Join the session (up to 10 minutes early) | Read the paper
Understanding Adversarial Examples Requires A Theory Of Artifacts For Deep Learning
Related: How to connect to Neural Mechanism Webinars
We are excited about the next Neural Mechanisms webinar this Friday. As always, it is free. You can find information about how and when to join the webinar below or at the Neural Mechanisms website—where you can also join sign up for the mailing list that notifies people about upcoming webinars, webconferences, and more!
Cameron Buckner (University of Houston)
16 April 2021
h14-16 Greenwhich Mean Time / 16-18 CEST
(Convert to your local time here)
Abstract. Deep neural networks are currently the most widespread and successful technology in artificial intelligence. However, these systems exhibit bewildering new vulnerabilities: most notably a susceptibility to adversarial examples. Here, I review recent empirical research on adversarial examples that suggests that deep neural networks may be detecting in them features that are predictively useful, though inscrutable to humans. To understand the implications of this research, we should contend with some older philosophical puzzles about scientific reasoning, helping us to determine whether these features are reliable targets of scientific investigation or just the distinctive processing artefacts of deep neural networks.