Publication: Towards an Artificial Visual System More Like the Primate Visual System
No Thumbnail Available
Open/View Files
Date
2022-09-12
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Dapello, Joel. 2022. Towards an Artificial Visual System More Like the Primate Visual System. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Current state-of-the-art object recognition models in computer vision are largely based on convolutional neural network (CNN) architectures, which are loosely inspired by the primate visual system. While many of these same CNNs have demonstrated remarkable accuracy as models of primate visual processing, there remain a number of noteworthy discrepancies between these models and the behavior of primates on object recognition tasks. First, many current CNNs can be readily fooled by explicitly crafted image perturbations which do not contain any semantic meaning to humans, and further struggle to recognize objects in noise corrupted images that are likewise still easily recognized by humans. More generally, it has been shown that even the best CNN models of primate visual processing do not align well with the image-by-image behavioral error patterns observed in humans. Taken together, these discrepancies point to critical differences between the underlying computational algorithms used in CNN models of object recognition and those present in the primate visual system, and hinder the deployment of CNN models in real world applications. In this dissertation, I start by demonstrating a relationship between a model’s ability to predict primate primary visual cortex firing rates and its robustness to small adversarial attacks. Following this, I develop a new hybrid CNN architecture called the VOneNet with a front end simulating the known processing characteristics of the primate primary visual cortex, followed by a standard downstream CNN architecture, and demonstrate that this new model has improved robustness to adversarial attacks and common image corruptions. After discovering that stochastic representations play a critical role in the robustness of the VOneNet, I use a recent mean-field theoretic manifold analysis technique to investigate how this mechanisms provides protection, and demonstrate that adding stochastic representations to auditory models of word recognition also improves adversarial robustness. Finally, shifting focus, I develop a method to directly align CNN representations with the inferior temporal cortex, a high level visual processing region of the primate visual system from which neural recordings have been shown to contain highly predictive information about an animal's behavior on object recognition tasks. I demonstrate that this method is able to generate CNN models that are more similar to the primate inferior temporal cortex, even on never before seen images and held out animals, and that this increase in IT similarity also correlates strongly with improvements in adversarial robustness and alignment with human behavioral error patterns. Overall, this dissertation demonstrates that by increasing the biological fidelity of CNN models, not only do we make better models of primate visual system, we also make better object recognition models in general, with improved robustness and alignment with human behavior.
Description
Other Available Sources
Keywords
Adversarial Attacks, Computer Vision, Machine Learning, Neural Representations, Primate Vision, Robustness, Neurosciences
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service