Publication: Building Features in Visual Neural Networks
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Deep neural networks have emerged as the dominant model class of the human visual system, however the algorithms they implement are often thought to be inscrutable. In this dissertation, we push back against this characterization, particularly as it applies to neural networks for object classification. Analogous to the ventral visual stream, these models are thought to compute a hierarchy of feature detectors, with features in each layer of the hierarchy computed as a function of those features represented in the previous layer. Features in early layers are thought to detect simple things, like colors and edges, while those in deep layers might detect complex constructs, like 'Junco feathers'. A complete understanding of object recognition requires that we reconcile not only what features are represented in this hierarchy, but how rich features are computed through the composition of simpler constituents. In three chapters, we will develop several 'interpretability' tools for deep neural networks, aimed at providing visually intuitive explanations of feature construction. In Chapter 1, we will combat a major obstacle to our understanding of feature construction -- that features are embedded in large networks with many parameters. A consequence of this is that when one simply considers the networks architecture, the function that computes any feature in the network also has many parameters. We propose a technique for 'circuit pruning', which specifies a sparse route through the network by which a given feature computes its response to an input image(s). In Chapter 2, we propose a novel technique for visualizing what a feature responds to in a particular input image, which we call 'feature accentuation'. Typically, explanations of feature responses to individual images rely on attribution maps, which are displayed as heatmaps over the image highlighting the most exciting regions. However, an explanation of where important regions are located is insufficient; what is it the feature sees in these locations? With feature accentuation, we exaggerate the expression of a feature in a given input with gradient-based activation maximization, revealing how even when two features are excited by the same region of an image, it is often for very different reasons. In Chapter 3, we will address the functional role of feature inhibition; that is, what are the mechanisms by which the model ensures images do not express a given feature? Inhibition has received far less treatment in the literature than excitation, yet is critical for the construction of discriminative features. We observe that standard interpretability tools are not immediately suited to the inhibitory case, given the asymmetry introduced by the ReLU activation function. Given this, we propose inhibition be understood through a study of 'maximally tense images', i.e. those images that excite and inhibit a given feature simultaneously.