Publication:

Building Features in Visual Neural Networks

Loading...
Thumbnail Image

Date

2024-05-13

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Hamblin, Christopher J. 2024. Building Features in Visual Neural Networks. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Deep neural networks have emerged as the dominant model class of the human visual system, however the algorithms they implement are often thought to be inscrutable. In this dissertation, we push back against this characterization, particularly as it applies to neural networks for object classification. Analogous to the ventral visual stream, these models are thought to compute a hierarchy of feature detectors, with features in each layer of the hierarchy computed as a function of those features represented in the previous layer. Features in early layers are thought to detect simple things, like colors and edges, while those in deep layers might detect complex constructs, like 'Junco feathers'. A complete understanding of object recognition requires that we reconcile not only what features are represented in this hierarchy, but how rich features are computed through the composition of simpler constituents. In three chapters, we will develop several 'interpretability' tools for deep neural networks, aimed at providing visually intuitive explanations of feature construction. In Chapter 1, we will combat a major obstacle to our understanding of feature construction -- that features are embedded in large networks with many parameters. A consequence of this is that when one simply considers the networks architecture, the function that computes any feature in the network also has many parameters. We propose a technique for 'circuit pruning', which specifies a sparse route through the network by which a given feature computes its response to an input image(s). In Chapter 2, we propose a novel technique for visualizing what a feature responds to in a particular input image, which we call 'feature accentuation'. Typically, explanations of feature responses to individual images rely on attribution maps, which are displayed as heatmaps over the image highlighting the most exciting regions. However, an explanation of where important regions are located is insufficient; what is it the feature sees in these locations? With feature accentuation, we exaggerate the expression of a feature in a given input with gradient-based activation maximization, revealing how even when two features are excited by the same region of an image, it is often for very different reasons. In Chapter 3, we will address the functional role of feature inhibition; that is, what are the mechanisms by which the model ensures images do not express a given feature? Inhibition has received far less treatment in the literature than excitation, yet is critical for the construction of discriminative features. We observe that standard interpretability tools are not immediately suited to the inhibitory case, given the asymmetry introduced by the ReLU activation function. Given this, we propose inhibition be understood through a study of 'maximally tense images', i.e. those images that excite and inhibit a given feature simultaneously.

Description

Other Available Sources

Research Data

Keywords

Convolutional Neural Networks, Explainable AI, Feature Visualization, Mechanistic Interpretability, Neural Circuits, Neural Network Interpretability, Artificial intelligence, Psychology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories