Publication: Multiplicative Feature-Based Attention for Transfer Learning in Deep Convolutional Neural Networks
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Recent years have seen dramatic progress in the use of deep convolutional neural networks (CNNs) to solve problems in computer vision. Typically, CNNs for computer vision are trained to perform a single task, such as image classification, with a fixed set of target image categories. In contrast, our own brains must flexibly solve many different tasks using the same neuronal “hardware.” One mechanism that may play a role in this flexibility is attention, which allows the brain to dynamically weight neural representations in a top-down, task dependent way. While some past efforts have explored adding attention to deep neural networks (Mnih et al., 2014; Xu et al., 2016), these have mostly focused on spatial attention, which allocates attention to specific locations in space. Here, we explore feature-based attention, where attention amplifies certain task-relevant feature detectors, rather than spatial locations. We investigate feature-based attention in neural networks through the context of transfer learning. A CNN is first trained to perform a reference task; next, a multiplicative weighting function is learned that amplifies certain filters to improve performance on a new task. Because this multiplicative weighting function has relatively few parameters, it can be learned quickly, yielding rapid improvements in performance on the new task. Consistent with our expectations, we find that filters with the highest initial discriminative ability are amplified the most, and we analyze which parts of the new task images are most amplified. This work has the potential both to advance practical methods for rapid transfer learning and provide insights into how featural attention might behave in the brain.