Learning Interpretable and Bias-Free Models for Visual Question Answering
MetadataShow full item record
CitationGrand, Gabriel. 2019. Learning Interpretable and Bias-Free Models for Visual Question Answering. Bachelor's thesis, Harvard College.
AbstractVisual Question Answering (VQA) is an innovative test of artificial intelligence that challenges machines to answer human-generated questions about everyday images. Over the past several years, steadily ascending scores on VQA benchmarks have generated the impression of swift progress towards systems capable of human-level reasoning. However, closer inspection of current VQA datasets and models reveals serious methodological issues lurking behind the façade of progress. Many popular VQA datasets contain systematic language biases that enable models to cheat by answering questions “blindly” without considering visual context. Meanwhile, the predominant approach to VQA relies on black-box neural networks that render this kind of cheating hard to detect, and even harder to prevent.
In light of these issues, this work presents two sets of original research findings addressing the twin problems of interpretability and bias in VQA. We first aim to endow VQA models with the capacity to better explain their decisions by pointing to visual counterexamples. Our experiments suggest that VQA models overlook key semantic distinctions between visually-similar images, indicating an over-reliance on language biases. Motivated by this result, we introduce a technique called adversarial regularization that is designed to mitigate the effects of language bias on learning. We demonstrate that adversarial regularization makes VQA models more robust to latent biases in the data, and improves their ability to generalize to new domains. Drawing on our findings, we recommend a set of design principles for future VQA benchmarks to promote the development of interpretable and bias-free models.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364587
- FAS Theses and Dissertations