Learning Interpretable and Bias-Free Models for Visual Question Answering

Grand, Gabriel

dc.contributor.author	Grand, Gabriel
dc.date.accessioned	2020-08-28T09:18:37Z
dc.date.created	2019-03
dc.date.issued	2019-08-23
dc.date.submitted	2019
dc.identifier.citation	Grand, Gabriel. 2019. Learning Interpretable and Bias-Free Models for Visual Question Answering. Bachelor's thesis, Harvard College.
dc.identifier.uri	https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364587	*
dc.description.abstract	Visual Question Answering (VQA) is an innovative test of artificial intelligence that challenges machines to answer human-generated questions about everyday images. Over the past several years, steadily ascending scores on VQA benchmarks have generated the impression of swift progress towards systems capable of human-level reasoning. However, closer inspection of current VQA datasets and models reveals serious methodological issues lurking behind the façade of progress. Many popular VQA datasets contain systematic language biases that enable models to cheat by answering questions “blindly” without considering visual context. Meanwhile, the predominant approach to VQA relies on black-box neural networks that render this kind of cheating hard to detect, and even harder to prevent. In light of these issues, this work presents two sets of original research findings addressing the twin problems of interpretability and bias in VQA. We first aim to endow VQA models with the capacity to better explain their decisions by pointing to visual counterexamples. Our experiments suggest that VQA models overlook key semantic distinctions between visually-similar images, indicating an over-reliance on language biases. Motivated by this result, we introduce a technique called adversarial regularization that is designed to mitigate the effects of language bias on learning. We demonstrate that adversarial regularization makes VQA models more robust to latent biases in the data, and improves their ability to generalize to new domains. Drawing on our findings, we recommend a set of design principles for future VQA benchmarks to promote the development of interpretable and bias-free models.
dc.description.sponsorship	Computer Science
dc.format.mimetype	application/pdf
dc.language.iso	en
dash.license	LAA
dc.title	Learning Interpretable and Bias-Free Models for Visual Question Answering
dc.type	Thesis or Dissertation
dash.depositing.author	Grand, Gabriel
dc.date.available	2020-08-28T09:18:37Z
thesis.degree.date	2019
thesis.degree.grantor	Harvard College
thesis.degree.level	Undergraduate
thesis.degree.name	AB
dc.type.material	text
thesis.degree.department	Computer Science
thesis.degree.discipline-joint	Mind Brain Behavior
dash.identifier.vireo
dc.identifier.orcid	0000-0003-1920-0021
dash.author.email	gabrieljgrand@gmail.com

Files in this item

Name:: GRAND-SENIORTHESIS-2019.pdf
Size:: 29.61Mb
Format:: PDF

View/Open

Name:: GRAND-THESIS-TEX.zip
Size:: 43.52Mb
Format:: Unknown

View/Open

This item appears in the following Collection(s)

FAS Theses and Dissertations [6136]

Show simple item record