Publication: A causal framework for explaining the predictions of black-box sequence-to-sequencemodels
Loading...
Date
2017-11-14
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computational Linguistics
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
D. Alvarez-Melis and T. S. Jaakkola. "A causal framework for explaining the predictions of black-box sequence-to-sequence models". In: Proc. 2017 Conference on Empirical Methods in Natural Language Processing. EMNLP (Copenhagen, Denmark, Sept. 2017). Association for Computational Linguistics, 2017.
Research Data
Abstract
We interpret the predictions of any black-box structured input-structured output model around a specific input-output pair. Our method returns an "explanation" consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the black-box model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitioning problem to select the most relevant components. We focus the general approach on sequence-to-sequence problems, adopting a variational autoencoder to yield meaningful input perturbations. We test our method across several NLP sequence generation tasks.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service