Publication:

A causal framework for explaining the predictions of black-box sequence-to-sequencemodels

Loading...
Thumbnail Image

Date

2017-11-14

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

D. Alvarez-Melis and T. S. Jaakkola. "A causal framework for explaining the predictions of black-box sequence-to-sequence models". In: Proc. 2017 Conference on Empirical Methods in Natural Language Processing. EMNLP (Copenhagen, Denmark, Sept. 2017). Association for Computational Linguistics, 2017.

Research Data

Abstract

We interpret the predictions of any black-box structured input-structured output model around a specific input-output pair. Our method returns an "explanation" consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the black-box model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitioning problem to select the most relevant components. We focus the general approach on sequence-to-sequence problems, adopting a variational autoencoder to yield meaningful input perturbations. We test our method across several NLP sequence generation tasks.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories