Publication:

TRIM: Text Replacement for Interpreting Models, A Novel Approach for Interpreting Text Classifiers

Loading...
Thumbnail Image

Date

2023-06-30

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Haidar, Masaoud. 2023. TRIM: Text Replacement for Interpreting Models, A Novel Approach for Interpreting Text Classifiers. Bachelor's thesis, Harvard College.

Abstract

Current local model-agnostic interpretation algorithms perturb text data by deleting random words and measuring how much the deletion changes the output of classifiers. We show that this random deletion breaks the grammar and structure of the text and results in data that is out of distribution for classifiers. Instead, we propose TRIM: Text Replacement for Interpreting Models. Instead of deleting words, we replace words with "neutral" words that fit into the same text and use that to measure the contribution of the original words to the output.

We train a classifier on two different classification tasks —Sentiment Analysis and Question-Answering Classification— and interpret the classifier using current algorithms and TRIM. We show that TRIM is better at estimating word contribution in complex contexts and in the existence of multiple important words. In addition, we use QUACKIE to evaluate the two interpretation algorithms, finding that TRIM outperforms the baseline in most settings and improves accuracy by up to 4.5%.

Description

Other Available Sources

Research Data

Keywords

Computer science, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories