Publication:

Analyzing and Evaluating Post hoc Explanation Methods for Black Box Machine Learning

Loading...
Thumbnail Image

Date

2022-05-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Pombra, Javin. 2022. Analyzing and Evaluating Post hoc Explanation Methods for Black Box Machine Learning. Bachelor's thesis, Harvard College.

Abstract

Over the past decade, complex tools such as deep learning models have been increasingly employed in high-stakes domains such as healthcare and criminal justice. Furthermore, these models achieve state-of-the-art accuracy at the expense of interpretability. As a result, practitioners, end users, and regulators have expressed a strong desire to increase the availability of post hoc explanation methods or ways to explain complex model architectures after the model is trained and deployed. Unfortunately, given the nascence of the field of explainability, there is little to no work on comparing and analyzing the behavior of popular post hoc methods.

This work introduces the disagreement problem in explainable machine learning. Through a series of user studies and offline experiments, we establish that the most common post hoc methods deployed on tabular, vision, and language datasets exhibit significant disagreement. Once established, we aim then to resolve the disagreement problem within graph neural network and deep learning recommendation models. To this end, we formalize novel metrics to test the efficacy of explainability methods. Starting with evaluating explainability for graph neural networks, we show under what dataset and model conditions various post hoc explainability methods operate best. We then move to the recommendation modeling space, formulating explainability as a joint task of interpreting embedding layers and neural layers. In addition to presenting a novel method, we conduct offline and online experimentation to also present which methods are preferred by target users.

Description

Other Available Sources

Research Data

Keywords

black box, explainability, graph neural network, interpretability, machine learning, recommendation model, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories