Publication: Analyzing and Evaluating Post hoc Explanation Methods for Black Box Machine Learning
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Over the past decade, complex tools such as deep learning models have been increasingly employed in high-stakes domains such as healthcare and criminal justice. Furthermore, these models achieve state-of-the-art accuracy at the expense of interpretability. As a result, practitioners, end users, and regulators have expressed a strong desire to increase the availability of post hoc explanation methods or ways to explain complex model architectures after the model is trained and deployed. Unfortunately, given the nascence of the field of explainability, there is little to no work on comparing and analyzing the behavior of popular post hoc methods.
This work introduces the disagreement problem in explainable machine learning. Through a series of user studies and offline experiments, we establish that the most common post hoc methods deployed on tabular, vision, and language datasets exhibit significant disagreement. Once established, we aim then to resolve the disagreement problem within graph neural network and deep learning recommendation models. To this end, we formalize novel metrics to test the efficacy of explainability methods. Starting with evaluating explainability for graph neural networks, we show under what dataset and model conditions various post hoc explainability methods operate best. We then move to the recommendation modeling space, formulating explainability as a joint task of interpreting embedding layers and neural layers. In addition to presenting a novel method, we conduct offline and online experimentation to also present which methods are preferred by target users.