Publication: Explaining Explanations and Perturbing Perturbations
No Thumbnail Available
Open/View Files
Date
2020-06-17
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Jia, Emily. 2020. Explaining Explanations and Perturbing Perturbations. Bachelor's thesis, Harvard College.
Research Data
Abstract
The impressive performance of machine learning models on prediction and classification tasks has sparked interest in deploying such models to make high-stakes decisions in domains such as health care and criminal justice. Due to the complexity and opacity of the models, as well as the sensitive nature of the tasks, additional explanation methods are needed to help identify bias and undesired behavior in models. LIME (Local Interpretable Model-agnostic Explanations) and SHAP (Shapley Additive exPlanations) are two widely-used explanation methods that can explain the local behavior of any black box model on a single data instance.
The first part of the thesis establishes the theoretical relationship between LIME and Kernel SHAP. We prove that Kernel SHAP is an instance of LIME that uniquely satisfies three desirable properties for feature attributions: local accuracy, missingness, and consistency. As suggested by the name of Kernel SHAP, the proof relies on a solution concept in game theory called the ``Shapley Value.” The second part of the thesis illustrates significant vulnerabilities in LIME and Kernel SHAP that make them unreliable for bias detection. We present a scaffolded model that can adversarially attack LIME or Kernel SHAP to produce any explanation. The consistency of this attack across three datasets (COMPAS, Communities and Crime, German Credit) demonstrates that highly biased classifiers can fool perturbation-based explanations such as LIME and Kernel SHAP into producing innocuous explanations.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service