Publication:

Optimizing the Infidelity, Sensitivity, and Complexity of Feature Importance Explanations for Machine Learning Models

Loading...
Thumbnail Image

Open/View Files

Date

2025-05-22

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Du, Dennis. 2025. Optimizing the Infidelity, Sensitivity, and Complexity of Feature Importance Explanations for Machine Learning Models. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Interpretable machine learning aims to bridge the gap between complex model predictions and human understanding. Given the open-ended nature of the field, there is an abundance of different methods for achieving interpretability and metrics for evaluating the quality of explanations. In this thesis, we survey the existing work in the field and focus on three main types of metrics, which we refer to as \textit{infidelity}, \textit{sensitivity}, and \textit{complexity}. We explore a novel framework for interpretability that balances these three objectives using a metric of explanation quality that incorporates all three objectives, ensuring that explanations accurately capture the model's behavior, remain stable across similar inputs, and avoid being needlessly hard to interpret. Specifically, we consider explanations that attempt to generate an \textit{importance score} for each feature of the input. We calculate the optimal explanation by minimizing the metric according to an algorithm we introduce based on coordinate descent and the Adam optimizer. We implement this algorithm and evaluate our approach on a neural network trained on the MNIST dataset.

Description

Other Available Sources

Research Data

Keywords

Computer science, Artificial intelligence

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories