Publication:
Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness-intervention Algorithms in Classification

No Thumbnail Available

Date

2023-06-30

Authors

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

He, Luxi. 2023. Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness-intervention Algorithms in Classification. Bachelor's thesis, Harvard College.

Research Data

Abstract

Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell’s results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model’s accuracy given fairness constraints and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing interventions and investigate fairness risks in data with missing values. In recent years, almost 400 machine learning fairness interventions have been proposed. Our approach helps with understanding the absolute best performance given data distributions as well as how far current fairness interventions are away from this limit. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination, and hence there may be diminishing returns in continuing developing algorithms of this kind. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination.

Description

Other Available Sources

Keywords

Computer science, Mathematics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories