Publication: Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness-intervention Algorithms in Classification
No Thumbnail Available
Open/View Files
Date
2023-06-30
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
He, Luxi. 2023. Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness-intervention Algorithms in Classification. Bachelor's thesis, Harvard College.
Research Data
Abstract
Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell’s results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model’s accuracy given fairness constraints and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing interventions and investigate fairness risks in data with missing values. In recent years, almost 400 machine learning fairness interventions have been proposed. Our approach helps with understanding the absolute best performance given data distributions as well as how far current fairness interventions are away from this limit. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination, and hence there may be diminishing returns in continuing developing algorithms of this kind. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination.
Description
Other Available Sources
Keywords
Computer science, Mathematics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service