Publication:

Epistemic Limits of Trustworthy Machine Learning

Loading...
Thumbnail Image

Date

2025-08-05

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Monteiro Paes, Lucas W.. 2025. Epistemic Limits of Trustworthy Machine Learning. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Theoretical understanding of a system’s limits has long driven technological breakthroughs. Carnot delineated the fundamental limits of heat engine efficiency, paving the way for the design of modern state-of-the-art engines. More than a century later, Claude Shannon unraveled the fundamental limit of communication, known as channel capacity. This insight revolutionized communication systems, enabling continual improvements that ultimately led to wireless communication as we know it today.

This thesis discusses the epistemic limits of machine learning (ML) and leverages them to improve the trustworthiness of ML systems. ML models have an epistemic limit when proving one of their properties is impossible. Epistemic refers to the impossibility of providing theoretical guarantees (knowledge) about a model's property. Epistemic limits are information-theoretic converse results on the hypothesis test that checks a model's property.

First, we prove a limit on how much information personalized models can use while ensuring reliable test for performance gains across all users -- epistemic limits of personalization. We leverage this limit to develop a tool to help with feature selection. Second, we show a limit for reliably testing if model performance is equitable across multiple demographic groups --epistemic limit of fairness testing. We exploit this limit to design a metric for efficient algorithmic bias detection. Third, we prove a limit for testing if one model outperforms another on average -- epistemic limit of model selection. We use this result to delineate the set of indistinguishably good models --Rashomon set. Finally, we argue that the epistemic limits in model selection imply that explaining the predictions of ML models is necessary. Then, we develop efficient methods for explaining the content produced by large language models.

Description

Other Available Sources

Research Data

Keywords

Artificial Inteligence, Explainability, Fairness, Hypothesis Testing, Information Theory, Predictive Multiplicity, Applied mathematics, Statistics, Artificial intelligence

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories