Publication:
Information-Theoretic Tools for Machine Learning Beyond Accuracy

No Thumbnail Available

Date

2023-05-15

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Hsu, Hsiang. 2023. Information-Theoretic Tools for Machine Learning Beyond Accuracy. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

For the past decades, information theory and machine learning have propelled each other forward. Information theory has provided mathematical tools to tackle emerging challenges in machine learning, such as lack of transparency and interpretability, privacy leakage, algorithmic bias, and predictive multiplicity. Meanwhile, machine learning has enabled new statistical techniques for computing and estimating information-theoretic metrics. This thesis leverages machine learning techniques to estimate information-theoretic quantities from data without the assumption of known distributions. We investigate the Lancaster decomposition of joint distributions that generalizes canonical correlation analysis. We estimate trimmed information density for image and text data, and compute the channel capacity of a class of image classifiers. Moreover, we solve information projection via modern automatic differentiation solver and convex programs. In this thesis, these information-theoretic tools, in return, are used to design machine learning algorithms with provable guarantees. The Lancaster decomposition enables large- scale correspondence analysis to interpret data dependencies (e.g., computer vision and natural language datasets) for various types of learning. Trimmed information density extends privacy-assuring formulations to general information metrics with richer operational meanings, and inspires a privacy watchdog algorithm that better preserves data utility. Channel capacity serves as a rigorous metric for predictive multiplicity that is first estimated on large-scale models such as neural networks. Finally, we derive precise conditions for the fair use of group attributes and propose a model-agnostic post-processing algorithm that ensures group fairness constraints based on information projection.

Description

Other Available Sources

Keywords

Algorithmic Fairness, Correspondence Analysis, Data Privacy, Predictive Multiplicity, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories