Publication: Information-Theoretic Tools for Machine Learning Beyond Accuracy
No Thumbnail Available
Date
2023-05-15
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Hsu, Hsiang. 2023. Information-Theoretic Tools for Machine Learning Beyond Accuracy. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
For the past decades, information theory and machine learning have propelled each other forward. Information theory has provided mathematical tools to tackle emerging challenges in machine learning, such as lack of transparency and interpretability, privacy leakage, algorithmic bias, and predictive multiplicity. Meanwhile, machine learning has enabled new statistical techniques for computing and estimating information-theoretic metrics.
This thesis leverages machine learning techniques to estimate information-theoretic quantities from data without the assumption of known distributions. We investigate the Lancaster decomposition of joint distributions that generalizes canonical correlation analysis. We estimate trimmed information density for image and text data, and compute the channel capacity of a class of image classifiers. Moreover, we solve information projection via modern automatic differentiation solver and convex programs.
In this thesis, these information-theoretic tools, in return, are used to design machine learning algorithms with provable guarantees. The Lancaster decomposition enables large- scale correspondence analysis to interpret data dependencies (e.g., computer vision and natural language datasets) for various types of learning. Trimmed information density extends privacy-assuring formulations to general information metrics with richer operational meanings, and inspires a privacy watchdog algorithm that better preserves data utility. Channel capacity serves as a rigorous metric for predictive multiplicity that is first estimated on large-scale models such as neural networks. Finally, we derive precise conditions for the fair use of group attributes and propose a model-agnostic post-processing algorithm that ensures group fairness constraints based on information projection.
Description
Other Available Sources
Keywords
Algorithmic Fairness, Correspondence Analysis, Data Privacy, Predictive Multiplicity, Computer science
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service