Publication:

Learning from high-dimensional measurements

Loading...
Thumbnail Image

Open/View Files

Date

2025-05-09

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Gowri, Gokul. 2025. Learning From High-Dimensional Measurements. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

This thesis considers the problems of distilling and quantifying information in high-dimensional measurements, with a focus on applications in biology. First, we explore the idea that underlying low-dimensional structure in high-dimensional data can be exploited to circumvent the curse of dimensionality in mutual information estimation. We develop a method that we call latent MI (LMI) approximation, which applies a nonparametric MI estimator to low-dimensional representations learned by a simple, theoretically-motivated model architecture. Using several benchmarks, we show that unlike existing techniques, LMI can approximate MI well for variables with $>10^3$ dimensions when their dependence structure has low intrinsic dimensionality. Second, we study how measurement noise in data affects the quality of representation learning models. Using an information-theoretic metric of representation quality, we show that model performance scales predictably with molecular undersampling noise in single-cell genomic data. We show that the form of this relationship can be recovered from a simple Gaussian noise model, which provides an intuitive interpretation of the law. Finally, we show that the same scaling relationship emerges in image classification problems, suggesting that noise scaling may be a general phenomenon.

Description

Other Available Sources

Research Data

Keywords

high-dimensional, mutual information, scaling laws, single-cell genomics, Statistics, Biology, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories