Publication: Advancing Molecular and Functional Understanding of Cells with Artificial Intelligence
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Rapid advancements in biotechnology are transforming our ability to measure biological systems across multiple modalities and scales. On the molecular level, spatially resolved single-cell transcriptomics enables comprehensive profiling of gene expression while preserving the spatial architecture of tissues. On the functional level, flexible brain-machine interfaces permit stable, long-term recordings of single-neuron activity throughout behavioral learning and disease progression. These innovations have increasingly shifted the life sciences toward a data-centric paradigm. However, traditional computational modeling approaches remain limited in their capacity to extract meaningful biological insights from heterogeneous, high-dimensional data, especially for decoding complex cell states and functions across space and time. To address this critical gap, this dissertation introduces a suite of computational methods that integrate artificial intelligence and machine learning (AI/ML) with cutting-edge biotechnologies. These methods are designed to interpret large-scale, multimodal biological data and feed insights back into experimental design for iterative discovery. Chapter 2 presents ClusterMap, a spatially informed, unsupervised clustering framework for single-cell and tissue segmentation directly from in situ transcriptomic data. Building on this, Chapter 3 builds a comprehensive spatial atlas of the mouse central nervous system by integrating single-cell gene expression and spatial data at subcellular resolution. To generalize spatial analysis across datasets and technologies, Chapter 4 introduces FuseMap, a universal deep-learning framework that harmonizes multiple brain atlases into a common coordinate framework, enabling gene imputation, tissue region annotation, and cross-dataset integration. Expanding beyond transcriptomics, Chapter 5 introduces AutoSort, a real-time multimodal spike sorting algorithm for stable long-term neural recordings, and UnitedNet, a multi-task learning model that jointly performs cell-type identification, cross-modal prediction, and feature relevance discovery across diverse single-cell modalities. In summary, this dissertation presents novel AI-powered frameworks that bridge molecular and functional modalities at the single-cell level. These approaches not only enhance our ability to decode the complex architecture and dynamics of biological systems but also provide a foundation for future integrative studies in development, disease, and therapeutic response.