Publication: SHARD: Spatio-Hierarchical Architectures for RNA Data
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Research Data
Abstract
We present a novel hierarchical transformer architecture for integrating morphological features with gene expression data in spatial single-cell analysis. Unlike existing approaches that either process bulk tissue regions or analyze modalities separately, our framework is the first to directly pair individual segmented cell images with their corresponding gene expression vectors through a cross-modal attention mechanism and incorporate information from neighboring cells. Our framework processes both individual cells and their microenvironments (niches) through a multimodal, multi-scale approach that preserves cell-level granularity while capturing tissue context.
We constructed a system with three components: a cell image encoder for morphological features, a gene expression encoder based on the pre-trained scGPT model, and a cross-modal attention transformer that aligns these data types. Together, this model is trained on 17.5 million cells from 20 tissue samples using the 10X Xenium platform, applying masked gene expression modeling with negative binomial distribution loss.
As a result, it improves over existing methods, with superior clustering results, with higher silhouette scores (0.49 vs. 0.32) and cell classification performance comparable to cutting-edge methods.
This thesis establishes a foundation for cell analysis that connects segmented morphological data and gene expression while representing tissue context at multiple scales. These embeddings create more generalizable understandings of cell function and have applications across disease classification, cell type identification, and spatial context analysis.