Publication:

SHARD: Spatio-Hierarchical Architectures for RNA Data

Loading...
Thumbnail Image

Date

2025-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Boesen, John. 2025. SHARD: Spatio-Hierarchical Architectures for RNA Data. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Research Data

Abstract

We present a novel hierarchical transformer architecture for integrating morphological features with gene expression data in spatial single-cell analysis. Unlike existing approaches that either process bulk tissue regions or analyze modalities separately, our framework is the first to directly pair individual segmented cell images with their corresponding gene expression vectors through a cross-modal attention mechanism and incorporate information from neighboring cells. Our framework processes both individual cells and their microenvironments (niches) through a multimodal, multi-scale approach that preserves cell-level granularity while capturing tissue context.

We constructed a system with three components: a cell image encoder for morphological features, a gene expression encoder based on the pre-trained scGPT model, and a cross-modal attention transformer that aligns these data types. Together, this model is trained on 17.5 million cells from 20 tissue samples using the 10X Xenium platform, applying masked gene expression modeling with negative binomial distribution loss.

As a result, it improves over existing methods, with superior clustering results, with higher silhouette scores (0.49 vs. 0.32) and cell classification performance comparable to cutting-edge methods.

This thesis establishes a foundation for cell analysis that connects segmented morphological data and gene expression while representing tissue context at multiple scales. These embeddings create more generalizable understandings of cell function and have applications across disease classification, cell type identification, and spatial context analysis.

Description

Other Available Sources

Keywords

Computer science, Biostatistics, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories