Publication:

Combining Foundation Models in Computational Pathology: Unlocking Multi-Representational Insights

Loading...
Thumbnail Image

Date

2025-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Runevic, Joel. 2025. Combining Foundation Models in Computational Pathology: Unlocking Multi-Representational Insights. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Foundation models have revolutionized computational pathology, enabling impressive results on tasks involving the classification of gigapixel whole-slide images (WSIs). However, no single foundation model consistently excels across all clinical scenarios. Given that these models differ substantially in their self-supervised training strategies, architectures, and data distributions, each model captures distinct morphological and structural features from histopathological slides. Leveraging multiple foundation models through patch-level feature fusion offers a promising approach to integrate their complementary strengths, potentially improving model robustness and generalization.

In this work, we present what is, to the best of our knowledge, one of the first and most comprehensive investigations of patch-level feature fusion using multiple foundation models. We systematically evaluate fusing three state-of-the-art pathology foundation models—UNI, Virchow, and GigaPath—across 11 established pathology tasks, 8 distinct fusion strategies, all possible encoder combinations, and various latent-space dimensionalities to thoroughly assess robustness. Since clinicians and researchers typically lack advance knowledge of which foundation model will perform best on unseen data, we adopt the average single-model performance as a practically relevant baseline for evaluating fusion methods. Our analysis demonstrates that a novel MLP-based fusion operator consistently surpasses this baseline in 132 out of 176 experiments across four multiple-instance learning (MIL) frameworks.

We further investigate factors influencing fusion effectiveness, finding that learned, parametric fusion operators typically outperform simpler, non-parametric methods predominantly studied in prior work. Additionally, we find that careful tuning of latent dimensionality can yield further performance gains, particularly for challenging multi-class subtyping tasks. Compared to conventional ensembles (aggregating final predictions), we discover that deep patch-level fusion is especially beneficial for multi-class diagnostic scenarios, whereas simpler ensembles may suffice for binary molecular biomarker tasks. Overall, this thesis provides valuable methodological insights and demonstrates the potential of multi-encoder patch-level fusion as a practical strategy for improving computational pathology systems.

Description

Other Available Sources

Research Data

Keywords

Computer science, Mathematics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories