Publication:

Bayesian Non-Negative Matrix Factorization with Correlated Mutation Type Probabilities for Mutational Signatures

Loading...
Thumbnail Image

Date

2025-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lang, Iris. 2025. Bayesian Non-Negative Matrix Factorization with Correlated Mutation Type Probabilities for Mutational Signatures. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Somatic mutations, or alterations in DNA of a somatic cell, are key markers of cancer. They reflect diverse underlying biological processes; however, many of the precise mechanisms driving these mutagenic processes remain only partially understood. In recent years, mutational signature analysis has become a prominent field of study within cancer research, and Nonnegative Matrix Factorization (NMF) has become a commonly used technique in the field. The idea behind using NMF is to decompose a tumor mutation counts matrix into the product of a latent signatures matrix with an exposures matrix.

Much of the recent mutational signatures literature focuses on Bayesian NMF, which allows for uncertainty quantification and the inclusion of prior knowledge. However, current methods assume independence across mutation types in the signatures matrix. This thesis expands upon the current Bayesian NMF methodologies by proposing novel methods that account for the dependencies between the mutation types. First, we implement the Bayesian NMF specification with a Multivariate Truncated Normal prior on the signatures matrix in order to be able to model the covariance structure using external information, in our case estimated from the COSMIC signatures database. This model is shown to converge in fewer iterations, using MCMC, when compared to a model with independent Truncated Normal priors on elements of the signatures matrix, as described in Landy et al. (2025), and results in improvements in accuracy, especially on small sample sizes. Additionally, we develop a hierarchical model, allowing for the covariance structure of the P matrix to be discovered rather than specified upfront, giving the algorithm more flexibility. This flexibility for the algorithm to learn the dependence structure of the signatures allows for a better understanding of biological interactions and how these change across different types of cancer. The code for this project is contributed to an open-source R software package. Our work lays the groundwork for future research to incorporate dependency structure across mutation types in the signatures matrix and is also applicable to any use of NMF beyond just SBS mutational signatures.

Description

Other Available Sources

Research Data

Keywords

Biostatistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories