Publication: Bayesian Non-Negative Matrix Factorization with Correlated Mutation Type Probabilities for Mutational Signatures
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Somatic mutations, or alterations in DNA of a somatic cell, are key markers of cancer. They reflect diverse underlying biological processes; however, many of the precise mechanisms driving these mutagenic processes remain only partially understood. In recent years, mutational signature analysis has become a prominent field of study within cancer research, and Nonnegative Matrix Factorization (NMF) has become a commonly used technique in the field. The idea behind using NMF is to decompose a tumor mutation counts matrix into the product of a latent signatures matrix with an exposures matrix.
Much of the recent mutational signatures literature focuses on Bayesian NMF, which allows for uncertainty quantification and the inclusion of prior knowledge. However, current methods assume independence across mutation types in the signatures matrix. This thesis expands upon the current Bayesian NMF methodologies by proposing novel methods that account for the dependencies between the mutation types. First, we implement the Bayesian NMF specification with a Multivariate Truncated Normal prior on the signatures matrix in order to be able to model the covariance structure using external information, in our case estimated from the COSMIC signatures database. This model is shown to converge in fewer iterations, using MCMC, when compared to a model with independent Truncated Normal priors on elements of the signatures matrix, as described in Landy et al. (2025), and results in improvements in accuracy, especially on small sample sizes. Additionally, we develop a hierarchical model, allowing for the covariance structure of the P matrix to be discovered rather than specified upfront, giving the algorithm more flexibility. This flexibility for the algorithm to learn the dependence structure of the signatures allows for a better understanding of biological interactions and how these change across different types of cancer. The code for this project is contributed to an open-source R software package. Our work lays the groundwork for future research to incorporate dependency structure across mutation types in the signatures matrix and is also applicable to any use of NMF beyond just SBS mutational signatures.