Publication: Allelic imbalance of chromatin accessibility in cancer identifies candidate causal risk variants and their mechanisms
No Thumbnail Available
Date
2022-09-07
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Grishin, Dennis. 2022. Allelic imbalance of chromatin accessibility in cancer identifies candidate causal risk variants and their mechanisms. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Understanding the functional impact of non-coding variants remains a major challenge in cancer genetics. In the case of non-coding germline risk associations from GWAS, many associations are in high LD and cannot be resolved by statistical fine-mapping. In the case of non-coding somatic drivers from tumor sequencing, existing datasets are generally underpowered for discovery with frequency-based methods. For both domains, methods that can infer the functional impact of individual variants within regulatory elements are thus urgently needed.
Here I studied allelic imbalance of chromatin accessibility in 406 ATAC-Seq samples across 23 cancer types. I employed a statistical model of allelic imbalance that aggregates signals across individuals while modeling individual-level somatic copy number variation. I discovered 7,262 germline allele-specific accessibility QTLs (as-aQTLs) and found that they are highly enriched for cancer risk heritability across seven common cancer GWAS (e.g., prostate cancer as-aQTLs with a 145±35.7 (p=6.3x10-5) fold enrichment for prostate cancer risk) and are largely explained by genetic variants that directly alter transcription factor binding and gene expression. To connect as-aQTLs to putative risk mechanisms, I introduced the Regulome-Wide Associations Study (RWAS). RWAS identified accessible peaks genetically associated with cancer risk at >70% of known breast and prostate loci (compared to % for a conventional Transcriptome-Wide Association Study) and discovered novel risk loci in all examined cancer types.
To estimate allele-specificity for variants that could not be tested, I developed siamAS, a predictor of allelic imbalance that uses a novel “Siamese” neural network approach that trains two mirrored networks on allele-specific features. I trained siamAS on my allele specificity data incorporating >7,000 features from multiple variant effect predictors. In a hold-out data set, siamAS achieved a classification AUC of 0.87, which substantially outperformed any individual predictive feature. Finally, I applied siamAS to variants within regulatory elements and non-coding somatic mutations in TCGA. siamAS identified germline variants that are enriched for cancer risk heritability and non-coding somatic mutations that result in allelic imbalance.
In summary, my results establish cancer as-aQTLs, RWAS, and siamAS as powerful tools to study the genetic architecture of cancer risk.
Description
Other Available Sources
Keywords
allelic imbalance, RWAS, siamAS, Genetics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service