Publication:
Genotype-driven identification of a molecular network predictive of advanced coronary calcium in ClinSeq® and Framingham Heart Study cohorts

Thumbnail Image

Open/View Files

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

BioMed Central
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Oguz, Cihan, Shurjo K. Sen, Adam R. Davis, Yi-Ping Fu, Christopher J. O’Donnell, and Gary H. Gibbons. 2017. “Genotype-driven identification of a molecular network predictive of advanced coronary calcium in ClinSeq® and Framingham Heart Study cohorts.” BMC Systems Biology 11 (1): 99. doi:10.1186/s12918-017-0474-5. http://dx.doi.org/10.1186/s12918-017-0474-5.

Research Data

Abstract

Background: One goal of personalized medicine is leveraging the emerging tools of data science to guide medical decision-making. Achieving this using disparate data sources is most daunting for polygenic traits. To this end, we employed random forests (RFs) and neural networks (NNs) for predictive modeling of coronary artery calcium (CAC), which is an intermediate endo-phenotype of coronary artery disease (CAD). Methods: Model inputs were derived from advanced cases in the ClinSeq®; discovery cohort (n=16) and the FHS replication cohort (n=36) from 89th-99th CAC score percentile range, and age-matched controls (ClinSeq®; n=16, FHS n=36) with no detectable CAC (all subjects were Caucasian males). These inputs included clinical variables and genotypes of 56 single nucleotide polymorphisms (SNPs) ranked highest in terms of their nominal correlation with the advanced CAC state in the discovery cohort. Predictive performance was assessed by computing the areas under receiver operating characteristic curves (ROC-AUC). Results: RF models trained and tested with clinical variables generated ROC-AUC values of 0.69 and 0.61 in the discovery and replication cohorts, respectively. In contrast, in both cohorts, the set of SNPs derived from the discovery cohort were highly predictive (ROC-AUC ≥0.85) with no significant change in predictive performance upon integration of clinical and genotype variables. Using the 21 SNPs that produced optimal predictive performance in both cohorts, we developed NN models trained with ClinSeq®; data and tested with FHS data and obtained high predictive accuracy (ROC-AUC=0.80-0.85) with several topologies. Several CAD and “vascular aging" related biological processes were enriched in the network of genes constructed from the predictive SNPs. Conclusions: We identified a molecular network predictive of advanced coronary calcium using genotype data from ClinSeq®; and FHS cohorts. Our results illustrate that machine learning tools, which utilize complex interactions between disease predictors intrinsic to the pathogenesis of polygenic disorders, hold promise for deriving predictive disease models and networks. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0474-5) contains supplementary material, which is available to authorized users.

Description

Keywords

Coronary artery calcium, Random forest, Neural networks, Case-control study, Coronary heart disease, Genotype data, Systems biology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories