Unsupervised Phenotyping of Severe Asthma Research Program Participants Using Expanded Lung Data
Author
Wu, Wei
Bleecker, Eugene
Moore, Wendy
Busse, William W.
Castro, Mario
Chung, Kian Fan
Erzurum, Serpil
Gaston, Benjamin
Curran-Everett, Douglas
Wenzel, Sally E.
Published Version
https://doi.org/10.1016/j.jaci.2013.11.042Metadata
Show full item recordCitation
Wu, Wei, Eugene Bleecker, Wendy Moore, William W. Busse, Mario Castro, Kian Fan Chung, Serpil Erzurum et al. "Unsupervised Phenotyping of Severe Asthma Research Program Participants Using Expanded Lung Data." Journal of Allergy and Clinical Immunology 133, no. 5 (2014): 1280-1288. DOI: 10.1016/j.jaci.2013.11.042Abstract
BackgroundPrevious studies have identified asthma phenotypes based on small numbers of clinical, physiologic or inflammatory characteristics. However, no studies have utilized a wide range of variables using machine learning approaches.
Objectives
To identify subphenotypes of asthma utilizing blood, bronchoscopic, exhaled nitric oxide and clinical data from the Severe Asthma Research Program using unsupervised clustering, and then characterize them using supervised learning approaches.
Methods
Unsupervised clustering approaches were applied to 112 clinical, physiologic and inflammatory variables from 378 subjects. Variable selection and supervised learning techniques were employed to select relevant and nonredundant variables, address their predictive values, as well as the predictive value of the full variable set.
Results
Ten variable clusters and six subject clusters were identified, which differed and overlapped with previous clusters. Traditionally defined severe asthmatics distributed through subject Clusters 3–6. Cluster 4 identified early onset allergic asthmatics with low lung function and eosinophilic inflammation. Later onset, mostly severe asthmatics with nasal polyps and eosinophilia characterized Cluster 5. Cluster 6 asthmatics manifested persistent inflammation in blood and bronchoalveolar lavage and exacerbations despite high systemic corticosteroid use and side effects. Age of asthma onset, quality of life, symptoms, medications and health care utilization were some of the 51 nonredundant variables distinguishing subject clusters. These 51 variables classified test cases with 88% accuracy, compared to 93% accuracy with all 112 variables.
Conclusion
The unsupervised machine learning approaches used here provide unique insights into disease, confirming other approaches while revealing novel additional phenotypes.
Other Sources
http://pubmed.ncbi.nlm.nih.gov/24589344/Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37370428
Collections
- HMS Scholarly Articles [17928]
Contact administrator regarding this item (to report mistakes or request changes)