Publication:
Use of Machine Learning Algorithms to Predict Illness Severity and Diagnosis Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) From Patient Data

No Thumbnail Available

Date

2020-03-03

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Friedman, Rill. 2020. Use of Machine Learning Algorithms to Predict Illness Severity and Diagnosis Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) From Patient Data. Master's thesis, Harvard Extension School.

Research Data

Abstract

Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is a condition that is characterized by a constellation of symptoms, including post-exertional malaise and disabling fatigue. Neither a blood test nor a single set of biomarkers exists for ME/CFS. Dr. Jose G. Montoya, former Professor of Medicine, Infectious Diseases, and Geographic Medicine at the Stanford University Medical Center, provided us with a multidimensional data set consisting of samples from 192 ME/ CFS cases and 392 healthy controls. He eliminated participants with missing data, yielding a final sample size of 186 ME/CFS cases and 388 healthy controls. Each participant completed the MFI-20, a 20-item quality of life questionnaire, on the day of blood sample collection. Dr. Montoya measured each participant’s serum cytokine levels using a 51-multiplex array. We divided this data into training and test sets to develop a binary predictive classifier to identify whether a study participant has ME/CFS. We tested eight different binary classification models. We fine-tuned the top three performing baseline models. We then employed the best of these three models, an elastic net generalized linear model binary classification algorithm to determine whether a study participant has ME/CFS. The model’s total accuracy is 75%, with an area under the return operator curve of 0.703, sensitivity of 0.439, and specificity of 0.93. This model is statistically significant at the α < 0.05 level with a Mcnemar's Test p-Value of 0.001. Next, we similarly evaluated eight regression prediction models for disease severity. As above, we tuned the best three of the eight baseline models. The top performer of these three models, evaluated on mean absolute error (MAE), was partial least squares regression. This two-component partial least squares model has a root mean square error (RMSE) of 25.33, a training coefficient of multiple determination R2 of 0.062, and a training MAE of 22.49, test RMSE of 25.86, and test MAE of 23.25. We successfully created a statistically significant binary classification model based upon 51 provided cytokine values in serum samples from ME/CFS patients and healthy controls. From the same cytokine values, combined with the MFI-20 survey measure of illness severity, we derived a regression and prediction model for illness severity. These findings may contribute to the development of a blood test for ME/CFS as well as the continuing hunt for disease biomarkers.

Description

Other Available Sources

Keywords

machine learning, biomarkers, classification, prediction, ME/CFS, Chronic Fatigue Syndrome, diagnosis, mathematical model

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories