Use of Machine Learning Algorithms to Predict Illness Severity and Diagnosis Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) From Patient Data
MetadataShow full item record
CitationFriedman, Rill. 2020. Use of Machine Learning Algorithms to Predict Illness Severity and Diagnosis Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) From Patient Data. Master's thesis, Harvard Extension School.
AbstractMyalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is a condition that is characterized by a constellation of symptoms, including post-exertional malaise and disabling fatigue. Neither a blood test nor a single set of biomarkers exists for ME/CFS.
Dr. Jose G. Montoya, former Professor of Medicine, Infectious Diseases, and Geographic Medicine at the Stanford University Medical Center, provided us with a multidimensional data set consisting of samples from 192 ME/ CFS cases and 392 healthy controls. He eliminated participants with missing data, yielding a final sample size of 186 ME/CFS cases and 388 healthy controls. Each participant completed the MFI-20, a 20-item quality of life questionnaire, on the day of blood sample collection. Dr. Montoya measured each participant’s serum cytokine levels using a 51-multiplex array.
We divided this data into training and test sets to develop a binary predictive classifier to identify whether a study participant has ME/CFS. We tested eight different binary classification models. We fine-tuned the top three performing baseline models. We then employed the best of these three models, an elastic net generalized linear model binary classification algorithm to determine whether a study participant has ME/CFS. The model’s total accuracy is 75%, with an area under the return operator curve of 0.703, sensitivity of 0.439, and specificity of 0.93. This model is statistically significant at the α < 0.05 level with a Mcnemar's Test p-Value of 0.001.
Next, we similarly evaluated eight regression prediction models for disease severity. As above, we tuned the best three of the eight baseline models. The top performer of these three models, evaluated on mean absolute error (MAE), was partial least squares regression. This two-component partial least squares model has a root mean square error (RMSE) of 25.33, a training coefficient of multiple determination R2 of 0.062, and a training MAE of 22.49, test RMSE of 25.86, and test MAE of 23.25.
We successfully created a statistically significant binary classification model based upon 51 provided cytokine values in serum samples from ME/CFS patients and healthy controls. From the same cytokine values, combined with the MFI-20 survey measure of illness severity, we derived a regression and prediction model for illness severity. These findings may contribute to the development of a blood test for ME/CFS as well as the continuing hunt for disease biomarkers.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364879