Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records

DSpace/Manakin Repository

Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records

Citable link to this page

 

 
Title: Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
Author: Lin, Chen; Karlson, Elizabeth W.; Canhao, Helena; Miller, Timothy A.; Dligach, Dmitriy; Chen, Pei Jun; Perez, Raul Natanael Guzman; Shen, Yuanyan; Weinblatt, Michael E.; Shadick, Nancy A.; Plenge, Robert M.; Savova, Guergana K.

Note: Order does not necessarily reflect citation order of authors.

Citation: Lin, C., E. W. Karlson, H. Canhao, T. A. Miller, D. Dligach, P. J. Chen, R. N. G. Perez, et al. 2013. “Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records.” PLoS ONE 8 (8): e69932. doi:10.1371/journal.pone.0069932. http://dx.doi.org/10.1371/journal.pone.0069932.
Full Text & Related Files:
Abstract: Objective: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. Materials and Methods The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. Results: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. Conclusion: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies.
Published Version: doi:10.1371/journal.pone.0069932
Other Sources: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3745469/pdf/
Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:11855841
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)

 
 

Search DASH


Advanced Search
 
 

Submitters