Evaluating the state of the art in disorder recognition and normalization of the clinical narrative

Pradhan, Sameer; Elhadad, Noémie; South, Brett R; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy W; Savova, Guergana

View/Open

4433360.pdf (370.4Kb)

Author

Pradhan, Sameer

Elhadad, Noémie

South, Brett R

Martinez, David

Christensen, Lee

Vogel, Amy

Suominen, Hanna

Chapman, Wendy W

Savova, Guergana HARVARD

Published Version

https://doi.org/10.1136/amiajnl-2013-002544

Metadata

Show full item record

Citation

Pradhan, Sameer, Noémie Elhadad, Brett R South, David Martinez, Lee Christensen, Amy Vogel, Hanna Suominen, Wendy W Chapman, and Guergana Savova. 2014. “Evaluating the state of the art in disorder recognition and normalization of the clinical narrative.” Journal of the American Medical Informatics Association : JAMIA 22 (1): 143-154. doi:10.1136/amiajnl-2013-002544. http://dx.doi.org/10.1136/amiajnl-2013-002544.

Abstract

Objective: The ShARe/CLEF eHealth 2013 Evaluation Lab Task 1 was organized to evaluate the state of the art on the clinical text in (i) disorder mention identification/recognition based on Unified Medical Language System (UMLS) definition (Task 1a) and (ii) disorder mention normalization to an ontology (Task 1b). Such a community evaluation has not been previously executed. Task 1a included a total of 22 system submissions, and Task 1b included 17. Most of the systems employed a combination of rules and machine learners. Materials and methods We used a subset of the Shared Annotated Resources (ShARe) corpus of annotated clinical text—199 clinical notes for training and 99 for testing (roughly 180 K words in total). We provided the community with the annotated gold standard training documents to build systems to identify and normalize disorder mentions. The systems were tested on a held-out gold standard test set to measure their performance. Results: For Task 1a, the best-performing system achieved an F1 score of 0.75 (0.80 precision; 0.71 recall). For Task 1b, another system performed best with an accuracy of 0.59. Discussion Most of the participating systems used a hybrid approach by supplementing machine-learning algorithms with features generated by rules and gazetteers created from the training data and from external resources. Conclusions: The task of disorder normalization is more challenging than that of identification. The ShARe corpus is available to the community as a reference standard for future studies.

Other Sources

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4433360/pdf/

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Citable link to this page

http://nrs.harvard.edu/urn-3:HUL.InstRepos:24983942

Collections

HMS Scholarly Articles [17922]

Contact administrator regarding this item (to report mistakes or request changes)