Show simple item record

dc.contributor.authorDeleger, Louiseen_US
dc.contributor.authorMolnar, Katalinen_US
dc.contributor.authorSavova, Guerganaen_US
dc.contributor.authorXia, Feien_US
dc.contributor.authorLingren, Todden_US
dc.contributor.authorLi, Qien_US
dc.contributor.authorMarsolo, Keithen_US
dc.contributor.authorJegga, Anilen_US
dc.contributor.authorKaiser, Meganen_US
dc.contributor.authorStoutenborough, Lauraen_US
dc.contributor.authorSolti, Imreen_US
dc.date.accessioned2014-03-11T10:17:20Z
dc.date.issued2012en_US
dc.identifier.citationDeleger, L., K. Molnar, G. Savova, F. Xia, T. Lingren, Q. Li, K. Marsolo, et al. 2012. “Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.” Journal of the American Medical Informatics Association : JAMIA 20 (1): 84-94. doi:10.1136/amiajnl-2012-001012. http://dx.doi.org/10.1136/amiajnl-2012-001012.en
dc.identifier.issn1067-5027en
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:11879366
dc.description.abstractObjective: (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. Material and methods A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated ‘gold standard’. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured. Results: The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction. Discussion and conclusion NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore, unlike manual de-identification, the automated approach scales up to millions of documents quickly and inexpensively.en
dc.language.isoen_USen
dc.publisherBMJ Groupen
dc.relation.isversionofdoi:10.1136/amiajnl-2012-001012en
dc.relation.hasversionhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC3555323/pdf/en
dash.licenseLAAen_US
dc.subjectNatural language processingen
dc.subjectprivacy of patient dataen
dc.subjecthealth insurance portability and accountability acten
dc.subjectautomated de-identificationen
dc.subjectprotected health informationen
dc.subjectNLPen
dc.subjecttext miningen
dc.subjecthuman-computer interaction and human-centered computingen
dc.subjectproviding just-in-time access to the biomedical literature and other health informationen
dc.subjectapplications that link biomedical knowledge from diverse primary sources (includes automated indexing)en
dc.subjectlinking the genotype and phenotypeen
dc.subjectdiscoveryen
dc.subjectbionlpen
dc.subjectmedical informaticsen
dc.subjectbiomedical informaticsen
dc.subjectdisease networksen
dc.subjecttranslational medicineen
dc.subjectdrug repositioningen
dc.subjectrare diseasesen
dc.titleLarge-scale evaluation of automated clinical note de-identification and its impact on information extractionen
dc.typeJournal Articleen_US
dc.description.versionVersion of Recorden
dc.relation.journalJournal of the American Medical Informatics Association : JAMIAen
dash.depositing.authorSavova, Guerganaen_US
dc.date.available2014-03-11T10:17:20Z
dc.identifier.doi10.1136/amiajnl-2012-001012*
dash.authorsorderedfalse
dash.contributor.affiliatedSavova, Guergana


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record