Person: Szolovits, Peter
Loading...
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
Szolovits
First Name
Peter
Name
Szolovits, Peter
5 results
Search Results
Now showing 1 - 5 of 5
Publication Normalization of Plasma 25-Hydroxy Vitamin D Is Associated with Reduced Risk of Surgery in Crohn’s Disease(Oxford University Press (OUP), 2013-08-01) Ananthakrishnan, Ashwin; Cagan, Andrew; Gainer, Vivian S.; Cai, Tianxi; Cheng, Su-Chun; Savova, Guergana; Chen, Pei; Szolovits, Peter; Xia, Zongqi; De Jager, Philip; Shaw, Stanley; Churchill, Susanne; Karlson, Elizabeth; Kohane, Isaac; Plenge, Robert; Murphy, Shawn; Liao, KatherineIntroduction Vitamin D may have an immunological role in Crohn’s disease (CD) and ulcerative colitis (UC). Retrospective studies suggested a weak association between vitamin D status and disease activity but have significant limitations. Methods Using a multi-institution inflammatory bowel disease (IBD) cohort, we identified all CD and UC patients who had at least one measured plasma 25-hydroxy vitamin D [25(OH)D]. Plasma 25(OH)D was considered sufficient at levels ≥ 30ng/mL. Logistic regression models adjusting for potential confounders were used to identify impact of measured plasma 25(OH)D on subsequent risk of IBD-related surgery or hospitalization. In a subset of patients where multiple measures of 25(OH)D were available, we examined impact of normalization of vitamin D status on study outcomes. Results Our study included 3,217 patients (55% CD, mean age 49 yrs). The median lowest plasma 25(OH)D was 26ng/ml (IQR 17–35ng/ml). In CD, on multivariable analysis, plasma 25(OH)D < 20ng/ml was associated with an increased risk of surgery (OR 1.76 (1.24 – 2.51) and IBD-related hospitalization (OR 2.07, 95% CI 1.59 – 2.68) compared to those with 25(OH)D ≥ 30ng/ml. Similar estimates were also seen for UC. Furthermore, CD patients who had initial levels < 30ng/ml but subsequently normalized their 25(OH)D had a reduced likelihood of surgery (OR 0.56, 95% CI 0.32 – 0.98) compared to those who remained deficient. Conclusion Low plasma 25(OH)D is associated with increased risk of surgery and hospitalizations in both CD and UC and normalization of 25(OH)D status is associated with a reduction in the risk of CD-related surgery.Publication Improving Case Definition of Crohnʼs Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing(Oxford University Press (OUP), 2013-06) Ananthakrishnan, Ashwin; Cai, Tianxi; Savova, Guergana; Cheng, Su-Chun; Chen, Pei; Guzman, Raul; Gainer, Vivian S.; Murphy, Shawn; Szolovits, Peter; Xia, Zongqi; Shaw, Stanley; Churchill, Susanne; Karlson, Elizabeth; Kohane, Isaac; Plenge, Robert M.; Liao, KatherineIntroduction Prior studies identifying patients with inflammatory bowel disease (IBD) utilizing administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record (EMR) based model for classification of IBD leveraging the combination of codified data and information from clinical text notes using natural language processing (NLP). Methods Using the EMR of 2 large academic centers, we created data marts for Crohn’s disease (CD) and ulcerative colitis (UC) comprising patients with ≥ 1 ICD-9 code for each disease. We utilized codified (i.e. ICD9 codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables. Results We confirmed 399 (67%) CD cases in the CD training set and 378 (63%) UC cases in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve (AUC) for CD 0.95; UC 0.94) than models utilizing only disease ICD-9 codes (AUC 0.89 for CD; 0.86 for UC). Addition of NLP narrative terms to our final model resulted in classification of 6–12% more subjects with the same accuracy. Conclusion Inclusion of narrative concepts identified using NLP improves the accuracy of EMR case-definition for CD and UC while simultaneously identifying more subjects compared to models using codified data alone.Publication Robust Parameter Extraction for Decision Support Using Multimodal Intensive Care Data(The Royal Society, 2008) Clifford, Gari D; Long, W.J.; Moody, G.B.; Szolovits, PeterDigital information flow within the intensive care unit (ICU) continues to grow, with advances in technology and computational biology. Recent developments in the integration and archiving of these data have resulted in new opportunities for data analysis and clinical feedback. New problems associated with ICU databases have also arisen. ICU data are high-dimensional, often sparse, asynchronous and irregularly sampled, as well as being non-stationary, noisy and subject to frequent exogenous perturbations by clinical staff. Relationships between different physiological parameters are usually nonlinear (except within restricted ranges), and the equipment used to measure the observables is often inherently error-prone and biased. The prior probabilities associated with an individual's genetics, pre-existing conditions, lifestyle and ongoing medical treatment all affect prediction and classification accuracy. In this paper, we describe some of the key problems and associated methods that hold promise for robust parameter extraction and data fusion for use in clinical decision support in the ICU.Publication Automated De-Identification of Free-Text Medical Records(BioMed Central, 2008) Neamatullah, Ishna; Douglass, Margaret M; Lehman, Li-wei H; Reisner, Andrew; Villarroel, Mauricio; Long, William J; Szolovits, Peter; Moody, George B; Mark, Roger Greenwood; Clifford, Gari DBackground: Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that protected health information (PHI) be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. Methods: We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Results: Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either corpus. Conclusion: We have developed a pattern-matching de-identification system based on dictionary look-ups, regular expressions, and heuristics. Evaluation based on two different sets of nursing notes collected from a U.S. hospital suggests that, in terms of recall, the software out-performs a single human de-identifier (0.81) and performs at least as well as a consensus of two human de-identifiers (0.94). The system is currently tuned to de-identify PHI in nursing notes and discharge summaries but is sufficiently generalized and can be customized to handle text files of any format. Although the accuracy of the algorithm is high, it is probably insufficient to be used to publicly disseminate medical data. The open-source de-identification software and the gold standard re-identified corpus of medical records have therefore been made available to researchers via the PhysioNet website to encourage improvements in the algorithm.Publication High-Throughput Phenotyping With Electronic Medical Record Data Using a Common Semi-Supervised Approach (PheCAP)(Springer Science and Business Media LLC, 2019-11-20) Zhang, Yichi; Cai, Tianrun; Yu, Sheng; Cho, Kelly; Hong, Chuan; Sun, Jiehuan; Huang, Jie; Xia, Zongqi; Castro, Victor; Gagnon, David; Savova, Guergana; Churchill, Susanne; Gaziano, John; Kohane, Isaac; Cai, Tianxi; Ho, Yuk-Lam; Ananthakrishnan, Ashwin; Shaw, Stanley; Gainer, Vivian; Link, Nicholas; Honerlaw, Jacqueline; Huong, Sicong; Karlson, Elizabeth; Plenge, Robert; Szolovits, Peter; O'Donnell, Christopher; Murphy, Shawn; Liao, KatherinePhenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping using EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures reducing the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 days if all data are available; however, the timing is largely dependent on the chart review step which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes/no).