Person: Clifford, Gari D
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
First Name
Name
Search Results
Publication Automated De-Identification of Free-Text Medical Records
(BioMed Central, 2008) Neamatullah, Ishna; Douglass, Margaret M; Lehman, Li-wei H; Reisner, Andrew; Villarroel, Mauricio; Long, William J; Szolovits, Peter; Moody, George B; Mark, Roger Greenwood; Clifford, Gari DBackground: Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that protected health information (PHI) be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. Methods: We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Results: Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either corpus. Conclusion: We have developed a pattern-matching de-identification system based on dictionary look-ups, regular expressions, and heuristics. Evaluation based on two different sets of nursing notes collected from a U.S. hospital suggests that, in terms of recall, the software out-performs a single human de-identifier (0.81) and performs at least as well as a consensus of two human de-identifiers (0.94). The system is currently tuned to de-identify PHI in nursing notes and discharge summaries but is sufficiently generalized and can be customized to handle text files of any format. Although the accuracy of the algorithm is high, it is probably insufficient to be used to publicly disseminate medical data. The open-source de-identification software and the gold standard re-identified corpus of medical records have therefore been made available to researchers via the PhysioNet website to encourage improvements in the algorithm.
Publication Open source model for generating RR intervals in atrial fibrillation and beyond
(BioMed Central, 2007) Lian, Jie; Clifford, Gari D; Müssig, Dirk; Lang, VolkerBackground: Realistic modeling of cardiac inter-beat (RR) intervals is highly desirable for basic research in cardiac electrophysiology, clinical management of heart diseases, and developing signal processing tools for ECG analysis. Methods: We present an open source computer model that is capable to generate realistic time series of RR intervals in both physiologic and pathologic conditions. Detailed model structure and the software implementation are described. Results: Examples are provided on how to use this model to generate RR intervals in atrial fibrillation with ventricular pacing, normal sinus rhythm with heart rate variability, and typical atrial flutter with atrioventricular block. The extensibility of the model is also discussed. Conclusion: The present computer model provides a unified platform wherein various types of ventricular rhythm can be simulated. The availability of this open source model promises to support and stimulate future studies.
Publication Artificial Arterial Blood Pressure Artifact Models and an Evaluation of a Robust Blood Pressure and Heart Rate Estimator
(BioMed Central, 2009) Li, Qiao; Mark, Roger Greenwood; Clifford, Gari DBackground: Within the intensive care unit (ICU), arterial blood pressure (ABP) is typically recorded at different (and sometimes uneven) sampling frequencies, and from different sensors, and is often corrupted by different artifacts and noise which are often non-Gaussian, nonlinear and nonstationary. Extracting robust parameters from such signals, and providing confidences in the estimates is therefore difficult and requires an adaptive filtering approach which accounts for artifact types. Methods: Using a large ICU database, and over 6000 hours of simultaneously acquired electrocardiogram (ECG) and ABP waveforms sampled at 125 Hz from a 437 patient subset, we documented six general types of ABP artifact. We describe a new ABP signal quality index (SQI), based upon the combination of two previously reported signal quality measures weighted together. One index measures morphological normality, and the other degradation due to noise. After extracting a 6084-hour subset of clean data using our SQI, we evaluated a new robust tracking algorithm for estimating blood pressure and heart rate (HR) based upon a Kalman Filter (KF) with an update sequence modified by the KF innovation sequence and the value of the SQI. In order to do this, we have created six novel models of different categories of artifacts that we have identified in our ABP waveform data. These artifact models were then injected into clean ABP waveforms in a controlled manner. Clinical blood pressure (systolic, mean and diastolic) estimates were then made from the ABP waveforms for both clean and corrupted data. The mean absolute error for systolic, mean and diastolic blood pressure was then calculated for different levels of artifact pollution to provide estimates of expected errors given a single value of the SQI. Results: Our artifact models demonstrate that artifact types have differing effects on systolic, diastolic and mean ABP estimates. We show that, for most artifact types, diastolic ABP estimates are less noise-sensitive than mean ABP estimates, which in turn are more robust than systolic ABP estimates. We also show that our SQI can provide error bounds for both HR and ABP estimates. Conclusion: The KF/SQI-fusion method described in this article was shown to provide an accurate estimate of blood pressure and HR derived from the ABP waveform even in the presence of high levels of persistent noise and artifact, and during extreme bradycardia and tachycardia. Differences in error between artifact types, measurement sensors and the quality of the source signal can be factored into physiological estimation using an unbiased adaptive filter, signal innovation and signal quality measures.
Publication Cardiac MRI with Concurrent Physiological Monitoring Using MRI-Compatible 12-Lead ECG
(BioMed Central, 2012) Tse, Zion; Dumoulin, Charles; Clifford, Gari D; Oster, Julien; Jerosch-Herold, Michael; Kwong, Raymond; Stevenson, William; Schmidt, Ehud JeruhamPublication Robust Parameter Extraction for Decision Support Using Multimodal Intensive Care Data
(The Royal Society, 2008) Clifford, Gari D; Long, W.J.; Moody, G.B.; Szolovits, PeterDigital information flow within the intensive care unit (ICU) continues to grow, with advances in technology and computational biology. Recent developments in the integration and archiving of these data have resulted in new opportunities for data analysis and clinical feedback. New problems associated with ICU databases have also arisen. ICU data are high-dimensional, often sparse, asynchronous and irregularly sampled, as well as being non-stationary, noisy and subject to frequent exogenous perturbations by clinical staff. Relationships between different physiological parameters are usually nonlinear (except within restricted ranges), and the equipment used to measure the observables is often inherently error-prone and biased. The prior probabilities associated with an individual's genetics, pre-existing conditions, lifestyle and ongoing medical treatment all affect prediction and classification accuracy. In this paper, we describe some of the key problems and associated methods that hold promise for robust parameter extraction and data fusion for use in clinical decision support in the ICU.