Person:

Mark, Roger Greenwood

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Mark

First Name

Roger Greenwood

Name

Mark, Roger Greenwood

Search Results

Now showing 1 - 3 of 3
  • Publication

    Automated De-Identification of Free-Text Medical Records

    (BioMed Central, 2008) Neamatullah, Ishna; Douglass, Margaret M; Lehman, Li-wei H; Reisner, Andrew; Villarroel, Mauricio; Long, William J; Szolovits, Peter; Moody, George B; Mark, Roger Greenwood; Clifford, Gari D

    Background: Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that protected health information (PHI) be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. Methods: We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Results: Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either corpus. Conclusion: We have developed a pattern-matching de-identification system based on dictionary look-ups, regular expressions, and heuristics. Evaluation based on two different sets of nursing notes collected from a U.S. hospital suggests that, in terms of recall, the software out-performs a single human de-identifier (0.81) and performs at least as well as a consensus of two human de-identifiers (0.94). The system is currently tuned to de-identify PHI in nursing notes and discharge summaries but is sufficiently generalized and can be customized to handle text files of any format. Although the accuracy of the algorithm is high, it is probably insufficient to be used to publicly disseminate medical data. The open-source de-identification software and the gold standard re-identified corpus of medical records have therefore been made available to researchers via the PhysioNet website to encourage improvements in the algorithm.

  • Publication

    Artificial Arterial Blood Pressure Artifact Models and an Evaluation of a Robust Blood Pressure and Heart Rate Estimator

    (BioMed Central, 2009) Li, Qiao; Mark, Roger Greenwood; Clifford, Gari D

    Background: Within the intensive care unit (ICU), arterial blood pressure (ABP) is typically recorded at different (and sometimes uneven) sampling frequencies, and from different sensors, and is often corrupted by different artifacts and noise which are often non-Gaussian, nonlinear and nonstationary. Extracting robust parameters from such signals, and providing confidences in the estimates is therefore difficult and requires an adaptive filtering approach which accounts for artifact types. Methods: Using a large ICU database, and over 6000 hours of simultaneously acquired electrocardiogram (ECG) and ABP waveforms sampled at 125 Hz from a 437 patient subset, we documented six general types of ABP artifact. We describe a new ABP signal quality index (SQI), based upon the combination of two previously reported signal quality measures weighted together. One index measures morphological normality, and the other degradation due to noise. After extracting a 6084-hour subset of clean data using our SQI, we evaluated a new robust tracking algorithm for estimating blood pressure and heart rate (HR) based upon a Kalman Filter (KF) with an update sequence modified by the KF innovation sequence and the value of the SQI. In order to do this, we have created six novel models of different categories of artifacts that we have identified in our ABP waveform data. These artifact models were then injected into clean ABP waveforms in a controlled manner. Clinical blood pressure (systolic, mean and diastolic) estimates were then made from the ABP waveforms for both clean and corrupted data. The mean absolute error for systolic, mean and diastolic blood pressure was then calculated for different levels of artifact pollution to provide estimates of expected errors given a single value of the SQI. Results: Our artifact models demonstrate that artifact types have differing effects on systolic, diastolic and mean ABP estimates. We show that, for most artifact types, diastolic ABP estimates are less noise-sensitive than mean ABP estimates, which in turn are more robust than systolic ABP estimates. We also show that our SQI can provide error bounds for both HR and ABP estimates. Conclusion: The KF/SQI-fusion method described in this article was shown to provide an accurate estimate of blood pressure and HR derived from the ABP waveform even in the presence of high levels of persistent noise and artifact, and during extreme bradycardia and tachycardia. Differences in error between artifact types, measurement sensors and the quality of the source signal can be factored into physiological estimation using an unbiased adaptive filter, signal innovation and signal quality measures.

  • Publication

    Discovering Shared Dynamics in Physiological Signals: Application to Patient Monitoring in ICU

    (Institute of Electrical and Electronics Engineers, 2012) Lehman, Li-wei H.; Nemati, Shamim; Adams, Ryan Prescott; Mark, Roger Greenwood

    Modern clinical databases include time series of vital signs, which are often recorded continuously during a hospital stay. Over several days, these recordings may yield many thousands of samples. In this work, we explore the feasibility of characterizing the “state of health” of a patient using the physiological dynamics inferred from these time series. The ultimate objective is to assist clinicians in allocating resources to high-risk patients. We hypothesize that “similar” patients exhibit similar dynamics and the properties and duration of these states are indicative of health and disease. We use Bayesian nonparametric machine learning methods to discover shared dynamics in patients' blood pressure (BP) time series. Each such “dynamic” captures a distinct pattern of evolution of BP and is possibly recurrent within the same time series and shared across multiple patients. Next, we examine the utility of this low-dimensional representation of BP time series for predicting mortality in patients. Our results are based on an intensive care unit (ICU) cohort of 480 patients (with 16% mortality) and indicate that the dynamics of time series of vital signs can be an independent useful predictor of outcome in ICU.