Person:

Liao, Katherine

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Liao

First Name

Katherine

Name

Liao, Katherine

Search Results

Now showing 1 - 10 of 10
  • Publication

    Genetics of rheumatoid arthritis contributes to biology and drug discovery

    (2013) Okada, Yukinori; Wu, Di; Trynka, Gosia; Raj, Towfique; Terao, Chikashi; Ikari, Katsunori; Kochi, Yuta; Ohmura, Koichiro; Suzuki, Akari; Yoshida, Shinji; Graham, Robert R.; Manoharan, Arun; Ortmann, Ward; Bhangale, Tushar; Denny, Joshua C.; Carroll, Robert J.; Eyler, Anne E.; Greenberg, Jeffrey D.; Kremer, Joel M.; Pappas, Dimitrios A.; Jiang, Lei; Yin, Jian; Ye, Lingying; Su, Ding-Feng; Yang, Jian; Xie, Gang; Keystone, Ed; Westra, Harm-Jan; Esko, Tõnu; Metspalu, Andres; Zhou, Xuezhong; Gupta, Namrata; Mirel, Daniel; Stahl, Eli A.; Diogo, Dorothée; Cui, Jing; Liao, Katherine; Guo, Michael; Myouzen, Keiko; Kawaguchi, Takahisa; Coenen, Marieke J.H.; van Riel, Piet L.C.M.; van de Laar, Mart A.F.J.; Guchelaar, Henk-Jan; Huizinga, Tom W.J.; Dieudé, Philippe; Mariette, Xavier; Bridges, S. Louis; Zhernakova, Alexandra; Toes, Rene E.M.; Tak, Paul P.; Miceli-Richard, Corinne; Bang, So-Young; Lee, Hye-Soon; Martin, Javier; Gonzalez-Gay, Miguel A.; Rodriguez-Rodriguez, Luis; Rantapää-Dahlqvist, Solbritt; Ärlestig, Lisbeth; Choi, Hyon; Kamatani, Yoichiro; Galan, Pilar; Lathrop, Mark; Eyre, Steve; Bowes, John; Barton, Anne; de Vries, Niek; Moreland, Larry W.; Criswell, Lindsey A.; Karlson, Elizabeth; Taniguchi, Atsuo; Yamada, Ryo; Kubo, Michiaki; Liu, Jun; Bae, Sang-Cheol; Worthington, Jane; Padyukov, Leonid; Klareskog, Lars; Gregersen, Peter K.; Raychaudhuri, Soumya; Stranger, Barbara E.; De Jager, Philip; Franke, Lude; Visscher, Peter M.; Brown, Matthew A.; Yamanaka, Hisashi; Mimori, Tsuneyo; Takahashi, Atsushi; Xu, Huji; Behrens, Timothy W.; Siminovitch, Katherine A.; Momohara, Shigeki; Matsuda, Fumihiko; Yamamoto, Kazuhiko; Plenge, Robert M.

    A major challenge in human genetics is to devise a systematic strategy to integrate disease-associated variants with diverse genomic and biological datasets to provide insight into disease pathogenesis and guide drug discovery for complex traits such as rheumatoid arthritis (RA)1. Here, we performed a genome-wide association study (GWAS) meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 RA cases and 73,758 controls), by evaluating ~10 million single nucleotide polymorphisms (SNPs). We discovered 42 novel RA risk loci at a genome-wide level of significance, bringing the total to 1012–4. We devised an in-silico pipeline using established bioinformatics methods based on functional annotation5, cis-acting expression quantitative trait loci (cis-eQTL)6, and pathway analyses7–9 – as well as novel methods based on genetic overlap with human primary immunodeficiency (PID), hematological cancer somatic mutations and knock-out mouse phenotypes – to identify 98 biological candidate genes at these 101 risk loci. We demonstrate that these genes are the targets of approved therapies for RA, and further suggest that drugs approved for other indications may be repurposed for the treatment of RA. Together, this comprehensive genetic study sheds light on fundamental genes, pathways and cell types that contribute to RA pathogenesis, and provides empirical evidence that the genetics of RA can provide important information for drug discovery.

  • Publication

    Modeling Disease Severity in Multiple Sclerosis Using Electronic Health Records

    (Public Library of Science, 2013) Xia, Zongqi; Secor, Elizabeth; Chibnik, Lori; Bove, Riley; Cheng, Suchun; Chitnis, Tanuja; Cagan, Andrew; Gainer, Vivian S.; Chen, Pei J.; Liao, Katherine; Shaw, Stanley; Ananthakrishnan, Ashwin; Szolovits, Peter; Weiner, Howard; Karlson, Elizabeth; Murphy, Shawn; Savova, Guergana; Cai, Tianxi; Churchill, Susanne E.; Plenge, Robert M.; Kohane, Isaac; De Jager, Philip

    Objective: To optimally leverage the scalability and unique features of the electronic health records (EHR) for research that would ultimately improve patient care, we need to accurately identify patients and extract clinically meaningful measures. Using multiple sclerosis (MS) as a proof of principle, we showcased how to leverage routinely collected EHR data to identify patients with a complex neurological disorder and derive an important surrogate measure of disease severity heretofore only available in research settings. Methods: In a cross-sectional observational study, 5,495 MS patients were identified from the EHR systems of two major referral hospitals using an algorithm that includes codified and narrative information extracted using natural language processing. In the subset of patients who receive neurological care at a MS Center where disease measures have been collected, we used routinely collected EHR data to extract two aggregate indicators of MS severity of clinical relevance multiple sclerosis severity score (MSSS) and brain parenchymal fraction (BPF, a measure of whole brain volume). Results: The EHR algorithm that identifies MS patients has an area under the curve of 0.958, 83% sensitivity, 92% positive predictive value, and 89% negative predictive value when a 95% specificity threshold is used. The correlation between EHR-derived and true MSSS has a mean R2 = 0.38±0.05, and that between EHR-derived and true BPF has a mean R2 = 0.22±0.08. To illustrate its clinical relevance, derived MSSS captures the expected difference in disease severity between relapsing-remitting and progressive MS patients after adjusting for sex, age of symptom onset and disease duration (p = 1.56×10−12). Conclusion: Incorporation of sophisticated codified and narrative EHR data accurately identifies MS patients and provides estimation of a well-accepted indicator of MS severity that is widely used in research settings but not part of the routine medical records. Similar approaches could be applied to other complex neurological disorders.

  • Publication

    TYK2 Protein-Coding Variants Protect against Rheumatoid Arthritis and Autoimmunity, with No Evidence of Major Pleiotropic Effects on Non-Autoimmune Complex Traits

    (Public Library of Science, 2015) Diogo, Dorothée; Bastarache, Lisa; Liao, Katherine; Graham, Robert R.; Fulton, Robert S.; Greenberg, Jeffrey D.; Eyre, Steve; Bowes, John; Cui, Jing; Lee, Annette; Pappas, Dimitrios A.; Kremer, Joel M.; Barton, Anne; Coenen, Marieke J. H.; Franke, Barbara; Kiemeney, Lambertus A.; Mariette, Xavier; Richard-Miceli, Corrine; Canhão, Helena; Fonseca, João E.; de Vries, Niek; Tak, Paul P.; Crusius, J. Bart A.; Nurmohamed, Michael T.; Kurreeman, Fina; Mikuls, Ted R.; Okada, Yukinori; Stahl, Eli A.; Larson, David E.; Deluca, Tracie L.; O'Laughlin, Michelle; Fronick, Catrina C.; Fulton, Lucinda L.; Kosoy, Roman; Ransom, Michael; Bhangale, Tushar R.; Ortmann, Ward; Cagan, Andrew; Gainer, Vivian; Karlson, Elizabeth; Kohane, Isaac; Murphy, Shawn N.; Martin, Javier; Zhernakova, Alexandra; Klareskog, Lars; Padyukov, Leonid; Worthington, Jane; Mardis, Elaine R.; Seldin, Michael F.; Gregersen, Peter K.; Behrens, Timothy; Raychaudhuri, Soumya; Denny, Joshua C.; Plenge, Robert M.

    Despite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three protein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.66, P = 2.3x10-21), A928V (rs35018800, OR = 0.53, P = 1.2x10-9), and I684S (rs12720356, OR = 0.86, P = 4.6x10-7). Second, we show that the same three TYK2 variants protect against systemic lupus erythematosus (SLE, Pomnibus = 6x10-18), and provide suggestive evidence that two of the TYK2 variants (P1104A and A928V) may also protect against inflammatory bowel disease (IBD; Pomnibus = 0.005). Finally, in a phenome-wide association study (PheWAS) assessing >500 phenotypes using electronic medical records (EMR) in >29,000 subjects, we found no convincing evidence for association of P1104A and A928V with complex phenotypes other than autoimmune diseases such as RA, SLE and IBD. Together, our results demonstrate the role of TYK2 in the pathogenesis of RA, SLE and IBD, and provide supporting evidence for TYK2 as a promising drug target for the treatment of autoimmune diseases.

  • Publication

    Phenome‐Wide Association Study of Autoantibodies to Citrullinated and Noncitrullinated Epitopes in Rheumatoid Arthritis

    (John Wiley and Sons Inc., 2017) Liao, Katherine; Sparks, Jeffrey; Hejblum, Boris P.; Kuo, I‐Hsin; Cui, Jing; Lahey, Lauren J.; Cagan, Andrew; Gainer, Vivian S.; Liu, Weidong; Cai, T. Tony; Sokolove, Jeremy; Cai, Tianxi

    Objective: Patients with rheumatoid arthritis (RA) develop autoantibodies against a spectrum of antigens, but the clinical significance of these autoantibodies is unclear. Using a phenome‐wide association study (PheWAS) approach, we examined the association between autoantibodies and clinical subphenotypes of RA. Methods: This study was conducted in a cohort of RA patients identified from the electronic medical records (EMRs) of 2 tertiary care centers. Using a published multiplex bead assay, we measured 36 autoantibodies targeting epitopes implicated in RA. We extracted all International Classification of Diseases, Ninth Revision (ICD‐9) codes for each subject and grouped them into disease categories (PheWAS codes), using a published method. We tested for the association of each autoantibody (grouped by the targeted protein) with PheWAS codes. To determine significant associations (at a false discovery rate [FDR] of ≤0.1), we reviewed the medical records of 50 patients with each PheWAS code to determine positive predictive values (PPVs). Results: We studied 1,006 RA patients; the mean ± SD age of the patients was 61.0 ± 12.9 years, and 79.0% were female. A total of 3,568 unique ICD‐9 codes were grouped into 625 PheWAS codes; the 206 PheWAS codes with a prevalence of ≥3% were studied. Using the PheWAS method, we identified 24 significant associations of autoantibodies to epitopes at an FDR of ≤0.1. The associations that were strongest and had the highest PPV for the PheWAS code were autoantibodies against fibronectin and obesity (P = 6.1 × 10−4, PPV 100%), and that between fibrinogen and pneumonopathy (P = 2.7 × 10−4, PPV 96%). Pneumonopathy codes included diagnoses for cryptogenic organizing pneumonia and obliterative bronchiolitis. Conclusion: We demonstrated application of a bioinformatics method, the PheWAS, to screen for the clinical significance of RA‐related autoantibodies. Using the PheWAS approach, we identified potentially significant links between variations in the levels of autoantibodies and comorbidities of interest in RA.

  • Publication

    Development of phenotype algorithms using electronic medical records and incorporating natural language processing

    (BMJ Publishing Group Ltd., 2015) Liao, Katherine; Cai, Tianxi; Savova, Guergana K; Murphy, Shawn; Karlson, Elizabeth; Ananthakrishnan, Ashwin; Gainer, Vivian S; Shaw, Stanley; Xia, Zongqi; Szolovits, Peter; Churchill, Susanne; Kohane, Isaac

    Electronic medical records are emerging as a major source of data for clinical and translational research studies, although phenotypes of interest need to be accurately defined first. This article provides an overview of how to develop a phenotype algorithm from electronic medical records, incorporating modern informatics and biostatistics methods.

  • Publication

    Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts

    (Public Library of Science, 2015) Liao, Katherine; Ananthakrishnan, Ashwin; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S.; Goryachev, Sergey; Chen, Pei; Savova, Guergana; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn; Plenge, Robert M.; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley; Karlson, Elizabeth; Cai, Tianxi

    Background: Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results: We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions: We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.

  • Publication

    High-Throughput Phenotyping With Electronic Medical Record Data Using a Common Semi-Supervised Approach (PheCAP)

    (Springer Science and Business Media LLC, 2019-11-20) Zhang, Yichi; Cai, Tianrun; Yu, Sheng; Cho, Kelly; Hong, Chuan; Sun, Jiehuan; Huang, Jie; Xia, Zongqi; Castro, Victor; Gagnon, David; Savova, Guergana; Churchill, Susanne; Gaziano, John; Kohane, Isaac; Cai, Tianxi; Ho, Yuk-Lam; Ananthakrishnan, Ashwin; Shaw, Stanley; Gainer, Vivian; Link, Nicholas; Honerlaw, Jacqueline; Huong, Sicong; Karlson, Elizabeth; Plenge, Robert; Szolovits, Peter; O'Donnell, Christopher; Murphy, Shawn; Liao, Katherine

    Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping using EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures reducing the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 days if all data are available; however, the timing is largely dependent on the chart review step which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes/no).

  • Publication

    Improving Case Definition of Crohnʼs Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing

    (Oxford University Press (OUP), 2013-06) Ananthakrishnan, Ashwin; Cai, Tianxi; Savova, Guergana; Cheng, Su-Chun; Chen, Pei; Guzman, Raul; Gainer, Vivian S.; Murphy, Shawn; Szolovits, Peter; Xia, Zongqi; Shaw, Stanley; Churchill, Susanne; Karlson, Elizabeth; Kohane, Isaac; Plenge, Robert M.; Liao, Katherine

    Introduction Prior studies identifying patients with inflammatory bowel disease (IBD) utilizing administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record (EMR) based model for classification of IBD leveraging the combination of codified data and information from clinical text notes using natural language processing (NLP).

    Methods Using the EMR of 2 large academic centers, we created data marts for Crohn’s disease (CD) and ulcerative colitis (UC) comprising patients with ≥ 1 ICD-9 code for each disease. We utilized codified (i.e. ICD9 codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables.

    Results We confirmed 399 (67%) CD cases in the CD training set and 378 (63%) UC cases in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve (AUC) for CD 0.95; UC 0.94) than models utilizing only disease ICD-9 codes (AUC 0.89 for CD; 0.86 for UC). Addition of NLP narrative terms to our final model resulted in classification of 6–12% more subjects with the same accuracy.

    Conclusion Inclusion of narrative concepts identified using NLP improves the accuracy of EMR case-definition for CD and UC while simultaneously identifying more subjects compared to models using codified data alone.

  • Publication

    Normalization of Plasma 25-Hydroxy Vitamin D Is Associated with Reduced Risk of Surgery in Crohn’s Disease

    (Oxford University Press (OUP), 2013-08-01) Ananthakrishnan, Ashwin; Cagan, Andrew; Gainer, Vivian S.; Cai, Tianxi; Cheng, Su-Chun; Savova, Guergana; Chen, Pei; Szolovits, Peter; Xia, Zongqi; De Jager, Philip; Shaw, Stanley; Churchill, Susanne; Karlson, Elizabeth; Kohane, Isaac; Plenge, Robert; Murphy, Shawn; Liao, Katherine

    Introduction Vitamin D may have an immunological role in Crohn’s disease (CD) and ulcerative colitis (UC). Retrospective studies suggested a weak association between vitamin D status and disease activity but have significant limitations.

    Methods Using a multi-institution inflammatory bowel disease (IBD) cohort, we identified all CD and UC patients who had at least one measured plasma 25-hydroxy vitamin D [25(OH)D]. Plasma 25(OH)D was considered sufficient at levels ≥ 30ng/mL. Logistic regression models adjusting for potential confounders were used to identify impact of measured plasma 25(OH)D on subsequent risk of IBD-related surgery or hospitalization. In a subset of patients where multiple measures of 25(OH)D were available, we examined impact of normalization of vitamin D status on study outcomes.

    Results Our study included 3,217 patients (55% CD, mean age 49 yrs). The median lowest plasma 25(OH)D was 26ng/ml (IQR 17–35ng/ml). In CD, on multivariable analysis, plasma 25(OH)D < 20ng/ml was associated with an increased risk of surgery (OR 1.76 (1.24 – 2.51) and IBD-related hospitalization (OR 2.07, 95% CI 1.59 – 2.68) compared to those with 25(OH)D ≥ 30ng/ml. Similar estimates were also seen for UC. Furthermore, CD patients who had initial levels < 30ng/ml but subsequently normalized their 25(OH)D had a reduced likelihood of surgery (OR 0.56, 95% CI 0.32 – 0.98) compared to those who remained deficient.

    Conclusion Low plasma 25(OH)D is associated with increased risk of surgery and hospitalizations in both CD and UC and normalization of 25(OH)D status is associated with a reduction in the risk of CD-related surgery.

  • Publication

    Association between inflammation and systolic blood pressure in RA compared to patients without RA

    (BioMed Central, 2018) Yu, Zhi; Kim, Seoyoung; Vanni, Kathleen; Huang, Jie; Desai, Rishi; Murphy, Shawn; Solomon, Daniel; Liao, Katherine

    Background: The relationship between inflammation and blood pressure (BP) has been studied mainly in the general population. In this study, we examined the association between inflammation and BP across a broader range of inflammation observed in rheumatoid arthritis (RA) and non-RA outpatients. Methods: We studied subjects from a tertiary care outpatient population with C-reactive protein (CRP) and BP measured on the same date in 2009–2010; RA outpatients were identified using a validated algorithm. General population data were obtained from the National Health and Nutrition Examination Survey (NHANES) as comparison. To study the cross-sectional association between CRP and BP in the three groups, we constructed a generalized additive model. Longitudinal association between CRP and BP was examined using a repeated-measures linear mixed-effects model in RA outpatients with significant change in inflammation at two consecutive time points. Results: We studied 24,325 subjects from the outpatient population, of whom 1811 had RA, and 5561 were from NHANES. In RA outpatients, we observed a positive relationship between CRP and systolic BP (SBP) at CRP < 6 mg/L and an inverse association at CRP ≥ 6 mg/L. A similar inverse U-shaped relationship was observed in non-RA outpatients. In NHANES, we observed a positive relationship between CRP and SBP as demonstrated by previous studies. Longitudinal analysis in RA showed that every 10 mg/L increase in CRP was associated with a 0.38 mmHg reduction in SBP. Conclusions: Across a broad range of CRP observed in RA and non-RA outpatients, we found an inverse U-shaped relationship between CRP and SBP, highlighting a relationship not previously observed when studying the general population. Electronic supplementary material The online version of this article (10.1186/s13075-018-1597-9) contains supplementary material, which is available to authorized users.