Person: Karczewski, Konrad
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
First Name
Name
Search Results
Publication Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects
(Nature Publishing Group, 2016) Zou, James; Valiant, Gregory; Valiant, Paul; Karczewski, Konrad; Chan, Siu On; Samocha, Kaitlin E.; Lek, Monkol; Sunyaev, Shamil; Daly, Mark; MacArthur, DanielAs new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.
Publication The ExAC browser: displaying reference data information from over 60 000 exomes
(Oxford University Press, 2017) Karczewski, Konrad; Weisburd, Ben; Thomas, Brett; Solomonson, Matthew; Ruderfer, Douglas M.; Kavanagh, David; Hamamsy, Tymor; Lek, Monkol; Samocha, Kaitlin E.; Cummings, Beryl; Birnbaum, Daniel; Daly, Mark; MacArthur, DanielWorldwide, hundreds of thousands of humans have had their genomes or exomes sequenced, and access to the resulting data sets can provide valuable information for variant interpretation and understanding gene function. Here, we present a lightweight, flexible browser framework to display large population datasets of genetic variation. We demonstrate its use for exome sequence data from 60 706 individuals in the Exome Aggregation Consortium (ExAC). The ExAC browser provides gene- and transcript-centric displays of variation, a critical view for clinical applications. Additionally, we provide a variant display, which includes population frequency and functional annotation data as well as short read support for the called variant. This browser is open-source, freely available at http://exac.broadinstitute.org, and has already been used extensively by clinical laboratories worldwide.
Publication Ten Simple Rules to Enable Multi-site Collaborations through Data Sharing
(Public Library of Science, 2017) Boland, Mary Regina; Karczewski, Konrad; Tatonetti, Nicholas P.Publication Patterns of genic intolerance of rare copy number variation in 59,898 human exomes
(2016) Ruderfer, Douglas M.; Hamamsy, Tymor; Lek, Monkol; Karczewski, Konrad; Kavanagh, David; Samocha, Kaitlin E.; Daly, Mark; MacArthur, Daniel; Fromer, Menachem; Purcell, Shaun M.Copy number variation (CNV) impacting protein-coding genes contributes significantly to human diversity and disease. Here we characterized the rates and properties of rare genic CNV (<0.5% frequency) in exome-sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC). On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes impacted by CNVs were more intolerant than in controls. ExAC CNV data constitutes a critical component of an integrated database spanning the spectrum of human genetic variation, aiding the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online.
Publication Analysis of protein-coding genetic variation in 60,706 humans
(2016) Lek, Monkol; Karczewski, Konrad; Minikel, Eric; Samocha, Kaitlin E.; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark; MacArthur, DanielSummary Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. We describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of truncating variants with 72% having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes.
Publication Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel
(Nature Pub. Group, 2015) Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L.; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Al Turki, Saeed; Amuzu, Antoinette; Anderson, Carl A.; Anney, Richard; Antony, Dinu; Artigas, María Soler; Ayub, Muhammad; Bala, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Benn, Marianne; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick F.; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Pablo Casas, Juan; Chambers, John C.; Charlton, Ruth; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebahattin; Clapham, Peter; Clement, Gail; Coates, Guy; Cocca, Massimiliano; Collier, David A.; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day, Ian N. M.; Day-Williams, Aaron; Dedoussis, George; Down, Thomas; Du, Yuanping; van Duijn, Cornelia M.; Dunham, Ian; Edkins, Sarah; Ekong, Rosemary; Ellis, Peter; Evans, David M.; Farooqi, I. Sadaf; Fitzpatrick, David R.; Flicek, Paul; Floyd, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Gasparini, Paolo; Gaunt, Tom R.; Geihs, Matthias; Geschwind, Daniel; Greenwood, Celia; Griffin, Heather; Grozeva, Detelina; Guo, Xiaosen; Guo, Xueqin; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey E.; Holmans, Peter; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Iotchkova, Valentina; Isaacs, Aaron; Jackson, David K.; Jamshidi, Yalda; Johnson, Jon; Joyce, Chris; Karczewski, Konrad; Kaye, Jane; Keane, Thomas; Kemp, John P.; Kennedy, Karen; Kent, Alastair; Keogh, Julia; Khawaja, Farrah; Kleber, Marcus E.; van Kogelenberg, Margriet; Kolb-Kokocinski, Anja; Kooner, Jaspal S.; Lachance, Genevieve; Langenberg, Claudia; Langford, Cordelia; Lawson, Daniel; Lee, Irene; van Leeuwen, Elisabeth M.; Lek, Monkol; Li, Rui; Li, Yingrui; Liang, Jieqin; Lin, Hong; Liu, Ryan; Lönnqvist, Jouko; Lopes, Luis R.; Lopes, Margarida; Luan, Jian'an; MacArthur, Daniel; Mangino, Massimo; Marenne, Gaëlle; März, Winfried; Maslen, John; Matchan, Angela; Mathieson, Iain; McGuffin, Peter; McIntosh, Andrew M.; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Migone, Nicola; Mitchison, Hannah M.; Moayyeri, Alireza; Morris, James; Morris, Richard; Muddyman, Dawn; Muntoni, Francesco; Nordestgaard, Børge G.; Northstone, Kate; O'Donovan, Michael C.; O'Rahilly, Stephen; Onoufriadis, Alexandros; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Payne, Stewart J.; Perry, John R. B.; Pietilainen, Olli; Plagnol, Vincent; Pollitt, Rebecca C.; Povey, Sue; Quail, Michael A.; Quaye, Lydia; Raymond, Lucy; Rehnström, Karola; Ridout, Cheryl K.; Ring, Susan; Ritchie, Graham R. S.; Roberts, Nicola; Robinson, Rachel L.; Savage, David B.; Scambler, Peter; Schiffels, Stephan; Schmidts, Miriam; Schoenmakers, Nadia; Scott, Richard H.; Scott, Robert A.; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shaw, Adam; Shihab, Hashem A.; Shin, So-Youn; Skuse, David; Small, Kerrin S.; Smee, Carol; Smith, George Davey; Southam, Lorraine; Spasic-Boskovic, Olivera; Spector, Timothy D.; St Clair, David; St Pourcain, Beate; Stalker, Jim; Stevens, Elizabeth; Sun, Jianping; Surdulescu, Gabriela; Suvisaari, Jaana; Syrris, Petros; Tachmazidou, Ioanna; Taylor, Rohan; Tian, Jing; Tobin, Martin D.; Toniolo, Daniela; Traglia, Michela; Tybjaerg-Hansen, Anne; Valdes, Ana M.; Vandersteen, Anthony M.; Varbo, Anette; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T. R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Eleanor; Whincup, Peter; Whyte, Tamieka; Williams, Hywel J.; Williamson, Kathleen A.; Wilson, Crispian; Wilson, Scott G.; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zaza, Gianluigi; Zeggini, Eleftheria; Zhang, Feng; Zhang, Pingbo; Zhang, Weihua; Gambaro, Giovanni; Richards, J. Brent; Durbin, Richard; Timpson, Nicholas J.; Marchini, Jonathan; Soranzo, NicoleImputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.
Publication Efficient genotype compression and analysis of large genetic variation datasets
(2015) Layer, Ryan M.; Kindlon, Neil; Karczewski, Konrad; Quinlan, Aaron R.Genotype Query Tools (GQT) is a new indexing strategy that expedites analyses of genome variation datasets in VCF format based on sample genotypes, phenotypes and relationships. GQT’s compressed genotype index minimizes decompression for analysis, and performance relative to existing methods improves with cohort size. We show substantial (up to 443 fold) performance gains over existing methods and demonstrate GQT’s utility for exploring massive datasets involving thousands to millions of genomes.
Publication Concept and design of a genome-wide association genotyping array tailored for transplantation-specific studies
(BioMed Central, 2015) Li, Yun R.; van Setten, Jessica; Verma, Shefali S.; Lu, Yontao; Holmes, Michael V.; Gao, Hui; Lek, Monkol; Nair, Nikhil; Chandrupatla, Hareesh; Chang, Baoli; Karczewski, Konrad; Wong, Chanel; Mohebnasab, Maede; Mukhtar, Eyas; Phillips, Randy; Tragante, Vinicius; Hou, Cuiping; Steel, Laura; Lee, Takesha; Garifallou, James; Guettouche, Toumy; Cao, Hongzhi; Guan, Weihua; Himes, Aubree; van Houten, Jacob; Pasquier, Andrew; Yu, Reina; Carrigan, Elena; Miller, Michael B.; Schladt, David; Akdere, Abdullah; Gonzalez, Ana; Llyod, Kelsey M.; McGinn, Daniel; Gangasani, Abhinav; Michaud, Zach; Colasacco, Abigail; Snyder, James; Thomas, Kelly; Wang, Tiancheng; Wu, Baolin; Alzahrani, Alhusain J.; Al-Ali, Amein K.; Al-Muhanna, Fahad A.; Al-Rubaish, Abdullah M.; Al-Mueilo, Samir; Monos, Dimitri S.; Murphy, Barbara; Olthoff, Kim M.; Wijmenga, Cisca; Webster, Teresa; Kamoun, Malek; Balasubramanian, Suganthi; Lanktree, Matthew B.; Oetting, William S.; Garcia-Pavia, Pablo; MacArthur, Daniel; de Bakker, Paul I W; Hakonarson, Hakon; Birdwell, Kelly A.; Jacobson, Pamala A.; Ritchie, Marylyn D.; Asselbergs, Folkert W.; Israni, Ajay K.; Shaked, Abraham; Keating, Brendan J.Background: In addition to HLA genetic incompatibility, non-HLA difference between donor and recipients of transplantation leading to allograft rejection are now becoming evident. We aimed to create a unique genome-wide platform to facilitate genomic research studies in transplant-related studies. We designed a genome-wide genotyping tool based on the most recent human genomic reference datasets, and included customization for known and potentially relevant metabolic and pharmacological loci relevant to transplantation. Methods: We describe here the design and implementation of a customized genome-wide genotyping array, the ‘TxArray’, comprising approximately 782,000 markers with tailored content for deeper capture of variants across HLA, KIR, pharmacogenomic, and metabolic loci important in transplantation. To test concordance and genotyping quality, we genotyped 85 HapMap samples on the array, including eight trios. Results: We show low Mendelian error rates and high concordance rates for HapMap samples (average parent-parent-child heritability of 0.997, and concordance of 0.996). We performed genotype imputation across autosomal regions, masking directly genotyped SNPs to assess imputation accuracy and report an accuracy of >0.962 for directly genotyped SNPs. We demonstrate much higher capture of the natural killer cell immunoglobulin-like receptor (KIR) region versus comparable platforms. Overall, we show that the genotyping quality and coverage of the TxArray is very high when compared to reference samples and to other genome-wide genotyping platforms. Conclusions: We have designed a comprehensive genome-wide genotyping tool which enables accurate association testing and imputation of ungenotyped SNPs, facilitating powerful and cost-effective large-scale genotyping of transplant-related studies. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0211-x) contains supplementary material, which is available to authorized users.
Publication Insights into the genetic epidemiology of Crohn's and rare diseases in the Ashkenazi Jewish population
(Public Library of Science, 2018) Rivas, Manuel A.; Avila, Brandon E.; Koskela, Jukka; Huang, Hailiang; Stevens, Christine; Pirinen, Matti; Haritunians, Talin; Neale, Benjamin; Kurki, Mitja; Ganna, Andrea; Graham, Daniel; Glaser, Benjamin; Peter, Inga; Atzmon, Gil; Barzilai, Nir; Levine, Adam P.; Schiff, Elena; Pontikos, Nikolas; Weisburd, Ben; Lek, Monkol; Karczewski, Konrad; Bloom, Jonathan; Minikel, Eric; Petersen, Britt-Sabina; Beaugerie, Laurent; Seksik, Philippe; Cosnes, Jacques; Schreiber, Stefan; Bokemeyer, Bernd; Bethge, Johannes; Heap, Graham; Ahmad, Tariq; Plagnol, Vincent; Segal, Anthony W.; Targan, Stephan; Turner, Dan; Saavalainen, Paivi; Farkkila, Martti; Kontula, Kimmo; Palotie, Aarno; Brant, Steven R.; Duerr, Richard H.; Silverberg, Mark S.; Rioux, John D.; Weersma, Rinse K.; Franke, Andre; Jostins, Luke; Anderson, Carl A.; Barrett, Jeffrey C.; MacArthur, Daniel; Jalas, Chaim; Sokol, Harry; Xavier, Ramnik; Pulver, Ann; Cho, Judy H.; McGovern, Dermot P. B.; Daly, MarkAs part of a broader collaborative network of exome sequencing studies, we developed a jointly called data set of 5,685 Ashkenazi Jewish exomes. We make publicly available a resource of site and allele frequencies, which should serve as a reference for medical genetics in the Ashkenazim (hosted in part at https://ibd.broadinstitute.org, also available in gnomAD at http://gnomad.broadinstitute.org). We estimate that 34% of protein-coding alleles present in the Ashkenazi Jewish population at frequencies greater than 0.2% are significantly more frequent (mean 15-fold) than their maximum frequency observed in other reference populations. Arising via a well-described founder effect approximately 30 generations ago, this catalog of enriched alleles can contribute to differences in genetic risk and overall prevalence of diseases between populations. As validation we document 148 AJ enriched protein-altering alleles that overlap with "pathogenic" ClinVar alleles (table available at https://github.com/macarthur-lab/clinvar/blob/master/output/clinvar.tsv), including those that account for 10–100 fold differences in prevalence between AJ and non-AJ populations of some rare diseases, especially recessive conditions, including Gaucher disease (GBA, p.Asn409Ser, 8-fold enrichment); Canavan disease (ASPA, p.Glu285Ala, 12-fold enrichment); and Tay-Sachs disease (HEXA, c.1421+1G>C, 27-fold enrichment; p.Tyr427IlefsTer5, 12-fold enrichment). We next sought to use this catalog, of well-established relevance to Mendelian disease, to explore Crohn's disease, a common disease with an estimated two to four-fold excess prevalence in AJ. We specifically attempt to evaluate whether strong acting rare alleles, particularly protein-truncating or otherwise large effect-size alleles, enriched by the same founder-effect, contribute excess genetic risk to Crohn's disease in AJ, and find that ten rare genetic risk factors in NOD2 and LRRK2 are enriched in AJ (p < 0.005), including several novel contributing alleles, show evidence of association to CD. Independently, we find that genomewide common variant risk defined by GWAS shows a strong difference between AJ and non-AJ European control population samples (0.97 s.d. higher, p<10−16). Taken together, the results suggest coordinated selection in AJ population for higher CD risk alleles in general. The results and approach illustrate the value of exome sequencing data in case-control studies along with reference data sets like ExAC (sites VCF available via FTP at ftp.broadinstitute.org/pub/ExAC_release/release0.3/) to pinpoint genetic variation that contributes to variable disease predisposition across populations.
Publication Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes
(Nature Publishing Group UK, 2017) Balasubramanian, Suganthi; Fu, Yao; Pawashe, Mayur; McGillivray, Patrick; Jin, Mike; Liu, Jeremy; Karczewski, Konrad; MacArthur, Daniel; Gerstein, MarkVariants predicted to result in the loss of function of human genes have attracted interest because of their clinical impact and surprising prevalence in healthy individuals. Here, we present ALoFT (annotation of loss-of-function transcripts), a method to annotate and predict the disease-causing potential of loss-of-function variants. Using data from Mendelian disease-gene discovery projects, we show that ALoFT can distinguish between loss-of-function variants that are deleterious as heterozygotes and those causing disease only in the homozygous state. Investigation of variants discovered in healthy populations suggests that each individual carries at least two heterozygous premature stop alleles that could potentially lead to disease if present as homozygotes. When applied to de novo putative loss-of-function variants in autism-affected families, ALoFT distinguishes between deleterious variants in patients and benign variants in unaffected siblings. Finally, analysis of somatic variants in >6500 cancer exomes shows that putative loss-of-function variants predicted to be deleterious by ALoFT are enriched in known driver genes.