Person: de Bakker, Paul
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
First Name
Name
Search Results
Publication Meta-Analysis of Genome-Wide Association Studies in Celiac Disease and Rheumatoid Arthritis Identifies Fourteen Non-HLA Shared Loci
(Public Library of Science, 2011) Zhernakova, Alexandra; Stahl, Eli A.; Trynka, Gosia; Festen, Eleanora A.; Franke, Lude; Westra, Harm-Jan; Fehrmann, Rudolf S. N.; Kurreeman, Fina A. S.; Thomson, Brian; Gupta, Namrata; Romanos, Jihane; McManus, Ross; Ryan, Anthony W.; Turner, Graham; Brouwer, Elisabeth; Posthumus, Marcel D.; Remmers, Elaine F.; Tucci, Francesca; Toes, Rene; Grandone, Elvira; Mazzilli, Maria Cristina; Rybak, Anna; Cukrowska, Bozena; Coenen, Marieke J. H.; Radstake, Timothy R. D. J.; van Riel, Piet L. C. M.; Li, Yonghong; Gregersen, Peter K.; Worthington, Jane; Siminovitch, Katherine A.; Klareskog, Lars; Huizinga, Tom W. J.; Wijmenga, Cisca; Raychaudhuri, Soumya; de Bakker, Paul; Plenge, Robert M.Epidemiology and candidate gene studies indicate a shared genetic basis for celiac disease (CD) and rheumatoid arthritis (RA), but the extent of this sharing has not been systematically explored. Previous studies demonstrate that 6 of the established non-HLA CD and RA risk loci (out of 26 loci for each disease) are shared between both diseases. We hypothesized that there are additional shared risk alleles and that combining genome-wide association study (GWAS) data from each disease would increase power to identify these shared risk alleles. We performed a meta-analysis of two published GWAS on CD (4,533 cases and 10,750 controls) and RA (5,539 cases and 17,231 controls). After genotyping the top associated SNPs in 2,169 CD cases and 2,255 controls, and 2,845 RA cases and 4,944 controls, 8 additional SNPs demonstrated P<5×10(^{−8}) in a combined analysis of all 50,266 samples, including four SNPs that have not been previously confirmed in either disease: rs10892279 near the DDX6 gene (P({combined}) = 1.2×10(^{−12})), rs864537 near CD247 (P({combined}) = 2.2×10(^{−11})), rs2298428 near UBE2L3 (P({combined}) = 2.5×10(^{−10})), and rs11203203 near UBASH3A (P({combined}) = 1.1×10(^{−8})). We also confirmed that 4 gene loci previously established in either CD or RA are associated with the other autoimmune disease at combined P<5×10(^{−8}) (SH2B3, 8q24, STAT4, and TRAF1-C5). From the 14 shared gene loci, 7 SNPs showed a genome-wide significant effect on expression of one or more transcripts in the linkage disequilibrium (LD) block around the SNP. These associations implicate antigen presentation and T-cell activation as a shared mechanism of disease pathogenesis and underscore the utility of cross-disease meta-analysis for identification of genetic risk factors with pleiotropic effects between two clinically distinct diseases.
Publication Modeling the Cumulative Genetic Risk for Multiple Sclerosis from Genome-Wide Association Data
(BioMed Central, 2011) Wang, Joanne H; Pappas, Derek; Pelletier, Daniel; Kappos, Ludwig; Polman, Chris H; Matthews, Paul M; Hauser, Stephen L; Baranzini, Sergio E; Oksenberg, Jorge R; De Jager, Philip; de Bakker, Paul; Chibnik, Lori; Hafler, DavidBackground: Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed. Methods: We used a discovery GWAS dataset (8,844 samples, 2,124 cases and 6,720 controls) and a multi-step logistic regression protocol to identify novel genetic associations. The emerging genetic profile included 350 independent markers and was used to calculate and estimate the cumulative genetic risk in an independent validation dataset (3,606 samples). Analysis of covariance (ANCOVA) was implemented to compare clinical characteristics of individuals with various degrees of genetic risk. Gene ontology and pathway enrichment analysis was done using the DAVID functional annotation tool, the GO Tree Machine, and the Pathway-Express profiling tool. Results: In the discovery dataset, the median cumulative genetic risk (P-Hat) was 0.903 and 0.007 in the case and control groups, respectively, together with 79.9% classification sensitivity and 95.8% specificity. The identified profile shows a significant enrichment of genes involved in the immune response, cell adhesion, cell communication/signaling, nervous system development, and neuronal signaling, including ionotropic glutamate receptors, which have been implicated in the pathological mechanism driving neurodegeneration. In the validation dataset, the median cumulative genetic risk was 0.59 and 0.32 in the case and control groups, respectively, with classification sensitivity 62.3% and specificity 75.9%. No differences in disease progression or T2-lesion volumes were observed among four levels of predicted genetic risk groups (high, medium, low, misclassified). On the other hand, a significant difference (F = 2.75, P = 0.04) was detected for age of disease onset between the affected misclassified as controls (mean = 36 years) and the other three groups (high, 33.5 years; medium, 33.4 years; low, 33.1 years). Conclusions: The results are consistent with the polygenic model of inheritance. The cumulative genetic risk established using currently available genome-wide association data provides important insights into disease heterogeneity and completeness of current knowledge in MS genetics.
Publication Concept, Design and Implementation of a Cardiovascular Gene-centric 50 K SNP Array for Large-scale Genomic Association Studies
(Public Library of Science, 2008) Keating, Brendan J.; Tischfield, Sam; Murray, Sarah S.; Bhangale, Tushar; Price, Thomas S.; Glessner, Joseph T.; Galver, Luana; Barrett, Jeffrey C.; Grant, Struan F. A.; Farlow, Deborah N.; Chandrupatla, Hareesh R.; Ajmal, Saad; Papanicolaou, George J.; Guo, Yiran; Li, Mingyao; DerOhannessian, Stephanie; Bailey, Swneke D.; Montpetit, Alexandre; Edmondson, Andrew C.; Taylor, Kent; Gai, Xiaowu; Wang, Susanna S.; Fornage, Myriam; Shaikh, Tamim; Groop, Leif; Boehnke, Michael; Hall, Alistair S.; Hattersley, Andrew T.; Frackelton, Edward; Patterson, Nick; Chiang, Charleston W. K.; Kim, Cecelia E.; Fabsitz, Richard R.; Ouwehand, Willem; Munroe, Patricia; Caulfield, Mark; Drake, Thomas; Boerwinkle, Eric; Whitehead, A. Stephen; Cappola, Thomas P.; Samani, Nilesh J.; Lusis, A. Jake; Schadt, Eric; Wilson, James G.; Koenig, Wolfgang; McCarthy, Mark I.; Kathiresan, Sekar; Gabriel, Stacey B.; Hakonarson, Hakon; Anand, Sonia S.; Reilly, Muredach; Engert, James C.; Nickerson, Deborah A.; Rader, Daniel J.; FitzGerald, Garret A.; Reitsma, Pieter H.; Hansen, Mark; de Bakker, Paul; Price, Alkes; Reich, David; Hirschhorn, JoelA wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a “cosmopolitan” tagging approach to capture the genetic diversity across ∼2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions.
Publication Comparative Modelling by Restraint-Based Conformational Sampling
(BioMed Central, 2008) Furnham, Nicholas; de Bakker, Paul; Gore, Swanand; Burke, David F; Blundell, Tom LBackground: Although comparative modelling is routinely used to produce three-dimensional models of proteins, very few automated approaches are formulated in a way that allows inclusion of restraints derived from experimental data as well as those from the structures of homologues. Furthermore, proteins are usually described as a single conformer, rather than an ensemble that represents the heterogeneity and inaccuracy of experimentally determined protein structures. Here we address these issues by exploring the application of the restraint-based conformational space search engine, RAPPER, which has previously been developed for rebuilding experimentally defined protein structures and for fitting models to electron density derived from X-ray diffraction analyses. Results: A new application of RAPPER for comparative modelling uses positional restraints and knowledge-based sampling to generate models with accuracies comparable to other leading modelling tools. Knowledge-based predictions are based on geometrical features of the homologous templates and rules concerning main-chain and side-chain conformations. By directly changing the restraints derived from available templates we estimate the accuracy limits of the method in comparative modelling. Conclusion: The application of RAPPER to comparative modelling provides an effective means of exploring the conformational space available to a target sequence. Enhanced methods for generating positional restraints can greatly improve structure prediction. Generation of an ensemble of solutions that are consistent with both target sequence and knowledge derived from the template structures provides a more appropriate representation of a structural prediction than a single model. By formulating homologous structural information as sets of restraints we can begin to consider how comparative models might be used to inform conformer generation from sparse experimental data.
Publication Novel Loci for Metabolic Networks and Multi-Tissue Expression Studies Reveal Genes for Atherosclerosis
(Public Library of Science, 2012) Inouye, Michael; Ripatti, Samuli; Kettunen, Johannes; Lyytikäinen, Leo-Pekka; Oksala, Niku; Laurila, Pirkka-Pekka; Kangas, Antti J.; Soininen, Pasi; Savolainen, Markku J.; Viikari, Jorma; Kähönen, Mika; Perola, Markus; Salomaa, Veikko; Raitakari, Olli; Lehtimäki, Terho; Taskinen, Marja-Riitta; Järvelin, Marjo-Riitta; Ala-Korpela, Mika; Palotie, Aarno; de Bakker, PaulAssociation testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis.
Publication GWAS Identifies Novel Susceptibility Loci on 6p21.32 and 21q21.3 for Hepatocellular Carcinoma in Chronic Hepatitis B Virus Carriers
(Public Library of Science, 2012) Li, Shengping; Qian, Ji; Zhao, Wanting; Dai, Juncheng; Bei, Jin-Xin; Foo, Jia Nee; McLaren, Paul J.; Li, Zhiqiang; Yang, Jingmin; Shen, Feng; Yang, Jiamei; Li, Shuhong; Pan, Shandong; Li, Wenjin; Zhai, Xiangjun; Zhou, Boping; Shi, Lehua; Chen, Xinchun; Chu, Minjie; Yan, Yiqun; Cheng, Shuqun; Shen, Jiawei; Jia, Weihua; Liu, Jibin; Yang, Jiahe; Wen, Zujia; Li, Aijun; Zhang, Guoliang; Luo, Xianrong; Qin, Hongbo; Chen, Minshan; Lin, Dongxin; Shen, Hongbing; Wang, Hongyang; Zeng, Yi-Xin; Wu, Mengchao; Hu, Zhibin; Shi, Yongyong; Liu, Jianjun; Zhou, Weiping; Yang, Yuan; Liu, Li; Wang, Yi; Wang, Jun; Zhang, Ying; Wang, Hua; Jin, Li; He, Lin; de Bakker, PaulGenome-wide association studies (GWAS) have recently identified KIF1B as susceptibility locus for hepatitis B virus (HBV)–related hepatocellular carcinoma (HCC). To further identify novel susceptibility loci associated with HBV–related HCC and replicate the previously reported association, we performed a large three-stage GWAS in the Han Chinese population. 523,663 autosomal SNPs in 1,538 HBV–positive HCC patients and 1,465 chronic HBV carriers were genotyped for the discovery stage. Top candidate SNPs were genotyped in the initial validation samples of 2,112 HBV–positive HCC cases and 2,208 HBV carriers and then in the second validation samples of 1,021 cases and 1,491 HBV carriers. We discovered two novel associations at rs9272105 (HLA-DQA1/DRB1) on 6p21.32 (OR = 1.30, P = 1.13×(10^{−19})) and rs455804 (GRIK1) on 21q21.3 (OR = 0.84, P = 1.86×(10^{−8})), which were further replicated in the fourth independent sample of 1,298 cases and 1,026 controls (rs9272105: OR = 1.25, P = 1.71×(10^{−4}); rs455804: OR = 0.84, P = 6.92×(10^{−3})). We also revealed the associations of HLA-DRB10405 and 09010602, which could partially account for the association at rs9272105. The association at rs455804 implicates GRIK1 as a novel susceptibility gene for HBV–related HCC, suggesting the involvement of glutamate signaling in the development of HBV–related HCC.
Publication Amino Acid Position 11 of HLA-DRβ1 is a Major Determinant of Chromosome 6p Association with Ulcerative Colitis
(2012) Achkar, Jean-Paul; Klei, Lambertus; de Bakker, Paul; Bellone, Gaia; Rebert, Nancy; Scott, Regan; Lu, Ying; Regueiro, Miguel; Brzezinski, Aaron; Kamboh, M. Ilyas; Fiocchi, Claudio; Devlin, Bernie; Trucco, Massimo; Ringquist, Steven; Roeder, Kathryn; Duerr, Richard HThe major histocompatibility complex (MHC) on chromosome 6p is an established risk locus for ulcerative colitis (UC) and Crohn’s disease (CD). We aimed to better define MHC association signals in UC and CD by combining data from dense single nucleotide polymorphism (SNP) genotyping and from imputation of classical HLA types, their constituent SNPs and corresponding amino acids in 562 UC, 611 CD, and 1,428 control subjects. Univariate and multivariate association analyses were performed, controlling for ancestry. In univariate analyses, absence of the rs9269955 C allele was strongly associated with risk for UC (P = 2.67×(10^{-13})). rs9269955 is a SNP in the codon for amino acid position 11 of HLA-DRβ1, located in the P6 pocket of the HLA-DR antigen binding cleft. This amino acid position was also the most significantly UC-associated amino acid in omnibus tests (P = 2.68×(10^{-13})). Multivariate modeling identified rs9269955-C and 13 other variants in best predicting UC versus control status. In contrast, there was only suggestive association evidence between the MHC and CD. Taken together, these data demonstrate that variation at HLA-DRβ1, amino acid 11 in the P6 pocket of the HLA-DR complex antigen binding cleft is a major determinant of chromosome 6p association with ulcerative colitis.
Publication Effective Detection of Human Leukocyte Antigen Risk Alleles in Celiac Disease Using Tag Single Nucleotide Polymorphisms
(Public Library of Science, 2008) Monsuur, Alienke J.; Zhernakova, Alexandra; Pinto, Dalila; Verduijn, Willem; Romanos, Jihane; Auricchio, Renata; Lopez, Ana; van Heel, David A.; Crusius, J. Bart A; Wijmenga, Cisca; de Bakker, PaulBackground: The HLA genes, located in the MHC region on chromosome 6p21.3, play an important role in many autoimmune disorders, such as celiac disease (CD), type 1 diabetes (T1D), rheumatoid arthritis, multiple sclerosis, psoriasis and others. Known HLA variants that confer risk to CD, for example, include DQA105/DQB102 (DQ2.5) and DQA103/DQB10302 (DQ8). To diagnose the majority of CD patients and to study disease susceptibility and progression, typing these strongly associated HLA risk factors is of utmost importance. However, current genotyping methods for HLA risk factors involve many reactions, and are complicated and expensive. We sought a simple experimental approach using tagging SNPs that predict the CD-associated HLA risk factors. Methodology: Our tagging approach exploits linkage disequilibrium between single nucleotide polymorphism (SNPs) and the CD-associated HLA risk factors DQ2.5 and DQ8 that indicate direct risk, and DQA10201/DQB10202 (DQ2.2) and DQA10505/DQB10301 (DQ7) that attribute to the risk of DQ2.5 to CD. To evaluate the predictive power of this approach, we performed an empirical comparison of the predicted DQ types, based on these six tag SNPs, with those executed with current validated laboratory typing methods of the HLA-DQA1 and -DQB1 genes in three large cohorts. The results were validated in three European celiac populations. Conclusion: Using this method, only six SNPs were needed to predict the risk types carried by >95% of CD patients. We determined that for this tagging approach the sensitivity was >0.991, specificity >0.996 and the predictive value >0.948. Our results show that this tag SNP method is very accurate and provides an excellent basis for population screening for CD. This method is broadly applicable in European populations.
Publication Common Missense Variant in the Glucokinase Regulatory Protein Gene Is Associated With Increased Plasma Triglyceride and C-Reactive Protein but Lower Fasting Glucose Concentrations
(American Diabetes Association, 2008) Orho-Melander, Marju; Melander, Olle; Guiducci, Candace; Perez-Martinez, Pablo; Corella, Dolores; Roos, Charlotta; Tewhey, Ryan; Rieder, Mark J.; Hall, Jennifer; Abecasis, Goncalo; Tai, E. Shyong; Welch, Cullan; Arnett, Donna K.; Lyssenko, Valeriya; Lindholm, Eero; Burtt, Noel; Voight, Benjamin F.; Tucker, Katherine L.; Hedner, Thomas; Tuomi, Tiinamaija; Isomaa, Bo; Eriksson, Karl-Fredrik; Taskinen, Marja-Riitta; Wahlstrand, Björn; Hughes, Thomas E.; Parnell, Laurence D.; Lai, Chao-Qiang; Berglund, Göran; Peltonen, Leena; Vartiainen, Erkki; Jousilahti, Pekka; Havulinna, Aki S.; Salomaa, Veikko; Nilsson, Peter; Groop, Leif; Ordovas, Jose M.; Kathiresan, Sekar; Saxena, Richa; de Bakker, Paul; Hirschhorn, Joel; Altshuler, DavidObjective: Using the genome-wide association approach, we recently identified the glucokinase regulatory protein gene (GCKR, rs780094) region as a novel quantitative trait locus for plasma triglyceride concentration in Europeans. Here, we sought to study the association of GCKR variants with metabolic phenotypes, including measures of glucose homeostasis, to evaluate the GCKR locus in samples of non-European ancestry and to fine-map across the associated genomic interval. Research Design and Methods: We performed association studies in 12 independent cohorts comprising >45,000 individuals representing several ancestral groups (whites from Northern and Southern Europe, whites from the U.S., African Americans from the U.S., Hispanics of Caribbean origin, and Chinese, Malays, and Asian Indians from Singapore). We conducted genetic fine-mapping across the ∼417-kb region of linkage disequilibrium spanning GCKR and 16 other genes on chromosome 2p23 by imputing untyped HapMap single nucleotide polymorphisms (SNPs) and genotyping 104 SNPs across the associated genomic interval. Results: We provide comprehensive evidence that GCKR rs780094 is associated with opposite effects on fasting plasma triglyceride (Pmeta = 3 × 10−56) and glucose (Pmeta = 1 × 10−13) concentrations. In addition, we confirmed recent reports that the same SNP is associated with C-reactive protein (CRP) level (P = 5 × 10−5). Both fine-mapping approaches revealed a common missense GCKR variant (rs1260326, Pro446Leu, 34% frequency, r2 = 0.93 with rs780094) as the strongest association signal in the region. Conclusions: These findings point to a molecular mechanism in humans by which higher triglycerides and CRP can be coupled with lower plasma glucose concentrations and position GCKR in central pathways regulating both hepatic triglyceride and glucose metabolism.
Publication Deleterious Alleles in the Human Genome Are on Average Younger Than Neutral Alleles of the Same Frequency
(Public Library of Science, 2013) Kiezun, Adam; Pulit, Sara L.; Francioli, Laurent C.; van Dijk, Freerk; Swertz, Morris; Boomsma, Dorret I.; van Duijn, Cornelia M.; Slagboom, P. Eline; van Ommen, G. J. B.; Wijmenga, Cisca; de Bakker, Paul; Sunyaev, ShamilLarge-scale population sequencing studies provide a complete picture of human genetic variation within the studied populations. A key challenge is to identify, among the myriad alleles, those variants that have an effect on molecular function, phenotypes, and reproductive fitness. Most non-neutral variation consists of deleterious alleles segregating at low population frequency due to incessant mutation. To date, studies characterizing selection against deleterious alleles have been based on allele frequency (testing for a relative excess of rare alleles) or ratio of polymorphism to divergence (testing for a relative increase in the number of polymorphic alleles). Here, starting from Maruyama's theoretical prediction (Maruyama T (1974), Am J Hum Genet USA 6:669–673) that a (slightly) deleterious allele is, on average, younger than a neutral allele segregating at the same frequency, we devised an approach to characterize selection based on allelic age. Unlike existing methods, it compares sets of neutral and deleterious sequence variants at the same allele frequency. When applied to human sequence data from the Genome of the Netherlands Project, our approach distinguishes low-frequency coding non-synonymous variants from synonymous and non-coding variants at the same allele frequency and discriminates between sets of variants independently predicted to be benign or damaging for protein structure and function. The results confirm the abundance of slightly deleterious coding variation in humans.