[Motivation] Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. ; [Results] We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the 'spuriously correlated' SNP merely happens to be correlated with the 'truly causal' SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. ; E.F. and L.B. gratefully acknowledge financial support from the European Research Council (grant 295642, The Foundations of Economic Preferences, FEP). D.S. gratefully acknowledges financial support from the German National Science Foundation (DFG, grant SCHU 2828/2-1, Inference statistical methods for behavioral genetics and neuroeconomics). A.N. gratefully acknowledges support from the Instituto de Salud Carlos III (grants RD12/0032/0011 and PT13/0001/0026) and the Spanish Government Grant (BFU2012-38236) and from FEDER. ; Peer reviewed
[Background] Plant breeding has been proposed as one of the most effective and environmentally safe methods to control fungal infection and to reduce fumonisin accumulation. However, conventional breeding can be hampered by the complex genetic architecture of resistance to fumonisin accumulation and marker-assisted selection is proposed as an efficient alternative. In the current study, GWAS has been performed for the first time for detecting high-resolution QTL for resistance to fumonisin accumulation in maize kernels complementing published GWAS results for Fusarium ear rot. ; [Results] Thirty-nine SNPs significantly associated with resistance to fumonisin accumulation in maize kernels were found and clustered into 17 QTL. Novel QTLs for fumonisin content would be at bins 3.02, 5.02, 7.05 and 8.07. Genes with annotated functions probably implicated in resistance to pathogens based on previous studies have been highlighted. ; [Conclusions] Breeding approaches to fix favorable functional variants for genes implicated in maize immune response signaling may be especially useful to reduce kernel contamination with fumonisins without significantly interfering in mycelia development and growth and, consequently, in the beneficial endophytic behavior of Fusarium verticillioides. ; This research was funded by the Autonomous Government of Galicia, Spain (project IN607A/013), and by the "Secretaría de Estado de Investigación, Desarrollo e Innovación", Spain, within the projects AGL2015–67313-C2–1-R and AGL2015–67313-C2–2-R, which were co-financed with European Social Funds. R. Santiago acknowledges postdoctoral contract "Ramón y Cajal" financed by the "Secretaría de Estado de Investigación, Desarrollo e Innovación" and co-financed by the "Universidad de Vigo", Spain, and the European Social Funds.
Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual's privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially-private approach to penalized logistic regression.
In: Twin research and human genetics: the official journal of the International Society for Twin Studies (ISTS) and the Human Genetics Society of Australasia, Band 13, Heft 4, S. 398-403
AbstractSelf-rated health questions have been proven to be a highly reliable and valid measure of overall health as measured by other indicators in many population groups. It also has been shown to be a very good predictor of mortality, chronic or severe diseases, and the need for services, and is positively correlated with clinical assessments. Genetic factors have been estimated to account for 25–64% of the variance in the liability of self-rated health. The aim of the present study was to identify Single Nucleotide Polymorphisms (SNPs) underlying the heritability of self-rated health by conducting a genome-wide association analysis in a large sample of 6,706 Australian individuals aged 18–92. No genome wide significant SNPs associated with self-rated health could be identified, indicating that self-rated health may be influenced by a large number of SNPs with very small effect size. A very large sample will be needed to identify these SNPs.
Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary data are available at Bioinformatics online.
In: United Kingdom and Ireland Renal Transplant Consortium (UKIRTC) and the Wellcome Trust Case Control Consortium (WTCCC)-3 2018 , ' Long- and short- term outcomes in renal allografts with deceased donors: A large recipient and donor genome- wide association study: A large recipient and donor genome-wide association study ' , American Journal of Transplantation , vol. 18 , no. 6 , pp. 1370-1379 . https://doi.org/10.1111/ajt.14594
Improvements in immunosuppression have modified short- term survival of deceased- donor allografts, but not their rate of long- term failure. Mismatches between donor and recipient HLA play an important role in the acute and chronic allogeneic immune response against the graft. Perfect matching at clinically relevant HLA loci does not obviate the need for immunosuppression, suggesting that additional genetic variation plays a critical role in both short- and long- term graft outcomes. By combining patient data and samples from supranational cohorts across the United Kingdom and European Union, we performed the first large- scale genome- wide association study analyzing both donor and recipient DNA in 2094 complete renal transplant-pairs with replication in 5866 complete pairs. We studied deceased- donor grafts allocated on the basis of preferential HLA matching, which provided some control for HLA genetic effects. No strong donor or recipient genetic effects contributing to long- or short- term allograft survival were found outside the HLA region. We discuss the implications for future research and clinical application.
Publisher's version (útgefin grein) ; Rationale: Idiopathic pulmonary fibrosis (IPF) is a complex lung disease characterized by scarring of the lung that is believed to result from an atypical response to injury of the epithelium. Genome-wide association studies have reported signals of association implicating multiple pathways including host defense, telomere maintenance, signaling, and cell-cell adhesion. Objectives: To improve our understanding of factors that increase IPF susceptibility by identifying previously unreported genetic associations. Methods: We conducted genome-wide analyses across three independent studies and meta-analyzed these results to generate the largest genome-wide association study of IPF to date (2,668 IPF cases and 8,591 controls). We performed replication in two independent studies (1,456 IPF cases and 11,874 controls) and functional analyses (including statistical fine-mapping, investigations into gene expression, and testing for enrichment of IPF susceptibility signals in regulatory regions) to determine putatively causal genes. Polygenic risk scores were used to assess the collective effect of variants not reported as associated with IPF. Measurements and Main Results: We identified and replicated threenewgenome-wide significant (P<5×10-8) signals of association with IPF susceptibility (associated with altered gene expression of KIF15, MAD1L1, and DEPTOR) and confirmed associations at 11 previously reported loci. Polygenic risk score analyses showed that the combined effect of many thousands of as yet unreported IPF susceptibility variants contribute to IPF susceptibility. Conclusions: The observation that decreased DEPTOR expression associates with increased susceptibility to IPF supports recent studies demonstrating the importance of mTOR signaling in lung fibrosis. New signals of association implicating KIF15 and MAD1L1 suggest a possible role of mitotic spindle-assembly genes in IPF susceptibility. ; R.J.A. is an Action for Pulmonary Fibrosis Research Fellow. L.V.W. holds a GSK/British Lung Foundation Chair in Respiratory Research. R.G.J. is supported by a National Institute for Health Research (NIHR) Research Professorship (NIHR reference RP-2017-08-ST2-014). I.N. is supported by the NHLBI (R01HL130796). B.G.-G. is funded by Agencia Canaria de Investigación, Innovación y Sociedad de la Información (TESIS2015010057) cofunded by European Social Fund. J.M.O. is supported by the NHLBI (K23HL138190). C.F. is supported by the Spanish Ministry of Science, Innovation and Universities (grant RTC-2017-6471-1; Ministerio de Ciencia e Innovacion/Agencia Estatal de Investigación/Fondo Europeo de Desarrollo Regional, Unión Europea) cofinanced by the European Regional Development Funds "A way of making Europe" from the European Union and by agreement OA17/008 with Instituto Tecnológico y de Energías Renovables to strengthen scientific and technological education, training, research, development and innovation in Genomics, Personalized Medicine and Biotechnology. The Spain Biobank array genotyping service was performed at CEGEN-PRB3-ISCIII, which is supported by PT17/0019, of the PE I+D+i 2013–2016, funded by Instituto de Salud Carlos III, and cofinanced by the European Regional Development Funds. P.L.M. is an Action for Pulmonary Fibrosis Research Fellow. M.O. is a fellow of the Parker B. Francis Foundation and a Scholar of the Michael Smith Foundation for Health Research. B.D.H. is supported by NIH K08 HL136928, Parker B. Francis Research Opportunity Award. M.H.C. and G.M.H. are supported by NHLBI grants R01HL113264 (M.H.C.), R01HL137927 (M.H.C.), R01HL135142 (M.H.C. and G.M.H.), R01111024 (G.M.H.), and R01130974 (G.M.H.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funding body has no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. T.M.M. is supported by an NIHR Clinician Scientist Fellowship (NIHR Ref: CS-2013-13-017) and a British Lung Foundation Chair in Respiratory Research (C17-3). M.D.T. is supported by a Wellcome Trust Investigator Award (WT202849/Z/16/Z). The research was partially supported by the NIHR Leicester Biomedical Research Centre; the views expressed are those of the author(s) and not necessarily those of the National Health Service (NHS), the NIHR, or the Department of Health. I.P.H. was partially supported by the NIHR Nottingham Biomedical Research Centre; the views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. I.S. is supported by Medical Research Council (G1000861) and Asthma UK (AUK-PG-2013-188). D.F. was supported by an Intermediate Fellowship from the Wellcome Trust (097152/Z/11/Z). This work was partially supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre. V.N. is funded by an NIHR Clinical Lectureship. G.G. is supported by project grant 141513-051 from the Icelandic Research Fund and Landspitali Scientific Fund A-2016-023, A-2017-029, and A-2018-025. D.J.L. and A.M. are supported by Multi-Ethnic Study of Atherosclerosis (MESA) and the MESA SNP Health Association Resource (SHARe) project are conducted and supported by the NHLBI in collaboration with MESA investigators. Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1-TR-001881, and DK063491. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. Genotyping was performed at Affymetrix (Santa Clara, California) and the Broad Institute of Harvard and Massachusetts Institute of Technology (Boston, Massachusetts) using the Affymetrix Genome-Wide Human SNP Array 6.0. This work was supported by NIH grants R01 HL131565 (A.M.), R01 HL103676 (D.J.L.), and R01 HL137234 (D.J.L.). ; Peer Reviewed
Frost tolerance is a key trait with economic and agronomic importance in barley because it is a major component of winter hardiness, and therefore limits the geographical distribution of the crop and the effective transfer of quality traits between spring and winter crop types. Three main frost tolerance QTL (Fr-H1, Fr-H2 and Fr-H3) have been identified from bi-parental genetic mapping but it can be argued that those mapping populations only capture a portion of the genetic diversity of the species. A genetically broad dataset consisting of 184 genotypes, representative of the barley gene pool cultivated in the Mediterranean basin over an extended time period, was genotyped with 1536 SNP markers. Frost tolerance phenotype scores were collected from two trial sites, Foradada (Spain) and Fiorenzuola (Italy) and combined with the genotypic data in genome wide association analyses (GWAS) using Eigenstrat and kinship approaches to account for population structure. ; The above work was partially funded by the European Union-INCO-MED program (MABDE ICA3-CT2002-10026). The James Hutton Institute received grant in aid from the Scottish Government's Rural and Environment Science and Analytical Services Division. The Centre UdL-IRTA forms part of the Centre CONSOLIDER on Agrigenomics and acknowledges competitive grants and GEN2006-28560-E and AGL2011-30529-C02-02 from the Spanish Ministry of Science and Innovation.
BACKGROUND: The epilepsies are a clinically heterogeneous group of neurological disorders. Despite strong evidence for heritability, genome-wide association studies have had little success in identification of risk loci associated with epilepsy, probably because of relatively small sample sizes and insufficient power. We aimed to identify risk loci through meta-analyses of genome-wide association studies for all epilepsy and the two largest clinical subtypes (genetic generalised epilepsy and focal epilepsy). METHODS: We combined genome-wide association data from 12 cohorts of individuals with epilepsy and controls from population-based datasets. Controls were ethnically matched with cases. We phenotyped individuals with epilepsy into categories of genetic generalised epilepsy, focal epilepsy, or unclassified epilepsy. After standardised filtering for quality control and imputation to account for different genotyping platforms across sites, investigators at each site conducted a linear mixed-model association analysis for each dataset. Combining summary statistics, we conducted fixed-effects meta-analyses of all epilepsy, focal epilepsy, and genetic generalised epilepsy. We set the genome-wide significance threshold at p<1·66 × 10(-8). FINDINGS: We included 8696 cases and 26 157 controls in our analysis. Meta-analysis of the all-epilepsy cohort identified loci at 2q24.3 (p=8·71 × 10(-10)), implicating SCN1A, and at 4p15.1 (p=5·44 × 10(-9)), harbouring PCDH7, which encodes a protocadherin molecule not previously implicated in epilepsy. For the cohort of genetic generalised epilepsy, we noted a single signal at 2p16.1 (p=9·99 × 10(-9)), implicating VRK2 or FANCL. No single nucleotide polymorphism achieved genome-wide significance for focal epilepsy. INTERPRETATION: This meta-analysis describes a new locus not previously implicated in epilepsy and provides further evidence about the genetic architecture of these disorders, with the ultimate aim of assisting in disease classification and prognosis. The data suggest that specific loci can act pleiotropically raising risk for epilepsy broadly, or can have effects limited to a specific epilepsy subtype. Future genetic analyses might benefit from both lumping (ie, grouping of epilepsy types together) or splitting (ie, analysis of specific clinical subtypes). FUNDING: International League Against Epilepsy and multiple governmental and philanthropic agencies.
In: Twin research and human genetics: the official journal of the International Society for Twin Studies (ISTS) and the Human Genetics Society of Australasia, Band 15, Heft 5, S. 615-623
Recent Genome-Wide Association Studies (GWAS) have identified four low-penetrance ovarian cancer susceptibility loci. We hypothesized that further moderate- or low-penetrance variants exist among the subset of single-nucleotide polymorphisms (SNPs) not well tagged by the genotyping arrays used in the previous studies, which would account for some of the remaining risk. We therefore conducted a time- and cost-effective stage 1 GWAS on 342 invasive serous cases and 643 controls genotyped on pooled DNA using the high-density Illumina 1M-Duo array. We followed up 20 of the most significantly associated SNPs, which are not well tagged by the lower density arrays used by the published GWAS, and genotyping them on individual DNA. Most of the top 20 SNPs were clearly validated by individually genotyping the samples used in the pools. However, none of the 20 SNPs replicated when tested for association in a much larger stage 2 set of 4,651 cases and 6,966 controls from the Ovarian Cancer Association Consortium. Given that most of the top 20 SNPs from pooling were validated in the same samples by individual genotyping, the lack of replication is likely to be due to the relatively small sample size in our stage 1 GWAS rather than due to problems with the pooling approach. We conclude that there are unlikely to be any moderate or large effects on ovarian cancer risk untagged by less dense arrays. However, our study lacked power to make clear statements on the existence of hitherto untagged small-effect variants.
WOS: 000373197500020 ; PubMed ID: 27016271 ; BACKGROUND AND OBJECTIVE: Developmental language disorder (DLD) is a highly prevalent neurodevelopmental disorder associated with negative outcomes in different domains; the etiology of DLD is unknown. To investigate the genetic underpinnings of DLD, we performed genome-wide association and whole exome sequencing studies in a geographically isolated population with a substantially elevated prevalence of the disorder (ie, the AZ sample). METHODS: DNA samples were collected from 359 individuals for the genome-wide association study and from 12 severely affected individuals for whole exome sequencing. Multifaceted phenotypes, representing major domains of expressive language functioning, were derived from collected speech samples. RESULTS: Gene-based analyses revealed a significant association between SETBP1 and complexity of linguistic output (P = 5.47 x 10(-7)). The analysis of exome variants revealed coding sequence variants in 14 genes, most of which play a role in neural development. Targeted enrichment analysis implicated myocyte enhancer factor-2 (MEF2)-regulated genes in DLD in the AZ population. The main findings were successfully replicated in an independent cohort of children at risk for related disorders (n = 37). CONCLUSIONS: MEF2-regulated pathways were identified as potential candidate pathways in the etiology of DLD. Several genes (including the candidate SETBP1 and other MEF2-related genes) seem to jointly influence certain, but not all, facets of the DLD phenotype. Even when genetic and environmental diversity is reduced, DLD is best conceptualized as etiologically complex. Future research should establish whether the signals detected in the AZ population can be replicated in other samples and languages and provide further characterization of the identified pathway. ; National Institute of Health [R01 DC007665, P50 HD052120]; NIH Centers for Mendelian Genomics [5U54HG006504]; National Science Foundation Integrative Graduate Education and Research Traineeship grant [114399]; Government of the Russian Federation [14.Z50.31.0027]; National Institutes of Health (NIH) ; Supported by National Institute of Health grants R01 DC007665 (Dr Grigorenko, Principal Investigator) and P50 HD052120 (Richard Wagner, Principal Investigator), NIH Centers for Mendelian Genomics (5U54HG006504), National Science Foundation Integrative Graduate Education and Research Traineeship grant 114399 (Dr Magnuson, Principal Investigator), and grant 14.Z50.31.0027 from the Government of the Russian Federation (Dr Grigorenko, Principal Investigator). Funded by the National Institutes of Health (NIH).
WOS: 000373197500020 ; PubMed ID: 27016271 ; BACKGROUND AND OBJECTIVE: Developmental language disorder (DLD) is a highly prevalent neurodevelopmental disorder associated with negative outcomes in different domains; the etiology of DLD is unknown. To investigate the genetic underpinnings of DLD, we performed genome-wide association and whole exome sequencing studies in a geographically isolated population with a substantially elevated prevalence of the disorder (ie, the AZ sample). METHODS: DNA samples were collected from 359 individuals for the genome-wide association study and from 12 severely affected individuals for whole exome sequencing. Multifaceted phenotypes, representing major domains of expressive language functioning, were derived from collected speech samples. RESULTS: Gene-based analyses revealed a significant association between SETBP1 and complexity of linguistic output (P = 5.47 x 10(-7)). The analysis of exome variants revealed coding sequence variants in 14 genes, most of which play a role in neural development. Targeted enrichment analysis implicated myocyte enhancer factor-2 (MEF2)-regulated genes in DLD in the AZ population. The main findings were successfully replicated in an independent cohort of children at risk for related disorders (n = 37). CONCLUSIONS: MEF2-regulated pathways were identified as potential candidate pathways in the etiology of DLD. Several genes (including the candidate SETBP1 and other MEF2-related genes) seem to jointly influence certain, but not all, facets of the DLD phenotype. Even when genetic and environmental diversity is reduced, DLD is best conceptualized as etiologically complex. Future research should establish whether the signals detected in the AZ population can be replicated in other samples and languages and provide further characterization of the identified pathway. ; National Institute of Health [R01 DC007665, P50 HD052120]; NIH Centers for Mendelian Genomics [5U54HG006504]; National Science Foundation Integrative Graduate Education and Research Traineeship grant [114399]; Government of the Russian Federation [14.Z50.31.0027]; National Institutes of Health (NIH) ; Supported by National Institute of Health grants R01 DC007665 (Dr Grigorenko, Principal Investigator) and P50 HD052120 (Richard Wagner, Principal Investigator), NIH Centers for Mendelian Genomics (5U54HG006504), National Science Foundation Integrative Graduate Education and Research Traineeship grant 114399 (Dr Magnuson, Principal Investigator), and grant 14.Z50.31.0027 from the Government of the Russian Federation (Dr Grigorenko, Principal Investigator). Funded by the National Institutes of Health (NIH).
Leukocyte telomere length (LTL) is a heritable biomarker of genomic aging. In this study, we perform a genome-wide meta-analysis of LTL by pooling densely genotyped and imputed association results across large-scale European-descent studies including up to 78,592 individuals. We identify 49 genomic regions at a false dicovery rate (FDR) 350,000 UK Biobank participants suggest that genetically shorter telomere length increases the risk of hypothyroidism and decreases the risk of thyroid cancer, lymphoma, and a range of proliferative conditions. Our results replicate previously reported associations with increased risk of coronary artery disease and lower risk for multiple cancer types. Our findings substantially expand current knowledge on genes that regulate LTL and their impact on human health and disease. ; The ENGAGE Project was funded under the European Union Framework 7 – Health Theme (HEALTH-F4-2007- 201413). The InterAct project received funding from the European Union (Integrated Project LSHM-CT-2006-037197 in the Framework Programme 6 of the European Community). The EPIC-CVD study was supported by core funding from the UK Medical Research Council (MR/L003120/1), the British Heart Foundation (RG/13/13/30194; RG/18/13/33946), the European Commission Framework Programme 7 (HEALTH-F2-2012-279233), and the National Institute for Health Research [Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust]. C.P.N is funded by the BHF. V.C., C.P.N. and N.J.S. are supported by the NIHR Leicester Cardiovascular Biomedical Research Centre and N.J.S. holds an NIHR Senior Investigator award. Chen Li is support by a 4-year Wellcome Trust PhD Studentship; CL, LAL, NJW are funded by the Medical Research Council (MC_UU_12015/1). NJW is an NIHR Senior Investigator. JD is funded by the National Institute for Health Research [Senior Investigator Award]. Cohort specific and further acknowledgements are given in the Supplemental Data.
In: Twin research and human genetics: the official journal of the International Society for Twin Studies (ISTS) and the Human Genetics Society of Australasia, Band 15, Heft 6, S. 767-774
As part of the Genes, Environment and Development Initiative, the Minnesota Center for Twin and Family Research (MCTFR) undertook a genome-wide association study, which we describe here. A total of 8,405 research participants, clustered in four-member families, have been successfully genotyped on 527,829 single nucleotide polymorphism (SNP) markers using Illumina's Human660W-Quad array. Quality control screening of samples and markers as well as SNP imputation procedures are described. We also describe methods for ancestry control and how the familial clustering of the MCTFR sample can be accounted for in the analysis using a Rapid Feasible Generalized Least Squares algorithm. The rich longitudinal MCTFR assessments provide numerous opportunities for collaboration.