[Motivation] Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. ; [Results] We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the 'spuriously correlated' SNP merely happens to be correlated with the 'truly causal' SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. ; E.F. and L.B. gratefully acknowledge financial support from the European Research Council (grant 295642, The Foundations of Economic Preferences, FEP). D.S. gratefully acknowledges financial support from the German National Science Foundation (DFG, grant SCHU 2828/2-1, Inference statistical methods for behavioral genetics and neuroeconomics). A.N. gratefully acknowledges support from the Instituto de Salud Carlos III (grants RD12/0032/0011 and PT13/0001/0026) and the Spanish Government Grant (BFU2012-38236) and from FEDER. ; Peer reviewed
Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P -values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the 'spuriously correlated' SNP merely happens to be correlated with the 'truly causal' SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. Availability and implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. Contact:peter.buehlmann@stat.math.ethz.ch Supplementary information:Supplementary data are available at Bioinformatics online. ; E.F. and L.B. gratefully acknowledge financial support from the European Research Council (grant 295642, The Foundations of Economic Preferences, FEP). D.S. gratefully acknowledges financial support from the German National Science Foundation (DFG, grant SCHU 2828/2-1, Inference statistical methods for behavioral genetics and neuroeconomics). A.N. gratefully acknowledges support from the Instituto de Salud Carlos III (grants RD12/0032/0011 and PT13/0001/0026) and the Spanish Government Grant (BFU2012-38236) and from FEDER.
[Background] Plant breeding has been proposed as one of the most effective and environmentally safe methods to control fungal infection and to reduce fumonisin accumulation. However, conventional breeding can be hampered by the complex genetic architecture of resistance to fumonisin accumulation and marker-assisted selection is proposed as an efficient alternative. In the current study, GWAS has been performed for the first time for detecting high-resolution QTL for resistance to fumonisin accumulation in maize kernels complementing published GWAS results for Fusarium ear rot. ; [Results] Thirty-nine SNPs significantly associated with resistance to fumonisin accumulation in maize kernels were found and clustered into 17 QTL. Novel QTLs for fumonisin content would be at bins 3.02, 5.02, 7.05 and 8.07. Genes with annotated functions probably implicated in resistance to pathogens based on previous studies have been highlighted. ; [Conclusions] Breeding approaches to fix favorable functional variants for genes implicated in maize immune response signaling may be especially useful to reduce kernel contamination with fumonisins without significantly interfering in mycelia development and growth and, consequently, in the beneficial endophytic behavior of Fusarium verticillioides. ; This research was funded by the Autonomous Government of Galicia, Spain (project IN607A/013), and by the "Secretaría de Estado de Investigación, Desarrollo e Innovación", Spain, within the projects AGL2015–67313-C2–1-R and AGL2015–67313-C2–2-R, which were co-financed with European Social Funds. R. Santiago acknowledges postdoctoral contract "Ramón y Cajal" financed by the "Secretaría de Estado de Investigación, Desarrollo e Innovación" and co-financed by the "Universidad de Vigo", Spain, and the European Social Funds.
The popularization of large-scale federated Genome-Wide Association Study (GWAS) where multiple data owners share their genome data to conduct federated analytics uncovers new privacy issues that have remained unnoticed or not given proper attention. Indeed, as soon as a diverse type of interested parties (e.g., private or public biocenters and governmental institutions from around the globe) and individuals from heterogeneous populations are participating in cooperative studies, interdependent and multi-party privacy appear as crucial issues that are currently not adequately assessed. In fact, in federated GWAS environments, the privacy of individuals and parties does not depend solely on their own behavior anymore but also on others, because a collaborative environment opens new credible adversary models. For instance, one might want to tailor the privacy guarantees to withstand the presence of potentially colluding federation members aiming to violate other members' data privacy and the privacy deterioration that might occur in the presence of interdependent genomic data (e.g., due to the presence of relatives in studies or the perpetuation of previous genomic privacy leaks in future studies). In this work, we catalog and discuss the features, unsolved problems, and challenges to tackle toward truly end-to-end private and practical federated GWAS.
Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual's privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially-private approach to penalized logistic regression.
In: Twin research and human genetics: the official journal of the International Society for Twin Studies (ISTS) and the Human Genetics Society of Australasia, Volume 13, Issue 4, p. 398-403
AbstractSelf-rated health questions have been proven to be a highly reliable and valid measure of overall health as measured by other indicators in many population groups. It also has been shown to be a very good predictor of mortality, chronic or severe diseases, and the need for services, and is positively correlated with clinical assessments. Genetic factors have been estimated to account for 25–64% of the variance in the liability of self-rated health. The aim of the present study was to identify Single Nucleotide Polymorphisms (SNPs) underlying the heritability of self-rated health by conducting a genome-wide association analysis in a large sample of 6,706 Australian individuals aged 18–92. No genome wide significant SNPs associated with self-rated health could be identified, indicating that self-rated health may be influenced by a large number of SNPs with very small effect size. A very large sample will be needed to identify these SNPs.
Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary data are available at Bioinformatics online.
The identification and characterisation of genomic changes (variants) that can lead to human diseases is one of the central aims of biomedical research. The generation of catalogues of genetic variants that have an impact on specific diseases is the basis of Personalised Medicine, where diagnoses and treatment protocols are selected according to each patient's profile. In this context, the study of complex diseases, such as Type 2 diabetes or cardiovascular alterations, is fundamental. However, these diseases result from the combination of multiple genetic and environmental factors, which makes the discovery of causal variants particularly challenging at a statistical and computational level. Genome-Wide Association Studies (GWAS), which are based on the statistical analysis of genetic variant frequencies across non-diseased and diseased individuals, have been successful in finding genetic variants that are associated to specific diseases or phenotypic traits. But GWAS methodology is limited when considering important genetic aspects of the disease and has not yet resulted in meaningful translation to clinical practice. This review presents an outlook on the study of the link between genetics and complex phenotypes. We first present an overview of the past and current statistical methods used in the field. Next, we discuss current practices and their main limitations. Finally, we describe the open challenges that remain and that might benefit greatly from further mathematical developments. ; L.A. was supported by grant BES-2017-081635. This publication is part of R&D and Innovation grant BES-2017-081635 funded by MCIN and by "FSE Investing in your future"I.M. was supported by grant FJCI-2017-31878. This publication is part of R&D and Innovation grant FJCI-2017-31878 funded by MCIN. C.S. received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement H2020-MSCA-COFUND-2016-754433. ; Peer Reviewed ; Postprint (published version)
In: Vesci Nacyjanal'naj Akadėmii Navuk Belarusi: Izvestija Nacional'noj Akademii Nauk Belarusi = Proceedings of the National Academy of Sciences of Belarus. Seryja ahrarnych navuk = Serija agrarnych nauk = Agrarian sciences series, Volume 59, Issue 1, p. 71-80
Genetic technologies used in breeding of small ruminants requires searching for new molecular markers of productive traits. The most effective for this is genome-wide association study (GWAS) of single nucleotide polymorphisms (SNP) with economically valuable traits. The paper presents results of study of associations of the frequency of single nucleotide polymorphisms with a rank assessment according to complex of productive traits (super-elite) in Romanov sheep using DNA biochips Ovine Infinium HD BeadChip 600K. Eleven SNPs have been found having significant correlation with the animals belonging to the "super-elite" group. Five substitutions are located in the genes introns, six are related to intergenic polymorphisms. The highest reliability of association with productivity was observed in substitution rs410516628 (р = 3,14 · 10-9) located on the 3rd chromosome. Substitution rs422028000 on 2nd chromosome differs with the fact that in the "super-elite" group it was found in 90 % of haplotypes. Polymorphisms rs411162754 (1st chromosome) and rs417281100 (10th chromosome) in our study turned out to be the rarest – only in "super-elite" group and only in a quarter of haplotypes. The genes located near the identified SNPs are mainly associated with metabolic and regulatory processes. Our study has identified several new candidate genes with polymorphism probably associated with the ranking in terms of productivity in Romanov sheep: LTBP1, KCNH8, LMX1B, ZBTB43, MSRA, CHPF, PID1 and DNER. The results obtained create a theoretical basis for further study of candidate genes affecting implementation of phenotypic traits in Romanov sheep. The revealed polymorphisms associated with the productive traits of sheep can be used in practical breeding as molecular and genetic markers for selection of parental pairs.
In: United Kingdom and Ireland Renal Transplant Consortium (UKIRTC) and the Wellcome Trust Case Control Consortium (WTCCC)-3 2018 , ' Long- and short- term outcomes in renal allografts with deceased donors: A large recipient and donor genome- wide association study: A large recipient and donor genome-wide association study ' , American Journal of Transplantation , vol. 18 , no. 6 , pp. 1370-1379 . https://doi.org/10.1111/ajt.14594
Improvements in immunosuppression have modified short- term survival of deceased- donor allografts, but not their rate of long- term failure. Mismatches between donor and recipient HLA play an important role in the acute and chronic allogeneic immune response against the graft. Perfect matching at clinically relevant HLA loci does not obviate the need for immunosuppression, suggesting that additional genetic variation plays a critical role in both short- and long- term graft outcomes. By combining patient data and samples from supranational cohorts across the United Kingdom and European Union, we performed the first large- scale genome- wide association study analyzing both donor and recipient DNA in 2094 complete renal transplant-pairs with replication in 5866 complete pairs. We studied deceased- donor grafts allocated on the basis of preferential HLA matching, which provided some control for HLA genetic effects. No strong donor or recipient genetic effects contributing to long- or short- term allograft survival were found outside the HLA region. We discuss the implications for future research and clinical application.
Publisher's version (útgefin grein) ; Rationale: Idiopathic pulmonary fibrosis (IPF) is a complex lung disease characterized by scarring of the lung that is believed to result from an atypical response to injury of the epithelium. Genome-wide association studies have reported signals of association implicating multiple pathways including host defense, telomere maintenance, signaling, and cell-cell adhesion. Objectives: To improve our understanding of factors that increase IPF susceptibility by identifying previously unreported genetic associations. Methods: We conducted genome-wide analyses across three independent studies and meta-analyzed these results to generate the largest genome-wide association study of IPF to date (2,668 IPF cases and 8,591 controls). We performed replication in two independent studies (1,456 IPF cases and 11,874 controls) and functional analyses (including statistical fine-mapping, investigations into gene expression, and testing for enrichment of IPF susceptibility signals in regulatory regions) to determine putatively causal genes. Polygenic risk scores were used to assess the collective effect of variants not reported as associated with IPF. Measurements and Main Results: We identified and replicated threenewgenome-wide significant (P<5×10-8) signals of association with IPF susceptibility (associated with altered gene expression of KIF15, MAD1L1, and DEPTOR) and confirmed associations at 11 previously reported loci. Polygenic risk score analyses showed that the combined effect of many thousands of as yet unreported IPF susceptibility variants contribute to IPF susceptibility. Conclusions: The observation that decreased DEPTOR expression associates with increased susceptibility to IPF supports recent studies demonstrating the importance of mTOR signaling in lung fibrosis. New signals of association implicating KIF15 and MAD1L1 suggest a possible role of mitotic spindle-assembly genes in IPF susceptibility. ; R.J.A. is an Action for Pulmonary Fibrosis Research Fellow. L.V.W. holds a GSK/British Lung Foundation Chair in Respiratory Research. R.G.J. is supported by a National Institute for Health Research (NIHR) Research Professorship (NIHR reference RP-2017-08-ST2-014). I.N. is supported by the NHLBI (R01HL130796). B.G.-G. is funded by Agencia Canaria de Investigación, Innovación y Sociedad de la Información (TESIS2015010057) cofunded by European Social Fund. J.M.O. is supported by the NHLBI (K23HL138190). C.F. is supported by the Spanish Ministry of Science, Innovation and Universities (grant RTC-2017-6471-1; Ministerio de Ciencia e Innovacion/Agencia Estatal de Investigación/Fondo Europeo de Desarrollo Regional, Unión Europea) cofinanced by the European Regional Development Funds "A way of making Europe" from the European Union and by agreement OA17/008 with Instituto Tecnológico y de Energías Renovables to strengthen scientific and technological education, training, research, development and innovation in Genomics, Personalized Medicine and Biotechnology. The Spain Biobank array genotyping service was performed at CEGEN-PRB3-ISCIII, which is supported by PT17/0019, of the PE I+D+i 2013–2016, funded by Instituto de Salud Carlos III, and cofinanced by the European Regional Development Funds. P.L.M. is an Action for Pulmonary Fibrosis Research Fellow. M.O. is a fellow of the Parker B. Francis Foundation and a Scholar of the Michael Smith Foundation for Health Research. B.D.H. is supported by NIH K08 HL136928, Parker B. Francis Research Opportunity Award. M.H.C. and G.M.H. are supported by NHLBI grants R01HL113264 (M.H.C.), R01HL137927 (M.H.C.), R01HL135142 (M.H.C. and G.M.H.), R01111024 (G.M.H.), and R01130974 (G.M.H.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funding body has no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. T.M.M. is supported by an NIHR Clinician Scientist Fellowship (NIHR Ref: CS-2013-13-017) and a British Lung Foundation Chair in Respiratory Research (C17-3). M.D.T. is supported by a Wellcome Trust Investigator Award (WT202849/Z/16/Z). The research was partially supported by the NIHR Leicester Biomedical Research Centre; the views expressed are those of the author(s) and not necessarily those of the National Health Service (NHS), the NIHR, or the Department of Health. I.P.H. was partially supported by the NIHR Nottingham Biomedical Research Centre; the views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. I.S. is supported by Medical Research Council (G1000861) and Asthma UK (AUK-PG-2013-188). D.F. was supported by an Intermediate Fellowship from the Wellcome Trust (097152/Z/11/Z). This work was partially supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre. V.N. is funded by an NIHR Clinical Lectureship. G.G. is supported by project grant 141513-051 from the Icelandic Research Fund and Landspitali Scientific Fund A-2016-023, A-2017-029, and A-2018-025. D.J.L. and A.M. are supported by Multi-Ethnic Study of Atherosclerosis (MESA) and the MESA SNP Health Association Resource (SHARe) project are conducted and supported by the NHLBI in collaboration with MESA investigators. Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1-TR-001881, and DK063491. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. Genotyping was performed at Affymetrix (Santa Clara, California) and the Broad Institute of Harvard and Massachusetts Institute of Technology (Boston, Massachusetts) using the Affymetrix Genome-Wide Human SNP Array 6.0. This work was supported by NIH grants R01 HL131565 (A.M.), R01 HL103676 (D.J.L.), and R01 HL137234 (D.J.L.). ; Peer Reviewed
Frost tolerance is a key trait with economic and agronomic importance in barley because it is a major component of winter hardiness, and therefore limits the geographical distribution of the crop and the effective transfer of quality traits between spring and winter crop types. Three main frost tolerance QTL (Fr-H1, Fr-H2 and Fr-H3) have been identified from bi-parental genetic mapping but it can be argued that those mapping populations only capture a portion of the genetic diversity of the species. A genetically broad dataset consisting of 184 genotypes, representative of the barley gene pool cultivated in the Mediterranean basin over an extended time period, was genotyped with 1536 SNP markers. Frost tolerance phenotype scores were collected from two trial sites, Foradada (Spain) and Fiorenzuola (Italy) and combined with the genotypic data in genome wide association analyses (GWAS) using Eigenstrat and kinship approaches to account for population structure. ; The above work was partially funded by the European Union-INCO-MED program (MABDE ICA3-CT2002-10026). The James Hutton Institute received grant in aid from the Scottish Government's Rural and Environment Science and Analytical Services Division. The Centre UdL-IRTA forms part of the Centre CONSOLIDER on Agrigenomics and acknowledges competitive grants and GEN2006-28560-E and AGL2011-30529-C02-02 from the Spanish Ministry of Science and Innovation.
BACKGROUND: The epilepsies are a clinically heterogeneous group of neurological disorders. Despite strong evidence for heritability, genome-wide association studies have had little success in identification of risk loci associated with epilepsy, probably because of relatively small sample sizes and insufficient power. We aimed to identify risk loci through meta-analyses of genome-wide association studies for all epilepsy and the two largest clinical subtypes (genetic generalised epilepsy and focal epilepsy). METHODS: We combined genome-wide association data from 12 cohorts of individuals with epilepsy and controls from population-based datasets. Controls were ethnically matched with cases. We phenotyped individuals with epilepsy into categories of genetic generalised epilepsy, focal epilepsy, or unclassified epilepsy. After standardised filtering for quality control and imputation to account for different genotyping platforms across sites, investigators at each site conducted a linear mixed-model association analysis for each dataset. Combining summary statistics, we conducted fixed-effects meta-analyses of all epilepsy, focal epilepsy, and genetic generalised epilepsy. We set the genome-wide significance threshold at p<1·66 × 10(-8). FINDINGS: We included 8696 cases and 26 157 controls in our analysis. Meta-analysis of the all-epilepsy cohort identified loci at 2q24.3 (p=8·71 × 10(-10)), implicating SCN1A, and at 4p15.1 (p=5·44 × 10(-9)), harbouring PCDH7, which encodes a protocadherin molecule not previously implicated in epilepsy. For the cohort of genetic generalised epilepsy, we noted a single signal at 2p16.1 (p=9·99 × 10(-9)), implicating VRK2 or FANCL. No single nucleotide polymorphism achieved genome-wide significance for focal epilepsy. INTERPRETATION: This meta-analysis describes a new locus not previously implicated in epilepsy and provides further evidence about the genetic architecture of these disorders, with the ultimate aim of assisting in disease classification and prognosis. The data suggest that specific loci can act pleiotropically raising risk for epilepsy broadly, or can have effects limited to a specific epilepsy subtype. Future genetic analyses might benefit from both lumping (ie, grouping of epilepsy types together) or splitting (ie, analysis of specific clinical subtypes). FUNDING: International League Against Epilepsy and multiple governmental and philanthropic agencies.
In: Twin research and human genetics: the official journal of the International Society for Twin Studies (ISTS) and the Human Genetics Society of Australasia, Volume 15, Issue 5, p. 615-623
Recent Genome-Wide Association Studies (GWAS) have identified four low-penetrance ovarian cancer susceptibility loci. We hypothesized that further moderate- or low-penetrance variants exist among the subset of single-nucleotide polymorphisms (SNPs) not well tagged by the genotyping arrays used in the previous studies, which would account for some of the remaining risk. We therefore conducted a time- and cost-effective stage 1 GWAS on 342 invasive serous cases and 643 controls genotyped on pooled DNA using the high-density Illumina 1M-Duo array. We followed up 20 of the most significantly associated SNPs, which are not well tagged by the lower density arrays used by the published GWAS, and genotyping them on individual DNA. Most of the top 20 SNPs were clearly validated by individually genotyping the samples used in the pools. However, none of the 20 SNPs replicated when tested for association in a much larger stage 2 set of 4,651 cases and 6,966 controls from the Ovarian Cancer Association Consortium. Given that most of the top 20 SNPs from pooling were validated in the same samples by individual genotyping, the lack of replication is likely to be due to the relatively small sample size in our stage 1 GWAS rather than due to problems with the pooling approach. We conclude that there are unlikely to be any moderate or large effects on ovarian cancer risk untagged by less dense arrays. However, our study lacked power to make clear statements on the existence of hitherto untagged small-effect variants.