[Motivation] Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. ; [Results] We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the 'spuriously correlated' SNP merely happens to be correlated with the 'truly causal' SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. ; E.F. and L.B. gratefully acknowledge financial support from the European Research Council (grant 295642, The Foundations of Economic Preferences, FEP). D.S. gratefully acknowledges financial support from the German National Science Foundation (DFG, grant SCHU 2828/2-1, Inference statistical methods for behavioral genetics and neuroeconomics). A.N. gratefully acknowledges support from the Instituto de Salud Carlos III (grants RD12/0032/0011 and PT13/0001/0026) and the Spanish Government Grant (BFU2012-38236) and from FEDER. ; Peer reviewed
Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P -values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the 'spuriously correlated' SNP merely happens to be correlated with the 'truly causal' SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. Availability and implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. Contact:peter.buehlmann@stat.math.ethz.ch Supplementary information:Supplementary data are available at Bioinformatics online. ; E.F. and L.B. gratefully acknowledge financial support from the European Research Council (grant 295642, The Foundations of Economic Preferences, FEP). D.S. gratefully acknowledges financial support from the German National Science Foundation (DFG, grant SCHU 2828/2-1, Inference statistical methods for behavioral genetics and neuroeconomics). A.N. gratefully acknowledges support from the Instituto de Salud Carlos III (grants RD12/0032/0011 and PT13/0001/0026) and the Spanish Government Grant (BFU2012-38236) and from FEDER.
[Background] Plant breeding has been proposed as one of the most effective and environmentally safe methods to control fungal infection and to reduce fumonisin accumulation. However, conventional breeding can be hampered by the complex genetic architecture of resistance to fumonisin accumulation and marker-assisted selection is proposed as an efficient alternative. In the current study, GWAS has been performed for the first time for detecting high-resolution QTL for resistance to fumonisin accumulation in maize kernels complementing published GWAS results for Fusarium ear rot. ; [Results] Thirty-nine SNPs significantly associated with resistance to fumonisin accumulation in maize kernels were found and clustered into 17 QTL. Novel QTLs for fumonisin content would be at bins 3.02, 5.02, 7.05 and 8.07. Genes with annotated functions probably implicated in resistance to pathogens based on previous studies have been highlighted. ; [Conclusions] Breeding approaches to fix favorable functional variants for genes implicated in maize immune response signaling may be especially useful to reduce kernel contamination with fumonisins without significantly interfering in mycelia development and growth and, consequently, in the beneficial endophytic behavior of Fusarium verticillioides. ; This research was funded by the Autonomous Government of Galicia, Spain (project IN607A/013), and by the "Secretaría de Estado de Investigación, Desarrollo e Innovación", Spain, within the projects AGL2015–67313-C2–1-R and AGL2015–67313-C2–2-R, which were co-financed with European Social Funds. R. Santiago acknowledges postdoctoral contract "Ramón y Cajal" financed by the "Secretaría de Estado de Investigación, Desarrollo e Innovación" and co-financed by the "Universidad de Vigo", Spain, and the European Social Funds.
The popularization of large-scale federated Genome-Wide Association Study (GWAS) where multiple data owners share their genome data to conduct federated analytics uncovers new privacy issues that have remained unnoticed or not given proper attention. Indeed, as soon as a diverse type of interested parties (e.g., private or public biocenters and governmental institutions from around the globe) and individuals from heterogeneous populations are participating in cooperative studies, interdependent and multi-party privacy appear as crucial issues that are currently not adequately assessed. In fact, in federated GWAS environments, the privacy of individuals and parties does not depend solely on their own behavior anymore but also on others, because a collaborative environment opens new credible adversary models. For instance, one might want to tailor the privacy guarantees to withstand the presence of potentially colluding federation members aiming to violate other members' data privacy and the privacy deterioration that might occur in the presence of interdependent genomic data (e.g., due to the presence of relatives in studies or the perpetuation of previous genomic privacy leaks in future studies). In this work, we catalog and discuss the features, unsolved problems, and challenges to tackle toward truly end-to-end private and practical federated GWAS.
In: Twin research and human genetics: the official journal of the International Society for Twin Studies (ISTS) and the Human Genetics Society of Australasia, Band 13, Heft 4, S. 398-403
AbstractSelf-rated health questions have been proven to be a highly reliable and valid measure of overall health as measured by other indicators in many population groups. It also has been shown to be a very good predictor of mortality, chronic or severe diseases, and the need for services, and is positively correlated with clinical assessments. Genetic factors have been estimated to account for 25–64% of the variance in the liability of self-rated health. The aim of the present study was to identify Single Nucleotide Polymorphisms (SNPs) underlying the heritability of self-rated health by conducting a genome-wide association analysis in a large sample of 6,706 Australian individuals aged 18–92. No genome wide significant SNPs associated with self-rated health could be identified, indicating that self-rated health may be influenced by a large number of SNPs with very small effect size. A very large sample will be needed to identify these SNPs.
Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary data are available at Bioinformatics online.
The identification and characterisation of genomic changes (variants) that can lead to human diseases is one of the central aims of biomedical research. The generation of catalogues of genetic variants that have an impact on specific diseases is the basis of Personalised Medicine, where diagnoses and treatment protocols are selected according to each patient's profile. In this context, the study of complex diseases, such as Type 2 diabetes or cardiovascular alterations, is fundamental. However, these diseases result from the combination of multiple genetic and environmental factors, which makes the discovery of causal variants particularly challenging at a statistical and computational level. Genome-Wide Association Studies (GWAS), which are based on the statistical analysis of genetic variant frequencies across non-diseased and diseased individuals, have been successful in finding genetic variants that are associated to specific diseases or phenotypic traits. But GWAS methodology is limited when considering important genetic aspects of the disease and has not yet resulted in meaningful translation to clinical practice. This review presents an outlook on the study of the link between genetics and complex phenotypes. We first present an overview of the past and current statistical methods used in the field. Next, we discuss current practices and their main limitations. Finally, we describe the open challenges that remain and that might benefit greatly from further mathematical developments. ; L.A. was supported by grant BES-2017-081635. This publication is part of R&D and Innovation grant BES-2017-081635 funded by MCIN and by "FSE Investing in your future"I.M. was supported by grant FJCI-2017-31878. This publication is part of R&D and Innovation grant FJCI-2017-31878 funded by MCIN. C.S. received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement H2020-MSCA-COFUND-2016-754433. ; Peer Reviewed ; Postprint (published version)
In: United Kingdom and Ireland Renal Transplant Consortium (UKIRTC) and the Wellcome Trust Case Control Consortium (WTCCC)-3 2018 , ' Long- and short- term outcomes in renal allografts with deceased donors: A large recipient and donor genome- wide association study: A large recipient and donor genome-wide association study ' , American Journal of Transplantation , vol. 18 , no. 6 , pp. 1370-1379 . https://doi.org/10.1111/ajt.14594
Improvements in immunosuppression have modified short- term survival of deceased- donor allografts, but not their rate of long- term failure. Mismatches between donor and recipient HLA play an important role in the acute and chronic allogeneic immune response against the graft. Perfect matching at clinically relevant HLA loci does not obviate the need for immunosuppression, suggesting that additional genetic variation plays a critical role in both short- and long- term graft outcomes. By combining patient data and samples from supranational cohorts across the United Kingdom and European Union, we performed the first large- scale genome- wide association study analyzing both donor and recipient DNA in 2094 complete renal transplant-pairs with replication in 5866 complete pairs. We studied deceased- donor grafts allocated on the basis of preferential HLA matching, which provided some control for HLA genetic effects. No strong donor or recipient genetic effects contributing to long- or short- term allograft survival were found outside the HLA region. We discuss the implications for future research and clinical application.
Publisher's version (útgefin grein) ; Rationale: Idiopathic pulmonary fibrosis (IPF) is a complex lung disease characterized by scarring of the lung that is believed to result from an atypical response to injury of the epithelium. Genome-wide association studies have reported signals of association implicating multiple pathways including host defense, telomere maintenance, signaling, and cell-cell adhesion. Objectives: To improve our understanding of factors that increase IPF susceptibility by identifying previously unreported genetic associations. Methods: We conducted genome-wide analyses across three independent studies and meta-analyzed these results to generate the largest genome-wide association study of IPF to date (2,668 IPF cases and 8,591 controls). We performed replication in two independent studies (1,456 IPF cases and 11,874 controls) and functional analyses (including statistical fine-mapping, investigations into gene expression, and testing for enrichment of IPF susceptibility signals in regulatory regions) to determine putatively causal genes. Polygenic risk scores were used to assess the collective effect of variants not reported as associated with IPF. Measurements and Main Results: We identified and replicated threenewgenome-wide significant (P<5×10-8) signals of association with IPF susceptibility (associated with altered gene expression of KIF15, MAD1L1, and DEPTOR) and confirmed associations at 11 previously reported loci. Polygenic risk score analyses showed that the combined effect of many thousands of as yet unreported IPF susceptibility variants contribute to IPF susceptibility. Conclusions: The observation that decreased DEPTOR expression associates with increased susceptibility to IPF supports recent studies demonstrating the importance of mTOR signaling in lung fibrosis. New signals of association implicating KIF15 and MAD1L1 suggest a possible role of mitotic spindle-assembly genes in IPF susceptibility. ; R.J.A. is an Action for Pulmonary Fibrosis Research Fellow. L.V.W. holds a GSK/British Lung Foundation Chair in Respiratory Research. R.G.J. is supported by a National Institute for Health Research (NIHR) Research Professorship (NIHR reference RP-2017-08-ST2-014). I.N. is supported by the NHLBI (R01HL130796). B.G.-G. is funded by Agencia Canaria de Investigación, Innovación y Sociedad de la Información (TESIS2015010057) cofunded by European Social Fund. J.M.O. is supported by the NHLBI (K23HL138190). C.F. is supported by the Spanish Ministry of Science, Innovation and Universities (grant RTC-2017-6471-1; Ministerio de Ciencia e Innovacion/Agencia Estatal de Investigación/Fondo Europeo de Desarrollo Regional, Unión Europea) cofinanced by the European Regional Development Funds "A way of making Europe" from the European Union and by agreement OA17/008 with Instituto Tecnológico y de Energías Renovables to strengthen scientific and technological education, training, research, development and innovation in Genomics, Personalized Medicine and Biotechnology. The Spain Biobank array genotyping service was performed at CEGEN-PRB3-ISCIII, which is supported by PT17/0019, of the PE I+D+i 2013–2016, funded by Instituto de Salud Carlos III, and cofinanced by the European Regional Development Funds. P.L.M. is an Action for Pulmonary Fibrosis Research Fellow. M.O. is a fellow of the Parker B. Francis Foundation and a Scholar of the Michael Smith Foundation for Health Research. B.D.H. is supported by NIH K08 HL136928, Parker B. Francis Research Opportunity Award. M.H.C. and G.M.H. are supported by NHLBI grants R01HL113264 (M.H.C.), R01HL137927 (M.H.C.), R01HL135142 (M.H.C. and G.M.H.), R01111024 (G.M.H.), and R01130974 (G.M.H.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funding body has no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. T.M.M. is supported by an NIHR Clinician Scientist Fellowship (NIHR Ref: CS-2013-13-017) and a British Lung Foundation Chair in Respiratory Research (C17-3). M.D.T. is supported by a Wellcome Trust Investigator Award (WT202849/Z/16/Z). The research was partially supported by the NIHR Leicester Biomedical Research Centre; the views expressed are those of the author(s) and not necessarily those of the National Health Service (NHS), the NIHR, or the Department of Health. I.P.H. was partially supported by the NIHR Nottingham Biomedical Research Centre; the views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. I.S. is supported by Medical Research Council (G1000861) and Asthma UK (AUK-PG-2013-188). D.F. was supported by an Intermediate Fellowship from the Wellcome Trust (097152/Z/11/Z). This work was partially supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre. V.N. is funded by an NIHR Clinical Lectureship. G.G. is supported by project grant 141513-051 from the Icelandic Research Fund and Landspitali Scientific Fund A-2016-023, A-2017-029, and A-2018-025. D.J.L. and A.M. are supported by Multi-Ethnic Study of Atherosclerosis (MESA) and the MESA SNP Health Association Resource (SHARe) project are conducted and supported by the NHLBI in collaboration with MESA investigators. Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1-TR-001881, and DK063491. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. Genotyping was performed at Affymetrix (Santa Clara, California) and the Broad Institute of Harvard and Massachusetts Institute of Technology (Boston, Massachusetts) using the Affymetrix Genome-Wide Human SNP Array 6.0. This work was supported by NIH grants R01 HL131565 (A.M.), R01 HL103676 (D.J.L.), and R01 HL137234 (D.J.L.). ; Peer Reviewed
Frost tolerance is a key trait with economic and agronomic importance in barley because it is a major component of winter hardiness, and therefore limits the geographical distribution of the crop and the effective transfer of quality traits between spring and winter crop types. Three main frost tolerance QTL (Fr-H1, Fr-H2 and Fr-H3) have been identified from bi-parental genetic mapping but it can be argued that those mapping populations only capture a portion of the genetic diversity of the species. A genetically broad dataset consisting of 184 genotypes, representative of the barley gene pool cultivated in the Mediterranean basin over an extended time period, was genotyped with 1536 SNP markers. Frost tolerance phenotype scores were collected from two trial sites, Foradada (Spain) and Fiorenzuola (Italy) and combined with the genotypic data in genome wide association analyses (GWAS) using Eigenstrat and kinship approaches to account for population structure. ; The above work was partially funded by the European Union-INCO-MED program (MABDE ICA3-CT2002-10026). The James Hutton Institute received grant in aid from the Scottish Government's Rural and Environment Science and Analytical Services Division. The Centre UdL-IRTA forms part of the Centre CONSOLIDER on Agrigenomics and acknowledges competitive grants and GEN2006-28560-E and AGL2011-30529-C02-02 from the Spanish Ministry of Science and Innovation.
In: Twin research and human genetics: the official journal of the International Society for Twin Studies (ISTS) and the Human Genetics Society of Australasia, Band 15, Heft 5, S. 615-623
Recent Genome-Wide Association Studies (GWAS) have identified four low-penetrance ovarian cancer susceptibility loci. We hypothesized that further moderate- or low-penetrance variants exist among the subset of single-nucleotide polymorphisms (SNPs) not well tagged by the genotyping arrays used in the previous studies, which would account for some of the remaining risk. We therefore conducted a time- and cost-effective stage 1 GWAS on 342 invasive serous cases and 643 controls genotyped on pooled DNA using the high-density Illumina 1M-Duo array. We followed up 20 of the most significantly associated SNPs, which are not well tagged by the lower density arrays used by the published GWAS, and genotyping them on individual DNA. Most of the top 20 SNPs were clearly validated by individually genotyping the samples used in the pools. However, none of the 20 SNPs replicated when tested for association in a much larger stage 2 set of 4,651 cases and 6,966 controls from the Ovarian Cancer Association Consortium. Given that most of the top 20 SNPs from pooling were validated in the same samples by individual genotyping, the lack of replication is likely to be due to the relatively small sample size in our stage 1 GWAS rather than due to problems with the pooling approach. We conclude that there are unlikely to be any moderate or large effects on ovarian cancer risk untagged by less dense arrays. However, our study lacked power to make clear statements on the existence of hitherto untagged small-effect variants.
WOS: 000373197500020 ; PubMed ID: 27016271 ; BACKGROUND AND OBJECTIVE: Developmental language disorder (DLD) is a highly prevalent neurodevelopmental disorder associated with negative outcomes in different domains; the etiology of DLD is unknown. To investigate the genetic underpinnings of DLD, we performed genome-wide association and whole exome sequencing studies in a geographically isolated population with a substantially elevated prevalence of the disorder (ie, the AZ sample). METHODS: DNA samples were collected from 359 individuals for the genome-wide association study and from 12 severely affected individuals for whole exome sequencing. Multifaceted phenotypes, representing major domains of expressive language functioning, were derived from collected speech samples. RESULTS: Gene-based analyses revealed a significant association between SETBP1 and complexity of linguistic output (P = 5.47 x 10(-7)). The analysis of exome variants revealed coding sequence variants in 14 genes, most of which play a role in neural development. Targeted enrichment analysis implicated myocyte enhancer factor-2 (MEF2)-regulated genes in DLD in the AZ population. The main findings were successfully replicated in an independent cohort of children at risk for related disorders (n = 37). CONCLUSIONS: MEF2-regulated pathways were identified as potential candidate pathways in the etiology of DLD. Several genes (including the candidate SETBP1 and other MEF2-related genes) seem to jointly influence certain, but not all, facets of the DLD phenotype. Even when genetic and environmental diversity is reduced, DLD is best conceptualized as etiologically complex. Future research should establish whether the signals detected in the AZ population can be replicated in other samples and languages and provide further characterization of the identified pathway. ; National Institute of Health [R01 DC007665, P50 HD052120]; NIH Centers for Mendelian Genomics [5U54HG006504]; National Science Foundation Integrative Graduate Education and Research Traineeship grant [114399]; Government of the Russian Federation [14.Z50.31.0027]; National Institutes of Health (NIH) ; Supported by National Institute of Health grants R01 DC007665 (Dr Grigorenko, Principal Investigator) and P50 HD052120 (Richard Wagner, Principal Investigator), NIH Centers for Mendelian Genomics (5U54HG006504), National Science Foundation Integrative Graduate Education and Research Traineeship grant 114399 (Dr Magnuson, Principal Investigator), and grant 14.Z50.31.0027 from the Government of the Russian Federation (Dr Grigorenko, Principal Investigator). Funded by the National Institutes of Health (NIH).
WOS: 000373197500020 ; PubMed ID: 27016271 ; BACKGROUND AND OBJECTIVE: Developmental language disorder (DLD) is a highly prevalent neurodevelopmental disorder associated with negative outcomes in different domains; the etiology of DLD is unknown. To investigate the genetic underpinnings of DLD, we performed genome-wide association and whole exome sequencing studies in a geographically isolated population with a substantially elevated prevalence of the disorder (ie, the AZ sample). METHODS: DNA samples were collected from 359 individuals for the genome-wide association study and from 12 severely affected individuals for whole exome sequencing. Multifaceted phenotypes, representing major domains of expressive language functioning, were derived from collected speech samples. RESULTS: Gene-based analyses revealed a significant association between SETBP1 and complexity of linguistic output (P = 5.47 x 10(-7)). The analysis of exome variants revealed coding sequence variants in 14 genes, most of which play a role in neural development. Targeted enrichment analysis implicated myocyte enhancer factor-2 (MEF2)-regulated genes in DLD in the AZ population. The main findings were successfully replicated in an independent cohort of children at risk for related disorders (n = 37). CONCLUSIONS: MEF2-regulated pathways were identified as potential candidate pathways in the etiology of DLD. Several genes (including the candidate SETBP1 and other MEF2-related genes) seem to jointly influence certain, but not all, facets of the DLD phenotype. Even when genetic and environmental diversity is reduced, DLD is best conceptualized as etiologically complex. Future research should establish whether the signals detected in the AZ population can be replicated in other samples and languages and provide further characterization of the identified pathway. ; National Institute of Health [R01 DC007665, P50 HD052120]; NIH Centers for Mendelian Genomics [5U54HG006504]; National Science Foundation Integrative Graduate Education and Research Traineeship grant [114399]; Government of the Russian Federation [14.Z50.31.0027]; National Institutes of Health (NIH) ; Supported by National Institute of Health grants R01 DC007665 (Dr Grigorenko, Principal Investigator) and P50 HD052120 (Richard Wagner, Principal Investigator), NIH Centers for Mendelian Genomics (5U54HG006504), National Science Foundation Integrative Graduate Education and Research Traineeship grant 114399 (Dr Magnuson, Principal Investigator), and grant 14.Z50.31.0027 from the Government of the Russian Federation (Dr Grigorenko, Principal Investigator). Funded by the National Institutes of Health (NIH).
Leukocyte telomere length (LTL) is a heritable biomarker of genomic aging. In this study, we perform a genome-wide meta-analysis of LTL by pooling densely genotyped and imputed association results across large-scale European-descent studies including up to 78,592 individuals. We identify 49 genomic regions at a false dicovery rate (FDR) 350,000 UK Biobank participants suggest that genetically shorter telomere length increases the risk of hypothyroidism and decreases the risk of thyroid cancer, lymphoma, and a range of proliferative conditions. Our results replicate previously reported associations with increased risk of coronary artery disease and lower risk for multiple cancer types. Our findings substantially expand current knowledge on genes that regulate LTL and their impact on human health and disease. ; The ENGAGE Project was funded under the European Union Framework 7 – Health Theme (HEALTH-F4-2007- 201413). The InterAct project received funding from the European Union (Integrated Project LSHM-CT-2006-037197 in the Framework Programme 6 of the European Community). The EPIC-CVD study was supported by core funding from the UK Medical Research Council (MR/L003120/1), the British Heart Foundation (RG/13/13/30194; RG/18/13/33946), the European Commission Framework Programme 7 (HEALTH-F2-2012-279233), and the National Institute for Health Research [Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust]. C.P.N is funded by the BHF. V.C., C.P.N. and N.J.S. are supported by the NIHR Leicester Cardiovascular Biomedical Research Centre and N.J.S. holds an NIHR Senior Investigator award. Chen Li is support by a 4-year Wellcome Trust PhD Studentship; CL, LAL, NJW are funded by the Medical Research Council (MC_UU_12015/1). NJW is an NIHR Senior Investigator. JD is funded by the National Institute for Health Research [Senior Investigator Award]. Cohort specific and further acknowledgements are given in the Supplemental Data.