The identification and characterisation of genomic changes (variants) that can lead to human diseases is one of the central aims of biomedical research. The generation of catalogues of genetic variants that have an impact on specific diseases is the basis of Personalised Medicine, where diagnoses and treatment protocols are selected according to each patient's profile. In this context, the study of complex diseases, such as Type 2 diabetes or cardiovascular alterations, is fundamental. However, these diseases result from the combination of multiple genetic and environmental factors, which makes the discovery of causal variants particularly challenging at a statistical and computational level. Genome-Wide Association Studies (GWAS), which are based on the statistical analysis of genetic variant frequencies across non-diseased and diseased individuals, have been successful in finding genetic variants that are associated to specific diseases or phenotypic traits. But GWAS methodology is limited when considering important genetic aspects of the disease and has not yet resulted in meaningful translation to clinical practice. This review presents an outlook on the study of the link between genetics and complex phenotypes. We first present an overview of the past and current statistical methods used in the field. Next, we discuss current practices and their main limitations. Finally, we describe the open challenges that remain and that might benefit greatly from further mathematical developments. ; L.A. was supported by grant BES-2017-081635. This publication is part of R&D and Innovation grant BES-2017-081635 funded by MCIN and by "FSE Investing in your future"I.M. was supported by grant FJCI-2017-31878. This publication is part of R&D and Innovation grant FJCI-2017-31878 funded by MCIN. C.S. received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement H2020-MSCA-COFUND-2016-754433. ; Peer Reviewed ; Postprint (published version)
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies. ; GCAT|Genomes for Life, a cohort study of the Genomes of Catalonia, Fundació Institut Germans Trias i Pujol (IGTP); IGTP is part of the CERCA Program/Generalitat de Catalunya; GCAT is supported by Acción de Dinamización del ISCIII-MINECO; Ministry of Health of the Generalitat of Catalunya [ADE 10/00026]; Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR) [2017-SGR 529]; B.C. is supported by national grants [PI18/01512]; X.F. is supported by VEIS project [001-P-001647] (co-funded by European Regional Development Fund (ERDF), 'A way to build Europe'); a full list of the investigators who contributed to the generation of the GCAT data is available from www.genomesforlife.com/; Severo Ochoa Program, awarded by the Spanish Government [SEV-2011-00067 and SEV2015-0493]; Spanish Ministry of Science [TIN2015-65316-P]; Innovation and by the Generalitat de Catalunya [2014-SGR-1051 to D.T.]; Agencia Estatal de Investigación (AEI, Spain) [BFU2016-77244-R and PID2019-107836RB-I00]; European Regional Development Fund (FEDER, EU) (to M.C.); Spanish Ministry of Science and Innovation [FPI BES-2016-0077344 to J.V.M.]; C.S. received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement [H2020-MSCA-COFUND-2016-754433]; this study made use of data generated by the UK10K Consortium from UK10K COHORT IMPUTATION [EGAS00001000713]; formal agreement with the Barcelona Supercomputing Center (BSC); this study made use of data generated by the Genome of the Netherlands' project, which is funded by the Netherlands Organization for Scientific Research [184021007], allowing us to use the GoNL reference panel containing SVs, upon request (GoNL Data Access request 2019203); this study also used data generated by the Haplotype Reference Consortium (HRC) accessed through the European Genome-phenome Archive with the accession numbers EGAD00001002729; formal agreement of the Barcelona Supercomputing Center (BSC) with WTSI; this study made use of data generated by the 1000 Genomes (1000G), accessed through the FTP portal (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/); this study used the GeneHancer-for-AnnotSV dump for GeneCards Suite Version 4.14, through a formal agreement between the BSC and the Weizmann Institute of Science. ; Peer Reviewed ; "Article signat per 21 autors/es: Jordi Valls-Margarit, Iván Galván-Femenía, Daniel Matías-Sánchez, Natalia Blay, Montserrat Puiggròs, Anna Carreras, Cecilia Salvoro, Beatriz Cortés, Ramon Amela, Xavier Farre, Jon Lerga-Jaso, Marta Puig, Jose Francisco Sánchez-Herrero, Victor Moreno, Manuel Perucho, Lauro Sumoy, Lluís Armengol, Olivier Delaneau, Mario Cáceres, Rafael de Cid, David Torrents" ; Postprint (published version)
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies. ; GCAT|Genomes for Life, a cohort study of the Genomes of Catalonia, Fundació Institut Germans Trias i Pujol (IGTP); IGTP is part of the CERCA Program/Generalitat de Catalunya; GCAT is supported by Acción de Dinamización del ISCIII-MINECO; Ministry of Health of the Generalitat of Catalunya [ADE 10/00026]; Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR) [2017-SGR 529]; B.C. is supported by national grants [PI18/01512]; X.F. is supported by VEIS project [001-P-001647] (co-funded by European Regional Development Fund (ERDF), 'A way to build Europe'); a full list of the investigators who contributed to the generation of the GCAT data is available from www.genomesforlife.com/; Severo Ochoa Program, awarded by the Spanish Government [SEV-2011-00067 and SEV2015-0493]; Spanish Ministry of Science [TIN2015-65316-P]; Innovation and by the Generalitat de Catalunya [2014-SGR-1051 to D.T.]; Agencia Estatal de Investigación (AEI, Spain) [BFU2016-77244-R and PID2019-107836RB-I00]; European Regional Development Fund (FEDER, EU) (to M.C.); Spanish Ministry of Science and Innovation [FPI BES-2016-0077344 to J.V.M.]; C.S. received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement [H2020-MSCA-COFUND-2016-754433]; this study made use of data generated by the UK10K Consortium from UK10K COHORT IMPUTATION [EGAS00001000713]; formal agreement with the Barcelona Supercomputing Center (BSC); this study made use of data generated by the Genome of the Netherlands' project, which is funded by the Netherlands Organization for Scientific Research [184021007], allowing us to use the GoNL reference panel containing SVs, upon request (GoNL Data Access request 2019203); this study also used data generated by the Haplotype Reference Consortium (HRC) accessed through the European Genome-phenome Archive with the accession numbers EGAD00001002729; formal agreement of the Barcelona Supercomputing Center (BSC) with WTSI; this study made use of data generated by the 1000 Genomes (1000G), accessed through the FTP portal (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/); this study used the GeneHancer-for-AnnotSV dump for GeneCards Suite Version 4.14, through a formal agreement between the BSC and the Weizmann Institute of Science. Funding for open access charge: GCAT|Genomes for Life, a cohort study of the Genomes of Catalonia, Fundació Institut Germans Trias i Pujol (IGTP); IGTP is part of the CERCA Program/Generalitat de Catalunya; GCAT is supported by Acción de Dinamización del ISCIII-MINECO; Ministry of Health of the Generalitat of Catalunya [ADE 10/00026]; Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR) [2017-SGR 529]; B.C. is supported by national grants [PI18/01512]; X.F. is supported by VEIS project [001-P-001647] (co-funded by European Regional Development Fund (ERDF), 'A way to build Europe'); a full list of the investigators who contributed to the generation of the GCAT data is available from www.genomesforlife.com/; Severo Ochoa Program, awarded by the Spanish Government [SEV-2011-00067 and SEV2015-0493]; Spanish Ministry of Science [TIN2015-65316-P]; Innovation and by the Generalitat de Catalunya [2014-SGR-1051 to D.T.]; [Agencia Estatal de Investigación (AEI, Spain) [BFU2016-77244-R and PID2019-107836RB-I00]; European Regional Development Fund (FEDER, EU) (to M.C.); Spanish Ministry of Science and Innovation [FPI BES-2016-0077344 to J.V.M.]; C.S. received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement [H2020-MSCA-COFUND-2016-754433]; this study made use of data generated by the UK10K Consortium from UK10K COHORT IMPUTATION [EGAS00001000713]; formal agreement with the Barcelona Supercomputing Center (BSC); this study made use of data generated by the Genome of the Netherlands' project, which is funded by the Netherlands Organization for Scientific Research [184021007], allowing us to use the GoNL reference panel containing SVs, upon request (GoNL Data Access request 2019203); this study also used data generated by the Haplotype Reference Consortium (HRC) accessed through the European Genome-phenome Archive with the accession numbers EGAD00001002729; formal agreement of the Barcelona Supercomputing Center (BSC) with WTSI; this study made use of data generated by the 1000 Genomes (1000G), accessed through the FTP portal (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/); this study used the GeneHancer-for-AnnotSV dump for GeneCards Suite Version 4.14, through a formal agreement between the BSC and The Weizmann Institute of Science. ; "Article signat per 21 autors/es: Jordi Valls-Margarit, Iván Galván-Femenía, Daniel Matías-Sánchez, Natalia Blay, Montserrat Puiggròs, Anna Carreras, Cecilia Salvoro, Beatriz Cortés, Ramon Amela, Xavier Farre, Jon Lerga-Jaso, Marta Puig, Jose Francisco Sánchez-Herrero, Victor Moreno, Manuel Perucho, Lauro Sumoy, Lluís Armengol, Olivier Delaneau, Mario Cáceres, Rafael de Cid, David Torrents" ; Postprint (published version)
Genome-wide association studies (GWAS) are not fully comprehensive, as current strategies typically test only the additive model, exclude the X chromosome, and use only one reference panel for genotype imputation. We implement an extensive GWAS strategy, GUIDANCE, which improves genotype imputation by using multiple reference panels and includes the analysis of the X chromosome and non-additive models to test for association. We apply this methodology to 62,281 subjects across 22 age-related diseases and identify 94 genome-wide associated loci, including 26 previously unreported. Moreover, we observe that 27.7% of the 94 loci are missed if we use standard imputation strategies with a single reference panel, such as HRC, and only test the additive model. Among the new findings, we identify three novel low-frequency recessive variants with odds ratios larger than 4, which need at least a three-fold larger sample size to be detected under the additive model. This study highlights the benefits of applying innovative strategies to better uncover the genetic architecture of complex diseases. ; This work has been sponsored by the grant SEV-2011-00067 and SEV2015-0493 of Severo Ochoa Program, awarded by the Spanish Government, by the grant TIN2015- 65316-P, awarded by the Spanish Ministry of Science and Innovation, and by the Generalitat de Catalunya (contract 2014-SGR-1051). This work was supported by an EFSD/Lilly research fellowship. Josep M. Mercader was supported by a Sara Borrell Fellowship from the Instituto Carlos III, Beatriu de Pinós fellowship from the Agency for Management of University and Research Grants (AGAUR) and by the American Diabetes Association Innovative and Clinical Translational Award 1-19-ICTS-068. Sílvia Bonàs was supported by FI-DGR Fellowship from FIDGR 2013 from Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya), and a 'Juan de la Cierva' postdoctoral fellowship (MINECO;FJCI-2017-32090). Cecilia Salvoro received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement H2020-MSCA-COFUND-2016- 754433. Cristian Ramon-Cortes pre-doctoral contract is financed by the Spanish Ministry of Science, Innovation, and Universities under contract BES-2016-076791. Elizabeth G. Atkinson was supported by the National Institutes of Mental Health (grants K01MH121659 and T32MH017119). Jose Florez was supported by NIH/NIDDK award K24 DK110550. This study made use of data generated by the UK10K Consortium, derived from samples from UK10K COHORT IMPUTATION (EGAS00001000713). A full list of the investigators who contributed to the generation of the data is available at www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310. This study made use of data generated by the 'Genome of the Netherlands' project, which is funded by the Netherlands Organization for Scientific Research (grant no. 184021007). The data were made available as a Rainbow Project of BBMRI-NL. Samples were contributed by LifeLines (http://lifelines.nl/lifelines-research/general), the Leiden Longevity Study (http://www.healthy-ageing.nl; http://www.langleven.net), the Netherlands Twin Registry (NTR: http://www.tweelingenregister.org), the Rotterdam studies (http://www.erasmus-epidemiology.nl/rotterdamstudy) and the Genetic Research in Isolated Populations program (http://www.epib.nl/research/geneticepi/research. html#gip). The sequencing was carried out in collaboration with the Beijing Institute for Genomics (BGI). This study also made use of data generated by The Haplotype Reference Consortium (HRC) accessed through The European Genome-phenome Archive at the European Bioinformatics Institute with the accession numbers EGAD00001002729, after a form agreed by the Barcelona Supercomputing Center (BSC) with WTSI. This research has been conducted using also the UK Biobank Resource (application number 31063 and 27892). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 07/16/2019. We acknowledge PRACE for awarding us access to both MareNostrum supercomputer from the Barcelona Supercomputing Center, based in Spain at Barcelona, and the SuperMUC supercomputer of the Leibniz Supercomputing Center (LRZ), based in Garching at Germany (proposals numbers 2016143358 and 2016163985). The technical support group from the Barcelona Supercomputing Center is gratefully acknowledged. Finally, we thank all the Computational Genomics group at the BSC for their helpful discussions and valuable comments on the manuscript. We also acknowledge Elias Rodriguez Fos for designing the GUIDANCE logo. ; Peer Reviewed ; Article signat per 22 autors/autores: Marta Guindo-Martínez 1,18; Ramon Amela 1,18; Silvia Bonàs-Guarch 1,2,3; Montserrat Puiggròs 1; Cecilia Salvoro 1; Irene Miguel-Escalada 1,2,3; Caitlin E. Carey 4,5; Joanne B. Cole 6,7,8,9; Sina Rüeger 10; Elizabeth Atkinson 4,5,11; Aaron Leong 8,12; Friman Sanchez 1; Cristian Ramon-Cortes 1; Jorge Ejarque 1; Duncan S. Palmer 4,5,17; Mitja Kurki 10; FinnGen Consortium*, Krishna Aragam 11,13,14; Jose C. Florez 6,7,15; Rosa M. Badia 1; Josep M. Mercader 1,6,7,15,19✉ & David Torrents 1,16,19✉ *A full list of members and their affiliations appears in the Supplementary Information 1 Barcelona Supercomputing Center (BSC), Barcelona, Spain. 2 Regulatory Genomics and Diabetes, Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain. 3 CIBER de Diabetes y Enfermedades Metabólicas Asociadas, Madrid, Spain. 4 Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 5 Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA. 6 Programs in Metabolism and Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 7 Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA. 8 Harvard Medical School, Boston, MA, USA. 9 Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA. 10 Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland. 11 Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 12 Department of Medicine, Massachusetts General Hospital, Boston, MA, USA. 13 Cardiology Division, Massachusetts General Hospital, Boston, MA, USA. 14 Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA. 15 Department of Medicine, Harvard Medical School, Boston, MA, USA. 16 Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain. 17 Present address: GENOMICS plc, Oxford, UK. 18 These authors contributed equally: Marta Guindo-Martínez, Ramon Amela. 19 These authors jointly supervised this work: Josep M. Mercader, David Torrents. ; Postprint (published version)