Abstract Genomic rearrangements describe gross DNA changes of the size ranging from a couple of hundred base pairs, the size of an average exon, to megabases (Mb). When greater than 3 to 5 Mb, such changes are usually visible microscopically by chromosome studies. Human diseases that result from genomic rearrangements have been called genomic disorders. Three major mechanisms have been proposed for genomic rearrangements in the human genome. Non-allelic homologous recombination (NAHR) is mostly mediated by low-copy repeats (LCRs) with recombination hotspots, gene conversion and apparent minimal efficient processing segments. NAHR accounts for most of the recurrent rearrangements: those that share a common size, show clustering of breakpoints, and recur in multiple individuals. Non-recurrent rearrangements are of different sizes in each patient, but may share a smallest region of overlap whose change in copy number may result in shared clinical features among different patients. LCRs do not mediate, but may stimulate non-recurrent events. Some rare NAHRs can also be mediated by highly homologous repetitive sequences (for example, Alu, LINE); these NAHRs account for some of the non-recurrent rearrangements. Other non-recurrent rearrangements can be explained by non-homologous end-joining (NHEJ) and the Fork Stalling and Template Switching (FoSTeS) models. These mechanisms occur both in germ cells, where the rearrangements can be associated with genomic disorders, and in somatic cells in which such genomic rearrangements can cause disorders such as cancer. NAHR, NHEJ and FoSTeS probably account for the majority of genomic rearrangements in our genome and the frequency distribution of the three at a given locus may partially reflect the genomic architecture in proximity to that locus. We provide a review of the current understanding of these three models.
Pathogenic variants in MYH3 cause distal arthrogryposis type 2A and type 2B3 as well as contractures, pterygia and spondylocarpotarsal fusion syndromes types 1A and 1B. These disorders are ultra-rare and their natural course and phenotypic variability are not well described. In this study, we summarize the clinical features and genetic findings of 17 patients from 10 unrelated families with vertebral malformations caused by dominant or recessive pathogenic variants in MYH3. Twelve novel pathogenic variants in MYH3 (NM_002470.4) were identified: three of them were de novo or inherited in autosomal dominant way and nine were inherited in autosomal recessive way. The patients had vertebral segmentation anomalies accompanied with variable joint contractures, short stature and dysmorphic facial features. There was a significant phenotypic overlap between dominant and recessive MYH3-associated conditions regarding the degree of short stature as well as the number of vertebral fusions. All monoallelic variants caused significantly decreased SMAD3 phosphorylation, which is consistent with the previously proposed pathogenic mechanism of impaired canonical TGF-beta signaling. Most of the biallelic variants were predicted to be protein-truncating, while one missense variant c.4244T>G,p.(Leu1415Arg), which was inherited in an autosomal recessive way, was found to alter the phosphorylation level of p38, suggesting an inhibition of the non-canonical pathway of TGF-beta signaling. In conclusion, the identification of 12 novel pathogenic variants and overlapping phenotypes in 17 affected individuals from 10 unrelated families expands the mutation and phenotype spectrum of MYH3-associated skeletal disorders. We show that disturbances of canonical or non-canonical TGF-beta signaling pathways are involved in pathogenesis of MYH3-associated skeletal fusion (MASF) syndrome. ; Funding Agencies|National Natural Science Foundation of ChinaNational Natural Science Foundation of China (NSFC) [81930068, 81772299, 81822030, 82072391, 81972132, 81672123, 81972037, 81902178]; Beijing Natural Science FoundationBeijing Natural Science Foundation [JQ20032, 7191007]; CAMS Innovation Fund for Medical Sciences (CIFMS) [2021-I2M-1-051, 2021-I2M-1-052]; Non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences [2019PT320025]; Tsinghua University-Peking Union Medical College Hospital Initiative Scientific Research Program; PUMC Youth Fund & the Fundamental Research Funds for the Central Universities [3332019021]; Swedish Research CouncilSwedish Research CouncilEuropean Commission [K2015-54X-22 736-01-4, 2015-02227, 2018-03046]; Swedish Governmental Agency for Innovation Systems (Vinnova)Vinnova [2014-01438]; Marianne and Marcus Wallenberg Foundation; IngaBritt och Arne Lundbergs forskningsstiftelse; Byggmastare Olle Engkvist Stiftelse; Promobilia; Nyckelfonden; Stiftelsen Frimurare Barnhuset i Stockholm; Region Stockholm; Karolinska Institutet, Stockholm, Sweden; orebro University, orebro, Sweden; Sallskapet Barnavard; Karolinska InstitutetKarolinska Institutet; Stiftelsen Sallsyntafonden; Stiftelsen Samariten; Stiftelsen Promobilia; Region Stockholm [20180131, 20200500]; US National Institutes of Health (NIH), National Institute of Neurological Disorders and Stroke [NINDS R35 NS105078]; National Human Genome Research Institute/National Heart, Lung, and Blood Institute [NHGRI/NHLBI UM1 HG006542]; US NIH National Human Genome Research Institute [NHGRI K08 HG008986]
Abstract Background Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods. Results We demonstrate Parliament's efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus. Conclusions HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, .
PurposePathogenic variants in SETD1B have been associated with a syndromic neurodevelopmental disorder including intellectual disability, language delay, and seizures. To date, clinical features have been described for 11 patients with (likely) pathogenic SETD1B sequence variants. This study aims to further delineate the spectrum of the SETD1B-related syndrome based on characterizing an expanded patient cohort.MethodsWe perform an in-depth clinical characterization of a cohort of 36 unpublished individuals with SETD1B sequence variants, describing their molecular and phenotypic spectrum. Selected variants were functionally tested using in vitro and genome-wide methylation assays.ResultsOur data present evidence for a loss-of-function mechanism of SETD1B variants, resulting in a core clinical phenotype of global developmental delay, language delay including regression, intellectual disability, autism and other behavioral issues, and variable epilepsy phenotypes. Developmental delay appeared to precede seizure onset, suggesting SETD1B dysfunction impacts physiological neurodevelopment even in the absence of epileptic activity. Males are significantly overrepresented and more severely affected, and we speculate that sex-linked traits could affect susceptibility to penetrance and the clinical spectrum of SETD1B variants.ConclusionInsights from this extensive cohort will facilitate the counseling regarding the molecular and phenotypic landscape of newly diagnosed patients with the SETD1B-related syndrome. ; We thank all patients and families for participation in this study. Part of this research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK, and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. Family 2 was collected as part of the SYNaPS Study Group collaboration funded by The Wellcome Trust and strategic award (Synaptopathies) funding (WT093205 MA and WT104033aIA) and research was conducted as part of the Queen Square Genomics group at University College London, supported by the National Institute for Health Research University College London Hospitals Biomedical Research Centre. HH is funded by The MRC (MR/S01165X/1, MR/S005021/1, G0601943), The National Institute for Health Research University College London Hospitals Biomedical Research Centre, Rosetree Trust, Ataxia UK, MSA Trust, Brain Research UK, Sparks GOSH Charity, Muscular Dystrophy UK (MDUK), Muscular Dystrophy Association (MDA USA). G.M.M. was supported by Jordan's Guardian Angels, the Brotman Baty Institute, and the Sunderland Foundation. J.R.L. acknowledges support by the Baylor Hopkins Center for Mendelian Genomics funded by the US National Human Genome Research Institute (UM1 HG006542). The DECODE-EE project (Health Research Call 2018, Tuscany Region) provided research funding to R.G. The Epilepsy Society supported this work, with funding to S.M.S. S.M.S. acknowledges that his work was partly carried out at NIHR University College London Hospitals Biomedical Research Centre, which receives a proportion of funding from the UK Department of Health's NIHR Biomedical Research Centres funding scheme. A.J. is supported by Solve-RD. The Solve-RD project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement number 779257. STA, R.R., K.J.C.L., K.A.P.G., and F.J.G.V. were supported by funding from King Abdullah University of Science and Technology (KAUST) through the baseline fund and award numbers FCC/1/1976-25 and REI/1/4446-01 from the Office of Sponsored Research (OSR). T.S.B.'s lab is supported by the Netherlands Organisation for Scientific Research (ZonMW Veni, grant 91617021), a NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation, an Erasmus MC Fellowship 2017, and Erasmus MC Human Disease Model Award 2018.