Transcriptional activity and strain-specific history of mouse pseudogenes
Copyright © The Author(s) 2020. Pseudogenes are ideal markers of genome remodelling. In turn, the mouse is an ideal platform for studying them, particularly with the recent availability of strain-sequencing and transcriptional data. Here, combining both manual curation and automatic pipelines, we present a genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains (available via the mouse.pseudogene.org resource). We also annotate 165 unitary pseudogenes in mouse, and 303, in human. The overall pseudogene repertoire in mouse is similar to that in human in terms of size, biotype distribution, and family composition (e.g. with GAPDH and ribosomal proteins being the largest families). Notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of all pseudogenes are unique, reflecting strain-specific evolution. Finally, we find that ~15% of the mouse pseudogenes are transcribed, and that highly transcribed parent genes tend to give rise to many processed pseudogenes. ; This project was supported by the Wellcome Trust (grant numbers WT108749/Z/15/Z, WT098051, WT202878/Z/16/Z and WT202878/B/16/Z), Cancer Research UK (20412), the European Research Council (615584), the European Molecular Biology Laboratory, and the National Human Genome Research Institute of the National Institutes of Health under Award Number U41HG007234. The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007- 2013) under grant agreement HEALTH-F4-2010-241504 (EURATRANS).