Transcriptional activity and strain-specific history of mouse pseudogenes.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
29 07 2020
29 07 2020
Historique:
received:
02
08
2018
accepted:
08
06
2020
entrez:
31
7
2020
pubmed:
31
7
2020
medline:
9
9
2020
Statut:
epublish
Résumé
Pseudogenes are ideal markers of genome remodelling. In turn, the mouse is an ideal platform for studying them, particularly with the recent availability of strain-sequencing and transcriptional data. Here, combining both manual curation and automatic pipelines, we present a genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains (available via the mouse.pseudogene.org resource). We also annotate 165 unitary pseudogenes in mouse, and 303, in human. The overall pseudogene repertoire in mouse is similar to that in human in terms of size, biotype distribution, and family composition (e.g. with GAPDH and ribosomal proteins being the largest families). Notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of all pseudogenes are unique, reflecting strain-specific evolution. Finally, we find that ~15% of the mouse pseudogenes are transcribed, and that highly transcribed parent genes tend to give rise to many processed pseudogenes.
Identifiants
pubmed: 32728065
doi: 10.1038/s41467-020-17157-w
pii: 10.1038/s41467-020-17157-w
pmc: PMC7392758
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
3695Subventions
Organisme : Wellcome Trust
ID : WT108749/Z/15/Z
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 202878/Z/16/Z
Pays : United Kingdom
Organisme : Medical Research Council
ID : G0800024
Pays : United Kingdom
Organisme : Cancer Research UK
ID : 20412
Pays : United Kingdom
Organisme : NHGRI NIH HHS
ID : U41 HG007234
Pays : United States
Organisme : Wellcome Trust
ID : WT202878/B/16/Z
Pays : United Kingdom
Organisme : Wellcome Trust
ID : WT202878/Z/16/Z
Pays : United Kingdom
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R017565/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : WT098051
Pays : United Kingdom
Références
Peters, L. L. et al. The mouse as a model for human biology: a resource guide for complex trait analysis. Nat. Rev. Genet 8, 58–69 (2007).
pubmed: 17173058
doi: 10.1038/nrg2025
Paigen, K. One hundred years of mouse genetics: an intellectual history. I. The classical period (1902-1980). Genetics 163, 1–7 (2003).
pubmed: 12586691
pmcid: 1462407
doi: 10.1093/genetics/163.1.1
Paigen, K. One hundred years of mouse genetics: an intellectual history. II. The molecular revolution (1981–2002). Genetics 163, 1227–1235 (2003).
pubmed: 12702670
pmcid: 1462511
doi: 10.1093/genetics/163.4.1227
Yalcin, B., Adams, D. J., Flint, J. & Keane, T. M. Next-generation sequencing of experimental mouse strains. Mamm. Genome 23, 490–498 (2012).
pubmed: 22772437
pmcid: 3463794
doi: 10.1007/s00335-012-9402-6
Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
pubmed: 21921910
pmcid: 3276836
doi: 10.1038/nature10413
Mestas, J. & Hughes, C. C. W. Of mice and not men: differences between mouse and human immunology. J. Immunol. 172, 2731–2738 (2004).
pubmed: 14978070
doi: 10.4049/jimmunol.172.5.2731
Mouse Genome Sequencing Consortium. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
doi: 10.1038/nature01262
Hedges, S. B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006).
pubmed: 17021158
doi: 10.1093/bioinformatics/btl505
Goios, A., Pereira, L., Bogue, M., Macaulay, V. & Amorim, A. mtDNA phylogeny and evolution of laboratory mouse strains. Genome Res. 17, 293–298 (2007).
pubmed: 17284675
pmcid: 1800920
doi: 10.1101/gr.5941007
Mouse Genome Informatics Resource. www.informatics.jax.org/mgihome/other/homepage_IntroMouse.shtml . Accessed 24 May 2020.
Richardson, A. et al. Use of transgenic mice in aging research. ILAR J. 38, 125–136 (1997).
pubmed: 11528054
doi: 10.1093/ilar.38.3.124
Troublesome variability in mouse studies. Nat Neurosci 12, 1075 (2009). https://doi.org/10.1038/nn0909-1075 .
Yang, H., Bell, T. A., Churchill, G. A. & Pardo-Manuel de Villena, F. On the subspecific origin of the laboratory mouse. Nat. Genet. 39, 1100–1107 (2007).
pubmed: 17660819
doi: 10.1038/ng2087
Yang, H. et al. Subspecific origin and haplotype diversity in the laboratory mouse. Nat. Genet. 43, 648–655 (2011).
pubmed: 21623374
pmcid: 3125408
doi: 10.1038/ng.847
Echols, N. et al. Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. Nucleic Acids Res. 30, 2515–2523 (2002).
pubmed: 12034841
pmcid: 117176
doi: 10.1093/nar/30.11.2515
Balakirev, E. S. & Ayala, F. J. Pseudogenes: are they “junk” or functional DNA? Annu. Rev. Genet. 37, 123–151 (2003).
pubmed: 14616058
doi: 10.1146/annurev.genet.37.040103.103949
Zhang, Z. D., Frankish, A., Hunt, T., Harrow, J. & Gerstein, M. Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates. Genome Biol. 11, R26 (2010).
pubmed: 20210993
pmcid: 2864566
doi: 10.1186/gb-2010-11-3-r26
Moore, R. C. & Purugganan, M. D. The early stages of duplicate gene evolution. Proc. Natl Acad. Sci. USA 100, 15682–15687 (2003).
pubmed: 14671323
doi: 10.1073/pnas.2535513100
pmcid: 307628
Kuang, M. C., Hutchins, P. D., Russell, J. D., Coon, J. J. & Hittinger, C. T. Ongoing resolution of duplicate gene functions shapes the diversification of a metabolic network. Elife 5, e19027 (2016).
pubmed: 27690225
pmcid: 5089864
doi: 10.7554/eLife.19027
Rastogi, S. & Liberles, D. A. Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol. Biol. 5, 28 (2005).
pubmed: 15831095
pmcid: 1112588
doi: 10.1186/1471-2148-5-28
Shakhnovich, B. E. & Koonin, E. V. Origins and impact of constraints in evolution of gene families. Genome Res 16, 1529–1536 (2006).
pubmed: 17053091
pmcid: 1665636
doi: 10.1101/gr.5346206
Ohno, S. Evolution by Gene Duplication. 1–160 (Springer, New York, 1970).
Wang, X., Grus, W. E. & Zhang, J. Gene losses during human origins. PLoS Biol. 4, e52 (2006).
pubmed: 16464126
pmcid: 1361800
doi: 10.1371/journal.pbio.0040052
Wang, X. et al. Specific inactivation of two immunomodulatory SIGLEC genes during human evolution. Proc. Natl Acad. Sci. USA 109, 9935–9940 (2012).
pubmed: 22665810
doi: 10.1073/pnas.1119459109
pmcid: 3382539
Pei, B. et al. The GENCODE pseudogene resource. Genome Biol. 13, R51 (2012).
pubmed: 22951037
pmcid: 3491395
doi: 10.1186/gb-2012-13-9-r51
Sisu, C. et al. Comparative analysis of pseudogenes across three phyla. Proc. Natl Acad. Sci. USA 111, 13361–13366 (2014).
pubmed: 25157146
doi: 10.1073/pnas.1407293111
pmcid: 4169933
Zhang, Z. et al. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22, 1437–1439 (2006).
pubmed: 16574694
doi: 10.1093/bioinformatics/btl116
Lilue, J. et al. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat. Genet. 50, 1574–1583 (2018).
pubmed: 30275530
pmcid: 6205630
doi: 10.1038/s41588-018-0223-8
Thybert, D. et al. Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes. Genome Res. 28, 448–459 (2018).
pubmed: 29563166
pmcid: 5880236
doi: 10.1101/gr.234096.117
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
pubmed: 22955987
pmcid: 3431492
doi: 10.1101/gr.135350.111
Phifer-Rixey, M. & Nachman, M. W. Insights into mammalian biology from the wild house mouse Mus musculus. Elife 4, e05959 (2015).
pmcid: 4397906
doi: 10.7554/eLife.05959
Yang, H. et al. A customized and versatile high-density genotyping array for the mouse. Nat. Methods 6, 663–666 (2009).
pubmed: 19668205
pmcid: 2735580
doi: 10.1038/nmeth.1359
Marques, A. C. et al. Evidence for conserved post-transcriptional roles of unitary pseudogenes and for frequent bifunctionality of mRNAs. Genome Biol. 13, R102 (2012).
pubmed: 23153069
pmcid: 3580494
doi: 10.1186/gb-2012-13-11-r102
Petrov, D. A. & Hartl, D. L. Pseudogene evolution and natural selection for a compact genome. J. Hered. 91, 221–227 (2000).
pubmed: 10833048
doi: 10.1093/jhered/91.3.221
Wu, J. et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature 534, 652–657 (2016).
pubmed: 27309802
doi: 10.1038/nature18606
Gonçalves, I., Duret, L. & Mouchiroud, D. Nature and structure of human genes that generate retropseudogenes. Genome Res. 10, 672–678 (2000).
pubmed: 10810090
pmcid: 310883
doi: 10.1101/gr.10.5.672
Hammoud, S. S. et al. Chromatin and transcription transitions of mammalian adult germline stem cells and spermatogenesis. Cell Stem Cell 15, 239–253 (2014).
pubmed: 24835570
doi: 10.1016/j.stem.2014.04.006
Sen, K., Podder, S. & Ghosh, T. C. Insights into the genomic features and evolutionary impact of the genes configuring duplicated pseudogenes in human. FEBS Lett. 584, 4015–4018 (2010).
pubmed: 20708614
doi: 10.1016/j.febslet.2010.08.012
Loehlin, D. W. & Carroll, S. B. Expression of tandem gene duplicates is often greater than twofold. Proc. Natl Acad. Sci. USA 113, 5988–5992 (2016).
pubmed: 27162370
doi: 10.1073/pnas.1605886113
pmcid: 4889415
Ohshima, K. et al. Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol. 4, R74 (2003).
pubmed: 14611660
pmcid: 329124
doi: 10.1186/gb-2003-4-11-r74
Zhang, Z. & Gerstein, M. Large-scale analysis of pseudogenes in the human genome. Curr. Opin. Genet. Dev. 14, 328–335 (2004).
pubmed: 15261647
doi: 10.1016/j.gde.2004.06.003
Goodier, J. L., Ostertag, E. M., Du, K. & Kazazian, H. H. Jr A novel active L1 retrotransposon subfamily in the mouse. Genome Res. 11, 1677–1685 (2001).
pubmed: 11591644
pmcid: 311137
doi: 10.1101/gr.198301
Brouha, B. et al. Hot L1s account for the bulk of retrotransposition in the human population. Proc. Natl Acad. Sci. USA 100, 5280–5285 (2003).
pubmed: 12682288
doi: 10.1073/pnas.0831042100
pmcid: 154336
Zhang, Z., Carriero, N. & Gerstein, M. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet. 20, 62–67 (2004).
pubmed: 14746985
doi: 10.1016/j.tig.2003.12.005
Klein, G. Toward a genetics of cancer resistance. Proc. Natl Acad. Sci. USA 106, 859–863 (2009).
pubmed: 19129501
doi: 10.1073/pnas.0811616106
pmcid: 2630080
Liu, W. et al. Mutations in cytochrome c oxidase subunit VIa cause neurodegeneration and motor dysfunction in Drosophila. Genetics 176, 937–946 (2007).
pubmed: 17435251
pmcid: 1894620
doi: 10.1534/genetics.107.071688
Zhang, Z. & Ren, Q. Why are essential genes essential?—the essentiality of Saccharomyces genes. Microb. Cell 2, 280–287 (2015).
pubmed: 28357303
pmcid: 5349100
doi: 10.15698/mic2015.08.218
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
pubmed: 26472758
pmcid: 4662922
doi: 10.1126/science.aac7041
Woods, S. et al. Duplication and retention biases of essential and non-essential genes revealed by systematic knockdown analyses. PLoS Genet. 9, e1003330 (2013).
pubmed: 23675306
pmcid: 3649981
doi: 10.1371/journal.pgen.1003330
Aubin-Houzelstein, G. & Panthier, J. J. The patchwork mouse phenotype: implication for melanocyte replacement in the hair follicle. Pigment Cell Res. 12, 181–186 (1999).
pubmed: 10385914
doi: 10.1111/j.1600-0749.1999.tb00511.x
Prats-Puig, A. et al. α-Defensins and bacterial/permeability-increasing protein as new markers of childhood obesity. Pediatr. Obes. 2, e10–e13 (2016).
Langergraber, K. E. et al. Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proc. Natl Acad. Sci. USA 109, 15716–15721 (2012).
pubmed: 22891323
doi: 10.1073/pnas.1211740109
pmcid: 3465451
Vicens, A., Lüke, L. & Roldan, E. R. S. Proteins involved in motility and sperm-egg interaction evolve more rapidly in mouse spermatozoa. PLoS ONE 9, e91302 (2014).
pubmed: 24608277
pmcid: 3948348
doi: 10.1371/journal.pone.0091302
Zheng, J. et al. mtDNA sequence, phylogeny and evolution of laboratory mice. Mitochondrion 17, 126–131 (2014).
pubmed: 25038446
doi: 10.1016/j.mito.2014.07.006
Baertsch, R., Diekhans, M., Kent, W. J., Haussler, D. & Brosius, J. Retrocopy contributions to the evolution of the human genome. BMC Genom. 9, 466 (2008).
doi: 10.1186/1471-2164-9-466
Hickey, G., Paten, B., Earl, D., Zerbino, D. & Haussler, D. HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics 29, 1341–1342 (2013).
pubmed: 23505295
pmcid: 3654707
doi: 10.1093/bioinformatics/btt128
Quinlan, A. R. BEDTools: The Swiss-Army tool for genome feature analysis. Curr Protoc Bioinforma. 47, 1–34 (2014).
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma. 5, 113 (2004).
doi: 10.1186/1471-2105-5-113
Genious R10. www.geneious.com . Accessed 24 May 2020.
Guo, Y., Mahony, S. & Gifford, D. K. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput Biol. 8, e1002638 (2012).
pubmed: 22912568
pmcid: 3415389
doi: 10.1371/journal.pcbi.1002638
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
pubmed: 23618408
pmcid: 4053844
doi: 10.1186/gb-2013-14-4-r36
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
pubmed: 22383036
pmcid: 3334321
doi: 10.1038/nprot.2012.016
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. www.repeatmasker.org (2013–2015).
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).
pubmed: 19617889
pmcid: 3159387
doi: 10.1038/nprot.2009.97
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. UpSet: visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph 20, 1983–1992 (2014).
pubmed: 26356912
pmcid: 4720993
doi: 10.1109/TVCG.2014.2346248
Bennett, B. D. & Bushel, P. R. goSTAG: gene ontology subtrees to tag and annotate genes within a set. Source Code Biol. Med. 12, 6 (2017).
pubmed: 28413437
pmcid: 5390446
doi: 10.1186/s13029-017-0066-1
Greene, D., Richardson, S. & Turro, E. ontologyX: a suite of R packages for working with ontological data. Bioinformatics 33, 1104–1106 (2017).
pubmed: 28062448
Lam, H. Y. K. et al. Pseudofam: the pseudogene families database. Nucleic Acids Res. 37, D738–D743 (2009).
pubmed: 18957444
doi: 10.1093/nar/gkn758
Dickinson, M. E. et al. High-throughput discovery of novel developmental phenotypes. Nature 537, 508–514 (2016).
pubmed: 27626380
pmcid: 5295821
doi: 10.1038/nature19356