Transcriptional activity and strain-specific history of mouse pseudogenes.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
29 07 2020
Historique:
received: 02 08 2018
accepted: 08 06 2020
entrez: 31 7 2020
pubmed: 31 7 2020
medline: 9 9 2020
Statut: epublish

Résumé

Pseudogenes are ideal markers of genome remodelling. In turn, the mouse is an ideal platform for studying them, particularly with the recent availability of strain-sequencing and transcriptional data. Here, combining both manual curation and automatic pipelines, we present a genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains (available via the mouse.pseudogene.org resource). We also annotate 165 unitary pseudogenes in mouse, and 303, in human. The overall pseudogene repertoire in mouse is similar to that in human in terms of size, biotype distribution, and family composition (e.g. with GAPDH and ribosomal proteins being the largest families). Notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of all pseudogenes are unique, reflecting strain-specific evolution. Finally, we find that ~15% of the mouse pseudogenes are transcribed, and that highly transcribed parent genes tend to give rise to many processed pseudogenes.

Identifiants

pubmed: 32728065
doi: 10.1038/s41467-020-17157-w
pii: 10.1038/s41467-020-17157-w
pmc: PMC7392758
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

3695

Subventions

Organisme : Wellcome Trust
ID : WT108749/Z/15/Z
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 202878/Z/16/Z
Pays : United Kingdom
Organisme : Medical Research Council
ID : G0800024
Pays : United Kingdom
Organisme : Cancer Research UK
ID : 20412
Pays : United Kingdom
Organisme : NHGRI NIH HHS
ID : U41 HG007234
Pays : United States
Organisme : Wellcome Trust
ID : WT202878/B/16/Z
Pays : United Kingdom
Organisme : Wellcome Trust
ID : WT202878/Z/16/Z
Pays : United Kingdom
Organisme : Wellcome Trust
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/R017565/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : WT098051
Pays : United Kingdom

Références

Peters, L. L. et al. The mouse as a model for human biology: a resource guide for complex trait analysis. Nat. Rev. Genet 8, 58–69 (2007).
pubmed: 17173058 doi: 10.1038/nrg2025
Paigen, K. One hundred years of mouse genetics: an intellectual history. I. The classical period (1902-1980). Genetics 163, 1–7 (2003).
pubmed: 12586691 pmcid: 1462407 doi: 10.1093/genetics/163.1.1
Paigen, K. One hundred years of mouse genetics: an intellectual history. II. The molecular revolution (1981–2002). Genetics 163, 1227–1235 (2003).
pubmed: 12702670 pmcid: 1462511 doi: 10.1093/genetics/163.4.1227
Yalcin, B., Adams, D. J., Flint, J. & Keane, T. M. Next-generation sequencing of experimental mouse strains. Mamm. Genome 23, 490–498 (2012).
pubmed: 22772437 pmcid: 3463794 doi: 10.1007/s00335-012-9402-6
Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
pubmed: 21921910 pmcid: 3276836 doi: 10.1038/nature10413
Mestas, J. & Hughes, C. C. W. Of mice and not men: differences between mouse and human immunology. J. Immunol. 172, 2731–2738 (2004).
pubmed: 14978070 doi: 10.4049/jimmunol.172.5.2731
Mouse Genome Sequencing Consortium. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
doi: 10.1038/nature01262
Hedges, S. B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006).
pubmed: 17021158 doi: 10.1093/bioinformatics/btl505
Goios, A., Pereira, L., Bogue, M., Macaulay, V. & Amorim, A. mtDNA phylogeny and evolution of laboratory mouse strains. Genome Res. 17, 293–298 (2007).
pubmed: 17284675 pmcid: 1800920 doi: 10.1101/gr.5941007
Mouse Genome Informatics Resource. www.informatics.jax.org/mgihome/other/homepage_IntroMouse.shtml . Accessed 24 May 2020.
Richardson, A. et al. Use of transgenic mice in aging research. ILAR J. 38, 125–136 (1997).
pubmed: 11528054 doi: 10.1093/ilar.38.3.124
Troublesome variability in mouse studies. Nat Neurosci 12, 1075 (2009). https://doi.org/10.1038/nn0909-1075 .
Yang, H., Bell, T. A., Churchill, G. A. & Pardo-Manuel de Villena, F. On the subspecific origin of the laboratory mouse. Nat. Genet. 39, 1100–1107 (2007).
pubmed: 17660819 doi: 10.1038/ng2087
Yang, H. et al. Subspecific origin and haplotype diversity in the laboratory mouse. Nat. Genet. 43, 648–655 (2011).
pubmed: 21623374 pmcid: 3125408 doi: 10.1038/ng.847
Echols, N. et al. Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. Nucleic Acids Res. 30, 2515–2523 (2002).
pubmed: 12034841 pmcid: 117176 doi: 10.1093/nar/30.11.2515
Balakirev, E. S. & Ayala, F. J. Pseudogenes: are they “junk” or functional DNA? Annu. Rev. Genet. 37, 123–151 (2003).
pubmed: 14616058 doi: 10.1146/annurev.genet.37.040103.103949
Zhang, Z. D., Frankish, A., Hunt, T., Harrow, J. & Gerstein, M. Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates. Genome Biol. 11, R26 (2010).
pubmed: 20210993 pmcid: 2864566 doi: 10.1186/gb-2010-11-3-r26
Moore, R. C. & Purugganan, M. D. The early stages of duplicate gene evolution. Proc. Natl Acad. Sci. USA 100, 15682–15687 (2003).
pubmed: 14671323 doi: 10.1073/pnas.2535513100 pmcid: 307628
Kuang, M. C., Hutchins, P. D., Russell, J. D., Coon, J. J. & Hittinger, C. T. Ongoing resolution of duplicate gene functions shapes the diversification of a metabolic network. Elife 5, e19027 (2016).
pubmed: 27690225 pmcid: 5089864 doi: 10.7554/eLife.19027
Rastogi, S. & Liberles, D. A. Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol. Biol. 5, 28 (2005).
pubmed: 15831095 pmcid: 1112588 doi: 10.1186/1471-2148-5-28
Shakhnovich, B. E. & Koonin, E. V. Origins and impact of constraints in evolution of gene families. Genome Res 16, 1529–1536 (2006).
pubmed: 17053091 pmcid: 1665636 doi: 10.1101/gr.5346206
Ohno, S. Evolution by Gene Duplication. 1–160 (Springer, New York, 1970).
Wang, X., Grus, W. E. & Zhang, J. Gene losses during human origins. PLoS Biol. 4, e52 (2006).
pubmed: 16464126 pmcid: 1361800 doi: 10.1371/journal.pbio.0040052
Wang, X. et al. Specific inactivation of two immunomodulatory SIGLEC genes during human evolution. Proc. Natl Acad. Sci. USA 109, 9935–9940 (2012).
pubmed: 22665810 doi: 10.1073/pnas.1119459109 pmcid: 3382539
Pei, B. et al. The GENCODE pseudogene resource. Genome Biol. 13, R51 (2012).
pubmed: 22951037 pmcid: 3491395 doi: 10.1186/gb-2012-13-9-r51
Sisu, C. et al. Comparative analysis of pseudogenes across three phyla. Proc. Natl Acad. Sci. USA 111, 13361–13366 (2014).
pubmed: 25157146 doi: 10.1073/pnas.1407293111 pmcid: 4169933
Zhang, Z. et al. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22, 1437–1439 (2006).
pubmed: 16574694 doi: 10.1093/bioinformatics/btl116
Lilue, J. et al. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat. Genet. 50, 1574–1583 (2018).
pubmed: 30275530 pmcid: 6205630 doi: 10.1038/s41588-018-0223-8
Thybert, D. et al. Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes. Genome Res. 28, 448–459 (2018).
pubmed: 29563166 pmcid: 5880236 doi: 10.1101/gr.234096.117
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
pubmed: 22955987 pmcid: 3431492 doi: 10.1101/gr.135350.111
Phifer-Rixey, M. & Nachman, M. W. Insights into mammalian biology from the wild house mouse Mus musculus. Elife 4, e05959 (2015).
pmcid: 4397906 doi: 10.7554/eLife.05959
Yang, H. et al. A customized and versatile high-density genotyping array for the mouse. Nat. Methods 6, 663–666 (2009).
pubmed: 19668205 pmcid: 2735580 doi: 10.1038/nmeth.1359
Marques, A. C. et al. Evidence for conserved post-transcriptional roles of unitary pseudogenes and for frequent bifunctionality of mRNAs. Genome Biol. 13, R102 (2012).
pubmed: 23153069 pmcid: 3580494 doi: 10.1186/gb-2012-13-11-r102
Petrov, D. A. & Hartl, D. L. Pseudogene evolution and natural selection for a compact genome. J. Hered. 91, 221–227 (2000).
pubmed: 10833048 doi: 10.1093/jhered/91.3.221
Wu, J. et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature 534, 652–657 (2016).
pubmed: 27309802 doi: 10.1038/nature18606
Gonçalves, I., Duret, L. & Mouchiroud, D. Nature and structure of human genes that generate retropseudogenes. Genome Res. 10, 672–678 (2000).
pubmed: 10810090 pmcid: 310883 doi: 10.1101/gr.10.5.672
Hammoud, S. S. et al. Chromatin and transcription transitions of mammalian adult germline stem cells and spermatogenesis. Cell Stem Cell 15, 239–253 (2014).
pubmed: 24835570 doi: 10.1016/j.stem.2014.04.006
Sen, K., Podder, S. & Ghosh, T. C. Insights into the genomic features and evolutionary impact of the genes configuring duplicated pseudogenes in human. FEBS Lett. 584, 4015–4018 (2010).
pubmed: 20708614 doi: 10.1016/j.febslet.2010.08.012
Loehlin, D. W. & Carroll, S. B. Expression of tandem gene duplicates is often greater than twofold. Proc. Natl Acad. Sci. USA 113, 5988–5992 (2016).
pubmed: 27162370 doi: 10.1073/pnas.1605886113 pmcid: 4889415
Ohshima, K. et al. Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol. 4, R74 (2003).
pubmed: 14611660 pmcid: 329124 doi: 10.1186/gb-2003-4-11-r74
Zhang, Z. & Gerstein, M. Large-scale analysis of pseudogenes in the human genome. Curr. Opin. Genet. Dev. 14, 328–335 (2004).
pubmed: 15261647 doi: 10.1016/j.gde.2004.06.003
Goodier, J. L., Ostertag, E. M., Du, K. & Kazazian, H. H. Jr A novel active L1 retrotransposon subfamily in the mouse. Genome Res. 11, 1677–1685 (2001).
pubmed: 11591644 pmcid: 311137 doi: 10.1101/gr.198301
Brouha, B. et al. Hot L1s account for the bulk of retrotransposition in the human population. Proc. Natl Acad. Sci. USA 100, 5280–5285 (2003).
pubmed: 12682288 doi: 10.1073/pnas.0831042100 pmcid: 154336
Zhang, Z., Carriero, N. & Gerstein, M. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet. 20, 62–67 (2004).
pubmed: 14746985 doi: 10.1016/j.tig.2003.12.005
Klein, G. Toward a genetics of cancer resistance. Proc. Natl Acad. Sci. USA 106, 859–863 (2009).
pubmed: 19129501 doi: 10.1073/pnas.0811616106 pmcid: 2630080
Liu, W. et al. Mutations in cytochrome c oxidase subunit VIa cause neurodegeneration and motor dysfunction in Drosophila. Genetics 176, 937–946 (2007).
pubmed: 17435251 pmcid: 1894620 doi: 10.1534/genetics.107.071688
Zhang, Z. & Ren, Q. Why are essential genes essential?—the essentiality of Saccharomyces genes. Microb. Cell 2, 280–287 (2015).
pubmed: 28357303 pmcid: 5349100 doi: 10.15698/mic2015.08.218
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
pubmed: 26472758 pmcid: 4662922 doi: 10.1126/science.aac7041
Woods, S. et al. Duplication and retention biases of essential and non-essential genes revealed by systematic knockdown analyses. PLoS Genet. 9, e1003330 (2013).
pubmed: 23675306 pmcid: 3649981 doi: 10.1371/journal.pgen.1003330
Aubin-Houzelstein, G. & Panthier, J. J. The patchwork mouse phenotype: implication for melanocyte replacement in the hair follicle. Pigment Cell Res. 12, 181–186 (1999).
pubmed: 10385914 doi: 10.1111/j.1600-0749.1999.tb00511.x
Prats-Puig, A. et al. α-Defensins and bacterial/permeability-increasing protein as new markers of childhood obesity. Pediatr. Obes. 2, e10–e13 (2016).
Langergraber, K. E. et al. Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proc. Natl Acad. Sci. USA 109, 15716–15721 (2012).
pubmed: 22891323 doi: 10.1073/pnas.1211740109 pmcid: 3465451
Vicens, A., Lüke, L. & Roldan, E. R. S. Proteins involved in motility and sperm-egg interaction evolve more rapidly in mouse spermatozoa. PLoS ONE 9, e91302 (2014).
pubmed: 24608277 pmcid: 3948348 doi: 10.1371/journal.pone.0091302
Zheng, J. et al. mtDNA sequence, phylogeny and evolution of laboratory mice. Mitochondrion 17, 126–131 (2014).
pubmed: 25038446 doi: 10.1016/j.mito.2014.07.006
Baertsch, R., Diekhans, M., Kent, W. J., Haussler, D. & Brosius, J. Retrocopy contributions to the evolution of the human genome. BMC Genom. 9, 466 (2008).
doi: 10.1186/1471-2164-9-466
Hickey, G., Paten, B., Earl, D., Zerbino, D. & Haussler, D. HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics 29, 1341–1342 (2013).
pubmed: 23505295 pmcid: 3654707 doi: 10.1093/bioinformatics/btt128
Quinlan, A. R. BEDTools: The Swiss-Army tool for genome feature analysis. Curr Protoc Bioinforma. 47, 1–34 (2014).
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinforma. 5, 113 (2004).
doi: 10.1186/1471-2105-5-113
Genious R10. www.geneious.com . Accessed 24 May 2020.
Guo, Y., Mahony, S. & Gifford, D. K. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput Biol. 8, e1002638 (2012).
pubmed: 22912568 pmcid: 3415389 doi: 10.1371/journal.pcbi.1002638
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
pubmed: 23618408 pmcid: 4053844 doi: 10.1186/gb-2013-14-4-r36
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
pubmed: 22383036 pmcid: 3334321 doi: 10.1038/nprot.2012.016
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. www.repeatmasker.org (2013–2015).
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).
pubmed: 19617889 pmcid: 3159387 doi: 10.1038/nprot.2009.97
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. UpSet: visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph 20, 1983–1992 (2014).
pubmed: 26356912 pmcid: 4720993 doi: 10.1109/TVCG.2014.2346248
Bennett, B. D. & Bushel, P. R. goSTAG: gene ontology subtrees to tag and annotate genes within a set. Source Code Biol. Med. 12, 6 (2017).
pubmed: 28413437 pmcid: 5390446 doi: 10.1186/s13029-017-0066-1
Greene, D., Richardson, S. & Turro, E. ontologyX: a suite of R packages for working with ontological data. Bioinformatics 33, 1104–1106 (2017).
pubmed: 28062448
Lam, H. Y. K. et al. Pseudofam: the pseudogene families database. Nucleic Acids Res. 37, D738–D743 (2009).
pubmed: 18957444 doi: 10.1093/nar/gkn758
Dickinson, M. E. et al. High-throughput discovery of novel developmental phenotypes. Nature 537, 508–514 (2016).
pubmed: 27626380 pmcid: 5295821 doi: 10.1038/nature19356

Auteurs

Cristina Sisu (C)

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA.
Department of Life Sciences, Brunel University London, London, UB8 3PH, UK.

Paul Muir (P)

Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT, 06520, USA.
Systems Biology Institute, Yale University, West Haven, CT, 06516, USA.

Adam Frankish (A)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Ian Fiddes (I)

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, 95064, USA.

Mark Diekhans (M)

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, 95064, USA.

David Thybert (D)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
Earlham Institute, Norwich Research Park, Norwich, NR4 7UH, UK.

Duncan T Odom (DT)

University of Cambridge, Cancer Research UK Cambridge Institute, Robinson Way, Cambridge, CB2 0RE, UK.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Paul Flicek (P)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

Thomas M Keane (TM)

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Tim Hubbard (T)

Department of Medical and Molecular Genetics, King's College London, London, SE1 9RT, UK.

Jennifer Harrow (J)

Elexir, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

Mark Gerstein (M)

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA. mark@gersteinlab.org.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA. mark@gersteinlab.org.
Systems Biology Institute, Yale University, West Haven, CT, 06516, USA. mark@gersteinlab.org.
Department of Computer Science, Yale University, New Haven, CT, 06520, USA. mark@gersteinlab.org.
Department of Statistics & Data Science, Yale University, New Haven, CT, 06520, USA. mark@gersteinlab.org.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH