Reconstruction of full-length LINE-1 progenitors from ancestral genomes.
KRAB zinc finger protein
LINE-1
ancestral sequence reconstruction
evolutionary arms race
Journal
Genetics
ISSN: 1943-2631
Titre abrégé: Genetics
Pays: United States
ID NLM: 0374636
Informations de publication
Date de publication:
04 07 2022
04 07 2022
Historique:
received:
25
03
2022
accepted:
27
04
2022
pubmed:
14
5
2022
medline:
7
7
2022
entrez:
13
5
2022
Statut:
ppublish
Résumé
Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, with 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks.
Identifiants
pubmed: 35552404
pii: 6584822
doi: 10.1093/genetics/iyac074
pmc: PMC9252281
pii:
doi:
Substances chimiques
Retroelements
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : CIHR
ID : FDN-148403
Pays : Canada
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.
Références
Genome Res. 2007 Dec;17(12):1797-808
pubmed: 17984227
PLoS Genet. 2013 May;9(5):e1003504
pubmed: 23675311
Bioinformatics. 2010 Jan 1;26(1):130-1
pubmed: 19850756
Nucleic Acids Res. 2016 Jan 8;44(1):281-93
pubmed: 26673717
Nucleic Acids Res. 2012 Jul;40(Web Server issue):W580-4
pubmed: 22661579
PLoS Genet. 2009 Apr;5(4):e1000461
pubmed: 19390601
PLoS Genet. 2020 Aug 14;16(8):e1008991
pubmed: 32797042
Mol Biol Evol. 2018 Oct 1;35(10):2582-2584
pubmed: 30165589
Methods Mol Biol. 2012;859:29-51
pubmed: 22367864
Cell. 2018 Oct 4;175(2):598-599
pubmed: 30290144
Genome Res. 2004 Dec;14(12):2412-23
pubmed: 15574820
Science. 1991 May 24;252(5009):1162-4
pubmed: 2031185
Mob DNA. 2015 Jun 02;6:11
pubmed: 26045719
Cell. 2015 Oct 22;163(3):583-93
pubmed: 26496605
Genes Dev. 2006 Jan 15;20(2):210-24
pubmed: 16418485
Nucleic Acids Res. 2016 Jan 4;44(D1):D81-9
pubmed: 26612867
Genome Biol. 2020 Sep 28;21(1):255
pubmed: 32988383
Nucleic Acids Res. 2020 Jan 8;48(D1):D265-D268
pubmed: 31777944
Cell. 1996 Nov 29;87(5):917-27
pubmed: 8945518
Trends Genet. 2017 Nov;33(11):802-816
pubmed: 28797643
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
pubmed: 12136088
Genetics. 2019 Dec;213(4):1401-1414
pubmed: 31666291
Gene Rep. 2018 Jun;11:74-78
pubmed: 30221208
Nucleic Acids Res. 1987 Mar 11;15(5):2251-60
pubmed: 3562227
Proc Natl Acad Sci U S A. 2011 Dec 20;108(51):20382-7
pubmed: 22159035
Trends Genet. 2000 Jun;16(6):276-7
pubmed: 10827456
Nat Struct Mol Biol. 2011 Aug 07;18(9):1006-14
pubmed: 21822284
Cell. 1996 Nov 29;87(5):905-16
pubmed: 8945517
Nat Rev Genet. 2008 May;9(5):397-405
pubmed: 18368054
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37
pubmed: 21593126
Mol Cell Biol. 2005 Sep;25(17):7780-95
pubmed: 16107723
Mol Cell Biol. 1990 Dec;10(12):6718-29
pubmed: 1701022
Annu Rev Genet. 2019 Dec 3;53:393-416
pubmed: 31518518
Genomics. 1987 Oct;1(2):113-25
pubmed: 3692483
Proc Natl Acad Sci U S A. 2009 Jan 20;106(3):731-6
pubmed: 19139409
Genes Dev. 2014 Jul 1;28(13):1397-409
pubmed: 24939876
J Mol Biol. 1995 Feb 24;246(3):401-417
pubmed: 7877164
G3 (Bethesda). 2018 Jan 4;8(1):219-229
pubmed: 29146583
Cell. 2015 Jan 29;160(3):554-66
pubmed: 25635462
Nat Rev Genet. 2017 Feb;18(2):71-86
pubmed: 27867194
Mol Biol Evol. 2001 Dec;18(12):2186-94
pubmed: 11719568
J Mol Biol. 2005 May 6;348(3):549-61
pubmed: 15826653
Mol Biol Evol. 2013 Jan;30(1):88-99
pubmed: 22918960
Nature. 2020 Nov;587(7833):246-251
pubmed: 33177663
Proc Natl Acad Sci U S A. 1997 Sep 16;94(19):10155-60
pubmed: 9294179
EMBO J. 1996 Feb 1;15(3):630-9
pubmed: 8599946
Genome Biol Evol. 2016 Dec 1;8(12):3485-3507
pubmed: 28175298
Methods Mol Biol. 2008;422:171-84
pubmed: 18629667
Genome Biol Evol. 2016 Dec 14;8(11):3301-3322
pubmed: 27702814
Mol Biol Evol. 2013 Jun;30(6):1239-51
pubmed: 23486611
Genome Res. 2006 Jan;16(1):78-87
pubmed: 16344559
Genome Biol. 2018 Jul 9;19(1):85
pubmed: 29983116
Nature. 2001 Feb 15;409(6822):860-921
pubmed: 11237011
Nat Genet. 2010 Jul;42(7):631-4
pubmed: 20526341
Nature. 2014 Dec 11;516(7530):242-5
pubmed: 25274305
Nature. 2017 Mar 23;543(7646):550-554
pubmed: 28273063
Genome Res. 2019 Jan;29(1):40-52
pubmed: 30455182
Science. 1991 Dec 20;254(5039):1808-10
pubmed: 1722352
Nat Genet. 2002 Dec;32(4):655-60
pubmed: 12415270
Elife. 2018 Mar 22;7:
pubmed: 29565245
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7
pubmed: 15034147
Bioinformatics. 2017 Feb 15;33(4):514-521
pubmed: 28011774
Mol Biol Evol. 2009 Jul;26(7):1641-50
pubmed: 19377059
J Mol Biol. 1986 Jan 20;187(2):291-304
pubmed: 3009828