Fake IDs? Widespread misannotation of DNA transposons as a general transcription factor.
Annotation
DNA transposon
GTF2
Genome
Transcription factor
Transposable element
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
13 Nov 2023
13 Nov 2023
Historique:
received:
14
04
2023
accepted:
01
11
2023
medline:
15
11
2023
pubmed:
14
11
2023
entrez:
14
11
2023
Statut:
epublish
Résumé
Accurate annotation of genes and transposable elements (TEs) is vital for understanding genomes, but current annotation pipelines often misannotate TEs as genes. This study reveals how the general transcription factor II-I repeat domain-containing protein 2 (GTF2IRD2) erroneously annotated DNA transposons in non-mammalian species, as it contains a 3' fused hAT transposase domain. We also demonstrate the generality of this problem by identifying misannotated TEs as genes in other vertebrate genomes. Such misannotations can lead to errors in phylogenetic analyses and wasted time for investigators. The study proposes adding a final TE-check to gene annotation pipelines to mitigate this problem.
Identifiants
pubmed: 37957683
doi: 10.1186/s13059-023-03102-9
pii: 10.1186/s13059-023-03102-9
pmc: PMC10641963
doi:
Substances chimiques
DNA Transposable Elements
0
Transcription Factors, General
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
260Informations de copyright
© 2023. The Author(s).
Références
Mob DNA. 2015 Jun 02;6:11
pubmed: 26045719
Adv Drug Deliv Rev. 2010 Sep 30;62(12):1187-95
pubmed: 20615441
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Nucleic Acids Res. 2021 Jan 8;49(D1):D344-D354
pubmed: 33156333
Genome Res. 2002 Jun;12(6):996-1006
pubmed: 12045153
Mol Biol Evol. 2018 Feb 1;35(2):518-522
pubmed: 29077904
J Mol Biol. 1997 Apr 25;268(1):78-94
pubmed: 9149143
J Comput Biol. 2000 Feb-Apr;7(1-2):203-14
pubmed: 10890397
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W22-8
pubmed: 23677614
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Eur J Hum Genet. 2004 Jul;12(7):551-60
pubmed: 15100712
Genetics. 2011 May;188(1):45-57
pubmed: 21368277
Genome Biol. 2015 Jan 29;16:21
pubmed: 25723810
PLoS Biol. 2020 Dec 2;18(12):e3001007
pubmed: 33264284
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
Mob DNA. 2016 Aug 11;7:16
pubmed: 27525044
Curr Opin Struct Biol. 1998 Jun;8(3):333-7
pubmed: 9666329
Nat Methods. 2017 Jun;14(6):587-589
pubmed: 28481363