RepetDB: a unified resource for transposable element references.

Database RepetDB Transposable element

Journal

Mobile DNA
ISSN: 1759-8753
Titre abrégé: Mob DNA
Pays: England
ID NLM: 101519891

Informations de publication

Date de publication:
2019
Historique:
received: 26 10 2018
accepted: 24 01 2019
entrez: 6 2 2019
pubmed: 6 2 2019
medline: 6 2 2019
Statut: epublish

Résumé

Thanks to their ability to move around and replicate within genomes, transposable elements (TEs) are perhaps the most important contributors to genome plasticity and evolution. Their detection and annotation are considered essential in any genome sequencing project. The number of fully sequenced genomes is rapidly increasing with improvements in high-throughput sequencing technologies. A fully automated de novo annotation process for TEs is therefore required to cope with the deluge of sequence data.However, all automated procedures are error-prone, and an automated procedure for TE identification and classification would be no exception. It is therefore crucial to provide not only the TE reference sequences, but also evidence justifying their classification, at the scale of the whole genome. A few TE databases already exist, but none provides evidence to justify TE classification. Moreover, biological information about the sequences remains globally poor. We present here the RepetDB database developed in the framework of GnpIS, a genetic and genomic information system. RepetDB is designed to store and retrieve detected, classified and annotated TEs in a standardized manner. RepetDB is an implementation with extensions of InterMine, an open-source data warehouse framework used here to store, search, browse, analyze and compare all the data recorded for each TE reference sequence. InterMine can display diverse information for each sequence and allows simple to very complex queries. Finally, TE data are displayed via a worldwide data discovery portal. RepetDB is accessible at urgi.versailles.inra.fr/repetdb. RepetDB is designed to be a TE knowledge base populated with full de novo TE annotations of complete (or near-complete) genome sequences. Indeed, the description and classification of TEs facilitates the exploration of specific TE families, superfamilies or orders across a large range of species. It also makes possible cross-species searches and comparisons of TE family content between genomes.

Sections du résumé

BACKGROUND BACKGROUND
Thanks to their ability to move around and replicate within genomes, transposable elements (TEs) are perhaps the most important contributors to genome plasticity and evolution. Their detection and annotation are considered essential in any genome sequencing project. The number of fully sequenced genomes is rapidly increasing with improvements in high-throughput sequencing technologies. A fully automated de novo annotation process for TEs is therefore required to cope with the deluge of sequence data.However, all automated procedures are error-prone, and an automated procedure for TE identification and classification would be no exception. It is therefore crucial to provide not only the TE reference sequences, but also evidence justifying their classification, at the scale of the whole genome. A few TE databases already exist, but none provides evidence to justify TE classification. Moreover, biological information about the sequences remains globally poor.
RESULTS RESULTS
We present here the RepetDB database developed in the framework of GnpIS, a genetic and genomic information system. RepetDB is designed to store and retrieve detected, classified and annotated TEs in a standardized manner. RepetDB is an implementation with extensions of InterMine, an open-source data warehouse framework used here to store, search, browse, analyze and compare all the data recorded for each TE reference sequence. InterMine can display diverse information for each sequence and allows simple to very complex queries. Finally, TE data are displayed via a worldwide data discovery portal. RepetDB is accessible at urgi.versailles.inra.fr/repetdb.
CONCLUSIONS CONCLUSIONS
RepetDB is designed to be a TE knowledge base populated with full de novo TE annotations of complete (or near-complete) genome sequences. Indeed, the description and classification of TEs facilitates the exploration of specific TE families, superfamilies or orders across a large range of species. It also makes possible cross-species searches and comparisons of TE family content between genomes.

Identifiants

pubmed: 30719103
doi: 10.1186/s13100-019-0150-y
pii: 150
pmc: PMC6350395
doi:

Types de publication

Journal Article

Langues

eng

Pagination

6

Déclaration de conflit d'intérêts

Not applicableNot applicableThe authors declare that they have no competing interests.Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Références

Genome Res. 2002 Aug;12(8):1269-76
pubmed: 12176934
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D360-3
pubmed: 14681434
J Mol Evol. 2003;57 Suppl 1:S50-9
pubmed: 15008403
Bioinformatics. 2005 Jun;21 Suppl 1:i152-8
pubmed: 15961452
Cytogenet Genome Res. 2005;110(1-4):462-7
pubmed: 16093699
PLoS Comput Biol. 2005 Jul;1(2):166-75
pubmed: 16110336
Nat Rev Genet. 2007 Apr;8(4):272-85
pubmed: 17363976
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W265-8
pubmed: 17485477
Bioinformatics. 2007 Sep 1;23(17):2334-6
pubmed: 17586542
Nat Rev Genet. 2007 Dec;8(12):973-82
pubmed: 17984973
Methods Mol Biol. 2009;537:323-36
pubmed: 19378152
Plant Cell. 2010 Jun;22(6):1686-701
pubmed: 20581307
Nucleic Acids Res. 2011 Jan;39(Database issue):D70-4
pubmed: 21036865
Genome Biol Evol. 2011;3:219-29
pubmed: 21296765
PLoS One. 2011 Jan 31;6(1):e16526
pubmed: 21304975
Funct Integr Genomics. 2011 Dec;11(4):671-7
pubmed: 21809124
Bioinformatics. 2012 Dec 1;28(23):3163-5
pubmed: 23023984
Nucleic Acids Res. 2013 Jan;41(Database issue):D1144-51
pubmed: 23203886
Nucleic Acids Res. 2013 Jan;41(Database issue):D83-9
pubmed: 23203982
Nat Rev Genet. 2013 Jan;14(1):49-61
pubmed: 23247435
Nat Genet. 2013 Sep;45(9):1092-6
pubmed: 23852167
Database (Oxford). 2013 Aug 19;2013:bat058
pubmed: 23959375
Nucleic Acids Res. 2014 Jan;42(Database issue):D1176-81
pubmed: 24174541
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30
pubmed: 24288371
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W468-72
pubmed: 24753429
PLoS One. 2014 May 02;9(5):e91929
pubmed: 24786468
BMC Genomics. 2015 Feb 28;16:141
pubmed: 25766680
Mob DNA. 2015 Jun 02;6:11
pubmed: 26045719
Nucleic Acids Res. 2016 Jan 4;44(D1):D81-9
pubmed: 26612867
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Genome Biol. 2016 Apr 12;17:66
pubmed: 27072794
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
Plant Genome. 2016 Mar;9(1):
pubmed: 27898761
BMC Genomics. 2017 Aug 29;18(1):667
pubmed: 28851275
Comput Appl Biosci. 1994 Jun;10(3):227-35
pubmed: 7922677

Auteurs

Joëlle Amselem (J)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Guillaume Cornut (G)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Nathalie Choisne (N)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Michael Alaux (M)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Françoise Alfama-Depauw (F)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Véronique Jamilloux (V)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Florian Maumus (F)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Thomas Letellier (T)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Isabelle Luyten (I)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Cyril Pommier (C)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Anne-Françoise Adam-Blondon (AF)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Hadi Quesneville (H)

URGI, INRA, Université Paris-Saclay, 78026 Versailles, France.

Classifications MeSH