getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories.

Assembly DNA Genome sequences Metadata Nucleotide diversity Repository

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
08 Jul 2022
Historique:
received: 26 10 2021
accepted: 23 06 2022
entrez: 8 7 2022
pubmed: 9 7 2022
medline: 14 7 2022
Statut: epublish

Résumé

Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms. The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a "NucleScore" for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis. The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo . getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform ( http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html ).

Sections du résumé

BACKGROUND BACKGROUND
Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms.
RESULTS RESULTS
The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a "NucleScore" for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis.
CONCLUSION CONCLUSIONS
The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo . getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform ( http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html ).

Identifiants

pubmed: 35804320
doi: 10.1186/s12859-022-04809-5
pii: 10.1186/s12859-022-04809-5
pmc: PMC9264741
doi:

Substances chimiques

Nucleotides 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

268

Subventions

Organisme : European Regional Development Fund (ERDF)
ID : 2015-FED-186

Informations de copyright

© 2022. The Author(s).

Références

Nucleic Acids Res. 2020 Jan 8;48(D1):D45-D50
pubmed: 31724722
Nucleic Acids Res. 2020 Jan 8;48(D1):D84-D86
pubmed: 31665464
Comput Struct Biotechnol J. 2019 Jan 09;17:118-126
pubmed: 30728919
Nucleic Acids Res. 2016 Jul 8;44(W1):W3-W10
pubmed: 27137889
BMC Bioinformatics. 2013 Jan 17;14:19
pubmed: 23323543
Nucleic Acids Res. 2019 Jan 8;47(D1):D687-D692
pubmed: 30395255
Wellcome Open Res. 2018 Sep 24;3:124
pubmed: 30345391
Nucleic Acids Res. 2020 Jan 8;48(D1):D70-D76
pubmed: 31722421
Nucleic Acids Res. 2018 Jan 4;46(D1):D48-D51
pubmed: 29190397
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
J Antimicrob Chemother. 2012 Nov;67(11):2640-4
pubmed: 22782487
Int J Syst Evol Microbiol. 2018 Jul;68(7):2386-2392
pubmed: 29792589
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Genome Res. 2002 Oct;12(10):1611-8
pubmed: 12368254
Nucleic Acids Res. 2016 Jan 4;44(D1):D73-80
pubmed: 26578580
Antimicrob Agents Chemother. 2014 Jul;58(7):3895-903
pubmed: 24777092
Database (Oxford). 2019 Jan 1;2019:
pubmed: 31868882

Auteurs

Vincent Moco (V)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Damien Cazenave (D)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Maëlle Garnier (M)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Matthieu Pot (M)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Isabel Marcelino (I)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Antoine Talarmin (A)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Stéphanie Guyomard-Rabenirina (S)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Sébastien Breurec (S)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.
Faculté de Médecine Hyacinthe Bastaraud, Université des Antilles, Pointe-à-Pitre, France.
Centre d'Investigation Clinique Antilles Guyane, Inserm CIC 1424, Pointe-à-Pitre, France.

Séverine Ferdinand (S)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Alexis Dereeper (A)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

Yann Reynaud (Y)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France.

David Couvin (D)

Unité Transmission, Réservoir et Diversité des Pathogènes, Institut Pasteur de Guadeloupe, Les Abymes, Guadeloupe, France. david.couvin@googlemail.com.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH