SQMtools: automated processing and visual analysis of 'omics data with R and anvi'o.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
14 Aug 2020
Historique:
received: 23 04 2020
accepted: 28 07 2020
entrez: 16 8 2020
pubmed: 17 8 2020
medline: 1 9 2020
Statut: epublish

Résumé

The dramatic decrease in sequencing costs over the last decade has boosted the adoption of high-throughput sequencing applications as a standard tool for the analysis of environmental microbial communities. Nowadays even small research groups can easily obtain raw sequencing data. After that, however, non-specialists are faced with the double challenge of choosing among an ever-increasing array of analysis methodologies, and navigating the vast amounts of results returned by these approaches. Here we present a workflow that relies on the SqueezeMeta software for the automated processing of raw reads into annotated contigs and reconstructed genomes (bins). A set of custom scripts seamlessly integrates the output into the anvi'o analysis platform, allowing filtering and visual exploration of the results. Furthermore, we provide a software package with utility functions to expose the SqueezeMeta results to the R analysis environment. Altogether, our workflow allows non-expert users to go from raw sequencing reads to custom plots with only a few powerful, flexible and well-documented commands.

Sections du résumé

BACKGROUND BACKGROUND
The dramatic decrease in sequencing costs over the last decade has boosted the adoption of high-throughput sequencing applications as a standard tool for the analysis of environmental microbial communities. Nowadays even small research groups can easily obtain raw sequencing data. After that, however, non-specialists are faced with the double challenge of choosing among an ever-increasing array of analysis methodologies, and navigating the vast amounts of results returned by these approaches.
RESULTS RESULTS
Here we present a workflow that relies on the SqueezeMeta software for the automated processing of raw reads into annotated contigs and reconstructed genomes (bins). A set of custom scripts seamlessly integrates the output into the anvi'o analysis platform, allowing filtering and visual exploration of the results. Furthermore, we provide a software package with utility functions to expose the SqueezeMeta results to the R analysis environment.
CONCLUSIONS CONCLUSIONS
Altogether, our workflow allows non-expert users to go from raw sequencing reads to custom plots with only a few powerful, flexible and well-documented commands.

Identifiants

pubmed: 32795263
doi: 10.1186/s12859-020-03703-2
pii: 10.1186/s12859-020-03703-2
pmc: PMC7430844
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

358

Subventions

Organisme : Ministerio de Economía, Industria y Competitividad, Gobierno de España
ID : CTM2016-80095-C2-1-R
Organisme : Ministerio de Economía, Industria y Competitividad, Gobierno de España
ID : PID2019-110011RB-C31
Organisme : Ministerio de Ciencia, Innovación y Universidades
ID : IJC2018-035180-I
Organisme : Ministerio de Ciencia, Innovación y Universidades
ID : SEV-2013-0347-17-2

Références

Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432
pubmed: 30357350
Novartis Found Symp. 2002;247:91-101; discussion 101-3, 119-28, 244-52
pubmed: 12539951
Curr Biol. 2015 Jun 29;25(13):1682-93
pubmed: 25981789
BMC Bioinformatics. 2003 Sep 11;4:41
pubmed: 12969510
Front Microbiol. 2017 Nov 15;8:2224
pubmed: 29187837
PLoS Biol. 2007 Mar;5(3):e82
pubmed: 17355177
Bioinformatics. 2013 Jul 15;29(14):1830-1
pubmed: 23740750
BMC Bioinformatics. 2011 Sep 30;12:385
pubmed: 21961884
Front Microbiol. 2021 Feb 26;12:638231
pubmed: 33717032
Gigascience. 2019 Sep 1;8(9):
pubmed: 31544212
BMC Bioinformatics. 2018 Jul 18;19(1):274
pubmed: 30021534
PLoS Comput Biol. 2016 Jun 21;12(6):e1004957
pubmed: 27327495
Nat Commun. 2019 Jun 20;10(1):2719
pubmed: 31222023
Theory Biosci. 2012 Dec;131(4):281-5
pubmed: 22872506
Brief Bioinform. 2019 Mar 22;20(2):398-404
pubmed: 28968751
Genome Biol. 2014;15(12):550
pubmed: 25516281
Bioinformatics. 2014 Nov 1;30(21):3123-4
pubmed: 25061070
Front Microbiol. 2019 Jan 24;9:3349
pubmed: 30733714
PeerJ. 2015 Oct 08;3:e1319
pubmed: 26500826
Brief Bioinform. 2019 Jul 19;20(4):1125-1136
pubmed: 29028872

Auteurs

Fernando Puente-Sánchez (F)

Systems Biology Department, Centro Nacional de Biotecnología (CNB-CSIC), C/ Darwin n° 3, Campus de Cantoblanco, 28049, Madrid, Spain. fpusan@gmail.com.

Natalia García-García (N)

Systems Biology Department, Centro Nacional de Biotecnología (CNB-CSIC), C/ Darwin n° 3, Campus de Cantoblanco, 28049, Madrid, Spain.

Javier Tamames (J)

Systems Biology Department, Centro Nacional de Biotecnología (CNB-CSIC), C/ Darwin n° 3, Campus de Cantoblanco, 28049, Madrid, Spain.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH