ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data.

Analysis workflow Annotation Metagenome-assembled genomes Metagenomics

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
22 Jun 2020
Historique:
received: 23 05 2019
accepted: 08 06 2020
entrez: 24 6 2020
pubmed: 24 6 2020
medline: 11 8 2020
Statut: epublish

Résumé

Metagenomics studies provide valuable insight into the composition and function of microbial populations from diverse environments; however, the data processing pipelines that rely on mapping reads to gene catalogs or genome databases for cultured strains yield results that underrepresent the genes and functional potential of uncultured microbes. Recent improvements in sequence assembly methods have eased the reliance on genome databases, thereby allowing the recovery of genomes from uncultured microbes. However, configuring these tools, linking them with advanced binning and annotation tools, and maintaining provenance of the processing continues to be challenging for researchers. Here we present ATLAS, a software package for customizable data processing from raw sequence reads to functional and taxonomic annotations using state-of-the-art tools to assemble, annotate, quantify, and bin metagenome data. Abundance estimates at genome resolution are provided for each sample in a dataset. ATLAS is written in Python and the workflow implemented in Snakemake; it operates in a Linux environment, and is compatible with Python 3.5+ and Anaconda 3+ versions. The source code for ATLAS is freely available, distributed under a BSD-3 license. ATLAS provides a user-friendly, modular and customizable Snakemake workflow for metagenome data processing; it is easily installable with conda and maintained as open-source on GitHub at https://github.com/metagenome-atlas/atlas.

Sections du résumé

BACKGROUND BACKGROUND
Metagenomics studies provide valuable insight into the composition and function of microbial populations from diverse environments; however, the data processing pipelines that rely on mapping reads to gene catalogs or genome databases for cultured strains yield results that underrepresent the genes and functional potential of uncultured microbes. Recent improvements in sequence assembly methods have eased the reliance on genome databases, thereby allowing the recovery of genomes from uncultured microbes. However, configuring these tools, linking them with advanced binning and annotation tools, and maintaining provenance of the processing continues to be challenging for researchers.
RESULTS RESULTS
Here we present ATLAS, a software package for customizable data processing from raw sequence reads to functional and taxonomic annotations using state-of-the-art tools to assemble, annotate, quantify, and bin metagenome data. Abundance estimates at genome resolution are provided for each sample in a dataset. ATLAS is written in Python and the workflow implemented in Snakemake; it operates in a Linux environment, and is compatible with Python 3.5+ and Anaconda 3+ versions. The source code for ATLAS is freely available, distributed under a BSD-3 license.
CONCLUSIONS CONCLUSIONS
ATLAS provides a user-friendly, modular and customizable Snakemake workflow for metagenome data processing; it is easily installable with conda and maintained as open-source on GitHub at https://github.com/metagenome-atlas/atlas.

Identifiants

pubmed: 32571209
doi: 10.1186/s12859-020-03585-4
pii: 10.1186/s12859-020-03585-4
pmc: PMC7310028
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

257

Subventions

Organisme : Pacific Northwest National Laboratory LDRD program
ID : Microbiomes in Transition Initiative
Organisme : European Research Council
ID : ERC-COG-2018
Pays : International

Références

PeerJ. 2019 Jul 26;7:e7359
pubmed: 31388474
Nat Microbiol. 2018 Jul;3(7):836-843
pubmed: 29807988
Nucleic Acids Res. 2017 Jan 4;45(D1):D507-D516
pubmed: 27738135
Methods. 2016 Jun 1;102:3-11
pubmed: 27012178
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
Nat Methods. 2018 Jul;15(7):475-476
pubmed: 29967506
Bioinformatics. 2016 Aug 15;32(16):2520-3
pubmed: 27153620
BMC Bioinformatics. 2010 Mar 08;11:119
pubmed: 20211023
J Formos Med Assoc. 2019 Feb;118(2):545-555
pubmed: 29490879
Nat Biotechnol. 2017 Nov;35(11):1026-1028
pubmed: 29035372
Nat Methods. 2017 Nov;14(11):1063-1071
pubmed: 28967888
Cell Metab. 2018 Dec 4;28(6):907-921.e7
pubmed: 30174308
Bioinformatics. 2015 May 15;31(10):1674-6
pubmed: 25609793
Nat Biotechnol. 2018 Nov;36(10):996-1004
pubmed: 30148503
Nat Rev Microbiol. 2015 Jul;13(7):439-46
pubmed: 26052662
Genome Biol. 2004;5(2):R12
pubmed: 14759262
Bioinformatics. 2016 Feb 15;32(4):605-7
pubmed: 26515820
Cell. 2016 Aug 25;166(5):1103-1116
pubmed: 27565341
ISME J. 2017 Dec;11(12):2864-2868
pubmed: 28742071
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Nat Biotechnol. 2017 Nov;35(11):1069-1076
pubmed: 28967887
Nat Commun. 2018 Feb 28;9(1):870
pubmed: 29491419
Nat Microbiol. 2017 Nov;2(11):1533-1542
pubmed: 28894102
Nat Methods. 2013 Dec;10(12):1196-9
pubmed: 24141494
mSystems. 2016 May 3;1(3):
pubmed: 27822526
Bioinformatics. 2015 Jun 15;31(12):i9-16
pubmed: 26072514
Nat Methods. 2018 Nov;15(11):962-968
pubmed: 30377376
Nature. 2019 Apr;568(7753):499-504
pubmed: 30745586
Genome Res. 2015 Jul;25(7):1043-55
pubmed: 25977477
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Genome Biol. 2016 Dec 16;17(1):260
pubmed: 27986083
Nat Commun. 2018 Jun 29;9(1):2542
pubmed: 29959318
PeerJ. 2015 Oct 08;3:e1319
pubmed: 26500826
Mol Biol Evol. 2017 Aug 1;34(8):2115-2122
pubmed: 28460117
Nucleic Acids Res. 2019 Jan 8;47(D1):D309-D314
pubmed: 30418610

Auteurs

Silas Kieser (S)

Department of Cell Physiology and Metabolism, Faculty of Medicine, Centre Medical Universitaire, 1206, Geneva, Switzerland.
Swiss Institute of Bioinformatics, Geneva, Switzerland.

Joseph Brown (J)

Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, 99352, USA.
Current address: Department of Human Genetics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112, USA.

Evgeny M Zdobnov (EM)

Swiss Institute of Bioinformatics, Geneva, Switzerland.
Institute of Genetics and Genomics in Geneva (iGE3), University of Geneva, 1206, Geneva, Switzerland.
Department of Genetic Medicine and Development, University of Geneva, 1206, Geneva, Switzerland.

Mirko Trajkovski (M)

Department of Cell Physiology and Metabolism, Faculty of Medicine, Centre Medical Universitaire, 1206, Geneva, Switzerland.
Institute of Genetics and Genomics in Geneva (iGE3), University of Geneva, 1206, Geneva, Switzerland.
Diabetes Center, Faculty of Medicine, Centre Medical Universitaire, 1206, Geneva, Switzerland.

Lee Ann McCue (LA)

Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, 99352, USA. leeann.mccue@pnnl.gov.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH