Species abundance information improves sequence taxonomy classification accuracy.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
11 10 2019
Historique:
received: 27 02 2019
accepted: 19 09 2019
entrez: 13 10 2019
pubmed: 13 10 2019
medline: 6 2 2020
Statut: epublish

Résumé

Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.

Identifiants

pubmed: 31604942
doi: 10.1038/s41467-019-12669-6
pii: 10.1038/s41467-019-12669-6
pmc: PMC6789115
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

4643

Subventions

Organisme : NIMH NIH HHS
ID : P30 MH062512
Pays : United States

Références

mSystems. 2017 Mar 7;2(2):
pubmed: 28289731
Front Plant Sci. 2016 Jun 20;7:820
pubmed: 27379119
Front Microbiol. 2017 Oct 12;8:1937
pubmed: 29075239
Environ Sci Technol. 2018 Nov 20;52(22):13438-13447
pubmed: 30335369
mSystems. 2018 Jun 5;3(3):
pubmed: 29896566
ISME J. 2012 Mar;6(3):610-8
pubmed: 22134646
PeerJ. 2018 Apr 18;6:e4652
pubmed: 29682424
mSphere. 2018 Sep 5;3(5):
pubmed: 30185512
J Biotechnol. 2006 Oct 20;126(1):37-51
pubmed: 16757050
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Nature. 2012 Jun 13;486(7402):207-14
pubmed: 22699609
Methods Enzymol. 2013;531:371-444
pubmed: 24060131
Science. 2014 Aug 29;345(6200):1048-52
pubmed: 25170151
J Clin Microbiol. 2007 Sep;45(9):2761-4
pubmed: 17626177
BMC Genomics. 2015 Dec 12;16:1056
pubmed: 26651617
Environ Microbiol. 2016 Jun;18(6):2039-51
pubmed: 26914164
BMC Genomics. 2012;13 Suppl 8:S17
pubmed: 23282177
Nature. 2017 Nov 23;551(7681):457-463
pubmed: 29088705
FEMS Microbiol Rev. 2012 Mar;36(2):435-62
pubmed: 22092350
Nat Methods. 2016 Jul;13(7):581-3
pubmed: 27214047
mSystems. 2019 Jun 25;4(4):
pubmed: 31239397
Sci Rep. 2017 Nov 21;7(1):15902
pubmed: 29162884
Appl Environ Microbiol. 2007 Aug;73(16):5261-7
pubmed: 17586664
mSystems. 2016 Aug 2;1(4):
pubmed: 27822543
Nat Microbiol. 2018 Feb;3(2):234-242
pubmed: 29180726
Front Microbiol. 2012 Jan 06;2:268
pubmed: 22232619
Front Microbiol. 2016 Apr 20;7:459
pubmed: 27148170
PLoS One. 2014 May 19;9(5):e97435
pubmed: 24841417
Mol Ecol. 2014 Mar;23(6):1301-17
pubmed: 24118574
Genome Biol. 2011;12(5):R50
pubmed: 21624126
Nat Methods. 2018 Oct;15(10):796-798
pubmed: 30275573
Am J Clin Nutr. 2015 Feb;101(2):251-61
pubmed: 25646321
Microbiol Mol Biol Rev. 2004 Sep;68(3):403-31, table of contents
pubmed: 15353563
mBio. 2015 Mar 24;6(2):
pubmed: 25805735
PLoS Negl Trop Dis. 2016 Feb 18;10(2):e0004403
pubmed: 26890609
Nat Methods. 2015 Oct;12(10):902-3
pubmed: 26418763
J Immunol Methods. 2015 Jun;421:112-121
pubmed: 25891793
Clin Microbiol Rev. 2007 Jul;20(3):511-32, table of contents
pubmed: 17630338
Cell. 2014 Jul 17;158(2):250-262
pubmed: 25036628
Microbiome. 2018 May 17;6(1):90
pubmed: 29773078
Nat Biotechnol. 2019 Aug;37(8):852-857
pubmed: 31341288
Front Microbiol. 2018 Oct 30;9:2559
pubmed: 30425690

Auteurs

Benjamin D Kaehler (BD)

Research School of Biology, Australian National University, Canberra, Australia. b.kaehler@adfa.edu.au.
School of Science, University of New South Wales, Canberra, Australia. b.kaehler@adfa.edu.au.

Nicholas A Bokulich (NA)

Center for Applied Microbiome Science, The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA. nicholas.bokulich@nau.edu.
Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA. nicholas.bokulich@nau.edu.

Daniel McDonald (D)

Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.

Rob Knight (R)

Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA.

J Gregory Caporaso (JG)

Center for Applied Microbiome Science, The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA. gregcaporaso@gmail.com.
Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA. gregcaporaso@gmail.com.

Gavin A Huttley (GA)

Research School of Biology, Australian National University, Canberra, Australia. gavin.huttley@anu.edu.au.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Populus Soil Microbiology Soil Microbiota Fungi

Classifications MeSH