Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes.

Algorithms Cluster Analysis Computational Biology / methods Metagenome / genetics Metagenomics Molecular Sequence Annotation / methods Sequence Analysis, DNA

Assembly Functional annotation Metagenomics Taxonomic annotation

Journal

BMC genomics

ISSN: 1471-2164

Titre abrégé: BMC Genomics

Pays: England

ID NLM: 100965258

Informations de publication

Date de publication:
10 Dec 2019

Historique:

received: 16 01 2019

accepted: 14 11 2019

entrez: 12 12 2019

pubmed: 12 12 2019

medline: 6 5 2020

Statut: epublish

Résumé

Metagenomes can be analysed using different approaches and tools. One of the most important distinctions is the way to perform taxonomic and functional assignment, choosing between the use of assembly algorithms or the direct analysis of raw sequence reads instead by homology searching, k-mer analysys, or detection of marker genes. Many instances of each approach can be found in the literature, but to the best of our knowledge no evaluation of their different performances has been carried on, and we question if their results are comparable. We have analysed several real and mock metagenomes using different methodologies and tools, and compared the resulting taxonomic and functional profiles. Our results show that database completeness (the representation of diverse organisms and taxa in it) is the main factor determining the performance of the methods relying on direct read assignment either by homology, k-mer composition or similarity to marker genes, while methods relying on assembly and assignment of predicted genes are most influenced by metagenomic size, that in turn determines the completeness of the assembly (the percentage of read that were assembled). Although differences exist, taxonomic profiles are rather similar between raw read assignment and assembly assignment methods, while they are more divergent for methods based on k-mers and marker genes. Regarding functional annotation, analysis of raw reads retrieves more functions, but it also makes a substantial number of over-predictions. Assembly methods are more advantageous as the size of the metagenome grows bigger.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

We have analysed several real and mock metagenomes using different methodologies and tools, and compared the resulting taxonomic and functional profiles. Our results show that database completeness (the representation of diverse organisms and taxa in it) is the main factor determining the performance of the methods relying on direct read assignment either by homology, k-mer composition or similarity to marker genes, while methods relying on assembly and assignment of predicted genes are most influenced by metagenomic size, that in turn determines the completeness of the assembly (the percentage of read that were assembled).

CONCLUSIONS CONCLUSIONS

Although differences exist, taxonomic profiles are rather similar between raw read assignment and assembly assignment methods, while they are more divergent for methods based on k-mers and marker genes. Regarding functional annotation, analysis of raw reads retrieves more functions, but it also makes a substantial number of over-predictions. Assembly methods are more advantageous as the size of the metagenome grows bigger.

Identifiants

DOI: 10.1186/s12864-019-6289-6 PMID: 31823721 PMC: PMC6902526

pubmed: 31823721

doi: 10.1186/s12864-019-6289-6

pii: 10.1186/s12864-019-6289-6

pmc: PMC6902526

doi:

Types de publication

Comparative Study Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

960

Subventions

Organisme : Ministerio de Economía, Industria y Competitividad, Gobierno de España

ID : CTM2013-48292-C3-2-R

Organisme : Ministerio de Economía, Industria y Competitividad, Gobierno de España

ID : CTM2016-80095-C2-1-R

Références

Proc Natl Acad Sci U S A. 2012 Jan 10;109(2):594-9

pubmed: 22184244

Front Microbiol. 2016 Mar 31;7:433

pubmed: 27065987

Appl Environ Microbiol. 2012 Jan;78(2):549-59

pubmed: 22081564

Front Microbiol. 2018 Oct 02;9:2353

pubmed: 30333812

PLoS One. 2014 Aug 22;9(8):e105776

pubmed: 25148512

Nat Methods. 2017 Nov;14(11):1063-1071

pubmed: 28967888

Bioinformatics. 2015 May 15;31(10):1674-6

pubmed: 25609793

Nucleic Acids Res. 2014 Jan;42(Database issue):D581-91

pubmed: 24225323

ISME J. 2012 Apr;6(4):898-901

pubmed: 22030673

Genome Res. 2016 Dec;26(12):1721-1729

pubmed: 27852649

Syst Appl Microbiol. 2013 Jun;36(4):215-7

pubmed: 23453737

Genome Biol. 2014 Mar 03;15(3):R46

pubmed: 24580807

Appl Environ Microbiol. 2008 Mar;74(5):1453-63

pubmed: 18192407

Genome Res. 2017 May;27(5):824-834

pubmed: 28298430

Bioinformatics. 2014 Dec 15;30(24):3548-55

pubmed: 25359891

Nat Methods. 2012 Mar 04;9(4):357-9

pubmed: 22388286

Nat Methods. 2015 Oct;12(10):902-3

pubmed: 26418763

Nature. 2011 Nov 06;480(7377):368-71

pubmed: 22056985

Nucleic Acids Res. 2014 Apr;42(8):e73

pubmed: 24589583

Nucleic Acids Res. 1999 Jan 1;27(1):29-34

pubmed: 9847135

R Soc Open Sci. 2015 Apr 22;2(4):140219

pubmed: 26064626

Bioinformatics. 2008 Sep 15;24(18):2124-5

pubmed: 18625611

Nature. 2007 Oct 18;449(7164):804-10

pubmed: 17943116

Nat Methods. 2015 Jan;12(1):59-60

pubmed: 25402007

Genom Data. 2015 Mar 24;4:73-5

pubmed: 26484181

F1000Res. 2016 Jul 28;5:1881

pubmed: 27610223

BMC Bioinformatics. 2010 Mar 08;11:119

pubmed: 20211023

Genome Res. 2012 Mar;22(3):557-67

pubmed: 22147368

Bioinformatics. 2016 Apr 1;32(7):1088-90

pubmed: 26614127

Front Microbiol. 2019 Jan 24;9:3349

pubmed: 30733714

Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Références

Auteurs

Javier Tamames (J)

Marta Cobo-Simón (M)

Fernando Puente-Sánchez (F)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

Selecting optimal software code descriptors-The case of Java.

Fasciola hepatica and Fasciola hybrid form co-existence in yak from Tibet of China: application of rDNA internal transcribed spacer.

Comparative genomic analysis and characterization of novel high-quality draft genomes from the coal metagenome.

Classifications MeSH