Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes.
Assembly
Functional annotation
Metagenomics
Taxonomic annotation
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
10 Dec 2019
10 Dec 2019
Historique:
received:
16
01
2019
accepted:
14
11
2019
entrez:
12
12
2019
pubmed:
12
12
2019
medline:
6
5
2020
Statut:
epublish
Résumé
Metagenomes can be analysed using different approaches and tools. One of the most important distinctions is the way to perform taxonomic and functional assignment, choosing between the use of assembly algorithms or the direct analysis of raw sequence reads instead by homology searching, k-mer analysys, or detection of marker genes. Many instances of each approach can be found in the literature, but to the best of our knowledge no evaluation of their different performances has been carried on, and we question if their results are comparable. We have analysed several real and mock metagenomes using different methodologies and tools, and compared the resulting taxonomic and functional profiles. Our results show that database completeness (the representation of diverse organisms and taxa in it) is the main factor determining the performance of the methods relying on direct read assignment either by homology, k-mer composition or similarity to marker genes, while methods relying on assembly and assignment of predicted genes are most influenced by metagenomic size, that in turn determines the completeness of the assembly (the percentage of read that were assembled). Although differences exist, taxonomic profiles are rather similar between raw read assignment and assembly assignment methods, while they are more divergent for methods based on k-mers and marker genes. Regarding functional annotation, analysis of raw reads retrieves more functions, but it also makes a substantial number of over-predictions. Assembly methods are more advantageous as the size of the metagenome grows bigger.
Sections du résumé
BACKGROUND
BACKGROUND
Metagenomes can be analysed using different approaches and tools. One of the most important distinctions is the way to perform taxonomic and functional assignment, choosing between the use of assembly algorithms or the direct analysis of raw sequence reads instead by homology searching, k-mer analysys, or detection of marker genes. Many instances of each approach can be found in the literature, but to the best of our knowledge no evaluation of their different performances has been carried on, and we question if their results are comparable.
RESULTS
RESULTS
We have analysed several real and mock metagenomes using different methodologies and tools, and compared the resulting taxonomic and functional profiles. Our results show that database completeness (the representation of diverse organisms and taxa in it) is the main factor determining the performance of the methods relying on direct read assignment either by homology, k-mer composition or similarity to marker genes, while methods relying on assembly and assignment of predicted genes are most influenced by metagenomic size, that in turn determines the completeness of the assembly (the percentage of read that were assembled).
CONCLUSIONS
CONCLUSIONS
Although differences exist, taxonomic profiles are rather similar between raw read assignment and assembly assignment methods, while they are more divergent for methods based on k-mers and marker genes. Regarding functional annotation, analysis of raw reads retrieves more functions, but it also makes a substantial number of over-predictions. Assembly methods are more advantageous as the size of the metagenome grows bigger.
Identifiants
pubmed: 31823721
doi: 10.1186/s12864-019-6289-6
pii: 10.1186/s12864-019-6289-6
pmc: PMC6902526
doi:
Types de publication
Comparative Study
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
960Subventions
Organisme : Ministerio de Economía, Industria y Competitividad, Gobierno de España
ID : CTM2013-48292-C3-2-R
Organisme : Ministerio de Economía, Industria y Competitividad, Gobierno de España
ID : CTM2016-80095-C2-1-R
Références
Proc Natl Acad Sci U S A. 2012 Jan 10;109(2):594-9
pubmed: 22184244
Front Microbiol. 2016 Mar 31;7:433
pubmed: 27065987
Appl Environ Microbiol. 2012 Jan;78(2):549-59
pubmed: 22081564
Front Microbiol. 2018 Oct 02;9:2353
pubmed: 30333812
PLoS One. 2014 Aug 22;9(8):e105776
pubmed: 25148512
Nat Methods. 2017 Nov;14(11):1063-1071
pubmed: 28967888
Bioinformatics. 2015 May 15;31(10):1674-6
pubmed: 25609793
Nucleic Acids Res. 2014 Jan;42(Database issue):D581-91
pubmed: 24225323
ISME J. 2012 Apr;6(4):898-901
pubmed: 22030673
Genome Res. 2016 Dec;26(12):1721-1729
pubmed: 27852649
Syst Appl Microbiol. 2013 Jun;36(4):215-7
pubmed: 23453737
Genome Biol. 2014 Mar 03;15(3):R46
pubmed: 24580807
Appl Environ Microbiol. 2008 Mar;74(5):1453-63
pubmed: 18192407
Genome Res. 2017 May;27(5):824-834
pubmed: 28298430
Bioinformatics. 2014 Dec 15;30(24):3548-55
pubmed: 25359891
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Nat Methods. 2015 Oct;12(10):902-3
pubmed: 26418763
Nature. 2011 Nov 06;480(7377):368-71
pubmed: 22056985
Nucleic Acids Res. 2014 Apr;42(8):e73
pubmed: 24589583
Nucleic Acids Res. 1999 Jan 1;27(1):29-34
pubmed: 9847135
R Soc Open Sci. 2015 Apr 22;2(4):140219
pubmed: 26064626
Bioinformatics. 2008 Sep 15;24(18):2124-5
pubmed: 18625611
Nature. 2007 Oct 18;449(7164):804-10
pubmed: 17943116
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Genom Data. 2015 Mar 24;4:73-5
pubmed: 26484181
F1000Res. 2016 Jul 28;5:1881
pubmed: 27610223
BMC Bioinformatics. 2010 Mar 08;11:119
pubmed: 20211023
Genome Res. 2012 Mar;22(3):557-67
pubmed: 22147368
Bioinformatics. 2016 Apr 1;32(7):1088-90
pubmed: 26614127
Front Microbiol. 2019 Jan 24;9:3349
pubmed: 30733714