Improved recovery and annotation of genes in metagenomes through the prediction of fungal introns.

artificial intelligence eukaryote fungi gene prediction intron metagenomics

Journal

Molecular ecology resources
ISSN: 1755-0998
Titre abrégé: Mol Ecol Resour
Pays: England
ID NLM: 101465604

Informations de publication

Date de publication:
Nov 2023
Historique:
revised: 27 06 2023
received: 21 11 2022
accepted: 31 07 2023
pubmed: 10 8 2023
medline: 10 8 2023
entrez: 10 8 2023
Statut: ppublish

Résumé

Metagenomics provides a tool to assess the functional potential of environmental and host-associated microbiomes based on the analysis of environmental DNA: assembly, gene prediction and annotation. While gene prediction is straightforward for most bacterial and archaeal taxa, it has limited applicability in the majority of eukaryotic organisms, including fungi that contain introns in gene coding sequences. As a consequence, eukaryotic genes are underrepresented in metagenomics datasets and our understanding of the contribution of fungi and other eukaryotes to microbiome functioning is limited. Here, we developed a machine intelligence-based algorithm that predicts fungal introns in environmental DNA with reasonable precision and used it to improve the annotation of environmental metagenomes. Intron removal increased the number of predicted genes by up to 9.1% and improved the annotation of several others. The proportion of newly predicted genes increased with the share of eukaryotic genes in the metagenome and-within fungal taxa-increased with the number of introns per gene. Our approach provides a tool named SVMmycointron for improved metagenome annotation, especially of microbiomes with a high proportion of eukaryotes. The scripts described in the paper are made publicly available and can be readily utilized by microbiome researchers analysing metagenomics data.

Identifiants

pubmed: 37561110
doi: 10.1111/1755-0998.13852
doi:

Banques de données

RefSeq
['PRJNA603240', 'SRX099567', 'SRX1686623', 'SRX1990991', 'SRX2488989', 'SRX2575203', 'SRX2720157', 'SRX3197864', 'SRX691280', 'SRX1557139', 'SRX1944669', 'SRX2316877', 'SRX2538108', 'SRX2648762', 'SRX2939063', 'SRX665338', 'SRX732059']

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1800-1811

Subventions

Organisme : Grantová Agentura České Republiky
ID : 21-17749S
Organisme : Ministry of Education, Youth and Sports of the Czech Republic
ID : e-INFRA CZ LM2018140

Informations de copyright

© 2023 John Wiley & Sons Ltd.

Références

Baldrian, P., Větrovský, T., Lepinay, C., & Kohout, P. (2022). High-throughput sequencing view on the magnitude of global fungal diversity. Fungal Diversity, 114, 539-547. https://doi.org/10.1007/s13225-021-00472-y
Baten, A., Chang, B. C. H., Halgamuge, S. K., & Li, J. (2006). Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics, 7(Suppl 5), S15. https://doi.org/10.1186/1471-2105-7-s5-s15
Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schokopf, B., & Ratsch, G. (2008). Support vector machines and kernels for computational biology. PLoS Computational Biology, 4(10), 10. https://doi.org/10.1371/journal.pcbi.1000173
Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR), 49(2), 1-50.
Brabec, J., & Machlica, L. (2018). Bad practices in evaluation methodology relevant to class-imbalanced problems. ArXiv, 1812.01388. https://doi.org/10.48550/arXiv.1812.01388
Buchfink, B., Reuter, K., & Drost, H.-G. (2021). Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods, 18(4), 366-368. https://doi.org/10.1038/s41592-021-01101-x
Corrêa, F. B., Saraiva, J. P., Stadler, P. F., & da Rocha, U. N. (2020). TerrestrialMetagenomeDB: A public repository of curated and standardized metadata for terrestrial metagenomes. Nucleic Acids Research, 48(D1), D626-D632. https://doi.org/10.1093/nar/gkz994
de Boer, W., Folman, L. B., Summerbell, R. C., & Boddy, L. (2005). Living in a fungal world: Impact of fungi on soil bacterial niche development. FEMS Microbiology Reviews, 29(4), 795-811. https://doi.org/10.1016/j.femsre.2004.11.005
Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, pp. 978-3). Springer.
Frey, K., & Pucker, B. (2020). Animal, fungi, and plant genome Sequences Harbor different non-canonical splice sites. Cell, 9(2), 19. https://doi.org/10.3390/cells9020458
Grau-Bove, X., Torruella, G., Donachie, S., Suga, H., Leonard, G., Richards, T. A., & Ruiz-Trillo, I. (2017). Dynamics of genomic innovation in the unicellular ancestry of animals. eLife, 6, 35. https://doi.org/10.7554/eLife.26036
Grigoriev, I. V., Nikitin, R., Haridas, S., Kuo, A., Ohm, R., Otillar, R., Riley, R., Salamov, A., Zhao, X., Korzeniewski, F., Smirnova, T., Nordberg, H., Dubchak, I., & Shabalov, I. (2014). MycoCosm portal: Gearing up for 1000 fungal genomes. Nucleic Acids Research, 42(D1), D699-D704. https://doi.org/10.1093/nar/gkt1183
Grutzmann, K., Szafranski, K., Pohl, M., Voigt, K., Petzold, A., & Schuster, S. (2014). Fungal alternative splicing is associated with multicellular complexity and virulence: A genome-wide multi-species study. DNA Research, 21(1), 27-39. https://doi.org/10.1093/dnares/dst038
Handelsman, J. (2004). Metagenomics: Application of genomics to uncultured microorganisms. Microbiology and Molecular Biology Reviews, 68(4), 669-685. https://doi.org/10.1128/mmbr.68.4.669-685.2004
Irimia, M., & Roy, S. W. (2014). Origin of spliceosomal introns and alternative splicing. Cold Spring Harbor Perspectives in Biology, 6(6), a016071. https://doi.org/10.1101/cshperspect.a016071
Karin, E. L., Mirdita, M., & Soding, J. (2020). MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome, 8(1), 48. https://doi.org/10.1186/s40168-020-00808-x
Keren, H., Lev-Maor, G., & Ast, G. (2010). Alternative splicing and evolution: Diversification, exon definition and function. Nature Reviews Genetics, 11(5), 345-355. https://doi.org/10.1038/nrg2776
Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics, 5, 59. https://doi.org/10.1186/1471-2105-5-59
Kupfer, D. M., Drabenstot, S. D., Buchanan, K. L., Lai, H. S., Zhu, H., Dyer, D. W., Roe, B. A., & Murphy, J. W. (2004). Introns and splicing elements of five diverse fungi. Eukaryotic Cell, 3(5), 1088-1100. https://doi.org/10.1128/ec.3.5.1088-1100.2004
Leslie, C., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing, 564-575.
Li, Y. N., Steenwyk, J. L., Chang, Y., Wang, Y., James, T. Y., Stajich, J. E., Spatafora, J. W., Groenewald, M., Dunn, C. W., Hittinger, C. T., Shen, X. X., & Rokas, A. (2021). A genome-scale phylogeny of the kingdom fungi. Current Biology, 31(8), 1653-1665.e5. https://doi.org/10.1016/j.cub.2021.01.074
Lim, C. S., Weinstein, B. N., Roy, S. W., & Brown, C. M. (2021). Analysis of fungal genomes reveals commonalities of intron gain or loss and functions in intron-poor species. Molecular Biology and Evolution, 38(10), 4166-4186. https://doi.org/10.1093/molbev/msab094
Loftus, B. J., Fung, E., Roncaglia, P., Rowley, D., Amedeo, P., Bruno, D., Vamathevan, J., Miranda, M., Anderson, I. J., Fraser, J. A., Allen, J. E., Bosdet, I. E., Brent, M. R., Chiu, R., Doering, T. L., Donlin, M. J., D'Souza, C. A., Fox, D. S., Grinberg, V., … Hyman, R. W. (2005). The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science, 307(5713), 1321-1324. https://doi.org/10.1126/science.1103773
Malousi, A., Chouvarda, I., Koutkias, V., Kouidou, S., & Maglaveras, N. (2010). SpliceIT: A hybrid method for splice signal identification based on probabilistic and biological inference. Journal of Biomedical Informatics, 43(2), 208-217. https://doi.org/10.1016/j.jbi.2009.09.004
Martinez, D., Larrondo, L. F., Putnam, N., Gelpke, M. D. S., Huang, K., Chapman, J., Helfenbein, K. G., Ramaiya, P., Detter, J. C., Larimer, F., Coutinho, P. M., Henrissat, B., Berka, R., Cullen, D., & Rokhsar, D. (2004). Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78. Nature Biotechnology, 22(6), 695-700. https://doi.org/10.1038/nbt967
Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J., & Levy Karin, E. (2021). Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18), 3029-3031. https://doi.org/10.1093/bioinformatics/btab184
Nayfach, S., Roux, S., Seshadri, R., Udwary, D., Varghese, N., Schulz, F., Wu, D., Paez-Espino, D., Chen, I. M., Huntemann, M., Palaniappan, K., Ladau, J., Mukherjee, S., Reddy, T. B. K., Nielsen, T., Kirton, E., Faria, J. P., Edirisinghe, J. N., Henry, C. S., … Eloe-Fadrosh, E. A. (2021). A genomic catalog of Earth's microbiomes. Nature Biotechnology, 39(4), 499-509. https://doi.org/10.1038/s41587-020-0718-6
Parks, D. H., Rinke, C., Chuvochina, M., Chaumeil, P. A., Woodcroft, B. J., Evans, P. N., Hugenholtz, P., & Tyson, G. W. (2017). Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature Microbiology, 2(11), 1533-1542. https://doi.org/10.1038/s41564-017-0012-7
Patel, A. A., & Steitz, J. A. (2003). Splicing double: Insights from the second spliceosome. Nature Reviews Molecular Cell Biology, 4(12), 960-970. https://doi.org/10.1038/nrm1259
Rho, M., Tang, H., & Ye, Y. (2010). FragGeneScan: Predicting genes in short and error-prone reads. Nucleic Acids Research, 38(20), e191. https://doi.org/10.1093/nar/gkq747
Sieber, P., Voigt, K., Kammer, P., Brunke, S., Schuster, S., & Linde, J. (2018). Comparative study on alternative splicing in Human fungal pathogens suggests its involvement during host invasion. Frontiers in Microbiology, 9, 13. https://doi.org/10.3389/fmicb.2018.02313
Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., & Ratsch, G. (2007). Accurate splice site prediction using support vector machines. BMC Bioinformatics, 8, 16. https://doi.org/10.1186/1471-2105-8-s10-s7
Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., & Morgenstern, B. (2006). AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Research, 34, W435-W439. https://doi.org/10.1093/nar/gkl200
Starke, R., Mondéjar, R. L., Human, Z. R., Navrátilová, D., Štursová, M., Větrovský, T., Olson, H. M., Orton, D. J., Callister, S. J., Lipton, M. S., Howe, A., McCue, L. A., Pennacchio, C., Grigoriev, I., & Baldrian, P. (2021). Niche differentiation of bacteria and fungi in carbon and nitrogen cycling of different habitats in a temperate coniferous forest: A metaproteomic approach. Soil Biology & Biochemistry, 155, 108170. https://doi.org/10.1016/j.soilbio.2021.108170
Tedersoo, L., Bahram, M., Polme, S., Koljalg, U., Yorou, N. S., Wijesundera, R., Ruiz, L. V., Vasco-Palacios, A. M., Thu, P. Q., Suija, A., Smith, M. E., Sharp, C., Saluveer, E., Saitta, A., Rosas, M., Riit, T., Ratkowsky, D., Pritsch, K., Põldmaa, K., … Abarenkov, K. (2014). Global diversity and geography of soil fungi. Science, 346(6213), 1256688. https://doi.org/10.1126/science.1256688
Tláskal, V., Brabcová, V., Větrovský, T., Jomura, M., López-Mondéjar, R., Oliveira Monteiro, L. M., Saraiva, J. P., Human, Z. R., Cajthaml, T., Nunes da Rocha, U., & Baldrian, P. (2021). Complementary roles of wood-inhabiting fungi and bacteria facilitate deadwood decomposition. mSystems, 6(1), e01078-20. https://doi.org/10.1128/mSystems.01078-20
Tláskal, V., Brabcová, V., Větrovský, T., López-Mondéjar, R., Monteiro, L. M. O., Saraiva, J. P., da Rocha, U. N., & Baldrian, P. (2021). Metagenomes, metatranscriptomes and microbiomes of naturally decomposing deadwood. Scientific Data, 8(1), 198. https://doi.org/10.1038/s41597-021-00987-8
Žifčáková, L., Větrovský, T., Lombard, V., Henrissat, B., Howe, A., & Baldrian, P. (2017). Feed in summer, rest in winter: Microbial carbon utilization in forest topsoil. Microbiome, 5(1), 122. https://doi.org/10.1186/s40168-017-0340-0

Auteurs

Anh Vu Le (AV)

Department of Computer Science, Czech Technical University in Prague, Praha, Czech Republic.

Tomáš Větrovský (T)

Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic.

Denis Barucic (D)

Department of Computer Science, Czech Technical University in Prague, Praha, Czech Republic.

Joao Pedro Saraiva (JP)

Department of Environmental Microbiology, UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany.

Priscila Thiago Dobbler (PT)

Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic.

Petr Kohout (P)

Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic.

Martin Pospíšek (M)

Department of Genetics and Microbiology, Charles University, Praha, Czech Republic.

Ulisses Nunes da Rocha (UN)

Department of Environmental Microbiology, UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany.

Jiří Kléma (J)

Department of Computer Science, Czech Technical University in Prague, Praha, Czech Republic.

Petr Baldrian (P)

Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic.

Classifications MeSH