A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions.

disease prediction machine learning metadata metagenome shotgun sequencing

Journal

Frontiers in microbiology
ISSN: 1664-302X
Titre abrégé: Front Microbiol
Pays: Switzerland
ID NLM: 101548977

Informations de publication

Date de publication:
2024
Historique:
received: 23 11 2023
accepted: 29 01 2024
medline: 29 2 2024
pubmed: 29 2 2024
entrez: 29 2 2024
Statut: epublish

Résumé

Metagenomics, Metabolomics, and Metaproteomics have significantly advanced our knowledge of microbial communities by providing culture-independent insights into their composition and functional potential. However, a critical challenge in this field is the lack of standard and comprehensive metadata associated with raw data, hindering the ability to perform robust data stratifications and consider confounding factors. In this comprehensive review, we categorize publicly available microbiome data into five types: shotgun sequencing, amplicon sequencing, metatranscriptomic, metabolomic, and metaproteomic data. We explore the importance of metadata for data reuse and address the challenges in collecting standardized metadata. We also, assess the limitations in metadata collection of existing public repositories collecting metagenomic data. This review emphasizes the vital role of metadata in interpreting and comparing datasets and highlights the need for standardized metadata protocols to fully leverage metagenomic data's potential. Furthermore, we explore future directions of implementation of Machine Learning (ML) in metadata retrieval, offering promising avenues for a deeper understanding of microbial communities and their ecological roles. Leveraging these tools will enhance our insights into microbial functional capabilities and ecological dynamics in diverse ecosystems. Finally, we emphasize the crucial metadata role in ML models development.

Identifiants

pubmed: 38419630
doi: 10.3389/fmicb.2024.1343572
pmc: PMC10900530
doi:

Types de publication

Journal Article Review

Langues

eng

Pagination

1343572

Informations de copyright

Copyright © 2024 Kumar, Lorusso, Fosso and Pesole.

Déclaration de conflit d'intérêts

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Auteurs

Bablu Kumar (B)

Università degli Studi di Milano, Milan, Italy.
Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy.

Erika Lorusso (E)

Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy.
National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy.

Bruno Fosso (B)

Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy.

Graziano Pesole (G)

Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy.
National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy.

Classifications MeSH