Exploring machine learning for untargeted metabolomics using molecular fingerprints.

Ataxia telangiectasia Machine learning Mass spectrometry Molecular fingerprinting Untargeted metabolomics

Journal

Computer methods and programs in biomedicine
ISSN: 1872-7565
Titre abrégé: Comput Methods Programs Biomed
Pays: Ireland
ID NLM: 8506513

Informations de publication

Date de publication:
08 Apr 2024
Historique:
received: 18 12 2023
revised: 15 03 2024
accepted: 03 04 2024
medline: 17 4 2024
pubmed: 17 4 2024
entrez: 16 4 2024
Statut: aheadofprint

Résumé

Metabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways. This study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups. The approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study. In conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models.

Sections du résumé

BACKGROUND BACKGROUND
Metabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways.
METHODS METHODS
This study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups.
RESULTS RESULTS
The approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study.
CONCLUSION CONCLUSIONS
In conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models.

Identifiants

pubmed: 38626559
pii: S0169-2607(24)00159-7
doi: 10.1016/j.cmpb.2024.108163
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

108163

Informations de copyright

Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Christel Sirocchi (C)

Department of Pure and Applied Sciences, University of Urbino, Piazza della Repubblica, 13, Urbino, 61029, Italy. Electronic address: c.sirocchi2@campus.uniurb.it.

Federica Biancucci (F)

Department of Biomolecular Sciences, University of Urbino, Via Saffi 2, Urbino, 61029, Italy.

Matteo Donati (M)

Department of Pure and Applied Sciences, University of Urbino, Piazza della Repubblica, 13, Urbino, 61029, Italy.

Alessandro Bogliolo (A)

Department of Pure and Applied Sciences, University of Urbino, Piazza della Repubblica, 13, Urbino, 61029, Italy.

Mauro Magnani (M)

Department of Biomolecular Sciences, University of Urbino, Via Saffi 2, Urbino, 61029, Italy.

Michele Menotta (M)

Department of Biomolecular Sciences, University of Urbino, Via Saffi 2, Urbino, 61029, Italy.

Sara Montagna (S)

Department of Pure and Applied Sciences, University of Urbino, Piazza della Repubblica, 13, Urbino, 61029, Italy.

Classifications MeSH