Multimodal regularized linear models with flux balance analysis for mechanistic integration of omics data.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
25 Oct 2021
Historique:
received: 16 08 2020
revised: 06 01 2021
accepted: 27 04 2021
medline: 12 5 2021
pubmed: 12 5 2021
entrez: 11 5 2021
Statut: ppublish

Résumé

High-throughput biological data, thanks to technological advances, have become cheaper to collect, leading to the availability of vast amounts of omic data of different types. In parallel, the in silico reconstruction and modeling of metabolic systems is now acknowledged as a key tool to complement experimental data on a large scale. The integration of these model- and data-driven information is therefore emerging as a new challenge in systems biology, with no clear guidance on how to better take advantage of the inherent multisource and multiomic nature of these data types while preserving mechanistic interpretation. Here, we investigate different regularization techniques for high-dimensional data derived from the integration of gene expression profiles with metabolic flux data, extracted from strain-specific metabolic models, to improve cellular growth rate predictions. To this end, we propose ad-hoc extensions of previous regularization frameworks including group, view-specific and principal component regularization and experimentally compare them using data from 1143 Saccharomyces cerevisiae strains. We observe a divergence between methods in terms of regression accuracy and integration effectiveness based on the type of regularization employed. In multiomic regression tasks, when learning from experimental and model-generated omic data, our results demonstrate the competitiveness and ease of interpretation of multimodal regularized linear models compared to data-hungry methods based on neural networks. All data, models and code produced in this work are available on GitHub at https://github.com/Angione-Lab/HybridGroupIPFLasso_pc2Lasso. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 33974036
pii: 6273576
doi: 10.1093/bioinformatics/btab324
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

3546-3552

Subventions

Organisme : UKRI Research England's THYME project
Organisme : Children's Liver Disease Foundation Research

Informations de copyright

© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Auteurs

Giuseppe Magazzù (G)

School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK.

Guido Zampieri (G)

School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK.
Department of Biology, University of Padova, Padova, Italy.

Claudio Angione (C)

School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK.
Healthcare Innovation Centre, Teesside University, Middlesbrough, UK.
Centre for Digital Innovation, Teesside University, Middlesbrough, UK.

Classifications MeSH