Interoperable and scalable data analysis with microservices: applications in metabolomics.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 10 2019
Historique:
received: 22 08 2018
revised: 25 02 2019
accepted: 08 03 2019
pubmed: 10 3 2019
medline: 9 6 2020
entrez: 10 3 2019
Statut: ppublish

Résumé

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 30851093
pii: 5372675
doi: 10.1093/bioinformatics/btz160
pmc: PMC6761976
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

3752-3760

Subventions

Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/H024921/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/I000771/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/L024055/1
Pays : United Kingdom

Informations de copyright

© The Author(s) 2019. Published by Oxford University Press.

Références

Metabolomics. 2016;12:14
pubmed: 26612985
Nat Genet. 2012 Jan 27;44(2):121-6
pubmed: 22281772
Front Immunol. 2016 Aug 04;7:246
pubmed: 27540379
Anal Chim Acta. 2016 Aug 3;930:13-22
pubmed: 27265900
Brief Bioinform. 2017 May 1;18(3):530-536
pubmed: 27013646
Nat Methods. 2016 Aug 30;13(9):741-8
pubmed: 27575624
Nat Rev Genet. 2010 Sep;11(9):647-57
pubmed: 20717155
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Nucleic Acids Res. 2016 Jan 4;44(D1):D463-70
pubmed: 26467476
J Intern Med. 2013 Feb;273(2):156-65
pubmed: 23216817
Nucleic Acids Res. 2012 Jul;40(Web Server issue):W127-33
pubmed: 22553367
Mol Cell Proteomics. 2011 Jan;10(1):R110.000133
pubmed: 20716697
Curr Opin Biotechnol. 2015 Aug;34:82-90
pubmed: 25531408
Nat Rev Genet. 2013 May;14(5):333-46
pubmed: 23594911
Anal Chem. 2006 Feb 1;78(3):779-87
pubmed: 16448051
Brain. 2007 Oct;130(Pt 10):2543-53
pubmed: 17626034
Anal Chem. 2017 Jan 17;89(2):1254-1259
pubmed: 27983788
PLoS Comput Biol. 2015 Aug 27;11(8):e1004321
pubmed: 26313928
BMC Bioinformatics. 2008 Mar 26;9:163
pubmed: 18366760
Trends Biotechnol. 2017 Jun;35(6):481-483
pubmed: 28117091
PLoS Comput Biol. 2017 May 25;13(5):e1005425
pubmed: 28542180
Eur J Neurol. 2005 Aug;12(8):625-31
pubmed: 16053472
Nat Rev Genet. 2018 May;19(5):325
pubmed: 29430012
Physiol Genomics. 2007 Apr 24;29(2):99-108
pubmed: 17190852
Curr Opin Biotechnol. 2015 Aug;34:189-201
pubmed: 25731751
Bioinformatics. 2012 Oct 1;28(19):2520-2
pubmed: 22908215
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Bioinformatics. 2017 Aug 15;33(16):2580-2582
pubmed: 28379341
BMC Bioinformatics. 2010 Mar 22;11:148
pubmed: 20307295
Brief Bioinform. 2008 Nov;9(6):493-505
pubmed: 18621748
J Bioinform Comput Biol. 2016 Apr;14(2):1641008
pubmed: 27122320
Nat Rev Drug Discov. 2003 Aug;2(8):668-76
pubmed: 12904817
Nucleic Acids Res. 2013 Jan;41(Database issue):D781-6
pubmed: 23109552
Nature. 2013 Jul 18;499(7458):268-70
pubmed: 23868243
Curr Pharm Des. 2008;14(23):2326-36
pubmed: 18781983
Genome Biol. 2010;11(8):R86
pubmed: 20738864
Genes Dev. 1998 Jan 15;12(2):149-62
pubmed: 9436976
Bioinformatics. 2015 May 1;31(9):1493-5
pubmed: 25527831
Sci Rep. 2017 Feb 03;7:41473
pubmed: 28155867
Anal Chem. 2018 Jan 2;90(1):649-656
pubmed: 29035042
Nature. 2017 May 29;546(7656):173-174
pubmed: 28569835
Curr Pharm Des. 2012;18(32):4980-90
pubmed: 22716159
Anal Chem. 2012 Jan 3;84(1):283-9
pubmed: 22111785
PeerJ Comput Sci. 2019 Nov 11;5:e232
pubmed: 33816885
Nature. 2013 Jun 13;498(7453):255-60
pubmed: 23765498
Circ Res. 2015 Mar 27;116(7):1231-44
pubmed: 25814684
Science. 2005 May 6;308(5723):814-7
pubmed: 15879208

Auteurs

Payam Emami Khoonsari (P)

Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden.

Pablo Moreno (P)

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

Sven Bergmann (S)

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
Swiss Institute of Bioinformatics, Lausanne, Switzerland.

Joachim Burman (J)

Department of Neuroscience, Uppsala University, Uppsala, Sweden.

Marco Capuccini (M)

Department of Information Technology, Uppsala University, Uppsala, Sweden.
Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.

Matteo Carone (M)

Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.

Marta Cascante (M)

Department of Biochemistry and Molecular Biomedicine, and Institute of Biomedicine (IBUB), Faculty of Biology, Universitat de Barcelona (IBUB), Barcelona, Spain.
Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD) and Metabolomics Node at INB-Bioinfarmatics Platform, Instituto de Salud Carlos III (ISCIII), Madrid, Spain.

Pedro de Atauri (P)

Department of Biochemistry and Molecular Biomedicine, and Institute of Biomedicine (IBUB), Faculty of Biology, Universitat de Barcelona (IBUB), Barcelona, Spain.
Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD) and Metabolomics Node at INB-Bioinfarmatics Platform, Instituto de Salud Carlos III (ISCIII), Madrid, Spain.

Carles Foguet (C)

Department of Biochemistry and Molecular Biomedicine, and Institute of Biomedicine (IBUB), Faculty of Biology, Universitat de Barcelona (IBUB), Barcelona, Spain.
Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD) and Metabolomics Node at INB-Bioinfarmatics Platform, Instituto de Salud Carlos III (ISCIII), Madrid, Spain.

Alejandra N Gonzalez-Beltran (AN)

Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK.

Thomas Hankemeier (T)

Division of Analytical Biosciences, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands.

Kenneth Haug (K)

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

Sijin He (S)

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

Stephanie Herman (S)

Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden.
Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.

David Johnson (D)

Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK.

Namrata Kale (N)

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.

Anders Larsson (A)

Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.
National Bioinformatics Infrastructure Sweden, Uppsala University, Uppsala, Sweden.

Steffen Neumann (S)

Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, Halle, Germany.
German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany.

Kristian Peters (K)

Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, Halle, Germany.

Luca Pireddu (L)

CRS4: Center for Advanced Studies, Research and Development in Sardinia, Distributed Computing Group, Pula, Italy.

Philippe Rocca-Serra (P)

Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK.

Pierrick Roger (P)

CEA, LIST, Laboratory for Data Analysis and Systems' Intelligence, MetaboHUB, Gif-sur-Yvette, France.

Rico Rueedi (R)

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
Swiss Institute of Bioinformatics, Lausanne, Switzerland.

Christoph Ruttkies (C)

Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, Halle, Germany.

Noureddin Sadawi (N)

Faculty of Medicine, Department of Surgery & Cancer, Imperial College London, London, UK.

Reza M Salek (RM)

International Agency for Research on Cancer, 69372 Lyon CEDEX 08, France.

Susanna-Assunta Sansone (SA)

Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, UK.

Daniel Schober (D)

Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, Halle, Germany.

Vitaly Selivanov (V)

Department of Biochemistry and Molecular Biomedicine, and Institute of Biomedicine (IBUB), Faculty of Biology, Universitat de Barcelona (IBUB), Barcelona, Spain.
Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD) and Metabolomics Node at INB-Bioinfarmatics Platform, Instituto de Salud Carlos III (ISCIII), Madrid, Spain.

Etienne A Thévenot (EA)

CEA, LIST, Laboratory for Data Analysis and Systems' Intelligence, MetaboHUB, Gif-sur-Yvette, France.

Michael van Vliet (M)

Division of Analytical Biosciences, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands.

Gianluigi Zanetti (G)

CRS4: Center for Advanced Studies, Research and Development in Sardinia, Distributed Computing Group, Pula, Italy.

Christoph Steinbeck (C)

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University, Jena, Germany.

Kim Kultima (K)

Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden.

Ola Spjuth (O)

Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Cephalometry Humans Anatomic Landmarks Software Internet
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic

Classifications MeSH