Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework.

bioinformatics computational workflows data-independent acquisition galaxy mass spectrometry proteomics

Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
15 02 2022
Historique:
received: 27 07 2021
revised: 26 11 2021
accepted: 12 01 2022
entrez: 15 2 2022
pubmed: 16 2 2022
medline: 5 4 2022
Statut: ppublish

Résumé

Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility. To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community. The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.

Sections du résumé

BACKGROUND
Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility.
FINDINGS
To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community.
CONCLUSION
The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.

Identifiants

pubmed: 35166338
pii: 6528772
doi: 10.1093/gigascience/giac005
pmc: PMC8848309
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press GigaScience.

Références

Nat Biotechnol. 2014 Mar;32(3):219-23
pubmed: 24727770
Methods Mol Biol. 2010;604:319-31
pubmed: 20013381
J Proteome Res. 2019 Mar 1;18(3):1340-1351
pubmed: 30726097
Bioinformatics. 2015 Feb 15;31(4):555-62
pubmed: 25348213
Nat Protoc. 2015 Mar;10(3):426-41
pubmed: 25675208
Bioinformatics. 2014 Sep 1;30(17):2524-6
pubmed: 24794931
Nat Methods. 2017 Sep;14(9):921-927
pubmed: 28825704
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Methods Mol Biol. 2017;1550:289-307
pubmed: 28188537
Gigascience. 2022 Feb 15;11:
pubmed: 35166338
Mol Cell Proteomics. 2019 Oct;18(10):1967-1980
pubmed: 31332098
PLoS One. 2016 Apr 07;11(4):e0153160
pubmed: 27054327
Nat Commun. 2018 Dec 3;9(1):5128
pubmed: 30510204
PLoS Comput Biol. 2021 May 13;17(5):e1008923
pubmed: 33983944
Proteomics. 2020 Sep;20(17-18):e1900276
pubmed: 32275110
J Proteome Res. 2021 Jan 1;20(1):1096-1102
pubmed: 33091296
Methods Mol Biol. 2021;2228:453-468
pubmed: 33950509
Mol Cell Proteomics. 2015 May;14(5):1400-10
pubmed: 25724911
Proteomics. 2012 Apr;12(8):1111-21
pubmed: 22577012
Nat Biotechnol. 2008 Dec;26(12):1367-72
pubmed: 19029910
Mol Cell Proteomics. 2012 Jun;11(6):O111.016717
pubmed: 22261725
Anal Chem. 2019 Jul 2;91(13):8453-8460
pubmed: 31247731
Mol Syst Biol. 2018 Aug 13;14(8):e8126
pubmed: 30104418
Cell Syst. 2018 Jun 27;6(6):752-758.e1
pubmed: 29953864
Nat Biotechnol. 2007 Aug;25(8):887-93
pubmed: 17687369
Anal Bioanal Chem. 2017 Feb;409(4):1049-1057
pubmed: 27766361
J Am Soc Mass Spectrom. 2019 Apr;30(4):669-684
pubmed: 30671891
J Proteome Res. 2021 Jul 2;20(7):3758-3766
pubmed: 34153189
Bioinformatics. 2010 Apr 1;26(7):966-8
pubmed: 20147306
Nat Biotechnol. 2016 Nov;34(11):1130-1136
pubmed: 27701404
Mol Cell Proteomics. 2015 Oct;14(10):2800-13
pubmed: 26199342
Bioinformatics. 2008 Nov 1;24(21):2534-6
pubmed: 18606607

Auteurs

Matthias Fahrner (M)

Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Breisacher Straße 115a, D-79106 Freiburg, Germany.
Faculty of Biology, Albert-Ludwigs-University Freiburg, Schänzlestraße 1, D-79104 Freiburg, Freiburg, Germany.
Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Albertstraße 19A, D-79104 Freiburg, Germany.

Melanie Christine Föll (MC)

Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Breisacher Straße 115a, D-79106 Freiburg, Germany.
Khoury College of Computer Sciences, Northeastern University, 440 Huntington Ave, Boston, MA 02115, USA.

Björn Andreas Grüning (BA)

Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany.

Matthias Bernt (M)

Young Investigators Group Bioinformatics and Transcriptomics, Helmholtz Centre for Environmental Research-UFZ, Permoserstraße 15, D-04318 Leipzig, Germany.

Hannes Röst (H)

Donnelly Centre,University of Toronto, 160 College St, Toronto, ON M5S 3E1, Toronto, ON, Canada.

Oliver Schilling (O)

Institute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Breisacher Straße 115a, D-79106 Freiburg, Germany.
German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Hugstetter Straße 55, D-79106 Freiburg, Heidelberg, Germany.
BIOSS Centre for Biological Signaling Studies,University of Freiburg, Schänzlestraße 18, D-79104 Freiburg, D-79104 Freiburg, Germany.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Humans Middle Aged Female Male Surveys and Questionnaires

Classifications MeSH