Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework.

Computational Biology / methods Mass Spectrometry Proteomics / methods Reproducibility of Results Software

bioinformatics computational workflows data-independent acquisition galaxy mass spectrometry proteomics

Journal

GigaScience

ISSN: 2047-217X

Titre abrégé: Gigascience

Pays: United States

ID NLM: 101596872

Informations de publication

Date de publication:
15 02 2022

Historique:

received: 27 07 2021

revised: 26 11 2021

accepted: 12 01 2022

entrez: 15 2 2022

pubmed: 16 2 2022

medline: 5 4 2022

Statut: ppublish

Résumé

Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility. To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community. The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.

Sections du résumé

BACKGROUND

FINDINGS

To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community.

CONCLUSION

The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.

Identifiants

DOI: 10.1093/gigascience/giac005 PMID: 35166338 PMC: PMC8848309

pubmed: 35166338

pii: 6528772

doi: 10.1093/gigascience/giac005

pmc: PMC8848309

pii:

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Nat Biotechnol. 2014 Mar;32(3):219-23

pubmed: 24727770

Methods Mol Biol. 2010;604:319-31

pubmed: 20013381

J Proteome Res. 2019 Mar 1;18(3):1340-1351

pubmed: 30726097

Bioinformatics. 2015 Feb 15;31(4):555-62

pubmed: 25348213

Nat Protoc. 2015 Mar;10(3):426-41

pubmed: 25675208

Bioinformatics. 2014 Sep 1;30(17):2524-6

pubmed: 24794931

Nat Methods. 2017 Sep;14(9):921-927

pubmed: 28825704

Sci Data. 2016 Mar 15;3:160018

pubmed: 26978244

Methods Mol Biol. 2017;1550:289-307

pubmed: 28188537

Gigascience. 2022 Feb 15;11:

pubmed: 35166338

Mol Cell Proteomics. 2019 Oct;18(10):1967-1980

pubmed: 31332098

PLoS One. 2016 Apr 07;11(4):e0153160

pubmed: 27054327

Nat Commun. 2018 Dec 3;9(1):5128

pubmed: 30510204

PLoS Comput Biol. 2021 May 13;17(5):e1008923

pubmed: 33983944

Proteomics. 2020 Sep;20(17-18):e1900276

pubmed: 32275110

J Proteome Res. 2021 Jan 1;20(1):1096-1102

pubmed: 33091296

Methods Mol Biol. 2021;2228:453-468

pubmed: 33950509

Mol Cell Proteomics. 2015 May;14(5):1400-10

pubmed: 25724911

Proteomics. 2012 Apr;12(8):1111-21

pubmed: 22577012

Nat Biotechnol. 2008 Dec;26(12):1367-72

pubmed: 19029910

Mol Cell Proteomics. 2012 Jun;11(6):O111.016717

pubmed: 22261725

Anal Chem. 2019 Jul 2;91(13):8453-8460

pubmed: 31247731

Mol Syst Biol. 2018 Aug 13;14(8):e8126

pubmed: 30104418

Cell Syst. 2018 Jun 27;6(6):752-758.e1

pubmed: 29953864

Nat Biotechnol. 2007 Aug;25(8):887-93

pubmed: 17687369

Anal Bioanal Chem. 2017 Feb;409(4):1049-1057

pubmed: 27766361

J Am Soc Mass Spectrom. 2019 Apr;30(4):669-684

pubmed: 30671891

J Proteome Res. 2021 Jul 2;20(7):3758-3766

pubmed: 34153189

Bioinformatics. 2010 Apr 1;26(7):966-8

pubmed: 20147306

Nat Biotechnol. 2016 Nov;34(11):1130-1136

pubmed: 27701404

Mol Cell Proteomics. 2015 Oct;14(10):2800-13

pubmed: 26199342

Bioinformatics. 2008 Nov 1;24(21):2534-6

pubmed: 18606607

Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Matthias Fahrner (M)

Melanie Christine Föll (MC)

Björn Andreas Grüning (BA)

Matthias Bernt (M)

Hannes Röst (H)

Oliver Schilling (O)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Relative victimization scale: initial development and retrospective reports of the impact on mental health.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Cultural adaptation and validation of the Sinhala version of the spiritual needs assessment for patients (S-SNAP) questionnaire.

Classifications MeSH