Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework.
bioinformatics
computational workflows
data-independent acquisition
galaxy
mass spectrometry
proteomics
Journal
GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872
Informations de publication
Date de publication:
15 02 2022
15 02 2022
Historique:
received:
27
07
2021
revised:
26
11
2021
accepted:
12
01
2022
entrez:
15
2
2022
pubmed:
16
2
2022
medline:
5
4
2022
Statut:
ppublish
Résumé
Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility. To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community. The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.
Sections du résumé
BACKGROUND
Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility.
FINDINGS
To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community.
CONCLUSION
The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.
Identifiants
pubmed: 35166338
pii: 6528772
doi: 10.1093/gigascience/giac005
pmc: PMC8848309
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press GigaScience.
Références
Nat Biotechnol. 2014 Mar;32(3):219-23
pubmed: 24727770
Methods Mol Biol. 2010;604:319-31
pubmed: 20013381
J Proteome Res. 2019 Mar 1;18(3):1340-1351
pubmed: 30726097
Bioinformatics. 2015 Feb 15;31(4):555-62
pubmed: 25348213
Nat Protoc. 2015 Mar;10(3):426-41
pubmed: 25675208
Bioinformatics. 2014 Sep 1;30(17):2524-6
pubmed: 24794931
Nat Methods. 2017 Sep;14(9):921-927
pubmed: 28825704
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Methods Mol Biol. 2017;1550:289-307
pubmed: 28188537
Gigascience. 2022 Feb 15;11:
pubmed: 35166338
Mol Cell Proteomics. 2019 Oct;18(10):1967-1980
pubmed: 31332098
PLoS One. 2016 Apr 07;11(4):e0153160
pubmed: 27054327
Nat Commun. 2018 Dec 3;9(1):5128
pubmed: 30510204
PLoS Comput Biol. 2021 May 13;17(5):e1008923
pubmed: 33983944
Proteomics. 2020 Sep;20(17-18):e1900276
pubmed: 32275110
J Proteome Res. 2021 Jan 1;20(1):1096-1102
pubmed: 33091296
Methods Mol Biol. 2021;2228:453-468
pubmed: 33950509
Mol Cell Proteomics. 2015 May;14(5):1400-10
pubmed: 25724911
Proteomics. 2012 Apr;12(8):1111-21
pubmed: 22577012
Nat Biotechnol. 2008 Dec;26(12):1367-72
pubmed: 19029910
Mol Cell Proteomics. 2012 Jun;11(6):O111.016717
pubmed: 22261725
Anal Chem. 2019 Jul 2;91(13):8453-8460
pubmed: 31247731
Mol Syst Biol. 2018 Aug 13;14(8):e8126
pubmed: 30104418
Cell Syst. 2018 Jun 27;6(6):752-758.e1
pubmed: 29953864
Nat Biotechnol. 2007 Aug;25(8):887-93
pubmed: 17687369
Anal Bioanal Chem. 2017 Feb;409(4):1049-1057
pubmed: 27766361
J Am Soc Mass Spectrom. 2019 Apr;30(4):669-684
pubmed: 30671891
J Proteome Res. 2021 Jul 2;20(7):3758-3766
pubmed: 34153189
Bioinformatics. 2010 Apr 1;26(7):966-8
pubmed: 20147306
Nat Biotechnol. 2016 Nov;34(11):1130-1136
pubmed: 27701404
Mol Cell Proteomics. 2015 Oct;14(10):2800-13
pubmed: 26199342
Bioinformatics. 2008 Nov 1;24(21):2534-6
pubmed: 18606607