ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.

bioinformatics community platform deep learning educational platform machine learning proteomics

Journal

Journal of proteome research
ISSN: 1535-3907
Titre abrégé: J Proteome Res
Pays: United States
ID NLM: 101128775

Informations de publication

Date de publication:
03 02 2023
Historique:
pubmed: 25 1 2023
medline: 7 2 2023
entrez: 24 1 2023
Statut: ppublish

Résumé

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.

Identifiants

pubmed: 36693629
doi: 10.1021/acs.jproteome.2c00629
pmc: PMC9903315
doi:

Substances chimiques

Peptides 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

632-636

Subventions

Organisme : NIA NIH HHS
ID : U19 AG023122
Pays : United States
Organisme : NIGMS NIH HHS
ID : R24 GM127667
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM087221
Pays : United States
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/V018779/1
Pays : United Kingdom
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/S01781X/1
Pays : United Kingdom
Organisme : Wellcome Trust
ID : 223745/Z/21/Z
Pays : United Kingdom

Références

J Proteome Res. 2021 Sep 3;20(9):4621-4624
pubmed: 34342226
Nat Commun. 2020 Oct 16;11(1):5301
pubmed: 33067450
Eur J Biochem. 1983 Jun 1;133(1):17-21
pubmed: 6852022
Nat Methods. 2017 Mar;14(3):259-262
pubmed: 28135259
Proteomics. 2020 Nov;20(21-22):e1900351
pubmed: 32267083
J Am Soc Mass Spectrom. 2021 Mar 3;32(3):661-669
pubmed: 33539078
Bioinformatics. 2022 Jan 12;38(3):875-877
pubmed: 34636883
Bioinformatics. 2019 Dec 15;35(24):5243-5248
pubmed: 31077310
Nat Protoc. 2016 Dec;11(12):2301-2319
pubmed: 27809316
J Proteome Res. 2021 Dec 3;20(12):5227-5240
pubmed: 34670092
J Am Soc Mass Spectrom. 2015 Jan;26(1):14-24
pubmed: 25331153
Mol Cell Proteomics. 2014 Jan;13(1):339-47
pubmed: 24143002
Sci Data. 2022 Mar 30;9(1):126
pubmed: 35354825
Nat Commun. 2021 Feb 19;12(1):1185
pubmed: 33608539
J Am Soc Mass Spectrom. 2019 Nov;30(11):2185-2195
pubmed: 31493234
Nat Methods. 2021 Nov;18(11):1363-1369
pubmed: 34711972
Proteomics. 2020 Nov;20(21-22):e1900335
pubmed: 32939979
Plant Cell. 2021 Nov 4;33(11):3421-3453
pubmed: 34411258
Bioinformatics. 2017 Jul 15;33(14):2235-2237
pubmed: 28334295
Protein Eng. 1999 Jan;12(1):3-9
pubmed: 10065704
Nat Methods. 2019 Jun;16(6):509-518
pubmed: 31133760
J Proteome Res. 2022 Jul 1;21(7):1771-1782
pubmed: 35696663
Nucleic Acids Res. 2022 Jan 7;50(D1):D543-D552
pubmed: 34723319
Nucleic Acids Res. 2020 Jan 8;48(D1):D1145-D1152
pubmed: 31686107
Cell Rep Methods. 2021 May 17;1(2):100003
pubmed: 35475237
Nucleic Acids Res. 2019 Jul 2;47(W1):W295-W299
pubmed: 31028400
Anal Chem. 2008 Dec 15;80(24):9689-99
pubmed: 18986171

Auteurs

Tobias G Rehfeldt (TG)

Institute for Mathematics and Computer Science, University of Southern Denmark, 5000 Odense, Denmark.

Ralf Gabriels (R)

VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium.
Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium.

Robbin Bouwmeester (R)

VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium.
Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium.

Siegfried Gessulat (S)

MSAID Gmbh, Berlin 10559, Germany.

Benjamin A Neely (BA)

National Institute of Standards and Technology, Charleston, South Carolina 29412, United States.

Magnus Palmblad (M)

Center for Proteomics and Metabolomics, Leiden University Medical Center, 2300 RC Leiden, The Netherlands.

Yasset Perez-Riverol (Y)

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.

Tobias Schmidt (T)

MSAID GmbH, Garching b. Munich 85748, Germany.

Juan Antonio Vizcaíno (JA)

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.

Eric W Deutsch (EW)

Institute for Systems Biology, Seattle, Washington 98109, United States.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Humans Middle Aged Female Male Surveys and Questionnaires

Classifications MeSH