Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.

Locality-sensitive hashing Mass spectrometry Signal processing

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
20 Jul 2022
Historique:
received: 01 07 2021
accepted: 08 07 2022
entrez: 20 7 2022
pubmed: 21 7 2022
medline: 23 7 2022
Statut: epublish

Résumé

Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties. In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs. Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data. Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .

Sections du résumé

BACKGROUND BACKGROUND
Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties.
RESULTS RESULTS
In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs.
CONCLUSIONS CONCLUSIONS
Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data.
AVAILABILITY BACKGROUND
Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .

Identifiants

pubmed: 35858828
doi: 10.1186/s12859-022-04833-5
pii: 10.1186/s12859-022-04833-5
pmc: PMC9301846
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

287

Subventions

Organisme : Deutsche Forschungsgemeinschaft
ID : 329350978
Organisme : Deutsche Forschungsgemeinschaft
ID : 329350978
Organisme : Deutsche Forschungsgemeinschaft
ID : 329350978
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217B
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217B
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217A
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217A
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217B

Informations de copyright

© 2022. The Author(s).

Références

Anal Chem. 2020 Jul 21;92(14):9472-9475
pubmed: 32501003
Nat Biotechnol. 2007 Jul;25(7):755-7
pubmed: 17621303
Nat Biotechnol. 2015 Jun;33(6):623-30
pubmed: 26006009
Sci Rep. 2019 Nov 20;9(1):17168
pubmed: 31748623
Nature. 2003 Mar 13;422(6928):198-207
pubmed: 12634793
J Proteome Res. 2021 Apr 2;20(4):2122-2129
pubmed: 33724840
Trends Biotechnol. 1999 Mar;17(3):121-7
pubmed: 10189717
Nat Biotechnol. 2008 Dec;26(12):1367-72
pubmed: 19029910
Mass Spectrom Rev. 2012 Jan-Feb;31(1):96-109
pubmed: 21590704
BMC Bioinformatics. 2012 Nov 08;13:291
pubmed: 23137144
Nat Methods. 2009 May;6(5):359-62
pubmed: 19377485
Nat Methods. 2016 Aug;13(8):651-656
pubmed: 27493588
J Proteome Res. 2010 Feb 5;9(2):997-1006
pubmed: 20000344
Biochem Soc Trans. 2020 Oct 30;48(5):1953-1966
pubmed: 33079175
Methods Mol Biol. 2011;696:341-52
pubmed: 21063959
Mol Cell Proteomics. 2018 Dec;17(12):2534-2545
pubmed: 30385480
J Am Soc Mass Spectrom. 1995 Apr;6(4):229-33
pubmed: 24214167
Bioinformatics. 2017 Dec 01;33(23):3740-3748
pubmed: 28961782
BMC Bioinformatics. 2019 Jul 17;20(1):397
pubmed: 31315562
Mol Cell Proteomics. 2020 Jun;19(6):1058-1069
pubmed: 32156793
J Proteome Res. 2004 Mar-Apr;3(2):179-96
pubmed: 15113093
Proteomics. 2020 Nov;20(21-22):e2000002
pubmed: 32415809
Nat Protoc. 2016 Apr;11(4):795-812
pubmed: 27010757
Bioinformatics. 2007 Mar 1;23(5):612-8
pubmed: 17237061
Nucleic Acids Res. 2019 Jan 8;47(D1):D15-D22
pubmed: 30445657
J Proteome Res. 2019 Jan 4;18(1):147-158
pubmed: 30511858
Electrophoresis. 1998 Aug;19(11):1853-61
pubmed: 9740045
Nat Methods. 2014 Feb;11(2):167-70
pubmed: 24336358

Auteurs

Konstantin Bob (K)

Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

David Teschner (D)

Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

Thomas Kemmer (T)

Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

David Gomez-Zepeda (D)

Institute for Immunology, University Medical Center of the Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.
Immunoproteomics Unit, Helmholtz-Institute for Translational Oncology (HI-TRON) Mainz, D-55131, Mainz, Germany.

Stefan Tenzer (S)

Institute for Immunology, University Medical Center of the Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.
Immunoproteomics Unit, Helmholtz-Institute for Translational Oncology (HI-TRON) Mainz, D-55131, Mainz, Germany.

Bertil Schmidt (B)

Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

Andreas Hildebrandt (A)

Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany. andreas.hildebrandt@uni-mainz.de.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH