Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.
Locality-sensitive hashing
Mass spectrometry
Signal processing
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
20 Jul 2022
20 Jul 2022
Historique:
received:
01
07
2021
accepted:
08
07
2022
entrez:
20
7
2022
pubmed:
21
7
2022
medline:
23
7
2022
Statut:
epublish
Résumé
Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties. In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs. Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data. Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .
Sections du résumé
BACKGROUND
BACKGROUND
Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties.
RESULTS
RESULTS
In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs.
CONCLUSIONS
CONCLUSIONS
Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data.
AVAILABILITY
BACKGROUND
Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .
Identifiants
pubmed: 35858828
doi: 10.1186/s12859-022-04833-5
pii: 10.1186/s12859-022-04833-5
pmc: PMC9301846
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
287Subventions
Organisme : Deutsche Forschungsgemeinschaft
ID : 329350978
Organisme : Deutsche Forschungsgemeinschaft
ID : 329350978
Organisme : Deutsche Forschungsgemeinschaft
ID : 329350978
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217B
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217B
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217A
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217A
Organisme : Bundesministerium für Bildung und Forschung
ID : 031L0217B
Informations de copyright
© 2022. The Author(s).
Références
Anal Chem. 2020 Jul 21;92(14):9472-9475
pubmed: 32501003
Nat Biotechnol. 2007 Jul;25(7):755-7
pubmed: 17621303
Nat Biotechnol. 2015 Jun;33(6):623-30
pubmed: 26006009
Sci Rep. 2019 Nov 20;9(1):17168
pubmed: 31748623
Nature. 2003 Mar 13;422(6928):198-207
pubmed: 12634793
J Proteome Res. 2021 Apr 2;20(4):2122-2129
pubmed: 33724840
Trends Biotechnol. 1999 Mar;17(3):121-7
pubmed: 10189717
Nat Biotechnol. 2008 Dec;26(12):1367-72
pubmed: 19029910
Mass Spectrom Rev. 2012 Jan-Feb;31(1):96-109
pubmed: 21590704
BMC Bioinformatics. 2012 Nov 08;13:291
pubmed: 23137144
Nat Methods. 2009 May;6(5):359-62
pubmed: 19377485
Nat Methods. 2016 Aug;13(8):651-656
pubmed: 27493588
J Proteome Res. 2010 Feb 5;9(2):997-1006
pubmed: 20000344
Biochem Soc Trans. 2020 Oct 30;48(5):1953-1966
pubmed: 33079175
Methods Mol Biol. 2011;696:341-52
pubmed: 21063959
Mol Cell Proteomics. 2018 Dec;17(12):2534-2545
pubmed: 30385480
J Am Soc Mass Spectrom. 1995 Apr;6(4):229-33
pubmed: 24214167
Bioinformatics. 2017 Dec 01;33(23):3740-3748
pubmed: 28961782
BMC Bioinformatics. 2019 Jul 17;20(1):397
pubmed: 31315562
Mol Cell Proteomics. 2020 Jun;19(6):1058-1069
pubmed: 32156793
J Proteome Res. 2004 Mar-Apr;3(2):179-96
pubmed: 15113093
Proteomics. 2020 Nov;20(21-22):e2000002
pubmed: 32415809
Nat Protoc. 2016 Apr;11(4):795-812
pubmed: 27010757
Bioinformatics. 2007 Mar 1;23(5):612-8
pubmed: 17237061
Nucleic Acids Res. 2019 Jan 8;47(D1):D15-D22
pubmed: 30445657
J Proteome Res. 2019 Jan 4;18(1):147-158
pubmed: 30511858
Electrophoresis. 1998 Aug;19(11):1853-61
pubmed: 9740045
Nat Methods. 2014 Feb;11(2):167-70
pubmed: 24336358