An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

Tandem Mass Spectrometry / methods Proteomics / methods Algorithms Databases, Protein Peptides / chemistry Proteins / chemistry

Journal

Bioinformatics (Oxford, England)

ISSN: 1367-4811

Titre abrégé: Bioinformatics

Pays: England

ID NLM: 9808944

Informations de publication

Date de publication:
28 Jun 2024

Historique:

medline: 28 6 2024

pubmed: 28 6 2024

entrez: 28 6 2024

Statut: ppublish

Résumé

Cross-linking tandem mass spectrometry (XL-MS/MS) is an established analytical platform used to determine distance constraints between residues within a protein or from physically interacting proteins, thus improving our understanding of protein structure and function. To aid biological discovery with XL-MS/MS, it is essential that pairs of chemically linked peptides be accurately identified, a process that requires: (i) database search, that creates a ranked list of candidate peptide pairs for each experimental spectrum and (ii) false discovery rate (FDR) estimation, that determines the probability of a false match in a group of top-ranked peptide pairs with scores above a given threshold. Currently, the only available FDR estimation mechanism in XL-MS/MS is the target-decoy approach (TDA). However, despite its simplicity, TDA has both theoretical and practical limitations that impact the estimation accuracy and increase run time over potential decoy-free approaches (DFAs). We introduce a novel decoy-free framework for FDR estimation in XL-MS/MS. Our approach relies on multi-sample mixtures of skew normal distributions, where the latent components correspond to the scores of correct peptide pairs (both peptides identified correctly), partially incorrect peptide pairs (one peptide identified correctly, the other incorrectly), and incorrect peptide pairs (both peptides identified incorrectly). To learn these components, we exploit the score distributions of first- and second-ranked peptide-spectrum matches for each experimental spectrum and subsequently estimate FDR using a novel expectation-maximization algorithm with constraints. We evaluate the method on ten datasets and provide evidence that the proposed DFA is theoretically sound and a viable alternative to TDA owing to its good performance in terms of accuracy, variance of estimation, and run time. https://github.com/shawn-peng/xlms.

Identifiants

DOI: 10.1093/bioinformatics/btae233 PMID: 38940171

pubmed: 38940171

pii: 7700896

doi: 10.1093/bioinformatics/btae233

doi:

Substances chimiques

Peptides 0

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

i428-i436

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Auteurs

Yisu Peng (Y)

Shantanu Jain (S)

Predrag Radivojac (P)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

Multilabel SegSRGAN-A framework for parcellation and morphometry of preterm brain in MRI.

An arithmetic operation P system based on symmetric ternary system.

Classifications MeSH