Bayesian Hierarchical Model for Protein Identifications.
Gibbs sampling
Mass-Spectrometry
cluster
hierarchical model
protein identification
Journal
Journal of applied statistics
ISSN: 0266-4763
Titre abrégé: J Appl Stat
Pays: England
ID NLM: 9883455
Informations de publication
Date de publication:
2019
2019
Historique:
entrez:
21
5
2019
pubmed:
21
5
2019
medline:
21
5
2019
Statut:
ppublish
Résumé
In proteomics, identification of proteins from complex mixtures of proteins extracted from biological samples is an important problem. Among the experimental technologies, Mass-Spectrometry (MS) is the most popular one. Protein identification from MS data typically relies on a "two-step" procedure of identifying the peptide first followed by the separate protein identification procedure next. In this setup, the interdependence of peptides and proteins are neglected resulting in relatively inaccurate protein identification. In this article, we propose a Markov chain Monte Carlo (MCMC) based Bayesian hierarchical model, a first of its kind in protein identification, which integrates the two steps and performs joint analysis of proteins and peptides using posterior probabilities. We remove the assumption of independence of proteins by using clustering group priors to the proteins based on the assumption that proteins sharing the same biological pathway are likely to be present or absent together and are correlated. The complete conditionals of the proposed joint model being tractable, we propose and implement a Gibbs sampling scheme for full posterior inference that provides the estimation and statistical uncertainties of all relevant parameters. The model has better operational characteristics compared to two existing "one-step" procedures on a range of simulation settings as well as on two well-studied datasets.
Identifiants
pubmed: 31105371
doi: 10.1080/02664763.2018.1454893
pmc: PMC6519717
mid: NIHMS1507092
doi:
Types de publication
Journal Article
Langues
eng
Pagination
30-46Subventions
Organisme : NCI NIH HHS
ID : R15 CA133844
Pays : United States
Organisme : NCI NIH HHS
ID : R15 CA170091
Pays : United States
Déclaration de conflit d'intérêts
Disclosure Statement No potential conflict of interest.
Références
Nat Biotechnol. 1999 Jul;17(7):676-82
pubmed: 10404161
Nat Biotechnol. 1999 Oct;17(10):994-9
pubmed: 10504701
Electrophoresis. 1999 Dec;20(18):3551-67
pubmed: 10612281
Bioinformatics. 2001;17 Suppl 1:S13-21
pubmed: 11472988
J Am Soc Mass Spectrom. 2002 Apr;13(4):378-86
pubmed: 11951976
Nature. 2003 Mar 13;422(6928):198-207
pubmed: 12634793
Anal Chem. 2003 Aug 1;75(15):3792-8
pubmed: 14572045
Anal Chem. 2003 Sep 1;75(17):4646-58
pubmed: 14632076
Anal Chem. 2003 Dec 15;75(24):6912-21
pubmed: 14670053
Bioinformatics. 2004 Jun 12;20(9):1466-7
pubmed: 14976030
J Proteome Res. 2004 Sep-Oct;3(5):958-64
pubmed: 15473683
J Proteome Res. 2007 Feb;6(2):654-61
pubmed: 17269722
Nat Methods. 2007 Mar;4(3):207-14
pubmed: 17327847
Anal Chem. 2007 May 15;79(10):3901-11
pubmed: 17441689
Biol Direct. 2007 Oct 25;2:25
pubmed: 17961253
Bioinformatics. 2008 Jan 15;24(2):202-8
pubmed: 18024968
J Proteome Res. 2008 Jan;7(1):29-34
pubmed: 18067246
J Proteome Res. 2008 Aug;7(8):3354-63
pubmed: 18597511
Nucleic Acids Res. 2009 Jan;37(1):1-13
pubmed: 19033363
Nat Protoc. 2009;4(1):44-57
pubmed: 19131956
J Comput Biol. 2009 Aug;16(8):1183-93
pubmed: 19645593
Proc Natl Acad Sci U S A. 2010 Jul 6;107(27):12101-6
pubmed: 20562346
J Proteome Res. 2010 Oct 1;9(10):5346-57
pubmed: 20712337
Bioinformatics. 2011 Apr 15;27(8):1128-34
pubmed: 21349864
J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89
pubmed: 24226387
Mol Cell Proteomics. 2014 Sep;13(9):2467-79
pubmed: 24895379
Bioinformatics. 2015 Mar 1;31(5):699-706
pubmed: 25362092