Predicting protein residue-residue contacts using random forests and deep networks.

Direct coupling analysis Protein Random forest Residue-residue contact prediction Web server

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
14 Mar 2019
Historique:
entrez: 16 3 2019
pubmed: 16 3 2019
medline: 3 5 2019
Statut: epublish

Résumé

The ability to predict which pairs of amino acid residues in a protein are in contact with each other offers many advantages for various areas of research that focus on proteins. For example, contact prediction can be used to reduce the computational complexity of predicting the structure of proteins and even to help identify functionally important regions of proteins. These predictions are becoming especially important given the relatively low number of experimentally determined protein structures compared to the amount of available protein sequence data. Here we have developed and benchmarked a set of machine learning methods for performing residue-residue contact prediction, including random forests, direct-coupling analysis, support vector machines, and deep networks (stacked denoising autoencoders). These methods are able to predict contacting residue pairs given only the amino acid sequence of a protein. According to our own evaluations performed at a resolution of +/- two residues, the predictors we trained with the random forest algorithm were our top performing methods with average top 10 prediction accuracy scores of 85.13% (short range), 74.49% (medium range), and 54.49% (long range). Our ensemble models (stacked denoising autoencoders combined with support vector machines) were our best performing deep network predictors and achieved top 10 prediction accuracy scores of 75.51% (short range), 60.26% (medium range), and 43.85% (long range) using the same evaluation. These tests were blindly performed on targets from the CASP11 dataset; and the results suggested that our models achieved comparable performance to contact predictors developed by groups that participated in CASP11. Due to the challenging nature of contact prediction, it is beneficial to develop and benchmark a variety of different prediction methods. Our work has produced useful tools with a simple interface that can provide contact predictions to users without requiring a lengthy installation process. In addition to this, we have released our C++ implementation of the direct-coupling analysis method as a standalone software package. Both this tool and our RFcon web server are freely available to the public at http://dna.cs.miami.edu/RFcon /.

Sections du résumé

BACKGROUND BACKGROUND
The ability to predict which pairs of amino acid residues in a protein are in contact with each other offers many advantages for various areas of research that focus on proteins. For example, contact prediction can be used to reduce the computational complexity of predicting the structure of proteins and even to help identify functionally important regions of proteins. These predictions are becoming especially important given the relatively low number of experimentally determined protein structures compared to the amount of available protein sequence data.
RESULTS RESULTS
Here we have developed and benchmarked a set of machine learning methods for performing residue-residue contact prediction, including random forests, direct-coupling analysis, support vector machines, and deep networks (stacked denoising autoencoders). These methods are able to predict contacting residue pairs given only the amino acid sequence of a protein. According to our own evaluations performed at a resolution of +/- two residues, the predictors we trained with the random forest algorithm were our top performing methods with average top 10 prediction accuracy scores of 85.13% (short range), 74.49% (medium range), and 54.49% (long range). Our ensemble models (stacked denoising autoencoders combined with support vector machines) were our best performing deep network predictors and achieved top 10 prediction accuracy scores of 75.51% (short range), 60.26% (medium range), and 43.85% (long range) using the same evaluation. These tests were blindly performed on targets from the CASP11 dataset; and the results suggested that our models achieved comparable performance to contact predictors developed by groups that participated in CASP11.
CONCLUSIONS CONCLUSIONS
Due to the challenging nature of contact prediction, it is beneficial to develop and benchmark a variety of different prediction methods. Our work has produced useful tools with a simple interface that can provide contact predictions to users without requiring a lengthy installation process. In addition to this, we have released our C++ implementation of the direct-coupling analysis method as a standalone software package. Both this tool and our RFcon web server are freely available to the public at http://dna.cs.miami.edu/RFcon /.

Identifiants

pubmed: 30871477
doi: 10.1186/s12859-019-2627-6
pii: 10.1186/s12859-019-2627-6
pmc: PMC6419322
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

100

Subventions

Organisme : NIGMS NIH HHS
ID : R15 GM120650
Pays : United States

Références

Nucleic Acids Res. 2000 Jan 1;28(1):235-42
pubmed: 10592235
Bioinformatics. 2000 Apr;16(4):404-5
pubmed: 10869041
Proteins. 2001 May 15;43(3):246-55
pubmed: 11288174
Proteins. 2002 May 1;47(2):142-53
pubmed: 11933061
Proteins. 2003;53 Suppl 6:497-502
pubmed: 14579339
J Comput Chem. 2004 Oct;25(13):1605-12
pubmed: 15264254
Prog Biophys Mol Biol. 2004 Oct;86(2):235-77
pubmed: 15288760
Proc Natl Acad Sci U S A. 2005 May 3;102(18):6395-400
pubmed: 15851683
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W72-6
pubmed: 15980571
BMC Bioinformatics. 2007 Apr 02;8:113
pubmed: 17407573
Bioinformatics. 2008 Apr 1;24(7):924-31
pubmed: 18296462
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W515-8
pubmed: 19420062
Bioinformatics. 2010 Apr 1;26(7):882-8
pubmed: 20150411
Curr Drug Metab. 2011 Jun;12(5):436-44
pubmed: 21453272
Mol Syst Biol. 2011 Oct 11;7:539
pubmed: 21988835
Bioinformatics. 2011 Dec 15;27(24):3379-84
pubmed: 22016406
Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301
pubmed: 22106262
Nat Methods. 2011 Dec 25;9(2):173-5
pubmed: 22198341
Bioinformatics. 2012 Dec 1;28(23):3066-72
pubmed: 23047561
BMC Bioinformatics. 2013;14 Suppl 14:S12
pubmed: 24267585
PLoS Comput Biol. 2014 Nov 06;10(11):e1003889
pubmed: 25375897
Proteins. 2016 Sep;84 Suppl 1:131-44
pubmed: 26474083
Proteins. 2016 Mar;84(3):332-48
pubmed: 26756402
Sci Rep. 2016 Jan 14;6:19301
pubmed: 26763289
Sci Rep. 2016 Jan 22;6:19598
pubmed: 26797014
Proteins. 2016 Sep;84 Suppl 1:4-14
pubmed: 27171127
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324
pubmed: 28056090
IUCrJ. 2017 Apr 18;4(Pt 3):291-300
pubmed: 28512576
Bioinformatics. 2018 May 1;34(9):1466-1472
pubmed: 29228185

Auteurs

Joseph Luttrell (J)

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA.

Tong Liu (T)

Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124, USA.

Chaoyang Zhang (C)

School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA.

Zheng Wang (Z)

Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124, USA. zheng.wang@miami.edu.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Animals Hemiptera Insect Proteins Phylogeny Insecticides

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis

Classifications MeSH