Balancing multiple objectives in conformation sampling to control decoy diversity in template-free protein structure prediction.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
25 Apr 2019
Historique:
received: 07 01 2019
accepted: 04 04 2019
entrez: 27 4 2019
pubmed: 27 4 2019
medline: 30 5 2019
Statut: epublish

Résumé

Computational approaches for the determination of biologically-active/native three-dimensional structures of proteins with novel sequences have to handle several challenges. The (conformation) space of possible three-dimensional spatial arrangements of the chain of amino acids that constitute a protein molecule is vast and high-dimensional. Exploration of the conformation spaces is performed in a sampling-based manner and is biased by the internal energy that sums atomic interactions. Even state-of-the-art energy functions that quantify such interactions are inherently inaccurate and associate with protein conformation spaces overly rugged energy surfaces riddled with artifact local minima. The response to these challenges in template-free protein structure prediction is to generate large numbers of low-energy conformations (also referred to as decoys) as a way of increasing the likelihood of having a diverse decoy dataset that covers a sufficient number of local minima possibly housing near-native conformations. In this paper we pursue a complementary approach and propose to directly control the diversity of generated decoys. Inspired by hard optimization problems in high-dimensional and non-linear variable spaces, we propose that conformation sampling for decoy generation is more naturally framed as a multi-objective optimization problem. We demonstrate that mechanisms inherent to evolutionary search techniques facilitate such framing and allow balancing multiple objectives in protein conformation sampling. We showcase here an operationalization of this idea via a novel evolutionary algorithm that has high exploration capability and is also able to access lower-energy regions of the energy landscape of a given protein with similar or better proximity to the known native structure than several state-of-the-art decoy generation algorithms. The presented results constitute a promising research direction in improving decoy generation for template-free protein structure prediction with regards to balancing of multiple conflicting objectives under an optimization framework. Future work will consider additional optimization objectives and variants of improvement and selection operators to apportion a fixed computational budget. Of particular interest are directions of research that attenuate dependence on protein energy models.

Sections du résumé

BACKGROUND BACKGROUND
Computational approaches for the determination of biologically-active/native three-dimensional structures of proteins with novel sequences have to handle several challenges. The (conformation) space of possible three-dimensional spatial arrangements of the chain of amino acids that constitute a protein molecule is vast and high-dimensional. Exploration of the conformation spaces is performed in a sampling-based manner and is biased by the internal energy that sums atomic interactions. Even state-of-the-art energy functions that quantify such interactions are inherently inaccurate and associate with protein conformation spaces overly rugged energy surfaces riddled with artifact local minima. The response to these challenges in template-free protein structure prediction is to generate large numbers of low-energy conformations (also referred to as decoys) as a way of increasing the likelihood of having a diverse decoy dataset that covers a sufficient number of local minima possibly housing near-native conformations.
RESULTS RESULTS
In this paper we pursue a complementary approach and propose to directly control the diversity of generated decoys. Inspired by hard optimization problems in high-dimensional and non-linear variable spaces, we propose that conformation sampling for decoy generation is more naturally framed as a multi-objective optimization problem. We demonstrate that mechanisms inherent to evolutionary search techniques facilitate such framing and allow balancing multiple objectives in protein conformation sampling. We showcase here an operationalization of this idea via a novel evolutionary algorithm that has high exploration capability and is also able to access lower-energy regions of the energy landscape of a given protein with similar or better proximity to the known native structure than several state-of-the-art decoy generation algorithms.
CONCLUSIONS CONCLUSIONS
The presented results constitute a promising research direction in improving decoy generation for template-free protein structure prediction with regards to balancing of multiple conflicting objectives under an optimization framework. Future work will consider additional optimization objectives and variants of improvement and selection operators to apportion a fixed computational budget. Of particular interest are directions of research that attenuate dependence on protein energy models.

Identifiants

pubmed: 31023237
doi: 10.1186/s12859-019-2794-5
pii: 10.1186/s12859-019-2794-5
pmc: PMC6485169
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

211

Subventions

Organisme : National Science Foundation
ID : 1763233

Références

Proteins. 1999;Suppl 3:22-9
pubmed: 10526349
Proteins. 2001;Suppl 5:13-21
pubmed: 11835478
Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12105-10
pubmed: 14528006
Nat Struct Biol. 2003 Dec;10(12):980
pubmed: 14634627
Proteins. 2004 Dec 1;57(4):702-10
pubmed: 15476259
Science. 2008 Jun 13;320(5882):1429-30
pubmed: 18556537
Proc Natl Acad Sci U S A. 2009 Feb 3;106(5):1415-20
pubmed: 19171891
Protein Sci. 2010 Mar;19(3):520-34
pubmed: 20066664
Methods Enzymol. 2011;487:545-74
pubmed: 21187238
Trends Biotechnol. 2011 Apr;29(4):174-82
pubmed: 21310501
Acta Crystallogr D Biol Crystallogr. 2011 Apr;67(Pt 4):386-94
pubmed: 21460457
Proteins. 2012 Jul;80(7):1715-35
pubmed: 22411565
Proteins. 2014 Feb;82 Suppl 2:112-26
pubmed: 23780644
IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1162-75
pubmed: 24384705
Phys Chem Chem Phys. 2014 Apr 14;16(14):6321-2
pubmed: 24608340
PLoS Comput Biol. 2016 Apr 28;12(4):e1004619
pubmed: 27124275
IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1288-1301
pubmed: 28113726
J Chem Theory Comput. 2018 Nov 13;14(11):6015-6025
pubmed: 30240210
IEEE/ACM Trans Comput Biol Bioinform. 2018 Oct 04;:null
pubmed: 30295627

Auteurs

Ahmed Bin Zaman (AB)

Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA.

Amarda Shehu (A)

Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA.
Department of Bioengineering, George Mason University, Fairfax, 22030, VA, USA.
School of Systems Biology, George Mason University, Manassas, 20110, VA, USA.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic

Classifications MeSH