SOLart: a structure-based method to predict protein solubility and aggregation.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 03 2020
01 03 2020
Historique:
received:
03
06
2019
revised:
31
08
2019
accepted:
08
10
2019
pubmed:
12
10
2019
medline:
18
9
2020
entrez:
12
10
2019
Statut:
ppublish
Résumé
The solubility of a protein is often decisive for its proper functioning. Lack of solubility is a major bottleneck in high-throughput structural genomic studies and in high-concentration protein production, and the formation of protein aggregates causes a wide variety of diseases. Since solubility measurements are time-consuming and expensive, there is a strong need for solubility prediction tools. We have recently introduced solubility-dependent distance potentials that are able to unravel the role of residue-residue interactions in promoting or decreasing protein solubility. Here, we extended their construction by defining solubility-dependent potentials based on backbone torsion angles and solvent accessibility, and integrated them, together with other structure- and sequence-based features, into a random forest model trained on a set of Escherichia coli proteins with experimental structures and solubility values. We thus obtained the SOLart protein solubility predictor, whose most informative features turned out to be folding free energy differences computed from our solubility-dependent statistical potentials. SOLart performances are very good, with a Pearson correlation coefficient between experimental and predicted solubility values of almost 0.7 both in cross-validation on the training dataset and in an independent set of Saccharomyces cerevisiae proteins. On test sets of modeled structures, only a limited drop in performance is observed. SOLart can thus be used with both high-resolution and low-resolution structures, and clearly outperforms state-of-art solubility predictors. It is available through a user-friendly webserver, which is easy to use by non-expert scientists. The SOLart webserver is freely available at http://babylone.ulb.ac.be/SOLART/. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 31603466
pii: 5585748
doi: 10.1093/bioinformatics/btz773
doi:
Substances chimiques
Escherichia coli Proteins
0
Solvents
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1445-1452Informations de copyright
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.