SOLart: a structure-based method to predict protein solubility and aggregation.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 03 2020
Historique:
received: 03 06 2019
revised: 31 08 2019
accepted: 08 10 2019
pubmed: 12 10 2019
medline: 18 9 2020
entrez: 12 10 2019
Statut: ppublish

Résumé

The solubility of a protein is often decisive for its proper functioning. Lack of solubility is a major bottleneck in high-throughput structural genomic studies and in high-concentration protein production, and the formation of protein aggregates causes a wide variety of diseases. Since solubility measurements are time-consuming and expensive, there is a strong need for solubility prediction tools. We have recently introduced solubility-dependent distance potentials that are able to unravel the role of residue-residue interactions in promoting or decreasing protein solubility. Here, we extended their construction by defining solubility-dependent potentials based on backbone torsion angles and solvent accessibility, and integrated them, together with other structure- and sequence-based features, into a random forest model trained on a set of Escherichia coli proteins with experimental structures and solubility values. We thus obtained the SOLart protein solubility predictor, whose most informative features turned out to be folding free energy differences computed from our solubility-dependent statistical potentials. SOLart performances are very good, with a Pearson correlation coefficient between experimental and predicted solubility values of almost 0.7 both in cross-validation on the training dataset and in an independent set of Saccharomyces cerevisiae proteins. On test sets of modeled structures, only a limited drop in performance is observed. SOLart can thus be used with both high-resolution and low-resolution structures, and clearly outperforms state-of-art solubility predictors. It is available through a user-friendly webserver, which is easy to use by non-expert scientists. The SOLart webserver is freely available at http://babylone.ulb.ac.be/SOLART/. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 31603466
pii: 5585748
doi: 10.1093/bioinformatics/btz773
doi:

Substances chimiques

Escherichia coli Proteins 0
Solvents 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1445-1452

Informations de copyright

© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Auteurs

Qingzhen Hou (Q)

Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue Roosevelt 50, 1050 Brussels, Belgium.
Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium.

Jean Marc Kwasigroch (JM)

Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue Roosevelt 50, 1050 Brussels, Belgium.
Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium.

Marianne Rooman (M)

Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue Roosevelt 50, 1050 Brussels, Belgium.
Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium.

Fabrizio Pucci (F)

Computational Biology and Bioinformatics, Université Libre de Bruxelles, Avenue Roosevelt 50, 1050 Brussels, Belgium.
Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium.
John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.

Articles similaires

Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic
Metabolic Networks and Pathways Saccharomyces cerevisiae Computational Biology Synthetic Biology Computer Simulation
Obesity Machine Learning Animals Biomarkers Computational Biology

Classifications MeSH