rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics.

Amino Acid Sequence Computational Biology / methods Computing Methodologies Proteins / chemistry Reproducibility of Results Software

Computational protein design Data analysis Protein structural metrics Scoring rstoolbox

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
15 May 2019

Historique:

received: 23 01 2019

accepted: 08 04 2019

entrez: 17 5 2019

pubmed: 17 5 2019

medline: 21 6 2019

Statut: epublish

Résumé

Large-scale datasets of protein structures and sequences are becoming ubiquitous in many domains of biological research. Experimental approaches and computational modelling methods are generating biological data at an unprecedented rate. The detailed analysis of structure-sequence relationships is critical to unveil governing principles of protein folding, stability and function. Computational protein design (CPD) has emerged as an important structure-based approach to engineer proteins for novel functions. Generally, CPD workflows rely on the generation of large numbers of structural models to search for the optimal structure-sequence configurations. As such, an important step of the CPD process is the selection of a small subset of sequences to be experimentally characterized. Given the limitations of current CPD scoring functions, multi-step design protocols and elaborated analysis of the decoy populations have become essential for the selection of sequences for experimental characterization and the success of CPD strategies. Here, we present the rstoolbox, a Python library for the analysis of large-scale structural data tailored for CPD applications. rstoolbox is oriented towards both CPD software users and developers, being easily integrated in analysis workflows. For users, it offers the ability to profile and select decoy sets, which may guide multi-step design protocols or for follow-up experimental characterization. rstoolbox provides intuitive solutions for the visualization of large sequence/structure datasets (e.g. logo plots and heatmaps) and facilitates the analysis of experimental data obtained through traditional biochemical techniques (e.g. circular dichroism and surface plasmon resonance) and high-throughput sequencing. For CPD software developers, it provides a framework to easily benchmark and compare different CPD approaches. Here, we showcase the rstoolbox in both types of applications. rstoolbox is a library for the evaluation of protein structures datasets tailored for CPD data. It provides interactive access through seamless integration with IPython, while still being suitable for high-performance computing. In addition to its functionalities for data analysis and graphical representation, the inclusion of rstoolbox in protein design pipelines will allow to easily standardize the selection of design candidates, as well as, to improve the overall reproducibility and robustness of CPD selection processes.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

Here, we present the rstoolbox, a Python library for the analysis of large-scale structural data tailored for CPD applications. rstoolbox is oriented towards both CPD software users and developers, being easily integrated in analysis workflows. For users, it offers the ability to profile and select decoy sets, which may guide multi-step design protocols or for follow-up experimental characterization. rstoolbox provides intuitive solutions for the visualization of large sequence/structure datasets (e.g. logo plots and heatmaps) and facilitates the analysis of experimental data obtained through traditional biochemical techniques (e.g. circular dichroism and surface plasmon resonance) and high-throughput sequencing. For CPD software developers, it provides a framework to easily benchmark and compare different CPD approaches. Here, we showcase the rstoolbox in both types of applications.

CONCLUSIONS CONCLUSIONS

rstoolbox is a library for the evaluation of protein structures datasets tailored for CPD data. It provides interactive access through seamless integration with IPython, while still being suitable for high-performance computing. In addition to its functionalities for data analysis and graphical representation, the inclusion of rstoolbox in protein design pipelines will allow to easily standardize the selection of design candidates, as well as, to improve the overall reproducibility and robustness of CPD selection processes.

Identifiants

DOI: 10.1186/s12859-019-2796-3 PMID: 31092198 PMC: PMC6521408

pubmed: 31092198

doi: 10.1186/s12859-019-2796-3

pii: 10.1186/s12859-019-2796-3

pmc: PMC6521408

doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

240

Subventions

Organisme : European Research Council

ID : 716058

Pays : International

Organisme : Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

ID : 310030_163139

Références

Proteins. 1999;Suppl 3:171-6

pubmed: 10526365

Nucleic Acids Res. 2000 Jan 1;28(1):235-42

pubmed: 10592235

Nat Struct Biol. 2000 Aug;7(8):674-8

pubmed: 10932253

Proc Natl Acad Sci U S A. 2000 Sep 12;97(19):10383-8

pubmed: 10984534

Proc Natl Acad Sci U S A. 2001 Dec 4;98(25):14274-9

pubmed: 11724958

J Biol Chem. 2002 Aug 30;277(35):32094-8

pubmed: 12068017

Protein Eng. 2002 Oct;15(10):779-82

pubmed: 12468711

J Mol Biol. 1963 Jul;7:95-9

pubmed: 13990617

Science. 2003 Nov 21;302(5649):1364-8

pubmed: 14631033

Nucleic Acids Res. 2008 Jan;36(Database issue):D419-25

pubmed: 18000004

Curr Protoc Bioinformatics. 2002 Aug;Chapter 2:Unit 2.3

pubmed: 18792934

Proc Natl Acad Sci U S A. 2009 Mar 10;106(10):3764-9

pubmed: 19228942

J Mol Biol. 2009 Oct 16;393(1):249-60

pubmed: 19646450

Structure. 2009 Sep 9;17(9):1244-52

pubmed: 19748345

Nat Struct Mol Biol. 2010 Feb;17(2):248-50

pubmed: 20098425

Proc Natl Acad Sci U S A. 2010 Aug 3;107(31):13707-12

pubmed: 20643959

Nucleic Acids Res. 1990 Oct 25;18(20):6097-100

pubmed: 2172928

Methods Enzymol. 2013;523:87-107

pubmed: 23422427

PLoS One. 2013 May 21;8(5):e63090

pubmed: 23704889

PLoS One. 2013 Oct 01;8(10):e75992

pubmed: 24098414

Cell. 2014 Jun 19;157(7):1644-1656

pubmed: 24949974

Curr Opin Struct Biol. 2016 Aug;39:16-26

pubmed: 27086078

Science. 2017 Jan 13;355(6321):201-206

pubmed: 28082595

J Chem Theory Comput. 2017 Jun 13;13(6):3031-3048

pubmed: 28430426

Curr Opin Biotechnol. 2018 Aug;52:145-152

pubmed: 29729544

Nucleic Acids Res. 2018 Jul 2;46(W1):W200-W204

pubmed: 29905871

Nucleic Acids Res. 2019 Jan 8;47(D1):D280-D284

pubmed: 30398663

PLoS Comput Biol. 2018 Nov 19;14(11):e1006623

pubmed: 30452434

Proc Natl Acad Sci U S A. 1987 Oct;84(19):6611-5

pubmed: 3477791

Science. 1997 Oct 3;278(5335):82-7

pubmed: 9311930

rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Références

Auteurs

Jaume Bonet (J)

Zander Harteveld (Z)

Fabian Sesterhenn (F)

Andreas Scheck (A)

Bruno E Correia (BE)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

Decoding the genomic terrain: functional insights into 14 chemosensory proteins in whitefly Bemisia tabaci Asia II-1.

Relative victimization scale: initial development and retrospective reports of the impact on mental health.

Classifications MeSH