cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
15 05 2019
Historique:
received: 07 07 2018
revised: 18 09 2018
accepted: 05 10 2018
pubmed: 9 10 2018
medline: 11 6 2020
entrez: 9 10 2018
Statut: ppublish

Résumé

Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory. We present cath-resolve-hits (CRH), a new tool that uses a dynamic-programming algorithm implemented in open-source C++ to handle large datasets quickly (up to ∼1 million hits/second) and in reasonable amounts of memory. It accepts multiple input formats and provides its output in plain text, JSON or graphical HTML. We describe a benchmark against an existing algorithm, which shows CRH delivers very similar or slightly improved results and very much improved CPU/memory performance on large datasets. CRH is available at https://github.com/UCLOrengoGroup/cath-tools; documentation is available at http://cath-tools.readthedocs.io. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 30295745
pii: 5123356
doi: 10.1093/bioinformatics/bty863
pmc: PMC6513158
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1766-1767

Subventions

Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/L002817/1
Pays : United Kingdom

Informations de copyright

© The Author(s) 2018. Published by Oxford University Press.

Références

Bioinformatics. 2010 Mar 15;26(6):745-51
pubmed: 20118117
Nucleic Acids Res. 2012 Jan;40(Database issue):D123-9
pubmed: 22086953
Nucleic Acids Res. 2013 Jan;41(Database issue):D483-9
pubmed: 23203869
Nucleic Acids Res. 2016 Jan 4;44(D1):D404-9
pubmed: 26578585
Nucleic Acids Res. 2017 Jan 4;45(D1):D289-D295
pubmed: 27899584
Nucleic Acids Res. 2017 Jan 4;45(D1):D190-D199
pubmed: 27899635
Nucleic Acids Res. 2018 Jan 4;46(D1):D435-D439
pubmed: 29112716

Auteurs

T E Lewis (TE)

Department of Structural and Molecular Biology, UCL, Darwin Building, London, UK.

I Sillitoe (I)

Department of Structural and Molecular Biology, UCL, Darwin Building, London, UK.

J G Lees (JG)

Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, Oxfordshire, UK.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature

Classifications MeSH