HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.


Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
01 02 2020
Historique:
received: 28 06 2019
revised: 30 11 2019
accepted: 13 01 2020
entrez: 9 2 2020
pubmed: 9 2 2020
medline: 28 1 2021
Statut: ppublish

Résumé

Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.

Sections du résumé

BACKGROUND
Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation.
RESULTS
Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline.
CONCLUSIONS
HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.

Identifiants

pubmed: 32034905
pii: 5731417
doi: 10.1093/gigascience/giaa003
pmc: PMC7007698
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NHGRI NIH HHS
ID : U24 HG007822
Pays : United States

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Nucleic Acids Res. 2009 Jan;37(Database issue):D593-7
pubmed: 18776214
Brief Bioinform. 2019 Jul 19;20(4):1151-1159
pubmed: 29028869
Bioinformatics. 2013 May 15;29(10):1325-32
pubmed: 23479348
Nat Methods. 2011 Sep 29;8(10):785-6
pubmed: 21959131
Proc Int Conf Intell Syst Mol Biol. 1998;6:175-82
pubmed: 9783223
Bioinformatics. 2013 May 1;29(9):1215-7
pubmed: 23505298
Nature. 2016 Aug 25;536(7617):425-30
pubmed: 27533034
Nucleic Acids Res. 2017 Jan 4;45(D1):D128-D134
pubmed: 27794554
J Biomed Semantics. 2016 Jun 13;7:39
pubmed: 27296299
Gigascience. 2020 Feb 1;9(2):
pubmed: 32034905
Nucleic Acids Res. 2014 Jan;42(Database issue):D206-14
pubmed: 24293654
Nucleic Acids Res. 2015 Jan;43(Database issue):D1064-70
pubmed: 25348399
Nature. 2017 Nov 23;551(7681):457-463
pubmed: 29088705
Nucleic Acids Res. 2018 Jan 4;46(D1):D851-D860
pubmed: 29112715
Nucleic Acids Res. 2016 Jan 4;44(D1):D1214-9
pubmed: 26467479
Nucleic Acids Res. 2000 Jan 1;28(1):304-5
pubmed: 10592255
Database (Oxford). 2014 Jul 22;2014:
pubmed: 25052702
Nucleic Acids Res. 2013 Jan;41(Database issue):D344-7
pubmed: 23161676
Nucleic Acids Res. 2019 Jan 8;47(D1):D351-D360
pubmed: 30398656
Nucleic Acids Res. 2017 Jan 4;45(D1):D507-D516
pubmed: 27738135
Nucleic Acids Res. 2016 Jan 4;44(D1):D523-6
pubmed: 26527720
Biochem Soc Trans. 2018 Aug 20;46(4):931-936
pubmed: 30065105
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
Bioinformatics. 2018 Feb 15;34(4):660-668
pubmed: 29028931
PLoS One. 2018 Jun 11;13(6):e0198216
pubmed: 29889900
Bioinformatics. 2020 Mar 1;36(6):1896-1901
pubmed: 31688925
Nucleic Acids Res. 2013 Jan;41(Database issue):D387-95
pubmed: 23197656
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Nucleic Acids Res. 2018 Jan 4;46(D1):D335-D342
pubmed: 29112718
Nucleic Acids Res. 2019 Jan 8;47(D1):D596-D600
pubmed: 30272209
Nat Biotechnol. 2017 Jul;35(7):676-683
pubmed: 28604660
J Biomol Tech. 2017 Apr;28(1):31-39
pubmed: 28337070
Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338
pubmed: 30395331
Proc Natl Acad Sci U S A. 2018 Apr 24;115(17):4325-4333
pubmed: 29686065
Nucleic Acids Res. 2018 Jan 4;46(D1):D802-D808
pubmed: 29092050

Auteurs

Jerven Bolleman (J)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Edouard de Castro (E)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Delphine Baratin (D)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Sebastien Gehant (S)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Beatrice A Cuche (BA)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Andrea H Auchincloss (AH)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Elisabeth Coudert (E)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Chantal Hulo (C)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Patrick Masson (P)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Ivo Pedruzzi (I)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Catherine Rivoire (C)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Ioannis Xenarios (I)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.
Centre Hospitalier Universitaire Vaudois/Ludwig Institute for Cancer Research, Agora Centre, CH-1005 Lausanne, Switzerland.

Nicole Redaschi (N)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Alan Bridge (A)

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH