PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison.
Knowledge comparison
Knowledge engineering
Linked open data
Ontology
Pharmacogenomics
Semantic web
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
18 Apr 2019
18 Apr 2019
Historique:
entrez:
20
4
2019
pubmed:
20
4
2019
medline:
15
6
2019
Statut:
epublish
Résumé
Pharmacogenomics (PGx) studies how genomic variations impact variations in drug response phenotypes. Knowledge in pharmacogenomics is typically composed of units that have the form of ternary relationships gene variant - drug - adverse event. Such a relationship states that an adverse event may occur for patients having the specified gene variant and being exposed to the specified drug. State-of-the-art knowledge in PGx is mainly available in reference databases such as PharmGKB and reported in scientific biomedical literature. But, PGx knowledge can also be discovered from clinical data, such as Electronic Health Records (EHRs), and in this case, may either correspond to new knowledge or confirm state-of-the-art knowledge that lacks "clinical counterpart" or validation. For this reason, there is a need for automatic comparison of knowledge units from distinct sources. In this article, we propose an approach, based on Semantic Web technologies, to represent and compare PGx knowledge units. To this end, we developed PGxO, a simple ontology that represents PGx knowledge units and their components. Combined with PROV-O, an ontology developed by the W3C to represent provenance information, PGxO enables encoding and associating provenance information to PGx relationships. Additionally, we introduce a set of rules to reconcile PGx knowledge, i.e. to identify when two relationships, potentially expressed using different vocabularies and levels of granularity, refer to the same, or to different knowledge units. We evaluated our ontology and rules by populating PGxO with knowledge units extracted from PharmGKB (2701), the literature (65,720) and from discoveries reported in EHR analysis studies (only 10, manually extracted); and by testing their similarity. We called PGxLOD (PGx Linked Open Data) the resulting knowledge base that represents and reconciles knowledge units of those various origins. The proposed ontology and reconciliation rules constitute a first step toward a more complete framework for knowledge comparison in PGx. In this direction, the experimental instantiation of PGxO, named PGxLOD, illustrates the ability and difficulties of reconciling various existing knowledge sources.
Sections du résumé
BACKGROUND
BACKGROUND
Pharmacogenomics (PGx) studies how genomic variations impact variations in drug response phenotypes. Knowledge in pharmacogenomics is typically composed of units that have the form of ternary relationships gene variant - drug - adverse event. Such a relationship states that an adverse event may occur for patients having the specified gene variant and being exposed to the specified drug. State-of-the-art knowledge in PGx is mainly available in reference databases such as PharmGKB and reported in scientific biomedical literature. But, PGx knowledge can also be discovered from clinical data, such as Electronic Health Records (EHRs), and in this case, may either correspond to new knowledge or confirm state-of-the-art knowledge that lacks "clinical counterpart" or validation. For this reason, there is a need for automatic comparison of knowledge units from distinct sources.
RESULTS
RESULTS
In this article, we propose an approach, based on Semantic Web technologies, to represent and compare PGx knowledge units. To this end, we developed PGxO, a simple ontology that represents PGx knowledge units and their components. Combined with PROV-O, an ontology developed by the W3C to represent provenance information, PGxO enables encoding and associating provenance information to PGx relationships. Additionally, we introduce a set of rules to reconcile PGx knowledge, i.e. to identify when two relationships, potentially expressed using different vocabularies and levels of granularity, refer to the same, or to different knowledge units. We evaluated our ontology and rules by populating PGxO with knowledge units extracted from PharmGKB (2701), the literature (65,720) and from discoveries reported in EHR analysis studies (only 10, manually extracted); and by testing their similarity. We called PGxLOD (PGx Linked Open Data) the resulting knowledge base that represents and reconciles knowledge units of those various origins.
CONCLUSIONS
CONCLUSIONS
The proposed ontology and reconciliation rules constitute a first step toward a more complete framework for knowledge comparison in PGx. In this direction, the experimental instantiation of PGxO, named PGxLOD, illustrates the ability and difficulties of reconciling various existing knowledge sources.
Identifiants
pubmed: 30999867
doi: 10.1186/s12859-019-2693-9
pii: 10.1186/s12859-019-2693-9
pmc: PMC6471679
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
139Références
Bizer C, Heath T, Berners-Lee T. Linked data-the story so far. Int J Semant Web Inf Syst. 2009; 5(3):1–22.
doi: 10.4018/jswis.2009081901
Xie H-G, Frueh FW. Pharmacogenomics steps toward personalized medicine. Personalized Med. 2005; 2(4):325–37.
doi: 10.2217/17410541.2.4.325
Caudle KE, Klein TE, Hoffman JM, Muller DJ, Whirl-Carrillo M, Gong L, et al. Incorporation of pharmacogenomics into routine clinical practice: the Clinical Pharmacogenetics Implementation Consortium (CPIC) guideline development process. Curr Drug Metab. 2014; 15(2):209–17.
doi: 10.2174/1389200215666140130124910
pubmed: 24479687
pmcid: 24479687
Martin MA, Hoffman JM, Freimuth RR, Klein TE, Dong BJ, Pirmohamed M, et al. Clinical Pharmacogenetics Implementation Consortium Guidelines for HLA-B Genotype and Abacavir Dosing: 2014 update. Clin Pharmacol Ther. 2014; 95(5):499–500.
doi: 10.1038/clpt.2014.38
pubmed: 24561393
pmcid: 24561393
Amstutz U, Henricks LM, Offer SM, Barbarino J, Schellens JHM, Swen JJ, et al. Clinical Pharmacogenetics Implementation Consortium (CPIC) Guideline for Dihydropyrimidine Dehydrogenase Genotype and Fluoropyrimidine Dosing: 2017 Update. Clin Pharmacol Ther. 2018; 103(2):210–6.
doi: 10.1002/cpt.911
pubmed: 29152729
pmcid: 29152729
Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, et al. Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther. 2012; 92(4):414.
doi: 10.1038/clpt.2012.96
pubmed: 22992668
pmcid: 22992668
Garten Y, Coulet A, Altman RB. Recent progress in automatically extracting information from the pharmacogenomic literature. Pharmacogenomics. 2010; 11(10):1467–89.
doi: 10.2217/pgs.10.136
pubmed: 21047206
pmcid: 21047206
Ioannidis JP. To replicate or not to replicate: the case of pharmacogenetic studies: Have pharmacogenomics failed, or do they just need larger-scale evidence and more replication?Circ Cardiovasc Genet. 2013; 6(4):413–8.
doi: 10.1161/CIRCGENETICS.113.000106
pubmed: 23963161
pmcid: 23963161
Delaney JT, Ramirez AH, Bowton E, Pulley JM, Basford MA, Schildcrout JS, et al. Predicting clopidogrel response using DNA samples linked to an electronic health record. Clin Pharmacol Ther. 2012; 91(2):257–63.
doi: 10.1038/clpt.2011.221
pubmed: 22190063
pmcid: 22190063
Ramirez AH, Shi Y, Schildcrout JS, Delaney JT, Xu H, Oetjens MT, et al. Predicting warfarin dosage in European-Americans and African-Americans using DNA samples linked to an electronic health record. Pharmacogenomics. 2012; 13(4):407–18.
doi: 10.2217/pgs.11.164
pubmed: 22329724
pmcid: 22329724
Birdwell KA, Grady B, Choi L, Xu H, Bian A, Denny JC, et al. The use of a DNA biobank linked to electronic medical records to characterize pharmacogenomic predictors of tacrolimus dose requirement in kidney transplant recipients. Pharmacogenet Genomics. 2012; 22(1):32–42.
doi: 10.1097/FPC.0b013e32834e1641
pubmed: 22108237
pmcid: 22108237
Coulet A, Smaïl-Tabbone M, Napoli A, Devignes M-D. Suggested Ontology for Pharmacogenomics (SO-Pharm): Modular Construction and Preliminary Testing. In: On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops, Montpellier, France, October 29 - November 3, 2006. Proceedings, Part I. Springer: 2006. p. 648–57.
Dumontier M, Villanueva-Rosales N. Towards pharmacogenomics knowledge discovery with the semantic web. Brief Bioinform. 2009; 10(2):153–63.
doi: 10.1093/bib/bbn056
pubmed: 19240125
pmcid: 19240125
Coulet A, Garten Y, Dumontier M, Altman RB, Musen MA, Shah NH. Integration and publication of heterogeneous text-mined relationships on the Semantic Web. J Biomed Semant. 2011; 2(S-2):S10.
doi: 10.1186/2041-1480-2-S2-S10
Samwald M, Giménez JM, Boyce RD, Freimuth RR, Adlassnig K-P, Dumontier M. Pharmacogenomic knowledge representation, reasoning and genome-based clinical decision support based on OWL 2 DL ontologies. BMC Med Inform Dec Making. 2015; 15:12.
doi: 10.1186/s12911-015-0130-1
Monnin P, Jonquet C, Legrand J, Napoli A, Coulet A. PGxO: A very lite ontology to reconcile pharmacogenomic knowledge units. In: Methods, tools & platforms for Personalized Medicine in the Big Data Era. NETTAB 2017 Workshop Collection. Palermo: PeerJ PrePrints: 2017. p. 1–4.
Noy NF, McGuinness DL, et al. Ontology development 101: A guide to creating your first ontology. Stanford, CA: Stanford knowledge systems laboratory technical report KSL-01-05 and Stanford medical informatics technical report SMI-2001-0880; 2001.
Dieng R, Corby O, Giboin A, Ribiere M. Methods and tools for corporate knowledge management. Int J Hum Comput Stud. 1999; 51(3):567–98.
doi: 10.1006/ijhc.1999.0281
Musen MA. The protégé project: a look back and a look forward. AI Matters. 2015; 1(4):4–12.
doi: 10.1145/2757001.2757003
pubmed: 27239556
pmcid: 27239556
Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge: Cambridge University Press; 2003.
Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, et al. BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 2009; 37(Web Server issue):W170–3.
doi: 10.1093/nar/gkp440
pubmed: 19483092
pmcid: 19483092
Pgxo summary page on the ncbo bioportal. Available from: https://bioportal.bioontology.org/ontologies/PGXO . Accessed 30 July 2018.
PractiKPharma. Pgxo page on github. Available from: https://github.com/practikpharma/PGxO . Accessed 30 July 2018.
Matentzoglu N, Malone J, Mungall C, Stevens R. MIRO: guidelines for minimum information for the reporting of an ontology. J Biomed Semant. 2018; 9(1):6:1–13. Available from: https://doi.org/10.1186/s13326-017-0172-7 .
doi: 10.1186/s13326-017-0172-7
Gangemi A. Ontology design patterns for semantic web content. In: The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings. Springer: 2005. p. 262–76.
Rindflesch TC, Kilicoglu H, Fiszman M, Rosemblat G, Shin D. Semantic MEDLINE: an advanced information management application for biomedicine. Inf Serv Use. 2011; 31(1-2):15–21.
doi: 10.3233/ISU-2011-0627
Tsuruoka Y, Miwa M, Hamamoto K, Tsujii J, Ananiadou S. Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics. 2011; 27(13):111–9.
doi: 10.1093/bioinformatics/btr214
PractiKPharma. Mappings from pgxo to so-pharm, po, phare and genomic cds. Available from: https://github.com/practikpharma/PGxO/raw/master/mappings/mapp1.owl . Accessed 30 July 2018.
PractiKPharma. Mappings from pgxo to mesh, ncit and snomed ct. Available from: https://github.com/practikpharma/PGxO/raw/master/mappings/mapp2.owl . Accessed 30 July 2018.
Bose R, Frew J. Lineage retrieval for scientific data processing: a survey. ACM Comput Surv. 2005; 37:1–28.
doi: 10.1145/1057977.1057978
Lebo T, Sahoo S, McGuinness D, Belhajjame K, Cheney J, Corsar D, et al. PROV-O: The PROV Ontology. W3C Recommendation. 2013; 30.
Gangemi A. Ontology:dolce+dns ultralite - odp. Available from: http://ontologydesignpatterns.org/wiki/Ontology:DOLCE+DnS_Ultralite . Accessed 30 July 2018.
Horrocks I, Patel-Schneider PF, Bechhofer S, Tsarkov D. Owl rules: A proposal and prototype implementation. Web Semant. 2005; 3(1):23–40.
doi: 10.1016/j.websem.2005.05.003
Motik B, Sattler U, Studer R. Query answering for OWL-DL with rules. J Web Sem. 2005; 3(1):41–60. Available from: https://doi.org/10.1016/j.websem.2005.05.001 .
doi: 10.1016/j.websem.2005.05.001
Krötzsch M. OWL 2 profiles: An introduction to lightweight ontology languages. In: Reasoning Web. Semantic Technologies for Advanced Query Answering - 8th International Summer School 2012, Vienna, Austria, September 3-8, 2012. Proceedings. Springer: 2012. p. 112–83.
Dalleau K, Marzougui Y, Da Silva S, Ringot P, Ndiaye NC, Coulet A. Learning from biomedical linked data to suggest valid pharmacogenes. J Biomed Semant. 2017; 8(1):16.
doi: 10.1186/s13326-017-0125-1
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, et al. Clinvar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2013; 42(D1):D980–5.
doi: 10.1093/nar/gkt1113
pubmed: 24234437
pmcid: 24234437
Piñero J, Queralt-Rosinach N, Bravo À, Deu-Pons J, Bauer-Mehren A, Baron M, et al. Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes. Database. 2015;2015.
Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur Dan, et al. Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2007; 36(suppl_1):D901–6.
doi: 10.1093/nar/gkm958
pubmed: 18048412
pmcid: 18048412
Kuhn M, Letunic I, Jensen LJ, Bork P. The sider database of drugs and side effects. Nucleic Acids Res. 2015; 44(D1):D1075–9.
doi: 10.1093/nar/gkv1075
pubmed: 26481350
pmcid: 26481350
Callahan A, Cruz-Toledo J, Ansell P, Dumontier M. Bio2rdf release 2: improved coverage, interoperability and provenance of life science linked data. In: The Semantic Web: Semantics and Big Data, 10th International Conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings. Springer: 2013. p. 200–12.
Wei C-H, Kao H-Y, Lu Z. Pubtator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013; 41(W1):W518–22.
doi: 10.1093/nar/gkt441
pubmed: 23703206
pmcid: 23703206
Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J. BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics: 2012. p. 102–7.
PractiKPharma. Guidelines of our yet unpublished annotated corpus. Available from: https://github.com/practikpharma/PGxCorpus/raw/master/annotation_guidelines.pdf . Accessed 30 July 2018.
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011; 12(Aug):2493–537.
Quan C, Hua L, Sun X, Bai W. Multichannel convolutional neural network for biological relation extraction. BioMed Res Int. 2016; 2016.
Lebret R, Collobert R. Word embeddings through hellinger PCA. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, April 26-30, 2014, Gothenburg, Sweden. The Association for Computer Linguistics: 2014. p. 482–90. Available from: http://aclweb.org/anthology/E/E14/E14-1051.pdf .
Neuraz A, Chouchana L, Malamut G, Le Beller C, Roche D, Beaune P, et al. Phenome-wide association studies on a quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics. PLoS Comput Biol. 2013; 9(12):e1003405.
doi: 10.1371/journal.pcbi.1003405
pubmed: 24385893
pmcid: 24385893
Ramirez AH, Shi Y, Schildcrout JS, Delaney JT, Xu H, Oetjens MT, et al. Predicting warfarin dosage in European-Americans and African-Americans using DNA samples linked to an electronic health record. Pharmacogenomics. 2012; 13(4):407–18.
doi: 10.2217/pgs.11.164
pubmed: 22329724
pmcid: 22329724
Mosley JD, Shaffer CM, Van Driest SL, Weeke PE, Wells QS, Karnes JH, et al. A genome-wide association study identifies variants in KCNIP4 associated with ACE inhibitor-induced cough. Pharmacogenomics J. 2016; 16(3):231–7.
doi: 10.1038/tpj.2015.51
pubmed: 26169577
pmcid: 26169577
Van Driest SL, McGregor TL, Velez Edwards DR, Saville BR, Kitchner TE, Hebbring SJ, et al. Genome-Wide Association Study of Serum Creatinine Levels during Vancomycin Therapy. PLoS ONE. 2015; 10(6):e0127791.
doi: 10.1371/journal.pone.0127791
pubmed: 26030142
pmcid: 26030142
Wells QS, Veatch OJ, Fessel JP, Joon AY, Levinson RT, Mosley JD, et al. Genome-wide association and pathway analysis of left ventricular function after anthracycline exposure in adults. Pharmacogenet Genomics. 2017; 27(7):247–54.
doi: 10.1097/FPC.0000000000000284
pubmed: 28542097
pmcid: 28542097
Kawai VK, Cunningham A, Vear SI, Van Driest SL, Oginni A, Xu H, et al. Genotype and risk of major bleeding during warfarin treatment. Pharmacogenomics. 2014; 15(16):1973–83.
doi: 10.2217/pgs.14.153
pubmed: 25521356
pmcid: 25521356
Feng Q, Wei WQ, Chung CP, Levinson RT, Bastarache L, Denny JC, et al. The effect of genetic variation in PCSK9 on the LDL-cholesterol response to statin therapy. Pharmacogenomics J. 2017; 17(2):204–8.
doi: 10.1038/tpj.2016.3
pubmed: 26902539
pmcid: 26902539
Karnes JH, Cronin RM, Rollin J, Teumer A, Pouplard C, Shaffer CM, et al. A genome-wide association study of heparin-induced thrombocytopenia using an electronic medical record. Thromb Haemost. 2015; 113(4):772–81.
doi: 10.1160/TH14-08-0670
pubmed: 25503805
pmcid: 25503805
Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther. 2008; 84(3):362–9.
doi: 10.1038/clpt.2008.89
pubmed: 3763939
pmcid: 3763939
Denny JC, Van Driest SL, Wei WQ, Roden DM. The Influence of Big (Clinical) Data and Genomics on Precision Medicine and Drug Development. Clin Pharmacol Ther. 2018; 103(3):409–18.
doi: 10.1002/cpt.951
pubmed: 29171014
pmcid: 29171014
Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, et al. Relations in biomedical ontologies. Genome Biol. 2005; 6(5):R46.
doi: 10.1186/gb-2005-6-5-r46
pubmed: 15892874
pmcid: 15892874
Gottesman O, et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med. 2013; 15(10):761–71.
doi: 10.1038/gim.2013.72
pubmed: 23743551
pmcid: 23743551
Jannot AS, Zapletal E, Avillach P, Mamzer MF, Burgun A, Degoulet P. The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience. Int J Med Inform. 2017; 102:21–8.
doi: 10.1016/j.ijmedinf.2017.02.006
pubmed: 28495345
pmcid: 28495345
Relling MV, Gardner EE, Sandborn WJ, Schmiegelow K, Pui CH, Yee SW, et al. Clinical pharmacogenetics implementation consortium guidelines for thiopurine methyltransferase genotype and thiopurine dosing: 2013 update. Clin Pharmacol Ther. 2013; 1(4):324–5.
doi: 10.1038/clpt.2013.4
Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004; 32(Database-Issue):267–270. Available from: https://doi.org/10.1093/nar/gkh061 .
doi: 10.1093/nar/gkh061
Humphreys BL, Lindberg DA, Schoolman HM, Barnett GO. The Unified Medical Language System: an informatics research collaboration. J Am Med Inform Assoc. 1998; 5(1):1–11.
doi: 10.1136/jamia.1998.0050001
pubmed: 9452981
pmcid: 9452981