simona: a comprehensive R package for semantic similarity analysis on bio-ontologies.
Bio-ontology
Bioconductor
R package
Semantic similarity
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
16 Sep 2024
16 Sep 2024
Historique:
received:
17
04
2024
accepted:
02
09
2024
medline:
17
9
2024
pubmed:
17
9
2024
entrez:
16
9
2024
Statut:
epublish
Résumé
Bio-ontologies are keys in structuring complex biological information for effective data integration and knowledge representation. Semantic similarity analysis on bio-ontologies quantitatively assesses the degree of similarity between biological concepts based on the semantics encoded in ontologies. It plays an important role in structured and meaningful interpretations and integration of complex data from multiple biological domains. We present simona, a novel R package for semantic similarity analysis on general bio-ontologies. Simona implements infrastructures for ontology analysis by offering efficient data structures, fast ontology traversal methods, and elegant visualizations. Moreover, it provides a robust toolbox supporting over 70 methods for semantic similarity analysis. With simona, we conducted a benchmark against current semantic similarity methods. The results demonstrate methods are clustered based on their mathematical methodologies, thus guiding researchers in the selection of appropriate methods. Additionally, we explored annotation-based versus topology-based methods, revealing that semantic similarities solely based on ontology topology can efficiently reveal semantic similarity structures, facilitating analysis on less-studied organisms and other ontologies. Simona offers a versatile interface and efficient implementation for processing, visualization, and semantic similarity analysis on bio-ontologies. We believe that simona will serve as a robust tool for uncovering relationships and enhancing the interoperability of biological knowledge systems.
Sections du résumé
BACKGROUND
BACKGROUND
Bio-ontologies are keys in structuring complex biological information for effective data integration and knowledge representation. Semantic similarity analysis on bio-ontologies quantitatively assesses the degree of similarity between biological concepts based on the semantics encoded in ontologies. It plays an important role in structured and meaningful interpretations and integration of complex data from multiple biological domains.
RESULTS
RESULTS
We present simona, a novel R package for semantic similarity analysis on general bio-ontologies. Simona implements infrastructures for ontology analysis by offering efficient data structures, fast ontology traversal methods, and elegant visualizations. Moreover, it provides a robust toolbox supporting over 70 methods for semantic similarity analysis. With simona, we conducted a benchmark against current semantic similarity methods. The results demonstrate methods are clustered based on their mathematical methodologies, thus guiding researchers in the selection of appropriate methods. Additionally, we explored annotation-based versus topology-based methods, revealing that semantic similarities solely based on ontology topology can efficiently reveal semantic similarity structures, facilitating analysis on less-studied organisms and other ontologies.
CONCLUSIONS
CONCLUSIONS
Simona offers a versatile interface and efficient implementation for processing, visualization, and semantic similarity analysis on bio-ontologies. We believe that simona will serve as a robust tool for uncovering relationships and enhancing the interoperability of biological knowledge systems.
Identifiants
pubmed: 39285315
doi: 10.1186/s12864-024-10759-4
pii: 10.1186/s12864-024-10759-4
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
869Informations de copyright
© 2024. The Author(s).
Références
Osumi-Sutherland D, Xu C, Keays M, Levine AP, Kharchenko PV, Regev A, et al. Cell type ontologies of the Human Cell Atlas. Nat Cell Biol. 2021;23:1129–35.
doi: 10.1038/s41556-021-00787-7
pubmed: 34750578
Schriml LM, Munro JB, Schor M, Olley D, McCracken C, Felix V, et al. The human disease ontology 2022 update. Nucleic Acids Res. 2021;50:D1255–61.
doi: 10.1093/nar/gkab1063
pmcid: 8728220
Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25:1251–5.
doi: 10.1038/nbt1346
pubmed: 17989687
pmcid: 2814061
Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39 suppl_2:W541–5.
doi: 10.1093/nar/gkr469
Ong E, Xiang Z, Zhao B, Liu Y, Lin Y, Zheng J, et al. Ontobee: a linked ontology data server to support ontology term dereferencing, linkage, query and integration. Nucleic Acids Res. 2016;45:D347–52.
doi: 10.1093/nar/gkw918
pubmed: 27733503
pmcid: 5210626
Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A literature review of gene function prediction by modeling gene ontology. Front Genet. 2020;11:400.
doi: 10.3389/fgene.2020.00400
pubmed: 32391061
pmcid: 7193026
Gu Z, Hübschmann D. simplifyEnrichment: a bioconductor package for clustering and visualizing functional enrichment results. Genomics Proteomics Bioinformatics. 2023;21:190–202.
doi: 10.1016/j.gpb.2022.04.008
pubmed: 35680096
Guo X, Liu R, Shriver CD, Hu H, Liebman MN. Assessing semantic similarity measures for the characterization of human regulatory pathways. Bioinformatics. 2006;22:967–73.
doi: 10.1093/bioinformatics/btl042
pubmed: 16492685
Yu G, Luo W, Fu G, Wang J. Interspecies gene function prediction using semantic similarity. BMC Syst Biol. 2016;10:121.
doi: 10.1186/s12918-016-0361-5
pubmed: 28155711
pmcid: 5260010
Garla VN, Brandt C. Semantic similarity in the biomedical domain: an evaluation across knowledge sources. BMC Bioinformatics. 2012;13:261.
doi: 10.1186/1471-2105-13-261
pubmed: 23046094
pmcid: 3533586
Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38:39–41.
doi: 10.1145/219717.219748
Mazandu GK, Chimusa ER, Mulder NJ. Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery. Brief Bioinform. 2017;18:886–901.
pubmed: 27473066
Greene D, Richardson S, Turro E. ontologyX: a suite of R packages for working with ontological data. Bioinformatics. 2017;33:1104–6.
doi: 10.1093/bioinformatics/btw763
pubmed: 28062448
Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics. 2010;26:976–8.
doi: 10.1093/bioinformatics/btq064
pubmed: 20179076
Fröhlich H, Speer N, Poustka A, Beißbarth T. GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products. BMC Bioinformatics. 2007;8:166.
doi: 10.1186/1471-2105-8-166
pubmed: 17519018
pmcid: 1892785
Harispe S, Ranwez S, Janaqi S, Montmain J. The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics. 2014;30:740–2.
Mina M. FastSemSim: A Python package to calculate semantic similarity over ontologies. https://pypi.org/project/fastsemsim/ .
Zhao C, Wang Z. GOGO: an improved algorithm to measure the semantic similarity between gene ontology terms. Sci Rep. 2018;8:15107.
doi: 10.1038/s41598-018-33219-y
pubmed: 30305653
pmcid: 6180005
Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th international joint conference on Artificial intelligence. 1995;1:448–53.
Pesquita C. Semantic similarity in the gene ontology. In: The gene ontology handbook. Methods in molecular biology. New York: Humana Press; 2016.
Lin D. An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning. San Francisco: Morgan Kaufmann Publishers Inc.; 1998. p. 296–304.
Wu Z, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. USA: Association for Computational Linguistics; 1994. p. 133–8.
doi: 10.3115/981732.981751
Mazandu GK, Mulder NJ. A topology-based metric for measuring term similarity in the gene ontology. Adv Bioinform. 2012;2012:e975783.
doi: 10.1155/2012/975783
Nagar A, Al-Mubaid H. A new path length measure based on GO for gene similarity with evaluation using SGD Pathways. In: 2008 21st IEEE International Symposium on Computer-Based Medical Systems. 2008. p. 590–5.
Jackson RC, Balhoff JP, Douglass E, Harris NL, Mungall CJ, Overton JA. ROBOT: a tool for automating ontology workflows. BMC Bioinformatics. 2019;20:407.
doi: 10.1186/s12859-019-3002-3
pubmed: 31357927
pmcid: 6664714
Schulz HJ. Treevis.net: a tree visualization reference. IEEE Comput Graph Appl. 2011;31:11–5.
doi: 10.1109/MCG.2011.103
pubmed: 24808254
Iannone R, Roy O. DiagrammeR: Graph/Network Visualization. https://CRAN.R-project.org/package=DiagrammeR .
Schlicker A, Domingues FS, Rahnenführer J, Lengauer T. A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics. 2006;7:302.
doi: 10.1186/1471-2105-7-302
pubmed: 16776819
pmcid: 1559652
Song X, Li L, Srimani PK, Yu PS, Wang JZ. Measure the semantic similarity of GO terms using aggregate information content. IEEE/ACM Trans Comput Biol Bioinf. 2014;11:468–76.
doi: 10.1109/TCBB.2013.176
Carey VJ. Ontology concepts and tools for statistical genomics. J Multivar Anal. 2004;90:213–28.
doi: 10.1016/j.jmva.2004.02.001
Jain S, Bader GD. An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology. BMC Bioinformatics. 2010;11:562.
doi: 10.1186/1471-2105-11-562
pubmed: 21078182
pmcid: 2998529
Caniza H, Romero AE, Heron S, Yang H, Devoto A, Frasca M, et al. GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology. Bioinformatics. 2014;30:2235–6.
doi: 10.1093/bioinformatics/btu144
pubmed: 24659104
pmcid: 4103586
Mazandu GK, Chimusa ER, Mbiyavanga M, Mulder NJ. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool. Bioinformatics. 2016;32:477–9.
doi: 10.1093/bioinformatics/btv590
pubmed: 26476781
Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, et al. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database. 2020;2020:baaa062.
doi: 10.1093/database/baaa062
pubmed: 32761142
pmcid: 7408187
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;44:D1214–9.
doi: 10.1093/nar/gkv1031
pubmed: 26467479
He Y, Liu Y, Zhao B. OGG: a Biological ontology for representing genes and genomes in specific organisms. CEUR Workshop Proc. 2014;1327:13–20.
He Y, Cowell L, Diehl A, Mobley H, Peters B, Ruttenberg A, et al. VO: Vaccine Ontology. Nat Prec. 2009.