Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining.
atopic dermatitis
eczema
information retrieval
medical terminology
text mining
Journal
Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology
ISSN: 1365-2222
Titre abrégé: Clin Exp Allergy
Pays: England
ID NLM: 8906443
Informations de publication
Date de publication:
09 2021
09 2021
Historique:
received:
09
03
2021
accepted:
30
06
2021
pubmed:
3
7
2021
medline:
24
3
2022
entrez:
2
7
2021
Statut:
ppublish
Résumé
Biomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications. To investigate the consequence of the ambiguity between the use of terms "Eczema" and "Atopic Dermatitis" (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining. Articles were retrieved by querying the PubMed using terms 'eczema' (D003876) and "dermatitis, atopic" (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used. Atopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with "AD" or "Eczema" differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query. There is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning as a tool to spot and characterize ambiguity, and provide the source code for disambiguation at https://github.com/cfrainay/ResearchCodeBase.
Sections du résumé
BACKGROUND
Biomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications.
OBJECTIVE
To investigate the consequence of the ambiguity between the use of terms "Eczema" and "Atopic Dermatitis" (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining.
METHODS
Articles were retrieved by querying the PubMed using terms 'eczema' (D003876) and "dermatitis, atopic" (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used.
RESULTS
Atopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with "AD" or "Eczema" differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query.
CONCLUSIONS AND CLINICAL RELEVANCE
There is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning as a tool to spot and characterize ambiguity, and provide the source code for disambiguation at https://github.com/cfrainay/ResearchCodeBase.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1185-1194Subventions
Organisme : Medical Research Council
ID : MR/S025340/1
Pays : United Kingdom
Organisme : INRAE
Informations de copyright
© 2021 The Authors. Clinical & Experimental Allergy published by John Wiley & Sons Ltd.
Références
Bieber T. How to define atopic dermatitis? Dermatol Clin. 2017;35:275-281. https://doi.org/10.1016/j.det.2017.02.001
Kantor R, Thyssen JP, Paller AS, et al. Atopic dermatitis, atopic eczema, or eczema? A systematic review, meta-analysis, and recommendation for uniform use of ‘atopic dermatitis’. Allergy. 2016;71:1480-1485. https://doi.org/10.1111/all.12982
Hamosh A, Scott AF, Amberger JS, et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33:D514-D517. https://doi.org/10.1093/nar/gki033
Schriml LM, Arze C, Nadendla S, et al. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012;40:940-946. https://doi.org/10.1093/nar/gkr972
Johansson SGO, Bieber T, Dahl R, et al. Revised nomenclature for allergy for global use: Report of the Nomenclature Review Committee of the World Allergy Organization, October 2003. J Allergy Clin Immunol. 2004;113:832-836. https://doi.org/10.1016/j.jaci.2003.12.591
Silverberg JI, Thyssen JP, Paller AS, et al. What’s in a name?: Atopic dermatitis or atopic eczema, but not eczema alone. Allergy. 2017;72:2026-2030. https://doi.org/10.1111/all.13225
Johansson SGO, Bousquet J, Dreborg S, et al. A revised nomenclature for allergy An EAACI position statement from the EAACI nomenclature task force. Allergy. 2001;56:813-824. https://doi.org/10.1111/j.1398-9995.2001.00002.x-i1
Pepys J. Natural history of “atopy”. J Allergy Clin Immunol. 1986;78(5):959-961.
Simpson A, Tan VY, Winn J, et al. Beyond atopy: multiple patterns of sensitization in relation to asthma in a birth cohort study. Am J Respir Crit Care Med. 2010;181(11):1200-1206. https://doi.org/10.1164/rccm.200907-1101OC. Epub 2010 Feb 18 PMID: 20167852.
Custovic A, Custovic D, Kljaic Bukvic B, Fontanella S, Haider S. Atopic phenotypes and their implication in the atopic march. Expert Rev Clin Immunol. 2020;16(9):873-881. https://doi.org/10.1080/1744666X.2020.1816825. Epub 2020 Sep 16 PMID: 3285695911.
Bieber T. Why we need a harmonized name for atopic dermatitis / atopic eczema / eczema!. Allergy. 2016;71:1379-1380. https://doi.org/10.1111/all.12984
Linn LW. The eczema-dermatitis nomenclature problem. Aust J Dermatol. 1951;1:127-134. https://doi.org/10.1111/j.1440-0960.1951.tb01415.x
Khare R, Leaman R, Lu Z. Accessing biomedical literature in the current information landscape. Methods Mol Biol. 2014;1159:11-31. https://doi.org/10.1007/978-1-4939-0709-0_2
Dogan RI, Murray GC, Névéol A, et al. Understanding PubMed user search behavior through log analysis. Database (Oxford). 2009;2009:bap018. https://doi.org/10.1093/database/bap018
Erskovic JORH, Anaka LENYT, Ersh WIH, et al. A Day in the Life of PubMed: Analysis of a Typical Day’s Query Log. J. Am. Med. Informatics Assoc. 2007;14(212-220): https://doi.org/10.1197/jamia.M2191
Gonzalez GH, Tahsin T, Goodale BC, et al. Recent advances and emerging applications in text and data mining for biomedical discovery. Brief Bioinform. 2016;17:33-42. https://doi.org/10.1093/bib/bbv087
Krallinger M, Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biol. 2005;6:224. https://doi.org/10.1186/gb-2005-6-7-224
Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet. 2006;7:119-129. https://doi.org/10.1038/nrg1768
Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 2004;5:1-13. https://doi.org/10.1186/1471-2105-5-147
Nelson SJ. Medical terminologies that work: The example of MeSH. I-SPAN 2009-10th Int. Symp Pervasive Syst Algorithms Networks. 2009;380-384: https://doi.org/10.1109/I-SPAN.2009.84
Smalheiser NR, Bonifield G. Two similarity metrics for medical subject headings (MeSH): an aid to biomedical text mining and author name disambiguation. J Biomed Discov Collab. 2016;7:1-14. https://doi.org/10.5210/disco.v7i0.6654
Bird S, Loper E. NLTK: The Natural Language Toolkit. Proc. ACL 2004 Interact. poster Demonstr. Sess. 2004; 31. https://doi.org/10.3115/1219044.1219075
Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38:39-41. https://doi.org/10.1145/219717.219748
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825-2830.
Liu Y, Liang Y, Wishart D. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res. 2015;43:535-542. https://doi.org/10.1093/nar/gkv383
Andrade-navarro MA, Fontaine JF. Gene Set to Diseases (GS2D): disease enrichment analysis on human gene sets with literature data. Genomics Comput Biol. 2016;2:1-7. https://doi.org/10.18547/gcb.2016.vol2.iss1.e33
Sartor MA, Ade A, Wright Z, et al. Metab2MeSH: annotating compounds with medical subject headings. Bioinformatics. 2012;28:1408-1410. https://doi.org/10.1093/bioinformatics/bts156
Gijón-Correas JA, Andrade-navarro MA, Fontaine J-F. Alkemio: association of chemicals with biomedical topics by text and data mining. Nucleic Acids Res. 2014;42:422-429. https://doi.org/10.1093/nar/gku432
Fontaine J-F, Priller F, Barbosa-silva A, et al. Génie: literature-based gene prioritization at multi genomic scale. Nucleic Acids Res. 2011;39:455-461. https://doi.org/10.1093/nar/gkr246
Wise F, Sulzberger MB. Year Book of Dermatology and Syphilology. Year B. Dermatology Syphilol. Chicago Year B. Publ. 1933;38-39.
Taïeb A, Wallach D, Tilles G. The History of Atopic Eczema / Dermatitis. Handbook of atopic eczema. Berlin, Heidelberg: Springer; 2006;10-20. https://doi.org/10.1007/3-540-29856-8_2
Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146-3154.
Liu H, Johnson SB, Friedman C. Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS. J Am Med Informatics Assoc. 2002;9:621-636. https://doi.org/10.1197/jamia.M1101
Jimeno-yepes AJ, Aronson AR. Knowledge-based biomedical word sense disambiguation: comparison of approaches. BMC Bioinformatics. 2010;11:1-12. https://doi.org/10.1186/1471-2105-11-569