Assessing domain adaptation in adverse drug event extraction on real-world breast cancer records.

Adverse drug event Breast cancer Electronic health record Natural language processing

Journal

International journal of medical informatics
ISSN: 1872-8243
Titre abrégé: Int J Med Inform
Pays: Ireland
ID NLM: 9711057

Informations de publication

Date de publication:
09 Jul 2024
Historique:
received: 29 08 2023
revised: 21 06 2024
accepted: 01 07 2024
medline: 1 8 2024
pubmed: 1 8 2024
entrez: 31 7 2024
Statut: aheadofprint

Résumé

Adverse Drug Events (ADE) are key information present in unstructured portions of Electronic Health Records. These pose a significant challenge in healthcare, ranging from mild discomfort to severe complications, and can impact patient safety and treatment outcomes. We explore the influence of domain shift between a set of dummy clinical notes and a real-world hospital corpus of Japanese clinical notes of breast cancer treatment when extracting ADEs from free text. We annotated a subset of the hospital dataset and used it to fine-tune a Named Entity Recognition (NER) model, initially trained with the set of dummy documents. We used increasing amounts of the annotated data and evaluated the impact on the model's performance. Additionally, we examined the extracted information to identify combinations of drugs that are likely to cause ADEs. We show that domain adaptation can significantly improve model performance in the new domain, as by feeding a small subset of 100 documents for the fine-tuning process we saw a 40% improvement in model performance. However, we also noticed diminishing returns when fine-tuning the model with a larger dataset. For instance, by feeding eight times more data, we only saw further 18% improvement in extraction performance. While variations in writing style and vocabulary in clinical corpora can significantly impact the quality of NER results. We show that domain adaptation can be of great aid in mitigating these discrepancies and achieving better performance. Yet, while providing in-domain data to a model helps, there are diminishing returns when fine-tuning with large amounts of data.

Sections du résumé

BACKGROUND BACKGROUND
Adverse Drug Events (ADE) are key information present in unstructured portions of Electronic Health Records. These pose a significant challenge in healthcare, ranging from mild discomfort to severe complications, and can impact patient safety and treatment outcomes.
METHODS METHODS
We explore the influence of domain shift between a set of dummy clinical notes and a real-world hospital corpus of Japanese clinical notes of breast cancer treatment when extracting ADEs from free text. We annotated a subset of the hospital dataset and used it to fine-tune a Named Entity Recognition (NER) model, initially trained with the set of dummy documents. We used increasing amounts of the annotated data and evaluated the impact on the model's performance. Additionally, we examined the extracted information to identify combinations of drugs that are likely to cause ADEs.
RESULTS RESULTS
We show that domain adaptation can significantly improve model performance in the new domain, as by feeding a small subset of 100 documents for the fine-tuning process we saw a 40% improvement in model performance. However, we also noticed diminishing returns when fine-tuning the model with a larger dataset. For instance, by feeding eight times more data, we only saw further 18% improvement in extraction performance.
CONCLUSION CONCLUSIONS
While variations in writing style and vocabulary in clinical corpora can significantly impact the quality of NER results. We show that domain adaptation can be of great aid in mitigating these discrepancies and achieving better performance. Yet, while providing in-domain data to a model helps, there are diminishing returns when fine-tuning with large amounts of data.

Identifiants

pubmed: 39084086
pii: S1386-5056(24)00202-8
doi: 10.1016/j.ijmedinf.2024.105539
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

105539

Informations de copyright

Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Gabriel Herman Bernardim Andrade (G)

Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan. Electronic address: herman_bernardim_andrade.hi1@is.naist.jp.

Tomohiro Nishiyama (T)

Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan.

Takako Fujimaki (T)

Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan.

Shuntaro Yada (S)

Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan.

Shoko Wakamiya (S)

Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan.

Mari Takagi (M)

Department of Pharmacy, Osaka International Cancer Institute, 3-1-69 Otemae, Chuo-ku, 541-8567, Osaka, Japan.

Mizuki Kato (M)

Cancer Control Center, Osaka International Cancer Institute, 3-1-69 Otemae, Chuo-ku, 541-8567, Osaka, Japan.

Isao Miyashiro (I)

Cancer Control Center, Osaka International Cancer Institute, 3-1-69 Otemae, Chuo-ku, 541-8567, Osaka, Japan.

Eiji Aramaki (E)

Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan. Electronic address: aramaki@is.naist.jp.

Classifications MeSH