Information extraction from German radiological reports for general clinical text and language understanding.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
09 02 2023
Historique:
received: 05 11 2022
accepted: 02 02 2023
entrez: 9 2 2023
pubmed: 10 2 2023
medline: 14 2 2023
Statut: epublish

Résumé

Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting.

Identifiants

pubmed: 36759679
doi: 10.1038/s41598-023-29323-3
pii: 10.1038/s41598-023-29323-3
pmc: PMC9911592
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2353

Informations de copyright

© 2023. The Author(s).

Références

Wang, Y. et al. Clinical information extraction applications: A literature review. J. Biomed. Inform. 77, 34–49 (2018).
doi: 10.1016/j.jbi.2017.11.011 pubmed: 29162496
Sheikhalishahi, S. et al. Natural language processing of clinical notes on chronic diseases: Systematic review. JMIR Med. Inform. 7, e12239 (2019).
doi: 10.2196/12239 pubmed: 31066697 pmcid: 6528438
Kreimeyer, K. et al. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J. Biomed. Inform. 73, 14–29 (2017).
doi: 10.1016/j.jbi.2017.07.012 pubmed: 28729030 pmcid: 6864736
Wu, S. et al. Deep learning in clinical natural language processing: A methodical review. J. Am. Med. Inform. Assoc. 27, 457–470 (2020).
doi: 10.1093/jamia/ocz200 pubmed: 31794016
Pons, E., Braun, L. M., Hunink, M. M. & Kors, J. A. Natural language processing in radiology: A systematic review. Radiology 279, 329–343 (2016).
doi: 10.1148/radiol.16142770 pubmed: 27089187
Maros, M. E. et al. Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual radlex mappings. Sci. Rep. 11, 1–18 (2021).
doi: 10.1038/s41598-021-85016-9
Viani, N. et al. A natural language processing approach for identifying temporal disease onset information from mental healthcare text. Sci. Rep. 11, 1–12 (2021).
doi: 10.1038/s41598-020-80457-0
Chu, S. H. et al. An independently validated, portable algorithm for the rapid identification of copd patients using electronic health records. Sci. Rep. 11, 1–9 (2021).
doi: 10.1038/s41598-021-98719-w
Zeng, X., Linwood, S. L. & Liu, C. Pretrained transformer framework on pediatric claims data for population specific tasks. Sci. Rep. 12, 1–13 (2022).
Khurshid, S. et al. Cohort design and natural language processing to reduce bias in electronic health records research. NPJ Digit. Med. 5, 1–14 (2022).
doi: 10.1038/s41746-022-00590-0
Patel, T. A. et al. Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer 123, 114–121 (2017).
doi: 10.1002/cncr.30245 pubmed: 27571243
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
Roller, R. et al. Information extraction models for German clinical text. In 2020 IEEE International Conference on Healthcare Informatics (ICHI), 1–2 (IEEE, 2020).
Toepfer, M. et al. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med. Inform. Decis. Mak. 15, 1–16 (2015).
Madan, S. et al. Deep learning-based detection of psychiatric attributes from German mental health records. Int. J. Med. Inform. 161, 104724 (2022).
doi: 10.1016/j.ijmedinf.2022.104724 pubmed: 35279550
Frei, J. & Kramer, F. GERNERMED: An open German medical NER model. Softw. Impacts 11, 100212 (2022).
doi: 10.1016/j.simpa.2021.100212
Bressem, K. K. et al. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics. 36, 5255–5261 (2020).
doi: 10.1093/bioinformatics/btaa668
Fink, M. A. et al. Deep learning-based assessment of oncologic outcomes from natural language processing of structured radiology reports. Radiol. Artif. Intell. 4, e220055 (2022).
doi: 10.1148/ryai.220055 pubmed: 36204531 pmcid: 9530771
Liang, S. et al. Fine-tuning BERT models for summarizing German radiology findings. In Proceedings of the 4th Clinical Natural Language Processing Workshop, 30–40 (2022).
Ghaddar, A., Langlais, P., Rashid, A. & Rezagholizadeh, M. Context-aware adversarial training for name regularity bias in named entity recognition. Trans. Assoc. Comput. Linguist. 9, 586–604 (2021).
doi: 10.1162/tacl_a_00386
Mishra, S., He, S. & Belli, L. Assessing demographic bias in named entity recognition. arXiv preprint arXiv:2008.03415 (2020).
Irvin, J. et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 590–597 (2019).
Jain, S. et al. Radgraph: Extracting clinical entities and relations from radiology reports. arXiv preprint arXiv:2106.14463 (2021).
Ramponi, A. & Plank, B. Neural unsupervised domain adaptation in NLP—A survey. In Proceedings of the 28th International Conference on Computational Linguistics (International Committee on Computational Linguistics, 2020).
Salhofer, E., Liu, X. L. & Kern, R. Impact of training instance selection on domain-specific entity extraction using BERT. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, 83–88 (2022).
Settles, B. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences (2009).
Scheffer, T., Decomain, C. & Wrobel, S. Active hidden markov models for information extraction. In International Symposium on Intelligent Data Analysis, 309–318 (Springer, 2001).
Jiang, H. & Gupta, M. Minimum-margin active learning. arXiv preprint arXiv:1906.00025 (2019).
Shrestha, M. Development of a language model for medical domain. Ph.D. thesis, (Hochschule Rhein-Waal, 2021).
Wu, S. & He, Y. Enriching pre-trained language model with entity information for relation classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2361–2364 (2019).
Gururangan, S. et al. Don’t stop pretraining: adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964 (2020).
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2021).
doi: 10.1145/3458754
Casey, A. et al. A systematic review of natural language processing applied to radiology reports. BMC Med. Inform. Decis. Mak. 21, 1–18 (2021).
doi: 10.1186/s12911-021-01533-7
Zech, J. et al. Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology 287, 570–580 (2018).
doi: 10.1148/radiol.2018171093 pubmed: 29381109
Proisl, T. & Uhrig, P. SoMaJo: State-of-the-art tokenization for German web and social media texts. In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task, 57–62 (Association for Computational Linguistics (ACL), 2016).
Uzuner, Ö., South, B. R., Shen, S. & DuVall, S. L. 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18, 552–556 (2011).
doi: 10.1136/amiajnl-2011-000203 pubmed: 21685143 pmcid: 3168320
Uzuner, Ö., Solti, I. & Cadag, E. Extracting medication information from clinical text. J. Am. Med. Inform. Assoc. 17, 514–518 (2010).
doi: 10.1136/jamia.2010.003947 pubmed: 20819854 pmcid: 2995677
Stenetorp, P. et al. Brat: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 102–107 (2012).
Salazar, J., Liang, D., Nguyen, T. Q. & Kirchhoff, K. Masked language model scoring. arXiv preprint arXiv:1910.14659 (2019).

Auteurs

Michael Jantscher (M)

Know-Center, 8010, Graz, Austria.

Felix Gunzer (F)

Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria.

Roman Kern (R)

Know-Center, 8010, Graz, Austria.

Eva Hassler (E)

Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria.

Sebastian Tschauner (S)

Division of Pediatric Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria.

Gernot Reishofer (G)

Department of Radiology, Medical University Graz, 8036, Graz, Austria. gernot.reishofer@medunigraz.at.
BioTechMed-Graz, 8010, Graz, Austria. gernot.reishofer@medunigraz.at.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH