Information extraction from German radiological reports for general clinical text and language understanding.

Humans Language Information Storage and Retrieval Semantics Natural Language Processing Radiology

Journal

Scientific reports

ISSN: 2045-2322

Titre abrégé: Sci Rep

Pays: England

ID NLM: 101563288

Informations de publication

Date de publication:
09 02 2023

Historique:

received: 05 11 2022

accepted: 02 02 2023

entrez: 9 2 2023

pubmed: 10 2 2023

medline: 14 2 2023

Statut: epublish

Résumé

Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting.

Identifiants

DOI: 10.1038/s41598-023-29323-3 PMID: 36759679 PMC: PMC9911592

pubmed: 36759679

doi: 10.1038/s41598-023-29323-3

pii: 10.1038/s41598-023-29323-3

pmc: PMC9911592

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

2353

Informations de copyright

Références

Wang, Y. et al. Clinical information extraction applications: A literature review. J. Biomed. Inform. 77, 34–49 (2018).

doi: 10.1016/j.jbi.2017.11.011 pubmed: 29162496

Sheikhalishahi, S. et al. Natural language processing of clinical notes on chronic diseases: Systematic review. JMIR Med. Inform. 7, e12239 (2019).

doi: 10.2196/12239 pubmed: 31066697 pmcid: 6528438

Kreimeyer, K. et al. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J. Biomed. Inform. 73, 14–29 (2017).

doi: 10.1016/j.jbi.2017.07.012 pubmed: 28729030 pmcid: 6864736

Wu, S. et al. Deep learning in clinical natural language processing: A methodical review. J. Am. Med. Inform. Assoc. 27, 457–470 (2020).

doi: 10.1093/jamia/ocz200 pubmed: 31794016

Pons, E., Braun, L. M., Hunink, M. M. & Kors, J. A. Natural language processing in radiology: A systematic review. Radiology 279, 329–343 (2016).

doi: 10.1148/radiol.16142770 pubmed: 27089187

Maros, M. E. et al. Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual radlex mappings. Sci. Rep. 11, 1–18 (2021).

doi: 10.1038/s41598-021-85016-9

Viani, N. et al. A natural language processing approach for identifying temporal disease onset information from mental healthcare text. Sci. Rep. 11, 1–12 (2021).

doi: 10.1038/s41598-020-80457-0

Chu, S. H. et al. An independently validated, portable algorithm for the rapid identification of copd patients using electronic health records. Sci. Rep. 11, 1–9 (2021).

doi: 10.1038/s41598-021-98719-w

Zeng, X., Linwood, S. L. & Liu, C. Pretrained transformer framework on pediatric claims data for population specific tasks. Sci. Rep. 12, 1–13 (2022).

Khurshid, S. et al. Cohort design and natural language processing to reduce bias in electronic health records research. NPJ Digit. Med. 5, 1–14 (2022).

doi: 10.1038/s41746-022-00590-0

Patel, T. A. et al. Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer 123, 114–121 (2017).

doi: 10.1002/cncr.30245 pubmed: 27571243

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

Roller, R. et al. Information extraction models for German clinical text. In 2020 IEEE International Conference on Healthcare Informatics (ICHI), 1–2 (IEEE, 2020).

Toepfer, M. et al. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med. Inform. Decis. Mak. 15, 1–16 (2015).

Madan, S. et al. Deep learning-based detection of psychiatric attributes from German mental health records. Int. J. Med. Inform. 161, 104724 (2022).

doi: 10.1016/j.ijmedinf.2022.104724 pubmed: 35279550

Frei, J. & Kramer, F. GERNERMED: An open German medical NER model. Softw. Impacts 11, 100212 (2022).

doi: 10.1016/j.simpa.2021.100212

Bressem, K. K. et al. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics. 36, 5255–5261 (2020).

doi: 10.1093/bioinformatics/btaa668

Fink, M. A. et al. Deep learning-based assessment of oncologic outcomes from natural language processing of structured radiology reports. Radiol. Artif. Intell. 4, e220055 (2022).

doi: 10.1148/ryai.220055 pubmed: 36204531 pmcid: 9530771

Liang, S. et al. Fine-tuning BERT models for summarizing German radiology findings. In Proceedings of the 4th Clinical Natural Language Processing Workshop, 30–40 (2022).

Ghaddar, A., Langlais, P., Rashid, A. & Rezagholizadeh, M. Context-aware adversarial training for name regularity bias in named entity recognition. Trans. Assoc. Comput. Linguist. 9, 586–604 (2021).

doi: 10.1162/tacl_a_00386

Mishra, S., He, S. & Belli, L. Assessing demographic bias in named entity recognition. arXiv preprint arXiv:2008.03415 (2020).

Irvin, J. et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 590–597 (2019).

Jain, S. et al. Radgraph: Extracting clinical entities and relations from radiology reports. arXiv preprint arXiv:2106.14463 (2021).

Ramponi, A. & Plank, B. Neural unsupervised domain adaptation in NLP—A survey. In Proceedings of the 28th International Conference on Computational Linguistics (International Committee on Computational Linguistics, 2020).

Salhofer, E., Liu, X. L. & Kern, R. Impact of training instance selection on domain-specific entity extraction using BERT. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, 83–88 (2022).

Settles, B. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences (2009).

Scheffer, T., Decomain, C. & Wrobel, S. Active hidden markov models for information extraction. In International Symposium on Intelligent Data Analysis, 309–318 (Springer, 2001).

Jiang, H. & Gupta, M. Minimum-margin active learning. arXiv preprint arXiv:1906.00025 (2019).

Shrestha, M. Development of a language model for medical domain. Ph.D. thesis, (Hochschule Rhein-Waal, 2021).

Wu, S. & He, Y. Enriching pre-trained language model with entity information for relation classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2361–2364 (2019).

Gururangan, S. et al. Don’t stop pretraining: adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964 (2020).

Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2021).

doi: 10.1145/3458754

Casey, A. et al. A systematic review of natural language processing applied to radiology reports. BMC Med. Inform. Decis. Mak. 21, 1–18 (2021).

doi: 10.1186/s12911-021-01533-7

Zech, J. et al. Natural language-based machine learning models for the annotation of clinical radiology reports. Radiology 287, 570–580 (2018).

doi: 10.1148/radiol.2018171093 pubmed: 29381109

Proisl, T. & Uhrig, P. SoMaJo: State-of-the-art tokenization for German web and social media texts. In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task, 57–62 (Association for Computational Linguistics (ACL), 2016).

Uzuner, Ö., South, B. R., Shen, S. & DuVall, S. L. 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18, 552–556 (2011).

doi: 10.1136/amiajnl-2011-000203 pubmed: 21685143 pmcid: 3168320

Uzuner, Ö., Solti, I. & Cadag, E. Extracting medication information from clinical text. J. Am. Med. Inform. Assoc. 17, 514–518 (2010).

doi: 10.1136/jamia.2010.003947 pubmed: 20819854 pmcid: 2995677

Stenetorp, P. et al. Brat: a web-based tool for NLP-assisted text annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 102–107 (2012).

Salazar, J., Liang, D., Nguyen, T. Q. & Kirchhoff, K. Masked language model scoring. arXiv preprint arXiv:1910.14659 (2019).

Information extraction from German radiological reports for general clinical text and language understanding.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Michael Jantscher (M)

Felix Gunzer (F)

Roman Kern (R)

Eva Hassler (E)

Sebastian Tschauner (S)

Gernot Reishofer (G)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH