Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology.

Data Mining Machine Learning Natural Language Processing Pilot Projects Radiology

Classification system Free text Machine learning Natural language processing Radiology Reporting

Journal

Journal of digital imaging

ISSN: 1618-727X

Titre abrégé: J Digit Imaging

Pays: United States

ID NLM: 9100529

Informations de publication

Date de publication:
08 2020

Historique:

pubmed: 23 2 2020

medline: 17 8 2021

entrez: 21 2 2020

Statut: ppublish

Résumé

Reports are the standard way of communication between the radiologist and the referring clinician. Efforts are made to improve this communication by, for instance, introducing standardization and structured reporting. Natural Language Processing (NLP) is another promising tool which can improve and enhance the radiological report by processing free text. NLP as such adds structure to the report and exposes the information, which in turn can be used for further analysis. This paper describes pre-processing and processing steps and highlights important challenges to overcome in order to successfully implement a free text mining algorithm using NLP tools and machine learning in a small language area, like Dutch. A rule-based algorithm was constructed to classify T-stage of pulmonary oncology from the original free text radiological report, based on the items tumor size, presence and involvement according to the 8th TNM classification system. PyContextNLP, spaCy and regular expressions were used as tools to extract the correct information and process the free text. Overall accuracy of the algorithm for evaluating T-stage was 0,83 in the training set and 0,87 in the validation set, which shows that the approach in this pilot study is promising. Future research with larger datasets and external validation is needed to be able to introduce more machine learning approaches and perhaps to reduce required input efforts of domain-specific knowledge. However, a hybrid NLP approach will probably achieve the best results.

Identifiants

DOI: 10.1007/s10278-020-00327-z PMID: 32076924 PMC: PMC7522136

pubmed: 32076924

doi: 10.1007/s10278-020-00327-z

pii: 10.1007/s10278-020-00327-z

pmc: PMC7522136

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1002-1008

Références

McGinty GB, Allen B, Geis JR, Wald C: IT infrastructure in the era of imaging 3.0. J Am Coll Radiol 11:1197–1204, 2014

doi: 10.1016/j.jacr.2014.09.005

Brierley J, Gospodarowicz MK, Wittekind C Eds: TNM classification of malignant tumours, 8th edition. Chichester: John Wiley & Sons Inc., 2017

Puts S, Nobel JM: Medical narrative to structure: maastroclinic/medstruct. maastroclinic, 2019

Krupinski EA, Hall ET, Jaw S, Reiner B, Siegel E: Influence of radiology report format on reading time and comprehension. J Digit Imaging 25:63–69, 2012

doi: 10.1007/s10278-011-9424-8

Pons E, Braun LMM, Hunink MGM, Kors JA: Natural language processing in radiology: A systematic review. Radiology 279:329–343, 2016

doi: 10.1148/radiol.16142770

Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–513, 2010

doi: 10.1136/jamia.2009.001560

Cornet R, van Eldik A, de Keizer N: Inventory of tools for Dutch clinical language processing. Stud Health Technol Inform 180:245–249, 2012

pubmed: 22874189

Nguyen AN, Lawley MJ, Hansen DP, Bowman RV, Clarke BE, Duhig EE, Colquist S: Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc 17:440–445, 2010

doi: 10.1136/jamia.2010.003707

Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S, Bekhuis T, Jacobson RT: Automated annotation and classification of BI-RADS assessment from radiology reports. J Biomed Inform 69:177–187, 2017

doi: 10.1016/j.jbi.2017.04.011

Pathak S, van Rossen J, Vijlbrief O, Geerdink J, Seifert C, van Keulen M: Automatic Structuring of Breast Cancer Radiology Reports for Quality Assurance. IEEE international conference on data mining workshops (ICDMW), Singapore, IEEE 2018(732–739):2018, 2018

Honnibal M, Montani I: Spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear: 7, 2017

Soldaini L, Goharian N: QuickUMLS: a fast, unsupervised approach for medical concept extraction. MedIR workshop, sigir, 2016. Available at http://ir.cs.georgetown.edu/downloads/quickumls.pdf . Accessed 6 May 2019.

Côté RA, Robboy S: Progress in medical information management. Systematized nomenclature of medicine (SNOMED). JAMA 243:756–762, 1980

doi: 10.1001/jama.1980.03300340032015

Chapman BE, Lee S, Kang HP, Chapman WW: Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform 44:728–737, 2011

doi: 10.1016/j.jbi.2011.03.011

Chapman WW, Hillert D, Velupillai S, Kvist M, Skeppstedt M, Chapman BE, Conway M, Tharp M, Mowery DL, Deleger L: Extending the NegEx lexicon for multiple languages. Stud Health Technol Inform 192:677–681, 2013

pubmed: 23920642 pmcid: 3923890

Afzal Z, Pons E, Kang N, Sturkenboom MC, Schuemie MJ, Kors JA: ContextD: An algorithm to identify contextual properties of medical terms in a Dutch clinical corpus. BMC Bioinformatics 15:373, 2014

doi: 10.1186/s12859-014-0373-3

Chapman WW: Extract context modifiers targeting clinical terms: Maastroclinic/pyConTextNLP 2019. Available at https://github.com/maastroclinic/pyConTextNLP . Accessed 19 June 2019.

Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018, 2016

doi: 10.1038/sdata.2016.18

Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Références

Auteurs

J Martijn Nobel (JM)

Sander Puts (S)

Frans C H Bakers (FCH)

Simon G F Robben (SGF)

André L A J Dekker (ALAJ)

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Intraarticular gold microparticles using hyaluronic acid as the carrier for hip osteoarthritis. A 2-year follow-up pilot study.

Understanding the role of machine learning in predicting progression of osteoarthritis.

Couple-Focused Smartphone Intervention to Reduce Problem Drinking: Pilot Randomized Control Trial.

Classifications MeSH