TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.

Humans Female Natural Language Processing Deep Learning Machine Learning Benchmarking Middle Aged Breast Neoplasms / diagnostic imaging Mammography / classification Datasets as Topic Radiology Information Systems / standards Adult

BI-RADS classification Breast radiological reports ML NLP TF-IDF Word2vec

Journal

BMC medical informatics and decision making

ISSN: 1472-6947

Titre abrégé: BMC Med Inform Decis Mak

Pays: England

ID NLM: 101088682

Informations de publication

Date de publication:
24 Oct 2024

Historique:

received: 07 09 2023

accepted: 10 10 2024

medline: 24 10 2024

pubmed: 24 10 2024

entrez: 24 10 2024

Statut: epublish

Résumé

Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification. The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607). In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607).

CONCLUSION CONCLUSIONS

In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.

Identifiants

DOI: 10.1186/s12911-024-02717-7 PMID: 39444035

pubmed: 39444035

doi: 10.1186/s12911-024-02717-7

pii: 10.1186/s12911-024-02717-7

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

310

Informations de copyright

Références

Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65(2):87–108. https://doi.org/10.3322/caac.21262 .

doi: 10.3322/caac.21262 pubmed: 25651787

Berry DA, Cronin KA, Plevritis SK, Fryback DG, Clarke L, Zelen M, et al. Effect of screening and adjuvant therapy on mortality from breast cancer. N Engl J Med. 2005;353(17):1784–92. https://doi.org/10.1056/nejmoa050518 .

doi: 10.1056/nejmoa050518 pubmed: 16251534

Screening for Breast Cancer: U.S. Preventive Services Task Force Recommendation Statement. Ann Intern Med. 2009;151(10):716. https://doi.org/10.7326/0003-4819-151-10-200911170-00008 .

Oeffinger KC, Fontham ETH, Etzioni R, Herzig A, Michaelson JS, Shih YCT, et al. Breast Cancer Screening for Women at Average Risk: 2015 Guideline Update From the American Cancer Society. JAMA. 2015;314(15):1599–614.

doi: 10.1001/jama.2015.12783 pubmed: 26501536 pmcid: 4831582

Sickles EA, D’Orsi CJ. Einleitung. In: ACR BI-RADS®-Atlas der Mammadiagnostik. Springer Berlin Heidelberg; 2016. pp. 475–480. https://doi.org/10.1007/978-3-662-48818-8_15 .

Tariq A, Assen MV, Cecco CND, Banerjee I. Bridging the Gap between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography. ACM Trans Comput Healthc. 2021;3(1):1–20. https://doi.org/10.1145/3474831 .

doi: 10.1145/3474831

Cury RC, Abbara S, Achenbach S, Agatston AS, Berman DS, Budoff MJ, et al. Coronary Artery Disease - Reporting and Data System (CAD-RADS): An Expert Consensus Document of SCCT, ACR and NASCI: Endorsed by the ACC. JACC Cardiovasc Imaging. 2016;9(9):1099–113.

doi: 10.1016/j.jcmg.2016.05.005 pubmed: 27609151

Reiner BI. The Challenges, Opportunities, and Imperative of Structured Reporting in Medical Imaging. J Digit Imaging Off J Soc Comput Appl Radiol. 2009;22:562–8.

Sevenster M, van Ommering R, Qian Y. Automatically Correlating Clinical Findings and Body Locations in Radiology Reports Using MedLEE. J Digit Imaging. 2011;25(2):240–9. https://doi.org/10.1007/s10278-011-9411-0 .

doi: 10.1007/s10278-011-9411-0 pmcid: 3295967

Ip IK, Mortele KJ, Prevedello LM, Khorasani R. Repeat Abdominal Imaging Examinations in a Tertiary Care Hospital. Am J Med. 2012;125(2):155–61. https://doi.org/10.1016/j.amjmed.2011.03.031 .

doi: 10.1016/j.amjmed.2011.03.031 pubmed: 22269618 pmcid: 4447187

Cheng LTE, Zheng J, Savova GK, Erickson BJ. Discerning Tumor Status from Unstructured MRI Reports—Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing. J Digit Imaging. 2009;23(2):119–32. https://doi.org/10.1007/s10278-009-9215-7 .

doi: 10.1007/s10278-009-9215-7 pubmed: 19484309 pmcid: 2837158

Bozkurt S, Lipson JA, Senol U, Rubin DL. Automatic abstraction of imaging observations with their characteristics from mammography reports. J Am Med Inform Assoc. 2014;22(e1):e81–92. https://doi.org/10.1136/amiajnl-2014-003009 .

doi: 10.1136/amiajnl-2014-003009 pubmed: 25352567

Percha B, Nassif H, Lipson J, Burnside E, Rubin D. Automatic classification of mammography reports by BI-RADS breast tissue composition class. J Am Med Inform Assoc. 2012;19(5):913–6. https://doi.org/10.1136/amiajnl-2011-000607 .

doi: 10.1136/amiajnl-2011-000607 pubmed: 22291166 pmcid: 3422822

Morioka C, Meng F, Taira R, Sayre J, Zimmerman P, Ishimitsu D, et al. Automatic Classification of Ultrasound Screening Examinations of the Abdominal Aorta. J Digit Imaging. 2016;29(6):742–8. https://doi.org/10.1007/s10278-016-9889-6 .

doi: 10.1007/s10278-016-9889-6 pubmed: 27400914 pmcid: 5114229

Solti I, Cooke CR, Xia F, Wurfel MM. Automated classification of radiology reports for acute lung injury: Comparison of keyword and machine learning based natural language processing approaches. In: 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop. IEEE; 2009. https://doi.org/10.1109/bibmw.2009.5332081 .

Zuccon G. Automatic Classification of Free-Text Radiology Reports to Identify Limb Fractures using Machine Learning and the SNOMED CT Ontology. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science; 2013.

Boumaraf S, Liu X, Ferkous C, Ma X. A New Computer-Aided Diagnosis System with Modified Genetic Feature Selection for BI-RADS Classification of Breast Masses in Mammograms. BioMed Res Int. 2020;2020(1):7695207.

pubmed: 32462017 pmcid: 7238352

Saslow D, Boetes C, Burke W, Harms SE, Leach MO, Lehman CD, et al. American cancer society guidelines for breast screening with MRI as an adjunct to mammography. CA Cancer J Clin. 2007;57:75–89.

doi: 10.3322/canjclin.57.2.75 pubmed: 17392385

An JY, Unsdorfer KML, Weinreb JC. BI-RADS, C-RADS, CAD-RADS, LI-RADS, Lung-RADS, NI-RADS, O-RADS, PI-RADS, TI-RADS: Reporting and Data Systems. Radiological Society of North America (RSNA); 2019. https://doi.org/10.1148/rg.2019190087.pres .

Burnside ES, Sickles EA, Bassett LW, Rubin DL, Lee CH, Ikeda DM, et al. The ACR BI-RADS® Experience: Learning From History. J Am Coll Radiol. 2009;6(12):851–60. https://doi.org/10.1016/j.jacr.2009.07.023 .

doi: 10.1016/j.jacr.2009.07.023 pubmed: 19945040 pmcid: 3099247

D’Orsi C. Breast Imaging Reporting and Data System (BI-RADS). Lee CI, Lehman CD, Bassett LW, editors. Oxford University Press; 2018. https://doi.org/10.1093/med/9780190270261.003.0005 .

of Radiology AC, et al. ACR BI-RADS® atlas of breast diagnostics: guidelines for diagnosis, recommendations for action and monitoring. Springer-Verlag; 2016.

Niknejad M, Weerakkody Y. Breast imaging-reporting and data system (BI-RADS). Radiopaedia.org; 2010. https://doi.org/10.53347/rid-10003 .

D’Orsi CJ, Sickles EA, Mendelson EB, Morris EA, et al. ACR BI-RADS

Jones KS. A statistical interpretation of term specificity and its application in retrieval. J Doc. 1972;28(1):11–21.

doi: 10.1108/eb026526

Mikolov T, Chen K, Corrado GS, Dean J. Efficient estimation of word representations in vector space. In International Conference on Learning Representations; 2013. https://openreview.net/forum?id=idpCdOWtqXd60 .

Hochreiter S. Long Short-term Memory. Neural Computation MIT-Press; 1997.

Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics; 2019. pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423 .

Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):bbac409.

Banerjee I, Madhavan S, Goldman RE, Rubin D. Intelligent Word Embeddings of Free-Text Radiology Reports. AMIA Annual Symposium proceedings AMIA Symposium. 2017;2017:411–20.

pubmed: 29854105

Farouk M. Sentence semantic similarity based on Word Embedding and WordNet. 2018 13th International Conference on Computer Engineering and Systems (ICCES); 2018. p. 33–7. https://ieeexplore.ieee.org/document/8639211 .

Lyu SY, Zhang Y, Zhang MW, Zhang BS, Gao LB, Bai LT, et al. Diagnostic value of artificial intelligence automatic detection systems for breast BI-RADS 4 nodules. World J Clin Cases. 2022;10(2):518.

doi: 10.12998/wjcc.v10.i2.518 pubmed: 35097077 pmcid: 8771370

Jnawali K, Arbabshirani MR, Ulloa AE, Rao N, Patel AA. Automatic Classification of Radiological Report for Intracranial Hemorrhage. In: 2019 IEEE 13th International Conference on Semantic Computing (ICSC). IEEE; 2019. https://doi.org/10.1109/icosc.2019.8665578 .

Kłos M, Żyłkowski J, Spinczyk D. Automatic Classification of Text Documents Presenting Radiology Examinations. In: Advances in Intelligent Systems and Computing. Springer International Publishing; 2018. pp. 495–505. https://doi.org/10.1007/978-3-319-91211-0_43 .

Semi-Supervised Deshmukh N, Approach Natural Language Processing, for Fine-Grained Classification of Medical Reports. In: 2019 IEEE MIT Undergraduate Research Technology Conference (URTC). IEEE; 2019. https://doi.org/10.1109/urtc49097.2019.9660430 .

Kim C, Zhu V, Obeid J, Lenert L. Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke. PLoS ONE. 2019;14(2):e0212778. https://doi.org/10.1371/journal.pone.0212778 .

doi: 10.1371/journal.pone.0212778 pubmed: 30818342 pmcid: 6394972

Garg R, Oh E, Naidech A, Kording K, Prabhakaran S. Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing. J Stroke Cerebrovasc Dis. 2019;28(7):2045–51. https://doi.org/10.1016/j.jstrokecerebrovasdis.2019.02.004 .

doi: 10.1016/j.jstrokecerebrovasdis.2019.02.004 pubmed: 31103549

Shin B, Chokshi FH, Lee T, Choi JD. Classification of radiology reports using neural attention models. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE; 2017. https://doi.org/10.1109/ijcnn.2017.7966408 .

Wheater E, Mair G, Sudlow C, Alex B, Grover C, Whiteley W. A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records. BMC Med Inform Decis Mak. 2019;19(1). https://doi.org/10.1186/s12911-019-0908-7 .

Gorinski PJ, Wu H, Grover C, Tobin R, Talbot C, Whalley HC, et al. Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches. 2019. arXiv:1903.03985 .

Alex B, Grover C, Tobin R, Sudlow C, Mair G, Whiteley W. Text mining brain imaging reports. J Biomed Semant. 2019;10(S1). https://doi.org/10.1186/s13326-019-0211-7 .

Johnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, ying Deng C. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data. 2019;6:317.

doi: 10.1038/s41597-019-0322-0 pubmed: 31831740 pmcid: 6908718

Jain S, Agrawal A, Saporta A, Truong S, Duong D, Bui T, et al. RadGraph: Extracting Clinical Entities and Relations from Radiology Reports. 2021. arXiv:2106.14463 .

Bustos A, Pertusa A, Salinas JM, de la Iglesia-Vayá M. PadChest: a large chest x-ray image dataset with multi-label annotated reports. Med Image Anal. 2019;66:101797.

doi: 10.1016/j.media.2020.101797

Nguyen HQ, Lam K, Le LT, Pham H, Tran DQ, Nguyen DB, et al. VinDr-CXR: an open dataset of chest X-rays with radiologist’s annotations. Sci Data. 2020;9:429.

doi: 10.1038/s41597-022-01498-w

Datta S, Roberts K. A dataset of chest X-ray reports annotated with Spatial Role Labeling annotations. Data Brief. 2020;32:106056.

Patel TA, Puppala M, Ogunti RO, Ensor JE, He T, Shewale JB, et al. Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer. 2016;123(1):114–21. https://doi.org/10.1002/cncr.30245 .

doi: 10.1002/cncr.30245 pubmed: 27571243

Miao S, Xu T, Wu Y, Xie H, Wang J, Jing S, et al. Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches. Int J Med Inform. 2018;119:17–21. https://doi.org/10.1016/j.ijmedinf.2018.08.009 .

doi: 10.1016/j.ijmedinf.2018.08.009 pubmed: 30342682

Banerjee I, Bozkurt S, Alkim E, Sagreiya H, Kurian AW, Rubin DL. Automatic inference of BI-RADS final assessment categories from narrative mammography report findings. J Biomed Inform. 2019;92:103137. https://doi.org/10.1016/j.jbi.2019.103137 .

doi: 10.1016/j.jbi.2019.103137 pubmed: 30807833

TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Sadam Hussain (S)

Usman Naseem (U)

Mansoor Ali (M)

Daly Betzabeth Avendaño Avalos (DB)

Servando Cardona-Huerta (S)

Beatriz Alejandra Bosques Palomo (BA)

Jose Gerardo Tamez-Peña (JG)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH