TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.
BI-RADS classification
Breast radiological reports
ML
NLP
TF-IDF
Word2vec
Journal
BMC medical informatics and decision making
ISSN: 1472-6947
Titre abrégé: BMC Med Inform Decis Mak
Pays: England
ID NLM: 101088682
Informations de publication
Date de publication:
24 Oct 2024
24 Oct 2024
Historique:
received:
07
09
2023
accepted:
10
10
2024
medline:
24
10
2024
pubmed:
24
10
2024
entrez:
24
10
2024
Statut:
epublish
Résumé
Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification. The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607). In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.
Sections du résumé
BACKGROUND
BACKGROUND
Recently, machine learning (ML), deep learning (DL), and natural language processing (NLP) have provided promising results in the free-form radiological reports' classification in the respective medical domain. In order to classify radiological reports properly, a high-quality annotated and curated dataset is required. Currently, no publicly available breast imaging-based radiological dataset exists for the classification of Breast Imaging Reporting and Data System (BI-RADS) categories and breast density scores, as characterized by the American College of Radiology (ACR). To tackle this problem, we construct and annotate a breast imaging-based radiological reports dataset and its benchmark results. The dataset was originally in Spanish. Board-certified radiologists collected and annotated it according to the BI-RADS lexicon and categories at the Breast Radiology department, TecSalud Hospitals Monterrey, Mexico. Initially, it was translated into English language using Google Translate. Afterwards, it was preprocessed by removing duplicates and missing values. After preprocessing, the final dataset consists of 5046 unique reports from 5046 patients with an average age of 53 years and 100% women. Furthermore, we used word-level NLP-based embedding techniques, term frequency-inverse document frequency (TF-IDF) and word2vec to extract semantic and syntactic information. We also compared the performance of ML, DL and large language models (LLMs) classifiers for BI-RADS category classification.
RESULTS
RESULTS
The final breast imaging-based radiological reports dataset contains 5046 unique reports. We compared K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient-Boosting (GB), Extreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT) and Biomedical Generative Pre-trained Transformer (BioGPT) classifiers. It is observed that the BioGPT classifier with preprocessed data performed 6% better with a mean sensitivity of 0.60 (95% confidence interval (CI), 0.391-0.812) compared to the second best performing classifier BERT, which achieved mean sensitivity of 0.54 (95% CI, 0.477-0.607).
CONCLUSION
CONCLUSIONS
In this work, we propose a curated and annotated benchmark dataset that can be used for BI-RADS and breast density category classification. We also provide baseline results of most ML, DL and LLMs models for BI-RADS classification that can be used as a starting point for future investigation. The main objective of this investigation is to provide a repository for the investigators who wish to enter the field to push the boundaries further.
Identifiants
pubmed: 39444035
doi: 10.1186/s12911-024-02717-7
pii: 10.1186/s12911-024-02717-7
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
310Informations de copyright
© 2024. The Author(s).
Références
Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65(2):87–108. https://doi.org/10.3322/caac.21262 .
doi: 10.3322/caac.21262
pubmed: 25651787
Berry DA, Cronin KA, Plevritis SK, Fryback DG, Clarke L, Zelen M, et al. Effect of screening and adjuvant therapy on mortality from breast cancer. N Engl J Med. 2005;353(17):1784–92. https://doi.org/10.1056/nejmoa050518 .
doi: 10.1056/nejmoa050518
pubmed: 16251534
Screening for Breast Cancer: U.S. Preventive Services Task Force Recommendation Statement. Ann Intern Med. 2009;151(10):716. https://doi.org/10.7326/0003-4819-151-10-200911170-00008 .
Oeffinger KC, Fontham ETH, Etzioni R, Herzig A, Michaelson JS, Shih YCT, et al. Breast Cancer Screening for Women at Average Risk: 2015 Guideline Update From the American Cancer Society. JAMA. 2015;314(15):1599–614.
doi: 10.1001/jama.2015.12783
pubmed: 26501536
pmcid: 4831582
Sickles EA, D’Orsi CJ. Einleitung. In: ACR BI-RADS®-Atlas der Mammadiagnostik. Springer Berlin Heidelberg; 2016. pp. 475–480. https://doi.org/10.1007/978-3-662-48818-8_15 .
Tariq A, Assen MV, Cecco CND, Banerjee I. Bridging the Gap between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography. ACM Trans Comput Healthc. 2021;3(1):1–20. https://doi.org/10.1145/3474831 .
doi: 10.1145/3474831
Cury RC, Abbara S, Achenbach S, Agatston AS, Berman DS, Budoff MJ, et al. Coronary Artery Disease - Reporting and Data System (CAD-RADS): An Expert Consensus Document of SCCT, ACR and NASCI: Endorsed by the ACC. JACC Cardiovasc Imaging. 2016;9(9):1099–113.
doi: 10.1016/j.jcmg.2016.05.005
pubmed: 27609151
Reiner BI. The Challenges, Opportunities, and Imperative of Structured Reporting in Medical Imaging. J Digit Imaging Off J Soc Comput Appl Radiol. 2009;22:562–8.
Sevenster M, van Ommering R, Qian Y. Automatically Correlating Clinical Findings and Body Locations in Radiology Reports Using MedLEE. J Digit Imaging. 2011;25(2):240–9. https://doi.org/10.1007/s10278-011-9411-0 .
doi: 10.1007/s10278-011-9411-0
pmcid: 3295967
Ip IK, Mortele KJ, Prevedello LM, Khorasani R. Repeat Abdominal Imaging Examinations in a Tertiary Care Hospital. Am J Med. 2012;125(2):155–61. https://doi.org/10.1016/j.amjmed.2011.03.031 .
doi: 10.1016/j.amjmed.2011.03.031
pubmed: 22269618
pmcid: 4447187
Cheng LTE, Zheng J, Savova GK, Erickson BJ. Discerning Tumor Status from Unstructured MRI Reports—Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing. J Digit Imaging. 2009;23(2):119–32. https://doi.org/10.1007/s10278-009-9215-7 .
doi: 10.1007/s10278-009-9215-7
pubmed: 19484309
pmcid: 2837158
Bozkurt S, Lipson JA, Senol U, Rubin DL. Automatic abstraction of imaging observations with their characteristics from mammography reports. J Am Med Inform Assoc. 2014;22(e1):e81–92. https://doi.org/10.1136/amiajnl-2014-003009 .
doi: 10.1136/amiajnl-2014-003009
pubmed: 25352567
Percha B, Nassif H, Lipson J, Burnside E, Rubin D. Automatic classification of mammography reports by BI-RADS breast tissue composition class. J Am Med Inform Assoc. 2012;19(5):913–6. https://doi.org/10.1136/amiajnl-2011-000607 .
doi: 10.1136/amiajnl-2011-000607
pubmed: 22291166
pmcid: 3422822
Morioka C, Meng F, Taira R, Sayre J, Zimmerman P, Ishimitsu D, et al. Automatic Classification of Ultrasound Screening Examinations of the Abdominal Aorta. J Digit Imaging. 2016;29(6):742–8. https://doi.org/10.1007/s10278-016-9889-6 .
doi: 10.1007/s10278-016-9889-6
pubmed: 27400914
pmcid: 5114229
Solti I, Cooke CR, Xia F, Wurfel MM. Automated classification of radiology reports for acute lung injury: Comparison of keyword and machine learning based natural language processing approaches. In: 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop. IEEE; 2009. https://doi.org/10.1109/bibmw.2009.5332081 .
Zuccon G. Automatic Classification of Free-Text Radiology Reports to Identify Limb Fractures using Machine Learning and the SNOMED CT Ontology. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science; 2013.
Boumaraf S, Liu X, Ferkous C, Ma X. A New Computer-Aided Diagnosis System with Modified Genetic Feature Selection for BI-RADS Classification of Breast Masses in Mammograms. BioMed Res Int. 2020;2020(1):7695207.
pubmed: 32462017
pmcid: 7238352
Saslow D, Boetes C, Burke W, Harms SE, Leach MO, Lehman CD, et al. American cancer society guidelines for breast screening with MRI as an adjunct to mammography. CA Cancer J Clin. 2007;57:75–89.
doi: 10.3322/canjclin.57.2.75
pubmed: 17392385
An JY, Unsdorfer KML, Weinreb JC. BI-RADS, C-RADS, CAD-RADS, LI-RADS, Lung-RADS, NI-RADS, O-RADS, PI-RADS, TI-RADS: Reporting and Data Systems. Radiological Society of North America (RSNA); 2019. https://doi.org/10.1148/rg.2019190087.pres .
Burnside ES, Sickles EA, Bassett LW, Rubin DL, Lee CH, Ikeda DM, et al. The ACR BI-RADS® Experience: Learning From History. J Am Coll Radiol. 2009;6(12):851–60. https://doi.org/10.1016/j.jacr.2009.07.023 .
doi: 10.1016/j.jacr.2009.07.023
pubmed: 19945040
pmcid: 3099247
D’Orsi C. Breast Imaging Reporting and Data System (BI-RADS). Lee CI, Lehman CD, Bassett LW, editors. Oxford University Press; 2018. https://doi.org/10.1093/med/9780190270261.003.0005 .
of Radiology AC, et al. ACR BI-RADS® atlas of breast diagnostics: guidelines for diagnosis, recommendations for action and monitoring. Springer-Verlag; 2016.
Niknejad M, Weerakkody Y. Breast imaging-reporting and data system (BI-RADS). Radiopaedia.org; 2010. https://doi.org/10.53347/rid-10003 .
D’Orsi CJ, Sickles EA, Mendelson EB, Morris EA, et al. ACR BI-RADS
Jones KS. A statistical interpretation of term specificity and its application in retrieval. J Doc. 1972;28(1):11–21.
doi: 10.1108/eb026526
Mikolov T, Chen K, Corrado GS, Dean J. Efficient estimation of word representations in vector space. In International Conference on Learning Representations; 2013. https://openreview.net/forum?id=idpCdOWtqXd60 .
Hochreiter S. Long Short-term Memory. Neural Computation MIT-Press; 1997.
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics; 2019. pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423 .
Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):bbac409.
Banerjee I, Madhavan S, Goldman RE, Rubin D. Intelligent Word Embeddings of Free-Text Radiology Reports. AMIA Annual Symposium proceedings AMIA Symposium. 2017;2017:411–20.
pubmed: 29854105
Farouk M. Sentence semantic similarity based on Word Embedding and WordNet. 2018 13th International Conference on Computer Engineering and Systems (ICCES); 2018. p. 33–7. https://ieeexplore.ieee.org/document/8639211 .
Lyu SY, Zhang Y, Zhang MW, Zhang BS, Gao LB, Bai LT, et al. Diagnostic value of artificial intelligence automatic detection systems for breast BI-RADS 4 nodules. World J Clin Cases. 2022;10(2):518.
doi: 10.12998/wjcc.v10.i2.518
pubmed: 35097077
pmcid: 8771370
Jnawali K, Arbabshirani MR, Ulloa AE, Rao N, Patel AA. Automatic Classification of Radiological Report for Intracranial Hemorrhage. In: 2019 IEEE 13th International Conference on Semantic Computing (ICSC). IEEE; 2019. https://doi.org/10.1109/icosc.2019.8665578 .
Kłos M, Żyłkowski J, Spinczyk D. Automatic Classification of Text Documents Presenting Radiology Examinations. In: Advances in Intelligent Systems and Computing. Springer International Publishing; 2018. pp. 495–505. https://doi.org/10.1007/978-3-319-91211-0_43 .
Semi-Supervised Deshmukh N, Approach Natural Language Processing, for Fine-Grained Classification of Medical Reports. In: 2019 IEEE MIT Undergraduate Research Technology Conference (URTC). IEEE; 2019. https://doi.org/10.1109/urtc49097.2019.9660430 .
Kim C, Zhu V, Obeid J, Lenert L. Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke. PLoS ONE. 2019;14(2):e0212778. https://doi.org/10.1371/journal.pone.0212778 .
doi: 10.1371/journal.pone.0212778
pubmed: 30818342
pmcid: 6394972
Garg R, Oh E, Naidech A, Kording K, Prabhakaran S. Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing. J Stroke Cerebrovasc Dis. 2019;28(7):2045–51. https://doi.org/10.1016/j.jstrokecerebrovasdis.2019.02.004 .
doi: 10.1016/j.jstrokecerebrovasdis.2019.02.004
pubmed: 31103549
Shin B, Chokshi FH, Lee T, Choi JD. Classification of radiology reports using neural attention models. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE; 2017. https://doi.org/10.1109/ijcnn.2017.7966408 .
Wheater E, Mair G, Sudlow C, Alex B, Grover C, Whiteley W. A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records. BMC Med Inform Decis Mak. 2019;19(1). https://doi.org/10.1186/s12911-019-0908-7 .
Gorinski PJ, Wu H, Grover C, Tobin R, Talbot C, Whalley HC, et al. Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches. 2019. arXiv:1903.03985 .
Alex B, Grover C, Tobin R, Sudlow C, Mair G, Whiteley W. Text mining brain imaging reports. J Biomed Semant. 2019;10(S1). https://doi.org/10.1186/s13326-019-0211-7 .
Johnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, ying Deng C. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data. 2019;6:317.
doi: 10.1038/s41597-019-0322-0
pubmed: 31831740
pmcid: 6908718
Jain S, Agrawal A, Saporta A, Truong S, Duong D, Bui T, et al. RadGraph: Extracting Clinical Entities and Relations from Radiology Reports. 2021. arXiv:2106.14463 .
Bustos A, Pertusa A, Salinas JM, de la Iglesia-Vayá M. PadChest: a large chest x-ray image dataset with multi-label annotated reports. Med Image Anal. 2019;66:101797.
doi: 10.1016/j.media.2020.101797
Nguyen HQ, Lam K, Le LT, Pham H, Tran DQ, Nguyen DB, et al. VinDr-CXR: an open dataset of chest X-rays with radiologist’s annotations. Sci Data. 2020;9:429.
doi: 10.1038/s41597-022-01498-w
Datta S, Roberts K. A dataset of chest X-ray reports annotated with Spatial Role Labeling annotations. Data Brief. 2020;32:106056.
Patel TA, Puppala M, Ogunti RO, Ensor JE, He T, Shewale JB, et al. Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer. 2016;123(1):114–21. https://doi.org/10.1002/cncr.30245 .
doi: 10.1002/cncr.30245
pubmed: 27571243
Miao S, Xu T, Wu Y, Xie H, Wang J, Jing S, et al. Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches. Int J Med Inform. 2018;119:17–21. https://doi.org/10.1016/j.ijmedinf.2018.08.009 .
doi: 10.1016/j.ijmedinf.2018.08.009
pubmed: 30342682
Banerjee I, Bozkurt S, Alkim E, Sagreiya H, Kurian AW, Rubin DL. Automatic inference of BI-RADS final assessment categories from narrative mammography report findings. J Biomed Inform. 2019;92:103137. https://doi.org/10.1016/j.jbi.2019.103137 .
doi: 10.1016/j.jbi.2019.103137
pubmed: 30807833