A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future.

Data Accuracy Machine Learning Mammography Motion Pictures Reproducibility of Results Datasets as Topic

Journal

Scientific data

ISSN: 2052-4463

Titre abrégé: Sci Data

Pays: England

ID NLM: 101640192

Informations de publication

Date de publication:
08 09 2023

Historique:

received: 28 11 2022

accepted: 31 07 2023

medline: 11 9 2023

pubmed: 9 9 2023

entrez: 8 9 2023

Statut: epublish

Résumé

The increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be challenging to obtain. This paper combines the experience of an AI startup with an analysis of the FAIR principles of the eight available datasets. It demonstrates that the datasets vary considerably, particularly in their interoperability, as each dataset is skewed towards a particular clinical use-case. Additionally, the mix of digital captures and scanned film compounds the problem of variability, along with differences in licensing terms, ease of access, labelling reliability, and file formats. Improving interoperability through adherence to standards such as the BIRADS criteria for labelling and annotation, and a consistent file format, could markedly improve access and use of larger amounts of standardized data. This, in turn, could be increased further by GAN-based synthetic data generation, paving the way towards better health outcomes for breast cancer.

Identifiants

DOI: 10.1038/s41597-023-02430-6 PMID: 37684306 PMC: PMC10491669

pubmed: 37684306

doi: 10.1038/s41597-023-02430-6

pii: 10.1038/s41597-023-02430-6

pmc: PMC10491669

doi:

Types de publication

Review Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

595

Informations de copyright

Références

Carney, P. A. et al. Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Annals of internal medicine 138, 168–175 (2003).

doi: 10.7326/0003-4819-138-3-200302040-00008 pubmed: 12558355

Lancet, T. Breast cancer in developing countries. The Lancet Oncology 374, 1077–1085 (2009).

Wilkinson, M. D. et al. The fair guiding principles for scientific data management and stewardship. Scientific data 3, 1–9 (2016).

doi: 10.1038/sdata.2016.18

Bishop, B. W., Hank, C. & Webster, J. The Data Life Aquatic. International Journal of Digital Curation 16, 10 (2022).

doi: 10.2218/ijdc.v16i1.635

Heath, M., Bowyer, K., Kopans, D., Moore, R. & Kegelmeyer, P. The digital database for screening mammography. In Proceedings of the Fifth International Workshop on Digital Mammography, 212–218.

Heath, M. et al. Current status of the digital database for screening mammography. In Digital mammography, 457–460 (Springer, 1998).

Lee, R. S. et al. A curated mammography data set for use in computer-aided detection and diagnosis research. Scientific data 4, 1–9 (2017).

doi: 10.1038/sdata.2017.177

Lévy, D. & Jain, A. Breast mass classification from mammograms using deep convolutional neural networks. arXiv e-prints arXiv–1612, (2016).

Ballester, P. & Araujo, R. M. On the performance of googlenet and alexnet applied to sketches. In Thirtieth AAAI Conference on Artificial Intelligence (2016).

Suckling, J. et al. Mammographic image analysis society (MIAS) database v1. 21 (2015).

Balleyguier, C. et al. Birads

doi: 10.1016/j.ejrad.2006.08.033 pubmed: 17164080

Muhimmah, I. & Zwiggelaar, R. Mammographic density classification using multiresolution histogram information. In Proceedings of the International Special Topic Conference on Information Technology in Biomedicine, ITAB, 26–28 (Citeseer, 2006).

Liasis, G., Pattichis, C. & Petroudi, S. Combination of different texture features for mammographic breast density classification. In 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE), 732–737 (IEEE, 2012).

Shi, P., Wu, C., Zhong, J. & Wang, H. Deep learning from small dataset for bi-rads density classification of mammography images. In 2019 10th International Conference on Information Technology in Medicine and Education (ITME), 102–109 (IEEE, 2019).

Lopez, M. et al. BCDR: a breast cancer digital repository. In 15th International conference on experimental mechanics, vol. 1215 (2012).

Wong, D. J. et al. Artificial intelligence and convolution neural networks assessing mammographic images: A narrative literature review. Journal of medical radiation sciences 67, 134–142 (2020).

doi: 10.1002/jmrs.385 pubmed: 32134206 pmcid: 7276180

Chougrad, H., Zouaki, H. & Alheyane, O. Deep convolutional neural networks for breast cancer screening. Computer methods and programs in biomedicine 157, 19–30 (2018).

doi: 10.1016/j.cmpb.2018.01.011 pubmed: 29477427

Moreira, I. C. et al. InBreast: toward a full-field digital mammographic database. Academic radiology 19, 236–248 (2012).

doi: 10.1016/j.acra.2011.09.014 pubmed: 22078258

Dhungel, N., Carneiro, G. & Bradley, A. P. Deep learning and structured prediction for the segmentation of mass in mammograms. In International Conference on Medical image computing and computer-assisted intervention, 605–612 (Springer, 2015).

Le, E., Wang, Y., Huang, Y., Hickman, S. & Gilbert, F. Artificial intelligence in breast imaging. Clinical radiology 74, 357–366 (2019).

doi: 10.1016/j.crad.2019.02.006 pubmed: 30898381

Akselrod-Ballin, A. et al. A region based convolutional network for tumor detection and classification in breast mammography. In Deep learning and data labeling for medical applications, 197–205 (Springer, 2016).

Zhang, F. et al. Cascaded generative and discriminative learning for microcalcification detection in breast mammograms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12578–12586 (2019).

Huang, M.-L. & Lin, T.-Y. Dataset of breast mammography images with masses. Data in brief 31, 105928 (2020).

doi: 10.1016/j.dib.2020.105928 pubmed: 32642525 pmcid: 7334406

Clark, K. et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. Journal of digital imaging 26, 1045–1057 (2013).

doi: 10.1007/s10278-013-9622-7 pubmed: 23884657 pmcid: 3824915

Sawyer Lee, R., Gimenez, F., Hoogi, A. & Rubin, D. Curated breast imaging subset of ddsm. The Cancer Imaging Archive. 10.7937/K9/TCIA.2016.7O02S9CY 9 (2016).

Heenaye-Mamode Khan, M. et al. Multi-class classification of breast cancer abnormalities using deep convolutional neural network (cnn). Plos one 16, e0256500 (2021).

doi: 10.1371/journal.pone.0256500 pubmed: 34437623 pmcid: 8389446

Agarwal, R., Diaz, O., Lladó, X., Yap, M. H. & Mart, R. Automatic mass detection in mammograms using deep convolutional neural networks. Journal of Medical Imaging 6, 031409 (2019).

doi: 10.1117/1.JMI.6.3.031409 pubmed: 35834317 pmcid: 6381602

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2818–2826 (2016).

Falcon, L., Pérez, M., Aguilar, W. G. & Conci, A. Transfer learning and fine tuning in breast mammogram abnormalities classification on cbis-ddsm database. Advances in Science, Technology and Engineering Systems 5, 154–165 (2020).

doi: 10.25046/aj050220

Shen, L. et al. Deep learning to improve breast cancer detection on screening mammography. Scientific reports 9, 1–12 (2019).

doi: 10.1038/s41598-019-48995-4

Ahmed, L. et al. Images data practices for semantic segmentation of breast cancer using deep neural network. Journal of Ambient Intelligence and Humanized Computing 1–17 (2020).

Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 834–848 (2017).

doi: 10.1109/TPAMI.2017.2699184 pubmed: 28463186

He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In Proceedings of the IEEE international conference on computer vision, 2961–2969 (2017).

Esserman, L. et al. Improving the accuracy of mammography: volume and outcome relationships. Journal of the National Cancer Institute 94, 369–375 (2002).

doi: 10.1093/jnci/94.5.369 pubmed: 11880475

Halling-Brown, M. D. et al. Optimam mammography image database: a large scale resource of mammography images and clinical data. arXiv preprint arXiv:2004.04742, (2020).

McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

doi: 10.1038/s41586-019-1799-6 pubmed: 31894144

Cai, H. et al. An online mammography database with biopsy confirmed types. Scientific Data 10, 123 (2023).

doi: 10.1038/s41597-023-02025-1 pubmed: 36882402 pmcid: 9992520

Boudouh, S. S. & Bouakkaz, M. Breast cancer: toward an accurate breast tumor detection model in mammography using transfer learning techniques. Multimedia Tools and Applications 1–24 (2023).

Obenauer, S., Hermann, K. & Grabbe, E. Applications and literature review of the bi-rads classification. European radiology 15, 1027–1036 (2005).

doi: 10.1007/s00330-004-2593-9 pubmed: 15856253

Xu, W., Souly, N. & Brahma, P. P. Reliability of GAN generated data to train and validate perception systems for autonomous vehicles. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 171–180 (2021).

Logan, J. M., Kennedy, P. K. & Catchpoole, D. Supplemental data for the application of Bishop & Hank’s framework to mammographic datasets. figshare https://doi.org/10.6084/m9.figshare.23732889 (2023).

A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Joe Logan (J)

Paul J Kennedy (PJ)

Daniel Catchpoole (D)

Articles similaires

Relative victimization scale: initial development and retrospective reports of the impact on mental health.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Infertility treatments and risk of breast benign diseases: a case‒control study.

Cultural adaptation and validation of the Sinhala version of the spiritual needs assessment for patients (S-SNAP) questionnaire.

Classifications MeSH