A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
08 09 2023
Historique:
received: 28 11 2022
accepted: 31 07 2023
medline: 11 9 2023
pubmed: 9 9 2023
entrez: 8 9 2023
Statut: epublish

Résumé

The increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be challenging to obtain. This paper combines the experience of an AI startup with an analysis of the FAIR principles of the eight available datasets. It demonstrates that the datasets vary considerably, particularly in their interoperability, as each dataset is skewed towards a particular clinical use-case. Additionally, the mix of digital captures and scanned film compounds the problem of variability, along with differences in licensing terms, ease of access, labelling reliability, and file formats. Improving interoperability through adherence to standards such as the BIRADS criteria for labelling and annotation, and a consistent file format, could markedly improve access and use of larger amounts of standardized data. This, in turn, could be increased further by GAN-based synthetic data generation, paving the way towards better health outcomes for breast cancer.

Identifiants

pubmed: 37684306
doi: 10.1038/s41597-023-02430-6
pii: 10.1038/s41597-023-02430-6
pmc: PMC10491669
doi:

Types de publication

Review Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

595

Informations de copyright

© 2023. Springer Nature Limited.

Références

Carney, P. A. et al. Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Annals of internal medicine 138, 168–175 (2003).
doi: 10.7326/0003-4819-138-3-200302040-00008 pubmed: 12558355
Lancet, T. Breast cancer in developing countries. The Lancet Oncology 374, 1077–1085 (2009).
Wilkinson, M. D. et al. The fair guiding principles for scientific data management and stewardship. Scientific data 3, 1–9 (2016).
doi: 10.1038/sdata.2016.18
Bishop, B. W., Hank, C. & Webster, J. The Data Life Aquatic. International Journal of Digital Curation 16, 10 (2022).
doi: 10.2218/ijdc.v16i1.635
Heath, M., Bowyer, K., Kopans, D., Moore, R. & Kegelmeyer, P. The digital database for screening mammography. In Proceedings of the Fifth International Workshop on Digital Mammography, 212–218.
Heath, M. et al. Current status of the digital database for screening mammography. In Digital mammography, 457–460 (Springer, 1998).
Lee, R. S. et al. A curated mammography data set for use in computer-aided detection and diagnosis research. Scientific data 4, 1–9 (2017).
doi: 10.1038/sdata.2017.177
Lévy, D. & Jain, A. Breast mass classification from mammograms using deep convolutional neural networks. arXiv e-prints arXiv–1612, (2016).
Ballester, P. & Araujo, R. M. On the performance of googlenet and alexnet applied to sketches. In Thirtieth AAAI Conference on Artificial Intelligence (2016).
Suckling, J. et al. Mammographic image analysis society (MIAS) database v1. 21 (2015).
Balleyguier, C. et al. Birads
doi: 10.1016/j.ejrad.2006.08.033 pubmed: 17164080
Muhimmah, I. & Zwiggelaar, R. Mammographic density classification using multiresolution histogram information. In Proceedings of the International Special Topic Conference on Information Technology in Biomedicine, ITAB, 26–28 (Citeseer, 2006).
Liasis, G., Pattichis, C. & Petroudi, S. Combination of different texture features for mammographic breast density classification. In 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE), 732–737 (IEEE, 2012).
Shi, P., Wu, C., Zhong, J. & Wang, H. Deep learning from small dataset for bi-rads density classification of mammography images. In 2019 10th International Conference on Information Technology in Medicine and Education (ITME), 102–109 (IEEE, 2019).
Lopez, M. et al. BCDR: a breast cancer digital repository. In 15th International conference on experimental mechanics, vol. 1215 (2012).
Wong, D. J. et al. Artificial intelligence and convolution neural networks assessing mammographic images: A narrative literature review. Journal of medical radiation sciences 67, 134–142 (2020).
doi: 10.1002/jmrs.385 pubmed: 32134206 pmcid: 7276180
Chougrad, H., Zouaki, H. & Alheyane, O. Deep convolutional neural networks for breast cancer screening. Computer methods and programs in biomedicine 157, 19–30 (2018).
doi: 10.1016/j.cmpb.2018.01.011 pubmed: 29477427
Moreira, I. C. et al. InBreast: toward a full-field digital mammographic database. Academic radiology 19, 236–248 (2012).
doi: 10.1016/j.acra.2011.09.014 pubmed: 22078258
Dhungel, N., Carneiro, G. & Bradley, A. P. Deep learning and structured prediction for the segmentation of mass in mammograms. In International Conference on Medical image computing and computer-assisted intervention, 605–612 (Springer, 2015).
Le, E., Wang, Y., Huang, Y., Hickman, S. & Gilbert, F. Artificial intelligence in breast imaging. Clinical radiology 74, 357–366 (2019).
doi: 10.1016/j.crad.2019.02.006 pubmed: 30898381
Akselrod-Ballin, A. et al. A region based convolutional network for tumor detection and classification in breast mammography. In Deep learning and data labeling for medical applications, 197–205 (Springer, 2016).
Zhang, F. et al. Cascaded generative and discriminative learning for microcalcification detection in breast mammograms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12578–12586 (2019).
Huang, M.-L. & Lin, T.-Y. Dataset of breast mammography images with masses. Data in brief 31, 105928 (2020).
doi: 10.1016/j.dib.2020.105928 pubmed: 32642525 pmcid: 7334406
Clark, K. et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. Journal of digital imaging 26, 1045–1057 (2013).
doi: 10.1007/s10278-013-9622-7 pubmed: 23884657 pmcid: 3824915
Sawyer Lee, R., Gimenez, F., Hoogi, A. & Rubin, D. Curated breast imaging subset of ddsm. The Cancer Imaging Archive. 10.7937/K9/TCIA.2016.7O02S9CY 9 (2016).
Heenaye-Mamode Khan, M. et al. Multi-class classification of breast cancer abnormalities using deep convolutional neural network (cnn). Plos one 16, e0256500 (2021).
doi: 10.1371/journal.pone.0256500 pubmed: 34437623 pmcid: 8389446
Agarwal, R., Diaz, O., Lladó, X., Yap, M. H. & Mart, R. Automatic mass detection in mammograms using deep convolutional neural networks. Journal of Medical Imaging 6, 031409 (2019).
doi: 10.1117/1.JMI.6.3.031409 pubmed: 35834317 pmcid: 6381602
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2818–2826 (2016).
Falcon, L., Pérez, M., Aguilar, W. G. & Conci, A. Transfer learning and fine tuning in breast mammogram abnormalities classification on cbis-ddsm database. Advances in Science, Technology and Engineering Systems 5, 154–165 (2020).
doi: 10.25046/aj050220
Shen, L. et al. Deep learning to improve breast cancer detection on screening mammography. Scientific reports 9, 1–12 (2019).
doi: 10.1038/s41598-019-48995-4
Ahmed, L. et al. Images data practices for semantic segmentation of breast cancer using deep neural network. Journal of Ambient Intelligence and Humanized Computing 1–17 (2020).
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 834–848 (2017).
doi: 10.1109/TPAMI.2017.2699184 pubmed: 28463186
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In Proceedings of the IEEE international conference on computer vision, 2961–2969 (2017).
Esserman, L. et al. Improving the accuracy of mammography: volume and outcome relationships. Journal of the National Cancer Institute 94, 369–375 (2002).
doi: 10.1093/jnci/94.5.369 pubmed: 11880475
Halling-Brown, M. D. et al. Optimam mammography image database: a large scale resource of mammography images and clinical data. arXiv preprint arXiv:2004.04742, (2020).
McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).
doi: 10.1038/s41586-019-1799-6 pubmed: 31894144
Cai, H. et al. An online mammography database with biopsy confirmed types. Scientific Data 10, 123 (2023).
doi: 10.1038/s41597-023-02025-1 pubmed: 36882402 pmcid: 9992520
Boudouh, S. S. & Bouakkaz, M. Breast cancer: toward an accurate breast tumor detection model in mammography using transfer learning techniques. Multimedia Tools and Applications 1–24 (2023).
Obenauer, S., Hermann, K. & Grabbe, E. Applications and literature review of the bi-rads classification. European radiology 15, 1027–1036 (2005).
doi: 10.1007/s00330-004-2593-9 pubmed: 15856253
Xu, W., Souly, N. & Brahma, P. P. Reliability of GAN generated data to train and validate perception systems for autonomous vehicles. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 171–180 (2021).
Logan, J. M., Kennedy, P. K. & Catchpoole, D. Supplemental data for the application of Bishop & Hank’s framework to mammographic datasets. figshare https://doi.org/10.6084/m9.figshare.23732889 (2023).

Auteurs

Joe Logan (J)

Alixir Technologies Pty Ltd, Sydney, NSW, Australia. joe@alixir.ai.
Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, NSW, Australia. joe@alixir.ai.

Paul J Kennedy (PJ)

Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, NSW, Australia.

Daniel Catchpoole (D)

Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, NSW, Australia.
The Tumour Bank, The Children's Cancer Research Unit, Kids Research, The Children's Hospital at Westmead, Sydney, NSW, Australia.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Humans Female Case-Control Studies Adult Breast Diseases
Humans Middle Aged Female Male Surveys and Questionnaires

Classifications MeSH