Automated curation of large-scale cancer histopathology image datasets using deep learning.
colorectal cancer
deep learning
digital pathology
quality control
Journal
Histopathology
ISSN: 1365-2559
Titre abrégé: Histopathology
Pays: England
ID NLM: 7704136
Informations de publication
Date de publication:
26 Feb 2024
26 Feb 2024
Historique:
revised:
29
12
2023
received:
19
09
2023
accepted:
09
02
2024
medline:
27
2
2024
pubmed:
27
2
2024
entrez:
27
2
2024
Statut:
aheadofprint
Résumé
Artificial intelligence (AI) has numerous applications in pathology, supporting diagnosis and prognostication in cancer. However, most AI models are trained on highly selected data, typically one tissue slide per patient. In reality, especially for large surgical resection specimens, dozens of slides can be available for each patient. Manually sorting and labelling whole-slide images (WSIs) is a very time-consuming process, hindering the direct application of AI on the collected tissue samples from large cohorts. In this study we addressed this issue by developing a deep-learning (DL)-based method for automatic curation of large pathology datasets with several slides per patient. We collected multiple large multicentric datasets of colorectal cancer histopathological slides from the United Kingdom (FOXTROT, N = 21,384 slides; CR07, N = 7985 slides) and Germany (DACHS, N = 3606 slides). These datasets contained multiple types of tissue slides, including bowel resection specimens, endoscopic biopsies, lymph node resections, immunohistochemistry-stained slides, and tissue microarrays. We developed, trained, and tested a deep convolutional neural network model to predict the type of slide from the slide overview (thumbnail) image. The primary statistical endpoint was the macro-averaged area under the receiver operating curve (AUROCs) for detection of the type of slide. In the primary dataset (FOXTROT), with an AUROC of 0.995 [95% confidence interval [CI]: 0.994-0.996] the algorithm achieved a high classification performance and was able to accurately predict the type of slide from the thumbnail image alone. In the two external test cohorts (CR07, DACHS) AUROCs of 0.982 [95% CI: 0.979-0.985] and 0.875 [95% CI: 0.864-0.887] were observed, which indicates the generalizability of the trained model on unseen datasets. With a confidence threshold of 0.95, the model reached an accuracy of 94.6% (7331 classified cases) in CR07 and 85.1% (2752 classified cases) for the DACHS cohort. Our findings show that using the low-resolution thumbnail image is sufficient to accurately classify the type of slide in digital pathology. This can support researchers to make the vast resource of existing pathology archives accessible to modern AI models with only minimal manual annotations.
Sections du résumé
BACKGROUND
BACKGROUND
Artificial intelligence (AI) has numerous applications in pathology, supporting diagnosis and prognostication in cancer. However, most AI models are trained on highly selected data, typically one tissue slide per patient. In reality, especially for large surgical resection specimens, dozens of slides can be available for each patient. Manually sorting and labelling whole-slide images (WSIs) is a very time-consuming process, hindering the direct application of AI on the collected tissue samples from large cohorts. In this study we addressed this issue by developing a deep-learning (DL)-based method for automatic curation of large pathology datasets with several slides per patient.
METHODS
METHODS
We collected multiple large multicentric datasets of colorectal cancer histopathological slides from the United Kingdom (FOXTROT, N = 21,384 slides; CR07, N = 7985 slides) and Germany (DACHS, N = 3606 slides). These datasets contained multiple types of tissue slides, including bowel resection specimens, endoscopic biopsies, lymph node resections, immunohistochemistry-stained slides, and tissue microarrays. We developed, trained, and tested a deep convolutional neural network model to predict the type of slide from the slide overview (thumbnail) image. The primary statistical endpoint was the macro-averaged area under the receiver operating curve (AUROCs) for detection of the type of slide.
RESULTS
RESULTS
In the primary dataset (FOXTROT), with an AUROC of 0.995 [95% confidence interval [CI]: 0.994-0.996] the algorithm achieved a high classification performance and was able to accurately predict the type of slide from the thumbnail image alone. In the two external test cohorts (CR07, DACHS) AUROCs of 0.982 [95% CI: 0.979-0.985] and 0.875 [95% CI: 0.864-0.887] were observed, which indicates the generalizability of the trained model on unseen datasets. With a confidence threshold of 0.95, the model reached an accuracy of 94.6% (7331 classified cases) in CR07 and 85.1% (2752 classified cases) for the DACHS cohort.
CONCLUSION
CONCLUSIONS
Our findings show that using the low-resolution thumbnail image is sufficient to accurately classify the type of slide in digital pathology. This can support researchers to make the vast resource of existing pathology archives accessible to modern AI models with only minimal manual annotations.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : HORIZON EUROPE European Research Council
Informations de copyright
© 2024 The Authors. Histopathology published by John Wiley & Sons Ltd.
Références
Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 2019; 16; 703-715.
Echle A, Rindtorff NT, Brinker TJ, Luedde T, Pearson AT, Kather JN. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 2020; 124; 686-696.
Shmatko A, Ghaffari Laleh N, Gerstung M, Kather JN. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat. Can. 2022; 3; 1026-1038.
Cifci D, Foersch S, Kather JN. Artificial intelligence to identify genetic alterations in conventional histopathology. J. Pathol. 2022; 257; 430-444.
Kleppe A, Skrede O-J, De Raedt S, Liestøl K, Kerr DJ, Danielsen HE. Designing deep learning studies in cancer diagnostics. Nat. Rev. Cancer 2021; 21; 199-211.
Campanella G, Hanna MG, Geneslaw L et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 2019; 25; 1301-1309.
Lu MY, Chen TY, Williamson DFK et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 2021; 594; 106-110.
Kather JN, Krisam J, Charoentong P et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 2019; 16; e1002730.
Kleppe A, Skrede O-J, De Raedt S et al. A clinical decision support system optimising adjuvant chemotherapy for colorectal cancers by integrating deep learning and pathological staging markers: a development and validation study. Lancet Oncol. 2022; 23; 1221-1232.
Saillard C, Schmauch B, Laifa O et al. Predicting survival after hepatocellular carcinoma resection using deep-learning on histological slides. Hepatology 2020; 72; 2000-2013.
Coudray N, Ocampo PS, Sakellaropoulos T et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 2018; 24; 1559-1567.
Kather JN, Pearson AT, Halama N et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 2019; 25; 1054-1056.
Heinz CN, Echle A, Foersch S, Bychkov A, Kather JN. The future of artificial intelligence in digital pathology-results of a survey across stakeholder groups. Histopathology 2022; 80; 1121-1127.
Moulin P, Grünberg K, Barale-Thomas E, van der Laak J. IMI-Bigpicture: a central repository for digital pathology. Toxicol. Pathol. 2021; 49; 711-713.
Hwang C, Lee SJ, Lee JH et al. Stromal tumor-infiltrating lymphocytes evaluated on H&E-stained slides are an independent prognostic factor in epithelial ovarian cancer and ovarian serous carcinoma. Oncol. Lett. 2019; 17; 4557-4565.
van Diest PJ, Huisman A, van Ekris J et al. Pathology image exchange: the Dutch digital pathology platform for exchange of whole-slide images for efficient teleconsultation, Telerevision, and virtual expert panels. JCO Clin. Cancer Inform. 2019; 3; 1-7.
G049-Dataset-for-histopathological-reporting-of-colorectal-cancer.pdf.
Echle A, Grabsch HI, Quirke P et al. Clinical-grade detection of microsatellite instability in colorectal tumors by deep learning. Gastroenterology 2020; 159; 1406-1416.e11.
Hildebrand LA, Pierce CJ, Dennis M, Paracha M, Maoz A. Artificial intelligence for histology-based detection of microsatellite instability and prediction of response to immunotherapy in colorectal cancer. Cancers 2021; 13; 391.
Yamashita R, Long J, Longacre T et al. Deep learning model for the prediction of microsatellite instability in colorectal cancer: A diagnostic study. Lancet Oncol. 2021; 22; 132-141.
Bychkov D, Linder N, Turkki R et al. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 2018; 8; 3395.
Courtiol P, Maussion C, Moarii M et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 2019; 25; 1519-1525.
Harder N, Schönmeyer R, Nekolla K et al. Automatic discovery of image-based signatures for ipilimumab response prediction in malignant melanoma. Sci. Rep. 2019; 9; 7449.
Brockmoeller S, Echle A, Ghaffari Laleh N et al. Deep learning identifies inflamed fat as a risk factor for lymph node metastasis in early colorectal cancer. J. Pathol. 2022; 256; 269-281.
Chen C, Lu MY, Williamson DFK, Chen TY, Schaumberg AJ, Mahmood F. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 2022; 6; 1420-1434.
Beuque M, Magee DR, Chatterjee A et al. Automated detection and delineation of lymph nodes in haematoxylin & eosin stained digitized slides. J. Pathol. Inform. 2022; 14: 100192. https://doi.org/10.2139/ssrn.4207480.
West NP, Morris EJA, Rotimi O, Cairns A, Finan PJ, Quirke P. Pathology grading of colon cancer surgical resection and its association with survival: a retrospective observational study. Lancet Oncol. 2008; 9; 857-865.
Sebag-Montefiore D, Stephens RJ, Steele R et al. Preoperative radiotherapy versus selective postoperative chemoradiotherapy in patients with rectal cancer (MRC CR07 and NCIC-CTG C016): a multicentre, randomised trial. Lancet 2009; 373; 811-820.
Carr PR, Weigl K, Edelmann D et al. Estimation of absolute risk of colorectal cancer based on healthy lifestyle, genetic risk, and colonoscopy status in a population-based study. Gastroenterology 2020; 159; 129-138.e9.
Hoffmeister M, Bläker H, Jansen L et al. Colonoscopy and reduction of colorectal cancer risk by molecular tumor subtypes: a population-based case-control study. Am. J. Gastroenterol. 2020; 115; 2007-2016.
Brenner H, Chang-Claude J, Seiler CM, Stürmer T, Hoffmeister M. Does a negative screening colonoscopy ever need to be repeated? Gut 2006; 55; 1145-1150.
Morton D, Seymour M, Magill L et al. Preoperative chemotherapy for operable colon cancer: mature results of an international randomized controlled trial. J. Clin. Oncol. 2023; 41(8):1541. JCO2200046.
Ghaffari Laleh N, Muti HS, Loeffler CML et al. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Med. Image Anal. 2022; 79; 102474.
Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In Iii HD, Singh A eds. Proceedings of the 37th international conference on machine learning, 2020, PMLR, 13-18 July, pp. 1597-1607.
Stacke K, Unger J, Lundström C, Eilertsen G. Learning representations with contrastive self-supervised learning for histopathology applications. Journal of Machine Learning for Biomedical Imaging 2022:023. pp 1-33. http://arxiv.org/abs/2112.05760.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. IEEE International Conference on Computer Vision (ICCV), IEEE2017 https://doi.org/10.1109/iccv.2017.74.