A visual-language foundation model for computational pathology.

Journal

Nature medicine

ISSN: 1546-170X

Titre abrégé: Nat Med

Pays: United States

ID NLM: 9502015

Informations de publication

Date de publication:
19 Mar 2024

Historique:

received: 02 08 2023

accepted: 05 02 2024

medline: 20 3 2024

pubmed: 20 3 2024

entrez: 20 3 2024

Statut: aheadofprint

Résumé

The accelerated adoption of digital pathology and advances in deep learning have enabled the development of robust models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain, and a model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text and, notably, over 1.17 million image-caption pairs through task-agnostic pretraining. Evaluated on a suite of 14 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving histopathology images and/or text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, and text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning.

Identifiants

DOI: 10.1038/s41591-024-02856-4 PMID: 38504017

pubmed: 38504017

doi: 10.1038/s41591-024-02856-4

pii: 10.1038/s41591-024-02856-4

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng. 1, 930–949 (2023).

doi: 10.1038/s44222-023-00096-8

Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).

pubmed: 31399699 pmcid: 6880861 doi: 10.1038/s41571-019-0252-y

Shmatko, A., Ghaffari Laleh, N., Gerstung, M. & Kather, J. N. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat. Cancer 3, 1026–1038 (2022).

pubmed: 36138135 doi: 10.1038/s43018-022-00436-4

Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).

pubmed: 36220072 pmcid: 10655164 doi: 10.1016/j.ccell.2022.09.012

Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).

doi: 10.1001/jama.2017.14585

Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

pubmed: 30224757 pmcid: 9847512 doi: 10.1038/s41591-018-0177-5

Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).

pubmed: 33649564 pmcid: 8711640 doi: 10.1038/s41551-020-00682-w

Skrede, O.-J. et al. Deep learning for prediction of colorectal cancer outcome: a discovery and validation study. Lancet 395, 350–360 (2020).

pubmed: 32007170 doi: 10.1016/S0140-6736(19)32998-8

Chen, R. J. et al. Pan-cancer integrative histology–genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).

pubmed: 35944502 pmcid: 10397370 doi: 10.1016/j.ccell.2022.07.004

Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25, 1519–1525 (2019).

pubmed: 31591589 doi: 10.1038/s41591-019-0583-3

Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).

pubmed: 33953404 doi: 10.1038/s41586-021-03512-4

Zhu, L. et al. An accurate prediction of the origin for bone metastatic cancer using deep learning on digital pathological images. EBioMedicine 87, 104426 (2023).

pubmed: 36577348 doi: 10.1016/j.ebiom.2022.104426

Kalra, S. et al. Yottixel—an image search engine for large archives of histopathology whole slide images. Med. Image Anal. 65, 101757 (2020).

pubmed: 32623275 doi: 10.1016/j.media.2020.101757

Hegde, N. et al. Similar image search for histopathology: SMILY. NPJ Digit. Med. 2, 56 (2019).

pubmed: 31304402 pmcid: 6588631 doi: 10.1038/s41746-019-0131-z

Wang, X. et al. RetCCL: clustering-guided contrastive learning for whole-slide image retrieval. Med. Image Anal. 83, 102645 (2023).

pubmed: 36270093 doi: 10.1016/j.media.2022.102645

Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420–1434 (2022).

pubmed: 36217022 pmcid: 9792371 doi: 10.1038/s41551-022-00929-8

Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).

pubmed: 33763651 pmcid: 7610412 doi: 10.1038/s43018-020-0087-6

Saldanha, O. L. et al. Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology. NPJ Precis. Oncol. 7, 35 (2023).

pubmed: 36977919 pmcid: 10050159 doi: 10.1038/s41698-023-00365-0

Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).

pubmed: 31561183 doi: 10.1016/j.media.2019.101563

Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

pubmed: 31308507 pmcid: 7418463 doi: 10.1038/s41591-019-0508-1

Bulten, W. et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).

pubmed: 31926805 doi: 10.1016/S1470-2045(19)30739-9

Nagpal, K. et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit. Med. 2, 48 (2019).

pubmed: 31304394 pmcid: 6555810 doi: 10.1038/s41746-019-0112-2

Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).

pubmed: 29531073 pmcid: 5879673 doi: 10.1073/pnas.1717139115

Chen, R. J. et al. Multimodal co-attention transformer for survival prediction in gigapixel whole slide images. In Proc. IEEE/CVF International Conference on Computer Vision 4015–4025 (IEEE, 2021).

Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).

pubmed: 35122049 doi: 10.1038/s43018-020-0085-8

Sammut, S.-J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 601, 623–629 (2022).

pubmed: 34875674 doi: 10.1038/s41586-021-04278-5

Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol. 7, 14 (2023).

pubmed: 36707660 pmcid: 9883475 doi: 10.1038/s41698-023-00352-5

Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439 (2023).

pubmed: 36624314 doi: 10.1038/s41591-022-02134-1

Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer 3, 1151–1164 (2022).

pubmed: 36038778 pmcid: 9586871 doi: 10.1038/s43018-022-00416-8

Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).

Jia, C. et al. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 4904–4916 (PMLR, 2021).

Yu, J. et al. CoCa: contrastive captioners are image–text foundation models. Trans. Mach. Learn. Artif. Intell. https://openreview.net/forum?id=Ee277P3AYC (2022).

Li, J., Li, D., Xiong, C. & Hoi, S. BLIP: bootstrapping language–image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning (eds Chaudhur, K. et al.) 12888–12900 (PMLR, 2022).

Singh, A. et al. FLAVA: a foundational language and vision alignment model. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15638–15650 (IEEE, 2022).

Li, H. et al. Uni-Perceiver v2: a generalist model for large-scale vision and vision-language tasks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2691–2700 (IEEE, 2023).

Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. Adv. Neural Inf. Process. Syst. 35, 23716–23736 (2022).

Li, Y., Fan, H., Hu, R., Feichtenhofer, C. & He, K. Scaling language–image pre-training via masking. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 23390–23400 (IEEE, 2023).

Wang, W. et al. Image as a foreign language: BEiT pretraining for vision and vision-language tasks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 19175–19186 (IEEE, 2023).

Schuhmann, C. et al. LAION-5B: an open large-scale dataset for training next generation image-text models. Adv. Neural Inf. Process. Syst. 35, 25278–25294 (2022).

Chen, Z., Song, Y., Chang, T.-H. & Wan, X. Generating radiology reports via memory-driven transformer. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) 1439–1449 (Association for Computational Linguistics, 2020); https://aclanthology.org/2020.emnlp-main.112

Liu, G. et al. Clinically accurate chest X-ray report generation. In Proc. 4th Machine Learning for Healthcare Conference (eds Doshi-Velez, F. et al.), Vol. 106, 249–269 (PMLR, 2019).

Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).

pubmed: 36109605 pmcid: 9792370 doi: 10.1038/s41551-022-00936-9

Huang, S.-C., Shen, L., Lungren, M. P. & Yeung, S. GLoRIA: a multimodal global–local representation learning framework for label-efficient medical image recognition. In Proc. IEEE/CVF International Conference on Computer Vision 3942–3951 (IEEE, 2021).

Zhang, S. et al. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image–text pairs. Preprint at https://doi.org/10.48550/arXiv.2303.00915 (2023).

Wang, Z., Wu, Z., Agarwal, D. & Sun, J. MedCLIP: contrastive learning from unpaired medical images and text. In Proc. 2022 Conference on Empirical Methods in Natural Language Processing (eds Che, W. & Shutova, E.) 3876–3887 (Association for Computational Linguistics, 2022).

Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).

pubmed: 32467650 pmcid: 7581495 doi: 10.1038/s41379-020-0540-1

Maleki, D. & Tizhoosh, H. R. LILE: look in-depth before looking elsewhere—a dual attention network using transformers for cross-modal information retrieval in histopathology archives. In International Conference on Medical Imaging with Deep Learning (eds Konukoglu, E. et al.) 879–894 (PMLR, 2022).

Zhang, Y., Jiang, H., Miura, Y., Manning, C. D. & Langlotz, C. P. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference (eds Lipton, Z. et al.) 2–25 (PMLR, 2022).

Zhang, H. et al. PathNarratives: data annotation for pathological human–AI collaborative diagnosis. Front. Med. 9, 1070072 (2023).

doi: 10.3389/fmed.2022.1070072

Tsuneki, M. & Kanavati, F. Inference of captions from histopathological patches. In International Conference on Medical Imaging with Deep Learning (eds Konukoglu, E. et al.) 1235–1250 (PMLR, 2022).

Zhang, R., Weber, C., Grossman, R. & Khan, A. A. Evaluating and interpreting caption prediction for histopathology images. In Machine Learning for Healthcare Conference (eds Doshi-Velez, F. et al.) 418–435 (PMLR, 2020).

Naseem, U., Khushi, M. & Kim, J. Vision-language transformer for interpretable pathology visual question answering. IEEE J. Biomed. Health Inform. 27, 1681–1690 (2022).

doi: 10.1109/JBHI.2022.3163751

He, X. Towards visual question answering on pathology images. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (eds Zong, C. et al.) 708–718 (Association for Computational Linguistics, 2021).

Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).

pubmed: 37592105 doi: 10.1038/s41591-023-02504-3

Gamper, J. & Rajpoot, N. Multiple instance captioning: learning representations from histopathology textbooks and articles. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16549–16559 (IEEE, 2021).

Lu, M. Y. et al. Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 19764–19775 (IEEE, 2023).

Lin, W. et al. PMC-CLIP: contrastive language–image pre-training using biomedical documents. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2023 (ed. Greenspan, H. et al.) 525–536 (Springer Nature, 2023).

Ikezogwo, W. O. et al. Quilt-1M: one million image–text pairs for histopathology. In Advances in Neural Information Processing Systems (eds Oh, A. et al.) 37995–38017 (Curran Associates, Inc., 2023).

Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).

pubmed: 35952419 doi: 10.1016/j.media.2022.102559

Gatta, G. et al. Burden and centralised treatment in Europe of rare tumours: results of RARECAREnet—a population-based study. Lancet Oncol. 18, 1022–1039 (2017).

pubmed: 28687376 doi: 10.1016/S1470-2045(17)30445-X

Riasatian, A. et al. Fine-tuning and training of densenet for histopathology image representation using TCGA diagnostic slides. Med. Image Anal. 70, 102032 (2021).

pubmed: 33773296 doi: 10.1016/j.media.2021.102032

Kundra, R. et al. OncoTree: a cancer classification system for precision oncology. JCO Clin. Cancer Inform. 5, 221–230 (2021).

pubmed: 33625877 doi: 10.1200/CCI.20.00108

Alfasly, S. et al. When is a foundation model a foundation model. Preprint at https://doi.org/10.48550/arXiv.2309.11510 (2023).

Zhou, K., Yang, J., Loy, C. C. & Liu, Z. Learning to prompt for vision-language models. Int. J. Comput. Vis. 130, 2337–2348 (2022).

doi: 10.1007/s11263-022-01653-1

Gao, P. et al. CLIP-Adapter: better vision-language models with feature adapters. Int. J. Comput. Vis. 132, 581–595 (2024).

doi: 10.1007/s11263-023-01891-x

Perez, E., Kiela, D. & Cho, K. True few-shot learning with language models. Adv. Neural Inf. Process. Syst. 34, 11054–11070 (2021).

Sanh, V. et al. Multitask prompted training enables zero-shot task generalization. In 10th International Conference on Learning Representations https://openreview.net/forum?id=9Vrb9D0WI4 (OpenReview.net 2021).

Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You Only Look Once: unified, real-time object detection. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 779–788 (IEEE, 2016).

Luo, R. et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinform. 23, bbac409 (2022).

pubmed: 36156661 doi: 10.1093/bib/bbac409

Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In 9th International Conference on Learning Representations https://openreview.net/forum?id=YicbFdNTTy (OpenReview.net, 2021).

Zhou, J. et al. Image BERT pre-training with online tokenizer. In 10th International Conference on Learning Representations https://openreview.net/forum?id=ydopy-e6Dg (OpenReview.net, 2022).

Silva-Rodriguez, J., Colomer, A., Dolz, J. & Naranjo, V. Self-learning for weakly supervised Gleason grading of local patterns. IEEE J. Biomed. Health Inform. 25, 3094–3104 (2021).

pubmed: 33621184 doi: 10.1109/JBHI.2021.3061457

Dice, L. R. Measures of the amount of ecologic association between species. Ecology 26, 297–302 (1945).

doi: 10.2307/1932409

Kolesnikov, A., Zhai, X. & Beyer, L. Revisiting self-supervised visual representation learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 1920–1929 (IEEE, 2019).

Wang, J. et al. GIT: a generative image-to-text transformer for vision and language. Trans. Mach. Learn. Res. https://openreview.net/forum?id=b4tMhpN0JC (2022).

Li, J., Li, D., Savarese, S. & Hoi, S. BLIP-2: bootstrapping language–image pre-training with frozen image encoders and large language models. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 19730–19742 (PMLR, 2023).

Banerjee, S. & Lavie, A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 65–72 (Association for Computational Linguistics, 2005).

Lin, C.-Y. ROUGE: a package for automatic evaluation of summaries. In Text Summarization Branches Out 74–81 (Association for Computational Linguistics, 2004).

Lewis, M., Dauphin, Y. & Fan, A. Hierarchical neural story generation. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (eds Gurevych, I. & Miyao, Y.) 889–898 (Association for Computational Linguistics, 2018).

Wei, J. W. et al. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Sci. Rep. 9, 3358 (2019).

pubmed: 30833650 pmcid: 6399447 doi: 10.1038/s41598-019-40041-7

Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).

pubmed: 30677016 pmcid: 6345440 doi: 10.1371/journal.pmed.1002730

Han, C. et al. WSSS4LUAD: Grand Challenge on weakly-supervised tissue semantic segmentation for lung adenocarcinoma. Preprint at https://doi.org/10.48550/arXiv.2204.06455 (2022).

Da, Q. et al. DigestPath: a benchmark dataset with challenge review for the pathological detection and segmentation of digestive-system. Med. Image Anal. 80, 102485 (2022).

pubmed: 35679692 doi: 10.1016/j.media.2022.102485

Roetzer-Pejrimovsky, T. et al. The Digital Brain Tumour Atlas, an open histopathology resource. Sci. Data 9, 55 (2022).

pubmed: 35169150 pmcid: 8847577 doi: 10.1038/s41597-022-01157-0

Roetzer-Pejrimovsky, T. et al. The Digital Brain Tumour Atlas, an open histopathology resource [Data set]. EBRAINS https://doi.org/10.25493/WQ48-ZGX (2022).

Huo, X. et al. Comprehensive AI model development for Gleason grading: from scanning, cloud-based annotation to pathologist–AI interaction. Preprint at SSRN https://doi.org/10.2139/ssrn.4172090 (2022).

Bulten, W. et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat. Med. 28, 154–163 (2022).

pubmed: 35027755 pmcid: 8799467 doi: 10.1038/s41591-021-01620-2

A visual-language foundation model for computational pathology.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Ming Y Lu (MY)

Bowen Chen (B)

Drew F K Williamson (DFK)

Richard J Chen (RJ)

Ivy Liang (I)

Tong Ding (T)

Guillaume Jaume (G)

Igor Odintsov (I)

Long Phi Le (LP)

Georg Gerber (G)

Anil V Parwani (AV)

Andrew Zhang (A)

Faisal Mahmood (F)

Classifications MeSH