The impact of site-specific digital histology signatures on deep learning model accuracy and bias.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
20 07 2021
Historique:
received: 09 12 2020
accepted: 01 07 2021
entrez: 21 7 2021
pubmed: 22 7 2021
medline: 28 7 2021
Statut: epublish

Résumé

The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site.

Identifiants

pubmed: 34285218
doi: 10.1038/s41467-021-24698-1
pii: 10.1038/s41467-021-24698-1
pmc: PMC8292530
doi:

Substances chimiques

Biomarkers, Tumor 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

4423

Subventions

Organisme : NIDCR NIH HHS
ID : K08 DE026500
Pays : United States
Organisme : NCI NIH HHS
ID : P20 CA233307
Pays : United States
Organisme : NCI NIH HHS
ID : T32 CA009566
Pays : United States
Organisme : NCI NIH HHS
ID : U01 CA243075
Pays : United States

Informations de copyright

© 2021. The Author(s).

Références

Cancer Cell. 2020 May 11;37(5):639-654.e6
pubmed: 32396860
IEEE Trans Med Imaging. 2016 May;35(5):1313-21
pubmed: 26891484
Gastroenterology. 2020 Oct;159(4):1406-1416.e11
pubmed: 32562722
Br J Cancer. 2021 Feb;124(4):686-696
pubmed: 33204028
Proc Natl Acad Sci U S A. 2018 Mar 27;115(13):E2970-E2979
pubmed: 29531073
Sci Rep. 2017 Apr 18;7:46450
pubmed: 28418027
Cell. 2018 Apr 5;173(2):400-416.e11
pubmed: 29625055
Br J Cancer. 1997;75(4):593-6
pubmed: 9052416
Sci Rep. 2020 Jan 30;10(1):1504
pubmed: 32001752
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1091-8
pubmed: 23893318
Biometrics. 1988 Sep;44(3):837-45
pubmed: 3203132
Cancer J. 2021 Jan-Feb 01;27(1):8-16
pubmed: 33475288
Ann Oncol. 2015 Feb;26(2):259-71
pubmed: 25214542
Nat Med. 2018 Oct;24(10):1559-1567
pubmed: 30224757
Nat Med. 2019 Oct;25(10):1519-1525
pubmed: 31591589
CA Cancer J Clin. 2015 May-Jun;65(3):221-38
pubmed: 25960198
J Oral Maxillofac Pathol. 2014 Sep;18(Suppl 1):S111-6
pubmed: 25364159
Nature. 2015 Jan 29;517(7536):576-82
pubmed: 25631445
Nat Cancer. 2020 Aug;1(8):789-799
pubmed: 33763651
Breast Cancer Res. 2020 Jan 28;22(1):12
pubmed: 31992350
Sci Rep. 2020 Jun 25;10(1):10333
pubmed: 32587295
Med Image Anal. 2019 Dec;58:101544
pubmed: 31466046
Nat Med. 2019 Jul;25(7):1054-1056
pubmed: 31160815
JAMA Netw Open. 2019 Jul 3;2(7):e197700
pubmed: 31348505
Nat Commun. 2020 Aug 3;11(1):3877
pubmed: 32747659
IEEE J Biomed Health Inform. 2014 May;18(3):765-72
pubmed: 24808220
Comput Med Imaging Graph. 2018 Mar;64:29-40
pubmed: 29409716
Radiology. 1983 Sep;148(3):839-43
pubmed: 6878708
Nature. 2014 Jul 31;511(7511):543-50
pubmed: 25079552
IEEE Trans Med Imaging. 2020 Sep 03;PP:
pubmed: 32881682
PeerJ. 2014 Jun 19;2:e453
pubmed: 25024921
Nature. 2000 Aug 17;406(6797):747-52
pubmed: 10963602
Cancer Epidemiol Biomarkers Prev. 2002 Jul;11(7):601-7
pubmed: 12101106
PLoS One. 2020 Nov 25;15(11):e0242858
pubmed: 33237966
Mod Pathol. 2002 Aug;15(8):783-9
pubmed: 12181262
N Engl J Med. 2018 Mar 15;378(11):981-983
pubmed: 29539284
J Clin Oncol. 2009 Sep 20;27(27):4515-21
pubmed: 19704069
Immunity. 2018 Apr 17;48(4):812-830.e14
pubmed: 29628290
NPJ Breast Cancer. 2018 Sep 3;4:30
pubmed: 30182055
Lancet Oncol. 2020 Feb;21(2):233-241
pubmed: 31926805
Nature. 2013 Jul 4;499(7456):43-9
pubmed: 23792563
Hum Mol Genet. 2016 Nov 1;25(21):4835-4846
pubmed: 28171663
Cancer. 1995 Jan 1;75(1 Suppl):406-21
pubmed: 8001011
Trends Biotechnol. 2017 Jun;35(6):498-507
pubmed: 28351613
BMC Bioinformatics. 2017 Jul 24;18(1):351
pubmed: 28738841
Comput Struct Biotechnol J. 2018 Feb 09;16:34-42
pubmed: 30275936
Nature. 2012 Jul 18;487(7407):330-7
pubmed: 22810696
Front Genet. 2020 Aug 25;11:768
pubmed: 33193560
Nat Biomed Eng. 2020 Aug;4(8):827-834
pubmed: 32572199
Front Med (Lausanne). 2019 Sep 30;6:193
pubmed: 31632974
J Thorac Oncol. 2017 Mar;12(3):501-509
pubmed: 27826035
Nat Med. 2019 Jan;25(1):44-56
pubmed: 30617339
Sci Rep. 2019 Aug 29;9(1):12529
pubmed: 31467303
PLoS Med. 2019 Jan 24;16(1):e1002730
pubmed: 30677016
Nature. 2012 Sep 27;489(7417):519-25
pubmed: 22960745
BMC Bioinformatics. 2017 Apr 11;18(1):211
pubmed: 28399795
IEEE Trans Med Imaging. 2018 Mar 28;:
pubmed: 29994086
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806
PLoS One. 2020 Jun 17;15(6):e0233678
pubmed: 32555646
J Clin Oncol. 2006 Mar 20;24(9):1357-62
pubmed: 16549830

Auteurs

Frederick M Howard (FM)

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA.

James Dolezal (J)

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA.

Sara Kochanny (S)

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA.

Jefree Schulte (J)

Department of Pathology, University of Chicago, Chicago, IL, USA.

Heather Chen (H)

Department of Pathology, University of Chicago, Chicago, IL, USA.

Lara Heij (L)

Department of Surgery and Transplantation, University Hospital RWTH Aachen, Aachen, Germany.
Institute of Pathology, University Hospital RWTH Aachen, Aachen, Germany.

Dezheng Huo (D)

Department of Public Health Sciences, University of Chicago, Chicago, IL, USA.
University of Chicago Comprehensive Cancer Center, Chicago, IL, USA.

Rita Nanda (R)

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA.
University of Chicago Comprehensive Cancer Center, Chicago, IL, USA.

Olufunmilayo I Olopade (OI)

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA.
University of Chicago Comprehensive Cancer Center, Chicago, IL, USA.

Jakob N Kather (JN)

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Pathology and Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK.
Medical Oncology, National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany.

Nicole Cipriani (N)

Department of Pathology, University of Chicago, Chicago, IL, USA.
University of Chicago Comprehensive Cancer Center, Chicago, IL, USA.

Robert L Grossman (RL)

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA. rgrossman1@uchicago.edu.
University of Chicago Comprehensive Cancer Center, Chicago, IL, USA. rgrossman1@uchicago.edu.

Alexander T Pearson (AT)

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA. apearson5@medicine.bsd.uchicago.edu.
University of Chicago Comprehensive Cancer Center, Chicago, IL, USA. apearson5@medicine.bsd.uchicago.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH