Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care.


Journal

Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology
ISSN: 1538-7755
Titre abrégé: Cancer Epidemiol Biomarkers Prev
Pays: United States
ID NLM: 9200608

Informations de publication

Date de publication:
06 03 2023
Historique:
received: 12 08 2022
revised: 07 10 2022
accepted: 19 12 2022
pubmed: 29 12 2022
medline: 8 3 2023
entrez: 28 12 2022
Statut: ppublish

Résumé

This study used machine learning to develop a 3-year lung cancer risk prediction model with large real-world data in a mostly younger population. Over 4.7 million individuals, aged 45 to 65 years with no history of any cancer or lung cancer screening, diagnostic, or treatment procedures, with an outpatient visit in 2013 were identified in Optum's de-identified Electronic Health Record (EHR) dataset. A least absolute shrinkage and selection operator model was fit using all available data in the 365 days prior. Temporal validation was assessed with recent data. External validation was assessed with data from Mercy Health Systems EHR and Optum's de-identified Clinformatics Data Mart Database. Racial inequities in model discrimination were assessed with xAUCs. The model AUC was 0.76. Top predictors included age, smoking, race, ethnicity, and diagnosis of chronic obstructive pulmonary disease. The model identified a high-risk group with lung cancer incidence 9 times the average cohort incidence, representing 10% of patients with lung cancer. Model performed well temporally and externally, while performance was reduced for Asians and Hispanics. A high-dimensional model trained using big data identified a subset of patients with high lung cancer risk. The model demonstrated transportability to EHR and claims data, while underscoring the need to assess racial disparities when using machine learning methods. This internally and externally validated real-world data-based lung cancer prediction model is available on an open-source platform for broad sharing and application. Model integration into an EHR system could minimize physician burden by automating identification of high-risk patients.

Sections du résumé

BACKGROUND
This study used machine learning to develop a 3-year lung cancer risk prediction model with large real-world data in a mostly younger population.
METHODS
Over 4.7 million individuals, aged 45 to 65 years with no history of any cancer or lung cancer screening, diagnostic, or treatment procedures, with an outpatient visit in 2013 were identified in Optum's de-identified Electronic Health Record (EHR) dataset. A least absolute shrinkage and selection operator model was fit using all available data in the 365 days prior. Temporal validation was assessed with recent data. External validation was assessed with data from Mercy Health Systems EHR and Optum's de-identified Clinformatics Data Mart Database. Racial inequities in model discrimination were assessed with xAUCs.
RESULTS
The model AUC was 0.76. Top predictors included age, smoking, race, ethnicity, and diagnosis of chronic obstructive pulmonary disease. The model identified a high-risk group with lung cancer incidence 9 times the average cohort incidence, representing 10% of patients with lung cancer. Model performed well temporally and externally, while performance was reduced for Asians and Hispanics.
CONCLUSIONS
A high-dimensional model trained using big data identified a subset of patients with high lung cancer risk. The model demonstrated transportability to EHR and claims data, while underscoring the need to assess racial disparities when using machine learning methods.
IMPACT
This internally and externally validated real-world data-based lung cancer prediction model is available on an open-source platform for broad sharing and application. Model integration into an EHR system could minimize physician burden by automating identification of high-risk patients.

Identifiants

pubmed: 36576991
pii: 712419
doi: 10.1158/1055-9965.EPI-22-0873
pmc: PMC9986687
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

337-343

Informations de copyright

©2022 The Authors; Published by the American Association for Cancer Research.

Références

J Natl Cancer Inst. 2020 May 1;112(5):466-479
pubmed: 31566216
J Med Internet Res. 2019 May 16;21(5):e13260
pubmed: 31099339
Front Pharmacol. 2017 Nov 30;8:883
pubmed: 29249970
Chest. 2022 Jun;161(6):1621-1627
pubmed: 35143823
J Am Med Inform Assoc. 2018 Aug 1;25(8):969-975
pubmed: 29718407
JAMA Oncol. 2022 Oct 1;8(10):1428-1437
pubmed: 35900734
BMC Med Res Methodol. 2021 Aug 28;21(1):180
pubmed: 34454423
BMC Med Res Methodol. 2020 May 6;20(1):102
pubmed: 32375693
Proc Mach Learn Res. 2017 Aug;68:25-38
pubmed: 30542673
CA Cancer J Clin. 2022 Jan;72(1):7-33
pubmed: 35020204
Chest. 2022 Feb;161(2):586-589
pubmed: 34298006
BMC Med. 2015 Jan 06;13:1
pubmed: 25563062
Comput Methods Programs Biomed. 2021 Nov;211:106394
pubmed: 34560604
BMJ Open. 2021 Dec 24;11(12):e050146
pubmed: 34952871
Cancer Causes Control. 2007 Jun;18(5):561-9
pubmed: 17447148
JAMA. 2015 Feb 24;313(8):853-5
pubmed: 25710663
Am J Respir Crit Care Med. 2021 Aug 15;204(4):389-390
pubmed: 34097833
Am J Respir Crit Care Med. 2021 Aug 15;204(4):445-453
pubmed: 33823116
BMC Med Inform Decis Mak. 2017 Feb 27;17(1):23
pubmed: 28241763
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806

Auteurs

Urmila Chandran (U)

Johnson & Johnson Global Epidemiology, Titusville, New Jersey.
Lung Cancer Initiative, Johnson & Johnson, New Brunswick, New Jersey.

Jenna Reps (J)

Johnson & Johnson Global Epidemiology, Titusville, New Jersey.

Robert Yang (R)

Lung Cancer Initiative, Johnson & Johnson, New Brunswick, New Jersey.

Anil Vachani (A)

University of Pennsylvania, Perelman School of Medicine, Philadelphia, Pennsylvania.

Fabien Maldonado (F)

Vanderbilt University, Nashville, Tennessee.

Iftekhar Kalsekar (I)

Lung Cancer Initiative, Johnson & Johnson, New Brunswick, New Jersey.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH