Prediction of lung cancer risk in Chinese population with genetic-environment factor using extreme gradient boosting.

Chinese population extreme gradient boosting lung cancer risk model single nucleotide polymorphisms

Journal

Cancer medicine
ISSN: 2045-7634
Titre abrégé: Cancer Med
Pays: United States
ID NLM: 101595310

Informations de publication

Date de publication:
12 2022
Historique:
revised: 22 04 2022
received: 12 10 2020
accepted: 24 04 2022
pubmed: 3 5 2022
medline: 15 12 2022
entrez: 2 5 2022
Statut: ppublish

Résumé

Detecting early-stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis of lung cancer in a multicenter case-control study. A total of 974 cases and 1005 controls in Shanghai and Taizhou were recruited, and 61 single nucleotide polymorphisms (SNPs) were genotyped. Multivariate logistic regression was used to calculate the association between signal SNPs and lung cancer risk. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms, a large-scale machine learning algorithm, were adopted to build the lung cancer risk model. In both models, 10-fold cross-validation was performed, and model predictive performance was evaluated by the area under the curve (AUC). After FDR adjustment, TYMS rs3819102 and BAG6 rs1077393 were significantly associated with lung cancer risk (p < 0.05). For lung cancer risk prediction, the model predicted only with epidemiology attained an AUC of 0.703 for LR and 0.744 for XGBoost. Compared with the LR model predicted only with epidemiology, further adding SNPs and applying XGBoost increased the AUC to 0.759 (p < 0.001) in the XGBoost model. BAG6 rs1077393 was the most important predictor among all SNPs in the lung cancer prediction XGBoost model, followed by TERT rs2735845 and CAMKK1 rs7214723. Further stratification in lung adenocarcinoma (ADC) showed a significantly elevated performance from 0.639 to 0.699 (p = 0.009) when applying XGBoost and adding SNPs to the model, while the best model for lung squamous cell carcinoma (SCC) prediction was the LR model predicted with epidemiology and SNPs (AUC = 0.833), compared with the XGBoost model (AUC = 0.816). Our lung cancer risk prediction models in the Chinese population have a strong predictive ability, especially for SCC. Adding SNPs and applying the XGBoost algorithm to the epidemiologic-based logistic regression risk prediction model significantly improves model performance.

Sections du résumé

BACKGROUND
Detecting early-stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis of lung cancer in a multicenter case-control study.
MATERIALS AND METHODS
A total of 974 cases and 1005 controls in Shanghai and Taizhou were recruited, and 61 single nucleotide polymorphisms (SNPs) were genotyped. Multivariate logistic regression was used to calculate the association between signal SNPs and lung cancer risk. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms, a large-scale machine learning algorithm, were adopted to build the lung cancer risk model. In both models, 10-fold cross-validation was performed, and model predictive performance was evaluated by the area under the curve (AUC).
RESULTS
After FDR adjustment, TYMS rs3819102 and BAG6 rs1077393 were significantly associated with lung cancer risk (p < 0.05). For lung cancer risk prediction, the model predicted only with epidemiology attained an AUC of 0.703 for LR and 0.744 for XGBoost. Compared with the LR model predicted only with epidemiology, further adding SNPs and applying XGBoost increased the AUC to 0.759 (p < 0.001) in the XGBoost model. BAG6 rs1077393 was the most important predictor among all SNPs in the lung cancer prediction XGBoost model, followed by TERT rs2735845 and CAMKK1 rs7214723. Further stratification in lung adenocarcinoma (ADC) showed a significantly elevated performance from 0.639 to 0.699 (p = 0.009) when applying XGBoost and adding SNPs to the model, while the best model for lung squamous cell carcinoma (SCC) prediction was the LR model predicted with epidemiology and SNPs (AUC = 0.833), compared with the XGBoost model (AUC = 0.816).
CONCLUSION
Our lung cancer risk prediction models in the Chinese population have a strong predictive ability, especially for SCC. Adding SNPs and applying the XGBoost algorithm to the epidemiologic-based logistic regression risk prediction model significantly improves model performance.

Identifiants

pubmed: 35499292
doi: 10.1002/cam4.4800
pmc: PMC9741969
doi:

Substances chimiques

BAG6 protein, human 0
Molecular Chaperones 0

Types de publication

Multicenter Study Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

4469-4478

Informations de copyright

© 2022 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.

Références

J Thorac Cardiovasc Surg. 2012 Jul;144(1):33-8
pubmed: 22710039
Am J Hum Genet. 2009 Nov;85(5):679-91
pubmed: 19836008
Yi Chuan. 2011 Aug;33(8):886-94
pubmed: 21831805
Carcinogenesis. 2010 Jul;31(7):1251-8
pubmed: 20462940
Lung Cancer. 2017 Nov;113:18-29
pubmed: 29110844
J Int Med Res. 2020 Mar;48(3):300060519887637
pubmed: 31775549
Clin Cancer Res. 2005 Aug 1;11(15):5433-9
pubmed: 16061858
CA Cancer J Clin. 2013 Mar-Apr;63(2):107-17
pubmed: 23315954
Cancer. 2006 Jun 15;106(12):2716-24
pubmed: 16691626
Zhonghua Liu Xing Bing Xue Za Zhi. 2015 Oct;36(10):1047-52
pubmed: 26837341
Carcinogenesis. 2019 May 14;40(3):403-411
pubmed: 30624620
Lung Cancer. 2009 Feb;63(2):180-6
pubmed: 18692935
Eur Respir J. 2016 Sep;48(3):889-902
pubmed: 27174888
J Thorac Oncol. 2019 May;14(5):784-792
pubmed: 30664991
Cancer Epidemiol Biomarkers Prev. 2009 Feb;18(2):579-84
pubmed: 19190136
Comput Biol Med. 2020 Jun;121:103761
pubmed: 32339094
JAMA Intern Med. 2017 Mar 1;177(3):439-441
pubmed: 28135349
CA Cancer J Clin. 2021 May;71(3):209-249
pubmed: 33538338
N Engl J Med. 2007 Nov 29;357(22):2277-84
pubmed: 18046031
Nat Genet. 2016 Jun;48(6):607-16
pubmed: 27158780
CA Cancer J Clin. 2019 Jan;69(1):7-34
pubmed: 30620402
Food Chem Toxicol. 2006 Sep;44(9):1590-6
pubmed: 16750592
Nucleic Acids Res. 2019 Jul 2;47(W1):W556-W560
pubmed: 31114875
Nat Genet. 2008 May;40(5):616-22
pubmed: 18385676
Int J Oncol. 2016 Jul;49(1):361-70
pubmed: 27121382
PLoS One. 2014 Sep 18;9(9):e107268
pubmed: 25233467
Cancer Epidemiol Biomarkers Prev. 2016 Aug;25(8):1208-15
pubmed: 27222311
Cancer Treat Rev. 2015 Apr;41(4):361-75
pubmed: 25825324
JAMA Oncol. 2017 Sep 1;3(9):1278-1281
pubmed: 28152136
IEEE Trans Pattern Anal Mach Intell. 2010 Mar;32(3):569-75
pubmed: 20075479
Acad Emerg Med. 2011 Oct;18(10):1099-104
pubmed: 21996075
Nat Genet. 2017 Jul;49(7):1126-1132
pubmed: 28604730
Cytometry A. 2009 Oct;75(10):840-7
pubmed: 19658174
Lung Cancer. 2009 Jun;64(3):251-6
pubmed: 19026460
J Clin Oncol. 2010 Jun 1;28(16):2719-26
pubmed: 20421535
Cancer Cell. 2004 Apr;5(4):341-51
pubmed: 15093541
Oncotarget. 2016 Jul 5;8(33):53959-53967
pubmed: 28903315
Nat Genet. 2008 Dec;40(12):1407-9
pubmed: 18978787
Nature. 2012 Sep 27;489(7417):519-25
pubmed: 22960745
Cancer Med. 2022 Dec;11(23):4469-4478
pubmed: 35499292
J Natl Compr Canc Netw. 2018 Apr;16(4):412-441
pubmed: 29632061
Cell Mol Life Sci. 2014 May;71(10):1829-37
pubmed: 24305946
BMC Med Genet. 2012 Dec 10;13:118
pubmed: 23228068
Curr Probl Cancer. 2019 Feb;43(1):66-74
pubmed: 30180988

Auteurs

Yutao Li (Y)

School of Life Sciences, Fudan University, Shanghai, China.

Zixiu Zou (Z)

School of Life Sciences, Fudan University, Shanghai, China.

Zhunyi Gao (Z)

Company 6 of Basic Medical School, Navy Military Medical University, Shanghai, China.

Yi Wang (Y)

School of Life Sciences, Fudan University, Shanghai, China.

Man Xiao (M)

Department of Biochemistry and Molecular Biology, Hainan Medical University, Haikou, China.

Chang Xu (C)

Clinical College of Xiangnan University, Chenzhou, China.

Gengxi Jiang (G)

Department of Thoracic Surgery, the First Affiliated Hospital of Naval Medical University (Second Military Medical University), Shanghai, China.

Haijian Wang (H)

School of Life Sciences, Fudan University, Shanghai, China.

Li Jin (L)

School of Life Sciences, Fudan University, Shanghai, China.

Jiucun Wang (J)

School of Life Sciences, Fudan University, Shanghai, China.

Huai Zhou Wang (HZ)

Department of Laboratory Diagnosis, the First Affiliated Hospital of Naval Medical University (Second Military Medical University), Shanghai, China.

Shicheng Guo (S)

School of Life Sciences, Fudan University, Shanghai, China.

Junjie Wu (J)

School of Life Sciences, Fudan University, Shanghai, China.
Department of Pulmonary and Critical Care Medicine, Zhongshan Hospital, Fudan University, Shanghai, China.
Department of Pulmonary and Critical Care Medicine, Shanghai Geriatric Medical Center, Shanghai, China.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH