Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study.
Gradient Boosting Decision Tree
airflow limitation
chronic obstructive pulmonary disease
logistic regression
medical check-up
Journal
JMIR medical informatics
ISSN: 2291-9694
Titre abrégé: JMIR Med Inform
Pays: Canada
ID NLM: 101645109
Informations de publication
Date de publication:
06 Jul 2021
06 Jul 2021
Historique:
received:
05
10
2020
accepted:
11
04
2021
revised:
17
11
2020
entrez:
13
7
2021
pubmed:
14
7
2021
medline:
14
7
2021
Statut:
epublish
Résumé
Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist. This study aimed to predict the risk factors for COPD diagnosis using machine learning in an annual medical check-up database. In this retrospective observational cohort study (ARTDECO [Analysis of Risk Factors to Detect COPD]), annual medical check-up records for all Hitachi Ltd employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30 to 75 years without a prior diagnosis of COPD/asthma or a history of cancer were included. The database included clinical measurements (eg, pulmonary function tests) and questionnaire responses. To predict the risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning (XGBoost) method was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the prebronchodilator forced expiratory volume in 1 second (FEV Of the 26,101 individuals screened, 1213 met the exclusion criteria, and thus, 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV Using a machine learning model in this longitudinal database, we identified a number of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan.
Sections du résumé
BACKGROUND
BACKGROUND
Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist.
OBJECTIVE
OBJECTIVE
This study aimed to predict the risk factors for COPD diagnosis using machine learning in an annual medical check-up database.
METHODS
METHODS
In this retrospective observational cohort study (ARTDECO [Analysis of Risk Factors to Detect COPD]), annual medical check-up records for all Hitachi Ltd employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30 to 75 years without a prior diagnosis of COPD/asthma or a history of cancer were included. The database included clinical measurements (eg, pulmonary function tests) and questionnaire responses. To predict the risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning (XGBoost) method was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the prebronchodilator forced expiratory volume in 1 second (FEV
RESULTS
RESULTS
Of the 26,101 individuals screened, 1213 met the exclusion criteria, and thus, 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV
CONCLUSIONS
CONCLUSIONS
Using a machine learning model in this longitudinal database, we identified a number of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan.
Identifiants
pubmed: 34255684
pii: v9i7e24796
doi: 10.2196/24796
pmc: PMC8293159
doi:
Types de publication
Journal Article
Langues
eng
Pagination
e24796Informations de copyright
©Shigeo Muro, Masato Ishida, Yoshiharu Horie, Wataru Takeuchi, Shunki Nakagawa, Hideyuki Ban, Tohru Nakagawa, Tetsuhisa Kitamura. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 06.07.2021.
Références
Eur J Epidemiol. 2016 Aug;31(8):785-92
pubmed: 26946425
Medicine (Baltimore). 2016 Apr;95(15):e3371
pubmed: 27082601
Chest. 2000 May;117(5 Suppl 2):354S-9S
pubmed: 10843976
Am J Respir Crit Care Med. 2020 Feb 1;201(3):294-302
pubmed: 31657634
Int J Chron Obstruct Pulmon Dis. 2017 Nov 16;12:3323-3328
pubmed: 29180862
PLoS One. 2012;7(5):e37483
pubmed: 22624038
Int J Chron Obstruct Pulmon Dis. 2019 May 13;14:995-1008
pubmed: 31190785
Respir Investig. 2018 Mar;56(2):120-127
pubmed: 29548649
Ann Intern Med. 2015 Jan 6;162(1):55-63
pubmed: 25560714
Am J Med. 2003 Apr 1;114(5):370-6
pubmed: 12714126
Sci Rep. 2019 Sep 17;9(1):13420
pubmed: 31530874
J R Stat Soc Series B Stat Methodol. 2012 Mar;74(2):245-266
pubmed: 25506256
Platelets. 2011;22(6):466-70
pubmed: 21506665
Am J Respir Crit Care Med. 2017 Mar 1;195(5):557-582
pubmed: 28128970
Respir Med. 2013 Jan;107(1):98-106
pubmed: 23127573
Clin J Pain. 2018 Sep;34(9):787-794
pubmed: 29485534
Am J Respir Crit Care Med. 2018 Jun 15;197(12):1540-1551
pubmed: 29406779
J Cachexia Sarcopenia Muscle. 2016 Dec;7(5):507-509
pubmed: 27891294
Lancet Respir Med. 2018 Jul;6(7):535-544
pubmed: 29628376
Nihon Koshu Eisei Zasshi. 2016;63(8):424-31
pubmed: 27681283
Med Arch. 2017 Apr;71(2):132-136
pubmed: 28790546
Lancet Respir Med. 2016 Sep;4(9):720-730
pubmed: 27444687
Int J Chron Obstruct Pulmon Dis. 2017 May 15;12:1469-1481
pubmed: 28553099
Eur Respir J. 2014 Jan;43(1):54-63
pubmed: 23563262
BMC Bioinformatics. 2008 Sep 25;9:400
pubmed: 18817546
Eur Respir J. 2004 Jun;23(6):932-46
pubmed: 15219010
Eur Respir J. 2005 Aug;26(2):319-38
pubmed: 16055882
Am J Respir Crit Care Med. 2011 Apr 1;183(7):891-7
pubmed: 20935112
Br Med J. 1977 Jun 25;1(6077):1645-8
pubmed: 871704
Respirology. 2004 Nov;9(4):458-65
pubmed: 15612956
Biometrics. 1988 Sep;44(3):837-45
pubmed: 3203132
Am J Respir Crit Care Med. 2016 Dec 1;194(11):1358-1365
pubmed: 27224255
Lancet. 2009 Aug 29;374(9691):721-32
pubmed: 19716965