Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study.

Gradient Boosting Decision Tree airflow limitation chronic obstructive pulmonary disease logistic regression medical check-up

Journal

JMIR medical informatics
ISSN: 2291-9694
Titre abrégé: JMIR Med Inform
Pays: Canada
ID NLM: 101645109

Informations de publication

Date de publication:
06 Jul 2021
Historique:
received: 05 10 2020
accepted: 11 04 2021
revised: 17 11 2020
entrez: 13 7 2021
pubmed: 14 7 2021
medline: 14 7 2021
Statut: epublish

Résumé

Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist. This study aimed to predict the risk factors for COPD diagnosis using machine learning in an annual medical check-up database. In this retrospective observational cohort study (ARTDECO [Analysis of Risk Factors to Detect COPD]), annual medical check-up records for all Hitachi Ltd employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30 to 75 years without a prior diagnosis of COPD/asthma or a history of cancer were included. The database included clinical measurements (eg, pulmonary function tests) and questionnaire responses. To predict the risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning (XGBoost) method was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the prebronchodilator forced expiratory volume in 1 second (FEV Of the 26,101 individuals screened, 1213 met the exclusion criteria, and thus, 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV Using a machine learning model in this longitudinal database, we identified a number of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan.

Sections du résumé

BACKGROUND BACKGROUND
Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances, including tobacco smoke, is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist.
OBJECTIVE OBJECTIVE
This study aimed to predict the risk factors for COPD diagnosis using machine learning in an annual medical check-up database.
METHODS METHODS
In this retrospective observational cohort study (ARTDECO [Analysis of Risk Factors to Detect COPD]), annual medical check-up records for all Hitachi Ltd employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30 to 75 years without a prior diagnosis of COPD/asthma or a history of cancer were included. The database included clinical measurements (eg, pulmonary function tests) and questionnaire responses. To predict the risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning (XGBoost) method was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the prebronchodilator forced expiratory volume in 1 second (FEV
RESULTS RESULTS
Of the 26,101 individuals screened, 1213 met the exclusion criteria, and thus, 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV
CONCLUSIONS CONCLUSIONS
Using a machine learning model in this longitudinal database, we identified a number of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan.

Identifiants

pubmed: 34255684
pii: v9i7e24796
doi: 10.2196/24796
pmc: PMC8293159
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e24796

Informations de copyright

©Shigeo Muro, Masato Ishida, Yoshiharu Horie, Wataru Takeuchi, Shunki Nakagawa, Hideyuki Ban, Tohru Nakagawa, Tetsuhisa Kitamura. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 06.07.2021.

Références

Eur J Epidemiol. 2016 Aug;31(8):785-92
pubmed: 26946425
Medicine (Baltimore). 2016 Apr;95(15):e3371
pubmed: 27082601
Chest. 2000 May;117(5 Suppl 2):354S-9S
pubmed: 10843976
Am J Respir Crit Care Med. 2020 Feb 1;201(3):294-302
pubmed: 31657634
Int J Chron Obstruct Pulmon Dis. 2017 Nov 16;12:3323-3328
pubmed: 29180862
PLoS One. 2012;7(5):e37483
pubmed: 22624038
Int J Chron Obstruct Pulmon Dis. 2019 May 13;14:995-1008
pubmed: 31190785
Respir Investig. 2018 Mar;56(2):120-127
pubmed: 29548649
Ann Intern Med. 2015 Jan 6;162(1):55-63
pubmed: 25560714
Am J Med. 2003 Apr 1;114(5):370-6
pubmed: 12714126
Sci Rep. 2019 Sep 17;9(1):13420
pubmed: 31530874
J R Stat Soc Series B Stat Methodol. 2012 Mar;74(2):245-266
pubmed: 25506256
Platelets. 2011;22(6):466-70
pubmed: 21506665
Am J Respir Crit Care Med. 2017 Mar 1;195(5):557-582
pubmed: 28128970
Respir Med. 2013 Jan;107(1):98-106
pubmed: 23127573
Clin J Pain. 2018 Sep;34(9):787-794
pubmed: 29485534
Am J Respir Crit Care Med. 2018 Jun 15;197(12):1540-1551
pubmed: 29406779
J Cachexia Sarcopenia Muscle. 2016 Dec;7(5):507-509
pubmed: 27891294
Lancet Respir Med. 2018 Jul;6(7):535-544
pubmed: 29628376
Nihon Koshu Eisei Zasshi. 2016;63(8):424-31
pubmed: 27681283
Med Arch. 2017 Apr;71(2):132-136
pubmed: 28790546
Lancet Respir Med. 2016 Sep;4(9):720-730
pubmed: 27444687
Int J Chron Obstruct Pulmon Dis. 2017 May 15;12:1469-1481
pubmed: 28553099
Eur Respir J. 2014 Jan;43(1):54-63
pubmed: 23563262
BMC Bioinformatics. 2008 Sep 25;9:400
pubmed: 18817546
Eur Respir J. 2004 Jun;23(6):932-46
pubmed: 15219010
Eur Respir J. 2005 Aug;26(2):319-38
pubmed: 16055882
Am J Respir Crit Care Med. 2011 Apr 1;183(7):891-7
pubmed: 20935112
Br Med J. 1977 Jun 25;1(6077):1645-8
pubmed: 871704
Respirology. 2004 Nov;9(4):458-65
pubmed: 15612956
Biometrics. 1988 Sep;44(3):837-45
pubmed: 3203132
Am J Respir Crit Care Med. 2016 Dec 1;194(11):1358-1365
pubmed: 27224255
Lancet. 2009 Aug 29;374(9691):721-32
pubmed: 19716965

Auteurs

Shigeo Muro (S)

Department of Respiratory Medicine, Nara Medical University, Nara, Japan.

Masato Ishida (M)

Department of Respiratory and Immunology, Medical, AstraZeneca KK, Osaka, Japan.

Yoshiharu Horie (Y)

Department of Data Science, Medical, AstraZeneca KK, Osaka, Japan.

Wataru Takeuchi (W)

Center for Technology Innovation-Artificial Intelligence, Research & Development Group, Hitachi, Ltd, Tokyo, Japan.

Shunki Nakagawa (S)

Center for Technology Innovation-Artificial Intelligence, Research & Development Group, Hitachi, Ltd, Tokyo, Japan.

Hideyuki Ban (H)

Center for Technology Innovation-Artificial Intelligence, Research & Development Group, Hitachi, Ltd, Tokyo, Japan.

Tohru Nakagawa (T)

Hitachi Health Care Center, Hitachi, Ltd, Ibaraki, Japan.

Tetsuhisa Kitamura (T)

Division of Environmental Medicine and Population Sciences, Department of Social and Environmental Medicine, Graduate School of Medicine, Osaka University, Osaka, Japan.

Classifications MeSH