Gene expression microarray public dataset reanalysis in chronic obstructive pulmonary disease.
Age Factors
Air Pollutants
/ adverse effects
Biomarkers
/ metabolism
Data Mining
Datasets as Topic
Down-Regulation
Female
Gene Expression Profiling
/ statistics & numerical data
Humans
Logistic Models
Machine Learning
Male
Models, Genetic
Oligonucleotide Array Sequence Analysis
/ statistics & numerical data
Pulmonary Disease, Chronic Obstructive
/ diagnosis
Risk Assessment
/ methods
Risk Factors
Sex Factors
Smoking
/ adverse effects
Transcriptome
/ genetics
United States
/ epidemiology
Up-Regulation
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2019
2019
Historique:
received:
17
06
2019
accepted:
21
10
2019
entrez:
16
11
2019
pubmed:
16
11
2019
medline:
3
4
2020
Statut:
epublish
Résumé
Chronic obstructive pulmonary disease (COPD) was classified by the Centers for Disease Control and Prevention in 2014 as the 3rd leading cause of death in the United States (US). The main cause of COPD is exposure to tobacco smoke and air pollutants. Problems associated with COPD include under-diagnosis of the disease and an increase in the number of smokers worldwide. The goal of our study is to identify disease variability in the gene expression profiles of COPD subjects compared to controls, by reanalyzing pre-existing, publicly available microarray expression datasets. Our inclusion criteria for microarray datasets selected for smoking status, age and sex of blood donors reported. Our datasets used Affymetrix, Agilent microarray platforms (7 datasets, 1,262 samples). We re-analyzed the curated raw microarray expression data using R packages, and used Box-Cox power transformations to normalize datasets. To identify significant differentially expressed genes we used generalized least squares models with disease state, age, sex, smoking status and study as effects that also included binary interactions, followed by likelihood ratio tests (LRT). We found 3,315 statistically significant (Storey-adjusted q-value <0.05) differentially expressed genes with respect to disease state (COPD or control). We further filtered these genes for biological effect using results from LRT q-value <0.05 and model estimates' 10% two-tailed quantiles of mean differences between COPD and control), to identify 679 genes. Through analysis of disease, sex, age, and also smoking status and disease interactions we identified differentially expressed genes involved in a variety of immune responses and cell processes in COPD. We also trained a logistic regression model using the common array genes as features, which enabled prediction of disease status with 81.7% accuracy. Our results give potential for improving the diagnosis of COPD through blood and highlight novel gene expression disease signatures.
Identifiants
pubmed: 31730674
doi: 10.1371/journal.pone.0224750
pii: PONE-D-19-16838
pmc: PMC6857915
doi:
Substances chimiques
Air Pollutants
0
Biomarkers
0
Banques de données
figshare
['10.6084/m9.figshare.8233175']
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0224750Déclaration de conflit d'intérêts
I have read the journal’s policy and the authors of this manuscript have the following competing interests: GIM has previously consulted for Colgate-Palmolive. LRKR and MV have declared that no competing interests exist. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Références
Lancet. 2017 May 13;389(10082):1931-1940
pubmed: 28513453
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Curr Protoc Bioinformatics. 2016 Jun 20;54:1.30.1-1.30.33
pubmed: 27322403
Sci Rep. 2016 Nov 24;6:37237
pubmed: 27883025
Nucleic Acids Res. 2002 Jan 1;30(1):207-10
pubmed: 11752295
Nat Rev Dis Primers. 2015 Dec 03;1:15076
pubmed: 27189863
Bioinformatics. 2004 Feb 12;20(3):307-15
pubmed: 14960456
Clin Vaccine Immunol. 2011 Dec;18(12):2050-9
pubmed: 21976223
Indian J Med Res. 2013 Feb;137(2):251-69
pubmed: 23563369
Methods. 2003 Dec;31(4):282-9
pubmed: 14597312
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
Front Neurosci. 2019 Apr 24;13:392
pubmed: 31068785
Eur Respir J. 2012 May;39(5):1230-40
pubmed: 22088970
Mediators Inflamm. 2017;2017:3520581
pubmed: 28588349
J Toxicol Sci. 2016 Feb;41(1):77-89
pubmed: 26763395
Am J Respir Cell Mol Biol. 2013 Aug;49(2):316-23
pubmed: 23590301
Nucleic Acids Res. 2003 Jan 1;31(1):68-71
pubmed: 12519949
Bioinformatics. 2010 Oct 1;26(19):2363-7
pubmed: 20688976
Int J Environ Res Public Health. 2009 Jan;6(1):209-24
pubmed: 19440278
Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361
pubmed: 27899662
Biostatistics. 2016 Jan;17(1):29-39
pubmed: 26272994
Am J Respir Crit Care Med. 2016 Apr 15;193(8):813-4
pubmed: 27082528
Mayo Clin Proc. 2018 Oct;93(10):1488-1502
pubmed: 30286833
Bioinformatics. 2007 Oct 15;23(20):2700-7
pubmed: 17720982
Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62
pubmed: 26476454
Thorax. 2003 Oct;58(10):832-4
pubmed: 14514931
PLoS One. 2014 Sep 29;9(9):e107381
pubmed: 25265030
PLoS One. 2017 Oct 9;12(10):e0185682
pubmed: 29016655
Hum Exp Toxicol. 2015 Dec;34(12):1200-11
pubmed: 26614807
Int J Radiat Biol. 2011 Aug;87(8):791-801
pubmed: 21801107
Transl Res. 2013 Oct;162(4):208-18
pubmed: 23684710
Genomics. 2016 Mar;107(2-3):51-58
pubmed: 26773458
Thorax. 2000 Jan;55(1):12-8
pubmed: 10607796
Nucleic Acids Res. 2000 Jan 1;28(1):27-30
pubmed: 10592173
Mol Biosyst. 2016 Feb;12(2):477-9
pubmed: 26661513
Adv Exp Med Biol. 2012;727:89-98
pubmed: 22399341
BMC Genomics. 2017 Feb 14;18(1):156
pubmed: 28193179
J Clin Invest. 2001 Jun;107(11):1357-64
pubmed: 11390417
IEEE J Biomed Health Inform. 2018 Sep;22(5):1486-1496
pubmed: 29990220
J Carcinog Mutagen. 2014;5:null
pubmed: 25621181
PLoS One. 2015 Oct 13;10(10):e0140022
pubmed: 26462087
Glob Health Epidemiol Genom. 2018 Apr 6;3:e4
pubmed: 29868229