Columbia Open Health Data for COVID-19 Research: Database Analysis.

COVID-19 access cohort data data science database electronic health record open data prevalence research symptom

Journal

Journal of medical Internet research
ISSN: 1438-8871
Titre abrégé: J Med Internet Res
Pays: Canada
ID NLM: 100959882

Informations de publication

Date de publication:
30 09 2021
Historique:
received: 11 06 2021
accepted: 03 08 2021
revised: 03 08 2021
pubmed: 21 9 2021
medline: 6 10 2021
entrez: 20 9 2021
Statut: epublish

Résumé

COVID-19 has threatened the health of tens of millions of people all over the world. Massive research efforts have been made in response to the COVID-19 pandemic. Utilization of clinical data can accelerate these research efforts to combat the pandemic since important characteristics of the patients are often found by examining the clinical data. Publicly accessible clinical data on COVID-19, however, remain limited despite the immediate need. To provide shareable clinical data to catalyze COVID-19 research, we present Columbia Open Health Data for COVID-19 Research (COHD-COVID), a publicly accessible database providing clinical concept prevalence, clinical concept co-occurrence, and clinical symptom prevalence for hospitalized patients with COVID-19. COHD-COVID also provides data on hospitalized patients with influenza and general hospitalized patients as comparator cohorts. The data used in COHD-COVID were obtained from NewYork-Presbyterian/Columbia University Irving Medical Center's electronic health records database. Condition, drug, and procedure concepts were obtained from the visits of identified patients from the cohorts. Rare concepts were excluded, and the true concept counts were perturbed using Poisson randomization to protect patient privacy. Concept prevalence, concept prevalence ratio, concept co-occurrence, and symptom prevalence were calculated using the obtained concepts. Concept prevalence and concept prevalence ratio analyses showed the clinical characteristics of the COVID-19 cohorts, confirming the well-known characteristics of COVID-19 (eg, acute lower respiratory tract infection and cough). The concepts related to the well-known characteristics of COVID-19 recorded high prevalence and high prevalence ratio in the COVID-19 cohort compared to the hospitalized influenza cohort and general hospitalized cohort. Concept co-occurrence analyses showed potential associations between specific concepts. In case of acute lower respiratory tract infection in the COVID-19 cohort, a high co-occurrence ratio was obtained with COVID-19-related concepts and commonly used drugs (eg, disease due to coronavirus and acetaminophen). Symptom prevalence analysis indicated symptom-level characteristics of the cohorts and confirmed that well-known symptoms of COVID-19 (eg, fever, cough, and dyspnea) showed higher prevalence than the hospitalized influenza cohort and the general hospitalized cohort. We present COHD-COVID, a publicly accessible database providing useful clinical data for hospitalized patients with COVID-19, hospitalized patients with influenza, and general hospitalized patients. We expect COHD-COVID to provide researchers and clinicians quantitative measures of COVID-19-related clinical features to better understand and combat the pandemic.

Sections du résumé

BACKGROUND
COVID-19 has threatened the health of tens of millions of people all over the world. Massive research efforts have been made in response to the COVID-19 pandemic. Utilization of clinical data can accelerate these research efforts to combat the pandemic since important characteristics of the patients are often found by examining the clinical data. Publicly accessible clinical data on COVID-19, however, remain limited despite the immediate need.
OBJECTIVE
To provide shareable clinical data to catalyze COVID-19 research, we present Columbia Open Health Data for COVID-19 Research (COHD-COVID), a publicly accessible database providing clinical concept prevalence, clinical concept co-occurrence, and clinical symptom prevalence for hospitalized patients with COVID-19. COHD-COVID also provides data on hospitalized patients with influenza and general hospitalized patients as comparator cohorts.
METHODS
The data used in COHD-COVID were obtained from NewYork-Presbyterian/Columbia University Irving Medical Center's electronic health records database. Condition, drug, and procedure concepts were obtained from the visits of identified patients from the cohorts. Rare concepts were excluded, and the true concept counts were perturbed using Poisson randomization to protect patient privacy. Concept prevalence, concept prevalence ratio, concept co-occurrence, and symptom prevalence were calculated using the obtained concepts.
RESULTS
Concept prevalence and concept prevalence ratio analyses showed the clinical characteristics of the COVID-19 cohorts, confirming the well-known characteristics of COVID-19 (eg, acute lower respiratory tract infection and cough). The concepts related to the well-known characteristics of COVID-19 recorded high prevalence and high prevalence ratio in the COVID-19 cohort compared to the hospitalized influenza cohort and general hospitalized cohort. Concept co-occurrence analyses showed potential associations between specific concepts. In case of acute lower respiratory tract infection in the COVID-19 cohort, a high co-occurrence ratio was obtained with COVID-19-related concepts and commonly used drugs (eg, disease due to coronavirus and acetaminophen). Symptom prevalence analysis indicated symptom-level characteristics of the cohorts and confirmed that well-known symptoms of COVID-19 (eg, fever, cough, and dyspnea) showed higher prevalence than the hospitalized influenza cohort and the general hospitalized cohort.
CONCLUSIONS
We present COHD-COVID, a publicly accessible database providing useful clinical data for hospitalized patients with COVID-19, hospitalized patients with influenza, and general hospitalized patients. We expect COHD-COVID to provide researchers and clinicians quantitative measures of COVID-19-related clinical features to better understand and combat the pandemic.

Identifiants

pubmed: 34543225
pii: v23i9e31122
doi: 10.2196/31122
pmc: PMC8485985
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e31122

Subventions

Organisme : NCATS NIH HHS
ID : OT2 TR003434
Pays : United States
Organisme : NLM NIH HHS
ID : R01 LM012895
Pays : United States

Informations de copyright

©Junghwan Lee, Jae Hyun Kim, Cong Liu, George Hripcsak, Karthik Natarajan, Casey Ta, Chunhua Weng. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.09.2021.

Références

Sci Data. 2020 Aug 27;7(1):285
pubmed: 32855430
Life Sci. 2020 Jul 1;252:117652
pubmed: 32278693
Lancet Neurol. 2020 Sep;19(9):713-715
pubmed: 32822622
N Engl J Med. 2020 Apr 30;382(18):1708-1720
pubmed: 32109013
Clin Transl Sci. 2019 Jul;12(4):329-333
pubmed: 31074176
EGEMS (Wash DC). 2016 Sep 11;4(1):1244
pubmed: 27713905
J Biomed Inform. 2019 Dec;100:103325
pubmed: 31676459
Stud Health Technol Inform. 2019 Aug 21;264:383-387
pubmed: 31437950
Clin Transl Sci. 2019 Mar;12(2):91-94
pubmed: 30412340
Sci Data. 2018 Nov 27;5:180273
pubmed: 30480666
N Engl J Med. 2020 Jun 18;382(25):2411-2418
pubmed: 32379955
J Med Internet Res. 2021 Sep 30;23(9):e31122
pubmed: 34543225
Sci Data. 2020 Oct 8;7(1):345
pubmed: 33033256
Nat Commun. 2020 Oct 6;11(1):5009
pubmed: 33024121
Lancet. 2020 Mar 28;395(10229):1054-1062
pubmed: 32171076
Clin Transl Sci. 2019 Mar;12(2):86-90
pubmed: 30412337
J Am Med Inform Assoc. 2010 Mar-Apr;17(2):169-77
pubmed: 20190059
BMJ. 2020 May 29;369:m1996
pubmed: 32471884
Sci Data. 2020 Mar 24;7(1):106
pubmed: 32210236
Sci Data. 2020 Aug 27;7(1):286
pubmed: 32855428
N Engl J Med. 2020 May 14;382(20):e60
pubmed: 32343504

Auteurs

Junghwan Lee (J)

Columbia University, New York, NY, United States.

Jae Hyun Kim (JH)

Columbia University, New York, NY, United States.

Cong Liu (C)

Columbia University, New York, NY, United States.

George Hripcsak (G)

Columbia University, New York, NY, United States.

Karthik Natarajan (K)

Columbia University, New York, NY, United States.

Casey Ta (C)

Columbia University, New York, NY, United States.

Chunhua Weng (C)

Columbia University, New York, NY, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH