A semi-supervised approach for rapidly creating clinical biomarker phenotypes in the UK Biobank using different primary care EHR and clinical terminology systems.

UK Biobank electronic health records medical informatics phenotyping

Journal

JAMIA open
ISSN: 2574-2531
Titre abrégé: JAMIA Open
Pays: United States
ID NLM: 101730643

Informations de publication

Date de publication:
Dec 2020
Historique:
received: 19 05 2020
revised: 10 08 2020
accepted: 14 09 2020
entrez: 23 2 2021
pubmed: 24 2 2021
medline: 24 2 2021
Statut: epublish

Résumé

The UK Biobank (UKB) is making primary care electronic health records (EHRs) for 500 000 participants available for COVID-19-related research. Data are extracted from four sources, recorded using five clinical terminologies and stored in different schemas. The aims of our research were to: (a) develop a semi-supervised approach for bootstrapping EHR phenotyping algorithms in UKB EHR, and (b) to evaluate our approach by implementing and evaluating phenotypes for 31 common biomarkers. We describe an algorithmic approach to phenotyping biomarkers in primary care EHR involving (a) bootstrapping definitions using existing phenotypes, (b) excluding generic, rare, or semantically distant terms, (c) forward-mapping terminology terms, (d) expert review, and (e) data extraction. We evaluated the phenotypes by assessing the ability to reproduce known epidemiological associations with all-cause mortality using Cox proportional hazards models. We created and evaluated phenotyping algorithms for 31 biomarkers many of which are directly related to COVID-19 complications, for example diabetes, cardiovascular disease, respiratory disease. Our algorithm identified 1651 Read v2 and Clinical Terms Version 3 terms and automatically excluded 1228 terms. Clinical review excluded 103 terms and included 44 terms, resulting in 364 terms for data extraction (sensitivity 0.89, specificity 0.92). We extracted 38 190 682 events and identified 220 978 participants with at least one biomarker measured. Bootstrapping phenotyping algorithms from similar EHR can potentially address pre-existing methodological concerns that undermine the outputs of biomarker discovery pipelines and provide research-quality phenotyping algorithms.

Identifiants

pubmed: 33619467
doi: 10.1093/jamiaopen/ooaa047
pii: ooaa047
pmc: PMC7717266
doi:

Types de publication

Journal Article

Langues

eng

Pagination

545-556

Subventions

Organisme : Medical Research Council
ID : MC_PC_13041
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/K006584/1
Pays : United Kingdom

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Références

BMJ. 2020 Feb 12;368:m131
pubmed: 32051121
J Am Med Inform Assoc. 2013 Jun;20(e1):e147-54
pubmed: 23531748
N Engl J Med. 2015 Feb 26;372(9):793-5
pubmed: 25635347
Open Heart. 2016 Sep 05;3(2):e000477
pubmed: 27621833
Diabetes Care. 2021 Jan;44(1):50-57
pubmed: 33097559
Public Health. 2021 Feb;191:41-47
pubmed: 33497994
J Travel Med. 2020 May 18;27(3):
pubmed: 32109273
Lancet. 2014 Jan 11;383(9912):166-75
pubmed: 24411645
Respir Med. 2020 Jun;167:105941
pubmed: 32421537
Eur J Prev Cardiol. 2021 Dec 20;28(14):1599-1609
pubmed: 33611594
BMJ Open. 2020 Nov 19;10(11):e040402
pubmed: 33444201
Clin Chem. 2017 May;63(5):963-972
pubmed: 28270433
J Clin Epidemiol. 2016 Feb;70:214-23
pubmed: 26441289
Database (Oxford). 2012 Oct 10;2012:bas033
pubmed: 23060432
Lancet Digit Health. 2019 May 20;1(2):e63-e77
pubmed: 31650125
Science. 2020 May 1;368(6490):476-477
pubmed: 32327600
JAMA Cardiol. 2020 Jul 1;5(7):811-818
pubmed: 32219356
Clin Pharmacol Ther. 2008 Sep;84(3):362-9
pubmed: 18500243
Lancet. 2020 May 30;395(10238):1715-1725
pubmed: 32405103
J Am Med Inform Assoc. 2014 Feb;21(e1):e11-9
pubmed: 23828173
China CDC Wkly. 2020 Feb 21;2(8):113-122
pubmed: 34594836
Lancet. 2014 May 31;383(9932):1899-911
pubmed: 24881994
J Am Med Inform Assoc. 2019 Dec 1;26(12):1545-1559
pubmed: 31329239

Auteurs

Spiros Denaxas (S)

Institute of Health Informatics, University College London, London, UK.
Health Data Research UK, University College London, London, UK.
The Alan Turing Institute, London UK.
British Heart Foundation Research Accelerator, University College London, London, UK.

Anoop D Shah (AD)

Institute of Health Informatics, University College London, London, UK.
Health Data Research UK, University College London, London, UK.

Bilal A Mateen (BA)

The Alan Turing Institute, London UK.
King's College Hospital, London, UK.

Valerie Kuan (V)

Health Data Research UK, University College London, London, UK.
British Heart Foundation Research Accelerator, University College London, London, UK.
Institute of Cardiovascular Science, University College London, London, UK.

Jennifer K Quint (JK)

Health Data Research UK, University College London, London, UK.
National Heart and Lung Institute, Imperial College London, London, UK.

Natalie Fitzpatrick (N)

Institute of Health Informatics, University College London, London, UK.
Health Data Research UK, University College London, London, UK.

Ana Torralbo (A)

Institute of Health Informatics, University College London, London, UK.
Health Data Research UK, University College London, London, UK.

Ghazaleh Fatemifar (G)

Institute of Health Informatics, University College London, London, UK.
Health Data Research UK, University College London, London, UK.

Harry Hemingway (H)

Institute of Health Informatics, University College London, London, UK.
Health Data Research UK, University College London, London, UK.
British Heart Foundation Research Accelerator, University College London, London, UK.

Classifications MeSH