CriteriaMapper: establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics.
Clinical trials
Cohort identification
Electronic healthcare records
Eligibility criteria attribute normalization
Eligibility criteria phenotyping
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
25 Oct 2024
25 Oct 2024
Historique:
received:
04
04
2024
accepted:
22
10
2024
medline:
26
10
2024
pubmed:
26
10
2024
entrez:
25
10
2024
Statut:
epublish
Résumé
The use of electronic health records (EHRs) holds the potential to enhance clinical trial activities. However, the identification of eligible patients within EHRs presents considerable challenges. We aimed to develop a CriteriaMapper system for phenotyping eligibility criteria, enabling the identification of patients from EHRs with clinical characteristics that match those criteria. We utilized clinical trial eligibility criteria and patient EHRs from the Mount Sinai Database. The CriteriaMapper system was developed to normalize the criteria using national standard terminologies and in-house databases, facilitating computability and queryability to bridge clinical trial criteria and EHRs. The system employed rule-based pattern recognition and manual annotation. Our system normalized 367 out of 640 unique eligibility criteria attributes, covering various medical conditions including non-small cell lung cancer, small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, Crohn's disease, non-alcoholic steatohepatitis, and sickle cell anemia. About 174 criteria were encoded with standard terminologies and 193 were normalized using the in-house reference tables. The agreement between automated and manual normalization was high (Cohen's Kappa = 0.82), and patient matching demonstrated a 0.94 F1 score. Our system has proven effective on EHRs from multiple institutions, showing broad applicability and promising improved clinical trial processes, leading to better patient selection, and enhanced clinical research outcomes.
Identifiants
pubmed: 39455879
doi: 10.1038/s41598-024-77447-x
pii: 10.1038/s41598-024-77447-x
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
25387Informations de copyright
© 2024. The Author(s).
Références
Ulrich, C. M. et al. RTOG physician and research associate attitudes, beliefs and practices regarding clinical trials: implications for improving patient recruitment. Contemp. Clin. Trials. 31(3), 221–228. https://doi.org/10.1016/j.cct.2010.03.002 (2010).
doi: 10.1016/j.cct.2010.03.002
pubmed: 20215046
Unger, J. M., Cook, E., Tai, E. & Bleyer, A. The role of clinical trial participation in cancer research: barriers, evidence, and strategies. Am. Soc. Clin. Oncol. Educ. Book. (36), 185–198. https://doi.org/10.1200/EDBK_156686 (2016).
Augustine, E. F., Adams, H. R. & Mink, J. W. Clinical trials in rare disease: challenges and opportunities. J. Child. Neurol. 28(9), 1142–1150. https://doi.org/10.1177/0883073813495959 (2013).
doi: 10.1177/0883073813495959
pubmed: 24014509
pmcid: 3964003
Rothwell, P. M. External validity of randomised controlled trials: to whom do the results of this trial apply? Lancet 365(9453), 82–93. https://doi.org/10.1016/S0140-6736(04)17670-8 (2005).
doi: 10.1016/S0140-6736(04)17670-8
pubmed: 15639683
Van Spall, H. G. C., Toren, A., Kiss, A. & Fowler, R. A. Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review. JAMA 297(11), 1233. https://doi.org/10.1001/jama.297.11.1233 (2007).
doi: 10.1001/jama.297.11.1233
pubmed: 17374817
Alexander, M. et al. Evaluation of an artificial intelligence clinical trial matching system in Australian lung cancer patients. JAMIA Open 3(2), 209–215. https://doi.org/10.1093/jamiaopen/ooaa002 (2020).
doi: 10.1093/jamiaopen/ooaa002
pubmed: 32734161
pmcid: 7382632
Angus, D. C. Fusing randomized trials with big data: the key to self-learning health care systems? JAMA 314(8), 767. https://doi.org/10.1001/jama.2015.7762 (2015).
doi: 10.1001/jama.2015.7762
pubmed: 26305643
Beck, J. T. et al. Artificial intelligence tool for optimizing eligibility screening for clinical trials in a large community cancer center. JCO Clin. Cancer Inf. (4), 50–59. https://doi.org/10.1200/CCI.19.00079 (2020).
Meystre, S. M., Heider, P. M., Kim, Y., Aruch, D. B. & Britten, C. D. Automatic trial eligibility surveillance based on unstructured clinical data. Int. J. Med. Inform. 129, 13–19. https://doi.org/10.1016/j.ijmedinf.2019.05.018 (2019).
doi: 10.1016/j.ijmedinf.2019.05.018
Ni, Y. et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC Med. Inf. Decis. Mak. 15(1), 28. https://doi.org/10.1186/s12911-015-0149-3 (2015).
doi: 10.1186/s12911-015-0149-3
Shivade, C. et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J. Am. Med. Inf. Assoc. 21(2), 221–230. https://doi.org/10.1136/amiajnl-2013-001935 (2014).
doi: 10.1136/amiajnl-2013-001935
He, T. et al. Trends and opportunities in computable clinical phenotyping: a scoping review. J. Biomed. Inf. 140, 104335. https://doi.org/10.1016/j.jbi.2023.104335 (2023).
doi: 10.1016/j.jbi.2023.104335
Zeng, Z., Deng, Y., Li, X., Naumann, T. & Luo, Y. Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans. Comput. Biol. Bioinf. 16(1), 139–153. https://doi.org/10.1109/TCBB.2018.2849968 (2019).
doi: 10.1109/TCBB.2018.2849968
Richesson, R. L., Sun, J., Pathak, J., Kho, A. N. & Denny, J. C. Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods. Artif. Intell. Med. 71, 57–61. https://doi.org/10.1016/j.artmed.2016.05.005 (2016).
doi: 10.1016/j.artmed.2016.05.005
pubmed: 27506131
pmcid: 5480212
Lee, K. et al. Optimizing clinical trial eligibility design using natural language processing models and real-world data: algorithm development and validation. JMIR AI 3, e50800. https://doi.org/10.2196/50800 (2024).
doi: 10.2196/50800
pubmed: 39073872
pmcid: 11319878
Pathak, J., Kho, A. N. & Denny, J. C. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J. Am. Med. Inf. Assoc. 20(e2), e206–e211. https://doi.org/10.1136/amiajnl-2013-002428 (2013).
doi: 10.1136/amiajnl-2013-002428
Yuan, C. et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J. Am. Med. Inform. Assoc. 26(4), 294–305. https://doi.org/10.1093/jamia/ocy178 (2019).
doi: 10.1093/jamia/ocy178
pubmed: 30753493
pmcid: 6402359
Bodenreider, O. Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearb Med. Inf. Published Online 2008:67–79 .
Chondrogiannis, E. et al. A novel semantic representation for eligibility criteria in clinical trials. J. Biomed. Inform. 69, 10–23. https://doi.org/10.1016/j.jbi.2017.03.013 (2017).
doi: 10.1016/j.jbi.2017.03.013
pubmed: 28336477
Hassanzadeh, H., Karimi, S. & Nguyen, A. Matching patients to clinical trials using semantically enriched document representation. J. Biomed. Inform. 105, 103406. https://doi.org/10.1016/j.jbi.2020.103406 (2020).
doi: 10.1016/j.jbi.2020.103406
pubmed: 32169670
Hersh, W. R. & Greenes, R. A. SAPHIRE—an information retrieval system featuring concept matching, automatic indexing, probabilistic retrieval, and hierarchical relationships. Comput. Biomed. Res. 23(5), 410–425. https://doi.org/10.1016/0010-4809(90)90031-7 (1990).
doi: 10.1016/0010-4809(90)90031-7
pubmed: 2225787
Liu, H. et al. An information extraction framework for cohort identification using electronic health records. AMIA Jt Summits Transl Sci Proc. 2013:149–153. (2013).
Richesson, R. L. et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory: table 1. J. Am. Med. Inf. Assoc. 20(e2), e226–e231. https://doi.org/10.1136/amiajnl-2013-001926 (2013).
doi: 10.1136/amiajnl-2013-001926
Weng, C., Tu, S. W., Sim, I. & Richesson, R. Formal representation of eligibility criteria: a literature review. J. Biomed. Inform. 43(3), 451–467. https://doi.org/10.1016/j.jbi.2009.12.004 (2010).
doi: 10.1016/j.jbi.2009.12.004
pubmed: 20034594
Lonsdale, D. W., Tustison, C., Parker, C. G. & Embley, D. W. Assessing clinical trial eligibility with logic expression queries. Data Knowl. Eng. 66(1), 3–17. https://doi.org/10.1016/j.datak.2007.07.005 (2008).
doi: 10.1016/j.datak.2007.07.005
Soares, A., Jenders, R. A., Harrison, R. & Schilling, L. M. A comparison of Arden syntax and clinical quality language as knowledge representation formalisms for clinical decision support. Appl. Clin. Inf. 12(3), 495–506. https://doi.org/10.1055/s-0041-1731001 (2021).
doi: 10.1055/s-0041-1731001
Sordo, M., Boxwala, A. A., Ogunyemi, O. & Greenes, R. A. Description and status update on GELLO: a proposed standardized object-oriented expression language for clinical decision support. Stud. Health Technol. Inf. 107(Pt 1), 164–168 (2004).
Bache, R., Taweel, A., Miles, S. & Delaney, B. C. An eligibility criteria query language for heterogeneous data warehouses. Methods Inf. Med.54(1), 41–44. https://doi.org/10.3414/ME13-02-0027 (2015).
doi: 10.3414/ME13-02-0027
pubmed: 24985949
Lindsay, J. et al. MatchMiner: an open source computational platform for real-time matching of cancer patients to precision medicine clinical trials using genomic and clinical criteria. Published Online Oct. 11 https://doi.org/10.1101/199489 (2017).
Tu, S. W. et al. A practical method for transforming free-text eligibility criteria into computable criteria. J. Biomed. Inform. 44(2), 239–250. https://doi.org/10.1016/j.jbi.2010.09.007 (2011).
doi: 10.1016/j.jbi.2010.09.007
pubmed: 20851207
Weng, C. et al. EliXR: an approach to eligibility criteria extraction and representation. J. Am. Med. Inform. Assoc. 18(Supplement 1), i116–i124. https://doi.org/10.1136/amiajnl-2011-000321 (2011).
doi: 10.1136/amiajnl-2011-000321
pubmed: 21807647
pmcid: 3241167
Wang, P., Shi, T. & Reddy, C. K. Text-to-SQL generation for question answering on Electronic Medical records. Published online 2019. https://doi.org/10.48550/ARXIV.1908.01839
Antoniou, G. & Harmelen, F. V. Web ontology language: OWL. In: (eds Staab, S. & Studer, R.) Handbook on Ontologies. Springer Berlin Heidelberg; :91–110. doi: https://doi.org/10.1007/978-3-540-92673-3_4 (2009).
doi: 10.1007/978-3-540-92673-3_4
Tudose, I. et al. OntoQuery: easy-to-use web-based OWL querying. Bioinformatics 29(22), 2955–2957. https://doi.org/10.1093/bioinformatics/btt514 (2013).
doi: 10.1093/bioinformatics/btt514
pubmed: 24008420
pmcid: 3810857
Kang, T. et al. EliIE: an open-source information extraction system for clinical trial eligibility criteria. J. Am. Med. Inf. Assoc. 24(6), 1062–1071. https://doi.org/10.1093/jamia/ocx019 (2017).
doi: 10.1093/jamia/ocx019
Li, X. et al. A comparison between human and NLP-based annotation of clinical trial eligibility criteria text using the OMOP common data model. AMIA Jt Summits Transl Sci Proc. 2021:394–403. (2021).
Ghim, J. L. & Ahn, S. Transforming clinical trials: the emerging roles of large language models. Transl Clin. Pharmacol. 31(3), 131. https://doi.org/10.12793/tcp.2023.31.e16 (2023).
doi: 10.12793/tcp.2023.31.e16
pubmed: 37810626
pmcid: 10551746
Jin, Q., Wang, Z., Floudas, C. S., Sun, J. & Lu, Z. Matching patients to clinical trials with large language models. ArXiv. Published online July 28, 2023:arXiv:2307.15051v2.
Datta, S. et al. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. J. Am. Med. Inf. Assoc. ocad218. https://doi.org/10.1093/jamia/ocad218 (2023). Published online November 11.
den Hamer, D. M., Schoor, P., Polak, T. B. & Kapitan, D. Improving patient pre-screening for clinical trials: assisting physicians with large language models. Published Online. https://doi.org/10.48550/ARXIV.2304.07396 (2023).
doi: 10.48550/ARXIV.2304.07396
Singhal, K. et al. Large language models encode clinical knowledge. Published online 2022. https://doi.org/10.48550/ARXIV.2212.13138
Soroush, A. et al. Large language models are poor medical coders—benchmarking of medical code querying. NEJM AI 1(5). https://doi.org/10.1056/AIdbp2300040 (2024).