Applying computer text mining algorithms for oversampling tumor mutation status in medical records for the NCI Patterns of Care studies.

United States Humans Carcinoma, Non-Small-Cell Lung / genetics Lung Neoplasms / genetics National Cancer Institute (U.S.) Anaplastic Lymphoma Kinase / genetics ErbB Receptors / genetics Mutation Algorithms Computers Medical Records

ALK EGFR Non-small cell lung cancer Text mining algorithm Tumor mutation

Journal

International journal of medical informatics

ISSN: 1872-8243

Titre abrégé: Int J Med Inform

Pays: Ireland

ID NLM: 9711057

Informations de publication

Date de publication:
09 2023

Historique:

received: 25 02 2023

revised: 29 06 2023

accepted: 16 07 2023

medline: 14 8 2023

pubmed: 22 7 2023

entrez: 22 7 2023

Statut: ppublish

Résumé

The National Cancer Institute (NCI) conducts Patterns of Care (POC) studies for selected cancer sites under a Congressional Mandate. These studies aim to collect treatment information beyond what is typically collected by the NCI's Surveillance, Epidemiology, and End Results (SEER) Program. The 2019 POC study focused on non-small cell lung cancer (NSCLC) and melanoma cancer sites. For the NSCLC cases, one of the primary sampling objectives was to oversample patients who tested positive for EGFR/ALK mutations, but initial information on mutation test results was unavailable prior to selecting the study sample. To address this, text mining algorithms were developed to screen all eligible NSCLC cases from the SEER database. These algorithms were designed to identify the mutation test status, allowing for stratified sampling based on SEER registry, sex, race/ethnicity, and tumor mutation test results. The final NSCLC sample included 2,434 patients aged 20+ with advanced stage (IIIB-IVB) NSCLC diagnosed in 2017 and 2018. Among this sample, 692 cases (13.2%) tested positive for EGFR/ALK mutations. An evaluation of the text mining algorithms performance, based on cases where both algorithm results and known EGFR/ALK status from medical chart abstraction were available, showed good results: sensitivity of 77.6%, specificity of 90.8%, and an overall accuracy 84.8%. The adaption of text mining algorithm proved effective in oversample patients with uncommon conditions in studies where electronic medical records are accessible. The 2019 POC study provides valuable data for researchers to evaluate cancer therapy details and patient characteristics, particularly among those with EGFR/ALK test positive cases.

Sections du résumé

BACKGROUNDS

METHODS

To address this, text mining algorithms were developed to screen all eligible NSCLC cases from the SEER database. These algorithms were designed to identify the mutation test status, allowing for stratified sampling based on SEER registry, sex, race/ethnicity, and tumor mutation test results.

RESULTS

The final NSCLC sample included 2,434 patients aged 20+ with advanced stage (IIIB-IVB) NSCLC diagnosed in 2017 and 2018. Among this sample, 692 cases (13.2%) tested positive for EGFR/ALK mutations. An evaluation of the text mining algorithms performance, based on cases where both algorithm results and known EGFR/ALK status from medical chart abstraction were available, showed good results: sensitivity of 77.6%, specificity of 90.8%, and an overall accuracy 84.8%.

CONCLUSIONS

The adaption of text mining algorithm proved effective in oversample patients with uncommon conditions in studies where electronic medical records are accessible. The 2019 POC study provides valuable data for researchers to evaluate cancer therapy details and patient characteristics, particularly among those with EGFR/ALK test positive cases.

Identifiants

DOI: 10.1016/j.ijmedinf.2023.105157 PMID: 37480595

pubmed: 37480595

pii: S1386-5056(23)00175-2

doi: 10.1016/j.ijmedinf.2023.105157

pii:

doi:

Substances chimiques

Anaplastic Lymphoma Kinase EC 2.7.10.1

ErbB Receptors EC 2.7.10.1

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

105157

Informations de copyright

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Applying computer text mining algorithms for oversampling tumor mutation status in medical records for the NCI Patterns of Care studies.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Auteurs

Benmei Liu (B)

Jennifer Stevens (J)

Gary Beverungen (G)

Michael T Halpern (MT)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH