Development and validation of coding algorithms to identify patients with incident lung cancer in United States healthcare claims data.
Medicare claims
SEER registry
administrative claims
algorithm
lung cancer
machine learning
pharmacoepidemiology
positive predictive value
sensitivity
validation
Journal
Pharmacoepidemiology and drug safety
ISSN: 1099-1557
Titre abrégé: Pharmacoepidemiol Drug Saf
Pays: England
ID NLM: 9208369
Informations de publication
Date de publication:
11 2020
11 2020
Historique:
received:
22
04
2020
revised:
01
09
2020
accepted:
09
09
2020
pubmed:
5
10
2020
medline:
25
11
2021
entrez:
4
10
2020
Statut:
ppublish
Résumé
Our aim was to develop and validate a practical US healthcare claims algorithm for identifying incident lung cancer that improves on positive predictive value (PPV) and sensitivity observed in past studies. Patients newly diagnosed with lung cancer in Surveillance, Epidemiology, and End Results (SEER) (gold standard) were linked with Medicare claims. A 5% Medicare "other cancer" sample and noncancer sample served as controls. A split-sample validation approach was used. Rules-based, regression, and machine learning models for developing algorithms were explored. Algorithms were developed in the model building subset. Rules-based algorithms and those with the highest F scores were evaluated in the validation subset. F scores were compared for 1000 bootstrap samples. Misclassification was evaluated by calculating the odds of selection by the algorithm among true positives and true negatives. A practical single-score algorithm derived from a logistic regression model had sensitivity = 78.22% and PPV = 78.50% (F score: 78.36). The algorithm was most likely to misclassify older patients (ages ≥80 years) or with missing data in the SEER registry, shorter follow-up time in Medicare (<3 months), insurance through Veterans Affairs, >1 cancer in SEER, or certain Charlson comorbidities (dementia, chronic pulmonary disease, liver disease, or myocardial infarction). In this dataset, a practical point-based algorithm for identifying incident lung cancer demonstrated significant and substantial improvement (7.9% and 23.9% absolute improvement in sensitivity and PPV, respectively) compared with a current standard.
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1465-1479Informations de copyright
© 2020 John Wiley & Sons Ltd.
Références
Schulman KL, Berenson K, Tina Shih YC, et al. A checklist for ascertaining study cohorts in oncology health services research using secondary data: report of the ISPOR oncology good outcomes research practices working group. Value Health. 2013;16(4):655-669.
Uno HR, Ritzwoller DP, Cronin AM, Carroll NM, Hornbrook MC, Hassett MJ. Determining the time of cancer recurrence using claims or Electronic Medical Record data. JCO Clin Cancer Inform. 2018;2:1-10.
Abraha I, Montedori A, Serraino D, et al. Accuracy of administrative databases in detecting primary breast cancer diagnoses: a systematic review. BMJ Open. 2018;8(7):e019264.
van Walraven C, Austin P. Administrative database research has unique characteristics that can risk biased results. J Clin Epidemiol. 2012;65(2):126-131.
Chan AW, Fung K, Tran JM, et al. Application of recursive partitioning to derive and validate a claims-based algorithm for identifying keratinocyte carcinoma (nonmelanoma skin cancer). JAMA Dermatol. 2016;152(10):1122-1127.
Nordstrom BL, Simeone JC, Malley KG, et al. Validation of claims algorithms for progression to metastatic cancer in patients with breast, non-small cell lung, and colorectal cancer. Front Oncol. 2016;6:18.
Bergquist SL, Brooks GA, Keating NL, Landrum MB, Rose S. Classifying lung cancer severity with ensemble machine learning in Health Care Claims Data. Proc Mach Learn Res. 2017;68:25-38.
McClish DK, Penberthy L, Whittemore M, et al. Ability of Medicare claims data and cancer registries to identify cancer cases and treatment. Am J Epidemiol. 1997;145(3):227-233.
Cooper GS, Yuan Z, Stange KC, Amini SB, Dennis LK, Rimm AA. The utility of Medicare claims data for measuring cancer stage. Med Care. 1999;37(7):706-711.
Ramsey SD, Scoggins JF, Blough DK, McDermott CL, Reyes CM. Sensitivity of administrative claims to identify incident cases of lung cancer: a comparison of 3 health plans. J Manag Care Pharm. 2009;15(8):659-668.
Setoguchi S, Solomon DH, Glynn RJ, Cook EF, Levin R, Schneeweiss S. Agreement of diagnosis and its date for hematologic malignancies and solid tumors between Medicare claims and cancer registry data. Cancer Causes Control. 2007;18(5):561-569.
Penberthy L, McClish D, Manning C, Retchin S, Smith T. The added value of claims for cancer surveillance: results of varying case definitions. Med Care. 2005;43(7):705-712.
SEER program (Number of persons by race and hispanic ethnicity for SEER participants [2010 Census Data]). Available at: http://seer.cancer.gov/registries/data.html. Accessed from December 16, 2019.
Nattinger AB, Laud PW, Bajorunaite R, Sparapani RA, Freeman JL. An algorithm for the use of Medicare claims data to identify women with incident breast cancer. Health Serv Res. 2004;39(6 Pt 1):1733-1749.
Zhao Z, Zhang R, Cox J, Duling D, Sarle W. Massively parallel feature selection: an approach based on variance preservation. Mach Learn. 2013;92(1):195-220.
Friedman JH. Multivariate adaptive regression splines. Ann Statist. 1991;19(1):1-141.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B Methodol. 1996;58(1):267-288.
National Center for Health Statistics (NCHS) (2018). Continuous NHANES 2007-2008. Public-use data file and documentation. Available at: https://wwwn.cdc.gov/nchs/nhanes/ResponseRates.aspx#population-totals.
SEER Cancer Statistics Review 2008, Howlader N, Noone AM, et al. SEER Cancer Statistics Review, 1975-2008, National Cancer Institute. Bethesda, MD. Available at: https://seer.cancer.gov/csr/1975_2008/. Based on November 2010 SEER data submission, posted to the SEER website, 2011.
Flatiron Health Inc. Electronic health record-derived database. Data on File. Accessed from March 23, 2018.
Berger NA, Savvides P, Koroukian SM, et al. Cancer in the Elderly. Trans Am Clin Climatol Assoc. 2006;117:147-156.
Blair DC. Information Retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths; 1979: 208 pp. J Am Soc Inf Sci. 1979;30:374-375.
Nadpara P, Madhavan SS, Tworek C. Guideline-concordant timely lung cancer care and prognosis among elderly patients in the United States: a population-based study. Cancer Epidemiol. 2015;39(6):1136-1144.
Wong ML, McMurry TL, Stukenborg GJ, et al. Impact of age and comorbidity on treatment of non-small cell lung cancer recurrence following complete resection: a nationally representative cohort study. Lung Cancer. 2016;102:108-117.
Lin CC, Virgo KS. Diagnosis date agreement between SEER and medicare claims data: impact on treatment. Med Care. 2014;52(1):32-37.