Generalizability of electroencephalographic interpretation using artificial intelligence: An external validation study.
automatic EEG analysis
epilepsy
normal variants
routine EEG
sleep
Journal
Epilepsia
ISSN: 1528-1167
Titre abrégé: Epilepsia
Pays: United States
ID NLM: 2983306R
Informations de publication
Date de publication:
14 Aug 2024
14 Aug 2024
Historique:
revised:
12
07
2024
received:
27
04
2024
accepted:
25
07
2024
medline:
14
8
2024
pubmed:
14
8
2024
entrez:
14
8
2024
Statut:
aheadofprint
Résumé
The automated interpretation of clinical electroencephalograms (EEGs) using artificial intelligence (AI) holds the potential to bridge the treatment gap in resource-limited settings and reduce the workload at specialized centers. However, to facilitate broad clinical implementation, it is essential to establish generalizability across diverse patient populations and equipment. We assessed whether SCORE-AI demonstrates diagnostic accuracy comparable to that of experts when applied to a geographically different patient population, recorded with distinct EEG equipment and technical settings. We assessed the diagnostic accuracy of a "fixed-and-frozen" AI model, using an independent dataset and external gold standard, and benchmarked it against three experts blinded to all other data. The dataset comprised 50% normal and 50% abnormal routine EEGs, equally distributed among the four major classes of EEG abnormalities (focal epileptiform, generalized epileptiform, focal nonepileptiform, and diffuse nonepileptiform). To assess diagnostic accuracy, we computed sensitivity, specificity, and accuracy of the AI model and the experts against the external gold standard. We analyzed EEGs from 104 patients (64 females, median age = 38.6 [range = 16-91] years). SCORE-AI performed equally well compared to the experts, with an overall accuracy of 92% (95% confidence interval [CI] = 90%-94%) versus 94% (95% CI = 92%-96%). There was no significant difference between SCORE-AI and the experts for any metric or category. SCORE-AI performed well independently of the vigilance state (false classification during awake: 5/41 [12.2%], false classification during sleep: 2/11 [18.2%]; p = .63) and normal variants (false classification in presence of normal variants: 4/14 [28.6%], false classification in absence of normal variants: 3/38 [7.9%]; p = .07). SCORE-AI achieved diagnostic performance equal to human experts in an EEG dataset independent of the development dataset, in a geographically distinct patient population, recorded with different equipment and technical settings than the development dataset.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : CIHR
ID : PJT-175056
Pays : Canada
Organisme : Duke University
ID : Start-up funding
Informations de copyright
© 2024 The Author(s). Epilepsia published by Wiley Periodicals LLC on behalf of International League Against Epilepsy.
Références
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930–1940.
Guekht A, Brodie M, Secco M, Li S, Volkers N, Wiebe S. The road to a World Health Organization global action plan on epilepsy and other neurological disorders. Epilepsia. 2021;62(5):1057–1063.
Understanding Gartner's hype cycles. Gartner. [cited 2024]. Available from: https://www.gartner.com/en/documents/3887767
Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28(1):31–38.
Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195.
Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi‐Velez F, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337–1340.
Tatum WO, Rubboli G, Kaplan PW, Mirsatari SM, Radhakrishnan K, Gloss D, et al. Clinical utility of EEG in diagnosing and monitoring epilepsy in adults. Clin Neurophysiol. 2018;129(5):1056–1082.
Pillai J, Sperling MR. Interictal EEG and the diagnosis of epilepsy. Epilepsia. 2006;47(Suppl 1):14–22.
Jing J, Ge W, Hong S, Fernandes MB, Lin Z, Yang C, et al. Development of expert‐level classification of seizures and rhythmic and periodic patterns during EEG interpretation. Neurology. 2023;100(17):e1750–e1762.
Mehndiratta MM, Wadhai SA. International epilepsy day ‐ a day notified for global public education & awareness. Indian J Med Res. 2015;141(2):143–144.
Adornato BT, Drogan O, Thoresen P, Coleman M, Henderson VW, Henry KA, et al. The practice of neurology, 2000‐2010: report of the AAN member research subcommittee. Neurology. 2011;77(21):1921–1928.
Nascimento FA, Gavvala JR. Education research: neurology resident EEG education: a survey of US neurology residency program directors. Neurology. 2021;96(17):821–824.
Nascimento FA, Katyal R, Olandoski M, Gao H, Yap S, Matthews R, et al. Expert accuracy and inter‐rater agreement of “must‐know” EEG findings for adult and child neurology residents. Epileptic Disord. 2024;26(1):109–120. https://doi.org/10.1002/epd2.20186
Benbadis SR, Tatum WO. Overintepretation of EEGs and misdiagnosis of epilepsy. J Clin Neurophysiol. 2003;20(1):42–44.
Benbadis SR, Lin K. Errors in EEG interpretation and misdiagnosis of epilepsy. Which EEG patterns are overread? Eur Neurol. 2008;59(5):267–271.
Benbadis SR. Errors in EEGs and the misdiagnosis of epilepsy: importance, causes, consequences, and proposed remedies. Epilepsy Behav EB. 2007;11(3):257–262.
Beniczky S, Karoly P, Nurse E, Ryvlin P, Cook M. Machine learning and wearable devices of the future. Epilepsia. 2021;62(Suppl 2):S116–S124.
Tveit J, Aurlien H, Plis S, Calhoun VD, Tatum WO, Schomer DL, et al. Automated interpretation of clinical electroencephalograms using artificial intelligence. JAMA Neurol. 2023;80(8):805–812.
Kleen JK, Guterman EL. The new era of automated electroencephalogram interpretation. JAMA Neurol. 2023;80(8):777–778.
Chalkidou A, Shokraneh F, Kijauskaite G, Taylor‐Phillips S, Halligan S, Wilkinson L, et al. Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening. Lancet Digit Health. 2022;4(12):e899–e905.
Japaridze G, Kasradze S, Lomidze G, Zhizhiashvili L, Kvernadze D, Geladze K, et al. Focal EEG features and therapeutic response in patients with juvenile absence and myoclonic epilepsy. Clin Neurophysiol. 2016;127(2):1182–1187.
Vlachou M, Ryvlin P, Armand Larsen S, Beniczky S. Focal electroclinical features in generalized tonic‐clonic seizures: decision flowchart for a diagnostic challenge. Epilepsia. 2024;65(3):725–738.
Amin U, Nascimento FA, Karakis I, Schomer D, Benbadis SR. Normal variants and artifacts: importance in EEG interpretation. Epileptic Disord. 2023;25(5):591–648.
Seeck M, Koessler L, Bast T, Leijten F, Michel C, Baumgartner C, et al. The standardized EEG electrode array of the IFCN. Clin Neurophysiol. 2017;128(10):2070–2077.
Gwet KL. Handbook of inter‐rater reliability: the definitive guide to measuring the extent of agreement among raters. 4th ed. Gaithersburg, Md: Advances Analytics, LLC; 2014. p. 410.
Gwet KL. Computing inter‐rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol. 2008;61(Pt 1):29–48.
Wongpakaran N, Wongpakaran T, Wedding D, Gwet KL. A comparison of Cohen's kappa and Gwet's AC1 when calculating inter‐rater reliability coefficients: a study conducted with personality disorder samples. BMC Med Res Methodol. 2013;13:61.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174.
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. BMJ. 2015;351:h5527.
Cohen JF, Korevaar DA, Altman DG, Bruns DE, Gatsonis CA, Hooft L, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open. 2016;6(11):e012799.
Norori N, Hu Q, Aellen FM, Faraci FD, Tzovara A. Addressing bias in big data and AI for health care: a call for open science. Patterns N Y N. 2021;2(10):100347.
Chen F, Wang L, Hong J, Jiang J, Zhou L. Unmasking bias and inequities: A systematic review of bias detection and mitigation in healthcare artificial intelligence using electronic health records [Internet]. arXiv. 2023 [cited 2024]. Available from: http://arxiv.org/abs/2310.19917
Kural MA, Duez L, Sejer Hansen V, Larsson PG, Rampp S, Schulz R, et al. Criteria for defining interictal epileptiform discharges in EEG: a clinical validation study. Neurology. 2020;94(20):e2139–e2147.
Hasan TF, Tatum WO. When should we obtain a routine EEG while managing people with epilepsy? Epilepsy Behav Rep. 2021;16:100454.
Jing J, Sun H, Kim JA, Herlopian A, Karakis I, Ng M, et al. Development of expert‐level automated detection of epileptiform discharges during electroencephalogram interpretation. JAMA Neurol. 2020;77(1):103–108.