Generalizability of electroencephalographic interpretation using artificial intelligence: An external validation study.

automatic EEG analysis epilepsy normal variants routine EEG sleep

Journal

Epilepsia

ISSN: 1528-1167

Titre abrégé: Epilepsia

Pays: United States

ID NLM: 2983306R

Informations de publication

Date de publication:
14 Aug 2024

Historique:

revised: 12 07 2024

received: 27 04 2024

accepted: 25 07 2024

medline: 14 8 2024

pubmed: 14 8 2024

entrez: 14 8 2024

Statut: aheadofprint

Résumé

The automated interpretation of clinical electroencephalograms (EEGs) using artificial intelligence (AI) holds the potential to bridge the treatment gap in resource-limited settings and reduce the workload at specialized centers. However, to facilitate broad clinical implementation, it is essential to establish generalizability across diverse patient populations and equipment. We assessed whether SCORE-AI demonstrates diagnostic accuracy comparable to that of experts when applied to a geographically different patient population, recorded with distinct EEG equipment and technical settings. We assessed the diagnostic accuracy of a "fixed-and-frozen" AI model, using an independent dataset and external gold standard, and benchmarked it against three experts blinded to all other data. The dataset comprised 50% normal and 50% abnormal routine EEGs, equally distributed among the four major classes of EEG abnormalities (focal epileptiform, generalized epileptiform, focal nonepileptiform, and diffuse nonepileptiform). To assess diagnostic accuracy, we computed sensitivity, specificity, and accuracy of the AI model and the experts against the external gold standard. We analyzed EEGs from 104 patients (64 females, median age = 38.6 [range = 16-91] years). SCORE-AI performed equally well compared to the experts, with an overall accuracy of 92% (95% confidence interval [CI] = 90%-94%) versus 94% (95% CI = 92%-96%). There was no significant difference between SCORE-AI and the experts for any metric or category. SCORE-AI performed well independently of the vigilance state (false classification during awake: 5/41 [12.2%], false classification during sleep: 2/11 [18.2%]; p = .63) and normal variants (false classification in presence of normal variants: 4/14 [28.6%], false classification in absence of normal variants: 3/38 [7.9%]; p = .07). SCORE-AI achieved diagnostic performance equal to human experts in an EEG dataset independent of the development dataset, in a geographically distinct patient population, recorded with different equipment and technical settings than the development dataset.

Identifiants

DOI: 10.1111/epi.18082 PMID: 39141002

pubmed: 39141002

doi: 10.1111/epi.18082

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : CIHR

ID : PJT-175056

Pays : Canada

Organisme : Duke University

ID : Start-up funding

Informations de copyright

Références

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930–1940.

Guekht A, Brodie M, Secco M, Li S, Volkers N, Wiebe S. The road to a World Health Organization global action plan on epilepsy and other neurological disorders. Epilepsia. 2021;62(5):1057–1063.

Understanding Gartner's hype cycles. Gartner. [cited 2024]. Available from: https://www.gartner.com/en/documents/3887767

Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28(1):31–38.

Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195.

Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi‐Velez F, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337–1340.

Tatum WO, Rubboli G, Kaplan PW, Mirsatari SM, Radhakrishnan K, Gloss D, et al. Clinical utility of EEG in diagnosing and monitoring epilepsy in adults. Clin Neurophysiol. 2018;129(5):1056–1082.

Pillai J, Sperling MR. Interictal EEG and the diagnosis of epilepsy. Epilepsia. 2006;47(Suppl 1):14–22.

Jing J, Ge W, Hong S, Fernandes MB, Lin Z, Yang C, et al. Development of expert‐level classification of seizures and rhythmic and periodic patterns during EEG interpretation. Neurology. 2023;100(17):e1750–e1762.

Mehndiratta MM, Wadhai SA. International epilepsy day ‐ a day notified for global public education & awareness. Indian J Med Res. 2015;141(2):143–144.

Adornato BT, Drogan O, Thoresen P, Coleman M, Henderson VW, Henry KA, et al. The practice of neurology, 2000‐2010: report of the AAN member research subcommittee. Neurology. 2011;77(21):1921–1928.

Nascimento FA, Gavvala JR. Education research: neurology resident EEG education: a survey of US neurology residency program directors. Neurology. 2021;96(17):821–824.

Nascimento FA, Katyal R, Olandoski M, Gao H, Yap S, Matthews R, et al. Expert accuracy and inter‐rater agreement of “must‐know” EEG findings for adult and child neurology residents. Epileptic Disord. 2024;26(1):109–120. https://doi.org/10.1002/epd2.20186

Benbadis SR, Tatum WO. Overintepretation of EEGs and misdiagnosis of epilepsy. J Clin Neurophysiol. 2003;20(1):42–44.

Benbadis SR, Lin K. Errors in EEG interpretation and misdiagnosis of epilepsy. Which EEG patterns are overread? Eur Neurol. 2008;59(5):267–271.

Benbadis SR. Errors in EEGs and the misdiagnosis of epilepsy: importance, causes, consequences, and proposed remedies. Epilepsy Behav EB. 2007;11(3):257–262.

Beniczky S, Karoly P, Nurse E, Ryvlin P, Cook M. Machine learning and wearable devices of the future. Epilepsia. 2021;62(Suppl 2):S116–S124.

Tveit J, Aurlien H, Plis S, Calhoun VD, Tatum WO, Schomer DL, et al. Automated interpretation of clinical electroencephalograms using artificial intelligence. JAMA Neurol. 2023;80(8):805–812.

Kleen JK, Guterman EL. The new era of automated electroencephalogram interpretation. JAMA Neurol. 2023;80(8):777–778.

Chalkidou A, Shokraneh F, Kijauskaite G, Taylor‐Phillips S, Halligan S, Wilkinson L, et al. Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening. Lancet Digit Health. 2022;4(12):e899–e905.

Japaridze G, Kasradze S, Lomidze G, Zhizhiashvili L, Kvernadze D, Geladze K, et al. Focal EEG features and therapeutic response in patients with juvenile absence and myoclonic epilepsy. Clin Neurophysiol. 2016;127(2):1182–1187.

Vlachou M, Ryvlin P, Armand Larsen S, Beniczky S. Focal electroclinical features in generalized tonic‐clonic seizures: decision flowchart for a diagnostic challenge. Epilepsia. 2024;65(3):725–738.

Amin U, Nascimento FA, Karakis I, Schomer D, Benbadis SR. Normal variants and artifacts: importance in EEG interpretation. Epileptic Disord. 2023;25(5):591–648.

Seeck M, Koessler L, Bast T, Leijten F, Michel C, Baumgartner C, et al. The standardized EEG electrode array of the IFCN. Clin Neurophysiol. 2017;128(10):2070–2077.

Gwet KL. Handbook of inter‐rater reliability: the definitive guide to measuring the extent of agreement among raters. 4th ed. Gaithersburg, Md: Advances Analytics, LLC; 2014. p. 410.

Gwet KL. Computing inter‐rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol. 2008;61(Pt 1):29–48.

Wongpakaran N, Wongpakaran T, Wedding D, Gwet KL. A comparison of Cohen's kappa and Gwet's AC1 when calculating inter‐rater reliability coefficients: a study conducted with personality disorder samples. BMC Med Res Methodol. 2013;13:61.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174.

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. BMJ. 2015;351:h5527.

Cohen JF, Korevaar DA, Altman DG, Bruns DE, Gatsonis CA, Hooft L, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open. 2016;6(11):e012799.

Norori N, Hu Q, Aellen FM, Faraci FD, Tzovara A. Addressing bias in big data and AI for health care: a call for open science. Patterns N Y N. 2021;2(10):100347.

Chen F, Wang L, Hong J, Jiang J, Zhou L. Unmasking bias and inequities: A systematic review of bias detection and mitigation in healthcare artificial intelligence using electronic health records [Internet]. arXiv. 2023 [cited 2024]. Available from: http://arxiv.org/abs/2310.19917

Kural MA, Duez L, Sejer Hansen V, Larsson PG, Rampp S, Schulz R, et al. Criteria for defining interictal epileptiform discharges in EEG: a clinical validation study. Neurology. 2020;94(20):e2139–e2147.

Hasan TF, Tatum WO. When should we obtain a routine EEG while managing people with epilepsy? Epilepsy Behav Rep. 2021;16:100454.

Jing J, Sun H, Kim JA, Herlopian A, Karakis I, Ng M, et al. Development of expert‐level automated detection of epileptiform discharges during electroencephalogram interpretation. JAMA Neurol. 2020;77(1):103–108.

Generalizability of electroencephalographic interpretation using artificial intelligence: An external validation study.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Subventions

Informations de copyright

Références

Auteurs

Daniel Mansilla (D)

Jesper Tveit (J)

Harald Aurlien (H)

Tamir Avigdor (T)

Victoria Ros-Castello (V)

Alyssa Ho (A)

Chifaou Abdallah (C)

Jean Gotman (J)

Sándor Beniczky (S)

Birgit Frauscher (B)

Classifications MeSH