Exploring scoring methods for research studies: Accuracy and variability of visual and automated sleep scoring.

Algorithms Healthy Volunteers Humans Male Observer Variation Polysomnography / methods Reproducibility of Results Research Design / standards Retrospective Studies Sleep / physiology

automatic scoring large datasets scoring variability visual scoring

Journal

Journal of sleep research

ISSN: 1365-2869

Titre abrégé: J Sleep Res

Pays: England

ID NLM: 9214441

Informations de publication

Date de publication:
10 2020

Historique:

received: 15 08 2019

revised: 20 01 2020

accepted: 20 01 2020

pubmed: 19 2 2020

medline: 9 1 2021

entrez: 19 2 2020

Statut: ppublish

Résumé

Sleep studies face new challenges in terms of data, objectives and metrics. This requires reappraising the adequacy of existing analysis methods, including scoring methods. Visual and automatic sleep scoring of healthy individuals were compared in terms of reliability (i.e., accuracy and stability) to find a scoring method capable of giving access to the actual data variability without adding exogenous variability. A first dataset (DS1, four recordings) scored by six experts plus an autoscoring algorithm was used to characterize inter-scoring variability. A second dataset (DS2, 88 recordings) scored a few weeks later was used to explore intra-expert variability. Percentage agreements and Conger's kappa were derived from epoch-by-epoch comparisons on pairwise and consensus scorings. On DS1 the number of epochs of agreement decreased when the number of experts increased, ranging from 86% (pairwise) to 69% (all experts). Adding autoscoring to visual scorings changed the kappa value from 0.81 to 0.79. Agreement between expert consensus and autoscoring was 93%. On DS2 the hypothesis of intra-expert variability was supported by a systematic decrease in kappa scores between autoscoring used as reference and each single expert between datasets (.75-.70). Although visual scoring induces inter- and intra-expert variability, autoscoring methods can cope with intra-scorer variability, making them a sensible option to reduce exogenous variability and give access to the endogenous variability in the data.

Identifiants

DOI: 10.1111/jsr.12994 PMID: 32067298

pubmed: 32067298

doi: 10.1111/jsr.12994

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

e12994

Informations de copyright

Références

Anderer, P., Gruber, G., Parapatics, S., Woertz, M., Miazhynskaia, T., Klosch, G., … Dorffner, G. (2005). An E-health solution for automatic sleep classification according to Rechtschaffen and Kales: Validation study of the Somnolyzer 24 x 7 utilizing the siesta database. Neuropsychobiology, 51, 115-133.

Anderer, P., Moreau, A., Woertz, M., Ross, M., Gruber, G., Parapatics, S., … Dorffner, G. (2010). Computer-assisted sleep classification according to the standard of the American academy of sleep medicine: Validation study of the AASM version of the Somnolyzer 24 x 7. Neuropsychobiology, 62, 250-264.

Berthomier, C., Drouot, X., Herman-Stoïca, M., Berthomier, P., Prado, J., Bokar-Thire, D., … d'Ortho, M.-P. (2007). Automatic analysis of single-channel sleep EEG: Validation in healthy individuals. Sleep, 30, 1587-1595. https://doi.org/10.1093/sleep/30.11.1587

Castro, L. S., Poyares, D., Leger, D., Bittencourt, L., & Tufik, S. (2013). Objective prevalence of insomnia in the Sao Paulo, Brazil epidemiologic sleep study. Annals of Neurology, 74, 537-546.

Chediak, A., Esparis, B., Isaacson, R., Cruz, L. D. L., Ramirez, J., Rodriguez, J. F., … Abreu, A. (2006). How many polysomnograms must sleep fellows score before becoming proficient at scoring sleep? Journal of Clinical Sleep Medicine, 2, 427-430. https://doi.org/10.5664/jcsm.26659

Cohen, J. (1960). A coefficient of reliability for nominal scales. Educational and psychological measurement. Educational and Psychological Measurement, 20, 37-46.

Collop, N. A. (2002). Scoring variability between polysomnography technologists in different sleep laboratories. Sleep Medicine, 3, 43-47. https://doi.org/10.1016/S1389-9457(01)00115-0

Conger, A. J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322-328. https://doi.org/10.1037/0033-2909.88.2.322

Danker-Hopfe, H., Anderer, P., Zeitlhofer, J., Boeck, M., Dorn, H., Gruber, G. … Dorffner, G. (2009). Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. Journal of Sleep Research, 18, 74-84.

De Zambotti, M., Godino, J. G., Baker, F. C., Cheung, J., Patrick, K., & Colrain, I. M. (2016). The boom in wearable technology: Cause for alarm or just what is needed to better understand sleep? Sleep, 39, 1761-1762. https://doi.org/10.5665/sleep.6108

Dean, D. A., Goldberger, A. L., Mueller, R., Kim, M., Rueschman, M., Mobley, D., … Redline, S. (2016). Scaling up scientific discovery in sleep medicine: The national sleep research resource. Sleep, 39, 1151-1164. https://doi.org/10.5665/sleep.5774

Fiorillo, L., Puiatti, A., Papandrea, M., Ratti, P.-L., Favaro, P., Roth, C., … Faraci, F. D. (2019). Automated sleep scoring: A review of the latest approaches. Sleep Medicine Reviews, 48, 101204.

Grigg-Damberger, M. M. (2012). The AASM scoring manual four years later. Journal of Clinical Sleep Medicine, 8, 323-332. https://doi.org/10.5664/jcsm.1928

Gwet, K. L. (2012). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among multiple raters, 3rd ed. Gaithersburg, MD: Advanced Analytics Press.

Himanen, S. L., & Hasan, J. (2000). Limitations of Rechtschaffen and Kales. Sleep Medicine Reviews, 4, 149-167. https://doi.org/10.1053/smrv.1999.0086

Iber, C., Ancoli-Israel, S. Jr, Chesson, A. L., & Quan, S. F. (2007) The AASM Manual for the scoring of sleep and associated events: Rules, terminology and technical specifications. Westchester, Illinois: American Academy of Sleep Medicine.

Kaplan, R. F., Wang, Y., Loparo, K. A., Kelly, M. R., & Bootzin, R. R. (2014). Performance evaluation of an automated single-channel sleep-wake detection algorithm. Nature and Science of Sleep, 6, 113-122.

Koupparis, A. M., Kokkinos, V., & Kostopoulos, G. K. (2014). Semi-automatic sleep EEG scoring based on the hypnospectrogram. Journal of Neuroscience Methods, 221, 189-195. https://doi.org/10.1016/j.jneumeth.2013.10.010

Ktonas, P. Y., & Smith, J. R. (1976). Semi-automatic analysis of rapid eye movement (REM) patterns: A software package. Computers and Biomedical Research, an International Journal, 9, 109-124. https://doi.org/10.1016/0010-4809(76)90034-3

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174. https://doi.org/10.2307/2529310

Magalang, U. J., Chen, N.-H., Cistulli, P. A., Fedson, A. C., Gíslason, T., Hillman, D., … Pack, A. I. (2013). Agreement in the scoring of respiratory events and sleep among international sleep centers. Sleep, 36, 591-596. https://doi.org/10.5665/sleep.2552

Malhotra, A., Younes, M., Kuna, S. T., Benca, R., Kushida, C. A., Walsh, J., … Pien, G. W. (2013). Performance of an automated polysomnography scoring system versus computer-assisted manual scoring. Sleep, 36, 573-582. https://doi.org/10.5665/sleep.2548

Morgenthaler, T. I., Deriy, L., Heald, J. L., & Thomas, S. M. (2016). The evolution of the AASM clinical practice guidelines: Another step forward. Journal of Clinical Sleep Medicine, 12, 129-135. https://doi.org/10.5664/jcsm.5412

Penzel, T., Zhang, X., & Fietze, I. (2013). Inter-scorer reliability between sleep centers can teach us what to improve in the scoring rules. Journal of Clinical Sleep Medicine, 9, 89-91. https://doi.org/10.5664/jcsm.2352

Pittman, S. D., MacDonald, M. M., Fogel, R. B., Malhotra, A., Todros, K., Levy, B., … White, D. P. (2004). Assessment of automated scoring of polysomnographic recordings in a population with suspected sleep-disordered breathing. Sleep, 27, 1394-1403. https://doi.org/10.1093/sleep/27.7.1394

Popovic, D., Khoo, M., & Westbrook, P. (2014). Automatic scoring of sleep stages and cortical arousals using two electrodes on the forehead: Validation in healthy adults. Journal of Sleep Research, 23, 211-221. https://doi.org/10.1111/jsr.12105

Redline, S., Amin, R., Beebe, D., Chervin, R. D., Garetz, S. L., Giordani, B., … Ellenberg, S. (2011). The Childhood Adenotonsillectomy Trial (CHAT): Rationale, design, and challenges of a randomized controlled trial evaluating a standard surgical procedure in a pediatric population. Sleep, 34, 1509-1517. https://doi.org/10.5665/sleep.1388

Redline, S., Dean, D. 3rd, & Sanders, M. H. (2013). Entering the era of "big data": Getting our metrics right. Sleep, 36, 465-469. https://doi.org/10.5665/sleep.2524

Redline, S., Schluchter, M. D., Larkin, E. K., & Tishler, P. V. (2003). Predictors of longitudinal change in sleep-disordered breathing in a nonclinic population. Sleep, 26, 703-709. https://doi.org/10.1093/sleep/26.6.703

Rosenberg, R. S., & Van Hout, S. (2013). The American Academy of Sleep Medicine inter-scorer reliability program: Sleep stage scoring. Journal of Clinical Sleep Medicine, 9, 81-87. https://doi.org/10.5664/jcsm.2350

Schulz, H. (2008). Rethinking sleep analysis. Journal of Clinical Sleep Medicine, 4, 99-103. https://doi.org/10.5664/jcsm.27124

Silber, M. H., Ancoli-Israel, S., Bonnet, M. H., Chokroverty, S., Grigg-Damberger, M. M., Hirshkowitz, M., … Iber, C. (2007). The visual scoring of sleep in adults. Journal of Clinical Sleep Medicine, 3, 121-131. https://doi.org/10.5664/jcsm.26814

Stephansen, J. B., Olesen, A. N., Olsen, M., Ambati, A., Leary, E. B., Moore, H. E., … Mignot, E. (2018). Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy. Nature Communications, 9, 5229. https://doi.org/10.1038/s41467-018-07229-3

Sun, H., Jia, J., Goparaju, B., Huang, G.-B., Sourina, O., Bianchi, M. T., & Westover, M. B. (2017). Large-scale automated sleep staging. Sleep, 40(10). https://doi.org/10.1093/sleep/zsx139

Van Dongen, H. P., Vitellaro, K. M., & Dinges, D. F. (2005). Individual differences in adult human sleep and wakefulness: Leitmotif for a research agenda. Sleep, 28, 479-496. https://doi.org/10.1093/sleep/28.4.479

Virkkala, J., Hasan, J., Varri, A., Himanen, S. L., & Muller, K. (2007). Automatic sleep stage classification using two-channel electro-oculography. Journal of Neuroscience Methods, 166, 109-115. https://doi.org/10.1016/j.jneumeth.2007.06.016

Wang, Y., Loparo, K. A., Kelly, M. R., & Kaplan, R. F. (2015). Evaluation of an automated single-channel sleep staging algorithm. Nature and Science of Sleep, 7, 101-111. https://doi.org/10.2147/NSS.S77888

Whitney, C. W., Gottlieb, D. J., Redline, S., Norman, R. G., Dodge, R. R., Shahar, E., … Nieto, F. J. (1998). Reliability of scoring respiratory disturbance indices and sleep staging. Sleep, 21, 749-757. https://doi.org/10.1093/sleep/21.7.749

Younes, M., Raneri, J., & Hanly, P. (2016). Staging sleep in polysomnograms: Analysis of inter-scorer variability. Journal of Clinical Sleep Medicine, 12, 885-894. https://doi.org/10.5664/jcsm.5894

Zhang, X., Dong, X., Kantelhardt, J. W., Li, J., Zhao, L., Garcia, C., … Han, F. (2015). Process and outcome for international reliability in sleep scoring. Sleep Breath, 19, 191-195. https://doi.org/10.1007/s11325-014-0990-0

Exploring scoring methods for research studies: Accuracy and variability of visual and automated sleep scoring.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Christian Berthomier (C)

Vincenzo Muto (V)

Christina Schmidt (C)

Gilles Vandewalle (G)

Mathieu Jaspar (M)

Jonathan Devillers (J)

Giulia Gaggioni (G)

Sarah L Chellappa (SL)

Christelle Meyer (C)

Christophe Phillips (C)

Eric Salmon (E)

Pierre Berthomier (P)

Jacques Prado (J)

Odile Benoit (O)

Romain Bouet (R)

Marie Brandewinder (M)

Jérémie Mattout (J)

Pierre Maquet (P)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH