Exploring scoring methods for research studies: Accuracy and variability of visual and automated sleep scoring.


Journal

Journal of sleep research
ISSN: 1365-2869
Titre abrégé: J Sleep Res
Pays: England
ID NLM: 9214441

Informations de publication

Date de publication:
10 2020
Historique:
received: 15 08 2019
revised: 20 01 2020
accepted: 20 01 2020
pubmed: 19 2 2020
medline: 9 1 2021
entrez: 19 2 2020
Statut: ppublish

Résumé

Sleep studies face new challenges in terms of data, objectives and metrics. This requires reappraising the adequacy of existing analysis methods, including scoring methods. Visual and automatic sleep scoring of healthy individuals were compared in terms of reliability (i.e., accuracy and stability) to find a scoring method capable of giving access to the actual data variability without adding exogenous variability. A first dataset (DS1, four recordings) scored by six experts plus an autoscoring algorithm was used to characterize inter-scoring variability. A second dataset (DS2, 88 recordings) scored a few weeks later was used to explore intra-expert variability. Percentage agreements and Conger's kappa were derived from epoch-by-epoch comparisons on pairwise and consensus scorings. On DS1 the number of epochs of agreement decreased when the number of experts increased, ranging from 86% (pairwise) to 69% (all experts). Adding autoscoring to visual scorings changed the kappa value from 0.81 to 0.79. Agreement between expert consensus and autoscoring was 93%. On DS2 the hypothesis of intra-expert variability was supported by a systematic decrease in kappa scores between autoscoring used as reference and each single expert between datasets (.75-.70). Although visual scoring induces inter- and intra-expert variability, autoscoring methods can cope with intra-scorer variability, making them a sensible option to reduce exogenous variability and give access to the endogenous variability in the data.

Identifiants

pubmed: 32067298
doi: 10.1111/jsr.12994
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e12994

Informations de copyright

© 2020 European Sleep Research Society.

Références

Anderer, P., Gruber, G., Parapatics, S., Woertz, M., Miazhynskaia, T., Klosch, G., … Dorffner, G. (2005). An E-health solution for automatic sleep classification according to Rechtschaffen and Kales: Validation study of the Somnolyzer 24 x 7 utilizing the siesta database. Neuropsychobiology, 51, 115-133.
Anderer, P., Moreau, A., Woertz, M., Ross, M., Gruber, G., Parapatics, S., … Dorffner, G. (2010). Computer-assisted sleep classification according to the standard of the American academy of sleep medicine: Validation study of the AASM version of the Somnolyzer 24 x 7. Neuropsychobiology, 62, 250-264.
Berthomier, C., Drouot, X., Herman-Stoïca, M., Berthomier, P., Prado, J., Bokar-Thire, D., … d'Ortho, M.-P. (2007). Automatic analysis of single-channel sleep EEG: Validation in healthy individuals. Sleep, 30, 1587-1595. https://doi.org/10.1093/sleep/30.11.1587
Castro, L. S., Poyares, D., Leger, D., Bittencourt, L., & Tufik, S. (2013). Objective prevalence of insomnia in the Sao Paulo, Brazil epidemiologic sleep study. Annals of Neurology, 74, 537-546.
Chediak, A., Esparis, B., Isaacson, R., Cruz, L. D. L., Ramirez, J., Rodriguez, J. F., … Abreu, A. (2006). How many polysomnograms must sleep fellows score before becoming proficient at scoring sleep? Journal of Clinical Sleep Medicine, 2, 427-430. https://doi.org/10.5664/jcsm.26659
Cohen, J. (1960). A coefficient of reliability for nominal scales. Educational and psychological measurement. Educational and Psychological Measurement, 20, 37-46.
Collop, N. A. (2002). Scoring variability between polysomnography technologists in different sleep laboratories. Sleep Medicine, 3, 43-47. https://doi.org/10.1016/S1389-9457(01)00115-0
Conger, A. J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322-328. https://doi.org/10.1037/0033-2909.88.2.322
Danker-Hopfe, H., Anderer, P., Zeitlhofer, J., Boeck, M., Dorn, H., Gruber, G. … Dorffner, G. (2009). Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. Journal of Sleep Research, 18, 74-84.
De Zambotti, M., Godino, J. G., Baker, F. C., Cheung, J., Patrick, K., & Colrain, I. M. (2016). The boom in wearable technology: Cause for alarm or just what is needed to better understand sleep? Sleep, 39, 1761-1762. https://doi.org/10.5665/sleep.6108
Dean, D. A., Goldberger, A. L., Mueller, R., Kim, M., Rueschman, M., Mobley, D., … Redline, S. (2016). Scaling up scientific discovery in sleep medicine: The national sleep research resource. Sleep, 39, 1151-1164. https://doi.org/10.5665/sleep.5774
Fiorillo, L., Puiatti, A., Papandrea, M., Ratti, P.-L., Favaro, P., Roth, C., … Faraci, F. D. (2019). Automated sleep scoring: A review of the latest approaches. Sleep Medicine Reviews, 48, 101204.
Grigg-Damberger, M. M. (2012). The AASM scoring manual four years later. Journal of Clinical Sleep Medicine, 8, 323-332. https://doi.org/10.5664/jcsm.1928
Gwet, K. L. (2012). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among multiple raters, 3rd ed. Gaithersburg, MD: Advanced Analytics Press.
Himanen, S. L., & Hasan, J. (2000). Limitations of Rechtschaffen and Kales. Sleep Medicine Reviews, 4, 149-167. https://doi.org/10.1053/smrv.1999.0086
Iber, C., Ancoli-Israel, S. Jr, Chesson, A. L., & Quan, S. F. (2007) The AASM Manual for the scoring of sleep and associated events: Rules, terminology and technical specifications. Westchester, Illinois: American Academy of Sleep Medicine.
Kaplan, R. F., Wang, Y., Loparo, K. A., Kelly, M. R., & Bootzin, R. R. (2014). Performance evaluation of an automated single-channel sleep-wake detection algorithm. Nature and Science of Sleep, 6, 113-122.
Koupparis, A. M., Kokkinos, V., & Kostopoulos, G. K. (2014). Semi-automatic sleep EEG scoring based on the hypnospectrogram. Journal of Neuroscience Methods, 221, 189-195. https://doi.org/10.1016/j.jneumeth.2013.10.010
Ktonas, P. Y., & Smith, J. R. (1976). Semi-automatic analysis of rapid eye movement (REM) patterns: A software package. Computers and Biomedical Research, an International Journal, 9, 109-124. https://doi.org/10.1016/0010-4809(76)90034-3
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174. https://doi.org/10.2307/2529310
Magalang, U. J., Chen, N.-H., Cistulli, P. A., Fedson, A. C., Gíslason, T., Hillman, D., … Pack, A. I. (2013). Agreement in the scoring of respiratory events and sleep among international sleep centers. Sleep, 36, 591-596. https://doi.org/10.5665/sleep.2552
Malhotra, A., Younes, M., Kuna, S. T., Benca, R., Kushida, C. A., Walsh, J., … Pien, G. W. (2013). Performance of an automated polysomnography scoring system versus computer-assisted manual scoring. Sleep, 36, 573-582. https://doi.org/10.5665/sleep.2548
Morgenthaler, T. I., Deriy, L., Heald, J. L., & Thomas, S. M. (2016). The evolution of the AASM clinical practice guidelines: Another step forward. Journal of Clinical Sleep Medicine, 12, 129-135. https://doi.org/10.5664/jcsm.5412
Penzel, T., Zhang, X., & Fietze, I. (2013). Inter-scorer reliability between sleep centers can teach us what to improve in the scoring rules. Journal of Clinical Sleep Medicine, 9, 89-91. https://doi.org/10.5664/jcsm.2352
Pittman, S. D., MacDonald, M. M., Fogel, R. B., Malhotra, A., Todros, K., Levy, B., … White, D. P. (2004). Assessment of automated scoring of polysomnographic recordings in a population with suspected sleep-disordered breathing. Sleep, 27, 1394-1403. https://doi.org/10.1093/sleep/27.7.1394
Popovic, D., Khoo, M., & Westbrook, P. (2014). Automatic scoring of sleep stages and cortical arousals using two electrodes on the forehead: Validation in healthy adults. Journal of Sleep Research, 23, 211-221. https://doi.org/10.1111/jsr.12105
Redline, S., Amin, R., Beebe, D., Chervin, R. D., Garetz, S. L., Giordani, B., … Ellenberg, S. (2011). The Childhood Adenotonsillectomy Trial (CHAT): Rationale, design, and challenges of a randomized controlled trial evaluating a standard surgical procedure in a pediatric population. Sleep, 34, 1509-1517. https://doi.org/10.5665/sleep.1388
Redline, S., Dean, D. 3rd, & Sanders, M. H. (2013). Entering the era of "big data": Getting our metrics right. Sleep, 36, 465-469. https://doi.org/10.5665/sleep.2524
Redline, S., Schluchter, M. D., Larkin, E. K., & Tishler, P. V. (2003). Predictors of longitudinal change in sleep-disordered breathing in a nonclinic population. Sleep, 26, 703-709. https://doi.org/10.1093/sleep/26.6.703
Rosenberg, R. S., & Van Hout, S. (2013). The American Academy of Sleep Medicine inter-scorer reliability program: Sleep stage scoring. Journal of Clinical Sleep Medicine, 9, 81-87. https://doi.org/10.5664/jcsm.2350
Schulz, H. (2008). Rethinking sleep analysis. Journal of Clinical Sleep Medicine, 4, 99-103. https://doi.org/10.5664/jcsm.27124
Silber, M. H., Ancoli-Israel, S., Bonnet, M. H., Chokroverty, S., Grigg-Damberger, M. M., Hirshkowitz, M., … Iber, C. (2007). The visual scoring of sleep in adults. Journal of Clinical Sleep Medicine, 3, 121-131. https://doi.org/10.5664/jcsm.26814
Stephansen, J. B., Olesen, A. N., Olsen, M., Ambati, A., Leary, E. B., Moore, H. E., … Mignot, E. (2018). Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy. Nature Communications, 9, 5229. https://doi.org/10.1038/s41467-018-07229-3
Sun, H., Jia, J., Goparaju, B., Huang, G.-B., Sourina, O., Bianchi, M. T., & Westover, M. B. (2017). Large-scale automated sleep staging. Sleep, 40(10). https://doi.org/10.1093/sleep/zsx139
Van Dongen, H. P., Vitellaro, K. M., & Dinges, D. F. (2005). Individual differences in adult human sleep and wakefulness: Leitmotif for a research agenda. Sleep, 28, 479-496. https://doi.org/10.1093/sleep/28.4.479
Virkkala, J., Hasan, J., Varri, A., Himanen, S. L., & Muller, K. (2007). Automatic sleep stage classification using two-channel electro-oculography. Journal of Neuroscience Methods, 166, 109-115. https://doi.org/10.1016/j.jneumeth.2007.06.016
Wang, Y., Loparo, K. A., Kelly, M. R., & Kaplan, R. F. (2015). Evaluation of an automated single-channel sleep staging algorithm. Nature and Science of Sleep, 7, 101-111. https://doi.org/10.2147/NSS.S77888
Whitney, C. W., Gottlieb, D. J., Redline, S., Norman, R. G., Dodge, R. R., Shahar, E., … Nieto, F. J. (1998). Reliability of scoring respiratory disturbance indices and sleep staging. Sleep, 21, 749-757. https://doi.org/10.1093/sleep/21.7.749
Younes, M., Raneri, J., & Hanly, P. (2016). Staging sleep in polysomnograms: Analysis of inter-scorer variability. Journal of Clinical Sleep Medicine, 12, 885-894. https://doi.org/10.5664/jcsm.5894
Zhang, X., Dong, X., Kantelhardt, J. W., Li, J., Zhao, L., Garcia, C., … Han, F. (2015). Process and outcome for international reliability in sleep scoring. Sleep Breath, 19, 191-195. https://doi.org/10.1007/s11325-014-0990-0

Auteurs

Vincenzo Muto (V)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.
Walloon Excellence in Life Sciences and Biotechnology (WELBIO), Liège, Belgium.
Psychology and Cognitive Neuroscience Research Unit, University of Liège, Liège, Belgium.

Christina Schmidt (C)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.
Psychology and Cognitive Neuroscience Research Unit, University of Liège, Liège, Belgium.

Gilles Vandewalle (G)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.

Mathieu Jaspar (M)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.
Walloon Excellence in Life Sciences and Biotechnology (WELBIO), Liège, Belgium.
Psychology and Cognitive Neuroscience Research Unit, University of Liège, Liège, Belgium.

Jonathan Devillers (J)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.
Walloon Excellence in Life Sciences and Biotechnology (WELBIO), Liège, Belgium.

Giulia Gaggioni (G)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.

Sarah L Chellappa (SL)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.

Christelle Meyer (C)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.
Walloon Excellence in Life Sciences and Biotechnology (WELBIO), Liège, Belgium.

Christophe Phillips (C)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.
Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium.

Eric Salmon (E)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.

Pierre Berthomier (P)

PHYSIP, Paris, France.

Jacques Prado (J)

PHYSIP, Paris, France.

Odile Benoit (O)

PHYSIP, Paris, France.

Romain Bouet (R)

Lyon Neuroscience Research Center, INSERM U1028, CNRS UMR 5292, University of Lyon 1, Lyon, France.

Marie Brandewinder (M)

PHYSIP, Paris, France.

Jérémie Mattout (J)

Lyon Neuroscience Research Center, INSERM U1028, CNRS UMR 5292, University of Lyon 1, Lyon, France.

Pierre Maquet (P)

GIGA-Cyclotron Research Centre-In vivo Imaging, University of Liège, Liège, Belgium.
Walloon Excellence in Life Sciences and Biotechnology (WELBIO), Liège, Belgium.
Department of Neurology, CHU Liège, Liège, Belgium.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH