GlottisNetV2: Temporal Glottal Midline Detection Using Deep Convolutional Neural Networks.
Laryngeal endoscopy
biomedical imaging
deep learning
deep neural networks
glottis
midline
Journal
IEEE journal of translational engineering in health and medicine
ISSN: 2168-2372
Titre abrégé: IEEE J Transl Eng Health Med
Pays: United States
ID NLM: 101623153
Informations de publication
Date de publication:
2023
2023
Historique:
received:
01
08
2022
revised:
27
11
2022
accepted:
04
01
2023
entrez:
23
2
2023
pubmed:
24
2
2023
medline:
3
3
2023
Statut:
epublish
Résumé
High-speed videoendoscopy is a major tool for quantitative laryngology. Glottis segmentation and glottal midline detection are crucial for computing vocal fold-specific, quantitative parameters. However, fully automated solutions show limited clinical applicability. Especially unbiased glottal midline detection remains a challenging problem. We developed a multitask deep neural network for glottis segmentation and glottal midline detection. We used techniques from pose estimation to estimate the anterior and posterior points in endoscopy images. Neural networks were set up in TensorFlow/Keras and trained and evaluated with the BAGLS dataset. We found that a dual decoder deep neural network termed GlottisNetV2 outperforms the previously proposed GlottisNet in terms of MAPE on the test dataset (1.85% to 6.3%) while converging faster. Using various hyperparameter tunings, we allow fast and directed training. Using temporal variant data on an additional data set designed for this task, we can improve the median prediction accuracy from 2.1% to 1.76% when using 12 consecutive frames and additional temporal filtering. We found that temporal glottal midline detection using a dual decoder architecture together with keypoint estimation allows accurate midline prediction. We show that our proposed architecture allows stable and reliable glottal midline predictions ready for clinical use and analysis of symmetry measures.
Identifiants
pubmed: 36816097
doi: 10.1109/JTEHM.2023.3237859
pmc: PMC9933989
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
137-144Références
Sci Data. 2020 Jun 19;7(1):186
pubmed: 32561845
J Speech Lang Hear Res. 2021 Jun 4;64(6):1889-1903
pubmed: 34000199
Laryngoscope. 2012 Jul;122(7):1582-8
pubmed: 22544473
Comput Biol Med. 2022 Feb;141:105089
pubmed: 34920160
J Speech Lang Hear Res. 2011 Feb;54(1):47-54
pubmed: 20699347
Nat Neurosci. 2018 Sep;21(9):1281-1289
pubmed: 30127430
PLoS One. 2020 Feb 10;15(2):e0227791
pubmed: 32040514
Nat Methods. 2022 Apr;19(4):486-495
pubmed: 35379947
Laryngoscope. 2012 Jan;122(1):58-65
pubmed: 21898450
Nat Med. 2019 Jan;25(1):24-29
pubmed: 30617335
Laryngoscope. 2008 Apr;118(4):753-8
pubmed: 18216742
Folia Phoniatr Logop. 2008;60(1):33-44
pubmed: 18057909
NPJ Digit Med. 2021 Jan 8;4(1):5
pubmed: 33420381
J Acoust Soc Am. 2013 Feb;133(2):EL82-7
pubmed: 23363198
J Speech Lang Hear Res. 2014 Apr 1;57(2):S674-86
pubmed: 24686982
Sci Rep. 2020 Nov 26;10(1):20723
pubmed: 33244031
Sci Rep. 2021 Jul 2;11(1):13760
pubmed: 34215788
Biomed Res Int. 2016;2016:4575437
pubmed: 27990428
J Voice. 2011 Sep;25(5):576-90
pubmed: 20728308
Med Image Anal. 2007 Aug;11(4):400-13
pubmed: 17544839
Laryngoscope. 2013 Jul;123(7):1686-93
pubmed: 23649746