An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks.
Journal
Computational and mathematical methods in medicine
ISSN: 1748-6718
Titre abrégé: Comput Math Methods Med
Pays: United States
ID NLM: 101277751
Informations de publication
Date de publication:
2022
2022
Historique:
received:
21
01
2022
revised:
17
02
2022
accepted:
07
03
2022
entrez:
9
5
2022
pubmed:
10
5
2022
medline:
11
5
2022
Statut:
epublish
Résumé
Diseases of internal organs other than the vocal folds can also affect a person's voice. As a result, voice problems are on the rise, even though they are frequently overlooked. According to a recent study, voice pathology detection systems can successfully help the assessment of voice abnormalities and enable the early diagnosis of voice pathology. For instance, in the early identification and diagnosis of voice problems, the automatic system for distinguishing healthy and diseased voices has gotten much attention. As a result, artificial intelligence-assisted voice analysis brings up new possibilities in healthcare. The work was aimed at assessing the utility of several automatic speech signal analysis methods for diagnosing voice disorders and suggesting a strategy for classifying healthy and diseased voices. The proposed framework integrates the efficacy of three voice characteristics: chroma, mel spectrogram, and mel frequency cepstral coefficient (MFCC). We also designed a deep neural network (DNN) capable of learning from the retrieved data and producing a highly accurate voice-based disease prediction model. The study describes a series of studies using the Saarbruecken Voice Database (SVD) to detect abnormal voices. The model was developed and tested using the vowels /a/, /i/, and /u/ pronounced in high, low, and average pitches. We also maintained the "continuous sentence" audio files collected from SVD to select how well the developed model generalizes to completely new data. The highest accuracy achieved was 77.49%, superior to prior attempts in the same domain. Additionally, the model attains an accuracy of 88.01% by integrating speaker gender information. The designed model trained on selected diseases can also obtain a maximum accuracy of 96.77% (cordectomy × healthy). As a result, the suggested framework is the best fit for the healthcare industry.
Identifiants
pubmed: 35529259
doi: 10.1155/2022/7814952
pmc: PMC9071878
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
7814952Informations de copyright
Copyright © 2022 Mohammed Zakariah et al.
Déclaration de conflit d'intérêts
The authors declare that there is no conflict of interest regarding the publication of this paper.
Références
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:2514-7
pubmed: 19964970
Comput Biol Med. 2016 Feb 1;69:270-6
pubmed: 26471193
J Voice. 2017 Mar;31(2):248.e11-248.e23
pubmed: 27692682
J Neural Transm (Vienna). 2017 Mar;124(3):303-334
pubmed: 28101650
Logoped Phoniatr Vocol. 2011 Jul;36(2):60-9
pubmed: 21073260
IEEE Trans Biomed Eng. 2006 Oct;53(10):1943-53
pubmed: 17019858
J Voice. 2017 Jan;31(1):3-15
pubmed: 26992554
Sensors (Basel). 2017 Jan 29;17(2):
pubmed: 28146069
Curr Opin Otolaryngol Head Neck Surg. 2008 Jun;16(3):211-5
pubmed: 18475073
J Voice. 2017 Jan;31(1):113.e9-113.e18
pubmed: 27105857
J Speech Hear Res. 1980 Mar;23(1):202-9
pubmed: 7442177
J Speech Hear Res. 1996 Apr;39(2):311-21
pubmed: 8729919
Comput Math Methods Med. 2015;2015:956249
pubmed: 26681977
Conf Proc IEEE Eng Med Biol Soc. 2006;2006:1669-73
pubmed: 17946059