The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection.

MFCC Pathology detection SVM Speech analysis Voice pathology

Journal

Journal of voice : official journal of the Voice Foundation
ISSN: 1873-4588
Titre abrégé: J Voice
Pays: United States
ID NLM: 8712262

Informations de publication

Date de publication:
27 Apr 2022
Historique:
received: 26 01 2022
accepted: 21 03 2022
entrez: 30 4 2022
pubmed: 1 5 2022
medline: 1 5 2022
Statut: aheadofprint

Résumé

Automatic voice pathology detection is a research topic, which has gained increasing interest recently. Although methods based on deep learning are becoming popular, the classical pipeline systems based on a two-stage architecture consisting of a feature extraction stage and a classifier stage are still widely used. In these classical detection systems, frame-wise computation of mel-frequency cepstral coefficients (MFCCs) is the most popular feature extraction method. However, no systematic study has been conducted to investigate the effect of the MFCC frame length on automatic voice pathology detection. In this work, we studied the effect of the MFCC frame length in voice pathology detection using three disorders (hyperkinetic dysphonia, hypokinetic dysphonia and reflux laryngitis) from the Saarbrücken Voice Disorders (SVD) database. The detection performance was compared between speaker-dependent and speaker-independent scenarios as well as between speaking task -dependent and speaking task -independent scenarios. The Support Vector Machine, which is the most widely used classifier in the study area, was used as the classifier. The results show that the detection accuracy depended on the MFFC frame length in all the scenarios studied. The best detection accuracy was obtained by using a MFFC frame length of 500 ms with a shift of 5 ms.

Identifiants

pubmed: 35490081
pii: S0892-1997(22)00087-X
doi: 10.1016/j.jvoice.2022.03.021
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.

Auteurs

Saska Tirronen (S)

Department of Signal Processing and Acoustics, Aalto University, Finland.

Sudarsana Reddy Kadiri (SR)

Department of Signal Processing and Acoustics, Aalto University, Finland. Electronic address: sudarsana.kadiri@aalto.fi.

Paavo Alku (P)

Department of Signal Processing and Acoustics, Aalto University, Finland.

Classifications MeSH