Overlapped speech detection using phase features.
Journal
The Journal of the Acoustical Society of America
ISSN: 1520-8524
Titre abrégé: J Acoust Soc Am
Pays: United States
ID NLM: 7503051
Informations de publication
Date de publication:
10 2021
10 2021
Historique:
entrez:
31
10
2021
pubmed:
1
11
2021
medline:
16
11
2021
Statut:
ppublish
Résumé
Simultaneous speech of multiple speakers is known as overlapped speech, which causes problems for speech recognition and speaker diarization systems. The present work uses previously less utilized signal phase information in the task of overlapped speech detection. In this context, Instantaneous Frequency Cosine Coefficient (IFCC) and Modified Group Delay Cepstral Coefficient (MGDCC) features are explored. IFCC captures the time-varying phase characteristics, while MGDCC represents the frequency-varying information of the phase spectrum. A Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM)-based classifier is used for the classification. The present work uses synthetically generated overlapped speech from the GRID corpus. The proposed method is benchmarked against three baseline approaches that use magnitude spectrum features. It is observed that the combination of IFCC and MGDCC features with CNN-LSTM classifier provides better performance than the baselines. The combination of phase features with magnitude-based MFCC feature provides the best performance, indicating the importance of complementary information. The present study also investigates the effect of segment duration, genders, and number of simultaneous speakers on the overlapped speech detection system. Finally, the proposed method is also evaluated on real overlapped data from the AMI corpus.
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM