Overlapped speech detection using phase features.

Female Humans Male Memory, Long-Term Neural Networks, Computer Speech

Journal

The Journal of the Acoustical Society of America

ISSN: 1520-8524

Titre abrégé: J Acoust Soc Am

Pays: United States

ID NLM: 7503051

Informations de publication

Date de publication:
10 2021

Historique:

entrez: 31 10 2021

pubmed: 1 11 2021

medline: 16 11 2021

Statut: ppublish

Résumé

Simultaneous speech of multiple speakers is known as overlapped speech, which causes problems for speech recognition and speaker diarization systems. The present work uses previously less utilized signal phase information in the task of overlapped speech detection. In this context, Instantaneous Frequency Cosine Coefficient (IFCC) and Modified Group Delay Cepstral Coefficient (MGDCC) features are explored. IFCC captures the time-varying phase characteristics, while MGDCC represents the frequency-varying information of the phase spectrum. A Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM)-based classifier is used for the classification. The present work uses synthetically generated overlapped speech from the GRID corpus. The proposed method is benchmarked against three baseline approaches that use magnitude spectrum features. It is observed that the combination of IFCC and MGDCC features with CNN-LSTM classifier provides better performance than the baselines. The combination of phase features with magnitude-based MFCC feature provides the best performance, indicating the importance of complementary information. The present study also investigates the effect of segment duration, genders, and number of simultaneous speakers on the overlapped speech detection system. Finally, the proposed method is also evaluated on real overlapped data from the AMI corpus.

Identifiants

DOI: 10.1121/10.0006614 PMID: 34717446

pubmed: 34717446

doi: 10.1121/10.0006614

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

2770

Overlapped speech detection using phase features.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Auteurs

Shikha Baghel (S)

S R Mahadeva Prasanna (SRM)

Prithwijit Guha (P)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH