Overlapped speech detection using phase features.


Journal

The Journal of the Acoustical Society of America
ISSN: 1520-8524
Titre abrégé: J Acoust Soc Am
Pays: United States
ID NLM: 7503051

Informations de publication

Date de publication:
10 2021
Historique:
entrez: 31 10 2021
pubmed: 1 11 2021
medline: 16 11 2021
Statut: ppublish

Résumé

Simultaneous speech of multiple speakers is known as overlapped speech, which causes problems for speech recognition and speaker diarization systems. The present work uses previously less utilized signal phase information in the task of overlapped speech detection. In this context, Instantaneous Frequency Cosine Coefficient (IFCC) and Modified Group Delay Cepstral Coefficient (MGDCC) features are explored. IFCC captures the time-varying phase characteristics, while MGDCC represents the frequency-varying information of the phase spectrum. A Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM)-based classifier is used for the classification. The present work uses synthetically generated overlapped speech from the GRID corpus. The proposed method is benchmarked against three baseline approaches that use magnitude spectrum features. It is observed that the combination of IFCC and MGDCC features with CNN-LSTM classifier provides better performance than the baselines. The combination of phase features with magnitude-based MFCC feature provides the best performance, indicating the importance of complementary information. The present study also investigates the effect of segment duration, genders, and number of simultaneous speakers on the overlapped speech detection system. Finally, the proposed method is also evaluated on real overlapped data from the AMI corpus.

Identifiants

pubmed: 34717446
doi: 10.1121/10.0006614
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2770

Auteurs

Shikha Baghel (S)

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati-781039, India.

S R Mahadeva Prasanna (SRM)

Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad 580011, India.

Prithwijit Guha (P)

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati-781039, India.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH