Szczegóły publikacji
Opis bibliograficzny
Convolutive weighted multichannel Wiener filter front-end for distant automatic speech recognition in reverberant multispeaker scenarios / Mieszko FRAŚ, Marcin WITKOWSKI, Konrad KOWALCZYK // W: INTERSPEECH 2022 [Dokument elektroniczny] : September 18–22, Incheon, Korea. — Wersja do Windows. — Dane tekstowe. — [Seoul : The Acoustical Society of Korea], [2022]. — S. 2943–2947. — Wymagania systemowe: Adobe Reader. — Tryb dostępu: https://isca-speech.org/archive/pdfs/interspeech_2022/fras22_... [2022-09-03]. — Bibliogr. s. 2947, Abstr.
Autorzy (3)
Słowa kluczowe
Dane bibliometryczne
ID BaDAP | 141901 |
---|---|
Data dodania do BaDAP | 2022-09-06 |
DOI | 10.21437/Interspeech.2022-10780 |
Rok publikacji | 2022 |
Typ publikacji | materiały konferencyjne (aut.) |
Otwarty dostęp | |
Creative Commons | |
Konferencja | INTERSPEECH 2022 |
Abstract
The performance of automatic speech recognition (ASR) systems strongly deteriorates when the desired speech signal is contaminated with room reverberation and when the speech of interfering speakers overlaps. To achieve acceptable word error rates (WER) by distant ASR in multispeaker reverberant scenarios, source separation and dereverberation can be performed as front-end processing. An existing optimum filter suitable for this task is the recently proposed weighted power minimization distortionless response convolutional beamformer (WPD). In this paper, we introduce a novel speech enhancement frontend for improving the accuracy of back-end ASR in scenarios with multiple reverberant overlapping speakers. The convolutional weighted multichannel Wiener filter (CW-MWF) is optimum for the joint separation and dereverberation task, and it is derived from the convolutional weighted minimum mean square error (CW-MMSE) optimization criterion, presented recently by the current authors. The WER results of performed experiments indicate superior performance of the CW-MWF in real and simulated rooms, irrespective of the method used for filter parameter estimation and the DNN model used for backend ASR.