Article
Version 1
Preserved in Portico This version is not peer-reviewed
Target Selection Strategies for Demucs-Based Speech Enhancement
Version 1
: Received: 12 June 2023 / Approved: 13 June 2023 / Online: 13 June 2023 (04:19:50 CEST)
A peer-reviewed article of this Preprint also exists.
Rascon, C.; Fuentes-Pineda, G. Target Selection Strategies for Demucs-Based Speech Enhancement. Appl. Sci. 2023, 13, 7820. Rascon, C.; Fuentes-Pineda, G. Target Selection Strategies for Demucs-Based Speech Enhancement. Appl. Sci. 2023, 13, 7820.
Abstract
Online speech enhancement is of great interest, because of its diverse areas of application. The Demucs architecture has been recently shown to achieve a high level of performance, even with small audio segments, while requiring a relatively small amount of computational resources. However, an issue that it bears, as well as most of state-of-the-art speech enhancement techniques, is that of target speech selection: it is assumed that only one speech source is present. However, in real-life scenarios, more than one speech source may be present, in it is uncertain which speech source within the fed mixture should be enhanced. Thus, it is of interest to complement speech enhancement techniques (such as Demucs-based) with a target selection scheme, so as to ensure that only the source of interest is enhanced. In this work, two target selection strategies are explored: 1) an embedding-based strategy, using a codified sample of the target speech, and 2) a location-based strategy, using a beamforming-based pre-filter to select the target that is in front of a 2-microphone array. It is shown that while both strategies improve the performance of the Demucs-based technique when one or more speech interferences are present, they both have their pros and cons. Specifically, while the beamforming-based strategy achieves overall a better performance compared to the embedding-based strategy, it is sensitive against the location variation of the target speech source which the embedding-based strategy does not suffer from.
Keywords
demucs; target selection; embeddings; phase-based beamforming
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment