Preprint Article Version 1 This version is not peer-reviewed

Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement

Version 1 : Received: 30 October 2019 / Approved: 31 October 2019 / Online: 31 October 2019 (16:40:30 CET)

How to cite: Gutiérrez-Muñoz, M.; González-Salazar, A.; Coto-Jiménez, M. Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement. Preprints 2019, 2019100376 (doi: 10.20944/preprints201910.0376.v1). Gutiérrez-Muñoz, M.; González-Salazar, A.; Coto-Jiménez, M. Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement. Preprints 2019, 2019100376 (doi: 10.20944/preprints201910.0376.v1).

Abstract

Speech signals are degraded in real-life environments, product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions.To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long and short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combination of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation has been made based on quality measurements of the signal's spectrum, training time of the networks and statistical validation of results. Results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, with advantages in efficiency, but without a significan drop in quality.

Subject Areas

artificial neural network; deep learning; LSTM; speech processing

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.