Version 1
: Received: 3 November 2017 / Approved: 3 November 2017 / Online: 3 November 2017 (14:51:47 CET)
How to cite:
Kum, S.; Nam, J. Classification-Based Singing Melody Extraction Using Deep Convolutional Neural Networks. Preprints2017, 2017110027. https://doi.org/10.20944/preprints201711.0027.v1
Kum, S.; Nam, J. Classification-Based Singing Melody Extraction Using Deep Convolutional Neural Networks. Preprints 2017, 2017110027. https://doi.org/10.20944/preprints201711.0027.v1
Kum, S.; Nam, J. Classification-Based Singing Melody Extraction Using Deep Convolutional Neural Networks. Preprints2017, 2017110027. https://doi.org/10.20944/preprints201711.0027.v1
APA Style
Kum, S., & Nam, J. (2017). Classification-Based Singing Melody Extraction Using Deep Convolutional Neural Networks. Preprints. https://doi.org/10.20944/preprints201711.0027.v1
Chicago/Turabian Style
Kum, S. and Juhan Nam. 2017 "Classification-Based Singing Melody Extraction Using Deep Convolutional Neural Networks" Preprints. https://doi.org/10.20944/preprints201711.0027.v1
Abstract
Singing melody extraction is the task that identifies the melody pitch contour of singing voice from polyphonic music. Most of the traditional melody extraction algorithms are based on calculating salient pitch candidates or separating the melody source from the mixture. Recently, classification-based approach based on deep learning has drawn much attentions. In this paper, we present a classification-based singing melody extraction model using deep convolutional neural networks. The proposed model consists of a singing pitch extractor (SPE) and a singing voice activity detector (SVAD). The SPE is trained to predict a high-resolution pitch label of singing voice from a short segment of spectrogram. This allows the model to predict highly continuous curves. The melody contour is smoothed further by post-processing the output of the melody extractor. The SVAD is trained to determine if a long segment of mel-spectrogram contains a singing voice. This often produces voice false alarm errors around the boundary of singing segments. We reduced them by exploiting the output of the SPE. Finally, we evaluate the proposed melody extraction model on several public datasets. The results show that the proposed model is comparable to state-of-the-art algorithms.
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.