ARTICLE | doi:10.20944/preprints202107.0691.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: semantic segmentation; activation function, deep ensembles
Online: 30 July 2021 (09:36:28 CEST)
Semantic segmentation is a very popular topic in modern computer vision and it has applications to many fields. Researchers proposed a variety of architectures over time, but the most common ones exploit an encoder-decoder structure that aims to capture the semantics of the image and it low level features. The encoder uses convolutional layers, in general with a stride larger than one, to extract the features, while the decoder recreates the image by upsampling an using skip connections with the first layers. In this work, we use DeepLab as architecture to test the effectiveness of creating an ensemble of networks by randomly changing the activation functions inside the network multiple times. We also use different backbone networks in our DeepLab to validate our findings. We manage to reach a dice coefficient of 0.888, and a mean Intersection over Union (mIoU) of 0.825, in the competitive Kvasir-SEG dataset. Results in skin detection also confirm the performance of the proposed ensemble, which is ranked first with respect to other state-of-the-art approaches (including HardNet) in a large set of testing datasets. The developed code will be available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints202103.0180.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: convolutional neural networks; activation functions; biomedical classification; ensembles; MeLU variants
Online: 5 March 2021 (10:05:38 CET)
Recently, much attention has been devoted to finding highly efficient and powerful activation functions for CNN layers. Because activation functions inject different nonlinearities between layers that affect performance, varying them is one method for building robust ensembles of CNNs. The objective of this study is to examine the performance of CNN ensembles made with different activation functions, including six new ones presented here: 2D Mexican ReLU, TanELU, MeLU+GaLU, Symmetric MeLU, Symmetric GaLU, and Flexible MeLU. The highest performing ensemble was built with CNNs having different activation layers that randomly replaced the standard ReLU. A comprehensive evaluation of the proposed approach was conducted across fifteen biomedical data sets representing various classification tasks. The proposed method was tested on two basic CNN architectures: Vgg16 and ResNet50. Results demonstrate the superiority in performance of this approach. The MATLAB source code for this study will be available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints202010.0526.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: audio classification; dissimilarity space; siamese network; ensemble of classifiers; pattern recognition; animal audio
Online: 26 October 2020 (13:57:01 CET)
The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using 4 different backbones, with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. Different clustering methods reduce the spectrograms in the dataset to a set of centroids that generate (in both a supervised and unsupervised fashion) the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, additional experiments process the spectrograms using the Heterogeneous Auto-Similarities of Characteristics. Once the similarity spaces are computed, a vector space representation of each pattern is generated that is then trained on a Support Vector Machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best stand-alone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset. The MATLAB code used in this study is available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints202002.0231.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Convolutional Neural Networks; ensemble of classifiers; activation functions; image classification; skin detection
Online: 17 February 2020 (01:50:08 CET)
In recent years, the field of deep learning achieved considerable success in pattern recognition, image segmentation and may other classification fields. There are a lot of studies and practical applications of deep learning on images, video or text classification. In this study, we suggest a method for changing the architecture of the most performing CNN models with the aim of designing new models to be used as stand-alone networks or as a component of an ensemble. We propose to replace each activation layer of a CNN (usually a ReLu layer) by a different activation function stochastically drawn from a set of activation functions: in this way the resulting CNN has a different set of activation function layers.
ARTICLE | doi:10.20944/preprints202104.0766.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: PAM; Passive acoustic monitoring; audio classiﬁcation; texture classiﬁcation; PAM- 16 ﬁlter; experimental protocols for audio classiﬁcation; statistical tests.
Online: 29 April 2021 (07:55:09 CEST)
Abstract: Passive acoustic monitoring (PAM) is a non-invasive technique to supervise the wildlife. Acoustic surveillance is preferable in some situation such as in the case of marine mammals, when the animals spend most of their time underwater, making it hard to obtain their images. Machine learning is very useful for PAM, for example, to identify species based on audio recordings. But some care should be taken to evaluate the capability of a system. We deﬁne PAM-ﬁlters as the creation of the experimental protocols according to the dates and locations of the recordings, aiming to avoid the use of the same individuals, noise and recording devices in both training and test sets. A random division of a database present accuracies much higher than accuracies obtained with protocols generated with PAM-ﬁlter. Although we use the animal vocalizations, in our method we convert the audio into spectrogram images, after that, we describe the images using the texture. Those are well-known techniques for audio classiﬁcation, and they have already been used for species classiﬁcation. Also, we perform statistical tests to demonstrate the signiﬁcant difference between accuracies generated with and without PAM-ﬁlters with several well-known classiﬁers. The conﬁguration of our experimental protocols and the database were made available online.