ARTICLE | doi:10.20944/preprints202107.0691.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: semantic segmentation; activation function, deep ensembles
Online: 30 July 2021 (09:36:28 CEST)
Semantic segmentation is a very popular topic in modern computer vision and it has applications to many fields. Researchers proposed a variety of architectures over time, but the most common ones exploit an encoder-decoder structure that aims to capture the semantics of the image and it low level features. The encoder uses convolutional layers, in general with a stride larger than one, to extract the features, while the decoder recreates the image by upsampling an using skip connections with the first layers. In this work, we use DeepLab as architecture to test the effectiveness of creating an ensemble of networks by randomly changing the activation functions inside the network multiple times. We also use different backbone networks in our DeepLab to validate our findings. We manage to reach a dice coefficient of 0.888, and a mean Intersection over Union (mIoU) of 0.825, in the competitive Kvasir-SEG dataset. Results in skin detection also confirm the performance of the proposed ensemble, which is ranked first with respect to other state-of-the-art approaches (including HardNet) in a large set of testing datasets. The developed code will be available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints202111.0047.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Data augmentation; Deep Learning; Convolutional Neural Networks; Ensemble.
Online: 2 November 2021 (11:18:23 CET)
Convolutional Neural Networks (CNNs) have gained prominence in the research literature on image classification over the last decade. One shortcoming of CNNs, however, is their lack of generalizability and tendency to overfit when presented with small training sets. Augmentation directly confronts this problem by generating new data points providing additional information. In this paper, we investigate the performance of more than ten different sets of data augmentation methods, with two novel approaches proposed here: one based on the Discrete Wavelet Transform and the other on the Constant-Q Gabor transform. Pretrained ResNet50 networks are finetuned on each augmentation method. Combinations of these networks are evaluated and compared across three benchmark data sets of images representing diverse problems and collected by instruments that capture information at different scales: a virus data set, a bark data set, and a LIGO glitches data set. Experiments demonstrate the superiority of this approach. The best ensemble proposed in this work achieves state-of-the-art performance across all three data sets. This result shows that varying data augmentation is a feasible way for building an ensemble of classifiers for image classification (code available at https://github.com/LorisNanni).
ARTICLE | doi:10.20944/preprints202103.0180.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: convolutional neural networks; activation functions; biomedical classification; ensembles; MeLU variants
Online: 5 March 2021 (10:05:38 CET)
Recently, much attention has been devoted to finding highly efficient and powerful activation functions for CNN layers. Because activation functions inject different nonlinearities between layers that affect performance, varying them is one method for building robust ensembles of CNNs. The objective of this study is to examine the performance of CNN ensembles made with different activation functions, including six new ones presented here: 2D Mexican ReLU, TanELU, MeLU+GaLU, Symmetric MeLU, Symmetric GaLU, and Flexible MeLU. The highest performing ensemble was built with CNNs having different activation layers that randomly replaced the standard ReLU. A comprehensive evaluation of the proposed approach was conducted across fifteen biomedical data sets representing various classification tasks. The proposed method was tested on two basic CNN architectures: Vgg16 and ResNet50. Results demonstrate the superiority in performance of this approach. The MATLAB source code for this study will be available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints202010.0526.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: audio classification; dissimilarity space; siamese network; ensemble of classifiers; pattern recognition; animal audio
Online: 26 October 2020 (13:57:01 CET)
The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using 4 different backbones, with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. Different clustering methods reduce the spectrograms in the dataset to a set of centroids that generate (in both a supervised and unsupervised fashion) the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, additional experiments process the spectrograms using the Heterogeneous Auto-Similarities of Characteristics. Once the similarity spaces are computed, a vector space representation of each pattern is generated that is then trained on a Support Vector Machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best stand-alone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset. The MATLAB code used in this study is available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints202002.0231.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Convolutional Neural Networks; ensemble of classifiers; activation functions; image classification; skin detection
Online: 17 February 2020 (01:50:08 CET)
In recent years, the field of deep learning achieved considerable success in pattern recognition, image segmentation and may other classification fields. There are a lot of studies and practical applications of deep learning on images, video or text classification. In this study, we suggest a method for changing the architecture of the most performing CNN models with the aim of designing new models to be used as stand-alone networks or as a component of an ensemble. We propose to replace each activation layer of a CNN (usually a ReLu layer) by a different activation function stochastically drawn from a set of activation functions: in this way the resulting CNN has a different set of activation function layers.
ARTICLE | doi:10.20944/preprints202108.0094.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Siamese networks; Ensemble of classifiers; Loss function; Discrete cosine transform
Online: 3 August 2021 (15:49:22 CEST)
In this paper, we examine two strategies for boosting the performance of ensembles of Siamese networks (SNNs) for image classification using two loss functions (Triplet and Binary Cross Entropy) and two methods for building the dissimilarity spaces (FULLY and DEEPER). With FULLY, the distance between a pattern and a prototype is calculated by comparing two images using the fully connected layer of the Siamese network. With DEEPER, each pattern is described using a deeper layer combined with dimensionality reduction. The basic design of the SNNs takes advantage of supervised k-means clustering for building the dissimilarity spaces that train a set of support vector machines, which are then combined by sum rule for a final decision. The robustness and versatility of this approach are demonstrated on several cross-domain image data sets, including a portrait data set, two bioimage and two animal vocalization data sets. Results show that the strategies employed in this work to increase the performance of dissimilarity image classification using SNN is closing the gap with standalone CNNs. Moreover, when our best system is combined with an ensemble of CNNs, the resulting performance is superior to an ensemble of CNNs, demonstrating that our new strategy is extracting additional information.
ARTICLE | doi:10.20944/preprints202104.0766.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: PAM; Passive acoustic monitoring; audio classiﬁcation; texture classiﬁcation; PAM- 16 ﬁlter; experimental protocols for audio classiﬁcation; statistical tests.
Online: 29 April 2021 (07:55:09 CEST)
Abstract: Passive acoustic monitoring (PAM) is a non-invasive technique to supervise the wildlife. Acoustic surveillance is preferable in some situation such as in the case of marine mammals, when the animals spend most of their time underwater, making it hard to obtain their images. Machine learning is very useful for PAM, for example, to identify species based on audio recordings. But some care should be taken to evaluate the capability of a system. We deﬁne PAM-ﬁlters as the creation of the experimental protocols according to the dates and locations of the recordings, aiming to avoid the use of the same individuals, noise and recording devices in both training and test sets. A random division of a database present accuracies much higher than accuracies obtained with protocols generated with PAM-ﬁlter. Although we use the animal vocalizations, in our method we convert the audio into spectrogram images, after that, we describe the images using the texture. Those are well-known techniques for audio classiﬁcation, and they have already been used for species classiﬁcation. Also, we perform statistical tests to demonstrate the signiﬁcant difference between accuracies generated with and without PAM-ﬁlters with several well-known classiﬁers. The conﬁguration of our experimental protocols and the database were made available online.