ARTICLE | doi:10.20944/preprints201811.0461.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Software quality; cross-project defect prediction; multi-source; dissimilarity space; arc-cosine kernel function
Online: 19 November 2018 (11:48:50 CET)
Software defect prediction is an important means to guarantee software quality. Because there are no sufficient historical data within a project to train the classifier, cross-project defect prediction (CPDP) has been recognized as a fundamental approach. However, traditional defect prediction methods using feature attributes to represent samples, which can not avoid negative transferring, may result in poor performance model in CPDP. This paper proposes a multi-source cross-project defect prediction method based on dissimilarity space ( DM-CPDP). This method first uses the density-based clustering method to construct the prototype set with the cluster center of samples in the target set. Then, the arc-cosine kernel is used to form the dissimilarity space, and in this space the training set is obtained with the earth mover’s distance (EMD) method. For the unlabeled samples converted from the target set, the KNN algorithm is used to label those samples. Finally, we use TrAdaBoost method to establish the prediction model. The experimental results show that our approach has better performance than other traditional CPDP methods.
ARTICLE | doi:10.20944/preprints202010.0526.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: audio classification; dissimilarity space; siamese network; ensemble of classifiers; pattern recognition; animal audio
Online: 26 October 2020 (13:57:01 CET)
The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using 4 different backbones, with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. Different clustering methods reduce the spectrograms in the dataset to a set of centroids that generate (in both a supervised and unsupervised fashion) the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, additional experiments process the spectrograms using the Heterogeneous Auto-Similarities of Characteristics. Once the similarity spaces are computed, a vector space representation of each pattern is generated that is then trained on a Support Vector Machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best stand-alone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset. The MATLAB code used in this study is available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints201906.0004.v1
Subject: Engineering, Other Keywords: weighted dissimilarity measure; feature-based indoor positioning; signals of opportunity; location-dependent standard deviation
Online: 3 June 2019 (08:37:55 CEST)
We propose an iterative scheme for feature-based positioning using a new weighted dissimilarity measure with the goal of reducing the impact of large errors among the measured or modeled features. The weights are computed from the location-dependent standard deviations of the features and stored as part of the reference fingerprint map (RFM). Spatial filtering and kernel smoothing of the kinematically collected raw data allow efficiently estimating the standard deviations during RFM generation. In the positioning stage, the weights control the contribution of each feature to the dissimilarity measure, which in turn quantifies the difference between the set of online measured features and the fingerprints stored in the RFM. Features with little variability contribute more to the estimated position than features with high variability. Iterations are necessary because the variability depends on the location, and the location is initially unknown when estimating the position. Using real WiFi signal strength data from extended test measurements with ground truth in an office building, we show that the standard deviations of these features vary considerably within the region of interest and are neither simple functions of the signal strength nor of the distances from the corresponding access points. This is the motivation to include the empirical standard deviations in the RFM. We then analyze the deviations of the estimated positions with and without the location-dependent weighting. In the present example the maximum radial positioning error from ground truth are reduced by 40% comparing to kNN without the weighted dissimilarity measure.
ARTICLE | doi:10.20944/preprints201809.0146.v1
Subject: Earth Sciences, Geoinformatics Keywords: Fuzzy c-Means (FCM) Classifier, Similarity and Dissimilarity measures, Distance, Fuzzy Error Matrix (FERM)
Online: 8 September 2018 (01:46:24 CEST)
In this study, the fuzzy c- means classifier has been studied with nine other similarity and dissimilarity measures: Manhattan distance, chessboard distance, Bray-Curtis distance, Canberra, Cosine distance, correlation distance, mean absolute difference, median absolute difference and normalised squared Euclidean distance. Both single and composite modes were used with a varying weight constant (m) and also at different α-cuts. The two best single norms obtained were combined to study the effect of composite norms on the datasets used. An image to image accuracy check was conducted to assess the accuracy of the classified images. Fuzzy Error Matrix (FERM) was applied to measure the accuracy assessment outcomes for a Landsat-8 dataset with respect to the Formosat-2 dataset. To conclude FCM classifier with Cosine norm performed better than the conventional Euclidean norm. But, due to the incapability of the FCM classifier to handle noise properly, the classification accuracy was around 75%.