Dissimilarity Space Based Multi-Source Cross-Project Defect Prediction

Shengbing Ren; Wanying Zhang; Hafiz Shahbaz Munir; Lei Xia

doi:10.20944/preprints201811.0461.v1

Submitted:

16 November 2018

Posted:

19 November 2018

You are already at the latest version

Abstract

Software defect prediction is an important means to guarantee software quality. Because there are no sufficient historical data within a project to train the classifier, cross-project defect prediction (CPDP) has been recognized as a fundamental approach. However, traditional defect prediction methods using feature attributes to represent samples, which can not avoid negative transferring, may result in poor performance model in CPDP. This paper proposes a multi-source cross-project defect prediction method based on dissimilarity space ( DM-CPDP). This method first uses the density-based clustering method to construct the prototype set with the cluster center of samples in the target set. Then, the arc-cosine kernel is used to form the dissimilarity space, and in this space the training set is obtained with the earth mover’s distance (EMD) method. For the unlabeled samples converted from the target set, the KNN algorithm is used to label those samples. Finally, we use TrAdaBoost method to establish the prediction model. The experimental results show that our approach has better performance than other traditional CPDP methods.

Keywords:

Software quality

;

cross-project defect prediction

;

multi-source

;

dissimilarity space

;

arc-cosine kernel function

Subject:

Computer Science and Mathematics - Computer Science

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Dissimilarity Space Based Multi-Source Cross-Project Defect Prediction

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe