Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Rough Noise-Filtered Easy Ensemble for Software Fault Prediction

Version 1 : Received: 17 May 2018 / Approved: 17 May 2018 / Online: 17 May 2018 (13:01:51 CEST)

How to cite: Riaz, S.; Arshad, A.; Jiao, L. Rough Noise-Filtered Easy Ensemble for Software Fault Prediction. Preprints 2018, 2018050248. https://doi.org/10.20944/preprints201805.0248.v1 Riaz, S.; Arshad, A.; Jiao, L. Rough Noise-Filtered Easy Ensemble for Software Fault Prediction. Preprints 2018, 2018050248. https://doi.org/10.20944/preprints201805.0248.v1

Abstract

Software fault prediction is the very consequent research topic for software quality assurance. Data driven approaches provide robust mechanisms to deal with software fault prediction. However, the prediction performance of the model highly depends on the quality of dataset. Many software datasets suffers from the problem of class imbalance. In this regard, under-sampling is a popular data pre-processing method in dealing with class imbalance problem, Easy Ensemble (EE) present a robust approach to achieve a high classification rate and address the biasness towards majority class samples. However, imbalance class is not the only issue that harms performance of classifiers. Some noisy examples and irrelevant features may additionally reduce the rate of predictive accuracy of the classifier. In this paper, we proposed two-stage data pre-processing which incorporates feature selection and a new Rough set Easy Ensemble scheme. In feature selection stage, we eliminate the irrelevant features by feature ranking algorithm. In the second stage of a new Rough set Easy Ensemble by incorporating Rough K nearest neighbor rule filter (RK) afore executing Easy Ensemble (EE), named RKEE for short. RK can remove noisy examples from both minority and majority class. Experimental evaluation on real-world software projects, such as NASA and Eclipse dataset, is performed in order to demonstrate the effectiveness of our proposed approach. Furthermore, this paper comprehensively investigates the influencing factor in our approach. Such as, the impact of Rough set theory on noise-filter, the relationship between model performance and imbalance ratio etc. comprehensive experiments indicate that the proposed approach shows outstanding performance with significance in terms of area-under-the-curve (AUC).

Keywords

software fault prediction; data preprocessing; feature selection; rough set theory; class imbalance; noise filter; easy ensemble

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.