Submitted:
15 March 2026
Posted:
16 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- In this article, a novel FS approach called Feature Selection Hybrid Beluga Whale Optimization Algorithm and Coati Optimization Algorithm FS- BWOA-COA framework has been proposed for SFP.
- The proposed FS- BWOA-COA approach is compared with other existing FS algorithms, namely BWOA & COA and evaluated on eleven benchmark SFP datasets.
- To derive derivatives from the obtained results, the experimental outcomes and the datasets’ features are examined.
- To confirm the importance of the difference in the outcomes between the previously described ways and the suggested FSSWO approach, statistical analysis (Pair T Test, Friedman Test, and Wilcoxon signed-rank post-hoc test) has also been carried out.
2. Related Work
3. Problem Statement
4. Proposed Methodology
4.1. Feature Encoding and Population Initialization
4.2. Fitness Evaluation
4.3. Balance Factor (Bf) and Whale Fall Probability (Wf)
4.4. Global Exploration Using BWOA
4.5. Whale Fall Mechanism (Escaping Local Optima)
4.6. Adaptive Switching to COA
4.7. Local Exploitation Using COA
4.8. Position Update and Banalization
4.9. Stopping Criterion and Output
| Algorithm 1: Hybrid BWOA–COA Feature Selection |
Input:
|
| 1:Encode all features as binary vectors 2:Initialize the beluga whale population Xi (i=1,2,…,N) 3:Calculate the fitness of each beluga whale using the objective function f(x) 4: Set iteration counter T=1 5: while T<T max do 6: Calculate the balance factor using equation -4 7: Calculate the whale-fall factor using equation-5 8. if Bf>0.5B_f > 0.5Bf>0.5 then 9: Enter global exploration phase (BWOA) 10: Update beluga positions using the exploration strategy (If Bf>0.5 ) 11 else 12: Enter exploitation phase (BWOA) 13: Update beluga positions using local exploitation 14: if Bf<Wf then 15: Calculate parameters P,Ps,C2, X step 16: Update position using whale-fall mechanism (Equation -8) 17: if Ps>P then 18: Calculate new position XT+1 19: else 20 Calculate Xt+1 new using reverse learning strategy 21: end if 22: Apply cooperative optimization using Equation -9 23: to assist weaker individuals in improving their solutions 24: end if 25: end if 26: Evaluate fitness of updated beluga population 27: Update global best solution X* 28: T=T+1 29: end while 30 Initialize COA population around the best BWOA solution X* 31: Set COA iteration counter C=1 32: while C< C max do 33: Perform local exploitation using coati foraging and climbing behavior 34: Update coati positions to refine selected features 35: Evaluate fitness and remove redundant features 36: C=C+1C = C + 1C=C+1 37: end while 38 Output the best feature subset obtained from BWOA–COA hybrid optimization |
5. Result Analysis
- Dataset Description
| Dataset | Project Description | No. of Modules | No.of Metrics | Defective Modules (%) | Key Characteristics |
|---|---|---|---|---|---|
| CM1 | Spacecraft instrument software | ~505 | 21 | ~9–10% | Small dataset, highly imbalanced |
| KC1 | Storage management system | ~2109 | 21 | ~15–16% | Medium size, correlated metrics |
| KC3 | Storage system (variant) | ~194 | 39 | ~18–19% | High dimensional, small sample |
| MC1 | Mission-critical software | ~9466 | 39 | ~6–7% | Large dataset, severe imbalance |
| MC2 | Mission-critical system (variant) | ~161 | 39 | ~30–32% | Small dataset, relatively balanced |
| MW1 | Satellite ground software | ~403 | 38 | ~7–8% | Sparse defective samples |
| PC1 | Flight software | ~1109 | 21 | ~6–7% | Medium size, real-world project |
| PC2 | Flight control system | ~5589 | 37 | ~0.4–1% | Extremely imbalanced |
| PC3 | Satellite flight software | ~1125 | 38 | ~12–13% | Moderate imbalance |
| PC4 | Updated satellite software | ~1458 | 38 | ~12–13% | Improved data quality |
| JM1 | Real-time predictive system | ~10885 | 21 | ~19–20% | Large, noisy, imbalanced |
- B. Experimental Condition
- C. Baseline Model
- D. Experimental Result

6. Statistical Analysis
| Dataset | Comparison | Mean Diff | SD | SE | t-Value | df | p-Value | Significant (α = 0.05)? |
|---|---|---|---|---|---|---|---|---|
| CM1 | BWOA vs BWOA+COA | 0.0228 | 0.0012 | 0.0006 | 38 | 3 | p<0.0001 | Yes |
| COA vs BWOA+COA | 0.0166 | 0.0019 | 0.00095 | 17.5 | 3 | p<0.001 | Yes | |
| KC1 | BWOA vs BWOA+COA | 0.0137 | 0.0016 | 0.0008 | 17.1 | 3 | p<0.001 | Yes |
| COA vs BWOA+COA | 0.0143 | 0.0062 | 0.0031 | 4.6 | 3 | p< 0.02 | Yes | |
| KC3 | BWOA vs BWOA+COA | 0.0158 | 0.0027 | 0.00135 | 11.7 | 3 | p <0.001 | Yes |
| COA vs BWOA+COA | 0.0102 | 0.0019 | 0.00095 | 10.7 | 3 | p < 0.01 | Yes | |
| MC2 | BWOA vs BWOA+COA | 0.0202 | 0.003 | 0.0015 | 13.5 | 3 | p<0.0001 | Yes |
| COA vs BWOA+COA | 0.01095 | 0.0007 | 0.00035 | 31.3 | 3 | p<0.0001 | Yes | |
| PC3 | BWOA vs BWOA+COA | 0.0291 | 0.013 | 0.0065 | 4.5 | 3 | p≈ 0.02 | Yes |
| COA vs BWOA+COA | 0.0163 | 0.0086 | 0.0043 | 3.8 | 3 | p≈ 0.03 | Yes | |
| PC4 | BWOA vs BWOA+COA | 0.0603 | 0.0048 | 0.0024 | 25.1 | 3 | p < 0.0001 | Yes |
| COA vs BWOA+COA | 0.0413 | 0.0017 | 0.00085 | 48.6 | 3 | p < 0.0001 | Yes | |
| MW1 | BWOA vs BWOA+COA | 0.058 | 0.0087 | 0.00435 | 13.3 | 3 | p < 0.001 | Yes |
| COA vs BWOA+COA | 0.0298 | 0.0033 | 0.00165 | 18.1 | 3 | p < 0.001 | Yes | |
| PC1 | BWOA vs BWOA+COA | 0.0658 | 0.0055 | 0.0028 | 23.5 | 3 | p < 0.0001 | Yes |
| COA vs BWOA+COA | 0.0343 | 0.0036 | 0.0018 | 19.1 | 3 | p < 0.0001 | Yes | |
| PC2 | BWOA vs BWOA+COA | 0.078 | 0.0105 | 0.00525 | 14.9 | 3 | p < 0.001 | Yes |
| COA vs BWOA+COA | 0.036 | 0.0017 | 0.00085 | 42.4 | 3 | p < 0.0001 | Yes | |
| MC1 | BWOA vs BWOA+COA | 0.077 | 0.0077 | 0.00385 | 20 | 3 | p < 0.001 | Yes |
| COA vs BWOA+COA | 0.0383 | 0.0009 | 0.00045 | 85.1 | 3 | p < 0.0001 | Yes | |
| JM1 | BWOA vs BWOA+COA | 0.077 | 0.0077 | 0.00385 | 20 | 3 | p < 0.001 | Yes |
| COA vs BWOA+COA | 0.0383 | 0.0009 | 0.00045 | 85.1 | 3 | p < 0.0001 | Yes |
7. Conclusions
8. Conflict-of-Interest Statement
Author Contributions Statement
9. Data Availability Statement
References
- Qiu, S.; He, J.; Wang, Y.; E, B. A Feature Selection Method for Software Defect Prediction Based on Improved Beluga Whale Optimization Algorithm. Comput. Mater. Contin. 2025, vol. 83(no. 3), 4879–4898. [Google Scholar] [CrossRef]
- Ghaedi; Bardsiri, A. K.; Shahbazzadeh, M. J. Software Failure Prediction Based on Game Theory and Convolutional Neural Network Optimized by Cat Hunting Optimization (CHO) Algorithm. Management Strategies and Engineering Sciences 2025, vol. 7(no. 1), 34–55. [Google Scholar] [CrossRef]
- Pethe, Y. S.; Gourisaria, M. K.; Singh, P. K.; Das, H. FSBOA: feature selection using bat optimization algorithm for software fault detection. Discover Internet of Things 2024, vol. 4(no. 1, Art. no. 17). [Google Scholar] [CrossRef]
- Rathi, S. C.; Misra, S.; Colomo-Palacios, R.; Adarsh, R.; Neti, L. B. M.; Kumar, L. Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction. Expert Syst. Appl. 2023, vol. 223, Art.(no. 119806). [Google Scholar] [CrossRef]
- Goyal, S.; Bhatia, P. K. Software fault prediction using lion optimization algorithm. Int. J. Inf. Tecnol. 2021, vol. 13(no. 6), 2185–2190. [Google Scholar] [CrossRef]
- Kumar, H.; Das, H. Cost-Effective Prediction Model for Optimal Selection of Software Faults Using Coati Optimization Algorithm. SN Comput. Sci. 2025, vol. 6, Art.(no. 420). [Google Scholar] [CrossRef]
- Hassouneh, Y.; Turabieh, H.; Thaher, T.; Tumar, I.; Chantar, H.; Too, J. Boosted Whale Optimization Algorithm with Natural Selection Operators for Software Fault Prediction. IEEE Access 2021, vol. 9, 14238–14258. [Google Scholar] [CrossRef]
- Das, H.; Prajapati, S.; Gourisaria, M. K.; Pattanayak, R. M.; Alameen, A.; Kolhar, M. Feature Selection Using Golden Jackal Optimization for Software Fault Prediction. Mathematics 2023, vol. 11(no. 11, Art. no. 2438). [Google Scholar] [CrossRef]
- Medicharla, S.; Kumar, S.; Devarakonda, P.; Agrawalla, B.; Reddy, B. R. Software Fault Prediction Using FeatBoost Feature Selection Algorithm. Procedia Comput. Sci. 2024, vol. 235, 316–325. [Google Scholar] [CrossRef]
- Das, H. Enhancing Software Fault Prediction Through Feature Selection With Spider Wasp Optimization Algorithm. IEEE Access 2024, vol. 12, 105312–105325. [Google Scholar] [CrossRef]
- Febrian, M. M.; Saputro, S. W.; Saragih, T. H.; Abadi, F.; Herteno, R. Hybrid Feature Selection and Balancing Data Approach for Improved Software Defect Prediction. Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics 2025, vol. 6(no. 3), 232–244. [Google Scholar] [CrossRef]
- Akbar, M.; Herteno, R.; Saputro, S. W.; Faisal, M. R.; Nugroho, R. A. Enhancing Software Defect Prediction through Hybrid Optimization for Feature Selection and Gradient Boosting Classification. J. Electron. Electromed. Eng. Med. Informatics 2024, vol. 6(no. 2), 169–181. [Google Scholar] [CrossRef]
- Balogun; Bajeh, A. O.; Orie, V. A.; Yusuf-Asaju, A. W. Software Defect Prediction Using Ensemble Learning: An ANP Based Evaluation Method. FUOYE Journal of Engineering and Technology 2018, vol. 3(no. 2). [Google Scholar] [CrossRef]
- Balogun. Empirical analysis of rank aggregation-based multi-filter feature selection methods in software defect prediction. Electronics (Switzerland) 2021, vol. 10(no. 2), 1–16. [Google Scholar] [CrossRef]
- Iqbal; Aftab, S. A classification framework for software defect prediction using multi-filter feature selection technique and MLP. International Journal of Modern Education and Computer Science 2020, vol. 12(no. 1), 18–25. [Google Scholar] [CrossRef]
- Sharma, T.; Bhaskar, S.; Jatain, A.; Pabreja, K. “Library Progress International Optimizing Software Defect Detection using advanced Feature Selection, Ensemble Learning, and Class Imbalance Solutions,” 2024. Available online: www.bpasjournals.com.
- Kumar, R.; Kaur, K. A Comparative Analysis of Techniques, Datasets, Feature Selection Methods, and Evaluation Metrics in Software Fault Prediction. Int. J. Emerg. Sci. Eng. 2025, vol. 13(no. 8), 1–9. [Google Scholar] [CrossRef]





| Sr. No | Dataset | Classifiers | BWOA | No of Feature Selection | COA | No of Feature Selection | BWOA+COA | No of Feature Selection |
|---|---|---|---|---|---|---|---|---|
| CM1 | Decision Tree | 0.851 | 10 | 0.858 | 6 | 0.872 | 9 | |
| 1 | SVM | 0.898 | 13 | 0.903 | 9 | 0.921 | 9 | |
| KNN | 0.901 | 15 | 0.906 | 12 | 0.925 | 12 | ||
| Native Bayes | 0.882 | 4 | 0.889 | 6 | 0.904 | 7 | ||
| Average | 0.883 | 10.5 | 0.889 | 8.25 | 0.906 | 9.25 | ||
| KC1 |
Decision Tree | 0.892 | 14 | 0.899 | 11 | 0.907 | 11 | |
| 2 | SVM | 0.910 | 13 | 0.915 | 11 | 0.922 | 11 | |
| KNN | 0.904 | 10 | 0.898 | 7 | 0.918 | 7 | ||
| Native Bayes | 0.924 | 5 | 0.916 | 9 | 0.936 | 6 | ||
| Average | 0.9077 | 10.5 | 0.9071 | 9.5 | 0.9214 | 8.75 | ||
|
3 |
KC3 |
Decision Tree | 0.818 | 19 | 0.825 | 17 | 0.838 | 23 |
| SVM | 0.852 | 17 | 0.858 | 20 | 0.867 | 22 | ||
| KNN | 0.834 | 17 | 0.839 | 18 | 0.848 | 19 | ||
| Native Bayes | 0.821 | 19 | 0.826 | 17 | 0.835 | 14 | ||
| Average | 0.8317 | 18 | 0.8373 | 18 | 0.8474 | 19.5 | ||
|
4 |
MC2 |
Decision Tree | 0.742 | 20 | 0.754 | 16 | 0.766 | 19 |
| SVM | 0.748 | 19 | 0.756 | 19 | 0.768 | 18 | ||
| KNN | 0.732 | 17 | 0.739 | 22 | 0.749 | 19 | ||
| Native Bayes | 0.735 | 25 | 0.743 | 13 | 0.754 | 21 | ||
| Average | 0.7395 | 20.25 | 0.7488 | 17.5 | 0.7597 | 19.25 | ||
|
5 |
PC3 |
Decision Tree | 0.842 | 16 | 0.850 | 20 | 0.861 | 18 |
| SVM | 0.848 | 16 | 0.871 | 19 | 0.889 | 20 | ||
| KNN | 0.872 | 17 | 0.879 | 18 | 0.888 | 16 | ||
| Native Bayes | 0.758 | 16 | 0.771 | 17 | 0.799 | 13 | ||
| Average | 0.8305 | 16.25 | 0.8433 | 18.5 | 0.8596 | 16.75 | ||
|
6 |
PC4 | Decision Tree | 0.842 | 20 | 0.861 | 16 | 0.902 | 11 |
| SVM | 0.871 | 19 | 0.889 | 16 | 0.931 | 12 | ||
| KNN | 0.868 | 18 | 0.884 | 17 | 0.923 | 8 | ||
| Native Bayes | 0.849 | 17 | 0.872 | 16 | 0.915 | 10 | ||
| Average | 0.8575 | 20 | 0.8765 | 16 | 0.9177 | 10.25 | ||
|
7 |
MW1 | Decision Tree | 0.892 | 5 | 0.915 | 9 | 0.942 | 14 |
| SVM | 0.905 | 9 | 0.928 | 8 | 0.962 | 12 | ||
| KNN | 0.876 | 8 | 0.902 | 10 | 0.931 | 16 | ||
| Native Bayes | 0.848 | 10 | 0.889 | 7 | 0.918 | 10 | ||
| Average | 0.88025 | 8 | 0.9085 | 8.5 | 0.93825 | 13 | ||
|
8 |
PC1 | Decision Tree | 0.885 | 14 | 0.912 | 6 | 0.948 | 10 |
| SVM | 0.903 | 11 | 0.928 | 7 | 0.965 | 13 | ||
| KNN | 0.872 | 13 | 0.901 | 9 | 0.936 | 15 | ||
| Native Bayes | 0.844 | 10 | 0.889 | 10 | 0.918 | 4 | ||
| Average | 0.876 | 12 | 0.9075 | 8 | 0.94175 | 10.5 | ||
|
9 |
PC2 | Decision Tree | 0.882 | 10 | 0.914 | 11 | 0.952 | 9 |
| SVM | 0.903 | 12 | 0.936 | 11 | 0.971 | 10 | ||
| KNN | 0.861 | 14 | 0.907 | 13 | 0.944 | 14 | ||
| Native Bayes | 0.832 | 15 | 0.889 | 13 | 0.923 | 16 | ||
| Average | 0.8695 | 12.75 | 0.9115 | 12 | 0.9475 | 12.25 | ||
|
10 |
MC1 | Decision Tree | 0.885 | 14 | 0.918 | 15 | 0.957 | 13 |
| SVM | 0.904 | 15 | 0.936 | 14 | 0.974 | 12 | ||
| KNN | 0.872 | 14 | 0.914 | 14 | 0.951 | 10 | ||
| Native Bayes | 0.841 | 13 | 0.889 | 10 | 0.928 | 10 | ||
| Average | 0.8755 | 14 | 0.91425 | 13.25 | 0.9525 | 11.25 | ||
|
11 |
JM1 | Decision Tree | 0.875 | 10 | 0.915 | 10 | 0.955 | 14 |
| SVM | 0.892 | 9 | 0.931 | 12 | 0.968 | 12 | ||
| KNN | 0.861 | 16 | 0.902 | 13 | 0.941 | 13 | ||
| Native Bayes | 0.832 | 15 | 0.874 | 12 | 0.913 | 10 | ||
| Average | 0.865 | 12.5 | 0.9055 | 11.75 | 0.94425 | 12.25 |
| Performs Compared Algorithms | CM1 | KC1 | KC3 | MC2 | PC3 | PC4 | MW1 | PC1 | PC2 | MC1 | JM1 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BWOA + COA- COA | 0.016625 | 0.014325 | 0.01015 | 0.01095 | 0.016325 | 0.04125 | 0.02975 | 0.03425 | 0.036 | 0.03825 | 0.03875 |
| BWOA + COA- BWOA | 0.022775 | 0.0137 | 0.01575 | 0.0202 | 0.0291 | 0.06025 | 0.058 | 0.06575 | 0.078 | 0.077 | 0.07925 |
| Performs Algorithms | CM1 | KC1 | KC3 | MC2 | PC3 | PC4 | MW1 | PC1 | PC2 | MC1 | JM1 | Total Average Accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BWOA | 0.8833 | 0.9078 | 0.8317 | 0.7396 | 0.8306 | 0.8575 | 0.8803 | 0.8760 | 0.8695 | 0.8755 | 0.8650 | 0.8561 |
| COA | 0.8895 | 0.9071 | 0.8373 | 0.7488 | 0.8434 | 0.8765 | 0.9085 | 0.9075 | 0.9115 | 0.9143 | 0.9055 | 0.8773 |
| BWOA + COA | 0.9061 | 0.9215 | 0.8475 | 0.7598 | 0.8597 | 0.9178 | 0.9383 | 0.9418 | 0.9475 | 0.9525 | 0.9443 | 0.9033 |
| Hyper Parameters | BWOA | COA | BWOA+COA |
|---|---|---|---|
| Population size | 30 | 30 | 30 |
| Number of iterations | 50 | 50 | 50 |
| alpha | - | - | 1 |
| beta | 1 | - | - |
| rho | - | 0.2 | - |
| Wmin | - | 0.4 | - |
| C1 | - | - | 2 |
| Wmax | - | - | 0.9 |
| C2 | - | 2 | - |
| MR | 0.01 | - | - |
| SF | - | - | 0.8 |
| Dataset | χ² | df | p-Value | α =p < 0.05 | Significant |
|---|---|---|---|---|---|
| CM1 | 8 | 2 | 0.0183 | Yes | Yes |
| KC1 | 6 | 2 | 0.0498 | Yes | Yes |
| KC3 | 8 | 2 | 0.0183 | Yes | Yes |
| MC2 | 8 | 2 | 0.0183 | Yes | Yes |
| PC3 | 8 | 2 | 0.0183 | Yes | Yes |
| PC4 | 8 | 2 | 0.0183 | Yes | Yes |
| MW1 | 8 | 2 | 0.0183 | Yes | Yes |
| PC1 | 8 | 2 | 0.0183 | Yes | Yes |
| PC2 | 8 | 2 | 0.0183 | Yes | Yes |
| MC1 | 8 | 2 | 0.0183 | Yes | Yes |
| JM1 | 8 | 2 | 0.0183 | Yes | Yes |
| Comparison | p-Value | Holm α | Significant |
|---|---|---|---|
| BWOA+COA vs COA | 0.003 | 0.025 | Yes |
| BWOA+COA vs BWOA | 0.003 | 0.05 | Yes |
| Study | Year | Method | Average Accuracy |
|---|---|---|---|
| Balgoun et al. [13] | 2018 | CfsSubsetEval Bagged KNN | 0.759 |
| Balogun et al. [14] | 2021 | RMFFS NB CS | 0.746 |
| Iqbal and Aftab [15] | 2020 | MLP MFFS ROS | 0.817 |
| Sharma et al. [16] | 2024 | SMOTE MI RFE CV PCA KNN | 0.72 |
| Febrian et al.[11] | 2025 | PSO SMOTE CS CFS CMFS | 0.872 |
| Akbar et al. [17] | 2024 | HGWOPSO - CatBoost | 0.8949 |
| Proposed Method | New | BWOA +COA | 0.9033 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).