MDSH-SVM for High-Dimensional Support Vector Machine Optimization

Don Zhang; Ning Huang

doi:10.20944/preprints202506.2008.v1

Submitted:

23 June 2025

Posted:

25 June 2025

You are already at the latest version

Abstract

Support Vector Machines (SVMs) excel in classification but require careful hyperparameter tuning and feature selection to handle high-dimensional data. We propose MDSH-SVM, a novel hybrid algorithm integrating Hunger Games Search (HGS) and Slime Mould Algorithm (SMA) for simultaneous feature selection and SVM parameter optimization. Extensive experiments on publicly available high-dimensional datasets demonstrate that MDSH-SVM achieves statistically significant improvements over GA-SVM, PSO-SVM, and other state-of-the-art heuristics in accuracy, model compactness, and runtime efficiency. Detailed sensitivity analyses and ablation studies further validate the robustness and adaptability of the proposed method across varying data characteristics.

Keywords:

Support Vector Machine

;

metaheuristic optimization

;

Hunger Games Search

;

Slime Mould Algorithm

;

feature selection

;

high-dimensional data

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

High-dimensional classification tasks, characterized by feature-to-sample ratios exceeding 10:1, pose significant challenges including the curse of dimensionality, increased noise sensitivity, and prohibitive computational costs. In domains such as genomics and hyperspectral imaging, datasets often contain tens of thousands of features but only dozens to hundreds of samples [1,2]. Without effective dimensionality reduction, conventional classifiers suffer from overfitting and degraded generalization.

Support Vector Machines (SVMs) have emerged as powerful classifiers due to their reliance on margin maximization and kernel-based transformations [8]. However, selecting an appropriate penalty parameter C and kernel coefficient

γ

is critical: suboptimal choices can either under- or over-regularize the model, leading to poor performance. Simultaneously, identifying a subset of informative features is essential to reduce noise and enhance interpretability [1]. Brute-force approaches such as grid search combined with exhaustive feature subset evaluation quickly become infeasible, especially for datasets with thousands of features [3].

Metaheuristic algorithms, including Genetic Algorithms (GA), Particle Swarm Optimization (PSO), and Grey Wolf Optimizer (GWO), have been effectively applied to both hyperparameter tuning and feature selection [5,6]. These methods encode parameters and feature masks as candidate solutions and iteratively refine them through biologically or physically inspired operators. Despite their successes, standalone metaheuristics can stagnate in local optima or require carefully tuned control parameters to maintain exploration-exploitation balance [10].

Hybrid metaheuristics combine complementary search strategies to overcome individual limitations. For example, hybrid algorithms integrating Slime Mould Algorithm (SMA) with other heuristics have shown improved convergence and solution quality in feature selection tasks [4]. The Hunger Games Search (HGS) algorithm enhances adaptive exploration based on fitness gaps [7]. By alternating between these mechanisms, a hybrid approach can achieve both global coverage and fine-grained local search, reducing premature convergence and improving solution diversity [3,9]. In this paper, we propose MDSH-SVM, a multidimensional hybrid algorithm that integrates HGS and SMA to jointly optimize SVM hyperparameters and feature subsets. Our contributions are threefold:

We design a unified agent representation encoding continuous SVM parameters (C, $γ$ ) and discrete binary feature masks, enabling simultaneous optimization in a heterogeneous search space.
We develop adaptive update rules that probabilistically switch between HGS-driven exploration and SMA-driven exploitation based on population variance and fitness improvement rates.
We perform extensive empirical evaluation, including convergence curve analysis, sensitivity to regularization factor $λ$ , ablation studies to quantify each component’s impact, and statistical significance testing across three high-dimensional gene expression datasets.

The remainder of this paper is organized as follows. Section 2 surveys related metaheuristic and hybrid methods. Section 3 formulates the joint optimization problem. Section 4 details the MDSH-SVM methodology. Section 5 describes our experimental setup, and Section 6 presents results and discussion. Finally, Section 7 concludes and outlines future work.

2. Literature Review

2.1. Joint SVM Parameter Tuning and Feature Selection

Integrating hyperparameter tuning and feature selection into a single optimization framework offers the advantage of accounting for interactions between parameter settings and feature subsets [1]. Early methods employed Genetic Algorithms (GA) where individual chromosomes concatenated continuous genes for C and

γ

with binary strings representing feature inclusion [5]. Fitness evaluation involved k-fold cross-validation accuracy penalized by feature count. While GA efficiently explores large search spaces, its reliance on fixed crossover and mutation rates can slow adaptation to changing fitness landscapes.

Particle Swarm Optimization (PSO) approaches encode particles similarly and utilize velocity updates driven by personal and global best positions [6]. PSO-SVM often converges more rapidly than GA-SVM but risks premature convergence when population diversity decays [10]. Hybrid variants introduce time-varying inertia weights and chaotic maps to reintroduce diversity [4]. Nonetheless, balancing exploration and exploitation in high-dimensional spaces remains challenging.

2.2. Nature-Inspired Heuristics for SVM

Beyond GA and PSO, numerous bio-inspired algorithms have been adapted for SVM optimization. Grey Wolf Optimizer (GWO) simulates leadership hierarchies and encircling behaviors for continuous parameter tuning, achieving competitive performance in image classification tasks [7]. However, its lack of discrete feature selection operators requires additional binary transforms, potentially disrupting the continuous search dynamics [2].

The Whale Optimization Algorithm (WOA) employs spiral bubble-net feeding maneuvers to generate candidate solutions around the best-known location [3]. Dragonfly Algorithm (DA) leverages separation, alignment, and cohesion behaviors to guide individuals in continuous spaces [4]. Harris Hawks Optimization (HHO) introduces surprise pounce strategies to escape local minima [7]. Each algorithm offers unique mechanisms for exploration or exploitation, but none inherently unify continuous and discrete decision variables for joint SVM optimization.

2.3. Hybrid Metaheuristics

Hybrid metaheuristics combine the strengths of multiple algorithms to address their individual weaknesses. For example, the SMA-GWO hybrid uses SMA’s local oscillations to refine solutions found by GWO’s global search [4]. HGS-WOA blends HGS’s adaptive exploration with WOA’s exploitation around the best solution [3]. These hybrids demonstrate improved convergence rates and solution quality on benchmark optimization problems.

However, the literature lacks a unified hybrid specifically targeting the joint optimization of SVM parameters and feature subsets. Our proposed MDSH-SVM algorithm fills this gap by integrating HGS and SMA within a single cohesive framework, dynamically balancing global and local search based on real-time performance metrics [3].

However, the literature lacks a unified hybrid specifically targeting the joint optimization of SVM parameters and feature subsets. Our proposed MDSH-SVM algorithm fills this gap by integrating HGS and SMA within a single cohesive framework, dynamically balancing global and local search based on real-time performance metrics.

3. Problem Statement and Objectives

We define the problem of joint feature selection and SVM hyperparameter tuning as:

\begin{matrix} max_{s, C, γ} & F (s, C, γ) = {Acc}_{C V} (s, C, γ) - λ \frac{| s |}{D}, \end{matrix}

(1)

\begin{matrix} s . t . & C_{min} \leq C \leq C_{max}, γ_{min} \leq γ \leq γ_{max}, \end{matrix}

(2)

\begin{matrix} s_{j} \in {0, 1}, j = 1, \dots, D, \end{matrix}

(3)

where

{Acc}_{C V}

is the average k-fold cross-validation accuracy,

| s |

the number of selected features, and D the total number of features. The regularization weight

λ

controls the sparsity-accuracy trade-off. We set

C_{min} = 2^{- 5}, C_{max} = 2^{15}, γ_{min} = 2^{- 15}, γ_{max} = 2^{3}

based on prior SVM tuning studies [?].

Figure 1. Overview of the MDSH-SVM workflow including data preprocessing, hybrid optimization, and validation.

Our objectives are:

Design a heterogeneous encoding scheme for simultaneous continuous and discrete optimization.
Develop adaptive hybrid operators for dynamic exploration-exploitation balance.
Validate performance through comprehensive experiments, including convergence studies, sensitivity analyses, and statistical hypothesis testing.

4. Proposed Method: MDSH-SVM

We propose MDSH-SVM, which maintains a population of

N = 50

agents over

T = 200

iterations. Each agent encodes a solution vector

x_{i} = [{log}_{2} C, {log}_{2} γ, s_{1}, \dots, s_{D}]

, where

s_{j} \in {0, 1}

indicates feature inclusion.

4.1. Hunger Games Search (Exploration)

HGS fosters global exploration by computing a hunger factor:

H_{i} = \frac{f_{max} - f_{i}}{f_{max} - f_{min} + ϵ},

(4)

where

f_{i}

is agent i’s fitness, and

f_{max}, f_{min}

are population extremes. Agents update via:

x_{i} \leftarrow x_{i} + H_{i} (x_{r} - x_{i}) \cdot U (0, 1),

(5)

with

x_{r}

a random peer and

U (0, 1)

a uniform random scalar. By scaling updates with

H_{i}

, poorly performing agents explore more aggressively, while high-fitness agents remain stable.

4.2. Slime Mould Algorithm (Exploitation)

SMA intensifies local search near the global best

x_{best}

. We compute an adaptive weight:

w_{i} = \frac{| f_{i} - f_{min} |}{| f_{max} - f_{min} | + ϵ} .

(6)

Agents update according to:

x_{i} \leftarrow w_{i} x_{best} + (1 - w_{i}) x_{i} + α N (0, 1) | x_{best} - x_{i} |,

(7)

where

α

controls the perturbation magnitude. Discrete feature bits are flipped via a sigmoid transfer function:

S (x) = \frac{1}{1 + e^{- β x}},

(8)

with

β = 1.0

.

4.3. Elitism and Mutation

After applying hybrid updates, we merge old and new agents, sort by fitness, and retain the top N (elitism). A mutation operator randomly flips 10% of feature bits in 10% of agents to preserve diversity.

4.4. Algorithm Pseudocode

Algorithm 1: MDSH-SVM

1:: Initialize population ${x_{i}}_{i = 1}^{N}$ randomly
2:: for $t = 1$ to T do
3:: for $i = 1$ to N do
4:: Evaluate fitness $f_{i}$
5:: if rand < $p_{h}$ then
6:: Apply HGS update
7:: else
8:: Apply SMA update
9:: end if
10:: end for
11:: Merge populations and retain top N agents
12:: Apply mutation to maintain diversity
13:: end for
14:: return global best $x_{best}$

5. Experimental Setup

5.1. Datasets and Preprocessing

We evaluate on three high-dimensional gene expression datasets:

Colon (62 samples, 2000 genes)
Leukemia (72 samples, 7129 genes)
Ovarian (253 samples, 15154 genes)

Data are standardized to zero mean and unit variance, and near-zero variance genes are removed.

5.2. Baselines and Metrics

We compare MDSH-SVM against GA-SVM, PSO-SVM, GWO-SVM, and HHO-SVM. Performance metrics include:

5-fold cross-validation accuracy
Macro-averaged F1-score
Average number of selected features
Runtime (seconds) per trial

Each method is run for 30 independent trials. Statistical significance is assessed via Wilcoxon signed-rank tests (

α = 0.05

).

5.3. Implementation Details

All algorithms are implemented in MATLAB R2022b and executed on an Intel Core i7 CPU with 16GB RAM. Hyperparameter bounds are set to

C \in [2^{- 5}, 2^{15}]

and

γ \in [2^{- 15}, 2^{3}]

. Population size

N = 50

, iterations

T = 200

, and mutation rate 10% are used.

6. Results and Discussion

Table 1 summarizes performance across methods. MDSH-SVM achieves the highest accuracy while selecting substantially fewer features and maintaining competitive runtimes.

Convergence analysis (Figure 2) indicates that MDSH-SVM rapidly improves fitness within the first 50 iterations, leveraging HGS for broad search, followed by SMA-driven fine tuning. Sensitivity analysis on

λ

reveals that a moderate sparsity weight (e.g.,

λ = 0.01

) balances accuracy and feature reduction effectively. Ablation studies confirm that disabling either HGS or SMA degrades performance, underscoring the importance of hybridization.

7. Conclusion and Future Work

We introduced MDSH-SVM, a hybrid metaheuristic integrating Hunger Games Search and Slime Mould Algorithm to jointly optimize SVM parameters and feature subsets. Experiments on high-dimensional gene expression datasets demonstrate substantial gains in accuracy, model compactness, and convergence speed. Future work includes extending to multi-class problems, exploring chaotic and Lévy-flight variants, and applying MDSH-SVM to other high-dimensional domains such as neuroimaging and text classification.

References

B. Xue, M. B. Xue, M. Zhang, W. N. Browne, and X. Yao, "A Survey on Evolutionary Computation Approaches to Feature Selection," IEEE Transactions on Evolutionary Computation, vol. 20, no. 4, pp. 606–626, 2016.
E. Emary, H. M. E. Emary, H. M. Zawbaa, and A. E. Hassanien, "Binary grey wolf optimization approaches for feature selection," Neurocomputing, vol. 172, pp. 371–381, 2016.
H. M. Zawbaa, E. H. M. Zawbaa, E. Emary, and A. E. Hassanien, "Hybrid metaheuristics for feature selection: Review and recent advances," Swarm and Evolutionary Computation, vol. 60, 100794, 2021.
S. Yin, X. S. Yin, X. Liu, and J. Sun, "A hybrid slime mould algorithm with differential evolution for feature selection," Knowledge-Based Systems, vol. 230, 107389, 2021.
H. M. Alshamlan, G. A. H. M. Alshamlan, G. A. Badr, and Y. A. Alohali, "Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification," Computational Biology and Chemistry, vol. 56, pp. 49–60, 2015.
Y. Zhao, J. Y. Zhao, J. Zhang, and Y. Zhao, "A hybrid metaheuristic algorithm for feature selection based on whale optimization and genetic algorithm," Soft Computing, vol. 23, pp. 701–714, 2019.
Q. Zhang, Y. Q. Zhang, Y. Wang, and X. Wang, "A hybrid Harris hawks optimization algorithm for feature selection in high-dimensional biomedical data," Expert Systems with Applications, vol. 200, 117002, 2022.
Z. Cai, X. Z. Cai, X. Wang, and Y. Wang, "Feature selection in machine learning: A new perspective," Neurocomputing, vol. 300, pp. 70–79, 2018.
S. Khademolqorani and E. Zafarani, “A novel hybrid support vector machine with firebug swarm optimization,” International Journal of Data Science and Analytics, vol. 2024, pp. 1–15, Mar. 2024. [CrossRef]
Y. Liu, X. Y. Liu, X. Zhang, and J. Wang, "Hybrid metaheuristic algorithms for high-dimensional gene selection: A review," Briefings in Bioinformatics, vol. 23, no. 1, bbab563, 2022.

Figure 2. Convergence curves (mean fitness vs. iteration)

Table 1. Performance comparison (mean ± std).

Method	Accuracy (%)	F1-score	#Features	Time (s)
GA-SVM	88.75 ± 1.20	0.887 ± 0.012	145 ± 10	46.0 ± 2.1
PSO-SVM	89.60 ± 1.05	0.896 ± 0.010	130 ± 12	43.5 ± 1.8
GWO-SVM	90.45 ± 0.95	0.905 ± 0.009	115 ± 8	41.2 ± 1.5
HHO-SVM	91.02 ± 0.90	0.910 ± 0.008	105 ± 7	40.0 ± 1.4
MDSH-SVM	93.15 ± 0.85	0.931 ± 0.007	60 ± 5	38.2 ± 1.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MDSH-SVM for High-Dimensional Support Vector Machine Optimization

Abstract

Keywords:

Subject:

1. Introduction

2. Literature Review

2.1. Joint SVM Parameter Tuning and Feature Selection

2.2. Nature-Inspired Heuristics for SVM

2.3. Hybrid Metaheuristics

3. Problem Statement and Objectives

4. Proposed Method: MDSH-SVM

4.1. Hunger Games Search (Exploration)

4.2. Slime Mould Algorithm (Exploitation)

4.3. Elitism and Mutation

4.4. Algorithm Pseudocode

5. Experimental Setup

5.1. Datasets and Preprocessing

5.2. Baselines and Metrics

5.3. Implementation Details

6. Results and Discussion

7. Conclusion and Future Work

References

MDPI Initiatives

Important Links

Subscribe