Highlights
Robust Noise Mitigation via Hellinger Distance: Demonstrates a two-phase process—outlier epoch removal and channel selection—using the Hellinger Distance in the time–frequency domain, thereby effectively reducing artifacts and improving data quality.
Superior Classification Accuracy in Low-Channel Regime: Achieves around 70% accuracy with only three EEG channels, outperforming other CSP variants and highlighting suitability for portable or resource-constrained BCI applications.
Scalable Performance with Increased Channels: Maintains consistently high accuracy (exceeding 80% in some configurations) as channel count grows, indicating broad applicability across diverse EEG setups.
1. Introduction
Electroencephalography (EEG) has gained increasing attention in brain-computer interface (BCI) applications and clinical diagnostics due to its non-invasive nature and ability to capture real-time neural activity [
1,
2]. Despite these benefits, EEG signals are characteristically non-stationary and prone to various sources of noise, such as physiological artifacts and external electromagnetic interference. These challenges often lead to substantial variability in the recorded data, complicating the extraction of reliable features for classification or clinical interpretation.
Channel selection, a process of identifying the most informative subset of electrodes, has emerged as a key strategy for managing this complexity. By reducing the dimensionality of the data, channel selection not only streamlines computational overhead but can also enhance classification accuracy by excluding channels with predominantly noisy or irrelevant signals. Traditional approaches to channel selection rely on criteria such as mutual information or variance-based metrics; however, they may not be robust in capturing subtle changes in EEG signals’ spectral content or in handling highly non-stationary properties [
3].
To address these gaps, this study proposes a novel pipeline centered on the Hellinger Distance computed over time-frequency representations (TFRs) of EEG. In abstract, we first transform raw EEG signals into the time-frequency domain utilizing Short-Time Fourier Transform (STFT) and Continuous Wavelet Transform (CWT). Then, we employ Hellinger Distance to perform two critical steps: automatic detection and removal of outlier epochs, and ranking and selection of the most informative channels based on their ability to discriminate between different classes or conditions. By leveraging Hellinger Distance for both noise mitigation and feature selection, our framework provides a unified approach to boosting classification performance.
2. Analysis
The overall goal is in two phase: first, to establish a procedure for detecting and eliminating noisy epochs that may adversely affect classification performance; and second, to identify a minimal, yet informative set of channels that maximizes discriminative power.
We begin by revisiting the core principles of time-frequency representations (
Section 2.1), emphasizing the importance of capturing transient spectral fluctuations in EEG signals. Next, we introduce Hellinger Distance as a measure of distributional divergence (
Section 2.2), illustrating its dual role in noise detection (
Section 2.3) and channel selection (
Section 2.4).
2.1. Time-Frequency Representation of EEG Signals
Let EEG signal be , where N is the number of samples, C is the number of channels, and T is the number of timepoints, and be the signal where .
As EEG signals are non-stationary, meaning its distribution varies over time, we analyze it by transforming it into a time-frequency domain to transient spectral features. By utilizing Short-Time Fourier Transform(STFT), Time-Frequency Representation(TFR) of a signal
can be calculated as
where
is the window function centered at time
[
2]. Analogous to STFT, Continuous Wavelet Transform(CWT) can be utilized by calculating
where the wavelet function
. In this study, Morlet wavelet function was used, which is defined as
where
is the central frequency and
controls the time spread. [
4].
Continuous Wavelet Transform(CWT) can be utilized using
as
where
a is the inverse frequency and
b is the time shift. [
5]
2.2. Utilization of Hellinger Distance for Z-Score Calculation
Hellinger Distance is a measure of difference between two probability distributions. For two discrete distributions:
and
, Hellinger distance
is defined as
In this study, Hellinger distance is utilized in two different ways: noise removal and channel selection.
2.3. Noise Removal
For calculating the z-score for each epoch, we calculate
. where
for each epoch
i, where
represents overall average of
. This
serves as a measure of how discriminative epoch
i’s features are between conditions. By subtracting
, which is the average of
, and dividing with
, which is the standard deviation of
, we get each epoch
i’s Z-score:
Epochs having large Z-score indicate how extreme the signal of each trial is compared to other epochs. Therefore, we assume epochs having high Z-score as outliers(in this case noise), and remove it.
Therefore, we can define
, which indicates non-noisy trials, as
where
indicates the raw time-domain EEG data corresponding to the
ith epoch.
2.4. Channel Selection
After removing noisy epochs, the most informative EEG channels classification is identified. The goal of this process is to retain channels that maximize class separability while discarding those that contribute redundant or noisy information. This is achieved by computing the Hellinger Distance between class-specific time-frequency representations (TFRs) of each EEG channel.
The TFR features extracted from each channel serve as probability distributions of spectral power. Let the normalized power spectrum of a channel
c for class 1 and class 2 be represented as
and
, respectively. The Hellinger Distance for a given channel
c is computed as:
The computed Hellinger Distance
for each channel provides a measure of how well the channel distinguishes between the two classes. A higher
value indicates greater separability. To rank channels, we define the effect size
as:
where
is the overall mean spectral distribution across all epochs. The channels are then sorted in descending order based on
.
3. Results
By [
3], CSP based motor imagery classification algorithms can be enumerated as CSP, L1 Norm of CSP, SCSP, FBCSP, and E-CSP. The accuracy of each algorithm, including HD-CSP(Hellinger Distance CSP), is computed using the BCI competition IVa dataset, which is a publicly available dataset consisting of imagery of four classes: left hand movement, right hand movement, leg movement, and tongue movement [
1], to evaluate HD-CSP’s performance. Each subject’s dataset is divided in 5-fold cross manner for evaluation and training. CSP filter is applied to transform the signal, and train an Support Vector Machine (SVM) classifier.
In our comparative evaluation of multiple variants of CSP (standard CSP,
Norm CSP, SCSP, FBCSP, E-CSP, and HD-CSP) under different number of EEG channels (
Table 1), HD-CSP consistently outperformed competing approaches in all tested conditions, except one case in 11 channel configuration. The results show that HD-CSP not only yields superior classification accuracy overall but also maintains a stable advantage across different channel counts (
Figure 1).
Notably, in the most restrictive scenario of only three channels, HD-CSP still manages to reach around 70% classification accuracy. This outcome is particularly compelling because many methods suffer a significant drop in performance when the dimensionality of the EEG data is limited. Except for this model, the rest of the models show an accuracy of 36% - 46%. The robust performance of HD-CSP in this low-channel regime illustrates its suitability for practical applications where hardware or time constraints require the use of fewer electrodes.
Moreover, even as the number of channels increases, HD-CSP remains consistently superior. Its classification accuracy continues to surpass that of standard CSP, Norm CSP, SCSP, FBCSP, and E-CSP for all tested configurations, reaching accuracy of 80% in 19 channel configuration, which none of other models have reached. These findings highlight HD-CSP’s efficacy in balancing feature extraction quality and noise handling, suggesting it can serve as a strong default method for both minimal-channel and full-cap montage scenarios.
References
- Brunner, C.; Leeb, R.; Müller-Putz, G.; Schlögl, A.; Pfurtscheller, G. BCI Competition 2008–Graz data set A. Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology 2008, 16, 1–6.
- Goyal, D.; Pabla, B.S. Condition based maintenance of machine tools—A review. CIRP Journal of Manufacturing Science and Technology 2015, 10, 24–35. [CrossRef]
- Lee, S. A Mathematical Review on EEG Channel Selection Techniques for Motor Imagery Classification, 2025. [CrossRef]
- Wallisch, P.; Lusignan, M.; Benayoun, M.; Baker, T.I.; Dickey, A.S.; Hatsopoulos, N.G. Chapter 9 - Wavelets. In Matlab for Neuroscientists; Wallisch, P.; Lusignan, M.; Benayoun, M.; Baker, T.I.; Dickey, A.S.; Hatsopoulos, N.G., Eds.; Academic Press: London, 2009; pp. 133–140. [CrossRef]
- Dişli, F.; Gedikpınar, M.; Fırat, H.; Şengür, A.; Güldemir, H.; Koundal, D. Epilepsy Diagnosis from EEG Signals Using Continuous Wavelet Transform-Based Depthwise Convolutional Neural Network Model. Diagnostics (Basel, Switzerland) 2025, 15. [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).