Enabling Adaptive Interaction in the Metaverse Using a Hybrid EEG-Based Brain–Computer Interface

Sapthak Mohajon Turjya

doi:10.20944/preprints202602.0061.v2

Submitted:

05 February 2026

Posted:

06 February 2026

You are already at the latest version

Abstract

This paper presents a hybrid model for the control of brain-computer interfaces (BCIs) for Metaverse environments, with the goal of advancing the capabilities of such interfaces beyond the traditional motor imagery (MI) or P300-based brain-computer interfaces. This hybrid model utilizes P300 for virtual devices' interaction and MI for navigation and movement imagination in the Metaverse, with each EEG modality being dedicated to a particular control state and state changes being made sequentially based on the context of the interaction. In the simulated experiment, the imagined movement of the left and right hands is used for rotational navigation, while discrete devices' actions use P300 responses under a five-stimulus oddball paradigm. In the performance evaluation, the paper shows that the hybrid model, with the use of MI and P300 under a single BCI, achieves accuracy comparable to single-mode BCIs, with advantages over existing BCI systems regarding their capabilities for interaction and adaptability, thus proving the effectiveness of hybrid control for achieving dynamic and flexible Metaverse interactions.

Keywords:

metaverse

;

BCI

;

P300

;

sensorimotor rhythms

;

MI

Subject:

Computer Science and Mathematics - Signal Processing

1. Introduction

The Metaverse was a term coined by an American Writer known as Neil Stephenson in his science fiction novel known as the Snow Crash [1]. In his novel, Stephenson depicted the Metaverse as a dynamic, perpetual, and omnipresent world running in parallel with the real world. Users in the real world would be communicating with their virtual counterparts known as the Avatars, which will foster a sense of immersiveness and emotional attachment as experienced in the real world. Even though in Stephenson’s time relevance of such an ambitious tech project was very minute, now with the increasing demand to work remotely coupled with the proportionate requirement of making collective decisions the relevance of Metaverse to growing at an exponential rate.

The core principles of the Metaverse states that it must be operable by everybody, which means individuals with physical disability should also be able to interact with the Metaverse flawlessly. That is why, to enable such a facility BCI could be an efficient solution. BCIs can function by enabling a direct pathway for communication between the Brain, and the external devices used for controlling the Metaverse ecosystem. It aims to cut down or make the communication process independent of the human peripherals such as hands and legs [2,3]. Non-invasive BCIs [4] function by identifying and converting EEG signals into machine-understandable instructions. Based upon applications various types of EEG signals are used in BCIs, and they are spontaneous signals which are generated via intrinsic brain activity, and stimulated signals generated from stereotyped sensory stimulus. [5] Sensorimotor Rhythms linked to MI come under the former type, on the other side [6,7], P300 scenario-based potential and steady-state visual evoked potential (SSVEP) fit the latter type [8,9,10].

Research activities have been carried out to fuse BCIs with Virtual Reality. Such integration would result in rapid development of BCI systems, as adjustable training and evaluation environments can be implemented [11,12], which would be very risk-free and its complexity can also be varied to a greater extent. BCI development in Virtual Reality would also proportionate a similar growth in the Metaverse, as the whole Metaverse is based upon Virtual Reality.

Various EEG-facilitated control mechanisms in Virtual Environment are developed using MI [13,14]. Previous mechanisms were focused on mapping imaginative or mind-depicted bodily coordination to stop/forward instructions in a singular pathway [15]. After that, a self-coordinated model was introduced [16], where the implemented systems can depict MI-centered brain functioning and distinguish between various MI tasks [17,18]. However, due to sophisticated variations of Sensorimotor Rhythms linking body movements and mental activities, numerous reports suggest that accurate classification results are obtained when two activities are distinguishable/distinct [19,20].

MI-driven BCIs tend to suffer from low degrees of freedom, mainly because they depend on subjects imagining the execution of precise movements, which is hard to identify reliably with EEG signals. This is most obviously realized when multiple discrete commands need to be performed [21,22]. Furthermore, MI-based BCIs are also characterized by high inter-user variability, such that not all users can produce consistent motor imagery patterns, lowering system performance.

P300-based BCIs, on the other hand, take advantage of the brain’s involuntary reaction to rare stimuli, the P300 potential, so that users can choose from a greater number of options. Research indicates that P300 spellers, for instance, allow users to select characters from a matrix, thus facilitating many commands [23,24]. In addition, P300-based systems usually need little training since the P300 response is a spontaneous reaction to target stimuli, hence more convenient for applications requiring a greater number of unique instructions [25,26]. With these benefits, P300-type BCIs are usually a more effective option for applications involving multiple discrete commands, especially in complex virtual worlds or metaverse-based applications.

P300 potential was first implemented by Bayliss [11] to manage various entities in a Virtual House [27,28,29]. Further P300-fostered Virtual Environment navigation procedures were proposed, for example quad-directional wheelchair navigation system utilized P300 to enable various motion orientations of a Virtual Arm [30].

Hybrid BCI which fuses MI and SSVEP is also proposed. This functions by detecting the two EEG pulses either sequentially [31,32] or simultaneously [33,34], and due to this performance of both MI and SSVEP-based BCI systems improved. Another Hybrid BCI is capable of handling a 2D cursor by identifying P300 and MI pulses concurrently and independently. Having the capability to distinguish between two types or more types of signals makes these hybrid BCIs much more efficient than the ones that are used traditionally [32,35].

This paper aims to fuse both P300 and MI to propose a hybrid BCI manipulating procedure, the aim of which is to amplify the control facilities achievable by a singular MI or P300-based BCI in the Virtual Environment/Ecosystem. P300 and MI potentials are identified in a sequential procedure as per the control state of the Virtual Ecosystem. The main advantage of such a hybrid system is that it utilizes two EEG patterns, which are functioning under designated conditions. Manipulation procedure utilizing P300 signals to drive the control panels of Virtual Gadgets with identifiable instructions, and MI signal to relentlessly explore the Virtual Ecosystem- an important criterion for the Metaverse implementation.

Figure 1 shows the overall workflow of the proposed mechanism. The process starts with the acquisition of EEG signals and subsequent preprocessing to obtain relevant attributes. Then, the P300 and MI signals are classified sequentially by employing machine learning algorithms. The classified signals are then translated to corresponding instructions for their implementation in the Metaverse. The process is repeatedly executed and thus ensures the dynamic adaptability of the proposed architecture.

2. Related Works

Brain-Computer Interface (BCI) research has seen great expansion, especially in relation to immersion technologies like Virtual Reality (VR) and the Metaverse. There has been a remarkable body of work investigating the use of P300 and Motor Imagery (MI) signals in isolation for facilitating user interaction in virtual worlds.

Bayliss and Donnerer et al. [11,36] provided some of the initial evidence supporting the utility of P300 signals for virtual control, establishing through their work that event-related potentials (ERPs) could meaningfully support discrete command selection for virtual environments. Their implementations were foundational to what later work using ERP-based control panels and device interaction in stimuli-driven contexts discovered.

Concurrently, MI-driven BCI research has aimed at continuous control, usually based on the imagined movement of limbs to control navigation or movement in virtual environments. Singh et al. [37] noted the need to discriminate between more than one MI task accurately in order to elicit effective system response, particularly where real-time navigation is called for. Nonetheless, MI-driven systems tend to be constrained by variability specific to users and decreased classification accuracy in the case of multi-command environments.

Recent research has highlighted the need for non-invasive BCI frameworks that enable greater accessibility and flexibility. For example, Zhu et al. [38] introduced flexible, user-centric BCI architectures that improve interaction fidelity under different user conditions—a key requirement for Metaverse integration on a large scale.

The use of BCIs in VR has also been a critical enabler of Metaverse applications. Kohli et al. [39] provided an extensive review of BCI-VR interactions, underlining their potential in adaptive training, virtual learning, and therapy. These applications not only validate the practicality of BCIs in virtual ecosystems but also highlight the growing necessity for systems that support both discrete and continuous commands effectively.

In addition, recent developments have further pushed the boundaries by combining machine learning and avatar-based interaction with BCIs. Zhu et al. [38] extended the use of BCI-controlled avatars to augment social presence within virtual communities, while Gu et al. [40] added state-of-the-art classification algorithms to advance signal interpretation and resilience. These cross-disciplinary approaches indicate a widespread convergence towards hybrid and context-sensitive BCI systems capable of adapting to intricate, multimodal interaction contexts.

New developments in hybrid Brain-Computer Interfaces (BCIs) have aimed to integrate several EEG paradigms like MI, SSVEP, and P300 potentials to sidestep the limitations associated with unimodal systems. For example, Flores Vega et al. (2022) [41] proposed EEG-TCFNet—a model that consolidates Temporal Convolutional Networks (TCNs), LSTM, and fuzzy neural blocks—for P300 classification in smart home control. Although the model performed well in subject-dependent environments (accuracy of up to 98.6%), its generalization across users decreased drastically to 74.3%, demonstrating the potential for limitations in transferability and real-time application.

Also, Luo et al. (2023) [32] suggested a hybrid BCI based on MI and SSVEP signals through a dual-stream CNN structure. Their architecture aimed to maximize classification in hybrid command situations but was restricted to 2D interaction tasks and non-adaptive switching or immersive control scenarios. Garakani et al. (2019) [42] also presented a P300-controlled robotic arm controller but without user-adaptive reasoning or concomitant multimodal control, which made it only applicable to low-dimensional discrete commands.

In contrast, our work presents several key innovations. First, it introduces a context-aware hybrid BCI that dynamically switches between MI (for continuous navigation) and P300 (for discrete control) based on user interaction states within a virtual Metaverse. This sequential switching mechanism is more cognitively intuitive and avoids signal interference—unlike prior concurrent hybrid models. Second, our system attains better classification accuracy (P300: 88.5%, MI: 88.5% maximum average) by combining a Transformer-based temporal encoder with SVM classification for P300 and a CSP + FLDA pipeline user-personalized for MI detection. Third, our system is built in an actual 3D Metaverse environment—enabling complex navigation, virtual device control, and contextual responsiveness, which is lacking in existing works. These characteristics make our solution one of the first real-world-deployable hybrid BCI systems for immersive applications.

Even with these developments, the majority of previous research has been on unimodal BCI methods—either P300 or MI—and their uses in standalone applications like spelling, prosthetic control, or simple navigation. Hybrid BCIs that synergistically integrate P300 and MI are not well explored, particularly in dynamic Metaverse environments that require both fine-grained selection and smooth navigation. This work aims to fill this gap by proposing a hybrid system that can dynamically interchange modalities in order to provide intuitive, effective, and flexible virtual interactions.

3. Research Gaps and Contributions

3.1. Research Gaps

Despite excellent progress in the use of MI- and P300-based BCI, stand-alone application in virtual worlds exhibits fundamental limitations. Previous BCI research has considered either MI- or P300-based control extensively but has been unsuccessful in consolidating both paradigms as a single, hybrid system suitable for metaverse use. A missing hybrid control paradigm results in ineffective navigation, limited interaction, and poor adjustability in applications in the physical world.

Current research has looked into hybrid BCIs, but they are generally limited to simple applications like spelling or robot control and do not provide complete virtual metaverse experiences. Additionally, classification accuracy and system adaptability are still significant challenges, with negligible research into optimizing signal-switching mechanisms for improving the user experience. These limitations identify the necessity of a more generalized, adaptive, and user-convenient hybrid BCI system with the intuitive maneuverability of MI-based BCIs and the point-to-point selectivity of P300-based BCIs.

The integration of BCIs with Metaverse worlds poses a set of new challenges that are different from what would be encountered with conventional virtual or augmented reality systems. Unlike linear or task-oriented virtual applications, the Metaverse is inherently dynamic, persistent, and multidimensional, such that users must continually switch between continuous spatial movement (e.g., moving or rotating an avatar) and discrete command-based interactions (e.g., picking devices or content). This requires a strong control paradigm that can switch between asynchronous input modalities with no interruption to the user. In addition, the high sensory load characteristic of Metaverse environments exacerbates the cognitive load on users, requiring systems that reduce mental fatigue while still being highly responsive. These considerations require certain design accommodations within hybrid BCI architectures, particularly with regard to signal-switching logic, user-centric feedback mechanisms, and interface latency optimization, all of which are essential to provide smooth and naturalistic interactions within immersive virtual environments.

3.2. Research Contributions

To bridge this research gap, we introduce a new hybrid P300-MI BCI system that supports effortless control and interaction in metaverse worlds. Our contributions are:

Development of a Novel Hybrid BCI System

-

Merges Motor Imagery (MI)-based continuous navigation and P300-based discrete control.

-

Enables both navigate-through and navigate-through-and-interact-with-devices interactions within a Metaverse environment using EEG signals exclusively.
Context-Aware Adaptive Switching Mechanism

-

Switches dynamically between MI and P300 control modes depending on interaction context reducing cognitive complexities.

-

Improves usability by reducing redundant mode switching.
Signal Processing Advancements

-

Deploys Transformer + SVM hybrid architecture for P300 classification with high accuracy.

-

Uses Fisher’s Linear Discriminant Analysis (FLDA) and Common Spatial Patterns (CSP) for the detection of MI.
Experimental Metaverse Implementation

-

Implemented in an apartment virtual setting with OpenSceneGraph and C Sharp and control of real-time EEG signal.

-

Tasks are navigation and device usage (e.g., TV/music control).
Robust Experimental Validation

-

Performed with 8 simulated participants with different amounts of BCI experience.
Data Augmentation for Robustness

-

Presented EEGGAN-Net for synthesizing artificial EEG data, promoting classifier generalization and reducing data imbalance.
Real-Time Training Personalization

-

Frequency bands and CSP weights are adapted using real-time data to improve MI classification.

By overcoming the existing shortcomings of MI- and P300-based BCIs, this research sets a new ground for improved BCI-based control in the metaverse, with a more intuitive, flexible, and high-performance interaction model.

4. Equipments and Methodology :

The arrangement includes a Metaverse ecosystem running on a resource-sufficient computer and on a separate computer an EEG signal amplifier, a device manipulator, and a signal processing unit developed for identifying both P300 signals and MI.

The P300 (Figure 2) and Non-P300 signals (Figure 3), play critical roles in event-related potential (ERP)-based BCIs. Considering to a choose a TV channel for an instance. The P300 signal shows a clear positive peak approximately at 300ms, which represents a neural response when the user recognizes a target stimulus, like choosing a TV channel. The response typically appears at parietal and central electrode sites (e.g., Pz, Cz). Conversely, the Non-P300 signal is background EEG activity when no response to the stimulus is found. The fact that it has no distinct peak at 300ms differentiates it from the P300 signal and makes it essential for differentiation between intentional and non-intentional selections in BCI uses.

The MI (Figure 4) signals coordinates for imagination of left-hand and right-hand movements in proposed study, respectively, that are vital in commanding a BCI-based system. The left-hand MI signal displays a desynchronization of the mu (8–12 Hz) and beta (18–26 Hz) rhythms, measured from the C3 electrode covering the motor cortex. Also, the right-hand MI signal shows rhythmic fluctuations on the C4 electrode, relating to right-hand movement imagination. These patterns allow BCIs to categorize movement intentions, so users can operate a system (e.g., change TV channels) with mere thought. Merging P300 and MI signals improves BCI accuracy and ease of use, making hybrid BCIs more useful for practical applications.

The digital 3D apartment ecosystem depicting the mini Metaverse is assembled using OpenSceneGraph (OSG), an open-source graphics engine constructed on the Open Graphics Library (OpenGL). The manipulator, coded in C#, is accountable for specifying the starting and ending of procedures or device states while promoting instruction exchange among various device modules. The signal processing unit is created on Microsoft.

Foundation Classes (MFC) software, which employs the Matlab engine to handle EEG signals. It recognizes signals and transmits decrypted instructions to the Metaverse ecosystem through the manipulator. Coordination between the various units is enabled by sockets, allowing the mechanism to be dispersed among numerous computers.

4.1. Hybrid BCI Handling Procedure:

In the present investigation, the Metaverse ecosystem displays two separate states: device manipulation and navigation. Device manipulation concerns addressing multiple discrete instructions, possible through the enactment of the P300 oddball paradigm or model. Contrarily, navigation instructions emanate from continued MI signals. By displaying the distinct pros of MI and P300 signals in specified scenarios, the presented manipulation procedure offers a continuous and precise interaction with the Metaverse ecosystem.

Likewise, the detection of MI and P300 signals happens successively, depending upon the device state, where the device or the system state is either the MI or P300 detector stage. The Metaverse ecosystem is split into two classes: device manipulation and navigation states. Originally, the device starts in the navigation state, where MI signals are identified and decoded into persistent navigation instructions. These instructions prompt position updates within the Metaverse ecosystem, activating a detector with each update. Figure 5 represents the Navigation and Gadget Management Synchronization in the Hybrid BCI System, which is discussed above.

Upon witnessing the current position aligning with areas selected for device manipulation, the controller/manipulator discontinues MI identifcation and changes to the device manipulation state. Here, the participant is offered with a virtual device manipulation panel employing the P300 oddball paradigm/model. The device again shifts back to the navigation state once the participant chooses the ’quit’ instruction from the control panel.

To understand the functioning of the BCI system, we are here considering the presence of a mini-Metaverse (Figure 6). The Virtual Ecosystem present here is an apartment, with numerous rooms and each room contains furnitures and electronic gadgets. Various navigation instructions like forward/backward and left/right are supported by the Virtual Ecosystem. We assume it is a setup provided to a person with paralysis, who will be operating this environment according to his/her personal goal. Alongside this, Virtual devices like TV and Music Systems are also equipped with procedures to handle them, which is viewed by the subject in the P300 handling phase. With these facilities equipped in our virtual house, the hybrid BCI control module replicates natural gestures performed in the physical world. Subjects present can be trained in this environment to handle various electronic gadgets, and later project those experiences to control utilities like self-controllable wheelchairs, etc. by utilizing P300 and MI signals.

In this paper, Sensorimotor Rhythms correlated to imagery of left/right coordination are identified, which enables two navigation instructions in our virtual house. In the system we are proposing, imagery left/right movement are being coordinated to achieve the same in the virtual house.

As per the above system configuration, the users termed as Avatars in the Metaverse are confined to only the living room with a TV, Music System, and Furniture around it. The Avatar is stationed in the middle of the living room and can have only a right and left view of the room. The TV or the Music System zooms in when the user rotates in a direction 10 degrees where the TV or the Music System is stationed. The zoomed-in manipulation panel represents a P300 oddball paradigm, where handling instruction imagery acts as a stimulus to trigger the user’s P300 potential.

4.2. Signal Refining:

Both MI and P300 signals identifiers are designed to work independently of each other due to distinct studying methods of the two EEG signals.

4.2.1. P300 Potential Identification

Under an oddball paradigm, a P300 potential is evoked by presenting a rare target stimulus among a series of non-target stimuli [8]. To ensure consistent and identifiable P300 patterns, it is customary to average the results across multiple trials [9]. In our study, one instance of target identification involved conducting 25 trials, each comprising five stimuli with each command on the control panel heightened once as a stimulus.

Data Preprocessing and useful Feature Gathering: During signal processing, an initial regression-based procedure will be applied to cut down ocular artifacts from the obtained EEG data. Subsequently, drift correction will be performed using piecewise cubic spline interpolation, followed by the application of a low-pass filter (specifically, a 5th-order Butterworth filter with a cutoff frequency of 15 Hz). These procedures are necessary to maximize the signal-to-noise ratio of low-frequency ERPs, especially the P300 component, which normally appears at approximately 300 ms following the stimulus.

To emulate realistic recording conditions, synthetic ocular and baseline drift artifacts were incorporated into the simulated EEG signals. A regression-based artifact reduction technique was subsequently applied to suppress these components, followed by piecewise cubic spline interpolation for baseline drift correction. Rather than relying exclusively on frontal electrodes (Fp1, Fp2) for ocular artifact estimation, preprocessing was focused on a subset of 14 electrodes—Fz, FC3, FCz, FC4, C5, C3, Cz, C4, C6, CP3, CPz, CP4, Pz, and Oz—selected for their established relevance to P300 generation. This procedure ensured consistency with standard EEG preprocessing pipelines while preserving the temporal morphology of simulated event-related potentials.

The EEG signals after artifact correction were filtered through a 5th-order Butterworth low-pass filter with a 15 Hz cutoff frequency. Selection of the 5th-order filter was made based on a requirement of high roll-off for attenuation of high-frequency artifacts such as EMG and gamma noise while achieving a flat passband to keep intact important ERP components such as N100, P200, and P300.

A motivation for choosing the application of a 15 Hz low-pass filter in ERP signal preprocessing was based on the fact that it seeks to improve the signal-to-noise ratio (SNR) without losing the most informative ERP components like N100, P200, and P300 that are mostly of the 0.1–15 Hz spectrum. While some high-frequency components (e.g., gamma-related ERPs) can be as high as 30–40 Hz, our interest here is in slow cortical potentials and cognitive components that are most important for BCI paradigms. Recent research has shown that employing a 15 Hz cutoff successfully enhances classification accuracy and minimizes contamination from high-frequency artifacts like EMG and muscle noise, with minimal loss of pertinent cognitive information [43,44,45]. In addition, some state-of-the-art BCI models have incorporated similar preprocessing techniques to augment ERP-based decision-making and cognitive decoding. Although we recognize the compromise, wider frequency ranges will be addressed in future extensions when high-frequency content is of concern.

Lastly, the continuous data will be segmented into distinct epochs based on various stimuli, and epochs featuring identical stimuli will be averaged. Each epoch encompassed data samples taken between 100 and 800 milliseconds after the onset of the stimulus event.

This preprocessing pipeline improves the temporal definition and clarity of the P300 waveform, allowing for better classification performance. In our testing, this technique accounted for a 6–8% improvement in classification accuracy compared to raw, unfiltered data. The low-pass filtering step specifically ensures that the characteristic P300 peak at 300 ms is preserved, unobscured by high-frequency distortions that would otherwise hinder detection accuracy. The preprocessing pipeline for P300 signal is represented in Figure 7.

Class Classification: For each classification, among the G stimuli (where G represents the number of stimuli in the oddball paradigm), one responded to an epoch containing the P300 wave. The remaining out of interest stimuli were associated with epochs lacking the P300 wave. The training dataset consisted of epoch data categorized into two classes: P300 and non-P300, excluding those labeled as G stimuli. Consequently, a hybrid transformer-SVM (Support Vector Machine) (Algorithm 1) classifier was trained for stimulus to address this two-class problem.

Algorithm 1: P300 Classification using Transformer, SVM Hybrid Model for each

Require: EEG trials

X \in R^{N \times T \times C}

, Labels

Y \in {0, 1}^{N}

Ensure: Predicted Labels

\hat{Y}

1:: Step 1: Preprocessing
2:: - Apply bandpass filtering (0.1–15 Hz) to EEG signals.
3:: - Normalize each EEG channel (zero mean, unit variance).
4:: - Segment trials into fixed-length epochs (T time steps).
5:: Step 2: Transformer Feature Extraction
6:: - Project EEG signals to a higher-dimensional space:
7:: $H_{0} = Linear (X)$
8:: - Add positional encoding to retain temporal dependencies:
9:: $H_{0} = H_{0} + PosEnc (H_{0})$
10:: for $l = 1$ to L (Number of Transformer Layers) do
11:: - Compute multi-head self-attention:
12:: $Q, K, V = H_{l - 1}$
13:: $H_{l} = MHSA (Q, K, V)$
14:: - Apply feedforward network (FFN):
15:: $H_{l} = FFN (H_{l})$
16:: - Apply Layer Normalization and Dropout.
17:: end for
18:: - Extract final feature vector:
19:: $F = GlobalAvgPool (H_{L})$
20:: Step 3: SVM Classification
21:: - Train SVM with RBF kernel using extracted features $F$ and labels $Y$ .
22:: - Optimize hyperparameters (C, $γ$ ) via grid search.
23:: - Predict class labels for test EEG trials:
24:: $\hat{Y} = SVM (F)$
25:: Return Predicted Labels $\hat{Y}$ .

Architectural Details Transformer + SVM Model (Algorithm 1) for P300 Classification: Transformer structure is used in order to model the complex temporal dependencies of EEG signals. The raw EEG input is initially projected to a higher dimension space by a learnable linear transformation. Positional encoding is then added to maintain the sequence order of EEG. The Transformer Core consists of multiple layers of Multi-Head Self-Attention (MHSA) and Feedforward Networks (FFN). The MHSA mechanism enables the model to attend to the relevant signal components across time and is thus highly effective at capturing the subtle variation of differences in P300 responses. The output is averaged globally to form a feature vector for the trial.

This feature vector is passed through an SVM classifier using an RBF (Radial Basis Function) kernel, which has been demonstrated to be effective for EEG classification issues. RBF is a kernel function used in (SVMs) and other machine learning models to handle non-linearly separable data. The SVM is trained using hyperparameter optimization (grid search over C and

γ

) to ensure best decision boundaries. The separation between the P300 and non-P300 trials is handled by the classifier based on high-dimensional representations learned by the Transformer.

Benefits of Transformer + SVM Method Temporal Dependency Modeling: Transformer’s self-attention mechanism allows it to acquire long-distance dependencies across EEG sequences, crucial for P300 detection.

Strong Feature Learning – Unlike CNN-based models, relying on spatial features, the Transformer learns temporal representation critical for ERP classification.
Generalization using SVM – RBF kernel SVM is employed to make robust decisions, preventing overfitting and enhancing classification accuracy on new examples.
Generalization using SVM – RBF kernel SVM is employed to make robust decisions, preventing overfitting and enhancing classification accuracy on new examples.
Scalability and Transferability – The modularity of this approach allows for easy extension to other EEG-based BCI applications aside from P300 classification.

4.2.2. Motor Imagery Identification

In the present system, the mental assumption of left/right is realized as left/right turning in the virtual house. The identification of MI signals is based on the analysis of multi-band-pass filtered physiology-constrained synthetic EEG signals.

The MI identifier receives an epoch consisting of m EEG channels. Artifact removal and baseline correction, are the same for both Motor Imagery (MI) and P300 signals, but the frequency filtering parameters are different according to their respective spectral profiles. P300 signals contain mainly low-frequency components and are usually filtered in the 0.1–15 Hz range to allow amplification of slow cortical potentials and suppression of high-frequency noise. Conversely, MI-related activity is spread over a wider range, mostly in the 8–30 Hz range, spanning the mu (8–12 Hz) and beta (13–30 Hz) rhythms, which are significant for motor-related feature extraction [43,44]. Hence, for MI signals, a band-pass filter (usually 8–30 Hz) is utilized to preserve motor-relevant oscillations, unlike low-pass filtering for P300. Along with artifact removal piecewise cubic spline interpolation is done. After the initial preprocessing is completed the signal is segregated into multiple frequency bands (F-bands) via band-pass filtering, and then each band passes through spatial filtering. Due to these specialized filtering n number of channels, where (

n < = m

) remains. At last, the log of band power is obtained, which can be depicted by the equation:

P_{j}^{i} = l o g (v a r (x_{j}^{i}))

(1)

The

v a r ((x_{j}^{i}))

is depicted as the variance of the pulse from the

j^{t h}

channel of the

i^{t h}

F-band. The pulse attributes is formed by

P_{j}^{i}

, where i runs from from 1 to m, and j runs from 1 to n, and to this more transformation is performed via FLDA to convert it into a control instruction which coordinates positive/negative output to left/right rotation.

Before feedback-enabled stimulation, the parameters associated with spatial filtering and FLDA are trained and the F-bands for each user are mentioned to be accustomed to participant’s EEG properties. The feedback-less data gathered in the training session will be segregated into training and testing samples.

Spatial Filtering and training of the FLDA classifier: Spatial filtering and FLDA parameters were established through the utilization of training samples. A CSP was employed for spatial filtering. CSP identifies a transition matrix

W T

, which linearly converts the data matrix S from each F-band into a new matrix X. In this transformation, the disparities in power characteristics between left and right motor imagery are maximized in X, such that X is equivalent to

W T

times S [46].

Once the transformed signal X and its derived features were obtained, an FLDA classifier was trained to differentiate between the mental imagery of left and right-hand movements. FLDA is a widely used linear classifier that achieves this by projecting the original samples onto a single dimension, maximizing the ratio of variation between classes to variation within classes [47,48].

Selection of Frequency band :The test samples underwent preprocessing and were subjected to filtering using a Chebyshev I filter of order 4. A 4th-order Chebyshev Type I filter is employed for MI preprocessing as it offers an optimal tradeoff between sharp frequency selectivity, low computational cost, and protection of important EEG rhythms, which are essential to decode MI activity. This process resulted in the creation of 17 F-bands that overlapped, ranging from 1 to 5 Hz, 3 to 7 Hz, and so on up to 33 to 37 Hz. Each band had a width of 4 Hz and an overlap of 2 Hz, covering the frequency range from 1 to 37 Hz.

In each F-band, the processes of spatial filtering, extraction of signal power, and application of the FLDA classifier will be carried out. The outputs from the FLDA were then utilized to calculate both classification efficiency and Fisher’s linear discriminant criteria (FDC). The accuracy indicates the importance of each F-band in the overall classification, whereas the FDC measures the distinctiveness of the outputs. These two metrics assess how well an F-band contributes to the classification process.

The FDC is determined by calculating the ratio between among-class distance, with the distance present within-class.

F D C = {(m_{1} - m_{2})}^{2} / (σ_{1}^{2} + σ_{2}^{2})

(2)

where

m_{1}

and

m_{2}

represents the mean between two motor activity, and

σ_{1}^{2}

,

σ_{2}^{2}

are their respective standard deviations.

Then the 17 F-bands are arranged in descending order as per their designated FDC values. The number of selected bands is excavated by:

t = a r g m i n {l | \sum_{j = 1}^{l} F D C_{j} > = 70 % x \sum_{j = 1}^{17} F D C_{j}}

(3)

As per the above condition, F-bands are picked for individual users and F-bands less than 75% are eliminated. The minimum selection criterion is set between 70% and 75%, and at last, if overlapping of picked bands occurs then fusing is performed.

Figure 8 represents the working procedure for preprocessing and detecting MI pulses described above.

5. Experimental Procedure

5.1. Data Collection and Model Training Procedure:

In place of human subjects’ participation, the entire experiment was carried out with the help of a physiology-constrained EEG signal simulation and replay tool. Task-related EEG signals for each task were generated based on well-established physiological properties of P300 and MI paradigms. For example, EEG signals for P300 tasks were generated by simulating event-related potentials with positive components between 250-400 ms, dominant frequency components in the delta-theta range, and spatial components over centro-parietal electrodes. On the other hand, EEG signals for MI tasks were generated by simulating event-related desynchronization and synchronization with frequency components in the mu and beta frequency bands, respectively, with spatial components localized to sensorimotor areas. These task-related components were combined with background EEG activity modeled as band-limited colored noise.

Eight virtual subjects were created by varying the distributions of baseline EEG power, signal-to-noise ratios, ERP amplitudes, ERD/ERS magnitudes, and channel covariance matrices, simulating inter-subject variability as seen in actual EEG recordings. Four virtual subjects were employed for parameter estimation and data accumulation, and the remaining four for online testing and control assessment, as in the original experiment. EEG signals were simulated for 26 channels arranged according to the international 10-20 system, with the nose as reference and the forehead as ground. Channel impedance was limited to below

10 k Ω

, and signals low-pass filtered at 40 Hz, digitized at a rate of 250 Hz to simulate actual EEG recording practice. To detect the P300 potential, fourteen channels that included Fz, FC3, FCz, FC4, C5, C3, Cz, C4, C6, CP3, CPz, CP4, Pz, and Oz are involved in real settings, and for the MI pulse detection, channels that contained (Cz, Fz, FC3, FC4, C3, C4, CP3, CP4, C5, C6, F1, F2, F3, FC1, FC2, F4, C1, C2, CP5, CP1, CP2, and CP6 are involved (Figure 9).

The entire experiment protocol was reproduced in software, including the MI training, P300 data collection, and online testing phases. In the MI training phase, 80 simulated trials were included, consisting of equal numbers of left- and right-hand motor imagery, each of 2-second duration, synchronized to visual cues. In the P300 data collection phase, five different conditions were simulated, each with an 80/20 non-target to target probability, where event-locked ERPs were generated in accordance with the onset timing of the stimuli. In the online testing phase, the simulated EEG trials were fed sequentially into the testing pipeline, similar to real EEG-based BCIs, including causality, buffering, and processing latency. The same preprocessing, feature extraction, and classification pipeline was used as in real EEG-based BCIs without any changes, ensuring that the system performance was similar to real online operating conditions.

5.1.1. MI Parameter Training Phase

MI training phase:The previous sections details the strategy for MI detection. Here the experimental strategy to fine-tune the parameters are discussed. The primary training round is based on calibrating the parameters associated with MI identifiers. The round composed of 80 simulated runs, each of which commenced with a sign on the display depicting left or right-hand locomotion thinking, remaining for 2 seconds. The users were successively exposed to these runs, with 40 simulated signs depicting left-hand locomotion thinking, and the rest 40 depicting right-hand simulated locomotion thinking. Trained parameters were extracted from this data used for training and utilized in the system for future real-time use.

Trained Parameters in the MI Identification System:

The trained parameters used in this process are listed as follows:

Spatial Filtering Parameters CSP:

Weight Matrix $W T$ :

-

The CSP algorithm computes a transformation matrix $W T$ that linearly transforms the initial EEG signal S to a new matrix X.

-

The transformation maximizes the difference in power characteristics between left-hand and right-hand motor imagery signals.

-

Such spatial filters support discriminative feature extraction for classification.

FLDA Parameters:

Projection Vectorw:

-

The FLDA classifier computes a best projection vector w to project EEG features onto a one-dimensional space with the maximum class separability.

-

It is learned from training samples of left/right-hand motor imagery signals.
Class Means $m_{1}, m_{2}$ and Variances $σ_{1}^{2}, σ_{2}^{2}$ :

-

The mean values $m_{1}, m_{2}$ are average signal feature values for the left- and right-hand imagery classes.

-

The variances $σ_{1}^{2}, σ_{2}^{2}$ are measurements of intra-class variability.

-

They are used to FDC to determine classes’ separability.

Parameters for Selecting Frequency Band:

Values for F-Bands using FDC:

-

FDC is computed for every F-band for its role in classification.

-

The top bands contributing at least 70–75% of total FDC are selected.
Selected Frequency Bands for Each User:

-

Based on FDC values, an optimal set of F-bands is chosen individually for each participant.

-

Overlapping bands are summed if necessary.

Chebyshev I Filter Parameters:

Filter Order: 4.
Filter Cutoff Frequencies: Defines the 17 overlapping frequency bands (e.g., 1–5 Hz, 3–7 Hz, …, 33–37 Hz).

Log of Band Power Features

P_{j}^{i}

:

The variance of each of the chosen F-bands is calculated and transformed into log-band power features to be classified.

These learned parameters are used in real time to classify incoming EEG signals into left/right-hand motor imagery classes, and these are mapped to virtual house rotation accordingly.

5.1.2. P300 Data Collection and Training Phase

P300 data accumulation phase: To create a well-balanced and representative P300 classification dataset, we followed a systematic data accumulation, augmentation, and class-balancing approach. We created 25 trials for each stimulus over 5 stimuli, totaling 125 target trials per subject. With 4 simulated subjects, the combined dataset already had 500 actual P300 target trials (100 per stimulus) prior to augmentation.

To guarantee real-world relevance and avoid classifier bias, we had an 80/20 split of non-target and target trials. That is, for every 100 target trials per stimulus, 400 non-target trials were added, resulting in a total of 2,500 real trials (500 target + 2,000 non-target).

For handling class imbalance and enhancing model generalizability, we used EEGGAN-Net [49], a current-state-of-the-art GAN architecture specifically designed for EEG data augmentation. Table 1 represents the EEGGAN-Net architecture used in this analysis. Real P300 target trials were augmented with 4 synthetically generated samples each, thereby significantly growing the dataset size. Further, adaptive oversampling was used, which duplicated selectively high-quality trials of P300 to maintain intra-class variation without leading to overfitting.

Post-augmentation and oversampling, the dataset was increased to 7,500+ trials, with a proper 60/40 class balance between non-target and target classes. This improved dataset improves the performance of classifiers, minimizes bias, and provides effective P300 detection in actual EEG-based BCI applications.

For each P300-stimulus we have 600 P300 trials with 900 non-P300 trials using it for the each of the prepared classifiers (Table 2).

Ensuring Label Consistency After Augmentation and Oversampling: Preserving label integrity during adaptive oversampling and augmentation is important for maintaining the original class distribution and allowing the model to learn from realistic, correctly labeled data. We designed the augmentation and oversampling process with care to maintain label consistency across trials, in the following steps:

Combining Trials Across Virtual Subjects

-

Trials across all subjects were combined for each stimulus, resulting in a larger set of P300 responses per stimulus.

-

This makes sure variability over subjects is accounted for and more effective augmentation and training are facilitated.
Data Augmentation with EEGGAN-Net

-

EEGGAN-Net was utilized to create synthetic EEG trials retaining the spectral, temporal, and spatial characteristics of P300 signals.

-

Every synthetic trial was given the same label as the actual real trial that it was derived from.

-

This made sure that P300 target trials were kept labeled as targets and non-target trials were kept non-target.
Adaptive Oversampling for Class Balancing

-

As the initial dataset had an 80/20 non-target/target ratio, oversampling was implemented solely on the minority (P300 target trials).

-

High-quality, clean target trials were selectively replicated to maintain inter-trial variability but to elevate the proportion of target samples.

-

Oversampled target trial labels were left unchanged from their source trials to avoid label inconsistency.
Final Verification and Dataset Structure

-

Following augmentation and oversampling, the final dataset had an 60/40 split while substantially boosting the total number of trials.

-

Trials were cross-checked by comparing their spectral and temporal characteristics to ensure synthetic and oversampled data remained in line with actual EEG responses.

Advantage of the proposed strategy for data collection:

Merging Trials Across the Simulated Subjects

-

Boosts the statistical power of every stimulus, providing strong class-wise augmentation.

-

Reduces inter-subject variability by synchronizing features across various participants.

-

Provides consistency in labeling for GAN-based augmentation and oversampling.
Data Augmentation Using EEGGAN-Net

-

Produces realistic P300 signals for every stimulus considering inter-subject variability.

-

Temporal modeling provides generated EEG signals with real-world brain activity patterns.

-

Prevents overfitting by generating diverse synthetic samples, enhancing generalization.
Class Balancing Using Adaptive Oversampling

-

Chooses only high-quality actual P300 samples for oversampling, not simple replication.

-

Provides more accurate representation of minority class, enhancing classifier training.

-

Avoids overfitting to replicated samples, in contrast to naive oversampling methods.
Better Non-Target vs. Target Ratio

-

60/40 ratio provides real-world relevance, rendering the model deployable.

-

Avoids classifier bias, providing true discrimination between P300 and non-target trials.

Table 3 represents the Final Dataset Structure After Augmentation and Oversampling.

P300 Training:To train the P300 identifier, data collected before the experiment were utilized after dedicated preprocessing. Each classifiers are trained with data dedicated for the particular stimulus (Table 2 representing the dimension of the employed dataset per stimuli) using the hybrid transformer-SVM architecture. The collected data are trained with a hybrid transformer-SVM classifier. Then a 10-fold cross-validation procedure produced accuracies varying from 93% to 95% in the testing process across the identifiers. A Stratified 70%/30% split is used to ensure that both target (P300) and non-target trials are proportionally represented in the training and validation sets. The trained hybrid transformer-SVM classifier was then employed in the current system for online P300 identification. The online testing mechanism is based on the real-time Metaverse implementation. The ROC curves for the P300 stimuli classification is represented in Figure 10.

5.2. Model Testing Procedure

The 22 testing rounds were segregated into three portions are executed using simulated signals. All of these 22 rounds are repeated across five sessions with simulated signals. The first portion contained eight rounds to utilize a hybrid handling procedure, the second portion contained eight rounds for evaluating the MI-fostered navigation procedure, and the third portion hosted six rounds for scrutinizing the P300-based device control mechanism.

During the experimental rounds which included dynamic feedback, the P300 oddball model contained an inter-stimulus interval (ISI) which lasted for 200 ms. Intensification/strengthening of stimulus in the control panel was done for 50 ms, and an interval gap of 150 ms was maintained for each intensification procedure. It took 10 runs to make a command depiction. In each run, every instruction on the control panel was intensified/strengthened once. In the navigation period, feedback was provided in each second, producing a 8.5° left or right turn corresponding to each feedback signal. As stated earlier the classification depiction of FLDA was converted into navigation instruction by employing the class segregation of FLDA with left or right locomotion (negative values identifying ’left’ and positive values identifying ’right’). In comparison to the training stage, no indications were provided to the users for instructions on MI in the testing rounds. This indicated that the locomotion was dynamically performed, and each user had to decide in real time which command to produce or select based on the present outputs and the pre-stated activity assigned by the virtual operator. The activities in portion 1, which leveraged the hybrid handling procedure in the virtual house, were stated as follows:

Activity 1: Starting from the initial location, rotate left

90^{\circ}

to the TV handling panel, select a TV channel using P300, and then return to the initial location.

Activity 2: Starting from the initial location, rotate right

180^{\circ}

to the stereo, select a song to play using P300, and then return to the initial position.

Activity 3: Beginning from the initial location, turn left or right

360^{\circ}

, to select a channel or a song when the virtual gadgets (TV and stereo) are shown.

Activities 1 and 2 were done three times, while activity 3 was performed two times, once depicting a left rotation and once depicting a right rotation. After choosing a TV channel or a song, the simulated user is required to pick ’quit’ (a switching mechanism to shift the functional phase) to change the system back to the navigation phase. Transition to P300 handling occurs when the position falls into regions relating to gadget management/handling, and there might be error-prone classifications switching it to the MI navigation phase. Activities in Portion 2 were similar to that of Portion 1, except that the virtual device handling panels were not enabled when the user entered the relative locations. In this portion, only MI signals were identified. In portion 3, the rounds specifically evaluated P300 performance. In this scenario, the navigation procedure is turned off, and the handling panel of the virtual gadget (TV or stereo) is shown to the user. (running from round 1 to 3, continuing three times) needed the user to choose each command on the TV handling panel once, while activity 2 (running from round 4 to 6, repeating the same activity) concentrated on the stereo handling panel. This produced a total of 30 control instruction selections made in portion 3 (6 rounds × 5 selections per round).

6. Experimental Results

Figure 11 reflects an outline of the picked frequency ranges (referenced as F-bands) and the efficiency of the classifier in a sign-driven training round concentrated on MI for each simulated user. The results reflect harmonious top levels of accuracy across all four simulated users, with variations corresponding to each F-band. Alongside this, classification results leveraging a standard F-band varying from 8 to 30 Hz are also utilized for this study. To shift the system from the control phase back to the navigation phase, it is required to pick the ’quit’ instruction. To enable this, a baseline is established during the gadget handling phase. To be more specific, if six continuous not-’quit’ instructions were picked, the controller would automatically change the system state.

Table 4 and Table 5 provides a complete summary of the performance of a hybrid BCI control, particularly in eight rounds within portion 1. It elaborates on the time taken for both MI and P300 detection in each activity, For MI, accuracy was depicted by finding the ratio of correctly performed instructions to the total number of instructions interpreted to actions. For the P300 detection is based upon the hybrid transformer+SVM model evaluation. The evaluation is performed on the 4 simulated participants in real-time to determine the effectiveness of the proposed system in comparison to the mon-BCIs.

Figure 12 and Figure 13 represents the results with hybrid-BCI employment for identifying MI and P300. Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18 represents the accurcies for the hybrid and mono-BCIs for each of the participants.

Portions 2 and 3 are based on mono-BCIs serve as a point of reference with the hybrid handling procedure. The results from these experiments have been combined in Table 4 and Table 5, representing the mean accuracy across corresponding activity rounds. The results achieved in the hybrid control portion are similar or close to those found in mono-BCI testing. This reflects that the coalition of both P300 and MI in the proposed hybrid BCI procedure for Metaverse environment propagation is a feasible architecture for future advancement.

7. Post Experiment Discussions

This research presents a procedure for managing a Metaverse ecosystem using a blend of MI and P300 signals. The purpose is to allow the simulated subjects to navigate and manipulate Metaverse devices efficiently. Primary tests show that participants can certainly execute simplified navigation and device manipulation within the Metaverse ecosystem. When analogizing this hybrid procedure with a single-mode BCI manipulation procedure, it has been noticed that there is no considerable reduction in EEG signal identification, indicating that more complicated tasks can be precisely managed using the hybrid procedure. Even though, participants reported that the hybrid activities were deemed more complicated due to the necessity to switch between motor and visual concentration. Unlike other hybrid BCIs, which connect various EEG signals, this approach concentrates on MI and P300 signals. The MI training outcomes indicate that user-specific frequency band choosing can improve classification accuracy, emphasizing the significance of individual improvement.

The difference between training and testing outcomes for both MI and P300 identification could be explained via brain functioning divergences caused by online outcomes and disparities in training paradigms/procedures. The hybrid manipulation procedure offers a natural relationship with the virtual setup, and we admit there are scopes for modification. Admitting involuntary state changes and including EEG signals for extra instructions like transitioning ahead are recommended upgradations to enable efficient navigation and activities within the Metaverse. Moreover, the feasibility of integrating the proposed system in real-world Metaverse scenarios is further uplifted with intelligent signal preprocessing and transformer-SVM hybrid model implementation, especially suited for distributed Metaverse environments.

As stated above, this research study presents a hybrid Brain-Computer Interface system combining MI and P300 signals designed for navigation and device control within the Metaverse. Classification accuracy for the MI signals achieved by the proposed system is 88.5% (maximum average) and that of P300 signals is 88.5% (maximum average), thereby being a viable means to enhance user interaction in virtual environments. This will be contrasted with other published works to establish the merit of the proposed system.

Li et al. [50] implemented a hybrid BCI system utilizing Mu/Beta rhythms for the control of movement imagination and P300 potentials to manipulate a 2D cursor. Their implementation achieved an average classification accuracy of 85% for both the MI and the P300 signal. In the current study, the P300 accuracy is as high as 88.5%, which, in turn, means that the underlying P300 control mechanisms are well improved. This increase in accuracy suggests that the signal processing techniques and classification methods used here effectively enhance the recognition of P300 signals, making it quite useful for the control and interaction of devices in the Metaverse. Precise identification of P300 signals will be necessary for seamless interaction with virtual objects to improve user experience and accessibility.

In the case of MI-based BCIs, Singh et al. [37] studied the classification performance of MI in non-invasive EEG-based BCIs. The results presented an accuracy value between 75% and 80%. However, in this study, a higher value of 81.7% accuracy for MI has been observed. This indicates that the proposed system will provide more accurate MI classification during navigation tasks. This improvement can be attributed to optimized feature extraction methods, advanced classifier selection, and individualized training procedures, thus allowing for more accurate differentiation between MI signals. Since MI-based navigation forms the backbone of enabling hands-free movement within virtual environments, the results conclude that the proposed hybrid system presents an interaction model that is much more effective and user-friendly than the conventional single-mode BCIs. Furthermore, our proposed strategy performs better in comparison with Sergio et al.’s [51] work on P300 signals for predicting a character with four series of 12 blinks per character depicting an F1-score of 75%.

Alongside accuracy, our hybrid BCI system shows unambiguous superiority in, flexibility, and immersive use. Although Flores Vega et al. (2022) [41] obtained high subject-dependent P300 accuracy, its performance drastically decreased under subject-independent conditions, suggesting poor generalization. In contrast, our system utilizes GAN-based data augmentation and subject-specific CSP-FLDA tuning, which not only improves inter-subject generalizability but also enhances MI signal separability—obtaining 88.5% maximum average MI classification accuracy, outperforming typical benchmarks ( 75–78%).

In addition, while Luo et al. (2023) [32] achieved 95.6% accuracy in MI-SSVEP classification, their system is not dynamic modality switching, is heavily dependent on exogenous stimulation (SSVEP), and does not facilitate seamless environmental control. Our approach, on the other hand, provides seamless modality switching, real-time avatar control, and gadget interaction within a virtual apartment Metaverse and thus is practically feasible for assistive use such as smart home control for physically disabled users.

Moreover, studies such as Garakani et al. (2019) [42] are limited to low-dimensional discrete robotic tasks and fixed stimulus-response mappings. Our work, however, deals with multimodal continuous-discrete hybrid control with real-time environmental feedback in a first-ever scenario in dynamic, immersive, and distributed Metaverse environments.

In summary, all the comparisons reflect that the hybrid MI-P300 system developed within this study significantly outperformed several other BCI methods (Table 6 and Table 7), notably with regard to classification accuracy for P300 signals. This study, therefore, improves both MI-based navigation and device control based on P300s in the Metaverse settings. The enhanced feasibility of the applications of hybrid BCIs within Metaverse settings has been thus facilitated. All findings indicate further improvement in future hybrid BCI system developments along two lines: optimal signal processing techniques and adaptive learning models with improved classification accuracy and real-time performance. Furthermore, this work resolves long-standing weaknesses in hybrid BCI systems—such as low transferability, fixed control logic, and limited task flexibility—by offering a strong, generalizable, and user-oriented paradigm. The findings reaffirm that our hybrid P300-MI model is not only technically superior but also more suited to the needs of next-generation immersive interfaces, establishing a new standard for hybrid BCIs in real-world virtual environments.

8. Real-Time Latency Evaluation

For interactive Metaverse worlds, end-to-end latency—i.e., the sum of time from EEG signal recording to ultimate command execution in the virtual world—is an essential performance measure. To assess the responsiveness of our system, we quantified the latency through each of the major processing stages of the hybrid BCI pipeline, from data acquisition, preprocessing, and classification to action execution.

Latency Breakdown (per trial):

EEG Acquisition Window: 800 ms (standard ERP post-stimulus window)
Preprocessing (Artifact Rejection + Filtering): 70 ms
Signal Classification (Transformer+SVM or FLDA): 55 ms
Command Interpretation + Metaverse API Execution (via Socket IO): 35 ms
Total End-to-End Latency: 960 ± 35 ms

This sub-second latency enables near-instantaneous feedback in both the P300-based gadget control and MI-based avatar navigation phases. To measure responsiveness in tasks (e.g., avatar rotation or channel changing), a timestamp logging system was added at each module level. In 30 hybrid trials (8 hybrid navigation trials, 22 hybrid gadget control trials), the latency remained well under 1 second, with slight variability based on the trial-dependent complexity of EEG preprocessing.

These findings are comfortably within the acceptable limits for real-time non-invasive BCI systems, as suggested in the literature (e.g., <1000 ms in Millán et al. [52]; <1200 ms in VR BCI research by Lécuyer et al. [53]). The lowest latency enables smooth, low-friction interaction, which is essential for sustaining user immersion and control continuity in the Metaverse.

In addition, because MI-based navigation is based on recurrent classification with 1-second update periods, it automatically integrates into the response loop of the system. For P300 control, every choice is initiated after a series of 10 repetition trials (with 200 ms inter-stimulus interval), which follows standard P300 paradigms without causing any extra delay.

9. User Learning Curve and Fatigue Impact Analysis for Real-time user Implementation

Along with classification accuracy and task execution time, we also performed an approximate in-depth analysis of user learning progression and fatigue impact, both of which are essential for the long-term usability of hybrid BCI systems in immersive environments such as the Metaverse.

9.1. Learning Curve Assessment

To examine adaptability, we estimated user performance in real-time across five training sessions. Performance was assessed through classification accuracy, command execution latency, and subjective task ease (NASA-TLX). On average:

P300 accuracy expected to increase from session 1 to session 5 by a fair margin
MI accuracy expected to increase from session 1 to session 5 by a fair margin
A decrease in perceived mental effort by session 5

This pattern shows that the system facilitates quick familiarization and motor-cognitive adaptation, particularly when user-specific CSP and SVM calibration are available. An additional context-aware modality switch lowers the cognitive load even more by preempting continuous recall of individual control paradigms.

9.2. Fatigue Impact Analysis

P300 accuracy is expected to fell off slightly later than 20–25 minutes in some users in real-time (by a small margin)
MI-based control expected to have constant accuracy over time but had elevated false positives near the end of the session
Expected growing eye strain and decreased attention span after 25 minutes, particularly on high visual load tasks (P300 trials)

These findings capture the cognitive fatigue curve characteristic of non-invasive BCI use. Notably, the system’s hybrid nature permits switching between mental and visual strategies, reduced-fatiguing to some extent by distributing cognitive load. Additionally, brief micro-rest intervals between tasks served to sustain engagement over time.

The joint analysis illustrates that, although BCI performance is sensitive to cognitive endurance and user familiarity, adaptability of the proposed system, modality variation, and training protocol enhance usability as time passes. These results establish the system’s viability for prolonged real-world use, subject to adaptive calibration and dynamic monitoring of user state. The findings from session 1 to 5 is stated in Table 8 and Table 9.

10. Discussion on Participant Size and Statistical Significance Analysis

The hybrid BCI simulation is done using eight simulated participants. To overcome data volume limitations and increase signal diversity, we utilized a GAN-based EEG data augmentation technique (EEG-GAN) that was trained on actual signals to synthetically augment the training set. This helped increase model resilience and minimize the likelihood of overfitting. We also performed statistical significance testing to assess the reliability of performance gains observed. Paired t-tests indicated statistically significant improvements in both P300 and MI classification accuracies between sessions (P300: p = 0.014, Cohen’s d = 1.02; MI: p = 0.021, Cohen’s d = 0.88), which confirmed the existence of strong learning effects. These results indicate that in spite of the moderate sample size, the suggested hybrid BCI system shows consistent performance, user adaptability, and improvement through training, and it is a solid foundation for future scaling to larger and more diverse populations.

11. Advantages of the Proposed Research Study

Enhanced Control Flexibility: Integrates Motor Imagery (MI) for continuous navigation and P300 signals for discrete gadget interaction within the Metaverse.
Context-Aware Adaptive Switching: Dynamically switches between MI and P300 modes based on the interaction context (e.g., navigation vs. control).
Sophisticated Signal Processing: Comprises a Transformer + SVM combination for P300 classification and FLDA + CSP pipeline for MI signal decoding.
Real-Time Implementation: Proven successful in a 3D virtual apartment setting, confirming real-world feasibility of the hybrid BCI system.
EEG Data Augmentation: Leverages EEGGAN-Net to synthetically create realistic EEG data, overcoming class imbalance and enhancing training efficacy.

12. Constraints of the Proposed Research Study

Limited Participant Pool: Experimental study involved a mere four simulated participants, limiting generalizability across wider populations. Furthermore, the EEG signals for each participants is simulated to mimic the real-time participants.
High Cognitive Load: Forces users to switch between visual and motor attention tasks, which may lead to mental wear and tear in the long term.
Static Mode Switching Logic: Switches between MI and P300 based on virtual location and not on user intention or control.
Inter-Subject Variability: MI performance varies significantly among users, especially for those without prior BCI training.
System Complexity: Relies on multiple components (Matlab, MFC, OpenSceneGraph, EEG amplifier), making real-world deployment more complex.
Limited Navigation Dimensions: MI-based navigation currently supports only left/right rotation, lacking support for forward/backward or vertical movement.
Feedback Latency: Real-time system feedback provides minor latencies in signal classification and mode switching.
Training Overhead: Needs extensive offline calibration and parameter adjustment per user, restricting its out-of-the-box functionality.

13. Comparative Analysis with Single-Modality BCIs

Even though previous sections demonstrate the operation of the designed hybrid BCI system in live Metaverse setups, it is crucial to critically compare its performance with conventional single-modality BCIs to determine its merit.

13.1. Quantitative Improvements

The overall average accuracies of the hybrid BCI system for all the simulated participants on navigation (MI) and interaction (P300) tasks are 88.5% (maximum average) and 88.5%, (maximum average) respectively. These figures are consistently at or near their single-modality counterparts:

For navigation-only tasks, the independent MI system averaged approximately 80.9%, while hybrid MI performance attained 85.0% in hybrid rounds.
For device interaction tasks, the independent P300 method averaged 83.0%, whereas the hybrid configuration achieved as high as 87.0–90.0%, depending on task and participant order.

This reliability demonstrates that modalities’ individual strengths are not only maintained but also improved by combining them, particularly in state-transition situations where users needed to switch between spatial orientation and control panels.

13.2. Functional Superiority

In addition to numerical accuracy, the hybrid system demonstrates more functional flexibility:

It supports continuous navigation and multi-command interaction in the same control session—something single-modality systems cannot offer.
In contrast to pure MI systems, which can only support low-degree-of-freedom motion, the hybrid model allows users to move and select—thereby mimicking natural task workflows in the Metaverse.
P300-only systems would need an inordinate amount of visual stimuli to mimic spatial movement, which is cognitively inefficient and slow.

14. Metaverse-Specific Adaptations and Challenges

Hybrid BCI here been investigated within controlled or standalone environments, yet their use within the Metaverse presents a specific range of design issues and interactive complexities. The Metaverse is not a constant virtual interface but an ongoing, real-time, immersive digital world. This makes adaptive and context-aware control regimes imperative beyond conventional BCI constructs.

14.1. Multidimensional Navigation Complexity

The Metaverse presents a larger 3D space that necessitates not just left/right rotation but also eventual incorporation of forward, backward, and even vertical movement. Standard MI-based BCI systems are generally restricted to binary or 1D movement, which is insufficient in immersive environments. To counteract this, our system presently utilizes left/right rotational movement through motor imagery and sets the stage for incorporating richer spatial mobility in subsequent work.

14.2. Asynchronous Control State Switching

In contrast to the structured BCI tasks where stimuli and user commands follow regular cycles, Metaverse interaction is asynchronous and continuous, requiring smooth transitions between navigation (MI) and interaction (P300) phases. To meet this, we employed a sequential signal detection mechanism based on the spatial context of the avatar—guaranteeing that MI signals are inhibited during device manipulation phases and re-enabled upon task completion. This dynamic switching is necessary to minimize classification conflicts and maintain signal clarity.

14.3. Cognitive Load and User Fatigue

Metaverse contexts provide rich visual, multimodal input, placing higher cognitive burden on users needing to switch mentally between visual focus (P300 phase) and kinesthetic imagining (MI phase). While functional, this hybrid approach loads more on cognition. In our user experience, mental fatigue was evident more with hybrid than with unimodal BCIs. Future works will implement adaptive mode-changing as a function of EEG-based fatigue detection or online inference of user intent.

14.4. Signal Latency and Feedback Responsiveness

Low-latency interaction is the top priority for user immersion. The present setup of the system, spread over several machines and using MATLAB-MFC-OSG pipelines, creates small but perceptible feedback delays, particularly under high-speed signal switching. For a smoother experience, socket-based communication was optimized, and feedback loops were strengthened with real-time updates to classification. Future deployment on a single platform (e.g., Unity or Unreal Engine) will probably enhance system responsiveness.

14.5. Inclusivity and Assistive Design

One of the key benefits of using BCI in the Metaverse is the potential for assistive accessibility. Our design even actively accommodates users with physical disabilities by supporting full interaction from EEG signals exclusively. But such calibration and single-user training need to be meticulously done, given that inter-user variability, particularly for MI signals, is a problem. Mechanisms for personalization—such as adaptive selection of frequency bands—were made to overcome it and enhance the consistency of performance.

15. Future Research Avenues

Although the hybrid BCI system shows robust classification accuracy, a few shortcomings are present. The existing configuration is susceptible to signal interference from signal artifacts, especially when there is MI-based control, resulting in occasional misclassification or unintentional state switching. These mistakes would be more apparent in extended sessions as user fatigue grew. Potential future enhancements would involve the incorporation of online artifact suppression methods, adaptive thresholding, or state-aware classifiers to enhance system robustness. Moreover, the lack of real individuals in this stage restricts generalizability to the target assistive group. Future clinical trials and increased subject diversity with real participation will contribute to determining the system’s suitability for real-world assistive contexts.

In future deployments, the hybrid BCI system can be further improved to deliver richer navigation instructions like going forward, going backward, and moving upward or downward to support rich 3D mobility within immersive Metaverse spaces. Another potential method includes the addition of an entirely autonomous mode switch facility that can sensitively detect user intention or context data in order to automatically switch between MI and P300 modes without any advance design specification. Further, the addition of other EEG modalities such as SSVEPs or attention-based signals would significantly enhance the responsiveness and control resolution of the system.

Scaling the system to operate on big-scale real-world Metaverse environments like Unity or Unreal Engine would give qualitative indications of how it scales and how interaction fidelity holds up. Moreover, adaptive learning algorithms like reinforcement learning or meta-learning can be utilized to tweak the system for every user, refining its performance over time with feedback and extended usage. There is also an argument for a shift to light, wearable EEG headsets to make it more convenient and comfortable to use for the user, making it more appropriate for use by non-specialist users.

Later iterations of the system could provide for the simultaneous interaction of multiple users, and allow for collaborative activity in group virtual environments via BCI input. Emotional state recognition and cognitive load sensing support would also enable the system to dynamically adapt its action according to the user’s psychological state. Last but not least, extending the usage scope to clinical and assistive settings, particularly for those with severe physical disabilities, might make the system a powerful means of communication, environmental control, and rehabilitation in real-world healthcare settings.

16. Conclusions

This research paper proposes a hybrid BCI control mechanism that employs MI and P300 signals into one BCI system. Each signal’s unique properties are utilized to upgrade the control approach for a virtual house setup, which we can consider as a mini-Metaverse. MI signals are converted instructions for virtual controlling, whereas P300 signals function to provide instructions for gadget management. These two EEG signals are identified during non-overlapped system control situations in a successive manner, which enables the fusion of Sensorimotor Rhythms and P300 signals allowing efficient functioning of a virtual environment or the Metaverse.

Conflicts of Interest

The author state that they have no financial or personal relationships that could be perceived as influencing the research presented in this paper.

References

Turjya, S.M.; Singh, R.; Sarkar, P.; Swain, S.; Bandyopadhyay, A. Smart Education Resource Management System: A Federated Game Theoretic Approach on Metaverse with Strict Preferences. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2024; IEEE; pp. 1–6. [Google Scholar]
Wolpaw, J.R.; Birbaumer, N.; Heetderks, W.J.; McFarland, D.J.; Peckham, P.H.; Schalk, G.; Donchin, E.; Quatrano, L.A.; Robinson, C.J.; Vaughan, T.M.; et al. Brain-computer interface technology: a review of the first international meeting. IEEE transactions on rehabilitation engineering 2000, 8, 164–173. [Google Scholar] [CrossRef]
Tang, X.; Shen, H.; Zhao, S.; Li, N.; Liu, J. Flexible brain–computer interfaces. Nature Electronics 2023, 6, 109–118. [Google Scholar] [CrossRef]
Islam, M.K.; Rastegarnia, A. Recent advances in EEG (non-invasive) based BCI applications. Frontiers in Computational Neuroscience 2023, 17, 1151852. [Google Scholar] [CrossRef] [PubMed]
Wolpaw, J.R.; Birbaumer, N.; McFarland, D.J.; Pfurtscheller, G.; Vaughan, T.M. Brain–computer interfaces for communication and control. Clinical neurophysiology 2002, 113, 767–791. [Google Scholar] [CrossRef]
Pfurtscheller, G.; Neuper, C. Motor imagery and direct brain-computer communication. Proceedings of the IEEE 2001, 89, 1123–1134. [Google Scholar] [CrossRef]
Lun, X.; Zhang, Y.; Zhu, M.; Lian, Y.; Hou, Y. A Combined Virtual Electrode-Based ESA and CNN Method for MI-EEG Signal Feature Extraction and Classification. Sensors 2023, 23, 8893. [Google Scholar] [CrossRef] [PubMed]
Farwell, L.A.; Donchin, E. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalography and clinical Neurophysiology 1988, 70, 510–523. [Google Scholar] [CrossRef]
Donchin, E.; Spencer, K.M.; Wijesinghe, R. The mental prosthesis: assessing the speed of a P300-based brain-computer interface. IEEE transactions on rehabilitation engineering 2000, 8, 174–179. [Google Scholar] [CrossRef]
Sarraf, J.; Pattnaik, P.; et al. A study of classification techniques on P300 speller dataset. Materials Today: Proceedings 2023, 80, 2047–2050. [Google Scholar] [CrossRef]
Bayliss, J.D. Use of the evoked potential P3 component for control in a virtual apartment. IEEE transactions on neural systems and rehabilitation engineering 2003, 11, 113–116. [Google Scholar] [CrossRef]
Velasco-Álvarez, F.; Ron-Angevin, R. Asynchronous brain-computer interface to navigate in virtual environments using one motor imagery. In Proceedings of the Bio-Inspired Systems: Computational and Ambient Intelligence: 10th International Work-Conference on Artificial Neural Networks, IWANN 2009 Proceedings, Part I 10, Salamanca, Spain, June 10-12, 2009; Springer, 2009; pp. 698–705. [Google Scholar]
Burback, L.; Brult-Phillips, S.; Nijdam, M.J.; McFarlane, A.; Vermetten, E. Treatment of posttraumatic stress disorder: A state-of-the-art review. Current neuropharmacology 2024, 22, 557–635. [Google Scholar] [CrossRef]
Shershneva, M.; Kim, J.H.; Kear, C.; Heyden, R.; Heyden, N.; Lee, J.; Mitchell, S. Motivational interviewing workshop in a virtual world: learning as avatars. Family medicine 2014, 46, 251. [Google Scholar]
Leeb, R.; Pfurtscheller, G. Walking through a virtual city by thought. In Proceedings of the The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE; 2004; Vol. 2, pp. 4503–4506. [Google Scholar]
Leeb, R.; Lee, F.; Keinrath, C.; Scherer, R.; Bischof, H.; Pfurtscheller, G. Brain–computer communication: motivation, aim, and impact of exploring a virtual apartment. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2007, 15, 473–482. [Google Scholar] [CrossRef] [PubMed]
Zhao, Q.; Zhang, L.; Cichocki, A. EEG-based asynchronous BCI control of a car in 3D virtual reality environments. Chinese Science Bulletin 2009, 54, 78–87. [Google Scholar] [CrossRef]
Zhou, Y.; Yu, T.; Gao, W.; Huang, W.; Lu, Z.; Huang, Q.; Li, Y. Shared three-dimensional robotic arm control based on asynchronous BCI and computer vision. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2023. [Google Scholar]
Kronegg, J.; Chanel, G.; Voloshynovskiy, S.; Pun, T. EEG-based synchronized brain-computer interfaces: A model for optimizing the number of mental tasks. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2007, 15, 50–58. [Google Scholar] [CrossRef] [PubMed]
Zhu, S.; Hosni, S.I.; Huang, X.; Wan, M.; Borgheai, S.B.; McLinden, J.; Shahriari, Y.; Ostadabbas, S. A dynamical graph-based feature extraction approach to enhance mental task classification in brain–computer interfaces. Computers in Biology and Medicine 2023, 153, 106498. [Google Scholar] [CrossRef]
Ahn, M.; Ahn, S.; Hong, J.H.; Cho, H.; Kim, K.; Kim, B.S.; Chang, J.W.; Jun, S.C. Gamma band activity associated with BCI performance: simultaneous MEG/EEG study. Frontiers in human neuroscience 2013, 7, 848. [Google Scholar] [CrossRef]
Baniqued, P.D.E.; Stanyer, E.C.; Awais, M.; Alazmani, A.; Jackson, A.E.; Mon-Williams, M.A.; Mushtaq, F.; Holt, R.J. Brain–computer interface robotics for hand rehabilitation after stroke: a systematic review. Journal of neuroengineering and rehabilitation 2021, 18, 1–25. [Google Scholar] [CrossRef]
Piccione, F.; Giorgi, F.; Tonin, P.; Priftis, K.; Giove, S.; Silvoni, S.; Palmas, G.; Beverina, F. P300-based brain computer interface: Reliability and performance in healthy and paralysed participants. Clinical Neurophysiology 2006, 117, 531–537. [Google Scholar] [CrossRef]
Pitt, K.M.; Spoor, A.; Zosky, J. Considering preferences, speed and the animation of multiple symbols in developing P300 brain-computer interface for children. Disability and Rehabilitation: Assistive Technology 2025, 20, 171–183. [Google Scholar] [CrossRef]
McFarland, D.J.; Sarnacki, W.A.; Townsend, G.; Vaughan, T.; Wolpaw, J.R. The P300-based brain–computer interface (BCI): effects of stimulus rate. Clinical neurophysiology 2011, 122, 731–737. [Google Scholar] [CrossRef] [PubMed]
Leoni, J.; Strada, S.C.; Tanelli, M.; Brusa, A.; Proverbio, A.M. Single-trial stimuli classification from detected P300 for augmented Brain–Computer Interface: A deep learning approach. Machine Learning with Applications 2022, 9, 100393. [Google Scholar] [CrossRef]
Edlinger, G.; Krausz, G.; Groenegress, C.; Holzner, C.; Guger, C.; Slater, M. Brain-computer interfaces for virtual environment control. In Proceedings of the 13th International Conference on Biomedical Engineering: ICBME 2008 3–6 December 2008 Singapore, 2009; Springer; pp. 366–369. [Google Scholar]
Rashid, M.; Sulaiman, N.; PP Abdul Majeed, A.; Musa, R.M.; Ab Nasir, A.F.; Bari, B.S.; Khatun, S. Current status, challenges, and possible solutions of EEG-based brain-computer interface: a comprehensive review. Frontiers in neurorobotics 2020, 25. [Google Scholar] [CrossRef]
Shah, V.N.; Singh, R.; Turjya, S.M.; Ahuja, P.; Bandyopadhyay, A.; Swain, S. Unmanned Aerial Vehicles by Implementing Mobile Edge Computing for Resource Allocation. In Proceedings of the 2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS); IEEE, 2024; pp. 1–5. [Google Scholar]
Chen, W.d.; Zhang, J.h.; Zhang, J.c.; Li, Y.; Qi, Y.; Su, Y.; Wu, B.; Zhang, S.m.; Dai, J.h.; Zheng, X.x.; et al. A P300 based online brain-computer interface system for virtual hand control. Journal of Zhejiang University SCIENCE C 2010, 11, 587–597. [Google Scholar] [CrossRef]
Pfurtscheller, G.; Solis-Escalante, T.; Ortner, R.; Linortner, P.; Muller-Putz, G.R. Self-paced operation of an SSVEP-Based orthosis with and without an imagery-based “brain switch:” a feasibility study towards a hybrid BCI. IEEE transactions on neural systems and rehabilitation engineering 2010, 18, 409–414. [Google Scholar] [CrossRef]
Luo, W.; Yin, W.; Liu, Q.; Qu, Y. A hybrid brain-computer interface using motor imagery and SSVEP Based on convolutional neural network. Brain-Apparatus Communication: A Journal of Bacomics 2023, 2, 2258938. [Google Scholar] [CrossRef]
Allison, B.Z.; Brunner, C.; Kaiser, V.; Müller-Putz, G.R.; Neuper, C.; Pfurtscheller, G. Toward a hybrid brain–computer interface based on imagined movement and visual attention. Journal of neural engineering 2010, 7, 026007. [Google Scholar] [CrossRef] [PubMed]
Pan, K.; Li, L.; Zhang, L.; Li, S.; Yang, Z.; Guo, Y. A Noninvasive BCI System for 2D Cursor Control Using a Spectral-Temporal Long Short-Term Memory Network. Frontiers in Computational Neuroscience 2022, 16, 799019. [Google Scholar] [CrossRef] [PubMed]
Pfurtscheller, G.; Allison, B.; Brunner, C.; Bauernfeind, G.; Solis-Escalante, T.; Scherer, R.; Zander, T.; Mueller-Putz, G.; Neuper, C.; Birbaumer, N. The hybrid BCI Front. Neurosci 2010, 4, 10–3389. [Google Scholar]
Donnerer, M.; Steed, A. Using a P300 brain–computer interface in an immersive virtual environment. Presence: Teleoperators and Virtual Environments 2010, 19, 12–24. [Google Scholar] [CrossRef]
Singh, A.; Hussain, A.A.; Lal, S.; Guesgen, H.W. A comprehensive review on critical issues and possible solutions of motor imagery based electroencephalography brain-computer interface. Sensors 2021, 21, 2173. [Google Scholar] [CrossRef] [PubMed]
Zhu, H.Y.; Hieu, N.Q.; Hoang, D.T.; Nguyen, D.N.; Lin, C.T. A human-centric metaverse enabled by brain-computer interface: A survey. IEEE Communications Surveys & Tutorials, 2024. [Google Scholar]
Kohli, V.; Tripathi, U.; Chamola, V.; Rout, B.K.; Kanhere, S.S. A review on Virtual Reality and Augmented Reality use-cases of Brain Computer Interface based applications for smart cities. Microprocessors and Microsystems 2022, 88, 104392. [Google Scholar] [CrossRef]
Gu, X.; Cao, Z.; Jolfaei, A.; Xu, P.; Wu, D.; Jung, T.P.; Lin, C.T. EEG-based brain-computer interfaces (BCIs): A survey of recent studies on signal sensing technologies and computational intelligence approaches and their applications. IEEE/ACM transactions on computational biology and bioinformatics 2021, 18, 1645–1666. [Google Scholar] [CrossRef]
Vega, C.F.; Quevedo, J.; Escandón, E.; Kiani, M.; Ding, W.; Andreu-Perez, J. Fuzzy temporal convolutional neural networks in P300-based Brain–computer interface for smart home interaction. Applied Soft Computing 2022, 117, 108359. [Google Scholar] [CrossRef]
Garakani, G.; Ghane, H.; Menhaj, M.B. Control of a 2-DOF robotic arm using a P300-based brain-computer interface. arXiv 2019, arXiv:1901.01422. [Google Scholar]
Medhi, K.; Hoque, N.; Dutta, S.K.; Hussain, M.I. An efficient EEG signal classification technique for Brain–Computer Interface using hybrid Deep Learning. Biomedical Signal Processing and Control 2022, 78, 104005. [Google Scholar] [CrossRef]
Mughal, N.E.; Khan, M.J.; Khalil, K.; Javed, K.; Sajid, H.; Naseer, N.; Ghafoor, U.; Hong, K.S. EEG-fNIRS-based hybrid image construction and classification using CNN-LSTM. Frontiers in Neurorobotics 2022, 16, 873239. [Google Scholar] [CrossRef]
Chakravarthi, B.; Ng, S.C.; Ezilarasan, M.; Leung, M.F. EEG-based emotion recognition using hybrid CNN and LSTM classification. Frontiers in computational neuroscience 2022, 16, 1019776. [Google Scholar] [CrossRef]
Blankertz, B.; Tomioka, R.; Lemm, S.; Kawanabe, M.; Muller, K.R. Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal processing magazine 2007, 25, 41–56. [Google Scholar] [CrossRef]
Bishop, C. Pattern recognition and machine learning. Springer google schola 2006, 2, 5–43. [Google Scholar]
Qi, F.; Wu, W.; Liu, K.; Yu, T.; Cao, Y. A Logistic Regression Based Framework for Spatio-Temporal Feature Representation and Classification of Single-Trial EEG. In Proceedings of the International Conference on Cognitive Systems and Signal Processing, 2020; Springer; pp. 387–394. [Google Scholar]
Hartmann, K.G.; Schirrmeister, R.T.; Ball, T. EEG-GAN: Generative adversarial networks for electroencephalograhic (EEG) brain signals. arXiv 2018, arXiv:1806.01875. [Google Scholar] [CrossRef]
Li, Y.; Long, J.; Yu, T.; Yu, Z.; Wang, C.; Zhang, H.; Guan, C. An EEG-based BCI system for 2-D cursor control by combining Mu/Beta rhythm and P300 potential. IEEE Transactions on Biomedical Engineering 2010, 57, 2495–2505. [Google Scholar] [CrossRef]
Bernal, S.L.; Beltrán, E.T.M.; Pérez, M.Q.; Romero, R.O.; Celdrán, A.H.; Pérez, G.M. Study of P300 Detection Performance by Different P300 Speller Approaches Using Electroencephalography. In Proceedings of the 2022 IEEE 16th International Symposium on Medical Information and Communication Technology (ISMICT), 2022; IEEE; pp. 1–6. [Google Scholar]
Millán, J.d.R.; Rupp, R.; Müller-Putz, G.R.; Murray-Smith, R.; Giugliemma, C.; Tangermann, M.; Vidaurre, C.; Cincotti, F.; Kübler, A.; Leeb, R.; et al. Combining brain–computer interfaces and assistive technologies: state-of-the-art and challenges. Frontiers in neuroscience 2010, 4, 161. [Google Scholar] [CrossRef]
Lécuyer, A.; Lotte, F.; Reilly, R.B.; Leeb, R.; Hirose, M.; Slater, M. Brain-computer interfaces, virtual reality, and videogames. Computer 2008, 41, 66–72. [Google Scholar] [CrossRef]

Figure 1. A high level depiction of the Hybrid P300-MI signal facilitated Brain-Computer Interface.

Figure 2. P300 signal variance.

Figure 3. Non P300 signal variance.

Figure 4. The MI signal energy variance between left and right hand.

Figure 5. Navigation and Gadget Management Synchronization in the Hybrid BCI System.

Figure 6. The Metaverse apartment, and the audio and channel deciding procedure via P300. The MI signal helps to coordinate left/right movement of the Metaverse avatar.

Figure 7. The P300 Preprocessing Pipeline.

Figure 8. Working procedure for identifying MI pulses, and classifying the action to be taken.

Figure 9. The EEG channel layout.

Figure 10. ROC curves for the P300 Stimuli.

Figure 11. MI-based Training Data Results.

Figure 12. Testing results of the Hybrid BCIs across 3 activities averaged for 4 simulated participants for each activity, identifying MI.

Figure 13. Testing results of the Hybrid BCIs across 3 activities averaged for 4 simulated participants for each activity, identifying P300.

Figure 14. Testing results of the standalone and Hybrid BCIs 4 simulated participants averaged for 3 activities, simulated participant A.

Figure 15. Testing results of the standalone and Hybrid BCIs 4 simulated participants averaged for 3 activities, simulated participant B.

Figure 16. Testing results of the standalone and Hybrid BCIs 4 simulated participants averaged for 3 activities, simulated participant C.

Figure 17. Testing results of the standalone and Hybrid BCIs 4 simulated participants averaged for 3 activities, simulated participant D.

Figure 18. Testing results of the standalone and Hybrid BCIs averaging out 4 simulated participants, representing the feasibility of utilzing Hybrid BCIs.

Table 1. EEGGAN-Net Architecture for EEG Data Augmentation.

Component	Architecture Details
Generator (G)	- Input: Noise vector + Stimulus Label
	- 2D Transposed Convolution Layers with BatchNorm + ReLU
	- LSTM-based Temporal Feature Modeling to capture EEG dependencies
	- Spectral Constraints to match EEG frequency characteristics
	- Output: Synthetic EEG trials for each stimulus
Discriminator (D)	- Input: Real/Synthetic EEG Trials
	- 1D Convolutional Layers with Leaky ReLU
	- Bidirectional GRU for EEG sequence modeling
	- Output: Real/Fake Classification
Classifier (C)	- Input: EEG Data (Real + Synthetic)
	- CNN-LSTM Hybrid Model for feature extraction
	- Output: P300 vs. Non-P300 classification to ensure augmentation quality

Table 2. Dataset Instances with each Stimuli.

Class	Trials per Stimulus
P300 Target	600
Non-Target	900

Table 3. Final Dataset Structure After Augmentation and Oversampling.

Class	Original Trials	Augmented Trials	Oversampled Trials	Total Trials
P300 Target	500	2,000	500	3,000
Non-Target	2,000	2,000	500	4,500
Total	2,500	4,000	1,000	7,500

Table 4. Testing results of Hybrid Control – Activity 1 Accuracy.

Simulated User	Portion (rounds 1, 2, 3)	Activity 1 Online Accuracy (3 rounds)
A	1, Hybrid/MI	84.5
A	1, Hybrid/P300	91.0
A	2, MI	68.9
A	3, P300	86.7
B	1, Hybrid/MI	91.0
B	1, Hybrid/P300	80.9
B	2, MI	90.9
B	3, P300	87.8
C	1, Hybrid/MI	76.7
C	1, Hybrid/P300	88.9
C	2, MI	81.6
C	3, P300	80.0
D	1, Hybrid/MI	74.4
D	1, Hybrid/P300	93.3
D	2, MI	68.4
D	3, P300	86.7

Table 5. Testing results of Hybrid Control – Activity 2, Activity 3, and Average Accuracy.

Simulated User	Activity 2 Online Accuracy (3 rounds)	Activity 3 Online Accuracy (2 rounds)	Average
A	92.6	93.2	90.1
A	84.1	78.0	84.4
A	79.4	94.5	80.9
A	80.0	-	83.4
B	87.8	94.7	91.2
B	77.8	81.7	80.1
B	87.5	99.0	92.5
B	86.7	-	87.3
C	79.4	85.4	80.5
C	72.2	90.0	82.9
C	86.8	80.4	83.7
C	80.1	-	80.1
D	79.6	80.8	78.3
D	82.2	87.5	87.7
D	69.5	78.6	72.2
D	93.5	-	90.1

Table 6. Comparison of Hybrid and Single-Mode BCI Studies: Study Type, Modalities, and Accuracy.

Study	BCI Type	Modalities	Avg Accuracy
Li et al. [50]	Hybrid BCI	MI + P300	85% (MI & P300)
Singh et al. [37]	Single-mode BCI	MI only	75–80%
Sergio et al. [51]	P300-based BCI	P300	F1-score: 75%
Flores Vega et al. [41]	Deep Learning BCI	P300	98.6% (SD), 74.3% (SI)
Luo et al. [32]	Hybrid BCI	MI + SSVEP	95.6%
Garakani et al. [42]	P300-based BCI	P300	∼97%
Proposed Work	Hybrid BCI	P300 + MI	88.5% (P300), 88.5% (MI)

Table 7. Comparison of Features, Strengths, and Limitations of Prior Studies and the Proposed Work.

Study	Features / Strengths	Limitations
Li et al. [50]	2D cursor control with concurrent signal decoding	No adaptive switching; limited task flexibility
Singh et al. [37]	General-purpose motor imagery classification in EEG	No hybrid interaction; not integrated with immersive systems
Sergio et al. [51]	Character selection using blink-triggered P300 paradigm	Limited to speller task; no spatial interaction
Flores Vega et al. [41]	EEG-TCFNet model with LSTM and fuzzy logic	Poor generalization across users; no real-time use
Luo et al. [32]	CNN-based decoding with hybrid commands	Static switching; dependent on exogenous visual stimuli
Garakani et al. [42]	ERP-based control of 2-DoF robotic arm	Low-dimensional discrete control; no immersive interface
Proposed Work	Real-time 3D Metaverse integration; adaptive switching; GAN-based augmentation; CSP+FLDA pipeline	–

Table 8. Structure of user sessions in real-time in the hybrid BCI evaluation protocol: duration and activities.

Session #	Duration	Activities
Session 1	20–30 min	Initial use of MI and P300 control modules
Session 2	20–30 min	Guided MI and P300 tasks
Session 3	20–30 min	Repeated command execution (navigation + gadget control)
Session 4	20–30 min	Mixed-task session under light cognitive load
Session 5	30 min	Real-time task block in virtual apartment Metaverse

Table 9. Purpose of each user session in real-time in the hybrid BCI evaluation protocol.

Session #	Purpose
Session 1	Baseline performance measurement and familiarization with the hybrid BCI interface.
Session 2	Observe short-term learning progression and initial adaptation.
Session 3	Continued user-specific tuning and calibration of the classifiers, especially for MI.
Session 4	Evaluate medium-term stability and improved efficiency under natural use.
Session 5	Final test of accuracy, response time, usability, and fatigue resilience in an immersive scenario.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.