1. Introduction
Brain-computer interfaces (BCIs) enable direct communication between the human brain and external devices. As technology advances, the range of applications for BCIs spans from medical rehabilitation to enhancing human-computer interaction and entertainment [
1]. One common method to control a BCI is through the motor imagery (MI) paradigm. MI BCIs are particularly utilized as a rehabilitation strategy for post-stroke patients, helping in the recovery of affected limbs [
2,
3]. During MI the user imagines the movement of a body part without actual physical execution. This process of imagination shares neural mechanisms with the actual execution [
4], which makes MI BCIs especially suited for motor recovery in chronic stroke patients. Specifically, during MI, the power of the
and
rhythm measured over the sensorimotor area of the brain decreases (event related de-synchronization) and recovers after MI (event related synchronization).
BCIs in general, but especially MI BCIs are (initially) difficult to operate for inexperienced users as they rely on the endogenous modulation of brain rhythms [
5] instead of external stimuli. Consequently, a large percentage of users is not able to control a BCI, a problem known as BCI inefficiency [
6,
7]. [
6,
7] split BCI users into different groups based on their performances and recommend different solutions for each group to improve their performance. These recommendations include using a better decoder (which is able to extract more complex features), employing adaptive decoders to circumvent distribution shifts and longer/better user training.
User learning is especially important as there are users that exhibit promising brain modulation during the initial screening but then fail to elicit the proper signals during MI [
7]. In other words, among the users considered as inefficient, there are users which have the potential to efficiently control a BCI. Encouragingly, research indicates that BCI usage is a learnable skill [
8,
9,
10,
11,
12,
13] where users are able to improve their performance and brain modulation [
14] through longitudinal training. A crucial part of learning or mastering any skill is the guidance and feedback received during or after the execution [
15]. BCIs that provide feedback to the user are referred to as
closed-loop or
online BCIs whereas BCIs without feedback are termed
open-loop or
offline BCIs. Apart from enabling self-regulation and user learning, closed-loop BCIs also increase the attention and motivation of participants during BCI usage [
12,
16]. Concisely, delivering feedback is an indispensable cornerstone of BCIs.
To date, closed-loop systems mostly employ traditional methods such as Common Spatial Patterns (CSP) combined with Linear Discriminant Analysis (LDA) or Support Vector Machine (SVM) classifiers [
17]. This is contrary to the trends in single-trial/offline decoding, where deep learning has predominantly overtaken traditional methods [
18]. Deep learning models, especially convolutional neural networks (CNNs), exhibit superior performance by implicitly learning complex distinctive features directly from data. However, how these superior offline performances of deep learning models can be translated to online decoding is almost entirely unaddressed in the literature.
[
19,
20] validated the general feasibility of deep learning for online control by employing long windows as in single-trial decoding to control a robotic arm. In [
19] the arm was moved at the end of the trial, whereas the approach of [
20] is closer to continuous online control as they use sliding windows ( 4
length,
shift). While both studies prove that deep learning based decoders are generally useful for control, their settings are not suited for continuous feedback as the window size is too long and the update frequency too low.
In other studies [
21,
22,
23] small sliding windows (
- 1
) were used to continuously control virtual reality feedback or the position of a cursor. [
21] developed a new CNN while [
22,
23] used modified versions of ShallowNet [
24] and EEGNet [
25] as decoding architectures which are among the most popular offline decoding models.
We hypothesize that there are two primary factors contributing to the limited literature on DL models for closed-loop decoding. Firstly, as DL models are mostly developed to classify whole trials it is unclear how to properly use them for shorter sequences (e.g., sliding windows). Secondly, DL models require substantial amounts of training data, and since closed-loop decoders are typically trained individually for each subject, the burden of offline calibration needed for each subject would be overwhelming.
We solve the first problem by proposing a new method called real-time adaptive pooling (RAP) that modifies existing offline deep learning models towards online decoding. As RAP tailors the deep learning model towards the specific online decoding requirements (short window size, high update frequency), our model is able to decode multiple consecutive windows at once which reduces the computational demand by a large factor. RAP allows re-using intermediate outputs of the network and therefore effectively exploits the continuous and overlapping nature of sliding windows. This is important because although sliding windows enable continuous control, they introduce a larger computational demand per trial. For short windows and high update frequencies, i.e., a high overlap between consecutive windows, the number of windows per trial increases. This would result in more forward passes and consequently a larger computational demand per trial if each window would be decoded individually.
The second limiting factor mentioned above is the large amount of training data required. The straight-forward solution of just recording more calibration data for each subject is time-consuming and would burden, fatigue or bore the user over time due to the non-interactive (without feedback) recording procedure. Consequently, even if one opts to record extensive calibration data per subject, the quality of the recorded data would likely be compromised.
An alternative approach to tackle this issue is to leverage existing data collected from other subjects to train a cross-subject decoder. As the number of subjects increases, the amount of data required per individual subject decreases. Additionally, a cross-subject decoder is able to immediately provide feedback to facilitate user learning without the need for an open-loop calibration phase.
However, despite deep learning having the ability to generalize across domains to a certain extent when trained on multiple domains, cross-subject models still underperform their within-subject counterparts (if there is enough subject-specific data). This is because of the domain shift between the training and test data, which arises from the different EEG patterns exhibited by different individuals. Solutions mitigating such shifts are categorized as domain adaptation methods. In the context of domain adaptation, the training data is often referred to as source data/domain and the test data is referred to as target data/domain. Depending on the setting and the availability of target data, different solutions such as supervised few-shot learning, unsupervised domain adaptation (UDA) and online test-time adaptation (OTTA) are possible.
In this work we will investigate how different domain adaptation techniques can be used to adapt a pre-trained cross-subject model towards a specific target subject. Compared to the typically employed within-subject decoders that are trained once with subject-specific training data, our method has multiple advantages. It can be 1) calibration-free and is therefore able to immediately provide feedback to facilitate user learning without a dedicated calibration for the target user. Through the usage of a cross-subject model we also 2) eliminate the risk of building a subject-specific decoder based on bad data that would potentially hamper subsequent user learning [
26]. Further, the domain adaptation part of our framework allows the model to 3) evolve from a generic model towards a user-specific model. Through the continuous adaptation during OTTA, the decoder can also adapt to behavioral changes within the user which enables mutual learning of user and decoder.
Figure 1.
Trial structure for the Dreyer2023 dataset [
27]. Each trial starts with a fixation cross, followed by a short auditory signal. The cue occurs after 3 seconds and is present for 1.25 seconds. The cue is followed by a 3.75 second feedback phase. Between the trials a black screen is displayed for 1.5 - 3.5 seconds.
Figure 1.
Trial structure for the Dreyer2023 dataset [
27]. Each trial starts with a fixation cross, followed by a short auditory signal. The cue occurs after 3 seconds and is present for 1.25 seconds. The cue is followed by a 3.75 second feedback phase. Between the trials a black screen is displayed for 1.5 - 3.5 seconds.
Figure 2.
A) BaseNet architecture and output dimensions for the Dreyer2023 dataset. Layer names are specified following the PyTorch API conventions. For Conv2d layers, the first value indicates the number of filters, and the tuple represents the kernel size. In the pooling layers, the first tuple indicates the kernel size, while the second tuple specifies the stride. B) Visualization of the sliding window extraction in the second pooling layer of BaseNet.
Figure 2.
A) BaseNet architecture and output dimensions for the Dreyer2023 dataset. Layer names are specified following the PyTorch API conventions. For Conv2d layers, the first value indicates the number of filters, and the tuple represents the kernel size. In the pooling layers, the first tuple indicates the kernel size, while the second tuple specifies the stride. B) Visualization of the sliding window extraction in the second pooling layer of BaseNet.
Figure 3.
Computational gain of joint decoding for and different trial lengths and window lengths .
Figure 3.
Computational gain of joint decoding for and different trial lengths and window lengths .
Figure 4.
Simplified overview of the domain adaptation landscape. Italic text specifies requirements regarding calibration data.
Figure 4.
Simplified overview of the domain adaptation landscape. Italic text specifies requirements regarding calibration data.
Figure 5.
Cross-subject results for the Dreyer2023 dataset for different data compositions. Each dot resembles a subject, the stars within the brackets resemble the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) when comparing two experiments connected by that bracket.
Figure 5.
Cross-subject results for the Dreyer2023 dataset for different data compositions. Each dot resembles a subject, the stars within the brackets resemble the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) when comparing two experiments connected by that bracket.
Figure 6.
Cross-subject results for the Lee2019 dataset for different data compositions. Each dot resembles a subject, the stars within the brackets resemble the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) when comparing two experiments connected by that bracket.
Figure 6.
Cross-subject results for the Lee2019 dataset for different data compositions. Each dot resembles a subject, the stars within the brackets resemble the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) when comparing two experiments connected by that bracket.
Figure 7.
Accuracy and samples per entropy bin for BaseNet (width of each bin equals 0.1, starting from 0). Each black dot resembles one subject. The trials are averaged over the seeds but the accuracies are filtered, i.e., accuracies where no trials were in the entropy bin are removed and thus no averaging can be performed.
Figure 7.
Accuracy and samples per entropy bin for BaseNet (width of each bin equals 0.1, starting from 0). Each black dot resembles one subject. The trials are averaged over the seeds but the accuracies are filtered, i.e., accuracies where no trials were in the entropy bin are removed and thus no averaging can be performed.
Figure 8.
Accuracies per window. Each black line indicates one subject, the red line corresponds to the average and the blue lines correspond to the average ± standard deviation.
Figure 8.
Accuracies per window. Each black line indicates one subject, the red line corresponds to the average and the blue lines correspond to the average ± standard deviation.
Figure 9.
Topoplots of the EDS scores for BaseNet and both datasets. Good subjects have a TAcc > 70%, bad subjects yield a performance below or equal to this threshold.
Figure 9.
Topoplots of the EDS scores for BaseNet and both datasets. Good subjects have a TAcc > 70%, bad subjects yield a performance below or equal to this threshold.
Table 1.
Within-subject results. Results above the double line are for the Dreyer2023 dataset, below are for the Lee2019 dataset. The stars after the method indicate the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) compared to BaseNet.
Table 1.
Within-subject results. Results above the double line are for the Dreyer2023 dataset, below are for the Lee2019 dataset. The stars after the method indicate the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) compared to BaseNet.
Method |
TAcc(%) |
uTacc(%) |
WAcc(%) |
RiemannMDM*** |
|
|
|
BaseNet |
|
|
|
RiemannMDM* |
|
|
|
BaseNet |
|
|
|
Table 2.
Dreyer2023 dataset cross-subject experiments. The stars after the method indicate the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) compared to the benchmark method in the same setting.
Table 2.
Dreyer2023 dataset cross-subject experiments. The stars after the method indicate the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) compared to the benchmark method in the same setting.
|
Method |
TAcc(%) |
uTacc(%) |
WAcc(%) |
|
BaseNet |
|
|
|
supervised |
RiemannMDM+PAR |
|
|
|
|
BaseNet*** |
|
|
|
|
BaseNet+EA*** |
|
|
|
|
BaseNet+RA*** |
|
|
|
unsupervised |
RiemannMDM |
|
|
|
|
BaseNet+EA*** |
|
|
|
|
BaseNet+RA*** |
|
|
|
|
BaseNet+AdaBN* |
|
|
|
|
BaseNet+EA+AdaBN*** |
|
|
|
|
BaseNet+RA+AdaBN*** |
|
|
|
online |
RiemannMDM+GR |
|
|
|
|
BaseNet+EA |
|
|
|
|
BaseNet+RA |
|
|
|
|
BaseNet+AdaBN |
|
|
|
|
BaseNet+EA+AdaBN |
|
|
|
|
BaseNet+RA+AdaBN |
|
|
|
Table 3.
Lee2019 cross-subject experiments. The stars after the method indicate the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) compared to the benchmark method in the same setting.
Table 3.
Lee2019 cross-subject experiments. The stars after the method indicate the different significance levels (p<0.05(*), p<0.01(**) and p<0.001(***)) compared to the benchmark method in the same setting.
|
Method |
TAcc(%) |
uTacc(%) |
WAcc(%) |
|
BaseNet |
|
|
|
supervised |
RiemannMDM+PAR |
|
|
|
|
BaseNet*** |
|
|
|
|
BaseNet+EA*** |
|
|
|
|
BaseNet+RA*** |
|
|
|
unsupervised |
RiemannMDM |
|
|
|
|
BaseNet+EA*** |
|
|
|
|
BaseNet+RA*** |
|
|
|
|
BaseNet+AdaBN*** |
|
|
|
|
BaseNet+EA+AdaBN*** |
|
|
|
|
BaseNet+RA+AdaBN*** |
|
|
|
online |
RiemannMDM+GR |
|
|
|
|
BaseNet+EA* |
|
|
|
|
BaseNet+RA* |
|
|
|
|
BaseNet+AdaBN |
|
|
|
|
BaseNet+EA+AdaBN*** |
|
|
|
|
BaseNet+RA+AdaBN*** |
|
|
|