A Lightweight Temporal-Spatial Fusion Network for Neonatal Sleep Staging

Ligang Zhou; Yan Xu; Laishuan Wang; Wei Chen; Chen Chen

doi:10.20944/preprints202604.1735.v1

Submitted:

23 April 2026

Posted:

24 April 2026

You are already at the latest version

Abstract

Background: Accurate assessment of neonatal sleep is critical for monitoring brain development and identifying potential neurological disorders, yet manual scoring of multi-channel EEG recordings is labor-intensive and prone to variability. Methods: To address this, we propose a lightweight temporal-spatial feature fusion network for automatic neonatal sleep staging. The model employs a dual-branch architecture to separately capture temporal dependencies and spatial correlations in EEG signals, which are then integrated via an adaptive fusion module to obtain comprehensive feature representations while maintaining low computational complexity. Results: The framework was evaluated on a clinical neonatal dataset (CHFD) for tasks including sleep–wake classification, quiet sleep detection, and three-stage sleep staging, achieving superior performance compared with several state-of-the-art methods. Additional experiments on the MASS-S3 adult dataset demonstrate that the model retains competitive accuracy and F1-score, indicating strong generalization across populations. Conclusions: These results suggest that jointly modeling temporal and spatial features enables robust and efficient automatic sleep staging. The proposed approach offers a practical solution for clinical applications and edge deployment, providing reliable, multi-dimensional assessment of neonatal brain activity and laying the groundwork for future studies integrating larger datasets or multimodal physiological signals.

Keywords:

EEG

;

deep learning

;

lightweight

;

neonate

;

sleep staging

;

temporal-spatial

Subject:

Engineering - Bioengineering

1. Introduction

Sleep plays a fundamental role in human physiological and neurological development, particularly during the neonatal period. In early life, sleep is closely associated with brain maturation, synaptic plasticity, and cognitive development, and abnormalities in neonatal sleep patterns may indicate neurological dysfunction or developmental disorders [1,2]. Continuous monitoring and analysis of neonatal sleep architecture, especially in neonatal intensive care units (NICUs), therefore provide critical insights into brain development and clinical outcomes [3]. Among various physiological signals, electroencephalography (EEG) is considered the most informative modality for characterizing neonatal sleep states due to its direct reflection of cortical activity. Traditionally, sleep staging is performed manually by trained experts using polysomnography (PSG), which remains the clinical gold standard [4,5]. However, manual scoring is labor-intensive, time-consuming, and subject to inter-rater variability. These limitations have motivated the development of automated sleep staging methods, particularly those based on EEG signals, to improve efficiency, consistency, and accessibility of sleep assessment [6].

Early studies on automated sleep staging primarily relied on traditional machine learning methods, which typically involved handcrafted feature extraction followed by classifiers such as support vector machines or random forests [7,8]. However, their reliance on manually designed features limits their ability to capture complex patterns in EEG signals. With the advancement of deep learning, data-driven methods have demonstrated superior capability in learning hierarchical representations directly from raw signals, significantly improving performance in sleep staging tasks. For example, DeepSleepNet employs deep CNNs to learn representative sleep features automatically [9]. Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) networks, have been introduced to capture temporal dependencies across sleep epochs, as demonstrated in SeqSleepNet [10]. To further enhance feature representation, attention mechanisms have been incorporated to adaptively focus on informative temporal segments, while Transformer-based models, such as SleepTransformer, leverage self-attention to model long-range dependencies more effectively [11]. In addition, to explicitly model spatial relationships among EEG channels, graph neural networks (GNNs) have been explored, such as GraphSleepNet, which capture inter-channel connectivity patterns and improve representation learning [12].

However, above methods either focus on a single aspect of EEG signals, such as temporal dynamics or single-channel representations, while neglecting the intrinsic spatial correlations among multiple EEG channels, or computationally intensive, involving large numbers of parameters and complex architectures, which restrict their deployment in resource-constrained environments such as bedside monitoring systems or wearable devices. On the other hand, neonatal EEG signals exhibit strong non-stationarity, low signal-to-noise ratio, and rapid developmental variability across subjects and ages, show a higher inter-class similarity and ambiguity [13,14,15]. Most of methods are tailored for adult sleep data, although these methods achieve strong performance on adult datasets, their direct application to neonatal EEG often results in significant performance degradation due to domain differences. Even for methods trained on limited neonatal datasets, their generalization ability across different populations remains insufficient, making them less reliable in real-world clinical scenarios. These challenges collectively highlight the need for models that are not only efficient and lightweight but also capable of capturing comprehensive temporal-spatial representations with strong generalization ability.

To address the above challenges, we propose a lightweight temporal-spatial fusion network for neonatal sleep staging based on multi-channel EEG signals. The proposed model adopts a dual-stream architecture to separately learn temporal dynamics and spatial correlations, and subsequently fuses these complementary representations to obtain a more comprehensive understanding of brain activity. By jointly modeling temporal and spatial information, the proposed approach enhances feature representation capability while maintaining a compact model size. In addition, the lightweight design significantly reduces computational overhead, making it suitable for deployment in edge computing scenarios. Extensive experiments demonstrate that the proposed method not only achieves competitive performance on neonatal datasets but also exhibits strong generalization ability across different populations, including adult sleep datasets.

2. Materials and Methods

2.1. Dataset and Preprocessing

2.1.1. Children’s Hospital of Fudan University Dataset

A clinical neonatal sleep dataset collected from the Children’s Hospital of Fudan University (CHFD) is employed to evaluate the proposed method. This dataset consists of 64 EEG recordings from neonates, with an average duration of approximately 131 minutes per recording. EEG data acquisition was performed using a Nicolet system, following the international 10–20 electrode placement scheme with channels F3, F4, C3, C4, T3, T4, P3, and P4, and a sampling rate of 500 Hz. The study protocol was approved by the ethics committee of CHFD (Approval No. (2017) 89). Based on established clinical guidelines for neonatal sleep staging, all recordings were manually annotated by an experienced neurophysiologist into three categories: wakefulness, quiet sleep (QS), and active sleep (AS) according to the practical guidelines and recommendations for neonatal sleep staging [16,17,18,19]. The detailed information of the neonatal sleep dataset is shown in Table 1

2.1.2. MASS-SS3 Dataset

To further assess the generalization capability of the proposed method, additional experiments are conducted on adult sleep datasets. Specifically, a publicly available dataset, montreal archive of sleep studies (MASS), is adopted to evaluate the model in multi-channel sleep staging tasks on adult sleep data [20]. The MASS-SS3 dataset is selected because its EEG channel configuration is consistent with that of our private neonatal CHFD dataset, which allows for a fair and coherent cross-dataset evaluation under the same model architecture. In contrast, other widely used public datasets, such as Sleep-EDF and SHHS, either provide single-channel EEG recordings or employ different electrode configurations, making them less suitable for evaluation within a multi-channel framework [21,22]. Detailed information about the MASS-SS3 dataset is provided in Table 2.

2.1.3. Preprocessing

To enhance signal quality and ensure compatibility with the model input, the raw EEG recordings are first processed using a zero-phase notch filter to remove power-line interference. Specifically, a 50 Hz notch filter is applied to the CHFD dataset, whereas a 60 Hz notch filter is used for the MASS-SS3 dataset, in accordance with their respective recording environments. Using zero-phase filtering ensures that the phase of the EEG signals is preserved during the filtering process. Subsequently, a zero-phase band-pass filter with a frequency range of 0.3–35 Hz is applied. This frequency range is commonly adopted in sleep EEG studies, as it retains the key components of sleep-related brain activity, including delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), and low beta (13–30 Hz) rhythms, while attenuating low-frequency drifts (below 0.3 Hz) caused by factors such as perspiration, electrode movement, or respiration, as well as high-frequency noise (above 35 Hz) mainly arising from muscle artifacts and external interference [23,24,25]. Zero-phase filtering is used to prevent distortion of the temporal characteristics of the EEG signals. After filtering, the EEG recordings are segmented into 30-second epochs, consistent with standard manual sleep scoring protocols. Epochs with substantial artifacts, interruptions, or abnormal patterns are manually discarded to ensure data quality. Finally, the EEG signals are downsampled from 500 Hz to 100 Hz to reduce computational load while retaining sufficient temporal resolution for effective analysis.

2.2. Temporal-spatial Feature Fusion Network

The overall architecture of the proposed framework is illustrated in Figure 1. The model follows a dual-branch design to jointly learn temporal dependencies and spatial correlations from multi-channel EEG signals. Specifically, the input EEG is simultaneously fed into a temporal representation learning branch and a spatial correlation learning branch. These two branches are designed to capture complementary characteristics of EEG signals from temporal and spatial perspectives, respectively. After extracting temporal dependencies and spatial relationships, the learned representations are integrated through an adaptive fusion module, which dynamically balances their contributions. The fused features are then passed to a classifier to learn discriminative patterns and produce the final sleep stage predictions.

2.2.1. Temporal Representation Learning Branch

The temporal representation learning branch is designed to extract local temporal patterns and model their dependencies, as shown in Figure 2. Given an input EEG segment

X \in R^{B \times C \times T}

, where B, C, and T denote the batch size, number of channels, and temporal length, respectively, a lightweight convolutional neural network (CNN) is first applied along the temporal dimension to extract local features:

F_{t} = CNN (X)

(1)

where

F_{t} \in R^{B \times C \times F}

represents the temporal feature embeddings.

To further capture dependencies among temporal features, a temporal attention mechanism is introduced. The features are projected into query and key spaces:

Q = F_{t} W_{q}, K = F_{t} W_{k}

(2)

where

W_{q}, W_{k} \in R^{F \times H}

are learnable projection matrices and H denotes the hidden dimension.

The dynamic dependency matrix is computed as:

A_{dyn} = Q K^{⊤}

(3)

To incorporate prior structural information, a learnable static adjacency matrix

A_{static} \in R^{C \times C}

is introduced. The final temporal dependency matrix is defined as:

A_{t} = σ (A_{dyn} + A_{static})

(4)

where

σ (\cdot)

denotes a non-linear activation function (ReLU in this work). The resulting matrix

A_{t} \in R^{B \times C \times C}

encodes the dependency relationships among temporal features.

2.2.2. Spatial Correlation Learning Branch

The spatial correlation learning branch focuses on modeling inter-channel relationships while preserving the spatial structure of EEG signals as shown in Figure 3. A group convolution (Group-CNN) is first applied to extract channel-wise global representations:

F_{s} = GroupCNN (X)

(5)

where

F_{s} \in R^{B \times C \times F}

denotes the spatial feature embeddings.

To capture spatial correlations among channels, a spatial attention mechanism is employed. The features are projected into query and key representations:

Q_{s} = F_{s} W_{q}, K_{s} = F_{s} W_{k}

(6)

The dynamic spatial adjacency matrix is computed as:

A_{dyn}^{(s)} = Q_{s} K_{s}^{⊤}

(7)

Similarly, a learnable static adjacency matrix

A_{static}^{(s)}

is introduced. The final spatial correlation matrix is given by:

A_{s} = σ (A_{dyn}^{(s)} + A_{static}^{(s)})

(8)

This formulation enables the model to jointly learn data-driven and prior-informed inter-channel relationships, thereby enhancing spatial representation capability.

2.2.3. Adaptive Fusion Module

To effectively integrate temporal and spatial features, an adaptive fusion module is proposed. This module first learns the relative importance of temporal and spatial representations using two sets of learnable weights. Given temporal features

F_{t}

and spatial features

F_{s}

, element-wise (Hadamard) multiplication is applied:

{\tilde{F}}_{t} = W_{t} ⊙ F_{t}, {\tilde{F}}_{s} = W_{s} ⊙ F_{s}

(9)

where

W_{t}

and

W_{s}

are learnable parameters, and ⊙ denotes element-wise multiplication.

The weighted features are then projected into a unified feature space:

Z_{t} = {\tilde{F}}_{t} W_{t}^{'}, Z_{s} = {\tilde{F}}_{s} W_{s}^{'}

(10)

The projected features are concatenated to form the final representation:

Z = Concat (Z_{t}, Z_{s})

(11)

Finally, the fused representation is passed through a classifier composed of two fully connected layers:

\hat{y} = {FC}_{2} (ReLU ({FC}_{1} (Z)))

(12)

where

\hat{y}

denotes the predicted sleep stage.

2.3. Evaluation Metrics

We evaluate the proposed approach using multiple performance metrics, including accuracy, F1-score, Cohen’s kappa coefficient, specificity, and sensitivity. The model is trained for 150 epochs and evaluated using subject-wise 10-fold cross-validation. All experiments are conducted on a server equipped with NVIDIA GeForce RTX™ 4090 GPU and Intel® Xeon® Platinum 8383C × 80 CPU, implemented in Python with the PyTorch framework. After completing all folds, predictions from all subjects are combined to compute the overall performance metrics.

3. Results

Comprehensive experiments are carried out to evaluate the performance of the proposed temporal-spatial feature fusion network for neonatal sleep staging. Our primary experiments focus on the CHFD dataset, which contains multi-channel EEG recordings from neonates, to assess the performance of the model in clinically relevant sleep staging tasks. To further evaluate the generalization capability of the proposed approach across populations, additional experiments are conducted on the MASS-SS3 adult sleep dataset. This cross-dataset validation allows us to examine how effectively the model, trained primarily on neonatal EEG data, can generalize to adult sleep signals, thereby demonstrating its robustness and versatility. The evaluation considers multiple performance metrics, including accuracy, F1-score, Cohen’s kappa coefficient, specificity, and sensitivity, providing a comprehensive assessment of the model’s discriminative power and reliability. In addition, to contextualize the performance of the proposed method, we conduct comparative experiments with several representative baseline approaches on the CHFD dataset, including both traditional machine learning methods and recent deep learning models. These comparisons allow us to highlight the advantages of our dual-branch temporal-spatial learning framework over existing approaches in neonatal sleep staging. The results are shown in Table 3 and Figure 4.

3.1. Performance on Sleep-Wake Task on CHFD Dataset

For the binary sleep–wake classification on the CHFD dataset, the proposed temporal-spatial feature fusion network achieved an overall accuracy of 88.6%, with an F1-score of 0.870 and a Cohen’s kappa of 0.740. The macro-averaged sensitivity and specificity were both 0.868, indicating balanced performance across the two classes. As shown in the corresponding confusion matrix, the model correctly identified 92.1% of sleep epochs and 81.4% of wake epochs. Misclassification primarily occurred when brief wake periods were embedded within sleep, which is consistent with the known challenges of distinguishing transient arousals in neonatal EEG. These results demonstrate the model’s robust capability to discriminate between sleep and wake states in neonates.

3.2. Performance on QS Detection Task on CHFD Dataset

For quiet sleep (QS) detection, the model achieved higher performance, with an overall accuracy of 91.6%, an F1-score of 0.906, and a kappa coefficient of 0.811. The macro-sensitivity and macro-specificity were both approximately 0.902. The confusion matrix indicates that 94.6% of non-quiet sleep (NQS) epochs and 85.8% of QS epochs were correctly classified. The slightly lower QS sensitivity reflects occasional confusion with non-quiet sleep stages, particularly during transitional periods. Overall, the results highlight the model’s effectiveness in detecting QS epochs, which is critical for assessing neonatal sleep architecture.

3.3. Performance on AS-W-QS Task on CHFD Dataset

For the three-class classification task involving active sleep (AS), wake (W), and quiet sleep (QS), the model achieved an overall accuracy of 81.9%, an F1-score of 0.819, and a kappa of 0.729. The macro-sensitivity and macro-specificity were 0.818 and 0.910, respectively, indicating strong discriminative capability across all three classes. From the confusion matrix, AS epochs were correctly classified with 79.1% accuracy, wake epochs with 80.0%, and QS epochs with 86.4%. Misclassifications predominantly occurred between AS and wake, reflecting the intrinsic similarity of EEG patterns during brief arousals. These results confirm that the proposed dual-branch temporal-spatial framework effectively captures both temporal dependencies and spatial correlations to distinguish subtle differences among neonatal sleep states.

3.4. Performance on five-stage Classification Task on MASS-SS3 Dataset

To evaluate cross-population generalization, the model was applied to the five-stage adult sleep classification task (Wake–N1–N2–N3–REM) on the MASS-SS3 dataset. The model achieved an overall accuracy of 82.0%, an F1-score of 0.739, and a Cohen’s kappa of 0.729. Macro-sensitivity and macro-specificity were 0.723 and 0.944, respectively. The confusion matrix shows that N2 and REM stages were classified with high accuracy (91.9% and 84.7%, respectively), while N1 and N3 stages exhibited lower classification performance due to their inherent ambiguity. Wake epochs were correctly identified in 79.3% of cases. Despite being primarily trained on neonatal EEG, the model demonstrates competitive performance on adult data, suggesting that the temporal-spatial feature fusion framework effectively captures generalized sleep representations across populations.

3.5. Cross-Task Analysis

A joint analysis across all tasks provides deeper insights into the performance and robustness of the proposed temporal-spatial feature fusion network. On the CHFD dataset, the model demonstrates consistently high accuracy for both binary (Sleep–Wake) and QS detection tasks, with slightly lower performance observed in the three-class AS–W–QS task. This trend indicates that while the model effectively captures clear distinctions between sleep and wake or between QS and non-QS epochs, differentiating among closely related neonatal sleep states, particularly AS and wake, remains inherently challenging due to the subtle and transient EEG patterns during these transitions.

Comparing neonatal tasks with the adult five-stage classification task on MASS-SS3 highlights the generalization capability of the proposed framework. Despite differences in EEG characteristics between neonates and adults, the model achieves competitive performance across all five adult sleep stages, with particularly high accuracy in N2 and REM stages. Misclassifications primarily occur in N1 and N3 stages, reflecting the well-known ambiguity of these transitional and deep sleep stages. These observations suggest that the dual-branch temporal-spatial architecture effectively captures generalized representations of sleep-related brain activity, enabling the model to maintain robust performance across multiple tasks and populations.

Additionally, the macro-sensitivity and macro-specificity values across tasks indicate balanced performance, with minimal bias toward any particular class. Overall, the cross-task analysis underscores the effectiveness of integrating temporal dependencies and spatial correlations, demonstrating that the proposed model can handle both neonatal and adult sleep staging tasks with high reliability, while providing a unified framework for multi-class sleep classification.

3.6. Comparison with Existing Methods

To further validate the effectiveness of the proposed method, a comparative study with several representative existing approaches was conducted on the CHFD dataset for the three-stage (AS–W–QS) classification task. The compared methods include both traditional machine learning-based approaches and advanced deep learning models, covering CNN, RNN, attention, and graph-based architectures [9,12,26,27,28,29,30,31]. All methods were evaluated under the same experimental settings using subject-wise 10-fold cross-validation to ensure fairness.

As shown in Table 4, the proposed method achieves the best overall performance, with an accuracy of 81.9%, an F1-score of 0.819, and a Cohen’s kappa of 0.729. In particular, the proposed model significantly outperforms classical CNN-based methods such as MB-CNN and Conv-2d, which rely heavily on handcrafted features or limited spatial modeling capability. For instance, MB-CNN achieves an accuracy of 72.8%, while Conv-2d methods exhibit even lower performance (approximately 52–53%), indicating that relying solely on local convolutional features is insufficient to capture the complex temporal-spatial patterns in neonatal EEG signals.

Compared with more advanced deep learning methods, such as DeepSleepNet and AttnSleep, the proposed method also demonstrates clear advantages [9,31]. Although DeepSleepNet leverages multi-scale CNN and bi-directional LSTM to model temporal dependencies, its performance (68.9% accuracy) is limited, likely due to its focus on single-channel input and insufficient modeling of inter-channel relationships. Similarly, AttnSleep introduces attention mechanisms but still falls short (68.0% accuracy), suggesting that temporal attention alone is not sufficient without effective spatial modeling.

Graph-based approaches, such as GraphSleepNet and MVST-GCN, attempt to capture spatial relationships among EEG channels [12,30]. However, their performance remains suboptimal (68.9% and 69.7% accuracy, respectively), which may be attributed to their reliance on handcrafted features or fixed graph structures that limit their adaptability. In contrast, the proposed method introduces an adaptive adjacency learning mechanism that jointly considers both static and dynamic relationships, enabling more flexible and data-driven modeling of channel interactions.

Among all compared methods, MS-HNN achieves relatively competitive performance (75.4% accuracy), as it incorporates both temporal learning and multi-scale feature extraction tailored for neonatal data. Nevertheless, the proposed method still surpasses MS-HNN by a notable margin across all evaluation metrics. This improvement can be attributed to the proposed dual-branch architecture, which explicitly disentangles temporal and spatial feature learning and further integrates them through an adaptive fusion module.

In addition to performance improvements, the proposed method maintains a favorable trade-off between accuracy and model complexity. With only 0.81M parameters, it is significantly more lightweight than models such as DeepSleepNet (24.75M) and MS-HNN (25.63M), while achieving superior or comparable results. Although some lightweight models (e.g., MB-CNN, GraphSleepNet) have fewer parameters, their performance is considerably lower, highlighting the effectiveness of the proposed design in balancing efficiency and accuracy.

Overall, these results demonstrate that the proposed temporal-spatial feature fusion network effectively leverages multi-channel EEG information, capturing both temporal dependencies and spatial correlations in a unified framework. This leads to superior performance compared to existing methods, while maintaining a lightweight architecture suitable for practical deployment.

4. Discussion

In this study, we proposed a lightweight temporal-spatial feature fusion network for neonatal sleep staging and evaluated its performance on both neonatal and adult EEG datasets. The results demonstrate that the model consistently achieves high accuracy, F1-score, and Cohen’s kappa across multiple classification tasks, including binary sleep–wake discrimination, quiet sleep (QS) detection, and three-stage AS–W–QS classification in neonates, as well as five-stage adult sleep classification. Compared with several representative state-of-the-art methods, our approach exhibits superior performance while maintaining a compact model size, highlighting the effectiveness of integrating temporal and spatial feature learning within a unified framework.

The strong performance on neonatal tasks suggests that explicitly modeling both temporal dependencies and spatial correlations is critical for capturing the complex dynamics of neonatal EEG signals. In the binary sleep–wake and QS detection tasks, the high sensitivity and specificity indicate that the model is able to accurately differentiate distinct brain states even in the presence of transient arousals or subtle EEG patterns. For the three-stage AS–W–QS classification, slightly lower accuracy reflects the inherent difficulty in distinguishing active sleep from wake periods, consistent with previous observations that AS and wake exhibit overlapping EEG features. The use of adaptive adjacency matrices, combining static and dynamic components, likely contributed to capturing subtle temporal and inter-channel interactions that traditional CNN or RNN methods fail to fully exploit.

The evaluation on the adult MASS-SS3 dataset provides evidence of the generalization capability of the proposed framework. Despite being primarily trained on neonatal data, the model achieves competitive performance across five adult sleep stages, with particularly high accuracy for N2 and REM stages. Misclassifications in N1 and N3 are consistent with prior studies highlighting the transitional and deep sleep stages as challenging even for expert scorers . These findings suggest that the temporal-spatial fusion approach captures generalized patterns of sleep-related brain activity that are transferable across different age populations and EEG characteristics.

Compared with existing methods, our model demonstrates clear advantages. Traditional single-channel CNN-based methods, such as Conv-2d and MB-CNN, achieve limited performance, highlighting the importance of multi-channel spatial information. Graph-based and attention-based models (GraphSleepNet, MVST-GCN, AttnSleep) improve performance by incorporating spatial or temporal dependencies, but often rely on handcrafted features or fixed adjacency structures, limiting adaptability. In contrast, the proposed approach jointly models temporal and spatial features with adaptive, data-driven adjacency learning, which allows the network to dynamically capture complex interactions between channels and across time. Notably, while MS-HNN also targets neonatal sleep with multi-scale CNN and temporal learning, our dual-branch design with explicit temporal-spatial fusion further improves classification performance while maintaining a smaller parameter footprint, emphasizing the efficiency and practicality of the approach.

The findings have several implications for neonatal and clinical sleep research. First, accurate automated sleep staging enables continuous monitoring of neonatal brain development, which is critical for early detection of neurodevelopmental disorders and assessment of treatment effects. Second, the lightweight design and multi-channel integration suggest that the model can be feasibly deployed in bedside monitoring systems or wearable EEG devices, facilitating real-time and scalable neonatal sleep assessment. Third, the demonstrated generalization to adult datasets indicates the potential for broader applications in sleep research, including cross-population studies and investigations of sleep-related neurological conditions.

Several limitations should be acknowledged. Despite the promising performance, the neonatal dataset remains relatively small and may not fully capture the diversity of EEG patterns across different gestational ages or clinical conditions. The model performance could potentially be enhanced with larger and more diverse datasets. Additionally, while the adaptive fusion module improves feature integration, exploring more sophisticated fusion strategies, such as graph attention networks or transformer-based cross-channel attention, may further enhance performance. Future research may also focus on longitudinal studies to assess developmental trajectories and on integrating additional physiological signals, such as ECG or respiratory measures, to provide a multimodal assessment of neonatal sleep [32]. Finally, interpretability analyses of the learned temporal-spatial dependencies could offer insights into neurophysiological mechanisms underlying neonatal sleep states [33].

5. Conclusions

In this paper, we proposed a lightweight temporal-spatial feature fusion network for automatic neonatal sleep staging using multi-channel EEG signals. The model employs a dual-branch architecture to capture temporal dependencies and spatial correlations, and integrates them through an adaptive fusion module to obtain comprehensive feature representations.Experimental results on the CHFD dataset demonstrate that the proposed method achieves superior performance across multiple neonatal sleep staging tasks compared to existing approaches. In addition, evaluation on the MASS-SS3 adult dataset shows that the model maintains competitive performance, indicating good generalization capability across different populations.Overall, the proposed method provides an effective and efficient solution for neonatal sleep staging, with potential for practical deployment in real-world clinical scenarios. Future work will focus on improving generalization with larger datasets and exploring multimodal extensions.

Author Contributions

Conceptualization, C.C. and W.C.; methodology, L.Z.; software, L.Z.; validation, L.Z., C.C. and W.C.; formal analysis, L.Z.; investigation, L.Z.; resources, Y.X.; data curation, Y.X.; writing—original draft preparation, L.Z.; writing—review and editing, C.C.; visualization, L.Z.; supervision, C.C.; project administration, L.W.; funding acquisition, C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Children’s Hospital of Fudan University (Approval No. (2017) 89).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets analyzed during the current study are not publicly available due to privacy and ethical restrictions involving neonatal patient data. Access to these clinical EEG recordings is restricted to authorized personnel. In the future, the authors may consider making de-identified data available for research purposes, pending appropriate approvals and ethical clearance.

Acknowledgments

The authors would like to thank the clinical staff at the Children’s Hospital of Fudan University for their support in data collection and annotation. We also acknowledge the administrative and technical assistance provided by the hospital’s EEG laboratory, which was essential for the completion of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PSG	Polysomnography
NICUs	Neonatal intensive care units
CHFD	Children’s hospital of fudan dataset
MASS	Montreal archive of sleep studies
SHHS	Sleep heart health study
EEG	Electroencephalograph
CNN	Convolutional neural network
RNN	Recurrent neural network
GNN	Graph neural network
LSTM	Long short-term memory
QS	Quiet sleep
AS	Active sleep

References

Tarullo, A.R.; Balsam, P.D.; Fifer, W.P. Sleep and infant learning. Infant and Child Development 2011, 20, 35–46. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1002/icd.685. [CrossRef] [PubMed]
Ryan, M.A.J.; Mathieson, S.R.; Livingstone, V.; O’Sullivan, M.P.; Dempsey, E.M.; Boylan, G.B. Sleep state organisation of moderate to late preterm infants in the neonatal unit. Pediatric Research 2023, 93, 595–603. [Google Scholar] [CrossRef] [PubMed]
Abbasi, S.F.; Abbas, A.; Ahmad, I.; Alshehri, M.S.; Almakdi, S.; Ghadi, Y.Y.; Ahmad, J. Automatic neonatal sleep stage classification: A comparative study. Heliyon 2023, 9, e22195. [Google Scholar] [CrossRef] [PubMed]
Choo, B.P.; Mok, Y.; Oh, H.C.; Patanaik, A.; Kishan, K.; Awasthi, A.; Biju, S.; Bhattacharjee, S.; Poh, Y.; Wong, H.S. Benchmarking performance of an automatic polysomnography scoring system in a population with suspected sleep disorders. Frontiers in Neurology 2023, 14–2023. [Google Scholar] [CrossRef]
Alhejaili, F. Comparing polysomnography auto scoring with the standard of care in sleep medicine. Annals of Thoracic Medicine 2025, 21, 21–28. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, X.; Huang, Q.; Lv, Y.; Chen, F. A review of automated sleep stage based on EEG signals. Biocybernetics and Biomedical Engineering 2024, 44, 651–673. [Google Scholar] [CrossRef]
Alickovic, E.; Subasi, A. Ensemble SVM Method for Automatic Sleep Stage Classification. IEEE Transactions on Instrumentation and Measurement 2018, 67, 1258–1265. [Google Scholar] [CrossRef]
Klok, A.B.; Edin, J.; Cesari, M.; Olesen, A.N.; Jennum, P.; Sorensen, H.B. A New Fully Automated Random-Forest Algorithm for Sleep Staging. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2018; pp. 4920–4923. [Google Scholar] [CrossRef]
Supratak, A.; Dong, H.; Wu, C.; Guo, Y. DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2017, 25, 1998–2008. [Google Scholar] [CrossRef]
Phan, H.; Andreotti, F.; Cooray, N.; Chén, O.Y.; De Vos, M. SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2019, 27, 400–410. [Google Scholar] [CrossRef]
Phan, H.; Mikkelsen, K.; Chén, O.Y.; Koch, P.; Mertins, A.; De Vos, M. SleepTransformer: Automatic Sleep Staging With Interpretability and Uncertainty Quantification. IEEE Transactions on Biomedical Engineering 2022, 69, 2456–2467. [Google Scholar] [CrossRef]
Jia, Z.; Lin, Y.; Wang, J.; Zhou, R.; Ning, X.; He, Y.; Zhao, Y. GraphSleepNet: Adaptive Spatial-Temporal Graph Convolutional Networks for Sleep Stage Classification. In Proceedings of the Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence; IJCAI-20, Bessiere, C., Ed.; International Joint Conferences on Artificial Intelligence Organization; Main track, 7 2020; pp. 1324–1330. [Google Scholar] [CrossRef]
Alix, J.J.; Ponnusamy, A.; Pilling, E.; Hart, A.R. An introduction to neonatal EEG. Paediatrics and Child Health 2017, 27, 135–142. [Google Scholar] [CrossRef]
Vanhatalo, S.; Kaila, K. Development of neonatal EEG activity: From phenomenology to physiology. Seminars in Fetal and Neonatal Medicine;Assessing Brain Function in the Perinatal Period 2006, 11, 471–478. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Chan, G.S.H.; Tracy, M.B.; Lee, Q.Y.; Hinder, M.; Savkin, A.V.; Lovell, N.H. Spectral analysis of systemic and cerebral cardiovascular variabilities in preterm infants: relationship with clinical risk index for babies (CRIB). Physiological Measurement 2011, 32, 1913. [Google Scholar] [CrossRef] [PubMed]
Tsuchida, T.N.; Wusthoff, C.J.; Shellhaas, R.A.; Abend, N.S.; Hahn, C.D.; Sullivan, J.E.; Nguyen, S.; Weinstein, S.; Scher, M.S.; Riviello, J.J.; et al. American clinical neurophysiology society standardized EEG terminology and categorization for the description of continuous EEG monitoring in neonates: report of the American Clinical Neurophysiology Society critical care monitoring committee. Journal of clinical neurophysiology 2013, 30, 161–173. [Google Scholar] [CrossRef]
Bertelle, V.; Sevestre, A.; Laou-Hap, K.; Nagahapitiye, M.; Sizun, J. Sleep in the neonatal intensive care unit. The Journal of perinatal & neonatal nursing 2007, 21, 140–148. [Google Scholar]
Dereymaeker, A.; Pillay, K.; Vervisch, J.; De Vos, M.; Van Huffel, S.; Jansen, K.; Naulaers, G. Review of sleep-EEG in preterm and term neonates. Early human development 2017, 113, 87–103. [Google Scholar] [CrossRef]
Grigg-Damberger, M.M. The visual scoring of sleep in infants 0 to 2 months of age. Journal of clinical sleep medicine 2016, 12, 429–445. [Google Scholar] [CrossRef]
O’Reilly, C.; Gosselin, N.; Carrier, J.; Nielsen, T. Montreal Archive of Sleep Studies: an open-access resource for instrument benchmarking and exploratory research. Journal of Sleep Research 2014, 23, 628–635. Available online: https://onlinelibrary.wiley.com/doi/pdf/10.1111/jsr.12169. [CrossRef]
Kemp, B.; Zwinderman, A.; Tuk, B.; Kamphuisen, H.; Oberye, J. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG. IEEE Transactions on Biomedical Engineering 2000, 47, 1185–1194. [Google Scholar] [CrossRef]
Zhang, G.Q.; Cui, L.; Mueller, R.; Tao, S.; Kim, M.; Rueschman, M.; Mariani, S.; Mobley, D.; Redline, S. The National Sleep Research Resource: towards a sleep data commons. Journal of the American Medical Informatics Association 2018, 25, 1351–1358. Available online: https://academic.oup.com/jamia/article-pdf/25/10/1351/34150622/ocy064.pdf. [CrossRef]
Mazzotti, D.R.; Guindalini, C.; de Souza, A.A.L.; Sato, J.R.; Santos-Silva, R.; Bittencourt, L.R.A.; Tufik, S. Adenosine Deaminase Polymorphism Affects Sleep EEG Spectral Power in a Large Epidemiological Sample. PLOS ONE 2012, 7, 1–6. [Google Scholar] [CrossRef]
Crespo-Garcia, M.; Atienza, M.; Cantero, J.L. Muscle Artifact Removal from Human Sleep EEG by Using Independent Component Analysis. Annals of Biomedical Engineering 2008, 36, 467–475. [Google Scholar] [CrossRef] [PubMed]
Reis, P.; Hebenstreit, F.; Gabsteiger, F.; von Tscharner, V.; Lochmann, M. Methodological aspects of EEG and body dynamics measurements during motion. Frontiers in Human Neuroscience 2014, 8–2014. [Google Scholar] [CrossRef] [PubMed]
Ansari, A.H.; Wel, O.D.; Lavanga, M.; Caicedo, A.; Dereymaeker, A.; Jansen, K.; Vervisch, J.; Vos, M.D.; Naulaers, G.; Huffel, S.V. Quiet sleep detection in preterm infants using deep convolutional neural networks. Journal of Neural Engineering 2018, 15, 066006. [Google Scholar] [CrossRef] [PubMed]
Ansari, A.H.; Wel, O.D.; Pillay, K.; Dereymaeker, A.; Jansen, K.; Huffel, S.V.; Naulaers, G.; Vos, M.D. A convolutional neural network outperforming state-of-the-art sleep staging algorithms for both preterm and term infants. Journal of Neural Engineering 2020, 17, 016028. [Google Scholar] [CrossRef]
Zhu, H.; Wang, L.; Shen, N.; Wu, Y.; Feng, S.; Xu, Y.; Chen, C.; Chen, W. MS-HNN: Multi-Scale Hierarchical Neural Network With Squeeze and Excitation Block for Neonatal Sleep Staging Using a Single-Channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering Conference Name: IEEE Transactions on Neural Systems and Rehabilitation Engineering 2023, 31, 2195–2204. [Google Scholar] [CrossRef]
Siddiqa, H.A.; Tang, Z.; Xu, Y.; Wang, L.; Irfan, M.; Abbasi, S.F.; Nawaz, A.; Chen, C.; Chen, W. Single-Channel EEG Data Analysis Using a Multi-Branch CNN for Neonatal Sleep Staging. IEEE Access 2024, 12, 29910–29925. [Google Scholar] [CrossRef]
Jia, Z.; Lin, Y.; Wang, J.; Ning, X.; He, Y.; Zhou, R.; Zhou, Y.; Lehman, L.w.H. Multi-View Spatial-Temporal Graph Convolutional Networks With Domain Generalization for Sleep Stage Classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2021, 29, 1977–1986. [Google Scholar] [CrossRef]
Eldele, E.; Chen, Z.; Liu, C.; Wu, M.; Kwoh, C.K.; Li, X.; Guan, C. An Attention-Based Deep Learning Approach for Sleep Stage Classification With Single-Channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2021, 29, 809–818. [Google Scholar] [CrossRef]
Lyu, J.; Shi, W.; Zhang, C.; Yeh, C.H. A Novel Sleep Staging Method Based on EEG and ECG Multimodal Features Combination. IEEE Transactions on Neural Systems and Rehabilitation Engineering 2023, 31, 4073–4084. [Google Scholar] [CrossRef]
Zhou, D.; Xu, Q.; Zhang, J.; Wu, L.; Xu, H.; Kettunen, L.; Chang, Z.; Zhang, Q.; Cong, F. Interpretable Sleep Stage Classification Based on Layer-Wise Relevance Propagation. IEEE Transactions on Instrumentation and Measurement 2024, 73, 1–10. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed method.

Figure 2. The temporal representation learning branch.

Figure 3. The temporal representation learning branch.

Figure 4. Confusion matrix of the proposed method on different sleep analysis task(a) Confusion matrix on CHFD Sleep-Wake classification task. (b) Confusion matrix on CHFD QS detection task. (c) Confusion matrix on CHFD AS-Wake-QS classification task. (d) Confusion matrix on MASS-SS3 five-stage classification task.

Table 1. Specifications of the CHFU dataset

Terms	Details
Gender (b: g)	32:32
Gestational age (w + d)	38.3 ± 1.8
Postmenstrual age (w + d)	40.5 ± 1.7
Weight (kg)	3.3 ± 0.6
Number of wakefulness epochs	5514 (32.8%)
Number of QS epochs	5749 (34.2%)
Number of AS epochs	5540 (33.0%)
EEG channel	F3, F4, C3, C4, T3, T4, P3, and P4
Sampling rate	500Hz

*b: g means boy: girl and w + d denotes week + day.

Table 2. Specifications of the MASS-SS3 dataset

Terms	Details
Gender (m: f)	28:34
Scoring rules	AASM
Sampling rate	256 Hz
Number of wakefulness epochs	6442
Number of N1 epochs	4839
Number of N2 epochs	29802
Number of N3 epochs	7653
Number of REM epochs	10581
Selected EEG channel	F3, F4, C3, C4, T3, T4, P3, and P4

*m: f means male: female.

Table 3. Performance evaluation on CHFD and MASS-SS3 datasets

Dataset	Task	Accuracy	MF1	Kappa	Macro-sensitivity	Macro-specificity
CHFD	Sleep-Wake	0.886	0.870	0.740	0.868	0.868
CHFD	QS Detection	0.916	0.906	0.811	0.902	0.902
CHFD	AS-W-QS	0.819	0.819	0.729	0.818	0.910
MASS-SS3	W-N1-N2-N3-REM	0.820	0.739	0.729	0.723	0.944

Table 4. Comparison of different methods on the CHFD AS-W-QS task.

Method	Accuracy	MF1	Kappa	M-sens	M-spec	Parameters
MB-CNN [29]	0.728	0.682	0.561	0.671	0.850	<0.01M
Conv-2d [26]	0.535	0.531	0.489	0.768	0.536	<0.01M
Conv-2d [27]	0.523	0.519	0.411	0.761	0.523	<0.01M
DeepSleepNet [9]	0.689	0.682	0.535	0.845	0.692	24.75M
AttnSleep [31]	0.680	0.646	0.659	0.839	0.650	5.20M
MS-HNN [28]	0.754	0.758	0.728	0.876	0.755	25.63M
GraphSleepNet [12]	0.689	0.682	0.535	0.845	0.692	<0.05M
MVST-GCN [30]	0.697	0.696	0.547	0.849	0.699	<0.05M
Proposed	0.819	0.819	0.729	0.818	0.910	0.81M

M-sens: Macro-sensitivity, M-spec: Macro-specificity.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Lightweight Temporal-Spatial Fusion Network for Neonatal Sleep Staging

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Dataset and Preprocessing

2.1.1. Children’s Hospital of Fudan University Dataset

2.1.2. MASS-SS3 Dataset

2.1.3. Preprocessing

2.2. Temporal-spatial Feature Fusion Network

2.2.1. Temporal Representation Learning Branch

2.2.2. Spatial Correlation Learning Branch

2.2.3. Adaptive Fusion Module

2.3. Evaluation Metrics

3. Results

3.1. Performance on Sleep-Wake Task on CHFD Dataset

3.2. Performance on QS Detection Task on CHFD Dataset

3.3. Performance on AS-W-QS Task on CHFD Dataset

3.4. Performance on five-stage Classification Task on MASS-SS3 Dataset

3.5. Cross-Task Analysis

3.6. Comparison with Existing Methods

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

MDPI Initiatives

Important Links

Subscribe