Preprint
Article

This version is not peer-reviewed.

SE-SNN: Squeeze-and-Excitation Enhanced Spiking Neural Networks with Learnable Neuron Dynamics for Event-Based Vision

A peer-reviewed version of this preprint was published in:
Biomimetics 2026, 11(5), 359. https://doi.org/10.3390/biomimetics11050359

Submitted:

26 March 2026

Posted:

27 March 2026

You are already at the latest version

Abstract
Spiking Neural Networks (SNNs) have emerged as a promising paradigm for energy-efficient neuromorphic computing, particularly when processing asynchronous event streams from dynamic vision sensors (DVS). However, SNNs often suffer from limited representational capacity and suboptimal feature recalibration compared to their artificial counterparts. To address these challenges, we propose SE-SNN, a novel architecture that integrates Squeeze-and-Excitation (SE) blocks into deep residual SNNs, enabling channel-wise attention without spike generation in the gating mechanism. Furthermore, we introduce a Robust Parametric Leaky Integrate-and-Fire (RobustPLIF) neuron model with learnable membrane time constant (τ) and firing threshold (Vth), allowing adaptive temporal dynamics per layer. Our model is trained on the CIFAR10-DVS dataset.Experimental results demonstrate that SE-SNN achieves state-of-the-art accuracy of 78.8 % on CIFAR10-DVS with only 16 time steps, significantly outperforming baseline SNNs while maintaining biological plausibility and hardware efficiency. Ablation studies confirm the individual contributions of SE blocks and learnable neuron parameters to performance gains.
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

Spiking Neural Networks (SNNs) have garnered extensive attention in recent years owing to their superior biological interpretability, lower power consumption, and lower latency. As a distinctive feature, SNNs employ discrete spike events for information transmission and exhibit sparse activation properties [1]. These characteristics not only enable ultra-low power consumption and low latency but also endow SNNs with a unique capability to capture key features in dynamic time-series data, thereby demonstrating enormous application potential [2]. Specifically, SNNs possess prominent event-driven sparsity: their neurons strictly adhere to a spike-triggered mechanism, where spike signals are generated only when the accumulated membrane potential exceeds the firing threshold. This biologically realistic "fire-rest" dynamic trait allows the neural network to maintain highly sparse activation in both spatial and temporal dimensions (i.e., spatiotemporal sparsity), and measurements have shown that it can reduce synaptic operation energy consumption by up to 60-80% [3]. Consequently, SNNs outperform traditional artificial neural networks (ANNs) in processing dynamic and continuous signal data, particularly displaying obvious advantages in tasks requiring efficient temporal feature processing, such as action recognition and speech processing [4,5]. With the continuous advancement of computing power, SNNs have been widely applied in various fields including image processing and signal recognition, further highlighting their great potential in low-power and efficient computing [6,7].
As the third generation of neural networks, SNNs differ significantly from traditional ANNs in terms of their constituent units, input-output methods, and operating mechanisms [8]. SNNs use spike signals as information carriers, and their neurons transmit and process information through discrete spike events, exhibiting strong temporal dynamic characteristics that are closer to the operating mechanisms of biological nervous systems in multiple aspects [9]. Firstly, SNNs adopt spike signals as the basic unit for information transmission, encoding information through the timing and frequency of spike emissions. This discrete, time-dependent information transmission method enables them to process spatiotemporal dynamic data more efficiently [10]. Secondly, neuronal activity in SNNs is sparse and event-driven, with spikes emitted only when receiving input signals of sufficient intensity, which confers significant low-power consumption advantages. Additionally, the learning algorithms of SNNs are based on biological principles, enabling more natural neural plasticity. These characteristics enable SNNs to exhibit great potential in simulating brain neural signal processing, handling complex spatiotemporal tasks, and achieving low-power computing, thus being widely recognized as an important direction in neural network development and representing the core features of the third generation of neural networks [11].
Currently, the learning methods of SNNs are mainly categorized into two types: the ANN-to-SNN conversion method and direct training [12]. Traditional ANNs inherently involve redundant computations, which inevitably increase computational costs during data processing. Both the conversion of ANNs to SNNs and the direct training of SNNs based on backpropagation supervised learning rules require substantial labeled data for model training, accompanied by high computational overhead. These methods still exhibit a considerable gap compared with the event-driven, efficient information processing mechanisms of biological neural systems. Despite recent advancements in conversion-based approaches and direct training methods, SNNs still lag behind ANNs in accuracy when tackling complex visual tasks. Two key limitations account for this performance gap: (1) the fixed dynamics of neurons fail to adapt to layer-specific feature statistics, and (2) the lack of explicit mechanisms for modeling inter-channel dependencies, which is critical for discriminative feature learning [13].
Moreover, SNNs still confront significant challenges in training algorithm performance, parameter optimization, and network architecture design, which severely restrict their performance improvement and practical application promotion [14]. Specifically, the existing challenges of SNNs include the lack of effective training algorithms, the need for refined parameter optimization, and the requirement for adaptive adjustments to network architectures. Research on training algorithms [15] primarily addresses the issue that gradient descent algorithms cannot be directly applied to SNNs due to the non-differentiable nature of spike firing functions. Research on parameter optimization [16] aims to further enhance the accuracy and reduce the latency of SNNs. In terms of network architecture adjustments [17], the asynchronous information processing driven by spike events differs significantly from the synchronous continuous-value processing in ANNs. Therefore, when leveraging ANN architectures to construct SNNs, corresponding modifications to the network architecture are often indispensable. Typical adjustments include improvements to residual connections, pooling layers, and Batch Normalization (BN) methods. These adjustments adapt to the characteristics of spike signals by redesigning the input and output forms of each layer, thereby enabling the effective application of such architectures in SNNs [18].
Nevertheless, existing SNN architectures still have numerous shortcomings. Some architectures merely treat spiking neurons as a special type of activation function, ignoring the temporal correlation between spikes and thus failing to fully utilize the spatiotemporal characteristics of SNNs. Others neglect the binary nature of spike sequences during information transmission, leading to inaccurate inter-layer data propagation and even information loss [19].
Event-based cameras, such as Dynamic Vision Sensors (DVS), capture visual information as asynchronous streams of sparse "events" triggered by pixel-level intensity changes. This paradigm offers advantages in high temporal resolution, low latency, and energy efficiency over conventional frame-based cameras  [20]. Inspired by biological neural systems, SNNs naturally align with such asynchronous data due to their event-driven computation and temporal coding capabilities. We implemented our model using the SpikingJelly framework  [21] and evaluated it on the challenging CIFAR10-DVS dataset  [22]. Our training protocol incorporates Mixup, EMA, and robust learning rate scheduling to stabilize the optimization process. The proposed SE-SNN achieves competitive performance with minimal computational overhead, demonstrating the effectiveness of attention mechanisms and adaptive neuron models in SNNs.
To bridge the aforementioned gaps, we propose three synergistic innovations:
  • A learnable robust PLIF neuron (RobustPLIF) with trainable τ and V th , enabling automatic adjustment of temporal integration and spiking behavior.
  • Integration of Squeeze-and-Excitation (SE) blocks  [23] into SNN residual blocks, where the SE module operates on membrane potentials (not spikes) to generate channel-wise attention weights via standard differentiable operations.
  • Experimental results demonstrate that SE-SNN achieves a state-of-the-art accuracy of 78.8 % on CIFAR10-DVS.
The remainder of this paper is organized as follows: Section 2 presents a brief review on background of Neural Encoding, Network Structures, and Learning Algorithms for SNNs. In Section 3, the details of the proposed algorithm are elaborated. Comprehensive study and experimental results are discussed in Section 4, and finally, Section 5 provides concluding remarks of the study.

3. Methodology

In this section, we present the detailed architecture of our proposed SE-SNN (Squeeze-and-Excitation Spiking Neural Network), a deep residual SNN enhanced with channel attention mechanisms for robust event-based object recognition. We first introduce the robust neuron model, followed by the SE-ResNet architecture, temporal integration strategy, and comprehensive training pipeline. The overall architecture is shown in Figure 2.
The detailed architectural configuration of the proposed SE-SNN is summarized in Table 3.

3.1. Robust PLIF Neuron Model

Unlike conventional Leaky Integrate-and-Fire (LIF) neurons with fixed hyperparameters, we propose a Robust Piecewise Linear Integrate-and-Fire (PLIF) neuron where both the membrane time constant τ and firing threshold v t h are learnable parameters optimized during training.

Membrane Dynamics

The subthreshold dynamics of our robust PLIF neuron follow:
τ d v ( t ) d t = ( v ( t ) v r e s t ) + I ( t ) ,
where v r e s t = 0 is the resting potential, I ( t ) represents the input current, and τ R + is the learnable time constant controlling the leakage rate.

Firing Mechanism

When the membrane potential exceeds the learnable threshold v t h , the neuron emits a spike:
s ( t ) = Θ ( v ( t ) v t h ) = 1 , if v ( t ) v t h , 0 , otherwise ,
where Θ ( · ) is the Heaviside step function. After firing, the membrane potential is reset to v r e s t .

Surrogate Gradient

To enable backpropagation through the non-differentiable spike function, we employ the ArcTangent surrogate gradient:
s v α 2 1 + π 2 α ( v v t h ) 2 ,
where α = 2.0 controls the steepness of the gradient. In SNNs, detach _ reset is a key parameter used to control whether the computation graph of the reset operation is detached during the backpropagation process, thereby avoiding gradient interference caused by the reset operation in gradient calculation. We set detach _ reset = True to prevent gradient flow through the reset mechanism while maintaining temporal dependency.

Parameter Constraints

To ensure stable dynamics, we apply hard constraints during optimization:
τ [ 1.0 , 20.0 ] , v t h [ 0.2 , 0.8 ] .
These constraints are enforced via projection after each gradient update.

3.2. SE-ResNet Architecture

Our network follows a residual architecture with channel-wise attention modules specifically designed for spiking neural networks. The overall structure comprises an initial convolutional stem, four stages of SE-Residual blocks with progressive channel expansion, and a classification head with temporal aggregation. The forward propagation implementation of SE-SNN is detailed in Algorithm 1.
Algorithm 1 SE-SNN Forward Propagation
Require: Event stream X R N × T × 2 × H × W , Network parameters θ , Time steps T
Ensure: Logits Y R N × N c l s
 1:
Initialize empty list O [ ]
 2:
for  t = 1   to  T  do
 3:
     x t X [ : , t , : , : , : ] ▹ Extract t-th frame
 4:
     h InitConv ( x t ) ▹ Conv(2→64, 3×3)+BN+PLIF+MaxPool
 5:
     h Layer 1 ( h ) ▹ SE-ResBlock ×2, 64 channels
 6:
     h Layer 2 ( h ) ▹ SE-ResBlock ×2, 128 channels, stride 2
 7:
     h Layer 3 ( h ) ▹ SE-ResBlock ×2, 256 channels, stride 2
 8:
     h Layer 4 ( h ) ▹ SE-ResBlock ×2, 512 channels, stride 2
 9:
     h AdaptiveAvgPool 2 d ( h , ( 4 , 4 ) )
10:
    Append h to O
11:
end for
12:
H stack ( O , dim = 1 ) ▹ Shape: [ N , T , 512 , 4 , 4 ]
13:
F max ( H , dim = 1 ) . values ▹ Temporal max pooling
14:
Y Classifier ( F ) ▹ Flatten + FC(8192→1024) + PLIF + Dropout + FC(1024→10)
15:
return  Y

3.2.1. Squeeze-and-Excitation for SNNs

Traditional SE blocks operate on activation maps in ANNs. In SNNs, we adapt this mechanism to operate on membrane potentials rather than binary spikes, preserving continuous information for effective channel recalibration.
Given an intermediate membrane potential tensor X R N × C × H × W at a specific time step, the SE module performs:
Squeeze Operation
Global average pooling aggregates spatial information into a channel descriptor z R N × C :
z c = 1 H × W i = 1 H j = 1 W X c , i , j , c = 1 , , C .
Excitation Operation
A bottleneck architecture learns channel-wise dependencies:
s = σ W 2 · ReLU ( W 1 · z ) ,
where W 1 R C r × C and W 2 R C × C r are learnable weights with reduction ratio r = 16 , and σ ( · ) denotes the sigmoid function.
Scaling Operation
The final output is obtained by channel-wise multiplication:
X ˜ c , i , j = s c · X c , i , j .

3.2.2. SE-Residual Block

Each residual block consists of two convolutional layers with SE attention and skip connections:
h 1 = PLIF ( BN ( Conv 3 × 3 ( x ) ) ) ,
h 2 = SE ( BN ( Conv 3 × 3 ( Dropout ( h 1 ) ) ) ) ,
y = PLIF ( h 2 + F s h o r t c u t ( x ) ) ,
where F s h o r t c u t is an identity mapping or 1 × 1 convolution with batch normalization when spatial dimensions or channel numbers change.
The forward pass implementation of SE-Residual Block is detailed in Algorithm 2.
Algorithm 2 SE-Residual Block Forward Pass
Require: Input membrane potential x , In channels C i n , Out channels C o u t , Stride s, Use SE flag I S E , Dropout rate p
Ensure: Output membrane potential y
 1:
identity x
 2:
out Conv 2 d ( x , C o u t , kernel = 3 , stride = s , padding = 1 )
 3:
out BatchNorm 2 d ( out )
 4:
out PLIF ( out )
 5:
if  p > 0   then
 6:
     out Dropout 2 d ( out , p )
 7:
end if
 8:
out Conv 2 d ( out , C o u t , kernel = 3 , stride = 1 , padding = 1 )
 9:
out BatchNorm 2 d ( out )
10:
if  I S E = True   then
11:
     z AdaptiveAvgPool 2 d ( out , ( 1 , 1 ) ) ▹ Squeeze: global spatial avg
12:
     z Flatten ( z ) ▹ Shape: [ N , C o u t ]
13:
     w Linear ( z , C o u t / / 16 ) ▹ Reduction
14:
     w ReLU ( w )
15:
     w Linear ( w , C o u t ) ▹ Expansion
16:
     w Sigmoid ( w ) ▹ Excitation: channel weights
17:
     out out × w . view ( N , C o u t , 1 , 1 ) ▹ Scale
18:
end if
19:
if  s 1   or  C i n C o u t   then
20:
     identity Conv 2 d ( x , C o u t , kernel = 1 , stride = s )
21:
     identity BatchNorm 2 d ( identity )
22:
end if
23:
out out + identity ▹ Residual connection
24:
y PLIF ( out ) ▹ Final activation
25:
return  y

3.3. Temporal Information Integration

For an input event stream X R N × T × C i n × H × W with T time steps, we process each frame independently through the spatial network f θ ( · ) and aggregate temporal information via max pooling over time:
F a g g = max t { 1 , , T } f θ ( X t ) ,
where X t R N × C i n × H × W denotes the frame at time t. This strategy emphasizes salient events while suppressing background noise, outperforming conventional average firing rate encoding.
The aggregated features F a g g are then fed into the classification head comprising fully-connected layers with dropout and PLIF activation.

3.4. Training Pipeline

3.4.1. Mixup Data Augmentation

To improve generalization on event-based data, we apply Mixup [52] in the input space:
λ Beta ( α , α ) , α = 0.2 ,
x ˜ = λ x i + ( 1 λ ) x j ,
y ˜ = λ y i + ( 1 λ ) y j ,
where ( x i , y i ) and ( x j , y j ) are two randomly sampled training examples.

3.4.2. Exponential Moving Average (EMA)

Exponential Moving Average (EMA) is a weighted moving average indicator widely used in financial technical analysis and deep learning optimization. Its core feature is that it assigns higher weights to recent data, thereby more sensitively reflecting changes in trends. We maintain an EMA model θ E M A alongside the training model θ :
θ E M A ( t ) = γ · θ E M A ( t 1 ) + ( 1 γ ) · θ ( t ) ,
with decay rate γ = 0.995 . The EMA model is used for validation and final evaluation, providing more stable predictions.

3.5. Complexity Analysis

Computational Cost

For an input with T time steps, the total floating-point operations (FLOPs) are:
FLOPs = T × l = 1 L FLOPs spatial , l + FLOPs temporal + FLOPs head ,
where L = 4 is the number of residual stages. The temporal aggregation adds negligible overhead ( O ( N · T · C · H · W ) ).

Memory Footprint

During training, we maintain two sets of parameters ( θ and θ E M A ) and gradients. Peak memory consumption is approximately:
Memory 2 × | θ | + | grad | + | activations | 4 × | θ | ( with AMP ) .
For our 19.6M parameter model, this requires ∼300MB for parameters and ∼2-4GB for activations depending on batch size.

4. Experiments

4.1. Dataset and Setup

CIFAR10-DVS is an event stream dataset for object classification. 10,000 frame-based images from the CIFAR-10 dataset were converted into 10,000 event streams of an event sensor with a resolution of 128 × 128 pixels. This dataset has moderate difficulty with 10 different categories. The conversion is achieved using the Repetitive Closed-Loop Smooth (RCLS) movement of frame-based images. Due to the conversion, they generate rich local intensity changes over continuous time, which are quantized by each pixel of the event-based camera.
All experiments are implemented using PyTorch and the SpikingJelly framework [21]. We employ the AdamW optimizer with initial learning rate 4 × 10 4 , weight decay 5 × 10 4 , and betas ( 0.9 , 0.999 ) . We evaluate on CIFAR10-DVS, which contains 10,000 DVS recordings (10 classes, 1,000 per class) of CIFAR10 images displayed on an LCD screen. Each sample is converted to 16-frame event representations of size 128 × 128 × 2 (on/off channels). We split the data into 90% training and 10% testing, following standard practice. To accommodate the heterogeneity of neuronal dynamics, we apply a 0.5× learning rate reduction for learnable parameters τ and v threshold in robust PLIF neurons. The learning rate schedule consists of a 10-epoch linear warmup from 0.1 × to 1.0 × base learning rate, followed by cosine annealing to 1 × 10 7 over the remaining epochs. We set the batch size to 16 and train for a maximum of 300 epochs with early stopping (patience=20). Gradient clipping with max norm 1.0 is applied to ensure training stability. Experimental parameter configuration is given as follows.
  • Data Augmentation: Random horizontal flip and crop on event frames; Mixup with α = 0.2 .
  • Optimization: AdamW optimizer with weight decay 5 × 10 4 ; separate learning rates for neuron parameters ( 2 × 10 4 ) and others ( 4 × 10 4 ).
  • Learning Rate Schedule: Linear warmup (10 epochs) followed by cosine annealing to 10 7 .
  • Regularization: Gradient clipping (max norm=1.0), dropout (0.1-0.5), label smoothing (0.1).
  • Model Averaging: Exponential Moving Average (EMA) with decay 0.995.
  • Early Stopping: Patience of 20 epochs based on validation accuracy.
The experimental platform is equipped with an Intel Core i9-13900HX (13th generation) processor, 32 GB of RAM, and an NVIDIA GeForce RTX 4070 GPU with 8 GB of dedicated video memory. The experiment was programmed and tested using Python 3.10.11 with CUDA version 12.0, and the selected deep learning frameworks are PyTorch and SpikingJelly.

4.2. Comparison with State-of-the-Art

We compare our SE-PLIF-SNN against existing state-of-the-art methods on CIFAR10-DVS. As shown in Table 4, our method achieves competitive performance among direct-training SNN approaches.
Our method achieves 78.8% accuracy with T = 16 , outperforming most existing direct-training methods and approaching the current state-of-the-art. Notably, our model utilizes 19.6M parameters, demonstrating superior parameter efficiency compared to competing methods. With reduced timesteps ( T = 10 ), our method maintains 76.5% accuracy, indicating robust temporal compression capability.

4.3. Analysis of Neuronal Dynamics

Learnable Parameter Evolution

We analyze the evolution of learnable parameters τ and v th during training. Figure 3 illustrates that different layers converge to distinct temporal dynamics: shallow layers prefer smaller τ (fast response to edge features), while deep layers adopt larger τ (sustained integration for semantic features). The thresholds stabilize in the range [0.35, 0.55], balancing firing sparsity and information transmission.

Spike Activity Analysis

Figure 4 presents the average firing rates across layers. The SE blocks effectively modulate channel-wise activity, reducing redundant spikes by 15% while preserving task-relevant information. The overall network maintains a moderate firing rate of 23%, indicating energy-efficient event-driven computation.

4.4. Robustness Evaluation

We evaluate the robustness of SE-PLIF-SNN under various challenging conditions:

Temporal Resolution Robustness

Testing with varying timesteps T { 4 , 8 , 10 , 16 , 20 } shows graceful degradation: 65.2% ( T = 4 ), 68.8% ( T = 8 ), 74.5% ( T = 10 ), 78.8% ( T = 16 ), 78.6% ( T = 20 ). The model maintains reasonable performance even with 4 timesteps, crucial for low-latency applications.

Noise Resilience

We inject Gaussian noise ( σ { 0.01 , 0.05 , 0.1 } ) to input frames. The accuracy degrades gradually: 77.2% ( σ = 0.01 ), 74.8% ( σ = 0.05 ), 70.3% ( σ = 0.1 ), demonstrating robustness to sensor noise inherent in event cameras.

Spatial Perturbations

Random erasing (probability 0.5) and cutout (hole size 16 × 16 ) result in 77.1% and 76.9% accuracy respectively, indicating strong spatial generalization.

4.5. Ablation Study

To validate the effectiveness of each proposed component, we conduct comprehensive ablation experiments on CIFAR10-DVS. All ablation experiments maintain the same hyperparameters unless otherwise specified.
Table 5. Ablation study of proposed components on CIFAR10-DVS. All experiments use T = 16 timesteps.
Table 5. Ablation study of proposed components on CIFAR10-DVS. All experiments use T = 16 timesteps.
Configuration PLIF SE Block Mixup Accuracy (%)
Baseline LIF 64.3 ± 0.4
+ PLIF only 66.8 ± 0.3
+ SE only 67.1 ± 0.5
+ Mixup only 65.9 ± 0.4
PLIF + SE 74.2 ± 0.3
PLIF + Mixup 76.5 ± 0.4
SE + Mixup 76.8 ± 0.3
Full Model (PLIF+SE+Mixup) 77 . 5 ± 0 . 2
Full Model + EMA 78 . 8 ± 0 . 2

Effect of PLIF Neurons

Replacing standard LIF with learnable PLIF neurons improves accuracy by 2.5%, demonstrating that adaptive membrane time constants better capture the heterogeneous temporal dynamics of event-based data. The learnable thresholds also contribute to optimized firing patterns.

Effect of SE Blocks

Integrating Squeeze-and-Excitation blocks into residual connections provides a 2.8% accuracy gain. The channel-wise attention mechanism effectively recalibrates feature responses, enhancing the discriminative power of spiking representations.

Effect of Mixup Augmentation

Mixup regularization improves generalization by 1.6%, particularly beneficial for the limited training data in CIFAR10-DVS. The linear interpolation of event-based frames creates virtual training samples that smooth the decision boundary.

Synergistic Effects

The combination of all three components achieves 77.5% accuracy, significantly outperforming individual additions. The EMA strategy further boosts performance to 78.8%, validating the effectiveness of temporal ensembling for SNN training stability.

5. Conclusion

We presented SE-SNN, a novel spiking neural network that combines Squeeze-and-Excitation attention with learnable neuron dynamics for event-based vision. By operating SE blocks on membrane potentials and parameterizing key neuron properties, our model achieves state-of-the-art results on CIFAR10-DVS while preserving the energy efficiency and temporal coding advantages of SNNs. The experimental results validate the effectiveness of the proposed SE-PLIF-SNN architecture. Future work includes extending this framework to larger datasets (e.g., DVS128 Gesture) and exploring hardware-aware deployment on neuromorphic chips like Loihi or TrueNorth.

Author Contributions

For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used “Conceptualization, C.L. and C.Y.; methodology, C.L.; software, C.Y.; validation, CY.; writing—original draft preparation, C.Y.; writing—review and editing, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by 69 batches of general funding projects from the China Postdoctoral Science Foundation, China (Grant No. 2021M693858) , and Technological Innovation Program for Young People of Shenyang City, China (Grant No. RC210400), and Scientific Research Funding Project of the Education Department of Liaoning Province, China (Grant No. LJ212411035009).

Institutional Review Board Statement

Not applicable.

Acknowledgments

The authors would like to thank the editors and reviewers for providing useful comments and suggestions to improve the quality of this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sun, C.; Song, W.; Chen, Q.; Dai, C.; Fu, Y.; Li, L. An Energy Efficient Residual Spiking Neural Network Accelerator With Ternary Spikes. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2025, 44, 395–400. https://doi.org/10.1109/TCAD.2024.3443003. [CrossRef]
  2. Xia, Q.; Yu, Y.; Chang, Z.; Hui, B.; Luo, H. CPT-SNN: A spiking neural network that can combine the previous timestep. NEUROCOMPUTING 2025, 640. https://doi.org/10.1016/j.neucom.2025.130253. [CrossRef]
  3. Huang, Z.; Chang, Y.; Wu, W.; Zhao, C.; Luo, H.; He, S.; Guo, D. Modeling of Spiking Neural Network With Optimal Hidden Layer via Spatiotemporal Orthogonal Encoding for Patterns Recognition. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2025, 9, 2194–2207. https://doi.org/10.1109/TETCI.2025.3537944. [CrossRef]
  4. Zhang, H.; Wang, H.; An, J.; Zheng, S.; Wu, D. A lightweight spiking neural network for EEG-based motor imagery classification. NEURAL NETWORKS 2025, 191. https://doi.org/10.1016/j.neunet.2025.107741. [CrossRef]
  5. Saini, A.K.; Gehlot, N.; Kumar, R.; Hans, S.; Chaudhary, S.; Sharma, G. Spiking neural network-based energy-efficient framework for real-time robotic arm manipulation. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2026, 167. https://doi.org/10.1016/j.engappai.2026.113805. [CrossRef]
  6. Wang, Z.; Li, S.; Xuan, J.; Shi, T. Biologically inspired compound defect detection using a spiking neural network with continuous time-frequency gradients. ADVANCED ENGINEERING INFORMATICS 2025, 65. https://doi.org/10.1016/j.aei.2025.103132. [CrossRef]
  7. Gatti, M.; Barbato, J.A.; Zandron, C. Spiking neural network classification of X-ray chest images. KNOWLEDGE-BASED SYSTEMS 2025, 314. https://doi.org/10.1016/j.knosys.2025.113194. [CrossRef]
  8. Song, Y.; Han, L.; Zhang, T.; Xu, B. Multiscale fusion enhanced spiking neural network for invasive BCI neural signal decoding. FRONTIERS IN NEUROSCIENCE 2025, 19. https://doi.org/10.3389/fnins.2025.1551656. [CrossRef]
  9. Ramaswamy, R.K.; Rajendiran, A.; Devakanth, J.J.M.A.; Balan, S.K. SCSN-Net: Siamese convolutional spiking neural network for childhood medulloblastoma detection using microscopic images. KNOWLEDGE-BASED SYSTEMS 2026, 337. https://doi.org/10.1016/j.knosys.2026.115357. [CrossRef]
  10. Zuo, L.; Ding, Y.; Jing, M.; Yang, K.; Deng, H. Learning spatio-temporal consistency in spiking neural networks by self-distillation. PATTERN RECOGNITION 2026, 175. https://doi.org/10.1016/j.patcog.2026.113108. [CrossRef]
  11. Choi, H.; Kim, J.; Park, J.; Park, S.; Jang, H.J.; Lee, S.H.; Ju, B.K.; Jeong, Y. STAR-SNN: A spatio-temporal adaptive recurrent spiking neural network with separated propagation surrogate gradient for hardware efficient real-time learning. NEUROCOMPUTING 2026, 674. https://doi.org/10.1016/j.neucom.2026.132968. [CrossRef]
  12. Lu, Y.; Pan, Z.; Zhang, R.; Jia, Y.; Che, K.; Zhou, Z. Spatially-enhanced Spiking neural network for efficient point cloud analysis. NEURAL NETWORKS 2026, 195. https://doi.org/10.1016/j.neunet.2025.108190. [CrossRef]
  13. Gan, Y.; Dong, Y.; Guo, W.; Yan, C.; Zou, G. HASNN: Hierarchical attention spiking neural network for dynamic graph. KNOWLEDGE-BASED SYSTEMS 2026, 339. https://doi.org/10.1016/j.knosys.2026.115541. [CrossRef]
  14. Yan, J.; Liu, Q.; Zhang, M.; Feng, L.; Ma, D.; Li, H.; Pan, G. Efficient spiking neural network design via neural architecture search. NEURAL NETWORKS 2024, 173. https://doi.org/10.1016/j.neunet.2024.106172. [CrossRef]
  15. Yang, J.; Zhao, J. A novel parallel merge neural network with streams of spiking neural network and artificial neural network. INFORMATION SCIENCES 2023, 642. https://doi.org/10.1016/j.ins.2023.119034. [CrossRef]
  16. Tang, J.; Li, D.; Zhang, Z.; Zeng, Z. NG-SNN: A neurogenesis-inspired dynamic adaptive framework for efficient spike classification. NEURAL NETWORKS 2026, 199. https://doi.org/10.1016/j.neunet.2026.108656. [CrossRef]
  17. Li, Y.; Zhao, F.; Zhao, D.; Zeng, Y. Directly training temporal Spiking Neural Network with sparse surrogate gradient. NEURAL NETWORKS 2024, 179. https://doi.org/10.1016/j.neunet.2024.106499. [CrossRef]
  18. Hong, D.; Qi, Y.; Wang, Y. Quantifying knowledge during full-layer ANN-to-SNN knowledge distillation. PATTERN RECOGNITION 2026, 175. https://doi.org/10.1016/j.patcog.2026.113066. [CrossRef]
  19. Fan, H.; Zheng, H.; Wang, Z.; Mao, J.; Yin, H.; Guo, H.; Deng, L. Temporal local attention with adaptive decoding: Enhancing spiking neural networks for temporal computing applications. NEURAL NETWORKS 2026, 198. https://doi.org/10.1016/j.neunet.2026.108558. [CrossRef]
  20. Cheng, Y.C.; Hu, W.X.; He, Y.L.; Huang, J.Z. A comprehensive multimodal benchmark of neuromorphic training frameworks for spiking neural networks. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2025, 159. https://doi.org/10.1016/j.engappai.2025.111543. [CrossRef]
  21. Fang, W.; Chen, Y.; Ding, J.; Yu, Z.; Masquelier, T.; Chen, D.; Huang, L.; Zhou, H.; Li, G.; Tian, Y. SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence. SCIENCE ADVANCES 2023, 9. https://doi.org/10.1126/sciadv.adi1480. [CrossRef]
  22. Li, H.; Liu, H.; Ji, X.; Li, G.; Shi, L. CIFAR10-DVS: An Event-Stream Dataset for Object Classification. Frontiers in Neuroscience 2017, 11.
  23. Wang, J.; Lv, P.; Wang, H.; Shi, C. SAR-U-Net: Squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in Computed Tomography. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021, 208. https://doi.org/10.1016/j.cmpb.2021.106268. [CrossRef]
  24. Eshraghian, J.K.; Ward, M.; Neftci, E.O.; et al. Training spiking neural networks using lessons from deep learning. Proceedings of the IEEE 2023, 111, 1016–1054.
  25. Gerstner, W.; Kistler, W.M.; Naud, R.; Paninski, L. Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition; Cambridge University Press, 2014.
  26. Mostafa, H. Supervised learning based on temporal coding in spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems 2018, 29, 3227–3235.
  27. Thorpe, S.J.; Delorme, A.; Van Rullen, R. Spike rank order coding: A fast coding scheme for visual information processing. Neural Networks 2001, 14, 713–724.
  28. Kim, S.; Yu, J.J.; Yoon, H.; Park, B. Deep spiking neural network for image recognition based on phase coding. IEEE Access 2018, 6, 57512–57522.
  29. Su, Q.; Mei, S.; Xing, X.; Yao, M.; Zhang, J.; Xu, B.; Li, G. SNN-BERT: Training-efficient Spiking Neural Networks for energy-efficient BERT. Neural Networks 2024, 180, 106630. https://doi.org/https://doi.org/10.1016/j.neunet.2024.106630. [CrossRef]
  30. Pamu, P.; et al. Spiking Neural Networks in Imaging: A Review and Case Study. Sensors 2025, 25, 6747.
  31. Comsa, I.M.; Potempa, K.; Versari, L. Temporal Coding in Spiking Neural Networks With Alpha Synaptic Function: Learning With Backpropagation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022, 33, 5939–5952.
  32. Fang, W.; Yu, Z.; Chen, Y.; Huang, T.; Masquelier, T.; Tian, Y. Deep Residual Learning in Spiking Neural Networks. In Proceedings of the Neural Information Processing Systems, 2021.
  33. Hu, Y.; Deng, L.; Wu, Y.; Yao, M.; Li, G. Advancing Spiking Neural Networks Toward Deep Residual Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025, 36, 2353–2367. https://doi.org/10.1109/TNNLS.2024.3355393. [CrossRef]
  34. Zheng, H.; Wu, Y.; Deng, L.; Hu, Y.; Li, G. Going Deeper With Directly-Trained Larger Spiking Neural Networks. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2021, Vol. 35, pp. 10680–10688.
  35. Yi, Z.; Lian, J.; Liu, Q.; Zhu, H.; Liang, D.; Liu, J. Learning rules in spiking neural networks: A survey. Neurocomputing 2023, 531, 163–179. https://doi.org/https://doi.org/10.1016/j.neucom.2023.02.026. [CrossRef]
  36. Maass, W.; Natschläger, T.; Markram, H. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations. Neural Computation 2002, 14, 2531–2560.
  37. Zhang, Y.; Mo, L.; He, X.; Meng, X. Unsupervised spiking neural network based on liquid state machine and self-organizing map. Neurocomputing 2025, 620, 129120. https://doi.org/https://doi.org/10.1016/j.neucom.2024.129120. [CrossRef]
  38. Gao, S.; Fan, X.; Deng, X.; Hong, Z.; Zhou, H.; Zhu, Z. TE-Spikformer:Temporal-enhanced spiking neural network with transformer. Neurocomputing 2024, 602, 128268. https://doi.org/https://doi.org/10.1016/j.neucom.2024.128268. [CrossRef]
  39. Zhou, C.; Yu, L.; Zhou, Z.; Zhang, H.; Ma, Z.; Zhou, H.; Tian, Y. Spikingformer: A Key Foundation Model for Spiking Neural Networks. arXiv preprint arXiv:2304.11954 2024.
  40. Lu, C.; Du, H.; Wei, W.; Sun, Q.; Wang, Y.; Zeng, D.; Chen, W.; Zhang, M.; Yang, Y. ESTSformer: Efficient spatio-temporal spiking transformer. Neural Networks 2025, 191, 107786. https://doi.org/https://doi.org/10.1016/j.neunet.2025.107786. [CrossRef]
  41. Neftci, E.O.; Mostafa, H.; Zenke, F. Surrogate Gradient Learning in Spiking Neural Networks: Bringing the Power of Gradient-Based Optimization to Spiking Neural Networks. IEEE Signal Processing Magazine 2019, 36, 51–63. https://doi.org/10.1109/MSP.2019.2931595. [CrossRef]
  42. Caporale, N.; Dan, Y. Spike-Timing-Dependent Plasticity: A Hebbian Learning Rule. Annual Review of Neuroscience 2008, 31, 25–46.
  43. Diehl, P.U.; Neil, D.; Binas, J.; Cook, M.; Liu, S.C.; Pfeiffer, M. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, Killarney, Ireland, 2015; pp. 1–8. https://doi.org/10.1109/IJCNN.2015.7280696. [CrossRef]
  44. Rathi, N.; Srinivasan, G.; Panda, P.; Roy, K. Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation. 2020.
  45. Meng, Q.; Xiao, M.; Yan, S.; Wang, Y.; Lin, Z.; Luo, Z.Q. Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022, pp. 12444–12453. https://doi.org/10.1109/CVPR52688.2022.01210. [CrossRef]
  46. Fang, W.; Yu, Z.; Chen, Y.; Masquelier, T.; Huang, T.; Tian, Y. Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2641–2651. https://doi.org/10.1109/ICCV48922.2021.00266. [CrossRef]
  47. Wu, Y.; Deng, L.; Li, G.; Zhu, J.; Shi, L. Spatio-Temporal Backpropagation for Training High-Performance Spiking Neural Networks. Frontiers in Neuroscience 2018, 12, 331. https://doi.org/10.3389/fnins.2018.00331. [CrossRef]
  48. Zhou, Z.; Zhu, Y.; He, C.; Wang, Y.; Yan, S.; Tian, Y.; Yuan, L. Spikformer: When Spiking Neural Network Meets Transformer. In Proceedings of the International Conference on Learning Representations (ICLR), 2023, [arXiv:cs.NE/2209.15425].
  49. Bellec, G.; Scherr, F.; Subramoney, A.; Hajek, E.; Salaj, D.; Legenstein, R.; Maass, W. A solution to the vanishing gradient problem for spiking recurrent neural networks. In Proceedings of the Advances in Neural Information Processing Systems, 2020, Vol. 33, pp. 1385–1396.
  50. Li, Y.; Zhao, F.; Zhao, D.; Zeng, Y. Directly training temporal Spiking Neural Network with sparse surrogate gradient. Neural Networks 2024, 179, 106499. https://doi.org/https://doi.org/10.1016/j.neunet.2024.106499. [CrossRef]
  51. Davies, M.; Wild, A. Advancing neuromorphic computing with spiking neural networks: A review and perspective. Proceedings of the IEEE 2021, 109, 846–879. https://doi.org/10.1109/JPROC.2021.3059794. [CrossRef]
  52. Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations (ICLR). OpenReview, 2017.
Figure 1. Taxonomy of SNN learning algorithms. The three main branches include: (1) Biologically-plausible unsupervised learning (e.g., STDP); (2) Indirect supervised learning via ANN-to-SNN conversion; and (3) Direct supervised learning using surrogate gradient methods.
Figure 1. Taxonomy of SNN learning algorithms. The three main branches include: (1) Biologically-plausible unsupervised learning (e.g., STDP); (2) Indirect supervised learning via ANN-to-SNN conversion; and (3) Direct supervised learning using surrogate gradient methods.
Preprints 205192 g001
Figure 2. SE-SNN Overall Architecture for DVS-CIFAR10 Classification.
Figure 2. SE-SNN Overall Architecture for DVS-CIFAR10 Classification.
Preprints 205192 g002
Figure 3. Evolution of learnable parameters across different layers during training. (a) Time constant τ convergence. (b) Firing threshold v threshold convergence.
Figure 3. Evolution of learnable parameters across different layers during training. (a) Time constant τ convergence. (b) Firing threshold v threshold convergence.
Preprints 205192 g003
Figure 4. Average firing rates across different layers and timesteps. Lower layers exhibit higher activity due to rich edge information in event data.
Figure 4. Average firing rates across different layers and timesteps. Lower layers exhibit higher activity due to rich edge information in event data.
Preprints 205192 g004
Table 1. Comparison of Neural Encoding Schemes in SNNs.
Table 1. Comparison of Neural Encoding Schemes in SNNs.
Encoding Scheme Temporal Precision Energy Efficiency Noise Robustness Primary Applications
Rate Coding  [25] Low Low High Static image classification, ANN conversion
TTFS/Latency Coding  [26] High High Medium Real-time processing, event-based vision
Rank-Order Coding  [27] High High High Rapid categorization, olfaction
Phase Coding  [28] High Medium Medium Navigation, temporal pattern recognition
Σ Δ Encoding  [29] Medium High High Biomedical signals, wearable devices
Table 2. Comparison of SNN Learning Algorithms.
Table 2. Comparison of SNN Learning Algorithms.
Learning Paradigm Supervision Latency Accuracy Biological Plausibility
STDP  [42] Unsupervised Low Low High
ANN-to-SNN Conversion  [43] Supervised Very High Very High Low
Surrogate Gradient (BPTT)  [41] Supervised Low High Medium
Hybrid (Conversion + Fine-tuning)  [44] Supervised Medium Very High Medium
Local Supervised Learning  [45] Supervised Low Medium High
Table 3. Detailed Architecture Configuration of SE-SNN.
Table 3. Detailed Architecture Configuration of SE-SNN.
Stage Layer Configuration Output Size Stride SE Params
Input - DVS event frames ( N , T , 2 , 128 , 128 ) - - -
Stem Conv 3 × 3 , 64 ( N , T , 64 , 128 , 128 ) 1 No 1,152
BN - - - - 128
PLIF τ = 10.0 , v t h = 0.4 - - - 2
MaxPool 2 × 2 ( N , T , 64 , 64 , 64 ) 2 - -
Layer1 SE-ResBlock [ 3 × 3 , 64 ] × 2 ( N , T , 64 , 64 , 64 ) 1 Yes 74,240
SE-ResBlock [ 3 × 3 , 64 ] × 2 ( N , T , 64 , 64 , 64 ) 1 Yes 74,240
Layer2 SE-ResBlock [ 3 × 3 , 128 ] × 2 ( N , T , 128 , 32 , 32 ) 2 Yes 262,784
SE-ResBlock [ 3 × 3 , 128 ] × 2 ( N , T , 128 , 32 , 32 ) 1 Yes 262,784
Layer3 SE-ResBlock [ 3 × 3 , 256 ] × 2 ( N , T , 256 , 16 , 16 ) 2 Yes 1,049,088
SE-ResBlock [ 3 × 3 , 256 ] × 2 ( N , T , 256 , 16 , 16 ) 1 Yes 1,049,088
Layer4 SE-ResBlock [ 3 × 3 , 512 ] × 2 ( N , T , 512 , 8 , 8 ) 2 Yes 4,195,840
SE-ResBlock [ 3 × 3 , 512 ] × 2 ( N , T , 512 , 8 , 8 ) 1 Yes 4,195,840
Neck AdaptiveAvgPool ( 4 , 4 ) ( N , T , 512 , 4 , 4 ) - - -
TemporalMax max over T ( N , 512 , 4 , 4 ) - - -
Head Flatten - ( N , 8192 ) - - -
Dropout p = 0.5 - - - -
FC + PLIF 8192 1024 ( N , 1024 ) - - 8,389,632
Dropout p = 0.5 - - - -
FC 1024 10 ( N , 10 ) - - 10,250
Total Parameters 19,566,066
Table 4. Comparison with state-of-the-art methods on CIFAR10-DVS.
Table 4. Comparison with state-of-the-art methods on CIFAR10-DVS.
Method Type Architecture Timestep Accuracy (%) Params (M)
STBP-tdBN  [34] Direct ResNet-19 10 67.8 12.6
PLIF  [46] Direct PLIF-Net 20 74.8 11.3
SEW ResNet  [32] Direct Wide-7B-Net 16 74.4 15.8
SE-PLIF-SNN (Ours) Direct SE-ResNet 16 78.8±0.2 19.6
SE-PLIF-SNN (Ours) Direct SE-ResNet 10 76.5 ± 0.3 19.6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated