Submitted:
19 February 2024
Posted:
20 February 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A high number of neurons in a compact implementation by making heavy use of pipelining and distributed memory, using 33% less FPGA space and power per neuron than current architectures while keeping simulation step times under 140.
- Support for a wide variety of neuron models by following a powerful design pattern, which also makes it an interesting choice to perform automatic mapping from network description languages [16] to FPGA-based SNN accelerators.
2. Background
2.1. Neuron Model Rules
- The internal state of a neuron is modelled by a set of differential equations (deterministic or stochastic, ordinary or partial) and continuously evolves with time.
- Input spikes received through the input synapses trigger changes in those state variables (spikes are binary, discrete events).
- Output spikes are generated when some condition is satisfied.
- Neuron dynamics: the set of differential equations stated as the rate of change of the neuron statewhere is a vector holding the state of the neuron after the input synaptic processing.
- Equation solver: since the simulation is discrete in time, we need a numerical method to solve the next step of the neuron state. An example of this is the Forward Euler methodwhere is the simulation step time. This method is accurate enough to showcase the LIF model.
2.2. Clock-Driven vs. Event-Driven Simulations
2.3. Motivation and Previous Works
3. Fully-Connected Core
- In high-level SNN description languages, neurons are usually defined in groups or populations that are easily mapped into neuron cores. These can then be sized accordingly based on the number of neurons and synapses.
- All processing inside a neuron model is self-contained, meaning that the information inside each neuron does not depend on other neurons. This, together with the fact that simulation step times (0.1 ms to 10 ms) are usually orders of magnitude larger than typical FPGA clock speeds, makes hardware pipelining an interesting choice.
- Since internal states and synaptic weights are accessed sequentially and independently, we can use distributed memory blocks, which are readily available on FPGAs.
3.1. Architecture
3.2. LIF Core Implementation
- Synaptic processing: the core implements the synaptic model of Equation (2), where static weights are used for training and inference. To simplify the implementation, weights can not be updated on the fly and the simulation relies on offline training. The virtual synapse implements the LIF state reset:taking the previous membrane voltage and resetting it to its rest potential if the neuron has spiked in the previous time step. After that, the following synaptic stages take that updated potential and sequentially process the input spikes
- Neuron dynamics: the state equations are hard-coded into the core definition, although they can be modified before synthesis if the model needs to be changed. The rate of change for this simplified LIF model is a linear function of the difference between the membrane voltage and its resting potentialwhere is the membrane potential after the synaptic processing and is a parameter that determines how fast the membrane goes back to its resting potential. This coefficient is a power of two to implement the multiplication as a bit-shift, but the core can incorporate multipliers if the model needs more generic parameters. The equation solver simply adds this value to the previous state, assimilating the value into .
- Spike output: the spike output condition function of Equation (5) is implemented in this case as a simple threshold which is compared against the state variablewhere is an arbitrary constant threshold potential.
3.3. Latency
3.4. Quantization and Memory
4. Convolutional Core
- Removing all unnecessary zeros: every output neuron only processes information about a small region of the input space. That means that the rest of them are zero and can be ignored. A fully-connected layer would have , while a layer with no zero-valued weights would have .
- Storing kernel weights only once: all neurons share the same kernel weights, so the number of synaptic weights drops again to .
- Emulating kernel stride/movement: by carefully arranging the kernel weights and getting the input information in a certain order, we can take advantage of the input synaptic pipeline to "move" the kernel around. This is done all while minimizing the amount of intermediate storage needed inside the pipeline.
4.1. 2D Convolution
4.2. Single-Input Single-Kernel 2D Convolution
4.2.1. Pipelining
4.2.2. Control and Padding
4.3. Multiple-Input Single-Kernel 2D Convolution
3D Line Buffer
4.4. Multiple-Input Multiple-Kernel 2D Convolution
5. Network
6. Results
6.1. Latency and Minimum Timestep
6.2. Benchmarking

7. Conclusions
- Takes advantage of pipelining and neuron data independence to accelerate simulations and reduce hardware usage, using around 33% less slices per neuron than the best current implementation.
- Achieves very low-power operation, nearly 4 times as less power per neuron as the most efficient state-of-the-art accelerator.
- Manages to keep simulation step times low, which contributes to making more accurate simulations and leaves room for bigger networks.
- Makes it easy to map high-level descriptions of SNNs into hardware.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural Networks 1997, 10, 1659–1671. [Google Scholar] [CrossRef]
- Kasabov, N.K. Time-Space, Spiking Neural Networks and Brain-Inspired Artificial Intelligence; Springer Berlin Heidelberg, 2019.
- Yamazaki, K.; Vo-Ho, V.K.; Bulsara, D.; Le, N. Spiking Neural Networks and Their Applications: A Review. Brain Sciences 2022, 12, 863. [Google Scholar] [CrossRef]
- Frenkel, C.; Bol, D.; Indiveri, G. Bottom-Up and Top-Down Neural Processing Systems Design: Neuromorphic Intelligence as the Convergence of Natural and Artificial Intelligence, 2021. arXiv:2106.01288.
- Bogdan, P.A.; Marcinnò, B.; Casellato, C.; Casali, S.; Rowley, A.G.; Hopkins, M.; Leporati, F.; D’Angelo, E.; Rhodes, O. Towards a Bio-Inspired Real-Time Neuromorphic Cerebellum. Frontiers in Cellular Neuroscience 2021, 15. [Google Scholar] [CrossRef]
- Pei, J.; Deng, L.; Song, S.; Zhao, M.; Zhang, Y.; Wu, S.; Wang, G.; Zou, Z.; Wu, Z.; He, W.; Chen, F.; Deng, N.; Wu, S.; Wang, Y.; Wu, Y.; Yang, Z.; Ma, C.; Li, G.; Han, W.; Li, H.; Wu, H.; Zhao, R.; Xie, Y.; Shi, L. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 2019, 572, 106–111. [Google Scholar] [CrossRef]
- Indiveri, G. Computation in Neuromorphic Analog VLSI Systems. In Perspectives in Neural Computing; Springer London, 2002; pp. 3–20.
- Davidson, S.; Furber, S.B. Comparison of Artificial and Spiking Neural Networks on Digital Hardware. Frontiers in Neuroscience 2021, 15. [Google Scholar] [CrossRef]
- Basu, A.; Deng, L.; Frenkel, C.; Zhang, X. Spiking Neural Network Integrated Circuits: A Review of Trends and Future Directions. 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022, pp. 1–8. [CrossRef]
- Golosio, B.; Tiddia, G.; Luca, C.D.; Pastorelli, E.; Simula, F.; Paolucci, P.S. Fast Simulations of Highly-Connected Spiking Cortical Models Using GPUs. Frontiers in Computational Neuroscience 2021, 15. [Google Scholar] [CrossRef]
- Furber, S.B.; Lester, D.R.; Plana, L.A.; Garside, J.D.; Painkras, E.; Temple, S.; Brown, A.D. Overview of the SpiNNaker System Architecture. IEEE Transactions on Computers 2013, 62, 2454–2467. [Google Scholar] [CrossRef]
- Akopyan, F.; Sawada, J.; Cassidy, A.; Alvarez-Icaza, R.; Arthur, J.; Merolla, P.; Imam, N.; Nakamura, Y.; Datta, P.; Nam, G.J.; Taba, B.; Beakes, M.; Brezzo, B.; Kuang, J.B.; Manohar, R.; Risk, W.P.; Jackson, B.; Modha, D.S. TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2015, 34, 1537–1557. [Google Scholar] [CrossRef]
- Davies, M.; Srinivasa, N.; Lin, T.H.; Chinya, G.; Cao, Y.; Choday, S.H.; Dimou, G.; Joshi, P.; Imam, N.; Jain, S.; Liao, Y.; Lin, C.K.; Lines, A.; Liu, R.; Mathaikutty, D.; McCoy, S.; Paul, A.; Tse, J.; Venkataramanan, G.; Weng, Y.H.; Wild, A.; Yang, Y.; Wang, H. Loihi: A Neuromorphic Manycore Processor with On-Chip Learning. IEEE Micro 2018, 38, 82–99. [Google Scholar] [CrossRef]
- Deng, L.; Wang, G.; Li, G.; Li, S.; Liang, L.; Zhu, M.; Wu, Y.; Yang, Z.; Zou, Z.; Pei, J.; Wu, Z.; Hu, X.; Ding, Y.; He, W.; Xie, Y.; Shi, L. Tianjic: A Unified and Scalable Chip Bridging Spike-Based and Continuous Neural Computation. IEEE Journal of Solid-State Circuits 2020, 55, 2228–2246. [Google Scholar] [CrossRef]
- Pham, Q.T.; Nguyen, T.Q.; Hoang, P.C.; Dang, Q.H.; Nguyen, D.M.; Nguyen, H.H. A review of SNN implementation on FPGA. 2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR). IEEE, 2021. [CrossRef]
- Davison, A.; Brüderle, D.; Eppler, J.; Kremkow, J.; Muller, E.; Pecevski, D.; Perrinet, L.; Yger, P. PyNN: a common interface for neuronal network simulators. Frontiers in Neuroinformatics 2009, 2. [Google Scholar] [CrossRef]
- Brette, R.; others. Simulation of networks of spiking neurons: A review of tools and strategies. Journal of Computational Neuroscience 2007, 23, 349–398. [Google Scholar] [CrossRef]
- Neil, D.; Liu, S.C. Minitaur, an Event-Driven FPGA-Based Spiking Network Accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2014, 22, 2621–2628. [Google Scholar] [CrossRef]
- Mo, L.; Tao, Z. EvtSNN: Event-driven SNN simulator optimized by population and pre-filtering. Frontiers in Neuroscience 2022, 16. [Google Scholar] [CrossRef]
- Ma, D.; Shen, J.; Gu, Z.; Zhang, M.; Zhu, X.; Xu, X.; Xu, Q.; Shen, Y.; Pan, G. Darwin: A neuromorphic hardware co-processor based on spiking neural networks. Journal of Systems Architecture 2017, 77, 43–51. [Google Scholar] [CrossRef]
- Gupta, S.; Vyas, A.; Trivedi, G. FPGA Implementation of Simplified Spiking Neural Network. 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 2020. [CrossRef]
- Wang, Q.; Li, Y.; Shao, B.; Dey, S.; Li, P. Energy efficient parallel neuromorphic architectures with approximate arithmetic on FPGA. Neurocomputing 2017, 221, 146–158. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, Y.; Ye, W.; Gui, Y. FPGA-NHAP: A General FPGA-Based Neuromorphic Hardware Acceleration Platform With High Speed and Low Power. IEEE Transactions on Circuits and Systems I: Regular Papers 2022, 69, 2553–2566. [Google Scholar] [CrossRef]
- Gerlinghoff, D.; Wang, Z.; Gu, X.; Goh, R.S.M.; Luo, T. E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks With Emerging Neural Encoding on FPGAs. IEEE Transactions on Parallel and Distributed Systems 2022, 33, 3207–3219. [Google Scholar] [CrossRef]
- Carpegna, A.; Savino, A.; Carlo, S.D. Spiker: an FPGA-optimized Hardware accelerator for Spiking Neural Networks. 2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2022. [CrossRef]
- Rueckauer, B.; Lungu, I.A.; Hu, Y.; Pfeiffer, M.; Liu, S.C. Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification. Frontiers in Neuroscience 2017, 11. [Google Scholar] [CrossRef]















| 2D conv. | Pooling | 2D conv. | Pooling | Dense | Dense | Dense | Total | |
|---|---|---|---|---|---|---|---|---|
| (24x24x6) | (12x12x6) | (8x8x16) | (4x4x16) | (120) | (84) | (10) | ||
| Neurons | 3 456 | 864 | 1 024 | 256 | 120 | 84 | 10 | 5 814 |
| Synapses | 86 400 | 3 456 | 153 600 | 1 024 | 30 720 | 10 080 | 840 | 286 120 |
| Weights | 150 | 4 | 2400 | 4 | 30 720 | 10 080 | 840 | 44 198 |
| Slice | Slice | Slices | LUT as | LUT as | Block | |
|---|---|---|---|---|---|---|
| LUTs | Registers | Logic | Memory | RAM | ||
| Input | 21 | 38 | 13 | 21 | 0 | 0 |
| 2D conv. (24x24x6) | 564 | 709 | 184 | 547 | 17 | 2.5 |
| Pooling (12x12x6) | 98 | 129 | 51 | 87 | 11 | 1 |
| 2D conv. (8x8x16) | 3420 | 3984 | 1044 | 2973 | 447 | 0 |
| Pooling (4x4x16) | 81 | 117 | 42 | 67 | 14 | 0.5 |
| Dense (120) | 5890 | 7184 | 1986 | 5878 | 12 | 68.5 |
| Dense (84) | 2128 | 3062 | 789 | 2118 | 10 | 60.5 |
| Dense (10) | 2051 | 2783 | 654 | 2033 | 18 | 0 |
| Spike Memory | 15 | 4 | 6 | 1 | 14 | 2 |
| Total | 14266 | 18010 | 4744 | 13723 | 543 | 135 |
| Total (%) | 22.5 % | 14.2 % | 29.9 % | 21.7 % | 2.9 % | 100 % |
| Available | 63400 | 126800 | 15850 | 63400 | 19000 | 135 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).