Submitted:
06 March 2025
Posted:
07 March 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- 1.
- Outstanding trade-off between classification accuracy and complexity. The proposed KWS system employs the same analog feature extraction configuration as in [9], and achieves a higher accuracy of 92.33% with the full precision model and 91.35% with the quantized model, compared to 91.00% of [9]. Pattern classification is made by a simple GRU-based structure, utilizing only 80 neurons per layer, compared to 128 neurons needed in [9].
- 2.
- Use of Learn Step Size (LSQ) and Look-Up Table (LUT)-Aware Quantization for high efficient computation, unlike previous works that use post-training quantization or fixed-point quantization. This approach enables adaptive quantization during training, reducing power and memory consumption while maintaining high recognition accuracy. The proposed model operates with 4-bit weights and 8-bit activation functions, significantly reducing computational complexity compared to prior works that rely on 8-bit weights and 12- or higher bit activation functions [11,14].
2. Description of the Proposed KWS Architecture
2.1. Analog Filter Bank Feature Extractor
2.2. GRU-Based Classifier
3. Learned Step Size Quantized GRU Classifier
3.1. Background
3.2. Methodology
| Algorithm 1: Quantization for Sigmoid and Tanh with LUT Approximation |
![]() |
| Algorithm 2: Learned Step Size Quantization (LSQ) with Gradient Scaling |
![]() |
4. Training, Performance Results and Comparison
4.1. Data Preparation
4.2. Training
- Number of epochs: 20 to 150;
- Batch size: 16 to 256;
- Learning rate: 1e-5 to 0.1;
- Weight decay: 1e-6 to 0.01;
- GRU dropout: 0.0 to 0.5;
- Learning rate scheduler decay factor: 0.1 to 0.9.
| Algorithm 3: Initialization of Activation Quantization Step Size and Zero Point |
![]() |
| Algorithm 4: Initialization of Weight Quantization Step Size |
![]() |
4.3. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| LSQ | Learned step size |
| GRU | Gated recurrent unit |
| KWS | Keyword spotting |
| GSCDv2 | Google speech command dataset v2 |
| MAC | Multiply-accumulate |
| AI | Artificial intelligence |
| ASR | Automatic speech recognition |
| VAD | Voice activity detection |
| AFE | Analog front-end |
| LUT | Loop-up table |
| FC | fully connected |
| RNN | Recurrent neural network |
| LSTM | Long-short term memory |
| CNN | Convolutional neural network |
| CRNN | Convolutional recurrent neural network |
| DS-CNN | Depth-wise separable convolutional neural network |
| PTQ | Post-training quantization |
| QAT | Quantization-aware training |
| FQN | Fake quantization node |
| STE | Straight-through estimator |
| TPE | Tree-structured parzen estimator |
| TPR | True positive rate |
| FEx | Feature extractor |
References
- Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet of Things Journal 2016, 3, 637–646. [Google Scholar] [CrossRef]
- Yu, C.H.; Kim, H.E.; Shin, S.; Bong, K.; Kim, H.; Boo, Y.; Bae, J.; Kwon, M.; Charfi, K.; Kim, J.; et al. 2.4 ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications. In Proceedings of the 2024 IEEE International Solid-State Circuits Conference (ISSCC), 2024, Vol. 67, pp. 42–44. [CrossRef]
- Yang, M.; Yeh, C.H.; Zhou, Y.; Cerqueira, J.P.; Lazar, A.A.; Seok, M. Design of an Always-On Deep Neural Network-Based 1- μ W Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction. IEEE Journal of Solid-State Circuits 2019, 54, 1764–1777. [Google Scholar] [CrossRef]
- Shen, Y.; Straeussnigg, D.; Gutierrez, E. Towards Ultra-Low Power Consumption VAD Architectures with Mixed Signal Circuits. In Proceedings of the 2023 IEEE International Symposium on Circuits and Systems (ISCAS); 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Shan, W.; Yang, M.; Wang, T.; Lu, Y.; Cai, H.; Zhu, L.; Xu, J.; Wu, C.; Shi, L.; Yang, J. A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS. IEEE Journal of Solid-State Circuits 2021, 56, 151–164. [Google Scholar] [CrossRef]
- Kawada, A.; Kobayashi, K.; Shin, J.; Sumikawa, R.; Hamada, M.; Kosuge, A. A 250.3μW Versatile Sound Feature Extractor Using 1024-point FFT 64-ch LogMel Filter in 40nm CMOS. In Proceedings of the 2024 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Nov 2024, pp. 183–187. [CrossRef]
- Gutierrez, E.; Perez, C.; Hernandez, F.; Hernandez, L. Time-Encoding-Based Ultra-Low Power Features Extraction Circuit for Speech Recognition Tasks. Electronics 2020, 9. [Google Scholar] [CrossRef]
- Shen, Y.; Perez, C.; Straeussnigg, D.; Gutierrez, E. Time-Encoded Mostly Digital Feature Extraction for Voice Activity Detection Tasks. In Proceedings of the 2024 IEEE International Symposium on Circuits and Systems (ISCAS); 2024; pp. 1–5. [Google Scholar] [CrossRef]
- Mostafa, A.; Hardy, E.; Badets, F. 17.8 0.4V 988nW Time-Domain Audio Feature Extraction for Keyword Spotting Using Injection-Locked Oscillators. In Proceedings of the 2024 IEEE International Solid-State Circuits Conference (ISSCC), 2024, Vol. 67, pp. 328–330. [CrossRef]
- Croce, M.; Friend, B.; Nesta, F.; Crespi, L.; Malcovati, P.; Baschirotto, A. A 760-nW, 180-nm CMOS Fully Analog Voice Activity Detection System for Domestic Environment. IEEE Journal of Solid-State Circuits 2021, 56, 778–787. [Google Scholar] [CrossRef]
- Kim, K.; Gao, C.; Graça, R.; Kiselev, I.; Yoo, H.J.; Delbruck, T.; Liu, S.C. A 23-μW Keyword Spotting IC With Ring-Oscillator-Based Time-Domain Feature Extraction. IEEE Journal of Solid-State Circuits 2022, 57, 3298–3311. [Google Scholar] [CrossRef]
- Narayanan, S.; Cartiglia, M.; Rubino, A.; Lego, C.; Frenkel, C.; Indiveri, G. SPAIC: A sub-μW/Channel, 16-Channel General-Purpose Event-Based Analog Front-End with Dual-Mode Encoders. In Proceedings of the 2023 IEEE Biomedical Circuits and Systems Conference (BioCAS), Oct 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Chen, Q.; Kim, K.; Gao, C.; Zhou, S.; Jang, T.; Delbruck, T.; Liu, S.C. DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM. IEEE Transactions on Circuits and Systems for Artificial Intelligence 2024, pp. 1–9. [CrossRef]
- Yang, H.; Seol, J.H.; Rothe, R.; Fan, Z.; Zhang, Q.; Kim, H.S.; Blaauw, D.; Sylvester, D. A 1.5-μW Fully-Integrated Keyword Spotting SoC in 28-nm CMOS With Skip-RNN and Fast-Settling Analog Frontend for Adaptive Frame Skipping. IEEE Journal of Solid-State Circuits 2023, 59, 29–39. [Google Scholar] [CrossRef]
- Zhang, Y.; Suda, N.; Lai, L.; Chandra, V. Hello Edge: Keyword Spotting on Microcontrollers, 2018. arXiv:cs.SD/1711.07128.
- Bartels, J.; Hagihara, A.; Minati, L.; Tokgoz, K.K.; Ito, H. An Integer-Only Resource-Minimized RNN on FPGA for Low-Frequency Sensors in Edge-AI. IEEE Sensors Journal 2023, 23, 17784–17793. [Google Scholar] [CrossRef]
- Campos, V.; Jou, B.; i Nieto, X.G.; Torres, J.; Chang, S.F. Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks, 2018. arXiv:cs.AI/1708.06834.
- Gao, C.; Neil, D.; Ceolini, E.; Liu, S.C.; Delbruck, T. DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator. In Proceedings of the Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA, 2018; FPGA ’18, p. 21–30. [CrossRef]
- Kusupati, A.; Singh, M.; Bhatia, K.; Kumar, A.; Jain, P.; Varma, M. FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network, 2019. arXiv:cs.LG/1901.02358.
- Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces, 2024. arXiv:cs.LG/2312.00752.
- Deng, L.; Li, G.; Han, S.; Shi, L.; Xie, Y. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proceedings of the IEEE 2020, 108, 485–532. [Google Scholar] [CrossRef]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, 2017. arXiv:cs.LG/1712.05877].
- Bartels, J.; Tokgoz, K.K.; A, S.; Fukawa, M.; Otsubo, S.; Li, C.; Rachi, I.; Takeda, K.I.; Minati, L.; Ito, H. TinyCowNet: Memory- and Power-Minimized RNNs Implementable on Tiny Edge Devices for Lifelong Cow Behavior Distribution Estimation. IEEE Access 2022, 10, 32706–32727. [Google Scholar] [CrossRef]
- Sari, E.; Courville, V.; Nia, V.P. iRNN: Integer-only Recurrent Neural Network, 2022. arXiv:cs.LG/2109.09828].
- Esser, S.K.; McKinstry, J.L.; Bablani, D.; Appuswamy, R.; Modha, D.S. Learned Step Size Quantization, 2020. arXiv:cs.LG/1902.08153].
- Bhalgat, Y.; Lee, J.; Nagel, M.; Blankevoort, T.; Kwak, N. LSQ+: Improving low-bit quantization through learnable offsets and better initialization, 2020. arXiv:cs.CV/2004.09576.
- PyTorch. GRU — PyTorch 2.6.0 Documentation. Available online: https://pytorch.org/docs/stable/generated/torch.nn.GRU.html (accessed on 12 February 2025).
- Bengio, Y.; Léonard, N.; Courville, A. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation, 2013. arXiv:cs.LG/1308.3432].
- Warden, P. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition, 2018. arXiv:cs.CL/1804.03209.
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization, 2019. arXiv:cs.LG/1711.05101.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 2015. arXiv:cs.CV/1502.01852.
- Rybakov, O.; Kononenko, N.; Subrahmanya, N.; Visontai, M.; Laurenzo, S. Streaming Keyword Spotting on Mobile Devices. In Proceedings of the Interspeech 2020. ISCA, 2020. [CrossRef]
- Villamizar, D.A.; Muratore, D.G.; Wieser, J.B.; Murmann, B. An 800 nW Switched-Capacitor Feature Extraction Filterbank for Sound Classification. IEEE Transactions on Circuits and Systems I: Regular Papers 2021, 68, 1578–1588. [Google Scholar] [CrossRef]
| 1 | |
| 2 |









| Villamizar TCASI’21[34] |
Kim JSSC’22[11] |
Yang JSSC’23[14] |
Mostafa ISSCC’24[9] |
Chen TCASAI’24[13] |
This work |
|
|---|---|---|---|---|---|---|
| Feature Ex. | Analog | Analog | Digital | Analog | Digital | Analog1 |
| # Channels | 32 | 16 | 26 | 16 | 10 | 16 |
| Classifier | Li-GRU2 | GRU | Skip RNN | GRU2 | Delta RNN | GRU3 |
| # RNN layers | - | 2 | 1 | 2 | 1 | 2 |
| Units / layer | - | 48 | 64 | 128 | 64 | 80 |
| NN Quant. | - | 8b w, 14b acti. | 8b w, 12b acti. | - | 8b w, - acti. | 4b w, 8b acti. |
| NN Memory (kB) | - | 24 | 18 | - | 24 | 34.84 |
| Dataset | GSCDv2 | GSCDv2 | GSCDv1 | GSCDv2 | GSCDv2 | GSCDv2 |
| # Classes (# KWs) | 12 (10) | 12 (10) | 7 (5) | 10 (10) | 12 (10) | 12 (10) |
| Accuracy (%) | 92.10% | 86.03% | 92.80% | 91.00% | 89.50% | 91.35% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).



