Submitted:
26 May 2025
Posted:
27 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Gesture Recognition with Convolutional Neural Networks

3. HW/SW Architecture for Gesture Recognition
4. Software Model Implementation in C
5. Floating-Point vs Fixed-Point Representations
5.1. Floating-Point
5.2. Fixed-Point
- RND: Round to plus infinity.
- RND_ZERO: Round to zero.
- RND_MIN_INF: Round to minus infinity.
- RND_INF: Round to infinity.
- RND_CONV: Convergent rounding.
- TRN: Truncation to minus infinity (default).
- TRN_ZERO: Truncation to zero.
6. Digital System Design Optimizations
6.1. Pipelining
6.2. Loop Unroll
6.3. Merging
7. Evaluation of the Proposed System
7.1. Classification Accuracy
- Parameters: This group contains all the parameters of every kernel and bias values for all layers (4300 values stored).
- Input: This group contains the input values (384 values stored).
- 1st Convolution: This group contains the output values of the 1st Convolution (3072 values stored).
- 1st MaxPool: This group contains the output values of the 1st MaxPool (336 values stored).
- 2nd Convolution: This group contains the output values of the 2nd Convolution (672 values stored).
- 2nd MaxPool: This group contains the output values of the 2nd MaxPool (224 values stored).
- 1st Dense: This group contains the output values of the 1st Dense (16 values stored).
- 2nd Dense: This group contains the output values of the 2nd Dense (4 values stored).
7.2. Impact of Design Optimizations
7.3. FPGA Resources
- Design Solution 1: This DS is after the Fixed-Point is implemented, before the Pipeline optimization.
- Design Solution 2: This DS is after the Pipeline optimization is implemented, before the Loop Unroll optimization.
- Design Solution 3: This DS is after the Loop Unroll optimization is implemented, before the Merge optimization.
- Design Solution 4: This DS is after the Merge optimization is implemented.
- Design Solution 5: This DS is the final architecture with bit-width optimization (Section 7.2).
7.4. System Performance
8. Related Works
9. Discussion
10. Conclusions
Funding
References
- Tensorflow. TensorFlow Lite. https://www.tensorflow.org/lite/guide, 2022.
- Daniel Situnayake, P.W. , TinyML; O’Reilly Media, 2019; chapter 11, 12.
- IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2008 2008, pp. 1–70. [CrossRef]
- Xilinx. Vitis High-Level Synthesis User Guide (UG1399). https://docs.xilinx.com/r/2021.1-English/ug1399-vitis-hls, 2021.
- Tsai, Y.C.; Lai, Y.H.; Xu, C.H.; Ruan, S.J. FPGA-based implementation of a dynamic hand gesture recognition system. In Proceedings of the IET International Conference on Engineering Technologies and Applications (ICETA 2023); 2023; 2023, pp. 41–42. [Google Scholar] [CrossRef]
- Eggimann, M.; Erb, J.; Mayer, P.; Magno, M.; Benini, L. Low Power Embedded Gesture Recognition Using Novel Short-Range Radar Sensors. In Proceedings of the 2019 IEEE SENSORS; 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Zhang, T.; Zhou, W.; Jiang, X.; Liu, Y. FPGA-based Implementation of Hand Gesture Recognition Using Convolutional Neural Network. In Proceedings of the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS); 2018; pp. 133–138. [Google Scholar] [CrossRef]
- V, V.; C, R.A.; Prasanna, R.; Kakarla, P.C.; PJ, V.S.; Mohan, N. Implementation Of Tiny Machine Learning Models On Arduino 33 BLE For Gesture And Speech Recognition, 2022, [arXiv:eess.AS/2207.12866]. 2022; arXiv:eess.AS/2207.12866]. [Google Scholar]
- Núñez-Prieto, R.; Gómez, P.C.; Liu, L. A Real-Time Gesture Recognition System with FPGA Accelerated ZynqNet Classification. In Proceedings of the 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC); 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Khalife, R.; Mrad, R.; Dabbous, A.; Ibrahim, A. Real-Time Implementation of Tiny Machine Learning Models for Hand Motion Classification. In Proceedings of the Applications in Electronics Pervading Industry, Environment and Society; Bellotti, F.; Grammatikakis, M.D.; Mansour, A.; Ruo Roch, M.; Seepold, R.; Solanas, A.; Berta, R., Eds., Cham; 2024; pp. 487–492. [Google Scholar]
- Zhou, W.; Lyu, C.; Jiang, X.; Li, P.; Chen, H.; Liu, Y.H. Real-time implementation of vision-based unmarked static hand gesture recognition with neural networks based on FPGAs. In Proceedings of the 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO); 2017; pp. 1026–1031. [Google Scholar] [CrossRef]
- Raza, W. Hand Gesture Recognition Using TinyML on OpenMV, 2023. (Date last accessed 03-March-2025).




| Layer | Input Shape | Output Shape | Number of Parameters |
|---|---|---|---|
| Convolution 2D | (128, 3, 1) | (128, 3, 8) | 104 |
| MaxPooling 2D | (128, 3, 8) | (42, 1, 8) | 0 |
| Convolution 2D | (42, 1, 8) | (42, 1, 16) | 528 |
| MaxPooling 2D | (42, 1, 16) | (14, 1, 16) | 0 |
| Flatten | (14, 1, 16) | (224) | 0 |
| Dense | (224) | (16) | 3600 |
| Dense | (16) | (4) | 68 |
| Gesture | W | O | L | Negative | Total |
|---|---|---|---|---|---|
| Maximum Absolute Error | 1.82E-06 | 1.88E-06 | 2.83E-06 | 2.50E-06 | 2.83E-06 |
| Mean Absolute Error | 1.95E-08 | 6.09E-08 | 1.50E-07 | 1.23E-07 | 8.17E-08 |
| Quadratic Absolute Error | 1.29E-14 | 2.65E-14 | 1.42E-13 | 1.26E-13 | 6.71E-14 |
| Gesture | W | O | L | Negative | Total |
|---|---|---|---|---|---|
| Maximum Error | 1.82E-06 | 1.94E-06 | 2.22E-06 | 1.55E-06 | 2.22E-06 |
| Minimum Error | -1.79E-06 | -1.97E-06 | -2.21E-06 | -1.46E-06 | -2.21E-06 |
| Average Error | 2.70E-10 | -1.37E-09 | -8.34E-11 | 1.28E-09 | -2.42E-10 |
| Quadratic Error | 7.08E-15 | 2.47E-14 | 6.35E-14 | 3.54E-14 | 3.21E-14 |
| Standard Deviation | 1.13E-07 | 2.13E-07 | 3.34E-07 | 2.83E-07 | 2.43E-07 |
| TRN | TRN_ZERO | RND_ZERO | RND_CONV | |
|---|---|---|---|---|
| Maximum Error | 1.89E-06 | 1.89E-06 | 9.31E-07 | 9.31E-07 |
| Minimum Error | 0.00E+00 | -1.82E-06 | -9.37E-07 | -9.37E-07 |
| Mean Error | 9.43E-07 | -1.50E-07 | 2.90E-08 | 2.90E-08 |
| Mean Absolute Error | 9.43E-07 | 9.78E-07 | 4.47E-07 | 4.47E-07 |
| Quadratic Error | 1.23E-12 | 1.30E-12 | 2.85E-13 | 2.85E-13 |
| Standard Deviation | 5.84E-07 | 1.13E-06 | 5.33E-07 | 5.33E-07 |
| Mode | 1.70E-06 | 8.94E-07 | 7.00E-07 | 7.00E-07 |
| Median | 8.94E-07 | -1.90E-07 | 9.69E-08 | 9.69E-08 |
| DSP | FF | LUT | |
|---|---|---|---|
| TRN | 2 | 181 | 266 |
| TRN_ZERO | 2 | 182 | 320 |
| RND_ZERO | 2 | 182 | 321 |
| RND_CONV | 2 | 182 | 321 |
| 1st Convolution | 1st MaxPool | 2nd Convolution | 2nd MaxPool | 1st Dense | 2nd Dense | Softmax | |
|---|---|---|---|---|---|---|---|
| Maximum Error | 3.59E-03 | 3.59E-03 | 1.94E-03 | 1.94E-03 | 3.63E-03 | 4.12E-04 | 2.22E-04 |
| Minimum Error | -2.76E-03 | -2.31E-03 | -2.78E-03 | -2.77E-03 | -3.44E-03 | -4.81E-04 | -2.22E-04 |
| Mean Error | -1.00E-06 | 6.39E-05 | -1.18E-05 | -1.22E-05 | -1.12E-04 | 5.25E-06 | 1.05E-08 |
| Mean Absolute Error | 1.51E-04 | 3.76E-04 | 1.22E-04 | 2.22E-04 | 8.87E-04 | 3.14E-04 | 1.11E-04 |
| Quadratic Error | 1.99E-07 | 5.58E-07 | 1.79E-07 | 3.27E-07 | 2.62E-06 | 1.18E-07 | 2.47E-08 |
| Standard Deviation | 4.46E-04 | 7.45E-04 | 4.23E-04 | 5.72E-04 | 1.62E-03 | 3.43E-04 | 1.57E-04 |
| Mode | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | - | - |
| Median | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Params | Input | 1st Conv. | 1st MaxPool | 2nd Conv. | 2nd MaxPool | 1st Dense | 2nd Dense | |
|---|---|---|---|---|---|---|---|---|
| T1 | 3.15 | 17.19 | 17.19 | 17.19 | 17.19 | 17.19 | 17.19 | 17.19 |
| T2 | 3.15 | 12.0 | 13.1 | 13.1 | 317.19 | 17.19 | 17.19 | 17.19 |
| T3 | 3.15 | 12.0 | 13.1 | 13.1 | 14.2 | 14.2 | 17.19 | 17.19 |
| T4 | 3.15 | 12.0 | 13.1 | 13.1 | 14.2 | 14.2 | 14.5 | 17.19 |
| T5 | 3.15 | 12.0 | 13.1 | 13.1 | 14.2 | 14.2 | 14.5 | 10.2 |
| T1 | T2 | T3 | T4 | T5 | |
|---|---|---|---|---|---|
| Maximum Error | 1.43E-03 | 2.67E-02 | 4.53E-02 | 5.13E-02 | 1.42E-01 |
| Minimum Error | -1.43E-03 | -2.96E-02 | -5.06E-02 | -5.22E-02 | -1.43E-01 |
| Mean Error | -1.53E-10 | 4.53E-11 | -2.28E-10 | -3.44E-10 | -1.07E-10 |
| Mean Absolute Error | 3.76E-05 | 5.70E-04 | 7.64E-04 | 7.63E-04 | 4.49E-03 |
| Quadratic Error | 1.40E-08 | 4.38E-06 | 8.80E-06 | 9.20E-06 | 1.60E-04 |
| Standard Deviation | 1.18E-04 | 2.090E-03 | 2.97E-03 | 3.03E-03 | 1.27E-02 |
| Mode | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Median | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Mismatch Predictions | 0 | 0 | 1 | 0 | 7 |
| T0 | T1 | T2 | T3 | T4 | T5 | |
|---|---|---|---|---|---|---|
| BRAM | 22 | 19 | 16 | 15 | 15 | 15 |
| DSP | 77 | 77 | 55 | 54 | 53 | 53 |
| FF | 5 647 | 5 190 | 4 418 | 4 277 | 4 212 | 4 044 |
| LUT | 11 336 | 11 175 | 10 952 | 9 816 | 9 859 | 9 685 |
| Wordlength | Parameters (3) | 1st Conv + MaxPool (13) | 2nd Conv + MaxPool (14) | 1st Dense (14) | 2nd Dense (10) |
|---|---|---|---|---|---|
| 20 | 0 | 0 | 0 | 0 | 0 |
| 19 | 0 | 0 | 2 | 1 | 0 |
| 18 | 0 | 0 | 0 | 0 | 0 |
| 17 | 0 | 0 | 2 | 1 | 1 |
| 16 | 0 | 0 | 1 | 0 | 1 |
| 15 | 0 | 0 | 1 | 3 | 1 |
| 14 | 0 | 0 | 5 | 6 | 2 |
| 13 | 2 | 2 | - | - | 3 |
| 12 | 1 | - | - | - | 5 |
| 11 | 10 | - | - | - | 9 |
| 10 | 8 | - | - | - | 29 |
| Parameters | Input | 1st Convolution | 1st MaxPool | 2nd Convolution | 2nd MaxPool | 1st Dense | 2nd Dense | |
|---|---|---|---|---|---|---|---|---|
| T5 | 3.15 | 12.0 | 13.1 | 13.1 | 14.2 | 14.2 | 14.5 | 10.2 |
| T6 | 3.9 | 12.0 | 13.0 | 13.0 | 14.0 | 14.0 | 14.1 | 10.4 |
| T7 | 3.7 | 12.0 | 13.0 | 13.0 | 14.0 | 14.0 | 14.1 | 10.4 |
| T5 | T6 | T7 | |
|---|---|---|---|
| Maximum Error | 1.42E-01 | 1.11E-01 | 3.41E-01 |
| Minimum Error | -1.43E-01 | -1.06E-01 | -3.41E-01 |
| Mean Error | -1.07E-10 | -2.15E-10 | -5.41E-10 |
| Mean Absolute Error | 4.49E-03 | 3.91E-03 | 1.14E-02 |
| Quadratic Error | 1.60E-04 | 1.26E-04 | 1.15E-03 |
| Standard Deviation | 1.27E-02 | 1.12E-02 | 3.39E-02 |
| Mode | 0.00 | 0.00 | 0.00 |
| Median | 0.00 | -4.40E-25 | 4.41E-11 |
| Mismatch Predictions | 7 | 3 | 12 |
| Bits stored | 144 408 | 113 352 | 104 752 |
| T5 | T6 | T7 | |
|---|---|---|---|
| BRAM | 15 | 13 | 12 |
| DSP | 53 | 53 | 53 |
| FF | 4,044 | 3,843 | 3,785 |
| LUT | 9,685 | 9,466 | 9,386 |
| W | O | L | Negative | Total | |
|---|---|---|---|---|---|
| Maximum Error | 2.19E-01 | 1.55E-01 | 2.70E-01 | 3.24E-01 | 3.24E-01 |
| Minimum Error | -1.28E-01 | -1.68E-01 | -2.70E-01 | -3.24E-01 | -3.24E-01 |
| Mean Error | 2.43E-10 | -1.26E-09 | -1.46E-10 | 9.66E-10 | -2.65E-10 |
| Mean Absolute Error | 3.46E-03 | 6.24E-03 | 2.06E-02 | 2.04E-02 | 1.10E-02 |
| Quadratic Error | 3.20E-04 | 3.28E-04 | 2.14E-03 | 2.51E-03 | 1.07E-03 |
| Standard Deviation | 1.79E-02 | 1.81E-02 | 4.63E-02 | 5.01E-02 | 3.28E-02 |
| Mismatch Predictions | 1 | 0 | 7 | 3 | 11 |
| Mismatch Predictions (%) | 0.45% | 0.00% | 3.08% | 4.84% | 1.51% |
| DS #1 | DS #2 | DS #3 | DS #4 | DS #5 | |
|---|---|---|---|---|---|
| BRAM | 22 | 55 | 50 | 120 | 45 |
| DSP | 77 | 271 | 717 | 302 | 106 |
| FF | 5,647 | 41,483 | 131,136 | 48,793 | 17,526 |
| LUT | 11,336 | 35,891 | 80,022 | 37,704 | 23,042 |
| BRAM | DSP | FF | LUT |
|---|---|---|---|
| 140 | 220 | 106 400 | 53 200 |
| No Pipeline (10ns) | No Pipeline (10.4ns) | |
|---|---|---|
| Clock (ns) | 10.00 | 10.40 |
| Total Time (ns) | 2.644E+06 | 2.739E+06 |
| 1st Convolution | 2nd Convolution | 1st MaxPool | 2nd MaxPool | 1st Dense | 2nd Dense | |
|---|---|---|---|---|---|---|
| Clock (ns) | 10.40 | 10.40 | 10.40 | 10.40 | 10.41 | 10.41 |
| Total Time (ns) | 1.408E+06 | 4.780E+05 | 2.760E+05 | 2.550E+05 | 1.020E+05 | 9.913E+04 |
| Function Time - Before (ns) | 1.363E+06 | 9.370E+05 | 2.130E+05 | 2.829E+04 | 1.920E+05 | 3.414E+03 |
| Function Time - After (ns) | 3.221E+04 | 7.592E+03 | 1.063E+04 | 7.114E+03 | 4.37E+04 | 2.390E+02 |
| Function Speedup | 42.3x | 123.4x | 20.0x | 4.0x | 4.6x | 14.3x |
| Total Speedup (10.4ns) | 1.9x | 5.7x | 9.9x | 10.7x | 26.9x | 27.6x |
| Total Speedup (10ns) | 1.9x | 5.5x | 9.6x | 10.4x | 25.9x | 26.7x |
| After Pipeline | 1st Dense (LU) | 1st Convolution (LU) | |
|---|---|---|---|
| Clock (ns) | 10.41 | 10.41 | 10.37 |
| Total Time (ns) | 9.913E+04 | 6.391E+04 | 6.367E+04 |
| Function Time - Before Pipeline (ns) | - | 1.920E+05 | 1.363E+06 |
| Function Time - Before Optimization (ns) | - | 4.137E+04 | 3.224E+04 |
| Function Time - After (ns) | - | 3.727E+03 | 3.212E+04 |
| Function Total Speedup | - | 51.5x | 42.4x |
| Function Optimization Speedup | - | 11.1x | 1.004x |
| Total Speedup (10.4ns) | 27.6x | 42.9x | 43.0x |
| Total Speedup (10ns) | 26.7x | 41.4x | 41.5x |
| After Pipeline | 1st Conv+Maxpool (M) | 2nd Conv+Maxpool (M) | 2nd Conv+1st Dense (M) | |
|---|---|---|---|---|
| Clock (ns) | 10.41 | 10.37 | 10.37 | 10.37 |
| Total Time (ns) | 9.913E+04 | 5.318E+04 | 4.606E+04 | 4.266E+04 |
| Function Time - Before Pipeline (ns) | - | 1.577E+06 | 9.655E+05 | 1.157E+06 |
| Function Time - Before Optimization (ns) | - | 4.271E+04 | 1.466E+04 | 1.128E+04 |
| Function Time - After (ns) | - | 3.224E+04 | 7.570E+03 | 7.902E+03 |
| Function Total Speedup | - | 48.9x | 127.5x | 146.4x |
| Function Optimization Speedup | - | 1.3x | 1.9x | 1.4x |
| Total Speedup (10.4ns) | 27.6x | 51.5x | 59.5x | 64.2x |
| Total Speedup (10ns) | 26.7x | 49.7x | 57.4x | 62.0x |
| Work | Device Used | CNN Model | Performance | FPGA Resources |
|---|---|---|---|---|
| FPGA-based Implementation of a Dynamic Hand Gesture Recognition System [5] | Xilinx Zynq-7000 | CNN | 85% accuracy, 30 FPS | 60% LUTs, 70 DSPs |
| Low Power Embedded Gesture Recognition Using Radar Sensors [6] | Not FPGA-based | CNN + TCN | 92% accuracy | N/A |
| FPGA-based Implementation of Hand Gesture Recognition Using CNN [7] | Xilinx ZCU102 | CNN (ResNet-like) | 75% accuracy, 15 FPS | 72% LUTs, 88 DSPs |
| Implementation of TinyML Models on Arduino 33 BLE [8] | Not FPGA-based | TinyML (DNN) | 80% accuracy, Low latency | N/A |
| A Real-Time Gesture Recognition System with FPGA Acceleration [9] | Xilinx ZCU104 | Modified ZynqNet | 88% accuracy, 60 FPS | 65% LUTs, 96 DSPs |
| Real-Time Implementation of TinyML Models for Hand Gesture Recognition [10] | Not FPGA-based | CNN | 82% accuracy | N/A |
| Real-Time Vision-Based Static Hand Gesture Recognition on FPGA [11] | Xilinx Virtex-7 | CNN + SOM | 78% accuracy, 40 FPS | 58% LUTs, 62 DSPs |
| Hand Gesture Recognition Using TinyML on OpenMV [12] | Not FPGA-based | TinyML (CNN) | 80% accuracy | N/A |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).