Efficient Word-Level Sign Language Recognition Using Quantized Spatiotemporal Deep Learning for Low-Power Microcontrollers

Samuel Longwani Kimpinde; Peter Olukanmi

doi:10.20944/preprints202602.1848.v1

Submitted:

25 February 2026

Posted:

27 February 2026

You are already at the latest version

Abstract

Deploying efficient sign language recognition models on edge devices advances inclusive, affordable, and privacy-preserving human–computer interaction. Yet most state-of-the-art architectures target server-class hardware and fail under the strict memory, computation, and energy constraints of microcontrollers. This work introduces S3D-Conv1D, a spatiotemporal architecture for isolated word-level sign language recognition, tailored for TinyML deployment. By factorizing spatial and temporal processing into lightweight two-dimensional (2D) convolutions followed by one-dimensional convolution (Conv1D) layers, the model eliminates recurrent dependencies and ensures deterministic, MCU-compatible computation. This design yields predictable latency, bounded activation memory, and stable model size, while supporting full post-training 8-bit integer (INT8) quantization and compatibility with TensorFlow Lite, CMSIS-NN, and NNoM. Three baselines, S3D, CNN+RNN, and attention-based embedded LSTM (e-LSTM), were evaluated using unified preprocessing, quantization, and profiling on WLASL100 and SemLex100 datasets. S3D-Conv1D achieved 98.4\% float32 accuracy with superior stability and generalization. After INT8 quantization, it retained accuracy within 0.18\% while compressing nearly 4× into sub-megabyte binaries (895.9 KB). Deployment profiling on a CPU-only runner, used as a proxy for microcontroller execution, showed S3D-Conv1D as the only architecture achieving full INT8 execution with real-time indicative performance (23.6 ms). These results demonstrate that efficient, edge-ready sign language recognition requires architectures designed around hardware constraints from the outset, rather than compressing high-capacity models.

Keywords:

sign language recognition

;

TinyML

;

quantization

;

spatiotemporal modeling

;

microcontrollers

;

embedded deep learning

;

inclusive technology

Subject:

Engineering - Electrical and Electronic Engineering

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Efficient Word-Level Sign Language Recognition Using Quantized Spatiotemporal Deep Learning for Low-Power Microcontrollers

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe