2. Literature Review
The global population of individuals with hearing impairments is increasing, making sign language recognition systems essential for improving communication within this community. The research on sign language (SL) translation and recognition is extensive and diverse, encompassing various innovative approaches. From haptic technology to advanced computer vision techniques, through smart wearable devices and human-machine interfaces, the design of sign language translation systems has been a challenging topic with promising results, making it a viable solution to promote more inclusive communication.
A hydrogel deformation sensor with silica nanoparticles has been developed in [
17] to improve the control of robotic hands and recognise gestures. The sensor successfully identified 15 sign language signs with an accuracy of
. A gesture recognition system made of silicone elastomer and ultrasonic waves achieved an accuracy of 91.13% to recognise eight common hand gestures and 88.5% to recognise ten SL digits [
18]. Another innovative device is a flexible deformation sensor with a snake-inspired structure, which has demonstrated high accuracy in recognising 21 SL signs, achieving a 98% success rate [
19]. Furthermore, a smart glove designed for SL translation uses a single sensor to capture hand movements and convert them into text or voice, achieving 90% accuracy for 26 letters of the American alphabet [
20]. Lastly, an optical fibre glove has been made for American SL recognition. It is very good at figuring out what ten numbers, 26 letters, 18 words, and five sentences mean, with a success rate of 98.6% for static gestures and 95% for dynamic gestures [
21].
In real-time translation, Turkish SL has been utilised to create a glove that combines fuzzy logic-assisted and extreme learning machines. This system has achieved 96.8% accuracy with a minimal processing time of
for 120 words, providing an innovative solution for real-time communication [
22]. The “SignTalk” mobile application receives and converts recognised signs into text and voice by employing a Tiny machine learning solution to recognise SL using a low-cost wearable device connected to the Internet. It utilises a lightweight deep neural network to interpret isolated signs of Indian SL based on motion sensor data, where transfer learning is used to initialise the model parameters. The system achieves an accuracy of 87.18% with only four observations of the 120 signs [
23].
Researchers have also proposed using computer vision, neural networks, and landmark estimation to improve sign language (SL) translation and recognition. These ideas involve developing and comparing various algorithms capable of understanding SL. Several studies have utilised deep neural networks and data from SL in RGB video or image format, resulting in significant progress in translation and recognition tasks for this language. One of the proposals involves an application that leverages the Internet of Things (IoT) and gesture recognition with MediaPipe [
24]. This application enables communication between Mexican SL users and non-disabled individuals by connecting an Apple Watch with a camera to translate gestures. It has achieved 96% recognition accuracy for 44 classes, including digits, Spanish alphabet letters, greetings, and an object. Another study aimed to improve communication for deaf or hard-of-hearing individuals who do not know SL through a mobile application [
25]. The app uses the MediaPipe library for video classification and includes a real-time translation function. Initially, the VGG16 library was used to create and train the model, but due to low accuracy, the MediaPipe library was chosen for better 3D coordinate detection. The model achieved 85% accuracy for ten words.
The work in [
26] suggested an algorithm using CNN, which achieved an accuracy of 99.7% for 32 different classes, which is better than current models and has much potential for helping technology work better. In [
27], an SL translation between Mexican SL and Spanish achieved an accuracy of 98.8% for ten signs using a bidirectional RNN, including phrases that can be used in school. This is done using MediaPipe to find critical points in gestures. The study highlights the importance of manual features in SL translation and its potential to be expanded to other languages. In [
28], a system for translating Panamanian SL into Spanish text was developed using an LSTM model. This model can work with non-static signs as sequential data. The deep learning model focuses on action detection, accurately processing frames where gestures are executed. In addition to hands, visual features such as the speaker’s face and pose were considered. The system was trained with a dataset of 330 videos corresponding to 5 classes and achieved an accuracy of 98.8%.
A study introduces an innovative method for translating Urdu sign language using a CNN model designed explicitly for languages with limited resources [
29]. In contrast to previous research focusing on sign languages with large networks, this study addresses translation in Urdu sign language with limited ones. The study conducted experiments using two datasets of 1500 and 78000 images for 37 Urdu signs. The process involved modules for gathering, preprocessing, categorisation, and prediction. The results showed that the model outperformed other machine learning methods, achieving an accuracy of 95% and excelling in precision, recall, and F1-score measures. Another proposal focused on expanding Indian and Vietnamese sign language datasets [
30]. This involved distorting gestures using the MediaPipe tool and a GRU-LSTM classification model, resulting in over 95% accuracy for 60 Indian SL words. This approach applied data augmentation over the Vietnamese sign language corpus to get 4364 new samples, demonstrating its potential to enhance resources for sign language recognition systems.
On the other hand, there needs to be more research in sign language (SL) translation than in spoken language translation advances. Because obtaining annotations is costly and slow, a new method of processing SL videos without using annotations was proposed in [
31]. This approach uses the interpreter’s skeletal points to identify movements, which increases the model’s robustness against background noise. These methods were tested with German SL (RWTH-PHOENIX-Weather 2014T) and Korean SL (KETI), achieving an 84% success rate. Likewise, existing research in SL translation has yet to fully utilise techniques such as skin masking, edge detection, and feature extraction [
32]. The study employs the SURF model for feature extraction, which is resistant to variations such as rotation, scale, and occlusion. Additionally, it uses the Bag of Visual Words (BoVW) model for gesture-to-text conversion. The system was evaluated with machine learning algorithms, achieving accuracy between 79% and 92% for Indian alphabet letter and digit classification.
Finally, some studies focus on creating a two-way sign language translation system to help people with hearing impairments communicate more effectively. One study introduces a real-time avatar system that can translate text or voice into Arabic SL movements using a deep learning model based on YOLOv8. This system can recognise and interpret gestures in real time with high precision. The avatar is trained with three sets of data: an RGB dataset of the Arabic alphabet sign language, an SL detection image dataset, and an Arabic SL dataset with
recognition accuracy. This technology aims to improve communication for deaf and mute individuals in Arabic-speaking communities [
33]. Similarly, an automatic translation system was developed to convert Arabic text and audio into sign language using an animated character to replicate the corresponding gestures. Due to limited resources, a dataset of 12187 pairs of words and signs was created. The model uses bidirectional transformers and achieved an accuracy of 94.71% in training and 87.04% in testing [
34].
In summary, translation and automatic sign language recognition have seen significant advancements in recent years, thanks to the convergence of various disciplines such as artificial intelligence and image processing. Research in this area has led to various innovative solutions, from sensor-based systems and wearable devices to sophisticated deep-learning algorithms. Studies demonstrate a consensus about the feasibility and potential of these technologies to improve communication and inclusion for deaf people. The developed systems have achieved increasingly higher levels of accuracy in gesture classification and translation thanks to advanced techniques in deep learning and transformation methods.
Despite significant progress in the field, challenges still require in-depth understanding. These include the need for standardisation in the notation used to represent sign language, the development of vision-based techniques to improve the accuracy of gesture translation, and the creation of language-independent annotations to connect different sign languages.
First, generating synthetic sign language datasets can enhance the effectiveness of sign language systems, as demonstrated in [
35]. Second, it is important to separate key elements involved in the process, such as posture and hand position. This was effectively accomplished in [
36], where the researchers achieved high accuracy by training models with data from one hand and two hands, even in noisy backgrounds. Finally, the need for standardised notations for representing sign language and the scarcity of large annotated datasets limit the development of more robust and generalisable models. For instance, the work in [
37] highlights the need for standardised annotation methods by focusing on the interpreter’s body movements rather than the meanings of the signs. This underscores the necessity for a broader range of databases to support machine learning models.