Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Sign Language Recognition with Multimodal Sensors and Deep Learning Methods

Version 1 : Received: 18 September 2023 / Approved: 20 September 2023 / Online: 21 September 2023 (08:33:08 CEST)

A peer-reviewed article of this Preprint also exists.

Lu, C.; Kozakai, M.; Jing, L. Sign Language Recognition with Multimodal Sensors and Deep Learning Methods. Electronics 2023, 12, 4827. Lu, C.; Kozakai, M.; Jing, L. Sign Language Recognition with Multimodal Sensors and Deep Learning Methods. Electronics 2023, 12, 4827.

Abstract

Sign language recognition is essential in hearing-impaired people’s communication. Sign language recognition is an important concern in computer vision and has been developed with rapid progress in image recognition technology. However, sign language recognition using a general monocular camera has problems with occlusion and recognition accuracy in sign language recognition. In this research, we aim to improve accuracy by using a 2-axis bending sensor as an aid in addition to image recognition. We aim to achieve higher recognition accuracy by acquiring hand keypoint information of sign language actions captured by a monocular RGB camera and adding sensor assist. To improve sign language recognition, we need to propose new AI models. In addition, the amount of dataset is small because it uses the original data set of our laboratory. To learn using sensor data and image data, we used MediaPipe, CNN, and BiLSTM to perform sign language recognition. MediaPipe is a method for estimating the skeleton of the hand and face provided by Google. In addition, CNN is a method that can learn spatial information, and BiLSTM can learn time series data. Combining the CNN and BiLSTM methods yields higher recognition accuracy. We will use these techniques to learn hand skeletal information and sensor data. Additionally, the 2-axis Bending sensor glove data support training AI model. Using these methods, we aim to improve the recognition accuracy of sign language recognition by combining sensor data and hand skeleton data. Our method performed better than using skeletal information, achieving 96.5% accuracy in Top-1.

Keywords

Sign Language Recognition; sensor fusion

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.