Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Automated Word-level Lip Reading using Convolutional Neural Networks with New Turkish Dataset

Version 1 : Received: 18 April 2023 / Approved: 20 April 2023 / Online: 20 April 2023 (10:07:48 CEST)

How to cite: Pervan-Akman, N.; Berkol, A.; Erdem, H. Automated Word-level Lip Reading using Convolutional Neural Networks with New Turkish Dataset. Preprints 2023, 2023040645. https://doi.org/10.20944/preprints202304.0645.v1 Pervan-Akman, N.; Berkol, A.; Erdem, H. Automated Word-level Lip Reading using Convolutional Neural Networks with New Turkish Dataset. Preprints 2023, 2023040645. https://doi.org/10.20944/preprints202304.0645.v1

Abstract

Automated lip reading is a research problem that has developed considerably in recent years. Lip reading is evaluated both visually and audibly in some cases. The lip reading model is a field of use for detecting specific words using images from security cameras, but it is not possible to use audio-visual databases in this situation. It is not possible to obtain the sound input of the pronounced word in all cases. We collected a new Turkish dataset with only the image in this study. The new dataset is produced using Youtube videos, which is an uncontrolled environment. For this reason, images have difficult parameters in terms of environmental factors such as light, angle, color, and personal characteristics of the face. Despite the different features on the human face such as mustache, beard, and make-up, the visual speech recognition problem was developed on 10 classes including single words and two-word phrases using Convolutional Neural Networks (CNN) without any intervention on the data. The proposed study using only-visual data obtained a model which is automated visual speech recognition with a deep learning approach. In addition, since this study uses only-visual data, the computational cost and resource usage is less than in multi-modal studies. It is also the first known study to address the lip reading problem with a deep learning algorithm using a new dataset belonging to the Ural-Altaic languages.

Keywords

Lip Reading; Multiclass Classification; Turkish Lip Reading Dataset; Deep Learning; Convolutional Neural Networks; Lip Detection

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.