Submitted:
12 December 2023
Posted:
13 December 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Litrature Survey
3. Methodology
3.1. Dataset
3.2. Model Architecture

4. Results and Discussion
| Model | Year | Accuracy |
|---|---|---|
| Lipnet[2] | 2016 | 95.2 |
| Lipnet(with face cutout)[17] | 2020 | 97.9 |
| LACNet[15] | 2018 | 97.9 |
| CTC/Attention[18] | 2022 | 98.8 |
| Our Model | 2023 | 97 |
5. Conclusions and Future Scope
References
- Chen W, Tan X, Xia Y, Qin T, Wang Y, Liu TY. DualLip: A system for joint lip reading and generation. InProceedings of the 28th ACM International Conference on Multimedia 2020 Oct 12 (pp. 1985-1993).
- Assael YM, Shillingford B, Whiteson S, De Freitas N. Lipnet: End-to-end sentence-level lipreading. arXiv preprint 2016 Nov 5. arXiv:1611.01599.
- Garg A, Noyola J, Bagadia S. Lip reading using CNN and LSTM. Technical report, Stanford University, CS231 n project report. 2016.
- Wand M, Koutník J, Schmidhuber J. Lipreading with long short-term memory. In2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016 Mar 20 (pp. 6115-6119). IEEE.
- Stafylakis T, Tzimiropoulos G. Combining residual networks with LSTMs for lipreading. arXiv preprint 2017 Mar 12. arXiv:1703.04105.
- A. V. Omkar M Parkhi and A. Zisserman, “Deep face recognition,” Proceedings of the British Machine Vision, vol. 1, no. 3, p. 6, 2015.
- Petridis S, Stafylakis T, Ma P, Cai F, Tzimiropoulos G, Pantic M. End-to-end audiovisual speech recognition. In2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) 2018 Apr 15 (pp. 6548-6552). IEEE.
- Ma P, Petridis S, Pantic M. End-to-end audio-visual speech recognition with conformers. InICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021 Jun 6 (pp. 7613-7617). IEEE.
- Martinez B, Ma P, Petridis S, Pantic M. Lipreading using temporal convolutional networks. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020 May 4(pp. 6319-6323). IEEE.
- Miled M, Messaoud MA, Bouzid A. Lip reading of words with lip segmentation and deep learning. Multimedia Tools and Applications. 2023 Jan;82(1):551-71. [CrossRef]
- Manaswi NK, Manaswi NK. Understanding and working with Keras. Deep learning with applications using Python: Chatbots and face, object, and speech recognition with TensorFlow and Keras. 2018:31-43.
- Bradski G, Kaehler A. Learning OpenCV: Computer vision with the OpenCV library. " O’Reilly Media, Inc."; 2008 Sep 24.
- Howse, J. OpenCV computer vision with python. Birmingham: Packt Publishing; 2013 Apr 23.
- Sarhan AM, Elshennawy NM, Ibrahim DM. HLR-net: a hybrid lip-reading model based on deep convolutional neural networks. Computers, Materials and Continua. 2021 Jan 1;68(2):1531-49. [CrossRef]
- Xu K, Li D, Cassimatis N, Wang X. LCANet: End-to-end lipreading with cascaded attention-CTC. In2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) 2018 May 15 (pp. 548-555). IEEE.
- M. Cooke, J. Barker, S. Cunningham, and X. Shao. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America, 120(5):2421–2424, 2006. [CrossRef]
- Zhang Y, Yang S, Xiao J, Shan S, Chen X. Can we read speech beyond the lips? rethinking roi selection for deep visual speech recognition. In2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) 2020 Nov 16 (pp. 356-363). IEEE.
- Ma P, Petridis S, Pantic M. Visual speech recognition for multiple languages in the wild. Nature Machine Intelligence. 2022 Nov;4(11):930-9. [CrossRef]
- Ajeyprasaath KB, Vetrivelan P. A QoE Framework for Video Services in 5G Networks with Supervised Machine Learning Approach. InInternational Conference on Machine Intelligence and Signal Processing 2022 Mar 12 (pp. 661-668). Singapore: Springer Nature Singapore.
- Ajeyprasaath KB, Vetrivelan P. Machine Learning Based Classifiers for QoE Prediction Framework in Video Streaming over 5G Wireless Networks. CMC-COMPUTERS MATERIALS & CONTINUA. 2023 Jan 1;75(1):1919-39. [CrossRef]



Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).