Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Large Scale Speech Recognition for Low Resource Language Amharic, an End-to-End Approach

Version 1 : Received: 14 February 2024 / Approved: 14 February 2024 / Online: 15 February 2024 (02:11:18 CET)

How to cite: Ejigu, Y.A.; Asfaw, T.T. Large Scale Speech Recognition for Low Resource Language Amharic, an End-to-End Approach. Preprints 2024, 2024020813. https://doi.org/10.20944/preprints202402.0813.v1 Ejigu, Y.A.; Asfaw, T.T. Large Scale Speech Recognition for Low Resource Language Amharic, an End-to-End Approach. Preprints 2024, 2024020813. https://doi.org/10.20944/preprints202402.0813.v1

Abstract

Speech recognition, or automatic speech recognition (ASR), is a technology designed to convert spoken language into text using software. However, conventional ASR methods involve several distinct components, including language, acoustic, and pronunciation models with dictionaries. This modular approach can be time-consuming and may influence performance. In this study, we propose a method that streamlines the speech recognition process by incorporating a unified recurrent neural network (RNN) architecture. Our architecture integrates a convolutional neural network (CNN) with an RNN and employs a connectionist temporal classification (CTC) loss function. Key experiments were carried out using a dataset comprising 576,656 valid sentences, using erosion techniques. Evaluation of the model performance, measured by the word error rate (WER) metric, demonstrated remarkable results, achieving a WER of 2%. This approach has significant implications for the realm of speech recognition, as it alleviates the need for labor-intensive dictionary creation, enhancing the efficiency and accuracy of ASR systems, and making them more applicable to real-world scenarios. For future enhancements, we recommend the inclusion of dialectal and spontaneous data in the dataset to broaden the model's adaptability. Additionally, fine-tuning the model for specific tasks can optimize its performance for targeted objectives or domains, further enhancing its effectiveness in those areas.

Keywords

Automatic speech recognition; Convolutional Neural Network; Connectionist Temporal Classification; End-to-End; Neural network; Erosion; Recurrent Neural Network

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.