Version 1
: Received: 13 October 2020 / Approved: 15 October 2020 / Online: 15 October 2020 (13:42:28 CEST)
How to cite:
Belay, B.; Habtegebrial, T.; Belay, G.; Mesheshsa, M.; Liwicki, M.; Stricker, D. Learning by Injection: Attention Embedded Recurrent Neural Network for Amharic Text-image Recognition. Preprints2020, 2020100324
Belay, B.; Habtegebrial, T.; Belay, G.; Mesheshsa, M.; Liwicki, M.; Stricker, D. Learning by Injection: Attention Embedded Recurrent Neural Network for Amharic Text-image Recognition. Preprints 2020, 2020100324
Belay, B.; Habtegebrial, T.; Belay, G.; Mesheshsa, M.; Liwicki, M.; Stricker, D. Learning by Injection: Attention Embedded Recurrent Neural Network for Amharic Text-image Recognition. Preprints2020, 2020100324
APA Style
Belay, B., Habtegebrial, T., Belay, G., Mesheshsa, M., Liwicki, M., & Stricker, D. (2020). Learning by Injection: Attention Embedded Recurrent Neural Network for Amharic Text-image Recognition. Preprints. https://doi.org/
Chicago/Turabian Style
Belay, B., Marcus Liwicki and Didier Stricker. 2020 "Learning by Injection: Attention Embedded Recurrent Neural Network for Amharic Text-image Recognition" Preprints. https://doi.org/
Abstract
In the present, the growth of digitization and worldwide communications make OCR systems of exotic languages a very important task. In this paper, we attempt to develop an OCR system for one of these exotic languages with a unique script, Amharic. Motivated by the recent success of the Attention mechanism in Neural Machine Translation (NMT), we extend the attention mechanism for Amharic text-image recognition. The proposed model consists of CNNs and attention embedded recurrent encoder-decoder networks that are integrated following the configuration of the seq2seq framework. The attention network parameters are trained in an end-to-end fashion and the context vector is injected, with the previously predicted output, at each time steps of decoding. Unlike the existing OCR model that minimizes the CTC objective function, the new model minimizes the categorical cross-entropy loss. The performance of the proposed attention-based model is evaluated against the test dataset from the ADOCR database which consists of both printed and synthetically generated Amharic text-line images and achieved promising results with a CER of 1.54% and 1.17% respectively.
Computer Science and Mathematics, Algebra and Number Theory
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.