Preprint
Article

This version is not peer-reviewed.

Exploiting Linguistic Knowledge for Low-Resource Neural Machine Translation

Submitted:

29 February 2020

Posted:

02 March 2020

You are already at the latest version

Abstract
Exploiting the linguistic knowledge of the source language for neural machine translation (NMT) has recently achieved impressive performance on many large-scale language pairs. However, since the Turkish→English machine translation task is low-resource and the source-side Turkish is morphologically-rich, there are limited resources of bilingual corpora and linguistic information available to further improve the NMT performance. Focusing on the above issues, we propose a multi-source NMT approach that models the word feature in parallel to external linguistic features by using two separate encoders to explicitly incorporate linguistic knowledge into the NMT model. We extend the word embedding layer of the knowledge-based encoder to accommodate for each word’s linguistic annotations in the context. Moreover, we share all parameters across encoders to enhance the representation ability of the NMT model on the source language. Experimental results show that our proposed approach achieves substantial improvements of up to 2.4 and 1.1 BLEU scores in Turkish→English and English→Turkish machine translation tasks, respectively, which points to a promising way to utilize the source-side linguistic knowledge for the low-resource NMT.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated