Article
Version 2
This version is not peer-reviewed
Exploiting Linguistic Knowledge for Low-Resource Neural Machine Translation
Version 1
: Received: 18 February 2020 / Approved: 19 February 2020 / Online: 19 February 2020 (10:51:41 CET)
Version 2 : Received: 29 February 2020 / Approved: 2 March 2020 / Online: 2 March 2020 (15:28:34 CET)
Version 2 : Received: 29 February 2020 / Approved: 2 March 2020 / Online: 2 March 2020 (15:28:34 CET)
How to cite: Pan, Y.; Li, X.; Yang, Y.; Dong, R. Exploiting Linguistic Knowledge for Low-Resource Neural Machine Translation. Preprints 2020, 2020020273 Pan, Y.; Li, X.; Yang, Y.; Dong, R. Exploiting Linguistic Knowledge for Low-Resource Neural Machine Translation. Preprints 2020, 2020020273
Abstract
Exploiting the linguistic knowledge of the source language for neural machine translation (NMT) has recently achieved impressive performance on many large-scale language pairs. However, since the Turkish→English machine translation task is low-resource and the source-side Turkish is morphologically-rich, there are limited resources of bilingual corpora and linguistic information available to further improve the NMT performance. Focusing on the above issues, we propose a multi-source NMT approach that models the word feature in parallel to external linguistic features by using two separate encoders to explicitly incorporate linguistic knowledge into the NMT model. We extend the word embedding layer of the knowledge-based encoder to accommodate for each word’s linguistic annotations in the context. Moreover, we share all parameters across encoders to enhance the representation ability of the NMT model on the source language. Experimental results show that our proposed approach achieves substantial improvements of up to 2.4 and 1.1 BLEU scores in Turkish→English and English→Turkish machine translation tasks, respectively, which points to a promising way to utilize the source-side linguistic knowledge for the low-resource NMT.
Keywords
linguistic knowledge; source language; neural machine translation (NMT); low-resource; multi-source NMT
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (1)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment
Commenter: Yirong Pan
Commenter's Conflict of Interests: Author