Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

A Chinese-Kazakh Translation Method That Combines Data Augmentation and R-drop Regularization

Version 1 : Received: 28 August 2023 / Approved: 28 August 2023 / Online: 29 August 2023 (14:18:04 CEST)

A peer-reviewed article of this Preprint also exists.

Liu, C.; Silamu, W.; Li, Y. A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization. Appl. Sci. 2023, 13, 10589. Liu, C.; Silamu, W.; Li, Y. A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization. Appl. Sci. 2023, 13, 10589.

Abstract

Low-resource languages often face the problem of insufficient data, which leads to poor quality in machine translation. One approach to address this issue is data augmentation. Data augmentation involves creating new data by transforming existing data through methods such as flipping, cropping, rotating, and adding noise. Traditionally, pseudo-parallel corpora are generated by randomly replacing words in low-resource language machine translation. However, this method can introduce ambiguity as the same word may have different meanings in different contexts. This study proposes a new approach for low-resource language machine translation, which involves generating pseudo-parallel corpora by replacing phrases. The performance of this approach is compared with other data augmentation methods, and it is observed that combining it with other data augmentation methods further improves performance. To enhance the robustness of the model, R-Drop regularization is also used. R-Drop is an effective method for improving the quality of machine translation. The proposed method was tested on Chinese-Kazakh (Latin script) translation tasks, resulting in performance improvements of 4.99 and 7.7 for Chinese-to-Kazakh and Kazakh-to-Chinese translations, respectively.

Keywords

machine translation; data augmentation; phrase replacement; R-Drop

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.