Submitted:
28 August 2023
Posted:
29 August 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Parallel corpus augmentation: We successfully expanded the parallel corpus required for the Chinese-Kazakh machine translation task by utilizing phrase replacement techniques. By introducing more variations and diversities, we increased the richness of the training data, providing more information and context for training machine translation models.
- Joint data augmentation: To improve the quality of low-resource language machine translation, we adopted a combination of various data augmentation methods. In addition to generating pseudo-parallel corpora through phrase replacement, other data augmentation methods, such as random phrase replacement and deletion flipping, were also employed. By combining different data augmentation methods, we further increased the diversity of the training data, enhancing the model’s adaptability to various scenarios.
- Introducing the R-Drop regularization method: By introducing the R-Drop regularization method, we effectively enhanced the robustness of the model. R-Drop ensures consistency between the outputs of two sub-models by minimizing the bidirectional Kullback-Leibler (KL) divergence. During training, R-Drop regularizes the outputs of two sub-models randomly sampled from dropout. This alleviates the inconsistency between the training and inference stages, strengthening the model’s generalization ability and adaptability to unknown data.
2. Related Work

3. Method
3.1. Generating a high-quality phrase table
- The phrase translation probability (f|e),
- The lexical weighting lex(f|e),
- The phrase inverse translation probability(e|f),
- The inverse lexical weighting lex(e|f),
- The phrase penalty, currently always e = 2.718.

3.2. Generating pseudo-parallel data using a phrase table.
3.3. Utilizing R-Drop for Model Fine-tuning

4. Experiments
4.1. Data and preprocessing
4.2. System Environment and Model Parameters
4.3. Results and discussion
4.3.1. Baseline
4.3.2. Result
| Zh-Kk | Kk-Zh | |||
| BLEU | chrF++ | BLEU | chrF++ | |
| Transformer | 49.47 | 0.745 | 52.04 | 0.463 |
| Back-translation | 49.93 | 0.746 | 54.21 | 0.478 |
| Replace | 49.90 | 0.747 | 57.15 | 0.514 |
| Token | 49.26 | 0.744 | 56.99 | 0.512 |
| Swap | 48.91 | 0.742 | 57.15 | 0.513 |
| Source | 48.93 | 0.742 | 52.14 | 0.462 |
| Reverce | 49.81 | 0.750 | 57.85 | 0.516 |
| Phrase-substitution | 50.15 | 0.752 | 57.35 | 0.514 |
4.3.3. Combining multiple augmentation methods
| Zh-Kk | Kk-Zh | |||
| BLEU | chrF++ | BLEU | chrF++ | |
| Phrase-rep.+Rev. | 50.42 | 0.756 | 58.74 | 0.530 |
| Phrase-rep.+Rev.+Token | 50.55 | 0.756 | 58.99 | 0.532 |
| Phrase-rep.+Rev.+Swap | 51.46 | 0.761 | 58.58 | 0.527 |
4.3.4. Fine-tuning using R-Drop
| Zh-Kk | Kk-Zh | |||||
| BLEU | chrF++ | Time | BLEU | chrF++ | Time | |
| 0.3 | 53.56 | 0.771 | 3.11h | 59.53 | 0.539 | 4.49h |
| 0.4 | 54.46 | 0.777 | 4.48h | 60.15 | 0.544 | 5.46h |
| 0.5 | 54.45 | 0.776 | 5.35h | 59.59 | 0.538 | 3.35h |
| 0.6 | 54.29 | 0.774 | 5.96h | 59.28 | 0.535 | 3.05h |
| 0.7 | 54.44 | 0.777 | 6.38h | 59.74 | 0.539 | 5.64h |
4.4. Qualitative analysis
should be
(Occupying the road for business activities), some translation errors need to be corrected.The phrase-substitution method improves the baseline translation by fixing vocabulary choices and translation errors. It is closer to the expression of the source sentence and better conveys the original meaning. The mixed enhancement method further improves the translation to make it more accurate and fluent. It is closer to the source sentence regarding grammar, structure, and vocabulary choices while maintaining a certain natural fluency. The mixed enhancement+R-Drop method further simplifies the translation by omitting redundant words, making it more concise.

5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
Abbreviations
| MDPI | Multidisciplinary Digital Publishing Institute |
| DOAJ | Directory of open access journals |
| TLA | Three letter acronym |
| LD | linear dichroism |
References
- ahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473.
- Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation[J]. arXiv 2015, arXiv:1508.04025.
- Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
- Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning[C]. Journal Abbreviation 2017, 10, 142–149.
- Gehring, Jonas and Auli, Michael and Grangier, David and Yarats, Denis and Dauphin, Yann N, Convolutional sequence to sequence learning. In International conference on machine learning; PMLR, 2017: 1243–1252.
- Wang R, Tan X, Luo R, et al. A survey on low-resource neural machine translation[J]. arXiv 2021, arXiv:2107.04239.
- Shi S, Wu X, Su R, et al. Low-resource neural machine translation: Methods and trends[J]. ACM Transactions on Asian and Low-Resource Language Information Processing 2022, 21, 1–22.
- Zoph B, Yuret D, May J, et al. Transfer learning for low-resource neural machine translation[J]. arXiv 2016, arXiv:1604.02201.
- Kocmi T, Bojar O. Trivial transfer learning for low-resource neural machine translation[J]. arXiv 2018, arXiv:1809.0035710, arXiv:1809–00357.
- Kim Y, Petrov P, Petrushkov P, et al. Pivot-based transfer learning for neural machine translation between non-English languages[J]. arXiv 2019, arXiv:1909.09524.
- Aji A F, Bogoychev N, Heafield K, et al. In neural machine translation, what does transfer learning transfer?[C]. Association for Computational Linguistics. 2020.
- Li Z, Liu X, Wong D F, et al. Consisttl: Modeling consistency in transfer learning for low-resource neural machine translation[J]. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 3rd ed.; Publisher: Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022; pp. 8383–8394.
- Turganbayeva A, Tukeyev U. The solution of the problem of unknown words under neural machine translation of the Kazakh language[J]. Journal of Information and Telecommunication 2021, 5, 214–225. [CrossRef]
- Khayrallah H, Koehn P. On the impact of various types of noise on neural machine translation[J]. arXiv 2018, arXiv:1805.12282.
- Sperber M, Niehues J, Waibel A. Toward robust neural machine translation for noisy input sequences[C]. Proceedings of the 14th International Conference on Spoken Language Translation, 2017; pp. 90–96.
- Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving Neural Machine Translation Models with Monolingual Data. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Publisher: Association for Computational Linguistics, Berlin, Germany, 2016; pp. 86–96.
- Hoang V C D, Koehn P, Haffari G, et al. Iterative back-translation for neural machine translation[C]Proceedings of the 2nd workshop on neural machine translation and generatio 2018; pp. 18–24.
- Caswell I, Chelba C, Grangier D. Tagged back-translation[J]. arXiv 2019, arXiv:1906.06442.
- Wu L, Li J, Wang Y, et al. R-drop: Regularized dropout for neural networks[J]. Advances in Neural Information Processing Systems 2021, 34, 10890–10905.
- Al-Onaizan Y, Curin J, Jahr M, et al. Statistical machine translation[C] Final Report, JHU Summer Workshop 1999; Vol. 30, pp. 98–157.
- Koehn, P. Statistical machine translation[M]. Publisher: Cambridge University Press, 2019.
- Zaremba W, Sutskever I, Vinyals O. Recurrent neural network regularization[J]. arXiv 2014, arXiv:1409.2329.
- Cheng J, Dong L, Lapata M. Long short-term memory-networks for machine reading[J]. arXiv 2016, arXiv:1601.06733.
- Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155.
- Bugliarello E, Okazaki N. Enhancing machine translation with dependency-aware self-attention. arXiv 2019, arXiv:1909.03149.
- Cheng Y, Liu Y, Yang Q, Sun M, Xu W. Neural machine translation with pivot languages. arXiv 2016, arXiv:1611.04928.
- Chen Y, Liu Y, Li V. Zero-resource neural machine translation with multi-agent communication game. In Proceedings of the aaai conference on artificial intelligence; 2018; Vol. 32, No. 1.
- ulcehre C, Firat O, Xu K, Cho K, Barrault L, Lin HC, Bougares F, Schwenk H, Bengio Y. On using monolingual corpora in neural machine translation. arXiv 2015, arXiv:1503.03535.
- Lample G, Conneau A, Denoyer L, Ranzato MA. Unsupervised machine translation using monolingual corpora only. arXiv 2017, arXiv:1711.00043.
- Zhang Z, Liu S, Li M, Zhou M, Chen E. Joint training for neural machine translation models with monolingual data. In Proceedings of the AAAI Conference on Artificial Intelligence; 2018; Vol. 32, No. 1.
- Pham NL, Pham TV. A Data Augmentation Method for English-Vietnamese Neural Machine Translation. IEEE Access 2023, 11, 28034–28044. [CrossRef]
- Sen S, Hasanuzzaman M, Ekbal A, Bhattacharyya P, Way A. Neural machine translation of low-resource languages using SMT phrase pair injection. Natural Language Engineering 2021, 27, 271–292. [CrossRef]
- Batheja A, Bhattacharyya P. Improving machine translation with phrase pair injection and corpus filtering. arXiv 2022, arXiv:2301.08008.
- Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions; 2007; pp. 177–180.
- Johnson H, Martin J, Foster G, Kuhn R. Improving translation quality by discarding most of the phrasetable. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL); 2007; pp. 967–975.
- Baldi P, Sadowski PJ. Understanding dropout. Advances in neural information processing systems 2013, 26.
- Jieba. Available online: https://github.com/fxsjy/jieba (accessed on 10 August 2023).
- Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909.
- Subword Neural Machine Translation. Available online: https://github.com/rsennrich/subword-nmt (accessed on 10 August 2023).
- Ott M, Edunov S, Baevski A, Fan A, Gross S, Ng N, Grangier D, Auli M. fairseq: A fast, extensible toolkit for sequence modeling. arXiv 2019, arXiv:1904.01038.
- Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
- Post, M. A call for clarity in reporting BLEU scores. arXiv 2018, arXiv:1804.08771. [Google Scholar]
- Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics ; 2002; pp. 311–318.
- Popović M. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the tenth workshop on statistical machine translation ; 2015; pp. 392–395.
- Popović, M. Popović M. chrF++: words helping character n-grams. In Proceedings of the second conference on machine translation ; 2017; pp. 612–618.
- Fadaee M, Bisazza A, Monz C. Data augmentation for low-resource neural machine translation. arXiv 2017, arXiv:1705.00440.
- Xie Z, Wang SI, Li J, Lévy D, Nie A, Jurafsky D, Ng AY. Data noising as smoothing in neural network language models. arXiv 2017, arXiv:1703.02573.
- Artetxe M, Labaka G, Agirre E, Cho K. Unsupervised neural machine translation. arXiv 2017, arXiv:1710.11041.
- Currey A, Miceli-Barone AV, Heafield K. Copied monolingual data improves low-resource neural machine translation. In Proceedings of the second conference on machine translation ; 2017; pp. 148–156.
- Voita E, Sennrich R, Titov I. Analyzing the source and target contributions to predictions in neural machine translation. arXiv 2020, arXiv:2010.10907.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).