Submitted:
05 April 2026
Posted:
07 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We present GaroNMT, the first publicly available neural machine translation system for Garo, supporting bidirectional translation between Garo (Latin script) and English.
- We release a curated gold parallel corpus of 15,441 sentence pairs spanning mixed domains, building on an earlier 2,500-pair release (Labs 2025).
- We conduct a systematic ablation study across six configurations examining zero-shot performance, the effect of LLM-based backtranslation augmentation, and continued pretraining on monolingual Garo data.
- We document a Garo-specific evaluation artefact: Unicode interpunct inconsistency between U+00B7 (·) and U+2219 (·) artificially suppresses automatic MT metrics and must be normalised before evaluation.
- We report a null result for CPT: monolingual pretraining converges within one epoch and provides no downstream translation benefit, a finding relevant to practitioners working on similar low-resource settings.
- We introduce a GaroBERT-guided agentic reranking layer over GaroNMT, the first such system demonstrated for a Northeast Indian language.
- We release LaBSE-Garo, a fine-tuned cross-lingual embedding model achieving 89.8% mean translation retrieval accuracy on Garo-English pairs, enabling future corpus mining and quality estimation work for Garo.
- All models will be released under CC-BY-4.0 upon acceptance.
2. Background
2.1. The Garo Language
2.2. NLP for Northeast Indian Languages
2.3. Low-Resource Neural Machine Translation
3. Data
3.1. Gold Parallel Corpus
3.2. Backtranslation Data
3.3. Monolingual Data
4. Model and Training
4.1. Base Model and Tokenizer
4.2. Training Configurations
Zero-shot Fresh (ZS-F).
Zero-shot CPT (ZS-C).
A1 – Fresh base, gold only.
A2 – CPT base, gold only.
B1 – Fresh base, BT + gold.
B2 – CPT base, BT + gold.
4.3. Evaluation Protocol and Unicode Normalisation
5. Results
| Run | Data | en→grt (in-domain) | grt→en (in-domain) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| BLEU | ChrF++ | TER↓ | METEOR | BLEU | ChrF++ | TER↓ | METEOR | ||
| ZS-F | — | 0.23 | 8.95 | 221.27 | 0.056 | 0.56 | 14.78 | 148.81 | 0.087 |
| ZS-C | Mono CPT | 0.69 | 11.42 | 136.04 | 0.064 | 0.71 | 12.74 | 106.61 | 0.056 |
| A1 | Gold | 12.31 | 48.57 | 80.37 | 0.290 | 27.36 | 47.34 | 63.04 | 0.450 |
| A2 | CPT + Gold | 12.68 | 48.44 | 79.45 | 0.292 | 25.76 | 46.02 | 66.09 | 0.437 |
| B1 | BT + Gold | 14.06 | 51.38 | 75.70 | 0.321 | 29.50 | 49.23 | 59.88 | 0.485 |
| B2 | CPT + BT + Gold | 13.25 | 51.26 | 75.08 | 0.314 | 29.36 | 49.36 | 60.20 | 0.483 |
| Run | en→grt | grt→en | ||
|---|---|---|---|---|
| BLEU | ChrF | BLEU | ChrF | |
| B1 | 16.50 | 54.52 | 45.37 | 60.15 |
| B2 | 14.55 | 52.88 | 45.32 | 60.55 |
Zero-shot baselines.
BT augmentation helps consistently.
CPT is a null result.
B1 vs. B2.
5.1. Qualitative Analysis
6. Discussion
6.1. The Unicode Interpunct Evaluation Artefact
6.2. On the CPT Null Result
6.3. Deployment Context
6.4. GaroBERT-Guided Agentic Reranking
7. Limitations
OOD reference quality.
Single-reference evaluation.
BT quality.
Script coverage.
Human evaluation.
8. Future Work
9. Conclusion
Acknowledgments
Appendix A. Reference Inadequacy: Three Systematic Failure Modes
Appendix A.1. Native Vocabulary vs. Loanwords
| English | I am going to the market. |
| Reference | Anga bazarona re·angenga. |
| B1 Prediction | Anga antiona re·angenga. |
| Analysis | bazar is a Hindi/Assamese loanword; anti is the native Garo word for market. The model is more linguistically faithful; BLEU assigns zero credit for the content word. |
Appendix A.2. Semantic Precision: Eating vs. Drinking
| English | Have you eaten rice? |
| Reference | Na·a mi chi·ahama? |
| B1 Prediction | Na·a mi cha·ahama? |
| Analysis | chi·a (consume liquid) is semantically imprecise for eating rice. cha·a (eat solid food) is the contextually correct verb. The model is more accurate; BLEU penalises it for not reproducing the reference’s imprecision. |
Appendix A.3. Politeness and Register
| English | Please sit down. |
| Reference | Asongbo. |
| B1 Prediction | Ka·sapae asongbo. |
| English | Please speak slowly. |
| Reference | Saksate saksate nengmitaniko on·bo. |
| B1 Prediction | Ka·sapae saksate nengmitaniko on·bo. |
| Analysis | In both cases the reference omits the politeness marker. The model adds it correctly, producing a more faithful translation of the English source. BLEU penalises the additional token despite it being semantically necessary. |
References
- Adelani, David Ifeoluwa, Jesujoba Alabi, Angela Fan, and et al. 2022. A few thousand translations go a long way! leveraging pre-trained models for African machine translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle, United States: Association for Computational Linguistics, pp. 3053–3070. [Google Scholar] [CrossRef]
- Eberhard, David M., Gary F. Simons, and Charles D. Fennig. 2023. Ethnologue: Languages of the world. [Google Scholar]
- Feng, Fangxiaoyu, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. 2022. Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics, pp. 878–891. [Google Scholar] [CrossRef]
- Gala, Jay, Pranjal A Madhani, Mitesh M Khapra, and et al. 2023. IndicTrans2: Towards high-quality and accessible machine translation models for all 22 scheduled Indian languages. In Transactions on Machine Learning Research. [Google Scholar]
- Labs, MWire. 2025. garo-english-parallel-corpus (revision 66dbd04). [Google Scholar] [CrossRef]
- Müller, Benjamin, Antonios Anastasopoulos, Benoît Sagot, and Djamé Seddah. 2021. Unseen languages are not unseen: Language families are enough for cross-lingual transfer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, August, pp. 4795–4806. [Google Scholar] [CrossRef]
- NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia-Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. 2022. No language left behind: Scaling human-centered machine translation. arXiv arXiv:2207.04672. [Google Scholar] [CrossRef]
- Nyalang, Badal. 2026. NE-BERT: A multilingual language model for nine Northeast Indian languages. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026). Rabat, Morocco: Association for Computational Linguistics, March, pp. 1–12. [Google Scholar] [CrossRef]
- Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, July, pp. 311–318. [Google Scholar] [CrossRef]
- Popović, Maja. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation. Lisbon, Portugal: Association for Computational Linguistics, September, pp. 392–395. [Google Scholar] [CrossRef]
- Post, Matt. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, Belgium, Brussels. Association for Computational Linguistics, October, pp. 186–191. [Google Scholar] [CrossRef]
- Sennrich, Rico, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, August, pp. 86–96. [Google Scholar] [CrossRef]
| Hyperparameter | Value |
|---|---|
| Base model | NLLB-200-distilled-600M |
| Epochs | 10 |
| Batch size (per device) | 32 |
| Gradient accumulation | 2 (effective: 64) |
| Learning rate | 5e-5 |
| Warmup steps | 200 |
| Precision | bf16 |
| Max sequence length | 128 tokens |
| Beam size (decoding) | 5 |
| Random seed | 42 |
| Hardware | NVIDIA A40 (48GB) |
| English | Reference | B1 Prediction | Note |
|---|---|---|---|
| Farmers work until night. | Gamgiparang walona kingking kam ka·a. | Gamgiparang walona kingking kam ka·a. | Exact match |
| I went to the Army Camp. | Anga Army Camp-on re·angaha. | Anga Army Camp-ona re·angaha. | Interpunct variant only |
| She was very happy to see her family. | Ua gisinni gipinrangko nangna pilaknan man·a. | Ua gisinni gipin-rangko nangan dake man·a. | Valid morphological alt. |
| Please sit down. | Asongbo. | Ka·sapae asongbo. | Model > reference |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).