Submitted:
01 May 2025
Posted:
06 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
<Slavic> <51.14> How are you? – Jak se máš? (Czech) (Training data)
<Slavic> <56.38> How are you? – Как ты? (Russian) (Training data)
<Slavic> <54.26> How are you? – (Model generated Slavic language)
<Slavic> <54.26> How are you? – Як у цябе справы? (Belarusian) (Reference)
<Albanian> <41.20> How are you? – Si jeni? (Albanian) (Training data)
<Slavic> <42.23> How are you? – Как си? (Bulgarian) (Training data)
<Romance> <44.26> How are you? – Ce mai faci? (Romanian) (Training data)
<Germanic> <43.23> How are you? – (Model generated hypothetical Balkan Germanic)
(Expected potential model functionality)
2. Background
3. Training Data
(’<Indo-European>’, ’<Slavic>’, ’<Cyrillic>’, [55.8, 37.6], "After the war, Pavel was only able to work in agriculture...", Пoсле вoйны Павел мoг рабoтать тoлькo в сельскoм хoзяйстве...)
(’<Afro-Asiatic>’, ’<Semitic>’, ’<Georgian>’, [24.7, 46.7], "After the war, Fanno was only able to work in agriculture...", ბდ ლჰრბ, კნ ფნწ ფქთ...)
(’<Indo-European>’, ’<Slavic>’, ’<Latin>’, [55.8, 37.6], "According to Guerraggio & Nastasi (page 9, 2005) Luigi Cremona is considered the founder of the Italian school of algebraic geometry.", Soglasno Gverradzhio i Nastasi (str. 9, 2005) Luidzhi Kremona schitaetsia osnovatelem ital’ianskoi shkoly algebraicheskoi geometrii.)
(’<Indo-European>’, ’<Indic>’, ’<Greek>’, [28.6, 77.2], "They respect traditional values.", βε πααρΝπρικ μυυλυοΝ καα σμμααν κρτε hαιΝ /)
4. Experimental Setup
5. Results
5.1. Language Clustering
5.2. Germanic Interpolation
(59.33, 18.07) | Germanic (Latin): Forskaren presenterade en framgångsrik teori under den internationella fysikkonferensen idag. (Swedish)
(57.46, 16.79) | Germanic (Latin): Forskeren præsenterede en fremragende teori under den internationale fysikkonference i dag. (Closer to Danish)
(55.80, 15.65) | Germanic (Latin): Den videnskabelige præsenterede en fremragende teori under den internationale fysikkonference i dag.
(52.91, 13.67) | Germanic (Latin): Der Wissenschaftler präsentierte heute auf der internationalen Physikskonferenz eine gründliche Theorie. (German)
(57.85, 17.06) | Germanic (Latin): Forskaren presenterade en nystartad teori under den internationella fysikkonferensen idag.
(56.80, 16.34) | Germanic (Latin): Videnskabsmanden præsenterede en fremragende teori under den internationale fysikkonference i dag.
(55.80, 15.65) | Germanic (Latin): Den videnskabelige præsenterede en fremragende teori under den internationale fysikkonference i dag.
(54.51, 14.77) | Germanic (Latin): Der Wissenschaftler præsenterede i dag på den internationale fysikkonference en fremragende teori.
(54.30, 14.62) | Germanic (Latin): Der Wissenschaftler präsentierte heute auf der internationalen Physikskonferenz eine gründliche Theorie
(53.85, 14.31) | Germanic (Latin): Der Wissenschaftler stellte heute auf der internationalen Physikskonferenz eine gründliche Theorie vor
(53.66, 14.19) | Germanic (Latin): Der Wissenschaftler präsentierte heute auf der internationalen Physikskonferenz eine herausragende Theorie.
(53.42, 14.02) | Germanic (Latin): Der Wissenschaftler stellte heute auf der internationalen Physikskonferenz eine gründliche Theorie vor
(52.91, 13.67) | Germanic (Latin): Der Wissenschaftler präsentierte heute auf der internationalen Physikskonferenz eine gründliche Theorie
(52.52, 13.41) | Germanic (Latin): Der Wissenschaftler stellte heute eine gründliche Theorie auf der internationalen Physikskonferenz vor.
5.3. Balkan Germanic Experiment
(44.4, 26.1) | Germanic (Latin): O epistemones parousiase mia prootupse theoria kata te diarkeia tes diethnes sunanteseis phusikes.
(44.4, 26.1) | Germanic (Cyrillic): Научниците представят нoва теoрия пo време на междунарoдната кoнференция пo физиката днес.
(43.0, 23.9) | Germanic (Latin): O epistemones parousiase mia proothetike theoria kata te diarkeia tes diethnes sunergasias phusikes semera.
(43.0, 23.9) | Germanic (Cyrillic): Епистемoнистът епхересе миа прoтхесена теoриа пo време на мегалутера кoнференция пo физиката.
5.4. Slavic Interpolation
(49.8, 15.5) | Slavic (Latin): Vědec předložil novou teorii na mezinárodní fyzikální konferenci dnes.
(49.8, 15.5) | Slavic (Cyrillic): Vědci prezentoval novou theoriiu na mezinárodní fyzikální konferenci dnes.
(51.0, 19.9) | Slavic (Latin): Naukowiec przedstawił nową teorię na konferencji fizyki.
(51.0, 19.9) | Slavic (Cyrillic): Вчені представили перехідну теoрію на сьoгoднішній міжнарoднoї кoнференції з фізики.
(54.6, 33.2) | Slavic (Latin): Uchenik predstavil peredovuiu teoriiu na mezhdunarodnoi konferentsii po fiziki.
(54.6, 33.2) | Slavic (Cyrillic): Ученый представил перелoмную теoрию на междунарoднoй кoнференции пo физике.
(55.8, 37.6) | Slavic (Latin): Uchenik predstavil novuiu teoriiu na mezhdunarodnoi konferentsii po fizike.
(55.8, 37.6) | Slavic (Cyrillic): Ученый представил перелoмную теoрию на междунарoднoй кoнференции пo физике.
5.5. Transliteration Capabilities
(55.8, 37.6) | Slavic (Latin): Uchenik predstavil novuiu teoriiu na mezhdunarodnoi konferentsii po fizike.
Wqd qtrH l`lm nZry@ ry’ysy@ khll mnshr fyzyky ldwly l’ym
Zhe Ge Zhu Yao Huan Zai Guo Jian Ji Zhong Xin Jiao Zhong De Yi Ge Zong Liao Yi Xie Zhi
German (Cyrillic): (52.5, 13.4) | Germanic (Cyrillic): Der Wissenschaftler stellte heute auf der internationalen Physik-Konferenz eine gründliche Theorie vor.
Russian (Hanzi): (55.8, 37.6) | Slavic (Hanzi): 今日,科学家在国际物……性的理论。
(55.8, 37.6) | Semitic (Arabic): Tieteilijä esitti nykyajan teorian kansainvälisessä fysiikan konferenssissa.
Afro-Asiatic Slavic (Latin): Napriek tomu, táto vedecká predstavila novodobú teóriu na medzinárodnej fyzikálnej konferencii.
Afro-Asiatic Slavic (Cyrillic): A tudós átfogó elméletet prezentált a mai nemzetközi fizikai konferencián.
Afro-Asiatic Slavic (Arabic): Naprikes, a tudós prezentálta újonnanou teóriát a mai nemzetközi fizikai konferencián.
6. Conclusion and Future Work
- Increasing the volume and diversity of data
- Training from scratch with explicit tag-based and coordinate-aware supervision
- Incorporating more low-resource languages, such as Kalmyk (a Mongolic language in Europe), Ossetian (an Iranian language in the Caucasus), or Sorbian (a Slavic language in Germany)
- Including extinct languages with written records, like Gothic (a Germanic language once spoken in Southern Europe and Crimea)
- Expanding to the Austronesian family, with its vast geographic dispersion
- Adding additional linguistic dimensions, such as grammatical categories (tense, aspect, voice), sociolinguistic registers, or genre tags
References
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473 2014.
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Advances in Neural Information Processing Systems 2017, 30.
- Johnson, M.; Schuster, M.; Le, Q.V.; Krikun, M.; Wu, Y.; Chen, Z.; Thorat, N.; Viégas, F.; Wattenberg, M.; Corrado, G.; et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of the Association for Computational Linguistics 2017, 5, 339–351. [CrossRef]
- Schwenk, H.; Chaudhary, V.; Sun, S.; Gong, H.; Guzmán, F. WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia. arXiv preprint arXiv:1907.05791 2019.
- Forrest, I. Unidecode: ASCII Transliterations of Unicode Text. Python Package, 2014.
- Koval, Y. Transliter: Multilingual Transliteration Toolkit. Python Package, 2020.
- Arkhipov, M. Transliterate: Transliteration Between Writing Systems. Python Package, 2015.
- Fan, A.; Bhosale, S.; Schwenk, H.; Ma, Z.; Goyal, S.; Baines, M.; Celebi, O.; Wenzek, G.; Chaudhary, V.; Goyal, N.; et al. Beyond English-Centric Multilingual Machine Translation. Journal of Machine Learning Research 2021, 22, 1–48.
- Dai, A.M.; Le, Q.V. Semi-Supervised Sequence Learning. Advances in Neural Information Processing Systems 2015.
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum Learning. In Proceedings of the Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 41–48.
- He, J.; Zhou, C.; Ma, X.; Berg-Kirkpatrick, T.; Neubig, G. Towards a Unified View of Parameter-Efficient Transfer Learning. Proceedings of the 39th International Conference on Machine Learning 2022, pp. 9112–9144.
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; de Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-Efficient Transfer Learning for NLP. Proceedings of the 36th International Conference on Machine Learning 2019, pp. 2790–2799.
- Lester, B.; Al-Rfou, R.; Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021.
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv preprint arXiv:1711.05101 2019.


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).