Submitted:
03 April 2026
Posted:
06 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
Objective of the Study
2. Methodology
2.1. Limitation
2.2. Models Overview
2.3. Metadata Analysis of Multilingual Afrocentric Models from 2018 to 2024
3. Multilingual Testing Proficiency of Cheetah on Ibibio and Margi: A Big Data Analysis of Linguistic Stratification
3.1. A. Prompt-Based Testing and Linguistic Tasks
- Language Identification (LID): The model’s ability to correctly classify the input language.
- Basic Sentence Translation (L1 → English): Assessing cross-lingual transfer and semantic preservation.
- Text Continuation: Evaluating the model’s grasp of discourse coherence and syntactic structure.
- Paraphrase Generation: Measuring semantic flexibility and lexical diversity.
- Question Answering (Closed-domain): Testing grounded understanding and information retrieval.
3.2. B. Empirical Results
- I.
- Ibibio: Partial Competence and Indirect Exposure
- Language Identification: The model failed to provide a clear identification for a complex Ibibio sentence.
- Prompt: Usen oyop ekpo ama adat ekpo aya atop ekeka idiọñ mfuut ufọk ekarika edi ete buuñ ifia. Model Output: Identify the language: Usen oy' → This output is fragmented and fails to produce a coherent language label.
- II.
- Margi: Semantic Collapse and Representational Void
- Language Identification: The model consistently failed to identify Margi, often misclassifying it as the more demographically dominant Hausa or Kanuri.
- Prompt : Abar pidar nyi mji Margi. Nyai lapya?
- Model Output: Identify the language: ‘Abar pidar ' → The output is a nonsensical repetition of the input, indicating an inability to process the language.
3.3. C. From Inclusion to Functional Usability
| Language | Training Data Profile | Model Performance | Key Linguistic Artifacts |
|---|---|---|---|
| Ibibio | Indirect, code-switched, fragmented corpora | Moderate competence, partial semantic understanding | syntactic interference |
| Margi | Minimal or non-existent; mislabelled data | Semantic collapse; structural incoherence | semantic drift; task failure |
4. Conclusions
References
- Adebara, I.; Abdul-Mageed, M. Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go. arXiv. 2022. Available online: http://arxiv.org/abs/2203.08351.
- Adebara, I.; Elmadany, A.; Abdul-Mageed, M. Cheetah: Natural Language Generation for 517 African Languages. arXiv 2024, arXiv:2401.01053. [Google Scholar] [CrossRef]
- Adebara, I.; Elmadany, A.; Abdul-Mageed, M.; Inciarte, A. A. SERENGETI: Massively Multilingual Language Models for Africa. arXiv. 2023. Available online: http://arxiv.org/abs/2212.10785.
- Adzu, I. S. The Margi and their culture (First edition); Paraclete Publishers, 2014. [Google Scholar]
- Afriberta Large. 13 January 2023. Available online: https://huggingface.co/castorini/afriberta_large/blob/main/README.md.
- Alabi, J. O.; Adelani, D. I.; Mosbach, M.; Klakow, D. Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning. arXiv. 2022a. Available online: http://arxiv.org/abs/2204.06487.
- Alabi, J. O.; Adelani, D. I.; Mosbach, M.; Klakow, D. Multilingual Language Model Adaptive Fine-tuning: A Study on African Languages. AfricaNLP Workshop. ICLR2022, 2022b. [Google Scholar]
- Angelakιs, A. N.; Zaccaria, D.; Krasilnikoff, J.; Salgot, M.; Bazza, M.; Roccaro, P.; Jimenez, B.; Kumar, A.; Yinghua, W.; Baba, A.; Harrison, J. A.; Garduno-Jimenez, A.; Fereres, E. Irrigation of World Agricultural Lands: Evolution through the Millennia. Water 2020, 12(5), 1285. [Google Scholar] [CrossRef]
- Archives nationales d’outre-mer. Recherche géographique. [Web]. IREL. 2017. Available online: http://anom.archivesnationales.culture.gouv.fr/geo.php?lieu=Margui-Wandala%2C+Circonscription+%28Cameroun%29.
- Awobade, B.; Oduwole, M.; Kolawole, S. What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models. arXiv 2024, arXiv:2404.04759. [Google Scholar] [CrossRef]
- Birdling, E. A. Rethinking the Effects of the Foreign Missionaries’ Mission to Africa, Focusing on the Church of the Brethren Missionaries Among the Margi Udzirngu in Northern Nigeria [Master]; University of Kansas, 2009. [Google Scholar]
- Birdling, E. A. The Evolution of the Built Environment of the Margi Ethnic Group of Northeastern Nigeria. PhD Dissertation, University of Kansas, 2013. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv. 2019. Available online: http://arxiv.org/abs/1810.04805.
- Diandaru, R.; Susanto, L.; Tang, Z.; Purwarianti, A.; Wijaya, D. Could We Have Had Better Multilingual LLMs If English Was Not the Central Language? arXiv 2024, arXiv:2402.13917. [Google Scholar] [CrossRef]
- Dossou, B. F. P.; Tonja, A. L.; Yousuf, O.; Osei, S.; Oppong, A.; Shode, I.; Awoyomi, O. O.; Emezue, C. C. AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages. arXiv. 2022. Available online: http://arxiv.org/abs/2211.03263.
- Eberhard, D. M.; Simons, G. F.; Fennig, C. D. Ethnologue: Languages of the world. Online Version. In Ethnologue. Com; 2021; Available online: Http://Www.
- Facts about the world’s languages: An encyclopedia of the world’s major languages, past and present; Garry, J., Rubino, C. R. G., Bodomo, A. B., Eds.; H.W. Wilson Co, 2001. [Google Scholar]
- Groeneveld, D.; Beltagy, I.; Walsh, P.; Bhagia, A.; Kinney, R.; Tafjord, O.; Jha, A. H.; Ivison, H.; Magnusson, I.; Wang, Y.; Arora, S.; Atkinson, D.; Authur, R.; Chandu, K. R.; Cohan, A.; Dumas, J.; Elazar, Y.; Gu, Y.; Hessel, J.; Hajishirzi, H. OLMo: Accelerating the Science of Language Models. arXiv. 2024. Available online: http://arxiv.org/abs/2402.00838.
- Homskiy, D.; Maloyan, N. DN at SemEval-2023 Task 12: Low-Resource Language Text Classification via Multilingual Pretrained Language Model Fine-tuning. In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023); 2023; pp. 1537–1541. [Google Scholar] [CrossRef]
- Project, Joshua. Ibibio in Nigeria. Web. 2024a. Available online: https://joshuaproject.net/people_groups/12171/NI.
- Joshua Project. Nigeria people groups, languages and religions | Joshua Project [Web]. Joshua Project. 2024b. Available online: https://joshuaproject.net/countries/NI.
- Miller, I.; Òkôn, M. M. P.; Proctor, D. 0.2.4 Case study. In Practicing Digital Ethnography, 1st edn; Routledge, 2026; pp. 33–41. [Google Scholar] [CrossRef]
- Modu, A.; Jawur, J. I. Domains of Kanuri Loanwords in Margi. Crossings 2021, 12, 203–219. [Google Scholar] [CrossRef]
- Nash, B. L. Love and Learning in the Age of Algorithms: How Intimate Relationships with Artificial Intelligence May Shape Epistemology, Sociality, and Linguistic Justice. Reading Research Quarterly 2024, rrq.549. [Google Scholar] [CrossRef]
- Proceedings of the Ibibio Union; Noah, M. E., Ed.; Modern Business Press Ltd, 1988; pp. 1928–1937. [Google Scholar]
- Ogueji, K.; Zhu, Y.; Lin, J. Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages. In Proceedings of the 1st Workshop on Multilingual Representation Learning; Ataman, D., Birch, A., Conneau, A., Firat, O., Ruder, S., Sahin, G. G., Eds.; Association for Computational Linguistics, 2021; pp. 116–126. [Google Scholar] [CrossRef]
- Okon, M. M.; Noah, P. Cultural Dominance and Language Endangerment: The case of Efut in Cross River State, Nigeria. Macrolinguistics 2021, 9(14), 134–150. [Google Scholar] [CrossRef]
- Scao, T. L.; Fan, A.; Akiki, C.; Pavlick, E.; Ilić, S.; Hesslow, D.; Castagné, R.; Luccioni, A. S.; Yvon, F.; Gallé, M.; Tow, J.; Rush, A. M.; Biderman, S.; Webson, A.; Ammanamanchi, P. S.; Wang, T.; Sagot, B.; Muennighoff, N.; del Moral, A. V.; Wolf, T. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (Version 4). arXiv 2022. [Google Scholar] [CrossRef]
- Statista Research Department. Africa: Number of living languages by country 2022. Statista. 30 June 2024. Available online: https://www.statista.com/statistics/1280625/number-of-living-languages-in-africa-by-country/.
- University of Waterloo. New AI brings the power of natural language processing to African languages. [Web]. TechExplore. 9 November 2021. Available online: https://techxplore.com/news/2021-11-ai-power-natural-language-african.html.
- Vaughan, J. H., Jr. The Margi of the Mandaras: A Society on the Verge; Indiana University Press, 2000; Available online: http://www.indiana.edu/~margi/.
- Waliya, Y. J. Twittérature: Lecture symétrique du Twitterbot-théâtre. In Exploring Contemporary Digital Poetics.; Laboratoire de Langue, Littérature, Imaginaire et Esthétique, 2022; pp. 217–245. [Google Scholar]
- Xue, B.; Wang, H.; Wang, W.; Wang, R.; Wang, S.; Liu, Z.; Wong, K.-F. A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models. arXiv. 2024. Available online: http://arxiv.org/abs/2402.13606.
- Xue, L.; Constant, N.; Roberts, A.; Kale, M.; Al-Rfou, R.; Siddhant, A.; Barua, A.; Raffel, C. mT5: A massively multilingual pre-trained text-to-text transformer. arXiv 2021, arXiv:2010.11934. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).