Submitted:
14 January 2026
Posted:
15 January 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- The availability of task-specific annotations ;
- The existence of unannotated texts in the language ;
- The presence of auxiliary data.

2. Sociolinguistic and Linguistic Features
2.1. Language Digraphs
2.2. Language Overview
2.2.1. Wolof
2.2.2. Pulaar / Fula
2.2.3. Sérère (Sereer)
2.2.4. Diola (Joola)
2.2.5. Mandingue (Mandinka)
2.2.6. Soninké
3. Low-Resource Context and Data Availability
4. Current NLP Efforts and Tasks
4.1. Parsing & Tokenization
4.2. Token Classification
4.3. Text Classification
4.3.1. Sentiment Analysis
4.3.2. Hate Speech Detection
4.3.3. Intent Classification
4.4. Lexicons & Spell Checking
- Those based on expert rules;
- Those incorporating a context model that allows candidate corrections to be reorganized;
- Those that learn error patterns from a training dataset.
- The maintenance complexity due to the rapid increase in the number of rules and the increasing difficulty of updates ;
- The dependence on the size of the dictionary ;
- The lack of linguistic context awareness.
4.5. Machine Translation
4.6. Question Answering and Dialogue Systems
- Commerce: Automation of customer services, intelligent recommendations, optimization of sales processes ;
- Healthcare: Digital medical assistance, easier access to health information, AI-assisted preliminary diagnosis ;
- Banking and fintech: Simplification of transactions, user support for mobile banking services ;
- Education: Access to online learning in local languages, interactive tutorials via educational chatbots, support for digital literacy.
4.7. Speech Processing
4.7.1. Automatic Speech Recognition
- Incentivise the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge ;
- Support research fellows for a period of 3-4 months to create datasets annotated for NLP tasks ;
- Host competitive Machine Learning challenges on the basis of these datasets.
4.7.2. Speech Synthesis
4.7.3. Spoken Dialog Systems
5. Case Study: NLP Pipelines for the Social Sciences
6. Discussions & Perspectives
6.1. Challenges
6.2. Opportunities and Future Directions
7. Conclusion
Acknowledgments
References
- Nekoto, W.; Marivate, V.; Matsila, T.; Fasubaa, T.; Fagbohungbe, T.; Akinola, S.O.; Muhammad, S.; Kabongo Kabenamualu, S.; Osei, S.; Sackey, F.; et al. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020; Cohn, T.; He, Y.; Liu, Y., Eds., Online, 2020; pp. 2144–2160. [CrossRef]
- Hedderich, M.; Lange, L.; Adel, H.; Strötgen, J.; Klakow, D. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 01 2021, pp. 2545–2568. [CrossRef]
- Adebara, I.; Abdul-Mageed, M. Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go. In Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022; Volume 1: Long Papers, pp. 3814–3841. [Google Scholar] [CrossRef]
- Esch, D.V.; Lucassen, T.; Ruder, S.; Caswell, I.; Rivera, C.E. Writing System and Speaker Metadata for 2,800+ Language Varieties. In Proceedings of the Proceedings of the Language Resources and Evaluation Conference, Marseille, France, 2022; pp. 5035–5046. [Google Scholar]
- Adebara, I.; Toyin, H.O.; Ghebremichael, N.T.; Elmadany, A.A.; Abdul-Mageed, M. Where Are We? Evaluating LLM Performance on African Languages. In Proceedings of the Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics; Vienna, Austria, Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; 2025; Volume 1: Long Papers, pp. 32704–32731. [Google Scholar] [CrossRef]
- Olofsson, S. When AI Can’t Understand Your Language, Democracy Breaks Down. https://www.techpolicy.press/when-ai-cant-understand-your-language-democracy-breaks-down-/, 2025. Accessed: 2025-12-17.
- Simpson, A. Language and National Identity in Africa; Oxford University Press, 2008. [Google Scholar] [CrossRef]
- Dimé, M. Reflux des solidarités intergénérationnelles en contexte de précarité à Dakar. Gérontologie et société 2019, 41, 85–98. [Google Scholar] [CrossRef]
- Tonja, A.L.; Belay, T.D.; Azime, I.A.; Ayele, A.A.; Mehamed, M.A.; Kolesnikova, O.; Yimam, S.M. Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities. arXiv 2023, arXiv:2303.14406. [Google Scholar] [CrossRef]
- Leclerc, J. « Senegal » dans L’aménagement linguistique dans le monde, Québec, CEFAN, Université Laval. http://www.axl.cefan.ulaval.ca/afrique/senegal.htm, 2015. Accessed: 2025-12-16.
- World Bank. Population, total - Senegal. https://data.worldbank.org/indicator/SP.POP.TOTL?locations=SN, 2024. Accessed: 2025-12-16.
- Ouzerrout, S.; Saadallah, I. Réhabiliter l’écriture Ajami : un levier technologique pour l’alphabétisation en Afrique. In Proceedings of the Actes des 18e Rencontres Jeunes Chercheurs en RI (RJCRI) et 27ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL); Bechet, F.; Chifu, A.G.; Pinel-sauvagnat, K.; Favre, B.; Maes, E.; Nurbakova, D., Eds., Marseille, France, 6 2025; pp. 253–267.
- Ngom, F. Ajami Scripts in the Senegalese Speech Community. Journal of Arabic and Islamic Studies 2017, 10, 1–23. [Google Scholar] [CrossRef]
- Nguer, E.M.; Bao, D.S.; Fall, Y.A.; Khoule, M. Digraph of Senegal s local languages: issues, challenges and prospects of their transliteration. arXiv 2020, arXiv:2005.02325. [Google Scholar] [CrossRef]
- Le, N.T.; Mijiyawa, A.; Leye, A.; Sadat, F. The Best of Both Worlds: Exploring Wolofal in the Context of NLP. In Proceedings of the Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script; El-Haj, M., Ed., Abu Dhabi, UAE, 2025; pp. 1–6.
- Fall, E.M.; hadji M. Nguer, E.; Sokhna, B.D.; Khoule, M.; Mangeot, M.; Cisse, M.T. Digraphie des langues ouest africaines: Latin2Ajami: un algorithme de translitteration automatique. arXiv, 2020; arXiv:cs.CL/2005.02827. [Google Scholar]
- Eberhard, D.; Simons, G.; Fennig, C. Ethnologue: Languages of the World, 22nd Edition; SIL International, 2019. [Google Scholar]
- Gauthier, E.; Besacier, L.; Voisin, S.; Melese, M.; Elingui, U.P. Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof. In Proceedings of the Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16); Calzolari, N.; Choukri, K.; Declerck, T.; Goggi, S.; Grobelnik, M.; Maegaard, B.; Mariani, J.; Mazo, H.; Moreno, A.; Odijk, J.; et al., Eds., Portorož, Slovenia, 2016; pp. 3863–3867.
- Çetinoğlu, Ö.; Schulz, S.; Vu, N.T. Challenges of Computational Processing of Code-Switching. In Proceedings of the Proceedings of the Second Workshop on Computational Approaches to Code Switching, Austin, Texas, 2016; pp. 1–11. [CrossRef]
- Kihm, A. Le sérère (seereer siin). working paper or preprint.
- Creissels, D. Le joola fooñi. working paper or preprint.
- Creissels, D. Le mandinka (màndìŋkàkáŋò). working paper or preprint.
- Tapo, A.A.; Assogba, K.; Homan, C.M.; Rafique, M.M.; Zampieri, M. Bayelemabaga: Creating Resources for Bambara NLP. In Proceedings of the Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); Chiruzzo, L.; Ritter, A.; Wang, L., Eds., Albuquerque, New Mexico, 2025; pp. 12060–12070. [CrossRef]
- Creissels, D.; Ismael, D. LE SONINKÉ[quelques contrastes pertinents pour l’acquisition du Français Langue Seconde par des locuteurs du soninké], 2016. LE SONINKÉ[quelques contrastes pertinents pour l’acquisition du Français Langue Seconde par des locuteurs du soninké].
- Scharff, C.; Gaikhe, O.; Tretyakov, J.; Shah, N.K.; Scharff, E. AI Strategies: A Review of Selected National and Intergovernmental Approaches. In Proceedings of the Research and Innovation Forum 2024; Visvizi, A.; Troisi, O.; Corvello, V.; Grimaldi, M., Eds., Cham, 2026; pp. 3–17.
- Ndiaye, S. NLP and Some Research Results in Senegal. In Proceedings of the Mathematics of Computer Science, Cybersecurity and Artificial Intelligence; Gueye, C.T.; Ngom, P.; Diop, I., Eds., Cham, 2024; pp. 21–29. Gueye, C.T., Ngom, P., Diop, I., Eds..
- Grallet, G. Pionniers: Voyage aux frontières de l’intelligence artificielle; Bernard Grasset: Paris, France, 2025. Release date: 05/11/2025.
- Heng, S.; Tsilionis, K.; Scharff, C.; Wautelet, Y. Understanding AI ecosystems in the Global South: The cases of Senegal and Cambodia. International Journal of Information Management 2022, 64, 102454. [Google Scholar] [CrossRef]
- Sinha, Y.R. What is Parsing in NLP: Its Types and Techniques. https://intellipaat.com/blog/what-is-parsing-in-nlp/, 2025. Accessed: 2025-12-05.
- Jaiswal, S. Natural Language Processing – Dependency Parsing. https://towardsdatascience.com/natural-language-processing-dependency-parsing-cf094bbbe3f7/, 2021. Accessed: 2025-12-05.
- Dione, C.M.B. A Morphological Analyzer For Wolof Using Finite-State Techniques. In Proceedings of the Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 2012; pp. 894–901. [Google Scholar]
- Dione, C.B. Handling Wolof clitics in LFG. In Challenging Clitics; Salvesen, C.M.; Helland, H.P., Eds.; John Benjamins Publishing Company, 2013; chapter Handling Wolof clitics in LFG, pp. 87–118. [CrossRef]
- Dione, C.M.B. Finite-State Tokenization for a Deep Wolof LFG Grammar. Bergen Language and Linguistics Studies 2017, 8. [Google Scholar] [CrossRef]
- Dione, C.M.B. Implementation and Evaluation of an LFG-based Parser for Wolof. In Proceedings of the Proceedings of the Twelfth Language Resources and Evaluation Conference; Calzolari, N.; Béchet, F.; Blache, P.; Choukri, K.; Cieri, C.; Declerck, T.; Goggi, S.; Isahara, H.; Maegaard, B.; Mariani, J.; et al., Eds., Marseille, France, 2020; pp. 5128–5136.
- Sulger, S.; Butt, M.; King, T.H.; Meurer, P.; Laczkó, T.; Rákosi, G.; Dione, C.B.; Dyvik, H.; Rosén, V.; De Smedt, K.; et al. ParGramBank: The ParGram Parallel Treebank. In Proceedings of the Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Schuetze, H.; Fung, P.; Poesio, M., Eds., Sofia, Bulgaria, 2013; pp. 550–560.
- Dione, C.M.B. Pruning the Search Space of the Wolof LFG Grammar Using a Probabilistic and a Constraint Grammar Parser. In Proceedings of the Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14); Calzolari, N.; Choukri, K.; Declerck, T.; Loftsson, H.; Maegaard, B.; Mariani, J.; Moreno, A.; Odijk, J.; Piperidis, S., Eds., Reykjavik, Iceland, 2014; pp. 2863–2870.
- Dione, B. LFG parse disambiguation for Wolof. Journal of Language Modelling 2014, 2, 105. [Google Scholar] [CrossRef]
- Dione, C.B. Developing Universal Dependencies for Wolof. In Proceedings of the Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019); Rademaker, A.; Tyers, F., Eds., Paris, France, 2019; pp. 12–23. [CrossRef]
- Qi, P.; Zhang, Y.; Zhang, Y.; Bolton, J.; Manning, C.D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. arXiv 2020, arXiv:2003.07082. [Google Scholar] [CrossRef]
- Arnett, C.; Hudspeth, M.; O’Connor, B. Evaluating Morphological Alignment of Tokenizers in 70 Languages. arXiv 2025, arXiv:2507.06378. [Google Scholar] [CrossRef]
- Dione, C.M.B. From LFG To UD: A Combined Approach. In Proceedings of the Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020); de Marneffe, M.C.; de Lhoneux, M.; Nivre, J.; Schuster, S., Eds., Barcelona, Spain (Online), 2020; pp. 57–66.
- Dione, C.M.B. Multilingual Dependency Parsing for Low-Resource African Languages: Case Studies on Bambara, Wolof, and Yoruba. In Proceedings of the Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021); Oepen, S.; Sagae, K.; Tsarfaty, R.; Bouma, G.; Seddah, D.; Zeman, D., Eds., Online, 2021; pp. 84–92. [CrossRef]
- HuggingFace. Token Classification. https://huggingface.co/tasks/token-classification, 2024. Accessed: 2025-11-26.
- Adelani, D.I.; Abbott, J.; Neubig, G.; D’souza, D.; Kreutzer, J.; Lignos, C.; Palen-Michel, C.; Buzaaba, H.; Rijhwani, S.; Ruder, S.; et al. MasakhaNER: Named Entity Recognition for African Languages. Transactions of the Association for Computational Linguistics 2021, 9, 1116–1131. [Google Scholar] [CrossRef]
- Adelani, D.I.; Neubig, G.; Ruder, S.; Rijhwani, S.; Beukman, M.; Palen-Michel, C.; Lignos, C.; Alabi, J.O.; Muhammad, S.H.; Nabende, P.; et al. MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; Goldberg, Y.; Kozareva, Z.; Zhang, Y., Eds., Abu Dhabi, United Arab Emirates, 2022; pp. 4488–450. [CrossRef]
- Dione, C.M.B.; Kuhn, J.; Zarrieß, S. Design and Development of Part-of-Speech-Tagging Resources for Wolof (Niger-Congo, spoken in Senegal). In Proceedings of the Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10); Calzolari, N.; Choukri, K.; Maegaard, B.; Mariani, J.; Odijk, J.; Piperidis, S.; Rosner, M.; Tapias, D., Eds., Valletta, Malta, 2010.
- Dione, C.M.B.; Adelani, D.I.; Nabende, P.; Alabi, J.; Sindane, T.; Buzaaba, H.; Muhammad, S.H.; Emezue, C.C.; Ogayo, P.; Aremu, A.; et al. MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Rogers, A.; Boyd-Graber, J.; Okazaki, N., Eds., Toronto, Canada, 2023; pp. 10883–10900. [CrossRef]
- Geitgey, A. Text Classification is Your New Secret Weapon. https://medium.com/@ageitgey/text-classification-is-your-new-secret-weapon-7ca4fad15788, 2018. Accessed: 2025-11-27. Available online:.
- Amazon, A. What is Sentiment Analysis? https://aws.amazon.com/what-is/sentiment-analysis/, 2024. Accessed: 2025-11-27.
- Kandé, D.; Marone, R.M.; Ndiaye, S.; Camara, F. A Novel Term Weighting Scheme Model. In Proceedings of the Proceedings of the 4th International Conference on Frontiers of Educational Technologies, New York, NY, USA, 2018; ICFET ’18, p. 92–96. [CrossRef]
- Das, M.; K., S.; Alphonse, P.J.A. A Comparative Study on TF-IDF feature Weighting Method and its Analysis using Unstructured Dataset. arXiv 2023, arXiv:2308.04037. [Google Scholar] [CrossRef]
- Chen, K.; Zhang, Z.; Long, J.; Zhang, H. Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst. Appl. 2016, 66, 245–260. [Google Scholar] [CrossRef]
- Mbaye, D.; Diallo, M. Beqi: Revitalize the Senegalese Wolof Language with a Robust Spelling Corrector. In Proceedings of the Innovations and Interdisciplinary Solutions for Underserved Areas; Cheikh M. F. Kebe, K.; Gueye, A.; Ndiaye, A.; Sene, N.A.; Maiga, A.S., Eds., Cham, 2025; pp. 311–325.
- Kandé, D.; Camara, F.; Ndiaye, S.; Guirassy, F.M.L. FWLSA-score: French and Wolof Lexicon-based for Sentiment Analysis. In Proceedings of the 2019 5th International Conference on Information Management (ICIM), 2019, pp. 215–220. [CrossRef]
- Samb, S.M.K.; Kandé, D.; Camara, F.; Ndiaye, S. Improved Bilingual Sentiment Analysis Lexicon Using Word-level Trigram. In Proceedings of the 2019 IEEE 5th International Conference on Computer and Communications (ICCC), 2019, pp. 112–119. [CrossRef]
- Sarr, A.D.; Kandé, D.; Camara, F. Markov Model for French-Wolof Text Analysis. In Proceedings of the 2023 International Conference on Communications, Computing and Artificial Intelligence (CCCAI), 2023, pp. 29–34. [CrossRef]
- Faty, L.; Ndiaye, M.; Sarr, E.N.; Sall, O.; Mbaye, S.N.; Landu, T.T.; Birregah, B.; Bousso, M.; Toure, F. SenOpinion: A New Lexicon for Opinion Tagging in Senegalese News Comments. In Proceedings of the 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), 2020, pp. 1–6. [CrossRef]
- Sarr, A.D.; Kandé, D.; Camara, F. A lexicon-based sentiment analysis approach using a graph structure for modeling relationships between opinion words in French and Wolof corpora. In Proceedings of the Proceedings of the 2024 2nd International Conference on Communications, Computing and Artificial Intelligence, New York, NY, USA, 2024; CCCAI ’24, p. 71–76. [CrossRef]
- Faty, L.; Drame, K.; Ngor Sarr, E.; Ndiaye, M.; Dia, Y.; Sall, O. COMFO : Corpus Multilingue pour la Fouille d’Opinions (COMFO: Multilingual Corpus for Opinion Mining). In Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale; Estève, Y.; Jiménez, T.; Parcollet, T.; Zanon Boito, M., Eds., Avignon, France, 6 2022; pp. 297–304.
- Mbaye, D.; Seye, M.R.; Diallo, M.; Ndiaye, M.L.; Sow, D.; Adjanohoun, D.S.; Mbengue, T.; Wade, C.S.; Pablo, D.R.; Munyaka, J.C.B.; et al. Sentiment Analysis on the Young People’s Perception About the Mobile Internet Costs in Senegal. In Proceedings of the Proceedings of Tenth International Congress on Information and Communication Technology; Yang, X.S.; Sherratt, R.S.; Dey, N.; Joshi, A., Eds., Singapore, 2026; pp. 201–217.
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; Jurafsky, D.; Chai, J.; Schluter, N.; Tetreault, J., Eds., Online, 2020; pp. 8440–8451. [CrossRef]
- Barbieri, F.; Anke, L.E.; Camacho-Collados, J. XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. arXiv 2022, arXiv:2104.12250. [Google Scholar] [CrossRef]
- Malik, J.S.; Qiao, H.; Pang, G.; van den Hengel, A. Deep Learning for Hate Speech Detection: A Comparative Study. arXiv 2023, arXiv:cs.CL/2202.0951. [Google Scholar] [CrossRef]
- Jacobs, C.; Rakotonirina, N.C.; Chimoto, E.A.; Bassett, B.A.; Kamper, H. Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili. arXiv 2023, arXiv:2306.00410. [Google Scholar] [CrossRef]
- Ndao, I.; Dramé, K.; Sambe, G.; Diallo, G. Annotated tweet data of mixed Wolof-French for detecting obnoxious messages. Data in Brief 2025, 60, 111500. [Google Scholar] [CrossRef]
- Ndao, I.; Dramé, K.; Sambe, G.; Diallo, G. Comparative Study of Machine Learning Models for the Detection of Abusive Messages: Case of Wolof-French Codes Mixing Data. In Proceedings of the Innovations and Interdisciplinary Solutions for Underserved Areas; Cheikh M. F. Kebe, K.; Gueye, A.; Ndiaye, A.; Sene, N.A.; Maiga, A.S., Eds., Cham, 2025; pp. 252–263.
- Ndao, I.; Dramé, K.; Sambe, G.; Diallo, G. AbuseBERT-WoFr: refined BERT model for detecting abusive messages on tweets mixing Wolof-French codes. In Proceedings of the Proceedings of Digital Avenues for Low-Resource Languages of Sub-Saharan Africa (DASSA 2025); Melatagia Yonta, P.; Nouvel, D.; Valentin, S., Eds., Yaoundé, Cameroon, 2025; p. 42 p. Source Agritrop Cirad (https://agritrop.cirad.fr/614384/).
- Razumovskaia, E.; Glavaš, G.; Majewska, O.; Ponti, E.M.; Korhonen, A.; Vulić, I. Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems. arXiv 2022, arXiv:cs.CL/2104.08570. [Google Scholar] [CrossRef]
- Ravuri, S.; Stolcke, A. Recurrent Neural Network and LSTM Models for Lexical Utterance Classification. In Proceedings of the Proc. Interspeech. ISCA - International Speech Communication Association, September 2015, pp. 135–139.
- Mesnil, G.; Dauphin, Y.; Yao, K.; Bengio, Y.; Deng, L.; Hakkani-Tur, D.; He, X.; Heck, L.; Tur, G.; Yu, D.; et al. Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2015, 23, 530–539. [Google Scholar] [CrossRef]
- Weld, H.; Huang, X.; Long, S.; Poon, J.; Han, S.C. A survey of joint intent detection and slot-filling models in natural language understanding. arXiv 2021, arXiv:cs.CL/2101.08091. [Google Scholar] [CrossRef]
- Yu, H.; Alabi, J.O.; Bukula, A.; Zhuang, J.Y.; Lee, E.S.A.; Guge, T.K.; Azime, I.A.; Buzaaba, H.; Sibanda, B.K.; Kalipe, G.K.; et al. INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages. arXiv 2025, arXiv:2502.09814. [Google Scholar] [CrossRef]
- Kandji, A.K.; Precioso, F.; Ba, C.; Ndiaye, S.; Ndione, A. WolBanking77: Wolof Banking Speech Intent Classification Dataset. arXiv 2025, arXiv:2509.19271. [Google Scholar] [CrossRef]
- Casanueva, I.; Temčinas, T.; Gerz, D.; Henderson, M.; Vulić, I. Efficient Intent Detection with Dual Sentence Encoders. In Proceedings of the Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI; Wen, T.H.; Celikyilmaz, A.; Yu, Z.; Papangelis, A.; Eric, M.; Kumar, A.; Casanueva, I.; Shah, R., Eds., Online, 2020; pp. 38–45. [CrossRef]
- Ying, C.; Thomas, S. Label Errors in BANKING77. In Proceedings of the Proceedings of the Third Workshop on Insights from Negative Results in NLP; Tafreshi, S.; Sedoc, J.; Rogers, A.; Drozd, A.; Rumshisky, A.; Akula, A., Eds., Dublin, Ireland, 2022; pp. 139–143. [CrossRef]
- Yu, S.; Sun, Q.; Zhang, H.; Jiang, J. Translate-Train Embracing Translationese Artifacts. In Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); Muresan, S.; Nakov, P.; Villavicencio, A., Eds., Dublin, Ireland, 2022; pp. 362–370. [CrossRef]
- Singh, S.; Vargus, F.; D’souza, D.; Karlsson, B.F.; Mahendiran, A.; Ko, W.Y.; Shandilya, H.; Patel, J.; Mataciunas, D.; O’Mahony, L.; et al. Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Ku, L.W.; Martins, A.; Srikumar, V., Eds., Bangkok, Thailand, 2024; pp. 11521–1156. Bangkok, Thailand. [CrossRef]
- Hládek, D.; Staš, J.; Pleva, M. Survey of Automatic Spelling Correction. Electronics 2020, 9. [Google Scholar] [CrossRef]
- Hládek, D.; Staš, J.; Pleva, M. Survey of Automatic Spelling Correction. Electronics 2020, 9, 1670. [Google Scholar] [CrossRef]
- Mangeot, M.; Enguehard, C. DILAF : des dictionnaires africains en ligne et une méthodologie. In Proceedings of the Francophonie et Langues Nationales, Dakar, Senegal, 2014.
- Mbodj, C.; Enguehard, C. Production et mise en ligne d’un dictionnaire électronique du wolof. JEP-TALN-RECITAL 2015, 2. [Google Scholar]
- Lo, A.; Nguer, E.H.M.; Abdoulaye, N.; Dione, C.B.; Mangeot, M.; Khoule, M.; Bao-Diop, S.; Cissé, M.T. Correction orthographique pour la langue wolof : état de l’art et perspectives. In Proceedings of the JEP-TALN-RECITAL 2016: Traitement Automatique des Langues Africaines TALAF 2016, Paris, France, 2016.
- Khoule, M.; Mangeot, M.; Nguer, E.H.M.; Cissé, M.T. iBaatukaay : un projet de base lexicale multilingue contributive sur le web à structure pivot pour les langues africaines notamment sénégalaises. In Proceedings of the hal-02054921 , version 1, 2016.
- Cissé, T.I.; Sadat, F. Automatic Spell Checker and Correction for Under-represented Spoken Languages: Case Study on Wolof. In Proceedings of the Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023); Mabuya, R.; Mthobela, D.; Setaka, M.; Van Zaanen, M., Eds., Dubrovnik, Croatia, 2023; pp. 1–10. [CrossRef]
- Goodman, J. The State of the Art in Language Modeling. In Proceedings of the Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Tutorials - Volume 5, USA, 2003; NAACL-Tutorials ’03, p. 4. [CrossRef]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv 2014, 1409. [Google Scholar]
- Cissé, T.I.; Sadat, F. Advancing Language Diversity and Inclusion: Towards a Neural Network-based Spell Checker and Correction for Wolof. In Proceedings of the Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024; Mabuya, R.; Matfunjwa, M.; Setaka, M.; van Zaanen, M., Eds., Torino, Italia, 2024; pp. 140–151.
- Khoulé, M.; Mangeot, M.; Nguer, M. Manipulation de dictionnaires d’origines diverses pour des langues peu dotées : la méthodologie iBaatukaay. In Proceedings of the Traitement Automatique des Langues Africaines 2018, Grenoble, France, 2018.
- IFAN. Sentermino, première plateforme terminologique du Sénégal. https://ifan.ucad.sn/sentermino-premiere-plateforme-terminologique-du-senegal/, 2025. Accessed: 2025-11-24.
- Outeirinho, D.B.; Otero, P.G.; de Dios-Flores, I.; Campos, J.R.P. Exploring the effects of vocabulary size in neural machine translation: Galician as a target language. In Proceedings of the Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1; Gamallo, P.; Claro, D.; Teixeira, A.; Real, L.; Garcia, M.; Oliveira, H.G.; Amaro, R., Eds., Santiago de Compostela, Galicia/Spain, 2024; pp. 600–604.
- Koehn, P. Statistical Machine Translation; Cambridge University Press, 2009. [CrossRef]
- Luong, T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015; pp. 1412–1421. [CrossRef]
- Koehn, P.; Knowles, R. Six Challenges for Neural Machine Translation. In Proceedings of the Proceedings of the First Workshop on Neural Machine Translation, Vancouver, 2017; pp. 28–39. [CrossRef]
- Federmann, C.; Kocmi, T.; Xin, Y. NTREX-128 – News Test References for MT Evaluation of 128 Languages. In Proceedings of the Proceedings of the First Workshop on Scaling Up Multilingual Evaluation; Ahuja, K.; Anastasopoulos, A.; Patra, B.; Neubig, G.; Choudhury, M.; Dandapat, S.; Sitaram, S.; Chaudhary, V., Eds., Online, 2022; pp. 21–24. [CrossRef]
- Caswell, I.; Nielsen, E.; Luo, J.; Cherry, C.; Kovacs, G.; Shemtov, H.; Talukdar, P.; Tewari, D.; Diane, B.M.; Diane, D.; et al. SMOL: Professionally translated parallel data for 115 under-represented languages. arXiv 2025, arXiv:2502.12301. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I.; Luxburg, U.V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Eds. Curran Associates, Inc., 2017, Vol. 3.
- Ranathunga, S.; Lee, E.S.A.; Skenduli, M.P.; Shekhar, R.; Alam, M.; Kaur, R. Neural Machine Translation for Low-Resource Languages: A Survey. arXiv 2021, arXiv:cs.CL/2106.15115. [Google Scholar] [CrossRef]
- Nguer, E.M.; Lo, A.; Dione, C.M.B.; Ba, S.O.; Lo, M. SENCORPUS: A French-Wolof Parallel Corpus. In Proceedings of the Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 2020; pp. 2803–2811.
- Lo, A.; Dione, C.M.B.; Nguer, E.M.; Ba, S.O.; Lo, M. Building Word Representations for Wolof Using Neural Networks. In Proceedings of the Innovations and Interdisciplinary Solutions for Underserved Areas; Thorn, J.P.R.; Gueye, A.; Hejnowicz, A.P., Eds., Cham, 2020; pp. 274–286.
- Alla, L.; Bamba, D.C.; Mamadou, N.E.; Ba, B.S.O.; Moussa, L. Using LSTM to Translate French to Senegalese Local Languages: Wolof as a Case Study. arXiv 2020, arXiv:2004.13840. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural computation 1997, 9, 1735–80. [Google Scholar] [CrossRef]
- Mbaye, D.; Diallo, M.; Diop, T.I. Low-Resourced Machine Translation for Senegalese Wolof Language. In Proceedings of the Proceedings of Eighth International Congress on Information and Communication Technology; Yang, X.S.; Sherratt, R.S.; Dey, N.; Joshi, A., Eds., Singapore, 2024; pp. 243–255.
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002; pp. 311–318. [CrossRef]
- Jónsson, H.P.; Símonarson, H.B.; Snæbjarnarson, V.; Steingrímsson, S.; Loftsson, H. Experimenting with Different Machine Translation Models in Medium-Resource Settings. In Proceedings of the Text, Speech, and Dialogue: 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8–11, 2020, Proceedings, Berlin, Heidelberg, 2020; p. 95–103. [CrossRef]
- Dione, C.M.B.; Lo, A.; Nguer, E.M.; Ba, S. Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for WolofFrench. In Proceedings of the Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, 2022; pp. 6654–6661.
- Kapoor, S.; Cantrell, E.; Peng, K.; Pham, T.H.; Bail, C.A.; Gundersen, O.E.; Hofman, J.M.; Hullman, J.; Lones, M.A.; Malik, M.M.; et al. REFORMS: Reporting Standards for Machine Learning Based Science. arXiv 2023, arXiv:2308.07832. [Google Scholar] [CrossRef]
- Lones, M.A. How to avoid machine learning pitfalls: a guide for academic researchers. arXiv 2024, arXiv:2108.02497. [Google Scholar] [CrossRef]
- Lo, A.; Nguer, E.M.; Ba, S.O.; Dione, C.M.B.; Lo, M. SenTekki: Online Platform and Restful Web Service for Translation Between Wolof and French. In Proceedings of the Innovations and Interdisciplinary Solutions for Underserved Areas; Mambo, A.D.; Gueye, A.; Bassioni, G., Eds., Cham, 2022; pp. 290–298.
- Arivazhagan, N.; Bapna, A.; Firat, O.; Lepikhin, D.; Johnson, M.; Krikun, M.; Chen, M.X.; Cao, Y.; Foster, G.; Cherry, C.; et al. Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges. arXiv 2019, arXiv:1907.05019. [Google Scholar] [CrossRef]
- Adelani, D.; Alabi, J.; Fan, A.; Kreutzer, J.; Shen, X.; Reid, M.; Ruiter, D.; Klakow, D.; Nabende, P.; Chang, E.; et al. Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation. In Proceedings of the Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, United States, 2022; pp. 3053–3070. [CrossRef]
- Fan, A.; Bhosale, S.; Schwenk, H.; Ma, Z.; El-Kishky, A.; Goyal, S.; Baines, M.; Celebi, O.; Wenzek, G.; Chaudhary, V.; et al. Beyond English-Centric Multilingual Machine Translation. arXiv 2020, arXiv:2010.11125. [Google Scholar] [CrossRef]
- Mohammadshahi, A.; Nikoulina, V.; Berard, A.; Brun, C.; Henderson, J.; Besacier, L. SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages. arXiv 2022, arXiv:cs.CL/2210.11621. [Google Scholar]
- Mbaye, D.; Diallo, M. Task-Oriented Dialog Systems for the Senegalese Wolof Language. In Proceedings of the Proceedings of the 31st International Conference on Computational Linguistics; Rambow, O.; Wanner, L.; Apidianaki, M.; Al-Khalifa, H.; Eugenio, B.D.; Schockaert, S., Eds., Abu Dhabi, UAE, 2025; pp. 4803–4812.
- Team, N.; Costa-jussà, M.R.; Cross, J.; Çelebi, O.; Elbayad, M.; Heafield, K.; Heffernan, K.; Kalbassi, E.; Lam, J.; Licht, D.; et al. No Language Left Behind: Scaling Human-Centered Machine Translation. arXiv 2022, arXiv:2207.04672. [Google Scholar] [CrossRef]
- Anil, R.; Dai, A.M.; Firat, O.; Johnson, M.; Lepikhin, D.; Passos, A.; Shakeri, S.; Taropa, E.; Bailey, P.; Chen, Z.; et al. PaLM 2 Technical Report. arXiv 2023, arXiv:2305.10403. [Google Scholar] [CrossRef]
- Caswell, I. 110 new languages are coming to Google Translate. https://blog.google/products/translate/google-translate-new-languages-2024/, 2024. Accessed: 2025-11-23.
- DeepL. DeepL Translator languages. https://support.deepl.com/hc/en-us/articles/360019925219-DeepL-Translator-languages, 2025. Accessed: 2025-11-28.
- Smartling. How Accurate is DeepL? https://www.smartling.com/blog/how-accurate-is-deepl, 2024. Accessed: 2025-11-28.
- Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd ed.; Prentice Hall, 2025. Online manuscript released January 12, 202.
- Young, S.; Gašić, M.; Thomson, B.; Williams, J.D. POMDP-Based Statistical Spoken Dialog Systems: A Review. Proceedings of the IEEE 2013, 101, 1160–1179. [Google Scholar] [CrossRef]
- Adelani, D.I.; Ojo, J.; Azime, I.A.; Zhuang, J.Y.; Alabi, J.O.; He, X.; Ochieng, M.; Hooker, S.; Bukula, A.; Lee, E.S.A.; et al. IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models. arXiv 2024, arXiv:2406.03368. [Google Scholar] [CrossRef]
- Kudugunta, S.; Caswell, I.; Zhang, B.; Garcia, X.; Choquette-Choo, C.A.; Lee, K.; Xin, D.; Kusupati, A.; Stella, R.; Bapna, A.; et al. MADLAD-400: A Multilingual And Document-Level Large Audited Dataset, 2023. arXiv arXiv:cs.CL/2309.04662.
- Kreutzer, J.; Caswell, I.; Wang, L.; Wahab, A.; van Esch, D.; Ulzii-Orshikh, N.; Tapo, A.; Subramani, N.; Sokolov, A.; Sikasote, C.; et al. Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets. Transactions of the Association for Computational Linguistics 2022, 10, 50–72, [https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00447/109285/Quality-at-a-Glance-An-Audit-of-Web-Crawled]. https://doi.org/10.1162/tacl_a_00447. [Google Scholar] [CrossRef]
- Hussen, K.Y.; Sewunetie, W.T.; Ayele, A.A.; Imam, S.H.; Muhammad, S.H.; Yimam, S.M. The State of Large Language Models for African Languages: Progress and Challenges. arXiv 2025, arXiv:2506.02280. [Google Scholar] [CrossRef]
- Adewumi, T.; Adeyemi, M.; Anuoluwapo, A.; Peters, B.; Buzaaba, H.; Samuel, O.; Rufai, A.M.; Ajibade, B.; Gwadabe, T.; Traore, M.M.K.; et al. AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource. African Languages 2022, arXiv:cs.CL/2204.08083. [Google Scholar]
- Budzianowski, P.; Wen, T.H.; Tseng, B.H.; Casanueva, I.; Ultes, S.; Ramadan, O.; Gašić, M. MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. arXiv 2020, arXiv:cs.CL/1810.00278. [Google Scholar]
- Ogundepo, O.; Gwadabe, T.R.; Rivera, C.E.; Clark, J.H.; Ruder, S.; Adelani, D.I.; Dossou, B.F.P.; DIOP, A.A.; Sikasote, C.; Hacheme, G.; et al. AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages. arXiv 2023, arXiv:cs.CL/2305.06897. [Google Scholar]
- Bandarkar, L.; Liang, D.; Muller, B.; Artetxe, M.; Shukla, S.N.; Husa, D.; Goyal, N.; Krishnan, A.; Zettlemoyer, L.; Khabsa, M. Khabsa, M. The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Ku, L.W.; Martins, A.; Srikumar, V., Eds., Bangkok, Thailand, 2024; pp. 749–77. [CrossRef]
- Blaschke, V.; Fedzechkina, M.; Ter Hoeve, M. Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025; Che, W.; Nabende, J.; Shutova, E.; Pilehvar, M.T., Eds., Vienna, Austria, 2025; pp. 8653–868. [CrossRef]
- Idris, T.K.; Mitra, P.; Eiselen, R. Can Embedding Similarity Predict Cross-Lingual Transfer? A Systematic Study on African Languages. arXiv 2026, arXiv:2601.03168. [Google Scholar] [CrossRef]
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. arXiv 2022, arXiv:2203.02155. [Google Scholar] [CrossRef]
- CohereForAI. The Aya Movement at a glance: Accelerating multilingual AI through open science. https://cohere.com/research/aya/aya-at-a-glance.pdf, 2024. Accessed: 2025-11-25.
- Üstün, A.; Aryabumi, V.; Yong, Z.X.; Ko, W.Y.; D’souza, D.; Onilude, G.; Bhandari, N.; Singh, S.; Ooi, H.L.; Kayid, A.; et al. Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model. arXiv 2024, arXiv:2402.07827. [Google Scholar] [CrossRef]
- Yu, H.; Xu, T.; Hedderich, M.A.; Hamidouche, W.; Zamir, S.W.; Adelani, D.I. AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages. arXiv 2026, arXiv:cs.CL/2601.06395. [Google Scholar]
- Penedo, G.; Kydlíček, H.; Sabolčec, V.; Messmer, B.; Foroutan, N.; Kargaran, A.H.; Raffel, C.; Jaggi, M.; Werra, L.V.; Wolf, T. FineWeb2: One Pipeline to Scale Them All – Adapting Pre-Training Data Processing to Every Language. arXiv 2025, arXiv:cs.CL/2506.20920. [Google Scholar]
- Penedo, G.; Kydlíček, H.; allal, L.B.; Lozhkov, A.; Mitchell, M.; Raffel, C.; Werra, L.V.; Wolf, T. The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale. arXiv 2024, arXiv:2406.17557. [Google Scholar] [CrossRef]
- Dossou, B.F.P.; Tonja, A.L.; Yousuf, O.; Osei, S.; Oppong, A.; Shode, I.; Awoyomi, O.O.; Emezue, C. AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages. In Proceedings of the Proceedings of the Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP); Fan, A.; Gurevych, I.; Hou, Y.; Kozareva, Z.; Luccioni, S.; Sadat Moosavi, N.; Ravi, S.; Kim, G.; Schwartz, R.; Rücklé, A., Eds., Abu Dhabi, United Arab Emirates (Hybrid), 2022; pp. 52–6. [CrossRef]
- Ogueji, K.; Zhu, Y.; Lin, J. Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages. In Proceedings of the Proceedings of the 1st Workshop on Multilingual Representation Learning; Ataman, D.; Birch, A.; Conneau, A.; Firat, O.; Ruder, S.; Sahin, G.G., Eds., Punta Cana, Dominican Republic, 2021; pp. 116–126. [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Burstein, J.; Doran, C.; Solorio, T., Eds., Minneapolis, Minnesota, 2019; pp. 4171–4186. [CrossRef]
- Adebara, I.; Elmadany, A.; Abdul-Mageed, M.; Alcoba Inciarte, A. SERENGETI: Massively Multilingual Language Models for Africa. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023; Rogers, A.; Boyd-Graber, J.; Okazaki, N., Eds., Toronto, Canada, 2023; pp. 1498–1537. [CrossRef]
- TRT Afrika. AWA: Senegalese start-up’s AI muse speaks in Wolof. https://www.trtafrika.com/english/article/18244712, 2024. Accessed: 2025-11-25.
- Gauthier, E.; SégaWade, P.; Moudenc, T.; Collen, P.; De Neef, E.; Ba, O.; Khoyane Cama, N.; Bamba Kebe, A.; Aissatou Gningue, N.; Mendo’O Aristide, T. Preuve de concept d’un bot vocal dialoguant en wolof (Proof-of-Concept of a Voicebot Speaking Wolof). In Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale; Estève, Y.; Jiménez, T.; Parcollet, T.; Zanon Boito, M., Eds., Avignon, France, 6 2022; pp. 403–412.
- Kang, Y.; Zhang, Y.; Kummerfeld, J.K.; Tang, L.; Mars, J. Data Collection for Dialogue System: A Startup Perspective. In Proceedings of the Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers); Bangalore, S.; Chu-Carroll, J.; Li, Y., Eds., New Orleans - Louisiana, 2018; pp. 33–40. [CrossRef]
- Bocklisch, T.; Faulkner, J.; Pawlowski, N.; Nichol, A. Rasa: Open Source Language Understanding and Dialogue Management. arXiv 2017, arXiv:1712.05181. [Google Scholar] [CrossRef]
- Akuthota, K.S.; Kishor Kumar Reddy, C.; Shuaib, M.; Alam, S.; Alshanketi, F.; Kyatham, A.R. A Comprehensive Von Willebrand Disease Awareness and Support Chatbot for Senegalese Communities. In Proceedings of the 2025 International Conference on Information Networking (ICOIN), 2025, pp. 714–719. [CrossRef]
- Qwen.; :.; Yang, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Li, C.; Liu, D.; et al. Qwen2.5 Technical Report. arXiv 2025, arXiv:cs.CL/2412.15115.
- Sy, Y.; Doucoure, D. Oolel: A High-Performing Open LLM for Wolof. https://huggingface.co/soynade-research/Oolel-v0.1, 2024. Accessed: 2025-11-25.
- Mehrish, A.; Majumder, N.; Bhardwaj, R.; Mihalcea, R.; Poria, S. A Review of Deep Learning Techniques for Speech Processing. arXiv 2023, arXiv:eess.AS/2305.00359. [Google Scholar] [CrossRef]
- Tamgno, J.K.; Elingui, P.U.; Mendo’o, A.T.; Richomme, M.; Lishou, C.; Obono, S.D.O. Speech Recognition and Text-to-Speech Solution for Vernacular Languages: Free Software and Community Involvement to Develop Voice Services. In Proceedings of the ICDT 2011: The Sixth International Conference on Digital Telecommunications. IARIA, 2011, pp. 56–63.
- Kakade, S.M.; Krishnamurthy, A.; Mahajan, G.; Zhang, C. Learning Hidden Markov Models Using Conditional Samples. arXiv 2024, arXiv:2302.14753. [Google Scholar] [CrossRef]
- Besacier, L.; Gauthier, E.; Mangeot, M.; Bretier, P.; Bagshaw, P.; Rosec, O.; Moudenc, T.; Pellegrino, F.; Voisin, S.; Marsico, E.; et al. Speech Technologies for African Languages: Example of a Multilingual Calculator for Education. In Proceedings of the Interspeech 2015 (short demo paper), Dresden, Germany, 2015.
- Povey, D.; Ghoshal, A.; Boulianne, G.; Burget, L.; Glembek, O.; Goel, N.K.; Hannemann, M.; Motlícek, P.; Qian, Y.; Schwarz, P.; et al. The Kaldi Speech Recognition Toolkit. In Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, 2011.
- Gauthier, E.; Besacier, L.; Voisin, S. Automatic Speech Recognition for African Languages with Vowel Length Contrast. Procedia Computer Science 2016, 81, 136–143. SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages 09-12 May 2016 Yogyakarta, Indonesia. [CrossRef]
- Gauthier, E.; Besacier, L.; Voisin, S. Speed perturbation and vowel duration modeling for ASR in Hausa and Wolof languages. In Proceedings of the Interspeech 2016 proceedings, San-Francisco, United States, 2016.
- Gauthier, E.; Besacier, L.; Voisin, S. Machine Assisted Analysis of Vowel Length Contrasts in Wolof. arXiv 2017, arXiv:1706.00465. [Google Scholar] [CrossRef]
- DeRenzi, B.; Dixon, A.; Farhi, M.; Resch, C. Synthetic Voice Data for Automatic Speech Recognition in African Languages. 2025. [Google Scholar] [CrossRef]
- SpeechBrain. speechbrain/asr-wav2vec2-dvoice-wolof: ASR model for Wolof. https://huggingface.co/speechbrain/asr-wav2vec2-dvoice-wolof, 2024. Accessed: 2025-11-29.
- Kandji, A.K.; Ba, C.; Ndiaye, S. State-of-the-Art Review on Recent Trends in Automatic Speech Recognition. In Proceedings of the Emerging Technologies for Developing Countries; Masinde, M.; Möbs, S.; Bagula, A., Eds., Cham, 2024; pp. 185–203.
- Abdou Mohamed, N.; Allak, A.; Gaanoun, K.; et al. Multilingual Speech Recognition Initiative for African Languages. International Journal of Data Science and Analytics 2025, 20, 3513–3528. [CrossRef]
- Siminyu, K.; Kalipe, G.; Orlic, D.; Abbott, J.; Marivate, V.; Freshia, S.; Sibal, P.; Neupane, B.; Adelani, D.I.; Taylor, A.; et al. AI4D – African Language Program. arXiv 2021, arXiv:cs.CL/2104.02516. [Google Scholar]
- Pratap, V.; Tjandra, A.; Shi, B.; Tomasello, P.; Babu, A.; Kundu, S.; Elkahky, A.; Ni, Z.; Vyas, A.; Fazel-Zarandi, M.; et al. Scaling Speech Technology to 1,000+ Languages. arXiv 2023, arXiv:cs.CL/2305.13516. [Google Scholar]
- Ericsson, L.; Gouk, H.; Loy, C.C.; Hospedales, T.M. Self-Supervised Representation Learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine 2022, 39, 42–62. [Google Scholar] [CrossRef]
- Baevski, A.; Zhou, H.; Mohamed, A.; Auli, M. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. arXiv 2020, arXiv:cs.CL/2006.11477. [Google Scholar]
- Keren, G.; Kozhevnikov, A.; Meng, Y.; Ropers, C.; Setzler, M.; Wang, S.; Adebara, I.; Auli, M.; Balioglu, C.; et al.; Omnilingual ASR team Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages. arXiv 2025, arXiv:2511.09690. [Google Scholar] [CrossRef]
- Velluet, Q. Senegalese startup Lengo brings AI to informal retailers. The Africa Report 2024. Accessed: 2025-11-29.
- Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust Speech Recognition via Large-Scale Weak Supervision. arXiv 2022, arXiv:2212.04356. [Google Scholar] [CrossRef]
- Basilwango, E.; Oche Ankeli, D.B.; Le Beux, Y. Fine-Tuning Automatic Speech Recognition Models for Wolof and Hausa in the Domain of Maternal and Reproductive Health. Deep Learning Indaba Poster session 1: African Datasets, IndabaX, Publications and General posters nos. 60 – 190, https://drive.google.com/file/d/1Qv8Y7SV0oSJoWjktggdDOoHaXBGAFSAt/, 2025. Accessed: 2025-12-18.
- Diallo, S. Whosper-large: A Multilingual ASR Model for Wolof with Enhanced Code-Switching Capabilities. 2025. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:cs.CL/2106.09685. [Google Scholar]
- Caubrière, A.; Gauthier, E. Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context. arXiv 2024, arXiv:cs.CL/2404.02000. [Google Scholar]
- Hsu, W.N.; Bolte, B.; Tsai, Y.H.H.; Lakhotia, K.; Salakhutdinov, R.; Mohamed, A. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. arXiv 2021, arXiv:cs.CL/2106.07447. [Google Scholar] [CrossRef]
- Caubrière, A.; Gauthier, E. Représentation de la parole multilingue par apprentissage auto-supervisé dans un contexte subsaharien. In Proceedings of the Actes des 35èmes Journées d’Études sur la Parole; Balaguer, M.; Bendahman, N.; Ho-dac, L.M.; Mauclair, J.; G Moreno, J.; Pinquier, J., Eds., Toulouse, France, 7 2024; pp. 163–172.
- Conneau, A.; Ma, M.; Khanuja, S.; Zhang, Y.; Axelrod, V.; Dalmia, S.; Riesa, J.; Rivera, C.; Bapna, A. FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech. arXiv 2022, arXiv:2205.12446. [Google Scholar] [CrossRef]
- Djionang Pindoh, P.; Melatagia Yonta, P. Self-supervised and multilingual learning applied to the Wolof, Swahili and Fongbe. Revue Africaine de Recherche en Informatique et Mathématiques Appliquées 2025, 42. soumission à Episcience. [CrossRef]
- Sy, Y.; Doucouré, D.; Cerisara, C.; Illina, I. Speech Language Models for Under-Represented Languages: Insights from Wolof. arXiv 2025, arXiv:2509.15362. [Google Scholar] [CrossRef]
- Parmar, J.; Satheesh, S.; Patwary, M.; Shoeybi, M.; Catanzaro, B. Reuse, Don’t Retrain: A Recipe for Continued Pretraining of Language Models. arXiv 2024, arXiv:cs.CL/2407.07263. [Google Scholar]
- Black, A.W.; Taylor, P.; Caley, R. Festival Speech Synthesis System: System Documentation, edition 2.4, for Festival version 2.4.0, 2014. Accessed: 2025-11-29.
- GalsenAI-Lab. Anta: Wolof female-voice Text-to-Speech dataset. https://huggingface.co/datasets/galsenai/anta_women_tts, 2024.
- Resemble-AI. Introducing Resemble Enhance: Open Source Speech Super Resolution AI Model. https://www.resemble.ai/introducing-resemble-enhance/, 2023. Accessed: 2025-11-29.
- Casanova, E.; Davis, K.; Gölge, E.; Göknar, G.; Gulea, I.; Hart, L.; Aljafari, A.; Meyer, J.; Morais, R.; Olayemi, S.; et al. XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model. arXiv 2024, arXiv:eess.AS/2406.04904. [Google Scholar]
- GalsenAI-Lab. galsenai/xTTS-v2-wolof. https://huggingface.co/galsenai/xTTS-v2-wolof, 2024.
- Concree. Adia TTS. https://huggingface.co/CONCREE/Adia_TTS, 2025.
- Lacombe, Y.; Srivastav, V.; Gandhi, S. Parler-TTS. https://github.com/huggingface/parler-tts, 2024.
- Lyth, D.; King, S. Natural language guidance of high-fidelity text-to-speech with synthetic annotations. arXiv 2024, arXiv:cs.SD/2402.01912. [Google Scholar]
- Mbaye, M. TTS-WOLOF : Building Inclusive Voice AI for African Languages – The Wolof Case. https://ascii.org.sn/index.php/cnria-2025, 2025. Accessed: 2025-11-29.
- Ogayo, P.; Neubig, G.; Black, A.W. Building African Voices. arXiv 2022, arXiv:2207.00688. [Google Scholar] [CrossRef]
- Ji, S.; Chen, Y.; Fang, M.; Zuo, J.; Lu, J.; Wang, H.; Jiang, Z.; Zhou, L.; Liu, S.; Cheng, X.; et al. WavChat: A Survey of Spoken Dialogue Models. arXiv 2024, arXiv:eess.AS/2411.13577. [Google Scholar]
- Fihlani, P. Lost in translation - How Africa is trying to close the AI language gap. https://www.bbc.com/news/articles/crkzgkkpx0lo, 2025. Accessed: 2025-11-28.
- Peng, J.; Wang, Y.; Li, B.; Guo, Y.; Wang, H.; Fang, Y.; Xi, Y.; Li, H.; Li, X.; Zhang, K.; et al. A Survey on Speech Large Language Models for Understanding. arXiv 2025, arXiv:eess.AS/2410.18908. [Google Scholar] [CrossRef]
- Arora, S.; Chang, K.W.; Chien, C.M.; Peng, Y.; Wu, H.; Adi, Y.; Dupoux, E.; Lee, H.Y.; Livescu, K.; Watanabe, S. On The Landscape of Spoken Language Models: A Comprehensive Survey. arXiv 2025, arXiv:2504.08528. [Google Scholar] [CrossRef]
- GalsenAI. Keyword Spotting with African Languages. https://k4all.org/project/keyword-spotting-with-african-languages/, 2019. Accessed: 2025-11-28.
- Warden, P. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. arXiv 2018, arXiv:1804.03209. [Google Scholar] [CrossRef]
- Adjanohoun, D.S.; Mbengue, T.P.; Sow, D.; Wade, C.S.; Seye, M.R.; Mbaye, D.; Diallo, M.; Ndiaye, M.L.; De-Roulet, P.; Munyaka, J.C.B.; et al. The digital policies in the face of access and usage inequalities among young people in intermediate cities in Senegal: The case of Saint-Louis. Journal of Infrastructure, Policy and Development 2025, 9, 10899. [Google Scholar] [CrossRef]
- Sow, D.; Adjanohoun, D.; Mbengue, T.; Wade, C.; Seye, M.; Mbaye, D.; Diallo, M.; Ndiaye, M.; De Roulet, P.; Munyaka Baraka, J.C.; et al. DIGITAL INCLUSION AND YOUTH PARTICIPATION IN URBAN GOVERNANCE IN SUB-SAHARAN AFRICA: THE CASE OF SAINT-LOUIS, SENEGAL. GLOBAL JOURNAL FOR RESEARCH ANALYSIS 2025, 50–55. [Google Scholar] [CrossRef]
- Yang, J.; Hussein, A.; Wiesner, M.; Khudanpur, S. JHU IWSLT 2022 Dialect Speech Translation System Description. In Proceedings of the Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022); Salesky, E.; Federico, M.; Costa-jussà, M., Eds., Dublin, Ireland (in-person and online), 2022; pp. 319–326. [CrossRef]
- Hou, Y.; Huang, J. Natural language processing for social science research: A comprehensive review. Chinese Journal of Sociology 2025, 11, 121–157. [Google Scholar] [CrossRef]
- Rathje, S.; Mirea, D.M.; Sucholutsky, I.; Marjieh, R.; Robertson, C.E.; Bavel, J.J.V. GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences 2024, 121, e2308950121, [https://www.pnas.org/doi/pdf/10.1073/pnas.2308950121]. [Google Scholar] [CrossRef] [PubMed]
- Gilardi, F.; Alizadeh, M.; Kubli, M. ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences 2023, 120. [Google Scholar] [CrossRef]
- Chiang, C.H.; yi Lee, H. Can Large Language Models Be an Alternative to Human Evaluations? arXiv 2023, arXiv:2305.01937. [Google Scholar] [CrossRef]
- Abdurahman, S.; Ziabari, A.S.; Moore, A.K.; Bartels, D.M.; Dehghani, M. A Primer for Evaluating Large Language Models in Social-Science Research. Advances in Methods and Practices in Psychological Science 2025, 8, 25152459251325174. [Google Scholar] [CrossRef]
- Suri, G.; Slater, L.R.; Ziaee, A.; Nguyen, M. Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5 2023, arXiv:cs.AI/2305.04400. [Google Scholar]
- Ecofin. Atos to deliver Senegal’s supercomputer in the coming months. https://www.ecofinagency.com/telecom/1501-39520-atos-to-deliver-senegal-s-supercomputer-in-the-coming-months, 2019. Accessed: 2025-12-14.
- Socialnetlink. Macky Sall exige la mise en service du Supercalculateur non fonctionnel depuis son acquisition à hauteur de 15 millions d’euros. https://www.socialnetlink.org/2022/01/27/macky-sall-exige-la-mise-en-service-du-supercalculateur-non-fonctionnel-depuis-son-acquisition-a-hauteur-de-15-millions-deuros/, 2022. Accessed: 2025-12-14.
- UNESCO. Global AI Ethics and Governance Observatory. https://www.unesco.org/ethics-ai/en/senegal, 2025. Accessed: 2025-12-14.
- Ecofin. Le Sénégal dévoile une stratégie à 206 millions de dollars pour numériser l’éducation. https://www.agenceecofin.com/actualites-numerique/0701-124683-le-senegal-devoile-une-strategie-a-206-millions-de-dollars-pour-numeriser-l-education, 2025. Accessed: 2025-12-14.
- RTS. Éducation: Le MEN lance un vaste programme d’intégration de l’intelligence artificielle à l’école. https://www.rts.sn/actualite/detail/a-la-une/education-le-men-lance-un-vaste-programme-dintegration-de-lintelligence-artificielle-a-lecole, 2025. Accessed: 2025-12-14.
- Presidency. Senegal Signs a $10 Million Strategic Partnership with the Gates Foundation to Accelerate the Technological New Deal. https://www.presidence.sn/en/actualites/senegal-signs-a-10-million-strategic-partnership-with-the-gates-foundation-to-accelerate-the-technological-new-deal-1, 2024. Accessed: 2025-12-14.
- Gueye, O. Le Sénégal en force au Sommet mondial pour l’Action sur l’Intelligence Artificielle à Paris. https://letechobservateur.sn/le-senegal-en-force-au-sommet-mondial-pour-laction-sur-lintelligence-artificielle-a-paris/, 2025. Accessed: 2025-12-14.
- Sikiru, R. From Senegal with Insights: Learnings from Deep Learning Indaba 2024. https://medium.com/@rasheedatsikiru/from-senegal-with-insights-learnings-from-deep-learning-indaba-2024-def6b334b596, 2024. Accessed: 2025-12-14.
- Fall, A. SALTIS 2025: Africa codes its future in Dakar. https://www.seneweb.com/en/news/Technologie/saltis-2025-lafrique-code-son-futur-a-dakar_n_464812.html, 2025. Accessed: 2025-12-14.
- Africa Tech Review. Senegal bolsters AI development with launch of ALIVE and DiCentre4AI laboratories. https://techreviewafrica.com/news/1997/senegal-bolsters-ai-development-with-launch-of-alive-and-dicentre4ai-laboratories, 2025. Accessed: 2025-12-14.
- Chalumeau, F.; Rajaonarivonivelomanantsoa, D.; de Kock, R.; Formanek, C.; Abramowitz, S.; Mahjoub, O.; Khlifi, W.; Toit, S.D.; Nessir, L.B.; Shabe, R.; et al. Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies. arXiv 2025, arXiv:2505.21236. [Google Scholar] [CrossRef]
- Google, Scholar. Top publications. https://scholar.google.es/citations?view_op=top_venues&hl=en&vq=eng_artificialintelligence, 2025. Accessed: 2025-12-18.
- UNESCO. AI’s potential for Africa development and prosperity. https://www.unesco.org/en/articles/ais-potential-africa-development-and-prosperity, 2025. Accessed: 2025-12-14.
- Ruder, S.; Clark, J.; Gutkin, A.; Kale, M.; Ma, M.; Nicosia, M.; Rijhwani, S.; Riley, P.; Sarr, J.M.; Wang, X.; et al. XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 2023, p. 1856–1884. [CrossRef]
- Ahesi, *!!! REPLACE !!!*. Using WhatsApp for Speech Dataset Building in Ghana. https://ashesi-org.github.io/dataset/nlp/ai/whatsapp/speech/2022/05/16/using-whatsapp-speech-dataset.html, 2022. Accessed: 2025-12-14.
| 1 | Traditionally been framed in terms of access to internet connectivity. |
| 2 | Presidential Decree No 71-566 of May 21, 1971. |
| 3 | |
| 4 | Senegal’s population is approximately 18.5 million people, with recent estimates [11]. |
| 5 | Process of representing a word, phrase, or text in a different script or writing system. |
| 6 | This variant is usually used in NLP research papers. |
| 7 | |
| 8 | A technique in machine learning (ML) in which knowledge learned from a task/language, is re-used in order to boost performance on a related task/language. |
| 9 | |
| 10 | Lexical Functional Grammar. |
| 11 | A morpheme that has syntactic characteristics of a word, but depends phonologically on another word or phrase. |
| 12 | The coverage has subsequently been extended to 20 African languages in [45] with no additional Senegalese languages. |
| 13 | FWLSA = French and Wolof Lexicon-based Sentiment Analysis. |
| 14 | The use of two or more languages in the same sentence. |
| 15 | Banking, Home, Travel, Utility, and Kitchen & Dining. |
| 16 | Texts translated by either humans or machines. |
| 17 | |
| 18 | A string of characters that defines a search pattern for matching text |
| 19 | A service that uses the principles of REST (Representational State Transfer) to allow applications to communicate over the web using standard HTTP methods. |
| 20 | |
| 21 | |
| 22 | |
| 23 | |
| 24 | |
| 25 | Input-output pairs that teach AI models to follow instructions, with each data point typically containing an instruction, an input query, and the expected output response. |
| 26 | |
| 27 | Senegalese Artificial Intelligence community with thousands of members across Senegal, Africa, and the diaspora. |
| 28 | |
| 29 | Approach, in which the dialog system is broken down into a pipeline of different sub-tasks. |
| 30 | A lifelong bleeding condition that makes it hard for blood to clot. |
| 31 | Authors used certainly, a proprietary conversational AI platform to train an NLU model. |
| 32 | Series of failures in a system of interconnected parts, where the failure of one component triggers the failure of others. |
| 33 | |
| 34 | Two versions (short/ long) of a same vowel that exist in the phoneme inventory of the language. |
| 35 | Gaussian Mixture Model, Deep Neural Networks and Hidden Markov model. |
| 36 | |
| 37 | |
| 38 | |
| 39 | |
| 40 | Leading telecom operator in Senegal. |
| 41 | |
| 42 | |
| 43 | Conference on Research in Computer Science and its Applications |
| 44 | |
| 45 | Study and design of how people and computers interact with each other, focusing on creating user-friendly and effective technology interfaces. |
| 46 | The task of learning to detect spoken keywords. |
| 47 | |
| 48 | Feedback shared by a small group of local social science researchers we interviewed. |
| 49 | Senegalese Committee for the Protection of Sensitive Data. |
| 50 | Senegalese Agency for Copyright and Related Rights. |
| 51 | |
| 52 | |
| 53 | |
| 54 | |
| 55 | |
| 56 | Neural Information Processing Systems. |
| 57 | |
| 58 | |
| 59 | |
| 60 | |
| 61 |


| Language | Text corpora | Speech corpora | Existing tools |
|---|---|---|---|
| Wolof | OPUS, FLORES-200, NTREX, LORELEI, MAFAND-MT, SMOL, MADLAD-400, MasakhaNER, MasakhaPOSUniversal Dependencies, AfriQA, AYA, AfriWOZ-1.0, Belebele, FineWeb2, AWOFRO, WolBanking77, Masakhane-NLU, AjamiXTranslit | AI4D-Urban, ALFFA, WolBanking77, Waxal, Fleurs, Kallaama, AI4D-Baamtu, Keyword Spotting | Wolof keyboards, Wolof library, Stanza, MorphScore, Common Voice, Dvoice, AfroLID,GlotLID |
| Pulaar | AjamiXTranslit, FineWeb2, Fulani-English Pair Dataset, MADLAD-400, SMOL | Kallaama, Keyword Spotting | - |
| Sereer | - | Kallaama, Keyword Spotting | - |
| Joola | - | Keyword Spotting | - |
| Mandinka | - | Keyword Spotting | - |
| Soninké | AjamiXTranslit | Keyword Spotting | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
