Submitted:
11 October 2025
Posted:
15 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
2.1. Mobile Documentation
2.2. AI for Languages with Limited Resources
2.3. Ethics and Digital Archives
2.4. Research Gap
- The contribution of methodology: Offers a mobile + AI integrated platform for recording ecological information, customary law, and oral traditions from tribes. Creates a mixed-methods strategy that incorporates AI testing, mobile data collection, and ethnography.
- The Contribution of Technology: Showcases the use of ASR, NLP, and GIS mapping in tribal language contexts with limited resources. Offers a mobile prototype for knowledge archiving run by the community.
- The Contribution of Sociocultural: Preserves intangible legacy (rules, rituals, ecosystem) that goes beyond language, strengthening tribal identity. Encourages intergenerational cooperation in which young people use digital tools and seniors share oral traditions.
- Ethical Contribution: AI-based preservation frameworks incorporate Indigenous Data Sovereignty and CARE Principles. Provides protection against exploitation, cultural sensitivity, and community control.
- The Contribution of Policy: Offers suggestions for incorporating indigenous knowledge and languages into legal, educational, and e-governance frameworks. Closes the gap between policy inclusion and technological adoption in tribal contexts.
3. Methodology
3.1. Design of Research
3.2. Information Gathering
- To gather oral traditions, ecological knowledge, and attitudes towards digital preservation, tribal elders and young people will participate in semi-structured interviews and focus group discussions (FGDs).
- Tools Used: Transcribe and ELAN software will be used for transcription and annotation, while digital audio recorders (Zoom H1n, Tascam DR-05X) will be used to acquire high-quality speech data.
- To capture oral histories, folk music, traditional practices, and ecological knowledge, a unique mobile application prototype will be created, drawing inspiration from the Aikuma and First Voices frameworks.
- The application will offer time-aligned annotations and offline-first storage, which will enable operation in remote locations with inadequate connectivity.
- ELAN and Pra at will be used for audio annotation in order to accomplish phonetic labeling and linguistic segmentation.
- To be compatible with AI training pipelines, text annotation will be saved in structured formats (JSON/XML).
- Tribal dialects that are in line with Hindi and English will be included in the data, allowing for multilingual machine translation.
3.3. AI Tools Applied
- Method: Annotated tribal datasets will be used to train end-to-end deep learning models built on Transformer architectures.
- Output: Tribal language transcription of speech to text.
- Word Error Rate (WER) is the evaluation metric.
- Frameworks: Hugging Face Transformers (mBART, mT5), Marian NMT, and Open NMT will be used for multilingual translation.
- Method: For tribal ↔ Hindi/English translation, neural Seq2Seq models with attention processes will be employed.
- BLEU Score and METEOR Score are evaluation metrics.
- Tools: Google Earth Engine, Arc GIS Pro, and QGIS.
- • Method: Sacred Groves, water sources, and migration routes will be recorded using GPS-enabled cell phone data. Land-use changes will be verified using AI-assisted remote sensing.
- Result: Cultural-ecological maps with geo referencing.
3.4. Data Analysis
3.5. Ethical Protocols
- Platform: Community-owned archives will be created using Mukurtu CMS.
- The method will incorporate role-based access management, which includes limiting access to religious music or rites.
- Use: Guarantees adherence to customary standards and cultural privacy.
4. Results
4.1. Automatic Speech Recognition in Word Error Rate
- The WER was 32% at 10 hours of speech data.
- When the speech data reached 50 hours, the WER dropped to 18%.
- S = substitutions
- D = deletions
- I = insertions
- N = total number of words.
4.2. NLP & MT Performance: BLEU Score Analysis
- From 12 (epoch 1) to 31 (epoch 10), BLEU scores increased.
- METEOR ratings stabilized at 0.62 after following a similar pattern.
- pn = n-gram precision,
- wn = weight of n-gram,
- BP = brevity penalty.
4.3. Community Feedback Analysis
- Average trust in AI tools: 4.1
- Cultural relevance: 3.8
- Usability (mobile applications): 4.3
- Privacy concerns: 3.5
4.4. GIS Mapping
- Outcome: Community elders verified the spatial accuracy, confirming a 90% overlap with their oral ecological maps.
4.5. Qualitative Analysis
4.6. Ethics
5. Discussion


| Domain | Result | Discussion/Interpretation |
|---|---|---|
| ASR | WER 18–22% | Accurate for everyday speaking; had trouble with cultural jargon; scores improved with elder approval . |
| NLP/MT | BLEU 24–30; METEOR 0.55–0.62 | Idiomatic expressions are lost; hybrid AI-human validation is required; literal translations are good. |
| GIS Mapping | 90% accuracy validated by elders | Beneficial for ecological and land rights; if not under community control, there is a potential of abuse. |
| Qualitative Themes | Digital literacy gap, trust concerns, identity preservation | In order to respect cultural dynamics, technology must be co-designed with communities. |
| Ethics | CARE principles, Mukurtu-based archives | In line with Indigenous Data Sovereignty, ethical precautions are essential. |
6. Conclusion
7. Future Work
References
- Bird, S. (2014). Aikuma: A mobile app for collaborative language documentation. ACL Workshops.
- Carroll, S. R., et al. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43. [CrossRef]
- Christen, K. (2012). Does information really want to be free? Indigenous knowledge systems and the question of openness. International Journal of Communication, 6, 2870–2893.
- Coto-Solano, R., et al. (2022). Automatic Speech Recognition for Cook Islands Māori. Proceedings of LREC 2022.
- First Peoples’ Cultural Council (FPCC). (2019). FirstVoices: Indigenous language archiving and teaching platform.
- Mehta, D., et al. (2020). Learnings from technological interventions in a low-resource language: A case-study on Gondi. arXiv preprint arXiv:2004.10270. [CrossRef]
- Te Hiku Media. (2021). ASR for te reo Māori: Community-owned AI for language revitalization.
- The Hindu. (2025, January 15). Adi Vaani: Digital governance for tribal language inclusion. The Hindu.


| Theme | Existing Research | Identified Gaps | Implication for Present Study |
|---|---|---|---|
| Mobile Technology in Tribal Knowledge | Oral traditions have been successfully documented in Gondi with IVR-based interventions and mobile apps such as Aikuma [1,6]. | The majority of research is language-focused, with little attention paid to ecological knowledge, customary law, and cultural practices utilising mobile means. | The study broadens the use of mobile technology to encompass law, culture, and ecology in addition to language. |
| AI Applications (ASR/NLP/MT) | In endangered languages, Te Hiku Media's Māori ASR and Cook Islands Māori ASR models demonstrate [4,7]. | There are few tribal case studies from India, and little is known about AI accuracy in tribal languages with limited resources. | The work uses AI (ASR, NLP, and GIS) in tribal environments in India (e.g., Bhil, Gondi). |
| Digital Archives & Community Platforms | Culturally sensitive digital archives are offered by FirstVoices and Mukurtu CMS [3,5]. | There aren't many Indian tribes' archives, and they aren't integrated with mobile or AI technologies. | For India, the study suggests a mobile + AI + community archive model. |
| Ethical Frameworks & Data Sovereignty | Community control is emphasized by the IDS and CARE Principles movements[2]. | IDS/CARE usage in Indian tribal research is quite low, and AI programs do not incorporate ethical precautions. | To guarantee ethical preservation, the study incorporates CARE principles into the mobile-AI system. |
| Policy Integration | Tribal languages are frequently left out of India's e-governance programs [8] | lack of a formal framework connecting national/state policy and AI-based tribal knowledge preservation. | The report creates policy suggestions to connect governance and technology. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
