Preprint
Article

This version is not peer-reviewed.

Digital Platform for Teaching and Preservation of the Kayapó Language Indigenous Language

Submitted:

13 June 2026

Posted:

16 June 2026

You are already at the latest version

Abstract
The preservation of Indigenous languages is hindered by the scarcity of technological resources and the lack of culturally appropriate educational tools, a reality that is evident in the Kayapó (Mebêngôkre) language. This article presents the development of a bilingual digital platform aimed at language teaching, learning, and linguistic preservation, designed through bibliographic research and software engineering principles. The study combines the identification of sociocultural requirements, prototyping with emerging technologies, and comparative functional validation. The results indicate that the platform is feasible, accessible, and compatible with Kayapó language learning practices, contributing to community engagement and linguistic revitalization.
Keywords: 
;  ;  ;  ;  

1. Introduction

In Brazil, the country’s vast ethnic and linguistic diversity contrasts with the fact that many Indigenous languages are at risk of extinction, as a result of historical processes of colonization, territorial displacement, and educational models that disregard native epistemologies (Alnizar, 2023). This scenario compromises the intergenerational transmission of knowledge and cultural practices, increasing educational and social inequalities. Among the languages in vulnerable situations, the Kayapó (Mebêngôkre) language stands out, spoken by communities in southern Pará, whose continuity depends on initiatives aimed at linguistic strengthening and valorization (Zoia and Rondon, 2021).
Despite advances in emerging technologies applied to education and software development, there is still a significant gap in the development of technological solutions focused on Brazilian Indigenous languages. Although technologies such as Artificial Intelligence (AI), cloud computing, and automation have improved the quality and scalability of digital systems (Amalfitano et al., 2024; Angrish et al., 2017; Abboud and Jacob, 2023), many of these solutions show limited adaptation to the cultural and sociolinguistic specificities of minority communities. Studies indicate that low-resource languages remain underserved due to the scarcity of data, infrastructure, and adequate technological tools (Haddow et al., 2022; Liu et al., 2022). Furthermore, linguistic preservation initiatives often result in isolated projects with limited integration into community practices, highlighting the need for more robust and pedagogically oriented digital platforms to support the teaching and preservation of these languages (Alnizar, 2023; Zoia and Rondon, 2021; Criollo et al., 2023; Heyndels, 2023).
From this perspective, this research is justified by the need to expand the development of technological solutions aimed at valuing and strengthening Indigenous languages. The creation of a digital environment for the Kayapó Language (Mebêngôkre) language makes it possible to integrate technological resources with educational and cultural practices, contributing to the dissemination of linguistic knowledge and supporting cultural preservation. In addition, the use of educational technologies can facilitate access to learning materials and stimulate new forms of interaction with the language, strengthening its presence in digital and educational environments while promoting the appreciation of Indigenous linguistic heritage.
In this context, this study proposes the development of a virtual learning platform based on digital technologies, conceived as a social technology to support the learning and preservation of the Kayapó Language Indigenous language (Mebêngôkre). This study did not involve data collection from human participants, being based exclusively on publicly available data and computational analysis of an expandable functional software model (Angrish et al., 2017), following an iterative prototyping approach supported by AI techniques for testing and architecture optimization (Bjarnason, Lang, and Mjöberg, 2023). The research also adopts an intercultural perspective, considering technology as a support instrument for use by field agents of SESAI, functioning as a communication support tool rather than a system intended for direct access by members of the Kayapó community.
The remainder of this article is organized as follows: Section 2 presents the Kayapó language. Section 3 presents related work. Section 4 describes the adopted methodology. Section 5 presents the results and discussion. Finally, Section 6 presents the concluding remarks.

2. The Kayapó Language (Mebêngôkre)

The Kayapó Language language, known as Mebêngôkre, belongs to the Jê family of the Macro-Jê linguistic trunk, one of the main Indigenous linguistic groupings in Brazil (Rodrigues, 1986). It is spoken by communities located in the states of Pará and Mato Grosso, with a strong presence in southern Pará (Moore, Galucio, and Gabas Jr., 2008). In addition to presenting linguistic characteristics typical of Jê languages, Mebêngôkre plays a central role in the cultural identity of the Kayapó people, serving as a means for transmitting traditional knowledge, cultural practices, and community values (Turner, 1995). Thus, the language not only enables everyday communication but also contributes to cultural continuity and the strengthening of the collective identity of Kayapó.

4. Materials and Methods

This research is characterized as applied, technological, and documentary in nature, aiming at the development of a bilingual digital platform (Kayapó–Portuguese) focused on teaching, learning, and language preservation. The study adopts a sociotechnical and interdisciplinary approach, integrating foundations of Software Engineering, emerging technologies, and principles of social technology, with emphasis on creating an accessible and culturally contextualized solution.
The methodological process was structured into four main stages: requirements gathering, system development, technological implementation, and functional validation. Initially, the study involved the collection and analysis of sociocultural, linguistic, and technical requirements based on specialized literature and publicly available databases related to the Kayapó Language language, as illustrated in Figure 1. This stage followed principles of problem-oriented requirements engineering, aiming to understand the system requirements and define its essential functionalities (Zave and Jackson, 1997).
During the development stage, the Model-View-Controller (MVC) architectural model was adopted, as it is widely used in web applications for promoting the separation between business logic, user interface, and data control. This approach contributes to improving the structural organization of the system, facilitating maintenance, and enabling application scalability. The development of the platform employed open-source web technologies, including PHP for server-side processing, SQL for database management, HTML and CSS for interface structure and styling, and JavaScript for dynamic user interaction. The Bootstrap library was used to ensure responsiveness and compatibility across different devices, especially smartphones and tablets (Li, Wang, and Zhang, 2023).
To enhance the robustness and security of the system, this stage also incorporated techniques based on language models for exploratory vulnerability analysis, which can be implemented through machine learning frameworks such as TensorFlow and applied to vulnerability analysis and software security testing. The use of AI-based models enables the identification of failure patterns and potential software risks, contributing to the continuous improvement of system quality (Topcu et al., 2023).
The source code of the Kayapó platform was submitted to different language models (GPT-2 Small, BERT, T5, and XLNet) through structured prompts describing each vulnerability category from the OWASP Top 10 framework. For each category, the model was instructed to identify risk patterns in the code and assign a compliance score ranging from 0 to 100. Following feedback alignment principles according to the methodology proposed by Ouyang et al. (2022), the model responses were compared by two evaluators with expertise in software development. In cases of disagreement, consensus was adopted to determine which response identified vulnerabilities with greater accuracy and depth. This comparative judgment informed a final scoring criterion for each category.
The final stage consisted of the functional validation of the platform, carried out through a comparative analysis with different technologies and existing approaches in the field, including neural machine translation systems, academic initiatives such as AmericasNLP, and linguistic documentation platforms such as FirstVoices and Enduring Voices Project. The evaluation considered criteria such as objectives, methodology, system architecture, accessibility, and social technology transfer, based on methodologically valid procedures for comparative analysis in applied sciences and principles of multicriteria decision-making (Saaty and Vargas, 2012; Pant et al., 2022).

5. Results and Discussion

The analysis of the obtained results indicates that the proposed platform demonstrates technical feasibility to support language preservation initiatives, since the functional validation of the system confirmed the proper operation of the implemented modules, especially the word registration module, which allows the insertion of terms in the Kayapó language along with their respective Portuguese translations and associated audio files for correct pronunciation. In its current version, the platform contains 200 registered terms for demonstration purposes, a number that can be expanded according to the need for incorporating new specific terms based on user demand.
Figure 2 illustrates an organized interface, featuring a layout divided between a side navigation menu and a central content panel. The menu presents essential modules identified by intuitive icons, facilitating quick navigation. The interface prioritizes visual clarity, while the light/dark theme selector enhances accessibility, visual comfort, and personalization of the user experience. The central interface concentrates the system’s main functionality: searching and reproducing words in the Kayapó Language language. The header integrates an Indigenous cultural icon along with a search bar for efficient term retrieval. Words are displayed in a table containing their Portuguese equivalents, Kayapó translations, and either synthesized or native audio playback for pronunciation support. Playback controls, integration with VLibras, and zoom features further reinforce accessibility, inclusion, and interface adaptability.
The light mode, in particular, was designed as a digital inclusion and pedagogical engagement feature, improving usability in well-lit environments and enhancing the experience of users with light sensitivity (Li, Wang, and Zhang, 2023).
This feature reinforces the platform’s commitment to ensuring that the teaching, learning, and preservation of the Kayapó Language language take place in an accessible, comfortable, and democratic manner, as illustrated on the home screen with the light mode activated in Figure 3.
Through the use of language models, the source code was evaluated according to web development security criteria based on the methodological perspectives proposed by Ouyang et al. (2022). The analysis revealed the appropriate use of prepared statements, bind_param, and password_hash, ensuring protection against SQL injection attacks and preserving password integrity. However, aspects related to input validation, session management, and error message handling still require further improvement.
The results of this evaluation, summarized in Table 1, present the scores obtained by each language model across the seven vulnerability categories defined by the OWASP Top 10 framework, with scores ranging from 0 to 100 for each category and a maximum possible total score of 700 points.
The vulnerability analysis results revealed significant differences among the evaluated models. T5 achieved the best overall performance (496 points), standing out in the detection of critical vulnerabilities, particularly in Insecure Design and Software and Data Integrity Failures, which confirms its high capability in identifying complex security risks. XLNet demonstrated robust performance (470 points), maintaining consistency in highly critical scenarios. GPT-2 Small (450 points) showed limitations related to design flaws and authentication vulnerabilities, while BERT obtained the lowest score (265 points), exhibiting greater sensitivity to cryptographic weaknesses. The integration of advanced Natural Language Processing (NLP) models proved effective in strengthening system security, improving code structural quality, and increasing software resilience.
In comparison, the Kayapó platform adopts a modular architecture based on the Model-View-Controller (MVC) pattern, using open-source technologies such as PHP, SQL, JavaScript, and Bootstrap, along with intelligent language models for security testing, optimization, and scalability. The object-oriented modeling approach, the use of classes, and the structured development methodology demonstrate a level of Software Engineering rigor that is absent in most existing documentation-oriented tools (Nguyen et al., 2022), as shown in Table 2.
The state-of-the-art analysis demonstrates that, although significant advances have been achieved in machine translation for Indigenous languages, most existing solutions remain restricted to experimental or academic environments, with limited attention given to practical deployment, accessibility, and social impact.
In this context, the platform presented in this article distinguishes itself by integrating Software Engineering, Artificial Intelligence (AI), and social technology, positioning itself not only as a computational system but also as a replicable model for language preservation, aligned with the real needs of the Kayapó community and international quality standards (Kann et al., 2022).
For the construction of the radar chart, the qualitative descriptors presented in Table 2 were converted into normalized numerical scores (0–100%), following an ordinal scale based on the approach proposed by Kusmaryono, Wijayanti, and Maharani (2022), which establishes the assignment of proportional numerical values to each qualitative response category as a methodologically valid procedure for comparative analysis in applied sciences, combined with the principles of multicriteria decision-making (Saaty and Vargas, 2012; Pant et al., 2022).
Each descriptor received a value proportional to its relative position within the evaluation spectrum of the analyzed criterion: Absent and Not Provided were coded as 0%; Limited as 20%; Partial as 50%; Moderate as 60%; and Core to the Project or Priority as 100%, as illustrated in Figure 4.
This conversion enables comparative visual representation across multiple axes simultaneously, preserving the ordinal relationships among the systems without assuming interval equivalence between the original qualitative descriptors. The criteria of Accessibility and Social Transfer received 0% for Neural Machine Translation (NMT) and AmericasNLP (both classified as Absent and Not Provided), intermediate scores for FirstVoices and Enduring Voices Project (Partial and Moderate), and scores between 90% and 100% for the Kayapó Platform, due to the incorporation of VLibras, responsive design, and its central role within the project.
The criteria of Objective, Methodology, Architecture, and Cultural Validation followed the same proportional logic. The visual results clearly demonstrate that the Kayapó Platform achieves broader coverage in the social and accessibility dimensions, while NMT systems and AmericasNLP remain predominantly restricted to technical and academic domains.

6. Final Considerations

This study presented the development of a bilingual digital platform designed for the teaching, learning, and preservation of the Kayapó Language Indigenous language (Mebêngôkre), structured on the principles of Software Engineering, emerging technologies, and a sociotechnical perspective. The proposal sought to address the existing gap in the availability of technological solutions aimed at low-resource languages by integrating lexical access functionalities, oral language support, and accessibility features within a unified digital environment.
The results obtained indicate that the platform demonstrates technical feasibility and potential application in educational and linguistic preservation contexts, as it provides a modular architecture based on the Model-View-Controller (MVC) pattern while incorporating features that enhance usability, accessibility, and user interaction. Comparative analysis with existing systems showed that the proposed solution expands beyond the traditional focus of linguistic tools by integrating technical, educational, and social dimensions into a single application.
From a scientific perspective, this work contributes by proposing an integrated approach between software engineering and linguistic preservation, highlighting the role of digital technologies as instruments to support the valorization of intangible cultural heritage. Furthermore, the use of language models for exploratory security analysis reinforces the discussion regarding the application of Artificial Intelligence techniques in supporting software development, even if in a complementary manner.
However, the study presents important limitations. The main limitation concerns the absence of empirical validation with end users, since the evaluation was restricted to functional and comparative system analysis. Additionally, the intercultural perspective adopted was based solely on bibliographic review, without involving direct interaction with the Kayapó community. Another relevant point is the exploratory nature of the security analysis based on language models, which does not replace specialized verification and security testing tools.
As future work, it is proposed to conduct empirical studies involving user participation, especially members of Indigenous communities, in order to validate the platform’s usability, pedagogical effectiveness, and cultural adequacy. It is also recommended to expand the linguistic resources by increasing the vocabulary database, which is still under development, and integrating the system with more robust linguistic databases. Finally, the proposed approach demonstrates strong potential for replication across other Indigenous languages, contributing to broader language preservation initiatives.
In summary, this study reinforces the idea that the development of digital technologies guided by social and cultural principles can represent an important strategy for the preservation, strengthening, and appreciation of Indigenous languages.

References

  1. Abboud, D.; Jacob, D. Implementation of a new authorization system from monolithic solution to microservice architecture. 2023. Available at: https://arxiv.org/abs/2307.05994.
  2. Alnizar, F. Review of Eda Derhemi and Christopher Moseley (eds.). Endangered Languages in the 21st Century. London: Routledge, 2023. Russian Journal of Linguistics, v. 27, n. 3, p. 745–749, 2023. [CrossRef]
  3. Amalfitano, D.; Faralli, S.; Hauck, J. C. R.; Matalonga, S.; Distante, D. Artificial intelligence applied to software testing: a tertiary study. ACM Computing Surveys, v. 56, n. 3, art. 58, p. 1–38, 2024. [CrossRef]
  4. Angrish, A.; Starly, B.; Lee, Y.-S.; Cohen, P. H. A flexible data schema and system architecture for the virtualization of manufacturing machines (VMM). Journal of Manufacturing Systems, v. 45, p. 236–247, 2017. [CrossRef]
  5. Bjarnason, E.; Lang, F.; Mjöberg, A. An empirically based model of software prototyping: a mapping study and a multi-case study. Empirical Software Engineering, v. 28, art. 115, 2023. [CrossRef]
  6. Chen, X.; Hu, X.; Huang, Y. et al. Software engineering based on deep learning: progress, challenges and opportunities. Science China Information Sciences, v. 68, 111102, 2025. [CrossRef]
  7. Criollo, C.; Govea, J.; Játiva, W.; Pierrottet, J.; Guerrero Arias, A.; Jaramillo Alcázar, Á.; Luján Mora, S. Towards the integration of emerging technologies as support for the teaching and learning model in higher education. Sustainability, v. 15, n. 7, art. 6055, 2023. [CrossRef]
  8. Haddow, B.; Bawden, R.; Barone, A. V. M.; Helcl, J.; Birch, A. Survey of low-resource machine translation. Computational Linguistics, v. 48, n. 3, p. 673–732, 2022. [CrossRef]
  9. Harrison, K. D. When languages die: the extinction of the world’s languages and the erosion of human knowledge. New York: Oxford University Press, 2007. 292 p. [CrossRef]
  10. Heyndels, S. Technology and neutrality. Philosophy & Technology, v. 36, n. 4, art. 75, 2023. [CrossRef]
  11. Kann, K.; et al., AmericasNLI: machine translation and natural language inference systems for Indigenous languages of the Americas. Frontiers in Artificial Intelligence, v. 5, 2022. [CrossRef]
  12. Kusmaryono, I.; Wijayanti, D.; Maharani, H. R. Number of response options, reliability, validity, and potential bias in the use of the Likert scale education and social science research: a literature review. International Journal of Educational Methodology, v. 8, n. 4, p. 625–637, nov. 2022. [CrossRef]
  13. Li, X.; Wang, Z.; Zhang, Q. Research on HTML5 responsive web front-end development based on Bootstrap framework. In: 2023 5th International Conference on Computer, Communication and Network Technology (CCNT). 2023. [CrossRef]
  14. Liu, Z.; Richardson, C.; Hatcher JR., R.; Prud’hommeaux, E. Not always about you: prioritizing community needs when developing endangered language technology. arXiv, 2022. [CrossRef]
  15. Mager, M.; Ontanón, S.; Costa-Jussà, M. R.; et al. Findings of the AmericasNLP 2021 shared task on open machine translation for Indigenous languages of the Americas. In: Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas. 2021.
  16. Moore, D.; Galucio, A. V.; Gabas JR., N. O desafio de documentar e preservar as línguas amazônicas. Revista do Instituto de Estudos Brasileiros, n. 47, p. 91–107, 2008.
  17. Moore, Patrick; Hennessy, Kate. Novas tecnologias e ideologias contestadas: o projeto Tagish FirstVoices. The American Indian Quarterly, v. 30, n. 1-2, p. 119–137, 2006. [CrossRef]
  18. Nguyen, L. A. T.; Huynh, T. S.; Tran, D. T.; Vu, Q. H. Design and implementation of web application based on MVC Laravel architecture. European Journal of Electrical Engineering and Computer Science, v. 6, n. 4, p. 23–29, 2022. [CrossRef]
  19. Niu, C.; Li, C.; Luo, B.; Ng, V. Deep learning meets software engineering: a survey on pre-trained models of source code. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2022. [CrossRef]
  20. Ouyang, L. et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 2022. [CrossRef]
  21. Pant, S. et al. Consistency indices in analytic hierarchy process: a review. Mathematics, v. 10, n. 8, art. 1206, 2022. [CrossRef]
  22. RODRIGUES, A. D. Línguas brasileiras: para o conhecimento das línguas indígenas.
  23. São Paulo: Loyola, 1986. Disponível em: http://www.etnolinguistica.org/biblio:rodrigues-1986-linguas. Acesso em: 24 mar. 2026.
  24. Saaty, T. L.; Vargas, L. G. Models, methods, concepts & applications of the analytic hierarchy process. 2. ed. New York: Springer, 2012. [CrossRef]
  25. Topcu, A. E.; Alzoubi, Y. I.; Elbasi, E.; Camalan, E. Social media zero day attack detection using TensorFlow. Electronics, v. 12, n. 17, art. 3554, 2023. [CrossRef]
  26. Turner, T. Social body and embodied subject: bodiliness, subjectivity, and sociality among the Kayapó. Cultural Anthropology, v. 10, n. 2, p. 143–170, 1995. Disponível em: https://www.jstor.org/stable/656331. Acesso em: 24 mar. 2026.
  27. Zave, P.; Jackson, M. Four dark corners of requirements engineering. ACM Transactions on Software Engineering and Methodology, v. 6, n. 1, p. 1–30, 1997. [CrossRef]
  28. Zoia, A.; Rondon, M. T. Conhecimento tradicional e produção de materiais didáticos para o fortalecimento das línguas indígenas em Mato Grosso (Brasil). Pedagogía Social. Revista Interuniversitaria, n. 39, p. 47–59, 2021. [CrossRef]
Figure 1. Methodological Process of Software Development.
Figure 1. Methodological Process of Software Development.
Preprints 218403 g001
Figure 2. Word Access Interface.
Figure 2. Word Access Interface.
Preprints 218403 g002
Figure 3. Initial System Interface with Light Mode Enabled.
Figure 3. Initial System Interface with Light Mode Enabled.
Preprints 218403 g003
Figure 4. Comparative graph of multicriteria performance among indigenous language preservation and teaching systems.
Figure 4. Comparative graph of multicriteria performance among indigenous language preservation and teaching systems.
Preprints 218403 g004
Table 1. Percentage of Compliance with Security Best Practices.
Table 1. Percentage of Compliance with Security Best Practices.
Preprints 218403 i001
Table 2. – Comparative Analysis of Existing Systems.
Table 2. – Comparative Analysis of Existing Systems.
Preprints 218403 i002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated