Submitted:
12 May 2025
Posted:
12 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- To identify the needs, expectations, challenges, and perspectives of key actors regarding the integration of AI-based tools for assessment purposes;
- To propose quality assurance measures and practical guidelines for the responsible use of generative AI in formal assessment settings;
- To design a new conceptual framework that enables a systematic and stakeholder-sensitive integration of generative AI into assessment practices in higher education;
- To support the development of transparent, scalable, and learner-centred evaluations that maintain academic integrity while embracing technological innovation;
- To examine the technological, pedagogical and ethical implications of deploying AI-powered systems for designing, administering, and evaluating academic assessments.
2. State-of-the-Art Review of Generative AI Tools and Their Assessment Applications
2.1. Classification of Assessment in Higher Education and the Role of Generative AI in This Process
2.2. Standards, Policies and Procedures for Assessment Quality in Higher Education
2.3. Generative AI Tools for Assessment in Higher Education
- Content-generation tools – these platforms (e.g., ChatGPT, Eduaide.ai, Copilot) focus on the automatic creation of assessment elements such as quiz questions, prompts, rubrics, and feedback. They support instructors in designing assessments aligned with learning taxonomies and learning outcomes.
- Interactive quiz generators – tools like Quizgecko use generative algorithms to convert any text into structured assessment formats, such as multiple-choice, fill-in-the-blank, or true/false questions. Their focus is on streamlining formative assessment, particularly in asynchronous or online contexts.
- AI-assisted feedback systems – platforms such as GrammarlyGO provide real-time, formative feedback on student writing, helping learners improve clarity, coherence, and argumentation. These tools are particularly useful in large classes and for supporting autonomous learning.
- Grading automation systems – tools like Gradescope apply AI models to evaluate student submissions using rubrics, group similar responses, and standardise feedback. They are widely adopted in STEM education and are integrated with major LMSs.
- Learning assistants and peer support tools – ChatGPT and similar apps serve as AI tutors that guide students through problem-solving steps and help reinforce foundational concepts, indirectly supporting assessment preparation through personalised coaching.
3. Related Work
3.1. Theoretical Frameworks for Application of Generative AI in Assessment Processes
3.2. Measuring the Attitudes Towards Generative AI Assessment and its Quality
4. Framework for Generative AI-Supported Assessment in Higher Education
5. Verification of the Proposed Generative AI-Based Assessment Framework
6. Conclusions and Future Research
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Van Damme, D.; Zahner, D. (Eds.) Does Higher Education Teach Students to Think Critically? OECD Publishing: Paris, France, 2022. [CrossRef]
- UNESCO. Reimagining Our Futures Together: A New Social Contract for Education. Available online: https://www.unesco.org/en/articles/reimagining-our-futures-together-new-social-contract-education (accessed on 20 May 2025).
- Sevnarayan, K.; Potter, M.A. Generative Artificial Intelligence in Distance Education: Transformations, Challenges, and Impact on Academic Integrity and Student Voice. J. Appl. Learn. Teach. 2024, 7. [Google Scholar] [CrossRef]
- World Economic Forum. Future of Jobs Report 2023. Available online: https://www.weforum.org/publications/the-future-of-jobs-report-2023 (accessed on 20 May 2025).
- Gamage, K.A.; Dehideniya, S.C.; Xu, Z.; Tang, X. ChatGPT and Higher Education Assessments: More Opportunities Than Concerns? J. Appl. Learn. Teach. 2023, 6, 358–369. [Google Scholar] [CrossRef]
- ENQA. Working Group Report on Academic Integrity. Available online: https://www.enqa.eu/wp-content/uploads/ENQA-WG-Report-on-Academic-Integrity-.pdf (accessed on 20 May 2025).
- NEAA. Criteria for Programme Accreditation of Professional Field / Specialty from the Regulated Professions. Available online: https://www.neaa.government.bg/images/Criteria_EN/ENG_Kriterii_za_programna_akreditacija_na_PN-SRP.pdf (accessed on 20 May 2025).
- European Commission. The European Higher Education Area in 2024: Bologna Process Implementation Report. 2024. Available online: https://eurydice.eacea.ec.europa.eu/publications/european-higher-education-area-2024-bologna-process-implementation-report (accessed on 20 May 2025).
- Nikolic, S.; Daniel, S.; Haque, R.; Belkina, M.; Hassan, G.M.; Grundy, S.; Sandison, C. ChatGPT Versus Engineering Education Assessment: A Multidisciplinary and Multi-Institutional Benchmarking and Analysis. Eur. J. Eng. Educ. 2023, 48, 559–614. [Google Scholar] [CrossRef]
- Gruenhagen, J.H.; Sinclair, P.M.; Carroll, J.A.; Baker, P.R.; Wilson, A.; Demant, D. The Rapid Rise of Generative AI and Its Implications for Academic Integrity: Students’ Perceptions and Use of Chatbots. Comput. Educ. Artif. Intell. 2024, 7, 100273. [Google Scholar] [CrossRef]
- Larenas, C.D.; Díaz, A.J.; Orellana, Y.R.; Villalón, M.J.S. Exploring the Principles of English Assessment Instruments. Ensaio: Aval. Polít. Públicas Educ. 2021, 29, 461–483. [Google Scholar] [CrossRef]
- Thanh, B.N.; Vo, D.T.H.; Nhat, M.N.; Pham, T.T.T.; Trung, H.T.; Xuan, S.H. Race with the Machines: Assessing the Capability of Generative AI in Solving Authentic Assessments. Australas. J. Educ. Technol. 2023, 39, 59–81. [Google Scholar] [CrossRef]
- Kane, M. Validity and Fairness. Lang. Test. 2010, 27, 177–182. [Google Scholar] [CrossRef]
- ISO 21001:2018. Educational Organizations—Management Systems for Educational Organizations—Requirements with Guidance for Use. Available online: https://www.iso.org/standard/66266.html (accessed on 20 May 2025).
- Mai, F. Anforderungen an Lerndienstleister und Lerndienstleistungen. In Qualitätsmanagement in der Bildungsbranche, Springer Gabler: Wiesbaden, Germany, 2020. [CrossRef]
- ENQA. Standards and Guidelines for Quality Assurance in the European Higher Education Area (ESG). 2015. Available online: https://www.enqa.eu/wp-content/uploads/2015/11/ESG_2015.pdf (accessed on 20 May 2025).
- Ellis, R.; Hogard, E. (Eds.) Handbook of Quality Assurance for University Teaching; Routledge: London and New York, UK/USA, 2019. [Google Scholar] [CrossRef]
- Kalimullin, A.M.; Khodyreva, E.; Koinova-Zoellner, J. Development of Internal System of Education Quality Assessment at a University. Int. J. Environ. Sci. Educ. 2016, 11, 6002–6013. [Google Scholar]
- Xia, Q.; Weng, X.; Ouyang, F.; Lin, T.J.; Chiu, T.K. A Scoping Review on How Generative Artificial Intelligence Transforms Assessment in Higher Education. Int. J. Educ. Technol. High. Educ. 2024, 21. [Google Scholar] [CrossRef]
- Xiong, Y.; Suen, H.K. Assessment Approaches in Massive Open Online Courses: Possibilities, Challenges and Future Directions. Int. Rev. Educ. 2018, 64, 241–263. [Google Scholar] [CrossRef]
- Vetrivel, S.C.; Vidhyapriya, P.; Arun, V.P. The Role of AI in Transforming Assessment Practices in Education. In AI Applications and Strategies in Teacher Education; IGI Global, 2025. [CrossRef]
- Smolansky, A.; Cram, A.; Raduescu, C.; Zeivots, S.; Huber, E.; Kizilcec, R.F. Educator and Student Perspectives on the Impact of Generative AI on Assessments in Higher Education. In Proceedings of the 10th ACM Conference on Learning@ Scale, Copenhagen, Denmark, 20–22 July 2023; pp. 378–382. [Google Scholar] [CrossRef]
- Agostini, D.; Picasso, F. Large Language Models for Sustainable Assessment and Feedback in Higher Education: Towards a Pedagogical and Technological Framework. Intelligenza Artificiale 2024, 18, 121–138. [Google Scholar] [CrossRef]
- Kolade, O.; Owoseni, A.; Egbetokun, A. Is AI Changing Learning and Assessment as We Know It? Evidence from a ChatGPT Experiment and a Conceptual Framework. Heliyon 2024, 10. [Google Scholar] [CrossRef] [PubMed]
- Salinas-Navarro, D.E.; Vilalta-Perdomo, E.; Michel-Villarreal, R.; Montesinos, L. Using Generative Artificial Intelligence Tools to Enhance Experiential Learning for Authentic Assessment. Educ. Sci. 2024, 14. [Google Scholar] [CrossRef]
- Khlaif, Z.N.; Alkouk, W.A.; Salama, N.; Abu Eideh, B. Redesigning Assessments for AI-Enhanced Learning: A Framework for Educators in the Generative AI Era. Educ. Sci. 2025, 15. [Google Scholar] [CrossRef]
- Williams, P. AI, Analytics and a New Assessment Model for Universities. Educ. Sci. 2023, 13(10), 1040. [Google Scholar] [CrossRef]
- Chiu, T.K. Future Research Recommendations for Transforming Higher Education with Generative AI. Comput. Educ. Artif. Intell. 2024, 6, 100197. [Google Scholar] [CrossRef]
- Ogunleye, B.; Zakariyyah, K.I.; Ajao, O.; Olayinka, O.; Sharma, H. Higher Education Assessment Practice in the Era of Generative AI Tools. J. Appl. Learn. Teach. 2024, 7. [Google Scholar] [CrossRef]
- Perkins, M.; Furze, L.; Roe, J.; MacVaugh, J. The Artificial Intelligence Assessment Scale (AIAS): A Framework for Ethical Integration of Generative AI in Educational Assessment. J. Univ. Teach. Learn. Pract. 2024, 21, 49–66. [Google Scholar] [CrossRef]

| Tool name | Based on LLM | Main educational functionality | Supported platforms | Input Limit (Max Tokens) | Access type |
|---|---|---|---|---|---|
| ChatGPT | GPT-4.5 | Assessment creation, rubric assistance, feedback generation | Web, mobile, API | Up to 32,000 tokens | Freemium and subscription |
| Eduaide.ai | GPT- and Anthropic- based | Instructional/assessment resource generation, Bloom's alignment | Web | N/A | Freemium and subscription |
| Quizgecko | GPT-based | Quiz generation from text, export to LMS | Web | N/A | Freemium and subscription |
| GrammarlyGO | Proprietary GPT-based | Real-time writing support, revision suggestions | Desktop, browser, mobile | N/A | Freemium and subscription |
| Gradescope | Custom AI models | Rubric-based AI-assisted grading and feedback | Web, LMS integration | N/A | Institutional license |
| Copilot | GPT-4.5 (via Azure) | Automation of content generation, rubric writing, document summarization | Microsoft 365 apps | Up to 32,000 tokens | Subscription (Microsoft 365) |
| Reference | Study Type | Target Assessment | Primary Focus | Methodological Approach |
|---|---|---|---|---|
| Nikolic et al. [9] | Empirical (experiment) | Engineering assignments/exams | ChatGPT performance on engineering tasks – implications for authenticity and design | Empirical (performance analysis) |
| Smolansky et al. [22] | Empirical (survey study) | Assessments (general) | Educator vs. student perspectives on GAI impact (integrity, usage, attitudes) | Empirical (survey research) |
| Thanh et al. [12] | Empirical (experiment) | Authentic assessments | GAI’s ability to solve real-world tasks – benefits and limitations | Empirical (comparative) |
| Williams [27] | Conceptual (analysis) | Summative HE assessment (general) | Post-COVID assessment model redesign incorporating AI influences | Conceptual (theoretical model) |
| Agostini & Picasso [23] | Conceptual (analysis) | Assessment processes (general) | Sustainable, alternative assessment practices in response to LLMs | Conceptual (strategy proposal) |
| Chiu [28] | Survey (qualitative) | General (multiple assessment forms) | Policy and strategy | Empirical (focus groups) |
| Gruenhagen et al. [10] | Conceptual (commentary) | Written exams & assignments | Academic integrity challenges posed by GAI in assessments | Conceptual (issue commentary) |
| Kolade et al. [24] | Empirical (experiment + framework) | Multiple course assessments | ChatGPT’s effect on learning/assessment – framework for AI-era assessment | Empirical (with conceptual elements) |
| Ogunleye et al. [29] | Empirical (analysis) | Various HE assessment tasks | Evaluating GAI tools’ capabilities on student assessments – impact on practice | Empirical (multi-task analysis) |
| Perkins et al. [30] | Conceptual (framework proposal) | Assessment practices (general) | AIAS framework for ethical GAI integration in assessment policies | Conceptual (theoretical) |
| Salinas-Navarro et al. [25] | Empirical (exploratory study) | Experiential/authentic tasks | Using GAI to enhance experiential learning and authentic assessments | Empirical (AI tool exploration) |
| Khlaif [26] | Framework proposal (qualitative study) | General (all course assessments) | Assessment redesign | Empirical (interviews & focus groups) |
| Criterion | Max points | Scoring guidance |
|---|---|---|
| 1. Understanding of the Output Table | 4 | 4: Fully explains structure and roles of model components (mean, volatility, distribution); 3: Minor inaccuracies; 2: Partial understanding; 1: Minimal; 0: None |
| 2. Explanation of Model Purpose | 3 | 3: Clearly explains GARCH use in modelling volatility; 2: Some context; 1: Basic idea but unclear; 0: Incorrect or missing |
| 3. Interpretation of Key Parameters | 4 | 4: Accurate and insightful on all key terms (mu, omega, alpha, beta, nu); 3: Mostly accurate; 2: Some errors; 1: Misinterpretation; 0: Not attempted |
| 4. Use of Statistical Significance | 2 | 2: Discusses p-values/confidence intervals correctly; 1: Some understanding; 0: No mention or incorrect |
| 5. Discussion of Convergence Warning | 2 | 2: Identifies warning and explains implications; 1: Recognised but misinterpreted; 0: Ignored or incorrect |
| 6. Structure and Clarity | 2 | 2: Logically structured, easy to follow; 1: Some clarity issues; 0: Disorganised or unclear |
| 7. Authenticity and Originality (human vs. AI only) | 3 | 3: Demonstrates unique insight or personal approach; 2: Generic but plausible; 1: AI-like phrasing; 0: Likely copied or lacks original reasoning |
| Total | 20 |
| Student: | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | S11 | S12 | S13 | S14 | S15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| L(points) | 32 | 17 | 31 | 30 | 34 | 37 | 28 | 33 | 18 | 30 | 40 | 33 | 23 | 23 | 37 |
| L (grade) | 5 | 3 | 5 | 5 | 5 | 6 | 4 | 5 | 3 | 5 | 6 | 5 | 4 | 4 | 6 |
| GAI (points) | 39 | 22 | 27 | 33 | 34 | 38 | 26 | 29 | 22 | 29 | 39 | 31 | 28 | 25 | 34 |
| GAI (grades) | 6 | 3 | 4 | 5 | 5 | 6 | 4 | 5 | 3 | 5 | 6 | 5 | 4 | 4 | 5 |
| Difference (grades) | -1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| Question | Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Final exam score |
|---|---|---|---|---|---|---|---|
| MAE | 0.93 | 1.07 | 0.53 | 1.87 | 0.93 | 1.00 | 2.93 |
| MSE | 1.47 | 1.73 | 0.67 | 7.33 | 1.73 | 1.93 | 12.00 |
| RMSE | 1.21 | 1.32 | 0.82 | 2.71 | 1.32 | 1.39 | 3.46 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).