Submitted:
26 July 2025
Posted:
30 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Generative AI in Higher Education: Evolution, Impact, and Institutional Response
1.2. Assessing the Effectiveness of AI Detection Tools and Ethical Issues in Teaching and Practice
1.3. Identifying Gaps in Existing Research and Rationale for Study
- Institutional Implementation of AI Detection Tools in Academic Integrity Frameworks and Assessment Systems
- Evaluating the Effectiveness and Limitations of Current AI Detection Tools in Distinguishing AI-Generated Content from Student Work
- The Ethical, Procedural, and Pedagogical Concerns that AI Detection Systems Raise for Students and Faculty in Higher Education
2. Materials and Methods
2.1. Review Process
3. Results
3.1. Screening Summary
3.2. Study Characteristics
3.3. Thematic Analysis
4. Discussion
4.1. Institutional Implementation of AI Detection Tools in Academic Integrity Frameworks and Assessment Systems
4.1.1. Policy Integration and Ethical Framing in the Age of AI
4.1.2. Technological Adoption and the Effectiveness of AI Detection Tools
4.1.3. Transformations in Assessment Design and Pedagogical Strategy
4.2. Evaluating the Effectiveness and Limitations of Current AI Detection Tools in Distinguishing AI-Generated Content from Student Work
4.2.1. Effectiveness and Shortcomings of Current AI Detection Tools
4.2.2. Emerging Innovations and Methodological Approaches in AI Detection
4.2.3. Ethical, Legal, and Educational Implications
4.3. Ethical, Procedural, and Pedagogical Concerns Raised by AI Detection Tools in Higher Education
4.3.1. Ethical Ambiguities and Student Vulnerabilities
4.3.2. Procedural Inconsistencies and Institutional Risks
4.3.3. Pedagogical Consequences and the Future of Assessment
5. Limitations of Current Research
6. Suggestions for Future Research
7. Conclusion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Abbas M, Jam FA, Khan TI. Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students. International Journal of Educational Technology in Higher Education. 2024;21.
- Alexander K, Savvidou C, Alexander C. Who wrote this essay? Detecting ai-generated writing in second language education in higher education. Teaching English with Technology. 2023;23:25–43.
- Davar NF, Dewan MAA, Zhang X. AI Chatbots in Education: Challenges and Opportunities. Information (Switzerland). Multidisciplinary Digital Publishing Institute (MDPI); 2025.
- Al-Zahrani, AM. The impact of generative AI tools on researchers and research: Implications for academia in higher education. Innovations in Education and Teaching International. 2024;61:1029–43.
- Lund BD, Wang T, Mannuru NR, Nie B, Shimray S, Wang Z. ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing. J Assoc Inf Sci Technol. 2023;74:570–81.
- Revell T, Yeadon W, Cahilly-Bretzin G, Clarke I, Manning G, Jones J, et al. ChatGPT versus human essayists: an exploration of the impact of artificial intelligence for authorship and academic integrity in the humanities. International Journal for Educational Integrity [Internet]. 2024;20:18. Available from: https://edintegrity.biomedcentral.com/articles/10. 1007.
- Birks D, Clare J. Linking artificial intelligence facilitated academic misconduct to existing prevention frameworks. International Journal for Educational Integrity. 2023;19.
- Nelson AS, Santamaría P V., Javens JS, Ricaurte M. Students’ Perceptions of Generative Artificial Intelligence (GenAI) Use in Academic Writing in English as a Foreign Language †. Educ Sci (Basel). 2025;15.
- Haenlein M, Kaplan A. A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. Calif Manage Rev. 2019;61:5–14.
- Zhao Y, Borelli A, Martinez F, Xue H, Weiss GM. Admissions in the age of AI: detecting AI-generated application materials in higher education. Sci Rep. 2024;14:26411.
- Yeo, MA. Academic integrity in the age of Artificial Intelligence (AI) authoring apps. TESOL Journal. 2023;14.
- Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, Foltýnek T, Guerrero-Dib J, Popoola O, et al. Testing of detection tools for AI-generated text. International Journal for Educational Integrity. 2023;19.
- Gaumann N, Veale M. AI providers as criminal essay mills? Large language models meet contract cheating law. Information and Communications Technology Law. 2024.
- Ibrahim, K. Using AI-based detectors to control AI-assisted plagiarism in ESL writing: “The Terminator Versus the Machines.” Language Testing in Asia. 2023;13:46.
- Ibrahim, K. Using AI-based detectors to control AI-assisted plagiarism in ESL writing: “The Terminator Versus the Machines.” Language Testing in Asia. 2023;13.
- Perkins, M. Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond. Journal of University Teaching and Learning Practice. 2023;20.
- Perkins, M. Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond. Journal of University Teaching and Learning Practice. 2023;20.
- Waltzer T, Pilegard C, Heyman GD. Can you spot the bot? Identifying AI-generated writing in college essays. International Journal for Educational Integrity. 2024;20.
- Ardito, CG. Generative AI detection in higher education assessments. New Directions for Teaching and Learning. 2024. [CrossRef]
- Hao J, Von Davier AA, Davier V, College B, Harris DJ. Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI [Internet]. 2024. Available from: https://openai.
- Yeo, MA. Academic integrity in the age of Artificial Intelligence (AI) authoring apps. TESOL Journal. 2023;14.
- Elkhatat AM, Elsaid K, Almeer S. Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. International Journal for Educational Integrity. 2023;19.
- Kar SK, Bansal T, Modi S, Singh A. How Sensitive Are the Free AI-detector Tools in Detecting AI-generated Texts? A Comparison of Popular AI-detector Tools. Indian J Psychol Med. 2024.
- Waltzer T, Pilegard C, Heyman GD. Can you spot the bot? Identifying AI-generated writing in college essays. International Journal for Educational Integrity. 2024;20.
- Sukhera, J. Narrative Reviews: Flexible, Rigorous, and Practical. J Grad Med Educ. 2022;14:414–7.
- Sukhera, J. Narrative Reviews: Flexible, Rigorous, and Practical. J Grad Med Educ. 2022;14:414–7.
- Deep P Das, Martirosyan N, Ghosh N, Rahaman MdS. ChatGPT in ESL Higher Education: Enhancing Writing, Engagement, and Learning Outcomes. Information. 2025;16:316.
- Hao J, von Davier AA, Yaneva V, Lottridge S, von Davier M, Harris DJ. Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI. Educational Measurement: Issues and Practice. 2024;43:16–29.
- Kar SK, Bansal T, Modi S, Singh A. How Sensitive Are the Free AI-detector Tools in Detecting AI-generated Texts? A Comparison of Popular AI-detector Tools. Indian J Psychol Med. 2025;47:275–8.
- Krawczyk N, Probierz B, Kozak J. Towards AI-Generated Essay Classification Using Numerical Text Representation. Applied Sciences. 2024;14:9795.
- Krishna K, Song Y, Karpinska M, Wieting J, Iyyer M. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. Adv Neural Inf Process Syst. Neural information processing systems foundation; 2023.
- Pan WH, Chok MJ, Wong JLS, Shin YX, Poon YS, Yang Z, et al. Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education. Proceedings of the 46th International Conference on Software Engineering: Software Engineering Education and Training. New York, NY, USA: ACM; 2024. p. 1–11.
- Baethge C, Goldbeck-Wood S, Mertens S. SANRA—a scale for the quality assessment of narrative review articles. Res Integr Peer Rev. 2019;4.
- Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann Intern Med. American College of Physicians; 2018. p. 467–73.
- Ardito, CG. Generative AI detection in higher education assessments. New Directions for Teaching and Learning. 2024. [CrossRef]
- Alexander K, Savvidou C, Alexander C. Who wrote this essay? Detecting ai-generated writing in second language education in higher education. Teaching English with Technology. 2023;23:25–43.
- Davar NF, Dewan MAA, Zhang X. AI Chatbots in Education: Challenges and Opportunities. Information (Switzerland). Multidisciplinary Digital Publishing Institute (MDPI); 2025.
- Revell T, Yeadon W, Cahilly-Bretzin G, Clarke I, Manning G, Jones J, et al. ChatGPT versus human essayists: an exploration of the impact of artificial intelligence for authorship and academic integrity in the humanities. International Journal for Educational Integrity. 2024;20.
- Nelson AS, Santamaría P V., Javens JS, Ricaurte M. Students’ Perceptions of Generative Artificial Intelligence (GenAI) Use in Academic Writing in English as a Foreign Language †. Educ Sci (Basel). 2025;15.
- Zhao Y, Borelli A, Martinez F, Xue H, Weiss GM. Admissions in the age of AI: detecting AI-generated application materials in higher education. Sci Rep. 2024;14.
- Elkhatat AM, Elsaid K, Almeer S. Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. International Journal for Educational Integrity. 2023;19.
- Krawczyk N, Probierz B, Kozak J. Towards AI-Generated Essay Classification Using Numerical Text Representation. Applied Sciences (Switzerland). 2024;14.
- Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, Foltýnek T, Guerrero-Dib J, Popoola O, et al. Testing of detection tools for AI-generated text. International Journal for Educational Integrity. 2023;19.

| Keyword/Concept | Boolean Operators | Purpose/Focus |
| AI Detection Tools | “AI detection tools” OR “AI writing detectors” OR “AI content detection” | To identify literature discussing technologies designed to detect AI-generated content in educational contexts. |
| Higher Education Context | “Higher education” OR “university” OR “tertiary education” OR “college” | To narrow the scope to post-secondary education environments where such tools are being evaluated and implemented. |
| Academic Integrity | “Academic integrity” AND “plagiarism” OR “misconduct” OR “academic honesty” | To capture studies exploring how AI detection tools intersect with integrity frameworks and institutional misconduct policies. |
| Generative AI Tools | “Generative AI” OR “ChatGPT” OR “large language models” OR “GPT-3” OR “GPT-4” | To focus on content related to the technologies most relevant to recent shifts in academic writing and integrity concerns. |
| Specific Detection Tools | “Turnitin” OR “GPTZero” OR “ZeroGPT” OR “Copyleaks” OR “CrossPlag” | To retrieve empirical studies that tested or reviewed commonly used detection platforms in educational contexts. |
| Detection Accuracy & Effectiveness | “accuracy” OR “detection performance” AND “false positives” OR “false negatives” | To locate studies that evaluated the technical reliability of AI detectors, particularly in distinguishing human vs. machine-generated texts. |
| Student and Faculty Perceptions | “Student perception” OR “faculty perception” AND “AI detection” OR “AI surveillance” | To understand user attitudes, concerns, and ethical views toward institutional use of AI detection technologies. |
| Ethical Implications | “ethics” OR “ethical implications” AND “privacy” OR “fairness” OR “consent” OR “bias” OR “equity” | To uncover scholarship discussing AI detection’s moral and social dimensions, especially around data use, consent, and equity in enforcement. |
| Assessment and Curriculum Redesign | “Assessment design” OR “curriculum redesign” AND “AI-generated writing” OR “ChatGPT” | To identify pedagogical responses to AI, including innovative assessments that reduce reliance on detection and emphasize human-centered learning. |
| AI Literacy and Responsible Use | “AI literacy” OR “responsible AI use” AND “teaching practice” OR “student support” | To gather conceptual and practice-oriented papers promoting student engagement with AI tools in ethical and educationally beneficial ways. |
| Criteria | Inclusion | Exclusion | Rationale |
| Publication Date | Peer-reviewed articles published between 2021 and 2024. | Articles published before 2021. | Ensures the review reflects the post-ChatGPT era, significantly accelerating the deployment and institutional scrutiny of AI detection tools in higher education. The 2021–2024 window aligns with the rapid technological and policy transformation period in academia. |
| Language | Studies published in English. | Non-English publications. | English was selected as the review language to maintain linguistic consistency, avoid translation inaccuracies, and reflect the dominance of English in scholarly discourse, particularly in AI and higher education research domains. |
| Peer-Review Status | Only peer-reviewed journal articles and conference proceedings included. | Excludes preprints, white papers, blogs, media articles, and non-peer-reviewed sources. | Peer-reviewed sources ensure methodological rigor and quality, according to the SANRA (Scale for the Assessment of Narrative Review Articles) criteria for evaluating the included literature. |
| Focus on Higher Education | Articles focused on university-level or tertiary education settings, including institutional, student, or faculty perspectives. | Research on K–12 education, vocational training, or non-academic sectors. | Ensures alignment with the study’s scope—AI detection tools are used within university-level academic integrity systems, not other educational levels or industries. |
| AI Detection Systems | Articles discussing AI writing detection tools (e.g., Turnitin AI, GPTZero, Copyleaks, ZeroGPT, CrossPlag, OpenAI classifiers). | Articles discussing AI in education generally and which do not reference detection tools. | The tool-specific review assesses detection systems’ technical, ethical, and procedural roles in enforcing academic integrity. |
| Thematic Relevance | Studies that explore: (a) institutional policy integration, (b) effectiveness and accuracy of detection tools, or (c) ethical and pedagogical impacts. | Studies lacking focus on institutional policy, accuracy, or ethical/pedagogical implications. | These themes correspond directly to the three research questions (RQ1–RQ3) guiding the review and structure the analysis across policy, performance, and pedagogy/ethics. |
| Methodological Clarity | Research with explicit methodological frameworks, such as case studies, comparative evaluations, theoretical models, or structured reviews. | Articles lacking any straightforward methodological approach or lacking an evaluative structure. | This ensures replicability, transparency, and analytical depth, especially where empirical claims about tool performance or institutional practice are made. |
| Stakeholder Involvement | Studies involving university stakeholders: students, faculty, administrators, policy makers, or institutional frameworks. | Articles focusing on AI tool developers, industry marketing, or technical benchmarking without a user perspective. | It focuses on the academic community, which is most impacted by AI detection, ensuring that findings remain grounded in educational practice, student experience, and faculty decision-making. |
| Ethical and Social Dimensions | Articles addressing fairness, surveillance, student rights, linguistic bias, transparency, or AI misuse/misidentification. | Papers focused purely on technical algorithms or architecture without discussing social or ethical implications. | Ethical risk is a significant pillar of the review. Including studies that engage with due process, trust, and academic freedom ensures a holistic assessment of the tools’ impact, beyond raw accuracy. |
| No | Reference (In-Text) | Study Location | Target Group | Research Objective | Research Approach | Principal Outcomes |
| 1 | [1] | Universities in Pakistan | University students | To examine the causes and consequences of ChatGPT usage among university students | Quantitative | Academic workload and time pressure increase ChatGPT use; reward-sensitive students use it less; use is linked to procrastination, memory loss, and lower CGPA. |
| 2 | [4] | Saudi Arabia |
Higher education students | To explore students’ awareness, use, impact perception, and ethical considerations of Generative AI (GenAI) tools in academic research | Quantitative survey | Students reported high awareness and positive experiences with GenAI, optimism about its future, and strong ethical concerns. However, training/support was limited. |
| 3 | [2] | Cyprus | ESL lecturers in higher education | Evaluate the effectiveness of AI detectors and human judgment in ESL assessment. | Qualitative | The paper reports that AI detectors (e.g, Turnitin, GPTZero, Crossplag) were generally more accurate than the ESL lecturers, particularly in identifying fully AI-generated essays |
| 4 | [19] | Global | Higher education faculty and students | Critically analyze generative AI detection tools’ effectiveness, vulnerabilities, and ethical implications in academic assessments. | Narrative analysis | AI detectors are unreliable, biased, and misaligned with educational goals. The study advocates for replacing detection tools with robust, AI-inclusive assessment frameworks. |
| 5 | [7] | Global | Commentary | Conceptual analysis using the situational crime prevention (SCP) framework, supplemented by academic misconduct literature. |
Theoretical analysis | AI-misconduct can be tackled using opportunity-reduction strategies from SCP, including redesigning assessments, increasing detection risks, reducing temptations, and clarifying institutional policies. |
| 6 | [3] | Global | Students and Educators in Higher Education | To explore the benefits and challenges of AI chatbots (e.g., ChatGPT) in education | Narrative Review | AI chatbots can support adaptive, real-time, and personalized learning and teaching, but raise concerns about ethics, academic integrity, misinformation, and accessibility |
| 7 | [27] | Global | Students and Educators in Higher Education | To evaluate the accuracy and ethical concerns surrounding AI-content detection tools | Narrative Review | Limitations include false positives, a lack of transparency, and fairness concerns; Turnitin AI, GPTZero, and Copyleaks were analyzed. |
| 8 | [22] | UAE | University Students | To explore students’ perceptions of using ChatGPT in academic writing, particularly regarding fairness, bias, and trust | Quantitative | Most students acknowledged ChatGPT’s usefulness, but raised concerns over fairness, accuracy, and trustworthiness. |
| 9 | [13] | Global | Legal scholars, policymakers, and education regulators. | Engaged in rigorous legal and policy analysis, reviewing statutes, AI services, and advertising practices | Conceptual Commentary | LLM providers may unintentionally fall under contract cheating laws; legal frameworks are unclear and need reform to protect legitimate AI use while targeting essay mills. |
| 10 | [28] | Not Applicable — the study is conceptual | Assessment professionals, educators, and students (Both K–12 and higher education) |
Explore how LLMs and generative AI affect educational assessment | Conceptual | AI improves test creation and scoring efficiency but introduces concerns around fairness, validity, and security. |
| 11 | [9] |
Not location-based; global focus | General readers, scholars, and business leaders | To provide a historical overview, current applications, and future outlook of AI in business and society | Narrative review/editorial | The article outlines the historical development of AI and its current applications in business contexts like HR, marketing, and decision-making. It also discusses future challenges related to ethics, regulation, and societal impact. |
| 12 | [14] | Kuwait | University ESL students | Test the effectiveness of AI detectors for identifying AI-generated plagiarism. | Empirical descriptive comparative study | Crossplag showed greater sensitivity to machine-generated texts, as reflected by a higher concentration of scores |
| 13 | [29] | India | Academic integrity experts | To assess how well free AI-detection tools can identify AI-generated and AI-paraphrased academic texts | Quantitative | Detection capability varied across tools; a few accurately identified original and paraphrased AI-generated content, while others failed to detect AI-generated text reliably. |
| 14 | [30] | Poland | AI-generated vs. student-written essays | To evaluate and compare different numerical text representation methods for classifying AI-generated essays | Quantitative | TF-IDF and Bag of Words with simple classifiers effectively detected AI-generated essays, while pretrained models performed poorly. |
| 15 | [31] | Not location-specific | Higher education (implicitly) | To evaluate how paraphrasing affects the performance of AI-text detectors and to test retrieval-based defense mechanisms | Experimental | Paraphrasing weakens detection tools, but retrieval-based methods offer a more reliable way to identify AI-generated content |
| 16 | [5] | Global | Scholars & publishers | To assess the ethical, practical, and academic implications of using ChatGPT in scholarly publishing. | Conceptual review | Identified benefits (efficiency, editing) and ethical concerns (bias, authorship, plagiarism) |
| 17 | [8] | Ecuador | Higher Education (EFL Students) | To explore Ecuadorian EFL students’ perceptions of generative AI (e.g., ChatGPT) in academic writing, particularly regarding academic dishonesty, usage, and institutional responses. | Quantitative | Students partially understood AI misuse, overestimated detection tools, and favored ethical guidance. |
| 18 | [32] | Malaysia & Singapore | Higher education students | To assess the performance and limitations of AIGC detectors in identifying AI-generated code in education contexts | Empirical (Quantitative) | AIGC detectors struggled to distinguish between AI- and human-written code. Most tools frequently misclassified code, indicating a need for improved detection models tailored to programming content. Sapling and GLTR showed relatively better performance with specific code variations. |
| 19 | [17] | Vietnam | Higher Education Institutions (HEIs), academic staff, and students | Examine academic integrity issues from student use of AI/LLMs like ChatGPT | Literature review | LLMs like ChatGPT produce undetectable, fluent content. Misconduct occurs when their use isn’t disclosed. |
| 20 | [6] | UK | Higher education students | Compare the performance of ChatGPT-generated essays with human essays analyzing Old English poetry, and evaluate detection accuracy. | Mixed-methods approach | AI essays scored similarly to student essays but lacked depth and cultural insight. Human markers identified AI essays around 79% of the time; AI detectors like Quillbot were even more accurate, at around 95%. The study urges reassessing how academic integrity is upheld as AI tools advance. |
| 21 | [18] | USA | Higher Education (Students & instructors) | To find out if people can tell the difference between AI-generated and student-written essays | Quantitative | Instructors struggled to detect AI essays, experience didn’t help, and both groups had mixed views on AI use. |
| 22 | [12] | Cross-country | Higher education academic integrity context | Evaluate detection tools for AI-generated text | Quantitative | Tools are mostly inaccurate; Turnitin was the most reliable |
| 23 | [11] | Singapore | Higher Education (TESOL educators and second/foreign language learners) | To explore how AI writing tools affect academic integrity, authorship, teaching, and assessment in language education. | Conceptual analysis | AI challenges authorship & integrity. Risk of misuse but learning potential. Teachers should integrate AI ethically. |
| 24 | [10] | USA | Higher Education | To detect AI-generated and AI-revised content in Letters of Recommendation (LORs) and Statements of Intent (SOIs) | Quantitative; | Domain-specific AI detectors (e.g., BERT, DistilBERT) achieved near-perfect accuracy in distinguishing human vs AI content in admissions materials; general detectors showed poor cross-domain performance. |
| Central Theme (Research Question) | Co-Themes | Structural Interpretation |
| 1. Institutional Policy and Implementation |
a. Lack of AI-Specific Academic Integrity Policies | Many institutions have yet to update academic integrity policies to reflect generative AI, creating confusion among faculty and students. |
| b. Policy Gaps and Faculty Uncertainty | Faculty members often rely on personal judgment in AI-related cases due to vague or missing guidelines, which can lead to inconsistency. | |
| c. Student Confusion and Inconsistent Enforcement | Students face unclear expectations, with institutional responses varying by instructor or department, increasing the risk of unfair accusations. | |
| 2. Detection Accuracy and System Performance |
a. False Positives and False Negatives | AI detectors often misclassify human writing as AI-generated and vice versa, undermining trust in academic decisions. |
| b. Tool Performance with Evolving LLMs | Many detection tools are not trained to detect the most recent language models (e.g., GPT-4), limiting their effectiveness. | |
| c. Gaps in Validation Across Student Populations | Detection tools often perform poorly with diverse writing styles, especially those from multilingual or non-traditional students. | |
| 3. Ethical and Pedagogical Implications |
a. Transparency and Due Process Issues | Students are rarely informed how their work is analyzed, raising concerns about consent, privacy, and fairness in enforcement. |
| b. Equity and Bias Concerns | Linguistic bias leads to disproportionate scrutiny of non-native English speakers, creating equity concerns. | |
| c. Need for Inclusive Assessment Design | Redesigning assessments (e.g., drafts, oral exams) can reduce reliance on detection tools and promote authentic student work. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).