Submitted:
11 September 2025
Posted:
15 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
3. Methodology
3.1. Document Analysis
- Marketing and official website statements;
- Technical manuals and product specifications;
- TOS, privacy policies, and data use policies;
- Promotional literature and user guidelines for teachers.
3.2. Benchmarking Against Global Standards
- GDPR (EU, 2018): Lawful processing of data principles; minimization; purpose limitation; DPIA’s; rights of data subjects.
- EU AI Act (Draft, 2024): Classifies educational AI as high-risk, requiring measures of transparency, auditability and human oversight.
- OECD AI Principles (2019): Sector-agnostic values of fairness, robustness, transparency, and centered around humanity. UNESCO Ai in Education Guidance 2021, ethical use in support of SDGs, ‘Universal inclusiveness’, human control.
- COPPA (US): parental consent requirements and limitations regarding the processing of children’s online data.
3.3. Gap Analysis
- Privacy Protections – working security practices, handling of children’s data, and adherence to GDPR/COPPA/FERPA.
- Fairness Testing – whether or not there are empirical audits for subgroup bias, whether or not the training data was representative, whether or not there were attempts at bias mitigation.
- Accountability & Transparency – In addition to clear assignment of responsibilities (e.g. who is responsible – teachers, schools, platform developers, etc. – for what), there should be include all mechanisms for explaining and understanding the decisions made by the system and all trails to enable an audit.
4. Ethical and Pedagogical Dimensions
4.1. Ethical Considerations
- Fairness: Despite claiming unbiased scores, there is no evidence available to the public that would show testing for bias in systematic subgroups, such as gender, socioeconomic status, or language background. This is an important omission, given the findings that when imbalanced datasets are used to train models in education, algorithmic decision-making has the potential to heighten inequalities further (Baker & Smith, 2019; Holmes et al., 2022). A lack of fairness audits would lead to those minority or marginalized student groups being worse off, which would be against the spirit of UNESCO’s recommendations for inclusivity in AI.
- Explainability: The “right to explanation” of GDPR (Article 22) and transparency obligations of the EU AI Act highlight the need for learners and educators to be able to comprehend the reasoning behind automated grading. Notegrade.ai presents relatively little information about how its scoring algorithms work, which creates opacity in its decision-making. Yet in the absence of mechanisms that clarify how these results were derived, there does not appear to be recourse for students impacted by them, thus undermining both the procedural fairness and accountability of these systems.
- Accountability: Responsibility can, and should, be consistent between developer- institution-educator in ethical ai governance from Leslie(2020). Notegrade.ai’s terms of service indemnify the platform by placing the liability on the users. This obscures accountability when harm occurs; whether via developers or teachers or institutions. These types of “accountability gaps” are a common issue in the governance of AI more generally (Sloane et al., 2022).

4.2. Pedagogical Implications
- Teacher Autonomy: Automating tasks decreases the amount of work that needs to be done, but too much automation can lead to lack of professional judgment. Teachers are also in a unique position to interpret contextualized signals in students’ work, such as those of creativity, critical reasoning or cultural relevance in students’ work, that indicate competence or lack thereof (Luckin, 2021). If grading by AI became normalized, teachers would have to be “mere managers of automated processes”.
- Student Engagement: If students consider feedback to be mechanical rather than dialogic, automated grading might inadvertently promote surface learning, for example, Williamson and Piattoeva 2022). Conversational feedback, a central tenet to formative assessment pedagogy, is at risk of becoming mere scoring through the use of AI systems that offer minimal explanation and feedback, and which as a result would undermine intrinsic motivation and reflective learning.
- Curricular Alignment: Rubrics are built into the training or programming of AI systems . This could create an incentive to favor standardized, more easily measurable outcomes as opposed to complex, higher-order skills. It has been warned that such alignment tends to ‘constrain curricular possibilities’ by sidelining skills such as ‘creativity, collaboration, and ethical reasoning’ because they are difficult to measure (Selwyn, 2019).
4.3. Ethical–Pedagogical Tensions
5. Data Privacy and Security
5.1. Centrality of Data Protection in Educational AI
5.2. GDPR Obligations and Observed Gaps
- Lawful Basis & Informed Consent: The platform must have a legal ground to process (consent, contractual requirement, legitimate interest, etc). The legal basis for data collection by Notegrade.ai is not explicitly stated in its policies.
- Data Minimization & Purpose Limitation: Educational purposes for which data is necessary should be the only purposes for which AI systems conduct data processing. It does not clarify if these secondary uses are not included, for example, uses to train algorithms, or for commercial partnerships.
- Data Subject Rights (DSRs): Rights of Access, Rectification, Erasure, Restriction and Portability as guaranteed under GDPR. There does not appear to be any documented process within Notegrade.ai for learners to activate these rights.
- Data Protection Impact Assessments (DPIAs): Necessary whenever processing is likely to result in high risks, for instance, use of profiling or automated decision-making in the context of education. There was no evidence found that DPIAs were made public.
5.3. FERPA and COPPA Considerations
- FERPA (1974): Prevents disclosure of education records without parental or eligible student permission. Notegrade.ai’s documentation makes no mention of compliance procedures, leaving it unclear whether its grading results or student records are considered “education records” that fall under FERPA protections.
- COPPA (1998): Mandates verifiable consent from parents prior to the collection of any personal information from children under 13. Considering the likelihood that AI grading tools, and other similar AI applications, could be used in such a way that children under the age of 18 could be exposed to these tools in K–12 settings, the absence of COPPA compliance statements is troubling.
| Regulation | Key Requirement | Notegrade.ai Status | Compliance Gap |
|---|---|---|---|
| GDPR (EU) | Lawful processing, DPIA, right to explanation | Not explicitly documented | No DPIA evidence |
| FERPA (US) | Student record confidentiality | Not mentioned | Potential risk for minors |
| COPPA (US) | Parental consent for <13 years | No explicit mention | Risk in K-12 adoption |
| UK DPA 2018 | Age-appropriate design, children’s rights | Not specified | Underdeveloped safeguards |
5.4. Security Safeguards and Technical Gaps
- Encryption standards such as AES-256;
- Retention of data;
- Access agreements; or data-sharing agreements;
- Breach notification policies.
5.5. Implications for Trust and Adoption
6. Transparency and Explainability
6.1. Central Role of Transparency in AI Governance
6.2. Current Transparency Practices in Notegrade.ai
- Unexplained Methodology: The kinds of training datasets used, the criteria employed in the scoring rubrics, and the bias mitigations used by the platform are not made explicit .
- Lack of activity logs: There are no automated records of how particular grades were assigned at different points in time, which affects traceability.
- Nontransparent error feedback: There is no feedback on error margins, confidence levels for the models, or on patterns of error within the system; students and teachers receive no information regarding margins of error.
6.3. International Benchmarks for Explainability
- Singapore Model AI Governance Framework (2019, 2020) : Accessible explanations according to stakeholder group, detailed audit trails, and accountability chains at each role.
- OECD AI Principles 2019: Advocating for transparency that allows users to “understand the rationale of the AI’s recommendations or decisions, in so far as this is appropriate.
- EU AI Act (Draft, 2024):High-risk systems, including those used in the field of education, shall be designed, developed and used in such a way that the information concerning the logic, significance, and probability of the consequences of the processing, is accessible and comprehensible to natural persons .

6.4. Pedagogical and Ethical Implications of Limited Transparency
- For Students: Feedback is without rationale and thus arbitrary in nature, fails to foster the internal motivational base, and shuts down prospects for reflective learning (Williamson & Piattoeva, 2022).
- For Educators: Teachers are unable to verify the AI’s choices or situate them within a broader framework of assessment objectives which takes away from teacher autonomy.
- For Institutions: “ Grading logic based on ambiguity presents legal and reputational risks, especially in cases where the learner disputes results based on rights such as Article 22 of GDPR, for example.
6.5. Towards Transparent AI Assessment
- Layered Explanations: for learners- lay reasoning, for instructors- alignment with rubric, for auditors- technical documentation.
- Machine Readable Audit Trails: A paper trail of choices that allows for accountability and compliance audits.
- Error Metrics Disclosure. Confidence intervals and known error rates should be reported to prevent a characterization of outputs as authoritative.
7. Compliance Gaps and Risks
7.1. Fairness and Bias Risks
7.2. Transparency Deficiencies
7.3. Data Protection Gaps
- FERPA (US): Safeguarding of students’ educational records. There is no clear indication whether the grading outputs are considered education records or that there is some form of control over disclosures.
- COPPA (US): Protection for children younger than 13. No descriptions of parental consent mechanisms that are able to be verified are provided.
- GDPR (EU): calls for Data Protection Impact Assessments (DPIAs) and assurances of Data Subject Rights (DSRs). Both were also absent in the platform documentation. This lack of explicit promises is concerning for institutional uptake, especially in K-12 contexts where children’s data is more legally protected.
7.4. Accountability Ambiguities
7.5. Aggregated Risks
- Legal Risk: Institutions may inadvertently breach GDPR, FERPA, and COPPA regulations and open themselves to fines or penalties.
- Ethical Risk: The presence of bias, lack of transparency, and absent accountability challenges fairness and justice in education.
- The Pedagogical Risk: Diminished teacher agency and student trust undermine legitimacy of assessments.
- Reputation Risk: Failure to use compliant AI systems leaves institutions vulnerable to reputational harm in the event that harms occur.
| Risk Category | Specific Gap | Potential Impact |
|---|---|---|
| Ethics | No subgroup fairness testing | Bias against minority learners |
| Privacy | Lack of FERPA/COPPA documentation | Risk to children’s data |
| Transparency | Limited scoring rationale | Black-box problem |
| Accountability | No clear responsibility structure | Unclear liability for errors |
8. Proposed Audit Protocol for AI in Education
8.1. Step 1 – Data Collection Audit
8.2. Step 2 – Human Consensus Grading
8.3. Step 3 – AI Scoring Validation
8.4. Step 4 – Subgroup Fairness Testing
8.5. Step 5 – Transparency and Explainability Audit
- Students: Specific feedback associated with learning outcomes.
- Teachers: Scoring logic and error reporting in detail.
- Institutions: technical documentation; audit trails; chains of accountability.
8.6. Step 6 – Accountability Mapping

9. Conclusions
References
- Wittmann, M., Hellman, T., & Loukina, A. (2025). Algorithmic Fairness in Automatic Short Answer Scoring. International Journal of Artificial Intelligence in Education. Available open access; examined gender and language group bias in PISA scoring systems.
- Schaller, N.-J., Ding, Y., Horbach, A., Meyer, J., & Jansen, T. (2024). Fairness in Automated Essay Scoring: A Comparative Analysis of Algorithms on German Learner Essays from Secondary Education. BEA 2024. Highlighted how skewed training data affects fairness across cognitive ability groups.
- Yang, K., Raković, M., Li, Y., Guan, Q., Gašević, D., & Chen, G. (2024). Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability. arXiv. Showed prompt-specific models often bias toward students from certain economic statuses.
- Altukhi, Z. M., & Pradhan, S. (2025). Systematic Literature Review: Explainable AI Definitions and Challenges in Education. arXiv. Identified 15 definitions and 62 challenges across explainability, ethics, trustworthiness, and policy in educational contexts.
- Maity, S., & Deroy, A. (2024). Human-Centric eXplainable AI in Education. arXiv. Emphasized frameworks for building XAI systems that prioritize educator and learner understanding.
- Unpacking the ethics of using AI in primary and secondary education: a systematic literature review. (2025). AI and Ethics. Systematically reviewed ethical debates in AIED across 48 sources, highlighting gaps across various communities.
- Decoding AI ethics from Users' lens in education: A systematic review. (2024). Heliyon, 10(20), e39357. Explored ethical concerns from user perspectives and proposed inclusive ethics guidelines for AIED.
- Wetzler, E. L., Cassidy, K. S., Jones, M. J., Frazier, C. R., Korbut, N. A., Sims, C. M., Bowen, S. S., & Wood, M. (2024). Grading the Graders: Comparing Generative AI and Human Assessment in Essay Evaluation. Journal Found low agreement between AI-generated grades (ChatGPT) and human instructors. [CrossRef]
- Alawadh, H. M., Meraj, T., Aldosari, L., & Rauf, H. T. (2024). An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features. Sage Open, 2024. Emphasized challenges in capturing creativity, coherence, and subjectivity.
- Chan, K. K. Y., Bond, T., & Yan, Z. (2023). Application of an Automated Essay Scoring Engine to English Writing Assessment Using Many-Facet Rasch Measurement. Assessment in Education: Principles, Policy & Practice. Described human review safeguards in GMAT scoring alongside AES.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).