Algorithmic Accountability and Continuous Audit in High-Risk Public AI Systems: A Narrative Review

Lourenço Correia; Antonio Goncalves; Mario Monteiro Marques

doi:10.20944/preprints202606.1380.v1

Submitted:

17 June 2026

Posted:

18 June 2026

You are already at the latest version

Abstract

This narrative review examines algorithmic accountability and continuous audit in high-risk public AI systems. Public-sector AI systems are increasingly used in areas such as criminal justice, healthcare, welfare, migration, taxation, education, and public security, where automated or semi-automated outputs may affect rights, access to services, liberty, welfare, and public trust. Rather than presenting original empirical data, this review synthesises interdisciplinary literature, regulatory instruments, standards, and documented public-sector AI cases to identify recurring accountability gaps. The review argues that transparency and compliance-oriented documentation are necessary but insufficient for high-risk public AI systems. Public institutions require audit-ready governance structures capable of linking system design, data provenance, model behaviour, human oversight, monitoring, redress, and institutional responsibility. The article proposes an audit-ready accountability perspective in which algorithmic systems are assessed as sociotechnical infrastructures rather than isolated technical tools. The review concludes that continuous audit, traceability, contestability, and post-deployment monitoring are essential conditions for accountable public-sector AI.

Keywords:

AI classification

;

algorithmic accountability

;

algorithmic bias

;

public services

;

explainability

;

AI governance

;

audit

;

EU AI act

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Governments across Europe and North America have spent the past decade integrating algorithmic classifiers into criminal justice risk assessment, welfare fraud detection, child protection screening, and healthcare resource allocation, contexts in which automated outputs directly determine individual outcomes: liberty, economic security, and access to essential services [2,3].

Institutions are drawn to AI for clear reasons: standardized decision-making at scale, lower operational costs, and claimed improvements in predictive accuracy. But public-sector classifiers operate in a normative environment that commercial applications do not. Democratic accountability, due process, and fundamental rights protections impose requirements that are simply absent from, say, a content recommendation algorithm. When an automated decision denies a benefit, assigns a risk score, or flags an individual for investigation, its legitimacy depends on more than accuracy; it must also be explicable, contestable, and traceable to identifiable actors who can be held responsible for it.

The regulatory environment has responded to these concerns with increasing urgency. The European Union's AI Act (Regulation 2024/1689), which entered into force in August 2024, classifies systems used in criminal justice, social benefit administration, and biometric identification as high-risk under Annex III, imposing obligations relating to risk management, data governance, human oversight, and transparency [7]. The General Data Protection Regulation, operating in parallel, establishes qualified rights to explanation for automated decisions with significant individual effects, provisions whose precise legal scope and technical operationalization remain contested [12]. Together, these instruments represent the most substantial regulatory effort yet to impose accountability obligations on public-sector AI. The gap between regulatory ambition and operational reality, however, remains wide: full obligations for most high-risk systems do not apply until August 2026, and for certain high-risk systems embedded in regulated products the deadline extends to August 2027.

Despite growing regulatory attention, critical gaps persist at the intersection of technical explainability, institutional accountability, and citizens' access to effective redress. Current deployments frequently exhibit deliberate algorithmic opacity [4,5]: proprietary protection that forecloses meaningful external scrutiny even when public bodies rely on algorithmic systems for consequential administrative determinations. Post-hoc explainability tools generate outputs that may not faithfully represent the model's actual decision logic [9], while existing redress mechanisms are procedurally inaccessible to the citizens most affected by erroneous determinations [13].

This article is presented as a narrative review. It does not report original empirical research or introduce a new dataset. Instead, it synthesises academic literature, regulatory developments, technical standards, and documented public-sector AI cases in order to examine how algorithmic accountability can be strengthened through continuous audit and audit-ready governance. The central contribution of the review is conceptual: it distinguishes between compliance-oriented transparency and audit-ready accountability, arguing that high-risk public AI systems require evidence structures that allow decisions, model behaviour, human oversight, redress mechanisms, and governance responsibilities to be reconstructed and assessed over time.

2. Research Approach and Methodology

This article adopts a narrative interdisciplinary review design rather than a systematic review protocol. This choice reflects the nature of the research problem. Ethical and accountability challenges in AI classification for critical public services do not belong to a single disciplinary field and cannot be adequately assessed through technical literature alone. They emerge at the intersection of machine learning design, administrative decision-making, legal accountability, public procurement, and citizens’ procedural rights. A narrative review is therefore appropriate for synthesising heterogeneous evidence across computer science, public administration, legal theory, and data ethics, particularly in an emerging technological domain where concepts, regulatory obligations, and deployment practices remain unsettled.

The review covers the period from 2016 to 2026. This timeframe begins with the public documentation of the COMPAS case by Angwin et al. [3], which remains a reference point for debates on algorithmic bias, proprietary opacity, and contestability in criminal justice. More recent sources were prioritised where they address developments in AI governance, explainability, redress, and regulation, particularly in relation to the EU AI Act and public-sector algorithmic transparency. Earlier conceptual work was included only where it remains directly relevant to the analytical framework, such as Burrell’s account of opacity in machine learning systems [5] and Mökander et al.’s work on ethics-based auditing [11].

Source selection was guided by relevance, credibility, and evidentiary value. The review draws primarily on peer-reviewed academic literature addressing algorithmic bias, explainable AI, public administration, AI governance, and accountability. It also incorporates institutional and regulatory materials, including the EU AI Act and GPAI/OECD guidance on algorithmic transparency in the public sector, because these sources define the governance environment within which public bodies operate. In addition, the review uses high-quality investigative journalism where it provides empirically documented evidence not otherwise available in academic form, most notably ProPublica’s investigation of COMPAS. Such sources are treated as case evidence rather than as theoretical authority.

The empirical cases discussed in the review were selected because they are among the most publicly documented and analytically relevant examples of AI classification failures in critical public services. COMPAS illustrates the criminal justice domain, the healthcare risk stratification case discussed through Alon-Barkat et al. illustrates the health domain, and the Dutch childcare benefits scandal, the Rotterdam welfare fraud algorithm, and MiDAS illustrate the social benefits and welfare domain. These cases were not selected to represent all public-sector AI deployments. They were selected because they provide verified evidence of bias, opacity, institutional responsibility gaps, and failures of redress across distinct administrative contexts.

The interdisciplinary structure of the review follows from the claim that accountability failures in public-sector AI cannot be reduced to model performance. Computer science literature is used to examine classification systems, explainability, bias indicators, model monitoring, and audit-ready technical architectures. Public administration literature is used to analyse delegation, bureaucratic responsibility, good governance, and the outsourcing of public authority. Legal and regulatory sources are used to assess the right to explanation, high-risk AI obligations, procurement-related opacity, and the limits of formal compliance. Data ethics literature is used to connect these technical and institutional questions to discrimination, fairness, and democratic accountability.

The review has three limitations. First, as a narrative review, it does not generate new empirical evidence and does not claim to provide an exhaustive mapping of all public-sector AI classification systems. Second, the selected cases are those that became publicly visible through litigation, journalism, regulatory scrutiny, or academic analysis, which introduces a selection bias towards documented failures. Third, the findings cannot be generalised to undocumented deployments whose design, performance, or governance arrangements remain inaccessible. The purpose of the review is therefore analytical rather than statistical: to identify recurring accountability mechanisms and governance failures across well-documented cases, and to use those patterns to develop the distinction between compliance-oriented transparency and audit-ready governance.

No new empirical data were collected or analysed for this review. The cases discussed are used as documented examples to illustrate recurring accountability challenges, rather than as a comparative empirical sample. The article therefore does not claim statistical generalisability. Its purpose is conceptual and analytical: to identify recurring governance failures across well-documented public-sector AI cases and to develop an audit-ready accountability perspective for high-risk public AI systems.

Figure 1 summarises the audit-ready accountability logic proposed in this review. The figure should be read as an evidence chain rather than as a purely technical pipeline. It shows that accountability in high-risk public AI systems depends on the ability to connect data training, deployment, monitoring, bias detection, redress, and governance review into a continuous audit structure.

The figure reinforces the central argument of the review: accountability cannot be located only at the moment of model development or formal regulatory approval. It must be maintained throughout the operational lifecycle of the system. If any part of the evidence chain is missing, public institutions may be unable to explain, audit, correct, or suspend the use of the AI system when harms or failures emerge.

3. Conceptual Framework

3.1. AI Classification Systems in Public Sector Contexts

For the purposes of this review, an AI classification system is understood as any computational system that processes input data to assign individuals, cases, or events to predefined categories: risk levels, eligibility determinations, fraud indicators, or recidivism scores, in ways that directly inform or substitute human administrative judgment. In the public sector, such systems typically operate as automated or semi-automated decision support tools embedded within bureaucratic workflows, including scoring systems that rank applicants for social benefits and predictive models that flag welfare fraud or assess criminal recidivism risk [2,3]. These classifiers span a wide technical spectrum, ranging from logistic regression and decision trees through ensemble methods such as gradient-boosted models and, increasingly, deep learning architectures whose internal reasoning is not directly interpretable by human operators.

Public administration is structured around principles of legality, proportionality, non-discrimination, and procedural due process. An algorithm that determines whether a citizen receives a benefit, faces investigation, or is deprived of liberty is an exercise of public authority, subject to the same democratic accountability requirements as any other form of administrative action [14]. The technical architecture of a classifier matters less than the normative context in which it operates.

3.2. Critical Public Services as High-Stakes Deployment Domains

Critical public services, as the term is used throughout this review, refers to those governmental functions in which automated decisions carry the potential for life-altering consequences for individual citizens: criminal justice and policing, healthcare resource allocation and clinical risk stratification, social benefit entitlement and fraud detection, and child protection. The designation reflects something more specific: the asymmetric power relationship between the state and the individual citizen who is subject to algorithmic determination without necessarily having the means to contest, understand, or appeal it [6,13].

3.3. Algorithmic Bias: Definitions and Structural Origins

Algorithmic bias refers to the systematic and unjustified difference in outcomes produced by an AI system across demographic groups, in ways that cannot be attributed solely to legitimate predictive features [8,15]. The literature distinguishes bias that originates in training data selection (pre-processing), model architecture choices (in-processing), and inconsistent application of outputs (post-processing), categories that are not mutually exclusive in practice [1].

Bias in AI systems is not primarily a technical artefact introduced by careless engineering. It is, in substantial part, a social one, a reflection of the structural inequalities embedded in the historical data used for training [2]. When a recidivism prediction model is trained on arrest records, it learns patterns that reflect policing practices rather than actual criminal behavior, because arrest rates are themselves a function of which communities have historically been subject to intensive law enforcement surveillance. The model then amplifies those patterns as objective risk scores, and presents a socially constructed inequality as a statistically derived prediction [3]. This distinction between bias as an engineering failure and bias as a social reproduction mechanism is decisive: it implies that bias cannot be resolved by better data cleaning or hyperparameter tuning alone, but requires interrogating the social conditions that produced the training data in the first place. Goncalves and Correia [9] reinforce this position from an engineering governance perspective, treating bias and fairness checks as first-class governance indicators rather than optional diagnostics, a framing that acknowledges the structural rather than incidental nature of the problem.

3.4. Explainable AI (XAI): Concepts and Limits

Explainable artificial intelligence (XAI) encompasses a family of methods and techniques designed to make the outputs, reasoning processes, or internal representations of AI models interpretable to human observers. The field distinguishes between intrinsic explainability, where the model architecture itself is interpretable by design (as in linear models, decision trees, or rule-based classifiers), and post-hoc explainability, where explanation techniques are applied externally to an already-trained model to generate approximations of its decision logic [9]. The most widely deployed post-hoc methods, including SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanations), and counterfactual generation, produce feature importance rankings or contrastive examples that approximate local model behavior but do not provide direct access to the global decision boundary or causal structure of complex models.

Under the GDPR, Article 22 establishes a qualified right to "meaningful information about the logic involved" in automated decisions. The EU AI Act adds a requirement that high-risk systems produce outputs interpretable to their users and supervisors [7,12]. But technical explainability and legally meaningful explanation are not the same thing. An explanation that identifies "age" and "employment history" as the leading features in a benefit denial decision provides information about the model’s statistical associations; it does not constitute a justification in the legal or administrative sense, because it does not establish whether those associations are legitimate, non-discriminatory, and proportionate to the decision outcome.

3.5. Accountability: Structures, Chains, and Algorithmic Disruption

Accountability, in the public administration literature, refers to the obligation of actors exercising delegated public authority to justify their actions to designated forums, and to bear consequences if those actions fall short of established expectations [2]. In democratic governance, accountability chains run from individual citizens through elected representatives to public servants, and back again, so that the exercise of public power remains subject to scrutiny and correction. AI classification systems disrupt this chain. The "agent" making the determination is no longer a human civil servant whose reasoning can be interrogated, but an algorithmic system whose outputs may be opaque, whose training was performed by a third-party vendor, and whose decision logic is protected by proprietary intellectual property claims [2,14].

The result is diffuse accountability: no single institutional actor can be held responsible for an adverse outcome because responsibility is spread across developers, deploying agencies, supervisors, and elected officials, without clear delineation [6]. This dynamic is a democratic deficit that existing accountability frameworks, designed for human bureaucratic action, have not been redesigned to address.

3.6. Model Governance and the Audit Imperative

Model governance refers to the organizational, procedural, and technical mechanisms by which an institution maintains control over the lifecycle of AI systems from design and training through deployment, monitoring, and retirement. In the context of high-risk public sector AI, model governance encompasses: the documentation of model specifications, training data provenance, and performance metrics; the establishment of bias monitoring protocols as continuous governance indicators rather than one-time pre-deployment checks; the versioning of governance parameters (thresholds, escalation triggers, override procedures) to accommodate regulatory change without system re-engineering; and the structuring of explainability outputs into audit-ready evidence bundles that can be queried, replayed, and verified by oversight bodies [9]. Auditability, in this context, is not a feature that can be added retrospectively to a deployed system: it must be designed into the system architecture from the outset, as an intrinsic property of how evidence is generated, stored, and made accessible to accountability forums.

4. State of the Art

4.1. AI Classification in Criminal Justice: Risk Scoring and Predictive Policing

The emblematic case in criminal justice AI remains the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) system, documented by Angwin et al. [3] in an investigation covering over 7,000 defendants in Broward County, Florida. COMPAS assigns recidivism risk scores routinely provided to judges during pretrial and sentencing. ProPublica's analysis revealed systematically asymmetric errors: Black defendants were flagged as future criminals at nearly twice the rate of White defendants who did not reoffend, while White defendants were more frequently misclassified as low risk despite later committing offenses [3]. Controlling for criminal history, charge type, age, and gender, Black defendants remained 77% more likely to be assigned higher risk scores. Northpointe declined to disclose the model's methodology, citing proprietary protection, a position accepted by the courts and left defendants with no viable mechanism to contest the algorithmic basis of their risk assessment.

What the COMPAS case reveals about the character of algorithmic bias in criminal justice matters beyond its factual specifics. The model does not explicitly use race as an input variable; and yet it produces racially disparate outputs, because the proxy variables it employs (residential stability, employment history, prior arrests) are themselves correlated with race through historical patterns of discriminatory policing and socioeconomic marginalization. The algorithm does not introduce racial bias; it discovers, formalizes, and scales it [3]. The model's proprietary status meant that neither defense attorneys, judges, nor affected defendants could audit its logic, and this opacity was commercially motivated and institutionally tolerated.

4.2. AI Classification in Healthcare: Risk Stratification and Resource Allocation

The healthcare domain presents a distinct but structurally analogous challenge. AI classifiers in clinical contexts are deployed for patient risk stratification, disease prediction, treatment recommendation, and resource allocation, functions that determine which patients receive priority access to specialist care, intensive monitoring, or preventive interventions. The most extensively documented case of racially biased healthcare classification involves a widely used commercial algorithm [2], which governed the allocation of additional care management resources to high-risk patients in the United States. The algorithm used healthcare costs as a proxy for clinical need, a seemingly technical and race-neutral design choice that produced systematically biased outcomes because Black patients, due to longstanding barriers to healthcare access, had historically lower healthcare expenditures than White patients with equivalent clinical severity. The algorithm therefore systematically underestimated the health risk of Black patients relative to White patients at comparable levels of illness, effectively directing care resources away from the population with lower historical healthcare spending. This is a textbook instance of pre-processing bias: the choice of proxy variable (cost) embedded a structural social inequality into the model before any training had taken place.

4.3. AI Classification in Social Benefits and Welfare: Fraud Detection and Eligibility

The social benefits domain has produced some of the most consequential AI classification failures in Europe. The Dutch childcare benefits scandal, in which the national tax authority deployed a risk-scoring algorithm using nationality as a fraud indicator, resulted in over 30,000 families wrongly flagged, stripped of benefits, subjected to forced repayment demands, and in many cases pushed into bankruptcy before the system was identified as discriminatory [2]. The scale illustrates the specific risk of automated decision-making: algorithmic errors replicate systematically across all cases to which the model is applied, producing mass effects operationally impossible in a purely human bureaucracy.

Investigative reporting subsequently revealed that the municipality of Rotterdam had additionally deployed a discriminatory welfare fraud detection algorithm that used Dutch language proficiency as a risk indicator, effectively treating the paperwork errors of residents with limited Dutch language skills as predictive of deliberate fraud [2]. This case is instructive because the bias mechanism was not encoded in the algorithm's explicit design (no one deliberately selected language skills as a fraud proxy), but emerged from the training data, which conflated genuine administrative mistakes with intentional fraud across a population where language barriers correlated strongly with migration background. The algorithm learned the conflation and reproduced it at scale.

The US context provides an additional landmark case in welfare-related algorithmic discrimination: the MiDAS (Michigan Integrated Data Automated System) unemployment fraud detection system, which generated approximately 40,000 incorrect fraud determinations, with the algorithm assessed to have been wrong in approximately 93% of the cases it flagged, through a proprietary scoring process similarly insulated from external audit [2].

4.4. Emerging Regulatory Trends and Governance Frameworks

The regulatory response has moved quickly. The EU AI Act imposes binding governance requirements on AI in high-risk public contexts, the broadest such framework in force, with conformity assessments, bias testing obligations, human oversight requirements, and post-market monitoring for systems classified under Annex III [7]. The GDPR's Article 22 provisions on automated decision-making, contested in their scope and technical operationalizability since their entry into force in 2018, have been substantially reinforced by the AI Act's complementary requirements, though the legal relationship between the two instruments and their combined implications for the right to explanation in administrative contexts remain subjects of active jurisprudential debate [12].

At the institutional and technical governance level, the GPAI/OECD framework on algorithmic transparency in the public sector establishes normative expectations for proactive disclosure of algorithmic systems used in government, including the publication of algorithmic registers and impact assessments [10]. However, as Goncalves and Correia [9] document, the dominant challenge is not the absence of regulatory requirements but the gap between those requirements and operational engineering practice: in most current deployments, explainability outputs are generated as isolated technical reports, disconnected from governance workflows and the continuous monitoring mechanisms that genuine compliance with the EU AI Act's lifecycle obligations would require. The technical vocabulary of accountability (model cards, conformity documentation, bias audits) has proliferated without the institutional infrastructure required to make these artefacts meaningful instruments of democratic oversight rather than compliance theater.

The cases reviewed in the preceding subsections illustrate how these regulatory and governance concerns materialise across different public-service domains. Although the systems differ in purpose, technical design, and institutional context, they reveal a recurring pattern: high-risk public AI systems may affect rights, services, scrutiny, or access to essential support while remaining difficult to explain, contest, audit, or correct. Table 1 summarises these cases and identifies the main accountability lesson that each one offers for continuous audit.

5. Critical Analysis

5.1. Intrinsic versus Post-Hoc Explainability: A False Equivalence in High-Stakes Contexts

The dominant paradigm for delivering explainability in deployed AI classification systems relies overwhelmingly on post-hoc methods, techniques applied externally to an already-trained model to approximate a human-interpretable account of its outputs. SHAP, LIME, and counterfactual generation each produce explanations by sampling the model's local behavior rather than reading its actual decision logic [9]. Two dimensions of this problem are worth distinguishing: fidelity, meaning how accurately an explanation represents the model's actual decision mechanism; and stability, meaning the consistency of explanations for similar inputs. Both are measurable engineering properties that Goncalves and Correia [9] treat as governance indicators, yet in most deployments neither is monitored or reported.

The difference between intrinsic and post-hoc explainability determines what accountability can mean in practice. An intrinsically interpretable model (a sparse linear classifier, a shallow decision tree, or a rule-based system) produces decisions whose logic is constitutively transparent: any human observer with sufficient numeracy can follow the reasoning from inputs to output. A gradient-boosted ensemble or deep neural network producing a SHAP explanation produces a statistical approximation of that reasoning, locally accurate but globally misleading, and open to manipulation, deliberate or inadvertent, to present a defensible account of a decision whose actual mechanism the explanation does not faithfully represent [2]. Goncalves and Correia [9] document this gap in engineering practice directly, noting that in many deployments, explanations are generated as isolated technical reports that remain weakly connected to decision provenance, governance actions, and audit logs, meaning that the explanation is produced for compliance purposes but does not function as an instrument of actual oversight.

As Goncalves and Correia [9] argue, explainability artefacts need to be routed into structured, versioned, queryable evidence chains throughout the AI lifecycle, where explanation stability and fidelity are treated as governance indicators subject to continuous monitoring, not assumed at the point of initial deployment.

A further limitation concerns the instability and strategic manipulability of post-hoc explanations. If an explanation is produced by an external approximation method rather than by the model’s internal decision structure, similar inputs may generate materially different explanations under small changes in sampling, perturbation settings, model version, or surrounding data distribution. This creates an accountability problem because the affected citizen, reviewer, or auditor may be presented with an explanation that is locally plausible but not stable enough to support institutional justification. Goncalves and Correia [9] treat explanation stability and fidelity as governance indicators precisely because an explanation that cannot be reproduced, compared, or monitored across versions cannot function as audit evidence. The problem is not only technical instability but also adversarial explainability: a system may be configured, deliberately or indirectly, to produce explanations that appear compliant while leaving the underlying decision logic unchanged or insufficiently scrutinised. In legal terms, this exposes the weakness of treating post-hoc explanation as equivalent to meaningful accountability. As Nisevic et al. [12] show in their analysis of the relationship between the GDPR and the AI Act, the legal demand for explainability is not satisfied by the mere production of an intelligible output; the explanation must support the affected person’s ability to understand, contest, and obtain review of the decision. Post-hoc explanations that satisfy formal transparency requirements while remaining unstable, non-reproducible, or detached from decision provenance therefore risk becoming compliance artefacts rather than instruments of accountability.

5.2. Diffuse Accountability and the Outsourcing Dynamic

Democratic accountability assumes that the exercise of public authority can be traced to identifiable actors who can be held responsible for its consequences. Algorithmic decision-making disrupts this by introducing what Alon-Barkat et al. [2] describe as a new link in the delegation chain, a technical artefact whose functioning is opaque, whose development may have occurred entirely outside the deploying institution, and whose outputs carry public authority while remaining insulated from the scrutiny that human administrative discretion is ordinarily subject to.

The opacity problem is compounded further by the proprietary nature of most AI systems acquired from commercial vendors. When a public body deploys a privately developed classifier, intellectual property protections attached to that system may prevent the deploying institution itself from fully understanding how the system reaches its outputs, let alone enabling external audit, judicial review, or citizen contestation [2,4].

The empirical findings of Alon-Barkat et al. [2] add a further concern. Their preregistered survey experiment with 2,483 Dutch citizens found that citizens assign meaningfully greater organizational responsibility to public bodies for algorithmic discrimination when the algorithm was developed in-house than when it was procured from an external private vendor, a finding that the authors theorize may create perverse institutional incentives. If reduced citizen attribution of responsibility follows from external procurement, public bodies may be rationally incentivized to outsource algorithmic development precisely as a mechanism for attenuating their accountability exposure for system failures. This dynamic outsourcing as responsibility diffusion, would produce a systematic bias in procurement decisions toward externally developed systems even when in-house development or rigorous vendor oversight would better serve the public interest. Accountability mechanisms must therefore follow the functional exercise of algorithmic authority, not the formal employment relationship of the system developer, and procurement regulations must impose accountability obligations on the public body deploying the system regardless of where that system was built.

This outsourcing dynamic also needs to be addressed at the procurement stage, before the system enters public administration. The accountability problem is not only ex-post, arising after an algorithmic failure has occurred; it is also ex-ante, embedded in the contractual conditions under which public bodies acquire proprietary systems. Bloch-Wehba [4] shows that procurement arrangements can intensify algorithmic opacity when vendors retain control over model documentation, source logic, training data access, or audit permissions through trade secrecy and contractual restrictions. In this context, audit rights cannot be treated as a negotiable safeguard to be requested after vendor selection. They must be mandatory procurement conditions for any system used to exercise public authority. Alon-Barkat et al. [2] reinforce the institutional significance of this point: when citizens attribute less responsibility to public bodies for outsourced algorithmic discrimination, external procurement can weaken perceived accountability even though the public body remains the actor deploying the system. Public contracts for AI classification in critical services should therefore require audit access, documentation duties, explanation traceability, and cooperation with oversight bodies as baseline conditions of eligibility, not as optional terms dependent on vendor discretion.

5.3. Algorithmic Discrimination as Structural Inequality: Beyond Individual Error

Public and regulatory discourse often treats algorithmic bias as individual error: mispredictions fixable by improving model accuracy. The cases reviewed in Section 4, read alongside the theoretical account in Section 3.3, point to a more specific claim: algorithmic discrimination in critical public services is structurally produced, the systematic outcome of applying a pattern-recognition system trained on historically inequitable data to populations that continue to bear the consequences of those historical inequities.

This structural account has several implications that the individual-error framing obscures. First, it implies that improving model accuracy in aggregate, as measured by standard metrics such as overall predictive accuracy, does not necessarily reduce discriminatory impact. The COMPAS system was reported to achieve approximately 65% overall predictive accuracy for recidivism, which is modestly above chance; yet within this aggregate figure, it generated systematically asymmetric error rates across racial groups in ways that were harmful and discriminatory [3]. An algorithm can be accurate in aggregate and discriminatory by group, and aggregate performance metrics provide no information about whether the distribution of errors is equitable. Ferrara [8] and Wang et al. [15] both document the mathematical tensions between competing fairness definitions (equalized odds, demographic parity, individual fairness, and calibration) that make the simultaneous satisfaction of all fairness criteria impossible in certain data distributions, a result known in the technical literature as the fairness impossibility theorem. The choice of which fairness metric to optimize is therefore a value-laden institutional decision, not a technical one; yet in current practice, that decision is typically made by the algorithm developer without democratic deliberation or public mandate.

5.4. The Cognitive Limits of Human-in-the-Loop Oversight

The EU AI Act's requirements for human oversight rest on an unexamined assumption: that human review of algorithmic outputs actually functions as a safeguard against erroneous or discriminatory determinations. The relevant question is not whether a human is formally present in the decision loop but whether that human is positioned, with the right information, time, and institutional support, to scrutinize the output rather than simply ratify it.

The human-computer interaction literature has extensively documented the phenomenon of automation bias: the tendency of human operators to over-rely on automated system outputs, particularly when those outputs are presented with apparent precision or authority, and when the cognitive load of independent verification is high [2]. In the context of AI classification in public services, the conditions that produce automation bias are structurally present: caseworkers reviewing algorithmic risk scores operate under time pressure, with large decision volumes, and with limited access to the technical information necessary to evaluate whether a given score is reliable for the individual case at hand. The human is in the loop formally but not substantively.

Goncalves and Correia [9] address this from the engineering side, arguing that effective human oversight requires interface layers that adapt explanations to different user profiles and provide structured mechanisms for review and override, structuring the cognitive process of scrutiny rather than simply displaying model outputs. Meaningful human oversight is a design requirement, not a default condition achieved by inserting any human agent into any position adjacent to the algorithmic output.

6. Implications for Audit

6.1. From Regulatory Compliance to Continuous Audit: Reframing the Audit Function

A tension runs through current AI governance practice: regulatory frameworks impose compliance obligations, but satisfying those obligations through conformity assessment, technical documentation, and explainability reports does not constitute ongoing institutional oversight. Compliance is a snapshot; accountability requires ongoing scrutiny of how authority is exercised. A system that passes conformity assessment at deployment may quietly degrade in fairness, accuracy, or explanation stability as the operational data distribution shifts, with no automatic trigger for re-evaluation or notification to oversight bodies [9].

The audit function in AI-intensive public administration cannot be a periodic inspection of static documentation. It needs to be a continuous process that draws on structured evidence produced by the AI system throughout its operational life. This has direct technical requirements: the AI system must be engineered to emit compliance evidence: decision records linking inputs, model version, explanation artefacts, and human reviewer actions, at every decision point, not merely on demand during an audit engagement. The evidence must be structured in standardized, queryable schemas, versioned to support temporal tracing, and cryptographically protected against post-hoc modification, conditions that Goncalves and Correia [9] operationalize in their XAI-Compliance-by-Design framework through append-only logging, hash-chaining, and role-based access controls on evidence stores. The audit function then becomes the process of querying these evidence bundles to verify that the system's documented behavior corresponds to its actual decision patterns, that bias indicators remain within governance-approved thresholds, and that explanation stability has not degraded beyond configured monitoring criteria.

This reframing has direct consequences for public procurement: audit-readiness cannot be negotiated after vendor selection; it must be a conformity specification from the start, requiring vendors to demonstrate that their systems are engineered to produce verifiable evidence rather than merely asserting compliance in documentation.

6.2. Data Governance and Bias as First-Class Audit Indicators

The structural account of algorithmic bias developed in Section 5.3 implies that bias monitoring cannot be a pre-deployment quality check; it must be a persistent, first-class governance indicator maintained throughout the operational life of the system. This position is explicitly adopted by Goncalves and Correia [9], who treat bias and fairness checks as integrated governance indicators rather than optional diagnostics, and require that they produce evidence artefacts structured for audit inspection with traceable escalation pathways when thresholds are breached. The engineering case for continuous monitoring is that data distributions shift: the population a public sector classifier operates on changes over time due to demographic change, policy reforms, or evolving administrative practices; a system whose fairness metrics were acceptable at training time may develop discriminatory patterns post-deployment without any change to its model parameters.

Effective data governance for audit purposes therefore encompasses several interconnected commitments that are frequently absent from current public sector deployments. Training data provenance must be documented in sufficient detail to support a retrospective determination of whether the selection criteria, labeling procedures, and feature choices were themselves equitable. Bias monitoring in production must additionally operate across the full set of fairness metrics relevant to the deployment domain (not merely the metric optimized during training) and must generate drift reports that function as governance triggers for review and potential recalibration when disparate impact indicators cross configurable thresholds [9]. Most consequentially for accountability, however, the fairness thresholds themselves must be institutionally determined by the deploying public body (not technically defaulted by the model developer), because the choice of which demographic groups to monitor, which fairness metric to prioritize, and what level of disparity is institutionally acceptable are irreducibly political decisions that belong to democratic governance rather than to technical optimization.

6.3. Redress Mechanisms as Audit Instruments

The right to redress (to contest an algorithmic determination, obtain human review, and receive remediation) is both a fundamental rights protection and an underutilized audit instrument. Pi and Proctor [13] find that redress is the least developed component of current AI governance frameworks, despite being the most directly democratically significant. A complaint mechanism that collects structured information about contested decisions generates a data stream capable of revealing discriminatory patterns at scale; individual errors that are too numerous to investigate case-by-case become statistically identifiable in the aggregate [11]. An effective redress framework for algorithmic public administration requires four mutually reinforcing conditions. Citizens must have a legally enforceable right to human review of any consequential automated determination. Complaints must be registered in a structured way that captures information usable for pattern analysis. Public bodies must be obligated to analyze complaint data for evidence of systemic error or discriminatory impact. And findings from that analysis must feed back into governance review of the deployed model.

The EU AI Act's requirement for adequate redress for persons affected by high-risk AI systems is a necessary but substantially underspecified obligation in this regard [7]. Its effective implementation would require public bodies to invest in complaint infrastructure that currently does not exist in most jurisdictions, and to recognize that this infrastructure is a core component of the audit architecture for AI systems, not simply a service for processing individual complaints whose scale, opacity, and speed of error propagation make individual casework review an insufficient accountability mechanism on its own.

7. Challenges and Limitations

7.1. The Persistence of Bias: Structural Resilience and Mitigation Gaps

The most fundamental challenge in the governance of AI classification systems in critical public services is that algorithmic bias resists detection and is deeply resilient to the mitigation strategies currently available. Technical debiasing approaches, including re-weighting training samples, applying fairness constraints during model training, and post-processing model outputs to equalize error rates across demographic groups, can reduce measurable disparities on the specific fairness metric being optimized while leaving others unaddressed, or while introducing new disparities along dimensions that were not monitored [8,15]. The mathematical impossibility of satisfying all standard fairness definitions at once under realistic data conditions (established in the technical literature) means that no technical fix can eliminate this fundamental tension. The challenge is therefore less about finding the correct fairness algorithm than about developing institutional processes for making democratically legitimate choices between competing fairness definitions, and for monitoring those choices as operational conditions evolve over time.

7.2. Opacity, Intentional and Technical

The opacity of AI classification systems in public services operates at two distinct levels. The first is deliberate opacity: the protection of algorithmic methodology through intellectual property law and trade secrecy. Burrell [5] distinguishes this (opacity as intentional corporate or state secrecy) from technical opacity arising from model complexity; Bloch-Wehba [4] demonstrates how it operates through procurement contracts that place vendors beyond freedom-of-information obligations, a dynamic documented in European deployments by Alon-Barkat et al. [2]. This form of opacity is a governance choice. It results from procurement frameworks that accept proprietary vendor constraints and is therefore addressable through procurement reform, mandatory disclosure requirements, and secure audit access regimes so that oversight bodies can examine model internals under confidentiality conditions.

The second form of opacity is technical: the irreducible complexity of modern machine learning architectures whose internal decision logic cannot be rendered fully interpretable even with complete access to all model parameters and training data. This is not a governance failure but an engineering reality: when the model's causal reasoning is not accessible even to its developers, any explanation provided to an affected citizen is necessarily an approximation whose fidelity is measurable but limited [9].

7.3. Model Security: Adversarial Threats to Classification Integrity

Model security against adversarial manipulation is a challenge frequently absent from ethical and accountability discussions, and one the document governing this review explicitly identifies as required. AI classifiers in high-stakes public contexts are potentially attractive targets for three categories of attack that have distinct governance implications. Data poisoning, the deliberate manipulation of training data to induce systematic errors or to create backdoor vulnerabilities in the trained model is particularly relevant in contexts where training data is collected from administrative records susceptible to manipulation by motivated actors, such as criminal justice records, welfare claim histories, or electronic health records. A public official, vendor, or external actor with access to training data pipelines could in principle introduce corrupted samples that shift model behavior in targeted ways without triggering conventional quality checks [9].

Model extraction attacks work differently: an adversary reconstructs an approximation of the deployed model by querying its API, then uses that reconstruction to study decision boundaries in sufficient detail to game inputs, which enables systematic manipulation of the features used in welfare fraud detection or recidivism classification to obtain favorable classifications. The governance implication is that audit frameworks must include adversarial robustness assessment alongside fairness and explainability verification, a requirement that Goncalves and Correia [9] address through integrity protection of evidence stores and proactive red-teaming as a component of continuous compliance monitoring.

A further security risk concerns attacks that expose sensitive information rather than directly altering classification outcomes. Model inversion attacks attempt to infer characteristics of the training data from model outputs, while training-data extraction attacks seek to recover specific records or data fragments used during model training. In critical public services, this risk has a distinct governance weight because training datasets often contain sensitive information about citizens: criminal records, welfare histories, health conditions, employment status, household composition, or previous interactions with administrative authorities. Even where the deployed classifier does not disclose raw data, repeated interaction with model outputs may allow an adversary to infer protected attributes or reconstruct information about vulnerable groups. Goncalves and Correia [9] frame this type of risk as part of the broader need for lifecycle governance, since compliance cannot be limited to fairness and explainability if the evidentiary architecture itself exposes confidential decision data. Audit-ready governance must therefore include access control, logging of model queries, integrity protection of evidence stores, and adversarial testing capable of detecting whether model outputs, explanations, or audit artefacts create unintended channels for sensitive data disclosure.

7.4. Regulatory Risk and Implementation Gaps

The regulatory environment governing AI in public services is shifting rapidly, and public bodies that have invested in compliance architectures built around current regulatory interpretations face a specific organizational risk: the precise scope of required conformity assessments, the technical standards against which they will be measured, and the allocation of supervisory responsibility between national authorities and sector regulators remain subjects of active regulatory development [7]. The European Commission's Digital Omnibus package illustrates that the regulatory parameters themselves are subject to revision, reinforcing the case for versioned, configurable governance frameworks that can be updated without requiring the re-engineering of technical systems [9].

The organizational risks extend beyond regulatory uncertainty to the capacity constraints of public bodies attempting to implement AI governance requirements without the in-house technical expertise to evaluate whether their deployed systems actually satisfy those requirements. Alon-Barkat et al. [2] note that public sector technical expertise already seriously lags behind the private sector in AI capabilities, and the gap is self-reinforcing: outsourcing erodes the institutional motivation to develop in-house expertise; without that expertise, public bodies cannot meaningfully evaluate vendor claims of compliance, which makes them more dependent on those vendors, which erodes the motivation for expertise further.

8. Conclusion

This review has examined the ethical and accountability challenges of AI classification systems in critical public services: how algorithmic bias reproduces historical inequalities, why explainability falls short of genuine accountability, and why existing redress mechanisms consistently fail the citizens they are supposed to protect. Across criminal justice, healthcare, and welfare administration, the same structural failure recurs: proprietary opacity insulates systems from scrutiny while human oversight remains formal rather than substantive, and accountability mechanisms activate only after large-scale harm has already materialised.

The original contribution of this article is the distinction between compliance-oriented transparency and audit-ready governance as an analytical framework for assessing AI classification in public services. Compliance-oriented transparency treats explanation, documentation, and conformity assessment as outputs produced to satisfy regulatory requirements. Audit-ready governance treats those same elements as operational evidence within a continuous accountability architecture. The distinction matters because an AI system may be formally transparent while remaining practically unaccountable if its explanations are unstable, its bias monitoring is episodic, its decision records are not traceable, or its redress mechanisms are disconnected from institutional review.

Governance frameworks have too often treated compliance documentation as a substitute for accountability; the gap this creates is institutional rather than technical. Audit must shift to a continuous process built on structured evidence generated throughout system operation, with auditability engineered in from the start [9]. Bias monitoring must be a persistent governance indicator measured against thresholds that institutions set deliberately. Redress must function as both a citizen right and a systemic audit signal; structured complaint data can reveal discriminatory patterns at scale that individual case review cannot.

The review also has limitations. As a narrative review, it does not produce new empirical evidence and does not claim to provide a systematic inventory of all public-sector AI deployments. The cases examined are those that became publicly visible through journalism, litigation, regulatory attention, or academic analysis, which creates selection bias towards documented failures. The findings therefore cannot be generalised statistically to undocumented systems whose design, performance, and governance arrangements remain inaccessible. Their value lies instead in identifying recurring accountability failures across well-documented cases and translating those patterns into governance implications.

For public bodies, the practical implications are direct. First, audit rights over AI systems used in critical services should be mandatory procurement conditions, including access to documentation, model behaviour evidence, explanation artefacts, bias metrics, and vendor cooperation with oversight bodies. Second, public agencies should implement continuous monitoring of fairness, explanation stability, model drift, human overrides, and complaint patterns, with predefined escalation triggers when governance thresholds are breached. Third, redress mechanisms should be redesigned as part of the audit architecture: citizens must be able to contest consequential classifications, and complaint data must be structured, analysed, and fed back into model governance.

As a narrative review, this article does not claim to provide empirical generalisation or a systematic inventory of all public-sector AI deployments. Its contribution is to identify recurring accountability gaps across documented public-sector AI cases and to propose an audit-ready governance perspective for high-risk public AI systems. The review argues that accountable public AI requires more than transparency statements, ethical principles, or one-off compliance checks. It requires continuous audit structures capable of preserving evidence, monitoring performance and bias, supporting contestability, and enabling public institutions to correct, suspend, or redesign systems when their effects cannot be justified. In this sense, audit-ready accountability offers a bridge between AI governance theory and the practical responsibilities of public administration.

Author Contributions

Conceptualization, L.C., A.G. and M.M.M.; methodology, L.C. and A.G.; investigation, L.C.; writing—original draft preparation, L.C.; writing—review and editing, L.C., A.G. and M.M.M.; supervision, A.G. and M.M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analysed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AI	Artificial Intelligence
API	Application Programming Interface
COMPAS	Correctional Offender Management Profiling for Alternative Sanctions
EU	European Union
GDPR	General Data Protection Regulation
GPAI	Global Partnership on Artificial Intelligence
LIME	Local Interpretable Model-agnostic Explanations
MDPI	Multidisciplinary Digital Publishing Institute
MiDAS	Michigan Integrated Data Automated System
OECD	Organisation for Economic Co-operation and Development
SHAP	SHapley Additive exPlanations
XAI	Explainable Artificial Intelligence

References

Ahmad, A.; Vallès, Y.; Idaghdour, Y. Bias in AI systems: Integrating formal and socio-technical approaches. Front. Big Data 2026, 8, 1686452. [Google Scholar] [CrossRef] [PubMed]
Alon-Barkat, S.; Busuioc, M.; Schwoerer, K.; Weißmüller, K. S. Algorithmic discrimination in public service provision: Understanding citizens’ attribution of responsibility for human versus algorithmic discriminatory outcomes. J. Public Adm. Res. Theory 2025, 35(4), 469–488. [Google Scholar] [CrossRef]
Angwin, J.; Larson, J.; Mattu, S.; Kirchner, L. (2016, May 23). Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Bloch-Wehba, H. (2021). Transparency’s AI problem. Knight First Amendment Institute at Columbia University. https://ssrn.com/abstract=3871293.
Burrell, J. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data Soc. 2016, 3(1), 1–12. [Google Scholar] [CrossRef]
Cheong, B. C. Transparency and accountability in AI systems. Front. Hum. Dyn. 2024, 6, 1421273. [Google Scholar] [CrossRef]
European Parliament; Council of the European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Off. J. Eur. Union 2024, L 202401689. [Google Scholar]
Ferrara, E. Fairness and bias in artificial intelligence: A brief survey. Sci 2024, 6(1), 3. [Google Scholar] [CrossRef]
Goncalves, A.; Correia, A. Engineering explainable AI systems for GDPR-aligned decision transparency: A modular framework for continuous compliance. J. Cybersecur. Priv. 2026, 6(1), 7. [Google Scholar] [CrossRef]
GPAI/OECD. Algorithmic transparency in the public sector. In Global Partnership on Artificial Intelligence; 2024. [Google Scholar]
Mökander, J.; Morley, J.; Taddeo, M.; Floridi, L. Ethics-based auditing of automated decision-making systems: Nature, scope, and limitations. Sci. Eng. Ethics 2021, 27(4), 44. [Google Scholar] [CrossRef] [PubMed]
Nisevic, M.; Cuypers, A.; De Bruyne, J. Explainable AI: Can the AI Act and the GDPR go out for a date? In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2024), 2024. [Google Scholar] [CrossRef]
Pi, Y.; Proctor, M. Toward empowering AI governance with redress mechanisms. Camb. Forum AI Law. Gov. 2025, 1(e24), 1–22. [Google Scholar] [CrossRef]
Roehl, M.; Hansen, M. B. Automated, administrative decision-making and good governance: Synergies. Public Adm. Rev. 2024, 84(6), 1184–1199. [Google Scholar] [CrossRef]
Wang, X.; Wu, Y.; Ji, X.; Fu, S. Algorithmic discrimination: Examining its types and regulatory measures. Front. Artif. Intell. 2024, 7, 1320277. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Audit-ready evidence chain: continuous governance cycle for AI classification systems in critical public services.

Table 1. Public-sector AI Cases and Accountability Lessons.

Domain	Case or system	Main accountability risk	Lesson for continuous audit
Criminal justice	COMPAS	Risk scores may influence liberty-related decisions while remaining difficult to contest	High-risk AI requires explainability, independent scrutiny, and traceable human decision-making
Healthcare	Healthcare risk stratification systems	Biased proxies may reproduce unequal access to care	Labels, proxy variables, and subgroup performance require continuous monitoring
Welfare and social benefits	Dutch childcare benefits scandal	Automated suspicion may shift evidential burdens onto citizens	Redress, proportionality, and evidence preservation are essential
Local government	Rotterdam welfare fraud algorithm	Risk indicators may be used without adequate transparency or contestability	Public-sector AI requires audit access, legal justification, and review mechanisms
Public administration	MiDAS	Automated classification may produce large-scale administrative harm	Continuous audit must include post-deployment monitoring and correction mechanisms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.