From Prediction to Public Accountability: An Accountability-Chain Framework for AI Classification in Critical Public Services

João Figueiredo; Antonio Goncalves; Mario Monteiro Marques

doi:10.20944/preprints202606.1297.v1

Submitted:

15 June 2026

Posted:

17 June 2026

You are already at the latest version

Abstract

AI classification systems are increasingly used in critical public services to prioritise patients, screen welfare claims, assess risk, route migration applications, detect fraud, allocate educational support, and identify security alerts. Although these systems may improve triage, consistency, and administrative capacity, they also redistribute suspicion, delay access to essential services, affect liberty and welfare, and shift evidential burdens onto citizens. This review examines ethical and accountability challenges in public-sector AI classification through the linked requirements of fairness, explainability, auditability, cybersecurity, data governance, procurement control, human review, and contestability. It contributes a Public Classification Accountability Chain that connects seven auditable elements: public-purpose justification, data and label legitimacy, model and threshold choice, human oversight, audit evidence, citizen contestability, and post-deployment authority. The article argues that high-stakes public classifiers should be governed as sociotechnical accountability systems rather than as isolated predictive tools. Responsible deployment requires public institutions to justify the classification task, document data and thresholds, monitor subgroup harms, reconstruct decisions, secure systems, provide meaningful review and appeal, and suspend or redesign classifiers when their effects cannot be made proportionate, lawful, and accountable.

Keywords:

AI classification

;

public services

;

algorithmic accountability

;

fairness

;

explainable AI

;

algorithmic auditing

;

public-sector governance

;

automated decision-making

;

data governance

;

cybersecurity

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Public institutions increasingly use AI and data-driven classification systems to sort people, cases, applications, records, transactions, and alerts into operational categories. A system may classify a patient as high-risk, a welfare claim as suspicious, a defendant as likely to reoffend, a migrant application as requiring additional scrutiny, or a transaction as anomalous. In critical public services, these classifications are not merely technical outputs. They help determine who receives attention, who comes under suspicion, who must provide additional evidence, and who is delayed, excluded, investigated, sanctioned, or prioritised.

Classification can serve legitimate administrative purposes. It may help agencies process large caseloads, standardise triage, detect irregularities, and direct scarce human review toward cases that require closer attention [7,16]. The strongest argument for public-sector classification is therefore not that it replaces judgement, but that it can support judgement under conditions of scale and limited resources. A hospital may use a risk model to identify patients who need preventive care; a welfare agency may use anomaly detection to prioritise investigations; a border agency may use screening tools to route applications for review.

The same systems can also create distinctive public-law and ethical risks. A false positive may wrongly classify a person as risky, fraudulent, non-compliant, dangerous, or suspicious. A false negative may miss disease, abuse, fraud, safety risk, or a security threat. These errors are not only statistical. They can lead to delayed treatment, intrusive investigation, benefit suspension, intensified policing, reputational harm, or a transfer of evidential burden from the state to the individual. In this sense, public-sector classification redistributes attention, suspicion, administrative burden, and public authority.

The central problem is that many classifiers are complex, data-dependent systems whose logic, data lineage, error patterns, and operational effects are difficult for citizens, officials, auditors, courts, and regulators to inspect. Accuracy is necessary but insufficient. A system may be accurate on average and still be discriminatory, opaque, insecure, disproportionate, or institutionally unaccountable [1,15]. Conversely, a transparent model may still be illegitimate if it relies on biased labels, an unlawful target, or a procedure that gives affected individuals no realistic way to challenge the outcome.

This article reviews ethical and accountability challenges in AI classification for critical public services. It treats classification as a sociotechnical accountability problem rather than as a narrow model-performance problem. The review develops an accountability-chain framework: a sequence of public justifications connecting the classification task, data and labels, model and threshold, human review, audit trail, citizen contestation, and post-deployment authority. The framework reflects the article’s organising claim: many public-sector AI failures arise not only from model architecture, but also from weak problem definition, unsuitable data, poor procurement, symbolic oversight, missing records, and fragmented responsibility.

1.1. Contribution of This Review

This review makes three contributions. First, it synthesises ethical, legal, technical, and governance challenges associated with AI classification in critical public services. Second, it proposes the Public Classification Accountability Chain, a conceptual framework that links public-purpose justification, data and label legitimacy, model and threshold choice, human review, auditability, citizen contestability, and post-deployment authority. Third, it translates these governance requirements into practical audit questions that can support public-sector oversight, procurement, compliance, and accountability activities.

2. Review Method and Scope

This article uses a narrative review method to synthesise academic literature, technical frameworks, policy documents, standards, and public-administration scholarship on AI classification in critical public services. The review is conceptual and analytical rather than systematic: it does not claim exhaustive coverage of empirical studies and does not provide a meta-analysis of model performance. Its purpose is to connect technical properties of classifiers with institutional requirements of legality, fairness, auditing, cybersecurity, administrative procedure, and democratic accountability.

The review considered literature published primarily between 2018 and 2025, while also incorporating earlier foundational works where necessary to define enduring concepts in algorithmic accountability, fairness, explainability, and public administration. Source tracing focused on terms such as AI classification, risk scoring, algorithmic accountability, algorithmic auditing, fairness in machine learning, explainable AI, model cards, datasheets, automated decision-making, data protection impact assessment, dataset shift, calibration, and AI risk management.

Materials were included when they contributed to understanding at least one of four dimensions: (1) the technical operation of classification systems; (2) ethical risks such as bias, opacity, and automation bias; (3) institutional mechanisms including audits, documentation, procurement controls, and appeal procedures; or (4) domain-specific challenges in healthcare, criminal justice, border control, welfare, education, fraud detection, and public security. Materials were excluded when they treated AI primarily as a private-sector optimisation tool, discussed automation without classification or risk scoring, or lacked clear relevance to critical public services.

Although the review is narrative rather than systematic, the selection process followed a structured logic intended to ensure conceptual breadth and analytical coherence. The article should therefore be read as an integrated conceptual synthesis designed to connect technical AI governance literature with public-sector accountability requirements.

3. Conceptual Background

3.1. AI Classification

AI classification assigns observations to categories on the basis of patterns learned from data. In public administration, the observation may be a person, claim, case file, application, medical record, transaction, address, image, or text. The output may be a class label, a risk score, or a ranked priority. Classification differs from regression, which estimates continuous values, although many public systems convert scores into categorical decisions through thresholds. This conversion is a governance-critical step because it determines when a score becomes an intervention.

A classifier is not a complete decision procedure. It is an artefact embedded in a wider administrative process that defines the target variable, selects data, trains or configures the model, sets a threshold, routes the output to officials, and determines consequences. Even when the formal decision remains with a human official, the classifier may shape the evidence available to that official and influence the burden placed on the affected person.

For accountability purposes, the technical pipeline should be described in operational rather than purely mathematical terms. Public classifiers normally require a problem definition, an input dataset, feature construction, a target label, training and validation procedures, calibration checks, threshold selection, deployment rules, monitoring, and a record of how outputs are used in decisions. Each step creates a distinct governance question. A model trained on a weak label can be invalid even if it is statistically accurate; a calibrated score can become unfair if the threshold exposes one group to disproportionate scrutiny; and a well-documented classifier can still be unlawful if it is used for a purpose that the institution cannot justify. This pipeline view explains why classification governance must extend from data collection to appeal, correction, and system withdrawal.

3.2. Critical Public Services

Critical public services are services in which administrative classification can materially affect rights, liberty, welfare, health, education, migration status, taxation, policing, security, or access to essential support. They are high-stakes because the state has coercive or allocative authority, citizens may have limited alternatives, and errors may be difficult to reverse. A poor classification in this setting can do more than inconvenience a service user; it can alter access to basic goods or expose a person to investigation, surveillance, or sanction.

Criticality is also institutional. A person may be unable to avoid the public authority, choose an alternative provider, or inspect the records used to classify them. Administrative classifications can therefore become coercive even when they are formally described as triage or prioritisation. Scored welfare, policing, migration, and credit-like public systems show how predictive categories can become embedded in administrative routines and due-process burdens [3].

3.3. Decision Support, De Facto Automation, and Full Automation

A useful distinction can be drawn between decision-support systems, de facto automated systems, and fully automated systems. In a decision-support system, the classifier supplies evidence or a recommendation while an official remains responsible for the final decision. In a fully automated system, the classification directly triggers a consequence such as referral, denial, suspension, or prioritisation. Between these poles lies de facto automation: the official formally retains discretion, but organisational pressure, time limits, interface design, or lack of contrary evidence makes acceptance of the model output routine.

This distinction matters because legal and ethical risk does not depend only on whether a human is nominally present. A risk score that is almost always accepted may have the same practical effect as automation. Auditors should therefore test actual practice: acceptance rates, override rates, review time, reasons for disagreement, appeal outcomes, and whether staff can obtain the evidence needed to reject the model. The relevant question is not simply whether a human is in the loop, but whether the human can change the outcome on informed and accountable grounds [2].

3.4. Supervised Learning, Risk Scoring, and Anomaly Detection

Most high-stakes classifiers use supervised learning: a model learns relationships between input features and labelled outcomes. The technical sequence is usually simple in principle but difficult in public practice. Historical records are converted into examples; variables are selected or engineered as features; a label is chosen as the target; the model is fitted on training data; performance is tested on validation or holdout data; and the final system is deployed with rules for thresholding, review, logging, and monitoring. Failures can occur at each stage. Training data may omit people who were never served, features may encode protected characteristics through proxies, labels may reflect prior administrative choices rather than the underlying social condition, and validation data may not represent future deployment conditions.

The legitimacy of supervised learning depends heavily on the legitimacy of the label. A label such as prior arrest, historic cost, or previous investigation may be available in administrative data, but it may not represent the substantive concept the institution claims to predict. The health allocation case analysed by Obermeyer et al. [12] illustrates this problem: cost was used as a proxy for health need, but cost partly reflected unequal access to care. A similar problem can arise wherever the target is administratively convenient but normatively weak, such as using investigation history as a proxy for fraud risk or recorded incidents as a proxy for underlying danger.

Risk scoring usually produces a probability or ordered category that must be translated into administrative action. Thresholds determine which cases receive review or intervention. Calibration matters when a score is presented as a probability; a poorly calibrated score can appear precise while misleading decision-makers. Validation should therefore ask not only whether the model separates cases, but whether scores mean what officials believe they mean, whether confidence is communicated honestly, whether performance is stable across subgroups, and whether the threshold is proportionate to the consequence. Anomaly detection is different. It identifies unusual patterns, often without labelled examples of wrongdoing. In public administration, unusualness should be treated as a trigger for inquiry, not as proof of fraud, danger, or non-compliance.

3.5. Bias, Explainability, and Accountability

Algorithmic bias can enter through historical data, proxy variables, missing records, unequal measurement, biased labels, deployment context, or feedback loops [8,15]. Fairness is not a single metric. Equal false-positive rates, equal false-negative rates, calibration, and individualised treatment can conflict, especially when base rates differ across groups. Public authorities must therefore explain which fairness harms matter in the specific context and why a chosen trade-off is lawful and proportionate.

Explainability concerns whether officials, auditors, regulators, courts, and affected individuals can understand the basis and limits of a classification. Post-hoc tools such as LIME, SHAP, and counterfactual explanations may be useful, but they do not by themselves make a decision accountable. In some high-stakes settings, interpretable models should be preferred unless a more complex model is demonstrably necessary and accompanied by stronger safeguards [14].

Accountability means that an institution can justify, document, review, correct, and, where necessary, suspend the use of a classifier. It includes answerability, enforceability, and institutional responsibility [2]. The state cannot discharge this responsibility by pointing to a vendor model, a generic accuracy score, or an official nominally “in the loop” who lacks the information or authority to challenge the system.

3.6. Accountability as a Sociotechnical Property

Accountability in public-sector AI classification cannot be reduced to model performance or technical transparency. Classifiers operate within organisational structures, legal frameworks, administrative procedures, procurement arrangements, and human decision-making processes. Consequently, accountability should be understood as a sociotechnical property emerging from the interaction between technology, institutions, public authority, and affected individuals. This perspective provides the conceptual basis for the accountability-chain framework developed later in this article.

4. State of the Art in Public-Sector AI Classification

Public-sector classifiers range from interpretable scoring models to complex machine-learning pipelines. Logistic regression and constrained decision trees remain attractive where reasons must be communicated and thresholds defended. Random forests, gradient boosting, support vector machines, and neural networks may improve predictive performance in some structured or unstructured-data settings, but they generally increase the burden of validation, explanation, monitoring, and procurement control. Natural language processing classifiers may help process complaints, case notes, or reports, but they can lose context and reproduce linguistic bias. Anomaly detection can identify unusual patterns in tax, welfare, border, or security data, but it should not be treated as proof of wrongdoing.

Table 1. Governance Implications of Common Classification Approaches.

Approach	Typical Strength	Main Governance Implication
Scoring models, logistic regression, and small decision trees	Relatively interpretable and easier to document	Suitable where reasons, appeals, and threshold justification are central, but still require label and subgroup-error review.
Ensembles, support vector machines, and gradient boosting	Often strong performance on structured administrative data	Require stronger calibration, subgroup validation, explanation, monitoring, and procurement documentation.
Neural networks and NLP classifiers	Useful for images, text, and complex signals	Harder to justify in adverse decisions unless necessity, reviewability, and contestability are explicit.
Anomaly detection	Useful where labelled examples are limited	Should trigger further inquiry rather than serve as evidence of liability, fraud, or danger.

The state of the art is no longer limited to model development. Documentation and governance tools have become central. Model cards and datasheets provide structured accounts of intended use, training data, evaluation, limitations, and risks [6,9]. Algorithmic audits examine whether system design, deployment, and monitoring align with stated goals and legal constraints [13]. Risk-management frameworks such as the NIST AI Risk Management Framework and ISO/IEC 23894 frame AI governance as an ongoing organisational process, while ISO/IEC 42001 treats AI governance as a management-system issue [10,19,20].

Legal and policy developments reinforce this institutional view. The EU AI Act regulates high-risk AI systems through obligations related to risk management, data governance, technical documentation, record keeping, transparency, human oversight, accuracy, robustness, and cybersecurity [5]. Data-protection law and guidance on automated individual decision-making also strengthen requirements for lawful basis, transparency, meaningful information, impact assessment, and rights-based safeguards where personal data are used for consequential classification. UNESCO, OECD, WHO, and the Council of Europe place AI governance within human rights, accountability, transparency, public benefit, and democratic oversight [11,17,26,28]. These instruments differ in legal force, but they share a common implication: classification systems in public services require institutional justification and control, not only technical optimisation.

The practical tension remains. Complex models can sometimes improve prediction, including in safety-relevant contexts, but their opacity may weaken explanation and contestation. Simpler models are easier to audit but may underperform or reproduce flawed labels. The appropriate choice depends on the decision context. A public authority that adopts a more opaque or proprietary classifier assumes a heavier burden to show why simpler alternatives are insufficient and how opacity will be governed.

Current practice also shows a shift from single models to integrated decision platforms. A classifier may be embedded in a case-management system, combined with document-processing tools, connected to multiple administrative databases, and presented through dashboards that rank cases for staff. In such settings, accountability cannot be limited to the model file or algorithmic method. Interface design, data-sharing arrangements, default recommendations, alert fatigue, and workflow incentives may determine whether the classifier is used cautiously or becomes a source of routine administrative pressure. For this reason, public-sector AI governance should examine the full information system, not only the predictive component.

4.1. Current Governance Gaps

Despite significant progress in AI governance, existing approaches often address isolated aspects of accountability. Model cards focus on model documentation, datasheets focus on datasets, impact assessments focus on risks, and management frameworks focus on organisational controls. However, public-sector classification systems require accountability across the entire lifecycle linking public purpose, data, modelling choices, operational use, citizen rights, and institutional oversight. This fragmentation motivates the development of an integrated accountability-chain framework.

5. Critical Analysis of AI Classification Risks

5.1. Accuracy, Thresholds, and Drift

Aggregate accuracy can conceal unequal false-positive and false-negative rates across groups. A classifier may perform well on average while performing poorly for minorities, rare cases, or people with missing or atypical data. Public evaluation should therefore include sensitivity, specificity, precision, recall, calibration, subgroup performance, error distribution, and confidence intervals, but it should also ask what each error means in the relevant administrative process [1].

The same metric can have different administrative meanings. High recall may be appropriate for a clinical screening tool where missed cases are dangerous, but it may be unacceptable for a fraud-screening tool if it produces large numbers of intrusive investigations. High precision may reduce unnecessary burdens, but it may leave vulnerable people unidentified in support-oriented triage. Evaluation should therefore be tied to a harm model: who is harmed by each error, how severe the harm is, whether the person can detect and challenge it, and whether the harm is reversible.

Threshold selection is not a purely technical optimisation problem. Lowering a threshold may detect more cases while exposing more people to investigation, delay, or stigma. Raising it may reduce burdens on individuals while missing cases in need of support or protection. Public authorities should document why an operating point is proportionate to the public purpose, how expected errors are distributed, and whether affected groups bear unequal costs. They should also preserve evidence of alternative thresholds considered, because a threshold is often where a statistical score becomes an administrative burden.

Deployed systems also face drift. Data drift occurs when the distribution of inputs changes; concept drift occurs when the relationship between inputs and the target changes. In public services, drift can arise from legal changes, altered reporting practices, new eligibility rules, changing clinical practice, feedback from enforcement, or changes in service-user behaviour. A classifier validated at procurement or pilot stage may therefore become unreliable after deployment unless monitoring is continuous. Monitoring should include scheduled recalibration checks, subgroup performance review, incident reporting, and a clear rule for pausing the system when performance or fairness falls outside accepted limits.

5.2. Fairness and Discrimination

Algorithmic discrimination may arise even without explicit protected attributes. Postcodes, employment histories, medical spending, school records, family composition, or policing data can act as proxies for protected or socially salient characteristics. Historical labels can encode prior discrimination: arrests may reflect policing patterns; benefit investigations may reflect enforcement priorities; health spending may reflect unequal access; school outcomes may reflect social inequality. A model trained on such labels can reproduce institutional history as prediction.

Fairness interventions are necessary but limited. Removing protected attributes may not remove proxies, and equalising one error rate may worsen another. The governance question is therefore not whether a system is simply “fair”, but whether its target, features, threshold, intervention, and appeal route can be justified together. Fairness assessment should be linked to administrative consequence. A disparity in a support-oriented triage tool may require a different response from a disparity in a sanctioning or surveillance tool.

5.3. Transparency, Automation Bias, and Human Oversight

Transparency has different audiences. Citizens need understandable reasons and a way to contest errors. Frontline officials need enough information to evaluate the output rather than defer to it. Auditors need logs, versions, thresholds, data lineage, and test results. Regulators and courts need evidence that the system operates within legal limits. A single explanation interface cannot satisfy all these needs.

Human oversight is meaningful only when humans have time, competence, information, and authority. A nominal human-in-the-loop design may still produce automation bias if officials treat model outputs as presumptively correct, if caseloads make independent review unrealistic, or if organisational incentives reward agreement with the system. Oversight should include training, escalation routes, override records, and periodic review of whether officials actually challenge the model when appropriate.

5.4. Security and Adversarial Risk

Public classifiers also create security and privacy risks. Adversarial examples, data poisoning, model extraction, membership inference, and manipulation of input records can affect confidentiality, integrity, and availability. The risk is not limited to technical attack. Users, vendors, staff, or external actors may learn how a system flags cases and adapt behaviour accordingly. A claimant may alter records to avoid scrutiny, a malicious insider may manipulate training data, a vendor integration may expose sensitive logs, or an attacker may infer whether a person was included in a sensitive dataset.

Security is therefore part of public accountability. If a classification system can be manipulated, extracted, or queried in ways that reveal sensitive personal information, the public authority cannot credibly defend the integrity of resulting decisions. Controls should include data-provenance checks, access management, secure model and API deployment, logging of administrative changes, separation between development and production environments, incident-response procedures, vendor security obligations, and periodic adversarial testing proportionate to the system’s risk. Privacy controls should also address data minimisation, retention, purpose limitation, and whether explanations or audit logs disclose more personal information than is necessary for contestation and oversight.

6. Key Ethical and Accountability Issues

6.1. Error Impact and Harm Analysis

The ethical significance of a classification error depends on the consequence attached to it. In low-stakes services, an error may create delay or inconvenience. In critical services, it may deny access to healthcare, suspend income support, trigger police attention, affect liberty, or produce migration consequences. Harm analysis should therefore examine who bears the error, whether the person can detect it, whether the harm is reversible, and whether the institution has incentives to minimise individual burden.

False positives and false negatives both matter, but they matter differently across domains. Fraud detection must control false positives because a suspicious classification can impose stigma and administrative burden. Healthcare triage must attend closely to false negatives when missed intervention can cause clinical harm. Criminal-justice and policing systems require especially careful scrutiny because classifications may influence restrictions on liberty, surveillance, or enforcement attention. Metrics should therefore be tied to the public value at stake and to the severity and reversibility of likely harms.

6.2. Contestability and Procedural Justice

A person affected by an adverse classification should be able to understand the practical reason for the decision, correct relevant data, submit contrary evidence, obtain human review, and receive a reasoned outcome. These requirements are consistent with long-standing procedural-justice concerns in public administration and with data-protection safeguards for automated individual decision-making. Contestability cannot be reduced to a general statement that an automated tool was used. It requires enough information to identify what can be challenged and enough institutional capacity to correct mistakes.

The cases of SyRI, Robodebt, the UK visa streaming tool, and Ofqual’s 2020 grading process illustrate different contestability failures: opaque risk indicators, weak evidential bases, routine administrative burden, and limited ability to challenge a model-mediated outcome [18,21,22,23,24,25]. These examples differ legally and factually, but they share a common lesson. Where classification affects public entitlements or scrutiny, legality and fairness depend not only on model design but also on notice, reasons, evidence preservation, and correction mechanisms.

Table 2. Public Cases and Accountability-Chain Lessons.

Case	Governance Failure Illustrated	Accountability Lesson
SyRI welfare-risk system	Opaque risk indication and limited public scrutiny of data-linking logic	Risk classification requires a clear public purpose, proportionality analysis, transparency about data sources, and a route for affected persons to understand and contest suspicion.
Robodebt	Weak evidential basis and administrative burden shifted onto citizens	Automated or data-driven debt claims require reliable evidence, preserved records, human review, and correction mechanisms before adverse action is taken.
UK visa streaming tool	Triage affected scrutiny while remaining difficult for applicants to inspect	Routing systems can materially affect burden and delay even when they do not formally decide the application; they therefore require auditability and non-discrimination assessment.
Ofqual 2020 grading	Model-mediated allocation produced perceived unfairness and constrained individual correction	Aggregate standardisation can conflict with individual justice; affected persons need meaningful reasons, evidence, and review routes when model outputs affect education outcomes.

6.3. Proportionality, Necessity, and Democratic Control

Public-sector classification should satisfy proportionality and necessity. A system should address a legitimate public purpose, be suitable for that purpose, use no more intrusive method than necessary, and avoid harms disproportionate to its benefits. This test matters because predictive power alone does not justify surveillance, exclusion, or administrative burden. In some contexts, the proper governance conclusion may be non-deployment: if the target is illegitimate, the data are too biased, the consequence is severe and irreversible, or the affected person cannot meaningfully contest the result, additional safeguards may not cure the problem.

Democratic control also requires clarity about who authorises the system, who monitors it, who receives performance reports, who can suspend it, and how the public can scrutinise its use. Without these mechanisms, accountability becomes fragmented across data teams, vendors, managers, frontline staff, legal departments, and political authorities.

7. Implications for Auditing and Information Systems

7.1. Technological Risk and Internal Control

AI classifiers in public services are information systems that process sensitive data, transform those data into classifications, and feed the classifications into administrative action. Their risks include data-quality errors, unauthorised access, insecure integration, model drift, weak logging, vendor dependence, and failure to preserve decision records. Audit work should therefore examine the complete control environment rather than the model alone.

Internal controls should cover purpose approval, data governance, access management, model validation, threshold documentation, change control, human-review procedures, override logging, incident response, post-deployment monitoring, and decommissioning authority. Controls should also define the division of responsibility between policy owners, operational managers, data scientists, procurement teams, cybersecurity staff, legal advisers, auditors, vendors, and frontline officials.

7.2. Compliance and Governance Instruments

Public agencies should distinguish between different types of governance instruments. Binding law may impose legal duties; public-law principles guide administrative justification; standards structure organisational controls; ethical frameworks express normative commitments; and audit artefacts provide evidence. Treating all instruments as equivalent weakens compliance analysis.

Table 3. Governance Instruments and Their Roles.

Instrument Type	Examples	Role in Classifier Governance
Binding law and regulation	EU AI Act; GDPR; sector-specific legal duties	Establish legal obligations for high-risk systems, personal-data processing, documentation, oversight, rights, and compliance.
Public-law principles	Legality, proportionality, reason-giving, due process	Require justification of purpose, procedure, intervention, and burdens imposed on affected persons.
Technical and management standards	ISO/IEC 42001; ISO/IEC 23894; NIST AI RMF	Structure risk management, organisational controls, monitoring, and audit evidence.
Ethical and human-rights guidance	OECD, UNESCO, WHO, Council of Europe	Clarify normative expectations for transparency, human rights, safety, equity, and public benefit.
Documentation and audit artefacts	Model cards, datasheets, logs, impact assessments, data protection impact assessments	Preserve evidence about design choices, limitations, testing, deployment, data flows, and decisions.

7.3. Data Governance and Automated Decision-Making

Data governance is central because labels and features define what the model can learn. Agencies should document data provenance, collection purpose, quality controls, missingness, representativeness, retention rules, access permissions, sharing arrangements, and the rationale for using each dataset [6]. They should also examine whether data collected for one administrative purpose are being repurposed for another in a way that changes citizens’ reasonable expectations or legal position.

Data governance should be specific enough to support audit and contestation. Dataset records should identify source systems, update frequency, known quality issues, missingness patterns, transformation rules, metadata, retention schedules, and access controls. Data minimisation is not only a privacy principle; it is also a reliability principle, because unnecessary variables increase the risk of proxy discrimination, leakage, and unjustified inference. Where multiple agencies share data, governance should identify which body is responsible for accuracy, correction, deletion, and responses to citizen challenges.

Automated decision-making becomes especially problematic when classification outputs trigger consequences without meaningful review, particularly where affected persons are subject to legal or similarly significant effects. Even decision-support systems can function as de facto automation if officials rarely challenge the output. Auditors should therefore assess how often outputs are accepted, overridden, escalated, corrected, or appealed, and whether affected persons receive enough explanation to identify error. They should also verify whether the system’s operational design matches its policy description: a tool described as advisory may be substantively determinative if staff lack the time, training, or authority to depart from it.

8. Accountability Models and Governance Mechanisms

8.1. Human-in-the-Loop Governance

Human-in-the-loop governance is a safeguard only when it changes the decision process. The official must understand the system’s purpose and limits, have access to relevant evidence, be able to disagree with the output, and be protected from incentives that make rubber-stamping likely. Oversight should be evaluated empirically through override rates, review time, error corrections, appeal outcomes, and staff interviews, not merely asserted in policy documents.

Meaningful human review also requires institutional independence. If the same unit that benefits from higher throughput is responsible for reviewing model outputs, staff may have weak incentives to slow down, request additional evidence, or reverse an automated recommendation. Reviewers should receive training on model limitations, common data errors, subgroup performance, and the legal consequences of accepting a classification. Override decisions should be logged without penalising justified disagreement, because a low override rate may indicate either a highly reliable system or a culture of deference.

8.2. Theoretical Foundation of the Accountability Chain

Existing governance instruments provide valuable guidance but frequently focus on specific stages of the AI lifecycle. Public accountability, however, requires institutions to justify the entire sequence from problem definition to post-deployment monitoring. The Public Classification Accountability Chain is proposed as an integrative governance model that connects technical, organisational, legal, and procedural accountability requirements into a single auditable structure.

8.3. Public Classification Accountability Chain

The central contribution of this article is the accountability-chain framework. The framework differs from model cards, datasheets, impact assessments, and risk-management standards because it connects their evidence to the sequence of public justifications required for a classification to affect rights, services, or scrutiny. It is not a replacement for those tools. It is an organising framework that asks whether the institution can defend each step from public purpose to post-deployment suspension.

Figure 1 illustrates this sequence as a lifecycle of public accountability. It shows that accountability does not arise at a single point, such as model validation or final human review, but depends on the continuity between public-purpose justification, data and label legitimacy, model and threshold choice, human review, audit evidence, citizen contestability, and post-deployment authority.

The chain should therefore be read from left to right as a set of cumulative accountability requirements. If one link is absent, the institution may be unable to justify, reconstruct, contest, or suspend the classifier in a legally and administratively defensible way.

Table 4. Public Classification Accountability Chain

Chain Element	Core Question	Minimum Evidence
Public purpose	What legitimate public function does classification serve?	Legal basis, policy rationale, necessity and proportionality assessment.
Data and label legitimacy	Do the data and label represent the concept being acted upon?	Data provenance, label rationale, bias assessment, missingness and representativeness review.
Model and threshold choice	Why this model and operating point?	Validation results, calibration, subgroup errors, threshold rationale, comparison with simpler alternatives.
Human review	Can officials challenge the output in practice?	Review procedure, training, override authority, escalation route, review-time evidence.
Audit trail	Can a decision be reconstructed?	Model version, input record, score, threshold, explanation, official action, override and appeal logs.
Citizen contestability	Can an affected person identify and challenge error?	Notice, reasons, data-correction route, human appeal, evidential preservation.
Post-deployment authority	Who can modify or suspend the system?	Monitoring reports, incident process, named owner, suspension and decommissioning criteria.

8.4. Impact Assessments, Audits, Procurement, and Traceability

Algorithmic impact assessments and, where personal-data risks require it, data protection impact assessments should be conducted before deployment and updated when purpose, data, model, threshold, vendor, or consequence changes [10]. They should identify affected groups, legal basis, data flows, foreseeable harms, mitigation measures, contestation routes, cybersecurity risks, and monitoring responsibilities. Independent audits add value when they have access to relevant evidence, authority to test claims, and a mandate broader than performance scoring.

Procurement is a central accountability mechanism. Contracts should require disclosure of intended use, training and validation evidence, limitations, security controls, documentation, audit access, model-change notification, incident reporting, portability of logs, and rights to suspend or terminate unsafe systems [27]. Vendor secrecy cannot override the public institution’s obligation to explain and defend administrative action.

Traceability links accountability to evidence. Each consequential classification should be reproducible in institutional records: the data used, model version, threshold, score, explanation, human action, override, notice, appeal, and correction. Without traceability, citizens cannot challenge decisions and auditors cannot determine whether the system operated lawfully.

Audits should move from broad governance assertions to testable controls. A document stating that a model is fair or that humans remain responsible is weak evidence unless it is supported by records showing how the claim was tested in practice. Effective audit work may include sampling individual decisions, reproducing scores from archived data, comparing subgroup error rates, reviewing threshold-change approvals, testing access logs, examining vendor change notices, interviewing frontline reviewers, and tracing whether appeals led to correction of data or model use. These procedures connect algorithmic governance with conventional information-systems control testing.

9. Practical Implications

The proposed framework has implications for multiple stakeholders. Public institutions should use the accountability chain to evaluate whether classifiers can be justified, explained, monitored, contested, and suspended when necessary. The framework encourages agencies to move beyond narrow performance measures and to assess whether each stage of the classification lifecycle is supported by evidence, documentation, and institutional authority.

For auditors, the framework provides a structure for assurance procedures that extend beyond model accuracy. Audit work should examine data provenance, label legitimacy, threshold governance, subgroup performance, override practices, citizen appeals, cybersecurity controls, and post-deployment monitoring. This supports a more complete assessment of whether the classifier is embedded in an accountable administrative process.

For procurement teams, the framework can inform contractual requirements for documentation, transparency, audit access, traceability, security controls, model-change notification, incident reporting, and suspension rights. These requirements are particularly important where public institutions rely on proprietary or vendor-operated AI systems.

For regulators and oversight bodies, the framework offers a way to evaluate whether high-risk public-sector AI systems satisfy requirements of legality, fairness, transparency, contestability, and continuing governance. Because the framework links technical choices to administrative consequences, it can support more structured supervisory review.

Finally, affected individuals benefit from the framework because it emphasises notice, explanation, reviewability, correction, and appeal. In this sense, the accountability chain connects technical governance with procedural fairness and public trust.

10. Domain-Specific Considerations

10.1. Healthcare

Healthcare classifiers may support diagnosis, triage, resource allocation, readmission prediction, and preventive care. Their benefits are strongest where they improve clinical attention or identify overlooked need. Their risks include biased training data, unrepresentative validation, unequal access embedded in labels, automation bias, and unsafe deployment across hospitals or populations different from the training context. The healthcare-cost proxy case shows why a technically predictive target may still be ethically misaligned with clinical need [12]. Health systems should therefore require clinical validation, equity analysis, monitoring across sites, and clear responsibility for final clinical judgement.

Healthcare also shows why false negatives and distribution shift require special attention. A model trained in one hospital may not generalise to another with different equipment, coding practices, patient demographics, or clinical pathways. Governance should require external validation where possible, clinician-facing explanations that disclose uncertainty, monitoring for alert fatigue, and procedures for reporting adverse events linked to classifier use. The final clinical decision should remain traceable to accountable clinical judgement rather than to a risk score alone.

10.2. Criminal Justice, Policing, and Public Security

Criminal-justice and policing classifiers can influence bail, sentencing support, supervision, patrol allocation, watchlists, and security alerts. These uses are especially sensitive because they may affect liberty, surveillance, and coercive state power. Historical policing data may reflect enforcement patterns rather than underlying offending; risk scores may be treated as stronger evidence than they are; and false positives may impose severe burdens. Public authorities should therefore require legal basis, subgroup error analysis, independent scrutiny, reason-giving, and strict limits on using predictions as substitutes for evidence [4].

A further risk is feedback. If predictive policing sends more patrols to areas already heavily policed, the system may generate more recorded incidents in those areas and treat the resulting data as confirmation of the original risk assessment. In criminal justice, risk scores may also influence discretionary decisions even when formally described as only one factor. These systems therefore require narrow use limitations, defence access to relevant information where liberty is affected, periodic independent evaluation, and a prohibition on treating group-level risk as individual proof.

10.3. Border Control and Migration

Border and migration classifiers may route applications, detect anomalies, or prioritise security review. Even where a tool does not formally decide an application, routing can affect delay, scrutiny, evidential burden, and perceived legitimacy. The UK visa streaming controversy illustrates the governance risk of administrative triage systems that affect scrutiny while remaining difficult to contest [18,25]. Migration systems should therefore provide clear purpose limitation, auditability, non-discrimination assessment, and review routes for persons affected by higher-scrutiny classification.

Migration contexts raise additional accountability problems because affected persons may have limited access to records, limited language support, and limited practical ability to challenge security-sensitive reasoning. Public authorities should separate legitimate confidentiality from excessive opacity: operational details may sometimes require protection, but the institution should still be able to provide meaningful reasons, preserve evidence, test nationality or proxy-based disparities, and permit independent oversight.

10.4. Welfare and Social Benefits

Welfare classifiers can help target support or identify possible irregularities, but they may also intensify surveillance of vulnerable groups, delay payments, or require claimants to disprove suspicion. Robodebt and SyRI show the importance of lawful basis, evidential quality, transparency, proportionality, and contestability in automated or data-driven welfare enforcement [23,24]. Welfare systems are especially sensitive because the affected person may depend on the benefit for subsistence and may lack resources to contest an automated suspicion.

Governance should therefore distinguish support-oriented classification from enforcement-oriented classification. A triage tool used to identify people needing additional assistance raises different risks from a fraud-risk tool that triggers investigation or suspension. The more punitive the consequence, the stronger the requirements for evidential quality, prior notice, human review, data correction, and independent appeal.

10.5. Education

Education classifiers may affect admissions, grade moderation, dropout-risk prediction, safeguarding referrals, scholarship allocation, or support prioritisation. The potential benefit is early identification of students needing help. The risk is that historical attainment, attendance, postcode, school resources, or family background can become proxies for disadvantage, turning structural inequality into prediction. Ofqual’s 2020 grading process showed how model-mediated administrative decisions can generate perceived unfairness and limited individual correction routes [21,22].

Education systems should therefore treat classification as a support tool unless a stronger justification exists. Where outputs affect grades, admissions, or disciplinary consequences, students should receive understandable reasons, access to relevant records, and a human appeal route capable of correcting individual unfairness. Aggregate consistency should not override the requirement to treat individual learners fairly.

10.6. Fraud Detection, Taxation, and Public Procurement

Fraud detection across tax, procurement, and benefits should be treated as investigation prioritisation unless independent evidence supports adverse action. An anomaly or risk score can identify a case for review, but it should not itself establish fraud, dishonesty, or liability. These systems often rely on cross-dataset matching, transaction histories, business relationships, addresses, or behavioural indicators, which can create privacy and proxy-discrimination risks.

Auditability is particularly important in fraud detection because public agencies may be reluctant to disclose detection rules. Some confidentiality is legitimate to prevent gaming, but secrecy should not prevent internal audit, judicial review, or the affected person’s ability to challenge factual errors. Agencies should separate the detection logic from the evidential basis for enforcement: the classifier may explain why a case was selected, but adverse action should rest on independently reviewable evidence.

11. Challenges and Limitations

Several challenges persist even when governance mechanisms are present. First, bias cannot be solved by technical debiasing alone because it may originate in social conditions, historical records, institutional incentives, and the definition of the target label. Second, explainability remains limited when complex systems, proprietary models, or high-dimensional data are used. Third, human oversight can be symbolic if officials lack time, information, authority, or incentives to challenge outputs. Fourth, responsibility is often fragmented among vendors, data scientists, policy teams, frontline officials, legal departments, auditors, and political leaders.

Regulatory and organisational capacity also matters. Some agencies lack staff who can evaluate model validity, fairness, cybersecurity, procurement terms, and audit logs. Others may have technical capacity but weak legal or procedural safeguards. Standards and frameworks help, but they do not by themselves create institutional competence. Capacity building, interdisciplinary governance, and independent scrutiny are therefore necessary conditions for responsible deployment.

This review has several limitations. It is a narrative rather than systematic review, draws on selected literature and public cases, and does not provide a quantitative synthesis of empirical evidence. The article also combines multiple jurisdictions and public-service domains despite substantial differences in legal frameworks, administrative traditions, institutional capacity, and accountability mechanisms.

Furthermore, the Public Classification Accountability Chain remains a conceptual framework that has not yet been empirically validated across different sectors or jurisdictions. Future research should therefore test the framework through comparative case studies, audit exercises, implementation assessments, and longitudinal analyses of deployed systems. Additional work is also needed to examine how public institutions choose thresholds, manage fairness trade-offs, audit vendor systems, and respond to harms identified through post-deployment monitoring.

12. Recommendations

Public institutions should adopt a lifecycle approach to AI classification in critical services. Before deployment, they should establish the legal basis and public purpose, define the target label, assess data provenance and representativeness, compare simpler and more complex models, justify thresholds, evaluate subgroup performance, and determine whether the intended intervention is necessary and proportionate. They should not deploy a classifier when the target is illegitimate, the data cannot support the intended inference, or the consequence is too severe to permit meaningful correction.

During procurement and development, agencies should require documentation of intended use, training data, validation results, model limitations, cybersecurity controls, explainability methods, logging requirements, change-control procedures, audit access, and data-protection responsibilities. Vendor systems should not be treated as black boxes beyond public scrutiny. Contracts should preserve the institution’s ability to inspect, monitor, explain, suspend, and terminate systems.

During deployment, agencies should ensure that human review is substantive, that affected persons receive meaningful notice and reasons, that appeal and data-correction routes are usable, and that outputs are not treated as conclusive evidence. Agencies should monitor drift, subgroup error patterns, override rates, appeal outcomes, complaints, incidents, and operational dependence on the system. Monitoring results should feed into periodic review, retraining, threshold adjustment, redesign, or suspension.

For audit practice, agencies should translate these requirements into testable procedures. Auditors should be able to select a sample of classifications and reconstruct the data inputs, model version, score, threshold, explanation, human action, notice given, and any subsequent appeal or correction. They should also test whether stated controls operate in practice by reviewing access logs, override records, incident tickets, model-change approvals, and vendor notifications. Where documentation is incomplete, the audit finding should not be treated as a minor recordkeeping issue; missing evidence may mean that the institution cannot demonstrate lawful and accountable use.

Table 5. Minimum Audit Questions for High-Stakes Public Classifiers

Audit Question	Evidence Required
What is the legal basis and public purpose?	Legal assessment, policy rationale, necessity and proportionality analysis.
What target label is being predicted?	Label definition, rationale, proxy-risk analysis, domain-expert review.
How were data selected and assessed?	Provenance records, data-quality tests, missingness analysis, representativeness review.
Why were the model and threshold chosen?	Validation results, calibration, subgroup error analysis, threshold report, comparison with alternatives.
Can humans meaningfully override the output?	Review procedures, training records, override logs, escalation policy.
Can a decision be reconstructed and contested?	Logs, model version, input record, score, explanation, notice, appeal route, correction records.
Who can suspend or redesign the system?	Named governance owner, monitoring reports, incident process, suspension criteria.

Finally, agencies should treat non-deployment as a legitimate accountability outcome. Some classification tasks are not made acceptable by better explanations or more detailed logs. Where the label is morally or legally misaligned, where errors are severe and irreversible, where affected persons cannot realistically contest the result, or where deployment would intensify unjustified surveillance, the accountable decision may be not to classify at all.

13. Conclusions

AI classification systems can support critical public services by improving analytical capacity, consistency, triage, and risk detection. These benefits matter, but they do not settle the governance question. When classification influences rights, services, liberty, healthcare, welfare, education, taxation, migration, or security, prediction becomes part of public administration.

The main conclusion of this review is that public-sector classifiers should be governed as sociotechnical accountability systems rather than as isolated predictive tools. Their legitimacy depends on data governance, fairness assessment, explainability, human oversight, contestability, cybersecurity, compliance, documentation, auditing, procurement design, and democratic control. Accuracy remains important, but it is only one component of legitimacy. A classifier can be statistically competent and still be unfair, opaque, insecure, disproportionate, or institutionally unaccountable.

The accountability-chain framework links public purpose, label legitimacy, model and threshold choice, human review, audit evidence, citizen contestability, and post-deployment authority. It shifts attention from whether a classifier predicts its target to whether the public institution can justify each step from purpose definition to citizen challenge and system suspension. Its practical value is diagnostic: when one link in the chain is missing, the institution can identify whether the problem is an illegitimate target, unsuitable data, an unjustified threshold, symbolic oversight, absent records, weak appeal rights, or lack of authority to stop the system.

Responsible deployment therefore requires context-specific validation, fairness testing, transparent documentation, meaningful review, appeal mechanisms, internal controls, cybersecurity safeguards, procurement accountability, and continuing monitoring. Public institutions remain responsible even when vendors build or host the system. Accountability cannot be outsourced to software or supplied after the fact by generic explanation tools. The accountable institution must be able to show not merely that a classifier works, but that the surrounding administrative system can explain, correct, audit, and discontinue its use.

Future research should empirically test audit frameworks for high-risk public-sector AI, evaluate appeal mechanisms, compare governance models across jurisdictions, and examine how agencies choose thresholds, handle fairness trade-offs, audit vendor systems, and respond when monitoring reveals unequal harm. The legitimacy of AI classification in critical public services ultimately depends less on prediction in the abstract than on whether public institutions can justify, explain, audit, contest, and suspend the use of prediction when it affects rights, services, or liberty.

Beyond its practical recommendations, the framework contributes to the growing literature on public-sector AI governance by providing an integrated conceptual model capable of linking technical performance with institutional accountability. In this sense, the accountability chain offers a bridge between AI governance theory and operational public-sector practice.

Author Contributions

Conceptualization, J.F., A.G. and M.M.M.; methodology, J.F. and A.G.; formal analysis, J.F., A.G. and M.M.M.; writing—original draft preparation, J.F. and A.G.; writing—review and editing, J.F., A.G. and M.M.M.; supervision, M.M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analysed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and machine learning: Limitations and opportunities. MIT Press.
Busuioc, M. (2021). Accountable artificial intelligence: Holding algorithms to account. Public Administration Review, 81(5), 825–836.
Levy, K. E. C., Chasalow, K. E., & Riley, S. (2021). Algorithms and decision-making in the public sector. Annual Review of Law and Social Science, 17, 309–334.
Meijer, A., & Wessels, M. (2019). Predictive policing: Review of benefits and drawbacks. International Journal of Public Administration, 42(12), 1031–1039.
European Parliament and Council of the European Union. (2024). Regulation (EU) 2024/1689 of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union, L 2024/1689, 12 July 2024.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92.
Margetts, H., & Dorobantu, C. (2019). Rethink government with AI. Nature, 568(7751), 163–165.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), Article 115, 1–35.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–229.
National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. NIST.
OECD. (2019). Recommendation of the Council on Artificial Intelligence. OECD/LEGAL/0449.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.
Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., & Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 33–44.
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206–215.
Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 59–68.
Wirtz, B. W., Weyerer, J. C., & Geyer, C. (2019). Artificial intelligence and the public sector: Applications and challenges. International Journal of Public Administration, 42(7), 596–615.
Council of Europe. (2024). Council of Europe Framework Convention on Artificial Intelligence and Human Rights, Democracy and the Rule of Law. Council of Europe.
Electronic Immigration Network. (2020). Home Office suspends use of digital streaming tool for visa applications after legal action by JCWI and Foxglove. Electronic Immigration Network, 9 August 2020.
International Organization for Standardization. (2023a). ISO/IEC 42001:2023 Information technology—Artificial intelligence—Management system. ISO.
International Organization for Standardization. (2023b). ISO/IEC 23894:2023 Information technology—Artificial intelligence—Guidance on risk management. ISO.
Office of Qualifications and Examinations Regulation. (2020a). Awarding GCSE, AS & A levels in summer 2020: Interim report. Ofqual.
Office of Qualifications and Examinations Regulation. (2020b). Summer 2020 results analysis: GCSE, AS and A level—Update to the interim report. Ofqual.
Rechtbank Den Haag. (2020). Judgment of 5 February 2020, ECLI:NL:RBDHA:2020:1878. De Rechtspraak.
Royal Commission into the Robodebt Scheme. (2023). Report of the Royal Commission into the Robodebt Scheme. Commonwealth of Australia.
UK Parliament. (2019). Visa processing algorithms. Hansard, HC Deb, 19 June 2019.
UNESCO. (2021). Recommendation on the Ethics of Artificial Intelligence. UNESCO.
World Economic Forum. (2020). AI Procurement in a Box: AI Government Procurement Guidelines. World Economic Forum.
World Health Organization. (2021). Ethics and governance of artificial intelligence for health: WHO guidance. World Health Organization.

Figure 1. Public Classification Accountability Chain for high-stakes public-sector AI classifiers.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.