Submitted:
22 May 2026
Posted:
25 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- It positions this survey against nine core prior surveys, clarifying where the literature is already mature and where important gaps remain.
- It uses centralized learning as a comparative baseline while maintaining federated learning as the primary scope of analysis.
- It introduces an Accountability-Integrated Taxonomy (AIT) that extends conventional poisoning categories with additional dimensions for observability, attribution granularity, required audit evidence, and trust assumptions.
- It synthesizes FL defenses—including aggregation-based, detection-based, and cryptographic mechanisms—through the lenses of auditability, traceability, and accountability, while offering a deeper critical discussion of untrusted servers, verifiable aggregation, and standards-relevant evidence.
2. Survey Methodology
2.1. Review Type, Search Strategy, and Eligibility Criteria
| Stage | Input | Inclusion logic | Exclusion logic | Output |
|---|---|---|---|---|
| Discovery search | Broad pool (≈120 works) | Papers addressing poisoning attacks, poisoning defenses, FL trust assumptions, secure aggregation, or accountability-related evidence | Adversarial-example papers focused solely on inference, purely application-specific case studies without transferable poisoning insight, and duplicates | Initial literature corpus |
| Survey positioning | 18 survey and survey-like works | Structured surveys with taxonomies, comparative discussion, or explicit treatment of poisoning in ML/FL | Only marginal treatment of FL, no comparative structure, or no meaningful treatment of training-time poisoning | 9 core surveys for direct positioning |
| Primary-study synthesis | Remaining relevant papers | Empirical, conceptual, or governance works needed to analyze attacks, defenses, trade-offs, and audit artifacts | Papers cited only tangentially or lacking analytical value for the paper’s research questions | Evidence base for later sections |
| Analytical output | Final manuscript corpus | Cross-survey positioning, scenario-based defense comparison, accountability indicators, and standards-oriented discussion | Claims of exhaustive or statistically pooled evidence synthesis | Scoping review with analytical survey contribution |
2.2. Positioning Relative to Existing Surveys
| Survey | Primary scope | Main strength | Gap relative to this work |
|---|---|---|---|
| Pitropakis et al. Pitropakis et al. [9] | Broad adversarial ML taxonomy | Foundational attack taxonomy across ML settings | Limited FL-specific poisoning detail and no accountability- or audit-evidence perspective |
| Tian et al. Tian et al. [3] | Poisoning in ML | Strong overview of poisoning attacks and countermeasures in centralized ML | Limited treatment of FL-specific trust boundaries, server-side threats, and forensic readiness |
| Cinà et al. Cinà et al. (2023) | Training-data poisoning in ML | Detailed synthesis of poisoning threat models and defenses | Emphasis remains on poisoning taxonomy itself rather than on FL accountability, untrusted aggregation, and evidence requirements |
| Sikandar et al. Sikandar et al. [11] | FL attacks and defenses broadly | Broad security overview of FL attack surfaces | Not poisoning-centric and offers limited critical discussion of accountability or auditability |
| Lianga et al. Lianga et al. [12] | FL poisoning attacks and defenses | FL-oriented catalog of attack and defense families | Limited direct comparison with other surveys and limited emphasis on accountability under untrusted-server assumptions |
| Xia et al. Xia et al. [13] | FL poisoning survey | Concise FL-specific poisoning overview | More descriptive than critical, with limited discussion of evidence-bearing defenses and governance implications |
| Nowroozi et al. Nowroozi et al. [22] | FL data poisoning | Recent FL-focused synthesis with updated attack examples | Narrower focus on data poisoning and a less developed treatment of accountability-oriented design |
| Li et al. Li et al. [14] | FL life-cycle threats and defenses | Broad life-cycle perspective on FL security and privacy | Poisoning is one component among many; there is limited centralized-baseline comparison and no explicit accountability taxonomy |
| Zhou et al. Zhou et al. [15] | FL data-poisoning defenses | Updated defense-focused discussion of FL poisoning | Limited attention to traceability, audit evidence, and malicious-server scenarios |
| This survey | FL poisoning with a centralized baseline | Cross-survey positioning, explicit scope clarification, AIT, and a deeper discussion of accountability and untrusted servers | Analytical rather than exhaustive; uses representative literature instead of claiming complete coverage |
3. Poisoning Attacks and Accountability Taxonomy
3.1. Baseline Poisoning Taxonomy
3.1.1. Taxonomy Overview
- Attacker knowledge: the degree of visibility into model internals or the training pipeline (white-box, gray-box, black-box) Papernot et al. [30].
- Manipulation target: the component the adversary modifies (training data, model parameters, or training dynamics).
- Attack intent: whether the goal is the untargeted degradation of overall performance or the targeted manipulation of specific outputs or behaviors.
| Access | Target | Intent | Example Mechanism | Representative Refs. |
|---|---|---|---|---|
| White-box | Data / Model | Untargeted / Targeted | Gradient manipulation, label flipping, and model replacement | Nowroozi et al. [22]; Sikandar et al. [11]; Biggio et al. [2]; Bagdasaryan et al. [31] |
| Gray-box | Data / Model | Untargeted / Targeted | Limited-knowledge poisoning and adaptive gradient updates (e.g., AgrEvader) | Mazzone et al. [32]; Zhang et al. [33] |
| Black-box | Data | Untargeted / Targeted | Query-based or transfer poisoning, shadow-model training | Sakhnovych [34]; Wang et al. [35] |
| White-box | Backdoor | Targeted | Trigger embedding and model-level implants | Lianga et al. [12]; Shafahi et al. [36]; Zhang et al. [37]; Shah et al. [38] |
3.1.2. Data Poisoning Attacks
- Clean-label poisoning: the attacker cannot modify labels, but instead crafts adversarial inputs that collide with target-class features Shafahi et al. [36]; Zhang et al. [39]. GAN-generated poisons can operate under non-IID conditions and without full knowledge of the victim model Xia et al. [13].
- Dirty-label poisoning: both inputs and labels are modified to implant malicious behavior or enable targeted misclassification Weng et al. [40]; Pan et al. [41]; Xu et al. [42]. Recent work shows that these categories remain active areas of research rather than closed historical cases. For example, recent defenses against data poisoning in neural networks, together with newer empirical evaluations of FL poisoning, continue to refine how clean-label, dirty-label, and label-flipping attacks are instantiated and measured in practice De Gaspari et al. [43]; Bena et al. [44]. A canonical example is label flipping: Static Label Flipping (SLF) directly alters labels (e.g., "1" to "7"), whereas Dynamic Label Flipping (DLF) targets feature-overlapping samples to increase stealth Zhang et al. [39]. These attacks highlight the risks posed by insufficient data provenance and the lack of audit trails.
3.1.3. Model Poisoning Attacks
- Model replacement: adversaries scale updates to overwrite the global model in a single round Bagdasaryan et al. [31].
- Gradient ascent: adversaries send updates in the opposite optimization direction to amplify divergence Shejwalkar and Houmansadr [48]; Guerraoui et al. [49].
- Stealth poisoning: adversaries craft malicious updates that mimic benign statistics in order to evade anomaly detection Shejwalkar and Houmansadr [48].
3.1.4. Backdoor Attacks
- Data level: imperceptible patches, pixel patterns, or text perturbations embedded into poisoned samples Chen et al. [50]; Rocha and Conti [51].
- Model level: direct manipulation of model weights or aggregation procedures to implant latent malicious behavior Zhang et al. [37]; Shah et al. [38]. Effective backdoors are both stealthy and semantically valid, making them particularly dangerous in FL, where decentralized updates hinder attribution and forensic reconstruction.
3.1.5. Summary
3.2. Poisoning Attacks in Centralized Learning
3.2.1. Observations & Auditability Differences
- Full data visibility enables high-impact, optimization-driven poisoning. In centralized settings, adversaries can directly inspect or manipulate the entire training dataset. This access enables precise bilevel optimization attacks Oprea et al. [53]; Jagielski et al. [54], adaptive gradient poisoning Srivastava et al. [24], and large-scale integrity attacks that are difficult to realize when only partial data visibility is available, as in FL Li et al. [14].
- The stealth-versus-impact trade-off is more pronounced. Recent clean-label and influence-based attacks Xia et al. [13]; Zhang et al. [37] achieve high stealth but smaller global impact, whereas aggressive label-flip and gradient-based poisons Gharib et al. [55]; Zhou et al. [15] cause substantial degradation but are easier to detect. Centralized settings allow adversaries to manage this trade-off more effectively because they retain full control over the data pipeline.
- High auditability and traceability via centralized logging. Centralized workflows benefit from reproducible pipelines, dataset versioning, and unified audit logs Muñoz-González et al. (2017); Tian et al. [3]. Forensic tools such as gradient fingerprinting and lineage tracing Sun et al. [52]; Gao et al. [28] allow investigators to reconstruct poisoning events. In contrast, FL lacks global visibility, and poisoned updates often blend with benign ones Bhagoji et al. [27]; Bagdasaryan et al. [45].
- Limited adversarial diversity compared with federated learning. Centralized systems involve a single training pipeline, making multi-party collusion and multi-round adaptive poisoning relatively rare. Recent federated attack studies Sikandar et al. [11]; Lianga et al. [12]; Nowroozi et al. [22] demonstrate far greater diversity, including client–server collusion, Byzantine coordination, and cross-round adaptive poisoning.
- Clearer provenance and accountability pathways. Centralized ML benefits from complete ownership over data collection, preprocessing, and training workflows. Accountability tools—dataset fingerprinting, secure provenance tracking, and auditable pipelines Kroll [61]; Miguel and Chen [62]—are easier to enforce. In FL, the absence of unified provenance and verifiable aggregation complicates the attribution of malicious behavior Zhang et al. [37]; Li et al. [14]. These observations highlight that centralized poisoning attacks remain highly damaging, yet comparatively easier to audit and attribute than their federated counterparts. This motivates a separate analysis of poisoning in federated learning, where untrusted participants, opaque aggregation, and cross-round interactions create a significantly broader and more complex threat surface.
3.3. Poisoning Attacks in Federated Learning
3.3.1. Client-Side Poisoning
3.3.2. Server-Side and Aggregation Attacks
3.3.3. Collusion and Multi-Round Poisoning
3.3.4. Why Federated Learning Is Harder to Audit
- Lack of global visibility. The server never sees client data, intermediate states, or, when secure aggregation is used, local gradients, which makes attribution extremely difficult Zhang et al. [37].
- Opaque aggregation. Robust aggregation techniques discard statistical information needed for forensics, while secure aggregation fully hides client updates Ma et al. [64].
- Untrusted participation. Cross-device FL allows open enrollment, enabling adversaries to create multiple fake clients and amplify their influence through Sybil-style coordination Fung et al. [66]; Cao et al. [63].
- No unified provenance trail. FL often lacks unified dataset lineage, versioning, and update histories, unlike centralized pipelines in which provenance is easier to trace Miguel and Chen [62]. These structural challenges make FL intrinsically harder to secure and audit and motivate the need for accountability-integrated taxonomies and forensic-ready FL architectures, as explored in later sections.
3.4. Accountability-Integrated Taxonomy (AIT)
- Observability of the training process: whether the relevant evidence is fully visible, partially visible, or hidden by decentralization or privacy-preserving mechanisms.
- Attribution granularity: whether suspicious behavior can be attributed to a sample, a client, a communication round, the server, or only to the federation as a whole.
- Required audit evidence: which artifacts are needed to support verification, such as dataset provenance, training logs, validation traces, cryptographic commitments, signatures, or zero-knowledge proofs.
- Trust assumption and control point: whether the dominant threat lies in data contributors, model updaters, the aggregation server, or collusion across these roles. Under AIT, an attack is therefore characterized not only by what is manipulated, but also by what can be seen, who can be blamed, which evidence is needed, and where trust is most brittle. These additional dimensions distinguish the accountability-oriented analysis in this paper from the conventional poisoning taxonomy summarized earlier.
3.4.1. Operational Accountability Indicators
4. Defenses and Countermeasures in Federated Learning
4.1. Category-Based Defenses in Federated Learning
4.1.1. Anomaly and Statistical Detection
4.1.2. Performance-Based Filtering
4.1.3. Byzantine-Robust Aggregation
4.1.4. Cryptographic and Encryption-Based Defenses
4.1.5. Accountability-Oriented Frameworks
4.2. Structured Trade-off Analysis
4.3. Persistent Gaps
4.3.1. Poisoning Attack Challenges
- Adaptive and Byzantine attackers: most aggregation-based defenses assume static attackers with limited capabilities and that malicious clients constitute only a small minority. Recent studies challenge this assumption by highlighting adaptive poisoning attacks that dynamically adjust their behavior to mimic benign updates or modulate gradient magnitudes to evade statistical filters Chen et al. [101]; Mohamed et al. [102]. This adaptiveness renders fixed-threshold and static rule-based defenses insufficient.
- Persistent backdoor injection: backdoor attacks remain one of the most enduring threats in federated learning. Malicious clients can embed stealthy triggers in rounds with minimal impact on global accuracy, allowing persistent targeted manipulation Masunda and Ajayi [103]. Their stealthy nature makes them difficult to detect, especially under secure aggregation or encrypted updates.
- Data heterogeneity exploitation: highly heterogeneous (non-IID) client data distributions enable adversaries to craft model updates that appear statistically consistent with legitimate local training Tallam [104]; Zhukabayeva et al. [105]. This undermines anomaly-detection techniques that assume homogeneous or centrally accessible data, leading to high false-positive rates or undetected attacks.
- Limited global observability: privacy constraints restrict the central server from accessing client data or intermediate training signals, thereby weakening validation- and verification-based defenses Chen et al. [101]. This creates a structural detection blind spot, particularly for data-centric poisoning attacks executed during the training phase.
- Untrusted server and aggregation manipulation: most poisoning defenses implicitly assume an honest server, yet recent analyses show that a compromised or malicious aggregator can bias model updates, alter aggregation weights, or fork global models without client visibility Nowroozi et al. [98]; Lianga et al. [99]. Because clients cannot inspect server-side operations, such manipulations remain invisible, creating a critical accountability gap in FL.
- Secure aggregation as a double-edged sword: while secure aggregation preserves client privacy, it also prevents the server from inspecting individual gradients or update statistics, eliminating many anomaly-based defenses and limiting post-incident forensics Ma et al. [64]; Sikandar et al. [11]. This creates structural blind spots in which data poisoning, backdoors, and collusive behaviors can operate undetected.
- Lack of standardized forensic and robustness benchmarks: FL poisoning research relies heavily on small image datasets (e.g., MNIST and CIFAR-10) and lacks standardized evaluation protocols that capture realistic heterogeneity, collusion, or accountability requirements Tian et al. [3]; Lianga et al. [12]. Without benchmarks that incorporate provenance, traceability, and verifiable auditing, comparing defenses or assessing forensic readiness remains difficult.
4.3.2. Anomaly Detection and Traceability
- Enabling traceability: mechanisms such as gradient fingerprinting, temporal consistency analysis, and inter-round similarity metrics help identify poisoned updates and attribute them to specific clients or rounds Nguyen [106]; Tallam [104]. These capabilities provide a foundation for forensic readiness and auditability in FL.
- Enhancing explainability: modern anomaly-detection frameworks increasingly incorporate attention mechanisms or other explainable neural components, offering interpretable insights into why updates are flagged as malicious Hamouda [107]; Shaik et al. [108]. Such transparency is essential for trust, human oversight, and alignment with ethical AI principles.
- Forensic analysis in federated learning: emerging research integrates anomaly detection directly into forensic-analysis pipelines, enabling the reconstruction of poisoning behavior and supporting proactive security strategies Mohamed et al. [102]. This shift from reactive detection to proactive attribution is critical to next-generation accountable FL systems.
4.4. Opportunities for Unified Frameworks
- Privacy-preserving observability: systems need minimal but verifiable evidence channels so that poisoning-relevant events can be inspected without exposing raw client data.
- Evidence-carrying aggregation: aggregation should produce auditable artifacts—for example commitments, signatures, or proofs—rather than only a model output.
- Attribution across trust boundaries: the design must distinguish client misconduct, server misconduct, and collusion, since each requires different evidence and response mechanisms.
- Standards-mappable assurance: technical artifacts should be expressible as governance evidence that can support internal audits, external assurance, or compliance review. The interaction between these requirements is illustrated in Figure 2. The three layers should be read as a design decomposition rather than a claim of implementation completeness. At the privacy-preserving layer, techniques such as differential privacy, homomorphic encryption, and secure multiparty computation protect client updates from inference or leakage. The robustness layer incorporates Byzantine-resilient aggregation, anomaly detection, and validation-based filtering to preserve model integrity against poisoning and backdoor attacks [113,114,115]. Above these, the accountability layer introduces tamper-evident audit trails using cryptographic commitments, hash-chained logs, identity-binding signatures, or zero-knowledge proofs to enable verifiable attribution and post-hoc forensic reconstruction [116,117,118]. This layered view also helps deepen the paper’s brief mapping to ISO/IEC 42001. At the governance level, FL systems need explicit role definitions, risk ownership, and incident-handling rules. At the operational level, they need evidence about sampling, aggregation, validation, and model release decisions. At the assurance level, they need artifacts that an independent auditor could inspect without re-running the entire training process. In other words, standards alignment is meaningful only if technical defenses emit reviewable evidence, not merely if they improve accuracy under attack. Future work should therefore formalize composable trust objectives that jointly quantify privacy budgets, robustness guarantees, and auditing costs under unified optimization criteria. Promising directions include: (1) developing quantitative metrics that capture privacy–robustness–accountability interactions; (2) constructing benchmark suites for untrusted-server and multi-round poisoning scenarios; (3) designing hybrid cryptographic–forensic frameworks that combine zero-knowledge verification with blockchain-secured aggregation; and (4) evaluating which evidence artifacts are actually useful to human auditors, regulators, and incident responders. Such integrated approaches would enable the emergence of trustworthy federated intelligence, a next-generation paradigm where security, transparency, and regulatory compliance coexist more coherently.
5. Implications for Future Research and Practice
- Developing auditable benchmark suites that model adversarial clients, collusion, and untrusted-server behavior under realistic deployment constraints.
- Embedding forensic auditability and explainability into model evaluation pipelines to support reproducibility in high-stakes or regulated settings.
- Creating open datasets containing poisoning traces, backdoor triggers, and unlearning metadata to accelerate the validation of accountable and verifiable defenses.
- Integrate tamper-evident audit logs and forensic accountability mechanisms to support transparent monitoring across the model lifecycle.
- Map technical safeguards to governance standards such as ISO/IEC 42001:2023 and the NIST AI Risk Management Framework, enabling lifecycle traceability and verifiable compliance.
- Adopt federated explainable AI (fXAI) components, ensuring interpretable reasoning pathways for regulators, auditors, and end-users.
Future Directions.
- Federated forensics by design: integrating neuron- and feature-level provenance with cryptographic round receipts to enable explainable reconstruction of poisoning and backdoor behavior.
- Privacy-preserving auditability: using zero-knowledge–attested robust aggregation and verifiable client sampling to prevent selective aggregation, tampering, and forgery while maintaining confidentiality.
- Standards alignment: mapping FL artifacts—hashes, provenance metadata, policies, and KPIs—to ISO/IEC 42001 controls, producing machine-verifiable risk and compliance evidence that regulators can query without accessing sensitive data.
- ZK-proof integration: improving the efficiency and deployment realism of zero-knowledge proofs in aggregation workflows so that update correctness can be certified without exposing sensitive data.
- Explainable accountability: evaluating whether interpretable anomaly indicators and provenance signals can serve as reliable, audit-ready evidence in regulated environments.
- Lifecycle governance: operationalizing ISO/IEC 42001 and similar standards through machine-verifiable artifacts that document risk management, training policies, and compliance events.
- Forensic optimization: identifying computationally efficient strategies for tamper-evident logging, dynamic audit triggering, and cryptographic verification that preserve model convergence.
- Human–AI oversight: developing visualization and review tools to support investigators, auditors, and domain experts in understanding attack patterns and audit trails in cross-silo deployments. Advancing these directions will require collaboration across machine learning, cybersecurity, distributed systems, and regulatory domains. Establishing reproducible forensic benchmarks and standard metrics for trust will accelerate progress toward FL systems that are not only secure and privacy-preserving, but also transparent, verifiable, and aligned with global Responsible AI requirements.
Author Contributions
Conflicts of Interest
References
- Jobin, A.; Ienca, M.; Vayena, E. The global landscape of ai ethics guidelines. Nat. Mach. Intell. 2019, 1, 389–3999. [Google Scholar] [CrossRef]
- Biggio, B.; Nelson, B.; Laskov, P. Poisoning attacks against support vector machines. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012. [Google Scholar]
- Tian, Z.; Cui, L.; Liang, J.; Yu, S. A comprehensive survey on poisoning attacks and countermeasures in machine learning. ACM Comput. Surv. 2022, 55, 1–35. [Google Scholar] [CrossRef]
- Mittelstadt, B. D.; Allo, P.; Taddeo, M.; Wachter, S.; Floridi, L. The ethics of algorithms: Mapping the debate. Big Data Soc. 2016, 3, 20539517166796791. [Google Scholar] [CrossRef]
- Novelli, C.; Taddeo, M.; Floridi, L. Accountability in artificial intelligence: what it is and how it works. Ai Soc. 2024, 39, 1871–1882. [Google Scholar] [CrossRef]
- Miguel, B. S.; Naseer, A.; Inakoshi, H. Putting accountability of AI systems into practice. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2021a; pp. 5276–5278. [Google Scholar]
- Cen, S.; Alur, R. From transparency to accountability and back: A discussion of access and evidence in ai auditing. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024a. [Google Scholar] [CrossRef]
- Kroll, J. A. Outlining traceability: A principle for operationalizing accountability in computing systems. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (ACM); 2021b; pp. 21–31. [Google Scholar] [CrossRef]
- Pitropakis, N.; Panaousis, E.; Giannetsos, T.; Anastasiadis, E.; Loukas, G. A taxonomy and survey of attacks against machine learning. Comput. Sci. Rev. 2019, 34, 1001991. [Google Scholar] [CrossRef]
- Cin`a, A. E.; Demontis, A.; Biggio, B.; Pelillo, M.; Roli, F. Wild patterns reloaded: A survey of machine learning security against training data poisoning. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
- Sikandar, H.; Waheed, H.; Tahir, S.; Malik, S. A detailed survey on federated learning attacks and defenses. Electronics 2023, 12, 2601076. [Google Scholar] [CrossRef]
- Lianga, J.; Wang, R.; Feng, C.; Chang, C.-C. A survey on federated learning poisoning attacks and defenses. arXiv 2023a, arXiv:2306.03397. [Google Scholar]
- Xia, G.; Chen, J.; Yu, C.; Ma, J. Poisoning attacks in federated learning: A survey. IEEE Access 2023, 11, 12345–1236. [Google Scholar] [CrossRef]
- Li, J.; et al. Threats and defenses in the federated learning life cycle: A comprehensive survey and challenges. Front. AI 2025. [Google Scholar] [CrossRef]
- Zhou, Y.; et al. Defending against data poisoning attacks in federated learning: A survey. ACM Comput. Surv. 2025. [Google Scholar]
- Zeng, H.; et al. A federated learning framework with blockchain-based auditable participant selection. J. Inf. Secur. Appl. Press 2024a. [Google Scholar] [CrossRef]
- Wang, Z.; et al. zkfl: Zero-knowledge proof-based gradient aggregation for federated learning. 1098 2023, arXiv:2310.02554. [Google Scholar] [CrossRef]
- Zhu, Y.; et al. Secure and verifiable data collaboration with low-cost zero-knowledge proofs. 1144 PVLDB 2024, 17, 2321–2334. [Google Scholar] [CrossRef]
- He, X.; et al. Enabling privacy-preserving and publicly auditable federated learning. arXiv 2024, arXiv:2405.04029. [Google Scholar] [CrossRef]
- Chen, J.; et al. Privacy-preserving and traceable federated learning for industrial iot. Expert Syst. With Appl. 2023, i. [Google Scholar] [CrossRef]
- Gu, M.; et al. Enhancing data provenance and model transparency in federated learning. arXiv 2024, arXiv:2403.01451. [Google Scholar] [CrossRef]
- Nowroozi, E.; Haider, I.; Taheri, R. Federated learning under attack: Exposing vulnerabilities through data poisoning attacks in computer networks. IEEE Transactions on Information Forensics and Security, 2025a. [Google Scholar]
- Zhang, Y.; Jia, R.; Pei, H.; Wang, W.; Li, B.; Song, D. The secret revealer: Generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020b; pp. 253–2611. [Google Scholar]
- Srivastava, M.; Kaushik, A.; Loughran, R.; McDaid, K. Data poisoning attacks in the training phase of machine learning models: A review. 2024. [Google Scholar]
- Biggio, B.; Nelson, B.; Laskov, P. Support vector machines under adversarial label noise. Asian conference on machine learning(PMLR), 2011; pp. 97–1128. [Google Scholar]
- Bonawitz, K.; Kairouz, P.; McMahan, B.; Ramage, D. Federated learning and privacy: Building privacy-preserving systems for machine learning and data science on decentralized data. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021. [Google Scholar] [CrossRef]
- Bhagoji, A. N.; Chakraborty, S.; Mittal, P.; Calo, S. Analyzing federated learning through an adversarial lens. International conference on machine learning(PMLR), 2019a; pp. 634–6438. [Google Scholar]
- Gao, Y.; Doan, B. G.; Zhang, Z.; Ma, S.; Zhang, J.; Fu, A.; et al. Backdoor attacks and countermeasures on deep learning: A comprehensive review. arXiv 2020, arXiv:2007.10760. [Google Scholar] [CrossRef]
- Li, Y.; Jiang, Y.; Li, Z.; Xia, S.-T. Backdoor learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 5–2298. [Google Scholar] [CrossRef]
- Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z. B.; Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, 2017; pp. 506–5191. [Google Scholar]
- Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to backdoor federated learning. arXiv 2020a, arXiv:1807.00459. [Google Scholar]
- Mazzone, F.; Badawi, A. A.; Polyakov, Y.; Everts, M. Investigating privacy attacks in the gray-box setting to enhance collaborative learning schemes. 2024. [Google Scholar] [CrossRef]
- Zhang, Y.; Bai, G.; Chamikara, M.; Ma, M. Agrevader: Poisoning membership inference against byzantine-robust federated learning. In Proceedings of the ACM Asia Conference on Computer and Communications Security, 2023c. [Google Scholar] [CrossRef]
- Sakhnovych, Y. Black-box Model Watermarking in Federated Learning. Ph. D. thesis, TU Wien, 2024. [Google Scholar]
- Wang, Z.; Ma, J.; Wang, X.; Hu, J.; Qin, Z.; Ren, K. Threats to training: A survey of poisoning attacks and defenses on machine learning systems. ACM Comput. Surv. 2022. [Google Scholar] [CrossRef]
- Shafahi, A.; Huang, W. R.; Najibi, M.; Suciu, O.; Studer, C.; Dumitras, T.; et al. Poison frogs! targeted clean-label poisoning attacks on neural networks. Adv. Neural Inf. Process. Syst. 2018. [Google Scholar]
- Zhang, R.; Hussain, S.; Chen, H.; Javaheripi, M. Systemization of knowledge: robust deep learning using hardware-software co-design in centralized and federated settings. ACM Comput. Surv. 2023b. [Google Scholar] [CrossRef]
- Shah, A.; Ahmad, A.; Ali, B.; Anwer, S. Guarding the gates: A comprehensive survey of backdoor attacks on neural networks. 2025. [Google Scholar]
- Zhang, J.; Chen, B.; Cheng, X.; Binh, H. T. T.; Yu, S. Poisongan: Generative poisoning attacks against federated learning in edge computing systems. IEEE Internet Things J. 2020a, 8, 1126 3310–3322. [Google Scholar] [CrossRef]
- Weng, C.-H.; Lee, Y.-T.; Wu, S.-H. On the trade-off between adversarial and backdoor robustness. Adv. Neural Inf. Process. Syst. 2020, 33((NeurIPS)1101). [Google Scholar]
- Pan, M.; Zeng, Y.; Lyu, L.; Lin, X.; Jia, R. Asset: Robust backdoor data detection across a multiplicity of deep learning paradigms. In Proceedings of the 32nd USENIX Security Symposium, 2023. [Google Scholar]
- Xu, C.; Liu, W.; Zheng, Y.; Wang, S.; Chang, C.-H. An imperceptible data augmentation based blackbox clean-label backdoor attack on deep neural networks. In IEEE Transactions on Circuits and Systems I: Regular Papers; 2024. [Google Scholar] [CrossRef]
- De Gaspari, F.; Hitaj, D.; Mancini, L. V. Have you poisoned my data? defending neural networks against data poisoning. In Computer Security – ESORICS; Springer, 2024; Volume 2024, pp. 85–104. [Google Scholar] [CrossRef]
- Bena, N.; Anisetti, M.; Damiani, E.; Yeun, C. Y.; Ardagna, C. A. Protecting machine learning from poisoning attacks: A risk-based approach. Comput. Secur. 2025, 155, 104468. [Google Scholar] [CrossRef]
- Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to backdoor federated learning. AISTATS, 2020b. [Google Scholar]
- Xie, C.; Huang, K.; Chen, P.-Y.; Li, B. Dba: Distributed backdoor attacks against federated learning. International Conference on Learning Representations (ICLR), 2019; 1107. [Google Scholar]
- Wan, Y.; Qu, Y.; Ni, W.; Xiang, Y.; Gao, L. Data and model poisoning backdoor attacks on wireless federated learning, and the defense mechanisms: A comprehensive survey. IEEE Communications Surveys & Tutorials, 2024. [Google Scholar]
- Shejwalkar, V.; Houmansadr, A. Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning. USENIX Security Symposium, 2021a. [Google Scholar]
- El Mhamdi, E. M.; Guerraoui, R.; Rouault, S. The hidden vulnerability of distributed learning in byzantium. In Proceedings of the 35th International Conference on Machine Learning (ICML) (PMLR), 2018; pp. 3521–3530. [Google Scholar]
- Chen, L.; Liu, X.; Wang, A.; Zhai, W.; Cheng, X. Flsad: Defending backdoor attacks in federated learning via self-attention distillation. Symmetry 2024a, 16, 1497898. [Google Scholar] [CrossRef]
- Rocha, A.; Conti, M. Weidetect: Weibull distribution-based defense against poisoning attacks in federated learning for network intrusion detection systems. arXiv 2025, arXiv:2504.04367. [Google Scholar]
- Sun, Z.; Liu, C.; Yang, Q.; Qi, Y. Data poisoning attacks on federated machine learning. ACM Trans. Knowl. Discov. From Data (TKDD) 2021, 15, 1–2510. [Google Scholar] [CrossRef]
- Oprea, A.; Li, X.; Ma, Y.; Rigazzi, G.; Bridges, R. A.; Marchal, S. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. Expert Syst. With Appl. 2022, 204, 1175411. [Google Scholar]
- Jagielski, M.; Oprea, A.; Biggio, B.; Liu, C.; Nita-Rotaru, C.; Li, B. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. IEEE Symposium on Security and Privacy, 2018. [Google Scholar]
- Gharib, A.; Abawajy, J.; Tari, Z. Poisoning attacks and defenses in machine learning: A survey. IEEE Access, 2022. [Google Scholar]
- Paudice, A.; Mu noz-Gonz alez, L.; Lupu, E. C.; et al. Label sanitization against label flipping poisoning attacks. IEEE Access 2018, 6, 5423–5431. [Google Scholar]
- Gu, T.; Dolan-Gavitt, B.; Garg, S. Badnets: Identifying vulnerabilities in the machine learning model supply chain. In Proceedings of the IEEE, 2017. [Google Scholar]
- Kiss, I.; Guly as, G. G.; Imre, S. Adversarial machine learning in malware detection: Arms race between evasion attack and defense. 2017 IEEE International Conference on Future IoT Technologies, 2017. [Google Scholar]
- Koh, P. W.; Liang, P. Understanding black-box predictions via influence functions. International Conference on Machine Learning, 2017. [Google Scholar]
- Ma, Y.; Li, X.; Rigazzi, G.; Oprea, A.; Marchal, S. Systematic poisoning attacks on and defenses for machine learning in healthcare. arXiv 2022a, arXiv:2206.12345. [Google Scholar]
- Kroll, J. A. Accountability in machine learning: Governance, auditability, and responsibility. In Communications of the ACM; 2021a. [Google Scholar]
- Miguel, J.; Chen, T. Machine learning provenance for accountability. USENIX Symposium on Operating Systems Design and Implementation, 2021. [Google Scholar]
- Cao, D.; Chang, S.; Lin, Z.; Liu, G. Understanding distributed poisoning attack in federated learning. IEEE Conference on Communications and Network Security (CNS), 2019. [Google Scholar]
- Ma, Z.; Ma, J.; Miao, Y.; Li, Y. Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning. IEEE Transactions on Information Forensics and Security, 2022b. [Google Scholar]
- ElZemity, A.; Arief, B. Privacy threats and countermeasures in federated learning for internet of things: A systematic review. 2024 IEEE Conference on Communications and Network Security (CNS), 2024. [Google Scholar]
- Fung, C.; Yoon, C. J. M.; Beschastnikh, I. Mitigating sybils in federated learning poisoning. arXiv 2018b, arXiv:1808.04866. [Google Scholar]
- Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-robust distributed learning: Towards optimal statistical rates. In Proceedings of the 35th International Conference on Machine Learning 1 (ICML), 2018; pp. 5650–5659. [Google Scholar]
- Cao, X.; et al. Foolsgold++: Detecting sybil attacks with enhanced gradient similarity. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023. [Google Scholar]
- Blanchard, P.; El Mhamdi, E. M.; Guerraoui, R.; Stainer, J. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems (NeurIPS); 2017. [Google Scholar]
- Cao, X.; Fang, M.; Liu, J.; Gong, N. Z. FLTrust: Byzantine-robust federated learning via trust bootstrapping. arXiv 2020, arXiv:2012.13995. [Google Scholar]
- Tran, B.; Li, J.; Madry, A. Spectral signatures in backdoor attacks. NeurIPS 2018. [Google Scholar]
- Wang, B.; et al. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. 1094 IEEE S&P, 2019. [Google Scholar]
- Gao, Y.; et al. Strip: A defence against trojan attacks on deep neural networks. ACSAC. 2019. [Google Scholar]
- Zhang, Q.; Liu, F.; Wang, C. Bvdfed: Byzantine- and verifiability-resilient federated learning framework. Pattern Recognit. Lett. 2023a, 176, 44–53. [Google Scholar] [CrossRef]
- Li, W.; Zhao, P.; Ahmed, S. Cafcor: Covariance-bound aggregation with secret-based local differential privacy for federated learning. IEEE Trans. Neural Netw. Learn. Syst. 2024a, doi. [Google Scholar] [CrossRef]
- Ning, Z.; Li, C.; Xu, Y. Blockchain-enabled accountable federated learning for edge ai systems. IEEE Internet Things J. 2024b, 11, 10325–10337. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, Y.; Tang, H. Verifiable and accountable federated learning via permissioned blockchain. Future Gener. Comput. Syst. 2024b, 155, 521–535. [Google Scholar] [CrossRef]
- Tounsi, A.; Salem, O.; Mehaoua, A. Anomaly detection in federated learning: A comprehensive study on data poisoning and energy consumption patterns in iot devices. IEEE Internet of Things Journal, 2024. [Google Scholar]
- Li, D.; Wong, W. E.; Wang, W.; Yao, Y.; Chan, M. C. Detection and mitigation of label-flipping attacks in federated learning systems with kpca and k-means. 2 8th International Conference on Dependable Systems and Their Applications (DSA)(IEEE), 2021; pp. 551–5599. [Google Scholar]
- Chen, X.; Zan, D.; Li, W.; Guan, B. A gan-based data poisoning framework against anomaly detection in vertical federated learning. IEEE Transactions on Neural Networks and Learning Systems, 2024b. [Google Scholar]
- Alsulaimawi, Z. Federated learning with anomaly detection via gradient and reconstruction analysis. arXiv 2024. [Google Scholar] [CrossRef]
- Khraisat, A.; Alazab, A.; Alazab, M.; Jan, T.; Singh, S. Securing federated learning: a defense strategy against targeted data poisoning attack. Cogn. Comput. 2025. [Google Scholar] [CrossRef]
- Gambs, S.; Zhao, L.; Patel, D. Client-side validation voting in federated learning. ACM Transactions on Privacy and Security, 2021. [Google Scholar]
- Zhang, X.; Kim, M.; Kumar, R. Flcert: Federated certification via client grouping and voting.1133 arXiv preprint arXiv:2201.XXXXX. 2022. [Google Scholar]
- Hakeem, S.; Kim, H. Advancing intrusion detection in v2x networks: A comprehensive survey on machine learning, federated learning, and edge ai for v2x security. IEEE Access, 2025. [Google Scholar]
- Li, S.; Ngai, E. C. H.; V oigt, T. An experimental study of byzantine-robust aggregation schemes in federated learning. IEEE Transactions on Dependable and Secure Computing, 2023. [Google Scholar]
- Liu, X.; Li, H.; Xu, G.; Chen, Z.; Huang, X.; Lu, R. Privacy-enhanced federated learning against poisoning adversaries. IEEE Trans. Inf. Forensics Secur. 2021, 16, 4574–4588. [Google Scholar] [CrossRef]
- Ashwinee, K.; Natarajan, V. Efficient gradient clipping in federated learning defense. J. Priv. Confidentiality 2021. [Google Scholar]
- Rathee, M.; Shen, C.; Wagh, S.; Popa, R. A. Elsa: Secure aggregation for federated learning with malicious actors. 2023 IEEE Symposium on Security and Privacy (SP)1059, 2023. [Google Scholar]
- Ma, Y.; Woods, J.; Angel, S.; Polychroniadou, A.; Rabin, T. Flamingo: Multi-round single-server secure aggregation with applications to private federated learning. 2023 IEEE Symposium on Security and Privacy (SP), 2023. [Google Scholar]
- Lycklama, H.; Burkhalter, L.; Viand, A.; K uchler, N.; Hithnawi, A. Rofl: Robustness of secure federated learning. 2023 IEEE Symposium on Security and Privacy (SP), 2023. [Google Scholar] [CrossRef]
- Jiang, Y.; Zarezadeh, M.; Dai, T.; K opsell, S. Alphafl: Secure aggregation with malicious security for federated learning against dishonest majority. Proc. Priv. Enhancing Technol. 2025, 348–368doi. [Google Scholar] [CrossRef]
- Ma, X.; Cheng, K.; Shen, Y.; Li, X.; Chang, Z.; Zhang, T.; et al. Trusted model aggregation with zero-knowledge proofs in federated learning. IEEE Trans. Parallel Distrib. Syst. 2024b, 1 35, 2284–2296. [Google Scholar] [CrossRef]
- Xu, B.; et al. Efficient verifiable secure aggregation protocols for federated learning. J. Inf. Secur. Appl. 2025, 80, 104161. [Google Scholar] [CrossRef]
- Bouamama, S.; et al. Vesafl: Verifiable secure aggregation for privacy-preserving federated learning. IEEE Transactions on Dependable and Secure Computing, 2025. [Google Scholar]
- Bottoni, P.; et al. Verifiability and privacy in federated learning through distributed ledger technologies and randomized response techniques. In Proceedings of the 11th International Conference on Big Data Computing Applications and Technologies (BDCAT ’25), 2025. [Google Scholar] [CrossRef]
- Bhagoji, A. N.; Chakraborty, S.; Mittal, P.; Calo, S. Analyzing federated learning through an adversarial lens. International Conference on Machine Learning (ICML), 2019b. [Google Scholar]
- Nowroozi, E.; et al. Federated learning under attack: A systematic review of poisoning threats and defenses. Front. Artif. Intell. 2025b. [Google Scholar]
- Lianga, Z.; et al. A survey on federated learning poisoning attacks and defenses. IEEE Transactions on Big Data, 2023b. [Google Scholar]
- Shejwalkar, V.; Houmansadr, A. Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning. IEEE Symposium on Security and Privacy Workshops 1 (SPW)1074, 2021b. [Google Scholar]
- Chen, C.; Liu, J.; Tan, H.; Li, X.; Wang, K.; Li, P. Trustworthy federated learning: privacy, security, and beyond. In Knowledge and Information Systems; 2025a. [Google Scholar]
- Mohamed, H.; Koroniotis, N.; Moustafa, N. Harnessing federated learning for digital forensics in iot. IEEE Transactions on Information Forensics and Security, 2024. [Google Scholar]
- Masunda, M.; Ajayi, R. Enhancing security in federated learning [Dataset]. 2025. [Google Scholar] [CrossRef]
- Tallam, K. Engineering risk-aware, security-by-design frameworks for autonomous ai. arXiv 2025. [Google Scholar]
- Zhukabayeva, T.; Zholshiyeva, L.; Karabayev, N.; Khan, S. Cybersecurity solutions for industrial internet of things–edge computing integration: Challenges, threats, and future directions. 1147 Sens. 2025, 25, 2131148. [Google Scholar] [CrossRef]
- Nguyen, D. IoT Security: From Context-based Authentication to Secure Federated Learning Anomaly Detection. Ph. D. thesis, TU Darmstadt, 2024. [Google Scholar] [CrossRef]
- Hamouda, D. New technologies for security and privacy issues in the era of industry 5.0. PhD Dissertation, 2024. [Google Scholar]
- Shaik, M.; Bojja, G.; Gudala, L. Leveraging artificial intelligence for enhanced threat detection. 1068 Res. Prepr. 2025. [Google Scholar]
- Kairouz, P.; McMahan, B.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2023, 16, 1–210. [Google Scholar] [CrossRef]
- Truex, S.; Liu, L.; Gursoy, M. E.; Yu, L.; Wei, W. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security 1 (AISec); 2019; pp. 1–11. [Google Scholar] [CrossRef]
- Yao, J.; Shen, J.; Wu, Y.; Zhang, R. Aero: Efficient and verifiable secure aggregation in federated learning. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS), 2023; pp. 1012–1023. [Google Scholar] [CrossRef]
- Zeng, Q.; Zhou, L.; Li, X. Decentralized and auditable federated learning using blockchain and smart contracts. Inf. Sci. 2024b, 656, 119982. [Google Scholar] [CrossRef]
- Fung, C.; Yoon, C. J.; Beschastnikh, I. Mitigating sybils in federated learning poisoning. In Proceedings of the 35th International Conference on Machine Learning (ICML), 2018a; pp. 168–1829. [Google Scholar]
- Xia, Y.; Cheng, L. Robust federated learning under adversarial attacks: Survey and outlook. 1104 ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
- Rahman, M.; Debnath, B. A survey on statistical poisoning detection in federated learning. 1056 J. Netw. Comput. Appl. 2024, 238, 104890. [Google Scholar] [CrossRef]
- Cen, X.; Alur, R. Auditable ai systems: Foundations and techniques. Commun. ACM 2024b, 67, 76–88. [Google Scholar] [CrossRef]
- Miguel, E.; Sanchez, J.; Ortega, P. Accountability in artificial intelligence: From principles to practice. AI Ethics 2021b, 1, 43–59. [Google Scholar] [CrossRef]
- Malgieri, G.; Pasquale, F. The emerging principle of accountability in ai regulation. Comput. Law. Secur. Rev. 2022, 45, 105701. [Google Scholar] [CrossRef]
- International Organization for Standardization. ISO/IEC 42001: Artificial Intelligence Management System (AIMS) – Requirements. ISO/IEC JTC 1/SC 42947; International Organization for Standardization. 2023. [Google Scholar]
- Ning, W.; Zhu, Y.; Song, C.; Li, H.; Zhu, L.; Xie, J.; et al. Blockchain-based federated learning: A survey and new perspectives. Appl. Sci. 2024a, 14, 9459. [Google Scholar] [CrossRef]
- Ma, R.; et al. Trusted model aggregation with zero-knowledge proofs in federated learning. IEEE Trans. Dependable Secur. Comput. 2024a. [Google Scholar] [CrossRef]
- Gill, W.; et al. Tracefl: Interpretability-driven debugging in federated learning. arXiv 2023, arXiv:2312.13632. [Google Scholar]


| Attack Type | Description / Mechanism | Impact / Observations |
|---|---|---|
| Gradient-based Poisoning | Crafts poisoned samples using bilevel optimization to distort gradient updates Oprea et al. [53]. | Can degrade accuracy by up to 60%; enables precise targeted misclassification or divergence. |
| Label Flipping | Randomly or selectively alters labels to mislead training Paudice et al. [56]; Gharib et al. [55]. | Reduces accuracy by 20–40%; effects are visible in class-confusion patterns. |
| Clean-label Poisoning | Inserts benign-looking samples that collide with target-class features Shafahi et al. [36]. | Hard to detect; preserves overall accuracy while enabling targeted misclassification. |
| Backdoor Poisoning | Embeds imperceptible triggers into poisoned inputs Gu et al. [57]. | Produces near-100% targeted misclassification when the trigger is present. |
| Availability Attacks | Corrupts data or features to prevent convergence Kiss et al. [58]. | Leads to accuracy collapse or unstable training failure. |
| Influence Function Attacks | Identifies and poisons influential training points using Hessian-vector analysis Koh and Liang [59]. | Enables stealthy manipulation with a small poisoning budget. |
| Regression Poisoning | Manipulates continuous-valued regression data to maximize prediction error Oprea et al. [53]. | Doubles or triples MSE and affects downstream risk models. |
| Domain-Specific Poisoning | Uses contextualized attacks targeting healthcare or ICS systems Ma et al. [60]. | Causes misdiagnosis or anomaly suppression in 85–90% of industrial monitoring cases. |
| Attack Type | Mechanism | Stealth Level | Detectability | Primary Defense (Examples) |
|---|---|---|---|---|
| Label flipping | Clients alter labels randomly or target specific classes Paudice et al. [56]; Gharib et al. [55] | Low (large gradient shift) | High under distance metrics | Local/global validation, confusion-matrix checks |
| Gradient poisoning | Adversarial gradient crafting via bilevel optimization Biggio et al. [2]; Oprea et al. [53] | Medium | Medium (norm and angle metrics) | Krum Blanchard et al. [69], Trimmed Mean Yin et al. [67] |
| Model replacement | Single-round overwrite of global model via scaled malicious updates Bagdasaryan et al. [31] | Low (extreme deviation) | High (Euclidean distance, cosine) | FLTrust Cao et al. [70], Multi-Krum Blanchard et al. [69] |
| Backdoor (Visible) | Large, high-frequency or patterned triggers embedded in inputs Gu et al. [57] | Medium | Medium (spectral activation analysis) | Spectral signatures Tran et al. [71]; pruning |
| Backdoor (Stealthy) | Clean-label triggers or imperceptible perturbations Shafahi et al. [36]; Xia et al. [13] | Very High | Very Low under non-IID noise | Neural Cleanse Wang et al. [72], STRIP Gao et al. [73] |
| Indicator | Question answered | Typical evidence or proxy | Interpretation for comparison |
|---|---|---|---|
| Observability | Can poisoning-relevant events be inspected at all? | Visible per-client updates, validation traces, or encrypted-only outputs | Low observability limits anomaly detection and post-incident analysis even if privacy is strong. |
| Attribution specificity | To what entity can suspicious behavior be linked? | Sample-, client-, round-, or server-level identifiers; committee decisions; signed messages | Higher specificity improves blame assignment and remediation precision. |
| Aggregation verifiability | Can the claimed aggregate be independently checked? | Commitments, ZK proofs, authenticated transcripts, verifiable computation logs | High verifiability is especially important when the server is not fully trusted. |
| Tamper evidence | Can later modification or deletion of evidence be detected? | Hash chains, append-only ledgers, signatures, timestamps | Strong tamper evidence improves forensic readiness and institutional trust. |
| Forensic replayability | Can an incident be reconstructed after the fact? | Versioned models, round metadata, rejection logs, challenge transcripts | Replayability determines whether audits are actionable rather than merely symbolic. |
| Audit overhead | What extra cost is incurred to preserve evidence? | Communication, computation, storage, and latency overhead | High accountability may be impractical unless the overhead remains deployment-compatible. |
| Theme | Representative work (2023–2025) | Relevance to accountability, auditability, and traceability |
|---|---|---|
| Verifiable aggregation/integrity | Ma et al. [93]; Xu et al. [94] | Enables verifiable aggregation through cryptographic proofs, such as zero-knowledge or verifiable computation techniques, allowing the server or external auditors to verify aggregation correctness without revealing individual client updates. This directly strengthens auditability in accountable federated learning. |
| Malicious-secure aggregation (preventive poisoning resilience) | Jiang et al. [92]; Rathee et al. [89]; Ma et al. [90]; Lycklama et al. [91] | Provides cryptographic and MPC-based aggregation protocols that tolerate malicious clients and untrusted servers by enforcing update integrity and admissibility constraints before aggregation, thereby limiting adaptive poisoning attempts and strengthening accountability guarantees. |
| Client authenticity/participant integrity | Zeng et al. [16]; Chen et al. [13] | Uses auditable participant selection, identity-binding signatures, or traceable communication channels to reduce impersonation, repudiation, and Sybil-style poisoning entry points before model aggregation begins. |
| Update provenance/round traceability | Gu et al. [21]; Chen et al. [13] | Preserves round-level lineage, commitments, or provenance metadata so that suspicious updates can be linked to a participant, a communication round, and an evidence trail for later audit or incident response. |
| Secure aggregation with accountability support | Bouamama et al. [95]; Bottoni et al. [96] | Integrates secure aggregation with verification artifacts such as commitments, authenticated logs, or proofs that can serve as forensic evidence, thereby enabling traceability and post-hoc audits while preserving client privacy. |
| Defense family | Non-IID | Secure agg. | Server distrust | Collusive attackers | Audit artifact | Decision-support interpretation |
|---|---|---|---|---|---|---|
| Anomaly/statistical detection | Low–Medium | Low | Low | Low–Medium | Suspicion scores, cluster traces, rejected-update logs | Useful when per-client updates are visible and heterogeneity is moderate; it degrades when benign updates naturally diverge or when secure aggregation hides local updates. |
| Performance-based filtering | Medium | Low–Medium | Low | Low–Medium | Validation outcomes, committee decisions, and rejection records | Practical when trusted validation data exist, but vulnerable to poisoned validation sets and stealthy attacks that preserve clean accuracy. |
| Byzantine-robust aggregation | Low–Medium | Medium | Low | Low | Aggregation trace, selected-update metadata | Effective against isolated Byzantine clients under honest-majority assumptions, but weak against Sybils, coordinated collusion, and malicious aggregation logic. |
| Verifiable crypto defenses | High | High | High | Medium | Proofs, signed transcripts, and verification logs | Best suited to settings where server distrust, auditability, or compliance dominate; stronger integrity guarantees come with higher communication and computation overhead. |
| Audit-oriented frameworks | Medium | Medium | Medium–High | Medium | Tamper-evident logs, signatures, and provenance ledgers | Most valuable as an orchestration layer for evidence and governance; they improve attribution and auditability but must be combined with robust detection or aggregation to block poisoning in real time. |
| Title | Focus Area |
|---|---|
| ISO/IEC 42001: Artificial Intelligence Management Systems International Organization for Standardization [119] | Lifecycle governance, auditability, accountability |
| Blockchain-Based Federated Learning: A Survey and New Perspectives Ning et al. [120] | Auditing, traceability, taxonomy in BCFL |
| A Comprehensive Survey on Blockchain-based FL Liu et al. [50] | Security and audit frameworks for BCFL |
| Auditable FL with Blockchain-Based Participant Selection Zeng et al. [16] | Auditable sampling, ledgered evidence |
| zkFL: ZKP-based Gradient Aggregation for FL Wang et al. [17] | Verifiable aggregation, privacy |
| Trusted Model Aggregation with Zero-Knowledge Proofs Ma et al. [121] | ZK proofs for trusted aggregation |
| RiseFL: Secure and Verifiable Data Collaboration with Low-Cost ZKPs Zhu et al. [18] | Low-cost ZK proofs, verifiable training |
| Publicly Auditable and Privacy-Preserving FL He et al. [19] | Public auditability with robust aggregation |
| TraceFL: Interpretability-Driven Debugging in FL Gill et al. [122] | Neuron-level provenance, debugging |
| Enhancing Data Provenance & Transparency in FL Gu et al. [21] | Provenance tracking, reproducibility |
| PPTFL: Privacy-Preserving and Traceable FL Chen et al. [13] | Traceability with privacy preservation |
| Interpretable/Explainable FL Survey Li et al. [80] | Explainability, fXAI taxonomy |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.