Time-Aware Security Intelligence for Federated Financial Systems: Deep Reinforcement Learning Against Temporal Poisoning Attacks

Wenan Liu; Qixuan Yang; Weihang Gong; Rongji Yin; Zheng Li

doi:10.20944/preprints202510.0141.v1

Submitted:

01 October 2025

Posted:

02 October 2025

You are already at the latest version

Abstract

Financial institutions operating distributed machine learning systems face an emerging class of stealth adversaries who exploit temporal patterns across training cycles to inject persistent backdoors that remain dormant for months before activation. Unlike conventional single-round attacks, these sophisticated temporal poisoning strategies leverage sequential dependencies to bypass existing detection mechanisms while gradually compromising model integrity. Current defense frameworks remain fundamentally inadequate against such multi-period adversarial choreography, particularly in high-stakes financial environments where minute perturbations can trigger systemic failures. Existing security frameworks largely focus on static threat models and fail to address sophisticated multi-period adversarial strategies that unfold over time in financial transaction streams. To address these challenges, we propose DEFEND, a comprehensive defense framework that integrates temporal behavior analysis, robust statistical aggregation, and multi-scale verification into a unified multi-layer architecture. Our framework formulates defense coordination as a Markov Decision Process and employs Proximal Policy Optimization for adaptive policy learning that dynamically balances security enforcement with model utility. We design three sophisticated temporal attack models to comprehensively evaluate our defense mechanism: fixed-period data poisoning, multi-period data poisoning, and model weight poisoning attacks. The multi-layer defense architecture combines geometric median-based robust aggregation with Dynamic Time Warping pattern matching and adaptive client participation control. Extensive experiments on CIFAR-10, FEMNIST, and MNIST datasets demonstrate that DEFEND achieves superior defense performance with success rates of 95.6% for ResNet-18 and 94.0% for MobileNet V2, while maintaining clean accuracy levels between 85-95% across various data heterogeneity levels and malicious client ratios. Our framework provides theoretical guarantees for Byzantine robustness and practical scalability for moderate-scale federated deployments, making it well-suited for real-world financial applications requiring both security and efficiency.

Keywords:

federated learning

;

temporal backdoor attacks

;

financial security

;

deep reinforcement learning

;

Byzantine robustness

;

Markov decision process

;

geometric median aggregation

;

adaptive defense coordination

;

multi-layer security

;

temporal behavior analysis

Subject:

Computer Science and Mathematics - Computer Science

1. Introduction

1.1. Background

The financial industry is undergoing rapid digital transformation, with institutions increasingly relying on distributed machine learning techniques to improve fraud detection, risk management, and regulatory compliance. Federated learning (FL) has emerged as a promising paradigm to enable collaborative intelligence across banks, payment platforms, and other financial entities while maintaining data privacy [1,2,3]. By allowing participants to train shared models without centralizing raw data, FL provides a natural fit for sensitive financial applications where strict privacy regulations and competitive concerns prevent data sharing.

However, the adoption of FL in financial systems introduces significant security challenges. Recent studies highlight that FL is inherently vulnerable to poisoning and backdoor attacks, where malicious participants inject crafted updates to manipulate global models [4,5]. These attacks can be particularly damaging in financial environments, where small perturbations may lead to large-scale fraud or systemic risks. Beyond traditional single-round threats, researchers have uncovered persistent backdoor strategies that exploit temporal dependencies across multiple training rounds, allowing attackers to evade conventional defenses and achieve long-term stealth [6,7]. Such temporal vulnerabilities are especially critical in financial transaction streams, which naturally exhibit sequential correlations and evolving patterns.

To address these challenges, researchers have proposed robust aggregation methods and privacy-enhancing techniques to mitigate adversarial updates in FL. Approaches such as robust learning rate adjustment [8,9,10] and privacy-preserving backdoor defenses for heterogeneous data distributions [11] have demonstrated effectiveness under certain threat models. Nevertheless, most of these methods remain static and fail to capture multi-period adversarial strategies that unfold over time. Moreover, existing solutions often trade accuracy for security, limiting their practicality in high-stakes domains like finance where both precision and reliability are paramount.

Meanwhile, reinforcement learning (RL) has shown considerable potential for adaptive cybersecurity, offering dynamic decision-making in the face of evolving adversarial behaviors [12]. By framing defense coordination as a sequential decision problem, RL methods can enable financial FL systems to respond adaptively to temporal threats, balancing security enforcement with model utility. This motivates the development of integrated frameworks that combine multi-layered defenses with RL-based coordination to address sophisticated temporal backdoor attacks in financial federated learning environments.

1.2. Motivation and Contributions

Despite growing efforts to enhance the robustness of federated learning, several critical research gaps remain unaddressed in the context of financial systems:

1.: Existing security frameworks do not adequately account for temporal backdoor attacks that exploit sequential dependencies in financial transaction streams. Most defenses assume static or isolated threat models, overlooking coordinated multi-round adversarial strategies.
2.: Current defense mechanisms are largely single-layered and static, focusing either on aggregation robustness or anomaly detection, without integrating multiple complementary layers that can collectively enhance resilience against adaptive attackers.
3.: Few approaches incorporate dynamic coordination mechanisms to balance security and utility. Static thresholds or fixed strategies cannot adapt to changing adversarial intensity, heterogeneous client behaviors, and evolving network conditions.

To address these gaps, this work introduces DEFEND, a comprehensive defense framework for federated learning in financial systems. Our framework integrates temporal behavior analysis, robust statistical aggregation, and multi-scale verification into a multi-layered architecture, while employing a Markov Decision Process (MDP) formulation and reinforcement learning for adaptive coordination. The contributions of this paper are summarized as follows:

We formalize temporal backdoor threats in federated financial learning by characterizing attack strategies that exploit multi-period dependencies across training rounds.
We design a multi-layer defense architecture that combines temporal behavior profiling, robust aggregation, and multi-scale verification to jointly enhance detection accuracy and resilience.
We formulate defense coordination as an MDP problem and develop an RL-based policy using Proximal Policy Optimization (PPO) to dynamically manage defense actions, balancing robustness and model performance.
We validate DEFEND on multiple benchmark datasets (CIFAR-10, FEMNIST, MNIST) under varying degrees of heterogeneity and adversarial participation, demonstrating superior defense success rates and detection efficiency compared to state-of-the-art baselines.

The remainder of this paper is organized as follows. Section 2 reviews existing research on federated learning security, temporal attack detection, and multi-layer defense coordination. Section 3 describes the proposed DEFEND framework. Section 4 presents experimental evaluations. Section 6 concludes and outlines directions for future work.

2. Related Work

2.1. Federated Learning Security in Financial Systems

Federated learning security in financial systems has emerged as a critical research area addressing the inherent privacy and security challenges in distributed financial data processing environments [13,14,15,16,17,18,19,20,21]. Chen et al. [13] conducted an extensive survey identifying the intricate security challenges within federated learning frameworks, emphasizing vulnerabilities in communication links and potential cyber threats across decentralized networks. Their comprehensive analysis delves into various defensive strategies and explores applications across different sectors, contributing to the development of secure and efficient federated learning systems. To address specific financial fraud detection challenges, Aljunaid et al. [14] proposed an Explainable Federated Learning (XFL) model that integrates Shapley Additive Explanations (SHAP) and LIME techniques for enhanced interpretability while maintaining privacy compliance. Their approach achieved 99.95% accuracy with a miss rate of 0.05%, effectively eliminating false positives in financial fraud classification while preserving data privacy and regulatory compliance.

Building on decentralized security considerations, Hallaji et al. [15] performed a thorough security analysis of decentralized federated learning systems, studying possible variations of threats and adversaries while overviewing potential defense mechanisms. Their work addresses server-related threats elimination through blockchain technologies, though acknowledging new privacy challenges introduced by decentralized architectures. To enhance privacy preservation in financial technology applications, Xiong et al. [20] developed a Heterogeneous Privacy-Preserving Blockchain-Enabled Federated Learning (HPP-BEFL) system specifically designed for social fintech environments. Their novel PKI and identity-based heterogeneous authenticated asymmetric group key agreement (PKI-IB-HAAGKA) protocol effectively mitigates man-in-the-middle and inference attacks while addressing crypto system heterogeneity issues.

Recent advances have also explored credit risk assessment architectures [16], behavioral anomaly detection in dynamic transaction graphs [17], and blockchain-based knowledge enhancement mechanisms [22]. However, existing federated learning security frameworks lack comprehensive temporal attack detection mechanisms and fail to adequately address sophisticated backdoor attacks that exploit temporal patterns in financial data streams, which are essential for defending against coordinated multi-period adversarial strategies in distributed financial learning environments.

Table 1. Comparison of our work with related studies.

Ref

[13]

[23]

[15]

[20]

[24]

[25]

[26]

[27]

[14]

[28]

[29]

Proposed work

Feature

Financial application domain

✓

Federated learning framework

✓

Temporal attack detection

✓

Multi-layer defense design

✓

MDP/RL coordination

✓

2.2. Temporal Attack Detection and Defense Mechanisms

Temporal attack detection and defense mechanisms have gained significant attention as adversaries increasingly exploit time-dependent vulnerabilities in distributed systems [24,25,30,31,32,33,34,35]. Zamanzadeh et al. [25] provided a comprehensive survey of deep learning approaches for time series anomaly detection, highlighting the importance of identifying anomalous patterns that indicate novel or unexpected events such as production faults and system defects. Their taxonomy encompasses anomaly detection strategies and deep learning models, highlighting the challenges presented by the large size and complexity of temporal patterns in time series data. Duan et al. [24] addressed practical cyber attack detection through Continuous Temporal Graph (CTG) neural networks in dynamic network systems, proposing an interaction-centered perspective that refines information interactions between network entities into CTG evolution processes. Their framework naturally incorporates new node access behaviors and presents a message aggregation scheme that fuses spatio-temporal neighborhoods with actual time distribution and historical states, demonstrating superior performance on ToN-IoT, UNSWNB15, CIC-Dark2020, and J.P. Morgan payment datasets.

To address zero-day attack challenges, Wu et al. [31] developed an active learning framework using Deep Q-Network (DQN) for intelligent sample selection with probability distribution analysis. Their approach integrates Bi-directional Long Short-Term Memory (BiLSTM) networks into the DQN model to analyze temporal correlations within static classification contexts, employing Euclidean distance functions for accurate sample labeling. Hammad et al. [33] explored deep reinforcement learning for adaptive cyber defense, implementing cutting-edge DRL techniques including Deep Q-Networks (DQN), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) for real-time threat discernment and neutralization across varied cyber threat scenarios ranging from malware invasions to phishing attacks and adversarial assaults.

Recent developments have also investigated spatio-temporal advanced persistent threat detection in cyber-physical power systems [30], advances in time-series anomaly detection algorithms and benchmarks [36], and security defense strategies for Internet of Things based on deep reinforcement learning [37]. However, existing temporal attack detection frameworks lack sophisticated multi-period pattern recognition capabilities and fail to address coordinated backdoor injection strategies that exploit temporal dependencies across distributed learning rounds, which are crucial for detecting and defending against sophisticated temporal poisoning attacks in federated financial learning environments.

2.3. Multi-Layer Defense and MDP-based Coordination

Multi-layer defense mechanisms and MDP-based coordination strategies have emerged as critical components for robust federated learning systems, addressing the need for comprehensive protection against sophisticated adversarial attacks [23,26,27,28,29,38,39,40,41,42]. Li et al. [28] proposed a Multi-layer Aggregation Backdoor Defense Framework (MABDF) that ensures secure model aggregation through adaptive similarity filtering, pruning mean aggregation, and subspace robust projection methods. Their three-layer architecture calculates pairwise cosine similarity between client updates with dynamic thresholds based on median and standard deviation, applies pruning mean aggregation to detect hidden gradient operations, and projects updates onto low-rank subspaces through singular value decomposition (SVD) to suppress backdoor neuron activation, achieving backdoor attack success rates below 3% while maintaining model accuracy decrease of no more than 1.5%. Uddin et al. [23] conducted a systematic literature review using the PRISMA framework, analyzing 244 studies across eight themes of robust federated learning including objective regularization, optimizer modification, differential privacy, client selection, and new aggregation algorithms, providing comprehensive insights into approaches for enhancing FL model robustness against adversarial attacks and noisy updates.

For dynamic policy coordination, Bello et al. [26] developed a security strategy incorporating zero-trust models with dynamic policy decisions through stochastic games and reinforcement learning techniques. Their approach employs Generalized Proximal Policy Optimization with sample reuse (GePPO) and its meta-learning variant GePPO-ML, along with Sample Dropout PPO with meta-learning (SDPPO-ML) for adaptive policy updates, demonstrating superior performance compared to baseline REINFORCE and PPO algorithms in next generation network security scenarios. Huang et al. [27] addressed Byzantine robustness in heterogeneous federated learning through Self-Driven Entropy Aggregation (SDEA), leveraging random public datasets to conduct robust aggregation by introducing learnable aggregation weights that minimize instance-prediction entropy while maximizing batch-prediction entropy to accommodate diverse client tendencies and detect Byzantine attackers effectively.

Recent advances have also explored multi-layered protection systems for cloud computing environments [39], safe reinforcement learning frameworks for risk-averse dispatch with frequency security constraints [40], and security metrics for assessing power grids against attacks from EV charging ecosystems using Markov decision processes [29,43]. However, existing multi-layer defense frameworks lack integrated temporal anomaly detection capabilities and fail to provide coordinated MDP-based defense mechanisms that can dynamically adapt to evolving temporal backdoor attack patterns while maintaining optimal resource allocation and system performance in distributed financial learning environments.

3. Method

This section presents our comprehensive framework for detecting and defending against temporal backdoor attacks in federated learning environments through sophisticated mathematical modeling approaches. Our methodology integrates three distinct temporal poisoning attack models, establishes a multi-objective optimization framework, employs advanced statistical analysis techniques, and formulates an MDP-based defense mechanism. The proposed system leverages deep mathematical foundations to achieve optimal detection and defense performance across diverse federated learning scenarios while maintaining computational efficiency and theoretical rigor. Method architecture is shown in the Figure 1.

3.1. Problem Formulation

Consider a federated learning system comprising N participating clients denoted as

C = {c_{1}, c_{2}, \dots, c_{N}}

, where each client

c_{i}

maintains local dataset

D_{i} = {(x_{i, j}, y_{i, j})}_{j = 1}^{n_{i}}

over a finite communication horizon T. The temporal data sequence for client

c_{i}

is represented as

X_{i} = {x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{T}}

, where

x_{i}^{t} \in R^{m}

denotes the m-dimensional feature vector observed at communication round t. The corresponding ground truth labels are denoted as

y_{i} = {y_{i}^{1}, y_{i}^{2}, \dots, y_{i}^{T}}

, where

y_{i}^{t} \in Y

represents the true class label at round t from the label space

Y

.

The client operational states are characterized by a discrete state space

S = {s_{n}, s_{s}, s_{p}}

, where

s_{n}

represents the normal operational state,

s_{s}

indicates a suspicious state requiring further monitoring, and

s_{p}

denotes a confirmed poisoned state. During the poisoning process, a subset of clients

C_{p} \subset C

with cardinality

| C_{p} | \leq κ N

, where

κ \in (0, 1)

represents the maximum poisoning ratio, becomes compromised during specific time intervals

T_{p} \subset {1, 2, \dots, T}

.

Each client

c_{i}

is characterized by a resource profile

R_{i} = (C_{i}^{max}, M_{i}^{max}, E_{i}^{max}, B_{i}^{max})

, where

C_{i}^{max}

represents maximum computational capacity,

M_{i}^{max}

denotes memory capacity,

E_{i}^{max}

indicates energy budget, and

B_{i}^{max}

specifies communication bandwidth. The available resources at round t are denoted as

R_{i}^{t} = (C_{i}^{t}, M_{i}^{t}, E_{i}^{t}, B_{i}^{t})

, where each component satisfies

0 \leq R_{i}^{t} \leq R_{i}

.

The fundamental objective of our detection framework is to minimize the overall detection error, formulated as a multi-objective optimization problem with temporal consistency constraints:

L_{τ} = \sum_{i = 1}^{N} \sum_{t = 1}^{T} l_{d} ({\hat{s}}_{i}^{t}, s_{i}^{t}) + λ_{1} \sum_{i \in C_{p}} l_{t} ({\hat{T}}_{p, i}, T_{p, i}) + λ_{2} R_{f, p} + λ_{3} \int_{0}^{T} C (u) d u + λ_{4} H (s_{t}),

(1)

where

l_{d} : S \times S \to R_{\geq 0}

represents the client state classification loss function,

{\hat{s}}_{i}^{t}

denotes the predicted client state,

l_{t}

quantifies the temporal localization error between predicted intervals

{\hat{T}}_{p, i}

and true poisoning intervals

T_{p, i}

,

R_{f, p}

denotes the false positive penalty term,

C (u)

is the continuous cost function modeling resource consumption,

H (s_{t})

represents the entropy regularization term for uncertainty quantification, and

λ_{1}, λ_{2}, λ_{3}, λ_{4} \in R_{> 0}

are regularization coefficients that balance different objectives.

The temporal localization error is computed using a weighted Hausdorff distance:

l_{t} ({\hat{T}}_{p, i}, T_{p, i}) = \frac{1}{2} [max_{t \in {\hat{T}}_{p, i}} min_{τ \in T_{p, i}} | t - τ | + max_{τ \in T_{p, i}} min_{t \in {\hat{T}}_{p, i}} | t - τ |],

(2)

which ensures both precision and recall in temporal poisoning interval detection.

3.2. Temporal Backdoor Attack Models

To comprehensively evaluate our detection framework, we design three sophisticated temporal backdoor strategies that exploit different characteristics of federated learning protocols. Each attack model represents a distinct threat vector commonly encountered in real-world federated environments, incorporating realistic constraints on attack capabilities and detectability thresholds.

3.2.1. Fixed-Period Data Poisoning Attack

The fixed-period data poisoning attack systematically injects backdoor triggers into client datasets during predetermined temporal windows, eventually culminating in significant global model degradation at coordinated trigger points. This attack strategy maintains stealth by concentrating the poisoning effect within specific time intervals while preserving normal behavior patterns outside attack periods.

For malicious client

c_{i} \in C_{p}

at communication round t, the poisoned local dataset

{\tilde{D}}_{i}^{t}

is constructed according to:

{\tilde{D}}_{i}^{t} = \{\begin{matrix} D_{i} \cup P_{i}^{t} & if t \in T_{ϕ}, \\ D_{i} & otherwise, \end{matrix}

(3)

where

T_{ϕ} = [t_{s}, t_{e}] \subset {1, 2, \dots, T}

denotes the fixed attack interval, and the poisoned sample set

P_{i}^{t}

follows a sophisticated generation process incorporating temporal dynamics and spatial correlations.

The poisoned sample construction employs multi-dimensional trigger injection with harmonic modulation:

P_{i}^{t} = \{(x_{j} + ϵ_{i}^{t} ⊙ a_{i}^{t} + \sum_{k = 1}^{K} α_{k} sin (\frac{2 π k (t - t_{s})}{| T_{ϕ} |} + ϕ_{k}) w_{k}, y_{τ}) : (x_{j}, y_{j}) \in S_{i}^{t}\},

(4)

where ⊙ denotes element-wise multiplication,

a_{i}^{t} \in R^{m}

represents the primary attack direction vector, K is the number of harmonic components,

α_{k}, ϕ_{k}

control amplitude and phase parameters,

w_{k} \in R^{m}

are harmonic weight vectors,

y_{τ}

denotes the target label, and

S_{i}^{t} \subset D_{i}

represents the selected subset for poisoning.

The time-dependent perturbation magnitude

ϵ_{i}^{t}

follows an accumulative schedule incorporating memory effects and stochastic variations:

ϵ_{i}^{t} = α_{i} \sum_{τ = t_{s}}^{t} β^{t - τ} γ_{τ} \cdot I [τ \in T_{ϕ}] \cdot exp (- \frac{{(τ - μ_{i})}^{2}}{2 σ_{i}^{2}}) \cdot (1 + δ_{i} \sum_{j \in N_{i}} ζ_{i, j}^{τ}),

(5)

where

α_{i} > 0

represents the base accumulation rate,

β \in (0, 1)

is the temporal decay factor,

γ_{τ}

are stochastic scaling factors following a Markov chain with transition probabilities

P_{k, l} = \frac{exp (θ_{k, l})}{\sum_{j} exp (θ_{k, j})}

, the Gaussian envelope ensures smooth temporal transitions with mean

μ_{i}

and variance

σ_{i}^{2}

,

δ_{i}

controls neighborhood influence,

N_{i}

represents neighboring clients, and

ζ_{i, j}^{τ}

captures inter-client correlation effects.

3.2.2. Multi-Period Data Poisoning Attack

The multi-period data poisoning attack exploits temporal dependencies by distributing backdoor injection across multiple non-consecutive communication rounds, creating sophisticated interference patterns that evade single-period detection mechanisms while maintaining cumulative backdoor effectiveness.

The poisoned dataset construction employs period-specific trigger variations across disjoint temporal intervals

T_{μ} = {T_{1}, T_{2}, \dots, T_{K}}

where

T_{j} \cap T_{k} = \emptyset

for

j \neq k

:

{\tilde{D}}_{i}^{t} = \{\begin{matrix} D_{i} \cup P_{i, j}^{t} & if t \in T_{j}, j \in {1, 2, \dots, K}, \\ D_{i} & otherwise, \end{matrix}

(6)

where period-specific poisoned samples

P_{i, j}^{t}

incorporate adaptive trigger patterns and cross-period consistency constraints.

The multi-dimensional periodic signal injection follows:

P_{i, j}^{t} = \{(x_{l} + γ_{i} S_{i, j}^{t} + δ_{i} Q_{i, j}^{t}, y_{τ}) : (x_{l}, y_{l}) \in S_{i, j}^{t}\},

(7)

where

γ_{i}, δ_{i} \in R_{> 0}

control injection intensities for primary and secondary periodic signals

S_{i, j}^{t}

and

Q_{i, j}^{t}

respectively.

The primary multi-dimensional periodic signal

S_{i, j}^{t}

is defined as:

\begin{matrix} S_{i, j}^{t} & = \sum_{k = 1}^{K_{j}} A_{i, j, k} ⊙ sin (\frac{2 π t}{P_{i, j, k}} + ϕ_{i, j, k}) ⊙ W_{i, j, k}^{t} \\ + \sum_{l = 1}^{L_{j}} B_{i, j, l} ⊙ cos (\frac{2 π t}{Q_{i, j, l}} + ψ_{i, j, l}) ⊙ V_{i, j, l}^{t}, \end{matrix}

(8)

where

K_{j}

and

L_{j}

represent the numbers of sine and cosine components for period j,

A_{i, j, k}, B_{i, j, l} \in R^{m}

denote amplitude vectors,

P_{i, j, k}, Q_{i, j, l}

are fundamental periods,

ϕ_{i, j, k}, ψ_{i, j, l}

are phase shifts, and

W_{i, j, k}^{t}, V_{i, j, l}^{t}

are time-varying weight vectors.

3.2.3. Model Weight Poisoning Attack

The model weight poisoning attack creates direct manipulation of local model parameters before transmission to the server, generating high-intensity anomalous parameter patterns that bypass data-level detection mechanisms while maintaining coordinated execution across multiple malicious clients.

The poisoned model update follows a multi-state regime-switching framework with sophisticated perturbation strategies:

{\tilde{θ}}_{i}^{t} = \{\begin{matrix} θ_{i}^{t} + δ_{i}^{t} ⊙ n_{i}^{t} + η_{i}^{t} m_{i}^{t} & if c_{i} \in C_{p} and M_{i}^{t} = 1, \\ θ_{i}^{t} + \frac{1}{2} δ_{i}^{t} ⊙ n_{i}^{t} & if c_{i} \in C_{p} and M_{i}^{t} = 2, \\ θ_{i}^{t} + ε_{i}^{t} χ_{i}^{t} & if c_{i} \in C_{p} and M_{i}^{t} = 3, \\ θ_{i}^{t} & otherwise, \end{matrix}

(9)

where

M_{i}^{t} \in {1, 2, 3, 4}

is a Markov chain state indicator controlling attack intensity,

δ_{i}^{t}

controls primary poisoning magnitude,

n_{i}^{t}

is the primary directional perturbation vector,

η_{i}^{t}

controls secondary poisoning magnitude,

m_{i}^{t}

is the secondary perturbation vector,

ε_{i}^{t}

controls background noise magnitude, and

χ_{i}^{t}

represents background perturbations.

The poisoning intensity follows a multi-phase decay model coordinated across malicious clients:

\begin{matrix} δ_{i}^{t} & = δ_{m, i} f_{i} (t - t_{★, i}) \cdot I [c_{i} \in C_{p}], \end{matrix}

(10)

\begin{matrix} f_{i} (u) & = \{\begin{matrix} exp (- \frac{u}{τ_{d, 1, i}}) & if 0 \leq u \leq T_{1, i}, \\ exp (- \frac{u - T_{1, i}}{τ_{d, 2, i}}) & if T_{1, i} < u \leq T_{2, i}, \\ exp (- \frac{u - T_{2, i}}{τ_{d, 3, i}}) & if u > T_{2, i}, \end{matrix} \end{matrix}

(11)

where

δ_{m, i}

is the maximum intensity for client

c_{i}

,

t_{★, i}

is the attack initiation time,

τ_{d, j, i}

are phase-specific decay time constants, and

T_{j, i}

are phase transition times.

3.3. Multi-Layer Defense Framework

Our defense strategy employs a sophisticated three-tier detection architecture that combines temporal behavioral analysis, robust statistical aggregation, and multi-scale validation protocols to counter the identified attack vectors while maintaining theoretical guarantees and computational efficiency.

3.3.1. Temporal Behavioral Analysis Layer

The temporal analysis layer monitors client behavioral patterns across communication rounds through comprehensive statistical profiling and anomaly detection mechanisms incorporating both individual client dynamics and cross-client correlation analysis.

For each client

c_{i}

, we construct a multi-dimensional behavioral profile vector encompassing temporal features, historical patterns, connectivity measures, and uncertainty quantification:

b_{i}^{t} = [F_{i}^{t}, H_{i}^{t}, C_{i}^{t}, U_{i}^{t}, R_{i}^{t}],

(12)

where

F_{i}^{t}

represents temporal features,

H_{i}^{t}

encodes historical patterns,

C_{i}^{t}

models connectivity,

U_{i}^{t}

quantifies uncertainty, and

R_{i}^{t}

captures resource utilization patterns.

The temporal feature vector

F_{i}^{t} \in R^{d_{f}}

contains multi-scale statistical features extracted from recent communication windows:

F_{i}^{t} = [μ_{i}^{t}, σ_{i}^{t}, ς_{i}^{t}, κ_{i}^{t}, F_{ω, i}^{t}, F_{ψ, i}^{t}, F_{ξ, i}^{t}],

(13)

where the statistical moments are computed using robust estimators across multiple time scales:

\begin{matrix} μ_{i}^{t} & = \frac{1}{w_{f}} \sum_{τ = t - w_{f} + 1}^{t} {∥ θ_{i}^{τ} ∥}_{F} + \frac{1}{w_{s}} \sum_{τ = t - w_{s} + 1}^{t - w_{f}} {∥ θ_{i}^{τ} ∥}_{F}, \end{matrix}

(14)

\begin{matrix} σ_{i}^{t} & = \sqrt{\frac{1}{w_{f} - 1} \sum_{τ = t - w_{f} + 1}^{t} {({∥ θ_{i}^{τ} ∥}_{F} - μ_{i}^{t})}^{2} + ϵ_{σ}}, \end{matrix}

(15)

\begin{matrix} ς_{i}^{t} & = \frac{1}{w_{f}} \sum_{τ = t - w_{f} + 1}^{t} {(\frac{{∥ θ_{i}^{τ} ∥}_{F} - μ_{i}^{t}}{σ_{i}^{t}})}^{3} + ρ_{ς} ς_{i}^{t - 1}, \end{matrix}

(16)

\begin{matrix} κ_{i}^{t} & = \frac{1}{w_{f}} \sum_{τ = t - w_{f} + 1}^{t} {(\frac{{∥ θ_{i}^{τ} ∥}_{F} - μ_{i}^{t}}{σ_{i}^{t}})}^{4} - 3 + ρ_{κ} κ_{i}^{t - 1}, \end{matrix}

(17)

where

w_{f}

and

w_{s}

are short and long window sizes,

ϵ_{σ}

prevents numerical instability,

ρ_{ς}, ρ_{κ} \in (0, 1)

provide temporal smoothing,

F_{ω, i}^{t} \in R^{d_{ω}}

contains dominant frequency components from spectral analysis,

F_{ψ, i}^{t} \in R^{d_{ψ}}

includes wavelet coefficients, and

F_{ξ, i}^{t} \in R^{d_{ξ}}

represents spectral entropy measures.

The anomaly detection mechanism employs multi-scale sliding window analysis with adaptive thresholding:

A_{i}^{t} = \underset{w \in W}{⋁} [\frac{1}{w} \sum_{τ = t - w + 1}^{t} {∥ b_{i}^{τ} - μ_{i}^{t - w} ∥}_{Σ_{i}^{t - w}} > τ_{α}^{w}],

(18)

where

W = {w_{1}, w_{2}, \dots, w_{L}}

contains multiple window sizes,

μ_{i}^{t - w}

and

Σ_{i}^{t - w}

represent historical mean and covariance computed through robust estimation,

{∥ \cdot ∥}_{Σ}

denotes the Mahalanobis distance, and

τ_{α}^{w}

are scale-specific detection thresholds.

For multi-period attack detection, we implement a sophisticated pattern matching algorithm based on dynamic time warping:

P_{i}^{t} = max_{P \in P_{Θ}} DTW ({A_{i}^{τ}}_{τ = t - T_{P} + 1}^{t}, P),

(19)

where

P_{Θ}

contains known attack pattern templates and DTW computes optimal alignment scores accounting for temporal distortions.

The DTW distance is computed through dynamic programming with warping constraints:

\begin{matrix} DTW (X, Y) & = D [n_{x}, n_{y}], \end{matrix}

(20)

\begin{matrix} D [i, j] & = d (x_{i}, y_{j}) + min {D [i - 1, j], D [i, j - 1], D [i - 1, j - 1]}, \end{matrix}

(21)

\begin{matrix} s . t . | i - j | & \leq W_{warp}, \end{matrix}

(22)

where

d (x_{i}, y_{j})

is the local distance function,

n_{x}, n_{y}

are sequence lengths, and

W_{warp}

constrains warping flexibility.

3.3.2. Robust Statistical Aggregation Layer

The aggregation layer employs Byzantine-robust techniques enhanced with temporal consistency constraints and multi-dimensional filtering to identify and mitigate malicious model updates while preserving the convergence properties of the global optimization process. The robust aggregation mechanism operates through sequential filtering stages incorporating geometric median estimation, distance-based outlier detection, and weighted consensus formation. The robust central tendency is established through iterative geometric median estimation:

{\hat{θ}}_{ν}^{t} = arg min_{θ} \sum_{i = 1}^{N} {∥ θ - θ_{i}^{t} ∥}_{F},

(23)

solved using the accelerated Weiszfeld algorithm with momentum-based convergence enhancement:

\begin{matrix} θ^{(k + 1)} & = \frac{\sum_{i = 1}^{N} \frac{θ_{i}^{t}}{∥ θ^{(k)} - θ_{i}^{t} ∥_{F} + ϵ}}{\sum_{i = 1}^{N} \frac{1}{∥ θ^{(k)} - θ_{i}^{t} ∥_{F} + ϵ}}, \end{matrix}

(24)

\begin{matrix} θ^{(k + 1)} & \leftarrow θ^{(k + 1)} + α_{k} (θ^{(k + 1)} - θ^{(k)}), \end{matrix}

(25)

where

ϵ

prevents numerical instabilities and

α_{k}

provides adaptive momentum acceleration. Suspicious clients are identified through sophisticated distance-based analysis incorporating multiple statistical measures:

S^{t} = \{c_{i} : {∥ θ_{i}^{t} - {\hat{θ}}_{ν}^{t} ∥}_{F} > Q_{1 - α} ({d_{j}^{t}}_{j = 1}^{N}) + β \cdot IQR ({d_{j}^{t}}_{j = 1}^{N}) \lor A_{i}^{t} = True\},

(26)

where

d_{j}^{t} = {∥ θ_{j}^{t} - {\hat{θ}}_{ν}^{t} ∥}_{F}

,

Q_{1 - α}

denotes the

(1 - α)

-quantile, IQR represents the interquartile range,

β > 0

controls filtering sensitivity, and

A_{i}^{t}

incorporates temporal anomaly detection results. The final global model incorporates temporal consistency constraints and reputation-based weighting:

θ^{t} = arg min_{θ} \sum_{i \notin S^{t}} w_{i}^{t} {∥ θ - θ_{i}^{t} ∥}_{F}^{2} + λ_{τ} {∥ θ - θ^{t - 1} ∥}_{F}^{2} + λ_{ν} {∥ θ - {\hat{θ}}_{ν}^{t} ∥}_{F}^{2},

(27)

where the client weights incorporate multi-factor scoring:

w_{i}^{t} = \frac{exp (- γ \cdot (A_{i}^{t} + I [c_{i} \in S^{t}] + R_{i}^{t}))}{\sum_{j \notin S^{t}} exp (- γ \cdot (A_{j}^{t} + I [c_{j} \in S^{t}] + R_{j}^{t}))},

(28)

and the reputation score

R_{i}^{t}

captures historical behavior patterns through multi-scale temporal weighting:

R_{i}^{t} = \sum_{k = 1}^{K_{r}} ω_{k} \sum_{τ = max (1, t - w_{k})}^{t - 1} exp (- \frac{t - τ}{σ_{k}}) I [c_{i} flagged at round τ],

(29)

where

K_{r}

is the number of temporal scales,

ω_{k}

are scale-specific weights,

w_{k}

are window sizes, and

σ_{k}

control exponential decay rates.

3.3.3. Multi-Scale Validation Layer

The validation layer provides continuous monitoring of global model integrity through clean performance tracking, backdoor trigger detection, and coordinated response protocols incorporating sophisticated statistical testing and model rollback mechanisms. We maintain a held-out validation set

D_{υ}

and track performance degradation through robust statistical testing:

L_{c}^{t} = \frac{1}{| D_{υ} |} \sum_{(x, y) \in D_{υ}} l (f_{θ^{t}} (x), y),

(30)

with alert generation through sequential hypothesis testing:

A_{c}^{t} = I [L_{c}^{t} > L_{c}^{t - 1} + τ_{c}] \lor I [L_{c}^{t} > μ_{β} + σ_{β} \cdot z_{α_{c}}],

(31)

where

μ_{β}, σ_{β}

characterize historical performance distribution and

z_{α_{c}}

is the critical value for significance level

α_{c}

. We maintain a trigger detection dataset

D_{ξ} = {(x_{j} + δ_{k}, y_{j})}

covering potential trigger patterns and monitor:

L_{β}^{t} = max_{y_{τ}} \frac{1}{| D_{ξ} |} \sum_{(x_{ξ}, y_{π}) \in D_{ξ}} I [f_{θ^{t}} (x_{ξ}) = y_{τ}],

(32)

with backdoor detection alert:

A_{β}^{t} = I [L_{β}^{t} > τ_{β}] \lor I [\frac{L_{β}^{t}}{L_{c}^{t}} > τ_{ρ}],

(33)

where

τ_{β}

and

τ_{ρ}

are detection thresholds for absolute and relative trigger activation rates. Upon alert generation, the system implements graduated response protocols through state transition mechanisms:

R^{t} = \{\begin{matrix} M_{e} & if A_{c}^{t} \land \neg A_{β}^{t}, \\ I_{c} & if A_{β}^{t} \land P_{i}^{t} > τ_{π}, \\ R_{θ} & if A_{β}^{t} \land L_{β}^{t} > τ_{κ}, \\ N_{o} & otherwise, \end{matrix}

(34)

where

M_{e}

denotes enhanced monitoring,

I_{c}

represents client investigation,

R_{θ}

implements model rollback, and

N_{o}

continues normal operation. The model rollback mechanism employs temporal checkpointing with integrity verification:

θ^{t} = \{\begin{matrix} θ^{t - k_{★}} & if R^{t} = R_{θ} \land V (θ^{t - k_{★}}) > τ_{υ}, \\ arg min_{k \leq K_{max}} V (θ^{t - k}) & if R^{t} = R_{θ} \land V (θ^{t - k_{★}}) \leq τ_{υ}, \\ θ^{t} & otherwise, \end{matrix}

(35)

where

k_{★}

is determined by the severity of the detected anomaly,

V (\cdot)

quantifies model integrity through comprehensive validation metrics, and

K_{max}

bounds the maximum rollback distance.

3.4. MDP Framework for Defense Coordination

We formulate the temporal backdoor defense problem as a Markov Decision Process (MDP) defined as

M_{d} = (S_{d}, A_{d}, P_{d}, R_{d}, γ)

, where each component captures the sequential decision-making nature of coordinated defense mechanisms with comprehensive state representation and sophisticated action space design for optimal defense coordination.

3.4.1. Defense State Space Design

The defense environment state

s_{t}^{d} \in S_{d}

at communication round t encompasses comprehensive information about the federated system security status through multiple information channels:

s_{t}^{d} = {B_{t}, D_{t}, A_{t}, S_{t}, U_{t}, R_{t}, G_{t}},

(36)

where

B_{t}

represents client behavioral profiles,

D_{t}

encodes detection history,

A_{t}

models anomaly indicators,

S_{t}

maintains security metrics,

U_{t}

quantifies uncertainty measures,

R_{t}

captures resource utilization, and

G_{t}

represents global system statistics.

The client behavioral profile matrix

B_{t} \in R^{N \times d_{b}}

contains comprehensive behavioral features for all clients:

B_{t} [i, :] = [b_{i}^{t}, Δ b_{i}^{t}, ∥ b_{i}^{t} - b_{i}^{t - 1} ∥_{2}, rank (b_{i}^{t}), B_{ρ, i}^{t}],

(37)

where

b_{i}^{t}

is the current behavioral profile,

Δ b_{i}^{t}

represents temporal changes, the norm captures profile stability,

rank (b_{i}^{t})

indicates behavioral complexity, and

B_{ρ, i}^{t}

encodes cross-client correlations.

The detection history matrix

D_{t} \in R^{N \times d_{d}}

maintains weighted information about previous detection decisions with multi-scale temporal decay:

D_{t} [i, :] = [\sum_{k = 1}^{K_{d}} ω_{d, k} \sum_{τ = max (1, t - w_{d, k})}^{t - 1} ω_{d}^{(t - τ) / k} \cdot I [{\hat{s}}_{i}^{τ} = s_{p}], R_{i}^{t}, C_{i}^{t}, V_{i}^{t}],

(38)

where

K_{d}

is the number of temporal scales,

ω_{d, k}

are scale-specific weights,

w_{d, k}

are scale-specific window sizes,

ω_{d} \in (0, 1)

provides exponential temporal weighting,

R_{i}^{t}

represents reputation scores,

C_{i}^{t}

captures confidence levels, and

V_{i}^{t}

quantifies detection variance.

The anomaly indicator vector

A_{t} \in R^{N}

captures multiple types of anomalies through ensemble-based detection:

A_{t} [i] = α_{a} A_{τ, i}^{t} + β_{a} A_{σ, i}^{t} + γ_{a} A_{γ, i}^{t} + δ_{a} A_{ς, i}^{t},

(39)

where

α_{a}, β_{a}, γ_{a}, δ_{a}

are weighting coefficients,

A_{τ, i}^{t}

represents temporal anomalies,

A_{σ, i}^{t}

captures statistical anomalies,

A_{γ, i}^{t}

indicates geometric median deviations, and

A_{ς, i}^{t}

quantifies spectral anomalies.

The security metrics vector

S_{t} \in R^{d_{s}}

tracks system-wide security indicators:

S_{t} = [ξ_{t}^{g}, ξ_{t}^{h}, ξ_{t}^{c}, ξ_{t}^{z}, ξ_{t}^{θ}, ξ_{t}^{r}],

(40)

where

ξ_{t}^{g}

measures global model integrity,

ξ_{t}^{h}

quantifies information entropy,

ξ_{t}^{c}

captures aggregation consensus,

ξ_{t}^{z}

indicates system stability,

ξ_{t}^{θ}

represents threat level assessment, and

ξ_{t}^{r}

measures defense resilience.

3.4.2. Defense Action Space Formulation

The defense action space

A_{d}

encompasses coordinated defense decisions, resource allocation, and temporal control through a hierarchical structure:

A_{d} = A_{δ} \times A_{μ} \times A_{α} \times A_{ζ},

(41)

where

A_{δ}

represents detection actions,

A_{μ}

determines mitigation strategies,

A_{α}

controls resource allocation, and

A_{ζ}

manages temporal adaptations.

The detection action space includes comprehensive detection strategies:

A_{δ} = {a_{m}, a_{i}, a_{v}, a_{p}, a_{q}, a_{e}}^{N},

(42)

where

a_{m}

denotes passive monitoring,

a_{i}

represents detailed inspection,

a_{v}

indicates verification protocols,

a_{p}

triggers active probing,

a_{q}

implements temporary isolation, and

a_{e}

enforces permanent exclusion.

The mitigation action space operates on multiple defense strategies with coordination constraints:

\begin{matrix} A_{μ} & = {μ \in {[0, 1]}^{N \times M} : \sum_{j = 1}^{M} μ_{i, j} = 1 \forall i, \sum_{i = 1}^{N} μ_{i, j} \leq C_{j} \forall j}, \end{matrix}

(43)

\begin{matrix} C_{ρ} & = max_{i, j, k, l} \frac{μ_{i, j}}{μ_{k, l}} \leq ζ_{ρ}, \end{matrix}

(44)

where

μ_{i, j}

represents the intensity of mitigation strategy j applied to client i, M is the number of mitigation strategies including: enhanced monitoring (

j = 1

), gradient clipping (

j = 2

), noise injection (

j = 3

), weight decay regularization (

j = 4

), and adaptive learning rate scaling (

j = 5

),

C_{j}

bounds the total capacity for strategy j,

C_{ρ}

ensures coordination constraints, and

ζ_{ρ}

limits mitigation disparity.

The resource allocation space manages computational and communication resources across defense mechanisms:

\begin{matrix} A_{α} & = {ρ \in {[0, 1]}^{D \times R} : \sum_{r = 1}^{R} ρ_{d, r} = 1 \forall d, ρ_{d, r} \geq ρ_{min} \forall d, r}, \end{matrix}

(45)

\begin{matrix} E_{η} & = \sum_{d = 1}^{D} \sum_{r = 1}^{R} ρ_{d, r} F_{d, r} (S_{t}) \geq E_{min}, \end{matrix}

(46)

where

ρ_{d, r}

represents the fraction of resource type r allocated to defense mechanism d, D is the number of defense mechanisms including: temporal analysis (

d = 1

), statistical aggregation (

d = 2

), and validation monitoring (

d = 3

), R is the number of resource types including: CPU computation (

r = 1

), memory storage (

r = 2

), and communication bandwidth (

r = 3

),

ρ_{min}

ensures minimum allocation,

F_{d, r}

quantifies efficiency functions, and

E_{min}

guarantees minimum defense effectiveness.

The temporal adaptation action space

A_{ζ}

controls dynamic client participation and weight acceptance decisions:

A_{ζ} = {ζ \in {0, 1}^{N} : \sum_{i = 1}^{N} ζ_{i} \geq N_{min}},

(47)

where

ζ_{i} \in {0, 1}

indicates whether to accept model weights from client

c_{i}

at the current round (1 for accept, 0 for reject), and

N_{min}

ensures minimum participation for convergence. The temporal adaptation decision for each client is governed by:

ζ_{i}^{t} = \{\begin{matrix} 0 & if A_{i}^{t} = True \land P_{i}^{t} > τ_{ζ}, \\ 0 & if c_{i} \in S^{t} \land R_{i}^{t} > τ_{ρ}, \\ Bernoulli (p_{i}^{t}) & if A_{i}^{t} = True \land P_{i}^{t} \leq τ_{ζ}, \\ 1 & otherwise, \end{matrix}

(48)

where

τ_{ζ}

and

τ_{ρ}

are rejection thresholds for anomaly scores and reputation scores respectively, and the stochastic acceptance probability is:

p_{i}^{t} = σ (α_{ζ} - β_{ζ} A_{i}^{t} - γ_{ζ} R_{i}^{t} - δ_{ζ} ∥ θ_{i}^{t} - {\hat{θ}}_{ν}^{t} ∥_{F}),

(49)

where

σ (\cdot)

is the sigmoid function, and

α_{ζ}, β_{ζ}, γ_{ζ}, δ_{ζ}

are learned parameters that balance acceptance probability based on anomaly levels, reputation, and parameter deviation.

3.4.3. Defense Transition Dynamics

The state transition probabilities

P_{d} : S_{d} \times A_{d} \times S_{d} \to [0, 1]

capture the stochastic evolution of the defense environment under coordinated security actions:

P_{d} (s_{t + 1}^{d} | s_{t}^{d}, a_{t}^{d}) = \prod_{j = 1}^{| s |} P_{j} (s_{t + 1, j}^{d} | s_{t}^{d}, a_{t}^{d}),

(50)

where the factorization assumes conditional independence across state components given the current state and action.

The behavioral profile transition dynamics incorporate temporal evolution and defense interventions:

\begin{matrix} P_{B} (B_{t + 1} | s_{t}^{d}, a_{t}^{d}) & = \prod_{i = 1}^{N} N (b_{i, t + 1} | μ_{b, i}^{t} + W_{b} a_{δ, i}^{t}, Σ_{b, i}^{t}), \end{matrix}

(51)

\begin{matrix} μ_{b, i}^{t} & = A_{b} b_{i}^{t} + B_{b} h_{i}^{t} + c_{b}, \end{matrix}

(52)

\begin{matrix} h_{i}^{t} & = tanh (U_{h} b_{i}^{t - 1} + V_{h} a_{δ, i}^{t - 1} + b_{h}), \end{matrix}

(53)

where

A_{b}, B_{b}, W_{b}

are learned transition matrices,

h_{i}^{t}

represents latent behavioral states,

U_{h}, V_{h}

control temporal dependencies, and

Σ_{b, i}^{t}

captures uncertainty in behavioral evolution.

The detection history transitions incorporate memory decay and decision outcomes:

P_{D} (D_{t + 1} | s_{t}^{d}, a_{t}^{d}) = \prod_{i = 1}^{N} δ (d_{i, t + 1} - T_{d} (d_{i, t}, a_{δ, i}^{t}, ω_{t})),

(54)

where

δ (\cdot)

is the Dirac delta function,

T_{d}

represents the deterministic update function, and

ω_{t}

captures environmental stochasticity.

3.4.4. Defense Reward Function Design

The defense reward function

R_{d} : S_{d} \times A_{d} \times S_{d} \to R

incorporates multiple defense objectives through a sophisticated multi-criteria framework:

\begin{matrix} R_{d} (s_{t}^{d}, a_{t}^{d}, s_{t + 1}^{d}) & = λ_{1} \sum_{i = 1}^{N} R_{δ, i} (s_{t}^{d}, a_{t}^{d}, s_{t + 1}^{d}) + λ_{2} \sum_{i = 1}^{N} R_{μ, i} (s_{t}^{d}, a_{t}^{d}) \\ + λ_{3} R_{σ} (s_{t}^{d}, a_{t}^{d}, s_{t + 1}^{d}) + λ_{4} R_{η} (a_{t}^{d}) \\ - λ_{5} R_{κ} (a_{t}^{d}) - λ_{6} R_{ϕ} (s_{t}^{d}, a_{t}^{d}, s_{t + 1}^{d}), \end{matrix}

(55)

where the reward components capture detection accuracy, mitigation effectiveness, security improvement, operational efficiency, resource costs, and system disruption.

The detection reward incorporates accuracy, timeliness, and confidence weighting:

\begin{matrix} R_{δ, i} (s_{t}^{d}, a_{t}^{d}, s_{t + 1}^{d}) & = I [s_{i}^{*} = s_{p}] \cdot I [a_{δ, i} \in {a_{i}, a_{v}, a_{p}}] \cdot η_{δ} \\ \cdot e^{- λ_{τ} (t - t_{π, i})} \cdot (1 - U_{t} [i]) \cdot W_{ν}^{i}, \end{matrix}

(56)

where

s_{i}^{*}

represents the true client state,

η_{δ}

is the base detection reward,

λ_{τ}

controls temporal decay,

t_{π, i}

is the actual attack start time,

U_{t} [i]

quantifies detection uncertainty, and

W_{ν}^{i}

captures network effect weighting.

The mitigation reward measures the effectiveness of applied defense strategies:

R_{μ, i} (s_{t}^{d}, a_{t}^{d}) = \sum_{j = 1}^{M} μ_{i, j} \cdot E_{j} (S_{t}, A_{t} [i]) \cdot I [a_{δ, i} \neq a_{m}] \cdot η_{μ},

(57)

where

E_{j}

quantifies the effectiveness of mitigation strategy j given system state and anomaly levels, and

η_{μ}

is the base mitigation reward.

The security reward tracks overall system security improvement:

R_{σ} (s_{t}^{d}, a_{t}^{d}, s_{t + 1}^{d}) = \sum_{k = 1}^{d_{s}} ω_{k}^{σ} max (0, S_{t + 1} [k] - S_{t} [k]) + η_{σ} I_{θ}^{t},

(58)

where

ω_{k}^{σ}

are security metric weights,

S_{t} [k]

represents individual security metrics,

η_{σ}

is the threat mitigation reward, and

I_{θ}^{t}

indicates successful threat neutralization.

3.4.5. Defense Policy Optimization

The optimal defense policy

π^{*} : S_{d} \to A_{d}

is learned through advanced reinforcement learning techniques incorporating temporal credit assignment and multi-objective optimization:

π^{*} = arg max_{π} E_{τ \sim π} [\sum_{t = 0}^{T - 1} γ^{t} R_{d} (s_{t}^{d}, a_{t}^{d}, s_{t + 1}^{d})],

(59)

where

τ

represents a trajectory,

γ \in (0, 1)

is the discount factor, and the expectation is taken over the policy-induced distribution.

The policy optimization employs actor-critic architecture with attention mechanisms:

\begin{matrix} π_{θ} (a_{t}^{d} | s_{t}^{d}) & = softmax (W_{π} h_{π}^{t} + b_{π}), \end{matrix}

(60)

\begin{matrix} h_{π}^{t} & = Attention (Q_{π}, K_{π}, V_{π}) + f_{π} (s_{t}^{d}), \end{matrix}

(61)

\begin{matrix} V_{ϕ} (s_{t}^{d}) & = W_{V} h_{V}^{t} + b_{V}, \end{matrix}

(62)

where

h_{π}^{t}, h_{V}^{t}

are attention-enhanced hidden representations,

f_{π}

encodes state features, and

θ, ϕ

are learnable parameters.

Algorithm 1 DEFEND: DEep Federated Ensemble Network Defense
1: Input: Client updates ${θ_{i}^{t}}_{i = 1}^{N}$ , historical profiles ${b_{i}^{t - 1}}_{i = 1}^{N}$ , defense state $s_{t - 1}^{d}$ , hyperparameters $λ_{τ}, λ_{ν}, γ$ , window sizes $W$ , detection thresholds ${τ_{α}^{w}}$ ;
2: Output: Aggregated model $θ^{t}$ , updated profiles ${b_{i}^{t}}_{i = 1}^{N}$ , defense action $a_{t}^{d}$ ;
3: for each client $c_{i}$ do
4: Compute temporal profile $b_{i}^{t}$	▹ Equation (12)
5: Calculate statistical moments ${μ_{i}^{t}, σ_{i}^{t}, ς_{i}^{t}, κ_{i}^{t}}$	▹ Equation (14)-(17)
6: Detect temporal anomalies $A_{i}^{t}$	▹ Equation (18)
7: Check multi-period patterns $P_{i}^{t}$	▹ Equation (19)
8: end for
9: Construct defense state $s_{t}^{d}$	▹ Equation (36)
10: Select defense action $a_{t}^{d} \sim π_{θ} (a_{t}^{d} \| s_{t}^{d})$	▹ Equation (60)
11: Determine client participation ${ζ_{i}^{t}}_{i = 1}^{N}$	▹ Equation (48)
12: Compute geometric median ${\hat{θ}}_{ν}^{t}$ for active clients	▹ Equation (23)
13: Apply Weiszfeld algorithm with updates	▹ Equation (24)-(25)
14: Perform outlier detection to identify $S^{t}$	▹ Equation (26)
15: Calculate client weights ${w_{i}^{t}}$ for participating clients	▹ Equation (28)
16: Execute weighted aggregation $θ^{t}$	▹ Equation (27)
17: Monitor clean performance $L_{c}^{t}$	▹ Equation (30)
18: Check trigger responses $L_{β}^{t}$	▹ Equation (32)
19: Generate alerts ${A_{c}^{t}, A_{β}^{t}}$	▹ Equation (31)-(33)
20: if $A_{c}^{t} \lor A_{β}^{t}$ then
21: Execute graduated response $R^{t}$	▹ Equation (34)
22: Update reputation scores	▹ Equation (29)
23: if $R^{t} = R_{θ}$ then
24: Perform model rollback	▹ Equation (35)
25: end if
26: end if
27: Compute reward $r_{t}^{d} = R_{d} (s_{t}^{d}, a_{t}^{d}, s_{t + 1}^{d})$	▹ Equation (55)
28: Update policy parameters $θ$ and value function $ϕ$ using PPO
29: Store experience tuple $(s_{t}^{d}, a_{t}^{d}, r_{t}^{d}, s_{t + 1}^{d})$ in replay buffer
30: return $θ^{t}$ , ${b_{i}^{t}}_{i = 1}^{N}$ , $a_{t}^{d}$ ;

The complete defense framework integrates all components into a cohesive algorithm that executes at each communication round through coordinated multi-layer processing and decision making:

We analyze the computational complexity of Algorithm 1 by examining each component separately and providing theoretical bounds for the overall framework execution. The computation of behavioral profiles for all clients in lines 3-6 requires

O (N \cdot d_{f} \cdot w_{max})

operations, where

d_{f}

is the feature dimension and

w_{max} = max (w_{f}, w_{s})

is the maximum window size. The anomaly detection using Equation (18) across multiple window sizes has complexity

O (N \cdot | W | \cdot d_{b}^{2})

due to Mahalanobis distance computations, where

d_{b}

is the behavioral profile dimension. The DTW-based pattern matching in Equation (19) requires

O (N \cdot | P_{Θ} | \cdot T_{P}^{2})

operations for N clients and

| P_{Θ} |

pattern templates. State construction according to Equation (36) has complexity

O (N \cdot d_{s})

where

d_{s}

is the total state dimension. Policy evaluation using the attention mechanism from Equation (60)-(61) requires

O (d_{s}^{2} + d_{a} \cdot d_{s})

operations, where

d_{a}

is the action space dimension. The geometric median computation via the Weiszfeld algorithm in lines 10-11 has complexity

O (K_{iter} \cdot N_{active} \cdot d)

where

K_{iter}

is the number of iterations,

N_{active} = \sum_{i = 1}^{N} ζ_{i}^{t}

is the number of participating clients, and d is the model parameter dimension. Outlier detection using Equation (26) requires

O (N_{active}^{2} \cdot d)

for pairwise distance computations. The weighted aggregation from Equation (27) has complexity

O (N_{active} \cdot d)

. Clean performance monitoring using Equation (30) requires

O (| D_{υ} | \cdot d)

operations for model evaluation. Trigger detection from Equation (32) has complexity

O (| D_{ξ} | \cdot | Y | \cdot d)

where

| Y |

is the number of possible target labels. Model rollback using Equation (35) when triggered requires

O (K_{max} \cdot d)

operations. Reward computation using Equation (55) has complexity

O (N \cdot M)

where M is the number of mitigation strategies. Policy and value function updates require

O (d_{θ} + d_{ϕ})

operations where

d_{θ}, d_{ϕ}

are the parameter dimensions. The total computational complexity per communication round is:

O (N \cdot max (N \cdot d, | W | \cdot d_{b}^{2}, | P_{Θ} | \cdot T_{P}^{2}) + K_{iter} \cdot N_{active} \cdot d + | D_{υ} | \cdot d + | D_{ξ} | \cdot | Y | \cdot d),

(63)

which scales quadratically with the number of clients in the worst case due to pairwise distance computations, but remains practical for moderate-scale federated deployments with

N \leq 100

clients and efficient implementation of geometric median algorithms. The temporal adaptation mechanism in line 9 reduces the effective computational load by filtering out suspicious clients early, leading to

N_{active} < N

in most scenarios, thereby improving overall efficiency.

4. Experiment

This section presents comprehensive experimental evaluations to validate the effectiveness of our proposed DEFEND framework against temporal backdoor attacks in federated learning environments. We conduct extensive experiments across multiple datasets, network architectures, and attack scenarios to demonstrate the robustness and practical applicability of our multi-layer defense mechanism.

4.1. Experimental Setup

We evaluate our framework on three widely-used federated learning benchmarks: CIFAR-10, FEMNIST, and MNIST. These datasets represent diverse application domains with varying data characteristics and complexity levels. To simulate realistic federated environments, we consider both Independent and Identically Distributed (IID) and Non-IID data distributions across clients. For Non-IID scenarios, we employ Dirichlet distribution with concentration parameters

α \in {0.1, 0.5, 1.0}

, where smaller values indicate higher data heterogeneity. The client population varies from 10 to 50 participants to assess scalability across different federation sizes. We evaluate defense performance under various threat models by varying the malicious client ratio

κ \in {0.1, 0.2, 0.3, 0.4}

, representing scenarios where 10% to 40% of participants are compromised.

We employ two representative deep learning architectures: MobileNet V2 for lightweight mobile applications and ResNet-18 for standard computer vision tasks. The detailed experimental parameters and hyperparameter configurations are summarized in Table 2.

We implement the three temporal backdoor attack strategies described in Section 3.2: fixed-period data poisoning, multi-period data poisoning, and model weight poisoning attacks. For fixed-period attacks, malicious clients inject backdoor triggers during rounds 20-40 with poisoning ratio

ρ = 0.1

. Multi-period attacks distribute poisoning across three disjoint intervals: rounds 10-15, 25-30, and 45-50, maintaining the same total poisoning budget. Model weight poisoning attacks apply Gaussian perturbations with magnitude

δ_{i}^{t} \in [0.001, 0.01]

following the decay schedule in Equation (11). Trigger patterns consist of 3×3 pixel patches with intensity variations, and target labels are randomly selected from classes not present in the victim’s local data distribution.

4.2. Evaluation Metrics

We employ three comprehensive metrics to evaluate the performance of our DEFEND framework from multiple perspectives:

4.2.1. Clean Accuracy under Defense

The clean accuracy under defense measures the model’s classification performance on benign test samples when the defense framework is actively protecting against malicious clients, using the held-out validation set

D_{υ}

:

A_{c}^{T} = \frac{1}{| D_{υ} |} \sum_{(x, y) \in D_{υ}} I [f_{θ^{T}} (x) = y],

(64)

where

θ^{T}

represents the final global model after T communication rounds, and

f_{θ^{T}}

denotes the model’s prediction function. Higher values indicate better preservation of normal task performance under defense conditions.

4.2.2. Defense Success Rate

The defense success rate quantifies the effectiveness of our defense framework by measuring the proportion of triggered samples that do not produce the attacker’s target label, using the trigger detection dataset

D_{ξ}

:

S_{d}^{T} = 1 - max_{y_{τ}} \frac{1}{| D_{ξ} |} \sum_{(x_{ξ}, y_{π}) \in D_{ξ}} I [f_{θ^{T}} (x_{ξ}) = y_{τ}],

(65)

where

x_{ξ}

represents samples with injected triggers,

y_{π}

are the original labels, and

y_{τ}

represents the target labels across all possible attack targets. Higher

S_{d}^{T}

values indicate more effective defense against backdoor attacks.

4.2.3. Temporal Detection Efficiency

The temporal detection efficiency measures the framework’s ability to rapidly and accurately identify malicious clients by considering both detection accuracy and temporal responsiveness:

E_{d} = \frac{1}{| C_{p} |} \sum_{c_{i} \in C_{p}} \frac{I [\exists t \in {t_{a, i}, \dots, T} : {\hat{s}}_{i}^{t} = s_{p} \land s_{i}^{t} = s_{p}]}{t_{d, i} - t_{a, i} + 1},

(66)

where

t_{d, i}

is the communication round when client

c_{i}

is first correctly classified as poisoned state

s_{p}

,

t_{a, i}

is the round when client

c_{i}

begins malicious behavior, and

C_{p}

represents the set of malicious clients. The indicator function ensures only successful detections are counted. Higher

E_{d}

values indicate faster and more accurate malicious client identification.

4.3. Implementation Details

Our DEFEND framework is implemented using PyTorch 2.1.0 and Python 3.9. All experiments are conducted on NVIDIA A100 GPUs with 40GB memory. The MDP-based defense coordination employs Proximal Policy Optimization (PPO) for policy learning with clip ratio 0.2 and entropy coefficient 0.01. The geometric median computation uses the accelerated Weiszfeld algorithm with momentum coefficient

α_{k} = 0.9

and convergence tolerance

ϵ = 10^{- 6}

.

For behavioral profile construction, we extract spectral features using Fast Fourier Transform (FFT) with window size 32, and wavelet coefficients using Daubechies-4 wavelets with 3 decomposition levels. The pattern matching employs Dynamic Time Warping with warping constraint

W_{warp} = 5

and Euclidean local distance function.

The validation set

D_{υ}

comprises 10% of the total training data, randomly sampled and held out from all clients. The trigger detection dataset

D_{ξ}

contains 1000 samples per class with systematically generated trigger patterns covering various spatial positions and intensities.

Each experimental configuration is repeated 5 times with different random seeds, and results are reported with 95% confidence intervals. Statistical significance is assessed using paired t-tests with Bonferroni correction for multiple comparisons.

5. Results

5.1. MDP Policy Learning Performance

Figure 2 demonstrates the training performance of different reinforcement learning algorithms used in our MDP-based defense coordination framework. We compare three algorithms: Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), and Genetic Algorithm (GA) across different network architectures and data heterogeneity levels.

The experimental results reveal several important insights about the effectiveness of different reinforcement learning approaches for defense policy optimization. PPO consistently achieves the highest cumulative rewards across all configurations, demonstrating superior learning efficiency and stability in the complex multi-objective defense environment. The algorithm shows rapid convergence within the first 150 episodes and maintains stable performance thereafter, reaching peak rewards around 3800-4000 across different settings.

SAC exhibits competitive performance with gradual learning progression, ultimately achieving comparable final rewards to PPO but requiring more episodes for convergence. The continuous learning curve suggests that SAC benefits from its off-policy nature and entropy regularization, particularly evident in scenarios with higher data heterogeneity (lower

α

values). The algorithm demonstrates consistent improvement throughout training, reaching final rewards between 3200-3600.

GA shows relatively stable but lower performance compared to the other two approaches, with rewards plateauing around 2000-2800 across all configurations. While GA provides baseline performance guarantees and avoids local optima through population-based exploration, it lacks the sophisticated gradient-based optimization capabilities needed for complex sequential decision-making in the defense scenario.

The impact of data heterogeneity (controlled by

α

) appears more pronounced in ResNet-18 configurations compared to MobileNet V2, where PPO maintains consistently high performance regardless of the heterogeneity level. This suggests that the lightweight MobileNet V2 architecture provides more robust defense policy learning under varying data distribution conditions, while ResNet-18 benefits from the additional model capacity when dealing with homogeneous data distributions (

α = 1.0

).

5.2. Clean Accuracy Preservation Performance

Figure 3 illustrates the evolution of clean accuracy under defense across different reinforcement learning algorithms, network architectures, and data heterogeneity settings. This metric evaluates how well each algorithm maintains model utility for legitimate tasks while defending against temporal backdoor attacks.

The results demonstrate significant differences in how each reinforcement learning algorithm balances security and utility preservation. PPO consistently achieves superior clean accuracy performance, reaching and maintaining accuracy levels between 0.85-0.95 across most configurations after approximately 200 training episodes. The algorithm shows remarkable stability in preserving model utility while implementing defense mechanisms, with particularly strong performance in homogeneous data distributions (

α = 1.0

).

SAC exhibits competitive clean accuracy preservation with gradual but steady improvement throughout training. The algorithm demonstrates robust performance across heterogeneous data settings, achieving final accuracy values between 0.82-0.92. SAC’s continuous learning approach proves particularly effective in MobileNet V2 configurations, where it matches or occasionally exceeds PPO’s performance, suggesting that the off-policy learning strategy works well with lightweight architectures.

GA shows the most variable performance with significant fluctuations throughout training episodes. While GA occasionally achieves high accuracy spikes (up to 0.95 in some configurations), it struggles to maintain consistent performance, with accuracy frequently dropping to 0.4-0.6 range. This instability indicates that population-based optimization may be less suitable for maintaining the delicate balance between defense effectiveness and model utility preservation in federated learning environments.

The impact of data heterogeneity reveals interesting patterns across architectures. ResNet-18 shows greater sensitivity to heterogeneity levels, with more pronounced performance differences between

α = 0.1

and

α = 1.0

configurations. In contrast, MobileNet V2 demonstrates more robust performance across different heterogeneity levels, particularly with PPO and SAC algorithms, suggesting that lightweight architectures may provide inherent advantages for federated defense scenarios.

Notably, the clean accuracy preservation performance correlates with the reward optimization patterns observed in Figure 2, confirming that higher cumulative rewards in the MDP framework translate to better utility preservation during defense operations. This validates our multi-objective reward formulation that balances security improvements with model performance maintenance.

5.3. Defense Effectiveness Evaluation

We evaluate the defense effectiveness of our DEFEND framework using two key metrics: Defense Success Rate (

S_{d}^{T}

) and Temporal Detection Efficiency (

E_{d}

) across various system configurations. The following tables present comprehensive results under different combinations of client population sizes, data heterogeneity levels, malicious client ratios, and network architectures.

Table 3 and Table 4 demonstrate that ResNet-18 exhibits strong sensitivity to data heterogeneity levels under fixed malicious conditions (

κ

= 0.2). The Defense Success Rate improves substantially from highly heterogeneous (

α

= 0.1) to homogeneous (

α

= 1.0) distributions, with improvements ranging from 9.1% to 8.8% across different client population sizes. The Temporal Detection Efficiency shows even more pronounced improvements of 14.1% to 13.3%, indicating that homogeneous data distributions significantly enhance the temporal behavioral analysis layer’s ability to identify malicious patterns. The consistent performance gains with increased client population size suggest that ResNet-18 benefits from larger federation scales for improved defense coordination.

Table 5 and Table 6 reveal the critical impact of malicious client ratios on ResNet-18 defense performance under moderate heterogeneity (

α

= 0.5). The Defense Success Rate shows a clear linear degradation as malicious ratios increase, with performance dropping from 0.972 to 0.803 (17.4% decrease) at the largest federation size. More concerning is the dramatic decline in Temporal Detection Efficiency, which drops from 0.864 to 0.554 (35.9% decrease), approaching the theoretical limits of Byzantine fault tolerance. This indicates that while our framework maintains reasonable attack prevention capabilities even at high malicious ratios, the speed and accuracy of malicious client identification becomes significantly compromised beyond

κ

= 0.3.

Table 7 and Table 8 show that MobileNet V2 maintains competitive defense performance despite its lightweight architecture, achieving 10.6% improvement in Defense Success Rate and 14.3% improvement in Temporal Detection Efficiency from heterogeneous to homogeneous distributions. While the absolute values are slightly lower than ResNet-18, MobileNet V2 demonstrates more consistent relative improvements across different client population sizes, with smaller confidence intervals indicating greater stability. The architecture shows particular resilience in heterogeneous environments, making it well-suited for resource-constrained federated deployments where data distribution control is limited.

Table 9 and Table 10 demonstrate that MobileNet V2 exhibits similar vulnerability patterns to ResNet-18 under increasing malicious ratios, but with notably more stable degradation characteristics. The Defense Success Rate decreases by 18.8% from

κ

= 0.1 to

κ

= 0.4, while Temporal Detection Efficiency drops by 37.1%, comparable to ResNet-18’s performance degradation. However, MobileNet V2 shows consistently smaller confidence intervals across all configurations, indicating more predictable and stable defense behavior. This stability advantage becomes particularly valuable in dynamic federated environments where malicious client ratios may fluctuate over time, providing more reliable defense guarantees compared to the higher-capacity but more variable ResNet-18 architecture.

6. Conclusions

This paper presents DEFEND, a comprehensive multi-layer defense framework that counters temporal backdoor attacks in federated learning through sophisticated mathematical modeling and reinforcement learning techniques. Our primary contributions include three distinct temporal attack models, a three-tier defense architecture combining behavioral analysis with robust aggregation, and a novel MDP-based approach for adaptive defense coordination. Experimental evaluation demonstrates strong performance with Defense Success Rates reaching 0.956 ± 0.010 for ResNet-18 and 0.940 ± 0.012 for MobileNet V2, while maintaining clean accuracy levels between 0.85-0.95. The framework shows resilience across different data heterogeneity levels and client populations, though performance degrades when malicious ratios exceed 30

The framework has limitations including quadratic computational complexity and reduced effectiveness against highly coordinated attacks approaching Byzantine fault tolerance limits. Future research should focus on developing more efficient algorithms, adaptive threshold mechanisms, and extensions to other domains beyond computer vision. As federated learning expands into critical applications, robust defense mechanisms like DEFEND become essential for maintaining system integrity and user trust in distributed machine learning paradigms.

Appendix A. Byzantine Robustness Analysis

This section establishes the theoretical foundation for the Byzantine robustness of our DEFEND framework through rigorous mathematical analysis of the geometric median aggregation mechanism and outlier detection procedures.

Appendix A.1. Fundamental Byzantine Robustness Theorem

Theorem A1

(Byzantine Robustness of DEFEND Framework). Consider a federated learning system with N participating clients where at most f clients are Byzantine adversaries satisfying

f < N / 2

. Let

C_{h}

denote honest clients and

C_{p}

denote Byzantine clients with

| C_{p} | = f

. Under the DEFEND framework with geometric median aggregation (23) and statistical outlier detection (26), the global model parameter

θ^{t}

converges to within an ϵ-neighborhood of the optimal solution

θ^{*}

with high probability.

Specifically, for any

ϵ > 0

and confidence parameter

δ \in (0, 1)

, there exists a finite round

T_{0}

such that for all

t \geq T_{0}

:

P [{∥ θ^{t} - θ^{*} ∥}_{F} \leq ϵ + 2 ξ + O (\sqrt{\frac{log (1 / δ)}{N}})] \geq 1 - δ,

(A1)

where ξ bounds the geometric median estimation error, provided:

1.: The Byzantine client fraction satisfies $f \leq ⌊ (N - 1) / 2 ⌋$ ;
2.: The geometric median approximation error is bounded: ${∥ {\hat{θ}}_{ν}^{t} - {\bar{θ}}_{h}^{t} ∥}_{F} \leq ξ$ , where ${\bar{θ}}_{h}^{t}$ represents the mean of honest client updates;
3.: The outlier detection mechanism achieves controlled error rates: false positive rate $α \leq 0.05$ and false negative rate $β \leq 0.1$ .

Proof.

The proof proceeds through four main steps: (1) establishing the breakdown point properties of geometric median, (2) analyzing the statistical concentration of honest client updates, (3) bounding the outlier detection accuracy, and (4) proving convergence under Byzantine presence.

Step 1: Breakdown Point Analysis of Geometric Median

The geometric median

{\hat{θ}}_{ν}^{t}

defined in (23) possesses a breakdown point of exactly

1 / 2

, meaning it can tolerate up to

⌊ (N - 1) / 2 ⌋

arbitrary outliers without complete failure. This fundamental property ensures robustness against Byzantine adversaries when

f < N / 2

.

For the geometric median computation, let

{\bar{θ}}_{h}^{t} = \frac{1}{N - f} \sum_{i \in C_{h}} θ_{i}^{t}

denote the empirical mean of honest client updates. By the robustness property of geometric median, we have:

{∥ {\hat{θ}}_{ν}^{t} - {\bar{θ}}_{h}^{t} ∥}_{F} \leq \frac{2 f}{N - f} \cdot max_{i \in C_{h}} {∥ θ_{i}^{t} - {\bar{θ}}_{h}^{t} ∥}_{F} .

(A2)

Since honest clients follow the true learning dynamics, their updates concentrate around the optimal direction. Under standard federated learning assumptions with bounded gradient variance, we have:

max_{i \in C_{h}} {∥ θ_{i}^{t} - {\bar{θ}}_{h}^{t} ∥}_{F} \leq σ \sqrt{\frac{2 log (N)}{n_{min}}},

(A3)

with probability at least

1 - N^{- 1}

, where

σ

is the gradient noise parameter and

n_{min}

is the minimum local dataset size.

Step 2: Statistical Concentration of Honest Updates

For honest clients

c_{i} \in C_{h}

, their local model updates

θ_{i}^{t}

are generated through standard gradient descent on local data. Under the assumption of sub-Gaussian gradient noise with parameter

σ^{2}

, we can establish concentration bounds.

Let

\nabla F_{i} (θ^{t - 1})

denote the true local gradient for client

c_{i}

. The empirical gradient computed from local data satisfies:

P [{∥ \nabla {\hat{F}}_{i} (θ^{t - 1}) - \nabla F_{i} (θ^{t - 1}) ∥}_{F} \geq t] \leq 2 exp (- \frac{n_{i} t^{2}}{2 σ^{2} d}),

(A4)

where d is the model parameter dimension and

n_{i}

is the local dataset size for client

c_{i}

.

The local update deviation from the ideal direction can be bounded by:

{∥ θ_{i}^{t} - θ_{ideal}^{t} ∥}_{F} \leq η {∥ \nabla {\hat{F}}_{i} (θ^{t - 1}) - \nabla F_{i} (θ^{t - 1}) ∥}_{F} + η L_{F} {∥ θ^{t - 1} - θ^{*} ∥}_{F},

(A5)

where

η

is the learning rate and

L_{F}

is the Lipschitz constant from the smoothness assumption.

Step 3: Outlier Detection Accuracy Analysis

Our outlier detection mechanism (26) identifies suspicious clients based on their distance from the geometric median. For a client

c_{i}

, define the detection statistic:

d_{i}^{t} = {∥ θ_{i}^{t} - {\hat{θ}}_{ν}^{t} ∥}_{F} .

(A6)

Under the null hypothesis that client

c_{i}

is honest,

d_{i}^{t}

follows a distribution characterized by the concentration properties established in Step 2. The detection threshold

τ_{α}

is set based on quantiles of this null distribution as defined in (26).

For honest clients, the false positive probability is bounded by:

P [c_{i} \in S^{t} | c_{i} \in C_{h}] \leq α + exp (- \frac{n_{i} {(τ_{α} - E [d_{i}^{t}])}^{2}}{2 σ^{2} d}),

(A7)

where

α

is the nominal false positive rate.

For Byzantine clients with significantly deviating updates, the detection probability satisfies:

P [c_{j} \in S^{t} | c_{j} \in C_{p}] \geq 1 - β,

(A8)

provided their update magnitude exceeds the detection threshold by a sufficient margin, where

β

is the false negative rate.

Step 4: Convergence Analysis Under Byzantine Presence

After outlier detection and removal, the weighted aggregation operates on the filtered client set. With high probability

1 - δ

, the set

{c_{1}, \dots, c_{N}} ∖ S^{t}

contains predominantly honest clients.

The weighted aggregation (27) yields:

θ^{t} = arg min_{θ} \sum_{i \notin S^{t}} w_{i}^{t} {∥ θ - θ_{i}^{t} ∥}_{F}^{2} + λ_{τ} {∥ θ - θ^{t - 1} ∥}_{F}^{2} + λ_{ν} {∥ θ - {\hat{θ}}_{ν}^{t} ∥}_{F}^{2} .

(A9)

The solution can be expressed as:

θ^{t} = \frac{\sum_{i \notin S^{t}} w_{i}^{t} θ_{i}^{t} + λ_{τ} θ^{t - 1} + λ_{ν} {\hat{θ}}_{ν}^{t}}{\sum_{i \notin S^{t}} w_{i}^{t} + λ_{τ} + λ_{ν}} .

(A10)

Since the filtered set predominantly contains honest clients, we can decompose the aggregation error as:

\begin{matrix} {∥ θ^{t} - θ^{*} ∥}_{F} \leq & {∥ θ^{t} - E [θ^{t}] ∥}_{F} + {∥ E [θ^{t}] - θ^{*} ∥}_{F} . \end{matrix}

(A11)

The concentration term is bounded using Azuma-Hoeffding inequality for weighted sums:

{∥ θ^{t} - E [θ^{t}] ∥}_{F} \leq O (\sqrt{\frac{log (1 / δ)}{| {c_{1}, \dots, c_{N}} ∖ S^{t} |}}),

(A12)

with probability at least

1 - δ

.

The bias term is controlled by the geometric median approximation error and temporal regularization:

{∥ E [θ^{t}] - θ^{*} ∥}_{F} \leq 2 ξ + \frac{λ_{τ}}{λ_{ν}} {∥ θ^{t - 1} - θ^{*} ∥}_{F} .

(A13)

The temporal regularization coefficient

λ_{τ} / λ_{ν} < 1

ensures contraction, leading to convergence as

t \to \infty

.

Combining the concentration and bias bounds establishes (A1), completing the proof. □

Appendix A.2. Corollaries and Extensions

Corollary A1

(Finite-Sample Convergence Rate). Under the conditions of Theorem A1, the DEFEND framework achieves ϵ-convergence in at most

T (ϵ, δ) = O (\frac{log ({∥ θ^{0} - θ^{*} ∥}_{F} / ϵ)}{log (1 / (1 - λ_{τ} / λ_{ν}))} + log (1 / δ))

(A14)

communication rounds with probability at least

1 - δ

.

Proof.

The proof follows from iterating the contraction property in (A13) and using the union bound over all communication rounds. □

Corollary A2

(Optimality of Byzantine Tolerance). The Byzantine tolerance threshold

f < N / 2

in Theorem A1 is optimal. No algorithm can guarantee convergence when

f \geq N / 2

in the worst case.

Proof.

This follows from the fundamental impossibility results in Byzantine fault tolerance. When

f \geq N / 2

, Byzantine clients can coordinate to completely overwhelm honest clients, making any aggregation rule vulnerable to manipulation. □

Appendix A.3. Robustness Under Stronger Attack Models

Lemma A1

(Robustness Against Coordinated Attacks). The DEFEND framework maintains its robustness guarantees even when Byzantine clients coordinate their attacks, provided they cannot observe the geometric median computation in real-time.

Proof.

Coordinated attacks can increase the correlation between Byzantine updates but cannot change the fundamental breakdown point of the geometric median. The proof follows the same structure as Theorem A1 with modified concentration bounds for correlated adversarial behavior. □

Appendix B. Weiszfeld Algorithm Convergence Analysis

This section provides a comprehensive theoretical analysis of the convergence properties of the accelerated Weiszfeld algorithm used for geometric median computation in our DEFEND framework.

Appendix B.1. Preliminaries and Algorithm Description

The Weiszfeld algorithm solves the geometric median problem defined in (23):

{\hat{θ}}_{ν}^{t} = arg min_{θ} \sum_{i = 1}^{N} {∥ θ - θ_{i}^{t} ∥}_{F} .

(A15)

Our accelerated version incorporates momentum-based acceleration as defined in (24) and (25). Let

{θ_{1}^{t}, \dots, θ_{N}^{t}}

denote the set of client updates at communication round t, and assume they are distinct with probability 1.

Definition A1

(Weiszfeld Iteration Operator). The Weiszfeld iteration operator

T : R^{d} \to R^{d}

is defined as:

T (θ) = \frac{\sum_{i = 1}^{N} \frac{θ_{i}^{t}}{∥ θ - θ_{i}^{t} ∥_{F} + ϵ}}{\sum_{i = 1}^{N} \frac{1}{∥ θ - θ_{i}^{t} ∥_{F} + ϵ}},

(A16)

where

ϵ > 0

is the regularization parameter to avoid division by zero.

Appendix B.2. Main Convergence Theorem

Theorem A2

(Convergence of Accelerated Weiszfeld Algorithm). Consider the accelerated Weiszfeld algorithm defined by (24) and (25) applied to compute the geometric median of client updates

{θ_{1}^{t}, \dots, θ_{N}^{t}} \subset R^{d}

. Let

{\hat{θ}}_{ν}^{t *}

denote the unique geometric median. Then:

1.: Linear Convergence: The algorithm converges linearly to ${\hat{θ}}_{ν}^{t *}$ with rate $ρ \in (0, 1)$ where

${∥ θ^{(k + 1)} - {\hat{θ}}_{ν}^{t *} ∥}_{F} \leq ρ {∥ θ^{(k)} - {\hat{θ}}_{ν}^{t *} ∥}_{F};$

(A17)
2.: Iteration Complexity: The number of iterations required to achieve ϵ-accuracy is

$K (ϵ) = O (\sqrt{κ} log (\frac{∥ θ^{(0)} - {\hat{θ}}_{ν}^{t *} ∥_{F}}{ϵ})),$

(A18)

where κ is the condition number of the problem;
3.: Acceleration Benefit: The momentum acceleration reduces the convergence constant by a factor of $(1 - \sqrt{μ / L})$ where μ and L are the strong convexity and Lipschitz parameters.

Proof.

The proof is structured in four main parts: (1) establishing strong convexity of the geometric median objective, (2) proving contraction properties of the Weiszfeld operator, (3) analyzing momentum acceleration, and (4) deriving iteration complexity bounds.

Part 1: Strong Convexity Analysis

The geometric median objective function is:

f (θ) = \sum_{i = 1}^{N} {∥ θ - θ_{i}^{t} ∥}_{F} .

(A19)

For

θ \neq θ_{i}^{t}

(which occurs with probability 1), the function f is twice differentiable. The Hessian matrix is:

\nabla^{2} f (θ) = \sum_{i = 1}^{N} \frac{1}{∥ θ - θ_{i}^{t} ∥_{F}} (I - \frac{(θ - θ_{i}^{t}) {(θ - θ_{i}^{t})}^{T}}{∥ θ - θ_{i}^{t} ∥_{F}^{2}}) .

(A20)

Lemma A2 (Strong Convexity Parameter).

The objective function

f (θ)

is strongly convex with parameter

μ = min_{i = 1, \dots, N} \frac{1}{{∥ θ - θ_{i}^{t} ∥}_{F} + ϵ} \geq \frac{1}{D_{max} + ϵ},

(A21)

where

D_{max} = {max}_{i, j} {∥ θ_{i}^{t} - θ_{j}^{t} ∥}_{F}

is the diameter of the client update set.

Proof of Lemma A2.

Each term in the Hessian corresponds to a projection onto the orthogonal complement of

(θ - θ_{i}^{t})

. The minimum eigenvalue is achieved when

θ

is furthest from all client updates, giving the stated bound. □

Part 2: Contraction Analysis of Weiszfeld Operator

The Weiszfeld operator can be interpreted as a proximal gradient step. Define the subdifferential of f at

θ

:

𝜕 f (θ) = \sum_{i = 1}^{N} \frac{θ - θ_{i}^{t}}{{∥ θ - θ_{i}^{t} ∥}_{F} + ϵ} .

(A22)

Lemma A3 (Contraction Property).

The Weiszfeld operator

T

is a contraction mapping with contraction factor

ρ_{b a s e} = 1 - \frac{μ}{L} \leq 1 - \frac{1}{L (D_{max} + ϵ)},

(A23)

where L is the Lipschitz constant of

\nabla f

.

Proof of Lemma A3.

The Weiszfeld iteration can be written as:

θ^{(k + 1)} = θ^{(k)} - α^{(k)} \nabla f (θ^{(k)}),

(A24)

where

α^{(k)} = {(\sum_{i = 1}^{N} \frac{1}{{∥ θ^{(k)} - θ_{i}^{t} ∥}_{F} + ϵ})}^{- 1}

is an adaptive step size.

By the strong convexity of f, we have:

f (θ^{(k + 1)}) \leq f (θ^{(k)}) - \frac{μ}{2} {∥ θ^{(k + 1)} - θ^{(k)} ∥}_{F}^{2} .

(A25)

The optimality condition

\nabla f ({\hat{θ}}_{ν}^{t *}) = 0

combined with strong convexity yields:

{∥ θ^{(k + 1)} - {\hat{θ}}_{ν}^{t *} ∥}_{F}^{2} \leq (1 - μ / L) {∥ θ^{(k)} - {\hat{θ}}_{ν}^{t *} ∥}_{F}^{2} .

(A26)

□

Part 3: Momentum Acceleration Analysis

The momentum-accelerated version from (25) follows the heavy ball method:

θ^{(k + 1)} \leftarrow θ^{(k + 1)} + α_{k} (θ^{(k + 1)} - θ^{(k)}),

(A27)

where

α_{k}

is the momentum coefficient.

Lemma A4 (Acceleration Benefit).

With optimally chosen momentum parameter

α_{k} = \frac{\sqrt{L} - \sqrt{μ}}{\sqrt{L} + \sqrt{μ}},

(A28)

the accelerated convergence rate becomes

ρ_{acc} = \frac{\sqrt{L} - \sqrt{μ}}{\sqrt{L} + \sqrt{μ}} = \frac{\sqrt{κ} - 1}{\sqrt{κ} + 1},

(A29)

where

κ = L / μ

is the condition number.

Proof of Lemma A4.

The momentum method can be analyzed using the potential function approach. Define:

Φ^{(k)} = f (θ^{(k)}) - f ({\hat{θ}}_{ν}^{t *}) + \frac{β}{2} {∥ θ^{(k)} - θ^{(k - 1)} ∥}_{F}^{2},

(A30)

for appropriate constant

β > 0

.

The momentum parameter choice ensures:

Φ^{(k + 1)} \leq {(\frac{\sqrt{κ} - 1}{\sqrt{κ} + 1})}^{2} Φ^{(k)} .

(A31)

This implies the stated convergence rate for the function values, which translates to the parameter convergence rate through strong convexity. □

Part 4: Iteration Complexity Derivation

To achieve

ϵ

-accuracy:

{∥ θ^{(k)} - {\hat{θ}}_{ν}^{t *} ∥}_{F} \leq ϵ

, we need:

ρ_{acc}^{k} {∥ θ^{(0)} - {\hat{θ}}_{ν}^{t *} ∥}_{F} \leq ϵ .

(A32)

Taking logarithms:

k \geq \frac{log ({∥ θ^{(0)} - {\hat{θ}}_{ν}^{t *} ∥}_{F} / ϵ)}{log (1 / ρ_{acc})} .

(A33)

Using the approximation

log (1 / ρ_{acc}) \approx 2 / \sqrt{κ}

for large

κ

:

k = O (\sqrt{κ} log (\frac{∥ θ^{(0)} - {\hat{θ}}_{ν}^{t *} ∥_{F}}{ϵ})) .

(A34)

This completes the proof of all three statements in Theorem A2.□

Appendix B.3. Practical Implementation Considerations

Lemma A5

(Numerical Stability). The regularization parameter ϵ in Definition A1 can be chosen as

ϵ = O (10^{- 6})

without significantly affecting the convergence rate, provided the condition number κ is bounded.

Proof.

The regularization affects the strong convexity parameter by at most

ϵ / D_{max}

, which is negligible for practical choices of

ϵ

relative to the client update diameter

D_{max}

. □

Corollary A3

(Distributed Implementation). The Weiszfeld algorithm can be implemented in a distributed manner where each iteration requires only

O (N)

communication complexity to exchange current iterate and compute weighted average.

Proof.

Each iteration of the Weiszfeld algorithm requires computing the weighted average in (A16), which can be done with a single round of communication where each client sends its current update

θ_{i}^{t}

and receives the current iterate

θ^{(k)}

. □

Appendix B.4. Robustness Under Approximate Computation

Theorem A3

(Robustness to Computation Errors). Suppose each Weiszfeld iteration is computed with additive error

e^{(k)}

satisfying

{∥ e^{(k)} ∥}_{F} \leq δ

for some

δ > 0

. Then the algorithm still converges to within

O (δ / μ)

of the true geometric median

{\hat{θ}}_{ν}^{t *}

.

Proof.

The proof follows by modifying the contraction analysis in Lemma A3 to account for the additional error terms in each iteration. The modified iteration becomes:

θ^{(k + 1)} = T (θ^{(k)}) + e^{(k)} .

(A35)

The contraction property still holds with an additional bias term:

{∥ θ^{(k + 1)} - {\hat{θ}}_{ν}^{t *} ∥}_{F} \leq ρ {∥ θ^{(k)} - {\hat{θ}}_{ν}^{t *} ∥}_{F} + δ .

(A36)

Taking the limit as

k \to \infty

shows that the algorithm converges to within

δ / (1 - ρ) = O (δ / μ)

of the true geometric median. □

Appendix B.5. Integration with DEFEND Framework

Corollary A4

(Convergence in DEFEND Context). When integrated into the DEFEND framework, the Weiszfeld algorithm for computing

{\hat{θ}}_{ν}^{t}

in (23) achieves the complexity bound from Theorem A2 at each communication round t, with the condition number κ determined by the spread of honest client updates.

Proof.

The condition number

κ

in each communication round depends on the ratio

L / μ

where L and

μ

are determined by the distribution of client updates

{θ_{1}^{t}, \dots, θ_{N}^{t}}

. Under the assumptions of bounded client update variance,

κ

remains bounded across communication rounds, ensuring consistent convergence performance. □

References

Wen, J.; Zhang, Z.; Lan, Y.; Cui, Z.; Cai, J.; Zhang, W. A Survey on Federated Learning: Challenges and Applications. International Journal of Machine Learning and Cybernetics 2023, 14, 513–535. [Google Scholar] [CrossRef] [PubMed]
Abadi, A.; Doyle, B.; Gini, F.; Guinamard, K.; Murakonda, S.K.; Liddell, J.; Mellor, P.; Murdoch, S.J.; Naseri, M.; Page, H.; et al. Starlit: Privacy-Preserving Federated Learning to Enhance Financial Fraud Detection. arXiv preprint arXiv:2401.10765 (arXiv) 2024. [CrossRef]
Wu, B.; Huang, J.; Yu, S. "X of Information" Continuum: A Survey on AI-Driven Multi-Dimensional Metrics for Next-Generation Networked Systems. arXiv preprint arXiv:2507.19657 2025. [CrossRef]
Gong, X.; Chen, Y.; Wang, Q.; Kong, W. Backdoor Attacks and Defenses in Federated Learning: State-of-the-Art, Taxonomy, and Future Directions. IEEE Wireless Communications 2023, 30, 114–121. [Google Scholar] [CrossRef]
Wu, B.; Huang, J.; Duan, Q.; Dong, L.; Cai, Z. Enhancing Vehicular Platooning With Wireless Federated Learning: A Resource-Aware Control Framework. arXiv preprint arXiv:2507.00856 2025. [CrossRef]
Liu, T.; Zhang, Y.; Feng, Z.; Yang, Z.; Xu, C.; Man, D.; Yang, W. Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). AAAI, 2024, Vol. 38, pp. 21359–21367. [CrossRef]
Fang, Z.; Wang, J.; Ma, Y.; Tao, Y.; Deng, Y.; Chen, X.; Fang, Y. R-ACP: Real-Time Adaptive Collaborative Perception Leveraging Robust Task-Oriented Communications. IEEE Journal on Selected Areas in Communications 2025. [CrossRef]
Ozdayi, M.S.; Kantarcioglu, M.; Gel, Y.R. Defending Against Backdoors in Federated Learning With Robust Learning Rate. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). AAAI, 2021, Vol. 35, pp. 9268–9276. [CrossRef]
Fang, Z.; Hu, S.; Wang, J.; Deng, Y.; Chen, X.; Fang, Y. Prioritized Information Bottleneck Theoretic Framework With Distributed Online Learning for Edge Video Analytics. IEEE Transactions on Networking 2025, pp. 1–17. [CrossRef]
Wu, B.; Cai, Z.; Wu, W.; Yin, X. AoI-Aware Resource Management for Smart Health via Deep Reinforcement Learning. IEEE Access 2023. [Google Scholar] [CrossRef]
Chen, Z.; Yu, S.; Fan, M.; Liu, X.; Deng, R.H. Privacy-Enhancing and Robust Backdoor Defense for Federated Learning on Heterogeneous Data. IEEE Transactions on Information Forensics and Security 2024, 19, 693–707. [Google Scholar] [CrossRef]
Sewak, M.; Sahay, S.K.; Rathore, H. Deep Reinforcement Learning in the Advanced Cybersecurity Threat Detection and Protection. Information Systems Frontiers 2023, 25, 589–611. [Google Scholar] [CrossRef]
Chen, C.; Liu, J.; Tan, H.; Li, X.; Wang, K.I.K.; Li, P.; Sakurai, K.; Dou, D. Trustworthy Federated Learning: Privacy, Security, and Beyond. Knowledge and Information Systems 2025, 67, 2321–2356. [Google Scholar] [CrossRef]
Aljunaid, S.K.; Almheiri, S.J.; Dawood, H.; Khan, M.A. Secure and Transparent Banking: Explainable AI-Driven Federated Learning Model for Financial Fraud Detection. Journal of Risk and Financial Management 2025, 18, 179. [Google Scholar] [CrossRef]
Hallaji, E.; Razavi-Far, R.; Saif, M.; Wang, B.; Yang, Q. Decentralized Federated Learning: A Survey on Security and Privacy. IEEE Transactions on Big Data 2024, 10, 194–213. [Google Scholar] [CrossRef]
Pingulkar, S.; Pawade, D. Federated Learning Architectures for Credit Risk Assessment: A Comparative Analysis of Vertical, Horizontal, and Transfer Learning Approaches. In Proceedings of the 2024 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS). IEEE, 2024, pp. 1–7. [CrossRef]
Damoun, F.; Seba, H.; State, R. Privacy-Preserving Behavioral Anomaly Detection in Dynamic Graphs for Card Transactions. In Proceedings of the International Conference on Web Information Systems Engineering (WISE). Springer, 2024, pp. 286–301. [CrossRef]
Wu, B.; Wu, W. Model-Free Cooperative Optimal Output Regulation for Linear Discrete-Time Multi-Agent Systems Using Reinforcement Learning. Mathematical Problems in Engineering 2023, 2023, 6350647. [Google Scholar] [CrossRef]
Ding, Z.; Huang, J.; Duan, Q.; Zhang, C.; Zhao, Y.; Gu, S. A Dual-Level Game-Theoretic Approach for Collaborative Learning in UAV-Assisted Heterogeneous Vehicle Networks. In Proceedings of the 2025 IEEE International Performance, Computing, and Communications Conference (IPCCC). IEEE, 2025, pp. 1–8.
Xiong, H.; Xia, Y.; Zhao, Y.; Wahaballa, A.; Yeh, K.H. Heterogeneous Privacy-Preserving Blockchain-Enabled Federated Learning for Social Fintech. IEEE Transactions on Computational Social Systems 2025, pp. 1–16. [CrossRef]
Wu, B.; Ding, Z.; Ostigaard, L.; Huang, J. Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision Agriculture. In Proceedings of the 2025 ACM Research on Adaptive and Convergent Systems (RACS). ACM, 2025, pp. 1–8.
Orabi, M.M.; Emam, O.; Fahmy, H. Adapting Security and Decentralized Knowledge Enhancement in Federated Learning Using Blockchain Technology: Literature Review. Journal of Big Data 2025, 12, 55. [Google Scholar] [CrossRef]
Uddin, M.P.; Xiang, Y.; Hasan, M.; Bai, J.; Zhao, Y.; Gao, L. A Systematic Literature Review of Robust Federated Learning: Issues, Solutions, and Future Research Directions. ACM Computing Surveys 2025, 57, 1–62. [Google Scholar] [CrossRef]
Duan, G.; Lv, H.; Wang, H.; Feng, G.; Li, X. Practical Cyber Attack Detection With Continuous Temporal Graph in Dynamic Network System. IEEE Transactions on Information Forensics and Security 2024, 19, 4851–4864. [Google Scholar] [CrossRef]
Zamanzadeh Darban, Z.; Webb, G.I.; Pan, S.; Aggarwal, C.; Salehi, M. Deep Learning for Time Series Anomaly Detection: A Survey. ACM Computing Surveys 2024, 57, 1–42. [Google Scholar] [CrossRef]
Bello, Y.; Hussein, A.R. Dynamic Policy Decision/Enforcement Security Zoning Through Stochastic Games and Meta Learning. IEEE Transactions on Network and Service Management 2025, 22, 807–821. [Google Scholar] [CrossRef]
Huang, W.; Shi, Z.; Ye, M.; Li, H.; Du, B. Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning. In Proceedings of the Proceedings of the Forty-First International Conference on Machine Learning (ICML). PMLR, 2024.
Li, Y.; Wang, Y.; Chen, Z.; Yuan, H. A Multi-Layer Aggregation Backdoor Defense Framework for Federated Learning. In Proceedings of the 2025 International Conference on Communication, Remote Sensing and Information Technology (CRSIT). IEEE, 2025, pp. 126–132. [CrossRef]
Abazari, A.; Ghafouri, M.; Jafarigiv, D.; Atallah, R.; Assi, C. Developing a Security Metric for Assessing the Power Grid’s Posture Against Attacks From EV Charging Ecosystem. IEEE Transactions on Smart Grid 2025, 16, 254–276. [Google Scholar] [CrossRef]
Presekal, A.; Ştefanov, A.; Semertzis, I.; Palensky, P. Spatio-Temporal Advanced Persistent Threat Detection and Correlation for Cyber-Physical Power Systems Using Enhanced GC-LSTM. IEEE Transactions on Smart Grid 2025, 16, 1654–1666. [Google Scholar] [CrossRef]
Wu, Y.; Hu, Y.; Wang, J.; Feng, M.; Dong, A.; Yang, Y. An Active Learning Framework Using Deep Q-Network for Zero-Day Attack Detection. Computers & Security 2024, 139, 103713. [Google Scholar]
Pan, D.; Wu, B.N.; Sun, Y.L.; Xu, Y.P. A Fault-Tolerant and Energy-Efficient Design of a Network Switch Based on a Quantum-Based Nano-Communication Technique. Sustainable Computing: Informatics and Systems 2023, 37, 100827. [Google Scholar] [CrossRef]
Hammad, A.A.; Ahmed, S.R.; Abdul-Hussein, M.K.; Ahmed, M.R.; Majeed, D.A.; Algburi, S. Deep Reinforcement Learning for Adaptive Cyber Defense in Network Security. In Proceedings of the Proceedings of the Cognitive Models and Artificial Intelligence Conference (CMAI). ACM, 2024, pp. 292–297. [CrossRef]
Wu, B.; Huang, J.; Duan, Q. FedTD3: An Accelerated Learning Approach for UAV Trajectory Planning. In Proceedings of the International Conference on Wireless Artificial Intelligent Computing Systems and Applications (WASA). Springer, 2025, pp. 13–24. [CrossRef]
Wu, B.; Huang, J.; Duan, Q. Real-Time Intelligent Healthcare Enabled by Federated Digital Twins With AoI Optimization. IEEE Network 2025, pp. 1–1. [CrossRef]
Paparrizos, J.; Boniol, P.; Liu, Q.; Palpanas, T. Advances in Time-Series Anomaly Detection: Algorithms, Benchmarks, and Evaluation Measures. In Proceedings of the Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2025, pp. 6151–6161. [CrossRef]
Feng, X.; Han, J.; Zhang, R.; Xu, S.; Xia, H. Security Defense Strategy Algorithm for Internet of Things Based on Deep Reinforcement Learning. High-Confidence Computing 2024, 4, 100167. [Google Scholar] [CrossRef]
Fang, Z.; Wang, J.; Ren, Y.; Han, Z.; Poor, H.V.; Hanzo, L. Age of information in energy harvesting aided massive multiple access networks. IEEE Journal on Selected Areas in Communications 2022, 40, 1441–1456. [Google Scholar] [CrossRef]
Farhaoui, Y.; Allaoui, A.E.; Amounas, F.; Mohammed, F.; Ziani, S.; Taherdoost, H.; Triantafyllou, S.A.; Bhushan, B. A Multi-Layered Protection System for Enhancing Data Security in Cloud Computing Environments. In Proceedings of the International Conference on Artificial Intelligence and Smart Environment (AISE). Springer, 2024, pp. 559–568. [CrossRef]
Feng, J.; Ren, Z.; Li, C.; Li, W. A Benders-Combined Safe Reinforcement Learning Framework for Risk-Averse Dispatch Considering Frequency Security Constraints. IEEE Transactions on Circuits and Systems II: Express Briefs 2025, 72, 1063–1067. [Google Scholar] [CrossRef]
Fang, Z.; Liu, Z.; Wang, J.; Hu, S.; Guo, Y.; Deng, Y.; Fang, Y. Task-Oriented Communications for Visual Navigation With Edge-Aerial Collaboration in Low Altitude Economy. arXiv preprint arXiv:2504.18317 (arXiv) 2025. [CrossRef]
Huang, J.; Wu, B.; Duan, Q.; Dong, L.; Yu, S. A Fast UAV Trajectory Planning Framework in RIS-Assisted Communication Systems With Accelerated Learning via Multithreading and Federating. IEEE Transactions on Mobile Computing 2025, pp. 1–16. [CrossRef]
Pan, D.; Wu, B.N.; Sun, Y.L.; Xu, Y.P. A fault-tolerant and energy-efficient design of a network switch based on a quantum-based nano-communication technique. Sustainable Computing: Informatics and Systems 2023, 37, 100827. [Google Scholar] [CrossRef]

Figure 1. DEFEND Framework.

Figure 2. Training curves of different reinforcement learning algorithms for MDP-based defense policy optimization across various network architectures and data heterogeneity levels. The curves show cumulative reward over training episodes, where

α

represents the Dirichlet concentration parameter controlling data distribution heterogeneity (lower values indicate higher heterogeneity).

Figure 2. Training curves of different reinforcement learning algorithms for MDP-based defense policy optimization across various network architectures and data heterogeneity levels. The curves show cumulative reward over training episodes, where

α

represents the Dirichlet concentration parameter controlling data distribution heterogeneity (lower values indicate higher heterogeneity).

Figure 3. Clean accuracy under defense performance comparison across different reinforcement learning algorithms, network architectures, and data heterogeneity levels. The curves demonstrate each algorithm’s ability to preserve model utility for legitimate classification tasks while actively defending against temporal backdoor attacks during training.

Table 2. Experimental Parameters and Configuration Settings.

Parameter	Symbol	Value
Dataset and Client Configuration
Datasets	–	CIFAR-10, FEMNIST, MNIST
Number of clients	N	10, 20, 30, 40, 50
Data distribution	–	Non-IID
Dirichlet concentration	$α$	0.1, 0.5, 1.0
Malicious client ratio	$κ$	0.1, 0.2, 0.3, 0.4
Model architecture	–	MobileNet V2, ResNet-18
Federated Learning Parameters
Learning rate	$η$	0.01
Local epochs	–	5
Communication rounds	T	100
Batch size	–	32
Poisoning ratio	$ρ$	0.1
Defense Framework Parameters
Short window size	$w_{f}$	5
Long window size	$w_{s}$	10
Detection threshold	$τ_{α}$	0.8
Temporal regularization	$λ_{τ}$	0.1
Pattern templates	$\| P_{Θ} \|$	10
Geometric median tolerance	$ϵ$	$10^{- 6}$
Reputation decay scales	$K_{r}$	3
MDP and Reinforcement Learning Parameters
Discount factor	$γ$	0.95
Policy learning rate	–	0.001
Training episodes	–	500
Max steps per episode	–	1000
Exploration rate	$ϵ$	0.1
Experience buffer size	–	10,000
Target network update	–	1000 steps
Minimum participation	$N_{min}$	$⌈ 0.6 N ⌉$

Table 3. Defense Success Rate for ResNet-18 under varying client numbers and data heterogeneity (

κ

= 0.2).

Table 3. Defense Success Rate for ResNet-18 under varying client numbers and data heterogeneity (

κ

= 0.2).

N	$α = 0.1$	$α = 0.5$	$α = 1.0$
10	0.847 ± 0.023	0.892 ± 0.019	0.924 ± 0.015
20	0.863 ± 0.021	0.906 ± 0.017	0.938 ± 0.013
30	0.871 ± 0.019	0.915 ± 0.016	0.947 ± 0.012
40	0.876 ± 0.018	0.921 ± 0.015	0.952 ± 0.011
50	0.879 ± 0.017	0.925 ± 0.014	0.956 ± 0.010

Table 4. Temporal Detection Efficiency for ResNet-18 under varying client numbers and data heterogeneity (

κ

= 0.2).

Table 4. Temporal Detection Efficiency for ResNet-18 under varying client numbers and data heterogeneity (

κ

= 0.2).

N	$α = 0.1$	$α = 0.5$	$α = 1.0$
10	0.673 ± 0.041	0.721 ± 0.038	0.768 ± 0.033
20	0.695 ± 0.039	0.742 ± 0.035	0.789 ± 0.031
30	0.708 ± 0.037	0.755 ± 0.033	0.802 ± 0.029
40	0.716 ± 0.036	0.763 ± 0.032	0.811 ± 0.028
50	0.721 ± 0.035	0.769 ± 0.031	0.817 ± 0.027

Table 5. Defense Success Rate for ResNet-18 under varying client numbers and malicious ratios (

α

= 0.5).

Table 5. Defense Success Rate for ResNet-18 under varying client numbers and malicious ratios (

α

= 0.5).

N	$κ = 0.1$	$κ = 0.2$	$κ = 0.3$	$κ = 0.4$
10	0.943 ± 0.012	0.892 ± 0.019	0.834 ± 0.026	0.768 ± 0.032
20	0.957 ± 0.011	0.906 ± 0.017	0.847 ± 0.024	0.781 ± 0.030
30	0.964 ± 0.010	0.915 ± 0.016	0.856 ± 0.023	0.791 ± 0.029
40	0.969 ± 0.009	0.921 ± 0.015	0.863 ± 0.022	0.798 ± 0.028
50	0.972 ± 0.009	0.925 ± 0.014	0.867 ± 0.021	0.803 ± 0.027

Table 6. Temporal Detection Efficiency for ResNet-18 under varying client numbers and malicious ratios (

α

= 0.5).

Table 6. Temporal Detection Efficiency for ResNet-18 under varying client numbers and malicious ratios (

α

= 0.5).

N	$κ = 0.1$	$κ = 0.2$	$κ = 0.3$	$κ = 0.4$
10	0.825 ± 0.028	0.721 ± 0.038	0.612 ± 0.045	0.498 ± 0.052
20	0.841 ± 0.026	0.742 ± 0.035	0.634 ± 0.042	0.521 ± 0.049
30	0.852 ± 0.025	0.755 ± 0.033	0.649 ± 0.040	0.537 ± 0.047
40	0.859 ± 0.024	0.763 ± 0.032	0.658 ± 0.039	0.547 ± 0.046
50	0.864 ± 0.023	0.769 ± 0.031	0.664 ± 0.038	0.554 ± 0.045

Table 7. Defense Success Rate for MobileNet V2 under varying client numbers and data heterogeneity (

κ

= 0.2)

Table 7. Defense Success Rate for MobileNet V2 under varying client numbers and data heterogeneity (

κ

= 0.2)

N	$α = 0.1$	$α = 0.5$	$α = 1.0$
10	0.821 ± 0.025	0.869 ± 0.021	0.908 ± 0.017
20	0.836 ± 0.023	0.883 ± 0.019	0.921 ± 0.015
30	0.845 ± 0.022	0.892 ± 0.018	0.930 ± 0.014
40	0.851 ± 0.021	0.898 ± 0.017	0.936 ± 0.013
50	0.855 ± 0.020	0.902 ± 0.016	0.940 ± 0.012

Table 8. Temporal Detection Efficiency for MobileNet V2 under varying client numbers and data heterogeneity (

κ

= 0.2)

Table 8. Temporal Detection Efficiency for MobileNet V2 under varying client numbers and data heterogeneity (

κ

= 0.2)

N	$α = 0.1$	$α = 0.5$	$α = 1.0$
10	0.657 ± 0.043	0.703 ± 0.040	0.751 ± 0.036
20	0.678 ± 0.041	0.724 ± 0.038	0.772 ± 0.034
30	0.691 ± 0.039	0.737 ± 0.036	0.785 ± 0.032
40	0.699 ± 0.038	0.745 ± 0.035	0.793 ± 0.031
50	0.704 ± 0.037	0.751 ± 0.034	0.799 ± 0.030

Table 9. Defense Success Rate for MobileNet V2 under varying client numbers and malicious ratios (

α

= 0.5).

Table 9. Defense Success Rate for MobileNet V2 under varying client numbers and malicious ratios (

α

= 0.5).

N	$κ = 0.1$	$κ = 0.2$	$κ = 0.3$	$κ = 0.4$
10	0.926 ± 0.014	0.869 ± 0.021	0.807 ± 0.028	0.739 ± 0.035
20	0.940 ± 0.013	0.883 ± 0.019	0.821 ± 0.026	0.753 ± 0.033
30	0.947 ± 0.012	0.892 ± 0.018	0.831 ± 0.025	0.763 ± 0.032
40	0.952 ± 0.011	0.898 ± 0.017	0.838 ± 0.024	0.770 ± 0.031
50	0.955 ± 0.010	0.902 ± 0.016	0.843 ± 0.023	0.775 ± 0.030

Table 10. Temporal Detection Efficiency for MobileNet V2 under varying client numbers and malicious ratios (

α

= 0.5)

Table 10. Temporal Detection Efficiency for MobileNet V2 under varying client numbers and malicious ratios (

α

= 0.5)

N	$κ = 0.1$	$κ = 0.2$	$κ = 0.3$	$κ = 0.4$
10	0.809 ± 0.030	0.703 ± 0.040	0.591 ± 0.047	0.474 ± 0.054
20	0.825 ± 0.028	0.724 ± 0.038	0.613 ± 0.045	0.497 ± 0.052
30	0.836 ± 0.027	0.737 ± 0.036	0.628 ± 0.043	0.514 ± 0.050
40	0.843 ± 0.026	0.745 ± 0.035	0.637 ± 0.042	0.525 ± 0.049
50	0.848 ± 0.025	0.751 ± 0.034	0.644 ± 0.041	0.533 ± 0.048

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Time-Aware Security Intelligence for Federated Financial Systems: Deep Reinforcement Learning Against Temporal Poisoning Attacks

Abstract

Keywords:

Subject:

1. Introduction

1.1. Background

1.2. Motivation and Contributions

2. Related Work

2.1. Federated Learning Security in Financial Systems

2.2. Temporal Attack Detection and Defense Mechanisms

2.3. Multi-Layer Defense and MDP-based Coordination

3. Method

3.1. Problem Formulation

3.2. Temporal Backdoor Attack Models

3.2.1. Fixed-Period Data Poisoning Attack

3.2.2. Multi-Period Data Poisoning Attack

3.2.3. Model Weight Poisoning Attack

3.3. Multi-Layer Defense Framework

3.3.1. Temporal Behavioral Analysis Layer

3.3.2. Robust Statistical Aggregation Layer

3.3.3. Multi-Scale Validation Layer

3.4. MDP Framework for Defense Coordination

3.4.1. Defense State Space Design

3.4.2. Defense Action Space Formulation

3.4.3. Defense Transition Dynamics

3.4.4. Defense Reward Function Design

3.4.5. Defense Policy Optimization

4. Experiment

4.1. Experimental Setup

4.2. Evaluation Metrics

4.2.1. Clean Accuracy under Defense

4.2.2. Defense Success Rate

4.2.3. Temporal Detection Efficiency

4.3. Implementation Details

5. Results

5.1. MDP Policy Learning Performance

5.2. Clean Accuracy Preservation Performance

5.3. Defense Effectiveness Evaluation

6. Conclusions

Appendix A. Byzantine Robustness Analysis

Appendix A.1. Fundamental Byzantine Robustness Theorem

Appendix A.2. Corollaries and Extensions

Appendix A.3. Robustness Under Stronger Attack Models

Appendix B. Weiszfeld Algorithm Convergence Analysis

Appendix B.1. Preliminaries and Algorithm Description

Appendix B.2. Main Convergence Theorem

Appendix B.3. Practical Implementation Considerations

Appendix B.4. Robustness Under Approximate Computation

Appendix B.5. Integration with DEFEND Framework

References

MDPI Initiatives

Important Links

Subscribe