Preprint
Article

This version is not peer-reviewed.

Fair Federated Learning

Submitted:

05 September 2024

Posted:

06 September 2024

You are already at the latest version

Abstract
Optimization is critical in various fields like smart vehicles, and transportation. Federated Learning (FL) has emerged as an effective approach in the coordination of autonomous vehicles, but traditional methods such as FedAvg can create performance disparities across clients. This paper addresses this fairness issue through the $q$-Fair Federated Learning ($q$-FFL) framework, adjusting model performance across clients using a tunable fairness parameter. We propose a modified FedAvg algorithm for $q$-FFL that maintains comparable convergence rates, ensuring more balanced client outcomes. Additionally, we explore incentive mechanisms in FL using a Stackelberg game model, incorporating a fairness coefficient to encourage equitable participation. Building on prior works, we redefine client utility functions to address communication and computation costs, ensuring fair resource allocation. The proposed framework achieves both global and local fairness, maintaining a unique Nash equilibrium in the modified game setting.
Keywords: 
;  ;  ;  

1. Introduction

Optimization plays a crucial role in various domains such as smart homes [1,2], finance [3,4], transportation [5,6], and solar energy systems [7,8,9]. Recently, learning models have shown promising abilities to solve complex optimization problems. Among them, Most recent studies in Federated Learning (FL) have been focusing on designing optimization algorithms with proven convergence guarantee for such a finite-sum objective
min x R m f ( x ) 1 N i = 1 N f i ( x )
where N is the number of clients in a network and f i ( x ) E ζ D i [ F i ( x ; ζ i ) ] is the expected loss of prediction of client i given the model parameter x and data distribution D i [10]. However, as [11] pointed out, naively minimizing the aggregate loss function may create disparities of model performance (i.e. prediction accuracy) across different clients. For example, it could result in overfitting a model to one client at the cost of another. In that case, the prediction accuracy would be higher for some clients over the others because more data is contributed by those clients to train the model. To better describe the issue, [11] define fairness of performance distribution as the uniformity of model performance across devices. A trained model w is fairer than model w ˜ if the performance of model w on the N devices is more uniform than that of model w ˜ .
Inspired by the α -fairness function in resource allocation of wireless networks, [11] propose an optimization objective called q-Fair Federated Learning (q-FFL) that addresses the fairness issues, i.e. the disproportional model performance across devices. Unlike the objective in ( 1 ) , q-FFL penalizes the loss functions of devices with a tunable parameter q, so that the model performance across clients in the network is pushed to be uniform in any desired extent.
The authors mentioned that with q-FFL as the optimization objective, FedAvg is no longer applicable to solve the problem, as the newly defined expected loss for client i, 1 q + 1 f i q + 1 ( x ) E ζ i D i [ F i ( x ; ζ i ) ] and thus the local SGD update x i t x i t 1 γ G i t in FedAvg (where G i t = F i ( x i t 1 ; ζ i ) ) cannot be used in the q-FFL setting.
But if we also change the loss function from F i ( x ; ζ i ) to 1 q + 1 F i q + 1 ( x ; ζ i ) , and do local update as x i t x i t 1 γ H i t (where H i t = G i t F i q ( x ; ζ i ) ), the new objective could still be fit in the FedAvg framework. (Notice that the equality E ζ i D 1 [ 1 q + 1 F i q + 1 ( x ; ζ i ) ] = 1 q + 1 f i q + 1 ( x ) is not true in general, so we need find a proper F i ( · ) that satisfy this equality to make it work).
Assume we can find a suitable F i ( · ) , then we are able to prove that the convergence rate of FedAvg on q-FFL is comparable with using the same algorithm on the original finite-sum objective. Notice that following the standard assumptions on the objective function f i ( · ) , 12] have shown that FedAvg for non-convex optimization can achieve O ( 1 / N T ) . In this work, we show that the q-Fair Federated Learning objective with a FedAvg-like algorithm can maintain the O ( 1 / N T ) convergence using almost the same assumptions on f i ( · ) (instead of assumptions on f i q + 1 ( · ) ).

2. q-Fair Federated Learning and Its Performance Analysis

For non-negative cost function f i and parameter q > 0 , the objective of q-FFL is defined as:
min x R m f q ( x ) 1 N i = 1 N 1 q + 1 f i q + 1 ( x )
where f i q + 1 ( · ) denote f i ( · ) raised to the power of q + 1 . q > 0 is the tunable fairness parameter. Notice that similar to the α -fairness framework, when q = 0 , no fairness will be imposed and problem ( 2 ) is reduced to problem ( 1 ) . When q = + , it corresponds to max-min fairness.
Table 1. Notation summary.
Table 1. Notation summary.
q Fairness parameter
N The number of clients
D i Set of data at client i
ζ i Example drawn from D i
F i ( x ; ζ i ) Loss on example ζ i at client i with model parameter x
f i ( x ) Expected loss at client i with model parameter x , i.e. E ζ i D i [ F i ( x ; ζ i ) ]
g i ( x ) Expected q-FFL loss at client i with model parameter x , i.e. 1 q + 1 f i q + 1 f i ( x )
f ( x ) Objective function of federated learning, i.e., 1 N i = 1 N f i ( x )
g ( x ) Objective function of q-fair federated learning, i.e., 1 N i = 1 N 1 q + 1 f i q + 1 ( x )
G i t Stochastic gradient of F i on x i t 1 with example ζ i t , i.e., F i ( x i t 1 ; ζ i t )
H i t G i t F i q ( x i t 1 ; ζ i t )
Throughout this paper, we assume problem ( 2 ) satisfies the following assumption.
Assumption 1. We assume that f i ( · ) satisfies:
  • Smoothness: Each f i ( x ) is smooth with modulus L, i.e., for any x , y R m , f i ( x ) f i ( y ) L x y .
  • Bounded variances and second moments: Assume that stochastic gradient F i ( · ) has bounded variance and second moments: There exists constant σ > 0 and G > 0 such that
    E ζ i D i F i ( x ; ζ i ) f i ( x ) 2 σ 2 , x , i
    E ζ i D i F i ( x ; ζ i ) 2 G 2 , x , i
  • Bounded loss function: There exist M > 0 such that f i ( x ) M , x X , where X R m is a non-empty compact set.
We assume the data is independent and identically distributed (IID). Each worker can compute unbiased stochastic gradients (on the last iteration solution x i t 1 and data sample ζ i t given by
H i t = G i t F i q ( x i t 1 ; ζ i t )
For simplicity, we denote the expected loss for device i by g i ( · ) , i.e., g i ( x ) 1 q + 1 f i q + 1 ( x ) and the expectation of the stochastic gradient is g i ( x i t 1 ) = f i ( x i t 1 ) f i q ( x i t 1 ) where ζ [ t 1 ] [ ζ i τ ] i { 1 , 2 , , N } , τ { 1 , , t 1 } denotes all the random samples used to calculate stochastic gradients up to iteration t 1 .
The FedAvg-like algorithm with q-FFL objective is described in Algorithm 1.
Algorithm 1: FedAvg-like algorithm with q-Fairness. i is the client index, K is the number of local epochs and s is the learning rate.
Preprints 117449 i001
Algorithm 1 is essentially the same as FedAvg described in [13], except that the stochastic gradient H i t is computed on g i ( · ) , the expected loss for the newly defined q-FFL objective.
Fix iteration index t, we define
x ¯ t 1 N i = 1 N x i t
as the average of local solution x i t over all N nodes. It is immediate that
x ¯ t = x ¯ t 1 s 1 N i = 1 N H i t = x ¯ t 1 s 1 N i = 1 N f i q ( x i t 1 ) G i t
The proof starts from the descent lemma and the L-smoothness assumption. Then we bound the generated quadratic term and cross term using the same technique from [12]. And lastly, through a telescoping sum, we get an upper bound on the (average) expected square gradient norm. The following useful lemma relates client drift E [ x ¯ t x i t 2 ] and node synchronization interval K. Table 1 summarizes our notation.
Lemma 1: (Client drift) Under Assumption 1, the algorithm ensures
E [ x i t x ¯ t 2 ] 4 s 2 K 2 G 2 M 2 q , i , t
where s is the constant stepsize, G , M are constant defined in Assumption 1, and q is the fairness parameter.
Proof. 
x i t = y ¯ s τ = t 0 + 1 t H i τ
x ¯ t = i = 1 N x i t = y ¯ s τ = t 0 + 1 t 1 N i = 1 N H i τ
Thus, we have
E [ x i t x ¯ t 2 ] = E [ s τ = t 0 + 1 t 1 N i = 1 N H i τ s τ = t 0 + 1 t H i τ 2 ] = s 2 E [ τ = t 0 + 1 t 1 N i = 1 N H i τ τ = t 0 + 1 t H i τ 2 ] 2 s 2 E [ τ = t 0 + 1 t 1 N i = 1 N H i τ 2 + τ = t 0 + 1 t H i τ 2 ] 2 s 2 ( t t 0 ) E [ τ = t 0 + 1 t 1 N i = 1 N H i τ 2 + τ = t 0 + 1 t H i τ 2 ] 2 s 2 ( t t 0 ) E [ τ = t 0 + 1 t ( 1 N i = 1 N H i τ 2 ) + τ = t 0 + 1 t H i τ 2 ] 2 s 2 ( t t 0 ) E [ τ = t 0 + 1 t ( 1 N i = 1 N f i 2 q ( x i t 1 ) f i q ( x i t 1 ) G i τ 2 ) + τ = t 0 + 1 t f i q ( x i t 1 ) G i τ 2 ] 4 s 2 K 2 G 2 M 2 q
   □
Theorem 1. Under Assumption 1, if 0 < s 1 L in the Algorithm, then for all T 1 , we have
1 T i = 1 T E [ g ( x ¯ t 1 ) 2 ] 2 s T ( g ( x ¯ 0 ) f * ) + 4 s 2 K 2 G 2 L 2 M 4 q + L s σ 2 M 2 q N
where f * is the minimum value of the problem.
Proof. 
From L-smoothness and descent lemma:
E [ g ( x ¯ t ) ] E [ g ( x ¯ t 1 ) ] + E [ g ( x ¯ t 1 ) , x ¯ t x ¯ t 1 ] + L 2 E [ x ¯ t x ¯ t 1 2 ]
For the quadratic term,
E [ x ¯ t x ¯ t 1 2 ] = s 2 E [ 1 N i = 1 N H i t 2 ] = s 2 N 2 i = 1 N E [ H i t g i ( x i t 1 ) 2 ] + s 2 E [ 1 N i = 1 N g i ( x i t 1 ) 2 ] = s 2 N 2 i = 1 N E [ f i q ( x i t 1 ) ( G i t f i ( x i t 1 ) ) 2 ] + s 2 E [ 1 N i = 1 N f i ( x i t 1 ) f i q ( x i t 1 ) 2 ] s 2 M 2 q N 2 i = 1 N E [ G i t f i ( x i t 1 ) 2 ] + s 2 M 2 q E [ 1 N i = 1 N f i ( x i t 1 ) 2 ] 1 N s 2 σ 2 M 2 q + s 2 M 2 q E [ 1 N i = 1 N f i ( x i t 1 ) 2 ]
For the cross term:
E [ g ( x ¯ t 1 ) , x ¯ t x ¯ t 1 ] = s E [ g ( x ¯ t 1 ) , 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) ] = s 2 E [ g ( x ¯ t 1 ) 2 + 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 g ( x ¯ t 1 ) 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ]
Substituting ( 4 ) and ( 5 ) into ( 3 ) yields
E [ g ( x ¯ t ) ] E [ g ( x ¯ t 1 ) ] s s 2 L 2 E [ 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ] s 2 E [ g ( x ¯ t 1 ) 2 ] + s 2 E [ g ( x ¯ t 1 ) 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ] + L s 2 σ 2 M 2 q 2 N
E [ g ( x ¯ t 1 ) 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ] = E [ 1 N i = 1 N f i ( x ¯ t 1 ) f i q ( x ¯ t 1 ) 1 N i = 1 N f i ( x i t 1 ) f i q ( x i t 1 ) 2 ] 1 N E [ i = 1 N f i ( x ¯ t 1 ) f i q ( x ¯ t 1 ) f i ( x i t 1 ) f i q ( x i t 1 ) 2 ] M 2 q 1 N E [ i = 1 N f i ( x ¯ t 1 ) f i ( x i t 1 ) 2 ] M 2 q L 2 1 N i = 1 N E [ x ¯ t 1 x i t 1 2 ] 4 s 2 K 2 G 2 L 2 M 4 q
Plugging ( 7 ) back into ( 6 ) and let 0 < s 1 L , we have
E [ g ( x ¯ t ) ] E [ g ( x ¯ t 1 ) ] s 2 E [ g ( x ¯ t 1 ) 2 ] + 2 s 3 K 2 G 2 L 2 M 4 q + L s 2 σ 2 M 2 q 2 N
Dividing both sides of ( 8 ) by s / 2 and rearranging terms yields
E [ g ( x ¯ t 1 ) 2 ] 2 s ( E [ g ( x ¯ t 1 ) ] E [ g ( x ¯ t ) ] ) + 4 s 2 K 2 G 2 L 2 M 4 q + L s σ 2 M 2 q N
Summing over t { 1 , , T } and dividing both sides by T yields
1 T i = 1 T E [ g ( x ¯ t 1 ) 2 ] 2 s T ( g ( x ¯ 0 ) E [ f ( x ¯ T ) ] ) + 4 s 2 K 2 G 2 L 2 M 4 q + L s σ 2 M 2 q N 2 s T ( g ( x ¯ 0 ) f * ) + 4 s 2 K 2 G 2 L 2 M 4 q + L s σ 2 M 2 q N
   □
The next corollary follows by substituting suitable s , K values into Theorem 1.
Corollary 1. Consider the problem under Assumption 1. Let T N .
1. If we choose s = N L T in the Algorithm, then we have 1 T i = 1 T E [ g ( x ¯ t 1 ) 2 ] 2 L N T ( g ( x ¯ 0 g * ) ) + 4 N T K 2 G 2 M 2 q + 1 N T σ 2 M 2 q .
2. If we further choose K T 1 / 4 N 3 / 4 , then 1 T i = 1 T E [ g ( x ¯ t 1 ) 2 ] 2 L N T ( g ( x ¯ 0 g * ) ) + 4 N T G 2 M 2 q + 1 N T σ 2 M 2 q = O ( 1 N T ) where g * is the minimum value of the problem.
With proper assumptions, FedAvg can achieve the same convergence rate on the objective of q-fair federated learning as the original objective.

3. Fair Incentive Mechanism Design Through Stackelberg Game Settings

Most of the studies in Federated Learning (FL) have been focused on convergence rate and stationary gap of optimization algorithms. However, designing mechanisms motivating local devices known as clients in terms of FL to collaborate in global model training has not received enough attention. [14] devised an incentive mechanism in a Stackelberg game setting, and proved the existence of a unique Nash equilibrium.
In the first stage of a Stackelberg game, the parameter server (leader) strives to convince clients (followers) to participate in the training of a global model by declaring a total payment of τ > 0 . Next in the second stage, participants decide about how much local data they are going to train. [14] proved that for any τ , the second-stage game has a unique Nash equilibrium, and furthermore, there exists a unique Stackelberg equilibrium for the game. The edge nodes also incur two costs, communication cost c n c o m and computation cost c n c m p given their training data set size x n .
Based on works of [14] and [15], we add a fairness coefficient to the utility functions of clients in a federated learning incentive mechanism design with the hope that this additive term causes more fair allocation of resources to clients. [15] argues that the global fairness of a model considers the full dataset across all clients while in local fairness performance measurement, data sets are typically non-iid. To address this issue, they propose a global and local fairness metric, i.e.,
F g l o b a l = P r ( Y ^ = 1 | A = 0 , Y = 1 ) P r ( Y ^ = 1 | A = 1 , Y = 1 )
Where Y ^ is a trained classifier and A is a group, let’s say, gender group. The ideal amount of the above metric is zero meaning that the true positive rate of the above metric should be equal regardless of the gender, male or female. The local fairness metric is defined as:
F k = P r ( Y ^ = 1 | A = 0 , Y = 1 , C = k ) P r ( Y ^ = 1 | A = 1 , Y = 1 , C = k )
Where C denotes the k t h client and for F k , k t h clients data set and distribution have been considered [15]. Furthermore, in FedAvg, [13], we have weights when doing the conventional global updates, i.e., w k t = n k k n k . Particularly, [15] discusses that these naive aggregation weights disfavour clients with lower data set sizes and they propose the following modified version of aggregation weights to achieve global fairness as follows:
w k t = e x p ( β | F k t F g l o b a l t | ) n k k n k
We adopted this approach and integrated it into the work of [14]. Notations and proofs of the existence of Nash equilibrium in the modified version of this work have been demonstrated in what follows.
The utility of each client is modified by adding a fairness term to it, i.e.,
u ( n , n ) = e x p ( γ n ) x n i x i τ C n c o m x n C n c m p x n
Where γ n = β | F k t F g l o b a l t | . To maximize the utility of each client, we take the derivative of the utility function and put it equal to zero, i.e.,
u ( n , n ) = e x p ( γ n ) i x i x n ( i x i ) 2 τ C n c o m C n c m p = 0
To ensure that the answers to the above equation are not saddle points, we derive the second derivative w.r.t. x n , i.e.,
u " ( n , n ) = e x p ( γ n ) ( 2 i x i ) ( i x i x n ( x i ) 4 ) τ < 0
This means that the utility function is strictly concave and the solutions are global optimal, i.e.,
x n = ( C n c o m + C n c m p ) 1 τ e x p ( γ n ) ( i n x i ) 1 2 i n x i
However, there is a limitation on the amount of clients’ contributions, i.e., it cannot exceed d n , and it depends also on the τ , hence, we have:
x n = 0 τ < ( i n x i ) ( C n c o m + C n c m p ) e x p ( γ n ) ( ( C n c o m + C n c m p ) 1 τ e x p ( γ n ) ( i n x i ) ) 1 2 i n x i x n ( 0 , d n ) d n Otherwise
Then, one can derive the best response from the followers, and clients, independent of other clients, as follows:
n x n * = τ ( n m x n * ) e x p ( γ m ) C m c o m + C m c p m
Then observe that,
x 1 * = a a 2 ( C 1 c o m + C 1 c m p ) τ e x p ( γ 1 ) )
Finally, considering the M t h client,
x M * = a a 2 ( C M c o m + C M c m p ) τ e x p ( γ M ) )
Observe that summing both sides of equations from 1 to M, yields:
a = M a n = 1 M a 2 ( C n c o m + C n c m p ) τ e x p ( γ n )
implying that,
a = M 1 n = 1 M C n c o m + C n c m p τ e x p ( γ n )
Finally, observe that:
x m * = M 1 n = 1 M C n c o m + C n c p m τ ( e x p ( X n ) ) M 1 n = 1 M C n c o m + C n c p m τ ( e x p ( X n ) ) 2 . C m c o m + C m c p m τ e x p ( γ m )
So far, we have found the best responses of followers. At this point, the leader knows that whatever the payment is there exists a unique equilibrium, the leader strives to derive the maximum profit by best tuning τ i.e., u ( τ ) = λ g n x n * τ is the parameter serverer utility function as defined in [14].
u ( τ ) τ = λ g ( X ) X τ 1
and we have that
x m * τ = M 1 n C n c o m + C n c m p e x p ( x n ) M 1 n C n c o m + C n c m p e x p ( x n ) 2 C m c o m + C m c m p e x p ( x m )
And,
2 x m * τ 2 = 0
Hence, we have,
u ( τ ) τ = λ g ( x ) m = 1 N M 1 n C n c o m + C n c m p e x p ( x n ) M 1 n C n c o m + C n c m p e x p ( x n ) 2 C m c o m + C m c m p e x p ( x m ) 1
Given that 2 g ( x ) x 2 < 0 , we have,
2 u ( τ ) τ 2 = λ 2 g ( x ) x 2 m = 1 N M 1 n C n c o m + C n c m p e x p ( x n ) M 1 n C n c o m + C n c m p e x p ( x n ) C m c o m + C m c m p e x p ( x m ) 2 < 0
Since u ( τ ) is a concave function and its value is equal to zero when τ = 0 , then it has a unique maximize, and the proof is done.

4. Summary

In this project, we studied the fairness issue in federated learning. Two different notions of fairness were examined: ( 1 ) fair (uniform) model performance across local clients and ( 2 ) fair incentive mechanism. To achieve the first type of fairness, a novel optimization objective q-FFL was introduced. We tried to fit this federated learning task into the FedAvg framework and derived the convergence rate.
At first, we thought we had proven it by following a similar procedure from previous literature. However, after a careful examination of the new objective function, we realized that some parts in the proof procedure of [12] cannot be used in this case due to the change of objective. For example, we do not know E ζ i D i [ 1 q + 1 F i q ( x ; ζ i ) ] when only knowing E ζ i D i [ F i ( x ; ζ i ) ] = f i ( x ) . Therefore, we could not use SGD for a local update as in FedAvg. Either we need more assumptions or find a suitable structure of F i ( x ) to make Theorem 1 hold.
As for the fairness in the incentive mechanism, we modeled the training process as a Stackelberg game where the server is treated as the leader and the clients are followers. Once the server announces a payment for training the global model, the followers decide the amount of data (and effort) they would like to contribute to the training task to maximize their payment. We have proven that the system could reach a unique Stackelberg equilibrium.

Appendix A

This part gives also an upper bound which is looser than the proved bounds above. The techniques used in this proof could be used whenever one would want to bind each loss function differently. Lipschitz coefficient is local and defined as follows at every point x:
L = L f q ( x ) + q f q 1 ( x ) ( | | f ( x ) | | 2 )
[11]. They further assume that step size is s = 1 L for every local agent. Either considering such a dynamic Lipschitz coefficient or such a dynamic step size makes it super difficult to simplify all the above terms. What we assume is that we have a fixed stepsize s and Lipschitz constant defined at a point such as x ̲ ¯ T
Assumption 2. Let assume that the functions f i are bounded s.t. m a x i , x f i q ( x ) M (This could be generalized to a specific bound on each loss function)
Due to the assumption 2, now we have client drift simplified as follows:
E [ x i t x ¯ t 2 ] 2 s 2 ( t t 0 ) E [ τ = t 0 + 1 t ( 1 N i = 1 N H i τ 2 ) + τ = t 0 + 1 t H i τ 2 ] 2 s 2 ( t t 0 ) E [ τ = t 0 + 1 t ( 1 N i = 1 N f i q ( x i τ 1 ) G i τ 2 ) + τ = t 0 + 1 t f i q ( x i τ 1 ) G i τ 2 ] 2 s 2 ( t t 0 ) M E [ τ = t 0 + 1 t ( 1 N i = 1 N G i τ 2 ) + τ = t 0 + 1 t G i τ 2 ] 4 M 2 s 2 K 2 G 2
Previously, we showed that
E [ g ( x ¯ t 1 ) 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ] 1 N E [ i = 1 N f i 2 q ( x ¯ t 1 ) f i ( x ¯ t 1 ) f i ( x i t 1 ) 2 ] + 1 N E [ i = 1 N ( f i q ( x ¯ t 1 ) f i q ( x i t 1 ) ) 2 f i ( x i t 1 ) 2 ]
We know that
E [ i = 1 N f i 2 q ( x ¯ t 1 ) f i ( x ¯ t 1 ) f i ( x i t 1 ) 2 ] M 2 L 2 i = 1 N E [ x ¯ t 1 x i t 1 2 ] 4 N M 2 s 2 K 2 G 2 L 2
Subsequently,
E [ g ( x ¯ t 1 ) 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ] 4 M 2 s 2 K 2 G 2 L 2 + 1 N E [ i = 1 N ( f i q ( x ¯ t 1 ) f i q ( x i t 1 ) ) 2 f i ( x i t 1 ) 2 ]
Observe that
f i q ( x ¯ t 1 ) f i q ( x i t 1 ) 2 M 2
Thereby,
E [ g ( x ¯ t 1 ) 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ] 4 M 2 s 2 K 2 G 2 L 2 + M 2 N E [ i = 1 N f i ( x i t 1 ) 2 ]
Observe that now:
E [ g ( x ¯ t ) ] E [ g ( x ¯ t 1 ) ] s s 2 L 2 E [ 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ] s 2 E [ g ( x ¯ t 1 ) 2 ] + 2 M 2 s 3 K 2 G 2 L 2 + s M 2 2 N E [ i = 1 N f i ( x i t 1 ) 2 ] + L s 2 σ 2 2 N 2 i = 1 N E [ f i 2 q ( x i t 1 ) ]
Now, we simplify s s 2 L 2 E [ 1 N i = 1 N f i q ( x i t 1 ) f i ( x i t 1 ) 2 ] assuming that s < 1 L . Since s < 1 L , we have 1 s > L , thus, s 2 L s 2 < s 2 1 s s 2 = 0 . Consequently, by removing the negative term, we obtain:
E [ g ( x ¯ t ) ] E [ g ( x ¯ t 1 ) ] s 2 E [ g ( x ¯ t 1 ) 2 ] + 2 M 2 s 3 K 2 G 2 L 2 + s M 2 2 N E [ i = 1 N f i ( x i t 1 ) 2 ] + L s 2 σ 2 2 N 2 i = 1 N E [ f i 2 q ( x i t 1 ) ] E [ g ( x ¯ t 1 ) ] s 2 E [ g ( x ¯ t 1 ) 2 ] + 2 M 2 s 3 K 2 G 2 L 2 + s M 2 2 N E [ i = 1 N f i ( x i t 1 ) 2 ] + L M 2 s 2 σ 2 2 N 2
To simplify s M 2 2 N E [ i = 1 N f i ( x i t 1 ) 2 ] , we use | | a b | | 2 2 | | a | | 2 + 2 | | b | | 2 . Observe that,
s M 2 2 N E [ i = 1 N f i ( x i t 1 ) 2 ] = s M 2 2 N i = 1 N E [ f i ( x ¯ t 1 ) f i ( x i t 1 ) f i ( x ¯ t 1 ) 2 ] s M 2 N i = 1 N E [ f i ( x ¯ t 1 ) f i ( x i t 1 ) 2 ] + s M 2 N i = 1 N E [ f i ( x ¯ t 1 ) 2 ]
Due to the client drift, we have:
s M 2 N i = 1 N E [ f i ( x ¯ t 1 ) f i ( x i t 1 ) 2 ] 4 L 2 s 3 M 2 K 2 G 2
Rewriting ( 11 ) due to ( 12 ) and ( 13 ) , we obtain:
E [ g ( x ¯ t ) ] E [ g ( x ¯ t 1 ) ] s 2 E [ g ( x ¯ t 1 ) 2 ] + 2 M 2 s 3 K 2 G 2 L 2 + s M 2 2 N E [ i = 1 N f i ( x i t 1 ) 2 ] + L s 2 σ 2 2 N 2 i = 1 N E [ f i 2 q ( x i t 1 ) ] E [ g ( x ¯ t 1 ) ] s 2 E [ g ( x ¯ t 1 ) 2 ] + s M 2 N i = 1 N E [ f i ( x ¯ t 1 ) 2 ] + 4 M 2 s 3 K 2 G 2 L 2 2 M 2 s 3 K 2 G 2 L 2 + L M 2 s 2 σ 2 2 N 2
Due to [16], we have:
1 N i E [ | | f i ( x ) | | 2 ] G 2 + B 2 E [ | | f ( x ) | | 2 ]
We rewrite ( 14 ) due to the above inequality, i.e.,
E [ g ( x ¯ t ) ] E [ g ( x ¯ t 1 ) ] s 2 E [ g ( x ¯ t 1 ) 2 ] + s M 2 ( G 2 + B 2 E [ | | f ( x ¯ t 1 ) | | 2 ] ) + 4 M 2 s 3 K 2 G 2 L 2 2 M 2 s 3 K 2 G 2 L 2 + L M 2 s 2 σ 2 2 N 2
Then we know an upper bound for E [ | | f ( x ¯ t 1 ) | | 2 ] .
By iterating over t and dividing both sides by T we obtain the upper bound □

References

  1. Nematirad, R.; Ardehali, M.; Khorsandi, A.; Mahmoudian, A. Optimization of Residential Demand Response Program Cost with Consideration for Occupants Thermal Comfort and Privacy. IEEE Access 2024. [CrossRef]
  2. Talebi, A. A multi-objective mixed integer linear programming model for supply chain planning of 3D printing. 2024. arXiv:2408.05213.
  3. Varmaz, A.; Fieberg, C.; Poddig, T. Portfolio optimization for sustainable investments. Annals of Operations Research 2024, pp. 1–26.
  4. Talebi, A.; Haeri Boroujeni, S.P.; Razi, A. Integrating random regret minimization-based discrete choice models with mixed integer linear programming for revenue optimization. Iran Journal of Computer Science 2024, pp. 1–15.
  5. Archetti, C.; Peirano, L.; Speranza, M.G. Optimization in multimodal freight transportation problems: A Survey. European Journal of Operational Research 2022, 299, 1–20. [CrossRef]
  6. Talebi, A. Simulation in discrete choice models evaluation: SDCM, a simulation tool for performance evaluation of DCMs. 2024. arXiv:2407.17014.
  7. Nematirad, R.; Pahwa, A.; Natarajan, B. A Novel Statistical Framework for Optimal Sizing of Grid-Connected Photovoltaic–Battery Systems for Peak Demand Reduction to Flatten Daily Load Profiles. Solar. MDPI, 2024, Vol. 4, pp. 179–208.
  8. Nematirad, R.; Pahwa, A.; Natarajan, B.; Wu, H. Optimal sizing of photovoltaic-battery system for peak demand reduction using statistical models. Frontiers in Energy Research 2023, 11, 1297356. [CrossRef]
  9. Soleymani, S.; Talebi, A. Forecasting solar irradiance with geographical considerations: integrating feature selection and learning algorithms. Asian Journal of Social Science 2024, 8, 5.
  10. Talebi, A. Convergence Rate Analysis of Non-I.I.D. SplitFed Learning with Partial Worker Participation and Auxiliary Networks. Preprints 2024. [CrossRef]
  11. Li, T.; Sanjabi, M.; Beirami, A.; Smith, V. Fair Resource Allocation in Federated Learning. International Conference on Learning Representations, 2020.
  12. Yu, H.; Yang, S.; Zhu, S. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence 2019, 33, 5693–5700. [CrossRef]
  13. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
  14. Zhan, Y.; Li, P.; Qu, Z.; Zeng, D.; Guo, S. A Learning-Based Incentive Mechanism for Federated Learning. IEEE Internet of Things Journal 2020, 7, 6360–6368. [CrossRef]
  15. Ezzeldin, Y.H.; Yan, S.; He, C.; Ferrara, E.; Avestimehr, S. Fairfed: Enabling group fairness in federated learning. 2021. arXiv:2110.00857.
  16. Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. International Conference on Machine Learning. PMLR, 2020, pp. 5132–5143.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated