Preprint
Article

This version is not peer-reviewed.

A Model-Based Stochastic Augmented Lagrangian Method for Online Stochastic Optimization

Submitted:

13 April 2026

Posted:

14 April 2026

You are already at the latest version

Abstract
In this paper, we focus on the online stochastic optimization problems in which the random parameters follow time-varying distributions. At each round t, decision is obtained from solving current optimization problem.Then samples are drawn from distributions which are updated after obtaining decision. The objective and constraint are updated in this process, and the updated problem is used to obtain the next decision. For solving the online stochastic optimization problem, we propose a model-based stochastic augmented Lagrangian method, which is referred to as MSALM. At each round, we construct the model functions for the sample objective and constraint functions based on their properties, which reduced the computational complexity. The step size is designed in a dynamic form and decreases as t increases to accelerate convergence. Due to the setting of the online stochastic problem, we use stochastic dynamic regret and constraint violation to measure the performance of our algorithm. Under the assumptions, we prove that our algorithm’s stochastic dynamic regret and constraint violation have a sublinear bound of total number of slots T. We design simulation experiments to verify the efficiency of our online algorithm. Its performance is evaluated on a range of information and system engineering problems, including adaptive filtering, online logistic regression, the time-varying smart grid energy dispatch, the online network resource allocation, and the path planning. In addition, in the context of the path planning problem, we integrate our algorithm with supervised learning to demonstrate its enhanced capabilities. The experimental results validate the performance of our new algorithm in practical applications.
Keywords: 
;  ;  ;  ;  

1. Introduction

In recent years, online optimization problems have garnered widespread attention because of their ability to make real-time decisions with partial information. In the online optimization process, decisions are made sequentially according to feedback from the environment. The general form of this problem can be written as follows:
min f t ( x ) , t { 1 , 2 , , T }
where T is the total number of decision rounds. The meaning of online optimization is clearly defined in several classic review articles[1,2,3]. These articles define regret to measure the online optimization algorithm.
R e g r e t T : = t = 1 T f t ( x t ) min x X t = 1 T f t ( x )
Considering online optimization with constraint, many algorithms are proposed to solve constrained online optimization like in [4,5]. Constraint is added in the problem framework.
s . t . g t ( x ) 0
Constraint violation is defined as:
V i o l a t i o n T : = t = 1 T [ g t ( x ) ] +
In practice, each problem encountered during the online process is closer to being generated randomly. Therefore, studying online stochastic optimization problems is more suitable for meeting real-world demands, where objective and constraint contain random parameters. Taira et al. [6] study an online stochastic optimization algorithm, in which at each round the parameters are sampled independently from a fixed distribution. To more closely approximate the randomly generated nature of the problem, Cao et al.[7] study the following online stochastic optimization problem with time-varying distributions:
min F t ( x ) : = E θ P t [ f ( x , θ ) ] s . t . G t ( x ) : = E ξ Q t [ g ( x , ξ ) ] 0 , t { 1 , 2 , , T }
In their work, they employ the projected stochastic gradient method (PSGD), which is a deterministic offline version of online gradient descent (OGD) [1], to solve the problem. However, PSGD uses a fixed step size and lacks an adaptive mechanism, which may lead to slower convergence in online settings. Research on this kind of issue is still very limited at present. Thus, we aim to design a new algorithm to efficiently solve online stochastic optimization problem with time-varying distributions.
For stochastic optimization problems, existing stochastic gradient-based algorithms such as SGD [8], SVRG [9], and SPDAM [10] give us some insights into analyzing and handling stochastic problems. The Lagrangian method [11] is widely used in constrained optimization problems. Several works use this method in online optimization, for example, [12,13]. Liu et al. proposed a model-based augmented Lagrangian method to solve online constrained optimization [14]. This method performs well for specific structured problems. It simplifies the computation by approximating the objective and constraint functions at each round. Together with the corresponding primal-dual update formula, this method guarantees a sublinear upper bound. Especially for online optimization, [15] shows that the augmented Lagrangian method converges super-linearly asymptotically.
The regret of online algorithms has a dynamic version, which is defined in [1]. Dynamic regret measures the ability of algorithm to track the optimal solution at each round. [7] defines dynamic regret and constraint violation to measure its performance and adapts to stochasticity:
R e g r e t ( T ) : = E [ t = 1 T F t ( x t ) ] t = 1 T F t ( x t * ) V i o l a t i o n i ( T ) : = E [ t = 1 T G t , i ( x t ) ]
Recent studies on distributed online optimization with coupled constraints [16] and online composite optimization with time-varying regularizers [17] also provide valuable insights into handling dynamic and structured online optimization problems, further motivating our approach.
Inspired by these ideas, we solve the online stochastic optimization problem with time-varying distribution by using the model-based augmented Lagrangian method in [14]. At the same time, we incorporate time-varying distribution approximations and a dynamic step size. Stochastic dynamic regret and constraint violation are used to evaluate the performance of our algorithm [7].
The following are the main contributions of our work:
1.
We propose a model-based stochastic augmented Lagrangian method (MSALM) for online stochastic optimization. In each round, we construct model functions to approximate the stochastic objective and constraint functions, which are sampled from time-varying distributions. This construction reduces computational complexity. The step size is designed in a dynamic form and decreases as t increases to accelerate convergence.
2.
We adopt dynamic regret and constraint violation as our performance metrics. These measures are particularly suited for online stochastic optimization with time-varying distributions. Under the assumptions, we prove that the algorithm’s regret and constraint violation have sublinear bounds in terms of the total number of rounds T.
3.
We demonstrate the practical efficacy of our proposed algorithm through a series of simulation experiments. In the contexts of adaptive filtering and online logistic regression, we compare our method with PSGD. The results show that MSALM attains lower regret and constraint violation bounds than PSGD, indicating that MSALM converges more rapidly toward the theoretical optimum while maintaining stricter adherence to constraints. In addition, results from the time-varying smart grid energy dispatch, online network resource allocation, and path planning problems collectively confirm that regret and constraint violation have bounds O ( T ) .

2. MSALM for Online Stochastic Optimization

This section presents the online stochastic optimization problem and the details of the model-based stochastic augmented Lagrangian method (MSALM). Then we describe the update strategies of our algorithm.

2.1. The Online Stochastic Optimization Problem

The online optimization problem is a process of making decisions sequentially with partial information. It generates a sequence of decisions through continuous interaction with the environment. The environment refers to the optimization objective (loss function) and its constraints at each round. If the generation process of the objectives and constraints is stochastic, then this problem becomes an online stochastic problem, as shown in 1. The random parameters θ and ξ represent the samples, which are drawn from the time-varying distributions defined in 1. The problem determines the online decision process. At round t, the decision x t 1 is selected based on previous information. Then the distributions P t 1 and Q t 1 are updated to P t and Q t . Parameters θ t P t and ξ t Q t are drawn from the current distribution. After that, the f t and g t are obtained as the samples of F t and G t as:
f t ( x ) = f ( x , θ t ) , g t ( x ) = g ( x , ξ t )
where x X R n . New decision x t is selected by solving this optimization problem.
Based on the above setup, we make the following standard assumptions.
Assumption 1.
X is a bounded set, and there exists a constant R > 0 for any x , y X satisfying
x y R
Assumption 2.
f ( x , θ ) and g ( x , ξ ) are convex and differentiable for any θ Θ and ξ Ξ .
Assumption 3
(Slater’s condition). At each round t, there exists a decision x t X for all i = 1 , 2 , , m satisfying
g t ( i ) ( x t ) 0

2.2. MSALM Algorithm

To efficiently solve the online stochastic optimization problem with time-varying distributions defined in 1, we extend the model-based augmented Lagrangian method (MALM) proposed by Liu et al. [14] to the stochastic setting. In the MALM framework [14], model-based means that we conservatively approximate the objective and constraint based on the properties of functions. The approximations f ^ t ( x ) and g ^ t ( x ) are the model functions to objective and constraint at x t . Model functions satisfy the following conditions [14]:
Assumption 4.
1 . For any x X ,
f ^ t ( x ) f t ( x ) , f ^ t ( x t ) = f t ( x t )
and g ^ t ( x ) g t ( x ) , with equality at x = x t
2.For any x X and any i = 1 , 2 , , m ,
g ^ t ( i ) ( x ) g t ( i ) ( x ) , g ^ t ( i ) ( x t ) = g t ( i ) ( x t )
3 . g ^ t ( · ) = [ g ^ t ( 1 ) ( · ) , g ^ t ( 2 ) ( · ) , , g ^ t ( m ) ( · ) ] is a bounded mapping on X, and there exists a constant D > 0 for any x X satisfying
g ^ t ( x ) D
In many approximations, we need gradient information of f t ( x ) and g t ( x ) . However, in our stochastic setting, the exact gradients F t ( x ) and G t ( x ) are not directly accessible. Instead, we use unbiased stochastic gradient estimates f t ( x ) and g t ( x ) based on the sampled functions.
f t ( x ) = E θ t P t [ F t ( x ) ] , g t ( x ) = E ξ t Q t [ G t ( x ) ]
Efficient models that satisfy Assumption4 can be designed.Depending on the properties of the f t ( x ) and g t ( x ) , different models are selected for approximation. Several model functions are presented in the MALM algorithm [14] :
  • Linearized model:
    f ^ t ( x ) : = f t ( x ) + f t ( x ) , x x t g ^ t ( i ) ( x ) : = g t ( i ) ( x ) + g t ( i ) ( x ) , x x t , i = 1 , , m
  • Quadratic model:
    f ^ t ( x ) : = f t ( x ) + f t ( x ) , x x t + ι 2 x x t 2
  • Truncated model:
    f ^ t ( x ) : = f t ( x ) + f t ( x ) , x x t +
  • Plain model:
    f ^ t ( x ) : = f t ( x ) g ^ t ( i ) ( x ) : = g t ( i ) ( x ) , i = 1 , , m
Then we define the model-based stochastic augmented Lagrangian of the problem as:
L ^ t , σ ( x , λ ) : = f ^ t ( x ) + 1 2 σ [ [ λ + σ g ^ t ( x ) ] + 2 λ 2 ]
where the operator [ · ] + means m a x { · , 0 } .
In the MALM algorithm [14], the primal variable is updated by solving the following proximal augmented Lagrangian subproblem:
x t + 1 = arg m i n x X [ L ^ t , σ ( x , λ t ) + α 2 x x t 2 ]
where α > 0 is the parameter of the proximal term. The optimality condition of the problem can be transformed as:
x t + 1 = x t 1 α x L ^ t , σ ( x t + 1 , λ t )
The 1 α plays the role of the step size in a gradient descent step.
We design the step size as α t = α 0 t which decreases as t increases, to accelerate convergence. At the beginning of the iteration, the algorithm has a larger step size. With the iteration of the algorithm, the step size continues to decrease. This design meets the requirements of different stages of the iterative process. In the early stage, a larger step size can accelerate algorithm iteration. In the late stage, a smaller step size can control the update amplitude and pursue precision. The proposed update for MSALM is:
x t + 1 = arg m i n x X [ L ^ t , σ ( x , λ t ) + α t 2 x x t 2 ]
The multiplier λ is updated by,
λ t + 1 = [ λ t + σ g ^ t ( x t + 1 ) ] +
The algorithm based on model-based augmented Lagrangian method is as follows,
Algorithm 1 MSALM
  • Require: Choose an initial point x 0 X arbitrarily. Set parameters α 0 > 0 , σ > 0 .Set the initial multiplier λ 0 = 0 .
  • for t=1, 2, ..., T do
  •     Submit the decision x t .
  •     Update distributions P t and Q t to determine F t and G t .
  •     Generate f t and g t by sampling θ t P t and ξ t Q t .
  •     Approximate f t ( x ) and g t ( x ) as f ^ t ( x ) and g ^ t ( x ) .
  •     Update x t + 1 and λ t + 1 by 3 and 4
  • end for
We measure the algorithm performance by stochastic dynamic regret and constraint violation 2. x t * is the theoretical optimal decision at each round.

3. Convergence Analysis

In this section, we analyze the performance of MSALM in online stochastic optimization by stochastic dynamic regret and constraint violation.
For proving stochastic dynamic regret, we adopt the definition of drift and the assumption on the drift from [7]:
Δ ( T ) : = t = 2 T x t 1 * x t *
Assumption 5.
Before the start of the algorithm, there is a Δ ¯ ( T ) for any T satisfying
Δ ( T ) Δ ¯ ( T )
The drift and assumption 5 ensure that although the problem with time-varying distributions retains the characteristic of sharing a common decision set, as in standard online optimization.
Assumption 6.
The gradient of f t ( x ) is bounded, i.e., there is a constant G f > 0 satisfying
f t G f
Lemma 1.
Under the Assumption1-6, we have,
E [ t = 1 T α t ( x t * x t 2 x t * x t + 1 2 ) ] α 0 R T ( 2 Δ ¯ ( T ) + 3 R )
Proof.
x t * x t 2 x t * x t + 1 2 x t * x t 2 x t * x t + 1 * 2 x t + 1 * x t + 1 2 + 2 x t * x t + 1 * x t + 1 * x t + 1
Let b t = x t * x t + 1 * ,
x t * x t 2 x t * x t + 1 2 x t * x t 2 b t 2 x t + 1 * x t + 1 2 + 2 b t · x t + 1 * x t + 1
Let A t = E [ x t * x t 2 ] . By Assumption 1, we have E [ x t + 1 * x t + 1 ] R and A t R 2 , then we take the expectation of 5,
E [ x t * x t 2 x t * x t + 1 2 ] A t A t + 1 b t 2 + 2 R b t
Multiplying by α t and summing,
E [ t = 1 T α t ( x t * x t 2 x t * x t + 1 2 ) ] t = 1 T α t ( A t A t + 1 ) + t = 1 T α t ( 2 R b t b t 2 )
We examine the two terms on the right-hand side of the above inequality separately.
t = 1 T α t ( A t A t + 1 ) = α 1 A 1 α T A T + 1 + t = 2 T ( α t α t 1 ) A t α 0 R 2 + R 2 ( α 0 T α 0 ) = α 0 R 2 T
t = 1 T α t ( 2 R b t b t 2 ) 2 α 0 R T t = 1 T b t 2 α 0 R T Δ ( T + 1 ) 2 α 0 R T ( Δ ¯ ( T ) + R )
6 can be written as:
E [ t = 1 T α t ( x t * x t 2 x t * x t + 1 2 ) ] α 0 R 2 T + 2 α 0 R T ( Δ ¯ ( T ) + R ) = α 0 R T ( 2 Δ ¯ ( T ) + 3 R )
Theorem 1.
Suppose Assumption 1-6 hold. The stochastic dynamic regret of MSALM has an sublinear upper bound, when the parameters are set α 0 = 1 Δ ¯ ( T ) and σ = 1 T .
R e g r e t ( T ) D 2 T 2 + ( G f 2 + R ) T Δ ¯ ( T ) + 3 R 2 2 T Δ ¯ ( T ) = O ( T Δ ¯ ( T ) )
Proof. We construct an auxiliary optimization problem:
min x X L ^ t , σ ( x , λ t ) + α t 2 ( x x t 2 x x t + 1 2 )
The optimal solution of the auxiliary problem satisfies:
x L ^ t , σ ( x , λ t ) + α t ( x t + 1 x t ) = 0
Comparing 7 with the optimality condition of ??, x t + 1 is the optimal point of the auxiliary problem.
We have the inequality:
L ^ t , σ ( x t + 1 , λ t ) + α t 2 x t + 1 x t 2 L ^ t , σ ( x t * , λ t ) + α t 2 ( x t * x t 2 x t * x t + 1 2 )
We analyze the two sides of the inequality separately. According to , the left-hand side becomes:
L ^ t , σ ( x t + 1 , λ t ) = f ^ t ( x t + 1 ) + 1 2 σ [ λ t + 1 2 λ t 2 ]
Combining f ^ t is convex and the assumption4,
f ^ t ( x t + 1 ) f ^ t ( x t ) + G f , x t + 1 x t f t ( x t ) G f x t + 1 x t
Th inequality 8 becomes:
L ^ t , σ ( x t + 1 , λ t ) f t ( x t ) G f x t + 1 x t + 1 2 σ [ [ λ t + 1 2 λ t 2 ]
Meanwhile, the right-hand side of the inequality 3 is bounded as:
L ^ t , σ ( x t * , λ t ) f t ( x t * ) + λ t , g ^ t ( x t * ) + σ 2 g ^ t ( x t * ) 2
Since x t * is a feasible solution, we have λ t , g ^ t ( x t * ) 0 . Considering the assumptionA4, we have:
L ^ t , σ ( x t * , λ t ) f t ( x t * ) + σ 2 D 2
From the above two parts, the inequality8 becomes:
f t ( x t ) G f x t + 1 x t + 1 2 σ [ λ t + 1 2 λ t 2 ] + α t 2 x t + 1 x t 2 f t ( x t * ) + σ 2 D 2 + α t 2 ( x t * x t 2 x t * x t + 1 2 )
We rearrange and bound the above inequality as:
f t ( x t ) f t ( x t * ) G f 2 2 α t 1 2 σ [ λ t + 1 2 λ t 2 ] + σ 2 D 2 + α t 2 ( x t * x t 2 x t * x t + 1 2 )
Summing from t = 1 to T, we have:
t = 1 T ( f t ( x t ) f t ( x t * ) ) t = 1 T G f 2 2 α t 1 2 σ t = 1 T [ λ t + 1 2 λ t 2 ] + σ D 2 T 2 + 1 2 t = 1 T α t ( x t * x t 2 x t * x t + 1 2 )
Taking expectation of the inequality 9 and substituting λ 1 = 0 :
E t = 1 T ( f t ( x t ) f t ( x t * ) ) 1 2 E t = 1 T α t ( x t * x t 2 x t * x t + 1 2 ) + t = 1 T G f 2 2 α t 1 2 σ E [ λ T + 1 2 ] + σ D 2 T 2
where for the term t = 1 T G f 2 2 α t , we have:
t = 1 T G f 2 2 α t G f 2 T α 0
When the parameters are set α 0 = 1 Δ ¯ ( T ) and σ = 1 T , we substitute the result of Lemma1:
R e g ( T ) = E t = 1 T ( f t ( x t ) f t ( x t * ) ) D 2 T 2 + ( G f 2 + R ) T Δ ¯ ( T ) + 3 R 2 2 T Δ ¯ ( T ) = O ( T Δ ¯ ( T ) )
Assumption 7.
The gradient of g t ( x ) is bounded, i.e., there is a constant G g > 0 satisfying
g t G g
Lemma 2.
1 2 σ E [ λ t + 1 2 λ t 2 ] 2 G f R + σ 2 D 2 + α t 2 E [ x s x t 2 x s x t + 1 2 ] ε 0 E [ λ t ]
where x s is the Slater’s point.
Proof. According to inequality8, we consider a point x s which satisfy the Slater’s condition.
L ^ t , σ ( x t + 1 , λ t ) + α t 2 x t + 1 x t 2 L ^ t , σ ( x s , λ t ) + α t 2 ( x s x t 2 x s x t + 1 2 )
From the definition of the augmented Lagrangian function,
L ^ t , σ ( x s , λ t ) = f ^ t ( x s ) + 1 2 σ [ [ λ t + σ g ^ t ( x s ) ] + 2 λ t 2 ]
Using the non-expansiveness property of the projection operator,
[ λ t + σ g ^ t ( x s ) ] + 2 λ t 2 + 2 σ λ t , g ^ t ( x s ) + σ 2 g ^ t ( x s ) 2
Therefore,
L ^ t , σ ( x s , λ t ) f ^ t ( x s ) + λ t , g ^ t ( x s ) + σ 2 g ^ t ( x s ) 2
By assumption 3 and 4, we have
L ^ t , σ ( x s , λ t ) f t ( x s ) ε 0 λ t + σ 2 D 2
The inequality 10 becomes:
f t ( x t ) G f x t + 1 x t + 1 2 σ [ λ t + 1 2 λ t 2 ] + α t 2 x t + 1 x t 2 f t ( x s ) ε 0 λ t + σ 2 D 2 + α t 2 ( x s x t 2 x s x t + 1 2 )
We rearrange terms and take expectation,
1 2 σ E [ λ t + 1 2 λ t 2 ] E [ f t ( x s ) f t ( x t ) ] + G f E [ x t + 1 x t ] α t 2 E [ x t + 1 x t 2 ] ε 0 E [ λ t ] + σ 2 D 2 + α t 2 E [ x s x t 2 x s x t + 1 2 ]
We bound each term.
E [ f t ( x s ) f t ( x t ) ] G f E [ x s x t ] G f R G f E [ x t + 1 x t ] α t 2 E [ x t + 1 x t 2 ] G f R
The inequality becomes:
1 2 σ E [ λ t + 1 2 λ t 2 ] 2 G f R + σ 2 D 2 + α t 2 E [ x s x t 2 x s x t + 1 2 ] ε 0 E [ λ t ]
Lemma 3.
There exist constants C 1 , C 2 , C 3 , C 4 > 0 for any t 0 and positive integer s satisfying
E [ λ t ] ψ ( σ , α 0 , s ) : = C 1 + C 2 α 0 + C 3 σ + C 4 σ s
where C 1 = 4 G f R ε 0 , C 2 = α 0 R 2 s ε 0 , C 3 = D 2 ε 0 D , C 4 = 2 D + ε 0 2 + s 8 σ D 2 ε 0 log 32 D 2 ε 0 2 .
Proof. For any t 0 , the inequality 11 can be:
1 2 σ E [ λ t + 1 2 λ t 2 ] 2 G f R + σ 2 D 2 + α t 2 E [ x s x t 2 x s x t + 1 2 ] ε 0 E [ λ t ]
Summing from t to t + s 1 ,
1 2 σ l = 0 s 1 E [ λ t + l + 1 2 λ t + l 2 ] l = 0 s 1 α t + l 2 E [ x s x t + l 2 x s x t + l + 1 2 ] ε 0 l = 0 s 1 E [ λ t + l ] + s 2 G f R + σ 2 D 2
We have,
l = 0 s 1 α t + l 2 E [ x s x t + l 2 x s x t + l + 1 2 ] α 0 2 t + s 1 l = 0 s 1 E [ x s x t + l 2 x s x t + l + 1 2 ] 1 2 α 0 t + s 1 R 2
Since the projection operator has the non-expansiveness:
λ t + 1 λ t = [ λ t + σ g ^ t ( x t + 1 ) ] + λ t σ g ^ t ( x t + 1 ) σ D
For any l 0 ,
λ t + l λ t σ D l
Therefore,
l = 0 s 1 E [ λ t + l ] l = 0 s 1 ( E [ λ t ] σ D l ) = s E [ λ t ] σ D s ( s 1 ) 2
Substituting the above results into the inequality 12,
1 2 σ E [ λ t + s 2 λ t 2 ] s 2 G f R + σ 2 D 2 + 1 2 α 0 t + s 1 R 2 ε 0 s E [ λ t ] σ D s ( s 1 ) 2
Since λ t + s 2 0 , we have,
ε 0 s E [ λ t ] s 2 G f R + σ 2 D 2 + 1 2 α 0 t + s 1 R 2 + 1 2 σ E [ λ t 2 ] + ε 0 σ D s ( s 1 ) 2
From the update rule,
| λ t + 1 2 λ t 2 | 2 λ t λ t + 1 λ t + λ t + 1 λ t 2 2 σ D λ t + σ 2 D 2
We define:
θ = C 1 + C 2 α 0 + C 3 σ
where C 1 , C 2 , C 3 are appropriate constants.
Specifically, from the rearranged inequality,
1 2 σ E [ λ t + s 2 λ t 2 ] K s ε 0 s E [ λ t ]
where K s collects all the positive bounded terms.
From earlier, we have:
| λ t + 1 λ t | σ D
Now, from Lemma2, we have:
1 2 σ E [ λ t + 1 2 λ t 2 ] K 1 ε 0 E [ λ t ]
where K 1 = 2 G f R + σ 2 D 2 + α t 2 E [ x s x t 2 x s x t + 1 2 ] .
Note that,
λ t + 1 2 λ t 2 = ( λ t + 1 λ t ) ( λ t + 1 + λ t )
When λ t is large, say λ t θ , then,
λ t + 1 2 λ t 2 2 σ ε 0 θ + 2 σ K 1
Based on this condition, we can obtain the inequality below by applying Lemma from[14],
λ t θ + σ D + 4 σ 2 D 2 ε 0 2 log 8 σ 2 D 2 ( ε 0 2 ) 2
where θ = 2 K 1 ε 0 .
Taking expectation and substituting back,
E [ λ t ] 2 G f R ε 0 + σ D 2 ε 0 + α 0 R 2 2 ε 0 + σ D + σ D · 8 σ D ε 0 log 32 σ 2 D 2 ε 0 2
Simplifying and grouping terms, we obtain the desired form,
E [ λ t ] ψ ( σ , α 0 , s ) : = C 1 + C 2 α 0 + C 3 σ + C 4 σ s
where,
C 1 = 2 G f R ε 0 C 2 = R 2 2 ε 0 C 3 = D 2 ε 0 + D C 4 = 8 D 2 ε 0 log 32 D 2 ε 0 2
Theorem 2.
Suppose Assumption 1-7 hold. The constraint violation of MSALM has an sublinear upper bound, when the parameters are set α 0 = 1 Δ ¯ ( T ) and σ = 1 T
V i o l a t i o n i ( T ) : = E t = 1 T G t , i ( x t ) O ( T )
Proof. According to the Lagrange multiplier update rule, assumption 4 and assumption 7, we have:
g t , i ( x t ) 1 σ ( λ t + 1 , i λ t , i ) + G g x t + 1 x t
Summing from t = 1 to T,
t = 1 T g t , i ( x t ) 1 σ t = 1 T ( λ t + 1 , i λ t , i ) + G g t = 1 T x t + 1 x t 1 σ λ T + 1 , i + G g t = 1 T x t + 1 x t
Taking expectation,
E t = 1 T g t , i ( x t ) 1 σ E [ λ T + 1 , i ] + G g t = 1 T E [ x t + 1 x t ] 1 σ E [ λ T + 1 ] + G g t = 1 T E [ x t + 1 x t ]
According to the Lemma3, for any t 0 and positive integer s,
E [ λ t ] C 1 + C 2 α 0 + C 3 σ + C 4 σ s
Specifically, for t = T + 1 , choosing s = T ,
E [ λ T + 1 ] C 1 + C 2 α 0 + C 3 σ + C 4 σ T
Substituting the parameter choices α 0 = 1 Δ ¯ ( T ) and σ = 1 T ,
E [ λ T + 1 ] C 1 + C 2 1 Δ ˜ ( T ) + C 3 1 T + C 4 1 T T = C 1 + C 2 1 Δ ˜ ( T ) + C 3 1 T + C 4
Since Δ ¯ ( T ) is an upper bound of Δ ( T ) , under reasonable assumptions Δ ¯ ( T ) does not tend to zero, therefore:
E [ λ T + 1 ] O ( 1 )
Computing the gradient,
x L ^ t , σ ( x , λ t ) = f ^ t ( x ) + [ λ t + σ g ^ t ( x ) ] + · g ^ t ( x )
The gradients are bounded:
f ^ t ( x ) G f , g ^ t ( x ) G g
Meanwhile, according to Lemma 3, λ t is bounded, therefore,
x L ^ t , σ ( x t + 1 , λ t ) G f + G g ( λ t + σ D )
Thus,
x t + 1 x t 1 α t ( G f + G g ( λ t + σ D ) )
Taking expectation,
E [ x t + 1 x t ] 1 α t ( G f + G g ( E [ λ t ] + σ D ) )
Substituting α t = α 0 t and the bound from Lemma3,
E [ x t + 1 x t ] 1 α 0 t ( G f + G g ( C 1 + C 2 α 0 + C 3 σ + C 4 σ s + σ D ) )
Summing from t = 1 to T,
t = 1 T E [ x t + 1 x t ] 1 α 0 t = 1 T 1 t ( G f + G g ( C 1 + C 2 α 0 + C 3 σ + C 4 σ s + σ D ) )
Using the integral bound,
t = 1 T 1 t 1 + 1 T 1 t d t = 1 + 2 ( T 1 ) 2 T
Therefore,
t = 1 T E [ x t + 1 x t ] 2 T α 0 ( G f + G g ( C 1 + C 2 α 0 + C 3 σ + C 4 σ s + σ D ) )
Substituting the parameter choices α 0 = 1 Δ ¯ ( T ) and σ = 1 T , and choosing s = T ,
t = 1 T E [ x t + 1 x t ] 2 T Δ ¯ ( T ) ( G f + G g ( C 1 + C 2 1 Δ ¯ ( T ) + C 3 1 T + C 4 + D 1 T ) )
Under reasonable assumptions (that Δ ¯ ( T ) does not grow too fast), we have,
t = 1 T E [ x t + 1 x t ] O ( T )
So we have,
V i o i ( T ) = E t = 1 T g t , i ( x t ) 1 σ E [ λ T + 1 ] + G g t = 1 T E x t + 1 x t 1 σ · O ( 1 ) + G g · O ( T ) = T · O ( 1 ) + O ( T ) = O ( T )

4. Numerical Experiments

In this section, our algorithm is demonstrated to be capable of solving many real problems. Firstly, we explored the influence of some initial parameter values on our algorithm on different mathematical models. The best parameter combinations for each model are provided. Then we compare our algorithm with the PSGD algorithm in the same simulation environment. Secondly, we created a simulation experiment to test our algorithm. The performance of our algorithm when solving different real problems is observed. Then the results of the experiments are presented. Finally, we combined our algorithm with supervised learning for path planning. The results show that regret and constraint violation also have convergent bounds. This proves that our algorithm has the ability to solve online path planning problems.
In addition, in this paper, the model training was conducted on Python 3. 12. 6 and all the experiments were conducted on Matlab 2025b on a laptop with Windows 11 installed for fairness. The CPU of this laptop is AMD Ryzen AI 9 H 465 w/ Radeon 880M (2.00 GHz) and 32GB of RAM.

4.1. Comparative Experiment with the Existing Algorithm

We compared our algorithm with the PSGD algorithm using adaptive filtering and online logistic regression problems.
Adaptive filtering is a core recursive estimation technique in modern signal processing, system identification, and control. In many applications, the impulse response exhibits sparsity in the time domain. Meanwhile, we consider that physically realizable systems are all stable, so their impulse response energy must be finite. There are two constraints, based on the actual context of the problem. The mathematical model of adaptive filtering is as follows:
min x R n a · x + b s . t . x 1 γ s , x 2 2 γ e
At each round, a and b are drawn from two independent normal distributions, whose means and standard deviations both vary smoothly over time in a sinusoidal or cosinusoidal manner.
Online logistic regression is a classic binary classification method in machine learning. It has significant application value in dynamic data stream environments. This problem is mathematically formulated as a constrained regularized empirical risk minimization problem. The objective function controls the model’s complexity and prevents overfitting. The mathematical model of online logistic regression is as follows:
min θ R n 1 m i = 1 m [ y i log ( σ ( x i θ ) ) ( 1 y i ) log ( 1 σ ( x i θ ) ) ] + λ 2 θ 2 2 s . t . θ 1 γ s , θ 2 2 γ e
At each round, x t is drawn from a normal distribution with fixed covariance, y t is drawn from a Bernoulli distribution, and the true parameter vector w true ( t ) varies smoothly over time in a sinusoidal manner.
We explored the influence of initial parameter values on our algorithm. Different α 0 values in our algorithm were compared. There are two criteria for measuring the quality of an online algorithm, so a multi-objective planning approach was adopted to make a mixed measurement. We used AHP(Analytical Hierarchy Process) to determine the weights of the two quantities. The two measurements correspond to scale value 3 in the 1-9 scale method. So the weight of regret is 0.25 and the weight of constraint violation is 0.75. Our mixed measurement m i x is defined as follows:
m i x = 0.25 * r e g r e t + 0.75 * v i o l a t i o n
In Figure 1, in adaptive filtering, we found that the algorithm performs best when α 0 = 0.2 . And in online logistic regression, the algorithm performs best when α 0 = 0.5 .
Then we compared our algorithm with PSGD algorithm. The comparison results are as follows:
Figure 2. We compared our algorithm with PSGD algorithm. (a) The regret of adaptive filtering. (b)The constraint violation of adaptive filtering. (c) The regret of online logistic regression. (d) The constraint violation of online logistic regression.
Figure 2. We compared our algorithm with PSGD algorithm. (a) The regret of adaptive filtering. (b)The constraint violation of adaptive filtering. (c) The regret of online logistic regression. (d) The constraint violation of online logistic regression.
Preprints 208126 g002
we compared the performance of two algorithms in adaptive filtering and online logistic regression. The figures show that MSALM attains lower regret and constraint violation bounds than PSGD. This result indicates that our algorithm has the ability to converge more rapidly toward the theoretical optimum while adhering better to the constraints. According to the experimental results, the conclusion can be obtained. When choosing a suitable α 0 , MSALM is superior to the existing algorithm.

4.2. Experiments Under Existing Models

In order to test the practicality of our algorithm, we applied our algorithm in energy dispatch problem and network resource allocation.
Time-varying smart grid energy dispatch problem is one of the core tasks of modern smart grids. This problem is to economically and reliably coordinate multiple heterogeneous energy sources while meeting the changing electricity demand. The objective functions of this problem include power balance constraints and resource capacity constraints. The mathematical model of time-varying smart grid energy dispatch is as follows:
min x R n c x + 1 2 x Q x
s . t . i = 1 n g x i j = 1 n s x n g + j k = 1 n d x n g + n s + k = d
0 x i u i , i = 1 , , n
At each round, c, Q are drawn from independent normal distributions with fixed standard deviations. Their means evolve over time according to a random walk with a decaying step size.
Online network resource allocation is a key challenge in modern computing systems. The key task is to efficiently allocate limited resources to continuously arriving real-time tasks or user requests. The objective function consists of two parts: Quality of Service cost and resource usage cost. The mathematical model of online network resource allocation is as follows:
min x R n i = 1 n ( d i x i ) 2 + α i = 1 n x i
s . t . x i c i , i = 1 , , n
At each round, the demand vector is drawn from a multivariate normal distribution with fixed standard deviation, and is then truncated to be nonnegative. Its mean vector varies smoothly over time following a sinusoidal pattern with a slow linear trend.
We conducted simulation experiments on two mathematical models, and the results are as follows,
Figure 3. Our algorithm is applied to energy dispatch problem and online network resource allocation. (a) Regret of time-varying smart grid energy dispatch. (b)Constraint violation of time-varying smart grid energy dispatch. (c) Regret of online network resource allocation. (d) Constraint violation of online network resource allocation.
Figure 3. Our algorithm is applied to energy dispatch problem and online network resource allocation. (a) Regret of time-varying smart grid energy dispatch. (b)Constraint violation of time-varying smart grid energy dispatch. (c) Regret of online network resource allocation. (d) Constraint violation of online network resource allocation.
Preprints 208126 g003
We took α 0 = 0.7 and α 0 = 0.37 for testing, respectively. The simulation experiments show that regret and constraint violation have a sublinear convergent bound of T, demonstrating that our algorithmic solution can gradually approach the theoretically optimal solution while adhering to the constraints. The results show that our algorithm can be applied to time-varying smart grid energy dispatch problem and the online network resource allocation problem.

4.3. Experiment Combining Our Algorithm with Supervised Learning

We combined MSALM with supervised learning to solve path planning problem. For obtaining an explicit function that is used in our algorithm, we applied parameter regression methods in supervised learning. First of all, 10 flight trajectories’ data for the same flight at the same time slot on different dates were randomly selected. Afterwards, we utilized these data to train a mathematical model for flight trajectories through parameter regression methods.B-spline interpolation was used to obtain the explicit function of the flight trajectory. In order to find the appropriate number of B-spline interpolation control points, we compared the errors between the fitted trajectory and 10 known trajectories under different numbers of control points. The results are as follows:
Table 1. Fitting error with different numbers of control points.
Table 1. Fitting error with different numbers of control points.
Number of Control Points 3D RMSE (m) Mean Error (m) Max Error (m)
6 16212.93 14630.05 42953.94
7 15278.20 13697.46 28088.96
8 10301.73 9577.57 20571.04
9 7767.29 7094.96 15264.24
10 6056.13 5466.36 12094.13
11 6474.88 5956.19 12599.98
12 6416.03 5793.27 12159.70
According to the above table, the accuracy reaches its highest when the trajectory has 10 control points. So we obtained the fitting function in this case. The coordinates of the feature points are as follows:
Table 2. Coordinates of the control points.
Table 2. Coordinates of the control points.
Control Point X (m) Y (m) Z (m)
1 478288.923 4423806.307 1199.714
2 483007.653 4412762.620 1939.867
3 491176.780 4411013.545 1022.683
4 540450.109 4386975.935 6298.608
5 613313.197 4283290.653 6170.731
6 753322.541 4268797.429 6188.807
7 792318.111 4171779.364 6157.780
8 773451.251 4061749.319 3778.580
9 782481.198 4033694.042 2106.506
10 782988.940 4014086.869 1142.003
We set some random situations as constraints to simulate the real flight conditions. The constraints include 9 dynamic obstacle avoidance constraints and 4 airspace boundary constraints. The mathematical model of dynamic obstacle avoidance constraints has the following form:
g k , i = d s a f e , k 2 ( t ) | | p i c k ( t ) | | 2 0
where c k ( t ) is the obstacle center coordinates, d s a f e , k ( t ) is the safe distance, p i is the control point with obstacle avoidance constraint. We set 3 obstacles and the 1st, 5th, and 10th control points with obstacle avoidance constraints, so k = 1 , 2 , 3 and i { 1 , 5 , 10 } .
The mathematical model of airspace boundary constraints has the following form:
x m i n x x m a x
y m i n y y m a x
Then we use our algorithm to solve this online optimization problem. The result of the experiment is as follows:
Figure 4. This is a figure in which our algorithm is applied to flight path planning. (a) The regret of flight path planning (b) The constraint violation of flight path planning.
Figure 4. This is a figure in which our algorithm is applied to flight path planning. (a) The regret of flight path planning (b) The constraint violation of flight path planning.
Preprints 208126 g004
In this figure, the results of the simulation experiment show that regret and constraint violation of our algorithm have a sublinear convergent bound of T, demonstrating progressive convergence to the theoretical optimum under compliance with the constraint. This indicates that the aircraft can make decisions that gradually approach the optimal choice during the flight and stay as safe as possible in the face of emergencies. This proves that our algorithm can combine with machine learning methods to be applied to similar path planning problems.

5. Conclusions

In this paper, we present a new online stochastic augmented Lagrangian method, the MSALM, for solving online stochastic optimization problems with time-varying distributions. At each round, we construct the model functions for the objective and constraint functions based on their properties, which reduces the computational complexity. The step size is designed in a dynamic form and decreases as t increases to accelerate convergence. Under standard assumptions, we proved that our algorithm achieves sublinear regret and constraint violation bounds of T. Simulation experiments demonstrate the performance and practical utility of the MSALM algorithm. Additionally, in the context of path planning, we combined our algorithm with supervised learning to further demonstrate its extensibility.

References

  1. Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning; 2003, pp. 928–935.
  2. Shalev-Shwartz, S. Online learning and online convex optimization. Found. Trends Mach. Learn. 2012, 4, 107–194. [Google Scholar] [CrossRef]
  3. Hazan, E. Introduction to online convex optimization. Found. Trends Optim. 2016, 2, 157–325. [Google Scholar] [CrossRef]
  4. Chen, T.; Ling, Q.; Giannakis, G. B. An online convex optimization approach to proactive network resource allocation. IEEE Trans. Signal Process. 2017, 65, 6350–6364. [Google Scholar] [CrossRef]
  5. Zhang, Y.; Dall’Anese, E.; Hong, M. Online proximal-ADMM for time-varying constrained convex optimization. IEEE Trans. Signal Inf. Process. Netw. 2021, 7, 144–155. [Google Scholar] [CrossRef]
  6. Tsuchiya, T.; Ito, S. Fast rates in stochastic online convex optimization by exploiting the curvature of feasible sets. In Advances in Neural Information Processing Systems 37 (NeurIPS 2024); 2024. [Google Scholar]
  7. Cao, X.; Zhang, J.; Poor, H. V. Online stochastic optimization with time-varying distributions. IEEE Trans. Autom. Control 2020, 66, 1840–1847. [Google Scholar] [CrossRef]
  8. Kiefer, J.; Wolfowitz, J. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics 1952, 23, 462–466. [Google Scholar] [CrossRef]
  9. Johnson, R.; Zhang, T. Accelerating stochastic gradient descent using predictive variance reduction. Advances in Neural Information Processing Systems 2013, 26, 315–323. [Google Scholar]
  10. Qi, R.; Xue, D.; Zhai, Y. A momentum-based adaptive primal–dual stochastic gradient method for non-convex programs with expectation constraints. Mathematics 2024, 12, 2393. [Google Scholar] [CrossRef]
  11. Arrow, K. J.; Hurwicz, L.; Uzawa, H. Studies in Linear and Non-Linear Programming; Stanford University Press: Stanford, CA, USA, 1958. [Google Scholar]
  12. Yu, H.; Neely, M. J.; Wei, X. Online convex optimization with stochastic constraints. In Advances in Neural Information Processing Systems; 2017; pp. 1428–1438. [Google Scholar]
  13. Lesage-Landry, A.; Wang, H.; Shames, I.; Mancarella, P.; Taylor, J. A. Online convex optimization of multi-energy building-to-grid ancillary services. IEEE Trans. Control Syst. Technol. 2020, 28, 2416–2431. [Google Scholar] [CrossRef]
  14. Liu, H.; Xiao, X.; Zhang, L. Augmented Lagrangian methods for time-varying constrained online convex optimization. J. Oper. Res. Soc. China 2025, 13, 364–392. [Google Scholar]
  15. Rockafellar, R. T. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1976, 1, 97–116. [Google Scholar] [CrossRef]
  16. Suo, W.; Li, W.; Zhang, B.; Liu, Y. Distributed online convex optimization with multiple coupled constraints: A double accelerated push-pull algorithm. J. Franklin Inst. 2024, 361, 106884. [Google Scholar] [CrossRef]
  17. Hou, R.; Li, X.; Shi, Y. Online composite optimization with time-varying regularizers. J. Franklin Inst. 2024, 361, 106884. [Google Scholar] [CrossRef]
Figure 1. We plotted the images of the mix quantity under different α 0 to determine which parameter can achieve better performance of the algorithm. (a) The mixed value of adaptive filtering (b) The mixed value of online logistic regression.
Figure 1. We plotted the images of the mix quantity under different α 0 to determine which parameter can achieve better performance of the algorithm. (a) The mixed value of adaptive filtering (b) The mixed value of online logistic regression.
Preprints 208126 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated