Preprint
Article

This version is not peer-reviewed.

A Novel Symmetrical Inertial Alternating Direction Method of Multipliers with Proximal Term for Nonconvex Optimization with Applications

A peer-reviewed article of this preprint also exists.

Submitted:

14 May 2025

Posted:

15 May 2025

You are already at the latest version

Abstract
In this paper, we propose a novel alternating direction method of multipliers based on inertial acceleration techniques for a class of nonconvex optimization problems with a two-block structure. To address the nonconvex subproblem, we introduce a proximal term to reduce the difficulty of solving this subproblem. For smooth subproblem, we employ a gradient descent method on the augmented Lagrangian function, which significantly reduces the computational complexity. Under the assumptions that the generated sequence is bounded and the auxiliary function satisfies Kurdyka-\L{ojasiewicz} property, we establish the global convergence of the proposed algorithm. Finally, the effectiveness and superior performance of the proposed algorithm are validated through numerical experiments in signal processing and SCAD problems.
Keywords: 
;  ;  ;  ;  

1. Introduction

In recent years, nonconvex optimization problems have found widespread applications in science and engineering. For instance, Mohammadreza et al. [1] investigated the optimization of local nonconvex objective functions in time-varying networks based on the gradient tracking algorithm. And then, the researchers had explored optimizing nonconvex objective functions in multi-node networks under imperfect data exchange links. Zhang et al. [2] pointed out that traditional optimization methods often lead to target feature compression and information loss in motor imagery decoding, thereby reducing classification performance. Further, to address the high dimensionality and small sample size characteristics of motor imagery signals, Zhang et al. [2] proposed a nonconvex sparse regularization model constructed using the Cauchy function. This approach enables more accurate extraction of target features across multiple datasets while effectively suppressing noise interference. In addition, Tiddeman and Ghahremani [3] combined wavelet transforms with principal component analysis to propose a class of principal component waveform networks for solving linear inverse problems, they fully utilized the symmetry in wavelet transforms during the wavelet decomposition, ensuring the effectiveness of image reconstruction. For more related works, one can see [4,5,6] and the references therein.
It is well known that recovering sparse signals from incomplete observations is an important research direction in practical applications. The core objective is to find the optimal sparse solution to a system of linear equations, which can be formulated as the following model [7]:
min c x 0 + 1 2 A x b ,
where A is the measurement matrix, b is the observed data, x is a sparse signal, c > 0 is a regularization parameter, and · 0 denotes the 0 -norm. However, Chartrand and Staneva [8] pointed out that the above problem represents a class of problems that are fundamentally difficult to solve. To overcome this challenge, Zeng et al. [9] proposed a relaxed objective function by replacing the l 0 regularization with the l 1 2 regularization, the problem is transformed into a more tractable nonconvex optimization problem. Therefore, adopting this modification becomes more reasonable in signal recovery problems, which leads to the following two-block nonconvex optimization problem:
min c x 1 2 1 2 + 1 2 A x b 2 ,
where x 1 2 1 2 = i = 1 n | x i | 1 2 . In general, we would consider introducing an auxiliary variable y to reformulate the problem (1) as follows
min c x 1 2 1 2 + 1 2 y 2 s . t . A x y = b .
Zeng et al. [9] pointed out that the iterative soft-thresholding algorithm can be used to solve the regularization problem, which was validated in context of problem (2). Meanwhile, Chen and Selesnick [10] performed a performance validation of model (2) using an improved overlapping shrinkage algorithm. In addition, further related works can be found in [11,12].
In statistical optimization, certain penalty methods exhibit limitations, such as vulnerability to data circumvention and biased estimation of significant variables [14]. To address these issues, Fan and Li [14] proposed the smoothly clipped absolute deviation (SCAD) penalty function. They developed optimization algorithms to solve non-concave penalized likelihood problems and demonstrated that this method possesses asymptotic oracle properties. Remarkably, with appropriate regularization parameter selection, the results can achieve nearly identical performance to the known true model. SCAD penalty problem proposed by scholars can be conceptually understood as
min i = 1 n h k ( | x i | ) + 1 2 y 2 s . t . A x y = b ,
with A R m × n , y , b R m , x = ( x 1 , x 2 , , x n ) T R n and the penalty function h κ in the objective, we refer readers to (26) later. As shown above, problems of the forms (2) and (3) can be generalized to the following nonconvex optimization problem:
min f ( x ) + g ( y ) s . t . A x + y = b ,
where f is a lower semicontinuous function from R n to R , and g : R m R is a differentiable function whose gradient is L-Lipschitz continuous with L > 0 . Here, A denotes a matrix in R m × n , x R n and y R m . Variants of model (4) have found applications in various fields, such as statistical learning [15,16,17], penalized zero-variance discriminant analysis [18] and image reconstruction [19,20].
It is well known that the alternating direction method of multipliers (ADMM) has gained widespread attention due to its balance between performance and efficiency. When the subproblems are independent, ADMM exhibits a unique symmetry. In fact, with appropriately designed update steps, this symmetry ensures that the convergence of ADMM is independent of the order in which the subproblems are updated [21]. In recent years, as nonconvex optimization problems have gained increasing attention, the convergence analysis of ADMM in nonconvex settings has become a research hotspot. Hong et al. [22], recognizing the strong empirical performance of ADMM on nonconvex problems but the lack of theoretical guarantees, not only established the convergence theory for nonconvex ADMM but also overcame the limitation on the number of variable blocks. Wang et al. [23] demonstrated that incorporating the Bregman distance into ADMM can effectively simplify the computation of subproblems, emphasizing the feasibility of ADMM in nonconvex settings. Ding et al. [24] proposed a class of Semi-Proximal ADMM for solving low-rank matrix recovery problems. In the presence of noisy matrix data, by minimizing the nuclear norm, they effectively addressed the issues of Gaussian noise and related mixed noise. Guo et al. [25] provided insights into solving large-scale nonconvex optimization problems using ADMM. For more related work, readers may refer to [26,27,28,29].
The inertial acceleration technique, derived from the heavy-ball method, utilizes information from previous iterations to construct affine combinations [30]. Additionally, we observe that inertial techniques can employ different extrapolation strategies during the optimization process to enhance convergence speed. In their study of the general inertial proximal gradient method, Wu and Li [31] proposed two distinct extrapolation strategies to flexibly adjust the algorithm’s convergence rate. Chen et al. [32] investigated an inertial proximal ADMM and established the global convergence of iterates under appropriate assumptions. In fact, Chao et al. [33] also discovered that embedding the inertial term into the y-subproblem can significantly improve the algorithm’s convergence speed. Moreover, compared to the standard inertial update step x ¯ k = x k + η ( x k x k 1 ) , Wang et al. [34] considered a different update scheme, x ¯ k = x k + η ( x k x ¯ k ) . This update step not only preserves the acceleration effect of inertia but also reduces computational errors introduced by the inertial term updates.
Unfortunately, the work of Wang et al. only considered the inertial update step for x. Inspired by the work in [34], we propose a novel symmetrical inertial alternating direction method of multipliers with proximal term (NIP-ADMM). Building upon Wang et al.’s inertial update step, we introduce an additional inertial update for y and incorporate y ¯ k into the x-subproblem update, this form of inertial update ensures that the primal variables are treated equally, thereby achieving faster acceleration. To simplify the computation of the subproblems, we introduce an approximation term in the x-subproblem, which under certain conditions, allows the nonconvex subproblem to be transformed into an approximate projection-type problem. Furthermore, since g ( y ) is convex, this property ensures that g ( y ) is well-defined, this enables us to abandon the traditional ADMM update scheme and instead adopt a gradient descent approach. This method requires only the computation of gradients at each iteration, significantly reducing computational complexity. Consequently, it offers substantial advantages when handling high-dimensional or large-scale datasets.
The structure of this paper is as follows. In Section 2, we review essential results required for further analysis. We present NIP-ADMM and analyze its convergence in Section 3. Numerical experiment and application to signal recovery in Section 4 highlight the benefits of the majorization and inertial techniques. Lastly, in Section 5, we sent a conclusion.

2. Preliminaries

In this section, we introduce key notations and definitions that are essential for the results to be developed and are utilized in the subsequent sections.
Assume x , y = x T y , x = x , x . If matrix Q is a positive definite (semi-definite positive) matrix, then we have Q 0 ( Q 0 ) . Given any n × n matrix Q 0 and a vector x R n , let x Q : = x T Q x be Q-norm of x. For the matrix G, we define λ min ( G ) and λ max ( G ) as the smallest and largest eigenvalues of G T G , respectively. If we denote f : R n ( , + ] , then the domain of the function is defined as d o m f = { x R n | f ( x ) < + } .
Definition 1. 
Let S R n . Then the distance from point x R n to S is defined as d ( x , S ) = inf y S y x . In particular, if S = , then d ( x , S ) = + .
Definition 2. 
For a differentiable convex function F : R n R , the Bregman distance is defined by
F ( p , q ) = F ( p ) F ( q ) F ( q ) , p q ,
where p , q R n .
Definition 3. 
Assume f : R n R { + } is a proper lower semicontinuous function.
(i)
The Frechet sub-differential of f at x dom f is denoted by ^ f ( x ) and defined as:
^ f ( x ) = x ¯ R n : lim y x inf y x f ( y ) f ( x ) x ¯ , y x y x 0 .
Among others, we set ^ f ( x ) = when x dom f .
(ii)
The limiting sub-differential of f at x dom f is written as f ( x ) and defined by
f ( x ) = x ¯ R n : x k x , f x k f ( x ) , x ^ k ^ f x k , x ^ k x ¯ .
Proposition 1. 
The sub-differential of a lower semicontinuous function f possesses several fundamental and significant properties as follows:
(i)
From Definition 3, which implies that ^ f ( x ) f ( x ) holds for all x R n , and given that f ( x ) is a closed set, ^ f ( x ) is also a closed set.
(ii)
Suppose that ( x k , y k ) is a sequence that converges to ( x , y ) , and f ( x k ) converges to f ( x ) with y k f ( x k ) . Then, by the definition of the sub-differential, we have y f ( x ) .
(iii)
If x is a local minimum of f, then it follows that 0 f ( x ) .
(vi)
Assuming that g : R n R is a continuously differentiable function, we can derive:
( f + g ) ( x ) = f ( x ) + g ( x ) .
Definition 4. 
We consider the point ( x , y , λ ) to be a critical point of the augmented Lagrangian function L ^ β ( x , y , λ ) if it satisfies the following conditions:
A T λ f ( x ) , B T λ = g ( y ) , A x + B y = b .
Definition 5 
([36]). (Kurdyka-Łojasiewicz property (KLP)) Let f : R n R { + } be a proper lower semicontinuous function. For p ^ f ( f ), if there exists ς ( 0 , + ) , a neighborhood U of p ^ , and a function φ Q φ , here Q φ is the set of the concave function φ : [ 0 , ς ) [ 0 , + ) , then for any p { p U f ( p ^ ) < f ( p ) < f ( p ^ ) + ς } , the following inequality holds
φ ( f ( p ) f ( p ^ ) ) d ( 0 , f ( p ) ) 1 ,
and we call that f satisfies KLP.
Lemma 1 
([35]). Suppose that the matrix B R r × p is a non-zero matrix, and let μ B denote the smallest positive eigenvalue of the matrix B B T . Then for each u R P , the following holds:
P Q T u 1 μ B B u .
Lemma 2 
([36]). Assume B ( x , y ) = f ( x ) + g ( y ) , where f : R n R { + } and g : R m R { + } are both proper lower semicontinuous functions. Then, for any ( x , y ) dom B = dom f × dom g , we can obtain
B ( x , y ) = x f ( x , y ) × y g ( x , y ) .
Lemma 3 
([37]). (Uniformized KLP) Let Ω be a compact set and Q φ be the same as in Definition 5. If a proper lower semicontinuous function f : R n R { + } is fixed at a point in Ω and satisfies the KLP at every point on Ω, and there exist ϱ > 0 , ς > 0 , and φ Q φ such that for any x ^ Ω and x { x R n : d ( x , Ω ) < ϱ } [ f ( x ^ ) < f ( x ) < f ( x ^ ) + ς ] , then the following inequality is satisfied:
φ ( f ( x ) f ( x ^ ) ) d ( 0 , f ( x ) ) 1 .
Lemma 4 
([38]). If the function c : R n R is continuously differentiable, and c is Lipschitz continuous with constant L 0 , then for any x , y R n , the following result holds:
| c ( y ) c ( x ) c ( x ) , y x | L 2 y x 2 .

3. Algorithm and Convergence Analysis

In this section, we first present the definition of the augmented Lagrangian function associated with problem (4) as follows:
L β ( x , y , λ ) = f ( x ) + g ( y ) + λ , A x + y b + β 2 A x + y b 2 ,
where λ denotes the augmented Lagrange multiplier, and β > 0 is a penalty parameter. Following this, we propose the NIP-ADMM for solving the problem (4), the proposed algorithm is outlined below:
Remark 1.
(i)
In NIP-ADMM, the inertial parameters η and θ are both in ( 0 , 1 ] , and S is a positive semi-definite matrix.
(ii)
The update scheme for the y-subproblem adopts the gradient descent method, where y L β is the gradient of the function L with respect to y, and γ is called the learning rate.
(iii)
The inertial structure we adopted employs a structurally balanced acceleration strategy. This update strategy is mathematically symmetric, with the only distinction being the values of the parameters η and θ.
According to Algorithm 1, the optimality conditions for NIP-ADMM are obtained as
0 f ( x k + 1 ) A T λ k + β A T ( A x k + 1 + y ¯ k b ) + S ( x k + 1 x ¯ k ) , 0 = g ( y k ) + λ k + β ( A x k + 1 + y k b ) 1 γ ( y k + 1 y k ) .
Before concluding this section, we present the following fundamental assumptions, which are essential for the convergence analysis.
Algorithm 1: NIP-ADMM
Initialization: Input x 1 , y 1 , and λ 1 , let x ¯ 0 = x 1 , and y ¯ 0 = y 1 . Given constants η , θ , γ , β . Set k = 1 .
While "not converge" Do
1 Compute ( x ¯ k , y ¯ k ) = ( x k , y k ) + θ ( x k x ¯ k 1 , 0 ) + η ( 0 , y k y ¯ k 1 ) .
2 Execute arg min { L β ( x , y ¯ k , λ k ) + 1 2 x x ¯ k S 2 to determine x k + 1 .
3 Calculate y k + 1 = y k γ y L β ( x k + 1 , y k , λ k ) .
4 Update dual variable λ k + 1 = λ k + β ( A x k + 1 + y k + 1 b ) .
5 Let k = k + 1 .
End While
Output: output ( x k + 1 , y k + 1 , λ k + 1 ) of the problem (4).
Assumption 1. (i) f : R n 1 R { + } is a proper lower semicontinuous function. g : R n 2 R is continuously differentiable, and g is Lipschitz continuous with a Lipschitz constant l g > 0 .
(ii)
S is a positive semidefinite matrix.
(iii)
For convenience, we introduce the following symbols:
ζ = ( x , y , λ ) , ζ k = ( x k , y k , λ k ) , ζ * = ( x * , y * , λ * ) , ξ = 1 γ β , ζ ^ k = ( x k , y k , λ k , x ¯ k , x k 1 , y k 1 ) , σ 0 = 1 γ L + β 2 2 ξ 2 β 2 ( ξ + L ) 2 β , L ^ β ( ζ ^ k ) = L β ( ζ k ) + ( ξ + L ) 2 β y k + 1 y k 2 + 1 2 x ¯ k + 1 x k S 2 .
(iv)
To analyze the monotonicity of { L ^ β ( ζ ^ k ) } , we set σ 0 > 0 .
Lemma 5. 
If Assumption 1 holds, for any k 1 , then
L ^ β ( x k + 1 , y k + 1 , λ k + 1 , x ¯ k + 1 , x k , y k ) L ^ β ( x k , y k , λ k , x ¯ k 1 , x k 1 , y k 1 ) 1 η 2 2 x k x ¯ k 1 S 2 σ 0 y k + 1 y k 2 ,
where η ( 0 , 1 ] is the inertial parameter in Algorithm 1.
Proof. 
According to the definition of the Lagrangian function, one gets
L β ( x k + 1 , y k + 1 , λ k + 1 ) L β ( x k + 1 , y k + 1 , λ k ) = λ k + 1 λ k , A p k + 1 + q k + 1 b = 1 β λ k + 1 λ k 2 ,
and we also can know that
L β ( x k + 1 , y k + 1 , λ k ) L β ( x k + 1 , y k , λ k ) = g ( y k + 1 ) g ( y k ) λ k , y k + 1 y k + β 2 A x k + 1 + y k + 1 b 2 β 2 A x k + 1 + y k b 2 .
It follows from (6), (9) and Lemma 4 that
g ( y k + 1 ) g ( y k ) λ k , y k + 1 y k g ( y k ) , y k + 1 y k + L 2 y k + 1 y k 2 λ k , y k + 1 y k = λ k + 1 + ( 1 γ β ) ( y k y k + 1 ) , y k + 1 y k + L 2 y k + 1 y k 2 λ k , y k + 1 y k = ( L 2 + β 1 γ ) y k + 1 y k 2 + λ k + 1 λ k , y k + 1 y k ,
and we get
β 2 A x k + 1 + y k + 1 b 2 β 2 A x k + 1 + y k b 2 = β 2 A x k + 1 + y k + 1 b 2 β 2 ( A x k + 1 + y k + 1 b ) + ( y k y k + 1 ) 2 = β 2 A x k + 1 + y k + 1 b 2 β 2 ( A x k + 1 + y k + 1 b 2 + y k y k + 1 2 ) + β ( A x k + 1 + y k + 1 b ) , y k y k + 1 = β 2 y k y k + 1 2 + λ k λ k + 1 , y k y k + 1 .
Combining the above two formulas, one can declare
L β ( x k + 1 , y k + 1 , λ k ) L β ( x k + 1 , y k , λ k ) ( L + β 2 1 γ ) y k + 1 y k 2 .
Since x k + 1 is the optimal solution to the subproblem with respect to x, one knows that
L β ( x k + 1 , y k , λ k ) L β ( x k , y k , λ k ) 1 2 x k x ¯ k S 2 1 2 x k x ¯ k + 1 S 2 1 2 x k x ¯ k 1 S 2 1 2 x k x ¯ k + 1 S 2 1 η 2 2 x k x ¯ k 1 S 2 .
Noticing Algorithm 1 and (6), one can see
( 1 γ β ) ( y k y k + 1 ) = g ( y k ) λ k + 1 .
Thus, it is natural to derive the following process:
λ k + 1 λ k 2 = g ( y k ) g ( y k 1 ) + ( 1 γ β ) ( y k + 1 y k ) ( 1 γ β ) ( y k y k 1 ) 2 2 ( L + 1 γ β ) 2 y k y k 1 2 + 2 ( 1 γ β ) 2 y k + 1 y k 2 .
Combining (8) and Equations (10)–(12), one can draw the following conclusions:
L β ( x k + 1 , y k + 1 , λ k + 1 ) + ( ξ + L ) 2 β y k + 1 y k 2 + 1 2 x ¯ k + 1 x k S 2 L β ( x k , y k , λ k ) + ( ξ + L ) 2 β y k y k 1 2 + 1 2 x ¯ k x k 1 S 2 1 η 2 2 x k x ¯ k 1 S 2 σ 0 y k + 1 y k 2 ,
where ξ = 1 γ β and σ 0 = 1 γ L + β 2 2 ξ 2 β 2 ( ξ + L ) 2 β , and we obtain the desired conclusion. □
According to Assumption 1 with σ 0 > 0 and η ( 0 , 1 ] , the monotonic non-increasing property of the sequence { L ^ β ( ζ ^ k ) } is guaranteed.
Lemma 6. 
If the sequence ζ k : = { x k , y k , λ k } generated by the algorithm is bounded, then we have
ζ k + 1 ζ k 2 < + .
Proof. 
Since { ζ k } is bounded, it is evident that ζ ^ k is also bounded. Moreover, there exists an accumulation point, let us assume it to be ζ , and there exists a subsequence { ζ j k } of { ζ k } such that
lim inf j L ^ β ( ζ k j ) L ^ β ( ζ ) ,
which implies that { L ^ β ( ζ k j ) } is bounded from below. From Lemma 5 and the condition k 2 , it follows that
k = 2 n σ 0 y k + 1 y k 2 + k = 2 n 1 θ 2 2 x k x ¯ k 1 2 L ^ β ( ζ 2 ) L ^ β ( ζ ) .
Given σ 0 > 0 , θ [ 0 , 1 ) , and S is a positive semi-definite matrix, it is evident that one can derive that
k = 0 n σ 0 y k + 1 y k 2 < , k = 0 n 1 θ 2 2 x k x ¯ k 1 2 < .
By the inertial relationship, the following conclusion can be obtained:
x k + 1 x k 2 = x k + 1 x ¯ k + x ¯ k x k 2 = x k + 1 x ¯ k + θ ( x k x ¯ k 1 ) 2 2 x k + 1 x ¯ k 2 + 2 θ x k x ¯ k 1 2 , y k + 1 y k 2 = y k + 1 y ¯ k + y ¯ k y k 2 = y k + 1 y ¯ k + η ( y k y ¯ k 1 ) 2 2 y k + 1 y ¯ k 2 + 2 η y k y ¯ k 1 2 .
Combining (12), (16), and (17), we have
k = 0 n x k + 1 x k 2 < , k = 0 n y k + 1 y k 2 < , k = 0 n λ k + 1 λ k 2 ,
thus, we have ζ k + 1 ζ k 2 < + . □
Now we give subsequential convergence analysis of NIP-ADMM.
Theorem 1. (Subsequential Convergence) The sequence { ζ k } generated by NIP-ADMM is bounded, and assume S and S ^ are the sets of cluster points of { ζ k } and { ζ ^ k } , respectively. Under the assumptions and the conditions of Lemma 5, we have the following conclusion:
(i)
M and M ^ are two non-empty compact sets. As k 0 , it follows that d ( ζ k , M ) 0 and d ( ζ ^ k , M ^ ) 0 .
(ii)
ζ M ζ ^ M ^ .
(iii)
M c r i t L β .
(iv)
The sequence { L ^ β ( ζ ^ k ) } converges, and L ^ β ( ζ ^ ) = inf k N L ^ β ( ζ ^ k ) = lim k L ^ β ( ζ ^ k ) .
Proof.
(i)
Based on the definitions of M and M ^ , the conclusion can be satisfied.
(ii)
Combining Lemma 5 with the definitions of ζ and ζ ^ , we obtain the desired conclusion.
(iii)
Let ζ M , then we obtain that a subsequence { ζ k j } of { ζ k } can converge to ζ . By combining Lemma 5, as k + , one has ζ k + 1 ζ k = 0 , which implies lim j + ζ k j = ζ . On one hand, noting that x k + 1 is the optimal solution to the x-subproblem, we have
f ( x k + 1 ) λ k , A x k + 1 + β 2 A x k + 1 + y ¯ k b 2 + 1 2 x k + 1 x ¯ k S 2 f ( x ) λ k , A x + β 2 A x + y ¯ k b 2 + 1 2 x x ¯ k S 2 .
From Lemma 6, we know that lim k + x k + 1 x ¯ k = 0 , and combining this with lim k j + ζ k j = lim k j + ζ k j + 1 = ζ , we conclude that the equality lim k j + sup f ( x k j + 1 ) f ( x ) holds. On the other hand, since f ( · ) is a lower semi-continuous function, we deduce that lim k j + f ( x k j + 1 ) f ( x ) , so one gets
lim k j + f ( x k j + 1 ) = f ( x ) .
Moreover, given the closedness of f and the continuity of g , along with k = k j + and the optimality condition of NIP-ADMM (6), we assert that
A T λ f ( x ) , B T λ = g ( y ) , A x + B y = b ,
and ζ crit L β as established in Definition 4.
(v)
Let ζ ^ M ^ , and assume that there exists a subsequence { ζ k j } of { ζ k } that converges to ζ . Combining the relations (14), (19), and the continuity of g, we have
lim k j + L ^ β ( ζ k j ) = L ^ β ( ζ ) .
Considering that L ^ ( ζ ^ k ) is monotonically non-increasing, it follows that L ^ ( ζ ^ k ) is convergent. Consequently, for any ζ ^ S ^ , the relationship can be established as
L ^ β ( ζ ^ ) = inf k N L ^ β ( ζ ^ k ) = lim k L ^ β ( ζ ^ k ) .
Based on the definition of the augmented Lagrangian function and the semidefiniteness of the matrix S, the following can be defined with ζ k = ( x k , y k , λ k ) :
ϵ 1 k + 1 = A T ( λ k λ k + 1 ) + β ( y k + 1 y k ) + S ( x k x k + 1 ) , ϵ 2 k + 1 = S ( y k + 1 ) S ( y k ) + ( λ k λ k + 1 ) + β ( y k + 1 y k ) + 1 γ ( y k y k + 1 ) + 2 ( ϵ + L ) 2 β ( y k + 1 y k ) ,
ϵ 3 k + 1 = ( A x k + 1 + y k + 1 b ) , ϵ 4 k + 1 = S ( x k + 1 x k ) , ϵ 5 k + 1 = S ( x k + 1 x k ) , ϵ 6 k + 1 = 2 ( ϵ + L ) 3 β ( y k + 1 y k ) .
Then, the following result can be obtained.
Lemma 7. 
Let ( ϵ 1 k + 1 , ϵ 2 k + 1 , ϵ 3 k + 1 , ϵ 4 k + 1 , ϵ 5 k + 1 , ϵ 6 k + 1 ) be contained in L ^ β ( ζ ^ k + 1 ) . Then, there exists ψ > 0 and k 1 such that
d ( 0 , L ^ β ( ζ ^ k + 1 ) ) ψ y k + 1 y k + x k + 1 x ¯ k + y k y k 1 + y k + 1 y ¯ k .
Proof. 
By the definition of L ^ β ( · ) and ζ ^ k = ( x k , y k , λ k , x ^ k , x k 1 , y k 1 ) , we can derive that
x L ^ β ( ζ k + 1 ) = f ( x k + 1 ) A λ k + 1 + β ( A x k + 1 y k + 1 b ) , y L ^ β ( ζ k + 1 ) = g ( y k + 1 ) λ k + 1 + β ( A x k + 1 y k + 1 b ) + 2 ( ε + L ) 2 β ( y k + 1 y k ) , λ L ^ β ( ζ k + 1 ) = ( A x k + 1 + y k + 1 b ) , x ¯ L ^ β ( ζ k + 1 ) = S ( x ¯ k + 1 x k ) , x ^ L ^ β ( ζ k + 1 ) = S ( x ¯ k + 1 x k ) , y ¯ L ^ β ( ζ k + 1 ) = 2 ( ε + L ) 3 β ( y k + 1 y k ) .
Combining the above expression with the optimality conditions of NIP-ADMM (6), which means
ϵ 1 k + 1 = A ( λ k λ k + 1 ) + β ( y k + 1 y ¯ k ) + S ( x ¯ k x k + 1 ) , ϵ 2 k + 1 = g ( y k + 1 ) g ( y k ) + ( λ k λ k + 1 ) + β ( y k + 1 y k ) + 1 γ ( y k y k + 1 ) + 2 ( ε + L ) 2 β ( y k + 1 y k ) , ϵ 3 k + 1 = ( A x k + 1 + y k + 1 b ) , ϵ 4 k + 1 = S ( x ¯ k + 1 x k ) , ϵ 5 k + 1 = S ( x ¯ k + 1 x k ) , ϵ 6 k + 1 = 2 ( ε + L ) 3 β ( y k + 1 y k ) .
It is easy to see from Lemma 2 that ( ϵ 1 k + 1 , ϵ 2 k + 1 , ϵ 3 k + 1 , ϵ 4 k + 1 , ϵ 5 k + 1 , ϵ 6 k + 1 ) L ^ β ( ζ ^ k + 1 ) . Moreover, since g ( · ) has a Lipschitz continuous gradient with respect to L, we get
g ( y k + 1 ) g ( y k ) L y k + 1 y k ,
therefore, according to (20), there exists a positive real number ψ 1 such that
( ϵ 1 k + 1 , ϵ 2 k + 1 , ϵ 3 k + 1 , ϵ 4 k + 1 , ϵ 5 k + 1 , ϵ 6 k + 1 ) ψ 1 ( λ k + 1 λ k + y k + 1 y ¯ k + x k + 1 x ¯ k + y k + 1 y k ) .
Furthermore, combining this with (12), we know that there exists ψ 2 > 0 and k > 1 such that:
λ k + 1 λ k ψ 2 ( y k y k 1 + y k + 1 y k ) ,
thus, by selecting ψ = max { ψ 1 , ψ 2 } and k > 1 , we can further conclude that
d ( 0 , L ^ β ( ζ ^ k + 1 ) ) ( ϵ 1 k + 1 , ϵ 2 k + 1 , ϵ 3 k + 1 , ϵ 4 k + 1 , ϵ 5 k + 1 , ϵ 6 k + 1 ) ψ y k + 1 y k + x k + 1 x ¯ k + y k y k 1 + y k + 1 y ¯ k ,
this concludes the proof. □
Theorem 2. (Global convergence) Suppose the sequence { ζ k } generated by NIP-ADMM is bounded, and the assumptions hold. If L ^ β is a KL function, then
ζ k + 1 ζ k < + .
Moreover, the sequence { ζ k } converges to a critical point of L β .
Proof. 
From Theorem 1, we know that lim k L ^ β ( ζ ^ k ) = L ^ β ( ζ ^ ) . For any ζ ^ M ^ , the proof process needs to consider the following two cases:
(i)
For any k 0 > 1 and given that L ^ β ( ζ ^ ( k 0 ) ) = L ^ β ( ζ ^ ) , it follows from Lemma 1 and Lemma 5 that there exists a constant K > 0 such that
K ( x k x ¯ k 1 2 + y k + 1 y k 2 ) L ¯ β ( ζ ^ k ) L ¯ β ( ζ ^ k + 1 ) L ¯ β ( ζ ^ 0 k ) L ¯ β ( ζ ^ ) .
It is clear that K = min σ 0 , 1 η 2 2 λ min ( S ) . As a result, we have y k + 1 y k < + and x k x ¯ k 1 < + . Combining (12) and (17), it follows that x k + 1 x k < + and λ k + 1 λ k < + . Finally, for any k > k 0 , we conclude that ζ k + 1 ζ k < + , and the result holds.
(ii)
Assume that for any k > 0 , the inequality L β ( ζ ^ k ) > L β ( ζ ^ ) holds. Since
lim k d ( ζ ^ k , M ^ ) = 0 ,
it follows that for any ϵ 1 > 0 , there exists k 1 > 0 such that for all k k 1 , we have:
d ( ζ ^ k , M ^ ) < ϵ 1 .
Moreover, noting that
lim k L ^ β ( ζ ^ k ) = L ^ β ( ζ ^ ) ,
it implies that for any ϵ 2 > 0 , there exists k 2 > 0 such that for all k > k 2 , the inequality holds:
L β ( ζ ^ k ) < L β ( ζ ^ ) + ϵ 2 .
Hence, given ϵ 1 and ϵ 2 , when k = max { k 1 , k 2 } , we have
d ( ζ ^ k , M ^ ) < ϵ 1 , L β ( ζ ^ ) < L β ( ζ ^ k ) < L β ( ζ ^ ) + ϵ 2 .
And, based on Lemma 3, the following holds for all k > k ¯ , it can be deduced that
φ L ^ β ( ζ ^ k ) L ^ β ( ζ ^ ) d ( 0 , L ^ β ( ζ ^ k ) ) 1 .
Furthermore, using the concavity of φ , we derive the following:
φ ( L β ( ζ ^ k ) L β ( ζ ^ ) ) φ ( L β ( ζ ^ k + 1 ) L β ( ζ ^ ) ) φ ( L β ( ζ ^ k ) L β ( ζ ^ ) ) × L β ( ζ ^ k ) L β ( ζ ^ k + 1 ) .
Noting the fact that φ L ^ β ( w k ) L ^ β ( w ) > 0 , together with the conclusion obtained in Lemma 7, we can infer that
L β ( ζ ^ k ) L β ( ζ ^ k + 1 ) φ ( L β ( ζ ^ k ) L β ( ζ ^ ) ) φ ( L β ( ζ ^ k + 1 ) L β ( ζ ^ ) ) φ ( L β ( ζ ^ k ) L β ( ζ ^ ) ) Π [ φ ] , [ k + 1 , k ] ψ T [ k , k + 1 ] ,
where Π [ φ ] , [ k , k + 1 ] represents φ ( L β ( ζ ^ k ) L β ( ζ ^ ) ) φ ( L β ( ζ ^ k + 1 ) L β ( ζ ^ ) ) , and T [ k , k + 1 ] represents y k + 1 y k + x k + 1 x ¯ k + y k y k 1 + y k + 1 y ¯ k . Combining Lemma 5, we can rewrite (22) as follows
ϕ ( x k x ¯ k 1 2 + y k + 1 y ¯ k 2 ) Π [ φ ] , [ k + 1 , k ] ψ T [ k , k + 1 ] ,
where (22) can be equivalently expressed as
2 x k x ¯ k 1 2 + 2 y k + 1 y ¯ k 2 2 2 ψ ϕ T [ k , k + 1 ] 1 2 Π [ φ ] , [ k + 1 , k ] 1 2 .
By applying the Cauchy-Schwarz inequality and multiplying both sides by 6, we obtain
6 x k x ¯ k 1 2 + y k + 1 y ¯ k 2 1 2 2 ψ ϕ T [ k , k + 1 ] 1 2 Π [ φ ] , [ k + 1 , k ] 1 2 .
Then, by further applying the fundamental inequality, we can deduce that
6 x k x ¯ k 1 + y k + 1 y ¯ k 2 T [ k , k + 1 ] 1 2 18 ψ ϕ Π [ φ ] , [ k + 1 , k ] 1 2 T [ k , k + 1 ] + 18 ψ ϕ Π [ φ ] , [ k + 1 , k ] = y k + 1 y k + x k + 1 x ¯ k + y k y k 1 + y k + 1 y ¯ k + 18 ψ ϕ Π [ φ ] , [ k + 1 , k ] 2 y k + 1 y ¯ k + 2 y k y ¯ k 1 + x k + 1 x ¯ k + y k 1 y ¯ k 2 + 18 ψ ϕ Π [ φ ] , [ k + 1 , k ] .
Next, summing up (23) from k = p + 1 to k = z and rearranging the terms, one gets
5 k = p + 3 z x k x ¯ k 1 + k = p + 3 z y k + 1 y ¯ k 3 y p + 1 y ¯ p + y p + 2 y ¯ p + 1 + x p + 1 x ¯ p 3 y z + 1 y ¯ z y z + 2 y ¯ z + 1 x z + 1 x ¯ z + 1 + 18 ψ ϕ Π [ φ ] , [ p + 1 , z + 1 ] .
Furthermore, as 0 φ ( L β ( ζ ^ k ) L β ( ζ ^ ) ) and m approaches positive infinity, we can conclude that
k = p + 1 + ( 5 x k x ¯ k 1 + y k + 1 y ¯ k ) < + ,
which implies
k = 0 + x k x ¯ k 1 < + , k = 0 + y k y ¯ k 1 < + .
Based on the relationship between (12) and (17), we can assert that
k = 0 + x k x k 1 < + , k = 0 + y k y k 1 < + , k = 0 + λ k λ k 1 < + .
This demonstrates that { ζ k } forms a Cauchy sequence, which ensures its convergence. By applying Theorem 1, it follows that { ζ k } converges to a critical point of L β .

4. Numerical Simulations

In this section, we demonstrate the application of NIP-ADMM in signal recovery and SCAD penalty problems. To verify the effectiveness of the algorithm, we compare it with Bregman modification of ADMM (BADMM) proposed by Wang et al. [23] and inertial proximal ADMM (IPADMM) proposed by Pi et al. [32]. All codes were implemented in MATLAB 2024b and executed on a Windows 11 system equipped with an AMD Ryzen R9-9900X CPU.

4.1. Signal Recovery

In this subsection on signal recovery, we consider the previously mentioned model (2).
min c x 1 2 1 2 + 1 2 y 2 s . t . A x y = b ,
where x 1 2 1 2 = i = 1 n | x i | 1 2 , and A R m × n , b R m , y R m , x R n . To evaluate the effectiveness of NIP-ADMM, we compare it with BADMM [23] and IPADMM [32]. We construct the following framework to solve problem (24). We set S = e 0 I β A A , where I denotes the identity matrix.
( x ¯ k , y ¯ k ) = ( x k , y k ) + θ ( x k x ¯ k 1 , 0 ) + η ( 0 , y k y ¯ k 1 ) , x k + 1 = H 1 e 0 A λ k + β A ( y ¯ k + b ) + ( e 0 I β A A ) x ¯ k , 2 c e 0 , y k + 1 = y k γ ( y k + λ k β ( A x k + 1 y k b ) ) , λ k + 1 = λ k + β ( A x k + 1 y k + 1 b ) .
Here, H ( · , · ) represents the half-shrinkage operator proposed by Xu et al. [39], which is defined as
H ( x , e ) = { h e ( x 1 ) , h e ( x 2 ) , , h e ( x n ) } T ,
where the function h e ( x i ) for i = 1 , 2 , , n is defined by
h e ( x i ) = 2 x i 3 1 + cos 2 3 π arccos e 8 x i 3 3 2 , x i > 54 3 4 e 2 3 , 0 , otherwise .
In this setup, the entries of matrix A are drawn from a standard normal distribution, with each column adjusted for normalization. The starting vector x 0 is initialized as a sparse vector, containing at least 100 non-zero components. The values x ¯ 0 , y ¯ 0 , x 0 , λ 0 are set initially to zero. To simulate the observation vector with added noise, we generate b using b = A x 0 + v , where v N ( 0 , 10 3 I ) . For the regularization parameter c, we compute
c = 0.1 A T b .
Based on Assumption 1, the parameters are chosen as β = 3 and e 0 = 10 . The error trend is depicted as log 10 ( x k x * ) and log 10 ( y k y * ) . At the ( k + 1 ) -th iteration, the primal residual is expressed as r k + 1 = A x k + 1 + y k + 1 b , while the Dual residual is represented as s k + 1 = β A T ( x k + 1 x k ) . Termination occurs when both conditions are met: r k + 1 2 n × 10 4 + 10 3 · max A x k 2 , y k 2 and s k + 1 2 n × 10 4 + 10 3 · A T λ k 2 . During the experiments, to satisfy Assumption 1, we set γ = 0.3 . Table 1 shows that when m = n = 1000 , selecting the inertial parameter values θ = 0.8 and η = 0.75 produces satisfactory results. Therefore, in subsequent experiments, we also adopt θ = 0.8 and η = 0.75 . The metrics include the number of iterations (Iter), CPU running time (CPUT) and the objective function value (Obj). To better present the experimental results, we retain two decimal places for Obj and four decimal places for CPUT.
The numerical results consistently demonstrate the superior performance of NIP-ADMM compared to BADMM and IPADMM (see Table 2): for m = n = 1000 , NIP-ADMM shows faster convergence in terms of both objective value and error reduction (see Figure 1); for m = 3000 and n = 4000 , the inclusion of the inertial term further highlights its effectiveness (see Figure 2); and for larger-scale models with m = n = 6000 , it is proven that NIP-ADMM is more suitable than BADMM and IPADMM for handling large-scale problems (see Figure 3).

4.2. SCAD Penalty Problem

We note that the smoothly clipped absolute deviation (SCAD) penalty problem in statistics can be formulated as the following model [14,31]:
min i = 1 n h k ( | x i | ) + 1 2 y 2 s . t . A x y = b ,
with A R m × n , b R m , y R m , x R n and the penalty function h κ ( · ) in the objective is defined as:
h κ ( θ ) = κ θ , θ κ , θ 2 + 2 c κ θ κ 2 2 ( c + 1 ) , κ < θ c κ , ( c + 1 ) κ 2 2 , θ > c κ ,
where c > 2 and κ > 0 , being the knots of the quadratic spline function. In the reference signal recovery subsection, we similarly set ψ ( x ) = 1 2 x S 2 and ϕ ( y ) = 0 , where S = μ I β A A . For the problem (25), the x-subproblem can be expressed as
x k + 1 = arg min i = 1 n g κ ( | x i | ) p k , A x + β 2 A x y ¯ k b 2 + 1 2 x x ¯ k μ I β A T A 2 = arg min i = 1 n g κ ( | x i | ) + μ 2 x β A T ( y ¯ k + b + p k β A x ¯ k ) μ 2 .
On the one hand, the x-subproblem can be equivalently formulated as:
min x R n i = 1 n g κ ( | x i | ) + 1 2 ν x q 2 .
On the other hand, under the condition that 1 + ν c , we can update x using the following rule [31]:
x i : = sign ( q i ) ( | q i | κ ν ) + , | q i | ( 1 + ν ) κ , ( c 1 ) q i sign ( q i ) κ ν c 1 ν , ( 1 + ν ) κ < | q i | c κ , q i , | q i | > c κ ,
where ( · ) + represents the positive part operator, which is defined as ( x ) + = max ( 0 , x ) , applying NIP-ADMM to solve the problem (25), one yields that
x ¯ k = x k + θ ( x k x ¯ k 1 ) , y ¯ k = y k + η ( y k y ¯ k 1 ) , x k + 1 = arg min i = 1 n g κ ( | x i | ) + μ 2 x A λ k + β A ( y ¯ k + b ) + μ 0 x ¯ k A A x ¯ k μ 2 , y k + 1 = y k γ ( y k λ k β ( A x k + 1 y k b ) ) , λ k + 1 = λ k β ( A x k + 1 y k + 1 b ) .
Similarly, the update scheme of BADMM can be represented by the following procedure:
x k + 1 = arg min i = 1 n g κ ( | x i | ) + μ 2 x β A T ( y k + b + λ k β A x k ) + μ x k μ 2 , y k + 1 = 1 1 + β β ( A x k + 1 b ) λ k , λ k + 1 = λ k β ( A x k + 1 y k + 1 b ) .
Utilize the IPADMM to address model (25) and derive the following iterative scheme:
( x ¯ k , y ¯ k , λ ¯ k ) = ( x k , y k , λ k ) + θ ( x k x k 1 , y k y k 1 , λ k λ k 1 ) , , x k + 1 = arg min i = 1 n g κ ( | x i | ) + μ 2 x A λ ¯ k + β A ( y ¯ k + b ) + μ 0 x ¯ k A A x ¯ k μ 2 , λ k + 1 = λ ¯ k β ( A x k + 1 y ¯ k b ) , y k + 1 = y k γ ( y k λ k + 1 β ( A x k + 1 y k b ) ) .
In this experiment, we generate a random m × n matrix A, and perform row and column normalization. Here, we choose to generate a vector z of dimension n with a sparsity ratio of 100 m . The vector b is represented as the sum of A z and a Gaussian noise vector with zero mean and variance 0.001 . The initial variables x 0 and y 0 are set as zero vectors, serving as the starting point for optimization. To improve numerical efficiency, in this experiment, we set c = 5 and κ = 0.1 . Under the condition that Assumption 1 is satisfied, we configure γ = 0.1 , β = 12 , and μ = 100 for NIP-ADMM and other algorithms. The stopping criterion for the updates is set as
max x k + 1 x k , y k + 1 y k 10 2 .
In Table 3, we set m = n = 1000 . The results in the table support our choice of the inertial parameters η = 0.9 and θ = 0.9 . Under these conditions, the NIP-ADMM requires the fewest iterations and the least running time. Therefore, we selected the inertial parameters η = 0.9 and θ = 0.9 for the experiments.
To further investigate the convergence behavior of the algorithms, we plot the update curves of the objective function (left figure) and the iteration error (right figure) against the number of iterations for each algorithm under three different dimensions (see Figure 4, Figure 5, and Figure 6). The numerical results demonstrate that NIP-ADMM achieves nearly the same performance as IPADMM and BADMM but with significantly fewer iterations.
Figure 4. Comparison of convergence when m = 1000 and n = 1000 : (a) The objective value when m = 1000 and n = 1000 . (b) log x k x and log y k y under m = n = 1000 .
Figure 4. Comparison of convergence when m = 1000 and n = 1000 : (a) The objective value when m = 1000 and n = 1000 . (b) log x k x and log y k y under m = n = 1000 .
Preprints 159441 g004
Figure 5. Comparison of convergence when m = 1000 and n = 1000 : (a) The objective value when m = 1500 and n = 1500 . (b) log x k x and log y k y under m = n = 1500 .
Figure 5. Comparison of convergence when m = 1000 and n = 1000 : (a) The objective value when m = 1500 and n = 1500 . (b) log x k x and log y k y under m = n = 1500 .
Preprints 159441 g005
Figure 6. Comparison of convergence when m = 1000 and n = 1000 : (a) The objective value when m = 3000 and n = 3000 . (b) log x k x and log y k y under m = n = 3000 .
Figure 6. Comparison of convergence when m = 1000 and n = 1000 : (a) The objective value when m = 3000 and n = 3000 . (b) log x k x and log y k y under m = n = 3000 .
Preprints 159441 g006
In Table 4, we present the test results of NIP-ADMM, BADMM and IPADMM under different dimensions. Although there are slight discrepancies in the Obj values between the two methods, by focusing on the metrics of Iter and CPUT, it is evident that NIP-ADMM demonstrates a significant advantage over BADMM and IPADMM. This advantage becomes even more pronounced in higher-dimensional scenarios.

5. Conclusion

The Purpose of this is to propose a novel symmetrical inertial ADMM with proximal term for solving nonconvex two-block optimization problems. Under certain conditions, if the objective function satisfies Kurdyka-Łojasiewicz property, the sequence generated by the algorithm globally converges to a stationary point. In numerical experiments, we apply the algorithm to signal recovery and SCAD penalty problems, and its superiority is validated. Notably, by continuously adjusting the inertial parameters, we identify a set of parameters that enhances the convergence speed of the algorithm.
Furthermore, we believe that future work could explore whether the convergence of the algorithm can be guaranteed when the objective function is non-separable. Additionally, it would be worthwhile to investigate whether introducing inertial terms into the y-subproblem and the multiplier λ could further accelerate the convergence speed of the algorithm.

Author Contributions

Conceptualization, J.-h.L. and H.-y.L.; methodology, J.-h.L.; software, J.-h.L. and S.-y.L.; validation, J.-h.L. and H.-y.L.; writing—original draft preparation, J.-h.L. and H.-y.L.; writing—review and editing, J.-h.L. and H.-y.L.; visualization, J.-h.L, H.-y.L. and S.-y.L.; supervision, H.-y.L.; project administration, H.-y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Innovation Fund of Postgraduate, Sichuan University of Science & Engineering (Y2024340), the Scientific Research and Innovation Team Program of Sichuan University of Science and Engineering (SUSE652B002) and the Opening Project of Sichuan Province University Key Laboratory of Bridge Non-destruction Detecting and Engineering Computing (2023QZJ01).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, S.R.; Wang, Q.H.; Zhang, B.X.; Liang, Z.; Zhang, L.; Li, L.L.; Huang, G.; Zhang, Z.G.; Feng, B.; Yu, T.Y. Cauchy non-convex sparse feature selection method for the high-dimensional small-sample problem in motor imagery EEG decoding. Front. Neurosci. 2023, 17, 1292724. [Google Scholar] [CrossRef] [PubMed]
  2. Doostmohammadian, M.; Gabidullina, Z.R.; Rabiee, H.R. Nonlinear perturbation-based non-convex optimization over time-varying networks. IEEE Trans. Netw. Sci. Eng. 2024, 11, 6461–6469. [Google Scholar] [CrossRef]
  3. Tiddeman, B.; Ghahremani, M. Principal component wavelet networks for solving linear inverse problems. Symmetry. 2021, 13, 1083. [Google Scholar] [CrossRef]
  4. Xia, Z.C.; Liu, Y.; Hu, C.; Jiang, H.J. Distributed nonconvex optimization subject to globally coupled constraints via collaborative neurodynamic optimization. Neural Netw. 2025, 184, 107027. [Google Scholar] [CrossRef] [PubMed]
  5. Yu, G.; Fu, H.; Liu, Y.F. High-dimensional cost-constrained regression via nonconvex optimization. Technometrics. 2021, 64, 52–64. [Google Scholar] [CrossRef]
  6. Merzbacher, C.; Mac Aodha, O.; Oyarzún, D.A. Bayesian optimization for design of multiscale biological circuits. ACS Synth. Biol. 2023, 12, 2073–2082. [Google Scholar] [CrossRef] [PubMed]
  7. Kim, S.j.; Koh, K.; Lustig, M.; Boyd, S.; Gorinevsky, D. An interior-point method for large-scale 1-regularized least Squares. IEEE J. Sel. Top. Signal Process. 2007, 1, 606–617. [Google Scholar] [CrossRef]
  8. Chartrand, R.; Staneva, V. Restricted isometry properties and nonconvex compressive sensing. Inverse Problems. 2008, 24, 035020. [Google Scholar] [CrossRef]
  9. Zeng, J.S.; Lin, S.B.; Wang, Y.; Xu, Z.B. L1/2 regularization: convergence of iterative half thresholding algorithm. IEEE Trans. Signal Process. 2014, 62, 2317–2329. [Google Scholar] [CrossRef]
  10. Chen, P.Y.; Selesnick, I.W. Group-sparse signal denoising: non-convex regularization, convex optimization. IEEE Trans. Signal Process. 2014, 62, 3464–3478. [Google Scholar] [CrossRef]
  11. Bai, Z.L. Sparse Bayesian learning for sparse signal recovery using 1/2-norm. Appl. Acoust. 2023, 207, 109340. [Google Scholar] [CrossRef]
  12. Wang, C.; Yan, M.; Rahimi, Y.; Lou, Y.F. Accelerated schemes for the L1/L2 minimization. IEEE Trans. Signal Process. 2020, 68, 2660–2669. [Google Scholar] [CrossRef]
  13. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  14. Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  15. Bai, J.C.; Zhang, H.C.; Li, J.C. A parameterized proximal point algorithm for separable convex optimization. Optim. Lett. 2018, 12, 1589–1608. [Google Scholar] [CrossRef]
  16. Wen, F.; Liu, P.L.; Liu, Y.P.; Qiu, R.C.; Yu, W.X. Robust sparse recovery in impulsive noise via p-1 optimization. IEEE Trans. Signal Process. 2017, 65, 105–118. [Google Scholar] [CrossRef]
  17. Zhang, H.M.; Gao, J.B.; Qian, J.J.; Yang, J.; Xu, C.Y.; Zhang, B. Linear regression problem relaxations solved by nonconvex ADMM with convergence analysis. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 828–838. [Google Scholar] [CrossRef]
  18. Ames, B.P.W.; Hong, M.Y. Alternating direction method of multipliers for penalized zero-variance discriminant analysis. Comput. Optim. Appl. 2016, 64, 725–754. [Google Scholar] [CrossRef]
  19. Zietlow, C.; Lindner, J.K.N. ADMM-TGV image restoration for scientific applications with unbiased parameter choice. Numer. Algorithms. 2024, 97, 1481–1512. [Google Scholar] [CrossRef]
  20. Bian, F.M.; Liang, J.W.; Zhang, X.Q. A stochastic alternating direction method of multipliers for non-smooth and non-convex optimization. Inverse Problems. 2021, 37, 075009. [Google Scholar] [CrossRef]
  21. Parikh, N.; Boyd, S. Proximal Algorithms; Now Publishers: Braintree, MA, USA, 2014. [Google Scholar]
  22. Hong, M.Y.; Luo, Z.Q.; Razaviyayn, M. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 2016, 26, 337–364. [Google Scholar] [CrossRef]
  23. Wang, F.H.; Xu, Z.B.; Xu, H.K. Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems. arXiv 2014, arXiv:1410.8625. [Google Scholar]
  24. Ding, W.; Shang, Y.; Jin, Z.; Fan, Y. Semi-proximal ADMM for primal and dual robust Low-Rank matrix restoration from corrupted observations. Symmetry. 2024, 16, 303. [Google Scholar] [CrossRef]
  25. Guo, K.; Han, D.R.; Wu, T.T. Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints. Int. J. Comput. Math. 2016, 94, 1653–1669. [Google Scholar] [CrossRef]
  26. Wang, Y.; Yin, W.T.; Zeng, J.S. Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 2019, 78, 29–63. [Google Scholar] [CrossRef]
  27. Wang, F.H.; Cao, W.F.; Xu, Z.B. Convergence of multi-block Bregman ADMM for nonconvex composite problems. Sci. China Inf. Sci. 2018, 61, 122101. [Google Scholar] [CrossRef]
  28. Barber, R.F.; Sidky, E.Y. Convergence for nonconvex ADMM, with applications to CT imaging. J. Mach. Learn. Res. 2024, 25, 1–46. [Google Scholar] [PubMed]
  29. Wang, X.F.; Yan, J.C.; Jin, B.; Li, W.H. Distributed and parallel ADMM for structured nonconvex optimization problem. IEEE Trans. Cybern. 2021, 51, 4540–4552. [Google Scholar] [CrossRef] [PubMed]
  30. Alvarez, F.; Attouch, H. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 2001, 9, 3–11. [Google Scholar] [CrossRef]
  31. Wu, Z.M.; Li, M. General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optim. Appl. 2019, 73, 129–158. [Google Scholar] [CrossRef]
  32. Chen, C.H.; Chan, R.H.; Ma, S.Q.; Yang, J.F. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imaging Sci. 2015, 8, 2239–2267. [Google Scholar] [CrossRef]
  33. Chao, M.T.; Zhang, Y.; Jian, J.B. An inertial proximal alternating direction method of multipliers for nonconvex optimization. Int. J. Comput. Math. 2020, 98, 1199–1217. [Google Scholar] [CrossRef]
  34. Wang, X.Q.; Shao, H.; Liu, P.J.; Wu, T. An inertial proximal partially symmetric ADMM-based algorithm for linearly constrained multi-block nonconvex optimization problems with applications. J. Comput. Appl. Math. 2023, 420, 114821. [Google Scholar] [CrossRef]
  35. Gonçalves, M.L.N.; Melo, J.G.; Monteiro, R.D.C. Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems. arXiv 2017, arXiv:1702.01850. [Google Scholar]
  36. Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 2010, 35, 438–457. [Google Scholar] [CrossRef]
  37. Bolte, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 2014, 146, 459–494. [Google Scholar] [CrossRef]
  38. Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer: NY, USA, 2004. [Google Scholar]
  39. Xu, Z.B.; Chang, X.Y.; Xu, F.M.; Zhang, H. L1/2 regularization: A thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1013–1027. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Comparison of convergence when m = 1000 and n = 1000 : (a) The objective value when m = 1000 and n = 1000 . (b) log x k x and log y k y under m = n = 1000 .
Figure 1. Comparison of convergence when m = 1000 and n = 1000 : (a) The objective value when m = 1000 and n = 1000 . (b) log x k x and log y k y under m = n = 1000 .
Preprints 159441 g001
Figure 2. Comparison of convergence when m = 3000 and n = 3000 : (a) The objective value when m = 3000 and n = 3000 . (b) log x k x and log y k y under m = n = 3000 .
Figure 2. Comparison of convergence when m = 3000 and n = 3000 : (a) The objective value when m = 3000 and n = 3000 . (b) log x k x and log y k y under m = n = 3000 .
Preprints 159441 g002
Figure 3. Comparison of convergence when m = 6000 and n = 6000 : (a) The objective value when m = 6000 and n = 6000 . (b) log x k x and log y k y under m = n = 6000 .
Figure 3. Comparison of convergence when m = 6000 and n = 6000 : (a) The objective value when m = 6000 and n = 6000 . (b) log x k x and log y k y under m = n = 6000 .
Preprints 159441 g003
Table 1. This is a table caption. Tables should be placed in the main text near to the first time they are cited.
Table 1. This is a table caption. Tables should be placed in the main text near to the first time they are cited.
θ η Iter CPUT(s) θ η Iter CPUT(s)
0.2 0.2 75 2.2392 0.6 0.7 54 1.6014
0.3 0.2 78 2.3039 0.8 0.8 49 1.4583
0.3 0.3 69 1.9476 0.8 0.75 49 1.4309
0.5 0.5 60 1.7622 0.85 0.85 56 1.6445
0.6 0.6 56 1.6516 0.9 0.9 84 2.4614
Table 2. Comparison of iteration effect between NIP-ADMM, IPADMM and BADMM.
Table 2. Comparison of iteration effect between NIP-ADMM, IPADMM and BADMM.
m n NIP-ADMM IPADMM BADMM
Iter CPUT(s) Obj Iter CPUT(s) Obj Iter CPUT(s) Obj
1000 1000 49 1.2698 19.36 78 2.1407 18.46 90 2.3407 20.14
1500 2000 44 4.9152 23.17 72 8.4978 22.12 76 8.5387 23.69
3000 3000 40 15.0464 21.02 57 22.3823 20.56 73 27.4115 21.18
3000 4000 55 34.0601 23.21 98 62.8206 23.11 76 48.3825 23.22
4000 5000 36 40.6110 24.02 53 61.7431 23.09 65 74.4521 24.03
4500 5500 40 61.7638 24.05 45 71.7627 23.79 67 102.6028 24.06
6000 6000 40 88.7702 24.99 48 108.3045 24.56 63 135.8133 25.00
Table 3. Numerical results of NIP-ADMM with different θ , η .
Table 3. Numerical results of NIP-ADMM with different θ , η .
θ η Iter CPUT(s) θ η Iter CPUT(s)
0.2 0.2 196 2.0092 0.6 0.7 149 1.5017
0.3 0.2 187 1.9122 0.8 0.7 134 1.3693
0.3 0.3 181 1.8690 0.8 0.9 133 1.3503
0.4 0.5 170 1.7650 0.9 0.8 127 1.3467
0.5 0.5 159 1.6438 0.9 0.9 126 1.3100
Table 4. Comparison of iteration effect between NIP-ADMM, IPADMM and BADMM.
Table 4. Comparison of iteration effect between NIP-ADMM, IPADMM and BADMM.
m n NIP-ADMM IPADMM BADMM
Iter CPUT(s) Obj Iter CPUT(s) Obj Iter CPUT(s) Obj
1000 1000 121 1.3366 10.91 213 2.3065 10.55 182 1.9632 10.55
1000 1300 115 1.9246 12.96 211 3.5724 12.48 174 2.9611 13.13
1500 1000 130 1.9270 8.92 228 3.2943 8.59 172 2.4709 7.49
1500 1300 140 3.0474 13.38 259 5.7832 12.90 215 4.7147 11.88
1500 1500 125 3.6104 13.43 230 6.6865 12.81 196 5.4584 12.71
1800 1500 146 4.6432 13.47 257 8.0396 12.94 209 6.1925 11.83
1800 2000 115 5.8341 15.00 210 10.7513 14.29 182 9.1033 14.69
2500 2000 142 8.9043 14.95 250 15.6397 14.29 201 12.2370 13.07
2900 2700 134 15.3647 17.70 245 28.4289 16.71 203 22.9945 16.50
3000 3000 125 17.1686 17.20 217 34.1575 16.34 188 25.0864 17.22
3500 3000 128 20.2808 16.87 234 37.1725 15.84 194 30.6876 15.94
3500 3500 123 24.5455 19.80 223 44.4163 18.69 200 39.2771 19.43
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated