Preprint
Article

This version is not peer-reviewed.

Hybrid Gauss-Newton Method for Convex Inclusion and Convex-Composite Optimization Problems

Submitted:

20 November 2025

Posted:

21 November 2025

You are already at the latest version

Abstract
In Banach space optimization, solving inclusion and convex-composite problems through iterative methods depends on various convergence conditions. In this work, we develop an extended semi-local convergence analysis for two algorithms used to generate sequences converging to solutions of inclusion and convex-composite optimization problems for Banach space-valued operators. The applicability of these algorithms is extended with benefits: weaker sufficient convergence criteria and tighter error estimates on the distances involved. These advantages are obtained at the same computational cost since the Lipschitz constants used are special cases of the Lipschitz constant used in earlier studies. The implementation issues for these algorithms are also addressed by introducing hybrid algorithms.
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

Let X and Y be Banach spaces, and let K represent a closed convex subset of Y. Two inclusion problems have been extensively studied [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. The first problem is defined as
F ( x ) K ,
where F : X Y is a differentiable operator in the Fréchet sense. The second problem is a convex composite optimization problem:
min x X ( h F ) ( x ) ,
where h is a real-valued convex function on Y and F is as defined in (1). If h ( · ) = d ( · , K ) , the distance function related to K, and (2) is solvable, then (2) reduces to (1). Many optimization and mathematical programming problems can be reduced to solving (1) or (2) [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. Regarding problem (1), Robinson [25,26] developed the extended Newton method under the conditions that K is a closed convex cone and X is a reflexive space. Let x 0 X .

Algorithm A ( x 0 ) : 

Given x n , n = 0 , 1 , 2 , , compute x n + 1 as follows: First, define the set
D : = { d X : F ( x ) + F ( x ) d K for all x X } .
If D ( x ) , pick d n D ( x n ) to satisfy d n = min d D ( x n ) d , and set x n + 1 = x n + d n . In the case when D ( x ) = for some x X , the algorithm is not well-defined. It is said to be well-defined if at least one sequence is generated by this algorithm. Let T x 0 be a convex process given by
T x 0 d = F ( x 0 ) d for all d X .
Robinson in [25,26] provided two conditions:
Range ( T x 0 ) = Y ,
and F is Lipschitz continuous with modulus H, T x 0 is normed with T x 0 1 < , and
x 1 x 0 1 2 H T x 0 1 .
Under these conditions, Algorithm A ( x 0 ) generates a sequence { x n } converging to some x * solving (1). Numerous authors have presented relevant convergence results lacking the affine invariant property for the operator F [14]. Later, in the elegant work by Li and Ng in [20], convergence conditions addressing this problem under weaker conditions were introduced. Their study also relied on the new concept of the weak-Robinson condition for convex processes introduced in Section 2, Subsection 2.1. The semi-local convergence of the two algorithms is presented in Section 2, Subsection 2.2. The hybrid methods are studied in Section 2, Subsection 2.3. The concluding remarks appear in Section 3 and the paper ends with discussion in Section 4.

2. Materials and Methods

2.1. Weak-Robinson Condition and Properties of Convex Processes

Let U ( x , ρ ) denote the open ball in X with center x X and radius ρ > 0 . Also, let d ( x , S ) denote the distance from x to S. We assume familiarity with the concept of convex process T : X 2 Y as defined by Rockafellar [27,28], as well as its properties. We list only some in order to make the study as self-contained as possible. The following sets are needed:
D ( T ) = { x X : T x } , R ( T ) = { T x : x D ( T ) } ,
and
T 1 y = { x X : y T x } for all y Y .
Proposition 1 
([28]). Let B 1 , B 2 be convex processes with D ( B 1 ) = D ( B 2 ) = X and R ( B 1 ) = Y . Suppose that B 1 1 B 2 < 1 and that ( B 1 + B 2 ) ( x ) is closed for all x X . Then, the following assertions hold:
R ( B 1 + B 2 ) = Y
and
( B 1 + B 2 ) 1 B 1 1 1 B 1 1 B 2 .
Next, we present Lipschitz-type conditions connected to convex processes for comparison.
Definition 1. 
Let E : X L ( X , Y ) , where L ( X , Y ) denotes the space of linear operators from X to Y. Let T : Y 2 Z be a convex process, where Z is also a Banach space. The pair ( T , E ) is said to be center-Lipschitz continuous on U ( x 0 , ρ 0 ) ( ρ 0 > 0 ) if
T E ( x ) E ( x 0 ) L 0 x x 0 ,
for all x U ( x 0 , ρ 0 ) and some L 0 > 0 . Set ρ = min { ρ 0 , 1 L 0 } .
Definition 2. 
Let E and T be as in Definition 1. Then, the pair ( T , E ) is said to be restricted-Lipschitz continuous on U ( x 0 , ρ ) if
T E ( x ) E ( y ) L x y
for all x , y U ( x 0 , ρ ) and some L > 0 .
Definition 3. 
Let E and T be as in Definition 1. Then, the pair ( T , E ) is said to be Lipschitz continuous on U ( x 0 , ρ 0 ) if
T E ( x ) E ( y ) L 1 x y ,
for all x , y U ( x 0 , ρ 0 ) and some L 1 > 0 .
Remark 1. 
It follows from these Definitions, since ρ ρ 0 , that
L 0 L 1
and
L L 1 .
The convergence analysis in [4] used L 1 . However, the proofs can be repeated with L replacing L 1 . Note that L 0 = L 0 U ( x 0 , ρ 0 ) and L 1 = L 1 U ( x 0 , ρ 0 ) , while L = L L 0 , U ( x 0 , ρ 0 ) . We assume that
L 0 L .
Otherwise, i.e., if L L 0 , the results hold with L 0 replacing L.
Definition 4. 
The problem (1) is said to satisfy the Weak-Robinson condition at x 0 on U ( x 0 , ρ 0 ) if
F ( x 0 ) R ( T x 0 ) and R F ( x ) R ( T x 0 ) for all x U ( x 0 , ρ 0 ) .
It follows that for the problem (1), the Robinson condition at x 0 implies the Weak-Robinson condition at x 0 on X.
Proposition 2. 
Let x 0 X and r 0 , 1 L 0 . Suppose that the problem (1) satisfies the Weak-Robinson condition at x 0 U ( x 0 , r ) and ( T x 0 1 , F ) is L 0 -Lipschitz continuous on U ( x 0 , r ) . Then, for all x U ( x 0 , r ) , the following assertions hold:
R ( T x 0 ) = R ( T x ) , D ( T x 1 F ( x 0 ) ) = X
and
T x 1 F ( x 0 ) 1 1 L 0 x x 0 .
Additionally, if X is reflexive, then
D ( x ) .
Proof. 
Simply replace L 1 by the smallest L 0 that is actually needed in the proof of Proposition 2.3 in [4]. □

2.2. The Gauss-Newton Method for the Problem

We assume in this section that X is reflexive, h is a continuous convex function, and the set K is the set of minimum points defined as
K : = arg min h .
For each Δ ( 0 , + ] and x X , let D Δ ( x ) be the set of all d X such that d Δ and
h F ( x ) + F ( x ) d = min h F ( x ) + F ( x ) d 0 : d 0 X , d 0 Δ .
Then, d D Δ ( x ) if and only if d solves the convex minimization problem:
min h F ( x ) + F ( x ) d 0 : d 0 X , d 0 Δ .
Define
D Δ ( x ) = d X : d Δ , F ( x ) + F ( x ) d K .
Let r [ 1 , + ) , x 0 X , and Δ ( 0 , + ] . Then, the Gauss-Newton method for solving the problem (2) is given by the Algorithm:

Algorithm A ( r , Δ , x 0 ) : 

Given x n , n = 0 , 1 , , compute x n + 1 as follows: If
h F ( x n ) = min h F ( x n ) + F ( x n ) d : d X , d Δ ,
then stop. Otherwise, select d n D Δ ( x n ) so that d n r d 0 , D Δ ( x n ) , and let x n + 1 = x n + d n .
Define the parameters β and α by
β = r T x 0 1 F ( x 0 ) , α = r 1 + ( r 1 ) L β , and α 1 = r 1 + ( r 1 ) L 1 β .
The majorizing functions for all t 0 are given by
g 0 ( t ) = β t + α L 0 2 t 2 , g ( t ) = β t + α L 2 t 2 , and g 1 ( t ) = β t + α 1 L 1 2 t 2 .
It follows from (19), (7), and these definitions that
g 0 ( t ) g 1 ( t )
and
g ( t ) g 1 ( t ) .
Define the majorizing sequences { s n } , { t n } by
s 0 = 0 , s 1 = r , s n + 1 = s n g ( s n ) g 0 ( s n ) ,
t 0 = 0 , t 1 = r , t n + 1 = t n g ( t n ) g ( t n ) ,
and
u 0 = 0 , u 1 = r , u n + 1 = u n g 1 ( u n ) g 1 ( u n ) .
The sequence { u n } is used in [4]. Here, we use { t n } although { s n } can also be used according to the proof of Theorem 3.1. Next, a Kantorovich condition is provided for the convergence of the sequence { t n } also for the sequence { s n } .
Lemma 1. 
Suppose
β ( r + 1 ) L 1 .
Then, the sequence { t n } is increasingly convergent to t * which is the smallest of the two roots of the function g guaranteed to exist by (22). Moreover for
θ = 1 L β 1 2 L β ( r 2 1 ) ( L β ) 2 β r L < 1 ,
t n = j = 0 2 n 2 θ j j = 0 2 n 1 θ j t * , n = 1 , 2 , .
Proof. 
Simply replace L 1 , g 1 by L , g in the proof of Lemma 3.1 in [4] . □
Remark 2. 
The Kantorovich condition in [4] is
β ( r + 1 ) L 1 1 .
It follows that
( ) ( )
but not necessarily vice versa unless L 1 = L . Moreover,
θ θ 1 ,
where θ 1 is given by (23) with L replaced by L 1 . Thus, the new results are at least as good as those in [4].
It is also worth noting that the improved results are obtained under the same computational effort, since in practice the computation of L 1 requires that of L 0 and L as special cases. Next, we present the semi-local convergence result for the algorithm A ( r , Δ , x 0 ) .
Theorem 1. 
Suppose that the problem (1) satisfies the Weak-Robinson condition at x 0 on U ( x 0 , t * ) and that ( T x 0 1 , F ) is L 0 -center-Lipschitz continuous on U ( x 0 , 1 L 0 ) and L-restricted-Lipschitz on U ( x 0 , t * ) . Suppose that
β min Δ , 1 L ( r + 1 ) .
Then, any sequence { x n } generated by the algorithm A ( r , Δ , x 0 ) is convergent to some x * solving (1). Moreover, the following assertions hold:
x n x n 1 t n t n 1 , n = 1 , 2 , , F ( x n 1 ) + F ( x n 1 ) ( x n x n 1 ) K , n = 1 , 2 ,
and
x * x n θ 2 n 1 j = 0 2 n 1 θ j t * .
Proof. 
Simply replace θ 1 , L 1 , g 1 , u n by θ , L , g , t n , respectively, in the proof of Theorem 3.1 in [4]. □
Theorem 2. 
Suppose that the problem (1) satisfies the Weak-Robinson condition at x 0 on U ( x 0 , t * ) , L 0 , L ( 0 , + ) , β T x 0 1 F ( x 0 ) , r = 1 , and T x 0 1 , F is L 0 -center-Lipschitz on U ( x 0 , 1 L 0 ) , L-Lipschitz continuous on U ( x 0 , ρ ) , and 2 β L 1 . Then, any sequence { x n } generated by the Algorithm A ( x 0 ) converges to a solution t * of the problem (1). Moreover, the following error assertion holds:
x * x n p 2 n 1 j = 0 2 n 1 p j t * ,
where
p = 1 L β 1 2 β L L β .
Proof. 
Replace q 1 , L 1 , g 1 , u n by p , L , g , t n in the proof of Theorem 3.2 in [4]. □
Remark 3. 
(i)
Theorem 2 can be reduced to a weaker version of Robinson’s [25,26] for β = T x 0 1 F ( x 0 ) , ρ 1 = 1 1 2 H T x 0 1 β H T x 0 1 , and x 1 x 0 1 2 H T x 0 1 .
(ii)
The Kantorovich conditions for the convergence of the sequence { t n } can be weakened further.
Indeed, let us demonstrate this in the case of Algorithm A ( r , x 0 , Δ ) and Theorem 3.4. According to the proof of Theorem 3.5:
r 0 = 0 , r 1 = β , r 2 = β + L 0 β 2 2 ( 1 L 0 β ) ,
r n + 2 = r n + 1 + L ( r n + 1 r n ) 2 2 ( 1 L 0 r n + 1 ) .
The sequence majorizing { x n } in [4], i.e., { u n } , can be written as:
u 0 = 0 , u 1 = β , u n + 1 = u n + L 1 ( u n u n 1 ) 2 2 ( 1 L 1 u n ) = u n g 1 ( u n ) g 1 ( u n ) .
Moreover, the convergence conditions are given in [2] and [4], respectively, as:
λ = L ¯ β 1
and
μ = 2 L 1 β r 1 ,
where L ¯ = 1 4 4 L 0 + L 0 L + 8 L 0 2 + L 0 L . Notice that:
μ 1 λ 1 ,
holds and λ μ 0 as L 0 L 1 , L L 1 approaches 0. Thus, the new convergence condition is infinitely weaker than the original Kantorovich condition (33). Moreover, the following hold:
0 r n + 1 r n u n + 1 u n
and
r * = lim n + r n t * .
Hence, the error bounds on the distances x n + 1 x n are at least as tight and the information on the location of the solution is at least as precise. The results can also be rewritten using majorant functions replacing the Lipschitz constants L 0 , L , and L 1 (see [2,3,4,5]).

2.3. The Implementation of the Algorithms

The iterates in both algorithms are computed by finding the expensive F ( x n ) 1 . In the papers [3,4], we developed a hybrid method to address this difficulty. Specifically, we define the method
F ( x n ) + F 1 ( x n ) ( x n + 1 x n ) = 0 ,
where F 1 = M B k 1 ( x n ) , k is a natural number, M L ( X , Y ) is an invertible operator, Δ = M 1 ( M F ( x ) ) , and B k = I + Δ + + Δ k . Here, B M 1 = lim k + B k M 1 = F .
Similarly, we introduce hybrid methods to replace both Algorithms. We demonstrate this with Algorithm A ( x 0 ) . The method for A ( r , Δ , x 0 ) follows along the same lines. We take M = F ( x 0 ) for simplicity. As in [3], define the parameters:
a = a ( x x 0 ) A ,
b k ¯ = a ( 1 a k ) 1 a , b k = 1 1 b k ¯ ,
b = 1 a 1 2 a , λ ( x x 0 ) = λ k ( x x 0 ) = b k b a k + 1 1 a .
Define the sequence { v n } with v 0 = 0 , v 1 = β ,
v 2 = β + L 0 2 β 2 + λ ( β ) 1 L 0 β , v n + 2 = v n + 1 + L 2 ( v n + 1 v n ) 2 + λ k ( v n + 1 ) 1 L 0 v n + 1 .
The sequence { v n } shall be shown to majorizing for the sequence { x n } generated by the Algorithm defined next.

Algorithm A ( x 0 ) : 

Given x n , n = 0 , 1 , , compute x n + 1 as follows: First define the set
D ( x ) : = d X : F ( x ) + F 1 ( x ) d K for all x X .
If D ( x ) , pick d n D ( x n ) to satisfy d n = min d D ( x n ) d and set x n + 1 = x n + d n .
Let us provide a convergence criterion for the sequence { v n } . Suppose that there exists 0 ρ < 1 L 0 such that for each n = 0 , 1 , 2 , ,
v n ρ .
It follows by (39) and (38) that 0 v n v n + 1 ρ , and there exists v * [ 0 , ρ ] such that v * = lim n v n . The limit point v * is the unique least upper bound of the sequence { v n } . We can show the analog of Theorem 2 for the sequence { x n } generated by Algorithm A ( x 0 ) .
Theorem 3. 
Suppose that the problem (1) satisfies the weak-Robinson condition at x 0 on U ( x 0 , v * ) , L 0 , L ( 0 , + ) , β T x 0 1 F ( x 0 ) , r = 1 , and T x 0 1 , F is L 0 -center Lipschitz on U ( x 0 , 1 L 0 ) and L-Lipschitz on U ( x 0 , ρ ) ,
a < 1 2 ,
and (39) holds. Then, any sequence { x n } generated by Algorithm A ( x 0 ) converges to a solution x * of the problem (1). Moreover, the following error estimate holds:
x * x n v * v n .
Proof. 
Notice that for each x U ( x 0 , v * ) ,
B k I = Δ + Δ 2 + + Δ k Δ + Δ 2 + + Δ k a + a 2 + + a k = a ( 1 a k ) 1 a = b k ¯ < 1 ( by ( ) ) .
So, B k 1 L ( Y , X ) and B k 1 1 1 b k ¯ = b k . Then, we can write
F 1 ( x ) = F ( x 0 ) B k 1 ( x ) = F ( x 0 ) B k 1 ( x ) B 1 ( x ) + B 1 ( x ) = F ( x 0 ) B 1 ( x ) + F ( x 0 ) B k 1 ( x ) B 1 ( x ) = F ( x 0 ) B 1 ( x ) + F ( x 0 ) B k 1 ( x ) B ( x ) B k ( x ) B 1 ( x ) .
We also need the estimate
B k 1 ( x ) B ( x ) B k ( x ) B 1 ( x ) b k a k + 1 + · b = b k b a k + 1 1 a = λ k ( x x 0 ) .
By replacing F by F 1 in the proof of Theorem 2 and using (43), we get
x n + 1 x n v n + 1 v n ,
leading to the existence of x * U ( x 0 , v * ) solving ( ) . Then, from (44), we have for each i = 0 , 1 , 2 ,
x n + i x n v n + i v n .
Finally, by letting i + in (45), we obtain (41). □
Remark 4. 
(i)
It is worth noticing that the condition (39) is weaker than the Kantorovich condition 2 β L 1 used in the Theorem 2 or (33) since both of these conditions imply (39) but not necessarily vice versa. Thus, (39) can replace these stronger conditions in the Theorem 2 provided that the sequence { v n } (for k = ) replaces the sequence { t n } .
(ii)
The results of the Theorem 3 can be presented in a more general setting if M is not necessarily chosen to be F ( x 0 ) as follows: Simply replace E ( x 0 ) by M in the Definition 1 and F ( x 0 ) by M in (12) where M L ( X , Y ) is an invertible operator. Then, the conclusions of the Theorem 3 hold this more general setting.
(iii)
The results can be further extended if the Lipschitz continuity is replaced by the generalized continuity of the operator F along the lines of our work in [2,3,4,5].

3. Concluding Remarks

Remark 5. 
The results in this work can also be expressed using majorant functions instead of the Lipschitz constants L 0 , L , and L 1 . This approach allows for even tighter error estimates (see [1,2,3,4,5]).
For instance, the constant L 0 in Definition 1 can be replaced with a function φ 0 : [ 0 , + ) [ 0 , + ) that is continuous and non-decreasing. Consequently, for all x U ( x 0 , ρ 0 ) , we obtain:
T ( E ( x ) E ( x 0 ) ) φ 0 ( x x 0 ) .
If φ 0 ( t ) = L 0 ( t ) , (46) reduces to the one stated in Definition 1. Similarly, L 1 and L in Definition 2 and Definition 3 can be replaced accordingly.
Let us introduce some parameters and real functions that play a role in the semi-local convergence of the Gauss-Newton algorithm under generalized continuity conditions used to control the derivative f of the operator f. Set A = [ 0 , + ) .
Suppose: ( C 1 ) There exists a function ψ 0 : A A , which is continuous and nondecreasing and a parameter α > 0 such that the equation α ψ 0 ( t ) 1 = 0 has at least one positive solution.Denote the smallest such solution by ξ . Set A 0 = [ 0 , ξ ) .
C 2 There exists a function ψ : A 0 A which is continuous and nondecreasing. Let β 0 be a parameter. Define the scalar sequence s n for c > 0 , s 0 = 0 , s 1 = β ,
s 2 = s 1 + c 0 1 ψ 0 ( 1 θ s 1 s 0 ) d θ s 1 s 0 1 ψ 0 s 1
and each n = 1 , 2 , by
s n + 2 = s n + 1 + c 0 1 ψ ( 1 θ ) s n + 1 s n d θ s n + 1 s n 1 ψ 0 s n + 1 .
The sequence s n shall be shown to be majorizing in Theorem 6.1. But, first a convergence condition is provided for the sequence { s n } .
C 3 There exists ξ 0 [ 0 , ξ ) such that for all n = 0 , 1 , 2 ,
ψ 0 s n < 1 and s n ξ 0 .
It follows by this condition and the definition of the sequence { s n } that for each n = 0 , 1 , 2 , , 0 s n s n + 1 ξ 0 and there exists s * such that s * = lim n s n . It is known from calculus that s * is the least upper bound of the sequence { s n } which is unique.
The scalar functions ψ 0 and ψ connect to the operators on the Gauss-Newton Algorithm. Let c [ 1 , + ) , Δ ( 0 , + ) and x 0 X be such that f x 0 Range T x 0 . Define β by
β = c T x 0 1 f x 0
C 4 Let T : Y 2 Z be a convex process, where Z is also a Banach space. The pair ( T 1 f ) is center-generalized continuous on U x 0 , ρ 0 for ρ > 0 , i.e., T 1 f ( x ) f x 0 ψ 0 x x 0 for all x U x 0 , ρ 0 .
Set
ξ ¯ 0 = min { ρ 0 , ξ } .
C 5 Let f , T be as in the condition C 4 . The pair T 1 , f is restricted generalized continuous on U x 0 , ξ ¯ , i.e., T 1 f ( x ) f ( y ) ψ x y for all x , y U ( x 0 , ξ ¯ ) .
C 6 β Δ .
C 7 U x 0 , s * U x 0 , ρ 0 .
Next, the main semi-local convergence of the Gauss-Newton algorithm is shown under the conditions C 1 C 7 .
Theorem 4. 
Suppose that the inclusion (1) satisfies the weak-Robinson condition at x 0 on U x 0 , s * and ( C 1 )-( C 7 ) hold. Then, the Gauss-Newton Algorithm A c , Δ , x 0 is well defined and any sequence x n converges to some x * U x 0 , s * such that F x * K . Moreover, the following items hold for all n = 1 , 2 ,
x n x n 1 s n s n 1 ,
f x n 1 + f x n 1 x n x n 1 K
and
x * x n s * s n .
Proof. 
It follows by the weak-Robinson condition that
D x 0 = T 1 x 0 f x 0 0 .
So, by the definition of β , (47), (48) and the condition ( C 6 , we get
c d 0 , D x 0 = c T x 0 1 F x 0 = β s 1 s 0 Δ
By the reflexivity of X, (54) and since c 1 ,there exists d D x 0 such that d β Δ . Hence, d 0 , D Δ x 0 = d 0 , D x 0 and D Δ x 0 . Thus, the iterate x 1 is well defined and f x 0 + f x 0 d 0 K ; so, (51) holds for n = 1 .
Moreover, by (54) and the Gauss-Newton Algorithm A c , Δ , x 0 : d 0 c d 0 , D x 0 s 1 s 0 , i.e., x 1 x 0 s 1 , s 0 . So,(50) holds sot n = 1 . Suppose that (50) and (51) hold for n = 1 , 2 , , m . Then, we get
x m x 0 x m x m 1 + x m 1 x m 2 + + x 1 x 0 s m s m 1 + s m 1 s m 2 + + s 1 s 0 = s m .
Similarly, we have x m 1 x 0 s m 1 s m .
Then, by the condition C 7 θ x m + ( 1 θ ) x m 1 U x 0 , s * B ( x 0 , ξ ) for all θ [ 0 , 1 ] , so D x m 0 . Hence, the iterate x m + 1 exists and (51) holds for n = m + 1 . Furthermore, for θ = 1 : x m x 0 s * ξ . By applying the condition ( C 4 ) T x 0 1 f x m f x 0 ψ 0 x m x 0 < ψ 0 s * 1 . Then, by the weak-Robinson condition, the Proposition (2) is applicable to x m replacing x leading to
D T x m 1 f x 0 = X
and
T 1 x 0 f x 0 1 1 ψ 0 x m x 0 1 1 ψ 0 s m
Thus, we get
T x 0 1 0 1 θ x m + ( 1 θ ) x m 1 f x m 1 x m d θ x m 1 x m 0 ,
and by the induction hypothesis, (47) and the condition C s
T x 0 1 f x m + f x m 1 f x m 1 x m 1 x m 0 1 ψ ( ( 1 θ ) s m s m 1 ) d θ s m 1 s m 1 .
Notice that this estimate holds for ψ replaced by ψ 0 if m = 1 . Next, we must show
T 1 ( x m ) 1 f ( x 0 ) T ( x 0 ) 1 f x m + f x m 1 x m 1 x m D ( x m )
It follows from (55) and (56) that the set in the middle of (58) is non-empty. To show the rest in (58), set w = f x m + f x m 1 + f x m 1 x m x m 1 and d T 1 x m f x 0 T x 0 1 ( w ) , so d T x m 1 g x 0 v for some v T x 0 1 ( w ) . We must show d D x m . But we have f x m d f x 0 v + K and f x 0 v w + K .
So, f x m d w + K + K = w + K , since K is a cone. By the definition of w and the induction hypothesis, it follows that
f x m + f x m d f x m 1 + f x m 1 x m x m 1 + K K + K = K ,
Thus, d D x m , showing the right-hand side of the assertion (58). Hence,we can get from (56) and (58) in turn,
c d 0 , D x m T x m 1 F x 0 T x 0 1 F x m + F x m 1 + F x m 1 x m x m 1 c 0 1 ψ ( 1 θ ) s m s m + 1 d θ s m s m 1 1 ψ 0 s m = s m + 1 s m .
Notice that the sequence s m + 1 s m is decreasing, so s m + 1 s m s 1 s 0 = β Δ . Thus, by (59) d 0 , D x m c d 0 , D x m Δ , which together with D ( x m ) 0 imply that there exists d 0 X with d 0 Δ such that F x m + F x m d 0 K . Hence, we have D Δ x m and d 0 , D Δ x m = d 0 , D x m . Thus, it follows by (59) that x m + 1 x m = d m c d 0 , D Δ x m s m + 1 s m . Thus, the induction for (50) and (51) is completed. It follows that the sequence { x m } is fundamental in X (since s m is also fundamental as convergent). Therefore,there exists x * X such that lim m + x m = x * and F x * K . Finally,from (50) for i = 0 , 1 , 2 and the triangle inequality, we obtain
x m + i x m s m + i s m
leading to (52) if i +
Remark 6. 
label=()
Let us specialize the functions ψ 0 and ψ to be ψ 0 ( t ) = L 0 t and ψ ( t ) = L t . Then, the condition C 3 becomes C 3 : L 0 s n < 1 and s n 1 L 0 , since ξ 0 = 1 L 0 by the condition C 1 . Notice that the sequence { s n } becomes by the definition of the functions ψ 0 and ψ
s 0 = 0 , s 1 = β , s 2 = s 1 + c L 0 s 1 s 0 2 2 1 s 1 L 0 s m + 1 = s m + 2 L s m s m 1 2 2 1 L 0 s m
According to the proof of Theorem 6.1, the sequence { s n } given by (60) majorizes { x n } provided that the conditions C 3 and C 6 hold. These conditions are weaker than (1). Indeed, this is the case, since
1 c α ( 1 + ( c 1 ) L β ) α ( 1 + ( c 1 ) L s n ) ,
so
c α ( 1 L s n ) 1 ( 1 α L s n ) 1 ψ ( s n ) 1 .
Hence, the sequence { s n } can be replaced by the less tight { t n } given by the formula (18). Notice that the sequence { t n } converges provided that the β 1 ( c + 1 ) L (see (27)). Consequently, the sequence s n } given by (60) is tighter than t n } used in [18] and the sufficient convergence conditions C 3 wood C 6 are weaker than ξ 1 C + 1 L 1 used in [18]. We conclude that under these choices of the functions ψ 0 and ψ, the results of Theorem 6.1 specialize to the ones in Theorem 1. Clearly, the same is true for Theorem 2 if we take c = 1 in Theorem 6.1.
lbbel=()
It is well known that generalized continuity provides even sharper bounds on the derivative than Lipschitz or Hölder or other conditions.

4. Discussion

The applicability of two Gauss-Newton methods for solving inclusion and convex-composition optimization problems for Banach space-valued operators is extended using more precise majorizing sequences and under the same computational cost, providing weaker yet sufficient semi-local convergence criteria and more precise error estimates on the distances x n + 1 x n and x * x n . Moreover, the implementation of these algorithms is addressed by developing the corresponding hybrid Gauss-Newton methods. The Lipschitz constants can be replaced by the generalized continuity of F .
In future research, the ideas presented in this study shall be used to extend the applicability of other iterative methods used to solve these problems.

Author Contributions

All authors contributed equally.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable

Informed Consent Statement

Not applicable

Data Availability Statement

Not applicable

Acknowledgments

IWe would like to express our gratitude to the anonymous reviewers for the constructive criticism of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Aragon Artacho, F.J.; Dontchev, A.L.; Gaydu, M.; Geoffroy, M.H.; Veliov, V.M. 2011. "Metric Regularity of Newton’s Iteration." SIAM Journal on Control and Optimization 49:339-362.
  2. Argyros, I.K. 2022. The Theory and Applications of Iteration Methods with Applications. Engineering Series, Second Edition. Boca Raton, FL: CRC Press/Taylor & Francis.
  3. Argyros, I.K.; George, S.; Shakhno, S.; Regmi, S.; Havdiak, M.; Argyros, M.I. 2024. "Asymptotically Newton-Type Methods Without Inverses for Solving Equations." Mathematics 12:1069. [CrossRef]
  4. Argyros, I.K.; George, S.; Regmi, S.; Argyros, C.I. 2024. "Hybrid Newton-like Inverse Free Algorithms for Solving Nonlinear Equations." Algorithms 17:154. [CrossRef]
  5. Argyros, I.K.; Shakhno, S.; Yarmola, H.; Regmi, S.; Shrestha, N. 2025. "Three step inverse free Kurchatov-like methods of convergence order close to four for equations." Eur. J. Math. Anal. 5:15-15.
  6. Argyros, I.K.; Shrestha, N. 2017. "Extending the local convergence analysis of Newton’s method." Commun. Appl. Nonlinear Anal. 24:49-60.
  7. Blum, L.; Cucker, F.; Shub, M.; Smale, S. 1998. Complexity and Real Computation. New York: Springer-Verlag.
  8. Burke, J.V.; Ferris, M.C. 1995. "A Gauss-Newton Method for Convex Composite Optimization." Mathematical Programming 71:179-194.
  9. Chen, J. 2008. "The Convergence Analysis of Inexact Gauss-Newton Methods for Nonlinear Problems." Computational Optimization and Applications 40(1):97-118.
  10. Chen, J.; Li, W. 2005. "Convergence of Gauss-Newton’s Method and Uniqueness of the Solution." Applied Mathematics and Computation 170(1):686-705.
  11. Dedieu, J.P.; Kim, M.H. 2002. "Newton’s Method for Analytic Systems of Equations with Constant Rank Derivatives." Journal of Complexity 18(1):187-209.
  12. Dennis Jr., J. E.; Schnabel, R.B. 1996. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Classics in Applied Mathematics, vol. 16. Philadelphia, PA: SIAM.
  13. Deuflhard, P.; Heindl, G. 1979. "Affine Invariant Convergence Theorems for Newton’s Method and Extensions to Related Methods." SIAM Journal on Numerical Analysis 16:1-10.
  14. Deuflhard, P. 2004. Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive Algorithms. Berlin; Heidelberg: Springer-Verlag.
  15. Dontchev, A.L.; Rockafellar, R.T. 2009. Implicit Functions and Solution Mappings: A View from Variational Analysis. Springer Monographs in Mathematics. Dordrecht: Springer.
  16. Ferreira, O.P.; Goncalves, M.L.N.; Oliveira, P.R. 2013. "Convergence of the Gauss-Newton Method for Convex Composite Optimization Under a Majorant Condition." SIAM Journal on Optimization 23(3):1757-1783.
  17. Ferreira, O.P.; Goncalves, M.L.N.; Oliveira, P.R. 2011. "Local Convergence Analysis of the Gauss-Newton Method under a Majorant Condition." Journal of Complexity 27:111-125.
  18. Ferreira, O.P.; Goncalves, M.L.N.; Oliveira, P.R. 2011. "Local Convergence Analysis of Inexact Newton-Like Methods under Majorant Condition." Computational Optimization and Applications 48:1-21.
  19. Kantorovich, L.V.; Akilov, G.P. 1982. Functional Analysis. 2nd ed. Oxford: Pergamon Press. (translated from Russian by Howard L. Silcock).
  20. Li, C.; Ng, K.F. 2007. "Majorizing Functions and Convergence of the Gauss-Newton Method for Convex Composite Optimization." SIAM Journal on Optimization 18:613-642.
  21. Li, C.; Wang, X.H. 2002. "On Convergence of the Gauss-Newton Method for Convex Composite Optimization." Mathematical Programming 91:349-356.
  22. Li, C.; Ng, K.F. 2012. "Convergence Analysis of the Gauss-Newton Method for Convex Inclusion and Convex-Composite Optimization Problems." Journal of Mathematical Analysis and Applications 389:469-485.
  23. Nocedal, J.; Wright, S.J. 1999. Numerical Optimization. Springer-Verlag.
  24. Proinov, P.D. 2009. "General Local Convergence Theory for a Class of Iterative Processes and Its Applications to Newton’s Method." Journal of Complexity 25(1):38-62.
  25. Robinson, S.M. 1972. "Extension of Newton’s Method to Nonlinear Functions with Values in a Cone." Numerische Mathematik 19:341-347.
  26. Robinson, S.M. 1972. "Normed Convex Process." Transactions of the American Mathematical Society 174:127-140.
  27. Rockafellar, R.T. 1970. Convex Analysis. Princeton: Princeton University Press.
  28. Rockafellar, R.T. 1967. Monotone Processes of Convex and Concave Type. Memoirs of the American Mathematical Society, no. 77.
  29. Smale, S. 1986. "Newton’s Method Estimates from Data at One Point." In The Merging of Disciplines: New Directions in Pure, Applied, and Computational Mathematics, 185-196. New York: Springer.
  30. Stewart, G.W. 1969. "On the Continuity of the Generalized Inverse." SIAM Journal on Applied Mathematics 17:33-45.
  31. Wedin, P.A. 1973. "Perturbation Theory for Pseudo-Inverses." BIT 13:217-232.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated