Preprint
Article

This version is not peer-reviewed.

Structural Results on the HMLasso

A peer-reviewed article of this preprint also exists.

Submitted:

01 August 2025

Posted:

01 August 2025

You are already at the latest version

Abstract
HMLasso (Lasso with High Missing Rate) is a useful technique for sparse regression when high-dimensional design matrix contain a large number of missing data. To solve HMLasso, an appropriate positive semidefinite symmetric matrix must be obtained. In this paper, we present two structural results on the HMLasso problem. These results allow existing acceleration algorithms for strongly convex functions to be applied to solve the HMLasso problem.
Keywords: 
;  ;  

1. Introduction

Let X be an n × p design matrix and let y be an n-dimensional response. Consider the following standard linear regression model
y = X β + ϵ ,
where ϵ is a noise term. The popular model focused on sparsity assumption of the regression vector. The Lasso [2] (Least absolute shrinkage and selection operator) is among the most popular procedures for estimating the unknown sparse regression vector in a high-dimensional linear model. Lasso is formulated as an 1 -penalized regression problem as follows:
min β 1 2 n y X β 2 2 + α β 1
where α > 0 is a regularization parameter and · 1 (resp. · 2 ) is the 1 (resp. 2 ) norm. Here, we consider the case where the design matrix X contains missing data. Missing data is prevalent and inevitable, which affects not only the representativeness and quality of data but also the results of Lasso regression. Therefore, it is crucial to develop a method applicable to process missing data. HMLasso [1] was proposed to effectively address this issue and is formulated as follows:
min Σ 1 2 W ( Σ S pair ) F 2 s . t . Σ 0
min β 1 2 β Σ ^ β ρ pair β + α β 1 ,
where ⊙ is Hadamard product, · F is Frobenius norm, S pair , W R p × p , ρ pair R p is defined using X (see Section 2 for details) and Σ ^ = argmin Σ 0 1 2 W ( Σ S pair ) F 2 (i.e., Σ ^ is the solution to problem (3)). It is known that problem (4) can be equivalently written as Lasso (2), and in the literature dedicated to sparse regression the most encountered algorithm is the proximal gradient method when dealing with a sum of two convex functions. Therefore, the key to solving HMLasso is to consider how fast and accurately problem (3) can be solved.
To deal with this challenge, we consider two structural results on the HMlasso problem. The first, we show that the gradient of the objective function of (3) is Lipschitz continuous; the second, we show that the objective function of (3) is strongly convex. These results guarantee that accelerated algorithms for strongly convex functions can be applied to solve (3). Finally, we conduct numerical experiments on the real data considered in [1]. The numerical results show that the accelerated algorithm for strongly convex functions is effective for solving problem (3).

2. Preliminaries

This section reviews basic definitions, facts, and notation that will be used throughout the paper.
For two matrices A , B R n × p , their Hadamard product A B is defined by
A B j k A j k B j k .
The Frobenius inner product is defined by A , B F tr ( A B ) . The Frobenius norm of A is defined by
A F A , A F = j n k p A j k 2 .
S p denotes the set of symmetric matrices of size p × p . We write A 0 (resp. A 0 ) to denote that A S p is positive semidefinite (resp. positive definite). S + p denotes the set of positive semidefinite matrices.
Let D R p × p . The indicator function i D : R p × p ( , ] of D is defined by
i D ( B ) 0 ( B D ) ; ( otherwise ) .
Let f : R p × p ( , ] be a proper and convex function. The proximal mapping prox γ f ( x ) of f is defined by
prox f ( Σ ) argmin W R p × p f ( W ) + 1 2 W Σ F 2 .
Let Σ S p . Then there exist a diagonal matrix Λ R p × p and a p × p orthogonal matrix U such that Σ = U Λ U . Let P S + p be the matric projection from S p onto S + p . Then, the following holds (see for example [3], [Example 29.31], [4] [Theorem 6.3]):
P S + p ( Σ ) = U Λ + U ,
where Λ + is the diagonal matrix obtaind from Λ by setting the negative entries to 0.
Let X R n × p be a design matrix. Set
I j k { i : X i j and X i k are observed }
and let n j k be the number of elements of I j k . We define matricies S pair and W as follows:
( S pair ) j k 1 n j k i I j k x i j x i k ( I j k ) 0 ( I j k = ) , ( W ) j k n j k n .
Let L [ 0 , ) and let h : R p × p R { } be a differentiable function. The gradient h of h is said to be L-Lipschitz continuous if
h ( Σ 1 ) h ( Σ 2 ) F L Σ 1 Σ 2 F ( Σ 1 , Σ 2 R p × p ) .
This condition is often called L-smoothness in the literature. Let h : R p × p R be L-smooth on R p × p , Then, we can upper bound the function h as
h ( Σ 1 ) h ( Σ 2 ) + h ( Σ 2 ) , Σ 1 Σ 2 + L 2 Σ 1 Σ 2 F 2 ( Σ 1 , Σ 2 R p × p )
( see for example [5] [Theorem 5.8], [6] [Theorem A.1]).
Let μ ( 0 , ) and let g : R p × p R { + } . g is said to be μ -strongly convex if for each Σ 1 , Σ 2 R p × p and λ ( 0 , 1 ) , we have
g ( λ Σ 1 + ( 1 λ ) Σ 2 ) λ g ( Σ 1 ) + ( 1 λ ) g ( Σ 2 ) μ 2 λ ( 1 λ ) Σ 1 Σ 2 F 2 .
Suppose that g is μ -strongly convex and differentiable. Then (9) is equivalent to the following condition:
g ( Σ 1 ) g ( Σ 2 ) , Σ 1 Σ 2 μ Σ 1 Σ 2 F 2 ( Σ 1 , Σ 2 R p × p ) .

3. Main Results

In this section, we present two structural results on the HMlasso problem (3).
Define f ( Σ ) 1 2 W ( Σ S pair ) F 2 . In this case, the gradient f of f is described as the following:
f ( Σ ) = 1 2 W ( Σ S pair ) F 2 = W W ( Σ S pair )
(see [1]).

3.1. Lipschitz continuity

The first structural result deals with the gradient of the objective function in (3). We show the Lipschitz continuity of f .
Lemma 1. 
The gradient f of f is Lipschitz continuous and its Lipschitz constant is W W F .
Proof. 
f ( Σ 1 ) f ( Σ 2 ) F 2 = W W ( Σ 1 S pair ) W W ( Σ 2 S pair ) F 2 = W W ( Σ 1 Σ 2 ) F 2 = j = 1 p k = 1 p W j k 2 ( Σ 1 j k Σ 2 j k ) 2 .
On the other hand,
W W F 2 Σ 1 Σ 2 F 2 = j = 1 p k = 1 p W j k 4 α ¯ · j = 1 p k = 1 p Σ 1 j k Σ 2 j k 2 .
This implies
W W F 2 Σ 1 Σ 2 F 2 f ( Σ 1 ) f ( Σ 2 ) F 2 = j = 1 p k = 1 p α ¯ W j k 4 Σ 1 j k Σ 2 j k 2 0 ,
where the last inequality follows from α ¯ W j k 4 0 for any j and k. Hence
f ( Σ 1 ) f ( Σ 2 ) F W W F Σ 1 Σ 2 F .
   □

3.2. Strong monotonicity

We next consider the strong monotonicity of f. This is our second result.
Lemma 2. 
f is min l , m ( W W ) l , m -strongly convex.
Proof. 
We show (10) with constant min l , m ( W W ) l , m . Let Σ 1 , Σ 2 R p × p .
f ( Σ 1 ) f ( Σ 2 ) , Σ 1 Σ 2 = W W ( Σ 1 Σ 2 ) , Σ 1 Σ 2 = j = 1 p k = 1 p W j k 2 ( Σ 1 j k Σ 2 j k ) 2 j = 1 p k = 1 p min j , k ( W W ) j , k ( Σ 1 j k Σ 2 j k ) 2 = min l , m ( W W ) l , m Σ 1 Σ 2 F 2 ,
where the second to last inequality follows from W j k 2 min l , m ( W W ) l , m for any j and k. This implies that f is min l , m ( W W ) l , m -strongly convex.    □
Remark 1. 
From Lemmas 1 and 2, the objective function of (3) is strongly convex and has Lipschitz gradient. It should be noted that we can guarantee faster convergence than just on regular convex function is if we know the objective function is strongly convex and it is smooth (see for example [6]).

4. Numerical Experiments

In this section, we consider a strongly convex variant of FISTA [6] [Algorithm 19] to solve problem (3).

4.1. Strongly convex FISTA

Let f : R p × p R be an L-smooth and μ -strongly convex function with dom ( f ) = R p × p , and let h : R p × p R { } be a convex function with { Σ : h ( Σ ) < } . We consider the problem of minimizing sum of f and h:
min Σ R p × p f ( Σ ) + h ( Σ ) .
In this setting, forward-backward splitting strategies were introduced as classical methods for solving (12). In the context of acceleration methods, the fast iterative shrinkage threshold algorithm (FISTA) was introduced by Beck and Teboulle [7] using the idea of forward-backward splitting. This topc is addressed in many references, and we refer to [3,5,6].
Here, we focus on the following strongly convex variant of FISTA involving backtracking investigated in [6].
Algorithm 1: Strongly convex FISTA [6] [Algorithm 19]
Input: 
An initial point Σ 0 R p × p , and initial estimate L 0 > μ .
1:
Initialize Φ 0 = Σ 0 , t 0 = 0 , and some α > 1 .
2:
for k = 0 , do
3:
    L k + 1 = L k
4:
   while do
5:
      q k + 1 = μ / L k + 1
6:
      t k + 1 = 2 t k + 1 + 4 t k + 4 q k + 1 t k 2 + 1 2 ( 1 q k + 1 )
7:
     set τ k = ( t k + 1 t k ) ( 1 + q k + 1 t k ) t k + 1 + 2 q k + 1 t k t k + 1 q k t k 2 and δ k = t k + 1 t k 1 + q k + 1 t k + 1
8:
      Ψ k = Σ k + τ k ( Φ k Σ k )
9:
      Σ k + 1 = prox h / L k + 1 ( Ψ k 1 L k + 1 f ( Ψ k ) )
10:
      Φ k + 1 = ( 1 q k + 1 δ k ) Φ k + q k + 1 δ k Ψ k + δ k ( Σ k + 1 Ψ k )
11:
     if  f ( Σ k + 1 ) f ( Σ k ) + f ( Σ k ) , Σ k + 1 Σ k + L k + 1 2 Σ k + 1 Σ k F 2 holds then
12:
         break { k will be incremented . }
13:
     else
14:
         L k + 1 = α L k + 1 { Recompute new L k + 1 . }
15:
     end if
16:
   end while
17:
end for
Output: 
An approximate solution Σ k + 1
Remark 2. 
We demonstrate how strongly convex FISTA can be applied to (3). Set
f ( Σ ) 1 2 W ( Σ S pair ) F 2 and h ( Σ ) i S + p ( Σ ) .
In this case, problem (3) is the special instanse of problem (12). The gradient f can be computed by (11). Moreover, the proximal mapping prox h is P S + p (see for example [3], Example 12.25) and hence the computation in the algorithm is simple.

4.2. Residential Building Dataset

We compared the performance of HMLasso (3) with the strongly convex FISTA (SCFISTA) and the alternating direction multiplier method (ADMM) used in [6]. All experiments are conducted on a PC with Apple M1 Max CPU and 32GB RAM. Methods are implemented with MATLAB (R2025a).
The numerical experiment uses actual residential building dataset from the UCI Data Repository1. The data consisted of n = 300 samples and p = 27 variables. We set the average missing rates to 20 % , 40 % , 60 % and 80 % . As choices for L 0 , α and μ occuring in algorithm, we let L 0 1 , α 1.1 and μ min l , m ( W W ) l , m . We chose 10 random initial points Σ 0 ( i ) R p × p ( i = 1 , 2 , , 10 ) and all entries at the initial point are defined from a standard normal distribution. Figures demonstrate the following function
D k ( i ) Σ k ( i ) Σ cvx F and D k ( 1 / 10 ) i = 1 10 D k ( i ) ,
where Σ cvx denotes the solution obtained by cvx2 and { Σ k ( i ) } is the sequence generated by Σ 0 ( i ) and each of SCFISTA and ADMM.
Figure 1 and Figure 2 show the computation results of the relation between the distance to a solution and iteration number. As we can see from these figures, SCFISTA and ADMM converge to the solution as the number of iterations increases. Furthermore, we found it more effective to solve the HMLasso problem using the acceleration algorithm for strongly convex functions.

5. Conclusions

In this paper we present two structural results on the HMlasso problem, covering Lipschitz continuity of the gradient of the objective function and strong convexity of the objective function. These results allow existing acceleration algorithms for strongly convex functions to be applied to solve the HMLasso problem. Our numerical experiments suggest that accelerated algorithms for strongly convex functions are computationally attractive.

Author Contributions

Methodology, Writing - original draft, Shin-ya Matsushita; Software, Sasaki Hiromu. All authors have read andagreed to the published version of the manuscript.

Funding

This work was partially supported by JSPS KAKENHI, Grant Number 23K03235.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We would like to express our sincere gratitude to Professor Takayasu Yamaguchi of Akita Prefectural University for providing the initial inspiration for this research. This work was supported by the Research Institute for Mathematical Sciences, an International Joint Usage/Research Center located in Kyoto University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Takada, M.; Fujisawa, H.; Nishikawa, T. HMLasso: Lasso with High Missing Rate. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) 2019, 3541–3547. [Google Scholar]
  2. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  3. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed.; Springer: CMS Books in Mathematics, 2017. [Google Scholar]
  4. Escalante, R.; Raydan, M. Alternating Projection Methods; SIAM: Philadelphia, PA, USA, 2011. [Google Scholar]
  5. Beck, A. First-order Methods in Optimization; SIAM: MOS-SIAM Series on Optimization, USA, 2017. [Google Scholar]
  6. d’Aspremont, A.; Scieur, D.; Taylor, A. Acceleration Methods; Now Publishers: Fundations and Trends in Optimizaion, 2021. [Google Scholar]
  7. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci., 2009, 2, 183–202. [Google Scholar] [CrossRef]
1
2
Figure 1. Figure 1 shows numerical comparisons of SCFISTA and ADMM at average missing rates of 20% (left) and 40% (right) for the design matrix in (3).
Figure 1. Figure 1 shows numerical comparisons of SCFISTA and ADMM at average missing rates of 20% (left) and 40% (right) for the design matrix in (3).
Preprints 170717 g001
Figure 2. Figure 2 shows numerical comparisons of SCFISTA and ADMM at average missing rates of 60% (left) and 80% (right) for the design matrix in (3).
Figure 2. Figure 2 shows numerical comparisons of SCFISTA and ADMM at average missing rates of 60% (left) and 80% (right) for the design matrix in (3).
Preprints 170717 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated