Structural Results on the HMLasso

Shin-ya Matsushita; Hiromu Sasaki

doi:10.20944/preprints202508.0083.v1

Submitted:

01 August 2025

Posted:

01 August 2025

You are already at the latest version

Abstract

HMLasso (Lasso with High Missing Rate) is a useful technique for sparse regression when high-dimensional design matrix contain a large number of missing data. To solve HMLasso, an appropriate positive semidefinite symmetric matrix must be obtained. In this paper, we present two structural results on the HMLasso problem. These results allow existing acceleration algorithms for strongly convex functions to be applied to solve the HMLasso problem.

Keywords:

HMLass

;

smooth and strongly convex function

;

strongly convex FISTA

Subject:

Computer Science and Mathematics - Applied Mathematics

1. Introduction

Let X be an

n \times p

design matrix and let

y

be an n-dimensional response. Consider the following standard linear regression model

y = X β + ϵ,

(1)

where

ϵ

is a noise term. The popular model focused on sparsity assumption of the regression vector. The Lasso [2] (Least absolute shrinkage and selection operator) is among the most popular procedures for estimating the unknown sparse regression vector in a high-dimensional linear model. Lasso is formulated as an

ℓ^{1}

-penalized regression problem as follows:

min_{β} \frac{1}{2 n} {∥ y - X β ∥}_{2}^{2} + α {∥ β ∥}_{1}

(2)

where

α > 0

is a regularization parameter and

{∥ \cdot ∥}_{1}

(resp.

{∥ \cdot ∥}_{2}

) is the

ℓ^{1}

(resp.

ℓ^{2}

) norm. Here, we consider the case where the design matrix X contains missing data. Missing data is prevalent and inevitable, which affects not only the representativeness and quality of data but also the results of Lasso regression. Therefore, it is crucial to develop a method applicable to process missing data. HMLasso [1] was proposed to effectively address this issue and is formulated as follows:

\begin{matrix} min_{Σ} \frac{1}{2} {∥ W ⊙ (Σ - S^{pair}) ∥}_{F}^{2} s . t . Σ ⪰ 0 \end{matrix}

(3)

\begin{matrix} min_{β} \frac{1}{2} β^{⊤} \hat{Σ} β - ρ^{pair ⊤} β + α {∥ β ∥}_{1}, \end{matrix}

(4)

where ⊙ is Hadamard product,

{∥ \cdot ∥}_{F}

is Frobenius norm,

S^{pair}, W \in R^{p \times p}, ρ^{pair} \in R^{p}

is defined using X (see Section 2 for details) and

\hat{Σ} = \underset{Σ ⪰ 0}{argmin} \frac{1}{2} {∥ W ⊙ (Σ - S^{pair}) ∥}_{F}^{2}

(i.e.,

\hat{Σ}

is the solution to problem (3)). It is known that problem (4) can be equivalently written as Lasso (2), and in the literature dedicated to sparse regression the most encountered algorithm is the proximal gradient method when dealing with a sum of two convex functions. Therefore, the key to solving HMLasso is to consider how fast and accurately problem (3) can be solved.

To deal with this challenge, we consider two structural results on the HMlasso problem. The first, we show that the gradient of the objective function of (3) is Lipschitz continuous; the second, we show that the objective function of (3) is strongly convex. These results guarantee that accelerated algorithms for strongly convex functions can be applied to solve (3). Finally, we conduct numerical experiments on the real data considered in [1]. The numerical results show that the accelerated algorithm for strongly convex functions is effective for solving problem (3).

2. Preliminaries

This section reviews basic definitions, facts, and notation that will be used throughout the paper.

For two matrices

A, B \in R^{n \times p}

, their Hadamard product

A ⊙ B

is defined by

{(A ⊙ B)}_{j k} A_{j k} B_{j k} .

The Frobenius inner product is defined by

{〈A, B〉}_{F} tr (A B^{⊤})

. The Frobenius norm of A is defined by

{∥ A ∥}_{F} \sqrt{{〈A, A〉}_{F}} = \sqrt{\sum_{j}^{n} \sum_{k}^{p} A_{j k}^{2}} .

S^{p}

denotes the set of symmetric matrices of size

p \times p

. We write

A ⪰ 0

(resp.

A ≻ 0

) to denote that

A \in S^{p}

is positive semidefinite (resp. positive definite).

S_{+}^{p}

denotes the set of positive semidefinite matrices.

Let

D \subset R^{p \times p}

. The indicator function

i_{D} : R^{p \times p} \to (- \infty, \infty]

of

D

is defined by

i_{D} (B) \{\begin{matrix} 0 & (B \in D); \\ \infty & (otherwise) . \end{matrix}

Let

f : R^{p \times p} \to (- \infty, \infty]

be a proper and convex function. The proximal mapping

{prox}_{γ f} (x)

of f is defined by

{prox}_{f} (Σ) \underset{W \in R^{p \times p}}{argmin} \{f (W) + \frac{1}{2} {∥ W - Σ ∥}_{F}^{2}\} .

Let

Σ \in S^{p}

. Then there exist a diagonal matrix

Λ \in R^{p \times p}

and a

p \times p

orthogonal matrix U such that

Σ = U Λ U^{⊤}

. Let

P_{S_{+}^{p}}

be the matric projection from

S^{p}

onto

S_{+}^{p}

. Then, the following holds (see for example [3], [Example 29.31], [4] [Theorem 6.3]):

P_{S_{+}^{p}} (Σ) = U Λ_{+} U^{⊤},

(5)

where

Λ_{+}

is the diagonal matrix obtaind from

Λ

by setting the negative entries to 0.

Let

X \in R^{n \times p}

be a design matrix. Set

I_{j k} {i : X_{i j} and X_{i k} are observed}

and let

n_{j k}

be the number of elements of

I_{j k}

. We define matricies

S^{pair}

and W as follows:

\begin{matrix} {(S^{pair})}_{j k} \{\begin{matrix} \frac{1}{n_{j k}} \sum_{i \in I_{j k}} x_{i j} x_{i k} & (I_{j k} \neq ⌀) \\ 0 & (I_{j k} = ⌀) \end{matrix}, {(W)}_{j k} \frac{n_{j k}}{n} . \end{matrix}

(6)

Let

L \in [0, \infty)

and let

h : R^{p \times p} \to R \cup {\infty}

be a differentiable function. The gradient

\nabla h

of h is said to be L-Lipschitz continuous if

∥ \nabla h (Σ_{1}) - \nabla h (Σ_{2}) ∥_{F} \leq L {∥ Σ_{1} - Σ_{2} ∥}_{F} (\forall Σ_{1}, Σ_{2} \in R^{p \times p}) .

(7)

This condition is often called L-smoothness in the literature. Let

h : R^{p \times p} \to R

be L-smooth on

R^{p \times p}

, Then, we can upper bound the function h as

h (Σ_{1}) \leq h (Σ_{2}) + 〈\nabla h (Σ_{2}), Σ_{1} - Σ_{2}〉 + \frac{L}{2} {∥ Σ_{1} - Σ_{2} ∥}_{F}^{2} (\forall Σ_{1}, Σ_{2} \in R^{p \times p})

(8)

( see for example [5] [Theorem 5.8], [6] [Theorem A.1]).

Let

μ \in (0, \infty)

and let

g : R^{p \times p} \to R \cup {+ \infty}

. g is said to be

μ

-strongly convex if for each

Σ_{1}, Σ_{2} \in R^{p \times p}

and

λ \in (0, 1)

, we have

g (λ Σ_{1} + (1 - λ) Σ_{2}) \leq λ g (Σ_{1}) + (1 - λ) g (Σ_{2}) - \frac{μ}{2} λ (1 - λ) {∥ Σ_{1} - Σ_{2} ∥}_{F}^{2} .

(9)

Suppose that g is

μ

-strongly convex and differentiable. Then (9) is equivalent to the following condition:

〈\nabla g (Σ_{1}) - \nabla g (Σ_{2}), Σ_{1} - Σ_{2}〉 \geq μ {∥ Σ_{1} - Σ_{2} ∥}_{F}^{2} (\forall Σ_{1}, Σ_{2} \in R^{p \times p}) .

(10)

3. Main Results

In this section, we present two structural results on the HMlasso problem (3).

Define

f (Σ) \frac{1}{2} {∥ W ⊙ (Σ - S^{pair}) ∥}_{F}^{2}

. In this case, the gradient

\nabla f

of f is described as the following:

\nabla f (Σ) = \nabla (\frac{1}{2} {∥ W ⊙ (Σ - S^{pair}) ∥}_{F}^{2}) = W ⊙ W ⊙ (Σ - S^{pair})

(11)

(see [1]).

3.1. Lipschitz continuity

The first structural result deals with the gradient of the objective function in (3). We show the Lipschitz continuity of

\nabla f

.

Lemma 1.

The gradient

\nabla f

of f is Lipschitz continuous and its Lipschitz constant is

{∥ W ⊙ W ∥}_{F}

.

Proof.

\begin{matrix} ∥ \nabla f (Σ_{1}) - \nabla f (Σ_{2}) ∥_{F}^{2} & = ∥ W ⊙ W ⊙ (Σ_{1} - S^{pair}) - W ⊙ W ⊙ (Σ_{2} - S^{pair}) ∥_{F}^{2} \\ = ∥ W ⊙ W ⊙ (Σ_{1} - Σ_{2}) ∥_{F}^{2} \\ = \sum_{j = 1}^{p} \sum_{k = 1}^{p} {(W_{j k}^{2} (Σ_{1 j k} - Σ_{2 j k}))}^{2} . \end{matrix}

On the other hand,

\begin{matrix} {∥ W ⊙ W ∥}_{F}^{2} {∥ Σ_{1} - Σ_{2} ∥}_{F}^{2} & = \underset{\bar{α}}{\underset{︸}{\sum_{j = 1}^{p} \sum_{k = 1}^{p} W_{j k}^{4}}} \cdot \sum_{j = 1}^{p} \sum_{k = 1}^{p} {(Σ_{1 j k} - Σ_{2 j k})}^{2} . \end{matrix}

This implies

\begin{matrix} {∥ W ⊙ W ∥}_{F}^{2} ∥ Σ_{1} - Σ_{2} ∥_{F}^{2} - {∥ \nabla f (Σ_{1}) - \nabla f (Σ_{2}) ∥}_{F}^{2} & = \sum_{j = 1}^{p} \sum_{k = 1}^{p} (\bar{α} - W_{j k}^{4}) {(Σ_{1 j k} - Σ_{2 j k})}^{2} \\ \geq 0, \end{matrix}

where the last inequality follows from

\bar{α} - W_{j k}^{4} \geq 0

for any j and k. Hence

\begin{matrix} ∥ \nabla f (Σ_{1}) - \nabla f (Σ_{2}) ∥_{F} \leq {∥ W ⊙ W ∥}_{F} {∥ Σ_{1} - Σ_{2} ∥}_{F} . \end{matrix}

□

3.2. Strong monotonicity

We next consider the strong monotonicity of f. This is our second result.

Lemma 2.

f is

{min}_{l, m} {(W ⊙ W)}_{l, m}

-strongly convex.

Proof.

We show (10) with constant

{min}_{l, m} {(W ⊙ W)}_{l, m}

. Let

Σ_{1}, Σ_{2} \in R^{p \times p}

.

\begin{matrix} 〈\nabla f (Σ_{1}) - \nabla f (Σ_{2}), Σ_{1} - Σ_{2}〉 & = 〈W ⊙ W ⊙ (Σ_{1} - Σ_{2}), Σ_{1} - Σ_{2}〉 \\ = \sum_{j = 1}^{p} \sum_{k = 1}^{p} W_{j k}^{2} {(Σ_{1 j k} - Σ_{2 j k})}^{2} \\ \geq \sum_{j = 1}^{p} \sum_{k = 1}^{p} min_{j, k} {(W ⊙ W)}_{j, k} {(Σ_{1 j k} - Σ_{2 j k})}^{2} \\ = min_{l, m} {(W ⊙ W)}_{l, m} {∥ Σ_{1} - Σ_{2} ∥}_{F}^{2}, \end{matrix}

where the second to last inequality follows from

W_{j k}^{2} \geq {min}_{l, m} {(W ⊙ W)}_{l, m}

for any j and k. This implies that f is

{min}_{l, m} {(W ⊙ W)}_{l, m}

-strongly convex. □

Remark 1.

From Lemmas 1 and 2, the objective function of (3) is strongly convex and has Lipschitz gradient. It should be noted that we can guarantee faster convergence than just on regular convex function is if we know the objective function is strongly convex and it is smooth (see for example [6]).

4. Numerical Experiments

In this section, we consider a strongly convex variant of FISTA [6] [Algorithm 19] to solve problem (3).

4.1. Strongly convex FISTA

Let

f : R^{p \times p} \to R

be an L-smooth and

μ

-strongly convex function with

dom (f) = R^{p \times p}

, and let

h : R^{p \times p} \to R \cup {\infty}

be a convex function with

{Σ : h (Σ) < \infty} \neq \emptyset

. We consider the problem of minimizing sum of f and h:

min_{Σ \in R^{p \times p}} f (Σ) + h (Σ) .

(12)

In this setting, forward-backward splitting strategies were introduced as classical methods for solving (12). In the context of acceleration methods, the fast iterative shrinkage threshold algorithm (FISTA) was introduced by Beck and Teboulle [7] using the idea of forward-backward splitting. This topc is addressed in many references, and we refer to [3,5,6].

Here, we focus on the following strongly convex variant of FISTA involving backtracking investigated in [6].

Algorithm 1: Strongly convex FISTA [6] [Algorithm 19]

Input:: An initial point $Σ_{0} \in R^{p \times p}$ , and initial estimate $L_{0} > μ$ .
1:: Initialize $Φ_{0} = Σ_{0}$ , $t_{0} = 0$ , and some $α > 1$ .
2:: for $k = 0, \dots$ do
3:: $L_{k + 1} = L_{k}$
4:: while do
5:: $q_{k + 1} = μ / L_{k + 1}$
6:: $t_{k + 1} = \frac{2 t_{k} + 1 + \sqrt{4 t_{k} + 4 q_{k + 1} t_{k}^{2} + 1}}{2 (1 - q_{k + 1})}$
7:: set $τ_{k} = \frac{(t_{k + 1} - t_{k}) (1 + q_{k + 1} t_{k})}{t_{k + 1} + 2 q_{k + 1} t_{k} t_{k + 1} - q_{k} t_{k}^{2}}$ and $δ_{k} = \frac{t_{k + 1} - t_{k}}{1 + q_{k + 1} t_{k + 1}}$
8:: $Ψ_{k} = Σ_{k} + τ_{k} (Φ_{k} - Σ_{k})$
9:: $Σ_{k + 1} = {prox}_{h / L_{k + 1}} (Ψ_{k} - \frac{1}{L_{k + 1}} \nabla f (Ψ_{k}))$
10:: $Φ_{k + 1} = (1 - q_{k + 1} δ_{k}) Φ_{k} + q_{k + 1} δ_{k} Ψ_{k} + δ_{k} (Σ_{k + 1} - Ψ_{k})$
11:: if $f (Σ_{k + 1}) \leq f (Σ_{k}) + 〈\nabla f (Σ_{k}), Σ_{k + 1} - Σ_{k}〉 + \frac{L_{k + 1}}{2} {∥ Σ_{k + 1} - Σ_{k} ∥}_{F}^{2}$ holds then
12:: $break {k will be incremented .}$
13:: else
14:: $L_{k + 1} = α L_{k + 1} {Recompute new L_{k + 1} .}$
15:: end if
16:: end while
17:: end for
Output:: An approximate solution $Σ_{k + 1}$

Remark 2.

We demonstrate how strongly convex FISTA can be applied to (3). Set

f (Σ) \frac{1}{2} {∥ W ⊙ (Σ - S^{pair}) ∥}_{F}^{2} and h (Σ) i_{S_{+}^{p}} (Σ) .

In this case, problem (3) is the special instanse of problem (12). The gradient

\nabla f

can be computed by (11). Moreover, the proximal mapping

{prox}_{h}

is

P_{S_{+}^{p}}

(see for example [3], Example 12.25) and hence the computation in the algorithm is simple.

4.2. Residential Building Dataset

We compared the performance of HMLasso (3) with the strongly convex FISTA (SCFISTA) and the alternating direction multiplier method (ADMM) used in [6]. All experiments are conducted on a PC with Apple M1 Max CPU and 32GB RAM. Methods are implemented with MATLAB (R2025a).

The numerical experiment uses actual residential building dataset from the UCI Data Repository1. The data consisted of

n = 300

samples and

p = 27

variables. We set the average missing rates to

20 %

,

40 %

,

60 %

and

80 %

. As choices for

L_{0}

,

α

and

μ

occuring in algorithm, we let

L_{0} 1

,

α 1.1

and

μ {min}_{l, m} {(W ⊙ W)}_{l, m}

. We chose 10 random initial points

Σ_{0}^{(i)} \in R^{p \times p}

(

i = 1, 2, \dots, 10

) and all entries at the initial point are defined from a standard normal distribution. Figures demonstrate the following function

D_{k}^{(i)} {∥ Σ_{k}^{(i)} - Σ_{cvx} ∥}_{F} and D_{k} (1 / 10) \sum_{i = 1}^{10} D_{k}^{(i)},

(13)

where

Σ_{cvx}

denotes the solution obtained by cvx2 and

{Σ_{k}^{(i)}}

is the sequence generated by

Σ_{0}^{(i)}

and each of SCFISTA and ADMM.

Figure 1 and Figure 2 show the computation results of the relation between the distance to a solution and iteration number. As we can see from these figures, SCFISTA and ADMM converge to the solution as the number of iterations increases. Furthermore, we found it more effective to solve the HMLasso problem using the acceleration algorithm for strongly convex functions.

5. Conclusions

In this paper we present two structural results on the HMlasso problem, covering Lipschitz continuity of the gradient of the objective function and strong convexity of the objective function. These results allow existing acceleration algorithms for strongly convex functions to be applied to solve the HMLasso problem. Our numerical experiments suggest that accelerated algorithms for strongly convex functions are computationally attractive.

Author Contributions

Methodology, Writing - original draft, Shin-ya Matsushita; Software, Sasaki Hiromu. All authors have read andagreed to the published version of the manuscript.

Funding

This work was partially supported by JSPS KAKENHI, Grant Number 23K03235.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We would like to express our sincere gratitude to Professor Takayasu Yamaguchi of Akita Prefectural University for providing the initial inspiration for this research. This work was supported by the Research Institute for Mathematical Sciences, an International Joint Usage/Research Center located in Kyoto University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Takada, M.; Fujisawa, H.; Nishikawa, T. HMLasso: Lasso with High Missing Rate. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) 2019, 3541–3547. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed.; Springer: CMS Books in Mathematics, 2017. [Google Scholar]
Escalante, R.; Raydan, M. Alternating Projection Methods; SIAM: Philadelphia, PA, USA, 2011. [Google Scholar]
Beck, A. First-order Methods in Optimization; SIAM: MOS-SIAM Series on Optimization, USA, 2017. [Google Scholar]
d’Aspremont, A.; Scieur, D.; Taylor, A. Acceleration Methods; Now Publishers: Fundations and Trends in Optimizaion, 2021. [Google Scholar]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci., 2009, 2, 183–202. [Google Scholar] [CrossRef]

1	https://archive.ics.uci.edu/ml/datasets/Residential+Building+Data+Set
2	http://cvxr.com/cvx/

Figure 1. Figure 1 shows numerical comparisons of SCFISTA and ADMM at average missing rates of 20% (left) and 40% (right) for the design matrix in (3).

Figure 2. Figure 2 shows numerical comparisons of SCFISTA and ADMM at average missing rates of 60% (left) and 80% (right) for the design matrix in (3).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Structural Results on the HMLasso

Abstract

Keywords:

Subject:

1. Introduction

2. Preliminaries

3. Main Results

3.1. Lipschitz continuity

3.2. Strong monotonicity

4. Numerical Experiments

4.1. Strongly convex FISTA

4.2. Residential Building Dataset

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe