1. Introduction
Logistic regression models are a powerful and popular technique for modeling the relationship between the predictors and a categorical response variable. Let
$({x}_{1},{y}_{1}),\cdots ,({x}_{n},{y}_{n})$ be independent pairs of observed data which are realizations of random vector
$(X,Y)$, with
$p-$dimensional predictors
$X\in {\mathbb{R}}^{p}$ and univariate binary response variable
$Y\in \{0,1\}$.
$(X,Y)$ is assumed to satisfy
where
${\beta}^{0}\in {\mathbb{R}}^{p}$ is a regression vector to be estimated. We are especially concerned with a sparse logistic regression problem when the dimension
p is high and the sample size
n might be small, the so-called "small
n, large
p" framework, which is a variable selection problem for high-dimensional data.
When dealing with high-dimensional data, there are usually two important considerations: model sparsity and prediction accuracy. The Lasso [
1] was proposed to address these two objectives since Lasso can determine submodels with a moderate number of parameters that still fit the data adequately. There are also other similar methods include SCAD [
2], elastic net [
3], Dantzig selector [
4], MCP [
5] and so on. In high-dimensional logistic regression models, the topics for Lasso study range from the asymptotic results, including the consistency and asymptotic distribution of the estimator, e.g., Huang et al. [
6], Bianco et al. [
7], to the non-asymptotic results, including the non-asymptotic oracle inequalities on the estimation and prediction errors, e.g., Abramovich et al. [
8], Huang et al. [
9] and Yin [
10].
In many applications, predictors can often be thought of as grouped. For example, in genome-wide association studies (GWAS), genes usually do not act individually, but are reflected in the covariation of several genes with each other. Or in histologically normal epithelium (NlEpi) studies, we need to consider the non-linear effects of genes for microarray data. Similar to the Lasso, considering this grouped information in the modeling process should improve the interpretability and the accuracy of the model. Yuan and Lin [
11] proposed an extension of the Lasso, called the Group Lasso, which imposes an
${L}_{2}$ penalty to individual groups of variables and then an
${L}_{1}$ penalty to the resulting block norms, rather than only an
${L}_{1}$ penalty to individual variables. Suppsose
${x}_{i}$ and
${\beta}^{0}$ in model (
1) are divided into
g known groups, where we consider a partition
$\{{G}_{1},\cdots ,{G}_{g}\}$ of
$\{1,\cdots ,p\}$ into groups and denote the cardinality of a group
${G}_{l}$ by
$|{G}_{l}|$,
${x}_{i}={({x}_{i\left(1\right)}^{T},{x}_{i\left(2\right)}^{T},\cdots ,{x}_{i\left(g\right)}^{T})}^{T}$,
${\beta}^{0}={({\left({\beta}_{\left(1\right)}^{0}\right)}^{T},{\left({\beta}_{\left(2\right)}^{0}\right)}^{T},\cdots ,{\left({\beta}_{\left(g\right)}^{0}\right)}^{T})}^{T}$,
${x}_{i\left(l\right)}\in {\mathbb{R}}^{|{G}_{l}|}$,
${\beta}_{\left(l\right)}^{0}\in {\mathbb{R}}^{|{G}_{l}|}$. We wish to achieves sparsity at the level of groups, i.e., to
${\beta}^{0}$ such that
${\beta}_{\left(l\right)}^{0}=0$ for some of the groups
$l\in \{1,\cdots ,g\}$. When using high-dimensional logistic regression models, the Group Lasso provides an estimator for
${\beta}^{0}$:
where
$\lambda \ge 0$ is a tuning parameter which controls the amount of penalization,
${\omega}_{l}=\sqrt{|{G}_{l}|}$ is used to normalize across groups of different sizes, and
${\parallel \xb7\parallel}_{2}$ denotes the
${L}_{2}$ norm of a vector. Meier et al. [
12] established the asymptotic consistency theory of the Group Lasso for logistic regression, Wang et al. [
13] analyzed the rates of convergence, Blazere et al. [
14] stated oracle inequalities and Kwemou [
15] studied non-asymptotic oracle inequalities. Other important references are the works of Nowakowski et al. [
16] and Zhang et al. [
17]. In terms of computational algorithms, Meier et al. [
12] applied the block coordinate descent algorithm of Tseng [
18] to Group Lasso for logistic regression, Breheny and Huang [
19] proposed the Group descent algorithm. These approaches are sufficiently fast for computing the exact coefficients at those selected values of
$\lambda $.
However, it is well known that the Lasso (or the Group Lasso) in linear regression models, the respective optimal values of tuning parameter
$\lambda $ depend on the unknown parameter
${\sigma}^{2}$, the homogeneous noise variance, and its the accurate estimation is generally more difficult when
$p\gg n$. To solve this problem, Belloni et al. [
20] proposed square-root Lasso, which removed this unknown parameter by using a weighted score function (i.e. the square root of empirical loss function). Bunea et al. [
21] extended the ideas behind the square-root Lasso for group selection and developed the Group square-root Lasso. Inspired by Group square-root Lasso, we propose a new penalized weighted score function method, which alternatively replaces the original score function (i.e. the gradient of negative loglikelihood function) with a weighted score function (Huang and Wang [
22]), to study sparse logistic regression with Group Lasso penalty. We get convergence rates for the estimation error and provide a direct choice for the tuning parameter. Moreover, we propose a modified block coordinate descent algorithm based on the weighted score function, which greatly optimizes the computational complexity.
The framework of this paper is as follows. In
Section 2, we apply this idea behind the Group square-root Lasso to sparse logisitic models and develop our method, the penalized weighted score function method. In
Section 3, we propose asymptotic bounds for our new estimator and a direct selection for the tuning parameter. In
Section 4, we provide weighted block coordinate descent algorithm. In
Section 5, numerical simulations show the advantages of our algorithm in terms of selection effects and computational time. In
Section 6, we present real data for genes and musk to support the simulation and theoretical results. The
Section 7 concludes our work. All proofs are given in Appendix.
$\mathbf{Notation}:$ Throughout the paper, denote by $I=\{l:\parallel {\beta}_{\left(l\right)}^{0}{\parallel}_{2}\ne 0\}$ the non-zero coordinate of ${\beta}^{0}$ and let $s=\mathrm{card}\left\{I\right\}$ be the number of non-zero elements of ${\beta}^{0}$. For all $\delta \in {\mathbb{R}}^{p}$ and subset I, we denote by ${\delta}_{I}$ that has the same coordinates as $\delta $ on I and zero coordinates on the complement ${I}^{C}$ of I. For a function $f\left(\beta \right)\in \mathbb{R}$, we denote by $\nabla f\left(\beta \right)\in {\mathbb{R}}^{p}$ its gradient and $\mathcal{H}\left(\beta \right)\in {\mathbb{R}}^{p\times p}$ its Hessian matrix at $\beta \in {\mathbb{R}}^{p}$. Define the ${L}_{q}$ norm of any vector v as ${\parallel v\parallel}_{q}=({\sum}_{i}|{v}_{i}{{|}^{q})}^{1/q}$ and for any vector $\beta \in {\mathbb{R}}^{p}$ with group structures, denote the block norm of $\beta $ for any $0\le q\le \infty $ as ${\parallel \beta \parallel}_{2,q}=({\sum}_{l=1}^{g}\parallel {\beta}_{\left(l\right)}{{\parallel}_{2}^{q})}^{1/q}$. In particular, ${\parallel \beta \parallel}_{2,0}={\sum}_{l=1}^{g}{1}_{{\beta}_{\left(l\right)}\ne 0}$ indicates the number of non-zero groups, ${\parallel \beta \parallel}_{2,1}={\sum}_{l=1}^{g}{\parallel {\beta}_{\left(l\right)}\parallel}_{2}$ represents the form of Group Lasso, ${\parallel \beta \parallel}_{2,2}={\parallel \beta \parallel}_{2}$ denotes ${L}_{2}$ norm, and ${\parallel \beta \parallel}_{2,\infty}={max}_{l}{\parallel {\beta}_{\left(l\right)}\parallel}_{2}$ means the largest ${L}_{2}$ norm of all groups. Moreover $\Phi \left(x\right)$ denotes the cumulative distribution function of the standard normal distribution.
2. Penalized weighted score function method
Recall model (
1), the loss function (i.e. the negative loglikelihood) is given by
leading to the score function
Note that the solution
${\widehat{\beta}}^{GL}$ of model (
2) satisfies KKT conditions defined as follows
for all
$l=1,\cdots ,g$. The left side of equation (
3) is the score function for logistic regression with group structure, which shows that
${\widehat{\beta}}^{GL}$ is actually a penalized score function estimator. To obtain a good estimator, usually we require that the inequality
$\lambda {\omega}_{l}\ge c{\parallel \nabla \ell \left({\beta}^{0}\right)\parallel}_{2,\infty}$ for all
$l=1,\cdots ,g$ and some constant
$c\ge 1$ holds with high probability (Meier et al. [
12] and Kwemou [
15]). However, the random part
$G\left({x}_{i}^{T}{\beta}^{0}\right)-{y}_{i}$ for
$\nabla \ell \left({\beta}^{0}\right)$, the score function valued at
$\beta ={\beta}^{0}$, has variance
$G\left({x}_{i}^{T}{\beta}^{0}\right)(1-G\left({x}_{i}^{T}{\beta}^{0}\right))$, which is also the variance of the binary random variable
${Y}_{i}|{X}_{i}={x}_{i}$. Obviously, binary noises are not homogeneous like noises of the linear regression models, a unique tuning parameter for all of the different coefficient is not a good choice.
We apply the idea from Group square-root Lasso to solve the above problem on choosing tuning parameter, and develop our method as follows. Huang and Wang [
22] formed a class of root-consistent estimating functions by weighted score function for logistic regression
where
$\psi (\xb7)$ is the weighted function of
${x}_{i}^{T}\beta $. This requires choosing a suitable weighed function to ensure that
$\nabla {\ell}_{\psi}\left(\beta \right)$ is almost integrable for
$\beta $. Then, replacing the score function in equation (
3) with the weighted score function, we develop a penalized weighted score function estimate
$\widehat{\beta}$, which is a solution of the following equation:
Let
${\ell}_{\psi}\left(\beta \right)$ be the loss function corresponding to the weighted score function (
4), the solution to Equation (
5) is equivalent to solve the following optimization problem:
Our method is motivated by Bunea et al. [
21] minimization of the Group square-root Lasso for linear model:
where
$\mathbb{Y}\in {\mathbb{R}}^{n\times 1}$ and
$\mathbb{X}\in {\mathbb{R}}^{n\times p}$. When
$\parallel \mathbb{Y}-\mathbb{X}{\widehat{\beta}}^{GSL}{\parallel}_{2}$ is non-zero, the Group square-root Lasso estimator
${\widehat{\beta}}^{GSL}$ satisfies the KKT condition
Compared with the KKT conditions for Group square-root Lasso and Group Lasso, the Group square-root Lasso adds the weighted function $(\sqrt{n}\parallel \mathbb{Y}-\mathbb{X}{\widehat{\beta}}^{GSL}{{\parallel}_{2})}^{-1}$ to estimate the homogeneous noise variance, which allows the tuning parameter $\lambda $ independent of the homogeneous noise variance. Thus, the Group square-root Lasso is able to estimate for the grouped variables and direct choice for the tuning parameter simultaneously.
A drawback of Group square-root Lasso is that it can only directly select the tuning parameter in a linear regression models. However, in logistic regression models, there is no direct way to select the tuning parameter. The penalized weighted score function method implements this scheme. We will discuss this in more detail in the next section.
3. Statistical properties
In this section, we will establish non-asymptotic oracle inequalities for the penalized weighted score function estimate and give a direct choice for tuning parameter.
Throughout this paper, we consider a fixed design setting (i.e., ${x}_{1},\cdots ,{x}_{n}$ are consider deterministic), and we make the following assumptions:
- (A1)
There exists a positive constant $\mathcal{M}<\infty $ such that ${max}_{1\le i\le n}{max}_{1\le l\le g}\sqrt{{\sum}_{j\in {G}_{l}}{x}_{ij}^{2}}\le \mathcal{M}$.
- (A2)
The $n,p$ satisfy that $n\le p=o\left({e}^{{n}^{1/3}}\right)$, and $p/\u03f5>2$ for $\u03f5\in (0,1)$.
- (A3)
There exists
$\mathcal{N}\left({\beta}^{0}\right)>0$ such that
- (A4)
Let ${\ell}_{\psi}(\xb7):{\mathbb{R}}^{p}\mapsto \mathbb{R}$ be a convex three times differentiable function such that for all $u,v\in {\mathbb{R}}^{P}$, the function $g\left(t\right)={\ell}_{\psi}(u+tv)$ satisfies for all $t\in \mathbb{R}$, $|{g}^{{\text{}}^{\prime \prime \prime}}\left(t\right)|\le {\tau}_{0}{max}_{1\le i\le n}\left|{x}_{i}^{T}v\right|{g}^{{\text{}}^{\prime \prime}}\left(t\right)$, where ${\tau}_{0}>0$ is a constant.
The assumption (A1) strictly controls the bounds of predictors, since the real data we collected was often bounded. The assumption (A2) controls the sparsity of the data and the lower bound on the probability that the non-asymptotic property holds. The assumption (A3) makes sure the variance of each component of
$\nabla {\ell}_{\psi}\left({\beta}^{0}\right)$ is bounded with choosing a suitable weighted function
$\psi (\xb7)$. The assumption (A4) is similar to the proposition 1 proposed by Bach [
23]. Under the assumption (A4), we can obtain lower and upper Taylor expansions of the loss function
${\ell}_{\psi}(\xb7)$, using which we can derive non-asymptotic results.
Moreover, restricted eigenvalue condition plays a key role in deriving oracle inequalities. For the Group Lasso problem of high-dimensional linear regression models, the oracle property under the group restricted eigenvalue condition was discussed by Hu et al. [
24] and extended to logistic regression models by Zhang et al. [
17]. To establish the desired group restricted eigenvalue condition, we introduce the following group restricted set
which is a grouped version of the restricted set
${\theta}_{\alpha}=:\left(\right)open="\{"\; close="\}">\vartheta \in {\mathbb{R}}^{p}\phantom{\rule{4pt}{0ex}}:\phantom{\rule{4pt}{0ex}}\parallel {\vartheta}_{{I}^{C}}{\parallel}_{1}\le \alpha {\parallel {\vartheta}_{I}\parallel}_{1}$ mentioned in Bickel et al. [
25], where
${W}_{I}$ is a diagonal matrix with the
jth diagonal element
${\omega}_{j}$ if
$j\in I$, and 0 otherwise. Based on the group restricted set (
8), we propose the following group restricted eigenvalue condition:
- (A5)
For some integer
s such that
$1<s<g$ and a positive number
$\alpha $, the follow condition holds
where
${\mathcal{H}}_{\psi}\left({\beta}^{0}\right)$ is the Hessian matrix for
${\ell}_{\psi}\left({\beta}^{0}\right)$. Different from the restricted eigenvalue condition mentioned in Bickel et al. [
25] for linear regression models, the group restricted eigenvalue condition for logistic regression is converted from the
${L}_{2}$ norm to the block norm for the denominator part and from the Gram matrix to the Hessian matrix
${\mathcal{H}}_{\psi}\left({\beta}^{0}\right)$ for the numerator part of (
9).
Remark 1.
The Hessian matrix of ${\ell}_{\psi}\left(\beta \right)$ is given by
Bach [
23] has already shown the Hessian matrix of
$\ell \left(\beta \right)$ is positive definite on some restricted sets. If the chosen weighted function
$\psi \left({x}_{i}^{T}\beta \right)$ makes the loss function
${\ell}_{\psi}\left(\beta \right)$ satisfy the assumption (A3),
${\mathcal{H}}_{\psi}\left(\beta \right)$ is also positive definite on the group restricted set (
8). Such weighed functions in fact exist and will be described later. In addition, the group restricted eigenvalue condition can effectively control the estimation error, enabling the estimation with good statistical properties and reliable results.
Theorem 1.
Assume that (A1), (A2), (A3), (A4) are satisfied. Let $\lambda <\frac{k(1-z)\mu (s,\alpha )}{4{\tau}_{0}\mathcal{M}s},z\in (0,1)$ and $k<{min}_{1\le l\le g}{\omega}_{l}$. Let λ be a tuning parameter chosen such that
Then, with probability of at least $1-\u03f5(1+o(1\left)\right)$, we have the following:
1. A group restricted set $\widehat{\beta}-{\beta}^{0}\in {\Theta}_{\alpha}$ with $\alpha =\frac{1+z}{1-z}$.
2. Under the group restricted eigenvalue condition (A5), the block norm estimation error are
respectively, and the error of the loss function ${\ell}_{\psi}$ is
The non-asymptotic oracle inequalities for the true coefficient
${\beta}^{0}$ are provided in (
11) and (
12). Unfortunately, the parameter
$\mathcal{N}\left({\beta}^{0}\right)$ is influenced by the true coefficient
${\beta}^{0}$, so that the choice of
$\lambda $ also depends on
${\beta}^{0}$. Therefore, we will choose the suitable
$\psi \left({x}_{i}^{T}{\beta}^{0}\right)$ to solve this problem in the next theorem.
Theorem 2.
Choose the weight function in the following form
Under assumptions (A2) and (A3) we choose the tuning parameter as
Then, under the assumptions of Theorem 1 with probability at least $1-\u03f5(1+o(1\left)\right)$ we have inequalities (11), (12) and (13).
In Theorem 2, Yin [
10] gives a discussion for the order of
${\Phi}^{-1}(1-\frac{\u03f5}{2p})$ in (
15), and proving that
${\Phi}^{-1}(1-\frac{\u03f5}{2p})\sim \mathcal{O}\left(\sqrt{log(2p/\u03f5)}\right)$. When
$|{G}_{l}|=1$ for
$l=1,2,\cdots ,g$, our estimate
$\widehat{\beta}$ is a Lasso estimate and its theoretical properties have been well studied in Yin [
10].
Remark 2.
If $\psi \left({x}_{i}^{T}{\beta}^{0}\right)$ is given as in Theorem 2, the loss function, weighted score function and the Hessian matrix, respectively, are given by
Clearly, the Hessian matrix given as a weighting function of the form of Theorem 2 is positive definite.
4. Weighted block coordinate descent algorithm
We apply the techniques of the block coordinate descent algorithm to the penalized weighted score function. Choose the weighted function as the form of (
14) and set
$\beta =\widehat{\beta}+\zeta $, then a second-order Taylor expansion of the loss function
${\ell}_{\psi}\left(\beta \right)$ in equation (
6) has
Now we consider minimization
$\mathcal{D}(\widehat{\beta}+\zeta )$ with respect to the
lth group of penalized parameters , it mean that
Inspired by Meier et al. [
12] assumptions, setting the sub-matrix
${\mathcal{H}}_{\psi}{\left(\widehat{\beta}\right)}_{\left(l\right)}$ is of the form
${\mathcal{H}}_{\psi}{\left(\widehat{\beta}\right)}_{\left(l\right)}={h}_{\psi}{\left(\widehat{\beta}\right)}_{\left(l\right)}{I}_{\left(l\right)}$, which choose
${h}_{\psi}{\left(\widehat{\beta}\right)}_{\left(l\right)}=-max\{\mathrm{diag}(-{\mathcal{H}}_{\psi}{\left(\widehat{\beta}\right)}_{\left(l\right)}),{r}_{0}\}$, where
${r}_{0}$ is a lower bound to ensure convergence. Then, simplifying equation (
17) gives
This leads to the following equivalence equation
According to equation (
15) and Remark 2, it is obtained that:
If
$\parallel {\mathcal{H}}_{\psi}{\left(\widehat{\beta}\right)}_{\left(l\right)}{\widehat{\beta}}_{\left(l\right)}-\nabla {\ell}_{\psi}{\left(\widehat{\beta}\right)}_{\left(l\right)}{\parallel}_{2}\le \lambda {\omega}_{l}$, the value of
$\zeta $ at the k-th iteration is given by
otherwise
If
${\zeta}_{\left(l\right)}^{\left(k\right)}\ne 0$, we use the Armijo rule of Tseng and Yun [
26] to select the step factor
${\sigma}^{\left(k\right)}$ as follows:
Armijo rule
Finally, the update direction is calculated for the gradient of the parameters and the parameters are updated according to a certain step size
The weighted block coordinate gradient descent algorithm is given by
Table 1. In general, selecting the tuning parameter
$\lambda $ using cross-validation method is complicated. As we know from
Table 1, the algorithm eliminates the selection process for the tuning parameter
$\lambda {\omega}_{l}$. Given an initial value
${\widehat{\beta}}^{\left(0\right)}$, we can then iterate directly over the
${\widehat{\beta}}^{\left(0\right)}$ until it converges to the range which we expect.
It is worth noting that we have given a direct choice (
15) for
$\lambda $ under a specific weight function
$\psi \left({x}_{i}^{T}{\beta}^{0}\right)$ given by (
14), so the weighted block coordinate gradient descent algorithm will be computationally faster than working iteratively on a fixed grid of tuning parameters
$\lambda $ (see Meier et al. [
12]). If choosing other weight functions, the weighted block coordinate gradient descent algorithm can still be used to solve (
6). But then the tuning parameter
$\lambda $ depends on
${\beta}^{0}$ (unknown), some cross-validation can be used for choosing the parameter
$\lambda $.
5. Simulations
In this section, we use simulated datasets to evaluate the performance of the penalized weighted score function estimator. Meier [
12] describes block coordinate gradient descent algorithm using the R package
grplasso. We modify
grplasso, named
wgrplasso, and use it to describe the weighted block coordinate gradient descent algorithm. We compare the performance of the
wgrplasso algorithm, R package
grpreg developed by Breheny [
19] and the R package
gglasso developed by Yang and Zou [
27]. Three main aspects of model performance are considered: correctness of variable selection and accuracy of coefficient estimation, and running time of the algorithm. The evaluation indicators for the model include the following:
TP: the number of predicted non-zero values in the non-zero coefficient set when determining the model
TN: the number of predicted zero values in the zero coefficient set when determining the model
FP: the number of predicted non-zero values in the zero coefficient set when determining the model
FN: the number of predicted zero values in the non-zero coefficient set when determining the model
TPR: the ratio of predicted non-zero values in the non-zero coefficient set when determining the model, which is calculated by the following formulation:
Accur: the ratio of accurate predictions when determining the model, which is calculated by the following formulation:
Time: the running time of the algorithm.
BNE: the block norm of the estimation error, which is calculated by the following formulation:
The sample size was 200. We set values of
$p=300$, 600, and 900, and generated 100 random datasets to repeat the simulation. We set
$\u03f5$ to 0.01 and 0.05 and uniformly specify the true non-zero coefficient parameters of the logistic regression models as
For the log odd $\eta $ setting, we consider the following four different models.
(a) In Model I, the observed data X assume sampling from a multivariate normal distribution and the log odd $\eta $ is considered to be the linear case, where the data between groups are independent but the data within groups are correlated. We set the size of each group to 3 and assume that the data within the groups obey ${X}_{i}\sim N(0,{\Sigma}_{i,jk})$, where ${\Sigma}_{i}=0.{5}^{|j-k|}$. Thus, the observed data can then be defined as $X\sim N(0,\Sigma )$, where $\Sigma =diag({\Sigma}_{1},\cdots ,{\Sigma}_{\frac{p}{3}})$.
(b) In Model II, the observed data X is assumed to be the sum of two uniform distributions and the log odd $\eta $ is considered to be the linear case. Assume that the p-dimensional vectors ${Z}_{1},\cdots ,{Z}_{p}$ and W are generated independently and through a uniform distribution of $[-1,1]$. Thus, the observed data can be defined as ${X}_{i}={Z}_{i}+W$.
The log odds
$\eta $ for Model I and Model II are then defined as follows
(c) In Model III, the observed data X is assumed to follow a standard multivariate normal distribution and the log ogg $\eta $ is considered to be additive case. Assuming that X obeys the $\frac{p}{3}$-dimensional standard normal distribution, the observed data can therefore be defined as $X\sim N(0,{I}_{\frac{p}{3}})$.
(d) In Model IV, the observed data X is assumed to be the sum of two uniform distributions and the log odd $\eta $ is considered to be the additive case. This means assume that the $\frac{p}{3}$-dimensional vectors ${Z}_{1},\cdots ,{Z}_{\frac{p}{3}}$ and W are generated independently by a uniform distribution of $[-1,1]$. Thus, the observed data can be defined as ${X}_{i}={Z}_{i}+W$.
The log odds
$\eta $ for Model III and Model IV are then defined as follows
Then, the dataset for the response variable
Y was generated by the logistic regression models
Table 2 shows the average simulation results of the three algorithms for the linear case, and Figure plots the point-line plots of Model I and Model II for TPR, Accur, Time and MSE.
First, from the TPR perspective, all three algorithms show excellent selection results when the normal distribution assumption is adopted. However, when the uniform distribution assumption is used, the wgrplasso algorithm shows higher correct selection in the nonzero set than the other algorithms, and the wgrplasso algorithm is also more stable in terms of variance.
Second, from the Accur perspective, compared to the gepreg algorithm, the wgrplasso and gglasso algorithms maintain a high selection effect under the assumption of normal distribution. However, Accur is also affected by FP, and the gepreg algorithm and gglasso algorithm are not stable enough to control FP from the perspective of variance. In addition, under the assumption of uniform distribution, both in terms of the effect of selection and the stability of variance, the wgrplasso algorithm has lower control on the FP aspect, which makes the wgrplasso algorithm perform better than the other algorithms in terms of Accur.
Third, from the Time perspective, using the wgrplasso algorithm saves a lot of time, both for the normal distribution assumption and the uniform distribution assumption.
And last, from the BNE perspective, under the assumption of normal distribution, the BNE obtained by wgrplasso and gglasso algorithms are similar and smaller than that obtained by grpreg algorithm. However, under the assumption of uniform distribution, compared with gglasso algorithm and grpreg algorithm, The BNE obtained by wgrplasso algorithm is smaller, which means that wgrplasso algorithm performs better.
Figure 1.
Average TPR, Accur, Time and BNE plots for 100 repetitions of the three algorithms in Model I and Model II.
Figure 1.
Average TPR, Accur, Time and BNE plots for 100 repetitions of the three algorithms in Model I and Model II.
Table 3 gives the simulation results of the three algorithms for the additive case, and Figure plots the point line plots of Models III and IV for TPR, Accur, Time and BNE.
The simulation results show that the grpreg algorithm and the gglasso algorithm in the additive case are reduce both in terms of TPR and Accur, and also show through the variance that the grpreg algorithm and the gglasso algorithm also do not have a stable selection, as well as increasing computational time overhead and BNE. However, wgrplasso obtains similar results in the additive case as in the linear case, and still maintains a better selection. Regardless of TPR, Accur and BNE, the wgrplasso algorithm performs better than the other algorithms, and the advantage in Time is even more obvious.
Figure 2.
Average TPR, Accur, Time and BNE plots for 100 repetitions of the three algorithms in Model III and Model IV.
Figure 2.
Average TPR, Accur, Time and BNE plots for 100 repetitions of the three algorithms in Model III and Model IV.