Numerical comparisons between Bayesian and frequentist low-rank matrix completion: estimation accuracy and uncertainty quantiﬁcation

In this paper we perform a numerious numerical studies for the problem of low-rank matrix completion. We compare the Bayesain approaches and a recently introduced de-biased estimator which provides a useful way to build conﬁdence intervals of interest. From a theoretical viewpoint, the de-biased estimator comes with a sharp minimax-optinmal rate of estimation error whereas the Bayesian approach reaches this rate with an additional logarithmic factor. Our simulation studies show originally interesting results that the de-biased estimator is just as good as the Bayesain estimators. Moreover, Bayesian approaches are much more stable and can outperform the de-biased estimator in the case of small samples. However, we also ﬁnd that the length of the conﬁdence intervals revealed by the de-biased estimator for an entry is absolutely shorter than the length of the considered credible interval. These suggest further theoretical studies on the estimation error and the concentration for Bayesian methods as they are being quite limited up to present.


Introduction
The goal of low-rank matrix completion is to recover a low-rank matrix from its 16 partially observed entries. This problem has recently received and increased attention 17 due to the emergence of several challenging applications, such as recommender systems 18 (particularly the famous Netflix challenge [5]). Different approaches from frequentist 19 to Bayesian methods have been conducted in this problem, both from theoretical and 20 computational point of views, see for example [1,3,[6][7][8][9]13,[15][16][17]21,22,24]. 21 From a frequentist point of view, most of the recent methods are usually based 22 on penalized optimization. A seminal result can be found in [7,8] for exact matrix 23 completion (noiseless case) and further developed in the noisy case in [6,13,20]. Some 24 efficient algorithms had also been studied, for example see [11,18,21]. More particularly, 25 in the notible work [13], the authors studied nuclear-norm penalized estimators and 26 provided reconstruction errors for their methods. Moreover, they also showed that these 27 errors are minimax-optimal (up to a logarithmic factor). Note that the average quadratic 28 error on the entries of a rank-r matrix size m × p from n-observations can not be better 29 than: r max(m, p)/n [13]. 30 More recently, in a remarkable work [9], de-biased estimators have been proposed 31 for the problem of noisy low-rank matrix completion. The estimation accruracy of this 32 where Ω is a subset of indices {1, . . . , m} × {1, . . . , p} and E ij ∼ N (0, σ 2 ) are inde- 83 pendently generated noise at the location (i, j). We assume that the data are missing 84 uniformly at random. Then, the problem of estimating M * with n = |Ω| < mp is called 85 the (noisy) low-rank matrix completion problem. 86 Let P Ω (·) : R m×p → R m×p be the orthogonal projection onto the observed entries in the index set Ω that Notations:

87
For a matrix A ∈ R m×p , A F = trace(A A) denotes its Frobenius norm and We use Id q to denote the identity matrix of dimension q × q. LetM be either the solution of the following nuclear norm regularization [18] min Z∈R m×p or of the following factorization minimization [11] min U∈R m×r ,V∈R p×r where λ > 0 is a tuning parameter.

92
Given an estimatorM as above, the de-biased estimator [9] is defined as where Pr rank−r (B) = arg min A:rank(A)≤r A − B F is the projection onto the set of rank-r 93 matrices.
where U db =Û(Σ + λId r ) 1/2 and V db =V(Σ + λId r ) 1/2 . Then, given a significance level α ∈ (0, 1), the following interval The Bayesian estimator studied in [3,17] is given by is the posterior and L(Y|M) λ is the likelihood raising by λ. Here λ ∈ (0, 1) is a tuning 102 parameter and π(M) is the prior distribution. A popular choice for the priors in Bayesian matrix completion is to assign conditional Gaussian priors to U ∈ R m×K and V ∈ R p×K with for a fixed integer K ≤ min(m, p). More specifically, for k ∈ {1, . . . , K}, independently where Id q is the identity matrix of dimension q × q and a, b are some tuning parameters.

105
This kind of prior is conjugate for which the conditional posteriors can be derived 106 explicitly in closed form and allows to use the Gibbs sampler, see [22] for details. Some 107 reviews and discussion on low-rank factorization priors can be found in [1,2].

Remark 2.
In the case that the rank r is not known, it is natural to take K as large as possible, 109 e.g K = min(m, p) but this may be computationally prohibitive if K is large. log(n max(m, p)). It is noted that, in [17], the rate is also reached with additional log-term 113 by log(min(m, p)) under general sampling distribution however the paper considered some 114 truncations on the prior.

115
For a given rank-r, we propose to consider the following prior, called fixed-rankprior, for k = 1, . . . , r. This prior is a simplified version of the above prior. We note that for 116 K > r the Gibbs sampler of the fixed-rank-prior will be faster than Gibbs sampler for the 117 above prior. However, interestingly, results from simulation for the Bayesian estimator 118 with this prior are slightly better than the one based on the above prior at some point.

119
Remark 4. We remark that the theoretical estimation error for the Bayesian estimator with the 120 fixed-rank-prior given in (6) remains unchanged following by Corollary 4.2 in [3]. technique. Here, we focus on the equal-tailed credible interval for an entry.
More precisely, the credible intervals are reported using the 89% equal-tailed in-126 tervals which is suggested by [14,19] for small posterior samples as in our situations 127 with 500 posterior samples. We noted that to obtain 95% intervals, an effective posterior 128 sample size of at least 10.000 is recommended [14], which is computationally prohibited  were conducted with simulated data:

135
• Setting I: In the first setting, we fix m = 100 and a rank-r matrix M * m×p is generated as the product of two rank-r matrices, where the entries of U * and V * are i.i.d N (0, 1). With a missing rate τ = 20%, 50% 136 and 80%, the entries of the matrix M * are observed using a uniform sampling. This 137 sampled set is then corrupted by noise as in (1), where the E i are i.i.d N (0, 1). We 138 alternate the other dimension by taking p = 100 and p = 1000. The rank r is varied 139 between r = 2 and r = 5.

140
• Setting II: The second series of simulations is similar to the first one, except that the matrix M * is no longer rank-r, but it can be well approximated by a rank-r matrix: where the entries of A and B are i.i.d N (0, 1).

141
Remark 5. We note that for second series of simulations, with appriximate low-rank matrices, 142 the theory of the de-biased estimator can not be used whereas theoretical guarantees for Bayesian 143 estimators are still validated, see [3,17].

144
For each setup, we generate 50 data sets (simulation replicates) and report the average and the standard deviation for a measure of error of each estimator over the replicates. The behavior of an estimator (say M) is evaluated through the mean squared error (MSE) per entry and the normalized mean square error (NMSE) We also measure the prediction error by using whereΩ is the set of un-observed entries. 145 We compare the de-biased estimator (denoted by 'd.b'), the Bayesian estimator From the reults in Tables 1 and 2, it is clear to see that the de-biased estimator 157 significantly outperforms its ancestry estimator being de-biased. Whereas, the de-biased 158 estimator is just as good as the Bayesian methods in some case.

159
More specifically, in Table 1, the de-biased estimator is equally comparative to 160 Bayesian estimators in the case with high rates of observation (say τ = 20% or 50%).

161
With the case of highly missing rate τ = 80%, the de-biased estimator returns highly 162 unstable results, this may be because its ancestry estimator (here is the als estimator) is smaller. This is also recognized for the setting of approximate low-rank matrices as in 166 that for the setting with true underlying matrix being approximately low-rank, in Table 2 170 and 4, the 'Bayes' approach is slightly betther than the 'f.Bayes' approach at some point.

171
This can be explained as the 'Bayes' approach employed a kind of approximate low-rank 172 prior through the Gamma prior on the variance of the factor matrices and thus it is able 173 to adapt to the approximate low-rankness. experiments. More precisely, we report the 95% confidence intervals for the de-biased 183 method. The credible intervals is reported using the 89% equal-tailed intervals which is 184 suggested by [14,19] for small posterior samples as in our situations with 500 posterior 185 samples. We noted that to obtain 95% intervals, an effective posterior sample size of 186 at least 10.000 is recommended [14]. A few examples with 10000 posterior samples are 187 given in Figure 1.

188
A noteworthy conclusion from the results in Table 3 and 4 and Figure 1 is that 189 the lengths of the 95% confidence intervals revealed by the de-biased method are sig-190 nificantly thinner than the 89% or 95% credible intervals. We note that in all of our

200
In this paper, we provide extensive numerical comparisions between the de-biased Bayesian (VB) method in [16] where its theoretical guarantees are given in [3], because 217 this method is very popular for matrix completion with large dataset. This will be the 218 objective of our future work. However, we would like to note that, in a preprint [2], the 219 authors had performed some comparisons between the Bayesian approach and the VB 220 method. The message from their works is that we can expect that VB should be more or 221 less as accurate as Bayes, maybe slightly less, but that the credibility intervals would be 222 wrong (see e.g Figure 3 in [2]).