Supervised strategies of multiblock data analysis: a uniﬁed approach and extensions

Within the framework of multiblock data analysis, a uniﬁed approach of supervised methods is discussed. It encompasses multiblock redundancy analysis (MB-RA) and multiblock partial least squares (MB-PLS) regression. Moreover, we develop new supervised strategies of multiblock data analysis, which can be seen as vari-ants of one or the other of these two methods. They are respectively refered to as multiblock weighted redundancy analysis (MB-WRA) and multiblock weighted covariate analysis (MB-WCov). The four methods are based on the determination of latent variables associated with the various blocks of variables. They are derived from clear optimization criteria whose aim is to maximize either the sum of the covariances or the sum of squared covariances between the latent variable associated with the response block of variables and the block latent variables associated with the various explanatory blocks of variables. We also propose indices to help better interpreting the outcomes of the analyses. The methods are illustrated and compared based on simulated and real datasets.


Introduction
With the advent of technology, several blocks of variables are very often collected in different areas to study some phenomena of interest. As example, for the characterization and optimization of food products, several measurements can be made (sensory, instrumental, physico-chemical) and linked to consumers preferences. In ecology, we may be interested in exploring the relationships between the abundance of some species in different sites, on the one hand, and, the variables describing those sites (environment, biodiversity, spatial situation, etc.), on the other hand.
We consider the setting where all the blocks of variables are measured on the same individuals, but the variables can be different from one block to another. For the purpose of predicting a block of variables from other blocks of variables, supervised methods are often used. Examples of such methods are Multiblock Redundancy Analysis (MB-RA) [1][2][3][4], Multiblock PLS (MB-PLS) regression [5][6][7], Hierachical PLS (H-PLS) [6] and P-ComDim [8].
The purpose of this paper is to: (i) set up a unified approach for MB-RA and MB-PLS regression, (ii) propose new supervised strategies for the analysis of multiblock data, (iii) highlight the similarities and differences among all these methods, (iv) propose new indices for a better interpretation of the results of these supervised strategies.
Although self containing, this paper can be seen as a follow up of the unified framework for unsupervised strategies of multiblock data analysis [9].
We start by introducing Redundancy Analysis [10,11] and PLS2 regression analysis [12] using an original presentation. Then, from two different viewpoints, we extend these two strategies to the framework of multiblock data analysis. The first extension of RA leads to MB-RA, whereas the second extension gives a new strategy of analysis that can be seen as a variant of MB-RA. This new strategy of analysis is refered to as Multiblock Weighted Redundancy Analysis (MB-WRA). In the same vein, we consider a first extension of PLS2 regression, which leads to MB-PLS regression and a second extension which leads to an interesting variant of MB-PLS, to which we refer to as Multiblock Weighted Covariate Analysis (MB-WCov). We also exhibit indices that reflect the importance of the latent variables determined at the successive stages and indices that highlight the contribution of the various explanatory blocks of variables to the determination of the global latent variables. All the methods of analysis are compared and illustrated using simulated and real datasets.
The paper is organized as follows. In section 2, we introduce the various methods of analysis. Then, in section 3, we illustrate and compare these methods on the basis of simulated and real datasets. We end the paper by a discussion and some concluding remarks.

Relationships between two datasets 2.1.1 Redundancy analysis
Let us denote by X (n × p) and Y (n × q), two datasets measured on the same individuals and supposed to be column-centered. The aim is to predict Y from X. Let us denote by P X = X(X X) −1 X , the orthogonal projector upon the space generated by the variables of X. We consider a latent variable associated with Y , u = Y ν, where ν is the vector of loading weights, which is constrained to be of length equal to 1 (||ν|| = 1).
The projection of u upon the space generated by the variables of X is given by the latent variable t = P X u. We seek u, and, therefore, t so as to maximize the criterion: cov(u, t) = 1 The rationale behind this maximization problem is clear enough: we seek a latent variable u in the Y -space and a latent variable t in the X-space that are as close as possible to each other.
It is clear that the vector, ν, that maximizes such a criterion is given by the eigenvector of Y P X Y associated with the largest eigenvalue. Thus, we are led to the same solution as redundancy analysis of X and Y . A Non Iterative PArtial Least Squares (NIPALS)-like algorithm to run this analysis is as follows: 0. Choose randomly ν and set ν = ν/||ν||; Compute the X-latent variable: t = P X u; 3. Update ν: ν = Y t/||Y t||; 4. Iterate starting from step 1, until convergence. That is, the criterion to be maximized ceases to increase by more than a pre-specified threshold (e.g., = 10 −8 ).
We can notice that, since the projector P X is idempotent (P 2 X = P X ) and symmetric (P X = P X ), we have: It follows that trace(Y P X Y ) = n q j=1 var(ŷ j ), whereŷ j = P X y j is the projection of the j th variable, y j of Y on the space spanned by the variables in X. In other words, trace(Y P X Y ) reflects the total variance in Y explained by X. Similarly, Therefore, an index of particular interest is: This index reflects the variation in P X Y recovered by the latent variable t. It ranges between 0 and 1. Obviously, the analysis seeks to maximize this index I.
It is worth noting that for the particular case where Y = [y] (a single variable), we are led to multiple linear regression.
Subsequent latent variables could be retreived after a deflation of X and Y , with respect to the X-latent variable t = P X u. More precisely, this deflation procedure consists of subtracting from the datasets X and Y , the information that has already been explained by the previous latent variables in X [13,14]. This information is computed by projecting the datasets X and Y onto these latent variables. This leads us to compute the Y -latent variables u (1) , u (2) , ... and correspondingly, the X-latent variables t (1) , t (2) , etc. At each stage, we can compute the index: These indices could be plotted as a function of the latent variables at each stage, h, and could be interpreted similarly to the so-called scree diagram in Principal Components Analysis (PCA) [15] to select an appropriate number of latent variables to be retained from the analysis.
A prediction model of the variables in Y from X can be set by regressing Y upon the latent variables t (1) , t (2) , ..., t (A) . The number of latent variables, A, to be introduced in the prediction model can be selected by a cross-validation procedure [16].
The projector, P X , involves the inversion of the matrix, X X. The advantage of this inversion is that the correlations as well as the variances of the variables in X are shaded off. Therefore, the analysis is targeted at recovering the variation in Y by means of the variables of X. However, a major drawback of the inversion of such a matrix is that, in case of high redundancy among the variables of X, the analysis is likely to lead to unstable results because of the well known problem of quasi-colinearity [17].

PLS2 regression
We consider the same setting as in the previous section. By way of circumventing the problem of quasi-colinearity, we consider the operator W X = XX instead of the projector P X = X(X X) −1 X . This means that the matrix (X X) −1 is replaced by the identity matrix.
Thereafter, we seek u (and, consequently, t) so as to maximize the criterion: cov(u, t) = 1 It follows that the vector, ν, that maximizes this quadratic form is given by the eigenvector of Y XX Y associated with the largest eigenvalue. Obviously, we are led to the same solution as PLS regression [12,18]. A NIPALS-like algorithm to run this method is the following: 0. Choose randomly the vector of loadings, ν, associated with Y and set ν = ν/||ν||;

1.
Compute the latent variable, u, associated with Y : u = Y ν; 2. Compute the latent variable, t, associated with X: t = W X u; 3. Update the vector ν: ν = Y t/||Y t||; 4. Iterate starting from step 1, until convergence. That is the criterion to be maximized ceases to increase by more than a pre-specifed threshold (e.g., = 10 −8 ).
The convergence of this algorithm is granted by the fact that at each step, the criterion to be maximized increases. Since this criterion is upper-bounded, it results that the series of values of this criterion generated in the course of the iterations converges.
Since PLS2 regression revolves around the eigenanalysis of the matrix Y XX Y , we can remark that trace(Y XX Y ) = trace(XX Y Y ). By using the rela- . Therefore, the quantity: trace(Y XX Y ) reflects the strength of the link between the variables in X and those in Y . This index was introduced by Robert and Escouffier [19,20] and it is tightly linked to the so-called RV coefficient, which is widely used in sensometrics and chemometrics [21][22][23][24][25]. Similarly, we have: ν Y XX Y ν = n 2 p i=1 cov 2 (x i , u). Conse-quently, the index: where λ 1 is the largest eigenvalue of Y XX Y . This index reflects the proportion of covariation (i.e., trace(Y XX Y )) recovered by t (and u).
Subsequent latent variables could be determined after a deflation of X and Y with respect to the latent variables associated with X, determined at previous stages. Therefore, we obtain the Y -latent variables u (1) , u (2) , ... and the corresponding X-latent variables t (1) , t (2) , etc. At each stage, the index: could be computed. These indices could be plotted as a scree diagram. This yields a tool to help choosing the number of latent variables to be retained.

Multiblock data analysis
We consider the multiblock setting where we have a dataset Y (n×q) to be predicted by K datasets X 1 , X 2 , ..., X K ; the dimension of X k (k = 1, 2, ..., K) is n × p k . All these datasets are measured on the same individuals and assumed to be column-centered.
Moreover, in order to set all the X-datasets on the same footing, they can be pre-scaled so as to have their norms equal to 1. This is achieved by dividing each dataset X k by its norm ||X k || = trace(X k X k ).

Multiblock Redundancy Analysis
Let us denote by P k = X k (X k X k ) −1 X k the orthogonal projector upon the space generated by the variables of dataset X k .
Starting with a Y -latent variable u = Y ν (||ν|| = 1), we consider its orthogonal projection upon the space generated by the variables of X k . Thus, we obtain t k = P k u, which define the block latent variables associated with the datasets X k (k = 1, 2, ..., K).
We seek u, and, consequently t k , so as to maximize: The rationale behind this problem is clear: we seek a latent variable u in the Y -space that is as close as possible to the X k -spaces. The optimal vector, ν, of this quadratic form is given by the eigenvector of Y K k=1 P k Y associated with the largest eigenvalue. Clearly, we are led to MB-RA [2][3][4].
We can remark that the criterion to be maximized can also be written as: where t = K k=1 t k stands as the global latent variable. A NIPALS-like algorithm to solve the above maximization problem is as follows: 0. Choose randomly a vector ν and set ν = ν/||ν||; 1. Determine the latent variable associated with Y : u = Y ν and the block latent variable associated with X k : t k = P k u; 3. Update the vector ν: ν = Y t/||Y t||; 4. Iterate starting from step 1, until convergence.
this latter vector could be standardized to unit length. Similarly, the global latent variable t can be written as t = Xw, where X = [X 1 , X 2 , ..., X K ] and w = (w 1 , w 2 , ..., w K ) .
Again for interpretational purpose, this latter vector could be standardized to unit length. Latent variables of higher order than 1 could be obtained by following the same strategy of analysis after deflating all the datasets with respect to the global latent variables associated with X, determined at previous stages. Therefore, we are led to computing the Y -latent variables u (1) , u (2) , ..., the X k -block latent variables t (1) k , t (2) k , ... (k = 1, 2, ..., K) and the global latent variables t (1) , t (2) , etc.
As for the case of RA, the index: can be computed at each stage to reflect the variation in datasets P k Y (k = 1, 2, ..., K) recovered by t (h) . These indices could be plotted as a function of the number of latent variables, h, and could be interpreted as a scree diagram to help choosing the number of latent variables to be retained.
At stage h, the quantity: reflects the contribution of the block X k in the determination of the components u (h) and t (h) .

Multiblock weighted redundancy analysis (MB-WRA)
Let us consider the same setting as in the previous section. We operate the same centering and pre-scaling of the datasets.
We consider a variant to the maximization problem. Instead of maximizing the quantity K k=1 cov(u, P k u), we propose to maximize the quantity: under the constraint that ||ν|| = 1.
Obviously, the rationale behind this problem is exactly the same as previously, that is seeking a direction in the Y -space that is as close as possible to the X k -spaces. To solve this problem, let us use the Lagrangian method. The Lagrange expression associated with this maximization criterion is the following: where 2µ is the Lagrange multiplier associated with the constraint ||ν|| = 1 or equivalently ||ν|| 2 = ν ν = 1. By deriving this Lagrange expression with respect to ν and setting this derivative to 0, we obtain: If we denote by λ k , the quantity λ k = ν Y P k Y ν, we can write: Multiplying the two members of this equality by ν and setting ν ν = 1, we obtain µ = K k=1 λ 2 k . From this, we can derive the stationnary point as: This suggests the following iterative algorithm: 0. Choose randomly the vector ν and set ν = ν/||ν||; 3. ν = ν/||ν||;
We show that this algorithm converges (see appendix). More precisely, we show that at each iteration, the criterion to be maximized increases and since this criterion is upperbounded, the series of the values of the criterion that we seek to maximize generated in the course of the algorithm converges.
Let us consider again the criterion that we sought to maximize. We have: where t = K k=1 λ k t k stands as the global latent variable and appears as a linear combination of the X k -block latent variables t k = P k u. More precisely, since λ k = u t k = n × cov(u, t k ), it follows that t is proportional to the first PLS component of u upon t 1 , t 2 , ..., t K . These remarks suggest an alternative algorithm to solve the maximization problem above. Indeed, this criterion can also be expressed as: Therefore, it follows that for fixed values of λ k , the optimal vector, ν, is given by the eigenvector of K k=1 λ k Y P k Y associated with the largest eigenvalue. Conversely, for a fixed value of ν, λ k is given by: The algorithm associated with this solution is the following: 0. Set λ k = 1 for k = 1, 2, ..., K; 1. Set ν to the eigenvector of K k=1 λ k Y P k Y associated with the largest eigenvalue; 3. Iterate from step 1 until convergence, that is until the criterion to be maximized ceases to increase by more than a pre-specified threshold (e.g., = 10 −8 ).
Subsequent latent variables of higher order can be determined by following the same strategy of analysis after a deflation of all the blocks of variables with respect to the global latent variable t.
Let us denote by t (h) the global latent variable and by t K , its associated block latent variables. Similarly, we denote by u (1) , u (2) , ..., u (h) the successive latent variables in the Y -space. Finally, we denote by λ The following indices can be very useful to better interpret the results.
highlights the importance of the global component t (h) in explaining the covariation of reflects the contribution of the block X k in the determination of the latent variables t (h) and u (h) .

Multiblock PLS regression
In order to counteract the problems of colinearity that arise from the inversion of the matrices X k X k , we propose, as for the case of PLS regression, to replace the matrices (X k X k ) −1 by the identity matrix and, therefore, consider the operators Starting from the Y -latent variable, u = Y ν (||ν|| = 1), we define the X k -block latent variable as t k = W k u. Thereafter, we seek u and, therefore, t k , so as to maximize the criterion: where V kY = 1 n X k Y is the covariance matrix between X k and Y and V Y k = V kY . Since We can also note that: where X = [X 1 |X 2 |...|X K ] is the dataset obtained by horizontally merging the datasets X 1 , X 2 , ..., X K . This entails that the optimal vector ν is the eigenvector of matrix Y XX Y associated with the largest eigenvalue. In other words, the solution to MB-PLS regression amounts to performing PLS regression of Y upon X [6].
We can compute the index: where λ = n × cov(u, t). This index reflects the proportion of covariation between Y , on the one hand, and X k (k = 1, 2, ..., K), on the other hand, that is explained by the global latent variable. Moreover, since we have λ = K k=1 λ k , where λ k = u t k = n × cov(u, t k ), we can compute the indices cont k = λ k λ , which reflects the contribution of the blocks of variables X k in the determination of the global latent variable. In other words, these indices reflect the importance that each block of variables, X k , attaches to the global latent variable.
Subsequent latent variables can be computed following the same strategy of analysis after deflation with respect to the global latent variables associated with X.

Multiblock Weighted Covariate Analysis (MB-WCov)
We consider the same setting and the same notations as in the previous section. As a variant to the maximization criterion that led us to introduce MB-PLS regression, namely K k=1 cov(u, t k ), we consider the criterion K k=1 cov 2 (u, t k ). The rationale behind these two criteria is the same, that is exploring the covariation between, on the one hand, Y and, on the other hand, the blocks of variables X k . The latter maximization criterion will lead us to a strategy of analysis that we shall refer to as Multiblock Weighted Covariate Analysis (MB-WCov). Indeed, as with MB-WRA, this method of analysis will explicitely exhibit weights that reflect the contribution of the X k -blocks of variables to the determination of the latent variables at each stage.
To the Y -latent variable u = Y ν (||ν|| = 1), we associate the variables t k = W k u, which are the block latent variables associated with the datasets X k . Thereafter, we seek u so as to maximize: The Lagrangian expression associated to this problem is: where 2µ is the Lagrange multiplier associated with the constraint ||ν|| = 1 or equivalently ||ν|| 2 = ν ν = 1. By deriving this Lagrange expression with respect to ν and setting this derivative to 0, we obtain 4 Multiplying the two members of this equality by ν and setting ν ν = 1, we obtain µ = K k=1 λ 2 k . The stationnary point is therefore given by . This suggests the following iterative algorithm: 0. Choose randomly the vector ν and set ν = ν/||ν||; 3. ν = ν/||ν||;
The convergence of this algorithm can be shown using very similar developments as for the case of MB-WRA (appendix).
The criterion that we sought to maximize can be written as K k=1 λ k cov(u, t k ) = cov(u, K k=1 λ k t k ). This yields the global latent variable t = K k=1 λ k t k , with λ k = n × cov(u, t k ). It follows that t is proportional to the first PLS component of u upon t 1 , t 2 , ..., t K . Thus, for fixed values of λ k , the optimal vector, ν, is given by the eigenvector of K k=1 λ k Y W k Y associated with the largest eigenvalue. Conversely, for a fixed value of ν, λ k is given by: λ k = ν Y W k Y ν. From these developments, we can propose an alternative algorithm for the resolution of the MB-WCov maximization criterion as follows.
0. Set λ k = 1 for k = 1, 2, ..., K; 1. Set ν to the eigenvector of K k=1 λ k Y W k Y associated with the largest eigenvalue; 3. Iterate starting from step 1 until convergence, that is until the criterion to be maximized ceases to increase by more than a pre-specified threshold (e.g., = 10 −8 ).  Table 1 sums up the four methods of multiblock data analysis discussed in this paper.

Comparison of methods
0. Initial ν (||ν|| = 1) 0.Initial ν (||ν|| = 1) 0. Initial ν (||ν|| = 1) It is clear that the four methods of multiblock data analysis discussed herein can be differentiated by two main features. The first key of differentiation is how the X kblock latent variables, t k , are computed from the Y -latent variable, u. Two options are offered to us, namely, whether we consider t k = P k u or t k = W k u. The first option leads to methods of analysis pertaining to redundancy analysis (i.e., MB-RA and MB-WRA). The second option leads to methods of analysis akin to PLS regression (i.e., MB-PLS and MB-WCov). The second key of differentiation between the methods is the relationship between the X k -block latent variable, t k , and the global latent variable, t.
This relationship directly stems from the optimization criterion to determine the latent variable. More precisely, the criterion based on the covariance between t k and u leads to t = K k=1 t k and the criterion based on the squared covariance between t k and u leads to t = K k=1 λ k t k , where λ k = u t k . In other words, the global latent variable sums up the block latent variables by stating that t is proportional to the average of the block latent variables t k (k = 1, 2, ..., K), in the former case and, in the latter case, stating that t is proportional to the first PLS component of u upon t k (k = 1, 2, ..., K).

Illustrations
The supervised strategies of multiblock data analysis are illustrated and compared based on a simulation study and a real case study.

Simulation study
This simulation study is, to a large extent, similar to that of Westerhuis et al. [6].
It consists of considering two orthogonal variables d 1 and d 2 , four explanatory datasets X 1 to X 4 and a response dataset, Y . These datasets are defined as follows:     Table 2 gives the correlations between the global latent variables t (1) and t (2) with the building variables d 1 and d 2 . It also gives the contributions of the blocks of variables X 1 , X 2 , X 3 and X 4 to the determination of the components t (1) and t (2) .
The first component, t (1) , obtained by means of MB-RA and MB-WRA is highly correlated to d 2 and the second component, t (2) is highly correlated with d 1 . MB-PLS and MB-WCov show an opposite pattern in that sense that their respective first components are highly correlated with d 1 and their respective second components are highly correlated with d 2 . This can be explained by the fact that since the two methods pertaining to MB-RA do not take account of the variation (i.e., variances and correlations) within each predictive dataset, it is the variable d 2 that takes the lead because it appears in Y , on the one hand, and X 2 , X 3 and X 4 , on the other hand. Therefore, it appears as a common pattern to all the datasets but X 1 . This is a configuration favored by MB-RA and MB-WRA. By contrast, the variable d 1 appears in the dataset X 1 five times but since MB-RA and MB-WRA do not take account of the within variation, it will be counted as one variable. Contrariwise, this variable will be the leading variable for MB-PLS and MB-WPLS because these methods of analysis take account of the variation within the predictive datasets in addition to taking account of their relationships with Y .
As regards the contributions of the datasets X 1 , X 2 , X 3 and X 4 to the determination of the component t (1) ( Table 2), we can see that, not surprisingly, this component is almost equally determined by X 2 , X 3 and X 4 insofar as MB-RA and MB-WRA are concerned.
With MB-PLS and MB-WCov, the first component is almost determined by the block of variables X 1 . Similar conclusions regarding the second component t (2) can easily be drawn. All these conclusions are in line with the rationale governing the various methods, namely that the two methods pertaining to MB-RA do not take account of the within variation in the blocks of variables, whereas the two methods pertaining to MB-PLS do. Table 2: Simulated data: Correlations between the global latent variables t (1) , t (2) with the building variables d 1 and d 2 and contributions of the various blocks of variables to the determination of the global latent variables t (1) and t (2) . In order to assess the prediction ability of the four methods of analysis, we divided the datasets into calibration sets with thirty observations and test sets with twenty observations. The former datasets were used to set up prediction models. These models were applied to the predictive blocks of variables from the test data. The predictions thus obtained were compared to the actual values from Y by means of the root mean squared errors of prediction (RMSEP, [16]). It can be seen in Figure 2, which shows the evolution of the RMSEP values as a function of the number of latent variables introduced in the models, that the four methods show more or less the same pattern. More precisely, the RMSEP sharply decreases when the second latent variable is introduced, then it very slightly decreases with the introduction of the third component. Thereafter, the various curves form a plateau or tend to slightly increase. It is worth noting that the smallest RMSEP value is obtained by means of MB-WRA with three components.

Case study: Potatoes data
The multiblock data used to compare the methods of analysis are described in more details in [26]. The aim is to predict sensory attributes from measurement data. Twenty potato samples were analyzed after one month of storage and six additional samples, after eight months of storage. A panel of assessors profiled the texture of cooked potatoes with respect to nine texture attributes. The sensory data were averaged accross assessors, yielding a dataset Y (26 potatoes samples × 9 sensory attributes). The block X 1 is given by the chemical analysis of the potatoes samples (14 variables) and a second block of variables, X 2 , concerns the uniaxial compression at six deformation rates (6 variables).
Each predictive dataset was column-centered and pre-scaled so as to have its norm equal     Table 3. Globally, it appears that, for all the methods, the chemical dataset (X 1 ) is the one that contributes most to the determination of the first two latent variables. Table 3: Potatoes data: Contributions of the blocks of variables X 1 and X 2 to the determination of the global latent variables t (1) and t (2) .  We have set up a unified approach for two supervised methods, namely, MB-RA and MB-PLS regression. We have also proposed two new strategies for predicting a block of variables from other blocks of variables. These four methods are based on clear optimization criteria and can be differentiated according to two traits.

MB-RA MB-WRA MB-PLS MB-WCov
The first trait concerns the relationship between the Y -latent variable u and the X k -block latent variables t k . We have considered herein two types of relationships: (i) The first relationship leads to the methods pertaining to redundancy analysis, namely, MB-RA and MB-WRA. Since, this strategy of analysis involves the inversion of the matrices X k X k , the variances and the correlations of the datasets X k are shaded off. Therefore, the methods of analysis will focus on recovering the variation of Y by means of the latent variables of the X k datasets. However, in case of multicolinearity among the variables of X k , the inversion of such matrices is likely to lead to unstable models. By considering the relationship (ii), we circumvent this problem and we are led to methods pertaining to PLS regression, namely, MB-PLS regression and MB-WCov.
The second distinguishing trait is the relationship between the global latent variable t and its associated block latent variables t k . Again, we have considered two kinds of relationships: (i) the global latent variable t is equal to the sum of its block latent variables t k (t = K k=1 t k ) and (ii) the global latent variable t is equal to the linear combination of its block latent variables t k (t = K k=1 λ k t k , with λ k = u t k ). This latter relationships means that t is the first PLS component of u upon t 1 , t 2 , ..., t K . For the relationship (i), we are led to MB-RA and MB-PLS regression. By using the relationship (ii), a specific weight, λ k , is attached to each dataset X k . These weights reflect the importance of each dataset in the computation of the latent variables t and u. With the relationship (ii), we obtain the methods MB-WRA and MB-WCov.
We have already noted the relationship t = K k=1 λ k t k , with t k = u t k means that the latent variable t is proportional to the first PLS component of u upon the block components t k . From this standpoint, it appears that MB-WCov bears some similarities to Multiblock Hierarchical PLS (MB-HPLS) [6] which basically enjoys the same property.
However, since MB-HPLS is not grounded on a clear optimization criterion, it suffers from convergence problems [6,27].
The fact that, with MB-WRA and MB-WCov, the global latent variable t is the first PLS component of the Y -latent variable u upon the block latent variables t 1 , t 2 , ..., t K suggests new interesting developments. Indeed, instead of the usual PLS regression, we could use a sparse PLS regression. This means that, at each stage, the blocks of variables that do not have a significant contribution to the determination of the latent components computed at the current stage will be discarded. As a consequence, we are led to parsimonious models that are easier to interpret without affecting the prediction ability.
Moving from the projectors P k = X k (X k X k ) −1 X k to the operators W k = X k X k was dictated by a requirement to circumvent the tricky problem of quasi-colinearity.
In effect, this corresponds to a drastic shrinkage of the matrices X k X k towards the identity matrix. A softer shrinkage may consist in considering the operators P kγ = X k γI + (1 − γ)X k X k −1 X k where γ is a tunning parameter comprised between 0 and 1. This yields a continuous strategy of analysis whose two extreme points (i.e., γ = 0 and γ = 1) are the methods of analysis discussed herein. In practice, the tuning parameter could be determined together with the number of latent variables to be included in the model by a technique of cross-validation [28].
We have also proposed indices for the interpretation of the results of the strategies of analysis. Among these indices, we have an index that indicates the proportion of covariation between the blocks of variables. This index could give a hint regarding the number of latent variables to be retained. We have also proposed an index that highlights the contribution of the block of variables X k in the determination of the global latent variable.
This property is readily proven by remarking that by virtue of Cauchy-Schwarz inequality, the maximum (with respect to x, ||x|| = 1) of the function x G(n)ν n is achieved for x = G(n)νn ||G(n)νn|| = ν n+1 . By expanding the left term of inequality (26), it is easy to check that it is equal to 1.
Since the matrices A k = Y P k Y are semi-definite positive, we have by virtue of Cauchy-Schwarz theorem ν n+1 A k ν n ≤ ν n+1 A k ν n+1 ν n A k ν n .
Thus we have: Again, by using Cauchy-Schwarz inequality it follows that the last term of the inequality (27) is smaller than: Using one last time Cauchy-Schwarz inequality, it follows that: Combining the inequalities (27), (28) and (29), we have: By remarking that λ k (n) = ν n A k ν n , it readily follows that: or equivalently: This is precisely the property that we aim to prove.