4. Discussion
To our knowledge, ISM is the first approach that allows heterogeneous views to be transformed into a three-dimensional array to which NTF can be applied to extract consistent information from all views. Other recent methods, such as Regularized Multi-Manifold NMF [19] or Multi-View Clustering in Latent Embedding Space [5], propose new algorithms that insert terms in the NMF loss function to minimize the difference between the consensus components and the view-specific components, thus circumventing the embedding in a three-dimensional array. However, these algorithms have yet to be adopted by the Machine Learning community and optimized in terms of performance and convergence. In contrast, ISM consists of a workflow of proven algorithms, NMF and NTF, which are already optimized in terms of performance and convergence. Thus, ISM provides the Machine Learning community with a powerful and scalable tool.
NMF and NTF are known to produce more interpretable and meaningful factors as they cannot cancel each other out due to the non-negativity of their loadings. Similarly, ISM, as a workflow involving NMF and NTF steps, produces latent factors whose interpretation is greatly facilitated thanks to the non-negativity of the attribute loadings that define them, as illustrated by the signature 915 data example.
Sparsity is an important element of the ISM workflow, which further facilitates the interpretation of latent factors. It is important to note that no parameter for sparsity needs to be defined, as the hard threshold calculation for latent factors is automatically selected as the reciprocal of the Herfindhal-Hirschman index. For factors with strongly positively skewed values, the use of the L2 norm for the denominator of the index can lead to excessively sparse factors, which in turn can lead to an overly large approximation error during embedding. Therefore, this threshold can be scaled down by a multiplicative factor to achieve a better mapping to each view, which can lead to greater consistency in the analysis, as long as the intrinsic nature of the embedding tensor is preserved, i.e. the embedding dimensions remain comparable in the different views. In our workflow implementation, the default value for the multiplicative factor was set to 0.8 after extensive testing with various data sets.
The embedding size is an additional parameter that allows ISM to be tuned to a desired level of specificity across views. A large embedding size, such as the factorization rank multiplied by the number of views, allows each view to find its specificities in some dimensions of the embedding. In contrast, a small embedding size, such as the factorization rank, leads to more consensual latent factors with attributes from different views due to the rarity of components in the latent space. This is in sharp contrast to other approaches to latent spaces such as GCCA or stepwise CPCs, which create a latent space that attempts to maximize the correspondence between views and filters out their specificities.
ISM intrinsic view loadings also enable the automatic weighting of the views within each latent factor. This allows the simultaneous analysis of views of very different sizes without the need for prior normalization to give each view the same importance, as is the case with Consensus PCA, for example.
It should be noted that the preliminary NMF in unit 1 of workflow 1 combines the data before applying NTF, which is reminiscent of the "attention" mechanism used in transformers before applying a light neural network [20]. This could explain why ISM can outperform NTF when applied to a multidimensional array, i.e. even if the data structure is suitable for the direct application of NTF, as shown by the clustering of marker genes achieved in the application example. This could also explain why, although NMF performance is close to ISM performance in terms of the purity index, ISM outperforms NMF in both examples in terms of number of recognized classes and, in the second example, by generating a better positioning of recognized cell types on the 2D map projection.
Like other latent space methods, ISM is not limited to the purpose of multi-view clustering. The ISM components, as well as the view-mapping matrix, can be used for data reduction on newly collected data (i.e. data that is not part of the data used to train/learn the model) by fixing these components in the ISM model.
Data reduction for newly collected data is still feasible even if some of the views contained in the training data are missing, as the ISM parameters are compartmentalized by view, in contrast to latent factors provided by other latent space factorization methods.
ISM is not limited to views with non-negative data. Each mixed-signed view can be split into its positive part and the absolute value of its negative part, resulting in two different non-negative views, similar to the scheme applied by NTF to centered data [14].
An important limitation of ISM and of other multi-view latent space approaches is the required availability of multi-view data for all observations in the training set. For financial or logistical reasons, a particular view may be missing in a subset of the observations, and this subset is in turn dependent on the view under consideration. We are currently evaluating a variant of ISM that can process multi-view data with missing views. In this approach, sets of views that have enough common observations are integrated with ISM separately. By using the model parameters, the transformation into the latent ISM space can be expanded to
all views over
all observations belonging to the set, resulting in much larger transformed views than the original intersection would allow. This
expansion process enables the integration of the ISM-transformed data from the different view sets, again using the ISM. For this reason, we call this variant the Integrated Latent Space Model, ILSM. Interestingly, a similar integrated latent space approach has already been proposed to study the influence of social networks on human behavior [21]. After masking a large number of views, the dataset of UCI digits was analyzed using ILSM. A more detailed description of the expansion process (Workflow S1,
Figure S1) and promising results (
Figure S2) can be found in the supplementary materials.
The ISM implementation relies on state-of-the-art algorithms that are already available in "off-the-shell" NMF and NTF packages (sklearn.decomposition.NMF — scikit-learn 1.4.0 documentation, adnmtf PyPI) and are invoked via a simple workflow implemented in a Jupyter Python notebook that is accessible to the vast majority of the Machine Learning community.