Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

ISM: A New Space-Learning Model for Heterogenous Multi-view Data Reduction, Visualization and Clustering

Version 1 : Received: 18 February 2024 / Approved: 19 February 2024 / Online: 19 February 2024 (07:53:27 CET)
Version 2 : Received: 20 March 2024 / Approved: 21 March 2024 / Online: 21 March 2024 (09:15:49 CET)
Version 3 : Received: 9 April 2024 / Approved: 10 April 2024 / Online: 10 April 2024 (10:52:38 CEST)
Version 4 : Received: 16 April 2024 / Approved: 17 April 2024 / Online: 18 April 2024 (02:44:04 CEST)

How to cite: Fogel, P.; Boldina, G.; Augé, F.; Geissler, C.; Luta, G. ISM: A New Space-Learning Model for Heterogenous Multi-view Data Reduction, Visualization and Clustering. Preprints 2024, 2024021001. https://doi.org/10.20944/preprints202402.1001.v2 Fogel, P.; Boldina, G.; Augé, F.; Geissler, C.; Luta, G. ISM: A New Space-Learning Model for Heterogenous Multi-view Data Reduction, Visualization and Clustering. Preprints 2024, 2024021001. https://doi.org/10.20944/preprints202402.1001.v2

Abstract

We describe a new approach for integrating multiple views of data into a common latent space using non-negative tensor factorization (NTF). This approach, which we refer to as the "Integrated Sources Model" (ISM), consists of two main steps: embedding and analysis. In the embedding step, each view is transformed into a matrix with common non-negative components. In the analysis step, the transformed views are combined into a tensor and decomposed using NTF. Noteworthy, ISM can be extended to process multi-view data sets with missing views. We illustrate the new approach using two examples: the UCI digit dataset and a public cell type gene signatures dataset, to show that multi-view clustering of digits or marker genes by their respective cell type is better achieved with ISM than with other latent space approaches. We also show how the non-negativity and sparsity of the ISM model components enable straightforward interpretations, in contrast to latent factors of mixed signs. Finally, we present potential applications to single-cell multi-omics and spatial mapping, including spatial imaging and spatial transcriptomics, and computational biology, which are currently under evaluation. ISM relies on state-of-the-art algorithms invoked via a simple workflow implemented in a Jupyter Python notebook.

Keywords

Principal Component Analysis; Non-negative Matrix Factorization; Non-negative Tensor Factorization; Multi-view Clustering; Canonical Correlation Analysis; Common Principal Components; Multi Dimensional Scaling

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.