Version 1
: Received: 18 February 2024 / Approved: 19 February 2024 / Online: 19 February 2024 (07:53:27 CET)
Version 2
: Received: 20 March 2024 / Approved: 21 March 2024 / Online: 21 March 2024 (09:15:49 CET)
Version 3
: Received: 9 April 2024 / Approved: 10 April 2024 / Online: 10 April 2024 (10:52:38 CEST)
Version 4
: Received: 16 April 2024 / Approved: 17 April 2024 / Online: 18 April 2024 (02:44:04 CEST)
How to cite:
Fogel, P.; Boldina, G.; Augé, F.; Geissler, C.; Luta, G. ISM: A New Space-Learning Model for Heterogenous Multi-view Data Reduction, Visualization and Clustering. Preprints2024, 2024021001. https://doi.org/10.20944/preprints202402.1001.v2
Fogel, P.; Boldina, G.; Augé, F.; Geissler, C.; Luta, G. ISM: A New Space-Learning Model for Heterogenous Multi-view Data Reduction, Visualization and Clustering. Preprints 2024, 2024021001. https://doi.org/10.20944/preprints202402.1001.v2
Fogel, P.; Boldina, G.; Augé, F.; Geissler, C.; Luta, G. ISM: A New Space-Learning Model for Heterogenous Multi-view Data Reduction, Visualization and Clustering. Preprints2024, 2024021001. https://doi.org/10.20944/preprints202402.1001.v2
APA Style
Fogel, P., Boldina, G., Augé, F., Geissler, C., & Luta, G. (2024). ISM: A New Space-Learning Model for Heterogenous Multi-view Data Reduction, Visualization and Clustering. Preprints. https://doi.org/10.20944/preprints202402.1001.v2
Chicago/Turabian Style
Fogel, P., Christophe Geissler and George Luta. 2024 "ISM: A New Space-Learning Model for Heterogenous Multi-view Data Reduction, Visualization and Clustering" Preprints. https://doi.org/10.20944/preprints202402.1001.v2
Abstract
We describe a new approach for integrating multiple views of data into a common latent space using non-negative tensor factorization (NTF). This approach, which we refer to as the "Integrated Sources Model" (ISM), consists of two main steps: embedding and analysis. In the embedding step, each view is transformed into a matrix with common non-negative components. In the analysis step, the transformed views are combined into a tensor and decomposed using NTF. Noteworthy, ISM can be extended to process multi-view data sets with missing views. We illustrate the new approach using two examples: the UCI digit dataset and a public cell type gene signatures dataset, to show that multi-view clustering of digits or marker genes by their respective cell type is better achieved with ISM than with other latent space approaches. We also show how the non-negativity and sparsity of the ISM model components enable straightforward interpretations, in contrast to latent factors of mixed signs. Finally, we present potential applications to single-cell multi-omics and spatial mapping, including spatial imaging and spatial transcriptomics, and computational biology, which are currently under evaluation. ISM relies on state-of-the-art algorithms invoked via a simple workflow implemented in a Jupyter Python notebook.
Keywords
Principal Component Analysis; Non-negative Matrix Factorization; Non-negative Tensor Factorization; Multi-view Clustering; Canonical Correlation Analysis; Common Principal Components; Multi Dimensional Scaling
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.