Multi-View Clustering via Projection-Enhanced Bipartite Graph Learning and Consensus Fusion

Xun Liu; Qing-Wen Wang; Jiang-Feng Chen

doi:10.20944/preprints202604.1396.v1

Submitted:

20 April 2026

Posted:

21 April 2026

You are already at the latest version

Abstract

Anchor-based bipartite graph methods provide linear scalability for multi-view clustering, but most of them construct graphs in the original feature space, where high dimensionality distorts the proximity between samples and anchors and degrades graph quality. In addition, the K-means step commonly used to discretize spectral embeddings produces different cluster assignments across random seeds. To address these limitations, this paper proposes Projection-Enhanced Bipartite Graph Learning (PEBGL), a unified framework that jointly performs subspace projection, bipartite graph construction, consensus graph fusion with adaptive view weighting, and discrete label assignment. Every subproblem admits a closed-form or deterministic solution, so the algorithm runs in linear time and produces reproducible cluster labels for any fixed initialization. Experiments on six benchmark datasets demonstrate that PEBGL achieves consistently competitive accuracy across all evaluation settings and improves over the strongest baseline by up to 4.8 percentage points. These results confirm the effectiveness and generality of the proposed framework.

Keywords:

multi-view clustering

;

bipartite graph learning

;

consensus graph fusion

;

anchor-based method

;

spectral embedding

Subject:

Physical Sciences - Mathematical Physics

1. Introduction

In many real-world applications, the same set of objects can be described by several different types of features. For example, an image can be represented by its color, texture, and shape at the same time. A web document can be described by both its text content and its link structure. A clinical patient may have genomic data, pathological images, and electronic health records. Each type of feature is called a view, and each view captures only part of the information about the data. No single view is likely to reveal the complete cluster structure on its own. Multi-view clustering aims to combine these different views to find meaningful groups of objects without any label information. This task has wide applications in areas such as social network analysis, medical informatics, remote sensing, and visual object categorization [1,2].

Among the various approaches to this task, graph-based methods have attracted considerable attention. The basic idea is to build a graph for each view, where nodes represent data samples and edge weights reflect pairwise similarities. The cluster structure is then extracted from the spectral properties of the graph. In classical spectral clustering [3], the eigenvectors of the graph Laplacian provide a continuous approximation of the discrete cluster indicators, and K-means is applied as a post-processing step to produce hard assignments. Multi-view extensions build one graph per view and try to learn a unified representation that combines information across views. Nie et al. [4] proposed a graph learning framework that assigns view weights automatically through self-optimization. Wang et al. [5] introduced a mutual reinforcement mechanism between individual view graphs and the consensus graph. Liang et al. [6] modeled both consistency and inconsistency across views, and Huang et al. [7] separated multi-view graphs into shared and diverse components. These methods have significantly improved clustering quality, but they all rely on full

N \times N

similarity matrices. This requires at least

O (N^{2})

storage and

O (N^{3})

eigendecomposition cost, which becomes impractical when N exceeds a few thousand.

Dense pairwise graphs carry a heavy computational burden. To resolve this, researchers select a small set of representative anchors, denoted as r (

r ≪ N

). By learning an

r \times N

affinity matrix between the samples and these anchors, a sparse bipartite structure is formed. Graph construction and spectral analysis both scale as

O (N r)

. Li et al. [8] initially adapted this concept for multi-view data. They construct a separate graph per view and concatenate them prior to spectral analysis. A related pipeline by Kang et al. [9] accelerates subspace clustering. Addressing a different issue, Nie et al. [10] designed a structured objective based on graph connectivity. This yields cluster labels directly, avoiding the K-means step altogether. Meanwhile, other studies [11,12,13] proposed fusion strategies without additional hyperparameters. These methods aggregate the individual bipartite graphs under a strict Laplacian rank constraint, which guarantees the consensus graph forms exactly c connected components.

A recent trend attempts to solve graph construction and clustering in a single step. Fang et al. [14] reconstruct subspaces to learn bipartite graphs for each view alongside a consensus graph. They also add label learning directly into this loop. Liu et al. [15] rely on decomposition. By splitting each anchor graph into a shared core and a separate residual, they penalize the residuals to reduce discrepancies across views. Yan et al. [16] push this idea further. They isolate consistency from diversity, applying sparsity rules so that only the consistent components shape the final graph. However, these methods all build graphs in the original feature space without modifying the data representation beforehand. Li et al. [17] take a different route by fusing similarities during the spectral embedding phase. By utilizing entropy weighting and spectral rotation, their model produces discrete labels without extra clustering steps. This orthogonal rotation mechanism to replace K-means was originally designed by Luo et al. [18] for multigraph clustering. Several recent bipartite graph models have since adopted this idea [14,17].

Tensors provide a parallel approach to capture high-order correlations. Xia et al. [19] formed a single tensor by stacking individual bipartite graphs. They then applied the tensor Schatten p-norm to guarantee low-rank consistency. Since full tensor decomposition is computationally expensive, Gu et al. [20] avoided it by learning a compact essential representation. Other studies broaden the scope of graph modeling. Zhao et al. [21] captured indirect relationships between samples. To maintain performance, they introduced a truncation mechanism that explicitly filters out low-quality graphs before fusion. Addressing a different challenge, Jiang et al. [22] extended the framework to handle unaligned data. Under this setting, the sample correspondences across views are unknown.

Despite this progress, two limitations remain in existing methods. The first limitation concerns the feature space in which bipartite graphs are constructed. Virtually all methods cited above learn anchor-sample affinities in the original feature space. When a view contains thousands of features, many of which are redundant or uninformative, pairwise distances suffer from the concentration phenomenon in high-dimensional spaces and become nearly indistinguishable. The resulting affinity values lose discriminative power, and the distorted affinities propagate through consensus fusion into the final partition. Although dimensionality reduction techniques such as PCA have long been available, integrating them into the bipartite graph learning objective while preserving the variance structure and maintaining closed-form updates remains unexplored.

The second limitation lies in the transition from continuous spectral embeddings to discrete cluster assignments. Three competing strategies coexist. The dominant approach applies K-means to the eigenvectors of the consensus Laplacian, which introduces nondeterminism through random seed selection and can yield different partitions across runs on the same graph. The constrained Laplacian rank approach [23] enforces exactly c connected components, yielding deterministic labels from the connectivity pattern, but it imposes a rigid structural requirement that may not suit data whose clusters overlap or vary in density. Spectral rotation [24] aligns the eigenvector matrix to a discrete indicator through orthogonal rotation, removing K-means entirely, yet it does not by itself ensure that the underlying graph possesses a clear connected-component structure. Each strategy sacrifices one desirable property to gain another, and combining their strengths within a unified objective remains an open question.

We propose Projection-Enhanced Bipartite Graph Learning (PEBGL) to resolve these limitations. Our approach places the entire learning pipeline within a single objective function. Rather than working in the original feature space, we project each view orthogonally. This maps the data into a compact subspace, but the dominant variance remains intact. In this new space, anchor reconstruction under simplex constraints forms the bipartite graph. The fusion objective treats all views symmetrically. By applying an entropy penalty, the method learns adaptive weights for the individual graphs and fuses them into one consensus structure. We then compute a spectral embedding from its symmetric normalized Laplacian. An orthogonal rotation aligns this embedding directly with discrete cluster indicators. Here, the rotation acts merely as a structural regularizer rather than a forced decoding step. Final labels are read straight from the connected components of the converged graph. Every subproblem has a closed-form solution, and the time cost per iteration scales linearly with N. Tested against six benchmark datasets, PEBGL reached the top accuracy on two and ranked second on three others. It yields deterministic output and maintains competitive running times.

The main contributions are summarized as follows.

(i): A unified objective that integrates variance-preserving PCA projection, bipartite graph construction, entropy-weighted consensus fusion, spectral embedding, and discrete rotation with overall linear time complexity.
(ii): A deterministic coclustering decoding strategy based on connected-component detection, which bypasses K-means post-processing and guarantees reproducible cluster labels for any fixed parameter configuration.
(iii): Closed-form solutions for every subproblem, validated by experiments on six benchmarks against seven recent methods.

The overall framework of PEBGL is depicted in Figure 1.

Table 1 summarizes the key symbols used throughout this paper. The set of

N \times c

binary matrices in which each row contains exactly one unit entry is denoted by

Ind

. Let

{X^{v}}_{v = 1}^{V}

denote a multi-view dataset with V views, where

X^{v} \in R^{N \times d_{v}}

contains N samples described by

d_{v}

-dimensional features in the v-th view. The goal is to partition the N samples into c clusters by jointly exploiting all views.

The remainder of this paper is organized as follows. Section 2 establishes the mathematical preliminaries. Section 3 develops the proposed method with complete derivations. Section 4 reports experimental comparisons and analyses on six benchmark datasets. Section 5 concludes the paper.

2. Preliminaries

This section reviews the mathematical tools that the proposed method builds upon.

2.1. Graph Laplacian

Let W be a symmetric non-negative similarity matrix with degree matrix

D = diag (W 1)

. The symmetric normalized Laplacian is

L = I - D^{- 1 / 2} W D^{- 1 / 2} .

(1)

Lemma 1

(von Luxburg [25]). The multiplicity of the eigenvalue 0 of L equals the number of connected components in the graph associated with W.

When W encodes a bipartite graph between N samples and r anchors through a non-negative matrix Z, the augmented similarity matrix takes the block form

S = [\begin{matrix} 0 & Z^{⊤} \\ Z & 0 \end{matrix}],

(2)

and Lemma 1 applies to its normalized Laplacian

{\tilde{L}}_{S} = I - D_{S}^{- 1 / 2} S D_{S}^{- 1 / 2}

, where

D_{S}

is the diagonal degree matrix. If

{\tilde{L}}_{S}

has exactly c zero eigenvalues, then S possesses exactly c connected components, each defining a cluster. This observation motivates the coclustering decoding adopted in Section 3.

In the standard spectral clustering pipeline [3], the c eigenvectors of L associated with the smallest eigenvalues are collected into F with

F^{⊤} F = I_{c}

, and K-means is applied to the rows of F to produce cluster labels. A multi-view extension replaces the single Laplacian with a weighted combination

\sum_{v = 1}^{V} ω_{v} L^{v}

, and the embedding is obtained by minimizing

min_{F^{⊤} F = I_{c}} \sum_{v = 1}^{V} ω_{v} Tr (F^{⊤} L^{v} F) .

(3)

This formulation is invariant under relabeling of the view indices, provided the weights are permuted accordingly. Assigning adaptive weights and bypassing K-means post-processing are two design choices that the proposed method addresses.

2.2. Bipartite Graph

Full graph spectral clustering requires

O (N^{2})

storage and

O (N^{3})

eigendecomposition time. Bipartite graph methods avoid this bottleneck by introducing r representative anchor points with

r ≪ N

and learning an

r \times N

affinity matrix instead of an

N \times N

similarity matrix.

Given a data matrix X and an anchor matrix A, the bipartite graph Z is obtained by expressing each sample as a non-negative combination of the anchors [11,14,15],

min_{Z \geq 0, Z^{⊤} 1 = 1} {∥X^{⊤} - A^{⊤} Z∥}_{F}^{2} + α {∥Z∥}_{F}^{2},

(4)

where

α > 0

is a regularization parameter. The constraint

Z^{⊤} 1 = 1

requires each column of Z to lie on the probability simplex

Δ^{r - 1} = \{z \in R^{r} ∣ z \geq 0, 1^{⊤} z = 1\} .

(5)

The Euclidean projection of an arbitrary vector

y \in R^{r}

onto

Δ^{r - 1}

admits the closed-form solution

z_{i}^{*} = max (y_{i} - θ, 0)

, where the threshold

θ

is uniquely determined by

\sum_{i} max (y_{i} - θ, 0) = 1

and can be computed in

O (r log r)

time [26]. Graph construction, storage, and all subsequent operations on Z scale as

O (N r)

.

In the multi-view setting, each view v produces a bipartite graph

Z^{v}

, and fusing

{Z^{v}}_{v = 1}^{V}

into a consensus bipartite graph that captures the shared cluster structure is the central algorithmic challenge.

The orthogonality constraint

W^{⊤} W = I_{p}, W \in R^{n \times p},

(6)

requires that the columns of W are orthonormal. The projection matrix

W^{v}

satisfies

{(W^{v})}^{⊤} W^{v} = I_{d_{v}^{'}}

, and the rotation matrix R satisfies

R^{⊤} R = I_{c}

. Both constraints admit efficient solutions through eigenvalue decomposition and singular value decomposition, as detailed in Section 3.

2.3. Matrix Inequalities

The closed-form updates for the projection matrix, the spectral embedding, and the rotation matrix each rely on one of the following two inequalities.

Lemma 2

(Courant–Fischer min-max theorem [27]). Let

M \in R^{n \times n}

be a symmetric matrix with eigenvalues

λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

. For any W with

W^{⊤} W = I_{p}

and

p \leq n

,

Tr (W^{⊤} M W) \geq \sum_{k = 1}^{p} λ_{k},

(7)

and equality holds when the columns of W are the eigenvectors of M corresponding to

λ_{1}, \dots, λ_{p}

.

This result provides the global minimizer of

Tr (W^{⊤} M W)

subject to

W^{⊤} W = I_{p}

, which governs the optimal projection subspace for each view.

Lemma 3

(von Neumann trace inequality [28]). Let

A, B \in R^{m \times n}

with singular values

σ_{1} (A) \geq \dots \geq σ_{min (m, n)} (A)

and

σ_{1} (B) \geq \dots \geq σ_{min (m, n)} (B)

, respectively. Then

Tr (A^{⊤} B) \leq \sum_{i = 1}^{min (m, n)} σ_{i} (A) σ_{i} (B),

(8)

with equality when

A = U_{B} V_{B}^{⊤}

, where

B = U_{B} Σ_{B} V_{B}^{⊤}

is a singular value decomposition of B.

2.4. Spectral Rotation

The eigenvector matrix F obtained from a graph Laplacian is determined only up to multiplication by an arbitrary orthogonal matrix. Two bases F and

F Q

with

Q^{⊤} Q = I_{c}

span the same subspace and encode the same partition information, yet K-means applied to F and to

F Q

may yield different cluster assignments.

Spectral rotation [24] eliminates this ambiguity. Given F with

F^{⊤} F = I_{c}

and a discrete indicator matrix

Y \in Ind

, the joint optimization

max_{R^{⊤} R = I_{c}, Y \in Ind} Tr ({(F R)}^{⊤} Y)

(9)

aligns the rotated embedding

F R

with Y by alternating two steps. With Y fixed, the optimal R is obtained by solving the orthogonal Procrustes problem via Lemma 3. With R fixed, the optimal Y is obtained by selecting the largest entry in each row of

F R

, a deterministic operation that requires no random initialization. The alternating procedure converges monotonically because each step increases the trace objective over its respective feasible set. This mechanism produces identical cluster labels across independent runs for any fixed input F, resolving the reproducibility issue associated with K-means post-processing.

3. Proposed Method

This section presents the PEBGL framework, develops each model component, and derives the alternating optimization procedure.

3.1. Projected Bipartite Graph Learning

Existing anchor-based methods construct bipartite graphs directly in the original feature space. When the feature dimensionality is high, pairwise distances between samples and anchors tend to concentrate, degrading the quality of the resulting affinity graphs. To mitigate this effect, an orthogonal projection is introduced for each view, mapping the original features into a compact subspace before graph construction.

For each view v, a matrix

W^{v} \in R^{d_{v} \times d_{v}^{'}}

with orthonormal columns, satisfying

{(W^{v})}^{⊤} W^{v} = I_{d_{v}^{'}}

, projects both the data matrix

X^{v}

and the anchor matrix

A^{v}

into a common low-dimensional space,

{\tilde{X}}^{v} = X^{v} W^{v}, {\tilde{A}}^{v} = A^{v} W^{v} .

(10)

The projected dimension

d_{v}^{'}

is determined individually for each view by retaining the leading principal components that account for at least 95% of the cumulative variance. A lower bound of c is imposed to preserve sufficient discriminative capacity. For high-dimensional small-sample datasets with

d_{v} ≫ N

, the projected dimension is capped at c to avoid an underdetermined linear system.

In the general formulation,

W^{v}

appears as an optimization variable subject to

{(W^{v})}^{⊤} W^{v} = I_{d_{v}^{'}}

. With all other variables held fixed, the subproblem for

W^{v}

reduces to

min_{{(W^{v})}^{⊤} W^{v} = I_{d_{v}^{'}}} {∥X^{v} W^{v} - Z^{v ⊤} A^{v} W^{v}∥}_{F}^{2} .

(11)

Let

E^{v} = X^{v} - Z^{v ⊤} A^{v} \in R^{N \times d_{v}}

denote the reconstruction residual. Applying the cyclic property of the trace,

{∥ E^{v} W^{v} ∥}_{F}^{2} = Tr [{(W^{v})}^{⊤} {(E^{v})}^{⊤} E^{v} W^{v}]

. By the Courant–Fischer min-max theorem [27], the global minimizer is

W^{v} = [q_{1}, \dots, q_{d_{v}^{'}}]

, formed by the eigenvectors of

{(E^{v})}^{⊤} E^{v}

corresponding to its

d_{v}^{'}

smallest eigenvalues. Because the dominant eigenvectors of

{(E^{v})}^{⊤} E^{v}

are governed by the principal subspace of

X^{v}

, precomputing

W^{v}

via PCA on

X^{v}

yields a close approximation to this minimizer while avoiding potential numerical difficulties in the alternating loop. Accordingly,

W^{v}

is computed once and held fixed throughout the subsequent optimization.

A prerequisite for bipartite graph methods is the construction of semantically aligned anchors across views. Independent per-view anchor generation risks semantic misalignment, where the k-th anchor in one view corresponds to a different data region than the k-th anchor in another view. To ensure consistent anchor semantics, all views are concatenated into

X_{cat} = [X^{1}, X^{2}, \dots, X^{V}] \in R^{N \times D}

with

D = \sum_{v} d_{v}

, and K-means is performed on

X_{cat}

to obtain r global centroids. These centroids are then split along the feature dimension to recover view-specific anchor matrices

A^{v}

, guaranteeing that corresponding anchors share the same global origin across all views.

Given the projected data

{\tilde{X}}^{v}

and projected anchors

{\tilde{A}}^{v}

, a view-specific bipartite graph

Z^{v}

is constructed to encode the affinity between each sample and the r anchors. Following the anchor-based subspace learning paradigm [14,15], each projected sample is modeled as a non-negative linear combination of the projected anchors, leading to the reconstruction objective

min_{Z^{v}} {∥{\tilde{X}}^{v ⊤} - {\tilde{A}}^{v ⊤} Z^{v}∥}_{F}^{2} + α {∥Z^{v}∥}_{F}^{2}, s . t . Z^{v} \geq 0, Z^{v ⊤} 1 = 1,

(12)

where

α > 0

is a regularization parameter that prevents trivial solutions and promotes well-conditioned graphs. The non-negativity constraint

Z^{v} \geq 0

ensures that the affinity values are interpretable as membership weights, and the column-sum constraint

Z^{v ⊤} 1 = 1

normalizes each sample’s affinities to lie on the probability simplex. The entry

Z_{j k}^{v}

represents the reconstruction weight of the k-th sample with respect to the j-th anchor.

The reconstruction is performed in the projected space

R^{d_{v}^{'}}

rather than in the original space

R^{d_{v}}

. This distinction is essential when

d_{v}

is large, because the Frobenius norm in (12) measures distances through

d_{v}^{'}

-dimensional inner products, yielding more reliable affinity estimates by concentrating on the variance-preserving directions.

3.2. Consensus Graph Fusion

Each view provides a different perspective of the underlying cluster structure, and the view-specific bipartite graphs

{Z^{v}}_{v = 1}^{V}

may exhibit both complementary patterns and view-specific artifacts. To aggregate cross-view information, a consensus bipartite graph P is introduced through the weighted discrepancy minimization

min_{P \geq 0} \sum_{v = 1}^{V} ω_{v} {∥P - Z^{v}∥}_{F}^{2},

(13)

where

ω_{v} > 0

denotes the adaptive weight for the v-th view. This formulation encourages P to serve as a centroid of the view-specific graphs in the Frobenius norm sense, with views of higher quality exerting greater influence through larger weights. The fusion objective in (13) treats all views with permutation symmetry, meaning that relabeling the view indices does not alter the optimization landscape. Unlike methods that fuse graphs at the spectral embedding level [17] or through direct concatenation [9], operating in the bipartite graph space preserves the

O (N r)

complexity throughout the fusion process.

Different views contribute unequally to the clustering task due to varying feature quality and relevance to the underlying structure. To adaptively balance view contributions, each view is assigned a weight

ω_{v} > 0

subject to

\sum_{v = 1}^{V} ω_{v} = 1

, and an entropy regularization term is introduced as

δ \sum_{v = 1}^{V} ω_{v} ln ω_{v},

(14)

where

δ > 0

controls the smoothness of the weight distribution. A large

δ

drives the weights toward the uniform distribution

ω_{v} = 1 / V

, while a small

δ

concentrates weight on the views with the lowest reconstruction cost. The entropy term prevents degenerate solutions where all weight collapses onto a single view, and it admits a closed-form softmax solution as derived in Section 3.4. The temperature parameter is set adaptively as

δ = \bar{h} / ln (V + 1)

, where

\bar{h} = (1 / V) \sum_{v = 1}^{V} h_{v}

denotes the mean per-view cost, thereby scaling the entropy regularization to the magnitude of the objective without manual tuning.

To extract the cluster structure encoded in the consensus bipartite graph P, the augmented bipartite graph is formed as

S = [\begin{matrix} 0 & P^{⊤} \\ P & 0 \end{matrix}] \in R^{(N + r) \times (N + r)} .

(15)

The symmetric normalized Laplacian of S is

L_{S} = I - D_{S}^{- 1 / 2} S D_{S}^{- 1 / 2}

, where

D_{S}

denotes the diagonal degree matrix. By Lemma 1, the number of connected components in S equals the multiplicity of the zero eigenvalue of

L_{S}

. Let

L_{P}

denote the

N \times N

symmetric normalized Laplacian obtained by restricting

L_{S}

to the sample nodes. A spectral embedding

\hat{F}

is extracted from the c eigenvectors of

L_{P}

associated with the smallest eigenvalues, subject to the orthonormality constraint

{\hat{F}}^{⊤} \hat{F} = I_{c}

. The spectral embedding term

Tr ({\hat{F}}^{⊤} L_{P} \hat{F})

(16)

encourages P to possess a clear block-diagonal structure conducive to clustering.

To bridge the gap between the continuous embedding

\hat{F}

and discrete cluster assignments, the spectral rotation framework [24] is adopted. An orthogonal matrix R is sought such that the rotated embedding

\hat{F} R

aligns closely with a discrete indicator matrix

Y \in Ind

. The alignment is measured by

max_{R^{⊤} R = I_{c}, Y \in Ind} Tr ({(\hat{F} R)}^{⊤} Y) .

(17)

The orthogonal rotation R absorbs the inherent rotational ambiguity of the spectral embedding, and the indicator matrix Y directly encodes cluster membership through the row-wise argmax of

\hat{F} R

, bypassing K-means entirely. This produces deterministic cluster labels for any fixed parameter configuration.

3.3. Objective Function

Assembling all components, the complete optimization problem is

\begin{matrix} min_{\begin{matrix} Z^{v}, P, \hat{F}, \\ R, Y, ω \end{matrix}} & \sum_{v = 1}^{V} ω_{v} {∥{\tilde{X}}^{v ⊤} - {\tilde{A}}^{v ⊤} Z^{v}∥}_{F}^{2} + α \sum_{v = 1}^{V} ω_{v} {∥Z^{v}∥}_{F}^{2} + β \sum_{v = 1}^{V} ω_{v} {∥P - Z^{v}∥}_{F}^{2} \\ + Tr ({\hat{F}}^{⊤} L_{P} \hat{F}) - η Tr ({(\hat{F} R)}^{⊤} Y) + δ \sum_{v = 1}^{V} ω_{v} ln ω_{v} \\ s . t . & Z^{v} \geq 0, Z^{v ⊤} 1 = 1, P \geq 0, {\hat{F}}^{⊤} \hat{F} = I_{c}, R^{⊤} R = I_{c}, Y \in Ind, \sum_{v = 1}^{V} ω_{v} = 1, \end{matrix}

(18)

where

α

,

β

,

η > 0

are trade-off parameters. The six terms in (18) serve complementary roles. The reconstruction term and the Frobenius regularization together measure anchor-based approximation quality in the projected subspace while preventing trivial solutions for

Z^{v}

. The consensus fusion term couples all views through the shared graph P, and the spectral embedding term extracts the block-diagonal structure of P via its symmetric normalized Laplacian

L_{P}

. The discrete rotation term aligns the continuous embedding to cluster indicators, and the entropy term adaptively distributes view weights.

3.4. Optimization

Problem (18) is solved by alternating minimization. Each variable is updated in turn while all others are held fixed, and every subproblem admits either a closed-form solution or an efficient iterative procedure. The update rules are presented below as steps (A) through (E).

(A) Update $Z^{v}$ . With all other variables fixed, the positive weight $ω_{v}$ does not affect the location of the minimizer and can be dropped. The subproblem for the v-th view is

min_{Z^{v} \geq 0, Z^{v ⊤} 1 = 1} {∥{\tilde{X}}^{v ⊤} - {\tilde{A}}^{v ⊤} Z^{v}∥}_{F}^{2} + α {∥Z^{v}∥}_{F}^{2} + β {∥P - Z^{v}∥}_{F}^{2} .

(19)

Expanding each term using

{∥ M ∥}_{F}^{2} = Tr (M^{⊤} M)

yields

{∥{\tilde{X}}^{v ⊤} - {\tilde{A}}^{v ⊤} Z^{v}∥}_{F}^{2} = Tr ({\tilde{X}}^{v} {\tilde{X}}^{v ⊤}) - 2 Tr (Z^{v} {\tilde{X}}^{v} {\tilde{A}}^{v ⊤}) + Tr (Z^{v ⊤} {\tilde{A}}^{v} {\tilde{A}}^{v ⊤} Z^{v}),

(20)

α {∥Z^{v}∥}_{F}^{2} = α Tr (Z^{v} Z^{v ⊤}),

(21)

β {∥P - Z^{v}∥}_{F}^{2} = β Tr (P P^{⊤}) - 2 β Tr (P Z^{v ⊤}) + β Tr (Z^{v} Z^{v ⊤}) .

(22)

Collecting all terms that depend on

Z^{v}

and discarding constants, the objective reduces to

Tr ({\tilde{A}}^{v} {\tilde{A}}^{v ⊤} Z^{v} Z^{v ⊤}) + (α + β) Tr (Z^{v} Z^{v ⊤}) - 2 Tr (Z^{v} {\tilde{X}}^{v} {\tilde{A}}^{v ⊤}) - 2 β Tr (P Z^{v ⊤}) .

(23)

Taking the matrix derivative with respect to

Z^{v}

and applying the identities

\frac{\partial Tr (Z^{⊤} A Z)}{\partial Z} = 2 A Z, \frac{\partial Tr (B Z^{⊤})}{\partial Z} = B,

(24)

where A is symmetric, the stationarity condition becomes

2 {\tilde{A}}^{v} {\tilde{A}}^{v ⊤} Z^{v} + 2 (α + β) Z^{v} - 2 {\tilde{A}}^{v} {\tilde{X}}^{v ⊤} - 2 β P = 0 .

(25)

Dividing by 2 and rearranging,

[{\tilde{A}}^{v} {\tilde{A}}^{v ⊤} + (α + β) I_{r}] Z^{v} = {\tilde{A}}^{v} {\tilde{X}}^{v ⊤} + β P .

(26)

The coefficient matrix

Φ^{v} = {\tilde{A}}^{v} {\tilde{A}}^{v ⊤} + (α + β) I_{r}

(27)

is symmetric positive definite, since

{\tilde{A}}^{v} {\tilde{A}}^{v ⊤}

is positive semidefinite and

α + β > 0

. The unconstrained minimizer is therefore unique and given by

{\hat{Z}}^{v} = {(Φ^{v})}^{- 1} [{\tilde{A}}^{v} {\tilde{X}}^{v ⊤} + β P] .

(28)

The matrix

Φ^{v}

is of size

r \times r

with

r ≪ N

, so its Cholesky factorization costs

O (r^{3})

and the subsequent back-substitution costs

O (r^{2} N)

. The dominant cost is the product

{\tilde{A}}^{v} {\tilde{X}}^{v ⊤}

, which requires

O (r d_{v}^{'} N)

operations. The regularization term

β P

in the right-hand side of (28) draws each view-specific graph toward the shared consensus graph, with the strength of this effect governed by

β

.

The unconstrained solution

{\hat{Z}}^{v}

generally violates the constraints

Z^{v} \geq 0

and

Z^{v ⊤} 1 = 1

. Each column of

{\hat{Z}}^{v}

is projected onto the probability simplex, yielding

z_{i}^{*} = max (y_{i} - θ, 0)

, where

θ

is determined in

O (r log r)

time [26].

(B) Update P. With all other variables fixed, the terms involving P are

min_{P \geq 0} β \sum_{v = 1}^{V} ω_{v} {∥P - Z^{v}∥}_{F}^{2} + Tr ({\hat{F}}^{⊤} L_{P} \hat{F}) .

(29)

A two-stage approach is adopted. The fusion term is minimized by expanding

\sum_{v} ω_{v} {∥ P - Z^{v} ∥}_{F}^{2}

and differentiating with respect to P,

2 (\sum_{v = 1}^{V} ω_{v}) P - 2 \sum_{v = 1}^{V} ω_{v} Z^{v} = 0,

(30)

which yields the weighted average

\bar{Z} = \frac{\sum_{v = 1}^{V} ω_{v} Z^{v}}{\sum_{v = 1}^{V} ω_{v}},

(31)

followed by projection onto the non-negative orthant. The spectral term is then incorporated through coclustering refinement [10,11]. The augmented graph S is formed as in (15) and its symmetric normalized Laplacian is

L_{S} = I - D_{S}^{- 1 / 2} S D_{S}^{- 1 / 2}

. The refinement solves

min_{S \geq 0, S^{⊤} 1 = 1} {∥S - \bar{Z}∥}_{F}^{2} + 2 λ Tr (F^{⊤} L_{S} F), s . t . F^{⊤} F = I_{c},

(32)

where

λ > 0

is automatically adjusted. Following the adaptive strategy of [11,14],

λ

is doubled when the number of near-zero eigenvalues of

L_{S}

is less than c and halved when it exceeds c, until the augmented graph possesses exactly c connected components. This mechanism eliminates a hyperparameter while imposing the desired cluster count.

(C) Update $\hat{F}$ . With all other variables fixed, the subproblem is

min_{{\hat{F}}^{⊤} \hat{F} = I_{c}} Tr ({\hat{F}}^{⊤} L_{P} \hat{F}) - η Tr ({(\hat{F} R)}^{⊤} Y),

(33)

which is equivalently written as

max_{{\hat{F}}^{⊤} \hat{F} = I_{c}} Tr [{\hat{F}}^{⊤} (- L_{P} \hat{F} + η Y R^{⊤})] .

(34)

Following Luo et al. [18], this is solved via iterative SVD. At the k-th inner step, form

N_{k} = - 2 L_{P} {\hat{F}}_{k} + η Y R^{⊤},

(35)

compute its thin SVD

N_{k} = U_{k} Σ_{k} V_{k}^{⊤}

, and update

{\hat{F}}_{k + 1} = U_{k} V_{k}^{⊤} .

(36)

By the von Neumann trace inequality, the update

{\hat{F}}_{k + 1} = U_{k} V_{k}^{⊤}

maximizes

Tr ({\hat{F}}^{⊤} N_{k})

over all matrices with orthonormal columns. This guarantees monotonic improvement at each inner step. The embedding

\hat{F}

is initialized from the spectral decomposition of P and evolves smoothly across outer iterations without re-initialization; two to three inner iterations suffice in practice.

(D) Update R and Y. With all other variables fixed, the subproblem for R reduces to the orthogonal Procrustes problem

max_{R^{⊤} R = I_{c}} Tr (R^{⊤} {\hat{F}}^{⊤} Y) .

(37)

Let

{\hat{F}}^{⊤} Y = U_{R} Σ_{R} V_{R}^{⊤}

be the thin SVD. By the von Neumann trace inequality

Tr (R^{⊤} {\hat{F}}^{⊤} Y) = Tr (R^{⊤} U_{R} Σ_{R} V_{R}^{⊤}) \leq Tr (Σ_{R}),

(38)

and the upper bound is attained at

R^{*} = U_{R} V_{R}^{⊤} .

(39)

For the indicator matrix Y, the subproblem with R and

\hat{F}

fixed is

max_{Y \in Ind} Tr (Y^{⊤} \hat{F} R) .

(40)

Let

F^{*} = \hat{F} R

. Since Y is an indicator matrix, the trace decomposes row-wise as

Tr (Y^{⊤} F^{*}) = \sum_{i = 1}^{N} F_{i, c_{i}}^{*}

, where

c_{i}

indexes the unit entry in the i-th row of Y. Each row is independent, and the global maximum is achieved by

Y_{i j} = \{\begin{matrix} 1 & if j = \arg \max_{k} F_{i k}^{*}, \\ 0 & otherwise . \end{matrix}

(41)

(E) Update ω. Collecting all terms involving $ω_{v}$ from (18), the subproblem is

min_{ω_{v} > 0, \sum_{v} ω_{v} = 1} \sum_{v = 1}^{V} ω_{v} h_{v} + δ \sum_{v = 1}^{V} ω_{v} ln ω_{v},

(42)

where the per-view cost aggregating the reconstruction, regularization, and consensus terms of (18) is

h_{v} = {∥{\tilde{X}}^{v ⊤} - {\tilde{A}}^{v ⊤} Z^{v}∥}_{F}^{2} + α {∥Z^{v}∥}_{F}^{2} + β {∥P - Z^{v}∥}_{F}^{2} .

(43)

Introducing a Lagrange multiplier

μ

for the constraint

\sum_{v} ω_{v} = 1

, the Lagrangian is

L (ω, μ) = \sum_{v = 1}^{V} ω_{v} h_{v} + δ \sum_{v = 1}^{V} ω_{v} ln ω_{v} - μ (\sum_{v = 1}^{V} ω_{v} - 1) .

(44)

Setting

\partial L / \partial ω_{v} = 0

gives

h_{v} + δ (ln ω_{v} + 1) - μ = 0,

(45)

from which

ω_{v} = exp (\frac{μ - δ - h_{v}}{δ}) = exp (\frac{μ - δ}{δ}) exp (\frac{- h_{v}}{δ}) .

(46)

Substituting into

\sum_{v} ω_{v} = 1

to eliminate

μ

yields

exp (\frac{μ - δ}{δ}) = {[\sum_{k = 1}^{V} exp (\frac{- h_{k}}{δ})]}^{- 1} .

(47)

Substituting back into (46) gives the softmax solution

ω_{v} = \frac{exp (- h_{v} / δ)}{\sum_{k = 1}^{V} exp (- h_{k} / δ)} .

(48)

The objective (42) is strictly convex in

ω

for any

δ > 0

because its Hessian is

δ \cdot diag (1 / ω_{1}, \dots, 1 / ω_{V})

, which is positive definite. Therefore (48) is the unique global minimizer.

3.5. Algorithm Summary

The complete procedure is summarized in Algorithm 1.

Algorithm 1 PEBGL: Projection-Enhanced Bipartite Graph Learning.

Require:: Multi-view data ${X^{v}}_{v = 1}^{V}$ , number of clusters c, anchor count r, parameters $α$ , $β$ , $η$ .
1:: Generate joint anchors ${A^{v}}$ via concatenation and K-means with 30 restarts.
2:: Compute PCA projections $W^{v}$ ; form ${\tilde{X}}^{v} = X^{v} W^{v}$ and ${\tilde{A}}^{v} = A^{v} W^{v}$ .
3:: Initialize $Z^{v}$ via distance-based simplex projection.
4:: Initialize $P = (1 / V) \sum_{v} Z^{v}$ .
5:: Initialize $\hat{F}$ from the spectral embedding of P.
6:: Initialize Y via K-means on $\hat{F}$ ; set $R = I_{c}$ ; set $ω_{v} = 1 / V$ .
7:: while not converged do
8:: Update $Z^{v}$ for each view via Equation (28) and simplex projection.
9:: Update P via Equation (31) and coclustering refinement (32).
10:: Update $\hat{F}$ via Equations (35) and (36) with 2–3 inner SVD iterations.
11:: Update R via Equation (39).
12:: Update Y via Equation (41).
13:: Update $ω$ via Equation (48) with $δ = \bar{h} / ln (V + 1)$ .
14:: end while
Ensure:: Cluster labels from connected-component detection on P.

Remark 1.

The indicator matrix Y is initialized via K-means applied to the initial spectral embedding

\hat{F}

. Within the main loop, Y is updated exclusively by the row-wise argmax rule (41) and no further K-means invocation occurs. Although Y also provides cluster labels through discrete rotation, the experiments in Section 4 report results from connected-component detection on the converged P, which is entirely deterministic for a fixed random seed in the anchor generation stage.

3.6. Complexity

The per-iteration cost of Algorithm 1 is analyzed as follows. Updating

Z^{v}

for a single view requires

O (r d_{v}^{'} N)

operations, dominated by the matrix product

{\tilde{A}}^{v} {\tilde{X}}^{v ⊤}

, with the Cholesky factorization of the

r \times r

matrix

Φ^{v}

contributing

O (r^{3})

and the column-wise simplex projection contributing

O (N r log r)

. Summing over V views gives

O (V r d_{max}^{'} N)

. The consensus graph update via the weighted average and coclustering refinement costs

O ((V + T_{P}) r N)

, where

T_{P}

is the number of inner refinement steps. The spectral embedding, rotation, and weight updates collectively cost

O (N c^{2})

. The overall per-iteration complexity is therefore

O (V r d_{max}^{'} N)

, which is linear in N. The PCA preprocessing incurs a one-time cost of

O (\sum_{v} d_{v}^{2} N)

outside the iterative loop.

4. Experiments

All experiments are executed in MATLAB R2024a on a machine equipped with an AMD Ryzen 9 7945HX processor and 32 GB of RAM.

4.1. Datasets and Evaluation Metrics

Six publicly available multi-view datasets spanning image recognition, handwritten digit classification, and plant species identification are adopted. Table 2 summarizes their statistics.

Here N, V, and c denote the number of samples, views, and clusters, respectively, and

d_{v}

is the feature dimension of the v-th view.

MSRCV1 [29] contains 210 images from 7 semantic categories, each described by 5 feature types including color moments, HOG descriptors, GIST features, LBP textures, and CENTRIST features.

Yale [30] consists of 165 face images of 15 individuals captured under varying illumination and expressions, represented by 3 high-dimensional views.

ORL [31] includes 400 face images from 40 subjects with variations in lighting, expression, and facial details, also described by 3 views with the same feature types as Yale.

Handwritten [32] contains 2000 images of handwritten digits 0 through 9, characterized by 6 feature types including Fourier coefficients, profile correlations, Karhunen–Loève coefficients, pixel averages, Zernike moments, and morphological features.

100Leaves [33] comprises 1600 plant leaf samples from 100 species described by 3 views of shape, fine-scale margin, and texture descriptors, all with 64-dimensional features.

Caltech101-7 [34] contains 1474 images from 7 object categories, described by 6 views including Gabor features, wavelet moments, CENTRIST features, HOG descriptors, GIST features, and LBP textures.

Three widely used external evaluation metrics are adopted. Accuracy [35] measures the proportion of correctly assigned samples after optimal label permutation via the Hungarian algorithm. Normalized Mutual Information [36] quantifies the mutual dependence between predicted and true labels, normalized to the range

[0, 1]

. Purity [37] computes the fraction of samples in each cluster that belong to the dominant class. Higher values indicate better clustering quality for all three metrics.

4.2. Experimental Setup

PEBGL is compared against seven recent multi-view clustering methods based on anchor graphs or bipartite graph learning. GFSC [38] performs multi-graph fusion for multi-view spectral clustering. LMVSC [9] constructs anchor graphs per view independently and fuses them through concatenation. CDMGC [7] decomposes view graphs into consistent and diverse components for fusion. MSGL [13] develops structured graph learning from single-view to multi-view settings. SFMC [11] learns a parameter-free consensus bipartite graph with a Laplacian rank constraint. FPMVS-CAG [12] integrates anchor selection and graph construction without explicit hyperparameter tuning. DiBGF-MGC [16] separates bipartite graphs into consistency and diversity components through intra-view and inter-view constraints.

The results of all seven baseline methods are taken from [16], where each method was executed 10 times and the mean performance was recorded. PEBGL employs connected-component detection on the converged consensus bipartite graph for cluster assignment, so the output is deterministic for any given parameter configuration. Anchors are generated by K-means with 30 restarts, and the random seed is fixed for reproducibility.

For PEBGL, the hyperparameters are tuned via a two-stage grid search. In the first stage,

α

and

β

are swept over logarithmic grids

{10^{- 3}, \dots, 10^{3}}

and

{10^{- 3}, \dots, 10^{2}}

respectively at each candidate r, to identify the promising region. In the second stage, a finer search is performed around the identified optimum. The anchor count r is selected from a dataset-dependent candidate set ranging from c to

10 c

. The rotation trade-off is fixed at

η = 1

. The entropy temperature

δ

is set adaptively as

δ = \bar{h} / ln (V + 1)

, where

\bar{h}

denotes the mean reconstruction cost across views. For high-dimensional small-sample datasets where

d_{v} ≫ N

, the PCA projection dimension is capped at c to avoid having fewer samples than features after projection; this applies to ORL in the present experiments.

4.3. Clustering Results

Table 3, Table 4 and Table 5 present the clustering results measured by ACC, NMI, and Purity on six benchmark datasets. The best result in each row is highlighted in bold and the second best is underlined.

Table 3, Table 4 and Table 5 show that PEBGL attains the highest ACC and NMI on 100Leaves and Caltech101-7, two datasets that differ substantially in structure. 100Leaves involves 100 fine-grained plant species with low-dimensional views, while Caltech101-7 combines six feature types with dimensionality spanning from 40 to 1984. On 100Leaves, PEBGL surpasses DiBGF-MGC by 1.3, 0.3, and 0.7 percentage points in ACC, NMI, and Purity respectively. On Caltech101-7, the lead reaches 4.8 and 2.7 points in ACC and NMI.

On MSRCV1, PEBGL ranks second across all three metrics, trailing DiBGF-MGC by only 1.4 points in ACC while outperforming the remaining baselines by clear margins. On Handwritten, all top methods achieve above 95% ACC; PEBGL reaches 96.9%, close to DiBGF-MGC and CDMGC. The small gap reflects the well-separated digit classes and balanced six-view configuration. On Yale, PEBGL secures the second-highest ACC, though the gap to DiBGF-MGC is larger. Both Yale and ORL are high-dimensional face recognition benchmarks with thousands of features per view. On ORL, PEBGL ranks third, and with 40 classes and only 10 samples per class the anchor count

r = 40

equals the number of classes, limiting the bipartite graph’s representational capacity.

In summary, across the 18 metric–dataset combinations, PEBGL achieves the best result five times and the second-best five times. These results confirm that the proposed framework delivers competitive clustering accuracy on datasets with diverse scales and feature configurations while maintaining deterministic output and linear time complexity.

4.4. Ablation Study

To evaluate the contribution of each module, four ablation variants are constructed. PEBGL-1 removes the PCA projection and constructs bipartite graphs directly in the original feature space. PEBGL-2 sets

β = 0

so that each view-specific bipartite graph is learned independently without consensus feedback. PEBGL-3 replaces the connected-component decoding with K-means applied to the spectral embedding of the converged P. PEBGL-4 fixes

ω_{v} = 1 / V

throughout the optimization, disabling adaptive view weighting. Table 6 reports the ACC of all variants on six datasets.

PEBGL-1 removes the PCA projection. This causes the largest drop on high-dimensional datasets, with ACC falling from 71.5% to 16.4% on Yale and from 84.9% to 42.9% on Caltech101-7. On 100Leaves, all three views are 64-dimensional, so the projection has no measurable effect. PEBGL-2 disables consensus fusion and yields the largest average degradation across all six datasets. PEBGL-3 replaces connected-component decoding with K-means, reducing ACC on every dataset. PEBGL-4 fixes uniform view weights and suffers the most on Caltech101-7, where the six views differ greatly in quality. Overall, consensus fusion contributes the most, followed by the decoding strategy and adaptive weighting, while projection is decisive only for high-dimensional views.

4.5. Parameter Analysis

The sensitivity of PEBGL to its two primary hyperparameters

α

and

β

is investigated by varying each over

{10^{- 3}, 10^{- 2}, 10^{- 1}, 1, 10, 10^{2}, 10^{3}}

and

{10^{- 3}, 10^{- 2}, 10^{- 1}, 1, 10, 10^{2}}

respectively, while fixing r at its optimal value. Figure 2 visualizes ACC as 3D bar charts on six datasets.

The effect of the anchor count r on clustering accuracy is examined separately. With

α

and

β

fixed at their optimal values, r is varied over

{c, 2 c, 3 c, 5 c, 7 c, 10 c}

. Figure 3 reports the results.

As shown in Figure 2, PEBGL maintains stable ACC over a wide range of

α

and

β

on most datasets. On MSRCV1 and Handwritten, ACC remains above 60% and 80% respectively across several orders of magnitude, with degradation confined to extreme corners of the grid. Caltech101-7 shows a narrower optimal region concentrated at small

α

and

β

, and performance drops rapidly outside this region. Yale and ORL exhibit moderate sensitivity, with higher ACC values concentrated along intermediate

α

.

Figure 3 reveals that the optimal anchor count varies across datasets and does not follow a uniform trend. On Handwritten, ACC increases steadily with r and stabilizes near

r = 80

. On 100Leaves, performance peaks near

r = 900

, a value substantially larger than the class count

c = 100

, reflecting the need for a large anchor set to represent 100 leaf species. On Yale, ORL, and Caltech101-7, ACC peaks at a specific r and declines when r grows further, suggesting that excessive anchors introduce redundancy rather than additional discriminative information. On MSRCV1 the pattern is similar, with the best ACC achieved at

r = 45

.

4.6. Behavior of the Consensus Graph

To examine how the consensus bipartite graph evolves during optimization, Figure 4 tracks the relative change

∥ P^{t} - P^{t - 1} ∥_{F} / {∥ P^{t - 1} ∥}_{F}

across iterations on all six datasets.

As shown in Figure 4, the six datasets exhibit two distinct patterns. Yale, ORL, and Caltech101-7 start from a relatively large initial change above 0.3 and drop sharply within the first 10 iterations, after which the update magnitude remains near zero. MSRCV1, Handwritten, and 100Leaves begin with a much smaller initial change below 0.03 and decrease gradually over the 100-iteration horizon. The difference reflects the initialization quality of the consensus graph P, which already approximates the final solution on the latter three datasets. In all cases the relative change reaches a negligible level and the resulting cluster assignments remain unchanged over the final iterations, confirming that the algorithm produces stable output in practice.

4.7. Computational Cost

Table 7 reports the wall-clock running time of PEBGL on six datasets. For datasets with moderate feature dimensions the algorithm terminates within five seconds; MSRCV1 requires only 0.54 s and Caltech101-7 requires 2.11 s. The elevated times on Yale at 14.06 s and ORL at 14.91 s are attributable to the PCA preprocessing stage, which performs eigendecomposition of covariance matrices with dimensionality up to 6750; the main optimization loop itself stabilizes in under one second on these two datasets. On 100Leaves, the 20.02 s running time is driven by the large anchor count

r = 900

. Handwritten, despite having the largest sample size at

N = 2000

, completes in 4.89 s owing to its moderate feature dimensions ranging from 6 to 240. These timings are consistent with the

O (V r d_{max}^{'} N)

per-iteration complexity derived in Section 3.6, confirming that the overall cost scales linearly in the number of samples.

4.8. View Weight Analysis

To illustrate the behavior of the entropy-regularized adaptive weighting mechanism, Figure 5 traces the evolution of the view weights

ω_{v}

across iterations on MSRCV1 and Handwritten, the two datasets with the largest number of views.

On MSRCV1, views 1, 4, and 5 collectively receive over 96% of the total weight at the final iteration, while views 2 and 3 are assigned weights below 0.02. This allocation reflects the feature characteristics of the dataset, as views 2 and 3 retain only 10 and 19 PCA dimensions after projection, suggesting limited discriminative capacity in these two views. The weight distribution stabilizes within approximately 15 iterations and shifts only slightly thereafter.

On Handwritten, five of the six views share comparable weights between 0.12 and 0.30, whereas view 5 is effectively suppressed with

ω_{5} \approx 3 \times 10^{- 5}

. The 47-dimensional Zernike moment feature of view 5 provides negligible complementary information beyond what the remaining five feature types already capture. In effect, the adaptive weighting acts as a soft view selection mechanism that concentrates the fusion on informative views without manual tuning. The entropy regularization prevents the weights from collapsing to a single-view solution, preserving meaningful contributions from multiple views.

5. Conclusions

This paper proposes PEBGL, a unified multi-view clustering framework. The method integrates PCA-based subspace projection, bipartite graph construction, consensus fusion with entropy-penalized view weighting, spectral embedding, and discrete rotation into a single objective. Before building the bipartite graph, each view is projected onto a low-dimensional subspace that retains the dominant variance, which helps reduce the distance concentration effect in high-dimensional feature spaces. The entropy regularization mechanism adjusts view weights automatically during optimization. The final cluster labels are obtained by detecting connected components on the converged consensus graph, without relying on K-means. Every subproblem in the optimization admits a closed-form solution, and the overall complexity scales linearly with the number of samples. Experiments on six benchmark datasets against seven recent methods show that PEBGL achieves the best accuracy on two datasets and competitive results on the other four.

Author Contributions

Conceptualization, X.L. and J.-F.C.; methodology, X.L. and J.-F.C.; software, X.L. and J.-F.C.; validation, X.L. and J.-F.C.; formal analysis, X.L.; resources, Q.-W.W.; data curation, X.L.; writing—original draft preparation, Q.-W.W. and X.L.; writing—review and editing, Q.-W.W., X.L. and J.-F.C.; visualization, X.L. and J.-F.C.; supervision, Q.-W.W.; project administration, Q.-W.W.; funding acquisition, Q.-W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (No. 12371023).

Data Availability Statement

Data are contained within the paper. Readers can contact the authors to obtain the code for this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chao, G.; Sun, S.; Bi, J. A Survey on Multiview Clustering. IEEE Transactions on Artificial Intelligence 2021, 2, 146–168. [Google Scholar] [CrossRef]
Yang, Y.; Wang, H. Multi-view clustering: A survey. Big Data Mining and Analytics 2018, 1, 83–107. [Google Scholar] [CrossRef]
Ng, A.; Jordan, M.; Weiss, Y. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 2001, 14, 849–856. [Google Scholar]
Nie, F.; Li, J.; Li, X.; et al. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. Proceedings of the Ijcai 2016, 9, 1881–1887. [Google Scholar]
Wang, H.; Yang, Y.; Liu, B. GMC: Graph-Based Multi-View Clustering. IEEE Transactions on Knowledge and Data Engineering 2020, 32, 1116–1129. [Google Scholar] [CrossRef]
Liang, Y.; Huang, D.; Wang, C.D. Consistency meets inconsistency: A unified graph learning framework for multi-view clustering. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM); IEEE: New York, NY, USA, 2019; pp. 1204–1209. [Google Scholar]
Huang, S.; Tsang, I.W.; Xu, Z.; Lv, J. Measuring Diversity in Graph Learning: A Unified Framework for Structured Multi-View Clustering. IEEE Transactions on Knowledge and Data Engineering 2022, 34, 5869–5883. [Google Scholar] [CrossRef]
Li, Y.; Nie, F.; Huang, H.; Huang, J. Large-scale multi-view spectral clustering via bipartite graph. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
Kang, Z.; Zhou, W.; Zhao, Z.; Shao, J.; Han, M.; Xu, Z. Large-Scale Multi-View Subspace Clustering in Linear Time. Proceedings of the AAAI Conference on Artificial Intelligence 2020, 34, 4412–4419. [Google Scholar] [CrossRef]
Nie, F.; Wang, X.; Deng, C.; Huang, H. Learning a Structured Optimal Bipartite Graph for Co-Clustering. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4132–4141. [Google Scholar]
Li, X.; Zhang, H.; Wang, R.; Nie, F. Multiview Clustering: A Scalable and Parameter-Free Bipartite Graph Fusion Method. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 44, 330–344. [Google Scholar] [CrossRef]
Wang, S.; Liu, X.; Zhu, X.; Zhang, P.; Zhang, Y.; Gao, F.; Zhu, E. Fast Parameter-Free Multi-View Subspace Clustering With Consensus Anchor Guidance. IEEE Transactions on Image Processing 2022, 31, 556–568. [Google Scholar] [CrossRef]
Kang, Z.; Lin, Z.; Zhu, X.; Xu, W. Structured Graph Learning for Scalable Subspace Clustering: From Single View to Multiview. IEEE Transactions on Cybernetics 2022, 52, 8976–8986. [Google Scholar] [CrossRef]
Fang, S.G.; Huang, D.; Cai, X.S.; Wang, C.D.; He, C.; Tang, Y. Efficient Multi-View Clustering via Unified and Discrete Bipartite Graph Learning. IEEE Transactions on Neural Networks and Learning Systems 2024, 35, 11436–11447. [Google Scholar] [CrossRef]
Liu, S.; Liao, Q.; Wang, S.; Liu, X.; Zhu, E. Robust and consistent anchor graph learning for multi-view clustering. IEEE Transactions on Knowledge and Data Engineering 2024, 36, 4207–4219. [Google Scholar] [CrossRef]
Yan, W.; Zhao, X.; Yue, G.; Ren, J.; Xu, J.; Liu, Z.; Tang, C. Diversity-induced bipartite graph fusion for multiview graph clustering. IEEE Transactions on Emerging Topics in Computational Intelligence 2024, 8, 2592–2601. [Google Scholar] [CrossRef]
Li, S.; Liu, K.; Zheng, M.; Bai, L. Multi-view spectral clustering algorithm based on bipartite graph and multi-feature similarity fusion. Neural Networks 2025, 194, 108177. [Google Scholar] [CrossRef] [PubMed]
Luo, M.; Nie, F.; Chang, X.; Yang, Y.; Hauptmann, A.G.; Zheng, Q. Discrete Multi-Graph Clustering. IEEE Transactions on Image Processing 2019, 28, 4701–4712. [Google Scholar] [CrossRef] [PubMed]
Xia, W.; Gao, Q.; Wang, Q.; Gao, X.; Ding, C.; Tao, D. Tensorized bipartite graph learning for multi-view clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 45, 5187–5202. [Google Scholar] [CrossRef]
Gu, W.; Guo, J.; Wang, H.; Zhang, G.; Zhang, B.; Chen, J.; Cai, H. Efficient multi-view clustering via essential tensorized bipartite graph learning. IEEE Transactions on Emerging Topics in Computational Intelligence 2024, 9, 2952–2964. [Google Scholar] [CrossRef]
Zhao, Z.; Wang, T.; Xin, H.; Wang, R.; Nie, F. Multi-view clustering via high-order bipartite graph fusion. Information Fusion 2025, 113, 102630. [Google Scholar] [CrossRef]
Jiang, H.; Tao, H.; Jiang, Z.; Hou, C. Unaligned multi-view clustering via diversified anchor graph fusion. Pattern Recognition 2026, 170, 111977. [Google Scholar] [CrossRef]
Nie, F.; Wang, X.; Jordan, M.; Huang, H. The constrained laplacian rank algorithm for graph-based clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, Arizona, 12–17 February 2016. [Google Scholar]
Huang, J.; Nie, F.; Huang, H. Spectral rotation versus k-means in spectral clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 14–18 July 2013; pp. 431–437. [Google Scholar]
von Luxburg, U. A tutorial on spectral clustering. Statistics and Computing 2007, 17, 395–416. [Google Scholar] [CrossRef]
Duchi, J.; Shalev-Shwartz, S.; Singer, Y.; Chandra, T. Efficient projections onto thel1-ball for learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning—ICML ’08; ACM Press: New York, NY, USA, 2008; pp. 272–279. [Google Scholar] [CrossRef]
Fan, K. On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. Proceedings of the National Academy of Sciences 1949, 35, 652–655. [Google Scholar] [CrossRef]
Mirsky, L. A trace inequality of John von Neumann. Monatshefte Mathematik 1975, 79, 303–306. [Google Scholar] [CrossRef]
Lee, Y.J.; Grauman, K. Foreground focus: Unsupervised learning from partially matching images. International Journal of Computer Vision 2009, 85, 143–166. [Google Scholar] [CrossRef]
Zhao, H.; Ding, Z.; Fu, Y. Multi-view clustering via deep matrix factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Cao, X.; Zhang, C.; Fu, H.; Liu, S.; Zhang, H. Diversity-induced multi-view subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 586–594. [Google Scholar]
Dua, D.; Graff, C. UCI Machine Learning Repository; School of Information and Computer Sciences, University of California: Irvine, CA, USA, 2019; Available online: http://archive.ics.uci.edu/ml.
Mallah, C.; Cope, J.; Orwell, J. Plant leaf classification using probabilistic integration of shape, texture and margin features. In Proceedings of the 10th IASTED International Conference on Signal Processing, Pattern Recognition and Applications; ACTA Press: Calgary, AB, Canada; pp. 45–54.
Nie, F.; Cai, G.; Li, X. Multi-view clustering and semi-supervised classification with adaptive neighbours. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Xu, W.; Liu, X.; Gong, Y. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, TO, Canada, 28 July–1 August 2003; pp. 267–273. [Google Scholar]
Strehl, A.; Ghosh, J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 2002, 3, 583–617. [Google Scholar]
Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Kang, Z.; Shi, G.; Huang, S.; Chen, W.; Pu, X.; Zhou, J.T.; Xu, Z. Multi-graph fusion for multi-view spectral clustering. Knowledge-Based Systems 2020, 189, 105102. [Google Scholar] [CrossRef]

Figure 1. Overview of the PEBGL framework. Each view

X^{v}

is projected into a compact subspace via PCA projection

W^{v}

, within which view-specific bipartite graphs

Z^{v}

are constructed through anchor-based reconstruction. The view-specific graphs are fused into a consensus graph P through entropy-regularized adaptive weighting

ω_{v}

, followed by coclustering refinement. A spectral embedding

\hat{F}

is extracted from the normalized Laplacian of P and aligned to a discrete indicator Y through orthogonal rotation R, which serves as a structural regularizer during optimization. The final cluster labels are read from the connected components of the converged P.

Figure 1. Overview of the PEBGL framework. Each view

X^{v}

is projected into a compact subspace via PCA projection

W^{v}

, within which view-specific bipartite graphs

Z^{v}

are constructed through anchor-based reconstruction. The view-specific graphs are fused into a consensus graph P through entropy-regularized adaptive weighting

ω_{v}

, followed by coclustering refinement. A spectral embedding

\hat{F}

is extracted from the normalized Laplacian of P and aligned to a discrete indicator Y through orthogonal rotation R, which serves as a structural regularizer during optimization. The final cluster labels are read from the connected components of the converged P.

Figure 2. Parameter sensitivity of ACC with respect to

α

and

β

on six datasets. (a) MSRCV1. (b) Yale. (c) ORL. (d) Handwritten. (e) 100Leaves. (f) Caltech101-7.

Figure 2. Parameter sensitivity of ACC with respect to

α

and

β

on six datasets. (a) MSRCV1. (b) Yale. (c) ORL. (d) Handwritten. (e) 100Leaves. (f) Caltech101-7.

Figure 3. Effect of anchor count r on clustering ACC. (a) MSRCV1. (b) Yale. (c) ORL. (d) Handwritten. (e) 100Leaves. (f) Caltech101-7.

Figure 4. Relative change of the consensus graph P on six datasets. (a) MSRCV1. (b) Yale. (c) ORL. (d) Handwritten. (e) 100Leaves. (f) Caltech101-7.

Figure 5. Evolution of view weights

ω_{v}

across iterations. (a) MSRCV1. (b) Handwritten.

Figure 5. Evolution of view weights

ω_{v}

across iterations. (a) MSRCV1. (b) Handwritten.

Table 1. Summary of key notations.

Symbol	Description
$N, V, c, r$	Number of samples, views, clusters, and anchors
$d_{v}, d_{v}^{'}$	Original and projected feature dimension of view v
$X^{v}$	Feature matrix of view v, $X^{v} \in R^{N \times d_{v}}$
$A^{v}$	Anchor matrix of view v, $A^{v} \in R^{r \times d_{v}}$
$W^{v}$	Orthogonal projection matrix, ${(W^{v})}^{⊤} W^{v} = I_{d_{v}^{'}}$
${\tilde{X}}^{v}, {\tilde{A}}^{v}$	Projected data and anchor matrices
$Z^{v}$	Bipartite graph of view v, $Z^{v} \in R^{r \times N}$
P	Consensus bipartite graph, $P \in R^{r \times N}$
$\hat{F}$	Spectral embedding matrix, $\hat{F} \in R^{N \times c}$
R	Orthogonal rotation matrix, $R^{⊤} R = I_{c}$
Y	Discrete indicator matrix, $Y \in Ind$
$ω_{v}$	Adaptive weight for view v
$L_{P}$	Symmetric normalized Laplacian derived from P

Table 2. Descriptions of datasets.

Dataset	N	V	c	$d_{1}$	$d_{2}$	$d_{3}$	$d_{4}$	$d_{5}$	$d_{6}$
MSRCV1	210	5	7	24	576	512	256	254	–
Yale	165	3	15	4096	3304	6750	–	–	–
ORL	400	3	40	4096	3304	6750	–	–	–
Handwritten	2000	6	10	76	216	64	240	47	6
100Leaves	1600	3	100	64	64	64	–	–	–
Caltech101-7	1474	6	7	48	40	254	1984	512	928

Table 3. ACC (%) on six datasets.

Dataset	GFSC	LMVSC	CDMGC	MSGL	SFMC	FPMVS-CAG	DiBGF-MGC	PEBGL
100Leaves	39.9	61.4	86.3	48.8	70.9	35.6	87.2	88.5
Caltech101-7	49.3	72.7	73.6	73.3	65.3	61.5	80.1	84.9
MSRCV1	71.4	77.6	69.1	72.4	81.0	60.5	83.3	81.9
Handwritten	70.8	91.7	98.8	74.4	97.9	82.3	99.1	96.9
Yale	51.4	61.3	69.5	15.8	58.8	44.9	75.6	71.5
ORL	56.4	60.6	79.0	25.3	61.5	56.0	79.2	70.8

Table 4. NMI (%) on six datasets.

Dataset	GFSC	LMVSC	CDMGC	MSGL	SFMC	FPMVS-CAG	DiBGF-MGC	PEBGL
100Leaves	71.0	80.7	92.4	56.2	83.0	70.3	93.9	94.2
Caltech101-7	2.9	51.9	52.5	52.5	55.5	57.0	62.1	64.8
MSRCV1	64.6	66.9	64.3	57.3	72.1	55.6	75.7	74.5
Handwritten	68.9	84.4	97.2	75.1	94.8	79.2	98.0	93.3
Yale	55.1	79.7	68.9	12.7	60.0	51.6	85.9	70.6
ORL	72.4	78.1	84.1	45.9	76.6	76.3	91.5	83.8

Table 5. Purity (%) on six datasets.

Dataset	GFSC	LMVSC	CDMGC	MSGL	SFMC	FPMVS-CAG	DiBGF-MGC	PEBGL
100Leaves	79.3	70.1	88.3	60.0	72.8	36.9	89.4	90.1
Caltech101-7	92.8	75.2	89.1	77.3	85.3	86.6	90.5	89.4
MSRCV1	74.0	77.6	70.0	76.7	81.0	61.9	84.5	81.9
Handwritten	78.2	91.7	98.8	88.1	97.9	82.3	99.1	96.9
Yale	60.7	71.3	70.3	66.1	60.0	47.3	80.5	71.5
ORL	72.0	70.5	76.0	43.3	68.0	60.0	88.8	73.5

Table 6. Ablation study measured by ACC (%).

Variant	100Leaves	Caltech101-7	MSRCV1	Handwritten	Yale	ORL
PEBGL-1	88.5	42.9	64.8	84.9	16.4	70.3
PEBGL-2	42.1	50.9	72.4	68.9	67.9	61.5
PEBGL-3	68.2	53.3	77.6	94.9	66.1	62.0
PEBGL-4	86.2	47.3	76.2	96.4	66.7	70.3
PEBGL	88.5	84.9	81.9	96.9	71.5	70.8

Table 7. Running time (s) of PEBGL on six datasets.

Dataset	N	r	Time
MSRCV1	210	45	0.54
Yale	165	75	14.06
ORL	400	40	14.91
Handwritten	2000	80	4.89
100Leaves	1600	900	20.02
Caltech101-7	1474	35	2.11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Multi-View Clustering via Projection-Enhanced Bipartite Graph Learning and Consensus Fusion

Abstract

Keywords:

Subject:

1. Introduction

2. Preliminaries

2.1. Graph Laplacian

2.2. Bipartite Graph

2.3. Matrix Inequalities

2.4. Spectral Rotation

3. Proposed Method

3.1. Projected Bipartite Graph Learning

3.2. Consensus Graph Fusion

3.3. Objective Function

3.4. Optimization

3.5. Algorithm Summary

3.6. Complexity

4. Experiments

4.1. Datasets and Evaluation Metrics

4.2. Experimental Setup

4.3. Clustering Results

4.4. Ablation Study

4.5. Parameter Analysis

4.6. Behavior of the Consensus Graph

4.7. Computational Cost

4.8. View Weight Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe