4. Experiments
4.1. Experimental Design and Evaluation Roadmap
The experimental portion aims to assess BioMetaEvo-GNN as a bio-inspired, Bayesian Fourier-enhanced, immune-defensive, and meta-governed framework for community intelligence and multi-objective community identification. The proposed method encompasses various interacting mechanisms, such as Bayesian Fourier Learning, Energy-Time Filtering, Bio-Phase Selection, T-Cell Defense Dynamics, T-SILE immune memory, Defense-Enhancement-Transfer adaptation, UCB reinforcement governance, Stage-Transition Dynamics, Incremental Growth, and Multi-Objective Replacement; thus, the evaluation cannot be confined to traditional clean-graph clustering performance.
Consequently, the experimental design adheres to a stratified evaluative framework. The initial layer assesses if the proposed framework attains competitive performance in community detection under pristine graph conditions. The second layer evaluates the stability of identified communities in response to structural and attribute perturbations. The third layer evaluates the functionality of BioMetaEvo-GNN’s internal mechanisms, encompassing Bayesian uncertainty estimate, T-cell defensive response, T-SILE immunological memory preservation, UCB action selection, and stage-transition evolution. The fourth layer assesses transferability, incremental growth, scalability, failure instances, and statistical significance.
The complete experimental roadmap is summarized in
Table 1. This roadmap is designed to ensure that every major methodological component of BioMetaEvo-GNN is supported by a corresponding empirical analysis.
The experimental workflow of BioMetaEvo-GNN, illustrated in
Figure 4, is structured as a closed-loop evaluation pipeline. The procedure commences with authentic and artificial graph datasets, succeeded by meticulous training and validation. Once the model is chosen under pristine validation conditions, various perturbation configurations are implemented to assess resilience against noisy, incomplete, or structurally unstable graph observations. The altered and pristine graph representations are further analyzed by the proposed BioMetaEvo-GNN architecture, which amalgamates graph neural representation learning, Bayesian Fourier learning, energy-time filtering, T-cell defense dynamics, T-SILE immune learning, UCB governance, and stage-transition control.
The workflow emphasizes that the suggested evaluation extends beyond mere accuracy in clean community detection. It concurrently analyzes community quality, uncertainty reliability, immune-defense behavior, governance decisions, and perturbation stability. This architecture aligns with the primary goal of BioMetaEvo-GNN: converting community detection from a static graph partitioning task into a bio-inspired, uncertainty-aware, and meta-governed community intelligence process.
4.2. Datasets and Preprocessing
4.2.1. Real-World Attributed Graph Datasets
We assess BioMetaEvo-GNN on fourteen authentic attributed graph datasets. These datasets encompass citation networks, co-authorship networks, social networks, and co-purchase networks. This dataset was chosen due to its diversity in graph scales, feature dimensions, community counts, edge densities, and structural regimes. Minor citation graphs facilitate traditional benchmark comparisons, whereas large-scale graphs like Reddit, OGBN-Arxiv, and OGBN-Products are employed to assess scalability and resilience in realistic graph dimensions.
The empirical datasets are encapsulated in
Table 2. The dataset list is maintained uniformly across all evaluated methods to guarantee equitable assessment.
For each real-world dataset, duplicate edges are eliminated, and the adjacency matrix is symmetrized when the original graph is directed, but the assessment necessitates an undirected community structure. Self-loops are incorporated for neural baselines and for BioMetaEvo-GNN. Node features are normalized by rows to mitigate the impact of feature-scale discrepancies. In datasets containing isolated nodes or unconnected components, the original benchmark configuration is maintained where feasible; when reliable spectral computation is necessary, the largest connected component is also examined as a controlled robustness variation.
The quantity of communities K is determined by the number of annotated classes for the purpose of benchmark comparability. Ground-truth labels are utilized just for validation-driven model selection and final assessment. They are not utilized as supervised labels inside the community detection objective.
4.2.2. Synthetic Graph Datasets with Controlled Regimes
Real-world datasets are essential yet insufficient due to the entanglement of their structural variables. To isolate the influences of spectral gap, degree heterogeneity, overlapping membership, feature noise, and heterophily, we also create synthetic graph datasets derived from versions of the stochastic block model. These synthetic graphs provide a controlled examination of the successes and failures of BioMetaEvo-GNN.
The parameters for the synthetic graph are delineated in
Table 3. Each synthetic configuration is produced using five distinct random seeds, and identical preprocessing, perturbation, model selection, and assessment protocols are employed for both real-world and synthetic datasets.
Synthetic datasets fulfill three functions. Initially, they evaluate whether Bayesian Fourier Learning exhibits distinct behavior under conditions of mild and strong spectral gaps. Secondly, they evaluate if T-cell defense can inhibit untrustworthy communities when graph signals are indistinct. Third, they facilitate regulated failure analysis within heterophily and overlapping-community frameworks.
4.2.3. Preprocessing Pipeline
All datasets follow a unified preprocessing pipeline to ensure that performance differences are not caused by inconsistent data handling. Let the original graph be denoted as
. The preprocessing pipeline is defined as:
Duplicate edges are eliminated, self-loops are incorporated for neural models, characteristics are standardized, and graph partitions are created. Perturbations are implemented solely subsequent to the adoption of a clean model. This inhibits the model from overfitting to a particular sort of perturbation.
The preprocessing settings are summarized in
Table 4.
4.3. Training, Validation, and Testing Protocol
The evaluation methodology varies marginally among small graphs, large-scale graphs, synthetic graphs, and perturbation contexts. For small and medium attributed graphs, we employ repeated 60/20/20 node partitions. Official splits are utilized for extensive datasets like OGBN-Arxiv and OGBN-Products. In synthetic datasets, 60% of nodes are allocated for optimization, 20% for validation, and 20% for final testing. To ensure perturbation robustness, models are trained and selected using pristine validation graphs and assessed on disturbed test graphs.
Every neurological experiment is conducted using five distinct random seeds. Each perturbation strength in the experiments is replicated with five distinct realizations. Results are presented as mean ± standard deviation. Additionally, 95% confidence intervals are presented for deterioration curves and uncertainty localization.
Table 5.
Evaluation partition strategies under different experimental regimes.
Table 5.
Evaluation partition strategies under different experimental regimes.
| Evaluation Regime |
Optimization Phase |
Model Selection |
Final Assessment |
| Small/medium attributed graphs |
Random node subset 60% |
Held-out nodes 20% |
Disjoint test nodes 20% |
| Large-scale benchmarks |
Official training subset |
Official validation subset |
Official test subset |
| Synthetic graph families |
Structured node partition 60% |
Independent validation nodes 20% |
Unseen test nodes 20% |
| Perturbation robustness |
Clean training graph |
Clean validation graph |
Perturbed test realizations |
| Uncertainty analysis |
Clean graph with posterior estimation |
Validation calibration objective |
Drift-prone node localization |
| Transfer evaluation |
Source graph optimization |
Target validation graph |
Target test graph |
| Incremental growth |
Initial observed graph |
New-node validation subset |
Newly introduced test nodes |
4.4. Baseline Methods and Implementation Details
To ensure a fair and broad comparison, we include baseline methods from nine representative categories. These baselines cover non-neural graph partitioning, spectral partitioning, random-walk embedding, deterministic GNNs, probabilistic graph models, self-supervised graph learning, deep graph clustering, differentiable pooling, and robust graph learning.
The baseline implementation details are summarized in
Table 6. For embedding-based and self-supervised methods, learned node representations are clustered using
k-means with
K equal to the number of benchmark communities. Each
k-means clustering is repeated ten times with different initializations, and the result with the lowest within-cluster distortion is selected.
4.5. BioMetaEvo-GNN Implementation Details
BioMetaEvo-GNN employs a two-layer GNN encoder as its foundational architecture unless stated otherwise. The hidden dimension is chosen from the set , with a default value of 128. Bayesian Fourier Learning is utilized on the acquired representation to derive spectral representation and posterior uncertainty. Energy-Time Filtering stabilizes representations across training iterations. Bio-Phase Selection assesses potential communities prior to immune defense activation. T-Cell Defense Dynamics allocates immune responses to prospective groups. T-SILE preserves memory prototypes for stable communities. The UCB controller chooses from defense, augmentation, transfer, replacement, growth, and stability activities. Stage-transition dynamics modify the existing learning phase.
The main implementation settings are shown in
Table 7.
The hyperparameter search space is summarized in
Table 8.
4.6. Perturbation Evaluation Protocol
The perturbation assessment aims to determine if BioMetaEvo-GNN can maintain stable community assignments during graph noise. We assess five perturbation regimes: feature noise, feature masking, edge dropout, edge insertion, edge rewiring, and spectral-risk perturbation. Perturbations are implemented just during evaluation following model selection on pristine validation graphs. No model undergoes retraining or fine-tuning on altered test graphs.
The perturbation protocol is summarized in
Table 9.
Feature noise is generated as:
Feature masking is generated as:
Edge dropout is generated as:
For edge insertion, a proportion of non-edges is sampled and inserted into the graph. For edge rewiring, a proportion of existing edges is removed and replaced by randomly sampled non-edges while approximately preserving graph density.
For spectral-risk perturbation, the local spectral roughness of node
i is defined as:
where
L is the normalized graph Laplacian and
H is the learned node representation. Nodes with larger
are considered more spectrally fragile. The perturbation probability is defined as:
where
is a temperature parameter. A lower
concentrates perturbations around high-risk boundary nodes, while a higher
makes the perturbation closer to random sampling.
4.7. Expanded Metric System and Interpretation
Due to its design as a multi-objective community intelligence architecture, BioMetaEvo-GNN necessitates many evaluation metrics. We employ a multi-faceted metric system encompassing cleanliness quality, structural integrity, perturbation resilience, robustness trajectory, uncertainty calibration, instability localization, immunological response, governance dynamics, and scalability.
The metric groups are summarized in
Table 10.
The core metrics are defined as follows. Normalized Mutual Information is:
Assignment Drift under perturbation is:
Robustness Area Under Curve is:
The degradation slope is estimated by:
where a smaller
a indicates slower performance degradation.
Dirichlet energy drift is:
For uncertainty-aware evaluation, the uncertainty–drift correlation is:
4.8. Main Community Detection Results
This experiment evaluates the clean community detection performance of BioMetaEvo-GNN and all baselines. The purpose is to verify whether the proposed framework can improve standard clustering performance before testing robustness and internal mechanisms.
Table 11 reports family-level clean performance.
The results indicate that BioMetaEvo-GNN consistently enhances performance across all graph families. The improvements are modest in clean citation and co-authorship graphs, but become more significant in social and co-purchase graphs. This trend indicates that the suggested framework is particularly advantageous when graph boundaries are erratic, characteristics are unclear, and local neighborhoods are unreliable.
4.9. Dataset-Level Result Analysis
Family-level averages are useful, but they may hide dataset-specific behavior. Therefore, we further report dataset-level NMI results for representative baselines and BioMetaEvo-GNN in
Table 12. The purpose of this analysis is to verify that performance gains are not caused by only one or two favorable datasets.
The dataset-level results show that the proposed method consistently performs best or near-best across all datasets. The most visible gains appear on BlogCatalog, Flickr, Reddit, Amazon-Computers, and Amazon-Photo, where structural noise and attribute ambiguity are more significant.
4.10. Structural Perturbation Robustness
This experiment evaluates the robustness of community assignments under edge dropout, edge insertion, and edge rewiring. These perturbations correspond to missing interactions, false interactions, and simultaneous removal-insertion corruption. Robustness is measured by NMI retention and assignment drift.
Table 13.
Average NMI under structural perturbations.
Table 13.
Average NMI under structural perturbations.
| Method |
Clean |
Dropout 20% |
Insertion 20% |
Rewiring 20% |
Average Retention |
Drift↓ |
| GCN |
0.485 |
0.386 |
0.371 |
0.361 |
0.769 |
0.241 |
| GAT |
0.497 |
0.402 |
0.389 |
0.378 |
0.784 |
0.224 |
| VGAE |
0.506 |
0.425 |
0.411 |
0.397 |
0.812 |
0.198 |
| GRACE |
0.523 |
0.452 |
0.438 |
0.430 |
0.841 |
0.164 |
| DMoN |
0.531 |
0.461 |
0.447 |
0.433 |
0.842 |
0.158 |
| Pro-GNN |
0.522 |
0.471 |
0.459 |
0.448 |
0.880 |
0.127 |
| GNNGuard |
0.525 |
0.479 |
0.466 |
0.456 |
0.890 |
0.116 |
| GRAND |
0.529 |
0.488 |
0.475 |
0.466 |
0.901 |
0.104 |
| BioMetaEvo-GNN |
0.558 |
0.526 |
0.520 |
0.516 |
0.933 |
0.067 |
BioMetaEvo-GNN demonstrates superior retention and minimal drift across all structural alterations. The enhancement is particularly evident in edge rewiring, as this process concurrently eliminates dependable structural evidence while introducing deceptive edges. This directly corroborates the efficacy of T-Cell Defense Dynamics and Multi-Objective Replacement.
Figure 5 illustrates the community assignment dynamics prior to and after to perturbation. The clear graph illustrates the reference community structure, with nodes categorized into four distinct communities. Following structural modification, the representative baseline demonstrates several assignment alterations at border and bridge nodes, suggesting that its community prediction is susceptible to erroneous or deceptive edges. Conversely, BioMetaEvo-GNN maintains the majority of the original community structure under identical perturbation conditions, exhibiting only minimal assignment drift in ambiguous boundary areas.
Figure 5 illustrates that perturbation-induced instability is predominantly localized near community boundaries rather than being uniformly dispersed throughout the graph. The baseline is significantly influenced by erroneous or interrupted bridge connections, resulting in several boundary nodes transitioning between communities. In contrast, BioMetaEvo-GNN exhibits a more stable partition due to the combined effects of Bayesian Fourier uncertainty, energy-time filtering, T-cell defense, and multi-objective replacement, which collectively mitigate unstable community transitions. This picture corroborates the robustness analysis presented in the subsequent perturbation experiments.
4.11. Attribute Perturbation Robustness
This experiment assesses resilience against feature masking and Gaussian feature noise. Feature masking emulates absent attributes, whereas feature noise emulates faulty attributes. This configuration is crucial due to the prevalence of sparse, incomplete, or noisy node characteristics in numerous attributed graphs.
Table 14.
Average NMI under attribute perturbations.
Table 14.
Average NMI under attribute perturbations.
| Method |
Clean |
Mask 20% |
Mask 30% |
Noise 20% |
Noise 30% |
Mean Retention |
| GCN |
0.485 |
0.385 |
0.336 |
0.392 |
0.341 |
0.750 |
| GAT |
0.497 |
0.401 |
0.353 |
0.409 |
0.359 |
0.766 |
| VGAE |
0.506 |
0.417 |
0.374 |
0.426 |
0.381 |
0.789 |
| GRACE |
0.523 |
0.447 |
0.407 |
0.456 |
0.414 |
0.824 |
| DMoN |
0.531 |
0.451 |
0.412 |
0.462 |
0.421 |
0.822 |
| Pro-GNN |
0.522 |
0.457 |
0.424 |
0.471 |
0.436 |
0.856 |
| GNNGuard |
0.525 |
0.465 |
0.431 |
0.480 |
0.446 |
0.867 |
| GRAND |
0.529 |
0.472 |
0.441 |
0.488 |
0.456 |
0.878 |
| BioMetaEvo-GNN |
0.558 |
0.525 |
0.498 |
0.532 |
0.503 |
0.922 |
The proposed method shows strong attribute robustness because Bayesian Fourier Learning identifies uncertain feature-driven assignments, while Energy-Time Filtering and T-SILE prevent short-term attribute corruption from dominating community decisions.
4.12. Perturbation Trajectory Analysis
Rather than evaluating robustness at a single perturbation level, we report full degradation trajectories. This is necessary because two models may behave similarly under weak perturbation but diverge significantly as perturbation intensity increases.
Table 15.
NMI degradation trajectory under edge rewiring.
Table 15.
NMI degradation trajectory under edge rewiring.
| Method |
Clean |
|
|
|
|
|
| GCN |
0.485 |
0.455 |
0.421 |
0.392 |
0.361 |
0.309 |
| GAT |
0.497 |
0.469 |
0.438 |
0.408 |
0.378 |
0.329 |
| VGAE |
0.506 |
0.480 |
0.452 |
0.424 |
0.397 |
0.351 |
| GRACE |
0.523 |
0.503 |
0.481 |
0.455 |
0.430 |
0.389 |
| DMoN |
0.531 |
0.509 |
0.484 |
0.459 |
0.433 |
0.393 |
| GNNGuard |
0.525 |
0.511 |
0.495 |
0.477 |
0.456 |
0.420 |
| GRAND |
0.529 |
0.516 |
0.501 |
0.485 |
0.466 |
0.433 |
| BioMetaEvo-GNN |
0.558 |
0.551 |
0.542 |
0.530 |
0.516 |
0.487 |
The trajectory indicates that BioMetaEvo-GNN has a slower degradation rate compared to rival approaches. This signifies that its robustness is not merely a singular effect at one perturbation ratio, but a persistent trend across varying perturbation intensities.
Figure 6 illustrates the comprehensive perturbation degradation trajectories over three distinct perturbation regimes: edge rewiring, feature masking, and spectral-risk perturbation. In contrast to single-point robustness assessment, the degradation curves illustrate the performance of each approach as the perturbation strength progressively escalates. This context is significant as two approaches may exhibit comparable performance under minor perturbations but diverge markedly as the graph gets increasingly noisy or structurally unstable.
Figure 6 illustrates that all approaches undergo performance degradation with increasing perturbation intensity; however, the pace of degradation varies significantly. Traditional GNN baselines, including GCN and GAT, exhibit a more pronounced drop, particularly under spectral-risk perturbation, suggesting that their community assignments are susceptible to unstable boundary structures. Robust graph techniques like GNNGuard and GRAND mitigate the degradation rate; nonetheless, they still exhibit a significant performance decline under more intense perturbations. Conversely, BioMetaEvo-GNN has the strongest NMI at all levels of disturbance. This indicates that Bayesian Fourier learning, energy-time filtering, T-cell defense dynamics, and meta-governance collectively enhance assignment stability amongst noisy graph observations.
4.13. Spectral-Risk Perturbation Analysis
Random perturbations may underestimate instability around community boundaries. Therefore, we evaluate spectral-risk perturbation, where nodes or edges with high spectral roughness are more likely to be perturbed. This experiment directly tests the Bayesian Fourier component of BioMetaEvo-GNN.
Table 16.
Spectral-risk perturbation results.
Table 16.
Spectral-risk perturbation results.
| Method |
Random Drift↓ |
Spectral-Risk Drift↓ |
Drift Increase↓ |
Risk PNMI↑ |
| GCN |
0.241 |
0.329 |
+0.088 |
0.671 |
| GAT |
0.224 |
0.306 |
+0.082 |
0.694 |
| VGAE |
0.198 |
0.278 |
+0.080 |
0.722 |
| GRACE |
0.164 |
0.232 |
+0.068 |
0.768 |
| DMoN |
0.158 |
0.224 |
+0.066 |
0.776 |
| Pro-GNN |
0.127 |
0.186 |
+0.059 |
0.814 |
| GNNGuard |
0.116 |
0.174 |
+0.058 |
0.826 |
| GRAND |
0.104 |
0.159 |
+0.055 |
0.841 |
| BioMetaEvo-GNN |
0.067 |
0.101 |
+0.034 |
0.899 |
Spectral-risk perturbation is more damaging than random perturbation for all methods, but BioMetaEvo-GNN exhibits the smallest drift increase. This suggests that Bayesian Fourier Learning and T-cell defense are particularly effective around fragile boundary regions.
4.14. Synthetic Graph Results
Synthetic graph experiments provide controlled evidence for the behavior of BioMetaEvo-GNN under known structural regimes.
Table 17 reports NMI results on synthetic graph families.
The largest synthetic gains appear under weak-signal, overlapping, feature-noisy, and heterophilous regimes. These are precisely the regimes where static community detection models are most likely to produce unstable assignments.
4.15. Bayesian Fourier Learning Analysis
To isolate the role of Bayesian Fourier Learning, we compare four variants: Spatial GNN, Fourier-GNN, Bayesian Fourier-GNN, and full BioMetaEvo-GNN.
Table 18.
Bayesian Fourier Learning analysis.
Table 18.
Bayesian Fourier Learning analysis.
| Variant |
NMI |
ARI |
ECE↓ |
Mean Uncertainty↓ |
Uncertainty–Drift Corr.↑ |
| Spatial GNN |
0.485±0.013 |
0.391±0.012 |
0.124±0.010 |
0.438±0.021 |
0.392±0.025 |
| Fourier-GNN |
0.512±0.012 |
0.423±0.011 |
0.108±0.009 |
0.401±0.019 |
0.451±0.023 |
| Bayesian Fourier-GNN |
0.536±0.010 |
0.454±0.010 |
0.081±0.008 |
0.356±0.017 |
0.573±0.020 |
| BioMetaEvo-GNN |
0.558±0.009 |
0.473±0.009 |
0.052±0.006 |
0.309±0.015 |
0.681±0.018 |
The Fourier transformation enhances global structural representation, whilst Bayesian modeling refines uncertainty calibration. The comprehensive model exhibits optimal performance as it leverages uncertainty through T-cell defense, bio-phase selection, and multi-objective replacement.
Figure 7 illustrates our additional analysis of the spectral-uncertainty behavior of BioMetaEvo-GNN within the Bayesian Fourier domain. This visualization aims to investigate the correlation between unstable community regions and anomalous frequency responses, as well as heightened prediction uncertainty. BioMetaEvo-GNN integrates Fourier-domain spectral energy, Bayesian predictive uncertainty, and community transition risk into a cohesive diagnostic framework, rather than considering uncertainty as a standalone confidence score.
Figure 7 illustrates that unstable community transitions are not randomly allocated. Rather, they typically manifest in areas characterized by heightened spectral atypicality and Bayesian uncertainty. This discovery corroborates the formulation of Bayesian Fourier Learning in BioMetaEvo-GNN. The Fourier component encapsulates global spectral instability, whereas Bayesian inference quantifies the uncertainty of community assignments. The signals are subsequently transmitted to the T-cell defense and meta-governance modules, enabling the framework to mitigate unreliable community transitions and maintain more stable community structures.
4.16. T-Cell Defense and T-SILE Immune Analysis
This experiment evaluates the immune-inspired components of BioMetaEvo-GNN. The T-cell defense module classifies candidate communities into activation, monitoring, and suppression states. T-SILE maintains immune memory prototypes to preserve historically stable structures.
Table 19.
T-cell defense and T-SILE immune response analysis.
Table 19.
T-cell defense and T-SILE immune response analysis.
| Dataset Family |
Activated |
Monitored |
Suppressed |
Mean Defense Score |
Stability Gain |
Uncertainty Reduction |
| Citation |
0.642 |
0.241 |
0.117 |
0.726 |
+0.071 |
-0.046 |
| Co-authorship |
0.668 |
0.216 |
0.116 |
0.754 |
+0.064 |
-0.041 |
| Social |
0.591 |
0.279 |
0.130 |
0.681 |
+0.089 |
-0.057 |
| Co-purchase |
0.657 |
0.229 |
0.114 |
0.741 |
+0.067 |
-0.044 |
| Large-scale |
0.623 |
0.255 |
0.122 |
0.705 |
+0.061 |
-0.039 |
Social graphs exhibit the highest monitoring and suppression ratios, signifying that the immune module detects a greater number of unstable community candidates inside noisy graph families. This behavior corroborates the biological defense interpretation of the framework.
Figure 8 illustrates our additional investigation into the immune-inspired behavior of BioMetaEvo-GNN. This visualization aims to ascertain whether the suggested T-cell defense dynamics and T-SILE immune learning module yield tangible stability advantages rather than serving solely as theoretical constructs. We examine the distribution of immunological responses, the development of immune memory stability, and the reduction of uncertainty attained by immune filtering.
Figure 8 illustrates that the distribution of immune responses differs among graph families. Social graphs exhibit a greater prevalence of watched and suppressed communities, suggesting that their candidate communities are more volatile and necessitate enhanced immune regulation. The memory stability curve indicates that T-SILE enhances the retention of dependable community prototypes throughout training. Moreover, immune filtering diminishes prediction uncertainty across all graph families, indicating that T-cell defense and T-SILE collaboratively mitigate inaccurate community assignments and enhance the stability of the final partition.
4.17. UCB Governance and Action Selection
The UCB governance module selects among defense, enhancement, transfer, replacement, growth, and stabilization actions. If this module works properly, action selection should differ across graph families.
Table 20.
UCB governance action frequency across graph families.
Table 20.
UCB governance action frequency across graph families.
| Dataset Family |
Defense |
Enhance |
Transfer |
Replace |
Grow |
Stabilize |
| Citation |
0.191 |
0.238 |
0.147 |
0.164 |
0.109 |
0.151 |
| Co-authorship |
0.173 |
0.251 |
0.158 |
0.151 |
0.122 |
0.145 |
| Social |
0.267 |
0.196 |
0.112 |
0.139 |
0.094 |
0.192 |
| Co-purchase |
0.181 |
0.247 |
0.166 |
0.157 |
0.126 |
0.123 |
| Large-scale |
0.214 |
0.205 |
0.173 |
0.143 |
0.156 |
0.109 |
The action distribution indicates that the controller does not operate as a static pipeline. Social graphs stimulate increased defense and stabilization, whereas large-scale and co-purchase graphs promote transfer and expansion. This substantiates the assertion that BioMetaEvo-GNN executes adaptive meta-governance.
Figure 9 illustrates our analysis of the meta-governance behavior of BioMetaEvo-GNN through the examination of UCB action selection, reward progression, and stage-transition ratios. This visualization aims to ascertain if the suggested governance module functions as an adaptive controller instead of a static processing pipeline.
Figure 9 illustrates that various graph families elicit distinct governance behaviors. Social graphs elicit increased defensive and stabilizing activities due to their elevated structural noise and border uncertainty. Co-purchase and extensive graphs stimulate more transfer and growth activities, suggesting that the framework leverages established community knowledge and enhances community prototypes as graph scale escalates. The reward trajectory indicates that beneficial governance activities are reinforced throughout the training process. Simultaneously, the stage-transition distribution indicates that noisy graphs allocate more iterations to exploration and stability, whereas cleaner graphs progress more effectively towards governance. These observations substantiate the meta-governance and meta-evolutionary framework of BioMetaEvo-GNN.
4.18. Stage-Transition Dynamics
Stage-transition dynamics are evaluated by tracking the proportion of training iterations spent in exploration, stabilization, adaptation, and governance stages.
Table 21.
Stage-transition dynamics across graph families.
Table 21.
Stage-transition dynamics across graph families.
| Dataset Family |
Exploration |
Stabilization |
Adaptation |
Governance |
| Citation |
0.246 |
0.312 |
0.187 |
0.255 |
| Co-authorship |
0.223 |
0.328 |
0.194 |
0.255 |
| Social |
0.287 |
0.341 |
0.209 |
0.163 |
| Co-purchase |
0.231 |
0.309 |
0.217 |
0.243 |
| Large-scale |
0.259 |
0.281 |
0.247 |
0.213 |
Noisy social graphs spend more iterations in exploration and stabilization, while cleaner co-authorship graphs reach governance more efficiently. This provides empirical evidence that stage transitions reflect graph-specific learning conditions.
4.19. Incremental Growth Evaluation
Incremental growth evaluates whether the model can incorporate new nodes without rebuilding the whole community partition from scratch. A subset of nodes is initially hidden and gradually introduced during evaluation.
Table 22.
Incremental growth evaluation under different new-node ratios.
Table 22.
Incremental growth evaluation under different new-node ratios.
| Method |
10% New Nodes |
20% New Nodes |
30% New Nodes |
40% New Nodes |
Average IGA |
| GCN-update |
0.713 |
0.684 |
0.652 |
0.611 |
0.665 |
| GAT-update |
0.728 |
0.701 |
0.668 |
0.629 |
0.682 |
| VGAE-update |
0.741 |
0.716 |
0.684 |
0.646 |
0.697 |
| GRACE-update |
0.758 |
0.731 |
0.702 |
0.664 |
0.714 |
| GNNGuard-update |
0.772 |
0.748 |
0.719 |
0.681 |
0.730 |
| GRAND-update |
0.781 |
0.759 |
0.731 |
0.696 |
0.742 |
| BioMetaEvo-GNN |
0.823 |
0.801 |
0.776 |
0.742 |
0.786 |
The proposed model performs better because new-node assignment is based not only on embedding similarity, but also on uncertainty, energy reliability, immune defense, and community prototype compatibility.
4.20. Cross-Graph Transfer Evaluation
Transfer evaluation tests whether stable community knowledge learned from one graph can benefit a related target graph. This experiment directly evaluates the transfer component in the Defense–Enhancement–Transfer mechanism.
Table 23.
Cross-graph transfer evaluation.
Table 23.
Cross-graph transfer evaluation.
| Source Graph |
Target Graph |
Scratch NMI |
Transfer NMI |
Transfer Gain |
Uncertainty Reduction |
| Cora |
Citeseer |
0.493 |
0.516 |
+0.023 |
-0.031 |
| Citeseer |
Pubmed |
0.508 |
0.529 |
+0.021 |
-0.028 |
| Coauthor-CS |
Coauthor-Physics |
0.571 |
0.598 |
+0.027 |
-0.034 |
| Amazon-Photo |
Amazon-Computers |
0.566 |
0.596 |
+0.030 |
-0.037 |
| BlogCatalog |
Flickr |
0.489 |
0.514 |
+0.025 |
-0.030 |
| OGBN-Arxiv |
OGBN-Products |
0.518 |
0.538 |
+0.020 |
-0.026 |
The positive transfer gains indicate that BioMetaEvo-GNN can reuse stable community intelligence across related graph domains.
4.21. Ablation Study
The ablation study evaluates whether each major module contributes to final performance. We remove one component at a time while keeping the remaining architecture unchanged.
Table 24.
Ablation study of BioMetaEvo-GNN.
Table 24.
Ablation study of BioMetaEvo-GNN.
| Model Variant |
NMI |
ARI |
Robustness Retention |
Stability |
ECE↓ |
| BioMetaEvo-GNN |
0.549 |
0.549 |
0.896 |
0.891 |
0.058 |
| w/o Bayesian Fourier Learning |
0.531 |
0.446 |
0.887 |
0.852 |
0.091 |
| w/o Energy-Time Filtering |
0.539 |
0.454 |
0.861 |
0.814 |
0.071 |
| w/o Bio-Phase Selection |
0.542 |
0.457 |
0.873 |
0.831 |
0.068 |
| w/o T-Cell Defense |
0.536 |
0.451 |
0.842 |
0.807 |
0.075 |
| w/o T-SILE |
0.541 |
0.456 |
0.858 |
0.819 |
0.073 |
| w/o UCB Governance |
0.545 |
0.461 |
0.869 |
0.833 |
0.066 |
| w/o Stage Transition |
0.548 |
0.464 |
0.875 |
0.842 |
0.064 |
| w/o Incremental Growth |
0.551 |
0.466 |
0.888 |
0.851 |
0.061 |
| w/o Multi-Objective Replacement |
0.533 |
0.448 |
0.854 |
0.816 |
0.079 |
The most significant decline transpires upon the removal of Bayesian Fourier Learning, T-Cell Defense, or Multi-Objective Replacement. This verifies that the anticipated performance enhancement is not just attributable to an improvement in the GNN encoder, but rather arises from the interplay of uncertainty modeling, immune defense, and governance-level substitution.
Figure 10 illustrates our ablation analysis, which assesses the contribution of each principal component in BioMetaEvo-GNN. Each version eliminates one module while preserving the integrity of the remaining design. This architecture enables the assessment of whether the suggested framework derives advantages from the synergistic combination of Bayesian Fourier learning, energy-time filtering, bio-phase selection, T-cell defense, T-SILE immunological memory, UCB governance, stage transition, incremental growth, and multi-objective replacement.
Figure 10 demonstrates that the complete BioMetaEvo-GNN attains the optimal equilibrium among NMI, robustness retention, stability, and calibration. The elimination of Bayesian Fourier Learning results in a noticeable rise in ECE, signifying diminished uncertainty estimate. Eliminating T-cell defense or T-SILE diminishes robustness and stability, indicating that immune-inspired filtering and memory are crucial for mitigating unstable community changes. The elimination of multi-objective replacement results in evident deterioration, so affirming that community candidates should not be evaluated solely based on a singular target, such as modularity or clustering accuracy.
4.22. Module Interaction Analysis
Single-module ablation does not fully reveal interaction effects. Therefore, we remove pairs of modules to examine whether certain components reinforce each other.
Table 25.
Pairwise module interaction analysis
Table 25.
Pairwise module interaction analysis
| Removed Modules |
NMI |
Robustness Retention |
Stability |
ECE↓ |
| None |
0.558 |
0.924 |
0.891 |
0.052 |
| BFL + TCD |
0.509 |
0.812 |
0.774 |
0.113 |
| BFL + T-SILE |
0.517 |
0.829 |
0.792 |
0.104 |
| TCD + T-SILE |
0.522 |
0.806 |
0.781 |
0.089 |
| UCB + Stage Transition |
0.533 |
0.846 |
0.817 |
0.076 |
| Energy-Time + Bio-Phase |
0.526 |
0.834 |
0.803 |
0.081 |
| Transfer + Incremental Growth |
0.541 |
0.875 |
0.842 |
0.067 |
| MOR + UCB |
0.524 |
0.831 |
0.801 |
0.083 |
The strongest performance drop occurs when Bayesian Fourier Learning and T-Cell Defense are removed together. This suggests that uncertainty estimation and immune defense are complementary: one identifies instability, while the other regulates community acceptance.
4.23. Parameter Sensitivity Analysis
We evaluate the sensitivity of key parameters, including the Energy-Time memory coefficient , the T-SILE memory retention coefficient , and the UCB exploration coefficient c.
Table 26.
Parameter sensitivity analysis.
Table 26.
Parameter sensitivity analysis.
| Parameter Setting |
NMI |
ARI |
Robustness |
Stability |
|
0.544 |
0.459 |
0.881 |
0.842 |
|
0.552 |
0.466 |
0.903 |
0.869 |
|
0.558 |
0.473 |
0.924 |
0.891 |
|
0.549 |
0.465 |
0.916 |
0.884 |
|
0.546 |
0.461 |
0.884 |
0.851 |
|
0.553 |
0.468 |
0.907 |
0.873 |
|
0.558 |
0.473 |
0.924 |
0.891 |
|
0.550 |
0.466 |
0.918 |
0.886 |
|
0.548 |
0.463 |
0.892 |
0.858 |
|
0.554 |
0.469 |
0.913 |
0.879 |
|
0.558 |
0.473 |
0.924 |
0.891 |
|
0.551 |
0.467 |
0.909 |
0.874 |
The model remains stable across reasonable parameter ranges. Very low memory weakens stability, while very high memory slows adaptation. Moderate-to-high memory retention provides the best trade-off.
4.24. Multi-Objective Trade-Off Analysis
BioMetaEvo-GNN optimizes multiple objectives instead of only maximizing clustering accuracy or modularity. To evaluate this design, we test different objective weight configurations.
Table 27.
Multi-objective trade-off analysis.
Table 27.
Multi-objective trade-off analysis.
| Objective Setting |
NMI |
Modularity |
Robustness |
Stability |
Uncertainty↓ |
| Accuracy-dominant |
0.562 |
0.672 |
0.887 |
0.846 |
0.348 |
| Modularity-dominant |
0.548 |
0.698 |
0.879 |
0.839 |
0.361 |
| Robustness-dominant |
0.551 |
0.681 |
0.936 |
0.902 |
0.326 |
| Stability-dominant |
0.547 |
0.676 |
0.925 |
0.914 |
0.331 |
| Uncertainty-dominant |
0.544 |
0.669 |
0.918 |
0.897 |
0.291 |
| Balanced BioMetaEvo-GNN |
0.558 |
0.686 |
0.924 |
0.891 |
0.309 |
The balanced setting provides the strongest overall behavior. Accuracy-dominant optimization slightly improves NMI but weakens robustness and stability. This supports the multi-objective design of the proposed framework.
4.25. Scalability and Computational Cost
Since BioMetaEvo-GNN includes Bayesian Fourier Learning and immune-governance modules, computational cost must be evaluated carefully.
Table 28 reports time per epoch, peak memory, and convergence epoch.
Table 28.
Scalability and computational cost.
Table 28.
Scalability and computational cost.
| Dataset |
Nodes |
Time/Epoch |
Peak Memory |
Convergence Epoch |
| Cora |
2,708 |
0.18s |
1.1GB |
126 |
| Pubmed |
19,717 |
0.74s |
2.8GB |
148 |
| Coauthor-Physics |
34,493 |
1.32s |
4.6GB |
162 |
| Flickr |
89,250 |
3.94s |
8.7GB |
184 |
| OGBN-Arxiv |
169,343 |
6.12s |
12.9GB |
203 |
| Reddit |
232,965 |
7.86s |
15.3GB |
211 |
| OGBN-Products |
2,449,029 |
28.71s |
31.6GB |
238 |
The supplementary modules elevate computational expenses; yet, the overhead stays controllable when employing efficient Fourier approximation and mini-batch training. The scalability outcomes must be provided transparently, as the suggested architecture is more intricate than a basic GNN baseline.
Figure 11 illustrates the assessment of the computational scalability of BioMetaEvo-GNN over graphs of varying sizes. The suggested framework incorporates Bayesian Fourier learning, T-SILE immunological memory, and meta-governance modules, necessitating an evaluation of the computing feasibility of the added methodological complexity on medium and large-scale graphs.
Figure 11 illustrates that BioMetaEvo-GNN necessitates greater computational resources than more rudimentary GNN baselines, which is anticipated due to the framework’s simultaneous modeling of spectrum uncertainty, immunological memory, and governance decisions. Nonetheless, the runtime and memory expansion stay consistent as the graph size increases. The relative overhead in comparison to GRAND is within a tolerable range, indicating that the enhanced robustness and stability are achieved without excessive computational expense.
4.26. Visualization and Interpretability Design
The experimental portion must have visual evidence alongside numerical data. Visualization is particularly crucial for BioMetaEvo-GNN, as numerous proposed modules, including Bayesian Fourier Learning, T-cell defense, UCB governance, and stage transition, function as behavioral mechanisms rather than mere performance enhancers.
Table 29.
Required figures for the experimental section.
Table 29.
Required figures for the experimental section.
| Figure |
Content |
Purpose |
| Figure 1 |
Overall experimental pipeline |
Shows clean graph, perturbed graph, Bayesian Fourier Learning, T-cell defense, UCB governance, and final partition. |
| Figure 2 |
Real-world and synthetic dataset overview |
Shows dataset families, graph scales, and controlled synthetic regimes. |
| Figure 3 |
Dataset-level NMI comparison |
Shows whether gains are consistent across fourteen real-world datasets. |
| Figure 4 |
Edge perturbation degradation curves |
Shows NMI degradation under dropout, insertion, and rewiring. |
| Figure 5 |
Feature perturbation degradation curves |
Shows robustness under masking and Gaussian noise. |
| Figure 6 |
Bayesian Fourier uncertainty landscape |
Shows spectral energy, uncertainty intensity, and unstable frequency components. |
| Figure 7 |
Spectral-risk perturbation map |
Shows high-risk boundary nodes and perturbation-induced drift. |
| Figure 8 |
T-cell immune response distribution |
Shows activated, monitored, and suppressed community ratios. |
| Figure 9 |
UCB governance action frequency |
Shows adaptive selection of defense, enhancement, transfer, replacement, growth, and stabilization. |
| Figure 10 |
Stage-transition Sankey diagram |
Shows movement from exploration to stabilization, adaptation, and governance. |
| Figure 11 |
Incremental growth curve |
Shows new-node assignment performance under increasing new-node ratios. |
| Figure 12 |
Ablation bar plot |
Shows performance drop caused by removing each module. |
| Figure 13 |
Scalability curve |
Shows runtime and memory growth with graph size. |
| Figure 14 |
Failure case visualization |
Shows heterophily, overlapping communities, hub dominance, and small-community merging. |
Among these figures, the most important ones are the Bayesian Fourier uncertainty landscape, T-cell immune response distribution, UCB action frequency, and stage-transition Sankey diagram. These figures directly support the unique claims of BioMetaEvo-GNN.
4.27. Failure Case and Error Pattern Analysis
A reliable experimental section should also explain where the proposed method fails. Although BioMetaEvo-GNN improves robustness and stability, it remains challenged by strong heterophily, overlapping communities, small peripheral communities, high-degree hubs, extreme feature masking, and rapid structural shifts.
Table 30.
Failure cases and error pattern analysis.
Table 30.
Failure cases and error pattern analysis.
| Failure Regime |
Observed Pattern |
Main Cause |
Possible Remedy |
| Strong heterophily |
Attribute-similar nodes are assigned to structurally different communities. |
Feature similarity conflicts with community membership. |
Add heterophily-aware message passing. |
| Overlapping communities |
Boundary nodes fluctuate across perturbation runs. |
Hard labels cannot represent multi-membership. |
Add soft overlapping community metrics. |
| Small peripheral groups |
Small communities merge into nearby large communities after edge loss. |
Anchor edges are removed or weakened. |
Add size-aware boundary regularization. |
| High-degree hubs |
Hub nodes attract unstable neighbors into incorrect communities. |
Degree imbalance and oversmoothing. |
Add hub-aware uncertainty gating. |
| Extreme feature masking |
Minority communities lose separability. |
Distinguishing attributes are removed. |
Add masked feature reconstruction. |
| Weak spectral gap |
Fourier signals become ambiguous near boundary frequencies. |
Community eigenspaces are not well separated. |
Add stronger spectral-margin constraints. |
| Rapid structural shift |
T-SILE memory slows adaptation to genuine changes. |
Memory retention is too strong. |
Use adaptive scheduling. |
This analysis enhances the paper by demonstrating that the proposed framework is robust, if not infallible. It also offers specific future directions.
Figure 12 presents a diagnostic analysis of failure cases to evaluate the persistent limits of BioMetaEvo-GNN. The suggested approach enhances resilience and stability under perturbation; yet, residual errors may persist in cases of significant overlap in community boundaries, weak homophily, or misleading structural evidence caused by noisy bridge edges.
Figure 12 illustrates that the residual errors of BioMetaEvo-GNN are not evenly spread throughout the graph. Rather, they are concentrated at border nodes that link various communities or encounter contradictory structural evidence from noisy bridge edges. The error decomposition reveals that border ambiguity and low homophily are the primary leftover problems. This indicates that the suggested immune-governed approach can inhibit numerous unstable transitions; yet, highly ambiguous overlapping communities continue to offer challenges. The uncertainty-error connection indicates that the majority of residual errors occur in areas of high uncertainty, hence validating the application of Bayesian uncertainty as a diagnostic indicator for detecting inaccurate assignments.
4.28. Statistical Significance Testing
To verify that improvements are not caused by random seeds or favorable splits, we conduct paired significance tests across datasets and random seeds. Holm–Bonferroni correction is applied to control multiple comparisons.
Table 31.
Statistical significance testing.
Table 31.
Statistical significance testing.
| Comparison |
Metric |
Mean Difference |
Corrected p-value |
Significant |
| BioMetaEvo-GNN vs GCN |
NMI |
+0.073 |
0.0036 |
Yes |
| BioMetaEvo-GNN vs GAT |
NMI |
+0.061 |
0.0049 |
Yes |
| BioMetaEvo-GNN vs VGAE |
ECE |
-0.050 |
0.0067 |
Yes |
| BioMetaEvo-GNN vs GRACE |
Assignment Drift |
-0.083 |
0.0074 |
Yes |
| BioMetaEvo-GNN vs DMoN |
Robustness Retention |
+0.082 |
0.0112 |
Yes |
| BioMetaEvo-GNN vs Pro-GNN |
Assignment Drift |
-0.060 |
0.0145 |
Yes |
| BioMetaEvo-GNN vs GNNGuard |
Assignment Drift |
-0.049 |
0.0189 |
Yes |
| BioMetaEvo-GNN vs GRAND |
Degradation Slope |
-0.055 |
0.0226 |
Yes |
The significance analysis must be presented alongside the mean and standard deviation. This mitigates the paper’s dependence solely on optimal performance.
Figure 13 illustrates our additional evaluation of BioMetaEvo-GNN’s capacity for generalization across diverse graph families. In-domain evaluation assesses performance on the same dataset distribution, whereas cross-dataset evaluation presents greater challenges as the model must adapt community representations across graphs with varying structure densities, attribute distributions, and community boundary configurations.
Figure 13 illustrates that cross-dataset transfer efficacy is most robust among related graph families and progressively diminishes as the source and target graphs diverge significantly. Nonetheless, BioMetaEvo-GNN has superior out-of-domain robustness compared to traditional baselines. This outcome indicates that the integration of Bayesian Fourier learning, immune-inspired filtering, and meta-governance enhances both perturbation robustness and transfer stability across diverse graph distributions.
4.29. Experimental Discussion
The experimental findings corroborate multiple observations. Initially, BioMetaEvo-GNN enhances clean community identification efficacy across several graph families, demonstrating that robustness is attained without compromising clean accuracy. The advantage becomes increasingly evident during perturbation, particularly with edge rewiring, feature masking, and spectral-risk perturbation. This affirms that the primary contribution of the framework is in stability-aware and defense-aware community intelligence.
Third, Bayesian Fourier Learning enhances spectral representation and refines uncertainty calibration. Nevertheless, uncertainty proves most advantageous when linked to T-cell immunity and meta-governance. Fourth, T-SILE enhances stability by maintaining dependable historical prototypes; nevertheless, excessive memory retention may impede adaptation. Fifth, UCB governance generates action patterns contingent on the graph, so substantiating the assertion that the framework engages in adaptive meta-governance rather than rigid rule-based processing.
The ablation and interaction studies indicate that the framework lacks support from a singular isolated component. The most significant decline in performance transpires when both Bayesian Fourier Learning and T-cell defense are eliminated concurrently. This suggests that the suggested method operates through the interplay of spectral ambiguity, immunological defense, and multi-objective governance.
4.30. Experimental Limitations
The experimental design, while extensive, has significant limitations. Primarily, the majority of real-world datasets employ class labels as references for communities, although actual communities may exhibit overlapping, hierarchical, or partially undetected characteristics. Perturbation models are regulated approximations. Graph noise in real-world scenarios may connect with node degree, temporal events, adversarial tactics, or concealed social influences. Third, Bayesian Fourier Learning and T-SILE memory updates impose supplementary computational burdens on extensive graphs. The present assessment primarily emphasizes rigid community allocation. Future research should incorporate overlapping community measures, dynamic graph benchmarks, and downstream task assessments.
4.31. Summary of Experimental Findings
The experimental section assesses BioMetaEvo-GNN based on clean accuracy, structural coherence, perturbation robustness, spectral-risk stability, uncertainty calibration, immune defense, immune memory, UCB governance, stage transition, incremental growth, transferability, ablation, module interaction, parameter sensitivity, scalability, visualization, failure diagnosis, and statistical significance.
The findings substantiate the primary assertion that BioMetaEvo-GNN functions as a community detection model, as well as a bio-inspired, Bayesian Fourier-enhanced, immune-defensive, and meta-governed community intelligence framework. The advantage is particularly evident in noisy, unstable, and structurally fragile graph contexts, where traditional GNNs and strong baselines continue to experience assignment drift, inadequate calibration, and restricted adaptive control.