Submitted:
16 October 2024
Posted:
17 October 2024
You are already at the latest version
Abstract
Keywords:
MSC: 68T07; 05C85; 05C40
1. Introduction
2. Graph Theory Fundamentals
2.1. Definitions
2.2. Bipartite Graphs in Large-Scale Applications
3. Classical and Neural Approaches to Graph Matching
3.1. Challenges of Classical Approaches in Graph Matching
3.2. Graph Neural Networks (GNNs)
4. HybridGNN: Architectural Design and Innovation
4.1. Architectural Components of HybridGNN
-
Graph Attention Networks v2 (GATv2) The Graph Attention Network v2 (GATv2) is an enhanced version of the original GAT model, designed to overcome certain limitations, particularly those associated with the non-negative attention coefficients in GAT. GATv2 introduces a more flexible attention mechanism, allowing the attention coefficients to assume both positive and negative values. This enhancement enables GATv2 to more effectively capture complex relationships between nodes, even in noisy or highly intricate graphs, thereby improving overall task performance.The core operation of GATv2 is represented as follows:where and represent the feature vectors of nodes i and j, is a learnable weight matrix, and represents the attention mechanism. Unlike the original GAT, GATv2 allows for negative attention scores, significantly increasing its expressiveness and enabling the model to better handle complex and non-linear interactions in graph structures. This improvement is crucial for tasks that require a nuanced understanding of node relationships, especially in noisy or complex graph environments.
- SAGEConv Layers: By aggregating information from neighboring nodes, the SAGEConv layers generate more comprehensive node representations, facilitating a deeper understanding of both local and global graph structures [4]. This aggregation process enables the model to generalize effectively across diverse graph topologies.
-
Graph Isomorphism Networks (GIN) The Graph Isomorphism Network (GIN) is a highly expressive GNN architecture specifically designed to capture detailed structural information from graphs by mimicking the Weisfeiler-Lehman (WL) graph isomorphism test [21]. GIN achieves this by aggregating node features from neighboring nodes using a learned weighted sum, making it particularly effective for tasks requiring a fine-grained understanding of graph structure. This high level of discriminative power makes GIN especially suitable for graph matching tasks and other applications where subtle structural differences in the graph are critical.The GIN aggregation rule is defined as:where represents the feature of node v at the k-th layer, is a learnable parameter, and denotes the set of neighboring nodes of v. The Multi-Layer Perceptron (MLP) in GIN plays a crucial role in enhancing the model’s expressiveness, enabling it to capture complex and intricate graph structures. This architecture’s ability to differentiate between non-isomorphic graphs gives it a significant advantage in learning highly discriminative graph representations.
-
Focal LossFocal Loss is a modified version of the standard cross-entropy loss function, specifically designed to address the challenges posed by class imbalance in classification tasks. In scenarios where certain classes dominate the dataset, models often become biased toward these majority classes, resulting in poor performance on underrepresented minority classes. Focal Loss mitigates this issue by reducing the relative loss contribution from easily classified examples and placing more emphasis on hard-to-classify instances.The Focal Loss function is mathematically defined as:where represents the predicted probability for the target class, is a weighting factor for class t, and is a focusing parameter that controls the down-weighting rate of easy examples. By increasing , the loss function places greater focus on difficult-to-classify examples, effectively forcing the model to pay more attention to minority or misclassified data points. This mechanism enhances the model’s ability to perform well on imbalanced datasets, addressing a common limitation of traditional loss functions.
- Jumping Knowledge (JK): The JK mechanism integrates representations from multiple layers, allowing the model to retain and utilize information at various depths of the network [23]. This ensures that the model captures both shallow and deep patterns, enhancing its ability to handle graphs with multi-level complexities.

5. Self-Supervised Learning for Graph-Based Tasks
5.1. Pseudo-Labeling Strategies for Bipartite Graphs
- Node Degree-Based Labeling: Nodes within each vertex set of the bipartite graph are first classified based on their degree. Nodes exhibiting similar degree values are grouped under the same pseudo-label, thereby capturing the local connectivity patterns [26].
- Neighborhood Disparity Adjustment: For each node, the feature vector similarity with its neighboring nodes is calculated. Nodes that exhibit significant disparity between their feature vectors and those of their neighbors are reclassified and assigned new pseudo-labels, reflecting their distinct roles in the graph [27].
- Maximum Matching Participation: Nodes that participate in the maximum matching of the bipartite graph are assigned unique pseudo-labels to highlight their central importance in maintaining the overall structure of the graph [18]. This ensures that critical nodes are treated differently, aligning the pseudo-labeling process with the structural properties of maximum matchings.

5.2. Supervised Training of HybridGNN Using Pseudo-Labels


6. L2 Regularization and Early Stopping Mechanisms
6.1. L2 Regularization for Overfitting Prevention
6.2. F1-Score-Driven Early Stopping Mechanism
7. Dynamic Pseudo-Label Updating
- Initial Pseudo-Label Generation: Use a clustering method (e.g., K-means) to assign pseudo-labels to the nodes based on their embeddings.
- Confidence Score Calculation: At each epoch, compute the confidence of model predictions using softmax probabilities.
- Update Rule: For nodes where the confidence exceeds a predefined threshold (e.g., 70%), update their pseudo-labels to the predicted class.
- Iterative Refinement: This process is repeated over multiple epochs, ensuring that the pseudo-labels become more accurate as the model improves.
8. Experimental Setup
8.1. Dataset Overview: Email Communication Network
8.2. DropEdge Data Augmentation

8.3. Optimization and Training Procedures for HybridGNN


9. Experimental Results and Analysis
9.1. Performance Comparison with Classical Algorithms
9.2. Metrics for Assessing HybridGNN Performance
- Accuracy: Accuracy was used to measure the proportion of edges correctly predicted as part of the maximum matching. This metric provides a straightforward indication of how well the model identifies true matches within the graph [28]. Figure 2 shows the accuracy of the HybridGNN model over 100 epochs. The model’s accuracy steadily improves, eventually stabilizing near a high value. The dashed orange line represents the baseline accuracy at 0.7, highlighting the model’s satisfied performance.
- F1-Score: The F1-score, a harmonic mean of precision and recall, was used to balance the trade-off between these two measures, especially in cases where the class distribution is imbalanced. It is particularly useful for evaluating the model’s performance in binary classification tasks related to edge prediction [37].
10. Time Complexity Analysis of the Hybrid GNN Model
10.1. Model Architecture Overview
- Input Layer: Node features of dimension .
- First Two Layers: Two GATv2Conv layers with skip connections.
- Intermediate Layers: SAGEConv layers.
- Final Layer: One GINConv layer.
- Output Layer: A fully connected layer mapping to C output classes.
10.2. Time Complexity of Individual Layers
10.2.1. GATv2Conv Layers
- Attention Coefficients: The GATv2Conv layer computes attention coefficients for each edge in the graph. Since attention is computed for each of the E edges and involves feature vectors of size H, the complexity of this step is:
- Self-Attention and Linear Transformations: In addition to edge-level attention, the GATv2Conv layer applies self-attention and linear transformations to the features of each node. This step requires operations per node, and for N nodes, the total complexity is:
- Attention Heads: GATv2Conv uses K attention heads, each operating on a reduced feature dimension . However, since the total feature dimensionality remains H, the complexity remains proportional to the full dimensionality. Therefore, the total complexity is:
10.2.2. SAGEConv Layers
- Neighbor Aggregation: For each node, the features of its neighboring nodes are aggregated. Since there are E edges in the graph and each edge involves an aggregation of node features of size H, the time complexity for this step is:
- Linear Transformation: After aggregation, a linear transformation is applied to the node features. For N nodes, each having a hidden dimension of size H, the total time complexity for this step is:
10.2.3. GINConv Layer
- Neighbor Aggregation: For each node, features from its neighboring nodes are aggregated. Given that there are E edges, and each edge contributes to an aggregation involving features of size H, the time complexity for this operation is:
- Application of MLP: After aggregation, the MLP with depth is applied to the features of N nodes. The time complexity for this operation is:
10.3. Overall Time Complexity
- Two GATv2Conv layers, each with complexity
- SAGEConv layers, each with complexity
- One GINConv layer, with complexity
- One fully connected layer, with complexity
10.4. Additional Computational Considerations
10.4.1. Residual Connections
10.4.2. Batch Normalization and Activation Functions
10.4.3. DropEdge Data Augmentation
10.4.4. Pseudo-Label Updating
10.5. Backward Pass and Parameter Updates
10.6. Total Time Complexity per Epoch
10.7. Scalability Considerations
- Sampling Methods: Techniques like GraphSAGE’s neighbor sampling can reduce the number of edges processed per batch.
- Sparse Matrix Operations: Leveraging efficient sparse matrix computations can improve computational efficiency.
- Mini-Batch Training: Processing subsets of the graph can reduce memory requirements and computation per iteration.
10.8. Conclusion
11. Discussion
11.1. Analysis of Results
11.2. Challenges and Limitations of HybridGNN
11.3. Directions for Future Research
12. Concluding Remarks
Funding
Acknowledgments
Conflicts of Interest
Appendix A. Full Code Implementation






References
- Edmonds, J. Paths, trees, and flowers. Canadian Journal of mathematics 1965, 17, 449–467.
- Hopcroft, J.E.; Karp, R.M. An n5/2 algorithm for maximum matchings in bipartite graphs. SIAM Journal on computing 1973, 2, 225–231.
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv preprint arXiv:1710.10903 2017.
- Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive representation learning on large graphs. Advances in neural information processing systems, 2017, pp. 1024–1034.
- Wang, M.; Lin, Y.; Wang, S. The nature diagnosability of bubble-sort star graph networks under the PMC model and MM* model. Int. J. Eng. Appl. Sci 2017, 4, 55–60.
- Wang, S.; Wang, Y.; Wang, M. Connectivity and matching preclusion for leaf-sort graphs. Journal of Interconnection Networks 2019, 19, 1940007.
- Lin, Y.; Miller, M.; Simanjuntak, R. Edge-magic total labelings of wheels, fans and friendship graphs. Bulletin of the ICA 2002, 35, 89–98.
- Baca, M.; Lin, Y.; Miller, M.; Simanjuntak, R. New constructions of magic and antimagic graph labelings. Utilitas Mathematica 2001, 60, 229–239.
- Lin, Y. Face antimagic labelings of plane graphs Pab. Ars Combinatoria 2006, 80, 259–273.
- Alghamdi, J.; Luo, S.; Lin, Y. A comprehensive survey on machine learning approaches for fake news detection. Multimedia Tools and Applications 2024, 83, 51009–51067.
- Javed, M.; Lin, Y. iMER: Iterative process of entity relationship and business process model extraction from the requirements. Information and Software Technology 2021, 135, 106558.
- Satake, S.; Gu, Y.; Sakurai, K. Explicit Non-malleable Codes from Bipartite Graphs. International Workshop on the Arithmetic of Finite Fields. Springer, 2022, pp. 221–236.
- Wang, M.; Wang, S. Connectivity and diagnosability of center k-ary n-cubes. Discrete Applied Mathematics 2021, 294, 98–107.
- Wang, M.; Lin, Y.; Wang, S.; Wang, M. Sufficient conditions for graphs to be maximally 4-restricted edge connected. Australas. J Comb. 2018, 70, 123–136.
- Bondy, J.A.; Murty, U.S.R.; others. Graph theory with applications; Vol. 290, Macmillan London, 1976.
- West, D.B. Introduction to graph theory; Prentice hall, 2001.
- Diestel, R. Graph theory; Springer, 2005.
- Lovász, L.; Plummer, M. Matching theory; American Mathematical Soc., 2009.
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 2016.
- McCreesh, C.; Prosser, P. A guide to scalability experiments in combinatorial computing. ACM Journal on Experimental Algorithmics (JEA) 2020, 25, 1–32.
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 2018.
- Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; others. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 2018.
- Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.i.; Jegelka, S. Representation learning on graphs with jumping knowledge networks. International Conference on Machine Learning, 2018, pp. 5453–5462.
- Jing, L.; Tian, Y. Self-supervised learning: A survey. arXiv preprint arXiv:2006.08218 2020.
- Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang, P.; Pande, V.; Leskovec, J. Strategies for pre-training graph neural networks. International Conference on Learning Representations, 2020.
- Zhao, J.; Zhou, H.; Liu, W. Degree-based classification in bipartite networks. Physical Review E, 2006, Vol. 74, p. 056109.
- Jiang, S.; Wang, L.; Liu, X. Neighborhood feature disparity for node classification in graphs. IEEE Transactions on Knowledge and Data Engineering 2022.
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep learning; MIT press, 2016.
- Li, Q.; Ji, S.; Zhang, Y. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering 2021.
- Ng, A.Y. Feature selection, L1 vs. L2 regularization, and rotational invariance. Proceedings of the twenty-first international conference on Machine learning 2004, p. 78.
- Prechelt, L. Early stopping-but when? Neural Networks: Tricks of the Trade. Springer, 1998, pp. 55–69.
- Leskovec, J.; Sosic, R. SNAP: A general-purpose network analysis and graph mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 2014, 8, 1.
- Lloyd, S. Least squares quantization in PCM. IEEE Transactions on Information Theory 1982, 28, 129–137.
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014.
- Paszke, A.; Gross, S.; Chintala, S.; others. Automatic differentiation in PyTorch. Advances in neural information processing systems 2017.
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful Are Graph Neural Networks? International Conference on Learning Representations, 2020.
- Sasaki, Y. The truth of the F-measure. International Workshop on Learning from Imbalanced Data Sets, 2007.
- Zhuang, C.; Zare, A.; Yu, Y. Local augmentation for graph neural networks. NeurIPS Graph Representation Learning Workshop, 2019.
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 1997, 30, 1145–1159.
- Ying, Z.; You, J.; Morris, C.; Ren, X.; Hamilton, W.L.; Leskovec, J. Hierarchical graph representation learning with differentiable pooling. Advances in Neural Information Processing Systems, 2018, pp. 4800–4810.
- Lee, J.; Lee, I.; Kang, J. Self-attention graph pooling. International Conference on Machine Learning, 2019, pp. 3734–3743.
- Kazemi, S.M.; Goel, R.; Eghbali, S.; Ramanan, J.; Sahota, J.; Thakur, S.; Wu, S.; Smyth, C.; Poupart, P.; Brubaker, M.A. Representation learning for dynamic graphs: A survey. Journal of Machine Learning Research 2020, 21, 1–73.
- Qiu, J.; Chen, Q.; Dong, Y.; Zhang, J.; Yang, H.; Ding, M.; Wang, K.; Tang, J. Gcc: Graph contrastive coding for graph neural network pre-training. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1150–1160.
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. European Semantic Web Conference, 2018, pp. 593–607.
- Baldassarre, F.; Azizpour, H. Explainability techniques for graph convolutional networks. arXiv preprint arXiv:1905.13686 2019.


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).