The proposed approach begins with the formulation of a novel block formation approach centered around the influencers in a heterogeneous network. The first task is to formulate a metric to identify the prominent nodes in the multilayer network. Let the incoming links from active layers be given more weight. Therefore, our metric incorporates both the structural properties of the network and the node’s importance in each layer (node degree). The iterative process calculates the node scores for each node by propagating influence across layers until convergence is achieved. The blocks are built around the identified influential nodes to preserve the interlayer dependencies. These blocks are used for further research involving link prediction or information diffusion.
3.1. Identify the Influencers in the Directed Heterogeneous Networks
In complex networks, there are two types of nodes: influencers and non-influencers. Influencers typically have more connections than non-influencers. An influencer in a network refers to a central node that has a higher number of incoming edges. These nodes play a crucial role in shaping the dynamics of the complex network. The identification of the influencers indicates the directions of the edges. Formally, an influencer is defined as follows:
Definition 1. Let be the multilayer network, an influencer node exhibits the maximum centrality value among all nodes of the layer , where and , where L is the total number of layers.
The presence or absence of influencers in the network can provide valuable information about the likelihood of link formation. For example, if two nodes have a common influencer, they may be more likely to form a link. Influencers are located in densely connected blocks and exhibit cohesive internal structures. Influencers act as bridges between different communities or clusters in the network, facilitating link formation between nodes that would otherwise be disconnected.
Influencers impact the link prediction by affecting the growth of the network. Influencers attract new nodes to the network, leading to the formation of new links. In social networks that exhibit a follower-followee relationship, nodes tend to establish links with highly influential nodes [
43]. In such networks, the node with more incoming edges represents an influencer, as it corresponds to an entity with more followers, such as a person, product, or web page.
The identification process considers the node-to-node connections [
44] and the weights assigned to different layers in the network. To proceed with the node-rank computation, The layer weight for the incoming edges evolving from the different layers are considered. The layer weight increases as the number of active nodes increases.
Definition 2. An active node v belonging to a layer , has higher number of incoming edges ∈ from other layers , .
A layer containing more active nodes becomes active layer.
Definition 3. An active layer ∈L, has a higher number of incoming edges from other layers , .
More weight is given to the active layers. The active nodes, along with the active layers, contribute to information flow and interactions within the network at a given point of time and therefore are of interest for rank computation. It is assumed that initially, all the layers have equal weightage or otherwise based on the ground truth of the dataset. The iterative process of computing the rank of a node progresses with,
- (1)
Computing the rank of all nodes by considering only the incoming edges from the same layer. Let be the rank of node v. is the ratio of incoming edges of v to the total number of edges. Initialize the rank of all nodes to , where N is the total number of nodes in the network.
- (2)
-
Next, consider the layer weight , computed by the cumulative weight of the nodes in an individual layer. Initialize the weight of all the layers to 1. The weight of the layer increases with more active nodes in the layer.
- (a)
The computed rank is used to re-compute the layer weights. The rank of node v in the layer, , is computed as the product of the layer weight (weight of the layer containing the node v) and the rank of v in the layer, i.e., .
- (3)
To retain the interplay in a multilayer network, consider the inter-layer adjacency matrix for all
V. The inter-layer adjacency matrix captures the interactions between nodes across different layers of the network, thus quantifying how nodes in one layer affect nodes in another layer, thereby increasing the accuracy of the computation of ranks. The inter-layer adjacency matrix is computed as:
- (4)
The iterative approach to computation needs to converge. The damping factor is used in this context. The damping factor d is the probability that a vertex will randomly follow another vertex . The optimal value established for the damping factor is 0.85.
- (5)
Thus, the rank of all nodes in a multilayer network
is computed iteratively considering the initial rank of all the vertices in the network cumulate with the product of the layer weight to the rank of every vertex and normalized by the damping factor as:
- (6)
-
The layer weight is updated with the update to the rank score as:
The equation computes the updated layer weight for a layer l by normalizing the sum of its PageRank scores. The updated layer weight is calculated by taking the weighted average of the PageRank scores of the nodes in layer l, where the weights are the relative rank scores of each node in that layer.
This normalization ensures that the layer weights collectively form a valid probability distribution, reflecting the influence or importance of nodes within the layer
l. The goal is to maintain the probability interpretation of rank, where the sum of probabilities within a layer equals 1. This iterative process is continued for all layers and nodes until the rank values converge. The nodes with higher rank scores are considered more influential in the network. Based on a user-specified threshold, the top influencers are selected. For referral convenience, from here on, the proposed approach of influencer identification is referred to as
mpr. As a case study, apply the algorithm on the Florentine Family Marriage and Business Ties Data [
45], a multilayer network formed with 16 nodes as shown in
Figure 3. The Algorithm 1 illustrates the computation of the node rank in a multilayer network. The node ranks are computed for each node. Comparing the ground truth (VerPR), the ranking has improved, and the following observations were made:
Iterative convergence of rank value involves comparing the new PageRank scores with the previous ones that considered connections from the same layer
connections and from different layers
connections. Thus, the update phase involves
connections and
time for convergence checking.
|
Algorithm 1 Influencers’ Identification in Multilayer Network |
Input: , Output:
- 1:
Initialize
- 2:
false
- 3:
while not converged do
- 4:
true
- 5:
for do
- 6:
- 7:
end for
- 8:
for do
- 9:
for do
- 10:
- 11:
end for
- 12:
- 13:
end for
- 14:
for each do
- 15:
if such that then
- 16:
false
- 17:
end if
- 18:
end for
- 19:
end while
- 20:
- 21:
return I
|
Figure 2.
Layers of Florentine family business (left) and family marriage relations (right).
Figure 2.
Layers of Florentine family business (left) and family marriage relations (right).
Figure 3.
Tabulation of ranks of nodes using Ver-PR and m-PR.
Figure 3.
Tabulation of ranks of nodes using Ver-PR and m-PR.
3.2. Formation of Block around the Influencer
Once the influencers are identified, the next goal is to build blocks around each of them. The nodes within the blocks are determined by their affinity for the influencers. The affinity is computed based on the strongest properties that the nodes share with the influencers. A node may thus become part of one or more blocks, resulting in overlapping blocks. Nodes with no affinity for the influencer may not be part of any block. This means that these nodes may not contribute to link prediction at this time. Since the influencers were identified across the layers, the distinguishing aspect of our block formation technique centred around the influencer captures and preserves the interplay between the layers.
How strongly the node is connected to its neighbors in the network. Through this, the local importance of the network is uncovered.
How similar are the nodes based on attributes and interactions, thereby uncovering the network’s structure.
These blocks offer a higher level of abstraction than individual nodes, as they group together nodes with similar connectivity patterns. By considering these blocks alongside individual nodes, link prediction algorithms can better capture the underlying structure of the network and achieve more accurate predictions.
Let
v be a neighbor of an influencer
I. We want to know if
v should be added in the block around
I. The more strongly
v is connected to its neighbors, the stronger is its local importance. This local importance is defined in terms of the connections that
v has with its neighbors. Let neigh(
v) denote the set of neighbors {u
1, u
2, ...} of
v. It is easy to see that the strongest connection exists if {u
1, u
2, ...} ∪ {
v} is maximally connected. In other words, the nodes form a complete subgraph, which essentially works out to
edges between them. However, the actual number of connections between {u
1, u
2, ...} ∪ {
v} need not reach this maximum. The ratio of the actual number of connections to the maximum number of connections provides a measure of
v’s local importance. Let
denote the actual connections that {u
1, u
2, ...} share among themselves, and
is the degree of
v. Thus, the ratio is the clustering coefficient, given by the formula
The algorithm for computing the correlation is elaborated in Algorithm 2.
|
Algorithm 2 Compute Correlation |
Input: v, , , , , Output: Correlation for each
- 1:
number of edges within
- 2:
- 3:
- 4:
- 5:
for do
- 6:
number of edges within
- 7:
- 8:
- 9:
- 10:
- 11:
end for
- 12:
return
|
The node centrality, degree, and clustering coefficient to select similar nodes based on the attributes are considered. Nodes with high centrality act as bridges, connecting different parts of the network, and serve as hubs with many connections. This structural importance helps identify the similarity with the influencer. The degree of the node quantifies the number of edges incident to a node, indicating local prominence within the network. Thus, selecting similar nodes based on attributes (correlation) is a cumulative measure of the degree of node
, centrality
, clustering coefficients
, and
and
as normalizing factors. The correlation is computed as:
Thus, A block is defined as followed:
Definition 4. Let be the multilayer network. A block B consisting of nodes such that is built around the influencer I, and has a high clustering coefficient with I and shares similar connectivity patterns within the network.
Thus, block formation is summarized as a three-step process:
Pick an influencer, I.
Among all the neighboring nodes of I, we determined a node v that exhibits the strongest local property
Next, for all neighbors of v, compute the correlation with I.
The formation of blocks surrounding the influential nodes enables the capture of the influence on the overall network structure and modularity. Each block is distinguished by a unique property. Thus, the nodes within a block exhibit more similar properties. Such blocks effectively capture the directed relationships. Since a block consists of nodes distributed across the layers, the interlayer dependency is well preserved by considering the layer information [
46]. The blocks ensure the capture of both global and local information relevant to link prediction and information diffusion. Therefore the
link prediction between the blocks represents the prediction of the future links between the nodes that exhibit dissimilar characteristics. The information spread in the blocks evaluates how the spread of rumors can be regulated within blocks. The blocks, centered on influencers, also account for the information flow from influential nodes to their neighbors, which is used for both link prediction and information diffusion. This process is elaborated in Algorithm 3.
|
Algorithm 3 Block Formation around Influencers |
Input: , I (Set of Influencers), Desired Number of Blocks: N Output: Block Set B
- 1:
Initialize Block set,
- 2:
Initialize Block
- 3:
for each do
- 4:
- 5:
- 6:
for each do
- 7:
if then
- 8:
- 9:
- 10:
- 11:
end if
- 12:
end for
- 13:
for each do
- 14:
if then
- 15:
- 16:
end if
- 17:
end for
- 18:
end for
- 19:
- 20:
return
B
|
3.3. Link Prediction between the Blocks
The future links between two blocks depends on the strength of the interconnections between the overall nodes of one block and the overall nodes of another block. The strength is highest when every node of one block connects to every node of the other block. Since the underlying network is a directed network, the strength of the interconnection has both magnitude and direction.
Once the blocks are formed, we use the binomial distribution for likelihood estimation in link prediction, which is rooted in the assumption that the presence or absence of links between nodes follows a binary outcome (link or no link) and that these outcomes are independent across different pairs of nodes. Therefore, the Probability Mass Function (PMF) gives the
where:
- -
X is the number of observed links between nodes in different blocks,
- -
is the total number of possible links between nodes in different block i and block j,
- -
is the number of observed links,
- -
is the probability of a link between nodes in different block i and block j, calculated based on the likelihood.
Now compute the maximum likelihood with every pair of blocks using the probability mass function (PMF) equation as
where:
is the combined likelihood,
and is the observed and total possible connections,
is the strength of the connections, and
is the probability mass function. Algorithm 4 elaborates the link prediction process using maximum likelihood estimation.
|
Algorithm 4 Link prediction between the blocks |
Input: Blocks ; ; threshold: ; Output: probability values;
- 1:
for do
- 2:
- 3:
end for
- 4:
if then
- 5:
Return values
- 6:
end if
|
Section 5.1 will demonstrate experimental proof for the block formation and the link prediction process between the blocks on three different dataset.
3.4. Information Dissemination
For the spread of information across the network, the Linear Threshold Model (LTM) is considered as the underlying diffusion model. In LTM, individuals have a specific threshold or required number of neighbors that need to adopt the behavior or innovation before they adopt it. The model assumes that individuals are influenced by their social connections and the behavior of their neighbors, and the adoption process takes place gradually over time [
47,
48]. In the Linear Threshold Model, the adoption of
threshold remains fixed for each individual. This means the model doesn’t account for the individual’s evolving beliefs or changing social context. Once the threshold is crossed, the individual adopts the information, and this decision remains constant throughout the simulation. In reality, the individual’s threshold may dynamically adjust. For instance, exposure to multiple sources of information demands alignment with their evolving beliefs, or changes in the opinions of their social connections could lead to a change in threshold. Thus a node becomes active if the fraction of its active neighbors exceeds a certain threshold.
Let
be the different blocks formed around the influencers. Let
be the activation probability function for a node
n within a block
formed. The activation probability can be represented as,
where
X denotes the node being in the aware state,
Y denotes the size of the cluster,
denotes the centrality of the node. The activation probability increases with node centrality and awareness.
Each node in the network is associated with an awareness level, representing the amount of information they have acquired about the topic being spread. This awareness level can vary from node to node. The spread of information occurs through interactions between nodes in the network. Nodes with higher awareness levels are more likely to influence their neighbors and propagate the information further. Nodes may have different thresholds for adopting or transmitting information based on their awareness levels. For example, a node with low awareness may require more exposure to the information before adopting it, while a node with high awareness may quickly adopt and transmit the information.
The normalization factors ensure that the result is in a meaningful range of {0, 1} [
49]. The generating function for activation probability within a block represents the probability distribution of the number of nodes activated within the block. Let
be a variable related to the activation probability of
node within block
. The generating function
for diffusion within block
is computed as [
50],
where
represents the probability that there are
k activated nodes within block
and
variable capturing the activation probability within block
.
Considering the
be the different blocks formed around the set of influencers nodes, we can extend this approach to define activation probabilities and generate functions for each block. The overall generating function for the entire multilayer network can then be expressed as a combination of the individual generating functions for each block as,
By focusing on each block individually, we compute the activation probability function according to the specific characteristics and dynamics within the block. Modeling diffusion within each block individually recognizes and accounts for the variations in activation probability across blocks. Considering each block in isolation enables the inclusion of block-specific factors in the activation probability function. The Algorithm 5 illustrates the same.
|
Algorithm 5 Activation Probability and Generating Function for Blocks in a Multilayer Network |
- 1:
Input: Multilayer network with blocks
- 2:
Output: Activation probability and generating function for each block
- 3:
for to l do
- 4:
Initialize block-specific parameters
- 5:
Calculate intrinsic node characteristics X for each node in
- 6:
Calculate block-specific properties Y for
- 7:
Calculate social influence and connectivity features for each node in
- 8:
Activation Probability:
- 9:
Calculate activation probability function
- 10:
Generating Function:
- 11:
Parameterize the activation probability as
- 12:
Calculate generating function
- 13:
Store for blocks
- 14:
end for
- 15:
Overall Multilayer Diffusion:
- 16:
Combine individual generating functions for all clusters
- 17:
- 18:
return Activation probabilities and generating functions for all blocks,
|
The analysis of the speed and extent of information dissemination in a network is contingent upon understanding of the spread rate. The spread rate is defined as the rate at which the influence propagates through the network, measured as the number of new activations per unit of time. Thus, we calculate the spread rate through the nodes within the block using the activation probability as