Collaboration: People, Papers, Average Graphs, Durfee Squares and Metric Dimension

Melissa Holly

doi:10.20944/preprints202507.0495.v2

Submitted:

16 September 2025

Posted:

19 September 2025

You are already at the latest version

Abstract

Utilizing several methods, this note shows that, in any collaboration network analysis, paper exclusion not only creates a loss of information, but can lead to incorrect interpretation of network structure because interpretation of vertex degree in the authors only graph is not well defined. Because the bipartite authors with papers graph is the actual social network, metric dimension is used to show that the relative distance structure of the bipartite graph is often defined by the structure of the papers, not that of the authors. Due to the NP-hard nature of metric dimension, methods that increase computational efficiency for the bipartite authors and papers graph are explored. With a departmental collaboration focus, public data for 245 professors from mathematics, physics and biology departments of three U.S. public universities is analyzed with network structure compared using metric dimension. By discipline, an average graph is defined with average graphs constructed from the collected data for the authors only structure, for the bipartite authors with papers structure and for papers only. Social analysis of the collected data shows that a 27\% change in the total number of hubs, along with identifying different professors as hubs, when the authors only graphs are compared to bipartite graphs reiterating the need for paper inclusion in any collaboration study.

Keywords:

collaboration network

;

network evolution

;

metric dimension

;

Durfee square

;

Durfee rank

;

average graph

Subject:

Computer Science and Mathematics - Discrete Mathematics and Combinatorics

1. Introduction

Most collaboration network studies examine the authors only structure although it is recognized that some information is lost when papers are excluded. This study shows that excluding papers can lead to misinterpretation of various aspects of the collaboration network structure, possibly making accurate modeling of the network evolution impossible.

Utilizing graphs as sociograms, with a focus on the departmental collaboration of 245 professors, data was collected for three STEM departments in three U.S. public universities where the average department size is similar to Zachary’s karate club mentioned in several social network papers [25]. The smaller data set allows collaboration network analysis, such as an analysis of departmental hubs, not provided by larger data sets. To provide anonymity concerning the selected universities, the concept of an average graph is introduced with data analysis given for graphs based on the collected data and for the constructed average graphs. Used to compare the change in relative distance structure when papers are added to the authors only structure, metric dimension analysis reflects that the authors only structure does not necessarily preserve the relative distance structure of the actual collaboration network. For any bipartite graph G and utilizing G’s distance matrix (DM), DM block resolving provides an accurate but more efficient method for finding the metric dimension of G,

\dim (G)

. Denote the bipartite authors and papers graph with

G_{A P}

, the authors only graph by

G_{A}

and the papers only graph as

G_{P}

. Proposition 3 shows that DM block resolving can be utilized to find

\dim (G_{A})

and

\dim (G_{P})

. Theorem 1 states that, given a specific result from DM block resolving, only three DM blocks need to be resolved in order to find

\dim (G_{A P})

. Propositions 8 and 9 provide more efficient methods for determining

\dim (G_{A P})

given certain conditions such as Durfee rank of a graph comparison. This note appears to be the only social network analysis utilizing metric dimension to compare related social network structures, and the only one utilizing degree diagrams and the related Durfee rank of a graph.

In this study’s nine departments, an average of 50% of the faculty performed departmental collaboration between 2019 and 2023. Six of the nine department chairs did departmental collaboration with three of the six acting as collaboration hubs. In comparing the

G_{A}

to the bipartite

G_{A P}

, the

G_{A P}

represents the actual social network, not the

G_{A}

. Although there is an interpretation difference in large vertex degree in a

G_{A}

compared to that of a

G_{A P}

, based on collaboration hubs, a comparison of the

G_{A}

to the

G_{A P}

results in a 27% increase in the number of hubs, plus different authors are identified as the hubs. The large degree interpretation of the

G_{A}

can indicate two different

G_{A P}

structures, so the

G_{A}

interpretation is not well defined. This emphasizes the importance of paper inclusion and gives foundation for Conjecture Section 6.2 stating that accurately determining social network evolution models requires paper inclusion.

Although the use of graphs in studying networks preceded their paper, Harary and Norman’s 1953 article that connected graph theory to social network analysis [13] got serious attention. de Solla Price’s 1965 article [30] discusses the network structure of scientific papers based on the papers’ references. Another important social network study was conducted by Milgram in 1967 [23] where randomly selected individuals were asked to forward a letter to a stock broker in Boston. In 2001 Newman [24] utilized large scientific paper databases to study the collaboration structure of scientific research. Often used to compare networks, metric dimension was published independently by Slater in 1975 [33] and by Harary and Melter in 1976 [14]. Metric dimension was originally shown to be NP-complete in [6] but has been more recently shown to be NP-hard [15]. The metric dimension of complete bipartite graphs is given in [8] while [4] discusses

\dim (G)

for regular bipartite graphs. No exact method has been found for finding the metric dimension covering all bipartite graphs. Metric dimension research is an active area with examples of recent work found in [29,37] while [34] provides a 2023 survey of metric dimension results.

Because this note is written for a variety of possible readers ranging from sociologists to network analysts, explanations are kept as simple as possible although some basic graph theory knowledge is assumed. Section 2 provides graph theory notation and other possibly unfamiliar concepts important to this note. That section concludes with social network concepts and some results of other social network studies. Section 3 gives the data collection methods for this study. Vertex projection graphs based on the bipartite authors with papers graph are discussed in Section 4, while Section 5 discusses the structural specifics of the bipartite authors and papers graph. The social analysis in Section 6 interprets the data collected on departmental collaboration with respect to department chairs who act as hubs.

Since this study takes a close view of mathematics, physics and biology departments, the data analysis presentation focuses on concealment of which universities were used via average graph construction. The nine average graphs given in Section 7 are accompanied by brief discussions. The use of the degree diagram and the Durfee square with its rank as an analysis tool for bipartite graphs is discussed in Section 8. Section 9 covers the metric dimension analysis of the average graphs and methods for determining the metric dimension for the authors and papers bipartite graph. A review of this study’s results and possible future work concludes this paper in Section 10. This section also discusses the challenges presented by paper inclusion in collaboration studies utilizing large databases.

2. Background Information

Whether this note’s readers are sociologists, statistical physicists or mathematicians, this note assumes basic graph theory familiarity as found in [9]. Utilized in many social analysis studies, a sociogram is a graph where vertices reflect people and/or social groups with edges representing vertex relationships. Graphs constructed from the collected data are referred to here as data derived graphs.

The Pigeonhole Principle states that given more pigeons than pigeonholes for the pigeons, and assuming that all of the pigeons find a pigeonhole, at least one pigeonhole contains more than one pigeon.

2.1. Graph Theory Notation

A proper subset A of set X is denoted by

A \subset X

with set cardinality given by

| X |

and element x inclusion in X by

x \in X

. Graph G has vertex set

V (G)

with cardinality

| V (G) |

, or simply n, is called G’s order. Edge set

E (G)

has cardinality

| E (G) |

, or m, and is G’s size. A vertex v in a specific graph G is

v_{G}

. All G in this note are simple so multiple edges and loops are excluded. Graph G isomorphic to graph H is denoted with

G ≅ H

. The diameter of a connected G,

diam (G)

, is the longest distance found over all of G’s vertex pairs. Distance variety is the set of possible distances as determined by connected G’s diameter. For instance, suppose G’s diameter is 4. If all distances between specific

v \in V (G)

and all other

v_{i} \in V (G)

are determined, this set of distances can only include 0 (from v to itself), 1, 2, 3 and 4 since the diameter is the longest path in G. All of these distances do not necessarily exist for all

v \in V (G)

: instead, these distances are the only possible distances for G. For vertices

v_{1}

and

v_{2}

, let

d (v_{1}, v_{2})

denote the distance between

v_{1}

and

v_{2}

and let

(v_{1}, v_{2})

indicate an edge between vertices

v_{1}

and

v_{2}

.

Complete graphs are denoted

K_{n}

where n is the graph order, path graphs by

P_{n}

and cycle graphs as

C_{n}

. Distinctly label the two vertex sets of bipartite G so that each set is clearly distinguished from the other, and call this labeling a distinguishing labeling. The goal of this labeling is to generate block matrices for the bipartite graph where possible. As an example, during this study, the exact number of authors who did departmental collaboration and the exact number of representative papers was initially unknown. However, for each department, it was a given that the number of authors was

< 100

. Thus, when department graphs were constructed, the distinguishing labeling gave authors a label

< 100

and papers

> 100

.

The open neighborhood of vertex v is

N (v)

while the closed neighborhood is

N [v]

. For

v_{1}, v_{2} \in V (G)

,

v_{1}

adjacent to

v_{2}

is

v_{1} \sim v_{2}

. Given

v_{1}

and

v_{2}

with edge

(v_{1}, v_{2})

between them, if vertex w is added between

v_{1}

and

v_{2}

creating edges

(v_{1}, w)

and

(w, v_{2})

, then edge

(v_{1}, v_{2})

is subdivided by w. In comparing graphs

G_{1}

and

G_{2}

, if

G_{2}

is

G_{1}

with every edge subdivided, then

G_{2}

is said to be a subdivided

G_{1}

. A subgraph in G is a collection of vertices and edges that form a graph as part of G.

Vertex v with degree of 1 is denoted

deg (v) = 1

, and v is called a pendant vertex. Define pendant chain in a bipartite graph as a

P_{n}

subgraph of three or more vertices in G that begins with a pendant vertex and concludes with a vertex incident to at least 3 edges in G with only degree 2 vertices between the pendant vertex and the concluding vertex. The length of the pendant chain is the number of edges between the pendant vertex and the vertex with degree greater than 2. A

P_{n}

is considered to have no pendant chains. The maximum degree of a vertex in

V (G)

is denoted by

Δ (G)

and the maximum degree of set X is

Δ (X)

. The minimum degree in G is

δ (G)

.

2.2. Distance Matrix and Common Neighbor Matrix

For any G with order n where

v_{i}, v_{j} \in V (G)

, the distance matrix DM is the symmetric

n \times n

matrix indexed by

V (G)

. Element

d_{i, j}

of DM contains the distance between

v_{i}

and

v_{j}

in G. A DM is only defined for connected graphs or for each connected component of a graph. Each

d_{i, i}

is zero reflecting the distance of each vertex to itself. In bipartite graphs, distances between elements of the same partite vertex set are all even while distances between elements of the two distinct partite sets are all odd. Given a distinguishing labeling on bipartite

G_{A P}

with partite sets A and P, G’s DM has blocks

a \times a, p \times p, a \times p

and

p \times a

where

a \in A

and

p \in P

. Blocks

a \times a

and

p \times p

contain even distances while the distances in the

a \times p

and

p \times a

blocks are odd.

Given G, define the common neighbor matrix CNM as a symmetric

n \times n

matrix with indices defined on

V (G)

where element

c_{i, j} \in

CNM at the intersection of

v_{i}, v_{j} \in V (G)

is the number of common neighbors that

v_{i}

shares with

v_{j}

. A CNM can be defined to include the common neighbors of vertices that are adjacent and/or those that are not adjacent. In this note, it is assumed that any CNM here is defined between nonadjacent vertices. Thus, for nonadjacent

v_{i}, v_{j} \in V (G)

,

c_{i, j} = | N (v_{i}) \cap N (v_{j}) |

. The CNM based on nonadjacent vertices for any

K_{n}

is an all zero matrix. For nonadjacent vertices, if G is bipartite and

c_{i, j}

is non-zero, then

v_{i}

and

v_{j}

are in the same partite set. Note that any row (or column) sum can include duplicate vertices since a vertex can be shared between vertices

v_{i}

and

v_{j}

and also shared between

v_{i}

and vertex

v_{k}

.

In Section 4, the CNM of the bipartite authors with papers graph is used to construct projection graphs. Based on common neighbors of nonadjacent vertices, a similar matrix is the κ-th order vertex-adjacency matrix

{}^{v}A_{κ}

as given in [17] for

κ = 2

. This matrix is a binary matrix with a 1 when

v_{i}

and

v_{j}

have distance 2 and a zero otherwise. However, the CNM provides greater flexibility regarding the choice of adjacent or nonadjacent vertices, plus providing the number of common neighbors shared by two vertices.

2.3. Average Graphs

The motivation for developing an average graph derives from the desire for both anonymity in this study yet using graph/network structure images. An average graph is a graph defined on the statistical averages of a parameter set for a collection of graphs. This concept has meaning when the graphs in the collection of graphs are specifically related. Although the departments within each university in this study are related to each other by institution, the average graphs in Section 7 represent the averages of the data derived graphs for each discipline as this aligns with this study’s objective.

The three data derived physics graphs (authors only, authors with papers and papers only graphs) differ significantly from those for mathematics and biology; but the three data derived physics graphs have similar structure to each other. The same can be said for the three data derived mathematics graphs and the three biology graphs.

In constructing this study’s average graphs, average graph order and size are used, along with average diameter, average degree and, as discussed later in this note, average metric dimension. Because many averages generate decimal values, truncation value to rounded value ranges are used as targets in average graph construction. If an integer value results, then that value is used. Average degree is determined as the sum of the three graphs’ degree sequences divided by the sum of the graph orders ± one standard deviation (

\pm σ

) with the target range based on the truncation value to the rounded value range as with the other averages. For bipartite authors with papers graphs, the average number of papers and the average number of authors is calculated. The average number of components is determined along with the average number for each component type such as the average number of

P_{3}

components, etc. Regarding the larger components that are distinct, an average large component(s) is found that is average with respect to order, size, degree, diameter and metric dimension.

2.4. Durfee Square

The concept of Young diagrams, and their included Durfee squares, are used in [20,21] with respect to the degree sequence of a graph. In [21,32], these two concepts are utilized in connection to threshold graphs. The Durfee square in [1] is used for h-index enhancement, while in [31], the Durfee square is utilized in measurement of scholarly impact.

A Young diagram (also called Young tableau and related to Ferrer’s diagram) gives a visual image of a non-negative non-increasing integer partition. Imagine the number 4 as four horizontal squares reflecting 4+0. Then 3+1 can be visualized as Preprints 173236 i001

with a maximal partition integer as the top row and each subsequent row below the top row representing a partition integer less than or equal to its predecessor row partition integer.

The Durfee square of a Young diagram is the largest square of squares anchored by the upper left square of the diagram. The Durfee rank,

r_{□}

, is the number of squares along one edge of the Durfee square, which is also the number of squares along the Durfee square diagonal (

r_{□}

is also called the Frobenius rank [10] or partition trace [20]). In [2], a Young diagram corner Preprints 173236 i002

is referred to as an inner corner of the Young diagram while a Preprints 173236 i003

corner is called an outer corner. Figure 1 displays five types of corners associated with

r_{□}

, each of which relays different information regarding the large degree structure of a graph. Let

R_{r_{□}}

indicate the lowest (from the top) degree diagram row that includes

r_{□}

, and let

ℓ (R_{r_{□}})

denote the length of this row. The row above

R_{r_{□}}

is row

R_{r_{□} - 1}

and the row below

R_{r_{□}}

is row

R_{r_{□} + 1}

. The table at the bottom of Figure 1 reflects the relationship of each

r_{□}

corner type with respect to the degree diagram rows

R_{r_{□} - 1}

,

R_{r_{□}}

and

R_{r_{□} + 1}

.

Any graph with at least one edge, has a degree sequence that is integral. An integer sequence from which a graph can be constructed is called graphic. In the same manner that 4 can be partitioned as either 2+2 or 3+1, a degree sequence can be viewed as a partition of twice a graph’s size or

2 m

. A degree diagram is a Young diagram of a graph’s’ degree sequence, or a degree sequence of a vertex subset that might not be graphic. Certain requirements exist for an integer sequence to be graphic (see [9,21] for general information). There exist two easy indicators that a degree sequence is graphic. First, the number of odd degree vertices must be even. Also, for a degree sequence X,

Δ (X) < | X |

is required. Let

r_{□} (G)

be the Durfee rank of G.

As explained in the next subsection, the larger degrees in G impact G’s metric dimension. For G,

r_{□} (G)

reflects that there exist at least

r_{□} (G)

number of vertices that have degree at least

r_{□} (G)

; thus giving a minimum value for the large degree structure in G. The identification of the

r_{□}

corner conveys additional large degree information as shown by the table in Figure 1.

Isolated vertices

K_{1}

are excluded from this note; and they are excluded from the degree diagram concept. Given a vertex subset X of a connected G,

r_{□} (X)

is the Durfee rank of X’s degree sequence. Although the degree diagram of a vertex subset X can have a single row, any degree diagram of G must have at least two rows.

Consider a bipartite

G_{X Y}

with partite sets X and Y. The degree sum of either vertex set is

m = | E (G) |

; thus, the degree sequences of X and Y are each partitions of m. The degree sequence of either set is most likely not graphic, but placing each sequence in a degree diagram provides significant information regarding G. If the partite degree sequences are placed in degree diagrams, then the visual partition image of one diagram is simply a rearrangement of the squares in the other partition image. Discussed later in this note, Figures 10 and 11 each display a bipartite

G_{A P}

with the degree diagrams for the two vertex sets A and P.

Assume a connected G. If

r_{□} (G) = 1

, then the second row of the given degree diagram must contain a single square, and the first row must contain one or more squares. If the first row contains a single square, then G must be bipartite

K_{2}

which is a star graph,

K_{1, n - 1}

, with

n = 2

. If the first row contains more than one square, then G is again a star graph. Thus,

r_{□} (G) = 1

if and only if

G ≅ K_{1, n - 1}

. This proves Proposition 1.

Proposition 1.

A connected G has

r_{□} (G) = 1

if and only if G is a star graph,

K_{1, n - 1}

, with

n \geq 2

.

Although complete characterization of simple G based on

r_{□} (G)

is beyond the scope of this note, if

r_{□} (G) = 2

, then

Δ (G) \geq 2

for two or more vertices. Graphs with

r_{□} (G) = 2

include

C_{n}

and, for

n \geq 4

,

P_{n}

. There is more to explore in the Durfee square and Durefee rank concepts than what is contained in this note.

2.5. Metric Dimension

Introduced independently by Slater in [33] (1975) and Harary and Melter in [14] (1976), the metric dimension of a graph has found numerous uses related to the comparison of network structures. If two graphs have the same metric dimension, then, based on relative distance, the two graphs have a similar structure.

Suppose G is a simple graph with

v_{i}, v_{j} \in V (G)

, and let

d (v_{i}, v_{j})

indicate the distance between vertices

v_{i}

and

v_{j}

. Imagine ordered subset W of vertices in G such that every vertex in G has a unique combination of distances to the members of W. Then W is called a resolving set of G. The fact that the elements in W are considered to be ordered is critical. The cardinality of a minimum resolving set is the metric dimension of G,

\dim (G)

. Set W is the metric generator of G, and the elements of a minimum cardinality W is a metric basis of G [8]. There can be more than one resolving set W with minimum cardinality.

As an example, consider the graph given in Figure 2 along with two W sets,

W_{1}

and

W_{2}

. Each vertex v has a unique distance vector,

r (v | W_{i})

(also known as the metric representation of v or metric code of v), that contains the distances from v to each of the members in that particular W. W is a resolving set if and only if all distance vectors for all v in G are unique. Any vertex v in W has a unique

r (v | W)

as a zero is in v’s distance vector at v’s position in W; so both

W_{1}

and

W_{2}

in the figure are resolving sets of G. If vertex 2 is added to each W, W is still a resolving set but it is not minimal. However, if any vertex is removed from either

W_{1}

or

W_{2}

, then the distance vectors for the vertices not in the altered W are no longer unique, indicating that both

W_{1}

and

W_{2}

are minimal resolving sets for G; and G has

\dim (G) = | W_{i} | = 3

where

i \in {1, 2}

. Additional information on the metric dimension of a graph can be found in [8].

2.5.1. Known Metric Dimensions

The metric dimension of some graph families has been determined. All

P_{n}

have metric dimension of 1,

\dim (K_{n}) = n - 1

for all

K_{n}

and

\dim (C_{n}) = 2

. Bipartite star graphs have

\dim (K_{1, n - 1}) = n - 2

. There exist additional graph families with known

\dim (G)

not mentioned here, and finding the metric dimension remains an active area of research. From Proposition 1, if

r_{□} (G) = 1

, then

G ≅ K_{1, n - 1}

and

\dim (G) = n - 2

.

2.5.2. Distance Matrix and Metric Dimension

All G have

\dim (G)

. From a technical linear algebra standpoint, matrix columns represent vector space bases while rows map to the field. With respect to any DM, because a DM is symmetric, the metric dimension of G can be found utilizing either the columns or the rows of G’s DM [3]. Rows are utilized in this note because this seems more natural than using columns. The manner in which the DM is used to find the metric dimension of a graph is best explained with an example. Figure 2 contains the distance matrix DM for the displayed G. The goal is to find a minimum number of rows that provide a set of unique ordered combinations of distances in the selected rows’ column combinations. Any specific

v_{i}

indexing a DM row is assumed to be in a W set and the ordered column combinations of a collection of

v_{i}

are the distance vectors for the

v_{i} \in W

to

v_{j} \in V (G)

. Alternatively, if columns are used, the vertices

v_{j}

indexing the columns are in W and the ordered row combinations are the distance vectors for the

v_{j}

to

v_{i}

. Note that any column combination in row

v_{i}

that contains a zero is unique as this combination indicates that

v_{i} \in W

.

In Figure 2, first notice that the DM row for pendant vertex 1 has a unique number of 1s and 2s compared to the other rows. A unique set of distances is often true for pendant vertices. Second, notice that the row for vertex 2 contains all 1s except for the single zero; so getting unique column combinations with vertex 2’s row is difficult. Let

W = {1, 2}

and consider the DM rows for vertices 1 and 2. For this W, the column combinations for vertices 3, 4 and 5 are

r (3 | W) = r (4 | W) = r (5 | W) = (2, 1)

so W is not a resolving set. Now select the DM rows of 1 and 3 placing 1 and 3 in W. Compared to rows 1 and 2, rows 1 and 3 have one less repeated column combination. However, still

r (4 | W) = r (5 | W) = (2, 1)

. So let either vertex 4 or 5 be in W making

| W | = 3

. Rows 1, 3 and 5 provide the set of unique column combinations that are the unique distance vectors

{(0, 2, 2), (1, 1, 1), (2, 0, 1), (2, 1, 1), (2, 1, 0)}

, so

W = {1, 3, 5}

is a resolving set. Comparing all combinations of rows, all minimal resolving sets contain three elements so

\dim (G) = 3

. Note that the closed neighborhoods of vertices 3, 4 and 5 in Figure 2 are the same.

2.5.3. Diameter and Metric Dimension

Since the diameter is the longest possible distance between any two vertices in G, the diameter reflects the maximum distance variety over the vertices of G. Consider G with diameter of 2 so

n > 2

. Then, excluding the zero in each DM row and utilizing

diam (G)

, the only possible distances contained in G’s DM are 1 and 2. As the order of G increases, so does the length of the rows in DM. Excluding zero, let n be the number of distances in G’s distance variety and let r be the number of DM rows; so r is also the number of zeros in the r rows. There exist a maximum of

n^{r} + r

possible unique ordered combinations. In the example with

diam (G) = 2

, for distances 1 and 2 and for 2 rows, there exist 6 maximum unique column combinations. Thus, two rows cannot give unique column combinations required by

\dim (G)

if the rows are longer than 6.

Given a fixed graph order, as diameter decreases, the metric dimension tends to increase. For simple connected G, if G’s diameter is

n - 1

, then G is

P_{n}

and

\dim (G) = 1

. If G’s diameter is 1, then

\dim (G) = \dim (K_{n}) = n - 1

.

2.5.4. Degree and Metric Dimension

Vertex degree in G also plays a significant role in

\dim (G)

because as general degree increases for a fixed order G, diameter tends to decrease due to more vertices becoming adjacent to each other. As diameter decreases, the variety of distances in G’s DM tends to decrease since the number of 1s increases in some rows. As the distance variety decreases, the number of DM rows required for unique column combinations tends to increase. In other words, for a fixed graph order, a general increase in vertex degree also tends to increase metric dimension.

2.5.5. Twin Vertices and Metric Dimension

Given distinct vertices

v_{1}

and

v_{2}

in G, if either

N (v_{1}) = N (v_{2})

or

N [v_{1}] = N [v_{2}]

, then

v_{1}

and

v_{2}

are twin vertices [19,39], and distances from

v_{1}

and

v_{2}

to the other vertices in G are the same. Therefore, either

v_{1}

or

v_{2}

must be in a minimal W. A set of twin pairs can have more than two vertices, all of which have the same set of distances. As noted above, in Figure 2,

N [3] = N [4] = N [5] = {2, 3, 4, 5}

indicating that this vertex set is a set of three twin pairs. This forces two of the three twins to be in a minimum W. In other words, given x number of twin vertices in the same twin set,

x - 1

of the vertices must be in a minimal W for G. Any

K_{n}

has a twin set of cardinality n making

\dim (K_{n}) = n - 1

as is known.

2.6. Social Network Background

Considered to be social networks, collaboration networks have been one of the most active areas of research for the past couple of decades. In this note, the graphs formed by authors without papers, by authors with papers, and by papers without authors, are discussed since the papers, and their connected research activities, are the social groups. For this study, the assumption is made that departments have a physical existence where professors see each other on a regular basis, giving them the opportunity to discuss their research.

A research group is a collection of collaborating authors within the same organization while a research network includes collaborators from more than one organization [18]. Given these definitions, this note is focused on the departmental research groups that may exist in a research network. Collaboration can lead to a larger number of publications, career advancement and increased access to funds [36].

Milgram’s impactful 1960s study [23] involved 160 random individuals in Nebraska who were requested to forward a letter to one of Milgram’s Boston friends. A requirement to the forwarding was that the letter be sent only to people who the sender knew on a first-name basis. Even though Milgram’s study was on a small scale, Milgram’s requirement of first-name basis has been used to justify utilizing collaboration networks as representative social networks as opposed to film actors in films [24] because it is assumed that coauthors tend to know each other on a first-name basis. As mentioned in [24], some papers have very large laboratories as authors, so a first-name basis seems unlikely in those cases. The social network in any department is undoubtedly one where first names are known among its faculty.

Consider the different environments found in the three disciplines covered by this study. Both physics and biology can have complex physical laboratories while the mathematician’s laboratory is typically paper and pen, or marker and board, or one or more computers. This difference results in a larger total number of collaborators for biology and physics compared to mathematics papers as discussed in [24]. Although a paper may have many authors, the only authors considered in this study are those from the same university department. Any given author team may produce a number of papers; but in this note, only a single paper that represents a distinct author collaboration structure is considered.

3. Data Collection Methods and Approach

This section discusses the data collection methods used in this study with a focus on the use of representative papers. The collected data’s purpose is to generate collaboration network graphs on which metric dimension is used to compare the structures.

3.1. Data Collection Methods

Five years of public information (2019 to 2023) as found on Google Scholar, ResearchGate and Web of Science is used for the professors in the mathematics, biology and physics departments of three U.S. public universities. Duplicate papers are excluded. Utilizing the same logic as given in [24], preprints are included in this study when they do not duplicate published papers.

Selection of the three United States public universities is based on the following common characteristics as determined directly from each university’s web site.

Total student enrollment is between 25,000 and 30,000 with a primary campus that includes a medical school and hospital. Primary campus is defined as containing at least 70% of the student population.
The basic structure of all three discipline departments is fundamentally the same; so each department has an applied faculty who work on medically related mathematics along with general research areas for that discipline.

Collaboration focuses exclusively on tenure track faculty in the same department. Professors who perform only research (have no teaching responsibilities) are eliminated because not all of the departments have these positions. Any professor officially listed on the web as being in more than one department is removed. In all nine of the departments, some faculty collaborate with both medical school personnel as well as members of their departments. In this case, the focus is exclusively on the collaborating authors within the studied department. Department inclusion of faculty during the study’s five year period is determined from various public sources including institutional reference on published papers.

It is assumed that the web sites are accurate and current. This assumption is applied to both the university web sites as well as those for the nine departments. The assumption is made that professors are accurately listed in the various research areas.

3.2. Data Approach - Representative Papers

As our objective is to analyze the collaboration structure and not the total amount of collaboration activity, only representative papers are utilized. In other words, if two department authors collaborate on 20 papers within the five year period, only one representative paper is recorded for the collaboration. However, if a third author in the department is periodically added to the collaborating team, then a second representative paper is documented. Thus each representative paper has a unique department collaboration authorship.

3.3. Collaboration Group Size

The collaboration group in this note is the number of professors in the same department who are authors on a representative paper, not the total number of authors on an actual paper. In almost all instances, the actual number of authors is greater than the number who are from the same department. Exceptions to the last statement are typically found in the three math departments where the mathematicians from the same department are the only authors on the published paper.

4. People, Papers and Graphs

Let

G_{A}

be a graph that contains only authors where edges connect the authors collaborating together on research papers. Denote a graph based only on representative papers as

G_{P}

where an edge between two papers reflects that the papers have at least one author in common. In collaboration network studies, because the social groups are the papers, the actual social situation is the bipartite graph,

G_{A P}

, constructed with author set A and related paper set P.

4.1. Projection Graphs

Graph

G_{A}

is the graph discussed in most collaboration network analysis studies. This graph is a projection graph derived by projecting the author vertices

a \in A

onto the papers

p \in P

in the bipartite

G_{A P}

[28]. Graph

G_{P}

is the projection graph of the set of p onto the set of a in

G_{A P}

; so

G_{P}

depicts the structure of the research groups identified by the representative papers in this note. Thus

V (G_{A})

is

A \subset V (G_{A P})

and

V (G_{P})

is

P \subset V (G_{A P})

. Figure 3 depicts

G_{A}

,

G_{A P}

and

G_{P}

of a few large components based on this study’s collected data. Note, in some of the depicted graphs, but not all,

G_{A P}

is generated by subdividing every edge of

G_{A}

. Also note that for two of the graph trios, the

G_{A}

are isomorphic

K_{3}

while their related

G_{A P}

are quite different, as are the two

G_{P}

.

4.2. Construction of Projection Graphs from CNM

Let a and p be vertices in

V (G_{A P})

,

a_{G_{A}} \in V (G_{A})

and

p_{G_{P}} \in V (G_{P})

. The open neighborhood of

a_{G_{A}}

(or

p_{G_{P}}

) is the set union of a’s neighbors’ neighborhoods less a in

G_{A P}

(and the same for p) so duplicate vertices are eliminated. For a, let

p_{i} \in N (a)

where

N (p_{i}) - a

is a set of

a_{j}

less a (for p, let

a_{j} \in N (p)

and

N (a_{j}) - p

is the set of

p_{i}

less p.) Thus,

deg (a_{G_{A}}) = | ⋃ N (p_{i}) - a |

where

p_{i} \in N (a)

and similarly for

p_{G_{P}}

. Hence,

N (a_{G_{A}})

is based on

N (a)

, the degree of each

p_{i} \in N (a) - a

less the number of neighbors shared among the set of

p_{i}

.

Given any

G_{A P}

, its CNM based on vertex nonadjacency can be utilized to construct

G_{A}

and

G_{P}

as follows. For bipartite

G_{A P}

with partite vertex sets A and P, define a graph

G_{A}

on vertex set A where vertices

a_{i}, a_{j} \in A

and

a_{i} \sim a_{j}

in

G_{A}

if there is a non-zero value in

G_{A P}

’s CNM at

c_{i, j} \in

CNM. A similar graph

G_{P}

can be defined for vertices

p_{i}, p_{j} \in P

in

G_{A P}

. The count of the non-zero entries in any row of

G_{A P}

’s CNM then gives the degree of vertex

a_{G_{A}}

indexing the CNM row and similarly for any

p_{G_{P}} \in G_{P}

. The order of

G_{A}

is

| A |

where

A \subset V (G_{A P})

and

| G_{P} | = | P |

where

P \subset V (G_{A P})

.

G_{A}

and

G_{P}

are the two projection graphs of

G_{A P}

derived from

G_{A P}

’s CNM.

Thus, due to the bipartite nature of

G_{A P}

, if a distinguishing labeling is given to

G_{A P}

that clearly separates set

a \in A

from the members of the set

p \in P

(such as papers labeled

\geq 100

and authors

< 100

), and the indices of CNM are in numeric order, then the CNM is a block matrix with

a \times a

,

a \times p

,

p \times a

and

p \times p

blocks. Based on nonadjacent vertices, the

a \times p

and

p \times a

blocks in the CNM for bipartite

G_{A P}

are all zero blocks. Note that the DM of a

G_{A P}

can also be used where the

G_{A}

(and

G_{P}

) is constructed based on vertex pairs in the

a \times a

(also

p \times p

) block with distance 2.

5. Structural Specifics of the Authors with Papers Graph

The use of representative papers makes the structure of the

G_{A P}

very specific. However, consider any social group network such as actors and films, or women and their participation in Southern US social groups [5], or the collaboration structure found in current large research paper databases [24] with the concept of representative groups. Thus, the defined structure of the

G_{A P}

in this note applies to many similar social situations. Note, only even

C_{n}

with

n \geq 6

and odd

P_{n}

with

n \geq 3

are

G_{A P}

.

Projection graphs

G_{A}

and

G_{P}

can be either bipartite or non-bipartite. In the 18 data derived projection graphs, 22% of the

G_{A}

largest components and 44% of the

G_{P}

largest components are bipartite.

5.1. Pendants, Degrees and Neighborhoods

The papers in this study require at least two faculty members for them to be considered in a

G_{A P}

. This requirement and the use of representative papers restricts the possibilities for the structure of these graphs. Below is a list of the specific structural aspects of a

G_{A P}

large component and its related

G_{A}

and

G_{P}

. Proofs of the following statements are left to the reader.

1.

Pendant vertices and pendant chains:

(a): In any $G_{A P}$ , only authors can be a pendant vertex.
(b): Any pendant author in a $G_{A P}$ is also pendant in its $G_{A}$ .
(c): Any pendant chain in a $G_{A P}$ includes at least one author and one paper.
(d): Any pendant paper in a $G_{P}$ is not pendant in the related $G_{A P}$ .
(e): Any pendant vertex in a $G_{P}$ must be in a pendant chain with minimum length of 3 in the related $G_{A P}$ .

2.

Degree and Durfee rank: Since only authors can be pendant vertices in a

G_{A P}

, the minimum possible degree for authors is 1 while that of papers is 2.

(a): For the set of paper vertices in a $G_{A P}$ , $2 \leq r_{□} (P) \leq Δ (G_{A P})$ .
(b): For author vertices in a $G_{A P}$ , $1 \leq r_{□} (A) \leq Δ (G_{A P})$ .
(c): $r_{□} (G_{A P})$ can be greater than either $r_{□} (A)$ or $r_{□} (P)$ .
(d): In a $G_{A P}$ , vertex $a \in A$ always projects to a $p \in P$ that has at least one neighbor in the $G_{A P}$ .
(e): Vertex p can project to an a vertex that has no other neighbor (i.e. a is pendant).

3.

Neighborhoods: Each paper is a representative paper resulting in unique neighborhoods for all papers in any

G_{A P}

.

(a): No paper vertex can be a twin of another paper vertex in a $G_{A P}$ .
(b): Author vertices can be in more than one twin pair.
(c): All bipartite $G_{A P}$ are planar due to the distinct neighborhoods of all p vertices.

5.2. Degree Projection

Consider the situation depicted in Figure 4 where author a has collaborated with other authors on six representative papers shown as gray vertices. Then each paper has vertex a as a common neighbor with the other five papers; so the p vertices are nonadjacent common neighbors of each other. This gives each of the papers in the related

G_{P}

at least a degree of 5, and places the six paper vertices in a

K_{6}

subgraph in

G_{P}

. In other words, a high degree in one of the vertex sets of

G_{A P}

is projected onto the vertices of the other set in the latter set’s projection graph. The degrees for papers

p_{2}, p_{3}, p_{4}

and

p_{5}

increase past 5 depending on the degree of the other author vertices to which each paper is adjacent in

G_{A P}

.

Proposition 2.

Assume connected

G_{A P}

has partite sets A of authors and set P of representative papers so

Δ (G_{A P}) = max {Δ (A), Δ (P)}

. In the projection of P vertices to the vertices in A,

Δ (A)

generates a

K_{Δ (A)}

subgraph in

G_{P}

; and by projection of A onto P,

Δ (P)

generates a

K_{Δ (P)}

subgraph in

G_{A}

.

Proof.

Given a

G_{A P}

as described, if

Δ (A) = x

, there exists vertex

a \in A

that is adjacent to x number of

p \in P

. Call the set of x number of p vertices

P_{x}

. Since the members of

P_{x}

share a as a neighbor, they have each other as common neighbors of a. So each pair of vertices in

P_{x}

has a nonzero value at their intersection element in

G_{A P}

’s CNM. Hence, there is a

K_{x}

subgraph in the

G_{P}

. The same reasoning applies to

Δ (P) = y

producing a

K_{y}

subgraph in

G_{A}

. □

If

G_{A P}

’s size m is even, the degree sequence of either A or P (most often P) can be a collection of all 2s. When this occurs, projection results in a degree sequence that is isomorphic to that of the vertex set in

G_{A P}

. This is due to degree projection creating a set of

K_{2}

in the projection graph. As an example, consider

C_{6}

with three

a \in A

and three

p \in P

. The degree sequence for A is isomorphic to that of P and both sequences are

[2, 2, 2]

so

Δ (A) = Δ (P) = 2

. These sequences generate

G_{A}

and

G_{P}

isomorphic to

K_{3}

. This is not a conflict to maximum degree generating a complete subgraph based on the maximum degree because three

K_{2}

subgraphs are generated in both

G_{A}

and

G_{P}

; and

K_{3}

is also

C_{3}

. In fact, if

G_{A P}

is an even

C_{n}

with

n \geq 6

(required for distinct

N (p)

), then

G_{A}

and

G_{P}

are isomorphic cycle graphs each with order

\frac{n}{2}

due to the all 2s degree sequences of both partite sets. If

G_{A P}

is an odd

P_{n}

with

n \geq 5

, then

| A | = | P | + 1

, and

G_{A}

and

G_{P}

are both path graphs due to

Δ (A) = Δ (P) = 2

.

There is a compounding effect that can occur in the projection graphs. Figure 5 displays two

G_{A P}

with gray

p \in P

and white

a \in A

, where each

G_{A P}

is a

C_{8}

with two chords. In both cases for that specific

G_{A P}

, the A and P degree sequences are identical and the projection graphs are isomorphic. For both

G_{A P}

,

Δ (A) = Δ (P) = 3

yet

G_{A P} 1

on the left has a

G_{A}

(and isomorphic

G_{P}

) that is

K_{4}

while

G_{A P} 2

on the right has a

G_{A}

(and also

G_{P}

) that is

C_{4}

with a chord. Although metric dimension,

\dim (G)

, is discussed later in greater detail,

\dim (G_{A P} 1) = 3

while

\dim (G_{A P} 2) = 2

. Although this may seem like a contradiction to Proposition 2, it is not. In both instances, the projection of the maximum degree 3 vertex produces

K_{3}

subgraphs in the projection graph. The difference is due to the disparity in the structure of the neighborhoods due to the different locations of the degree 3 vertices. In

G_{A P} 1

, author vertex b with degree 2 is adjacent to two degree 3 papers so b is adjacent to a, c and d in

G_{A}

. On the other hand,

G_{A P} 2

contains no degree 2 vertex adjacent to two degree 3 vertices. Thus, the four degree 2 vertices have only two neighbors in their respective projection graphs. Due to the importance of Proposition 2, next is a large component example derived from the collected data.

For partite sets

a_{i} \in A

and

p_{j} \in P

, it is a fact that

\sum_{i = 1}^{| A |} deg (a_{i}) = \sum_{j = 1}^{| P |} deg (p_{j}) = m

where m is the size of the related

G_{A P}

. As with

G_{A P} 1

in Figure 5, the data derived large component of a

G_{A P}

in Figure 6 appears to contradict Proposition 2. However, due to the degree sum of A equaling the degree sum of P, the

K_{4}

in

G_{A}

is still much smaller than the

K_{6}

found in

G_{P}

due to

Δ (A) = 6

but

Δ (P) = 3

. This particular

G_{P}

contains two vertices with degree 8 and two vertices with degree 7 as shown later in Figure 11.

Remark 1.

Define

ϵ (G) = \frac{| E |}{| V |}

as in [12] and let

δ (G)

be the minimum degree of G. Excluding isolated vertices, Proposition 1.2.2 in [12] shows that large degree vertices are not scattered in vertices with smaller degrees. In other words, in any graph there exists a subgraph H, where

H = G

may be true, such that

δ (H) > ϵ (H) \geq ϵ (G)

. The proof of the proposition in [12] contains the following process. For G with at least one edge, construct an induced subgraph sequence

G = H_{0} \supseteq H_{1} \supseteq \dots

such that any

v_{i} \in V (H_{i})

where

deg (v_{i}) \leq ϵ (H_{i})

is deleted and

H_{i + 1} = H_{i} - v_{i}

. The process stops when there are no more

v_{i} \in V (H_{i})

that can be deleted. This results in

ϵ (H_{i + 1}) \geq ϵ (H_{i})

for all i, and an induced subgraph that contains the vertices with the larger degrees in G.

Because invariant

ϵ (G) = \frac{| E |}{| V |}

reflects the proportion of graph size to graph order,

ϵ (G) < 1

indicates a tree or forest. If

ϵ (G) = 1

, then graph size equals graph order, one example of which is

C_{n}

. For

K_{n}

,

ϵ (K_{n}) = \frac{x - 1}{2}

; so for

K_{2} ≅ P_{2}

,

ϵ (K_{2}) = \frac{1}{2} < 1

and for

K_{3} ≅ C_{3}

,

ϵ (K_{3}) = 1

.

The situation that produces the closest complete subgraphs in the projection graphs is when at least one of the partite sets of

G_{A P}

has a degree sequence that consists of all 2s. Because vertex

a \in A \subset V (G_{A P})

can have degree 1 and the minimum degree of

p \in P \subset V (G_{A P})

is 2, if the degree sequence of A is all 2s, then the degree sequence of P is also all 2s reflecting that

G_{A P}

is an even cycle graph with

n \geq 6

. In this case, the only complete projection graph is when

G_{A P}

is

C_{6}

and both projection graphs are

K_{3} ≅ C_{3}

. For even

C_{n}

with

n \geq 6

, the projection graphs are isomorphic cycles composed of

K_{2}

subgraphs. If

G_{A P}

is a subdivided

G_{A}

, then the degree sequence of P in

G_{A P}

is all 2s and

Δ (A) \geq Δ (P)

. For this situation, the degree sequence of

G_{A}

is isomorphic to the degree sequence of A in

G_{A P}

due to the all-2s P degree sequence.

Based on Proposition 2, if

Δ (A) = x

and

Δ (P) = y

where

x > y

, can the maximal induced complete subgraph in

G_{A}

have greater order than the maximal induced complete subgraph in

G_{P}

? The answer is “yes” for a specific case that follows where

Δ (A) > Δ (P)

but the maximal complete subgraph in

G_{A}

exceeds the order of the complete subgraph in

G_{P}

.

Suppose

G_{A P}

is a subdivided

G_{A} ≅ K_{| A |}

where

Δ (A) = x

. Because it poses a “small graph” exception concerning the number of

K_{x}

subgraphs in

G_{P}

, first let

x = 3

so

| A | = 4

. Then

G_{A P}

is the subdivided

K_{4}

and

G_{A} ≅ K_{4}

, so

G_{A}

contains four

K_{3}

subgraphs.

G_{P}

has

| P | = 6

, each

p \in V (G_{P})

has regular degree 4, and by Proposition 2, there exist eight

K_{3}

subgraphs but these subgraphs do not form a

K_{4}

or

K_{5}

. Instead there exist three sets of twin pairs in

G_{P}

. In this case,

3 = r_{□} (G_{A}) < r_{□} (G_{P}) = 4

and

\dim (G_{A}) = \dim (G_{P})

. Now more generally, let

Δ (A) = x > 3

. Then in

G_{A P}

,

| A | = x + 1

, all

a_{i} \in A

have degree x and each pair in set A shares a single paper. Thus,

| P | = \frac{(x + 1) (x)}{2} = y

and

G_{P}

has

Δ (G_{P}) = 2 (x - 1)

for all p and has

\frac{x (x + 1) (x - 1)}{2}

number of edges. There exist

x + 1

number of

K_{x}

subgraphs in

G_{A} ≅ K_{x + 1}

. Based on Proposition 2, the y number of

p_{j} \in V (G_{P})

are in

K_{x}

subgraphs but each p has degree

2 (x - 1)

that is greater than degree

x - 1

. To compare these particular

G_{A}

and

G_{P}

, consider

ϵ

given in Remark 1. For the

G_{A}

,

ϵ (G_{A}) = \frac{x}{2}

while

ϵ (G_{P}) = x - 1

; and

\frac{x}{2} < x - 1

for all

x > 3

. This shows that proportionally, there are more vertices with larger degrees in

G_{P}

than in

G_{A}

. However, the placement of the edges in

G_{A P}

only allows a p pair to have at most

x - 1

common neighbors. Thus, the largest complete subgraph in

G_{P}

is

K_{x}

which is smaller than

K_{x + 1} ≅ G_{A}

.

Also note that for the two regular degrees,

x < 2 (x - 1)

so

x = r_{□} (G_{A}) < r_{□} (G_{P}) = 2 (x - 1)

. A graph’s degree structure affects its metric dimension; so compared to

G_{A}

’s DM, either equal or more rows in

G_{P}

’s DM are required to produce unique column combinations. Hence,

\dim (G_{A}) \leq \dim (G_{P})

.

Corollary 1.

Suppose connected

G_{A P}

has partite sets A of authors and set P of representative papers and

G_{A P}

is not a subdivided

K_{n}

. If

Δ (A) > Δ (P)

, then a maximal complete subgraph in

G_{A}

has smaller order than a maximal complete subgraph in

G_{P}

. If

Δ (A) < Δ (P)

, then a maximal complete subgraph in

G_{P}

has smaller order than a maximal complete subgraph in

G_{A}

.

Proof.

Suppose

Δ (A) > Δ (P)

where

Δ (A) = x

and

Δ (P) = y

. Assume that the vertex elimination process described in Remark 1 has been done on connected

G_{A P}

generating a graph

G_{A P}^{*}

containing only the larger degree vertices in

G_{A P}

. Let

a_{Δ}

have degree x and

p_{Δ}

have degree y. To create a maximal situation, let

a_{Δ} \sim p_{Δ}

in

G_{A P}^{*}

and

G_{A P}

. Denote the set of

p_{Δ}

neighbors of

a_{Δ}

as

p_{Δ} i

where

1 \leq i \leq x

, and

a_{Δ}

neighbors of

p_{Δ}

as

a_{Δ} j

with

1 \leq j \leq y

. To get maximum degrees of

a_{Δ}

and

p_{Δ}

in their respective projection graphs, if it were possible that the neighbors of

a_{Δ}

and

p_{Δ}

shared no neighbors, then maximum

{deg}_{G_{A}} (a_{Δ}) = (\sum_{i = 1}^{deg (a_{Δ})} deg (p_{Δ i})) - deg (a_{Δ}) = x y - x

and similarly for

p_{Δ}

making maximum

{deg}_{G_{P}} (p_{Δ}) = y x - y

. Since

x > y

,

{deg}_{G_{A}} (a_{Δ}) < {deg}_{G_{P}} (p_{Δ})

for all

a_{Δ} \in G_{A}

and all

p_{Δ} \in G_{P}

. Similar logic shows

{deg}_{G_{P}} (p_{Δ}) > {deg}_{G_{A}} (a_{Δ})

.

Still assuming that all

a_{Δ}

are adjacent to all

p_{Δ}

in

G_{A P}^{*}

(and

G_{A P}

) where

x > y

, now let the neighbors of

a_{Δ}

and

p_{Δ}

share neighbors, let

N (a_{Δ})

be a set of

p_{Δ} i

where

N (p_{Δ} i)

is a set of

a_{Δ}

so

| N (a_{Δ}) | > | N (p_{Δ}) |

. Thus the probability of shared neighbors for the smaller distinct neighborhoods of

p_{Δ} i

is greater than the probability of shared neighbors in the larger possibly non-distinct

a_{Δ} i

neighborhoods resulting in

deg (a_{Δ})

in

G_{A}

being less than

x y - x

that is already less than

x y - y

for the

p_{Δ}

. Thus for

a_{Δ} \in G_{A}

,

| N (a_{Δ}) |

in

G_{A}

is less than

| N (p_{Δ}) |

in

G_{P}

. Because all

p_{Δ} i

are neighbors of

a_{Δ}

, they form a

K_{Δ (A)}

subgraph in

G_{P}

, and the same for the

a_{Δ} i

in

G_{A}

. Hence, the maximal complete subgraph in

G_{A}

is then smaller than the maximal complete subgraph in

G_{P}

.

When

x < y

, the larger neighborhoods are distinct. Thus, there exists a greater probability of shared neighbors in the smaller

N (a_{Δ} i)

because the vertices can have non-distinct neighborhoods (twins are permitted) and

N (p_{Δ})

have more

a_{Δ} i

in this case. This gives the

p_{Δ}

in

G_{p}

a degree smaller than

x y - y

that is smaller than

x y - x

in this case.

First assume

Δ (A) > Δ (P)

in

G_{A P}

that is not a subdivided

K_{n}

. For the sake of contradiction let the

a_{Δ}

in the associated

G_{A}

be in a larger maximal

K_{n}

subgraph than the

p_{Δ}

in the maximal

K_{n}

of the related

G_{P}

. This implies that each

p_{Δ}

has more unique (non-shared) common neighbors compared to the

a_{Δ}

in

G_{A P}

’s CNM. However, the probability of the last statement referring to

G_{A P}

’s CNM is zero due to the

N (a_{Δ} i)

having a greater chance of shared neighbors in

G_{A P}

. Now assume that

Δ (A) < Δ (P)

in

G_{A P}

and that the

a_{Δ}

in the associated

G_{A}

are in a smaller maximal

K_{n}

subgraph than the

p_{Δ}

in the maximal

K_{n}

of the related

G_{P}

. Using similar logic as when

Δ (A) > Δ (P)

again reveals a zero probability of this situation. □

6. Social Aspects of the Data Collected

In this section, the data collected from the three university web sites is examined first, followed by discussing each of the three disciplines regarding the department chairs as research focal points.

Assortative mixing occurs when members of social groups associate with each other based on specific characteristics. In this study, all professors have specific areas of focus within their general research area as given on the department web site. As expected, with the exception of two papers, professors collaborate with other professors in the department who share their specific research area. The two exceptions are education papers where department professors who do not have education as their research area, coauthor with the education researcher in their department. Although not given in this note, dendrograms based on distances successfully identified clear communities in the

G_{A}

and

G_{A P}

centered on research areas.

6.1. Analysis of University Data

Various characteristics of the three universities are collected from the institutions’ web sites with averages (means) and standard deviations (

σ

) presented here. Focus is exclusively on each university’s primary campus. The average student to teacher ratio is 15:1 with

σ = 1.5

. The average total student population is 27,136 (

σ = 2, 023

). Of the total student population, there is an average of 20,340 undergraduates (

σ = 1, 064

) representing 75% of the student body, and 6,795 graduate students (

σ = 2, 126

) for 25%. There is an average of 22,857 full-time students (

σ = 1, 219

) or 84%. The average in-state student population is 19,818 students (

σ = 7, 639

) or 73%.

By discipline, Table 1 displays the mean department size along with the mean number and the mean percent of faculty performing departmental collaboration. Overall, six of the nine department chairs (67%) collaborate within their department.

6.2. Analysis of Discipline Data: Hubs Analysis

Define a hub as any author vertex whose degree is greater than the average department degree plus one standard deviation. An analysis of hubs is done for both the nine

G_{A}

and the nine

G_{A P}

(with paper degrees excluded) with a focus on department chairs. Professors in a hub position may exert greater influence as far as research in a department is concerned [38].

In any

G_{A P}

, author vertices with larger degree indicate professors involved with a greater number of representative papers that reflect distinct research groups in the department. In any

G_{A}

, the interpretation of vertices with the greater degree is not well defined as there exist two possible meanings. In a

G_{A}

, a high degree a can reflect professors associated with either representative papers that have a larger number of departmental faculty authors, or reflect a large number of representative papers that may have only a few authors.

The data in Table 2 gives the hubs analysis for the nine data derived

G_{A}

and for the nine data derived

G_{A P}

. Overall, 33% of the department chairs are hubs in their departments. Any

G_{A P}

depicts the actual social network, not its

G_{A}

projection graph. In this analysis, the value defining the hubs is calculated separately for the nine

G_{A}

and the nine

G_{A P}

. The

G_{A P}

hub value is based on the author degree only (paper degree is excluded). Due to degree projection, in this analysis, 33% of the 18 graphs examined display a different set of hubs between the

G_{A}

and the related

G_{A P}

.

Utilizing the

G_{A P}

to assess the number of hubs produced an increase in total number of hubs from 22 (see Table 2) to 28 (27% increase). Although the total number of chairs acting as hubs remained the same, the number shifted from 1 to zero for physics and from 1 to 2 for mathematics. In comparing the two table sections, notice that there is no difference for the Biology row. An examination of the three data derived biology

G_{A}

to their related

G_{A P}

reveals that each

G_{A P}

is a subdivided version of the

G_{A}

in all three cases.

Focusing exclusively on the structure of

G_{A}

can give misleading interpretations as shown by the different hub analysis results between the

G_{A}

and the related

G_{A P}

and the fact that the interpretation of the large degree structure in a

G_{A}

is not well defined. These differences can impact an analysis of a network’s evolution over time.

Conjecture 1.

Given an author and paper collaboration network structure, analysis of projection graphs

G_{A}

and

G_{P}

, plus bipartite

G_{A P}

, is required in the prediction of future network links or edges and general network evolution.

7. The Nine Average Graphs

Derived from the 27 data derived graphs, the average graphs are covered in this section. For each discipline, the average

G_{A}

, average

G_{A P}

and average

G_{P}

are given, and are only briefly discussed. Focus in the discussion is on the large component,

L_{G}

, of each average graph. Only the metric dimension for the large component

L_{G}

of each average graph is displayed.

An important requirement in the transition between the average

G_{A}

and the average

G_{A P}

, and between the average

G_{P}

and the average

G_{A P}

, is that projecting the a in the average

G_{A P}

onto the p in the average

G_{A P}

must produce the average

G_{A}

determined from calculating the average

G_{A}

parameters; and similarly projecting the

p \in V (G_{A P})

onto the

a \in V (G_{A P})

must generate the calculated average

G_{P}

.

7.1. Authors Only Average Graphs and Analysis

As expected, in Figure 7, the number of components in all of the average graphs is similar since overall, 50% of the professors do departmental collaboration with little difference between average discipline department size as displayed in Table 1. Compared to the results in [24], the percent of vertices in the large components here is much less.

In comparing the large components of the three average

G_{A}

in Figure 7, notice that this component is much smaller for mathematics. The greater connectivity of the physics

G_{A}

is explained by the fact that the physics

G_{A P}

’s large component in Figure 8 has 5

K_{3}

subgraphs while the other two

G_{A P}

contain a single

K_{3}

. Although

Δ (G_{A})

is close for the three disciplines’ large components, the physics

G_{A}

has

r_{□} (G_{A}) = 3

while math and biology have

r_{□} (G_{A}) = 2

. As shown in the figure, for the

L_{G_{A}}

,

\dim (G_{A}) = 2

for all three disciplines.

7.2. Authors with Papers Average Graphs and Analysis

Figure 8 displays the three average

G_{A P}

for this study. Consider the

L_{G_{A P}}

for the average physics

G_{A P}

. Notice that one author in the physics

L_{G_{A P}}

has degree 5 while two other authors have degree 4. The relatively high degrees for the physics author vertices indicate that these professors have significant variety in their research group construction since representative papers are utilized. As depicted and previously mentioned, the average physics

G_{A P}

structure in Figure 8 reflects the three data derived

G_{A P}

. In other words, all of the physics departments in this study display a high amount of variety in their departmental collaboration research groups. The three

G_{A P}

have distinct

\dim (G_{A P})

for their

L_{G_{A P}}

.

The total number of papers produced during the study’s timeframe by all considered professors was determined. On average, the physics professors produced 3.5 times as many papers as the other two research areas. In two of the physics departments, the number of representative papers outnumbered the authors by 150% due to the variety in the research group construction. Could the high level of productivity be due to the departmental collaboration style of the physics professors? Greater variety in research group construction allows for a broader range of skills and knowledge in collaborative research.

7.3. Papers Only Average Graphs and Analysis

Figure 9 clearly reflects the departmental collaboration style difference between physics and the other two research areas. Notice that the

L_{G_{P}}

has

\dim (G_{P}) = 1

for math,

\dim (G_{P}) = 4

for physics and

\dim (G_{P}) = 3

for biology.

8. The Degree Diagram and the Durfee Rank

Figure 10 displays a connected

G_{A P}

along with the degree diagrams for

G_{A P}

’s vertex sets A and P. Although

Δ (A) = Δ (P) = 3

, set A has more vertices with degree 3 than P; so

r_{□}^{o} (A) = 3

and

r_{□}^{i} (P) = 2

. One impact of the difference in the two

r_{□}

is that

\dim (G_{A}) = 2

and

\dim (G_{P}) = 3 = \dim (G_{A P})

. Can

G_{P}

more accurately reflect the actual collaboration in a department instead of the related

G_{A}

? Notice that

| A | < | P |

in this case.

Consider the large component

L_{G_{A P}}

from a data derived

G_{A P}

depicted in Figure 11 (also displayed in Figure 6). On the top far left is the degree diagram of the

G_{A P}

degree sequence. Set

V (G_{A P})

has

Δ (G_{A P}) = 6

and

r_{□}^{o} (G_{A P}) = 4

. On the right side of

G_{A P}

are the degree diagrams for sets A and P in

G_{A P}

. Although the degree sequence for A is not graphic, each degree diagram clearly displays the degree relationships within each of the vertex sets compared to the other set. In comparing the two degree diagrams,

Δ (A) = 6

and

r_{□}^{o} (A) = 4

while

Δ (P) = 3

and

r_{□}^{p} (P) = 2

. Note that the degree diagram of

G_{A P}

is merely the union of the degree diagrams for sets A and P with

Δ (G_{A P}) = Δ (A)

and

r_{□}^{o} (G_{A P}) = r_{□}^{o} (A)

in this case.

Now examine

G_{A}

and

G_{P}

in the lower portion of Figure 11. Here

Δ (G_{A}) = 5

and

r_{□}^{r} (G_{A}) = 3

reflecting a decrease in these figures compared to those for set A in

G_{A P}

. For

G_{P}

,

Δ (G_{P}) = 8

compared to

Δ (P) = 3

and

r_{□}^{p} (G_{P}) = 5

versus

r_{□}^{p} (P) = 2

. This significant change is due to degree projection in

G_{A P}

where

Δ (A) = 6

generates a

K_{6}

subgraph in

G_{P}

. Due to

| A | < | P |

in

G_{A P}

and the Pigeonhole Principle, all of the high degree vertices are authors as shown by

r_{□}^{o} (G_{A P}) = r_{□}^{o} (A) = 4

and

Δ (P) = 3

. This results in all of the papers that are coauthored by the high degree authors gaining a combination of the high degrees in

G_{P}

.

Because metric dimension is used to compare networks, and based on the change in the hubs analysis between some

G_{A}

and their related

G_{A P}

, the next section examines the change in metric dimension given a

G_{A P}

and its related

G_{A}

and

G_{P}

.

9. The Metric Dimension of the Authors with Papers Graph

This section examines changes in the relative distance structure of

G_{A}

,

G_{P}

and

G_{A P}

, provides methods for finding

\dim (G_{A P})

. The metric dimension of G with multiple components is the sum of the metric dimension of each component in G. Although the average graphs all have multiple components, this section is focused exclusively on the changes in any single large component treated here as a connected graph. First the change between

\dim (G_{A P})

to

\dim (G_{A})

and to

\dim (G_{P})

for the average graphs is examined.

9.1. Changes Between the Three Data Derived Average Graphs

For each discipline, the metric dimension of that discipline’s three average graphs’

L_{G}

are compared. Unless otherwise stated, reference to

G_{A}

is referring to the large component of

G_{A}

; and the same is true for

G_{A P}

and

G_{P}

. The comparison begins with the mathematics discipline followed by physics and concludes with biology.

9.1.1. Metric Dimension in Average Mathematics Graphs:

With respect to the three average mathematics graphs,

\dim (G_{A}) = 2

,

\dim (G_{A P}) = 2

and

\dim (G_{P}) = 1

. In this case,

Δ (G_{A P}) = Δ (P) = 3

generating a

K_{3}

in

G_{A}

. Thus the relative distance structures of

G_{A}

and

G_{A P}

are similar, and the projection generating

G_{P}

is not structure preserving with respect to the relative distance structure of

G_{A P}

even though

d_{G_{P}} (p_{i}, p_{j}) = \frac{1}{2} d_{G_{A P}} (p_{i}, p_{j})

where

p_{i}, p_{j} \in P

.

9.1.2. Metric Dimension in Average Physics Graphs:

In examining the three average physics graphs,

\dim (G_{A}) = 2

,

\dim (G_{A P}) = 4

and

\dim (G_{P}) = 4

with

Δ (G_{A P}) = Δ (A) = 5

producing a

K_{5}

subgraph in

G_{P}

where

Δ (G_{P}) = 7

. In this case the distance structures of

G_{A}

and

G_{A P}

are different while the distance structures of

G_{P}

and

G_{A P}

are similar. Here the projection generating

G_{A}

does not preserve the relative distance structure of

G_{A P}

. This again reflects the importance of paper inclusion.

9.1.3. Metric Dimension in Average Biology Graphs:

The metric dimensions of the three average biology graphs are

\dim (G_{A}) = 2

,

\dim (G_{A P}) = 3

and

\dim (G_{P}) = 3

. For biology,

Δ (G_{A P}) = Δ (A) = 4

generating a

K_{4}

in

G_{P}

with

Δ (G_{P}) = 5

. Similar to physics, the relative distance structure of

G_{A P}

is that of

G_{P}

, not

G_{A}

.

9.2. Double Distance and Diameter

As mentioned,

d_{G_{A P}} (a_{i}, a_{j})

, is double

d_{G_{A}} (a_{i}, a_{j})

; and similarly for

d_{G_{A P}} (p_{i}, p_{j})

and

d_{G_{P}} (p_{i}, p_{j})

. The double distance fact does not apply to the diameter of

G_{A}

,

G_{P}

and

G_{A P}

. In all cases,

diam (G_{A P}) > diam (G_{A})

and

diam (G_{A P}) > diam (G_{P})

due to the double distance relationship. In some cases, but not all, when the diameter in

G_{A P}

is an author to author path,

diam (G_{A P}) = 2 \cdot diam (G_{A})

. In all cases,

\frac{diam (G_{A P})}{2} > diam (G_{P})

and

diam (G_{A}) \geq diam (G_{P})

since no paper can be a pendant vertex in

G_{A P}

. Recall that, by affecting the possible distance variety in a graph’s DM, diameter impacts metric dimension of a graph. When the diameter is a to a, or p to p, it is even; and when the diameter is a to p, it is odd.

9.3. Using the DM for Any Graph’s Metric Dimension

For any G with a labeling that generates distinct DM blocks, the blocks can be utilized to determine

\dim (G)

resulting in a significant increase in computational efficiency. Finding a resolving set is relatively simple. However finding a minimal resolving set is NP-hard [15] as all possibilities must be explored. Regarding a graph’s DM, all rows must be compared to all other rows in order to find a minimal number of rows that resolve G with unique column combinations. Thus, being able to find

\dim (G)

using matrix blocks gives significant computational efficiency. Theorem 1 states that given a specific matrix block resolving result, only three of the four DM blocks need to be resolved in order to find

\dim (G_{A P})

. Focus in this note is now on methods for finding

\dim (G_{A P})

.

9.4. DM Block Resolving

Throughout the rest of this note, it is assumed that any

G_{A P}

has a distinguishing labeling that generates blocks in its DM. As reflected in the DM blocks for any

G_{A P}

, because

G_{A P}

is bipartite, distances between elements in the same vertex set are all even, while those between the two partite sets are odd.

DM block resolving is the process of using the portions of G’s distance matrix (DM) rows (or columns) contained in the DM blocks to find the minimum number of rows in each block that produces unique column (or row) combinations.

The phrase “block resolving” is used when it is clear that the blocks are those in a DM. When using block resolving either rows, or columns, can be utilized but it is critical to use the same method for all blocks that are resolved. In this note, rows are used for block resolving and it is assumed that the reader understands that the choice of columns also exists.

All column combinations that contain a zero are unique with the zero implying that the row index vertex is in a W set for the block. The minimum number of rows that gives unique column combinations for that block is denoted by

\dim (a \times a)

,

\dim (a \times p)

,

\dim (p \times a)

and

\dim (p \times p)

.

An even block is a block that contains only even entries; so the

a \times a

and

p \times p

blocks are even. Analogously, the

a \times p

and

p \times a

blocks are odd blocks. General terms referring to the minimum number of rows that resolve a block are

\dim (e v e n)

and

\dim (o d d)

. The term

\dim (b l o c k)

refers to minimally block resolving of a general block, either even or odd.

When using DM block resolving to find

\dim (G_{A P})

, the focus is on the even blocks as these blocks have row and column indices from a particular partite set, and as shown later, these blocks are also related to the projection graphs. The odd blocks are sometimes referred to as a block extension because literally, these blocks extend the rows of the even blocks by giving the relative distance relationships to the vertices in the other partite set.

Prior to giving four example

G_{A P}

selected for the variety of their block resolving results and result interpretations, the characteristics of the even and odd blocks are discussed.

9.5. DM Block Characteristics

Reference to a block row refers only to the portion of the DM row contained in the specific block being discussed.

9.5.1. Even Blocks:

Even blocks are always square and symmetric across their diagonal. The dimension of an even block is the partite set cardinality whose vertices index the block. The entries are all even with a single zero in each row, but the entries in the

a \times a

block compared to the

p \times p

block are not necessarily identical sets of even numbers.

Even blocks contain only even diameters. Because any graph has only one diameter measure, an even diameter plus the zeros, provide an even block with greater resolving efficiency by providing greater distance variety compared to its extension block. An even diameter can be in either the

a \times a

block only, in the

p \times p

block only or in both even blocks.

Recall that when row resolving, the zero indicates that that row’s index vertex is in an ordered W set, so any column combination with a zero is automatically unique. Thus the existence of the single zero automatically reduces the number of column combinations that need to be checked for uniqueness.

9.5.2. Odd Blocks

Odd blocks are only square when

| A | = | P |

. The entries in the odd blocks are all odd with 1s indicating the vertices in the open neighborhood of the vertex that is the row’s index. Thus,

Δ (A)

and

Δ (P)

impact the distance variety in the odd blocks while

r_{□} (A)

and

r_{□} (P)

, including their

r_{□}

type, indicate the number of rows that might have a larger number of 1s. The odd blocks’ symmetry is to each other, across the diagonal of the DM, resulting in the two odd blocks having the same sets of odd integers. In other words, the rows of the

a \times p

block are the columns of the

p \times a

block and vice versa.

An odd diameter is in both odd blocks and increases distance variety. As extensions of the even blocks, the odd blocks can restrict the use of the related even block vertices in a minimal W set for the

G_{A P}

.

9.6. DM Block Resolving Examples

When block resolving,

\dim (a \times a)

and

\dim (p \times p)

always need to be determined. Based on the results of these two metric dimensions, when

\dim (a \times a) \neq \dim (p \times p)

, resolving only one more block is required. Following are four simple example

G_{A P}

with brief explanations of their block resolving and its interpretation.

The adjacency matrix for a

K_{n}

,

A (K_{n})

has a distinct recognizable structure of all 1s except for the all zero diagonal. The matrix resulting from

2 \cdot A (K_{n})

is

A (K_{n})

with the 1s replaced by 2s. When the

2 \cdot A (K_{n})

structure is found as a subblock in a DM even block, it is called a

K_{n}

double subblock. The importance of these subblocks relates back to Proposition 2, and the double distance relationship between

G_{A P}

and its projection graphs.

A

dim (b l o c k)

box is a

2 \times 2

box that displays the metric dimension of each DM block. Thus, the

\dim (b l o c k)

box displays the relationship of the four

\dim (b l o c k)

allowing easier comparison and overall interpretation.

9.6.1. Example 1:

Figure 12 displays a

G_{A P}

with its DM and

\dim (b l o c k)

box where

\dim (a \times a) = \dim (a \times p)

and

\dim (p \times p) = \dim (p \times a)

.

In examining the DM in Figure 12’s center and the

\dim (b l o c k)

box on the right, the entire rows for a and b have unique column combinations for all vertices in the graph as reflected by

\dim (a \times a) = 2 = \dim (a \times p)

. There is a

K_{4}

double subblock in

p \times p

making

dim (p \times p) = 3

. The row length in

p \times a

is 5 but resolving two odd digits in two rows gives only

2^{2} = 4

unique column combinations, so the row length of 5 requires 3 rows for

\dim (b l o c k)

. Thus,

dim (G_{A P}) = 2

and

W = {a, b}

.

9.6.2. Example 2:

Figure 13 displays another

G_{A P}

. Both even blocks contain a

K_{n}

double subblock and

\dim (o d d)

does not agree with

\dim (e v e n)

for which the odd block is an extension.

In this case, utilizing

\dim (a \times a) = 2

is not possible for a minimal W in

G_{A P}

because the number of 1s in the extension block generates

\dim (a \times p) = 3

. Thus, a minimal W is either all three a; or three p, the choice of which depends on the rows that resolve

p \times a

. Note that because

\dim (a \times a) \neq \dim (p \times p)

once these values are known, checking whether the smaller

\dim (e v e n)

provides

\dim (G_{A P})

can be done by resolving only the extension block for the block with the smallest

\dim (e v e n)

. Here,

\dim (G_{A P}) = 3

and

W = {2, 3, 4}

.

9.6.3. Example 3:

Example 3 shown in Figure 14 displays a

G_{A P}

where

\dim (a \times a) = \dim (p \times p)

but

\dim (o d d)

is greater than

\dim (e v e n)

.

In this case, a minimal W must be constructed with both a and p vertices; so for this example,

W = {b, 2}

where both DM rows contain

diam (G_{A P}) = 4

. Utilizing three a or three p as dictated by

\dim (o d d)

resolves

G_{A P}

but not minimally. Thus, when

\dim (a \times a) = \dim (p \times p)

all four blocks should be resolved.

9.6.4. Example 4:

As shown in Figure 15, the

G_{A P}

in Example 4 generates

\dim (a \times a) \neq \dim (p \times p)

and

\dim (a \times p) \neq \dim (p \times a)

.

This example provides an additional situation where

\dim (a \times a) \neq \dim (p \times p)

and resolving the extension block for the block with the smaller

\dim (e v e n)

provides

\dim (G_{A P})

by restricting

\dim (a \times a) = 2

. In other words, two a vertices do not resolve

G_{A P}

but three a give

\dim (G_{A P})

. Thus,

\dim (G_{A P}) = 3

and

W = {2, 3, 4}

. Note that in this case

\dim (G_{A P}) \neq \dim (e v e n)

and that

r_{□}^{p} (A) = 2

,

r_{□}^{i} (P) = 2

but

r_{□}^{o} (G_{A P}) = 3

. As mentioned, there is more to explore in the

r_{□}

concept than what given in this note.

9.7. Relation of the DM Blocks to the Projection Graphs

The following proposition relates the DM blocks to the two projection graphs.

Proposition 3.

For the distance matrix (DM) and DM block resolving of

G_{A P}

with authors set A and representative papers set P where

G_{A P}

has a distinguishing labeling,

\dim (a \times a) = \dim (G_{A})

and

\dim (p \times p) = \dim (G_{P})

.

Proof.

Because

G_{A}

is constructed by projecting the vertices in A onto the vertices of P in the related

G_{A P}

, for each

a_{i}, a_{j} \in V (G_{A}) \subset V (G_{A P})

,

{deg}_{G_{A}} (a_{i})

is the number of 2s in row

a_{i}

of the

a \times a

block of

G_{A P}

’s DM. In other words, if there exists a 2 in the

a \times a

block at the intersection of row

a_{i}

and column

a_{j}

, then

a_{i} \sim a_{j}

in

G_{A}

because

d_{G_{A P}} (a_{i}, a_{j}) = 2 \cdot d_{G_{A}} (a_{i}, a_{j})

indicating that

a_{i}

shares a neighbor p with

a_{j}

in

G_{A P}

. Thus, if all entries in the

a \times a

block are multiplied by

\frac{1}{2}

the resultant block is isomorphic to

G_{A}

’s DM. It follows then that a combination of rows that minimally resolve

a \times a

also minimally resolve

G_{A}

. Hence,

\dim (a \times a) = \dim (G_{A})

. Given

p_{i}, p_{j} \in V (G_{P}) \subset V (G_{A P})

, the same reasoning applies resulting in

\dim (p \times p) = \dim (G_{P})

. □

Suppose that for a

G_{A P}

and its related

G_{A}

and

G_{P}

,

\dim (G_{A}) = x

and

\dim (G_{P}) = y

where x and y may, or may not, be equal. Due to the double distance relationship between

G_{A P}

and its projection graphs, x number of a minimally resolve all

a \in A

; but x number of a do not necessarily minimally resolve the vertices in A to those in P because the double distance relationship does not apply. Thus, x number of vertices do not necessarily minimally resolve

G_{A P}

. Given that

\dim (G_{P}) = y

, y number of p vertices minimally resolve the vertices in P, but not necessarily minimally resolve set A nor minimally resolve

G_{A P}

.

Proposition 4.

Any minimal resolving set of

G_{A}

minimally resolves the vertices

a_{i}, a_{j} \in A \subset V (G_{A P})

but not necessarily the vertices in

P \subset V (G_{A P})

; so not necessarily

G_{A P}

. Likewise, any minimal resolving set of

G_{P}

minimally resolves vertices

p_{i}, p_{j} \in P \subset V (G_{A P})

but not necessarily the vertices in

A \subset V (G_{A P})

; so not necessarily

G_{A P}

.

9.8. Complete Graph Double Subblocks

As stated previously, if a 2 is at the DM intersection of

v_{i}

and

v_{j}

then these vertices are adjacent in the related projection graph. If the same

v_{i}

and

v_{j}

are found in a

K_{n}

double subgraph of an even DM block, then

v_{i}

and

v_{j}

are found in a

K_{n}

subgraph in the related projection graph, thus impacting the metric dimension of the projection graph. Proof of the following proposition is given by the gray boxes of the DM in Figure 16.

Proposition 5.

Given the distance matrix (DM) for a

G_{A P}

with authors set A where

Δ (A) = x

and representative papers set P with

Δ (P) = y

, if

G_{A P}

is given a distinguishing labeling that is consecutive around its even cycle subgraphs, then a

K_{y}

double subblock is possible in the

a \times a

block and a

K_{x}

double subblock is possible in the

p \times p

block.

9.9. Existence of Twin Pairs

The existence of twin pairs can affect the possible results for the DM blocks. Figure 16 shows that it is possible to have

\dim (p \times a) = 0

reflecting the impact of twins on block resolving.

Proposition 6.

Given a

G_{A P}

with authors set A and representative papers set P and utilizing row block resolving of

G_{A P}

’s distance matrix (DM) and DM block resolving,

\dim (p \times a) = 0

if and only if

G_{A P}

has at least one twin pair.

Proof.

If

G_{A P}

has at least one twin a pair, by the definition of a twin pair, the rows in

a \times p

indexed by the twin pair have duplicate entries; and there exist duplicate columns in the

p \times a

block so

\dim (p \times a) = 0

.

When

\dim (p \times a) = 0

, there must exist duplicate columns in this block. Only the vertices in A can be twins; and any twin pair has identical rows in the

a \times p

block; so twin vertices have identical columns in the

p \times a

block and

\dim (p \times a) = 0

. If there are no twin vertices, then there are no identical columns in the

p \times a

block so it is resolvable and

\dim (p \times a) \neq 0

. □

Twin a vertices also generate duplicate column combinations in the

a \times a

block because, except for the single zero, their rows have duplicate entries and the

a \times a

block is symmetric along its diagonal. With their need for unique neighborhoods, no two papers can be twins. It follows then that for a

G_{A P}

, the

a \times p

block is always resolvable when using row block resolving. When

\dim (p \times a) = 0

, no W for the

G_{A P}

can contain onlyp vertices. However, p vertices can be included with a vertices in a minimal W where

\dim (G_{A P})

is determined by

\dim (a \times a)

and

\dim (a \times p)

.

Proposition 7.

Suppose that a

G_{A P}

has authors set A and representative papers set P, where

G_{A P} ≇ P_{3}

and

G_{A P}

is not a subdivided

K_{n}

. Utilizing

G_{A P}

’s distance matrix (DM) along with DM block resolving, if

Δ (A) > Δ (P)

, then

\dim (a \times a) \leq \dim (p \times p)

; and if

Δ (A) < Δ (P)

, then

\dim (a \times a) > \dim (p \times p)

.

Proof.

Let

Δ (A) = x

and

Δ (P) = y

. From Proposition 2, when

Δ (A) \neq Δ (P)

,

Δ (A)

generates at least a

K_{x}

subgraph in

G_{P}

and

Δ (P)

generates at least a

K_{y}

subgraph in

G_{A}

. Corollary 1 states that the

K_{n}

subgraph generated by the smaller maximum partite set degree cannot exceed the

K_{n}

subgraph generated by the other maximum partite set degree when

G_{A P}

is not a subdivided

K_{n}

.

When

G_{A P}

contains twins, other than the single zero in each

a \times a

block row, the rows of twin a vertices contain identical distances that create blocks of identical column combinations where the other a vertices have columns indexed by the twins. Thus, when

x > y

,

\dim (a \times a)

in this case can exceed the metric dimension of the

K_{y}

double subblock and reach equality with

\dim (p \times p)

resulting in

\dim (a \times a) \leq \dim (p \times p)

. When

x < y

, because, p vertices cannot be twins, equality does not occur. The

p \times p

block has at least a

K_{x}

double subblock while the

a \times a

block has at least a

K_{y}

double subblock, so

\dim (a \times a) > \dim (p \times p)

. □

Because metric dimension is defined across all of G’s vertices, there exists no “efficiency” that could generate

\dim (G_{A P}) < min {\dim (G_{A}), \dim (G_{P})}

. There is also no inefficiency that could produce

\dim (G_{A P}) > max {\dim (a \times a), \dim (a \times p)}

or

\dim (G_{A P}) > max {\dim (p \times p), \dim (p \times a)}

. In other words, because

V (G_{A P})

is the union of A and P and both

max {\dim (a \times a), \dim (a \times p)}

and

max {\dim (p \times p), \dim (p \times a)}

minimally resolve the sets A and P across all of

V (G_{A P})

,

\dim (G_{A P}) ≯ max {\dim (a \times a), \dim (a \times p)}

and

\dim (G_{A P}) ≯ max {\dim (p \times p), \dim (p \times a)}

. This proves Lemma 1.

Lemma 1.

Suppose bipartite

G_{A P}

has authors set A and representative papers set P. Concerning the distance matrix (DM) for

G_{A P}

and using DM block resolving, if

\dim (a \times a) \neq \dim (p \times p)

, then

\dim (o d d) \leq max {\dim (a \times a), \dim (p \times p)}

.

Theorem 1 states that if

\dim (a \times a) \neq \dim (p \times p)

, then only three DM blocks need to be resolved.

Theorem 1.

For

G_{A P}

with authors set

a \in A

and representative paper set

p \in P

, with respect to block resolving

G_{A P}

’s distance matrix (DM), if

\dim (a \times a) \neq \dim (p \times p)

, then resolving only three blocks determines

\dim (G_{A P})

.

Proof.

Let

\dim (a \times a) = x

and

\dim (p \times p) = y

where

x \neq y

. It is a given that finding

G_{A P}

requires determining

\dim (a \times a)

and

\dim (p \times p)

. From Lemma 1

{\dim (a \times p), \dim (p \times a)} \leq max {\dim (a \times a), \dim (p \times p)}

. First let

x < y

. As the goal is to find a minimal value for

\dim (G_{A P})

, resolving

a \times p

in this case gives the minimal value of a possible W for all vertices in

G_{A P}

. If

\dim (a \times p) \leq \dim (a \times a)

, then

\dim (G_{A P}) = \dim (a \times a)

; and if

\dim (a \times a) < \dim (a \times p) \leq \dim (p \times p)

, then

\dim (G_{A P}) = \dim (a \times p)

. If

a \times p

contains identical rows making

\dim (p \times a) = 0

, and indicating the existence of twins in

G_{A P}

, then

\dim (G_{A P}) = max {\dim (a \times a), \dim (a \times p)}

. Now let

x > y

. Because

p \times a

is the extension of

p \times p

, and using the same logic as when

x < y

,

\dim (G_{A P}) = max {\dim (p \times p), \dim (p \times a)}

. In either case for x and y, only three blocks need to be resolved in order to find

\dim (G_{A P})

. □

The following proposition follows from Proposition 5 that shows the existence of the

K_{n}

double subblocks.

Proposition 8.

Suppose

G_{A P}

has authors set A and representative papers set P. If

\dim (G_{A}) = \dim (G_{P})

, then

\dim (G_{A P}) = \dim (G_{A}) = \dim (G_{P})

.

Proof.

Suppose

\dim (G_{A}) = \dim (G_{P}) = x

so

\dim (e v e n) = x

. Let

x < y

. If

\dim (o d d) = x

, then it is a given that

\dim (G_{A P}) = x

. If

\dim (a \times p) = x

and

\dim (p \times a) = y

, then

\dim (G_{A P}) = x

because x number of a vertices minimally resolve the graph. It follows that if

\dim (a \times p) = y

and

\dim (p \times a) = x

, then

G_{A P}

is minimally resolved by x number of p vertices so

\dim (G_{A P}) = x

. If

\dim (o d d) = y

, then a combination of a and p vertices whose numbers total x can minimally resolve

G_{A P}

giving

\dim (G_{A P}) = x = \dim (G_{A}) = \dim (G_{P})

. □

Proposition 9 is focused on using maximum degree and

r_{□}

to find

\dim (G_{A P})

. Recall that

R_{r_{□}}

indicates the degree diagram row indicated by the

r_{□}

type and

ℓ (R_{r_{□}})

denotes the length of this row. The table in Figure 1 is referenced in the following proof of Proposition 9.

Proposition 9.

Assume that

G_{A P}

has authors set A and representative papers set P. If

Δ (A) = Δ (P)

,

r_{□} (A) = r_{□} (P)

and the

r_{□}

corner type for both A and P is the same, then

\dim (G_{A P}) = \dim (G_{A}) = \dim (G_{P})

.

Proof.

Assume

Δ (A) = Δ (P) = x

. Let

G_{A P} ≅ P_{n}

where

n = | A | + | P |

and n is odd. Because

Δ (A) = Δ (P)

and

r_{□} (A) = r_{□} (P)

with the same corner type,

| A | \geq 5

and

| P | \geq 4

,

Δ (A) = Δ (P) = 2

,

r_{□}^{r} (A) = r_{□}^{r} (P) = 2

and it is a given that

\dim (G_{A P}) = \dim (G_{A}) = \dim (G_{P}) = 1

.

Recall that for any

G_{A P}

,

\sum deg (a) = \sum deg (p)

. Because

Δ (A) = Δ (P) = x

, the degree diagrams for both A and P have the same top row. Because

r_{□} (A) = r_{□} (P)

and both

r_{□}

have the same corner type, for both diagrams, the rows above and below

R_{r_{□}}

plus row

R_{r_{□}}

have the same general structure as shown by the table in Figure 1. Deviation from similarity is controlled by the fact that

\sum deg (a) = \sum deg (p)

. Thus, A and P must have very similar or identical large degree structures. From Proposition 2 there exists a

K_{x}

subgraph in the projection graphs. The similarity of the large degree structures forces the degree structures of the projection graphs to deviate from the

K_{x}

subgraph structure in similar ways. Hence, if

Δ (A) = Δ (P)

and

r_{□} (A) = r_{□} (P)

and the

r_{□}

corner type for both A and P is the same, then

\dim (G_{A P}) = \dim (G_{A}) = \dim (G_{P})

. □

10. Conclusions

10.1. Conclusions Regarding Social Aspects

Concerning the impact of paper exclusion, compared to studies utilizing large databases, the small data set of 245 professors allowed for detailed department level analysis. To minimize graph size yet reflect the collaboration structure, this study used representative papers that reflected unique research groups within the biology, mathematics and physics departments. Overall, 50% of the professors participated in departmental collaboration during this study’s timeframe.

Utilizing the average graph concept and constructing authors only graphs (

G_{A}

), papers only graphs (

G_{P}

), and bipartite authors with papers graphs (

G_{A P}

), this study found substantial differences when comparing the three average

G_{A}

large components to the three average

G_{A P}

large components. An analysis of department research hubs revealed a change of 27% between the

G_{A}

hubs and the

G_{A P}

hubs. Based on this fact, Conjecture Section 6.2 states that excluding papers/research groups might make identifying the model for network evolution impossible.

Similar to other studies, when overall papers were considered, the physics departments generated 3.5 times as many papers as the other two departments. Comparing the three average

G_{A P}

large components in Figure 8, the physics

G_{A P}

displays significantly larger author vertex degree, and for two of the three physics departments, the number of representative papers, that align with unique research groups, outnumbered authors by 150%. These two results reflect that the physics professors in this study have greater variety in the construction of their research groups. Could the physics style of departmental collaboration result in higher paper/research output?

Compared to the results in [24], the percent of vertices in this note’s

G_{A}

large components is much less. At what investigation level, does the order of the large component get close to the range 60% to 90+% inclusion found in other studies involving larger data sets? Would exploring institutional collaboration create the large components with higher percents of inclusion?

10.2. Conclusions Regarding Network Analysis

Of the 18

G_{A}

and

G_{P}

largest components constructed from collected data, 33% were bipartite. Results showed that the same

G_{A}

structure resulted from very different

G_{A P}

structures. This reflects that

G_{A}

interpretation of the related

G_{A P}

is not well defined, reiterating the need for paper inclusion.

This note showed that the distance matrix (DM) of any

G_{A P}

along with DM block resolving, provides an accurate and more efficient method for finding the metric dimension,

\dim (G_{A P})

, of any

G_{A P}

. Metric dimension comparisons of the

G_{A}

and

G_{P}

to the related

G_{A P}

showed that paper structure significantly influences the network’s distance metrics giving greater foundation for Conjecture Section 6.2. In other words,

\dim (G_{A})

alone cannot be reliably used to predict

\dim (G_{A P})

.

Although DM block resolving provides an accurate and more computationally efficient method for finding the metric dimension for bipartite graphs, many networks are not bipartite. Can a method be developed, perhaps by a labeling methodology, that creates clear blocks in any graph’s DM?

The challenges faced with paper inclusion in the collaboration studies based on the large databases are significant. As done in this study, utilizing representative papers reduces the number of papers and accurately reflects unique research groups. However, the large number of authors on many papers poses a potential problem in defining the representative papers. Can a co-authorship “core” for papers be identified? Is there some other strategy that allows for paper inclusion without hindering collaboration network analysis due to enormous network order and size?

References

Anderson TR, Hankin RKS, Killworth PD (2008) Beyond the Durfee square: Enhancing the h-index to score total publication output. Scientometrics 76(3):577-588. [CrossRef]
Andrews GE, Eriksson K (2004) Integer Partitions. Cambridge Univ. Press, Cambridge.
Anuradha A, Amutha B A study on metric dimension of some families of graphs. AIP Conference Proceedings 2019 2112(1):1-5. [CrossRef]
Baca M, Baskoro ET, Salman ANM, Saputro SW, Suprijanto D (2011) The metric dimension of regular bipartite graphs. Bulletin mathématiques de la Société des sciences mathématiques de Roumanie 54(102):15-28. [CrossRef]
Blanchard CG, Becker JV, Bristow AR (1979) Attitudes of Southern Women: Selected Group Comparisons. Psychology of Women Quarterly 1(2):160-171. [CrossRef]
MCarey MR, Johnson DS (1979) Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY.
Cartwright D, Harary F (1956) Structural balance: a generalization of Heider’s Theory. The Psychological Review 63(5):277-293. [CrossRef]
Chartrand G, Ehoh L, Johnson MA, Oellermann OR (2000) Resolvability in graphs and the metric dimension of a graph. Discrete Applied Mathematics 105:99-113. [CrossRef]
Chartrand G, Lesniak L, Zhang P (2016) Graphs & Digraphs, 6th Ed. CRC Press, Boca Raton, FL.
Cummins CJ, King RC (1987) Young diagrams, supercharacters of OSp(M/N) and modification rules. Journal of Physics, A: Mathematical and General 20:3103-3120. [CrossRef]
Diaz J, Pottohen O, Serna M, van Leeuwen EJ (2017) Complexity of metric dimension on planar graphs. Journal of Computer and System Sciences. 83:132-158. [CrossRef]
Diestel R (2017) Graph Theory, 5th Ed. Springer: Graduate Texts in Mathematics series, Berlin.
Harary F, Norman RZ (1953) Graph theory as a mathematical model in social science. Bulletin de l’Institut de recherches économiques et sociales 26(8). [CrossRef]
Harary F, Melter RA (1976). On the metric dimension of a graph. Ars Combinatoria 2:191–195.
Hartung S, Nichterlein A On the parameterized and approximation hardness of metric dimension. 2013 IEEE Conference on Computational Complexity Stanford University. Palo Alto, CA. 266–276. [CrossRef]
Heinz T, Shapira P, Rogers JD, Senker JM(2009) Organizational and institutional influences on creativity in scientific research. Research Policy 38:610-623. [CrossRef]
Janežič D, Miličević A, Nikolić S, Trinajstić N (2015) Graph-Theoretical Matrices in Chemistry, 2nd Ed. CRC Press, Taylor & Francis Group, Boca Raton, FL.
Kyvik S, Reyert I (2017) Research collaboration in groups and networks: differences across academic fields. Scientometrics 113:951-967. [CrossRef]
Lovász L (2010) Graphs and Geometry. American Mathematical Society. Volume 65. Providence, RI.
Merris R (2003) Combinatorics, 2nd Ed. Wiley Interscience, John Wiley & Sons, Inc. Hoboken, NJ.
Merris R (2001) Graph Theory. Wiley Interscience, John Wiley & Sons, Inc. Hoboken, NJ.
Merton RK (1968) The Matthew effect in science. Science. 159(3810):56-63. [CrossRef]
Milgram S (1967) The small-world problem. Psychology Today 1(1, May):61-67. [CrossRef]
Newman M (2001) The structure of scientific collaborations networks. PNAS 98(2):404-409. [CrossRef]
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. PNAS-06 99(12):7821-7826. [CrossRef]
Newman M (2004) Coauthorship networks and patterns of scientific collaboration. PNAS 101(1):5200-5205. [CrossRef]
Newman MEJ (2005) Power laws, Pareto distributions and Zipf’s law. Contemporary Physics. 46(5):323–351. [CrossRef]
Newman M (2018) Networks, 2nd Ed. Oxford Press. Oxford, England.
Prabhu S, Jeba SR, Stephen S (2025) Metric dimension of a star fan graph. Scientific Reports 15(102). 102-108. [CrossRef]
de Solla Price DJ (1965) Networks of scientific papers: the pattern of bibliographic references indicates the nature of the scientific front. Science 149(3683):510-515. [CrossRef]
Ruscio J, Seaman F, D’Oriano C, Stremlo E, Mahalchik K (2012) Measuring scholarly impact using modern citation-based indices. Measurement 10:123-146. [CrossRef]
Schriba I, Farrugia S (2011) On the spectrum of threshold graphs. ISRN Discrete Mathematics 1-21. [CrossRef]
Slater PJ (1975). Leaves of trees (Proc. 6th Southeastern Conference on Combinatorics, Graph Theory, and Computing, Florida Atlantic Univ., Boca Raton, FL) Congressus Numerantium 14:549–559.
Tillquist RC, Frongillo RM, Lladser ME (2023) Getting the lay of the land in discrete space: a survey of metric dimension and its applications. SIAM Review 65(4):919-962. [CrossRef]
Trinajstić N (1992) Chemical Graph Theory. Taylor and Francis, LLC; Boca Raton, FL.
van Rijnsoever FJ, Hessels LK, Vandeberg RIJ (2008) A resource-based view on the interactions of university researchers. Research Policy 37:1255-1266. [CrossRef]
Tapendra BC, Dueck S (2025) The metric dimension of circulant graphs. Opuscula Mathematica 45(1):39-51. [CrossRef]
Wagner CS, Leydesdorff L (2005) Network structure, self-organization, and the growth of international collaboration in science. Research Policy 34:1608-1618. [CrossRef]
Wang J, Tian F, Liu Y, Pang J, Miao L (2023) On graphs of order n with metric dimension n-4. Graphs and Combinatorics 39(29):1-18. [CrossRef]
Watts DJ (2003) Six Degrees: The Science of a Connected Age. W.W. Norton & Co., New York, NY.
Watts DJ, Strogatz SH (1998) Collective dynamics of the ‘small-world’ networks. Nature 3393:440-442. [CrossRef]

Figure 1. Five types of

r_{□}

corners and a table with respect to degree diagram row length.

Figure 1. Five types of

r_{□}

corners and a table with respect to degree diagram row length.

Figure 2. Graph G, two

W_{i}

resolving sets for G and G’s distance matrix, DM.

Figure 2. Graph G, two

W_{i}

resolving sets for G and G’s distance matrix, DM.

Figure 3. Examples of largest components from data derived

G_{A}

,

G_{A P}

and

G_{P}

.

Figure 3. Examples of largest components from data derived

G_{A}

,

G_{A P}

and

G_{P}

.

Figure 4. Example with central degree 6 author and adjacent gray papers.

Figure 5. With respect to Proposition 2, two

G_{A P}

, with the

G_{A}

related to each

G_{A P}

, where

G_{P} ≅ G_{A}

.

Figure 5. With respect to Proposition 2, two

G_{A P}

, with the

G_{A}

related to each

G_{A P}

, where

G_{P} ≅ G_{A}

.

Figure 6. With respect to Proposition 2, a

G_{A P}

with related

G_{A}

and

G_{P}

.

Figure 6. With respect to Proposition 2, a

G_{A P}

with related

G_{A}

and

G_{P}

.