Preprint
Article

This version is not peer-reviewed.

Collaboration: People, Papers, Average Graphs, Durfee Squares and Metric Dimension

Submitted:

16 September 2025

Posted:

19 September 2025

You are already at the latest version

Abstract
Utilizing several methods, this note shows that, in any collaboration network analysis, paper exclusion not only creates a loss of information, but can lead to incorrect interpretation of network structure because interpretation of vertex degree in the authors only graph is not well defined. Because the bipartite authors with papers graph is the actual social network, metric dimension is used to show that the relative distance structure of the bipartite graph is often defined by the structure of the papers, not that of the authors. Due to the NP-hard nature of metric dimension, methods that increase computational efficiency for the bipartite authors and papers graph are explored. With a departmental collaboration focus, public data for 245 professors from mathematics, physics and biology departments of three U.S. public universities is analyzed with network structure compared using metric dimension. By discipline, an average graph is defined with average graphs constructed from the collected data for the authors only structure, for the bipartite authors with papers structure and for papers only. Social analysis of the collected data shows that a 27\% change in the total number of hubs, along with identifying different professors as hubs, when the authors only graphs are compared to bipartite graphs reiterating the need for paper inclusion in any collaboration study.
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

Most collaboration network studies examine the authors only structure although it is recognized that some information is lost when papers are excluded. This study shows that excluding papers can lead to misinterpretation of various aspects of the collaboration network structure, possibly making accurate modeling of the network evolution impossible.
Utilizing graphs as sociograms, with a focus on the departmental collaboration of 245 professors, data was collected for three STEM departments in three U.S. public universities where the average department size is similar to Zachary’s karate club mentioned in several social network papers [25]. The smaller data set allows collaboration network analysis, such as an analysis of departmental hubs, not provided by larger data sets. To provide anonymity concerning the selected universities, the concept of an average graph is introduced with data analysis given for graphs based on the collected data and for the constructed average graphs. Used to compare the change in relative distance structure when papers are added to the authors only structure, metric dimension analysis reflects that the authors only structure does not necessarily preserve the relative distance structure of the actual collaboration network. For any bipartite graph G and utilizing G’s distance matrix (DM), DM block resolving provides an accurate but more efficient method for finding the metric dimension of G, dim ( G ) . Denote the bipartite authors and papers graph with G A P , the authors only graph by G A and the papers only graph as G P . Proposition 3 shows that DM block resolving can be utilized to find dim ( G A ) and dim ( G P ) . Theorem 1 states that, given a specific result from DM block resolving, only three DM blocks need to be resolved in order to find dim ( G A P ) . Propositions 8 and 9 provide more efficient methods for determining dim ( G A P ) given certain conditions such as Durfee rank of a graph comparison. This note appears to be the only social network analysis utilizing metric dimension to compare related social network structures, and the only one utilizing degree diagrams and the related Durfee rank of a graph.
In this study’s nine departments, an average of 50% of the faculty performed departmental collaboration between 2019 and 2023. Six of the nine department chairs did departmental collaboration with three of the six acting as collaboration hubs. In comparing the G A to the bipartite G A P , the G A P represents the actual social network, not the G A . Although there is an interpretation difference in large vertex degree in a G A compared to that of a G A P , based on collaboration hubs, a comparison of the G A to the G A P results in a 27% increase in the number of hubs, plus different authors are identified as the hubs. The large degree interpretation of the G A can indicate two different G A P structures, so the G A interpretation is not well defined. This emphasizes the importance of paper inclusion and gives foundation for Conjecture Section 6.2 stating that accurately determining social network evolution models requires paper inclusion.
Although the use of graphs in studying networks preceded their paper, Harary and Norman’s 1953 article that connected graph theory to social network analysis [13] got serious attention. de Solla Price’s 1965 article [30] discusses the network structure of scientific papers based on the papers’ references. Another important social network study was conducted by Milgram in 1967 [23] where randomly selected individuals were asked to forward a letter to a stock broker in Boston. In 2001 Newman [24] utilized large scientific paper databases to study the collaboration structure of scientific research. Often used to compare networks, metric dimension was published independently by Slater in 1975 [33] and by Harary and Melter in 1976 [14]. Metric dimension was originally shown to be NP-complete in [6] but has been more recently shown to be NP-hard [15]. The metric dimension of complete bipartite graphs is given in [8] while [4] discusses dim ( G ) for regular bipartite graphs. No exact method has been found for finding the metric dimension covering all bipartite graphs. Metric dimension research is an active area with examples of recent work found in [29,37] while [34] provides a 2023 survey of metric dimension results.
Because this note is written for a variety of possible readers ranging from sociologists to network analysts, explanations are kept as simple as possible although some basic graph theory knowledge is assumed. Section 2 provides graph theory notation and other possibly unfamiliar concepts important to this note. That section concludes with social network concepts and some results of other social network studies. Section 3 gives the data collection methods for this study. Vertex projection graphs based on the bipartite authors with papers graph are discussed in Section 4, while Section 5 discusses the structural specifics of the bipartite authors and papers graph. The social analysis in Section 6 interprets the data collected on departmental collaboration with respect to department chairs who act as hubs.
Since this study takes a close view of mathematics, physics and biology departments, the data analysis presentation focuses on concealment of which universities were used via average graph construction. The nine average graphs given in Section 7 are accompanied by brief discussions. The use of the degree diagram and the Durfee square with its rank as an analysis tool for bipartite graphs is discussed in Section 8. Section 9 covers the metric dimension analysis of the average graphs and methods for determining the metric dimension for the authors and papers bipartite graph. A review of this study’s results and possible future work concludes this paper in Section 10. This section also discusses the challenges presented by paper inclusion in collaboration studies utilizing large databases.

2. Background Information

Whether this note’s readers are sociologists, statistical physicists or mathematicians, this note assumes basic graph theory familiarity as found in [9]. Utilized in many social analysis studies, a sociogram is a graph where vertices reflect people and/or social groups with edges representing vertex relationships. Graphs constructed from the collected data are referred to here as data derived graphs.
The Pigeonhole Principle states that given more pigeons than pigeonholes for the pigeons, and assuming that all of the pigeons find a pigeonhole, at least one pigeonhole contains more than one pigeon.

2.1. Graph Theory Notation

A proper subset A of set X is denoted by A X with set cardinality given by | X | and element x inclusion in X by x X . Graph G has vertex set V ( G ) with cardinality | V ( G ) | , or simply n, is called G’s order. Edge set E ( G ) has cardinality | E ( G ) | , or m, and is G’s size. A vertex v in a specific graph G is v G . All G in this note are simple so multiple edges and loops are excluded. Graph G isomorphic to graph H is denoted with G H . The diameter of a connected G, diam ( G ) , is the longest distance found over all of G’s vertex pairs. Distance variety is the set of possible distances as determined by connected G’s diameter. For instance, suppose G’s diameter is 4. If all distances between specific v V ( G ) and all other v i V ( G ) are determined, this set of distances can only include 0 (from v to itself), 1, 2, 3 and 4 since the diameter is the longest path in G. All of these distances do not necessarily exist for all v V ( G ) : instead, these distances are the only possible distances for G. For vertices v 1 and v 2 , let d ( v 1 , v 2 ) denote the distance between v 1 and v 2 and let ( v 1 , v 2 ) indicate an edge between vertices v 1 and v 2 .
Complete graphs are denoted K n where n is the graph order, path graphs by P n and cycle graphs as C n . Distinctly label the two vertex sets of bipartite G so that each set is clearly distinguished from the other, and call this labeling a distinguishing labeling. The goal of this labeling is to generate block matrices for the bipartite graph where possible. As an example, during this study, the exact number of authors who did departmental collaboration and the exact number of representative papers was initially unknown. However, for each department, it was a given that the number of authors was < 100 . Thus, when department graphs were constructed, the distinguishing labeling gave authors a label < 100 and papers > 100 .
The open neighborhood of vertex v is N ( v ) while the closed neighborhood is N [ v ] . For v 1 , v 2 V ( G ) , v 1 adjacent to v 2 is v 1 v 2 . Given v 1 and v 2 with edge ( v 1 , v 2 ) between them, if vertex w is added between v 1 and v 2 creating edges ( v 1 , w ) and ( w , v 2 ) , then edge ( v 1 , v 2 ) is subdivided by w. In comparing graphs G 1 and G 2 , if G 2 is G 1 with every edge subdivided, then G 2 is said to be a subdivided G 1 . A subgraph in G is a collection of vertices and edges that form a graph as part of G.
Vertex v with degree of 1 is denoted deg ( v ) = 1 , and v is called a pendant vertex. Define pendant chain in a bipartite graph as a P n subgraph of three or more vertices in G that begins with a pendant vertex and concludes with a vertex incident to at least 3 edges in G with only degree 2 vertices between the pendant vertex and the concluding vertex. The length of the pendant chain is the number of edges between the pendant vertex and the vertex with degree greater than 2. A P n is considered to have no pendant chains. The maximum degree of a vertex in V ( G ) is denoted by Δ ( G ) and the maximum degree of set X is Δ ( X ) . The minimum degree in G is δ ( G ) .

2.2. Distance Matrix and Common Neighbor Matrix

For any G with order n where v i , v j V ( G ) , the distance matrix DM is the symmetric n × n matrix indexed by V ( G ) . Element d i , j of DM contains the distance between v i and v j in G. A DM is only defined for connected graphs or for each connected component of a graph. Each d i , i is zero reflecting the distance of each vertex to itself. In bipartite graphs, distances between elements of the same partite vertex set are all even while distances between elements of the two distinct partite sets are all odd. Given a distinguishing labeling on bipartite G A P with partite sets A and P, G’s DM has blocks a × a , p × p , a × p and p × a where a A and p P . Blocks a × a and p × p contain even distances while the distances in the a × p and p × a blocks are odd.
Given G, define the common neighbor matrix CNM as a symmetric n × n matrix with indices defined on V ( G ) where element c i , j CNM at the intersection of v i , v j V ( G ) is the number of common neighbors that v i shares with v j . A CNM can be defined to include the common neighbors of vertices that are adjacent and/or those that are not adjacent. In this note, it is assumed that any CNM here is defined between nonadjacent vertices. Thus, for nonadjacent v i , v j V ( G ) , c i , j = | N ( v i ) N ( v j ) | . The CNM based on nonadjacent vertices for any K n is an all zero matrix. For nonadjacent vertices, if G is bipartite and c i , j is non-zero, then v i and v j are in the same partite set. Note that any row (or column) sum can include duplicate vertices since a vertex can be shared between vertices v i and v j and also shared between v i and vertex v k .
In Section 4, the CNM of the bipartite authors with papers graph is used to construct projection graphs. Based on common neighbors of nonadjacent vertices, a similar matrix is the κ-th order vertex-adjacency matrix A κ v as given in [17] for κ = 2 . This matrix is a binary matrix with a 1 when v i and v j have distance 2 and a zero otherwise. However, the CNM provides greater flexibility regarding the choice of adjacent or nonadjacent vertices, plus providing the number of common neighbors shared by two vertices.

2.3. Average Graphs

The motivation for developing an average graph derives from the desire for both anonymity in this study yet using graph/network structure images. An average graph is a graph defined on the statistical averages of a parameter set for a collection of graphs. This concept has meaning when the graphs in the collection of graphs are specifically related. Although the departments within each university in this study are related to each other by institution, the average graphs in Section 7 represent the averages of the data derived graphs for each discipline as this aligns with this study’s objective.
The three data derived physics graphs (authors only, authors with papers and papers only graphs) differ significantly from those for mathematics and biology; but the three data derived physics graphs have similar structure to each other. The same can be said for the three data derived mathematics graphs and the three biology graphs.
In constructing this study’s average graphs, average graph order and size are used, along with average diameter, average degree and, as discussed later in this note, average metric dimension. Because many averages generate decimal values, truncation value to rounded value ranges are used as targets in average graph construction. If an integer value results, then that value is used. Average degree is determined as the sum of the three graphs’ degree sequences divided by the sum of the graph orders ± one standard deviation ( ± σ ) with the target range based on the truncation value to the rounded value range as with the other averages. For bipartite authors with papers graphs, the average number of papers and the average number of authors is calculated. The average number of components is determined along with the average number for each component type such as the average number of P 3 components, etc. Regarding the larger components that are distinct, an average large component(s) is found that is average with respect to order, size, degree, diameter and metric dimension.

2.4. Durfee Square

The concept of Young diagrams, and their included Durfee squares, are used in [20,21] with respect to the degree sequence of a graph. In [21,32], these two concepts are utilized in connection to threshold graphs. The Durfee square in [1] is used for h-index enhancement, while in [31], the Durfee square is utilized in measurement of scholarly impact.
A Young diagram (also called Young tableau and related to Ferrer’s diagram) gives a visual image of a non-negative non-increasing integer partition. Imagine the number 4 as four horizontal squares reflecting 4+0. Then 3+1 can be visualized as Preprints 173236 i001 with a maximal partition integer as the top row and each subsequent row below the top row representing a partition integer less than or equal to its predecessor row partition integer.
The Durfee square of a Young diagram is the largest square of squares anchored by the upper left square of the diagram. The Durfee rank, r , is the number of squares along one edge of the Durfee square, which is also the number of squares along the Durfee square diagonal ( r is also called the Frobenius rank [10] or partition trace [20]). In [2], a Young diagram corner Preprints 173236 i002 is referred to as an inner corner of the Young diagram while a Preprints 173236 i003corner is called an outer corner. Figure 1 displays five types of corners associated with r , each of which relays different information regarding the large degree structure of a graph. Let R r indicate the lowest (from the top) degree diagram row that includes r , and let ( R r ) denote the length of this row. The row above R r is row R r 1 and the row below R r is row R r + 1 . The table at the bottom of Figure 1 reflects the relationship of each r corner type with respect to the degree diagram rows R r 1 , R r and R r + 1 .
Any graph with at least one edge, has a degree sequence that is integral. An integer sequence from which a graph can be constructed is called graphic. In the same manner that 4 can be partitioned as either 2+2 or 3+1, a degree sequence can be viewed as a partition of twice a graph’s size or 2 m . A degree diagram is a Young diagram of a graph’s’ degree sequence, or a degree sequence of a vertex subset that might not be graphic. Certain requirements exist for an integer sequence to be graphic (see [9,21] for general information). There exist two easy indicators that a degree sequence is graphic. First, the number of odd degree vertices must be even. Also, for a degree sequence X, Δ ( X ) < | X | is required. Let r ( G ) be the Durfee rank of G.
As explained in the next subsection, the larger degrees in G impact G’s metric dimension. For G, r ( G ) reflects that there exist at least r ( G ) number of vertices that have degree at least r ( G ) ; thus giving a minimum value for the large degree structure in G. The identification of the r corner conveys additional large degree information as shown by the table in Figure 1.
Isolated vertices K 1 are excluded from this note; and they are excluded from the degree diagram concept. Given a vertex subset X of a connected G, r ( X ) is the Durfee rank of X’s degree sequence. Although the degree diagram of a vertex subset X can have a single row, any degree diagram of G must have at least two rows.
Consider a bipartite G X Y with partite sets X and Y. The degree sum of either vertex set is m = | E ( G ) | ; thus, the degree sequences of X and Y are each partitions of m. The degree sequence of either set is most likely not graphic, but placing each sequence in a degree diagram provides significant information regarding G. If the partite degree sequences are placed in degree diagrams, then the visual partition image of one diagram is simply a rearrangement of the squares in the other partition image. Discussed later in this note, Figures 10 and 11 each display a bipartite G A P with the degree diagrams for the two vertex sets A and P.
Assume a connected G. If r ( G ) = 1 , then the second row of the given degree diagram must contain a single square, and the first row must contain one or more squares. If the first row contains a single square, then G must be bipartite K 2 which is a star graph, K 1 , n 1 , with n = 2 . If the first row contains more than one square, then G is again a star graph. Thus, r ( G ) = 1 if and only if G K 1 , n 1 . This proves Proposition 1.
Proposition 1.
A connected G has r ( G ) = 1 if and only if G is a star graph, K 1 , n 1 , with n 2 .
Although complete characterization of simple G based on r ( G ) is beyond the scope of this note, if r ( G ) = 2 , then Δ ( G ) 2 for two or more vertices. Graphs with r ( G ) = 2 include C n and, for n 4 , P n . There is more to explore in the Durfee square and Durefee rank concepts than what is contained in this note.

2.5. Metric Dimension

Introduced independently by Slater in [33] (1975) and Harary and Melter in [14] (1976), the metric dimension of a graph has found numerous uses related to the comparison of network structures. If two graphs have the same metric dimension, then, based on relative distance, the two graphs have a similar structure.
Suppose G is a simple graph with v i , v j V ( G ) , and let d ( v i , v j ) indicate the distance between vertices v i and v j . Imagine ordered subset W of vertices in G such that every vertex in G has a unique combination of distances to the members of W. Then W is called a resolving set of G. The fact that the elements in W are considered to be ordered is critical. The cardinality of a minimum resolving set is the metric dimension of G, dim ( G ) . Set W is the metric generator of G, and the elements of a minimum cardinality W is a metric basis of G [8]. There can be more than one resolving set W with minimum cardinality.
As an example, consider the graph given in Figure 2 along with two W sets, W 1 and W 2 . Each vertex v has a unique distance vector, r ( v | W i ) (also known as the metric representation of v or metric code of v), that contains the distances from v to each of the members in that particular W. W is a resolving set if and only if all distance vectors for all v in G are unique. Any vertex v in W has a unique r ( v | W ) as a zero is in v’s distance vector at v’s position in W; so both W 1 and W 2 in the figure are resolving sets of G. If vertex 2 is added to each W, W is still a resolving set but it is not minimal. However, if any vertex is removed from either W 1 or W 2 , then the distance vectors for the vertices not in the altered W are no longer unique, indicating that both W 1 and W 2 are minimal resolving sets for G; and G has dim ( G ) = | W i | = 3 where i { 1 , 2 } . Additional information on the metric dimension of a graph can be found in [8].

2.5.1. Known Metric Dimensions

The metric dimension of some graph families has been determined. All P n have metric dimension of 1, dim ( K n ) = n 1 for all K n and dim ( C n ) = 2 . Bipartite star graphs have dim ( K 1 , n 1 ) = n 2 . There exist additional graph families with known dim ( G ) not mentioned here, and finding the metric dimension remains an active area of research. From Proposition 1, if r ( G ) = 1 , then G K 1 , n 1 and dim ( G ) = n 2 .

2.5.2. Distance Matrix and Metric Dimension

All G have dim ( G ) . From a technical linear algebra standpoint, matrix columns represent vector space bases while rows map to the field. With respect to any DM, because a DM is symmetric, the metric dimension of G can be found utilizing either the columns or the rows of G’s DM [3]. Rows are utilized in this note because this seems more natural than using columns. The manner in which the DM is used to find the metric dimension of a graph is best explained with an example. Figure 2 contains the distance matrix DM for the displayed G. The goal is to find a minimum number of rows that provide a set of unique ordered combinations of distances in the selected rows’ column combinations. Any specific v i indexing a DM row is assumed to be in a W set and the ordered column combinations of a collection of v i are the distance vectors for the v i W to v j V ( G ) . Alternatively, if columns are used, the vertices v j indexing the columns are in W and the ordered row combinations are the distance vectors for the v j to v i . Note that any column combination in row v i that contains a zero is unique as this combination indicates that v i W .
In Figure 2, first notice that the DM row for pendant vertex 1 has a unique number of 1s and 2s compared to the other rows. A unique set of distances is often true for pendant vertices. Second, notice that the row for vertex 2 contains all 1s except for the single zero; so getting unique column combinations with vertex 2’s row is difficult. Let W = { 1 , 2 } and consider the DM rows for vertices 1 and 2. For this W, the column combinations for vertices 3, 4 and 5 are r ( 3 | W ) = r ( 4 | W ) = r ( 5 | W ) = ( 2 , 1 ) so W is not a resolving set. Now select the DM rows of 1 and 3 placing 1 and 3 in W. Compared to rows 1 and 2, rows 1 and 3 have one less repeated column combination. However, still r ( 4 | W ) = r ( 5 | W ) = ( 2 , 1 ) . So let either vertex 4 or 5 be in W making | W | = 3 . Rows 1, 3 and 5 provide the set of unique column combinations that are the unique distance vectors { ( 0 , 2 , 2 ) , ( 1 , 1 , 1 ) , ( 2 , 0 , 1 ) , ( 2 , 1 , 1 ) , ( 2 , 1 , 0 ) } , so W = { 1 , 3 , 5 } is a resolving set. Comparing all combinations of rows, all minimal resolving sets contain three elements so dim ( G ) = 3 . Note that the closed neighborhoods of vertices 3, 4 and 5 in Figure 2 are the same.

2.5.3. Diameter and Metric Dimension

Since the diameter is the longest possible distance between any two vertices in G, the diameter reflects the maximum distance variety over the vertices of G. Consider G with diameter of 2 so n > 2 . Then, excluding the zero in each DM row and utilizing diam ( G ) , the only possible distances contained in G’s DM are 1 and 2. As the order of G increases, so does the length of the rows in DM. Excluding zero, let n be the number of distances in G’s distance variety and let r be the number of DM rows; so r is also the number of zeros in the r rows. There exist a maximum of n r + r possible unique ordered combinations. In the example with diam ( G ) = 2 , for distances 1 and 2 and for 2 rows, there exist 6 maximum unique column combinations. Thus, two rows cannot give unique column combinations required by dim ( G ) if the rows are longer than 6.
Given a fixed graph order, as diameter decreases, the metric dimension tends to increase. For simple connected G, if G’s diameter is n 1 , then G is P n and dim ( G ) = 1 . If G’s diameter is 1, then dim ( G ) = dim ( K n ) = n 1 .

2.5.4. Degree and Metric Dimension

Vertex degree in G also plays a significant role in dim ( G ) because as general degree increases for a fixed order G, diameter tends to decrease due to more vertices becoming adjacent to each other. As diameter decreases, the variety of distances in G’s DM tends to decrease since the number of 1s increases in some rows. As the distance variety decreases, the number of DM rows required for unique column combinations tends to increase. In other words, for a fixed graph order, a general increase in vertex degree also tends to increase metric dimension.

2.5.5. Twin Vertices and Metric Dimension

Given distinct vertices v 1 and v 2 in G, if either N ( v 1 ) = N ( v 2 ) or N [ v 1 ] = N [ v 2 ] , then v 1 and v 2 are twin vertices [19,39], and distances from v 1 and v 2 to the other vertices in G are the same. Therefore, either v 1 or v 2 must be in a minimal W. A set of twin pairs can have more than two vertices, all of which have the same set of distances. As noted above, in Figure 2, N [ 3 ] = N [ 4 ] = N [ 5 ] = { 2 , 3 , 4 , 5 } indicating that this vertex set is a set of three twin pairs. This forces two of the three twins to be in a minimum W. In other words, given x number of twin vertices in the same twin set, x 1 of the vertices must be in a minimal W for G. Any K n has a twin set of cardinality n making dim ( K n ) = n 1 as is known.

2.6. Social Network Background

Considered to be social networks, collaboration networks have been one of the most active areas of research for the past couple of decades. In this note, the graphs formed by authors without papers, by authors with papers, and by papers without authors, are discussed since the papers, and their connected research activities, are the social groups. For this study, the assumption is made that departments have a physical existence where professors see each other on a regular basis, giving them the opportunity to discuss their research.
A research group is a collection of collaborating authors within the same organization while a research network includes collaborators from more than one organization [18]. Given these definitions, this note is focused on the departmental research groups that may exist in a research network. Collaboration can lead to a larger number of publications, career advancement and increased access to funds [36].
Milgram’s impactful 1960s study [23] involved 160 random individuals in Nebraska who were requested to forward a letter to one of Milgram’s Boston friends. A requirement to the forwarding was that the letter be sent only to people who the sender knew on a first-name basis. Even though Milgram’s study was on a small scale, Milgram’s requirement of first-name basis has been used to justify utilizing collaboration networks as representative social networks as opposed to film actors in films [24] because it is assumed that coauthors tend to know each other on a first-name basis. As mentioned in [24], some papers have very large laboratories as authors, so a first-name basis seems unlikely in those cases. The social network in any department is undoubtedly one where first names are known among its faculty.
Consider the different environments found in the three disciplines covered by this study. Both physics and biology can have complex physical laboratories while the mathematician’s laboratory is typically paper and pen, or marker and board, or one or more computers. This difference results in a larger total number of collaborators for biology and physics compared to mathematics papers as discussed in [24]. Although a paper may have many authors, the only authors considered in this study are those from the same university department. Any given author team may produce a number of papers; but in this note, only a single paper that represents a distinct author collaboration structure is considered.

3. Data Collection Methods and Approach

This section discusses the data collection methods used in this study with a focus on the use of representative papers. The collected data’s purpose is to generate collaboration network graphs on which metric dimension is used to compare the structures.

3.1. Data Collection Methods

Five years of public information (2019 to 2023) as found on Google Scholar, ResearchGate and Web of Science is used for the professors in the mathematics, biology and physics departments of three U.S. public universities. Duplicate papers are excluded. Utilizing the same logic as given in [24], preprints are included in this study when they do not duplicate published papers.
Selection of the three United States public universities is based on the following common characteristics as determined directly from each university’s web site.
  • Total student enrollment is between 25,000 and 30,000 with a primary campus that includes a medical school and hospital. Primary campus is defined as containing at least 70% of the student population.
  • The basic structure of all three discipline departments is fundamentally the same; so each department has an applied faculty who work on medically related mathematics along with general research areas for that discipline.
Collaboration focuses exclusively on tenure track faculty in the same department. Professors who perform only research (have no teaching responsibilities) are eliminated because not all of the departments have these positions. Any professor officially listed on the web as being in more than one department is removed. In all nine of the departments, some faculty collaborate with both medical school personnel as well as members of their departments. In this case, the focus is exclusively on the collaborating authors within the studied department. Department inclusion of faculty during the study’s five year period is determined from various public sources including institutional reference on published papers.
It is assumed that the web sites are accurate and current. This assumption is applied to both the university web sites as well as those for the nine departments. The assumption is made that professors are accurately listed in the various research areas.

3.2. Data Approach - Representative Papers

As our objective is to analyze the collaboration structure and not the total amount of collaboration activity, only representative papers are utilized. In other words, if two department authors collaborate on 20 papers within the five year period, only one representative paper is recorded for the collaboration. However, if a third author in the department is periodically added to the collaborating team, then a second representative paper is documented. Thus each representative paper has a unique department collaboration authorship.

3.3. Collaboration Group Size

The collaboration group in this note is the number of professors in the same department who are authors on a representative paper, not the total number of authors on an actual paper. In almost all instances, the actual number of authors is greater than the number who are from the same department. Exceptions to the last statement are typically found in the three math departments where the mathematicians from the same department are the only authors on the published paper.

4. People, Papers and Graphs

Let G A be a graph that contains only authors where edges connect the authors collaborating together on research papers. Denote a graph based only on representative papers as G P where an edge between two papers reflects that the papers have at least one author in common. In collaboration network studies, because the social groups are the papers, the actual social situation is the bipartite graph, G A P , constructed with author set A and related paper set P.

4.1. Projection Graphs

Graph G A is the graph discussed in most collaboration network analysis studies. This graph is a projection graph derived by projecting the author vertices a A onto the papers p P in the bipartite G A P [28]. Graph G P is the projection graph of the set of p onto the set of a in G A P ; so G P depicts the structure of the research groups identified by the representative papers in this note. Thus V ( G A ) is A V ( G A P ) and V ( G P ) is P V ( G A P ) . Figure 3 depicts G A , G A P and G P of a few large components based on this study’s collected data. Note, in some of the depicted graphs, but not all, G A P is generated by subdividing every edge of G A . Also note that for two of the graph trios, the G A are isomorphic K 3 while their related G A P are quite different, as are the two G P .

4.2. Construction of Projection Graphs from CNM

Let a and p be vertices in V ( G A P ) , a G A V ( G A ) and p G P V ( G P ) . The open neighborhood of a G A (or p G P ) is the set union of a’s neighbors’ neighborhoods less a in G A P (and the same for p) so duplicate vertices are eliminated. For a, let p i N ( a ) where N ( p i ) a is a set of a j less a (for p, let a j N ( p ) and N ( a j ) p is the set of p i less p.) Thus, deg ( a G A ) = | N ( p i ) a | where p i N ( a ) and similarly for p G P . Hence, N ( a G A ) is based on N ( a ) , the degree of each p i N ( a ) a less the number of neighbors shared among the set of p i .
Given any G A P , its CNM based on vertex nonadjacency can be utilized to construct G A and G P as follows. For bipartite G A P with partite vertex sets A and P, define a graph G A on vertex set A where vertices a i , a j A and a i a j in G A if there is a non-zero value in G A P ’s CNM at c i , j CNM. A similar graph G P can be defined for vertices p i , p j P in G A P . The count of the non-zero entries in any row of G A P ’s CNM then gives the degree of vertex a G A indexing the CNM row and similarly for any p G P G P . The order of G A is | A | where A V ( G A P ) and | G P | = | P | where P V ( G A P ) . G A and G P are the two projection graphs of G A P derived from G A P ’s CNM.
Thus, due to the bipartite nature of G A P , if a distinguishing labeling is given to G A P that clearly separates set a A from the members of the set p P (such as papers labeled 100 and authors < 100 ), and the indices of CNM are in numeric order, then the CNM is a block matrix with a × a , a × p , p × a and p × p blocks. Based on nonadjacent vertices, the a × p and p × a blocks in the CNM for bipartite G A P are all zero blocks. Note that the DM of a G A P can also be used where the G A (and G P ) is constructed based on vertex pairs in the a × a (also p × p ) block with distance 2.

5. Structural Specifics of the Authors with Papers Graph

The use of representative papers makes the structure of the G A P very specific. However, consider any social group network such as actors and films, or women and their participation in Southern US social groups [5], or the collaboration structure found in current large research paper databases [24] with the concept of representative groups. Thus, the defined structure of the G A P in this note applies to many similar social situations. Note, only even C n with n 6 and odd P n with n 3 are G A P .
Projection graphs G A and G P can be either bipartite or non-bipartite. In the 18 data derived projection graphs, 22% of the G A largest components and 44% of the G P largest components are bipartite.

5.1. Pendants, Degrees and Neighborhoods

The papers in this study require at least two faculty members for them to be considered in a G A P . This requirement and the use of representative papers restricts the possibilities for the structure of these graphs. Below is a list of the specific structural aspects of a G A P large component and its related G A and G P . Proofs of the following statements are left to the reader.
1.
Pendant vertices and pendant chains:
(a)
In any G A P , only authors can be a pendant vertex.
(b)
Any pendant author in a G A P is also pendant in its G A .
(c)
Any pendant chain in a G A P includes at least one author and one paper.
(d)
Any pendant paper in a G P is not pendant in the related G A P .
(e)
Any pendant vertex in a G P must be in a pendant chain with minimum length of 3 in the related G A P .
2.
Degree and Durfee rank: Since only authors can be pendant vertices in a G A P , the minimum possible degree for authors is 1 while that of papers is 2.
(a)
For the set of paper vertices in a G A P , 2 r ( P ) Δ ( G A P ) .
(b)
For author vertices in a G A P , 1 r ( A ) Δ ( G A P ) .
(c)
r ( G A P ) can be greater than either r ( A ) or r ( P ) .
(d)
In a G A P , vertex a A always projects to a p P that has at least one neighbor in the G A P .
(e)
Vertex p can project to an a vertex that has no other neighbor (i.e. a is pendant).
3.
Neighborhoods: Each paper is a representative paper resulting in unique neighborhoods for all papers in any G A P .
(a)
No paper vertex can be a twin of another paper vertex in a G A P .
(b)
Author vertices can be in more than one twin pair.
(c)
All bipartite G A P are planar due to the distinct neighborhoods of all p vertices.

5.2. Degree Projection

Consider the situation depicted in Figure 4 where author a has collaborated with other authors on six representative papers shown as gray vertices. Then each paper has vertex a as a common neighbor with the other five papers; so the p vertices are nonadjacent common neighbors of each other. This gives each of the papers in the related G P at least a degree of 5, and places the six paper vertices in a K 6 subgraph in G P . In other words, a high degree in one of the vertex sets of G A P is projected onto the vertices of the other set in the latter set’s projection graph. The degrees for papers p 2 , p 3 , p 4 and p 5 increase past 5 depending on the degree of the other author vertices to which each paper is adjacent in G A P .
Proposition 2.
Assume connected G A P has partite sets A of authors and set P of representative papers so Δ ( G A P ) = max { Δ ( A ) , Δ ( P ) } . In the projection of P vertices to the vertices in A, Δ ( A ) generates a K Δ ( A ) subgraph in G P ; and by projection of A onto P, Δ ( P ) generates a K Δ ( P ) subgraph in G A .
Proof. 
Given a G A P as described, if Δ ( A ) = x , there exists vertex a A that is adjacent to x number of p P . Call the set of x number of p vertices P x . Since the members of P x share a as a neighbor, they have each other as common neighbors of a. So each pair of vertices in P x has a nonzero value at their intersection element in G A P ’s CNM. Hence, there is a K x subgraph in the G P . The same reasoning applies to Δ ( P ) = y producing a K y subgraph in G A . □
If G A P ’s size m is even, the degree sequence of either A or P (most often P) can be a collection of all 2s. When this occurs, projection results in a degree sequence that is isomorphic to that of the vertex set in G A P . This is due to degree projection creating a set of K 2 in the projection graph. As an example, consider C 6 with three a A and three p P . The degree sequence for A is isomorphic to that of P and both sequences are [ 2 , 2 , 2 ] so Δ ( A ) = Δ ( P ) = 2 . These sequences generate G A and G P isomorphic to K 3 . This is not a conflict to maximum degree generating a complete subgraph based on the maximum degree because three K 2 subgraphs are generated in both G A and G P ; and K 3 is also C 3 . In fact, if G A P is an even C n with n 6 (required for distinct N ( p ) ), then G A and G P are isomorphic cycle graphs each with order n 2 due to the all 2s degree sequences of both partite sets. If G A P is an odd P n with n 5 , then | A | = | P | + 1 , and G A and G P are both path graphs due to Δ ( A ) = Δ ( P ) = 2 .
There is a compounding effect that can occur in the projection graphs. Figure 5 displays two G A P with gray p P and white a A , where each G A P is a C 8 with two chords. In both cases for that specific G A P , the A and P degree sequences are identical and the projection graphs are isomorphic. For both G A P , Δ ( A ) = Δ ( P ) = 3 yet G A P 1 on the left has a G A (and isomorphic G P ) that is K 4 while G A P 2 on the right has a G A (and also G P ) that is C 4 with a chord. Although metric dimension, dim ( G ) , is discussed later in greater detail, dim ( G A P 1 ) = 3 while dim ( G A P 2 ) = 2 . Although this may seem like a contradiction to Proposition 2, it is not. In both instances, the projection of the maximum degree 3 vertex produces K 3 subgraphs in the projection graph. The difference is due to the disparity in the structure of the neighborhoods due to the different locations of the degree 3 vertices. In G A P 1 , author vertex b with degree 2 is adjacent to two degree 3 papers so b is adjacent to a, c and d in G A . On the other hand, G A P 2 contains no degree 2 vertex adjacent to two degree 3 vertices. Thus, the four degree 2 vertices have only two neighbors in their respective projection graphs. Due to the importance of Proposition 2, next is a large component example derived from the collected data.
For partite sets a i A and p j P , it is a fact that i = 1 | A | deg ( a i ) = j = 1 | P | deg ( p j ) = m where m is the size of the related G A P . As with G A P 1 in Figure 5, the data derived large component of a G A P in Figure 6 appears to contradict Proposition 2. However, due to the degree sum of A equaling the degree sum of P, the K 4 in G A is still much smaller than the K 6 found in G P due to Δ ( A ) = 6 but Δ ( P ) = 3 . This particular G P contains two vertices with degree 8 and two vertices with degree 7 as shown later in Figure 11.
Remark 1.
Define ϵ ( G ) = | E | | V | as in [12] and let δ ( G ) be the minimum degree of G. Excluding isolated vertices, Proposition 1.2.2 in [12] shows that large degree vertices are not scattered in vertices with smaller degrees. In other words, in any graph there exists a subgraph H, where H = G may be true, such that δ ( H ) > ϵ ( H ) ϵ ( G ) . The proof of the proposition in [12] contains the following process. For G with at least one edge, construct an induced subgraph sequence G = H 0 H 1 such that any v i V ( H i ) where deg ( v i ) ϵ ( H i ) is deleted and H i + 1 = H i v i . The process stops when there are no more v i V ( H i ) that can be deleted. This results in ϵ ( H i + 1 ) ϵ ( H i ) for all i, and an induced subgraph that contains the vertices with the larger degrees in G.
Because invariant ϵ ( G ) = | E | | V | reflects the proportion of graph size to graph order, ϵ ( G ) < 1 indicates a tree or forest. If ϵ ( G ) = 1 , then graph size equals graph order, one example of which is C n . For K n , ϵ ( K n ) = x 1 2 ; so for K 2 P 2 , ϵ ( K 2 ) = 1 2 < 1 and for K 3 C 3 , ϵ ( K 3 ) = 1 .
The situation that produces the closest complete subgraphs in the projection graphs is when at least one of the partite sets of G A P has a degree sequence that consists of all 2s. Because vertex a A V ( G A P ) can have degree 1 and the minimum degree of p P V ( G A P ) is 2, if the degree sequence of A is all 2s, then the degree sequence of P is also all 2s reflecting that G A P is an even cycle graph with n 6 . In this case, the only complete projection graph is when G A P is C 6 and both projection graphs are K 3 C 3 . For even C n with n 6 , the projection graphs are isomorphic cycles composed of K 2 subgraphs. If G A P is a subdivided G A , then the degree sequence of P in G A P is all 2s and Δ ( A ) Δ ( P ) . For this situation, the degree sequence of G A is isomorphic to the degree sequence of A in G A P due to the all-2s P degree sequence.
Based on Proposition 2, if Δ ( A ) = x and Δ ( P ) = y where x > y , can the maximal induced complete subgraph in G A have greater order than the maximal induced complete subgraph in G P ? The answer is “yes” for a specific case that follows where Δ ( A ) > Δ ( P ) but the maximal complete subgraph in G A exceeds the order of the complete subgraph in G P .
Suppose G A P is a subdivided G A K | A | where Δ ( A ) = x . Because it poses a “small graph” exception concerning the number of K x subgraphs in G P , first let x = 3 so | A | = 4 . Then G A P is the subdivided K 4 and G A K 4 , so G A contains four K 3 subgraphs. G P has | P | = 6 , each p V ( G P ) has regular degree 4, and by Proposition 2, there exist eight K 3 subgraphs but these subgraphs do not form a K 4 or K 5 . Instead there exist three sets of twin pairs in G P . In this case, 3 = r ( G A ) < r ( G P ) = 4 and dim ( G A ) = dim ( G P ) . Now more generally, let Δ ( A ) = x > 3 . Then in G A P , | A | = x + 1 , all a i A have degree x and each pair in set A shares a single paper. Thus, | P | = ( x + 1 ) ( x ) 2 = y and G P has Δ ( G P ) = 2 ( x 1 ) for all p and has x ( x + 1 ) ( x 1 ) 2 number of edges. There exist x + 1 number of K x subgraphs in G A K x + 1 . Based on Proposition 2, the y number of p j V ( G P ) are in K x subgraphs but each p has degree 2 ( x 1 ) that is greater than degree x 1 . To compare these particular G A and G P , consider ϵ given in Remark 1. For the G A , ϵ ( G A ) = x 2 while ϵ ( G P ) = x 1 ; and x 2 < x 1 for all x > 3 . This shows that proportionally, there are more vertices with larger degrees in G P than in G A . However, the placement of the edges in G A P only allows a p pair to have at most x 1 common neighbors. Thus, the largest complete subgraph in G P is K x which is smaller than K x + 1 G A .
Also note that for the two regular degrees, x < 2 ( x 1 ) so x = r ( G A ) < r ( G P ) = 2 ( x 1 ) . A graph’s degree structure affects its metric dimension; so compared to G A ’s DM, either equal or more rows in G P ’s DM are required to produce unique column combinations. Hence, dim ( G A ) dim ( G P ) .
Corollary 1.
Suppose connected G A P has partite sets A of authors and set P of representative papers and G A P is not a subdivided K n . If Δ ( A ) > Δ ( P ) , then a maximal complete subgraph in G A has smaller order than a maximal complete subgraph in G P . If Δ ( A ) < Δ ( P ) , then a maximal complete subgraph in G P has smaller order than a maximal complete subgraph in G A .
Proof. 
Suppose Δ ( A ) > Δ ( P ) where Δ ( A ) = x and Δ ( P ) = y . Assume that the vertex elimination process described in Remark 1 has been done on connected G A P generating a graph G A P * containing only the larger degree vertices in G A P . Let a Δ have degree x and p Δ have degree y. To create a maximal situation, let a Δ p Δ in G A P * and G A P . Denote the set of p Δ neighbors of a Δ as p Δ i where 1 i x , and a Δ neighbors of p Δ as a Δ j with 1 j y . To get maximum degrees of a Δ and p Δ in their respective projection graphs, if it were possible that the neighbors of a Δ and p Δ shared no neighbors, then maximum deg G A ( a Δ ) = i = 1 deg ( a Δ ) deg ( p Δ i ) deg ( a Δ ) = x y x and similarly for p Δ making maximum deg G P ( p Δ ) = y x y . Since x > y , deg G A ( a Δ ) < deg G P ( p Δ ) for all a Δ G A and all p Δ G P . Similar logic shows deg G P ( p Δ ) > deg G A ( a Δ ) .
Still assuming that all a Δ are adjacent to all p Δ in G A P * (and G A P ) where x > y , now let the neighbors of a Δ and p Δ share neighbors, let N ( a Δ ) be a set of p Δ i where N ( p Δ i ) is a set of a Δ so | N ( a Δ ) | > | N ( p Δ ) | . Thus the probability of shared neighbors for the smaller distinct neighborhoods of p Δ i is greater than the probability of shared neighbors in the larger possibly non-distinct a Δ i neighborhoods resulting in deg ( a Δ ) in G A being less than x y x that is already less than x y y for the p Δ . Thus for a Δ G A , | N ( a Δ ) | in G A is less than | N ( p Δ ) | in G P . Because all p Δ i are neighbors of a Δ , they form a K Δ ( A ) subgraph in G P , and the same for the a Δ i in G A . Hence, the maximal complete subgraph in G A is then smaller than the maximal complete subgraph in G P .
When x < y , the larger neighborhoods are distinct. Thus, there exists a greater probability of shared neighbors in the smaller N ( a Δ i ) because the vertices can have non-distinct neighborhoods (twins are permitted) and N ( p Δ ) have more a Δ i in this case. This gives the p Δ in G p a degree smaller than x y y that is smaller than x y x in this case.
First assume Δ ( A ) > Δ ( P ) in G A P that is not a subdivided K n . For the sake of contradiction let the a Δ in the associated G A be in a larger maximal K n subgraph than the p Δ in the maximal K n of the related G P . This implies that each p Δ has more unique (non-shared) common neighbors compared to the a Δ in G A P ’s CNM. However, the probability of the last statement referring to G A P ’s CNM is zero due to the N ( a Δ i ) having a greater chance of shared neighbors in G A P . Now assume that Δ ( A ) < Δ ( P ) in G A P and that the a Δ in the associated G A are in a smaller maximal K n subgraph than the p Δ in the maximal K n of the related G P . Using similar logic as when Δ ( A ) > Δ ( P ) again reveals a zero probability of this situation. □

6. Social Aspects of the Data Collected

In this section, the data collected from the three university web sites is examined first, followed by discussing each of the three disciplines regarding the department chairs as research focal points.
Assortative mixing occurs when members of social groups associate with each other based on specific characteristics. In this study, all professors have specific areas of focus within their general research area as given on the department web site. As expected, with the exception of two papers, professors collaborate with other professors in the department who share their specific research area. The two exceptions are education papers where department professors who do not have education as their research area, coauthor with the education researcher in their department. Although not given in this note, dendrograms based on distances successfully identified clear communities in the G A and G A P centered on research areas.

6.1. Analysis of University Data

Various characteristics of the three universities are collected from the institutions’ web sites with averages (means) and standard deviations ( σ ) presented here. Focus is exclusively on each university’s primary campus. The average student to teacher ratio is 15:1 with σ = 1.5 . The average total student population is 27,136 ( σ = 2 , 023 ). Of the total student population, there is an average of 20,340 undergraduates ( σ = 1 , 064 ) representing 75% of the student body, and 6,795 graduate students ( σ = 2 , 126 ) for 25%. There is an average of 22,857 full-time students ( σ = 1 , 219 ) or 84%. The average in-state student population is 19,818 students ( σ = 7 , 639 ) or 73%.
By discipline, Table 1 displays the mean department size along with the mean number and the mean percent of faculty performing departmental collaboration. Overall, six of the nine department chairs (67%) collaborate within their department.

6.2. Analysis of Discipline Data: Hubs Analysis

Define a hub as any author vertex whose degree is greater than the average department degree plus one standard deviation. An analysis of hubs is done for both the nine G A and the nine G A P (with paper degrees excluded) with a focus on department chairs. Professors in a hub position may exert greater influence as far as research in a department is concerned [38].
In any G A P , author vertices with larger degree indicate professors involved with a greater number of representative papers that reflect distinct research groups in the department. In any G A , the interpretation of vertices with the greater degree is not well defined as there exist two possible meanings. In a G A , a high degree a can reflect professors associated with either representative papers that have a larger number of departmental faculty authors, or reflect a large number of representative papers that may have only a few authors.
The data in Table 2 gives the hubs analysis for the nine data derived G A and for the nine data derived G A P . Overall, 33% of the department chairs are hubs in their departments. Any G A P depicts the actual social network, not its G A projection graph. In this analysis, the value defining the hubs is calculated separately for the nine G A and the nine G A P . The G A P hub value is based on the author degree only (paper degree is excluded). Due to degree projection, in this analysis, 33% of the 18 graphs examined display a different set of hubs between the G A and the related G A P .
Utilizing the G A P to assess the number of hubs produced an increase in total number of hubs from 22 (see Table 2) to 28 (27% increase). Although the total number of chairs acting as hubs remained the same, the number shifted from 1 to zero for physics and from 1 to 2 for mathematics. In comparing the two table sections, notice that there is no difference for the Biology row. An examination of the three data derived biology G A to their related G A P reveals that each G A P is a subdivided version of the G A in all three cases.
Focusing exclusively on the structure of G A can give misleading interpretations as shown by the different hub analysis results between the G A and the related G A P and the fact that the interpretation of the large degree structure in a G A is not well defined. These differences can impact an analysis of a network’s evolution over time.
Conjecture 1.
Given an author and paper collaboration network structure, analysis of projection graphs G A and G P , plus bipartite G A P , is required in the prediction of future network links or edges and general network evolution.

7. The Nine Average Graphs

Derived from the 27 data derived graphs, the average graphs are covered in this section. For each discipline, the average G A , average G A P and average G P are given, and are only briefly discussed. Focus in the discussion is on the large component, L G , of each average graph. Only the metric dimension for the large component L G of each average graph is displayed.
An important requirement in the transition between the average G A and the average G A P , and between the average G P and the average G A P , is that projecting the a in the average G A P onto the p in the average G A P must produce the average G A determined from calculating the average G A parameters; and similarly projecting the p V ( G A P ) onto the a V ( G A P ) must generate the calculated average G P .

7.1. Authors Only Average Graphs and Analysis

As expected, in Figure 7, the number of components in all of the average graphs is similar since overall, 50% of the professors do departmental collaboration with little difference between average discipline department size as displayed in Table 1. Compared to the results in [24], the percent of vertices in the large components here is much less.
In comparing the large components of the three average G A in Figure 7, notice that this component is much smaller for mathematics. The greater connectivity of the physics G A is explained by the fact that the physics G A P ’s large component in Figure 8 has 5 K 3 subgraphs while the other two G A P contain a single K 3 . Although Δ ( G A ) is close for the three disciplines’ large components, the physics G A has r ( G A ) = 3 while math and biology have r ( G A ) = 2 . As shown in the figure, for the L G A , dim ( G A ) = 2 for all three disciplines.

7.2. Authors with Papers Average Graphs and Analysis

Figure 8 displays the three average G A P for this study. Consider the L G A P for the average physics G A P . Notice that one author in the physics L G A P has degree 5 while two other authors have degree 4. The relatively high degrees for the physics author vertices indicate that these professors have significant variety in their research group construction since representative papers are utilized. As depicted and previously mentioned, the average physics G A P structure in Figure 8 reflects the three data derived G A P . In other words, all of the physics departments in this study display a high amount of variety in their departmental collaboration research groups. The three G A P have distinct dim ( G A P ) for their L G A P .
The total number of papers produced during the study’s timeframe by all considered professors was determined. On average, the physics professors produced 3.5 times as many papers as the other two research areas. In two of the physics departments, the number of representative papers outnumbered the authors by 150% due to the variety in the research group construction. Could the high level of productivity be due to the departmental collaboration style of the physics professors? Greater variety in research group construction allows for a broader range of skills and knowledge in collaborative research.

7.3. Papers Only Average Graphs and Analysis

Figure 9 clearly reflects the departmental collaboration style difference between physics and the other two research areas. Notice that the L G P has dim ( G P ) = 1 for math, dim ( G P ) = 4 for physics and dim ( G P ) = 3 for biology.

8. The Degree Diagram and the Durfee Rank

Figure 10 displays a connected G A P along with the degree diagrams for G A P ’s vertex sets A and P. Although Δ ( A ) = Δ ( P ) = 3 , set A has more vertices with degree 3 than P; so r o ( A ) = 3 and r i ( P ) = 2 . One impact of the difference in the two r is that dim ( G A ) = 2 and dim ( G P ) = 3 = dim ( G A P ) . Can G P more accurately reflect the actual collaboration in a department instead of the related G A ? Notice that | A | < | P | in this case.
Consider the large component L G A P from a data derived G A P depicted in Figure 11 (also displayed in Figure 6). On the top far left is the degree diagram of the G A P degree sequence. Set V ( G A P ) has Δ ( G A P ) = 6 and r o ( G A P ) = 4 . On the right side of G A P are the degree diagrams for sets A and P in G A P . Although the degree sequence for A is not graphic, each degree diagram clearly displays the degree relationships within each of the vertex sets compared to the other set. In comparing the two degree diagrams, Δ ( A ) = 6 and r o ( A ) = 4 while Δ ( P ) = 3 and r p ( P ) = 2 . Note that the degree diagram of G A P is merely the union of the degree diagrams for sets A and P with Δ ( G A P ) = Δ ( A ) and r o ( G A P ) = r o ( A ) in this case.
Now examine G A and G P in the lower portion of Figure 11. Here Δ ( G A ) = 5 and r r ( G A ) = 3 reflecting a decrease in these figures compared to those for set A in G A P . For G P , Δ ( G P ) = 8 compared to Δ ( P ) = 3 and r p ( G P ) = 5 versus r p ( P ) = 2 . This significant change is due to degree projection in G A P where Δ ( A ) = 6 generates a K 6 subgraph in G P . Due to | A | < | P | in G A P and the Pigeonhole Principle, all of the high degree vertices are authors as shown by r o ( G A P ) = r o ( A ) = 4 and Δ ( P ) = 3 . This results in all of the papers that are coauthored by the high degree authors gaining a combination of the high degrees in G P .
Because metric dimension is used to compare networks, and based on the change in the hubs analysis between some G A and their related G A P , the next section examines the change in metric dimension given a G A P and its related G A and G P .

9. The Metric Dimension of the Authors with Papers Graph

This section examines changes in the relative distance structure of G A , G P and G A P , provides methods for finding dim ( G A P ) . The metric dimension of G with multiple components is the sum of the metric dimension of each component in G. Although the average graphs all have multiple components, this section is focused exclusively on the changes in any single large component treated here as a connected graph. First the change between dim ( G A P ) to dim ( G A ) and to dim ( G P ) for the average graphs is examined.

9.1. Changes Between the Three Data Derived Average Graphs

For each discipline, the metric dimension of that discipline’s three average graphs’ L G are compared. Unless otherwise stated, reference to G A is referring to the large component of G A ; and the same is true for G A P and G P . The comparison begins with the mathematics discipline followed by physics and concludes with biology.

9.1.1. Metric Dimension in Average Mathematics Graphs:

With respect to the three average mathematics graphs, dim ( G A ) = 2 , dim ( G A P ) = 2 and dim ( G P ) = 1 . In this case, Δ ( G A P ) = Δ ( P ) = 3 generating a K 3 in G A . Thus the relative distance structures of G A and G A P are similar, and the projection generating G P is not structure preserving with respect to the relative distance structure of G A P even though d G P ( p i , p j ) = 1 2 d G A P ( p i , p j ) where p i , p j P .

9.1.2. Metric Dimension in Average Physics Graphs:

In examining the three average physics graphs, dim ( G A ) = 2 , dim ( G A P ) = 4 and dim ( G P ) = 4 with Δ ( G A P ) = Δ ( A ) = 5 producing a K 5 subgraph in G P where Δ ( G P ) = 7 . In this case the distance structures of G A and G A P are different while the distance structures of G P and G A P are similar. Here the projection generating G A does not preserve the relative distance structure of G A P . This again reflects the importance of paper inclusion.

9.1.3. Metric Dimension in Average Biology Graphs:

The metric dimensions of the three average biology graphs are dim ( G A ) = 2 , dim ( G A P ) = 3 and dim ( G P ) = 3 . For biology, Δ ( G A P ) = Δ ( A ) = 4 generating a K 4 in G P with Δ ( G P ) = 5 . Similar to physics, the relative distance structure of G A P is that of G P , not G A .

9.2. Double Distance and Diameter

As mentioned, d G A P ( a i , a j ) , is double d G A ( a i , a j ) ; and similarly for d G A P ( p i , p j ) and d G P ( p i , p j ) . The double distance fact does not apply to the diameter of G A , G P and G A P . In all cases, diam ( G A P ) > diam ( G A ) and diam ( G A P ) > diam ( G P ) due to the double distance relationship. In some cases, but not all, when the diameter in G A P is an author to author path, diam ( G A P ) = 2 · diam ( G A ) . In all cases, diam ( G A P ) 2 > diam ( G P ) and diam ( G A ) diam ( G P ) since no paper can be a pendant vertex in G A P . Recall that, by affecting the possible distance variety in a graph’s DM, diameter impacts metric dimension of a graph. When the diameter is a to a, or p to p, it is even; and when the diameter is a to p, it is odd.

9.3. Using the DM for Any Graph’s Metric Dimension

For any G with a labeling that generates distinct DM blocks, the blocks can be utilized to determine dim ( G ) resulting in a significant increase in computational efficiency. Finding a resolving set is relatively simple. However finding a minimal resolving set is NP-hard [15] as all possibilities must be explored. Regarding a graph’s DM, all rows must be compared to all other rows in order to find a minimal number of rows that resolve G with unique column combinations. Thus, being able to find dim ( G ) using matrix blocks gives significant computational efficiency. Theorem 1 states that given a specific matrix block resolving result, only three of the four DM blocks need to be resolved in order to find dim ( G A P ) . Focus in this note is now on methods for finding dim ( G A P ) .

9.4. DM Block Resolving

Throughout the rest of this note, it is assumed that any G A P has a distinguishing labeling that generates blocks in its DM. As reflected in the DM blocks for any G A P , because G A P is bipartite, distances between elements in the same vertex set are all even, while those between the two partite sets are odd.
DM block resolving is the process of using the portions of G’s distance matrix (DM) rows (or columns) contained in the DM blocks to find the minimum number of rows in each block that produces unique column (or row) combinations.
The phrase “block resolving” is used when it is clear that the blocks are those in a DM. When using block resolving either rows, or columns, can be utilized but it is critical to use the same method for all blocks that are resolved. In this note, rows are used for block resolving and it is assumed that the reader understands that the choice of columns also exists.
All column combinations that contain a zero are unique with the zero implying that the row index vertex is in a W set for the block. The minimum number of rows that gives unique column combinations for that block is denoted by dim ( a × a ) , dim ( a × p ) , dim ( p × a ) and dim ( p × p ) .
An even block is a block that contains only even entries; so the a × a and p × p blocks are even. Analogously, the a × p and p × a blocks are odd blocks. General terms referring to the minimum number of rows that resolve a block are dim ( e v e n ) and dim ( o d d ) . The term dim ( b l o c k ) refers to minimally block resolving of a general block, either even or odd.
When using DM block resolving to find dim ( G A P ) , the focus is on the even blocks as these blocks have row and column indices from a particular partite set, and as shown later, these blocks are also related to the projection graphs. The odd blocks are sometimes referred to as a block extension because literally, these blocks extend the rows of the even blocks by giving the relative distance relationships to the vertices in the other partite set.
Prior to giving four example G A P selected for the variety of their block resolving results and result interpretations, the characteristics of the even and odd blocks are discussed.

9.5. DM Block Characteristics

Reference to a block row refers only to the portion of the DM row contained in the specific block being discussed.

9.5.1. Even Blocks:

Even blocks are always square and symmetric across their diagonal. The dimension of an even block is the partite set cardinality whose vertices index the block. The entries are all even with a single zero in each row, but the entries in the a × a block compared to the p × p block are not necessarily identical sets of even numbers.
Even blocks contain only even diameters. Because any graph has only one diameter measure, an even diameter plus the zeros, provide an even block with greater resolving efficiency by providing greater distance variety compared to its extension block. An even diameter can be in either the a × a block only, in the p × p block only or in both even blocks.
Recall that when row resolving, the zero indicates that that row’s index vertex is in an ordered W set, so any column combination with a zero is automatically unique. Thus the existence of the single zero automatically reduces the number of column combinations that need to be checked for uniqueness.

9.5.2. Odd Blocks

Odd blocks are only square when | A | = | P | . The entries in the odd blocks are all odd with 1s indicating the vertices in the open neighborhood of the vertex that is the row’s index. Thus, Δ ( A ) and Δ ( P ) impact the distance variety in the odd blocks while r ( A ) and r ( P ) , including their r type, indicate the number of rows that might have a larger number of 1s. The odd blocks’ symmetry is to each other, across the diagonal of the DM, resulting in the two odd blocks having the same sets of odd integers. In other words, the rows of the a × p block are the columns of the p × a block and vice versa.
An odd diameter is in both odd blocks and increases distance variety. As extensions of the even blocks, the odd blocks can restrict the use of the related even block vertices in a minimal W set for the G A P .

9.6. DM Block Resolving Examples

When block resolving, dim ( a × a ) and dim ( p × p ) always need to be determined. Based on the results of these two metric dimensions, when dim ( a × a ) dim ( p × p ) , resolving only one more block is required. Following are four simple example G A P with brief explanations of their block resolving and its interpretation.
The adjacency matrix for a K n , A ( K n ) has a distinct recognizable structure of all 1s except for the all zero diagonal. The matrix resulting from 2 · A ( K n ) is A ( K n ) with the 1s replaced by 2s. When the 2 · A ( K n ) structure is found as a subblock in a DM even block, it is called a K n double subblock. The importance of these subblocks relates back to Proposition 2, and the double distance relationship between G A P and its projection graphs.
A dim ( b l o c k ) box is a 2 × 2 box that displays the metric dimension of each DM block. Thus, the dim ( b l o c k ) box displays the relationship of the four dim ( b l o c k ) allowing easier comparison and overall interpretation.

9.6.1. Example 1:

Figure 12 displays a G A P with its DM and dim ( b l o c k ) box where dim ( a × a ) = dim ( a × p ) and dim ( p × p ) = dim ( p × a ) .
In examining the DM in Figure 12’s center and the dim ( b l o c k ) box on the right, the entire rows for a and b have unique column combinations for all vertices in the graph as reflected by dim ( a × a ) = 2 = dim ( a × p ) . There is a K 4 double subblock in p × p making dim ( p × p ) = 3 . The row length in p × a is 5 but resolving two odd digits in two rows gives only 2 2 = 4 unique column combinations, so the row length of 5 requires 3 rows for dim ( b l o c k ) . Thus, dim ( G A P ) = 2 and W = { a , b } .

9.6.2. Example 2:

Figure 13 displays another G A P . Both even blocks contain a K n double subblock and dim ( o d d ) does not agree with dim ( e v e n ) for which the odd block is an extension.
In this case, utilizing dim ( a × a ) = 2 is not possible for a minimal W in G A P because the number of 1s in the extension block generates dim ( a × p ) = 3 . Thus, a minimal W is either all three a; or three p, the choice of which depends on the rows that resolve p × a . Note that because dim ( a × a ) dim ( p × p ) once these values are known, checking whether the smaller dim ( e v e n ) provides dim ( G A P ) can be done by resolving only the extension block for the block with the smallest dim ( e v e n ) . Here, dim ( G A P ) = 3 and W = { 2 , 3 , 4 } .

9.6.3. Example 3:

Example 3 shown in Figure 14 displays a G A P where dim ( a × a ) = dim ( p × p ) but dim ( o d d ) is greater than dim ( e v e n ) .
In this case, a minimal W must be constructed with both a and p vertices; so for this example, W = { b , 2 } where both DM rows contain diam ( G A P ) = 4 . Utilizing three a or three p as dictated by dim ( o d d ) resolves G A P but not minimally. Thus, when dim ( a × a ) = dim ( p × p ) all four blocks should be resolved.

9.6.4. Example 4:

As shown in Figure 15, the G A P in Example 4 generates dim ( a × a ) dim ( p × p ) and dim ( a × p ) dim ( p × a ) .
This example provides an additional situation where dim ( a × a ) dim ( p × p ) and resolving the extension block for the block with the smaller dim ( e v e n ) provides dim ( G A P ) by restricting dim ( a × a ) = 2 . In other words, two a vertices do not resolve G A P but three a give dim ( G A P ) . Thus, dim ( G A P ) = 3 and W = { 2 , 3 , 4 } . Note that in this case dim ( G A P ) dim ( e v e n ) and that r p ( A ) = 2 , r i ( P ) = 2 but r o ( G A P ) = 3 . As mentioned, there is more to explore in the r concept than what given in this note.

9.7. Relation of the DM Blocks to the Projection Graphs

The following proposition relates the DM blocks to the two projection graphs.
Proposition 3.
For the distance matrix (DM) and DM block resolving of G A P with authors set A and representative papers set P where G A P has a distinguishing labeling, dim ( a × a ) = dim ( G A ) and dim ( p × p ) = dim ( G P ) .
Proof. 
Because G A is constructed by projecting the vertices in A onto the vertices of P in the related G A P , for each a i , a j V ( G A ) V ( G A P ) , deg G A ( a i ) is the number of 2s in row a i of the a × a block of G A P ’s DM. In other words, if there exists a 2 in the a × a block at the intersection of row a i and column a j , then a i a j in G A because d G A P ( a i , a j ) = 2 · d G A ( a i , a j ) indicating that a i shares a neighbor p with a j in G A P . Thus, if all entries in the a × a block are multiplied by 1 2 the resultant block is isomorphic to G A ’s DM. It follows then that a combination of rows that minimally resolve a × a also minimally resolve G A . Hence, dim ( a × a ) = dim ( G A ) . Given p i , p j V ( G P ) V ( G A P ) , the same reasoning applies resulting in dim ( p × p ) = dim ( G P ) . □
Suppose that for a G A P and its related G A and G P , dim ( G A ) = x and dim ( G P ) = y where x and y may, or may not, be equal. Due to the double distance relationship between G A P and its projection graphs, x number of a minimally resolve all a A ; but x number of a do not necessarily minimally resolve the vertices in A to those in P because the double distance relationship does not apply. Thus, x number of vertices do not necessarily minimally resolve G A P . Given that dim ( G P ) = y , y number of p vertices minimally resolve the vertices in P, but not necessarily minimally resolve set A nor minimally resolve G A P .
Proposition 4.
Any minimal resolving set of G A minimally resolves the vertices a i , a j A V ( G A P ) but not necessarily the vertices in P V ( G A P ) ; so not necessarily G A P . Likewise, any minimal resolving set of G P minimally resolves vertices p i , p j P V ( G A P ) but not necessarily the vertices in A V ( G A P ) ; so not necessarily G A P .

9.8. Complete Graph Double Subblocks

As stated previously, if a 2 is at the DM intersection of v i and v j then these vertices are adjacent in the related projection graph. If the same v i and v j are found in a K n double subgraph of an even DM block, then v i and v j are found in a K n subgraph in the related projection graph, thus impacting the metric dimension of the projection graph. Proof of the following proposition is given by the gray boxes of the DM in Figure 16.
Proposition 5.
Given the distance matrix (DM) for a G A P with authors set A where Δ ( A ) = x and representative papers set P with Δ ( P ) = y , if G A P is given a distinguishing labeling that is consecutive around its even cycle subgraphs, then a K y double subblock is possible in the a × a block and a K x double subblock is possible in the p × p block.

9.9. Existence of Twin Pairs

The existence of twin pairs can affect the possible results for the DM blocks. Figure 16 shows that it is possible to have dim ( p × a ) = 0 reflecting the impact of twins on block resolving.
Proposition 6.
Given a G A P with authors set A and representative papers set P and utilizing row block resolving of G A P ’s distance matrix (DM) and DM block resolving, dim ( p × a ) = 0 if and only if G A P has at least one twin pair.
Proof. 
If G A P has at least one twin a pair, by the definition of a twin pair, the rows in a × p indexed by the twin pair have duplicate entries; and there exist duplicate columns in the p × a block so dim ( p × a ) = 0 .
When dim ( p × a ) = 0 , there must exist duplicate columns in this block. Only the vertices in A can be twins; and any twin pair has identical rows in the a × p block; so twin vertices have identical columns in the p × a block and dim ( p × a ) = 0 . If there are no twin vertices, then there are no identical columns in the p × a block so it is resolvable and dim ( p × a ) 0 . □
Twin a vertices also generate duplicate column combinations in the a × a block because, except for the single zero, their rows have duplicate entries and the a × a block is symmetric along its diagonal. With their need for unique neighborhoods, no two papers can be twins. It follows then that for a G A P , the a × p block is always resolvable when using row block resolving. When dim ( p × a ) = 0 , no W for the G A P can contain onlyp vertices. However, p vertices can be included with a vertices in a minimal W where dim ( G A P ) is determined by dim ( a × a ) and dim ( a × p ) .
Proposition 7.
Suppose that a G A P has authors set A and representative papers set P, where G A P P 3 and G A P is not a subdivided K n . Utilizing G A P ’s distance matrix (DM) along with DM block resolving, if Δ ( A ) > Δ ( P ) , then dim ( a × a ) dim ( p × p ) ; and if Δ ( A ) < Δ ( P ) , then dim ( a × a ) > dim ( p × p ) .
Proof. 
Let Δ ( A ) = x and Δ ( P ) = y . From Proposition 2, when Δ ( A ) Δ ( P ) , Δ ( A ) generates at least a K x subgraph in G P and Δ ( P ) generates at least a K y subgraph in G A . Corollary 1 states that the K n subgraph generated by the smaller maximum partite set degree cannot exceed the K n subgraph generated by the other maximum partite set degree when G A P is not a subdivided K n .
When G A P contains twins, other than the single zero in each a × a block row, the rows of twin a vertices contain identical distances that create blocks of identical column combinations where the other a vertices have columns indexed by the twins. Thus, when x > y , dim ( a × a ) in this case can exceed the metric dimension of the K y double subblock and reach equality with dim ( p × p ) resulting in dim ( a × a ) dim ( p × p ) . When x < y , because, p vertices cannot be twins, equality does not occur. The p × p block has at least a K x double subblock while the a × a block has at least a K y double subblock, so dim ( a × a ) > dim ( p × p ) . □
Because metric dimension is defined across all of G’s vertices, there exists no “efficiency” that could generate dim ( G A P ) < min { dim ( G A ) , dim ( G P ) } . There is also no inefficiency that could produce dim ( G A P ) > max { dim ( a × a ) , dim ( a × p ) } or dim ( G A P ) > max { dim ( p × p ) , dim ( p × a ) } . In other words, because V ( G A P ) is the union of A and P and both max { dim ( a × a ) , dim ( a × p ) } and max { dim ( p × p ) , dim ( p × a ) } minimally resolve the sets A and P across all of V ( G A P ) , dim ( G A P ) max { dim ( a × a ) , dim ( a × p ) } and dim ( G A P ) max { dim ( p × p ) , dim ( p × a ) } . This proves Lemma 1.
Lemma 1.
Suppose bipartite G A P has authors set A and representative papers set P. Concerning the distance matrix (DM) for G A P and using DM block resolving, if dim ( a × a ) dim ( p × p ) , then dim ( o d d ) max { dim ( a × a ) , dim ( p × p ) } .
Theorem 1 states that if dim ( a × a ) dim ( p × p ) , then only three DM blocks need to be resolved.
Theorem 1.
For G A P with authors set a A and representative paper set p P , with respect to block resolving G A P ’s distance matrix (DM), if dim ( a × a ) dim ( p × p ) , then resolving only three blocks determines dim ( G A P ) .
Proof. 
Let dim ( a × a ) = x and dim ( p × p ) = y where x y . It is a given that finding G A P requires determining dim ( a × a ) and dim ( p × p ) . From Lemma 1 { dim ( a × p ) , dim ( p × a ) } max { dim ( a × a ) , dim ( p × p ) } . First let x < y . As the goal is to find a minimal value for dim ( G A P ) , resolving a × p in this case gives the minimal value of a possible W for all vertices in G A P . If dim ( a × p ) dim ( a × a ) , then dim ( G A P ) = dim ( a × a ) ; and if dim ( a × a ) < dim ( a × p ) dim ( p × p ) , then dim ( G A P ) = dim ( a × p ) . If a × p contains identical rows making dim ( p × a ) = 0 , and indicating the existence of twins in G A P , then dim ( G A P ) = max { dim ( a × a ) , dim ( a × p ) } . Now let x > y . Because p × a is the extension of p × p , and using the same logic as when x < y , dim ( G A P ) = max { dim ( p × p ) , dim ( p × a ) } . In either case for x and y, only three blocks need to be resolved in order to find dim ( G A P ) . □
The following proposition follows from Proposition 5 that shows the existence of the K n double subblocks.
Proposition 8.
Suppose G A P has authors set A and representative papers set P. If dim ( G A ) = dim ( G P ) , then dim ( G A P ) = dim ( G A ) = dim ( G P ) .
Proof. 
Suppose dim ( G A ) = dim ( G P ) = x so dim ( e v e n ) = x . Let x < y . If dim ( o d d ) = x , then it is a given that dim ( G A P ) = x . If dim ( a × p ) = x and dim ( p × a ) = y , then dim ( G A P ) = x because x number of a vertices minimally resolve the graph. It follows that if dim ( a × p ) = y and dim ( p × a ) = x , then G A P is minimally resolved by x number of p vertices so dim ( G A P ) = x . If dim ( o d d ) = y , then a combination of a and p vertices whose numbers total x can minimally resolve G A P giving dim ( G A P ) = x = dim ( G A ) = dim ( G P ) . □
Proposition 9 is focused on using maximum degree and r to find dim ( G A P ) . Recall that R r indicates the degree diagram row indicated by the r type and ( R r ) denotes the length of this row. The table in Figure 1 is referenced in the following proof of Proposition 9.
Proposition 9.
Assume that G A P has authors set A and representative papers set P. If Δ ( A ) = Δ ( P ) , r ( A ) = r ( P ) and the r corner type for both A and P is the same, then dim ( G A P ) = dim ( G A ) = dim ( G P ) .
Proof. 
Assume Δ ( A ) = Δ ( P ) = x . Let G A P P n where n = | A | + | P | and n is odd. Because Δ ( A ) = Δ ( P ) and r ( A ) = r ( P ) with the same corner type, | A | 5 and | P | 4 , Δ ( A ) = Δ ( P ) = 2 , r r ( A ) = r r ( P ) = 2 and it is a given that dim ( G A P ) = dim ( G A ) = dim ( G P ) = 1 .
Recall that for any G A P , deg ( a ) = deg ( p ) . Because Δ ( A ) = Δ ( P ) = x , the degree diagrams for both A and P have the same top row. Because r ( A ) = r ( P ) and both r have the same corner type, for both diagrams, the rows above and below R r plus row R r have the same general structure as shown by the table in Figure 1. Deviation from similarity is controlled by the fact that deg ( a ) = deg ( p ) . Thus, A and P must have very similar or identical large degree structures. From Proposition 2 there exists a K x subgraph in the projection graphs. The similarity of the large degree structures forces the degree structures of the projection graphs to deviate from the K x subgraph structure in similar ways. Hence, if Δ ( A ) = Δ ( P ) and r ( A ) = r ( P ) and the r corner type for both A and P is the same, then dim ( G A P ) = dim ( G A ) = dim ( G P ) . □

10. Conclusions

10.1. Conclusions Regarding Social Aspects

Concerning the impact of paper exclusion, compared to studies utilizing large databases, the small data set of 245 professors allowed for detailed department level analysis. To minimize graph size yet reflect the collaboration structure, this study used representative papers that reflected unique research groups within the biology, mathematics and physics departments. Overall, 50% of the professors participated in departmental collaboration during this study’s timeframe.
Utilizing the average graph concept and constructing authors only graphs ( G A ), papers only graphs ( G P ), and bipartite authors with papers graphs ( G A P ), this study found substantial differences when comparing the three average G A large components to the three average G A P large components. An analysis of department research hubs revealed a change of 27% between the G A hubs and the G A P hubs. Based on this fact, Conjecture Section 6.2 states that excluding papers/research groups might make identifying the model for network evolution impossible.
Similar to other studies, when overall papers were considered, the physics departments generated 3.5 times as many papers as the other two departments. Comparing the three average G A P large components in Figure 8, the physics G A P displays significantly larger author vertex degree, and for two of the three physics departments, the number of representative papers, that align with unique research groups, outnumbered authors by 150%. These two results reflect that the physics professors in this study have greater variety in the construction of their research groups. Could the physics style of departmental collaboration result in higher paper/research output?
Compared to the results in [24], the percent of vertices in this note’s G A large components is much less. At what investigation level, does the order of the large component get close to the range 60% to 90+% inclusion found in other studies involving larger data sets? Would exploring institutional collaboration create the large components with higher percents of inclusion?

10.2. Conclusions Regarding Network Analysis

Of the 18 G A and G P largest components constructed from collected data, 33% were bipartite. Results showed that the same G A structure resulted from very different G A P structures. This reflects that G A interpretation of the related G A P is not well defined, reiterating the need for paper inclusion.
This note showed that the distance matrix (DM) of any G A P along with DM block resolving, provides an accurate and more efficient method for finding the metric dimension, dim ( G A P ) , of any G A P . Metric dimension comparisons of the G A and G P to the related G A P showed that paper structure significantly influences the network’s distance metrics giving greater foundation for Conjecture Section 6.2. In other words, dim ( G A ) alone cannot be reliably used to predict dim ( G A P ) .
Although DM block resolving provides an accurate and more computationally efficient method for finding the metric dimension for bipartite graphs, many networks are not bipartite. Can a method be developed, perhaps by a labeling methodology, that creates clear blocks in any graph’s DM?
The challenges faced with paper inclusion in the collaboration studies based on the large databases are significant. As done in this study, utilizing representative papers reduces the number of papers and accurately reflects unique research groups. However, the large number of authors on many papers poses a potential problem in defining the representative papers. Can a co-authorship “core” for papers be identified? Is there some other strategy that allows for paper inclusion without hindering collaboration network analysis due to enormous network order and size?

References

  1. Anderson TR, Hankin RKS, Killworth PD (2008) Beyond the Durfee square: Enhancing the h-index to score total publication output. Scientometrics 76(3):577-588. [CrossRef]
  2. Andrews GE, Eriksson K (2004) Integer Partitions. Cambridge Univ. Press, Cambridge.
  3. Anuradha A, Amutha B A study on metric dimension of some families of graphs. AIP Conference Proceedings 2019 2112(1):1-5. [CrossRef]
  4. Baca M, Baskoro ET, Salman ANM, Saputro SW, Suprijanto D (2011) The metric dimension of regular bipartite graphs. Bulletin mathématiques de la Société des sciences mathématiques de Roumanie 54(102):15-28. [CrossRef]
  5. Blanchard CG, Becker JV, Bristow AR (1979) Attitudes of Southern Women: Selected Group Comparisons. Psychology of Women Quarterly 1(2):160-171. [CrossRef]
  6. MCarey MR, Johnson DS (1979) Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY.
  7. Cartwright D, Harary F (1956) Structural balance: a generalization of Heider’s Theory. The Psychological Review 63(5):277-293. [CrossRef]
  8. Chartrand G, Ehoh L, Johnson MA, Oellermann OR (2000) Resolvability in graphs and the metric dimension of a graph. Discrete Applied Mathematics 105:99-113. [CrossRef]
  9. Chartrand G, Lesniak L, Zhang P (2016) Graphs & Digraphs, 6th Ed. CRC Press, Boca Raton, FL.
  10. Cummins CJ, King RC (1987) Young diagrams, supercharacters of OSp(M/N) and modification rules. Journal of Physics, A: Mathematical and General 20:3103-3120. [CrossRef]
  11. Diaz J, Pottohen O, Serna M, van Leeuwen EJ (2017) Complexity of metric dimension on planar graphs. Journal of Computer and System Sciences. 83:132-158. [CrossRef]
  12. Diestel R (2017) Graph Theory, 5th Ed. Springer: Graduate Texts in Mathematics series, Berlin.
  13. Harary F, Norman RZ (1953) Graph theory as a mathematical model in social science. Bulletin de l’Institut de recherches économiques et sociales 26(8). [CrossRef]
  14. Harary F, Melter RA (1976). On the metric dimension of a graph. Ars Combinatoria 2:191–195.
  15. Hartung S, Nichterlein A On the parameterized and approximation hardness of metric dimension. 2013 IEEE Conference on Computational Complexity Stanford University. Palo Alto, CA. 266–276. [CrossRef]
  16. Heinz T, Shapira P, Rogers JD, Senker JM(2009) Organizational and institutional influences on creativity in scientific research. Research Policy 38:610-623. [CrossRef]
  17. Janežič D, Miličević A, Nikolić S, Trinajstić N (2015) Graph-Theoretical Matrices in Chemistry, 2nd Ed. CRC Press, Taylor & Francis Group, Boca Raton, FL.
  18. Kyvik S, Reyert I (2017) Research collaboration in groups and networks: differences across academic fields. Scientometrics 113:951-967. [CrossRef]
  19. Lovász L (2010) Graphs and Geometry. American Mathematical Society. Volume 65. Providence, RI.
  20. Merris R (2003) Combinatorics, 2nd Ed. Wiley Interscience, John Wiley & Sons, Inc. Hoboken, NJ.
  21. Merris R (2001) Graph Theory. Wiley Interscience, John Wiley & Sons, Inc. Hoboken, NJ.
  22. Merton RK (1968) The Matthew effect in science. Science. 159(3810):56-63. [CrossRef]
  23. Milgram S (1967) The small-world problem. Psychology Today 1(1, May):61-67. [CrossRef]
  24. Newman M (2001) The structure of scientific collaborations networks. PNAS 98(2):404-409. [CrossRef]
  25. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. PNAS-06 99(12):7821-7826. [CrossRef]
  26. Newman M (2004) Coauthorship networks and patterns of scientific collaboration. PNAS 101(1):5200-5205. [CrossRef]
  27. Newman MEJ (2005) Power laws, Pareto distributions and Zipf’s law. Contemporary Physics. 46(5):323–351. [CrossRef]
  28. Newman M (2018) Networks, 2nd Ed. Oxford Press. Oxford, England.
  29. Prabhu S, Jeba SR, Stephen S (2025) Metric dimension of a star fan graph. Scientific Reports 15(102). 102-108. [CrossRef]
  30. de Solla Price DJ (1965) Networks of scientific papers: the pattern of bibliographic references indicates the nature of the scientific front. Science 149(3683):510-515. [CrossRef]
  31. Ruscio J, Seaman F, D’Oriano C, Stremlo E, Mahalchik K (2012) Measuring scholarly impact using modern citation-based indices. Measurement 10:123-146. [CrossRef]
  32. Schriba I, Farrugia S (2011) On the spectrum of threshold graphs. ISRN Discrete Mathematics 1-21. [CrossRef]
  33. Slater PJ (1975). Leaves of trees (Proc. 6th Southeastern Conference on Combinatorics, Graph Theory, and Computing, Florida Atlantic Univ., Boca Raton, FL) Congressus Numerantium 14:549–559.
  34. Tillquist RC, Frongillo RM, Lladser ME (2023) Getting the lay of the land in discrete space: a survey of metric dimension and its applications. SIAM Review 65(4):919-962. [CrossRef]
  35. Trinajstić N (1992) Chemical Graph Theory. Taylor and Francis, LLC; Boca Raton, FL.
  36. van Rijnsoever FJ, Hessels LK, Vandeberg RIJ (2008) A resource-based view on the interactions of university researchers. Research Policy 37:1255-1266. [CrossRef]
  37. Tapendra BC, Dueck S (2025) The metric dimension of circulant graphs. Opuscula Mathematica 45(1):39-51. [CrossRef]
  38. Wagner CS, Leydesdorff L (2005) Network structure, self-organization, and the growth of international collaboration in science. Research Policy 34:1608-1618. [CrossRef]
  39. Wang J, Tian F, Liu Y, Pang J, Miao L (2023) On graphs of order n with metric dimension n-4. Graphs and Combinatorics 39(29):1-18. [CrossRef]
  40. Watts DJ (2003) Six Degrees: The Science of a Connected Age. W.W. Norton & Co., New York, NY.
  41. Watts DJ, Strogatz SH (1998) Collective dynamics of the ‘small-world’ networks. Nature 3393:440-442. [CrossRef]
Figure 1. Five types of r corners and a table with respect to degree diagram row length.
Figure 1. Five types of r corners and a table with respect to degree diagram row length.
Preprints 173236 g001
Figure 2. Graph G, two W i resolving sets for G and G’s distance matrix, DM.
Figure 2. Graph G, two W i resolving sets for G and G’s distance matrix, DM.
Preprints 173236 g002
Figure 3. Examples of largest components from data derived G A , G A P and G P .
Figure 3. Examples of largest components from data derived G A , G A P and G P .
Preprints 173236 g003
Figure 4. Example with central degree 6 author and adjacent gray papers.
Figure 4. Example with central degree 6 author and adjacent gray papers.
Preprints 173236 g004
Figure 5. With respect to Proposition 2, two G A P , with the G A related to each G A P , where G P G A .
Figure 5. With respect to Proposition 2, two G A P , with the G A related to each G A P , where G P G A .
Preprints 173236 g005
Figure 6. With respect to Proposition 2, a G A P with related G A and G P .
Figure 6. With respect to Proposition 2, a G A P with related G A and G P .
Preprints 173236 g006
Figure 7. The average G A graphs for the three disciplines.
Figure 7. The average G A graphs for the three disciplines.
Preprints 173236 g007
Figure 8. The average G A P graphs for the three disciplines.
Figure 8. The average G A P graphs for the three disciplines.
Preprints 173236 g008
Figure 9. The average G P graphs for the three disciplines.
Figure 9. The average G P graphs for the three disciplines.
Preprints 173236 g009
Figure 10. G A P with maximum degree 3 in both A and P vertex sets.
Figure 10. G A P with maximum degree 3 in both A and P vertex sets.
Preprints 173236 g010
Figure 11. Top: Data derived G A P and the degree diagrams for G A P , vertex set A and set P. Bottom: The related G A and G P with degree diagrams.
Figure 11. Top: Data derived G A P and the degree diagrams for G A P , vertex set A and set P. Bottom: The related G A and G P with degree diagrams.
Preprints 173236 g011
Figure 12. Example 1 for DM block resolving that displays G A P , G A P ’s DM and the DM’s dim ( b l o c k ) box.
Figure 12. Example 1 for DM block resolving that displays G A P , G A P ’s DM and the DM’s dim ( b l o c k ) box.
Preprints 173236 g012
Figure 13. Example 2 for DM block resolving.
Figure 13. Example 2 for DM block resolving.
Preprints 173236 g013
Figure 14. Example 3 for DM block resolving.
Figure 14. Example 3 for DM block resolving.
Preprints 173236 g014
Figure 15. Example 4 for DM block resolving.
Figure 15. Example 4 for DM block resolving.
Preprints 173236 g015
Figure 16. A G A P where | A | > | P | and dim ( p × a ) = 0 due to twin pair { b , c } .
Figure 16. A G A P where | A | > | P | and dim ( p × a ) = 0 due to twin pair { b , c } .
Preprints 173236 g016
Table 1. Faculty collaboration information for the nine departments.
Table 1. Faculty collaboration information for the nine departments.
Discipline Mean # faculty Mean # and (%) collaborate
Math 29 14 (48%)
Physics 26 13 (50%)
Biology 27 14 (52%)
Table 2. G A and G A P hub analysis based on collected data from the nine departments.
Table 2. G A and G A P hub analysis based on collected data from the nine departments.
Discipline G A Total G A chair - G A P Total G A P chair
# hubs as hub - # hub as hub
Math 9 1 - 11 2
Physics 5 1 - 9 0
Biology 8 1 - 8 1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated