1. Introduction
New knowledge, ideas, and innovations are created thanks to the development of scientific cooperation. Scientific cooperation is a joint activity of scientists to create and verify new knowledge. The results of scientific cooperation are the publication of scientific publications, the organization and implementation of joint scientific projects, and the organization of conferences, seminars, and other scientific events. The increase in the productivity of the scientific activity of individual scientists and scientific teams is a factor that affects the development of innovations in the region and the state as a whole. The current direction of scientometrics is identifying the influence of demographic, social and gender differences on publishing productivity. In works [
1,
2], it was determined that the form and intensity of scientific cooperation affect publishing productivity and the creation of innovations [
3]. This process is significantly influenced by the peculiarities of the construction of the social space in which scientific teams cooperate. It can be assumed that one of the influencing factors in forming patterns of scientific collaboration is gender. The impact of gender differences on publication productivity and citation of scientific publications is described in [
4]. In work [
5], it was found that gender-heterogeneous working groups allow the production of scientific results of higher quality. However, it is complicated by natural gender homophily [
6]. The ability to collaborate with peers also manifests itself in citations of scientific publications. In work [
7], scientists tend to cite publications by authors of the same gender as themselves. Gender-based questions about homophily in research are described in works [
8,
9].
Ensuring respect for human dignity, equality, and respect for human rights are critical values of the EU and other countries with a high human development index. An essential condition for ensuring these values is the implementation of a policy of gender equality and the elimination of gender gaps. Therefore, in recent decades, there has been a tendency to decrease the influence of gender differences among performers on the formation of the composition of scientific projects. In particular, work [
10] indicated that the influence of gender differences on scientific publication productivity is decreasing in current conditions, especially among young scientists. The analysis in [
10] claims that gender differences in the productivity of scientific activity have been disappearing recently. A few decades ago, the number of scientific publications with male authors significantly exceeded that of female authors, but now this trend has changed. However, it was difficult for women to get positions in science for a long time since this field was almost entirely male [
11]. However, even with the gender representativeness of the STEM direction in education and science, this process was accompanied by increased gender differences in productivity and influence [
12].
The prevailing situation is that there are fewer females than males in the higher ranks in academic circles. In work [
13], it is indicated that, personally, females with high scientific results in a scientific group significantly influence the productivity of this scientific group. In work [
14], it is indicated that this is influenced by the higher emotional intelligence of females compared to males. Ensuring gender diversity in educational and scientific spaces is complex and multifaceted. Some aspects of gender diversity policy in university networks are described in [
15]. It is important to note that gender representativeness can differ in different science areas. In work [
16], a study of the results of the work of 150,000 mathematicians was conducted. It has been shown that females publish less early in their careers and drop out of research faster than males. As a result, top mathematics journals publish fewer articles authored by women. A similar trend can be observed in the direction of computer science. However, this is a separate research task.
Even though the trend of overcoming gender gaps is one of the priorities in developed countries, questions remain as to whether scientific publications with a different gender composition are cited differently. And if so, what could it be connected with? To find answers to this question, choose a method using which you can effectively evaluate citation impact. Traditionally, citation impact is defined as the number of times subsequent publications cite a publication.
One of the methods that can be used to evaluate the scientific publication productivity or citation impact of a scientist is the PageRank method [
17]. The traditional purpose of the PageRank method is to determine the influence of a user on social networks or to evaluate the importance of web pages. Each network user or page is assigned an actual number that measures importance or reputation. The larger this number, the higher the importance [
18]. There are modifications to the PageRank method to calculate the productivity of scientific activities, the citation index, scientific journals' reputation, etc. The classical PageRank method uses only edge relations and does not consider higher-order structures, particularly subgraphs. One of the concepts of modifying the PageRank method, described in [
19], is the complication of the evaluation calculation by including higher-order structures in the calculation. In work [
19], it is shown that this approach helps perform the ranking of social network users better. This approach makes sense because citation networks tend to have a complex structure. This fact can be considered to assess the impact of citations in practice. However, it is challenging to use this method in real-time. A dynamic change in the structure of the citation network leads to the need to recalculate the scores, which is cumbersome.
In [
20], an iterative method for calculating PageRank is proposed, simplifying the rating calculation. In general, using the PageRank method allows you to consider all the information about all the citations of the network authors when evaluating. While the h-index [
21] and its analogs, such as the i10-index, g-index, etc., when calculating the productivity of scientific activity, lose part of the citations outside the core. The work [
22] describes the method of calculating the scientific productivity of collective subjects (universities, scientific institutes, departments, faculties, etc.) based on the Time-Weighted PageRank Method with Citation Intensity (TWPR-CI). It is shown that the advantage of the TWPR-CI method is the higher sensitivity of the scientific productivity estimates for new collective subjects that it averages during the first ten years of observation. The method's sensitivity is essential and can be used for citation impact evaluation, especially for recently published posts. However, the number of citations of new publications may be small, so this method will not differ from the classic PageRank method.
An analysis of the continuity of research in intergender scientific cooperation [
23] is a direction that allows a better understanding of the features of the involvement of scientists of different genders in joint scientific projects. Well-known methods of researching patterns of scientific cooperation and choosing scientists for the organization of projects [
23,
24] can also be used to study the influence of gender on scientific interaction. Also, the methods described in works [
25,
26,
27,
28,
29,
30] can be used to evaluate the productivity of scientific activity, management, and competence selection of project executors using a gender approach. The work [
31] describes a thorough study of the impact of gender inequality on scientific careers in different countries. It found that the increase in female participation in science over the past 60 years has been accompanied by a widening of the gender gap in both scientific productivity and impact.
The article hypothesizes that there is a citation dependency impact of scientific publications from different gender compositions of the authors of these publications. If the effect is detected, it may mean that the gender composition of scientific teams working on joint research affects their scientific publication productivity. This trend may differ depending on the countries and areas of scientific research, and may change over time. Accordingly, the article's goal is citation analysis impact of scientific publications by authors with different gender compositions. Also, the article does not suggest that biases are conscious and that biases may depend on other socioeconomic and cultural factors but allow to reveal existing inequalities. Identified differences in the citation of scientific publications are not a sign of discrimination based on gender but are an indicator that captures the current state of publication activity.
A citation data set of scientific publications was investigated Network Dataset (13 versions) of more than 5 million scientific publications and 48 million citations [
32], collected from databases such as DBLP [
33], ACM [
34], Microsoft Academic Graph [
35], and others. The construction of the database is described in more detail in [
36]. The following research stages were implemented:
Calculate the citation impact for each scientific publication in the citation network. For this, a method based on calculating the number of citations of scientific publications was used. Also, for citation impact calculation, the PageRank method was used [
37,
38].
All publications are divided into eight classes according to the gender composition of the authors of these publications. The publication's belonging to the corresponding cluster is determined by the author's article based on a unique service for determining the gender of a person by their first name.
To set the dependency citation impact of scientific publications from the gender composition of the authors of these publications, the obtained results for eight classes are compared among themselves. Special attention should also be paid to citation scores' impact on scientific publications by authors from different countries. Analyzing the change in citation scores' impact on scientific publications from different countries is also essential.
Researching the influence of gender differences on scientific publication productivity is relevant for the development of innovations and scientific production in general. The identified gender inequality in the academic circle should be eliminated at the institution of higher education or scientific research institution and the state level. An increase in the scientific publishing activity of the authors contributes to the growth of the scientific productivity of the institutions with which these authors are affiliated. The described study continues the research published in works [
22,
38].
2. Methods and Data
2.1. Basic Terms and Concepts
Some terms and concepts have been used in the publication. Citation impact is determined by the number of times subsequent publications cite a publication. This study used the PageRank method to calculate the citation impact of scientific publications. The citation impact of a scientific publication, which was calculated as a result, is called PageRank citation impact. Also, the traditional method of calculating their total number of citations was used to evaluate the impact of scientific publications.
The work focuses on the citation calculation impact of scientific publications with different gender compositions. This is important to understand the regional distribution by country and the change over time in the intensity of citation of scientific publications with different gender compositions: male, female, and mixed.
Patterns for the gender composition of authors were highlighted. Each pattern corresponds to a specific class in which scientific publications were included. Each of these classes is studied separately. To evaluate the citation impact of scientific publications by authors from different countries using open data collected over a long period. This allows you to investigate the change of citation impact of scientific publications for different classes over time. Also, sufficient data allows us to analyze the citations separately and the impact of scientific publications in different countries.
The work examines eight patterns for the gender composition of authors of scientific publications. It is assumed that a particular pattern will determine each article, and the citation score impact for these articles will differ. All scientific publications are divided into eight classes or subsets for each pattern separately. Let is the set of scientists, n is the number of scientists. Let is the set of scientific publications published by scientists from set S, and let m is the number of scientific publications. With each publication , one or more authors of this publication are associated. We set the function , which the set of pairs will determine , , . Let's set the function: determines the gender of each scientist from the set S. Then define a tuple: .
If for scientific publications
,
,
,
,
,
, then all authors of scientific publications
are women and publications belong to the pattern "Fff". If
then publications belong to the pattern "F". If
,
,
, then the authors of the scientific publications
are male and, accordingly, the publications belongs to the "Mmm" pattern, if
, the publication belongs to the "M" pattern. Other patterns are described in
Figure 1. A capital letter at the beginning of the pattern's name indicates the gender of the first author of the scientific publication, respectively F – female, M – male. The analysis of the specified number of classes or subsets of scientific publications corresponding to the specified patterns is sufficient for the study.
It should be noted that the gender composition of publications is determined based on a service that checks the gender of the authors of these publications. Separately, a significant number of publications with an uncertain gender composition should be considered, when at least for one author, the service cannot identify author’s gender with sufficient accuracy. It should also be understood that the obtained results may have some deviations since, among the authors, a certain number of persons may identify themselves as not binary. Still, the first name cannot determine it.
2.2. The Assessmalet of citation impact and PageRank citation impact of scientific publications
To calculate the citation index impact for each scientific publication, you need to calculate the number of citations of this publication in other scientific publications. This indicator shows the influence of a scientific publication. The higher the citation rate impact of a scientific publication, the greater the influence of this publication. If is the citation scores impact for each scientific publication , , . This indicator only shows the total number of citations, but it can quantify this publication's interest among other relevant authors.
PageRank method to evaluate the influence of scientific publications. This method allows you to determine the impact of a scientific publication in comparison with other publications under consideration. According to the PageRank method, the scalar evaluation of the citation impact of a scientific publication
is
calculated according to the formula:
where is
the PageRank score citation impact of a scientific publication
,
,
,
,
the coefficient that determines the presence of a scientific publication
in
the list of publication citations
,
,
is a coefficient that ensures the existence of a non-trivial solution of the system of linear algebraic equations (1).
As a result of applying formula (1), a homogeneous system of linear algebraic equations is constructed:
where
is the matrix of coefficients of the system of the form :
where E is the single matrix,
is a column vector unknown of grades,
For there to be a non-trivial solution of the system of algebraic equations (1), the matrix B must be degenerate, i.e.,
.
Let's ask a subset of the Cartesian product
, which determines the citation of publications
. Plural scientific publications which cited by a given publication
we define through
. The formulas can determine the coefficients of system (1):
where
is the indicator of the presence of the publication
in the list of publication references
,
is the value inverse of the total number of citations in the publication
.
After finding the estimates, it is advisable to standardize them according to the formula
where
is the PageRank score citation impact of a scientific publication
,
,
is the normalized PageRank score citation impact of a scientific publication
,
.
The more citations a scientific publication has over time, the higher its citation impact. Therefore, to evaluate the citation impact of a scientific publication, you can count the number of citations of this publication. The advantage of calculating the citation score impact of a scientific publication index using the PageRank method is that this method considers the influence of a scientific publication by the number of citations compared with the citations of other scientific publications.
The citation base of scientific publications was analyzed in the Network Dataset (ver. 13), and a citation network was built. Next, the citation score was calculated for all scientific publications based on the number of citations and PageRank rating citation impact of all scientific publications. It is necessary to solve the system of linear algebraic equations of large dimensions (2) to find the PageRank score citation impact. The iterative process of the Gauss- Seidel method is used to find the approximate solution of the system of linear algebraic equations (2). At step zero, the value of the PageRank scores citation impact of all scientific publications is equal to 1. At the k-th step, the value of each PageRank score citation impact The formula to find the index of the publication:
where
is the approximate value of PageRank citation impact publications
at the k-th step,
is the approximate value of the PageRank estimate citation impact publications
at the (k-1)-th step, and the coefficients are calculated according to formulas (3), (4).
After each step, starting from zero, the maximum relative change in citation scores was calculated to impact scientific publication according to the formula:
where
is the maximum relative change in PageRank scores citation impact scientific publication
. The iterative method stops if
the maximum relative change in citation scores impacts scientific publication
. The value
is some small number that is specified in advance. After that, the values are normalized according to the formula (5).
A method for determining the gender composition of authors of scientific publications is proposed. The conceptual diagram of the method is shown in
Figure 1. The method consists of three stages.
At the preparatory stage, PageRank scores are calculated for each scientific publication’s citation impact and citation impact by the number of citations.
In the first stage, the gender identity of the authors is determined by their names using the genderize.io service [
39]. This service allows you to determine with the specified accuracy whether the entered first name belongs to a male or female. First is used to determine the gender name of each author. If the name belongs to a male's name according to the genderize.io service (identification accuracy threshold exceeds 0.9), then the author is identified as a man. If the name belongs to a female, according to the genderize.io service (identification accuracy threshold exceeds 0.9), the corresponding author is identified as a female. If the identification accuracy threshold is less than 0.9, then we believe the author's gender cannot be determined. The threshold is chosen empirically since the gender of the author should be identified as accurately as possible. As already indicated, among the authors of publications, there may be a small part of those who, according to the genderize.io service, are identified as male or female, but they are not binary. Determining this fact by the first name is impossible.
In the second stage, the set of scientific publications with the known gender of the authors is divided into eight subsets (
Table 1). If the gender of at least one of the authors could not be determined, then the article belongs to the subset with an uncertain gender composition of authors. Each author of a scientific publication has a specific affiliation. Accordingly, the publication belongs to those countries whose authors are affiliated with institutions of higher education or scientific institutions of these countries.
From the database of scientific publications, Citation Network The dataset was selected from those scientific publications affiliated with the list of countries with different gender parity scores according to the Global Gender Gap Report 2022 [
40]. This is necessary to check whether there is a correlation between citation scores impact of scientific publications by authors from certain countries on their gender parity score, according to the Global Gender Gap Report 2022.
Also, to establish the dynamics of changes in the citation rating impact of scientific publications of different countries over time, their evaluations were calculated for two patterns with purely male and female authors.
Jupiter notebook environment was used for scientometric analysis and data set processing in Python programming language.
4. Discussion
4.1. Findings
The estimates of citation impact may, to some extent, reflect the productivity of the authors of these publications. The more the author's publications are cited, the more author is published in the best scientific journals. Accordingly, for such an author, there will be faster career growth in science and will be more invited to participate in scientific projects, etc. There is a "closed circle" effect here. If the author's publications are poorly cited, the career growth of such an author will be slower.
Since two performance assessment methods were used, the correlation coefficient between all assessments was calculated for their comparison. The correlation coefficient calculated between the estimates by the PageRank method and the number of citations equals 0.754. The correlation coefficient was also calculated for non-zero scores, equal to 0.647. This makes it possible to argue that the methods provide related but not functionally dependent estimates. Since relative evaluations are used for comparison, the different number of scientific publications from different patterns affects the evaluation result.
As you know, the participation of females in science is complicated, mainly due to pregnancy, the need to devote more time to raising children, and the greater representativeness of males in the management of scientific projects. Even a short-term pause in scientific activity can affect the dynamics of career growth in this direction, publication of high-quality scientific papers, research in scientific projects, etc. It can become more acute in different cultures and according to the socioeconomic status of the countries. Accordingly, this direction depends on ensuring gender equality in the country.
Based on the results, it can be concluded that scientific publications with male authors are cited more. Accordingly, their scientific publication productivity will be higher. It is established that the citation impact of a scientific publication depends on the gender composition of its authors. This means that the gender composition of scientific teams working on joint research affects their scientific publication productivity. Considering the superiority of publications with a male composition over publications with a female composition, we can conclude gender inequality. That is, the scientific publication productivity of female authors in these conditions will be lower than male authors.
However, the dynamics of evaluations of the advantages of subsets according to the defined patterns of the top ten countries by publication representation in the Citation Database Network Data show an overall improvement in gender equality in science.
Citation scores impacted scientific publications by certain countries' authors' gender parity scores, according to the Global Gender Gap Report 2022 [
40]. It was established that the correlation coefficient is -0.168, which indicates a weak anti-correlation. This can be explained by the fact that the gender parity score refers to all aspects that affect gender equality in a country. In this study, only the aspect of scientific activity is considered, particularly one of its components: publication activity and citation of scientific publications. In addition, many other socioeconomic and cultural factors influence the equal representation of females and males in science and their scientific results.
4.2. Limitations and Future Research Lines
A limitation of the study is that in the Citation database Network Dataset, most publications relate to the subject area of natural sciences. Accordingly, the presentation of scientific publications in the social sciences or humanities could be more extensive. It is possible that, for publications in a non-naturalist subject area, value evaluations of the citation impact of scientific publications will differ from those calculated in this research. Also, note that the number of citations to scientific publications in some countries may influence the results received.
Another limitation is the impossibility of setting authors from not binary gender since identifying whether the author is male or female was made based on their first names.
The more citations a given article receives over time, the higher its influence and the higher the author's productivity. Accordingly, one of the directions of future research is the assessment of aspects of the organization of project teams with different gender compositions on the productivity of each team member and the team's results as a whole. Also, an essential aspect of future research is to show the dynamics of changes in the evaluations of the preferences of subsets according to the corresponding patterns. In addition, the specified patterns can be considered patterns of scientific collaborations. This can be singled out as a separate indicator for assessing gender equality in scientific activity in different countries, regions, universities, etc. The research aims to inform countries, universities, and scientific institutes of problems related to gender gaps in science and to find ways to overcome them.
5. Conclusions
The work analyzed the citation impact of scientific publications by authors with different gender compositions. The PageRank method was used for citation impact evaluation of scientific publications and calculating the number of citations of scientific publications. The estimated citation impact of publications is calculated for different countries by eight subsets of publications that correspond to the patterns of the gender composition of their authors. The citation score is also calculated impact for the case when the gender composition of the authors of a scientific publication cannot be identified. The advantages of evaluations for subsets corresponding to different patterns are calculated.
Based on the Citation Network Dataset, results of the citation evaluation impact of scientific publications with mostly male authors indicate that the citation impact of publications with a female composition prevails over the citation impact of publications with a female composition. It indicates that articles from mainly male authors are cited more than articles with a mixed or female composition of authors. Analysis advantages in dynamics indicate that in the latter decade, there was a reduced influence of the gender composition of the authors' publications on citation impact. This may be the result of gender equality policies in many countries. However, the obtained results still confirm the existence of gender inequality in science, which may result from cultural and socioeconomic factors or natural homophily.
The obtained results can be considered more broadly. Author groups are often established, and the same author groups publish different publications in their direction. This means that citation scores are obtained impact of scientific publications with different gender compositions of authors corresponds to the assessment of the productivity of different gender patterns of scientists in scientific collaborations in different countries. This is important for intensifying the debate in the direction of ensuring gender equality and overcoming gender gaps in science. An increase in the scientific publishing activity of the authors contributes to the growth of the scientific productivity of the institutions with which these authors are affiliated. The obtained results do not mean the presence of discrimination based on gender, and the results indicate the peculiarities of citing scientific publications with different gender compositions. However, the intensity of citations of such publications can be influenced by various socioeconomic, cultural, and other factors.
Appendix A (
Table A1,
Table A2 and
Table A3) the power of subsets of publications that correspond to the patterns of their gender composition. The average normalized PageRank scores indicated the citation impact of scientific publications by several citations for countries with more than 100 authors affiliated.