1. Introduction
Vehicle trajectory analysis has become essential to address mobility problems in complex urban environments, where traffic and congestion present increasing challenges. The implementation of artificial intelligence and data mining in this field makes it possible to identify travel patterns from large volumes of data, facilitating the understanding of traffic flows and their relationship with road infrastructure [
1,
2]. This type of analysis supports traffic planners and managers in making data-driven decisions, helping to reduce congestion and optimize the use of road networks [
3].
In addition, intelligent transportation systems, which employ advanced spatial analysis tools, enable real-time traffic monitoring, detecting critical points and assessing safety conditions in various areas of the city [
4]. This constant monitoring enables a timely response to road infrastructure problems, while improving road safety and maintenance [
5]. In turn, spatial data processing techniques have applications that go beyond transportation, providing value in areas such as massive data analysis and the study of consumer behavior, extending the impact of these advances to various sectors [
6,
7].
The analysis of scientific production in emerging areas, such as intelligent transportation systems and the study of vehicle trajectories, provides a better understanding of the trends and impact of these fields in science and technology. Bibliometric studies provide a valuable framework for observing how research in these topics has evolved, revealing patterns of collaboration, citations, and relevance that reflect their growing importance in the scientific community.
Bibliometric study is a discipline that has had an important growth within the scientific community in recent years. Eugene Garfield, with the establishment of the Institute for Scientific Information (ISI) in the 1960s, initiated the measurement of articles, journals, researchers, and institutions [
8]. Bibliometric research examines authorship, publication, citations, and content by applying quantitative measures to a body (corpus) of literature [
9]. Currently, scientific articles are stored and indexed in large scientific databases, allowing to measure the parameters they have, such as their keywords, numbers of citations, numbers of authors, author collaboration and impact, annual scientific production, among others. The main idea is that getting more citations in a scientific field indicates greater importance, quality and is more remarkable [
10]. The reason for indexing articles is given by the following: authors cite other papers for their central idea, this is due to the connection they have with the central theme of their research or work. Since any author can select which article to cite, including only the most relevant and related to their article, most of the articles that are cited could demonstrate the impact or importance they have had within their scientific field. The information that can be obtained can be leveraged by various institutions, as valuable information on both individual and aggregate impact is given. Therefore, it could help in the recruitment of teachers or in devising research strategies in universities and research councils, however, bibliometric studies can also help with information about the history that has had a certain topic, in addition, to publicize the scope or trend that leads that research topic. This helps new researchers to have an idea of the impact that a research topic has on their scientific field [
11]. This type of analysis is made possible through the availability of large bibliographic databases such as Scopus or Web of Science, among others. These indexing services are an important means for the evaluation process in academia.
Scopus is a bibliographic database that collects citations and abstracts from a wide variety of neutral sources. These resources are carefully selected by independent experts who are recognized leaders in their respective disciplinary fields. Scopus offers researchers a range of discovery and analysis tools. This platform not only facilitates the search and retrieval of relevant information, but also promotes collaboration and the exchange of ideas between individuals and institutions in the scientific community. With a broad scope, Scopus indexes content from more than 7000 publishers, covering a diversity of disciplines. In addition, it hosts a vast data collection, with more than 91 million records, including more than 94000 affiliation profiles and the contribution of more than 17 million authors.
From a macroscopic level, metrics can be determined that are common to many journals and are useful for different stakeholders. However, some characteristics change from one context or discipline to another. There are a number of researchers and journals that perform unevenly. In recent years there has been an expansion in the number of journals and an increase in the periods in which they are published, this may be thanks to the expansion of the academic sector in several countries, increasing gradually in the last decade in various countries. In addition, scientific disciplines have different parameters regarding the publication of an article. Therefore, it is important to study, their characteristics and/or equivalent topics, in order to provide a meaningful classification for bibliometric parameters.
The objective of this paper is to analyze the metadata of all articles indexed in the Scopus bibliographic database that perform “algorithms or methods for GPS trajectory clustering”. It is also noted that the samples generated by the bibliographic database were manually filtered to exclude all articles that are not part of the field of study. This article will provide useful information on the main journals that are interested in publishing articles on this particular topic, as well as the evolution of its scientific field over time. In addition, other aspects are discussed, such as the most cited authors, the areas in which these articles are most published, the number of publications per year, strategic diagrams on the impact of the topics, the thematic evolution, among others.
The bibliometric analysis is given graphically by the VOSviewer software which is a software tool to create maps based on network data, to visualize and explore these maps [
12], including graphs of citations, sources and authors. In addition, use is made of the bibliometrix package and its graphical interface biblioshiny of the R programming language, which was developed by Aria and Cuccurullo [
13] to perform the analysis about the graphical distribution of the corresponding author, the most cited articles, the main keywords, the main publication sources, the strategic diagrams of the keywords and the thematic evolution of the keywords. Both softwares are open source, which allows the researcher to use all their functionalities, such as the most cited article, co-authorship, among others.
The remainder of the paper is structured as follows.
Section 2 describes the materials and methods used in the analysis methodology.
Section 3 details the data under analysis, as well as the main findings of the study by means of bibliometrix and its graphical interface biblioshiny. In addition, the analysis by means of VOSviewer of the selected indicators is performed. Finally, section 4 extracts the main conclusions and explains the possible lines of research that can be derived from the analysis.
2. Materials and Methods
To analyze the scientific production in “GPS Trajectory Clustering”, a detailed methodological approach was developed that integrated bibliometric analysis and specialized data visualization tools. First, 559 articles were extracted from the Scopus database, selected through a rigorous filtering process that excluded those works without direct relevance to the research topic. The collection of these articles considered publications from 2002 to 2023 and covered papers that included both general analyses and specific developments in GPS trajectory clustering algorithms.
Visualization and analysis of bibliographic networks was performed using VOSviewer [
12], a specialized tool that allows mapping networks of co-citation, author collaboration and thematic distribution. This software facilitated the creation of structural graphs showing the relationships between the most relevant articles, journals and authors in the field of study. Likewise, the analysis was complemented with the use of bibliometrix and its interface biblioshiny, developed in R language, which enabled the evaluation of keywords, thematic evolution and the identification of emerging trends. Both programs, freely available and open source, offer the advantage of replicability and allow other researchers to extend the application of this method to similar studies.
To quantify the concentration of variables, such as the distribution of authors, countries and research areas, Shannon entropy was used to evaluate the homogeneity of data dispersion. The values obtained were used to interpret the patterns of concentration in authorship and international collaborations, as well as the diversification of topics in the literature on GPS trajectories.
4. Conclusions
This analysis shows that clustering for GPS trajectories comprises a combination of urban planning and the effects that vehicles have on streets, roads or carts. It should be noted that this would not be possible without Global Positioning Systems or GPS. In addition to the integration of correct trajectory clustering algorithms such as TraClus, Kmeans, Tra-Dbscan and others. This article is a significant contribution to the bibliometric analysis of clustering algorithms or methods for GPS trajectories, which includes 559 articles published in Web of Science, these records allowed finding significant results, both in the relationships between keywords, authors, citations, among others. It was detected that there are important articles that are not found in Scopus, for example, Time-focused clustering of trajectories of moving objects by Nanni and Pedreschi [
35], considered in other bibliographic sources as a highly cited article.
Table 9 shows a high concentration of authors from China, although the diversity of countries does predominate. In addition, it can be seen in
Figure 8 that the citations between articles are closely related, possibly indicating that the topic of study is consolidating.
Although there are a wide variety of clustering algorithms for trajectories, there are very few literature or literature reviews about how they work or what fields of research they can target, the only one that deals limitedly is Yuan et al. [
36] with his analysis of clustering algorithms for trajectories, however, he adapts it to a general context of the topic of study. According to the bibliometric review, the study of GPS data obtained from vehicles can help solve both road and urban problems, therefore, in this line there is still a lack of studies that provide starting guidelines for new researchers who wish to enter the field of GPS trajectory clustering, for example, to identify which roads, air or sea spaces are the most suitable for the rapid mobility of multimodal means of transport, planning the routes of modern urbanizations or those under construction to reduce vehicular traffic, analyzing which patterns cause traffic accidents in order to try to avoid them, determining which routes are the most feasible for autonomous vehicles to circulate, establishing safe roads, streets or highways for people using a different means of transportation such as bicycles, scooters, skateboards, among others. Finally, one can also incur in the review of trajectory clustering algorithms focused on another area, both in the analysis of mobility or migration of animals, people, trajectories of robots, unmanned aerial vehicles, hurricane trajectory analysis among others. In relation to this aspect, there are almost no papers indicating the trend of clustering algorithms or methods for GPS trajectories in this field of research.
Figure 1.
Strategic diagram of the KeyWords Plus, generated with bibliometrix. Source: Scopus.
Figure 1.
Strategic diagram of the KeyWords Plus, generated with bibliometrix. Source: Scopus.
Figure 2.
Strategic diagram of the author’s keywords, generated with bibliometrix. Source: Scopus.
Figure 2.
Strategic diagram of the author’s keywords, generated with bibliometrix. Source: Scopus.
Figure 3.
Strategic diagram of the author’s keywords Plus, generated with bibliometrix. Source: Scopus.
Figure 3.
Strategic diagram of the author’s keywords Plus, generated with bibliometrix. Source: Scopus.
Figure 4.
Strategic diagram of the author’s keywords, generated with bibliometrix. Source: Scopus.
Figure 4.
Strategic diagram of the author’s keywords, generated with bibliometrix. Source: Scopus.
Figure 5.
Map of word clouds in titles and abstracts (full count), generated with VOSviewer. Source: Scopus.
Figure 5.
Map of word clouds in titles and abstracts (full count), generated with VOSviewer. Source: Scopus.
Figure 6.
Map of word clouds in titles and abstracts (binary count), generated with VOSviewer. Source: Scopus.
Figure 6.
Map of word clouds in titles and abstracts (binary count), generated with VOSviewer. Source: Scopus.
Figure 7.
Cloud map of journals where articles on “GPS trajectory clustering” were published, generated with VOSviewer. Source: Scopus.
Figure 7.
Cloud map of journals where articles on “GPS trajectory clustering” were published, generated with VOSviewer. Source: Scopus.
Figure 8.
Cloud maps were published from authors’ journals with papers on “GPS trajectory clustering”, generated with VOSviewer. Source: Scopus.
Figure 8.
Cloud maps were published from authors’ journals with papers on “GPS trajectory clustering”, generated with VOSviewer. Source: Scopus.
Table 1.
Main areas of research assigned to the sample papers. Source: Scopus.
Table 1.
Main areas of research assigned to the sample papers. Source: Scopus.
| Research areas |
Records |
% of 1094 |
| Computer science |
391 |
35.74% |
| Engineering |
176 |
16.09% |
| Social sciences |
125 |
11.43% |
| Mathematics |
123 |
11.24% |
| Earth and planetary sciences |
69 |
6.31% |
| Total of the 5 main research areas |
884 |
80.80% |
Table 2.
Number of articles published per year. Source: Scopus.
Table 2.
Number of articles published per year. Source: Scopus.
| Years |
Items |
Annual growth rate |
| 2002 |
2 |
- |
| 2003 |
2 |
0.0% |
| 2004 |
1 |
-50.00% |
| 2005 |
0 |
-100.00% |
| 2006 |
0 |
- |
| 2007 |
1 |
- |
| 2008 |
2 |
100.00% |
| 2009 |
11 |
450.00% |
| 2010 |
12 |
9.09% |
| 2011 |
12 |
0.00% |
| 2012 |
15 |
25.00% |
| 2013 |
21 |
40.00% |
| 2014 |
30 |
42.86% |
| 2015 |
31 |
3.33% |
| 2016 |
37 |
19.35% |
| 2017 |
48 |
29.73% |
| 2018 |
55 |
14.58% |
| 2019 |
68 |
23.64% |
| 2020 |
57 |
-16.18% |
| 2021 |
52 |
-8.77% |
| 2022 |
60 |
15.38% |
| 2023 |
42 |
-30.00% |
| Total |
559 |
15.6% |
Table 3.
Ten countries of corresponding authors. Source: Scopus.
Table 3.
Ten countries of corresponding authors. Source: Scopus.
| Country |
Articles |
Frequency |
SCP |
MCP |
MCP Ratio |
| China |
203 |
36.3% |
159 |
44 |
21.7% |
| USA |
29 |
5.2% |
17 |
12 |
41.4% |
| India |
16 |
2.9% |
14 |
2 |
12.5% |
| Italy |
13 |
2.3% |
12 |
1 |
7.7% |
| Korea |
11 |
2.0% |
8 |
3 |
27.3% |
| Portugal |
8 |
1.4% |
4 |
4 |
50.0% |
| Japan |
6 |
1.1% |
5 |
1 |
16.7% |
| Australia |
5 |
0.9% |
3 |
2 |
40.0% |
| France |
5 |
0.9% |
2 |
3 |
60.0% |
| Germany |
5 |
0.9% |
3 |
2 |
40.0% |
| Total 10 countries |
301 |
53.9% |
227 |
74 |
31.7% |
Table 4.
Top ten total citations by country. Source: Scopus.
Table 4.
Top ten total citations by country. Source: Scopus.
| Country |
Total citations |
Average citations of articles |
| China |
3908 |
19.30 |
| USA |
1004 |
34.60 |
| Turkey |
400 |
400.00 |
| Italy |
202 |
15.50 |
| Hong Kong |
173 |
34.60 |
| Switzerland |
173 |
34.60 |
| Greece |
167 |
33.40 |
| Spain |
156 |
39.00 |
| France |
127 |
25.40 |
| Australia |
118 |
23.60 |
| Total (all countries) |
7138 |
21.92 |
Table 5.
The ten most relevant sources. Source: Scopus.
Table 5.
The ten most relevant sources. Source: Scopus.
| Sources |
Articles |
Type |
| Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
38 |
Conference Proceedings |
| ISPRS International Journal of Geo-Information |
17 |
Journal |
| IEEE Access |
15 |
Journal |
| GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems |
13 |
Conference Proceedings |
| ACM International Conference Proceeding Series |
12 |
Conference Proceedings |
| International Journal of Geographical Information Science |
11 |
Journal |
| IEEE Transactions on Intelligent Transportation Systems |
8 |
Journal |
| International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives |
8 |
Journal |
| Jiaotong Yunshu Xitong Gongcheng Yu Xinxi/Journal of Transportation Systems Engineering and Information Technology |
7 |
Journal |
| Transactions in GIS |
7 |
Journal |
Table 6.
The ten most cited articles, ordered descending by number of citations. Source: Scopus.
Table 6.
The ten most cited articles, ordered descending by number of citations. Source: Scopus.
| Author (year) and Title |
Source |
Citations |
| Yuan et al. (2010). T-drive: driving directions based on taxi trajectories [15]. |
GIS: International Conference on Advances in Geographic Information Systems |
884 |
| Abul et al. (2008). Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases [16]. |
2008 IEEE 24th International Conference on Data Engineering |
400 |
| Jing Yuan et al. (2013). T-Drive: Enhancing Driving Directions with Taxi Drivers’ Intelligence [17]. |
IEEE Xplore |
348 |
| Tang et al. (2015). Uncovering urban human mobility from large scale taxi GPS data [18]. |
Physica A: Statistical Mechanics and its Applications |
232 |
| Schroedl et al. (2004). Mining GPS Traces for Map Refinement [19]. |
Data Mining and Knowledge Discovery |
197 |
| Guo et al. (2012). Discovering Spatial Patterns in Origin-Destination Mobility Data [20]. |
Transactions in GIS |
145 |
| Abul et al. (2010). Anonymization of moving objects databases by clustering and perturbation [21]. |
Information Systems |
144 |
| Li et al. (2010). Incremental Clustering for Trajectories [22]. |
Springer Berlin Heidelberg |
132 |
| Chen et al. (2018). TripImputor: Real-Time Imputing Taxi Trip Purpose Leveraging Multi-Sourced Urban Data [23]. |
IEEE Transactions on Intelligent Transportation Systems |
125 |
| Monreale et al. (2010). Movement data anonymity through generalization [24]. |
Transactions on Data Privacy |
125 |
Table 7.
Most productive authors. Source: Scopus.
Table 7.
Most productive authors. Source: Scopus.
| Authors |
Institution |
Articles |
| Wang Haoyu |
Yunnan University, Kunming, China |
16 |
| Li Jinhong |
North China University of Technology, Beijing, China |
15 |
| Li Xue |
Shandong University of Science and Technology, Qingdao, China |
12 |
| Liu Yizhi |
Hunan University of Science and Technology, Xiangtan, China |
12 |
| Li Qing |
Shandong University of Science and Technology, Qingdao, China |
11 |
Table 8.
Main keywords. Source: Scopus.
Table 8.
Main keywords. Source: Scopus.
| Author keywords |
Articles |
Keywords plus |
Articles |
| clustering |
67 |
trajectories |
278 |
| trajectory |
34 |
clustering algorithms |
169 |
| trajectory clustering |
27 |
global positioning system |
119 |
| gps |
26 |
data mining |
105 |
| gps trajectory |
24 |
taxicabs |
88 |
| dbscan |
22 |
cluster analysis |
75 |
| data mining |
21 |
roads and streets |
70 |
| gps data |
21 |
gps trajectories |
61 |
| gps trajectories |
20 |
trajectory clustering |
61 |
| big data |
14 |
gps |
57 |
Table 9.
Entropic concentration index (H) of the selected variables. Source: Scopus.
Table 9.
Entropic concentration index (H) of the selected variables. Source: Scopus.
| Variable |
H |
| Authors |
0.9665 |
| Sources |
0.9211 |
| Countries |
0.5375 |
| Areas of research |
0.6682 |
| Article citations |
0.8169 |
Table 10.
Observed distribution of the number of authors who wrote a given number of articles and adjusted values of Lotka’s law. Source: Scopus.
Table 10.
Observed distribution of the number of authors who wrote a given number of articles and adjusted values of Lotka’s law. Source: Scopus.
| Number of Articles |
Authors |
Observed frequency |
Adjusted frequency |
| 1 |
1080 |
0.7627 |
0.7630 |
| 2 |
184 |
0.1299 |
0.1300 |
| 3 |
64 |
0.0452 |
0.0450 |
| 4 |
35 |
0.0247 |
0.0250 |
| 5 |
17 |
0.0120 |
0.0120 |
| 6 |
12 |
0.0085 |
0.0080 |
| 7 |
6 |
0.0042 |
0.0040 |
| 8 |
6 |
0.0042 |
0.0040 |
| 9 |
2 |
0.0014 |
0.0010 |
| 10 |
2 |
0.0014 |
0.0010 |
| 11 |
4 |
0.0028 |
0.0030 |
| 12 |
2 |
0.0014 |
0.0010 |
| 15 |
1 |
0.0007 |
0.0010 |
| 16 |
1 |
0.0007 |
0.0010 |