Bibliometric Knowledge Mapping of e-commerce platform operation on data mining

The e-commerce platform in the digital economy era has evolved into a data platform ecosystem built around data resources and data mining technology systems. The most typical applications of big data are also concentrated in the field of e-commerce. E-commerce companies should first grasp the interactive relationship among the three major factors of data, technology and innovation, e-commerce platform operation is a multidisciplinary research field. It is not easy for researchers to obtain a panoramic view of the knowledge structure in this field. Knowledge graph is a kind of graph that shows the development process and structure relationship of knowledge with the field of knowledge as the object. It is not only a visual knowledge mapping, but also a serialized knowledge pedigree, which provides researchers with a quantitative research method for the development trend of statistics and academic status. The purpose of this research is to help researchers understand the key knowledge, evolutionary trends and research frontiers of current research. This study uses Citespace bibliometric analysis to analyze the data of the Science Net database and finds that: 1) The development of the research field has gone through three stages, and some representative key scholars and key documents have been recognized; 2) the common knowledge mapping of literature The cooccurrence of citations and keywords shows research hotspots; 3) The results of burst detection and central node analysis reveal research frontiers and development trends. Today, the visualization of big data brings different challenges. The abstraction between the world and today's data visualization occurs when the data is captured. Every user sees his own visualization data generated by standardized calculations. At the same time, there are still many controversies in the theoretical model, structure and Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 21 December 2020 doi:10.20944/preprints202012.0529.v1 © 2020 by the author(s). Distributed under a Creative Commons CC BY license. Corresponding author: Hongxia LI, No.19, Xuefu Ave, Nanan District, Chongqing, P.R.China 400067 +8602362769347, lihongxia@ctbu.edu.cn structural dimensions. This is the direction that future researchers need to further study.

E-commerce platform is a kind of virtual "market", with its wireless connectivity and powerful information processing ability, formed a new transaction structure, greatly enhance the value creation and value capture ability of all participants.
With the e-commerce platform, buyers and sellers can complete transactions on products or services regardless of space, time, etc., increasing the frequency of transactions and reducing transaction costs. In the e-commerce platform, through the click data flow analysis, we can know that these data to improve the design of the site, improve the affinity of the site, has important significance. Use data warehouse and data mining techniques to process data such as click-through data streams and users' personal information. There is a growing preference for the use of online data, which are properly extracted and processed and can lead to accurate and timely indicators of various economic themes. It's not easy for researchers in the field of e-commerce to get a panoramic view of the structure, evolution and key nodes of knowledge. Before the method of Bibliometric, researchers who wanted quick access to a panoramic view of a discipline or field of study relied primarily on peer-reviewed articles or anthology. One obvious limitation of this approach is that it is subject to peer knowledge and subjective judgment, and it does not fully reveal key literature and emerging research hotspots in many cases and it is easily controversial (Peng, Zhu, Wu, 2020; Latini, 2019). Prior to theadvent of Citespace, researchers relied primarily on peer-reviewed articles or collections of papers to gain a panoramic view of the subject or field of study. Because of the lack of objectivity due to the research peer's knowledge vision and subjective judgment, this method can not reveal the main literature and emerging research hotspots in this field comprehensively, which is easy to cause controversy. The emergence of the Citespace analysis tool provides researchers with another possibility, that is, from simple subjective judgment to subjective judgment combined with objective measurement, this study is guided by this idea.
Citespace is a quantitative analysis method based on scientific and technical literature, which describes, evaluates and predicts the situation and trends in various research fields by mathematical and statistical methods. This paper uses two common literature analysis software ： Citespace and VOSviewer, to study the key path of knowledge evolution in the field of e-commerce, to reveal important knowledge inflection points in the field, to analyze the potential dynamic mechanisms of evolution and to explore the frontiers of development by drawing a series of visual maps.

2.Data Sources and Research Methodology 2.1 Data Sources
In this study, we used the Web of Science (WOS) as a data source. WOS integrates the Scientific Citation Index (SCI), the Social Science Citation Index (SSCI), Arts and Humanities Citation Index (A and HCI), and the Conference Institute Citation Index -Social Sciences and Humanities (CPCI-SSH). It is currently the world's deepest and most complete database, providing citation index data. Unlike EBSCO, Springer, Willie Blackwell, and other large databases, the WOS database covers all source journals of SSCI, and its sub-databases enjoy high authority in academia. The retrieval method of this data collection was "Topic= E-Commerce platform operations AND data mining" and to filter out duplicate literature. After searching the four databases of WOS (SSCI, A&HCI, CPCI-SSH, SCI), we obtained 2586 pieces of relevant literature index data. Then we purified the obtained data by three standards of the general article, proceedings paper and review, and obtained 2518 bibliographic data. Data collection time is from 2010 to 2020 (the data collection time for the database is 2010). The retrieval date is November 19, 2020. 1 In this paper, Citespace is used to analyze annual statistical analysis, authors, institutions, countries, literature keywords, synonym keyword combination processing, statistical high frequency keywords, generating word matrix and co-occurrence matrix. Using the visual software CiteSpace to document the key node literature, the knowledge map is drawn, and the research hotspot and research frontier research evolution is analyzed by clustering algorithm and burst word detection function. When this study used Citespace to generate a knowledge map, the parameter threshold was set according to the following steps: 1) the reference data was imported into the Citespace software: 2) Set parameter values in the software to "TimeSlicing=2010-2020", "YearsPerSlice=1", "TermSource" by default, "TermType=BurstTerms," "Pruning=Minimum Spanning Tree," and "NodeType" to be flexibly selected as required by the research. Among them, 2020 in the data time range refers to the data limited to the data collection time range, that is, the data up to 18th November, 2020.

Research Methodology
VOSviewer is a JAVA-based knowledge drawing software developed by the Centre for Scientific and Technological Research at Leiden University in the Netherlands. The structure, evolution, cooperation and other relations in the field of literature knowledge are introduced. This paper uses VOSviewer to statistically analyze the age and keywords of the literature related to the operation of the e-commerce platform and combined the synonym keywords to calculate high-frequency keywords. CiteSpace Knowledge Visualization Software is a multidimensional, tense and dynamic knowledge mapping tool developed by Professor Chen Chaomei (2006) of Dressel University in the United States. The software can convert abstract data into graphical expressions (Chen and Liu, 2005), by analyzing and understanding graphics, so that users can visually see the relevant information in the corresponding research field, grasp the research field of structural relations, evolutionary laws and other knowledge characteristics. CiteSpace version 5.7 is used to analyze the literature selected in this article. Firstly, the key node literature is collated by co-introduction analysis, and the research hotspots and frontiers of e-commerce platform operation are analyzed by clustering and burst detection, and secondly, knowledge maps are drawn to analyze the research evolution trend in the field of e-commerce platform operation. Finally, the coupling relationship of literature is used to analyze the internal relationship between disciplines in literature.

Data Analysis 3.1 Literature Analysis 3.1.1 Annual Volume Analysis
To some extent, the overall annual distribution of the literature can reflect the research status of the field at a particular stage. In addition, the total number of papers published each year is also an important indicator of scientific research output. Using CiteSpace toextract annual data from WOS data, analyze the overall research trends in the field of e-commerce, and obtain a time trend graph of the number of documents ( Figure 1).
According to the trend of the number of publications, the development of ecommerce research is divided into three stages: 1) Initial stage; 2) Slow stage;

Institutional and Country Analysis
We extracted institutional data from WOS, and the top ranked item by citation counts is (Hong Kong Polytech Univ) in Cluster #1.5, with citation counts of 4. The second one is (Shandong Univ Finance & Econ) in Cluster #0.5, with citation counts of 3. The third is (Jinan Univ) in Cluster #0.5, with citation counts of 2. The 4th is (Tongji Univ) in Cluster #-0.5, with citation counts of 2. The 5th is (Univ N Carolina) in Cluster #-0.5, with citation counts of 2. The 6th is (Fudan Univ) in Cluster #1.5, with citation counts of 2. The 7th is (RMIT Univ) in Cluster #1.5, with citation counts of 2. The 8th is (Chinese Culture Univ) in Cluster #6.5, with citation counts of 2. The 9th is (Harbin Inst Technol) in Cluster #0.5, with citation counts of 2. The 10th is (Univ Memphis) in Cluster #-0.5, with citation counts of 1. (Figure 2) ( Table 2).

Key Scholars Analysis
Key scholars are leading the way in the field of research. The frequency of the author's reference can reveal the author's attention in the field of research and serve as a measure of the author's influence. Based on previous analysis, this has been a rapidly evolving stage since 2015, and analysis of high-volume authors can help researchers find the latest representative authors in the field of e-commerce. The top ranked item by centrality is (WANG Z) in Cluster #0.5, with centrality of 4. The second one is (CHONG AYL) in Cluster #-0.5, with centrality of 3. The third is (SHEN B) in Cluster #-0.5, with centrality of 3. The 4th is (LU QH) in Cluster #-0.5, with centrality of 3. The 5th is (YU Y) in Cluster #-0.5, with centrality of 3. The 6th is (CHOI TM) in Cluster #-0.5, with centrality of 3. The 7th is (LI L) in Cluster #-0.5, with centrality of 3. The 8th is (LIU Z) in Cluster #-0.5, with centrality of 3. The 9th is (ZHANG J) in Cluster #-0.5, with centrality of 3. The 10th is (CHEN X) in Cluster #-0.5, with centrality of 3. Figure 3: Table 3:

Centrality
References

High Citation Literature Analysis
The highly cited papers reflect the degree and focus of academic attention to research. In the current research evaluation, citation frequency is an index to measure the importance of the paper. This paper extracts citation data from WOS and obtains the top 10 papers with the highest citation frequency, as shown in Table 4.

Research Hotspots Analysis
By analyzing recent hotspots and frontiers, It is helpful for researchers to understand the latest research trends and identify research issues in the near stage. We selected 2010-2020 data as the source data for hot spot and cutting-edge analysis based on the last development stage(2020 data, less than one year, not included in the analysis). In Citespace, "TimeSlicing= 2010-2020"," Node type= Reference ", " Top=25"are set up to obtain a commonly referenced visual knowledge map of the literature (Figure 4). In the figure, there are 235 nodes, 469 links, density 0.0171, modularity 5974 (Q-value, which is generally considered an important cluster structure for Q> 0.3), Silhouette=0.6035 (S-value, which is generally considered a reasonable cluster of S>0.5). The node in the figure is the year ring. The larger the node, the higher the overall frequency of the common reference. In the ring of the year, blue represents the earlier years, and red represents the last few years. The crimson color at the center of the year ring indicates that the literally cited document is a burst document, and that the thickness of the ring is directly related to the frequency with which the document is referenced jointly during the year. Some rings have purple outer rings, indicating that the commonly quoted literature is fairly central. The connection between common reference references means that two common references often appear in the same article. The thicker the connection, the higher the frequency of co-occurrence. Clustering refers to the use of clustering algorithms to group collections of physical or abstract objects into classes composed of similar objects(Chen, C. and Leydesdorf,2014). Clustering analysis is based on the similarity of analytical objects, and the (LLR) algorithm is used in this paper. Hot spot clustering analysis is based on the natural advantages of cocitation network clustering based on connection relationships rather than node attributes. It clusters the sampling space of any shape and converges to the best global solution. Among them, Q value and S value are two key measures of the overall structure of the network. Keyword phrases are extracted from the title of a particular cluster as a cluster tag to describe and identify the cluster. From Summary Figure 2, we can see that the research hotspots in this field mainly form eight aspects, namely: the item ranked highest by citation count is e-commerce in "cluster" and the citation count is 813. The second is the model in the citation count "cluster" with a citation count of 282. The third cluster, electronic commerce, has a reference count of273. The fourth is the big data in the cluster, with a reference count of 246. The fifth is performance in the cluster, with a reference count of 221. The sixth is impact in the cluster, with a reference count of 177. The seventh is technology in the cluster, with a reference count of 127. The eighth in the cluster is system, with a reference count of 125. The ninth is adoption in Cluster, with a reference count of 124. In the 10th innovation in Cluster, the reference count is 120. The key article in each cluster block is the node shown in Figure 4.  According to Table 5, we can find that the research objects in the field of ecommerce research are e-commerce, model, electronic commerce, big data, impact, performance, technology, adoption, innovation, system.
To check the literature, we clicked the Burst button to set the minimum duration of burst detection to 1 year. We got the top 25 mutation keywords that revealed different stages and future trends in e-commerce. The keywords in the graph are represented the corresponding keywords, the year represents the first time the record is retrieved, the strength indicates the strength of the mutation, the beginning indicates when the mutation starts, and the end indicates the end time. The start and end times of the year correspond to a small red rectangular block. As shown in Figure 6 below:

Document Co-citation Analysis
Document Co-citation Analysis (DCA) can be used to identify key literature and research frontiers in the field of research (Chen, C., Ibekwe-SanJuan, F., & Hou, J., 2010). A common citation refers to the phenomenon of two articles quoted in the same article. Generally speaking, the frequency of common citations changes over time, so the literary network based on common citation relationships is a dynamic structure. Common Citation Analysis has dynamic analysis capabilities and is suitable for analyzing the evolution of knowledge in a discipline or field of study (Zeng Zhishen, 2008). Document co-citation analysis (DCA) can be used to identify key literature and research frontiers in a research field. The "cited Reference"" was used as the node in the CiteSpace, set LBY=-1(no limitation of retroactivity) , Timeslice=1, years, and top N=30. The document co-citation map of E-commerce was obtained by using Pathfinder pruning and displaying in a time-zone mode, which is shown in Figure 7. According to Figure 7, it can be seen that the co-citation relationship in the field of e-commerce in 2007-2020 is very close, while the citation contribution rate is high in the period 2015-2019. Among them, two highly cited documents, Chen A and Bowen Jeff, were cited 15 times before 2011; between 2013 and 2015, there were six highly cited documents, mainly Berthold Jonathan, Siddiqui AW, and Li L , Agarwal NK, Tan KH and Hagiu A were cited 126 times; in 2016-2019, there were seven highly cited documents, such as Bedane B, Hopp T, Cachon GP, Ahmed E, Alahyari H, Shen B and Lin XL, which were cited 178 times. As shown in Figure 7 below： Figure 7:

Research Frontier and Trend Analysis
Research Frontiers refer to "a set of sudden dynamic concepts and potential research issues" (Chen, C. and Leydesdorf,2014). In CiteSpace, Kleinberg's Burst Detection Algorithm was used to identify those cited articles (burst items) that have aroused the academic community's close attention in a specific time, and they are marked with annual red rings inside the nodes. The longer the reference surge of a node lasts, the thicker the red ring of the node. Burst literature contains two dimensions of burst value and burst time. The literature with high emergence value means that it has received particular attention in the corresponding time interval, representing the research frontier of the field to some extent. Burst time interval reflects the dynamic change of a discipline or research field frontier hotspot. To display the latest research frontier information, this paper extracted burst literature from the latest burst time (2020), the authors of key articles representing the frontiers of e-commerce research (Figure 8), as shown in Figure 8 below.

Disciplinary dual-map Overlay Analysis
Disciplinary dual-map Overlay analysis can use the coupling relationship of literature to analyze the internal relationship between disciplines in literature. Figure 9 is obtained by using the Zscorce algorithm to overlay and simplify the literature data of e-commerce, using the journal Double Chart Overlay to see the two areas of common research hot spots, where the thickness of the lines indicates the degree of connectivity between disciplines. Part 1 of the figure consists of the subject relationship of the cited literature, and Part 2 of the area consists of the subject relationship of the cited document, as shown in Figure 9 below.

Figure 9:
Disciplinary Dual-Map Overlay of E-commerce Figure 9 shows that the research on e-commerce in Part 1 focuses on computer science, economics, management, and business, which provide a developing knowledge base for e-commerce research. Part2 shows that the research literature of ecommerce is quoted as psychology, education, sociology, health science and biometrics. Less distribution in physical education, more often cited in chemical science.

Results and discussions 4.1 Results
In this paper, we used the bibliometrics method to make a statistical analysis of the article's annual publication, research institutions, key authors, keywords, and citations. W used literature citation analysis, keyword clustering algorithm, burst detection algorithm and Z-scorce algorithm to detect research hotspots, research frontiers and evolutionary trends in the field of e-commerce, providing key knowledge references for relevant researchers.
According to the statistics of the trend of the number of papers, the development of e-commerce research is divided into three stages:1) initial stage; Before 2000, it was the initial research phase, with or without several papers published each year. From publishing institutions and national analysis, research institutions in the field of e-commerce are universities, mostly from China. China is the core country in the field of research, accounting for 62. 3%, farmore than any other country, which is related to China as the world's largest e-commerce market.
WANG Z,CHONG AYL, and SHEN B arehighly citedauthors in this field, based on the authors' statistical results and citation data. Bedane B, Hopp T, Cachon GP, Ahmed E, Alahyari H, Shen B and Lin XL are highly productive writers in the latest phase. According to citation data and knowledge maps, scholars can access key articles and quickly master the basics of the field.
Research hotspots and cutting-edge test results show that the highest-ranked projects in the field of e-commerce research are e-commerce, model, electronic commerce, big data, impact, performanc, technology, adoption, innovation, system. The results show thatEdi, electronic commerce, electronic data interchange, information technology, intelligent agent are the latest research frontiers. The key nodes selected by the center show the evolution path of research development, verify the previous analysis results, further reveal the research trends, provide researchers with a clear perspective, and provide researchers with a panoramic view of e-commerce.
A literature measurement method based on a knowledge map.In this paper, a visual analysis method based on knowledge map is used. The Knowledge Map in Literature Bibliometric is a chart that shows the relationship between the development process and the knowledge structure by targeting the domain of knowledge(Chen, C. and Leydesdorf,2014). It has the dual nature and characteristics of "graph" and "spectrum": it is not only a visual knowledge map, but also a serialized knowledge family tree, showing many complex relationships between knowledge units or groups of knowledge, such as networks, structures, interactions, intersections, evolution or derivation. In the internet age, scientific communication and technological innovation have entered a faster and more efficient pace. In this context, the visualization method can be used to display econometric analysis results, so as to show the research trends and context more clearly. Literature measurement based on knowledge map is a new scientific measurement method. The traditional method of literature is to carry out a simple statistical analysis of literature, and this new method is to dig deep into the literature. It uses deep learning methods to identify entities in the literature and their relationships and displays them as knowledge maps. This new approach opens up more possibilities for knowledge research.

Discussion
The following is a discussion of research topics and the views of leading scholars in the field of e-commerce. The keyword clustering diagrams in Figures 4 and 5 show that the research topics of e-commerce are mainly concerned with the following. Factors that affect the operation of the e-commerce platform. By analyzing the literature, it can be concluded that many factors affect the operation and transaction of e-commerce. The mode of E-commerce platform operation, business performance, technological innovation and big data are the main factors.
Data mining Practice and impact in the field of e-commerce academia. One way for e-commerce (Kohavi and Provost, 2001) to bridge the gap from academia to business data mining is to build a vertical bridge and complete the data mining chain from collection to cleanup, mining, action, and validation. The e-commerce and data mining architecture built (Ansari et al., 2001) provides academia with the unique ability to collect more data than is normally available for data mining projects. From the very beginning, important consideration was given to data transformation and analysis needs. This contrasts with one of the challenges of e-commerce business intelligence in the case of post-analysis analysis. In these cases, there is often a gap between the potential value of the analysis and the actual realized value, either because of the limited data collected or because the data must be converted to be efficiently mined (Kohavi, Rothleder, and Simoudis, 2002). Analyzing data is a complex task (Berry and Linov,1997Linov, ,2000 because it requires experience and expertise. Unlike traditional ecommerce, the goal of data mining is not only business-oriented, but also artificial intelligence, big data-oriented. E-commerce has entered the "big data era". The theoretical model basis for e-commerce platform operation. On the basis of Dynamic Capabilities Theory, Dynamic capability of technological innovation Theory, Technology Acceptance model and RFM model, the researchers gradually perfected the theoretical framework of e-commerce platform operation. Teece and Pisano put forward the concept of dynamic capability (DC) from a resource-based perspective. In the longterm competition, to maintain the company's competitive advantage, it is necessary to continuously develop dynamic capabilities. Through these processes, companies continuously consolidate, reconfigure, update, and re-create resources and capabilities to respond to changing environments to achieve and maintain competitive advantage. The citation frequency of this document is the highest in the statistical backtracking cycle and provides reference for many researchers. Scholar Zhang Lin and others (2018) can be understood as a combination of multiple elements, is a dynamic, need enterprises to accumulate experience and knowledge to form a(combination. It will enhance the technology of enterprises, and thus enhance the competitiveness of enterprises. The combination of factors were included technological innovation input ability, technological innovation output ability, technological innovation transformation ability, technological innovation realization mode and technology innovation management ability. Build a knowledge structure diagram of the dynamic ability of technological innovation and verify its rationality. The Technical Acceptance Model (TAM) (Davis,1989) is one of the core streams of intent to invest in user systems and is used by many scholars. TAM assumes higher system-aware ease of use (PEOU) and perceived usefulness (PU), and therefore has a greater intent to use the system. The technical acceptance model (TAM) is a widely accepted theory in the user's technical acceptance. The RFM (Frequency, Frequency, and Currency) model is a behavior-based model that analyzes the behavior of online customers, such as the customer's browsing history, the customer's purchase history, the customer's transaction history, and so on, and then forecasts based on database management. Behavior in the database (Hughes, 1996;Yeh et al., 2009). In addition, recency represents the length of the time period since the last purchase, while frequency represents the number of purchases over a specified period of time, and currency indicates that a great deal of research has been done on the analysis and forecasting of customer buying behavior, resulting in a variety of models with high predictive effectiveness, particularly in non-contractual online business environments (Gupta et al., 2006; Van den Boer and Barkinx, 2005). Together, these theories build an impact model for e-commerce platform operations.

Conclusions and limitations
Combining research hotspot and cutting-edge detection with keyword burst detection results and important literature analysis, it can be concluded that innovation, information technology, and big data are important frontier trends in E-commerce. In addition, the social impact of e-commerce participation in different disciplines is also different. Researchers can conduct quantitative comparative studies in the future to provide references for the overall development of e-commerce. With the help of different research methods, research perspectives and research paths, researchers further expand the interdisciplinary research field and enrich the research in this field.
The globalization of society has brought many opportunities, but it has also brought a series of problems. With the rapid development of globalization, these problems have become common problems facing all countries in the world. The field of e-commerce research has become a hot research area for scholars at present. The existing research has made fruitful research results and produced some representative scholars and theoretical articles. However, the results of development analysis show that the scope of research objects needs to be expanded, and there are many disputes in the theoretical model, structure and structural dimension of e-commerce research. There was still a lot of room for development and further research was needed. Future researchers need to enrich and perfect e-commerce theory and research results based on collaboration, integration and expansion, combining the social characteristics of globalization and information technology. E-commerce platform operation and data mining research is a multidisciplinary cross-cutting field, involving behavioral computer science, psychology, education, economics and social science, etc. They enrich the research field by using the research methods, viewpoints and paths of different disciplines, so that the disciplines, methods and perspectives are diversified.
Bibliometrics is a scientific and effective method of library and information science research, which is used for the statistics, description and prediction of academic status quo and development trend. However, the quantity and quality of scientific literature is undoubtedly a measure of technical level. The research in this paper is based on the WOS database, and the research results are relatively objective and fair. However, due to database limitations, the data is not comprehensive enough, and some databases do not support directory downloads. Therefore, there may be some deviations in the results of the study. Future research can consider using data mining technology to expand the scope of source data collection and improve the quantity and quality of