1. Introduction
Analyzing document data as a time series has been a commonly employed approach in the past [
1,
2,
3,
4,
5,
6]. One notable advantage of representing documents as time series data is the ability to quantitatively capture long-range correlations among various components of the documents [
7,
8,
9,
10,
11,
12]. This advantage stems from the application of mathematical tools in time series analysis, such as autocorrelation functions [
13,
14,
15], waiting time distributions [
16,
17], and similar methods. For instance, autocorrelation functions can be utilized to determine the significance of words within a given document [
13]. Words exhibiting strong long-range autocorrelation across a text are often considered to be closely related to the document’s central theme [
13,
14,
15,
16,
17].
In a previous study [
17], we utilized univariate Hawkes processes to model word occurrence patterns in texts, demonstrating their effectiveness in capturing the dynamic, long-range correlations between different positions in a considered text. Hawkes processes are a type of stochastic process designed to model events occurring over time, influenced by prior occurrences [
18]. They are particularly effective at capturing self-exciting phenomena, where the occurrence of one event increases the probability of subsequent events, whether in the short-term or long-term [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. This characteristic makes Hawkes processes highly suitable for describing the occurrences of significant words in a text that are associated with specific concepts or ideas. Such key words often reappear during the explanation of crucial concepts. In other words, important words related to the subject matter of the text exhibit self-excitatory behavior, correlating with their own past occurrence signals. As a result, their patterns can be effectively modeled using Hawkes processes. This was confirmed in our previous study [
17], which showed that many keywords in famous academic books follow such patterns.
However, the methodology proposed in the previous study [
17] has certain limitations. Specifically, it employed univariate Hawkes processes, which can capture autocorrelation—measuring how a word’s occurrence signal is correlated with itself over different time lags. However, this approach cannot account for cross-correlation, which quantifies the correlation between the occurrence signals of two different words as a function of a time lag. It is common for significant concepts in documents to be explained from multiple perspectives, using several key terms. In such cases, these key terms should be interrelated or correlated, such that they mutually influence one another’s appearance within the document.
To characterize such documents, the model used should be capable of capturing the mutual correlations among keywords. This requirement underscores the necessity of multivariate Hawkes processes [
29,
30,
31] for text modeling. As will be elaborated later, multivariate Hawkes processes can effectively represent multiple, mutually correlated stochastic processes, such as the occurrence signals of several keywords within a text. The objective of this study is to employ multivariate Hawkes processes to model documents. To the best of our knowledge, this represents the first application of multivariate Hawkes processes to document analysis. Furthermore, it will be demonstrated that employing multivariate Hawkes processes, as opposed to univariate ones, significantly enhances the accuracy in describing word occurrence signals. Moreover, we discovered that Hawkes graphs, [
32] which are graphical representations of multivariate Hawkes processes, are highly effective in intuitively conveying the contents of analyzed documents. These graphs sensitively depict the interrelationships among multiple keywords appearing in the document. From the graph, it is possible to identify a central keyword that is closely tied to the document’s theme, as well as several other keywords used to describe or introduce the central concept. While Hawkes graphs may appear similar to diagrams depicting word co-occurrence networks [
33,
34], they are more advantageous as they incorporate information about causal relationships and dependencies among keywords. Thus, another significant benefit of utilizing the multivariate Hawkes process lies in its ability to provide an intuitive representation of document content through Hawkes graphs.
The remainder of this paper is organized as follows. In the next section, we outline the methodology, describing the characteristics of the univariate and multivariate Hawkes processes employed, including their kernel functions and log-likelihood functions. Additionally, we explain how word occurrence signals were extracted from each document and how these signals were modeled using the univariate and multivariate Hawkes processes. This section also details the simulation procedures used to validate our modeling approach with both types of Hawkes processes. The subsequent section presents our results, highlighting the advantages of multivariate Hawkes processes over univariate ones. This section also elaborates on the construction and practical aspects of Hawkes graphs. Finally, in the concluding section, we summarize our findings and propose directions for future research.
2. Methodology
2.1. Converting Text as Time Series Data
To treat written texts as time series data, we assign a serial number to every sentence in a considered document and assign a time role to this sentence number [
13,
14,
15,
16,
17]. The occurrence signal of a considered word is then defined as
which is a binary variable expressing word occurrence event, and accumulated value of
, i.e.,
becomes a counting process for the word occurrence event, i.e.,
represents the number of occurrences of a considered word along the documents. We treat
and
as time series data and seek a stochastic process that can well describe the behavior of
and
. As already clarified, for words that are not directly connected to the theme of a given document and therefore that do not exhibit any dynamic correlations, their occurrences can be accurately modeled using either a homogeneous Poisson process or an inhomogeneous Poisson process [
13,
14,
15,
16]. In this paper, we investigate stochastic processes that characterize
and
for significant words which are closely related to the document’s theme and therefore have long-range dynamic correlations.
2.2. Maximum Likelihood Estimation of Hawkes Processes
To illustrate our methodology, we briefly introduce the intensity and likelihood functions of univariate and multivariate Hawkes processes, which we consider to be suitable stochastic models for explaining the observed word occurrence signals, Equation (1), for real words in the analyzed text.
The univariate Hawkes process is mathematically defined by its intensity function,
, representing the conditional event rate (word occurrenc rate) at time
For a process with an exponential decay kernel, the intensity function is given as [
35,
36]:
where
denotes the baseline intensity,
represents the times of previous events (
), and
and
are positive parameters of the exponential decay function. Given that event times
are observed within a time interval
, where
denotes the total number of sentences in the text, the log-likelihood function of the univariate Hawkes process is expressed as [
35,
36]:
In the framework of maximum likelihood estimation (MLE), , and are treated as fitting parameters to adapt the univariate Hawkes process to the observed sequence of event times.
The multivariate Hawkes process expands upon the univariate Hawkes process by introducing multiple dimensions, allowing it to capture interactions among various types of events. In our case, we anticipate that the multivariate Hawkes process can effectively model the cross-correlation between word occurrence signals of different words within a given text. If we select
words as the focus of our analysis and consider the
-th word, the intensity function of the multivariate Hawkes process for events of type
(i.e., the intensity function for the occurrence of the
-th word) is defined as [
36,
37]:
where
is the baseline intensity for events of type
,
is the total number of event types (the total number of analyzed words),
represents the time of the
-th event of type
(i.e., the
-th occurrence of the
-th word),
denotes the influence strength of events of type
on those of type
, and
is the decay rate of the excitation. The log-likelihood function for the multivariate Hawkes process is expressed as:
Here,
represents the number of occurrences of events of type
,
is the baseline intensity vector, defined as
, while
and
are matrices comprising the fitting parameters introduced in Equation (5). Equation (6) can be derived in the following manner. According to the general theory of point processes, the likelihood function of the multivariate Hawkes process is given by [
38] (Chapter 7):
where
is defined by Equation (5). By substituting Equation (5) into Equation (7), taking the logarithm of both sides, and performing further integral calculations, we obtain the log-likelihood function given in Equation (6).
In the framework of MLE for univariate Hawkes processes, the goal is to find optimized parameters , and that maximize the log-likelihood function, Equation (4), given a list of event times . Similarly, for multivariate Hawkes processes, the objective is to optimize the vector and matrices and to maximize the log-likelihood function, Equation (6), given lists of event times for each type of event, for . In practical MLE procedures, we used the minimize() function from the Python library scipy.optimize to determine the optimal parameter values by minimizing and The L-BFGS-B (Limited-memory Broyden–Fletcher–Goldfarb–Shanno with Box constraints) method was employed to define upper and lower bounds for the fitting parameters, ensuring that the solutions remained within the specified ranges.
2.3. Selecting Important Words in Used Texts
Six texts employed in this study are listed in
Table 1. One of the six documents is a famous historical pamphlet (“Common Sense” by Thomas Paine) ; the other five are well-known academic books. They are chosen so as to represent wide range of written texts, In the table, short names of each book and some information are also shown. The preface, contents and index pages were deleted before starting the text preprocessing because they may act as noise and may affect the final results.
We select 20 important words from each of the six texts to analyze their occurrence signals by using the univariate and the multivariate Hawkes processes. This means that 20 pairs of Equations, (3) and (4), are considered for modeling the 20 words with univariate Hawkes processes while we set
in Equations (5) and (6) for modeling with the multivariate Hawkes process. The selected 20 important words having long-range dynamic correlations are listed in
Table 2. Note that only nouns are included in the table because we intended to investigate the causal relationships between concepts represented by the nouns in the document under consideration (e.g., word “A” is used to describe word “B”, etc.). This investigation was performed based on the correlations derived from analysis using the multivariate Hawkes process.
The accuracy of representing word occurrence signals using the multivariate Hawkes process improves as the number of analyzed words increases. This is because a larger set of words allows for the consideration of all possible event causes that mutually excite one another. In most documents, key concepts are explained from multiple perspectives, utilizing numerous words to convey complex ideas. Such texts often exhibit correlations where many words influence each other.
In a multivariate Hawkes process, each dimension represents the occurrence signal of a single word. Consequently, limiting the process to only 20 dimensions restricts its ability to capture correlations among words, making it insufficient for describing the cross-correlations commonly found in texts. However, the maximum likelihood estimation (MLE) of a multivariate Hawkes process requires extensive convergence calculations, making it computationally intensive and practical only for cases with fewer than 20 dimensions. Therefore, establishing clear selection criteria for the 20 key words, as outlined below, is critical to this study.
First, we calculated the autocorrelation functions (ACFs) for all words that appear more than 50 times in each text, using the formula:
where
represents a mean value of
. Then these ACFs were fitted by a Kohlausch-Williams-Watts (KWW) function:
where
(a relaxation time) and
(a shape parameter) are fitting parameters that satisfy
and
. The KWW function has been demonstrated to effectively represent real ACFs of important words with long-range dynamic correlations [
13,
14,
15,
16,
17]. Next, we evaluated the mean relaxation times
for each word using the equation:
where
denotes the gamma function. Finally, we select top 20 words in terms of
as the key words to be analyzed for each document. A large mean relaxation time,
, indicates that a word exhibits both strong dynamic correlations and long-range memory, making it relevant to the key concept or theme of the text. Therefore, we consider words with large
to be significant. In
Table 2, the values of the fitting parameters,
and
, and the mean relaxation time,
, are shown for each of the 20 words.
Figure 1 shows the word occurrence signal
, the calculated ACF and the fitted KWW function for a picked word “formation” in the Darwin text.
2.4. Validation of Modeling with Hawkes Processes
In this study, we use univariate and multivariate Hawkes processes to model word occurrence signals in the considered texts. The validity of the modeling is confirmed by following procedures.
To confirm the validity of using the univariate Hawkes process, we first evaluated the optimal values of the fitting parameters , , and in Equations (3) and (4) using the maximum likelihood estimation (MLE). By the principle of the maximum likelihood, the univariate Hawkes process with the obtained parameter values is expected to best replicate the occurrence signals of the word in question. Using the optimized parameter values, we generated virtual word occurrence signals for the top 20 important words by simulating each of the 20 univariate Hawkes processes. These simulated word occurrence signals were compared to the observed signals in the real texts. The comparison mainly focused on the autocorrelation functions (ACFs): if the ACF of the simulated signal for a given word closely matches the ACF of the real signal, we conclude that the modeling using the univariate Hawkes process is effective.
To confirm the validity of the multivariate Hawkes process, we followed a similar approach. First, we determined the parameter values of the 20-dimensional multivariate Hawkes process using MLE given the word occurrence signals of the top 20 important words. Next, we simulated the 20-dimensional Hawkes process to generate virtual word occurrence signals for these 20 words. Finally, two types of ACFs—calculated from the simulated signals and the real signals for each of the 20 words—were compared to assess the effectiveness of the modeling using the multivariate Hawkes process.
3. Results and Discussion
3.1. Validity Confirmation of Hawkes Processes
As outlined above, we primarily compare the characteristic quantities derived from simulated signals with those obtained from word occurrence signals observed in real texts. The comparisons focused on the following four quantities:
The total number of occurrences of the word throughout the text.
The relaxation time of the ACF, derived from the fitting parameter of the KWW function (Equation (9)).
The shape parameter of the ACF, also obtained from the fitting parameter of the KWW function.
The Bayesian Information Criterion (BIC), calculated during the fitting of the KWW function to ACFs.
The results of these comparisons are displayed in plots, where the vertical axis represents one of the four characteristic quantities calculated from simulated signals, while the horizontal axis represents the corresponding quantity calculated from observed occurrence signals in the real text.
Figure 2 through
Figure 7 present the results of these comparisons. Plots (a)-(d) in Figures 2 through 7 illustrate the comparison between the univariate Hawkes processes and the actual word occurrence signals, whereas plots (e)-(h) depict the comparison between the multivariate Hawkes processes and the actual word occurrence signals. Word occurrence signals were simulated 20 times for both univariate and multivariate Hawkes processes, resulting in 20 vertical values for each characteristic quantity corresponding to each word. The upper end of the error bars represents the maximum value among these 20 features, while the lower end denotes the minimum value. The vertical positions of the blue circles in Figures 2 through 7 indicate the averages of these 20 values.
If the simulated signal for a given word closely matches the real signal, the four characteristic quantities across plots (a)-(h) should align near or directly along the line. Thus, we can conclude that the modeling using univariate or multivariate Hawkes processes is effective if the blue circles are positioned along the straight line indicating direct proportionality. To assess the degree of linear correspondence between the vertical and horizontal quantities, we calculated correlation coefficients, which are displayed in the titles of the plots in Figures 2 through 7. Consequently, the validation criterion is how closely the correlation coefficient approaches 1.
Figure 2.
Characteristics comparing real and simulated word occurrence signals. In all plots, the horizontal axis represents characteristics of the real signals in the Darwin text, while the vertical axis represents characteristics derived from the simulated signals. Characteristics derived from simulated signals generated from univariate Hawkes processes are shown in plots (a)–(d), while those from the 20-dimensional multivariate Hawkes process are shown in plots (e)–(h). The red lines represent the function .
Figure 2.
Characteristics comparing real and simulated word occurrence signals. In all plots, the horizontal axis represents characteristics of the real signals in the Darwin text, while the vertical axis represents characteristics derived from the simulated signals. Characteristics derived from simulated signals generated from univariate Hawkes processes are shown in plots (a)–(d), while those from the 20-dimensional multivariate Hawkes process are shown in plots (e)–(h). The red lines represent the function .
Figure 3.
The same meaning plots as in
Figure 2 for Einstein text.
Figure 3.
The same meaning plots as in
Figure 2 for Einstein text.
Figure 4.
The same meaning plots as in
Figure 2 for Faraday text.
Figure 4.
The same meaning plots as in
Figure 2 for Faraday text.
Figure 5.
The same meaning plots as in
Figure 2 for Freud text.
Figure 5.
The same meaning plots as in
Figure 2 for Freud text.
Figure 6.
Similar plots to those in
Figure 2 are presented for the Paine text, except that an 8-dimensional multivariate Hawkes process was used to calculate the vertical quantities in plots (e)–(h).
Figure 6.
Similar plots to those in
Figure 2 are presented for the Paine text, except that an 8-dimensional multivariate Hawkes process was used to calculate the vertical quantities in plots (e)–(h).
Figure 7.
The same meaning plots as in
Figure 2 for Plato text.
Figure 7.
The same meaning plots as in
Figure 2 for Plato text.
Our primary focus is to evaluate the extent to which the effectiveness of modeling improves when transitioning from the univariate Hawkes process to the multivariate Hawkes process. To assess this, we compared the description accuracies of the univariate Hawkes process and the multivariate Hawkes process across Figures 2 through 7: comparisons between plots (a) and (e) for the number of occurrences, between plots (b) and (f) for the relaxation time , between plots (c) and (g) for the shape parameter , and between plots (d) and (h) for the Bayesian Information Criterion (BIC).
These comparisons clearly demonstrate that the multivariate Hawkes process significantly enhances modeling effectiveness compared to the univariate Hawkes process. This improvement is evidenced by the scatter plots of the four characteristic quantities being distributed closer to the straight line
in the case of the multivariate Hawkes process, with correlation coefficients approaching 1. Indeed,
Table 3 confirms that the correlation coefficient for each quantity is closer to 1 when modeling with the multivariate Hawkes process is applied.
This result is unsurprising, given the nature of the document, which presents concepts each of which is explained by using multiple key terms. These key terms are frequently repeated throughout the explanation, creating inherent correlations among them. As a result, the multivariate Hawkes process effectively captures these relationships.
In contrast, the univariate Hawkes process is limited to capturing self-correlation, meaning it only reflects the correlations found within the occurrence signal of a single word.
3.2. Hawkes Graphs
Hawkes graphs are directed graphs that intuitively represent multivariate Hawkes processes. In these graphs, each node corresponds to a specific type of event in the multivariate Hawkes process, while each edge indicates how one type of event (represented at the root of the arrow) influences or enhances another type of event (located at the tip of the arrow). Examples of Hawkes graphs are presented in
Figure 8 and
Figure 9, which illustrate the multivariate Hawkes processes optimized to model word occurrence signals in the Paine text and the Plato text, respectively. Each node represents the occurrence events of an important word, and its corresponding background intensity,
, as defined in Equation (5), is also displayed within the node.
In a multivariate Hawkes process, various types of events mutually excite one another, and the magnitude of their mutual excitation is quantified by the following coefficient [
32]:
This coefficient measures how much the event of type-
enhances the occurrence of the type-
event. In
Figure 8 and
Figure 9, the values of
are displayed alongside the edges. To improve readability, edges with
are not shown in the graphs.
By constructing a Hawkes graph for a given text using the optimized parameter values of the multivariate Hawkes process, it becomes easier to infer the central concepts or notions of the text. For instance, based on
Figure 8, we identify “time” as one of the central notions in the Paine text. This conclusion is drawn from observing that five edges associated with the “time” node are incoming arrows, while only one edge flows outward from this node. The predominance of incoming arrows indicates that the concepts represented by the source nodes are predominantly used to describe the notion of “time.”
3.3. Finding Important Notions in Texts
The idea for inferring the central notions of a text from the Hawkes graph, as mentioned above, can be refined as follows. The difference between the inflows into the
-th node and the outflows from the same node is defined as:
This quantity intuitively represents the net extent to which the notion corresponding to the -th node is explained by other important notions.
The values of
are displayed in
Table 4, which ranks the importance of each notion in descending order for each text. The most important notion for each text—represented by the top-ranked word in
Table 4—can be considered appropriately identified based on the content of each text, as will be discussed below.
Clearly, the notion of “species” in the Darwin text, “relativity” in the Einstein text, and “dream” in the Freud text are all closely tied to the main themes of the respective documents, serving as the central concepts each text seeks to explain. In the Faraday text, the principles of chemistry and physics are explored through the fascinating subject of candles. The most significant topic in the Faraday text is the investigation of the substances that constitute candles and how they transform during combustion. Thus, it is reasonable to identify “substance” as the most critical concept in the text. The Paine text is a groundbreaking pamphlet published during the American Revolution, emphasizing the urgency of declaring independence. In this text, the author employs the notion of “time” as a motivational framework for driving transformative action. Hence, “time” is regarded as central because the author explains the nature of the current “time” within this context. In the Plato text, the notion of “man” takes center stage because the author’s philosophical reflections on justice, governance, and enlightenment are deeply rooted in his understanding of human nature and elaborated upon through this perspective.
3.4. Advantages of Analyzing Texts with Multivariate Hawkes Processes
Based on the results outlined so far, the following advantages of utilizing the multivariate Hawkes process have been established:
As demonstrated in Figures 2 through 7, the accuracy of modeling word occurrence signals in documents is significantly improved, making it more reliable when using the multivariate Hawkes process compared to the univariate one.
A Hawkes graph can be generated from the parameter values obtained through the analysis using the multivariate Hawkes process. This facilitates an intuitive understanding of the relationships among the concepts that emerge in the document.
The importance of each concept identified in a text can be assessed using the optimized parameters of the multivariate Hawkes process. The most significant concepts suggested for each text analyzed in this study are confirmed to be valid when considering the content of the text.
4. Conclusions
The occurrence signals of important words in six texts (one historical pamphlet and five renowned academic books) were modeled using univariate and multivariate Hawkes processes. The modeling procedure for the univariate Hawkes process was conducted as follows. First, optimized parameter values for the univariate Hawkes process were determined using maximum likelihood estimation (MLE) based on the occurrence signals of a given word in a considered text. Next, word occurrence signals were generated by simulating the univariate Hawkes process with optimized parameters. Finally, the validity of the univariate Hawkes process modeling was confirmed by verifying that the four characteristic quantities derived from the simulated word occurrence signals closely matched those obtained from the real signals observed in the text.
The modeling procedure for the multivariate Hawkes process followed a similar approach. First, optimized parameters for the -dimensional multivariate Hawkes process were evaluated using MLE based on the occurrence signals of important words, where represents the number of important words analyzed simultaneously with the multivariate Hawkes process. In this study, the typical value of was set to be 20. Next, occurrence signals of important words were simultaneously simulated using the multivariate Hawkes process with the optimized parameters. Finally, the validity of the multivariate Hawkes process modeling was confirmed by ensuring that the four characteristic quantities derived from the simulated word occurrence signals closely matched those obtained from the real signals observed in the text.
By validating the modeling with univariate and multivariate Hawkes processes, we observed that the correlation coefficients of the four characteristic quantities between real and simulated signals are notably closer to one in the case of the multivariate Hawkes process compared to the univariate one. This result demonstrates that the multivariate Hawkes process provides greater accuracy in describing word occurrence signals.
Additional advantages of the multivariate Hawkes process include its ability to intuitively represent the relationships among concepts described in a document through a Hawkes graph and to evaluate the importance of words in the text based on their relationships. For instance, the most important words in the analyzed documents were inferred using the optimized parameters of multivariate Hawkes processes, and these findings were confirmed to be valid.
Potential directions for future research utilizing the multivariate Hawkes process, which proved useful in this study, include:
Enhancing the accuracy of word occurrence signal descriptions by increasing the number of important words analyzed simultaneously.
Treating the matrix (defined by Equation (11)) as an adjacency matrix and applying graph theory methods, such as spectral clustering.
Since increasing the dimensionality of the multivariate Hawkes process poses challenges related to the computational cost of the MLE method, these directions are deferred to future studies.
Author Contributions
For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used “Conceptualization, H.O.; methodology, H.O.; software, H.O.; validation, Y.H., K.O. and M.K.; formal analysis, M.K.; investigation, H.O.; resources, H.O.; data curation, K.O.; writing—original draft preparation, H.O.; writing—review and editing, Y.H.; visualization, H.O. and K.O.; supervision, H.O.; project administration, H.O. and M.K.; funding acquisition, H.O. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by JSPS KAKENHI, grant number 24K15198.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Datasets and the major source codes used in this study are available in the Open Science Framework repository at
https://osf.io/6d8sf/.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| MLE |
Maximum Likelihood Estimation |
| ACF |
Autocorrelation Function |
| BIC |
Bayesian Information Criterion |
| KWW |
Kohlausch-Williams-Watts |
References
- Pawlowski, A. Time-Series Analysis in Linguistics. Application of the Arima Method to Some Cases of Spoken Polish. J. Quant. Linguist. 1997, 4, 203–221. [Google Scholar] [CrossRef]
- Pawlowski, A. Language in the Line vs. Language in the Mass: On the Efficiency of Sequential Modelling in the Analysis of Rhythm. J. Quant. Linguist. 1999, 6, 70–77. [Google Scholar] [CrossRef]
- Pawlowski, A. Modelling of Sequential Structures in Text. In Handbooks of Linguistics and Communication Science; Walter deGruyter: Berlin, Germany, 2005; pp. 738–750. [Google Scholar]
- Pawlowski, A.; Eder, M. Sequential Structures in “Dalimil’s Chronicle”; Mikros, G.K., Macutek, J., Eds.; Walter de Gruyter: Berlin, Germany, 2015; pp. 147–170. [Google Scholar] [CrossRef]
- Altmann, E.G.; Pierrehumbert, J.B.; Motter, A.E. Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 2009, 4, e7678. [Google Scholar] [CrossRef]
- Tanaka-Ishii, K.; Bunde, A. Long-range memory in literary texts: On the universal clustering of the rare words. PLoS ONE 2016, 11, e0164658. [Google Scholar] [CrossRef]
- Schenkel, A.; Zhang, J.; Zhang, Y. Long range correlation in human writings. Fractals 1993, 1, 47–57. [Google Scholar] [CrossRef]
- Ebeling, W.; Pöschel, T. Entropy and long-range correlations in literary english. Europhys. Lett. 1994, 26, 241. [Google Scholar] [CrossRef]
- Montemurro, M.A.; Pury, P.A. Long-range fractal correlations in literary corpora. Fractals 2002, 10, 451–461. [Google Scholar] [CrossRef]
- Alvarez-Lacalle, E.; Dorow, B.; Eckmann, J.P.; Moses, E. Hierarchical structures induce long-range dynamic correlations in written texts. Proc. Natl. Acad. Sci. USA 2006, 103, 7956–7961. [Google Scholar] [CrossRef]
- Altmann, E.G.; Cristadoro, G.; Esposti, M.D. On the origin of long-range correlations in texts. Proc. Natl. Acad. Sci. USA 2012, 109, 11582–11587. [Google Scholar] [CrossRef]
- Chatzigeorgiou, M.; Constantoudis, V.; Diakonos, F.; Karamanos, K.; Papadimitriou, C.; Kalimeri, M.; Papageorgiou, H. Multifractal correlations in natural language written texts: Effects of language family and long word statistics. Physica. A 2017, 469, 173–182. [Google Scholar] [CrossRef]
- Ogura, H.; Amano, H.; Kondo, M. Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function. J. Data Anal. Inf. Process 2019, 7, 46–73. [Google Scholar] [CrossRef]
- Ogura, H.; Amano, H.; Kondo, M. Origin of Dynamic Correlations of Words in Written Texts. J. Data Anal. Inf. Process. 2019, 7, 228–249. [Google Scholar] [CrossRef]
- Ogura, H.; Amano, H.; Kondo, M. Simulation of pseudo-text synthesis for generating words with long-range dynamic correlations. SN Appl. Sci. 2020, 2, 1387. [Google Scholar] [CrossRef]
- Ogura, H.; Hanada, Y.; Amano, H.; Kondo, M. A stochastic model of word occurrences in hierarchically structured written texts. SN Appl. Sci. 2022, 4, 77. [Google Scholar] [CrossRef]
- Ogura, H.; Hanada, Y.; Amano, H.; Kondo, M. Modeling Long-Range Dynamic Correlations of Words in Written Texts with Hawkes Processes. Entropy 2022, 24, 858. [Google Scholar] [CrossRef]
- Hawkes, A.G. Spectra of Some Self-Exciting and Mutually Exciting Point Processes. Biometrika 1971, 58, 83–90. [Google Scholar] [CrossRef]
- Ogata, Y. Statistical models for earthquake occurrences and residual analysis for point processes. J. Amer. Statist. Assoc. 1988, 83, 9–27. [Google Scholar] [CrossRef]
- Ogata, Y. Seismicity analysis through point-process modeling: A review. Pure Appl. Geophys. 1999, 155, 471–507. [Google Scholar] [CrossRef]
- Zhuang, J.; Ogata, Y.; Vere-Jones, D. Stochastic declustering of space-time earthquake occurrences. J. Amer. Statist. Soc. 2002, 97, 369–380. [Google Scholar] [CrossRef]
- Truccolo, W.; Eden, U.T.; Fellows, M.R.; Donoghue, J.P.; Brown, E.N. A Point Process Framework for Relating Neural Spiking Activity to SpikingHistory, Neural Ensemble, and Extrinsic Covariate Effects. J. Neurophysiol. 2005, 93, 1074–1089. [Google Scholar] [CrossRef]
- Reynaud-Bouret, P.; Rivoirard, V.; Tuleau-Malot, C. Inference of functional connectivity in Neurosciences via Hawkes processes. In Proceedings of the 1st IEEE Global Conference on Signal and Information Processing, Austin, TX, USA, 3–5 December 2013. [Google Scholar] [CrossRef]
- Gerhard, F.; Deger, M.; Truccolo, W. On the stability and dynamics of stochastic spiking neuron models: Nonlinear Hawkes process and point process GLMs. PLoS Comput. Biol. 2017, 13, e1005390. [Google Scholar] [CrossRef] [PubMed]
- Bacry, E.; Mastromatteo, I.; Muzy, J. Hawkes Processes in Market. Microstruct. Liq. 2015, 1, 1550005. [Google Scholar] [CrossRef]
- Rizoiu, M.A.; Lee, Y.; Mishra, S.; Xie, L. Hawkes processes for events in social media. In Frontiers of Multimedia. In Frontiers of Multimedia Research, 1st ed.; Chang, S. 2017, Ed.; Association for Computing Machinery and Morgan & Claypool: New York, USA, 2017. [Google Scholar] [CrossRef]
- Palmowski, Z.; Puchalska, D. Modeling social media contagion using Hawkes processes. J. Pol. Math. Soc. 2021, 49, 65–83. [Google Scholar] [CrossRef]
- Chiang, W.H.; Liu, X.; Mohler, G. Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates. Int. J. Forecast. 2022, 38, 505–520. [Google Scholar] [CrossRef]
- Embrechts, P.; Liniger, T. ; Lin. L. Multivariate Hawkes processes: an application to financial data. J. Appl. Probab. 2011, 48, 367–378. [Google Scholar] [CrossRef]
- Bowsher, C. G. Modelling security market events in continuous time: Intensity based, multivariate point process models, J. Econom 2007, 141, 876–912. [Google Scholar] [CrossRef]
- Yang, S. Y.; Liu, A. ; Chen, J;. Hawkes, A. Applications of a multivariate Hawkes process to joint modeling of sentiment and market return events, Quant. Finance 2018, 18(2), 295–310. [CrossRef]
- Embrechts, P.; Kirchner, M. Hawkes graphs, Theory Probab. Its Appl. 62, 132–156. [CrossRef]
- Osman, A. H.; Barukub, O. M. , Graph-Based Text Representation and Matching: A Review of the State of the Art and Future Challenges, IEEE Access 2020, 8, 87562–87583. [CrossRef]
- Edited by Segev, E. Semantic Network Analysis in Social Sciences, 1st ed.; Taylor & Francis, New York, USA, 2022.
- Laub, P. J.; Taimre, T.; Pollett. P.; Taimre, T. Hawkes Processes, arXiv:1507.02822. 2015. [CrossRef]
- Laub, P. J.; Lee, Y.; Pollett, K. P.; Taimre, T. Hawkes Models and Their Applications, arXiv:2405.10527. 2024. arXiv:2405.10527. 2024.
- Lindström,A.; Lindgren, S.; Sainudiin, R. Hawkes Processes on Social and Mass Media, International Conference on Data Technologies and Applications 2023, https://uu.diva-portal.org/smash/get/diva2:1827189/FULLTEXT01.pdf.
- Daley, D. J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume I :Elementary Theory and Methods, 2nd ed.; Springer-Verlag, New York, USA, 2003.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).