4. Results
In
Table 2, the descriptive statistics are exposed, for the Apple stocks. It can be observed that mean of closing prices is 160,79 dollars, while for opening prices is 160.43 dollars, on the analyzed period of September 2020 to September 2023. The mean for description polarity is 0.11, positive mean score, while for title polarity is 0.07, positive score. The maximum closing price, in the analyzed period was 196.45, while the minimum was 125.02.
In
Table 3, the descriptive statistics are exposed for the Microsoft stock. The mean for the opening prices is 282.28 dollars, while for the closing prices is 282.3 dollars. The mean sentiment score for the title was, on the analyzed period, 0.069, while for the description, 0.11, positive mean scores. The minimum opening price was 219.85 dollars, while the maximum was 361,75 dollars.
The descriptive statistics for Tesla stock are exposed in table for, where the mean of the opening prices for the analyzed period was 194.25 dollars and for the closing prices is 193.89 dollars. The mean for the title sentiment polarity is 0.064, while for the description polarity was 0.094. The minimum opening price was 103 dollars, while the maximum was 296.04 dollars, with a standard deviation of 47.28.
In
Figure 1, line graphics for all of the analyzed variables for the Apple stocks are presented, as follows: closing prices (1’st differenced), description length, description polarity, opening prices (1’st differenced), title length and title polarity. There is a noticeable difference in the variance of the title’s sentiment score versus the description sentiment scores, as the title polarity presents higher variance.
In
Figure 2, graphics for all of the analyzed variables for the Microsoft stocks, are exposed, as follows: closing prices (1’st differenced), the length of the description, the sentiment polarity of the description, the opening price (1’st differenced), the length of the title and the sentiment score of the title. It can be observed that the description polarity is more has a higher variance than the title polarity. While the stock prices register visible signals in the last part of the first quarter and the middle part of the last quarter.
In
Figure 3, the line graphs for all of the analyzed variables for the Tesla stocks are presented, as follows: closing prices (1’st differences), description length, description polarity, opening prices (1’st differenced), title length and title polarity. It can be observed that the title polarity scores present a higher variance than the description polarity scores.
- 2.
Unit root test – test for the absence of nonstationary
In
Table 3, results computed under Dickey–Fuller test (ADF) procedure are presented. Tested hypotheses are:
H0: A unit root is present in a time series sample.
H1: The time-series sample has no unit root; thus, it is stationary.
In
Table 5 we can obverse that the p-values for the description polarity, description length, title polarity and title length series, for the Apple stocks, are less than the significance level of 0.5, with no difference, thus we can reject the null hypothesis and conclude that the time series are stationary. In case of the closing and opening prices, the p-values were significant after the first differentiation.
In
Table 6 we can obverse that the p-values for the description polarity, description length, title polarity and title length series, for the Microsoft stocks, are less than the significance level of 0.5, with no difference, thus we can reject the null hypothesis and conclude that the time series are stationary. In case of the closing and opening prices, the p-values were non-significant, thus we have once differentiated all the opening and closing prices series.
In
Table 7 we can obverse that the p-values for the description polarity, description length, title polarity and title length series, for the Tesla stocks, are less than the significance level of 0.5, with no difference, thus we can reject the null hypothesis and conclude that the time series are stationary. In case of the closing and opening prices, the p-values were non-significant, thus we have once differentiated all the opening and closing prices series.
- 3.
Correlation matrix
In order to perform a model on our data we have computed the Pearson correlation, with the results exposed in
Table 8, for the Apple variables. It can be observed that the title polarity series are significantly correlated with the closing prices (with a significance level of 0.1), having a positive relationship.
In
Table 9, the Pearson correlation among the title and description variables and stock prices is presented. The only significant correlation was found between description polarity and opening prices, although the correlation has a negative sign.
In
Table 10, the Pearson correlating among the title and description variable and closing prices is exposed, with the only significant correlation among title polarity and opening prices, although on a negative sign.
We have further decided to test for heteroskedasticity, in order to perform a regression model on our data.
Heteroskedasticity
In
Table 11, the results of the Glejser tests were exposed, while taking in consideration the hypothesis tested:
H0: The residuals of the linear regression are homoscedastic.
H1: The residuals of the linear regression are heteroscedastic.
Because the significance value for the title polarity variable effect on the residuals of the equation is higher than 0.05, we conclude that there is no evidence for the presence of heteroskedasticity.
Because the p-value for the description polarity variable effect on the residuals of the equation is higher than 0.05, we conclude that there is no evidence for the presence of heteroskedasticity.
In
Table 13, we can observe that the p-value for the description polarity variable effect on the residuals of the equation is higher than 0.05, we conclude that there is no evidence for the presence of heteroskedasticity.
- 4.
Wavelet coherence
Because our closing and opening prices were not stationary, we have estimated and plotted the wavelet coherence for each of the analyzed stocks. As stated in [
29], “continuous wavelet cross-correlation provides a time–scale distribution of the correlation between two signals, whereas continuous wavelet coherence provides a qualitative estimator of the temporal evolution of the degree of linearity of the relationship between two signals on a given scale”.
In
Figure 4, the wavelet coherence for the closing prices and title polarity is plotted. It can be observed, in the cone of influence, that there that there are three regions of coherent oscillation behavior, firstly between 32 and 64 period coherence lags, in the first time period between the 100 to 300 news titles. In the first significant coherence lag, mentioned before, the arrows direction points to a lagging A index, that means that in the first 100 to 300 news titles, the closing prices determined the general opinion on the media, that was exposed through the news title, with a lag of 32 to 64 periods, where periods mean 6 to 8 hours. Also, in the second coherence region, the closing prices are leading the media opinion, represented through the title polarity, with a lag of 4 to 16 periods. In the third significant coherence region, the variables are in phase, the news title sentiments are leading the closing prices of the Apple stocks, with a lag of 8 to 20 periods.
In
Figure 5, the wavelet coherence between the description polarity and the opening prices is exposed. Three regions of coherence oscillation behavior can be observed in the cone of influence. In the first significant coherence regions, between the first 50-100 days, the variables are in phase, the description polarity score is leading the opening prices, with a lag of 4 to 16 periods (in this wavelet coherence analysis, a period is approximatively a day). In the second coherence region, the description polarity score is lagging, meaning that the opening prices influence the stock description score, with a lag of 4 periods (approx. 4 days). The second coherence period Is dated in the day 200 to 220. The third coherence period, between the 250 day to 300, the description polarity score is lagging, thus the opening prices is determining the description polarity score, with a lag of 4 to 16 days.
In
Figure 6, the wavelet coherence for the opening prices and title polarity is plotted. It can be observed, in the cone of influence, that there is no coherent oscillation behavior, except for a coherence region that generates from outside of the cone of influence, in the first time-period, with a direction that points out that the closing prices were leading the title polarity, with a lag of 8 to 16 periods (a period = approx. 1 day, in this case).
Regressions
Because the Glejser test provided evidence that the series residuals are homoscedastic in all of the three stick index cases analyses, different types of regression were computed, among which the most significant were exposed and analyzed.
In
Table 14, the results of the linear, quadratic and cubic regressions were exposed, with the closing prices transformed into stationary serios, by first differentiation. It can be observed that the cubic regression has the highest R Square value, of 0.13. In
Figure 7, the three regressions are exposed.
The equation for the cubic regression is, as follows in Equation (6).
Next, and autoregression with an exogenous factor is estimated, as follows from Equation (4).
In
Table 15, the parameters of the autoregressive model with exogenous factor are exposed. It can be observed that the parameter estimations are both positive, while the title polarity coefficient (0.41) is higher that the lagged closing prices coefficient (0.014). In
Table 16, the goodness of fit is analyzed, while the R squared is 0.004.
In
Table 17, the results of the linear, quadratic and cubic regressions were exposed, with the closing prices transformed into stationary serios, by first differentiation. It can be observed that the cubic regression has the highest R Square value, of 0.23. In
Figure 8, the three regressions are exposed.
The equation for the cubic regression is, as follows in Equation (7).
In
Table 18, the results of the linear, quadratic and cubic regressions were exposed, with the closing prices transformed into stationary serios, by first differentiation. It can be observed that the cubic regression has the highest R Square value, of 0.05. In
Figure 9, the three regressions are exposed.
In
Table 19, the parameters of the autoregressive model with exogenous factor are exposed. It can be observed that the parameter estimations are both positive, while the title polarity coefficient (16.99) is higher that the lagged closing prices coefficient (0.902). In
Table 20, the goodness of fit is analyzed, while the R squared is 0.555.
The result equation for the autoregressive model with exogenous factor is presented in equation 8.