Forecasting Models
This paper develops a Time series forecasting model ARIMA which is known to be an Auto-Regressive Integrated Moving Average. the models which include AR, MA and ARMA, and ARIMA, and develop a model list from these regression types using the parameters. The major parameters included in the arima model are p, q, and d where p is the parameter for Auto-regressive co-efficient which says about how many days have the co-relation between today’s date.
The real-world data tends to be always non-Stationary. A signal is said to be stationary if its statistical properties like mean, standard deviation, trend, etc... doesn’t change over time. To check if the time series is stationary or not, we use Augmented-Dickey Fuller Test where the null hypothesis is "the time series contains a unit root and is non-stationary". The results of the Augmented Dickey-Fuller Test for each of the Imputation datasets are given in the below sections.
Datasets Analysis and ADF test results
The ADF test for no-imputation dataset test-statistic = -5.45 p-value = 2.55e-06 for no-imputation dataset for first difference to try to change the time-series to stationary. test-statistic = -10.403 p-value = 1.88e-18 for Second Difference, which is not suggested as the p-value is zero (over-differencing) test-statistic = -21.152 p-value = 0.0
The ADF test for Mean imputation dataset test-statistic = -5.393 p-value = 3.49e-06 for first difference to try to change the time-series to stationary. test-statistic = -10.073 p-value = 1.23e-17 for Second Difference, which is not suggested as the p-value is zero (over-differencing) test-statistic = -21.617 p-value = 0.0
The ADF test for Median imputation dataset test-statistic = -5.363 p-value = 4.042e-06, for first difference to try to change the time-series to stationary. test-statistic = -10.075 p-value = 1.223e-17 for Second Difference, which is not suggested as the p-value is zero (over-differencing) test-statistic = -21.686 p-value = 0.0
The ADF test for Mode imputation dataset test-statistic = -5.227 p-value = 7.73e-06 for first difference to try to change the time-series to stationary. test-statistic = -10.258 p-value = 4.31e-18 for Second Difference, which is not suggested as the p-value is zero (over-differencing) test-statistic = -22.121 p-value = 0.0
The ADF test for Linear Interpolation imputation dataset test-statistic = -5.390 p-value = 3.53e-06 for first difference to try to change the time-series to stationary test-statistic = -10.072 p-value = 1.24e-17 for Second Difference, which is not suggested as the p-value is zero (over-differencing) test-statistic = -21.44 p-value = 0.0
Auto Regression and Moving Average model
In time series forecasting, the autoregressive moving average model of order
, denoted as ARMA(
), is a popular approach. The ARMA(
) model combines the autoregressive (AR) model of order
p and the moving average (MA) model of order
q. The ARMA(
) model assumes that the value of the time series at a given point is linearly dependent on the previous
p values of the series and the previous
q error terms. The formula for the ARMA(
) model is as follows:
In this formula:
represents the value of the time series at time t.
c is the intercept or constant term.
are the coefficients of the autoregressive terms that capture the relationship between the current and previous values.
represent the lagged values of the time series.
are the coefficients of the moving average terms that capture the relationship between the current value and the previous error terms.
represent the lagged error terms of the time series.
is the error term at time t, which represents the random fluctuations or noise in the series.
To estimate the parameters () and the intercept (c) of the ARMA() model, various estimation techniques can be used, such as maximum likelihood estimation.
Once the parameters are estimated, the ARMA() model can be used for forecasting by substituting the lagged values and lagged error terms of the time series into the formula to predict future values.
Note that the ARMA() model assumes stationarity of the time series, and it is a flexible model that can capture both autoregressive and moving average components in the data.
Seasonal Auto-Regressive Models
The Seasonal Autoregressive Integrated Moving Average (SARIMA) model is a time series forecasting model that extends the Autoregressive Integrated Moving Average (ARIMA) model to account for seasonality. SARIMA combines the components of ARIMA with seasonal differencing and seasonal autoregressive and moving average terms.
The SARIMA(p, d, q)(P, D, Q, s) model is defined by the following equations:
Autoregressive (AR) component: AR(p):
Integrated (I) component: I(d): , where B is the backshift operator ()
Moving Average (MA) component: MA(q):
Seasonal Autoregressive (SAR) component: SAR(P):
Seasonal Moving Average (SMA) component: SMA(Q):
where: is the observed time series at time t is the error term (also known as the residual) at time t are the non-seasonal AR, I, MA orders, respectively are the seasonal SAR, I, SMA orders, respectively s is the seasonal period or frequency (e.g., 12 for monthly data, 4 for quarterly data, etc.) are the non-seasonal autoregressive coefficients are the non-seasonal moving average coefficients are the seasonal autoregressive coefficients are the seasonal moving average coefficients