1. Introduction
B. Pertussis continues to be a significant public health concern globally [
1,
2] despite the availability of effective vaccines. Periodic B. Pertussis outbreaks have been documented in South Africa, with seasonal variations impacting the case count. Time series data analysis is essential for describing and understanding disease trends, predicting future epidemics, and implementing effective public health interventions.
Time series analysis serves as a valuable tool in disease surveillance and epidemiology, offering multiple methods for assessing and forecasting the incidence of illnesses. Support vector machines [
3], decomposition methods, and ARIMA modelling are frequently used in epidemiology. These techniques assess interventions, detect outbreaks, and describe disease dynamics [
4]. Furthermore, spectral analysis and the maximum entropy method have been applied to examine statistical trends of infectious diseases across multiple countries [
5]
In the analysis of discrete illness counts, integer-valued autoregressive models demonstrate advantages compared to traditional Box-Jenkins models [
6]. Time-domain and frequency-domain techniques, such as autocorrelation functions and spectrum analysis, are widely used; wavelet analysis represents a hybrid technique [
7]. When selecting techniques, it is important to consider factors such as scale, scope, disease characteristics, and surveillance objectives in conjunction with the algorithm's performance [
8].
Recent studies have detailed the various mechanisms underlying the resurgence and spread of B. Pertussis. B. Pertussis infections are significantly affected by weather variability across all age categories, particularly temperature and humidity [
9]. Population size and density influence persistence, and spatial-temporal studies have demonstrated the evolving patterns of disease peaks and regional expansion [
10]. Socio-environmental factors such as temperature, socioeconomic level, and school calendars influence incidence [
11]. Additionally, seasonal forcing and immune boosting establish cyclical behaviour and chaotic dynamics [
12].
Vaccination has resulted in more extended periodic dynamics. For example, high vaccination levels against B. Pertussis can suppress its annual cycle and lengthen long-term periodicity in pertussis. Moreover, prior research has shown that vaccination coverage, vaccine effectiveness, pathogen evolution, and demographic changes significantly influence the transmission dynamics of pertussis, with some studies reporting prolonged epidemic cycles [
15]. Comprehensive surveillance is necessary to enhance the understanding of B. Pertussis dynamics and anticipate future outbreaks.
Effective disease forecasting for communicable diseases has been made possible using models such as ARIMA and SARIMA. These models deliver early warnings for diseases such as dengue, tuberculosis, malaria, and COVID-19 while also offering reliable epidemic forecasts [
16,
17]. For example, SARIMA models have demonstrated effectiveness in identifying seasonal patterns of diseases, including hand, foot, and mouth disease [
18]. Furthermore, SARIMA models exhibit similarities to dynamic linear models when long-term longitudinal data is utilized [
19].
In the context of diseases characterised by long-range dependencies, such as hemorrhagic fever advanced models like SARFIMA have demonstrated superior predictive capabilities [
20]. Time series analytic approaches serve as valuable tools for public health surveillance, enabling authorities to implement preventative measures and mitigate the impact of disease outbreaks [
21].
Recent research has examined various predictive models for B. Pertussis incidence. Contemporary trends in short-term B. Pertussis data indicate the feasibility of employing time series analysis techniques, specifically ARIMA and exponential smoothing models [
10,
22] (Zeng et al., 2016; Wang et al., 2018). Hybrid methodologies that integrate wavelet transformations or ARIMA with neural networks have demonstrated enhanced accuracy [
22].
Incorporating additional data sources, including Google Trends, improves conventional modelling techniques [
23]. Numerous studies, including those conducted by Raycheva et al. (2020) and Zhang et al. (2019), have identified increased incidence and seasonal trends [
10,
24]. In addition, advanced models integrating age-structured and particle-filtering methodologies have been created to elucidate diminishing immunity and assess therapeutic interventions [
25,
26]. This allows advanced forecasting systems to evaluate potential strategies for reducing the burden of B. pertussis and providing early warnings, facilitating informed public health decision-making.
Despite extensive research on B. Pertussis epidemiology and forecasting methods, there remain gaps in integrating advanced time series models with socio-environmental variables. Previous studies have primarily utilised conventional statistical models like ARIMA and SARIMA, which, while effective, have limitations in accurately representing the complex nonlinear relationships characteristic of disease transmission. Moreover, few studies have explored B. Pertussis trend analysis in South Africa, highlighting the need for localised forecasting methods. This study seeks to address these deficiencies by employing various forecasting methodologies, including hybrid models that integrate machine learning techniques. This study aims to (1) analyse historical B. Pertussis trends and identify seasonal patterns, (2) compare the predictive accuracy of various time series models, and (3) evaluate the influence of climate on B. Pertussis incidence in Tshwane.
2. Materials and Methods
2.1. Study Design
This investigation employed a retrospective time series analysis to examine B. Pertussis incidence patterns and evaluate the predictive capabilities of various forecasting models. The methodological approach integrated traditional statistical techniques with machine learning algorithms to enhance disease prediction accuracy.
2.2. Data Collection
B. Pertussis surveillance data were obtained from public and private health facilities in the Tshwane Health District, South Africa.
2.3. Inclusion and Exclusion Criteria
The study included all laboratory-confirmed and clinically diagnosed B. Pertussis cases reported in Tshwane between 2015 and 2019. Records were excluded if they were duplicates or contained unverified clinical diagnoses.
2.4. Data Processing and Analysis
All datasets underwent standardisation into a uniform monthly aggregation format. Missing values were addressed through linear interpolation and multiple imputation techniques. To mitigate reporting errors, extreme values were identified using z-scores and boxplot analyses.
2.5. Time Series Modeling
The time series data were analysed through Seasonal-Trend Decomposition with LOESS (STL), which enabled the decomposition into trend, seasonal, and residual components. The Augmented Dickey-Fuller (ADF) test was employed to evaluate stationarity. Four forecasting approaches were considered: Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA (SARIMA), Holt-Winters Exponential Smoothing, and a Hybrid Machine Learning Model combining LSTM and ARIMA. Model selection was determined based on the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) were utilised to assess predictive accuracy. The statistical and computational analyses were performed using several software tools, including Python (with libraries such as pandas, statsmodels, and TensorFlow for LSTM models), Stata for statistical validation, and R for visualisation and time series decomposition.
To evaluate the models' performance, a rolling-window cross-validation method was used. Additionally, residual analyses were carried out to assess the model's goodness of fit and the reliability of their forecasts.
4. Discussion
The analysis of B. Pertussis trends in the Tshwane Health District over five years confirms the hypothesis of public health experts: B. Pertussis persists and demonstrates a distinct seasonal pattern, with peaks observed in Spring and Winter. This pattern corresponds with global observations, suggesting that B. Pertussis cases typically rise during colder months [
27]. The significance of these findings is that, despite vaccination initiatives, B. Pertussis persists in predictable cycles, highlighting the necessity for more strategic, data-informed interventions.
One of this study's key contributions is its use of ARMA(4) modelling, which successfully captured short-term fluctuations in B. Pertussis incidence. The cubic trend model provided a reliable approach to long-term forecasting. These models offer a window into the future, helping policymakers anticipate outbreaks rather than react to them.
Our modelling approach further develops existing B. Pertussis forecasting studies, which have predominantly used SARIMA models for their capacity to incorporate seasonality [
10]. Our findings indicate that ARMA-based models can yield comparably robust short-term forecasts, especially when integrating external factors like vaccination coverage and climate data [
26]. This offers a chance to enhance public health forecasting instruments by incorporating real-time surveillance data.
The current era necessitates an increased emphasis on predictive analytics in disease control efforts. Incorporating real-time Google Trends data and surveillance trends could facilitate the development of early warning systems, enabling public health officials to prepare for seasonal surges in B. Pertussis. This concept has been examined in other infectious disease modelling studies, where integrating Google Trends or electronic health records has improved outbreak predictions [
23] . Predicting the timing and location of case spikes allows preemptive intervention to prevent outbreak escalation.
The findings underscore the significance of strategically timing vaccination efforts. The increase of B. Pertussis in well-vaccinated populations is partially linked to waning immunity and the rise of pertactin-deficient strains. (Heininger et al., 2024).Our study confirms the existence of seasonal peaks, suggesting that administering booster doses prior to high-incidence periods could enhance herd immunity during critical times [
25].