Symmetry-Aware Simulation and Modelling of Noise-Robust Electric Load Forecasting Using Hybrid MMPF-NARX and GA/PSO Optimization

Stylianos Pappas; Alexandros Gazis; Nikos E. Mastorakis

doi:10.20944/preprints202606.1680.v1

Submitted:

22 June 2026

Posted:

23 June 2026

You are already at the latest version

Abstract

Reliable electric load forecasting is an important engineering problem for power-system planning, grid stability, and mission-critical energy management. This paper presents a symmetry-aware simulation and modelling framework for medium-range electric load forecasting under noisy and uncertain operating conditions. The proposed approach combines the Multi-Model Partitioning Filter (MMPF) with Nonlinear Autoregressive Exogenous (NARX) submodels, while two adaptive optimization strategies, Genetic Algorithm-based Resource Allocation (GARA) and Particle Swarm Optimization (PSO), are used to optimize the contribution weights of the parallel predictors. The modelling process uses real commercial power-system data and evaluates the forecasting framework over two-month horizons. To simulate realistic engineering disturbances, correlated symmetric Gaussian noise is injected into the testing phase under moderate and heavy noise scenarios. The cyclic symmetry of temporal variables, such as hour and month, is preserved through unit-circle encoding, while the symmetry and asymmetry of residual error symmetric distributions are examined through scatter-plot analysis. The results show that both GARA and PSO improve the robustness of the hybrid MMPF-NARX model, but PSO consistently achieves lower MAPE values, smoother convergence, and lower computational burden. The optimal configuration is obtained with nine NARX submodels, beyond which additional model complexity offers no meaningful performance gain. Overall, the study shows that symmetry-aware modelling, adaptive optimization, and noise-based simulation can support more reliable forecasting in modern power-system engineering applications.

Keywords:

adaptive weight allocation

;

electric load forecasting

;

genetic Algorithm (GA)

;

hybrid forecasting framework

;

medium-range forecasting

;

multi-model partitioning filter (MMPF)

;

noise robustness

;

nonlinear autoregressive exogenous (NARX)

;

particle swarm optimization (PSO)

;

power systems optimization

Subject:

Engineering - Electrical and Electronic Engineering

1. Introduction

The accurate prediction of electric load is vital to the dependability of present-day power systems. When forecasts go wrong, operators suffer losses due to the activation of reserves they did not want, higher production cost and reliability [1,2]. The growing use of renewable energy sources and demand-side variability might increase the non-linear characteristics of modern grids. This is the main reason for the increasing demand of forecasting methods that are adaptive and robust [3,4].

Classical approaches for example ARIMA models have been widely used in load forecasting problems in the past but are no longer sufficient. Since they are mostly linear models, they show poor performance for nonlinear and asymmetrical stochastic oscillations in modern energy demand [1,5]. Many hybrid forecasting models have been proposed to address these issues. The combination of artificial neural networks with multi-variable linear regression and support vector regression via fuzzy logic [6], as also presented in [3]. This method achieves better results than any of its parts. An alternative paper [4] delivers a hybrid framework that utilizes intelligent optimization algorithms to fine-tune the relevant weights and to select key features effective for enhancing prediction accuracy [7]. Comparative analyses demonstrate that tree-based machine learning models, namely random forests and gradient boosting, and recurrent neural networks achieve superior performance than traditional ARIMA models under non-linear load patterns [5,8].

To handle symmetrical and asymmetrical uncertainties in modern energy demand, ensemble-based and probabilistic forecasting methods have gained significant prominence. Recent research has shown that combining a large set of potential forecasts, each one generated with different parameter settings, improves the scalability and precision, especially in medium-range forecasting [9,10]. As such, the same framework should not only be used but also expanded in implementing and integrating various predictors such as Random Forests and Gradient Boosting Decision Trees. Moreover, recent studies such as [11] have suggested that using traditional methods like Quantile Regression and Kernel Density Estimation for developing full load forecasting probability symmetric distributions enables us to provide an optimal and reliable risk evaluation than traditional classic point forecasts [12,13]. In addition, it is worth mentioning that the study [14] expands the previous notion and suggests that early economic warning signs and industrial indicators must be taken into consideration while building hybrid networks. Specifically, these feature properties are important as they assist in maintaining high accuracy scores while ensuring that the patterns keep shifting near the critical load turning points [15].

Additionally, another important aspect to consider is the incremental adaptation of advanced state-space models, as well as careful data preparation methods used to ensure stability of operation in smart grids. Analytically, recent studies showcase that the use of conditional hidden semi-Markov models can be used in order to manage uncertain demand behavior by monitoring and detecting random changes in the electricity demands. As a result, these models can provide a real-case case study point of view in terms of their operation, as shown by [16,17], as they enable the system to split into different operating states and thus, enabling us to examine how long each state lasts. Specifically, although they are characterized by strong performance, they are severely associated with in-depth data cleaning, preprocessing analysis, and overfitting techniques to ensure optimal training. A typical example of this statement is when a Gaussian filter is applied to a smart meter data with an aims to provide a data cleaning method for eliminating noise from outliers or generic transmission error, [1,18]. As such, as evident from the above-mentioned examples [9,11,14,16], most contemporary power networks focus on reliability and providing a fault-tolerant operation by combining optimal design principles while taking into account the need for continuous and flexible optimization during all cycles of operation and processing.

On the one hand, an interesting technique is Particle Swarm Optimization (PSO), which is used extensively in the industry and academia alike to tune forecasting models. Specifically, it is commonly found in ARIMA models [1], complex Artificial Intelligence (AI) such as Neural Networks (NN) [19,20,21], and general support vector machines (SVM) [22,23,24]. Specifically, most of this research showcases the various ways that PSO can assist in terms of convergence and forecasting accuracy while handling large parameter spaces efficiently. More specifically, PSO specializing in Artificial NNs is of particular interest as they can achieve superior short-term forecasting than traditional NN structures [19,20]. Additionally, PSO can be fused with SVM to develop systems with augmented generalization models and stabler predictions [22,24]. For example, some systems with these characteristics have been implemented with grey models and wavelet-based methods that aim to improve results in cases of nonlinear behavior data samples [25]. Lastly, PSO can also be found in long-term load forecasting [26] and parameter estimation in power systems [27] as it is well regarded and used widely due to is simpleness in usage, decreased need for a high number of hyperparameters, and its rapid convergence [28].

On the other hand, even though PSO is one of the most well-known and applied hybrid methods, there is not much talk in terms of its use inside probabilistic multi-model filtering structures such as the Multi-Model Partitioning Filter (MMPF). Specifically, the hybrid MMPF-NARX architecture principles are not studied extensively, although there are some interesting notions, such as assigning different weights to several nonlinear submodels that run in a distributed parallel operation. An example of this can be found in Genetic Algorithms (GA), where adaptive optimization processing has been applied in many works, such as [29,30]. Analytically, GAs are defined as systems that operate by embedding crossover and mutation operators that provide global search capabilities while at the same time introducing additional structural complexity and affecting convergence system dynamics. On the contrary, one can argue that PSO uses a velocity/position update mechanism that is often responsible for smoother convergence behavior, as indicated in many works in forecasting, such as [19,22,25].

As such, based on the above, our paper focuses on presenting two methods, one GA- and one PSO- based than are to be fused with the same MMPF-NARX architecture for adaptive weight optimization, [31,32]. In terms of evaluating our proposed system, we employ our test on both real and commercial power system data [33]. Specifically, we aim to showcase our technical merit by monitoring their robustness as correlated Gaussian noise is introduced during our testing phases under moderate and heavy levels to simulate real case scenarios, and issue notices on modern commercial grids.

Our results are interesting as, at various points, they can be interpreted as potentially applicable in future mission-critical energy environments. The data quality is addressed through an adaptive filtering pre-processing stage to enhance model resilience. Analytically, it is worth noting that our comparison draws beyond traditional testing of properties as it includes several important technical factors such as: forecasting accuracy (MAPE), convergence behavior, e.g., the % Training MAPE per iteration, and performance consistency across 7 to 10 NARX sub-filter configurations. As such, we validate our technical contribution by attempting to determine whether PSO achieves a measurable benefit over GA if they are both under the same umbrella of a probabilistic hybrid forecasting framework.

2. Methods

2.1. Multimodel Partition Filter (MMPF)

Another crucial adaptive estimation method to consider is the Multi-Model Partitioning Filter (MMPF), which is developed for systems with nonlinear behavior and time-changing dynamics [34]. Specifically, MMPF operates in several parallel running models that focus not on a unique global predictor but rather on a individual level to describe a local view of the system’s behavior on which they operate. This means that the main drive behind MMPF is to categorize the prediction task among different candidate models, similar to the MapReduce algorithm classifier and stages. The main difference is that their contribution is not to aggregate results but to adjust continuously based on how well they perform and try to provide an accurate forecast according to their structure. As such, they draw beyond the MapReduce paradigm as their outputs are combined with an adaptive weighting rule that is used to determine the weight of each model, i.e., the % it will affect each step of the final algorithm of operation [35]. As such, the goal of MMPF is the percentage of affection of the final result during a forecast rather than generic aggregation modeling.

The weighting process is probabilistic. At each iteration, the prediction error of every submodel is evaluated against the real value. Based on this residual information, a symmetric likelihood function is calculated and used to update the posterior probability associated with each model [36]. The weights are normalized to ensure that their sum remains equal to one, preserving numerical stability and interpretability.

The optimal Minimum Mean Square Error (MMSE) estimate is given by:

\hat{x} (k / k) = \sum_{i = 1}^{M} \hat{x} (k / k; θ) p (θ / k) d θ

(1)

The probabilities are calculated online in a recursive manner as follows:

p (θ / k) = \frac{L (k / k; θ)}{\sum_{i = 1}^{M} L (k / k; θ) p (θ / k - 1) d θ}

(2)

where,

L (k / k; θ) = {|P_{z} (k / k - 1; θ)|}^{\frac{1}{2}} \cdot e^{(- 0.5 ‖\tilde{z} (k / k - 1; θ)‖ \cdot P_{z}^{- 1} (k / k - 1; θ))}

(3)

and

P_{z}^{- 1}

is calculated by the EKF equations

The basic structure, MMPF, with Extended Kalman Filter (EKF) implementing either ARMA or ARIMA models for forecasting purposes, is analytically presented in [37,38,39]. It has also been effectively utilized for adaptive electric load forecasting using empirical data [40], integrating it with Support Vector Machines (SVM), demonstrating substantial enhancements in predictive accuracy [41,42].

2.2. MMPF – NARX

In the proposed method, the ARMA models implemented in every EKF in Figure 1 are substituted by Nonlinear Autoregressive Exogenous (NARX) models. These models work as nonlinear predictors inside the MMPF (see Figure 2).

The NARX subfilters are implemented in parallel to save computational time. They aim to approximate the nonlinear relationship between the current and past electric load values, using both autoregressive and external input variables.

The general form of each symmetric NARX submodel is expressed as:

{\tilde{y}}_{i} (k) = F_{i} (y (k - 1), y (k - 2), \dots, y (k - p), u (k - 1), \dots u (k - q))

(4)

where F_i (·) represents the nonlinear vector of the ith submodel, p and q are the autoregressive and exogenous input vector orders, respectively. The predicted electric load at time instant k, by the MMPF-NARX, is denoted by

\tilde{y}

_i(k).

In this case study, the exogenous input vector u(k) is defined as:

u (k) = [\sin (\frac{2 π D_{h o u r}}{24}), \cos (\frac{2 π D_{h o u r}}{24}), D_{w e e k} (k),

\sin (\frac{2 π D_{m o n t h}}{24}), \cos (\frac{2 π D_{m o n t h}}{24}), {\bar{y}}_{(k - 6, k)}]

(5)

where,

D_hour (k)= the current hour of the day (1–24)

D_month (k)= the month of the year (1-12)

both expressed in a unit circle.

This encoding prevents discontinuities (i.e., from 23:00 to 0:00, or from the 12th month to the 1st). It effectively models the diurnal pattern of energy consumption.

D_week (k)= the day of the week (1–7)

{\bar{y}}_{(k - 6, k)}

= the moving average of the load, over the past six hours, defined as:

\frac{1}{6} \sum_{j = k - 6}^{k - 1} y (j)

Lastly, each NARX submodel minimizes its own mean-squared error:

J_{i =} \frac{1}{M} \sum_{k = 1}^{M} {(y (k) - {\tilde{y}}_{i} (k))}^{2}

(6)

2.3. GARA

To improve the adaptive weight selection of the MMPF-NARX structure, a Genetic Algorithm-based Resource Allocation (GARA) mechanism is added. The Genetic Algorithm works with different candidate weight vectors. It optimizes them by minimizing a fitness function based on forecasting error [43]. Through selection, crossover, and mutation procedures, the population gradually moves toward an optimal combination of NARX submodels, refining the probabilistic weighting provided by the MMPF [44].

The input to the genetic algorithm is the overall weighted output as estimated by the MMPF-NARX.

Each chromosome represents a candidate weight vector:

W = [w_{1}, w_{2}, \dots w_{M}]

(7)

with

\sum_{i = 1}^{M} w_{i}

= 1, and

w_{i} \in

[0, 1]

where

M

denotes the number of NARX submodels (7–10 depending on the configuration).

The first chromosome is initialized from the final vector of posterior probabilities as it is estimated by the MMPF:

w^{1} = w^{M M P F - N A R X} = [p (θ_{1} / k) \dots p (θ_{M} / k)]

(8)

with

\sum_{i = 1}^{M} w_{i}

= 1, and

w_{i} \in

[0, 1].

The fitness function measuring the forecasting accuracy is given by:

f (W) = \frac{1}{N} \sum_{k = 1}^{N} {|y (k) - \sum_{i = 1}^{M} w_{i} {\tilde{y}}_{i} (k)|}^{2}

(9)

The Genetic Algorithm evolves the population via selection, crossover, and mutation until convergence. The mutation rate is defined as:

μ_{t} = μ_{0} (1 - \frac{t}{t_{m a x}}) + ε

(10)

where,

μ_t = the mutation rate at generation t
μ₀ = the initial mutation rate
t_max = the maximum number of generations
$ε$ = the residual mutation term, ensuring minimal diversity near convergence

In this study, μ₀ is set to 0.1. This corresponds to a moderate mutation rate of 10%, which helps to avoid early convergence and keeps a good level of exploration. The weights assigned to each NARX submodel are optimized adaptively, so t_max was chosen as 40. Finally,

ε

is set to 0.003 to introduce a small stochastic disturbance in order to maintain diversity near convergence and avoid stagnation when fitness stops improving, without reducing stability.

The optimal weight vector is given by:

W^{*} = a r g m i n f (W)

(11)

and the final optimized forecast is expressed by

\hat{y} (k / k) = \sum_{i = 1}^{M} w_{i}^{*} {\tilde{y}}_{i} (k)

(12)

where,

w_{i}^{*}

represents the optimal weight assigned to the ith NARX submodel.

2.4. PSO

In order to examine whether swarm-based optimization offers measurable advantages over evolutionary resource allocation, the GARA mechanism is replaced by a Particle Swarm Optimization layer while preserving the same hybrid MMPF–NARX structure.

The input to the Particle Swarm Optimization (PSO) algorithm is the weighted prediction structure produced by the MMPF–NARX framework. Each particle represents a candidate weight vector:

W = [w_{1}, w_{2}, \dots w_{M}]

(13)

with

\sum_{i = 1}^{M} w_{i})

= 1, and

w_{i} \in

[0, 1]

where

M

denotes the number of NARX submodels (7–10 depending on the configuration).

In this work, the swarm consists of Ν_p = 25 particles, which is sufficient given the relatively low dimensionality of the weight vector. To preserve consistency with the probabilistic structure of the MMPF–NARX framework, a hybrid initialization strategy is adopted:

The first particle is initialized using the posterior probability vector estimated by the MMPF:

x_{1}^{0} = w^{M M P F - N A R X} = [p (θ_{1} / k) \dots p (θ_{M} / k)]

(14)

The remaining

x_{j}^{0}

are randomly generated as

w_{i}

~ U(0,1). within the admissible region and subsequently normalized to satisfy the unity-sum constraint

\sum_{i = 1}^{M} w_{i}

, ensuring that each particle represents a valid convex combination.

Initial velocities are randomly assigned within a small symmetric interval

v_{j}^{0}

~ U(-v_max, v_max), where v_max was set to 0.1, is to prevent excessive oscillatory behaviour and ensure stable exploration within the bounded weight space.

The forecasting accuracy is evaluated using the same objective function defined for the GA approach, equation (9).

At iteration

t

, the velocity of the

j

-th particle is updated as:

\begin{matrix} v_{j}^{t + 1} = ω v_{j}^{t} + c_{1} r_{1} (p_{j}^{t} - x_{j}^{t}) + c_{2} r_{2} (g_{j}^{t} - x_{j}^{t}) \end{matrix}

(15)

where:

ω = the inertia weight
c₁, c₂ = the acceleration coefficients
r₁,r₂ ~ U(0,1).
$p_{j}^{t}$ = the personal best position of particle $j$
$g_{j}^{t}$ = the global best position of the swarm

For this work, ω = 0.7, c₁ = c₂=1.5, ensuring a balanced exploration and exploitation trade-off.

As such, the position of each particle is updated according to

x_{j}^{t + 1} = x_{j}^{t} + v_{j}^{t + 1}

(16)

After every update, the negative weights are clipped to zero, the weights greater than one are reduced to one, and the vector is renormalized as:

w_{i}^{t + 1} = \frac{w_{i}^{t + 1}}{\sum_{i = 1}^{M} w_{i}^{t + 1}}

(17)

The PSO algorithm terminates when either the maximum number of iterations (

t_{m a x} = 40

) is reached or when the improvement in the global best fitness

∣ f_{b e s t}^{t + 1} - f_{b e s t}^{t} ∣

becomes less than

10^{- 6}

. This tolerance ensures convergence stability without enforcing excessive computational effort.

The final optimal weight vector and the final optimal forecast are given by equations (11) and (12) as for the GARA case.

2.5. Data Preparation

The dataset covers the period from January 2020 through December 2025. It is arranged in chronological order to keep the time structure of the load data. The data are split sequentially. Approximately 60% is used to train the MMPF–NARX submodels, 20% is used for validation, and the remaining 20% is reserved for testing and forecast evaluation.

To check how the proposed model works, having as input anomalous measurements, a disturbance was added only to the test subset. Specifically, we focused on providing a distinct process in terms of training and validation, i.e., to ensure that in all stages the model deployed does not learn from noisy, incomplete, or generally corrupted data. As such, to achieve this, correlated Gaussian noise was chosen to be added to our testing phases in an effort to simulate realistic measurement errors and sensor uncertainty behavior. The full data flow of the proposed MMPF-NARX framework, including the transition from weighted model outputs to the GARA or PSO optimization stage, is summarized in Figure 3.

Before feeding the raw load data into the MMPF-NARX framework, a data cleaning process takes place. It is known that power grids’ real data is often corrupted or there are missing values due to sensor failures, communication lag, or maintenance. To ensure the accuracy of the forecasting models, we implement a two-step pre-processing strategy.

The traditional

3 σ

(three-sigma) rule assumes a static mean and standard deviation. However, electric load is non-stationary and follows daily and seasonal patterns. Therefore, we adopt an Improved

3 σ

Criterion based on an Exponentially Weighted Moving Average (EWMA).

Instead of a fixed global mean, we calculate a local moving average that adapts to the load’s trend. If a measurement

z (k)

satisfies the following condition, it is flagged as an outlier:

| z (k) - μ_{local} (k) | > 3 σ_{local} (k)

(18)

where

μ_{local}

and

σ_{local}

represent the adaptive mean and standard deviation at time

k

. As such, errors should not take in to account natural peak loads (midday peaks, for example) while removing technical faults or spikes [45].

When outliers are removed or missing intervals are found, there will be gaps in the time series that need to be filled. As such, we can use Cubic Spline Interpolation for this purpose. This means that the cubic spline connects the data points using third order polynomials. AS such this approach guarantees the continuity of both the first and second derivatives of the load curve. Also, the continuity of symmetric signal is very important for the NARX subfilters, as it allows the internal Kalman filters to keep on smoothly interpolating the subfilter state, and thus avoid the prediction error exploding due to discontinuities in the artificial data [46,47].

The forecasting horizon has medium-range meaning two months. Three periods were selected: January–February (Period 1), April–May (Period 2), and August–September (Period 3). These snippets exhibit different seasonal patterns. They also allow for a transparent assessment in the medium-term operational conditions.

To test the model’s robustness two levels of correlated noise were selected:

Moderate correlated noise: correlation coefficient $ρ = 0.95$ , noise ratio = 0.25
Heavy correlated noise: correlation coefficient $ρ = 0.98$ , noise ratio = 0.50

Each of these scenarios simulates the distortion of the data at various levels. They allow us to visualize the performance change when they add more challenging conditions.

The proposed framework’s arrangement was also tested by altering the number of parallel NARX submodels. Configurations consisting of 7, 8, 9, and 10 submodels were implemented and evaluated under each noise level. This comparison shows how the number of submodels, i.e., the proposed method’s complexity, affects forecasting accuracy, robustness characteristics, and error behavior when measurement disturbances are present.

To assess the accuracy of the predictions, the Mean Absolute Percentage Error (MAPE) is calculated as follows:

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{A_{i} - P_{i}}{A_{i}}|

(19)

where A_i is the actual value, P_i is the predicted value, and n is the total number of samples.

The percentage Training MAPE per iteration is used to examine the convergence behavior of the optimization algorithms.

3. Results

As it was mentioned, the real data were injected with noise. Figure 4 and Figure 5 depict the effect of moderate and heavy noise on the data series. Similarly, Figure 6 and Figure 7 present the forecasting performance of the PSO-based model under moderate and heavy noise, while Figure 8 and Figure 9 present the corresponding forecasting performance of the GARA -based model under the same noise conditions. Moreover, Figure 10, Figure 11, and Figure 12 show the PSO error scatter plots for Period 1, Period 2, and Period 3 under moderate noise, while Figure 13, Figure 14, and Figure 15 show the same PSO error behavior under heavy noise. Furthermore, Figure 16, Figure 17, and Figure 18 present the GARA error scatter plots for the three periods under moderate noise, while Figure 19, Figure 20, and Figure 21 present the corresponding GARA error scatter plots under heavy noise. Lastly, Figure 22, Figure 23, and Figure 24 compare the fitness training error per iteration of both methods under moderate noise, while Figure 25, Figure 26, and Figure 27 compare their fitness training error per iteration under heavy noise.

The two methods were tested for day-ahead prediction in a time frame of two months. Three different forecasting periods were selected. Period 1 corresponds to June - July 2025, Period 2 to April May 2025, and Period 3 to August - September 2025.

4. Discussion

4.1. Comparative Evaluation of Optimization Strategies

The experimental results presented in Table 1 show that both GARA and PSO give stable and reliable performance inside the hybrid MMPF-NARX model. Analytically, our results are promising since during all of our test cases our 2 methods employed achieved low MAPE values, thus showcasing their suitability for medium-term load forecasting in complex and commercial grid conditions.

However, a closer examination of the metrics reveals that the proposed PSO method maintains a consistently better performance in all configurations. Specifically:

The overall forecast accuracy regarding the noise level and the number of submodels implemented has risen from 2.4% to 14.8%. This is important in terms of % MAPE for the PSO over GARA.
The “Medium Noise” optimal configuration (i.e., in Table 1 for N=9), the PSO reached a value of 6.0%. This is important in terms of relative improvement, i.e., as PSO 2.02% is comparable to 2.15% for GARA, as indicated in Table 1.
Similarly, in the “Heavy Noise” optimal configuration (i.e., in Table 1 for N=9), the PSO achieved a reduced 2.4% forecasting, indicating an accuracy improvement.

In conclusion, based on the above, our findings showcase that using PSO algorithms can provide a more efficient fine-tuning submodule weight system. This is particularly important in cases where even higher noise level conditions occur; GARA remains a robust resource allocation method under noise.

4.2. Non-Monotonic Relationship Between Filter Complexity and Accuracy

An important finding of our study is the connection between the number of parallel NARX subfilters (N) and the forecasting accuracy results that are not monotonic. Analytically, as presented in Table 1, the increased complexity is not connected to a sudden shift or a continuous reduction in error. On the contrary, Table 2 suggests that hybrid models can employ 3 different types of behavior.

As evident from Table 2 above, the NARX filter value exceeds an optimal value N (in this case, N = 9) as the method tries to either over-parameterize or start modeling the noise outliers as symmetric signal. As a result, this behavior leads to a reduced forecasting performance due to noise values passing throughout our initial system to the forecasting analysis.

4.3. Robustness and Practical Considerations

In terms of robustness and practical considerations, the MMPF-NARX framework is an ideal solution derived from the results of our testing phases for various forecasting periods and noise levels. Specifically, a performance gap between PSO and GARA is indicated under heavy noise conditions that can be translated as a need for a swarm-based optimization resilient solution. This could be beneficial for systems employing critical operations in energy environments characterized by a high percentage of data uncertainty. Lastly, in terms of our PSO training, the smoother convergence curves in association with better accuracy values indicated that our proposed system can be used as a rapid prototype or an efficient alternative in real-time application scenarios and test cases requiring high-precision forecasts due to sensory device-related uncertainties.

4.4. Error Scatter Analysis and Symmetric Distribution Characteristics

The error scatter plots (Figure 10, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, and Figure 21) provide a deeper insight into the nature of the residuals produced by each optimization method. The differences observed are the following.

PSO precision and concentration:
- The PSO error scatter plots (Figure 10, Figure 12, Figure 13, Figure 14, and Figure 15) are tightly grouped around.
- This symmetric spread is smaller compared to GARA.
- PSO particles tend to move, “swarm” quickly toward the global best solution.
- This is happening because the particles follow their own experience as well as the group’s.
- This swarm movement keeps forecasts close to each other and reduces variation in errors.
GARA Balance Diversity.
- The GARA error scatter plots (Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, and Figure 21) show a wider symmetric spread (greater variance).
- They appear more symmetrical balanced atop the horizontal axis than the PSO ones.
- The mutation process creates diversity of the Genetic Algorithm and prevents its collapse into one solution.
- As a result, the errors are more uniform or evenly symmetric spread.
- Generally, they have bigger absolute values than the respective PSO ones, although over time they are less systematically biased.
- Using this feature can help you avoid cumulative bias in longer-term operational horizons.

The differences previously mentioned can be explained by the bias-variance tradeoff [48,49,50]. Due to more concentration of PSO on exploitation, its errors are less and forecasting is more accurate [51]. On the other hand, GARA focuses on the exploration and diversity of the population via crossover and mutation. Due to this, this forecasting makes more balanced errors but is not as accurate as PSO’s.

For mission-critical applications, this difference is essential [52]. A tighter scatter for the PSO error is more appropriate for problems which require the minimization of the absolute error [53]. Nonetheless, the balanced nature of GARA indicates that it may not drift so much, as the errors reasonably neutralize each other over time. Both methods may be remarkable but are applied in different circumstances [54]. PSO provides high-precision instantaneous accuracy, while GARA generates unbiased and robust performance over time.

4.5. Optimization Dynamics and Convergence Behavior

Another important factor to consider is the property characteristics of GARA and PSO algorithms as presented in Figure 22, Figure 23, Figure 24, Figure 25, Figure 26, and Figure 27, i.e., the illustrations of our “Fitness Training Error per Iteration” curves. These illustrations provide insights analytically in terms of the internal dynamics of each optimization strategy. They show that initially, the error is reduced fairly fast, the final MAPE values are almost identical by the 40th iteration, and thus they perform very well [55]. As such, these results act as a validation for our test cases that both approaches employed are capable of identifying the best possible weight symmetric distribution for the hybrid MMPF-NARX framework.

Moreover, in terms of the shared initialization based on the MMPF posterior probabilities, all of the methods start early on from a similar error baseline. However, there is a distinct difference in their convergence path curves during the intermediate iterations.

PSO Smoothness and Stability
- Smoother and more stable convergence curves toward the global minimum.
- The error decreases more consistently.
- PSO updates particles using velocity and momentum [56], as described earlier.
- Each particle move toward their own best and global best position, causing the swarm to move smoothly.
- This prevents considerable fluctuation of the pattern.
GARA Oscillatory
- Between the 10th generation and the 30th generation, the way is more oscillatory.
- he peaks indicate that some errors rise sharply on occasion but fall afterwards.
- The Genetic Algorithm possesses a stochastic nature for this reason [57].
- The addition of a mutation operator assures diversity and avoids convergence to any one region for any long time, which helps in exploring better solutions in the search space. Also, it creates spikes in performance during training.

The distinction in the convergence curves is the classic exploration - exploitation trade-off [57]. GARA’s spikes indicate aggressive exploration, which sacrifices current accuracy in favor of broadening the search across a wider region of the parameter space. Conversely, PSO aims for an equitable exploitation of the available information causing a more smooth behavior [58,59,60].

4.6. Computational Burden

The computational burden was assessed by observing the error trajectories in Figure 22, Figure 23, Figure 24, Figure 25, Figure 26, and Figure 27. Although both algorithms were executed for 30 iterations, the effective convergence (i.e., the point where

Δ Fitness < 10^{- 4}

) occurred significantly earlier for PSO. Mathematically, this is due to the velocity update rule (Eq. 15), which acts as a first-order momentum filter, accelerating the search toward the global optimum. In contrast, GARA’s computational burden is increased by the Selection Operator, which requires sorting the entire population

P

in every generation, adding an

O (P l o g P)

overhead that becomes significant in real-time forecasting cycles. These computational differences are summarized in Table 3, which compares GARA and PSO in terms of complexity per iteration, convergence speed, FLOPs, memory overhead, and stability.

For GARA, the cost per iteration is higher due to sorting and stochastic sampling:

C_{GARA} \approx N_{p o p} \times (C_{eval} + C_{select} + C_{cross} + C_{mut})

(20)

Where

C_{select}

involves sorting the population, typically:

O (P l o g P)

(21)

For PSO, the cost is lower because it only involves vector arithmetic:

C_{PSO} \approx N_{p o p} \times (C_{eval} + C_{vel_update} + C_{pos_update})

(22)

Where velocity/position updates are simple

O (1)

vector additions.

Consequently, the fact that both methods reach almost the same final training error validates the robustness of the MMPF-NARX structure. Due to lower computational load, PSO’s smoother behavior may be desired for real-time energy management systems. On the flip side, GARA keeps being effective for the long-term allocation of resources.

5. Conclusions

This article presents a framework for simulating and modelling power load forecasting under uncertain conditions, a symmetric and noisy data. A Multi-Model Partitioning Filter is combined with NARX submodels, with GARA and PSO used as adaptive optimisation strategies to determine the final contribution weights of the parallel predictors.

Our work aligns with simulation and modelling in engineering, as it evaluates the forecasting framework under perturbation experiments. To approximate realistic measurement uncertainty, sensor error, and imperfect grid conditions, the test data were corrupted with correlated Gaussian noise. Cyclic temporal variables such as hour and month were encoded using unit-circle representation, and forecast residual symmetry and asymmetry were analysed through error scatter plots.

Results show that both GARA and PSO improve the robustness of the MMPF–NARX framework. PSO performed better overall across most configurations, achieving lower MAPE, smoother convergence curves, and lower computational cost, making it well suited to real-time or mission-critical energy applications. GARA remains valuable as a strong evolutionary method, particularly where broader exploration of the weight space is beneficial.

Another interesting finding is that forecasting accuracy does not improve simply by increasing the number of NARX submodels. Nine submodels produced the best performance, and adding more increased model complexity without meaningful benefit. In some cases, additional submodels actually generalised worse under strong noise. This suggests the framework must balance model diversity against over-parameterisation.

In summary, symmetry-aware modelling, adaptive optimisation, and noise-based simulation provide a reliable foundation for electric load forecasting in modern power-system engineering. Future work should test the framework on larger datasets, demand symmetric profiles incorporating renewable energy, and real-time smart grid deployments, and should examine more explicitly the relationship between residual symmetry, forecasting bias, and long-term operational stabili.

Author Contributions

All authors have read and agreed to the published version of the manuscript and contributed equally in all stages of this manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available upon request.

Acknowledgments

None.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

This paper proposes a forecasting scheme that combines the MMPF and NARX models as summarized in Figure A1 bellow:

Figure A1. Hybrid MMPF–NARX forecasting architecture with GARA or PSO optimization.

Lastly, in Table A1, we present a comparative analysis indicating for various case studies the approach, the main design principles behind its design, the strengths, and the weaknesses that were studied.

Table A1. Comparison based on approaches, main contribution, strengths, and limitations of recent research.

Ref	Approach	Main Concept	Strength	Limitation / Gap
[1,2]	ARIMA / ARIMA + PSO	Linear time-series forecasting with parameter tuning	Clear structure, easy implementation	Limited performance under nonlinear demand
[8]	Random Forests, Gradient Boosting, RNN	Machine learning for nonlinear load patterns	Superior to classical ARIMA	Higher computational cost
[9]	Ensemble probabilistic forecasting	A combination of multiple forecasts	Improved scalability and precision	Model management complexity
[11]	Quantile Regression + KDE	Full probability distribution modeling	Better risk evaluation	Requires reliable density estimation
[14]	Hybrid with economic indicators	Use of early warning and industrial metrics	Improved performance near turning points	Depends on external data quality
[16]	Conditional hidden semi-Markov models	State-duration-based load modeling	Captures dynamic operating states	Sensitive to preprocessing
[18]	Gaussian filtering of smart meter data	Noise reduction before training	Cleaner input data	Additional preprocessing step
[19,20,21]	PSO + ANN	Swarm optimization for neural networks	Higher short-term accuracy	Possible local stagnation
[21,22]	GA-based MMPF weight optimization	Evolutionary adaptive filtering	Strong global search	Oscillatory convergence behavior
[22,23,24]	PSO + SVM	Swarm-based SVM tuning	Improved generalization	Increased training time
[25]	PSO + Grey / Wavelet models	Hybrid nonlinear forecasting	Better nonlinear handling	Layered model complexity
[26]	PSO for long-term forecasting	Swarm tuning for extended horizon	Stable convergence	Parameter sensitivity
[27]	PSO in power system parameter estimation	Optimization of system cost functions	Efficient high-dimensional search	Initialization sensitivity
[28]	PSO theoretical analysis	Exploration-exploitation balance	Simple structure	Trade-off not fully eliminated
[38,39]	Bias-variance analysis	Theoretical explanation of trade-off	Conceptual clarity	Does not prescribe the optimal method
[61]	ANN + Regression + SVR (Fuzzy logic)	A hybrid combination of statistical and AI predictors	Better than standalone models	Increased model complexity
[62]	Hybrid with intelligent optimization	Optimization for weight tuning and feature selection	Improved accuracy	Depends on optimizer efficiency

References

Marrero, L.; Garcia-Santander, L.; Carrizo, D.; Ulloa, F. An application of load forecasting based on ARIMA models and particle swarm optimization. In Proceedings of the 2019 11th International Symposium on Advanced Topics in Electrical Engineering (ATEE), 2019. [Google Scholar] [CrossRef]
Xiao, L.; Xiao, L. Combined modeling for electrical load forecasting with particle swarm optimization. In Proceedings of the 2014 IEEE Workshop on Electronics, Computer and Applications, 2014; pp. 395–400. [Google Scholar] [CrossRef]
Shiblee, M.F.H.; Koukaras, P. Short-term load forecasting in the Greek power distribution system: A comparative study of gradient boosting and deep learning models. Energies 2025, 18, 5060. [Google Scholar] [CrossRef]
Gong, J.; Qu, Z.; Zhu, Z.; Xu, H.; Yang, Q. Ensemble models of TCN-LSTM-LightGBM based on ensemble learning methods for short-term electrical load forecasting. Energy 2025, 318, 134757. [Google Scholar] [CrossRef]
Dakheel, F.; Çevik, M. Optimizing smart grid load forecasting via a hybrid long short-term memory-XGBoost framework: Enhancing accuracy, robustness, and energy management. Energies 2025, 18, 2842. [Google Scholar] [CrossRef]
Li, K.; Yuan, L.; Qian, F.; Song, L.; Wu, X.; Wang, L.; Dai, J.; Shen, L. Short-Term Load Forecasting for Electricity Spot Markets Across Different Seasons Based on a Hybrid VMD-LSTM-Random Forest Model. Energies 2025, 18, 6097. [Google Scholar] [CrossRef]
Rani, S.; Ahmed, U.; Mahmood, A.; Manzoor, S.; Razzaq, S.; Jamil, U.; Zia, M.F.; Benbouzid, M. Dual Attention-Empowered Bidirectional Long Short-Term Memory Network for Short-Term Electric Load Forecasting. Energy Technol. 2026, 14, e202500914. [Google Scholar] [CrossRef]
Prashanthi, P.; Priyadarsini, K. A comparative study of the performance of machine learning based load forecasting methods. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), 2021; pp. 132–136. [Google Scholar] [CrossRef]
Landgraf, A.J. An ensemble approach to GEFCom2017 probabilistic load forecasting. Int. J. Forecast. 2019, 35, 1432–1438. [Google Scholar] [CrossRef]
Gupta, T.; Bhatia, R.; Sharma, S. An ensemble-based enhanced short and medium term load forecasting using optimized missing value imputation. Sci. Rep. 2025, 15, 21857. [Google Scholar] [CrossRef]
Wang, S.; Wang, S.; Wang, D. Combined probability density model for medium term load forecasting based on quantile regression and kernel density estimation. Energy Procedia 2019, 158, 6446–6451. [Google Scholar] [CrossRef]
Liu, Y.; Mei, F.; Zhang, J.; Dai, X.; Li, W. Power Load Probabilistic Prediction Based on Multi-Value Quantile Regression and Timing Fusion Ensemble Learning Model. Entropy 2026, 28, 329. [Google Scholar] [CrossRef] [PubMed]
Shah, S.A.H.; Ahmed, U.; Bilal, M.; Khan, A.R.; Razzaq, S.; Aziz, I.; Mahmood, A. Improved electric load forecasting using quantile long short-term memory network with dual attention mechanism. Energy Rep. 2025, 13, 2343–2353. [Google Scholar] [CrossRef]
Wang, Y.; Wang, Z.; Li, X.; Wang, Z.; Liu, R.; Tong, X. Medium and long-term load intelligent forecasting method based on the comprehensive and multiple early warning indicators. J. Phys. Conf. Ser. 2020, 1449, 012124. [Google Scholar] [CrossRef]
Wang, X.; Wang, Z.; Yang, K.; Song, Z.; Bian, C.; Feng, J.; Deng, C. A survey on deep learning for cellular traffic prediction. Intell. Comput. 2024, 3, 0054. [Google Scholar] [CrossRef]
Ji, Y.; Buechler, E.; Rajagopal, R. Data-driven load modeling and forecasting of residential appliances. In Proceedings of the 2021 IEEE Power & Energy Society General Meeting (PESGM), 2021; pp. 1–1. [Google Scholar] [CrossRef]
Wu, Z.; Wang, C.; Wu, J.; Wang, X.; Li, M.; Dong, Y.; Zhu, H.; Zhang, H. Dynamic adaptive modeling for non-intrusive load monitoring with unknown loads. Energy Build. 2025, 329, 115246. [Google Scholar] [CrossRef]
Rai, S.; De, M. Analysis of classical and machine learning based short-term and mid-term load forecasting for smart grid. Int. J. Sustain. Energy 2021, 40, 821–839. [Google Scholar] [CrossRef]
Abdullah, A.G.; Suranegara, G.M.; Hakim, D.L. Hybrid PSO-ANN Application for Improved Accuracy of Short Term Load Forecasting. WSEAS Trans. POWER Syst. 2014, 9, 446–451. Available online: https://wseas.com/journals/articles.php?id=5330 (accessed on 1 June 2026).
Shafiei Chafi, Z.; Afrakhte, H. Short-term load forecasting using neural network and particle swarm optimization (PSO) algorithm. Math. Probl. Eng. 2021, 2021, 1–10. [Google Scholar] [CrossRef]
Ozerdem, O.C.; Olaniyi, E.O.; Oyedotun, O.K. Short term load forecasting using particle swarm optimization neural network. Procedia Comput. Sci. 2017, 120, 382–393. [Google Scholar] [CrossRef]
Jiang, D. Study on short-term load forecasting method based on the PSO and SVM model. Int. J. Control Autom. 2015, 8, 181–188. [Google Scholar] [CrossRef]
Ji, G.; Li, S.; Shi, Z.; Zhang, X.; Zhao, W. Regional power load forecasting based on PSOSVM. In Proceedings of the 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC), 2018; pp. 1685–1688. [Google Scholar] [CrossRef]
Ren, G.; Wen, S.; Yan, Z.; Hu, R.; Zeng, Z.; Cao, Y. Power load forecasting based on support vector machine and particle swarm optimization. In Proceedings of the 2016 12th World Congress on Intelligent Control and Automation (WCICA), 2016; pp. 2003–2008. [Google Scholar] [CrossRef]
Bahrami, S.; Hooshmand, R.; Parastegari, M. Short term electric load forecasting by wavelet transform and grey model improved by PSO (particle swarm optimization) algorithm. Energy 2014, 72, 434–442. [Google Scholar] [CrossRef]
AlRashidi, M.; EL-Naggar, K. Long term electric load forecasting based on particle swarm optimization. Appl. Energy 2010, 87, 320–326. [Google Scholar] [CrossRef]
Alrashidi, M.R.; El-Naggar, K.M.; Al-Othman, A.K. Particle swarm optimization based approach for estimating the fuel-cost function parameters of thermal power plants with valve loading effects. Electr. Power Compon. Syst. 2009, 37, 1219–1230. [Google Scholar] [CrossRef]
Coello Coello, C.A.; Reyes-Sierra, M. Multi-objective particle swarm optimizers: A survey of the state-of-the-Art. Int. J. Comput. Intell. Res. 2006, 2. Available online: https://www.researchgate.net/publication/216301306_Multi-Objective_Particle_Swarm_Optimizers_A_Survey_of_the_State-of-the-Art (accessed on 1 June 2026). [CrossRef]
Beligiannis, G.; Skarlas, L.; Likothanassis, S. A generic applied evolutionary hybrid technique. IEEE Signal Process. Mag. 2004, 23, 28–38. [Google Scholar] [CrossRef]
Katsikas, S.; Likothanassis, S.; Beligiannis, G.; Berkeris, K.; Fotakis, D. Genetically determined variable structure multiple model estimation. IEEE Trans. Signal Process. 2001, 49, 2253–2261. [Google Scholar] [CrossRef]
Mustaffa, Z.; Sulaiman, M.H. Advanced forecasting of building energy loads with XGBoost and metaheuristic algorithms integration. Energy Storage Sav. 2025. [Google Scholar] [CrossRef]
Camacho-Villalón, C.L.; Stützle, T.; Dorigo, M. Designing new metaheuristics: manual versus automatic approaches. Intell. Comput. 2023, 2, 0048. [Google Scholar] [CrossRef]
Parvathareddy, S.; Yahya, A.; Amuhaya, L.; Samikannu, R.; Suglo, R.S. A hybrid machine learning and optimization framework for energy forecasting and management. Results Eng. 2025, 26, 105425. [Google Scholar] [CrossRef]
Saber, M.; Ouahi, M.; Motahhir, S.; El Akchioui, N. Neural Network-Aided Linear Parameter-Varying Modeling and Finite-Time Control for Autonomous Driving. Intell. Comput. 2026, 5, 0227. [Google Scholar] [CrossRef]
Hammam, I.M.; El-Kharbotly, A.K.; Sadek, Y.M. Adaptive demand forecasting framework with weighted ensemble of regression and machine learning models along life cycle variability. Sci. Rep. 2025, 15, 38482. [Google Scholar] [CrossRef] [PubMed]
Avinash, G.; Mishra, S. Bayesian model averaging based deep learning forecasts of inpatient bed occupancy in mental health facilities. Sci. Rep. 2025, 15, 38294. [Google Scholar] [CrossRef] [PubMed]
Pappas, S.; Ekonomou, L.; Karampelas, P.; Katsikas, S.; Liatsis, P. Modeling of the grounding resistance variation using ARMA models. Simul. Model. Pract. Theory 2008, 16, 560–570. [Google Scholar] [CrossRef]
Pappas, S.; Ekonomou, L.; Karamousantas, D.; Chatzarakis, G.; Katsikas, S.; Liatsis, P. Electricity demand loads modeling using autoregressive moving average (ARMA) models. Energy 2008, 33, 1353–1360. [Google Scholar] [CrossRef]
Pappas, S.S.; Karamousantas, D.C.; Chatzarakis, G.E.; Pappas, C.S. A New Hybrid Forecasting Strategy Applied to Mean Hourly Wind Speed Time Series. J. Wind Energy 2014, 2014, 683939. [Google Scholar] [CrossRef]
Pappas, S.S.; Ekonomou, L.; Moussas, V.C.; Karampelas, P.; Katsikas, S.K. Adaptive load forecasting of the Hellenic electric grid. J. Zhejiang Univ.-Sci. A 2008, 9, 1724–1730. [Google Scholar] [CrossRef]
Pappas, S.S.; Adam, S. Prediction of the long-term electrical energy consumption in Greece using adaptive algorithms. WSEAS Trans. Power Syst. 2018, 13, 291–299. [Google Scholar]
Pappas, S. Application and comparison of evolutionary techniques for forecasting the Hellenic grid electricity load. Int. J. Power Energy Res. 2017, 1, 139–149. [Google Scholar] [CrossRef]
Sakib, M.; Siddiqui, T.; Mustajab, S.; Alotaibi, R.M.; Alshareef, N.M.; Khan, M.Z. An ensemble deep learning framework for energy demand forecasting using genetic algorithm-based feature selection. PLoS ONE 2025, 20, e0310465. [Google Scholar] [CrossRef] [PubMed]
Tsoulos, I.G.; Charilogis, V.; Tsalikakis, D. Introducing a new genetic operator based on differential evolution for the effective training of neural networks. Computers 2025, 14, 125. [Google Scholar] [CrossRef]
Montgomery, D.C. Introduction to statistical quality control; John Wiley & Sons, 2019; Available online: https://www.wiley.com/en-us/shop/general-introductory-industrial-engineering/introduction-to-statistical-quality-control-8th-edition-p-9781119399308 (accessed on 1 June 2026).
Zhang, Z.; Xu, W.; Gong, Q. Short-term power load forecasting based on particle swarm optimization long short-term memory neural network. In Proceedings of the 2023 IEEE 2nd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), 2023; pp. 412–416. [Google Scholar] [CrossRef]
Zhou, F.; Zhou, H.; Li, Z.; Zhao, K. Multi-step ahead short-term electricity load forecasting using VMD-TCN and error correction strategy. Energies 2022, 15, 5375. [Google Scholar] [CrossRef]
Valentini, G.; Dietterich, T. Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods. J. Mach. Learn. Res. 2004, 5, 725–775. [Google Scholar] [CrossRef] [PubMed]
Geman, S.; Bienenstock, E.; Doursat, R. Neural networks and the bias/Variance dilemma. Neural Comput. 1992, 4, 1–58. [Google Scholar] [CrossRef]
Hosseini, E.; Møller, J.K.; Banaei, M.; Ebrahimy, R. Optimal weights and Heuristic Meta-Models: Learning-based multi-model strategies for energy forecasting. Array 2025, 100550. [Google Scholar] [CrossRef]
Wu, C.; Zhang, Z. Surfing information: The challenge of intelligent decision-making. Intell. Comput. 2023, 2, 0041. [Google Scholar] [CrossRef]
Wang, S.; Nikfar, M.; Agar, J.C.; Liu, Y. Stacked Deep Learning Models for Fast Approximations of Steady-State Navier–Stokes Equations for Low Re Flow. Intell. Comput. 2024, 3, 0093. [Google Scholar] [CrossRef]
Tindall, J.R.; Rogage, K.; Doukari, O. Optimising deep learning for smart building energy prediction: A particle swarm approach with real-world validation. J. Build. Eng. 2025, 114652. [Google Scholar] [CrossRef]
Lengler, J.; Offermann, T. Diversity-Preserving Exploitation of Crossover. In Proceedings of the 18th ACM/SIGEVO Conference on Foundations of Genetic Algorithms; 2025; pp. 154–165. [Google Scholar] [CrossRef]
Khan, S.; Mazhar, T.; Shahzad, T.; Ali, T.; Ayaz, M.; Ghadi, Y.Y.; Aggoune, E.H.M.; Hamam, H. Optimizing load demand forecasting in educational buildings using quantum-inspired particle swarm optimization (QPSO) with recurrent neural networks (RNNs): a seasonal approach. Sci. Rep. 2025, 15, 19349. [Google Scholar] [CrossRef] [PubMed]
Yang, B.; Zhang, Q.L. Parallelizing a modified particle swarm optimizer (PSO). Adv. Mater. Res. 2010, 163-167, 2404–2409. [Google Scholar] [CrossRef]
Goldberg, D.E. Genetic algorithms in search, optimization, and machine learning; Addison-Wesley Professional, 1989; Available online: https://dl.acm.org/doi/10.5555/534133 (accessed on 1 June 2026).
Slowik, A.; Kwasnicka, H. Evolutionary algorithms and their applications to engineering problems. Neural Comput. Appl. 2020, 32, 12363–12379. [Google Scholar] [CrossRef]
Osaba, E.; Villar-Rodriguez, E.; Del Ser, J.; Nebro, A.J.; Molina, D.; LaTorre, A.; Ponnuthurai, N.S.; Coello, C.C.A.; Herrera, F. A tutorial on the design, experimentation and application of metaheuristic algorithms to real-world optimization problems. Swarm Evol. Comput. 2021, 64, 100888. [Google Scholar] [CrossRef]
Črepinšek, M.; Liu, S.; Mernik, M. Exploration and exploitation in evolutionary algorithms. ACM Comput. Surv. 2013, 45, 1–33. [Google Scholar] [CrossRef]
Keitsch, K.A.; Bruckner, T. Modular electrical demand forecasting framework - A novel hybrid model approach. In Proceedings of the 2016 13th International Multi-Conference on Systems, Signals & Devices (SSD), 2016; pp. 454–458. [Google Scholar] [CrossRef]
Ghiasi, M.; Irani Jam, M.; Teimourian, M.; Zarrabi, H.; Yousefi, N. A new prediction model of electricity load based on hybrid forecast engine. Int. J. Ambient Energy 2017, 40, 179–186. [Google Scholar] [CrossRef]

Figure 1. MMPF basic Structure.

Figure 2. MMPF-NARX Architecture.

Figure 3. MMPF-NARX. The final weighted outputs become the input to either the GARA or the PSO for further processing and improvement.

Figure 4. Moderate Noise Scenario, correlation coefficient ρ = 0.95, noise ratio = 0.25.

Figure 5. Heavy Noise Scenario, correlation coefficient ρ = 0.98, noise ratio = 0.5.

Figure 6. Forecasting performance using PSO under the moderate noise scenario.

Figure 7. Forecasting performance using PSO under the heavy noise scenario.

Figure 8. Forecasting performance using GARA under the moderate noise scenario.

Figure 9. Forecasting performance using GARA under the heavy noise scenario.

Figure 10. Moderate Noise. PSO Error Scatter Plot, Period 1.

Figure 11. Moderate Noise. PSO Error Scatter Plot, Period 2.

Figure 12. Moderate Noise. PSO Error Scatter Plot, Period 3.

Figure 13. Heavy Noise. PSO Error Scatter Plot, Period 1.

Figure 14. Heavy Noise. PSO Error Scatter Plot, Period 2.

Figure 15. Heavy Noise. PSO Error Scatter Plot, Period 3.

Figure 16. Moderate Noise. GARA Error Scatter Plot, Period 1.

Figure 17. Moderate Noise. GARA Error Scatter Plot, Period 2.

Figure 18. Moderate Noise. GARA Error Scatter Plot, Period 3.

Figure 19. Heavy Noise. GARA Error Scatter Plot, Period 1.

Figure 20. Heavy Noise. GARA Error Scatter Plot, Period 2.

Figure 21. Heavy Noise. GARA Error Scatter Plot, Period 3.

Figure 22. Comparison of Fitness Training Error per Iteration for both methods. Moderate Noise Scenario, Period 1.

Figure 23. Comparison of Fitness Training Error per Iteration for both methods. Moderate Noise Scenario, Period 2.

Figure 24. Comparison of Fitness Training Error per Iteration for both methods. Moderate Noise Scenario, Period 3.

Figure 25. Comparison of Fitness Training Error per Iteration for both methods. Heavy Noise Scenario, Period 1.

Figure 26. Comparison of Fitness Training Error per Iteration for both methods. Heavy Noise Scenario, Period 2.

Figure 27. Comparison of Fitness Training Error per Iteration for both methods. Heavy Noise Scenario, Period 3.

Table 1. An overview of our MAPE errors for all of the employed methods (including different noise scenarios).

Noise Scenario	# of NARX models (N)	MMPF-NARX		Behavior
Noise Scenario	# of NARX models (N)	GARA	PSO	Behavior
Medium	7	2.70%	2.30%	Under-Capacity
	8	2.42%	2.08%	Improving
	9	2.15%	2.02%	Optimal
	10	2.17%	2.06%	Plateau
Heavy	7	3.1%	2.88%	Under-Capacity
	8	2.88%	2.58%	Improving
	9	2.55%	2.49%	Optimal
	10	2.61%	2.53%	Plateau

Table 2. An overview of the NARX models (N) on forecasting performance, i.e., a description of the behavior based on different cases.

Case	# of NARX models (N)	Behavior Description
1. Complexity Deficit	7, 8	Higher error rates are observed. The model cannot fully capture nonlinear oscillations and measurement noise. The number of subfilters is not enough.
2. Optimal Complexity	9	Best performance for both optimization methods. The diversity of subfilters is well balanced. This gives a more accurate representation of load dynamics.
3. Performance Plateau and Degradation	10	Error increases slightly (e.g., MAPE rises from 2.49% to 2.53% under heavy noise with PSO). Extra filters add unnecessary parameters. This may cause overfitting and reduce generalization ability.

Table 3. Computational Burden and Convergence Efficiency.

Metric	GARA	PSO	Improvement (%)
Complexity per Iteration	$O (P l o g P + P \cdot L)$	$O (P \cdot L)$	~25% Faster
Avg. Iterations to Convergence	26 - 30	14 -18	~45% Faster
Floating Point Ops (FLOPs)	High (Sorting/ Randomizing)	Low (Vector Addition)	-
Memory Overhead	Moderate (Population buffer)	Minimal (Velocity vectors)	-
Stability (Figure 25, Figure 26 and Figure 27)	Stochastic (High Variance)	Deterministic (Low Var.)	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Symmetry-Aware Simulation and Modelling of Noise-Robust Electric Load Forecasting Using Hybrid MMPF-NARX and GA/PSO Optimization

Abstract

Keywords:

Subject:

1. Introduction

2. Methods

2.1. Multimodel Partition Filter (MMPF)

2.2. MMPF – NARX

2.3. GARA

2.4. PSO

2.5. Data Preparation

3. Results

4. Discussion

4.1. Comparative Evaluation of Optimization Strategies

4.2. Non-Monotonic Relationship Between Filter Complexity and Accuracy

4.3. Robustness and Practical Considerations

4.4. Error Scatter Analysis and Symmetric Distribution Characteristics

4.5. Optimization Dynamics and Convergence Behavior

4.6. Computational Burden

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

MDPI Initiatives

Important Links

Subscribe