Preprint
Article

A Multi-Step Ensemble Approach for Energy Community Day-ahead Net Load Point and Probabilistic Forecasting

Altmetrics

Downloads

113

Views

27

Comments

0

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

08 January 2024

Posted:

09 January 2024

You are already at the latest version

Alerts
Abstract
The incorporation of renewable energy systems in the world energy system has been steadily increasing during the last years. In terms of the building sector, the usual consumers are becoming increasingly prosumers, and the trend is that communities of energy, whose households share produced electricity, will increase in number in the future. Another observed tendency is that the aggregator (the entity that manages the community) trades the net community energy in public energy markets. To accomplish economically good transactions, accurate and reliable forecasts of the day-ahead net energy community must be available. These can be obtained using an ensemble of multi-step shallow artificial neural networks, with prediction intervals obtained by the covariance algorithm. Using real data obtained by a small energy community of four houses located in the South region of Portugal, one can verify that the deterministic and probabilistic performance of the proposed approach is at least similar, typically better than using complex, deep models.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning

1. Introduction

The increased incorporation of renewable energy systems (RES) in the world energy system, replacing traditional thermal power stations, has been translated into important advantages in the reduction of atmospheric pollutants. Concerning the building sector, residential end users are moving from simple energy consumers to active prosumers (i.e., those who consume and produce energy), proactively managing their energy demands through energy production systems and, in some cases, using energy storage systems, such as battery energy storage (ES), and/or electric vehicles (EVs). Surplus electricity can be, in most cases, traded with grid companies.
This completely decentralized approach, however, has also disadvantages. From the grid point of view, it can introduce instability, resulting in increased interactions and unpredictable fluctuations; from the prosumers point of view, it is not so beneficial as expected, since generally, there is a large difference between buying and selling electricity tariffs. Due to these reasons, the concept of sharing and exchanging electricity between prosumers has emerged as a promising solution for achieving collective benefits. Energy communities, made up of interconnected loads and distributed energy resources (DERs), can employ energy management strategies that maximize community self-consumption and reduce dependence on the central grid. Different energy sharing strategies are discussed in [1], where it is assumed that each community member has an intelligent home energy management system (HEMS) managing its own flow of energy, under the supervision of an intelligent aggregator which controls the whole energy community and being also responsible for the community’s interaction with the central grid.
Nowadays, in many countries all over the world, the production and sale of electricity is traded under competitive rules in free markets. Thus, the electricity demand and price forecasting are a main target for the agents and companies involved in the electricity markets. In particular, short-term forecast, which is the one day ahead hourly forecast, has been extensively studied in the literature. Excellent reviews on demand and price forecasting can be found in [2,3] and [4,5], respectively. Works on load demand and price forecasts for several energy markets can be found in [6], for the Spanish market in [7] and for Portuguese energy market in [8].
This paper assumes the scenario of a community of energy managed by an aggregator. Additionally, it is also considered that the aggregator will buy the electricity needed by the community in an energy market, such as the Iberian Electricity Market – MIBEL. The day-ahead market of MIBEL is the platform where electricity is transacted for delivery on the day after the negotiation. This market is priced for each of the 24 hours of each day and for each one of the 365 or 366 days of each year. Every day the daily market platform is active until 12:00, when, at the close of the session, electricity prices are presented for the following day. Before this deadline, the energy proposals of purchase and sale are made, which will be in the origin of the value of the prices of electricity for each day-ahead hour.
To efficiently manage the community, the aggregator needs to employ good forecasts of the day-ahead net energy or power required, as well as the future selling/buying prices. This paper will deal only with the day-ahead net power and will assume that day-ahead prices are obtained by other techniques. As the net power is the difference between the load demand and the electricity produced, good forecasts of both variables are required. In addition, if the aggregator manages different communities of energy, these forecasts should be extended to the aggregation of all day-ahead net energy foreseen.
The existing load and generation forecast can be classified in terms of forecasting results into two categories: point forecast and probabilistic forecast [9]. Traditional point forecasting provides a single accurate estimate for future power, which is unable to adequately quantify the uncertainty associated with the actual variable. On the other hand, probabilistic forecasting is receiving considerable research attention due to its superior ability of capturing future demand uncertainty. Probabilistic prediction can better describe the uncertain components contained in the deterministic prediction results in three forms: prediction intervals (PI), quantile, and probability density function (PDF) [10]. The forecasting methods based on PIs are trained by optimizing the error-based loss function, and the output of the trained prediction model is used to construct PIs [11]. Traditional PI methods, such as Bayesian [12], Delta [13], Bootstrap [14] and mean-variance (MVE) [15] construct the corresponding PIs sequentially. In this paper, we shall be using the covariance method, introduced in [16] and discussed later on.
Specifically for load demand point forecasting, the works [17-19] can be highlighted. In the former, data from four residential buildings, sampled with an interval of 15 minutes, is employed. Three forecasting models are compared: i) the Persistence-based Auto-regressive model, ii) the Seasonal Persistence-based Regressive model, and iii) the Seasonal Persistence-based Neural Network model, as well as mixtures of these three methods.
Kiprijanovska and others [18] propose for forecasting a deep residual neural network, which integrates multiple sources of information by extracting features from (i) contextual data (weather, calendar), and (ii) the historical load of the particular household and all households present in the dataset. Authors compared their approach with other techniques on 925 households of Pecan Street database [20] with data re-sampled at 1 hour time interval. Day-ahead forecasts are computed at 10 hours, the previous day. The proposed approach achieves excellent results, in terms of RMSE, MAE and R2.
Pirbazari and co-authors [19] employed hourly data from a public set of 300 Australian houses, available from 2010 to 2013. This set consisted of load demand and electricity generation for the six hundred houses, which have, for analysis, been divided into 6 different 50 houses groups. For forecasting, they propose a predicting approach that first considers the highly influential factors and, second, benefits from an ensemble learning method where one Gradient Boosted Regression Tree algorithm is combined with several Sequence-to-Sequence Long Short-Time Memory (LSTM) networks. The ensemble technique obtained the best 24-hours ahead results, for both load and generation, when compared to other isolated and ensemble approaches.
Focusing now on PV generation point forecasting, the work [21] should be mentioned. The authors propose a LSTM-attention-embedding model based on Bayesian optimization to predict the day-ahead PV power output. The effectiveness of the proposed model is verified on two actual PV power plants in one area of China. The comparative experimental results show that the performance of the proposed model has been significantly improved compared to LSTM neural networks, Back-Propagation Neural Networks (BPNN), Support Vector Regression machines (SVR) and persistence models.
The works cited above deal with point forecasting. We shall now move to a probabilistic forecasting survey.
Jaun Vilar et. al. [6] proposes two procedures to obtain prediction intervals for electricity demand (as well as price) based on functional data. The proposed procedures are related to one day ahead pointwise forecast. The first method uses a nonparametric autoregressive model and the second one uses a partial linear semi-parametric model, in which exogenous scalar covariates are incorporated in a linear way. In both cases, the proposed procedures for the construction of the prediction intervals use residual-based bootstrap algorithms, which allows also to obtain estimates of the prediction density. The 2012 entire year of the total load demand in Spain is employed, using a hourly sampling time. It has been found that, using the Winkler score, the second method obtains better results.
Zhang and co-workers [9] propose a probability density prediction method based on monotone composite quantile regression neural network and Gaussian kernel function (MCQRNNG) for day-ahead load. Taking real load data carrying quantile crossing from Ottawa in Canada, Baden-Württemberg in Germany and a certain region in China, the performance of the MCQRNNG model is verified from three aspects: point, interval, and probability prediction. Results show that the proposed model significantly outperforms the comparison model based on effectively extricating the quantile crossing for power load forecasting.
Regarding PV generation, [22,23] should be mentioned. The former compares the suitability of a non-parametric and three parametric distributions in the characterization of prediction intervals of photovoltaic power forecasts with high confidence levels. The calculations were done for one year of day-ahead forecasts of hourly power generation of 432 PV systems. The results showed that, in general, the non-parametric distribution assumption for the forecast error yielded the best prediction intervals.
Mei et. al. [23] propose an LSTM- Quantile Regression Averaging (QRA)-based nonparametric probabilistic forecasting model for PV output power. A set of independent LSTM deterministic forecasting models is trained using historical PV data and Numerical Weather Prediction (NWP) data. Nonparametric probabilistic forecasting is generated by integrating the independent LSTM prediction models with Quantile Regression Averaging (QRA). In comparison with the benchmark methods, LSTM-QRA has higher prediction performance due to the better forecasting accuracy of independent deterministic forecasts.
Finally, [24] seems to be the only publication looking at the net load forecasting of residential microgrids. The authors propose a data-driven-based net load uncertainty quantification fusion mechanisms for cloud-based energy storage management with renewable energy integration. Firstly, a fusion model is developed using SVR, LSTM, and Convolutional Neural Network – Gated Recurrent Unit (CNN-GRU) algorithms to estimate day-ahead load and PV power forecasting errors. After that, two mechanisms are proposed to determine the day-ahead net load error. In the first mechanism, the net load error is directly forecasted, while in the second mechanism, it is derived from the forecast errors of load and PV power. The uncertainty is analyzed with various probability confidence intervals. However, those are not prediction intervals and, this way, the actual data does not lie within those bounds. Additionally, the forecasts are computed in the end of one day for the next 24 hours, i.e., the PH is only of 24 hours and cannot be used for energy markets.
The present paper solves the issues mentioned in relation with the last reference. It allows to obtain, for the community(s) managed an aggregator, excellent day-ahead hourly point forecasts of the net load, whose robustness can be accessed by prediction intervals obtained at a user-specified confidence level. This allows the aggregator to make more informed decisions regarding energy buying (and or selling).
This is achieved by the use of robust ensemble forecasting models, which will be described in Section 2. The data employed and the results obtained are presented in Section 3. They are discussed in Section 4 and compared with results obtained by other works. Section 5 concludes the paper.

2. Material and Methods

The design of robust ensemble forecasting models has been discussed elsewhere (please see [25]) and will only be summarized here.
To obtain “good” single models from a set of acquired or existent data, three sub-steps must be followed:
  • data selection – this is a pre-processing stage where, from the existing data, suitable sets (training, testing, validation, …) must be constructed.
  • features/delays selection - given the above data sets, the “best” set of inputs and network topology should be determined.
  • parameter estimation - given the data sets constructed and using the inputs/delays and network topology chosen, the “best” network parameters should be determined.
These are solved by the application of a model design framework, composed of two existing tools. The first, denoted as ApproxHull, performs data selection, from the data available for design. Feature and topology search are obtained by the evolutionary part of MOGA (Multi-Objective Genetic Algorithm), while parameter estimation is performed by the gradient part of MOGA.

2.1. Data Selection

To design data driven models it is mandatory that the samples that enclose the whole input-output range where the underlying process is supposed to operate should be present in the training set. To determine such samples, called convex hull (CH) points, out of the whole dataset, convex hull algorithms can be applied.
As standard convex hull algorithms suffer from both exaggerated time and space complexity for high dimensions, a randomized approximation convex hull algorithm, denoted as ApproxHull, proposed in [26], is employed.
In a prior step before determining the CH points, ApproxHull eliminates replicas and linear combinations of samples/features. After having identified the CH points, ApproxHull generates the training, test and validation sets to be used by MOGA, according to user-specifications, but incorporating mandatorily the CH points in the training set.

2.2. MOGA

This framework is described in detail in [27], and it will be briefly discussed here.
MOGA designs static or dynamic ANN (Artificial Neural Network) models, for classification, approximation or forecasting purposes.

2.2.1. Parameter Separability

MOGA employes models that are linear-nonlinearly separable in the parameters [28]. The output of this type of models, at sample k, is given as:
y ^ x k , w = u 0 + i = 1 n u i φ i x k , v i = φ x k , v u
In (1), x k   is the kth input sample, φ is the basis functions vector, u is the (linear) output weights vector and v represents the nonlinear parameters. For simplicity, we shall assume here only one hidden layer, and this way v is composed of n vectors of parameters, each one for each neuron v = v 1 v n T . This type of models comprises Multilayer Perceptrons, Radial Basis Function (RBF) networks, B-Spline and Asmod models, Wavelet networks, and Mamdani, Takagi and Takagi-Sugeno fuzzy models (satisfying certain assumptions) [29].
In the case described here, RBF models will be employed. The basis functions employed by RBFs are typically Gaussian functions:
φ i x k , v i = e x k c i 2 2 2 σ i 2 ,
Which means that the nonlinear parameters for each neuron are constituted by a centre vector, c i , with as many components as the dimensions of the input, and a spread parameter, σi.
Associated with each model, there are always heuristics that can be used to obtain the initial values of the parameters. For RBFs, centre values can be obtained randomly from the input data, or from the range of the input data. Alternatively, clustering algorithms can be employed. Initial spreads can be chosen randomly, or using several heuristics, such as:
σ = z m a x 2 n ,
where zmax represents the maximum distance between centres and n represents the number of neurons.
For the general case, the model parameters can this way be divided into linear (u) and nonlinear (v) parameters:
w = u v ,
and this separability can be exploited in the training algorithms. For a set of input patterns X, training the model means finding the values of w, such that the following criterion is minimized:
Ω X , w = y y ^ X , w 2 2 2
where y is the target vector, y ^  is the RBF output vector, and ||.||2 denotes the Euclidean norm. Replacing (1) in (5), we have:
Ω X , w = y Γ X , v u 2 2 2
Where Γ X , v = φ x 1 , v φ x m , v T , m being the number of patterns in the training set. As (4) is a linear problem in u, its global optimal solution is given as:
u * = Γ + X , v y
Where the symbol ‘+’ denotes a pseudo-inverse operation. Replacing (7) in (6), we have a new criterion, which is only dependent on the nonlinear parameters, v:
Ψ X , w = y Γ X , v Γ + X , v y 2 2 2  

2.2.2. Training Algorithm

Any gradient algorithm can be used to minimize (6) or (8). For non-linear least-squares problems, the Levenberg-Marquardt (LM) method [30,31] is recognized to be the state-of-the-art method, as it exploits the sum-of-squares characteristics of the problem. The LM search direction at iteration k ( p k ) is given as the solution of:
J k T J k + λ k I p k = g k ,
where g k   is the gradient vector:
g k = δ Ω k δ w k   or   δ Ψ k δ v k
and J k   is the Jacobean matrix:
J k = δ y ^ k δ w k   or   δ y ^ k δ v k
The parameter λ is a regularization parameter, which enables the search direction to change between the steepest descent and the Gauss-Newton directions. The gradient and Jacobean of criterion (8) can be obtained by: a) first computing (7); b) replacing the optimal values in the linear parameters; c) and determining the gradient and Jacobian in the usual way.
In MOGA, the training algorithm terminates after a predefined number of iterations, or using an early-stopping technique [32].

2.2.3. The evolutionary part

MOGA evolves ANN (in the case of the present paper, RBFs) structures, whose parameters separate, each structure being trained by the LM algorithm minimizing (8). We shall be using multi-step-ahead forecasting, and this way each model should be iterated PH times to predict the evolution of a variable within a selected Prediction Horizon (PH). For each step, the model inputs evolve with time. Consider the Nonlinear Auto-Regressive model with Exogeneous inputs (NARX), with just one input, x, for simplicity:
y ^ k + 1 | k = f y k d o 1 , , y k d o n , x k d i 1 , , x k d i m ,
Where y ^ k + 1 | k   denotes the prediction for time-step k+1 given the measured data at time k, and d k j the jth delay for variable k. This represents the 1-step-ahead prediction within a prediction horizon. As (12) is iterated over PH, some or all the indices in the right hand-side of (12) will be larger than j, which means that there are no measured values, and the corresponding forecast must be employed. What has been said for NARX models is also valid for NAR models (models with no exogeneous inputs).
The evolutionary part of MOGA evolves a population of ANN structures. Each topology consists of the number of neurons in the single hidden layer (for an RBF model), and the model inputs or features. MOGA assumes that the number of neurons must be within a user-specified bound, n ∈ [nm, nM]. Additionally, one needs to select the features to use for a specific model, i.e., input selection must be performed. In MOGA it is assumed that, from a total number q of available features, denoted as F, each model must select the most representative d features within a user-specified interval, d ∈ [dm, dM], dMq, dm representing the minimum number of features, and dM its maximum number.
The operation of MOGA is a typical evolutionary procedure, with the operators conveniently changed to implement topology and feature selection. We shall refer the reader to publication [27] regarding a more detailed explanation of MOGA.
MOGA is a multi-objective optimizer which means that the optimization objectives and goals need to be defined. Typical objectives are Root-Mean-Square Errors (RMSE) evaluated on the training set (ρtr), or on the test set (ρte), as well as the model complexity, #(v) - number of nonlinear parameters - or the norm of the linear parameters (||u||2). For forecasting applications one criterion is also used to assess its performance. Assuming a time-series sim, a subset of the design data, with p data points is considered. For each point, the model (11) is used to make predictions up to PH steps-ahead. Then an error matrix is built:
E s i m , P H = e 1,1 e 1 , P H e p P H , 1 e p P H , P H ,
where e[i,j] is the model forecasting error taken from instant i of sim, at step j within the prediction horizon. Denoting the RMS function operating over the ith column of matrix E, by ρ s i m . , i , the forecasting performance criterion is the sum of the RMS of the columns of E:
ρ s i m P H = i = 1 P H ρ s i m E s i m , P H , i
Notice that in the MOGA formulation every performance criterion can be minimized or set as a restriction to be met.
After having formulated the optimization problem, and after setting other hyperparameters, such as the number of elements in the population (npop), number of iterations population (niter), and genetic algorithm parameters (proportion of random immigrants, selective pressure, crossover rate and survival rate), the hybrid evolutive-gradient method is executed.
Each element in the population corresponds to a certain RBF structure. As the model is nonlinear, a gradient algorithm such as the LM algorithm minimizing (6) is only guaranteed to converge to a local minimum. For this reason, the RBF model is trained a user-specified number of times, starting with different initial values for the nonlinear parameters. MOGA allows initial centers chosen from the heuristics mentioned in Section 2.2.1, or using an adaptive clustering algorithm [33].
After having executed the specified number of iterations, we have performance values of npop * niter different models. As the problem is multi-objective, a subset of these models corresponds to non-dominated models (nd), or Pareto solutions. If one or more objectives is (are) set as restriction(s), then a subset of nd, denoted as preferential solutions, pref, corresponds to the non-dominated solutions that met the goals.
The performance of MOGA models is assessed on the non-dominated models set, or in the preferential models set. If a single solution is sought, it will be chosen based on the objective values of those model sets, of performance criteria applied to the validation set, and possibly other criteria.
When the analysis of the solutions provided by the MOGA requires the process to be repeated, the problem definition steps should be revised. In this case, two major actions can be conducted: input space redefinition by removing or adding one or more features (variables and lagged input terms in the case of forecasting problems) and improving the trade-off surface coverage by changing objectives or redefining goals. This results in a smaller search space in a subsequent run of the MOGA, possibly achieving a faster convergence and better approximation of the Pareto front.
Typically, for a specific problem, an initial MOGA execution is performed, minimizing all objectives. Then a second execution is run, where typically some objectives are set as restrictions.

2.3. Model Ensemble

According to the previous discussion the output of MOGA is not a single model, but sets of nondominated RBFs or, possibly, preferential models. The models within these sets are typically powerful models, all with a different structure, which can be used to form a stacking ensemble.
Several combination approaches have been applied for ensemble models. The most usual way is the use of the mean, against which other alternatives are compared. The authors of [34] propose four approaches: the median, the use of weights proportional to the probability of success, the use of weights calculated by variance minimization, and the use of weights calculated from eigenvector of covariance matrix of forecast errors. Other alternatives are even more complex. For instance, in [35] a Bayesian combination approach for combining ANN and linear regression predictor has been used. In [36] a combination of weather forecasts from different sources using aggregated forecast from exponential reweighting is proposed.
To be employed in computationally intensive tasks, a computationally simple technique of combining the outputs of the model ensemble must be employed. As in a few exceptional cases, MOGA-generated models have been found with outputs that can be considered as outliers, the ensemble output is obtained as a median, and not as a mean. The ensemble output model, using a set of models, denoted as outset, is therefore given as:
y ˘ = m e d i a n y ^ i , i ϵ o u t s e t , n   o d d y m 1 : y m 1 , y m 2 a r e   t h e   m i d d l e   v a l u e s   o f   t h e   s o r t e d   o u t s e t ,   n   e v e n ,
n being the number of elements of outset. It was argued in [25] that 25 was a convenient number of elements for outset, and its elements should be selected using a Pareto concept. Specifically, the following sequence of procedures should be followed for model design:
i)
Use ApproxHull to select the datasets from the available design data.
ii)
Formulate MOGA, in a first step, minimizing the different objectives.
iii)
Analyze MOGA results, restrict some objectives and redefine input space and/or the model topology.
iv)
Use the preferential set of models obtained from the second MOGA execution for the next steps.
v)
Determine m e p r e f using
m e p r e f = m p r e f : n w m p r e f m ˘ n w p r e f m p r e f : f o r e m p r e f m ˘ f o r e p r e f
Where m ˘ n w p r e f = m e d i a n n w m p r e f , m ˘ f o r e p r e f = m e d i a n f o r e m p r e f , m p r e f is the set of preferential models obtained in the 2nd MOGA execution, n w m p r e f is the set of the linear weight norm for each model in m p r e f , m ˘ n w p r e f being its median; f o r e m p r e f is the set of the sum of the forecasts for the prediction time-series obtained for all models in m p r e f , computed using (15), being m ˘ f o r e p r e f its median.
vi)
From m e p r e f obtain m e p a r using:
m e p a r = n d f o r e m e p r e f , n w m e p r e f : # n d = 25
This set is obtained iteratively, by adding non-dominated models, considering nw(.) and fore(.) as the two criteria, until 25 models are found. Initially, the set of models to be inspected, m i , i = 1 is initialized to m e p r e f , and m e p a r to an empty set. In each iteration, both criteria are applied to m i . Then, the non-dominated solutions found are added to m e p a r and removed from m i .
vii)
If the model that should be designed is a NARX model, use m e p a r for the exogenous variables.

2.4. Robust Models

Recent inclusion of smart grids and renewable integration requirements induced the drawback of increasing the uncertainty of future prices, supply and demand. Academics and practitioners alike realized that probabilistic electricity price, load and production forecasting is now more important for energy systems planning and operations than ever before [37,38].
Predictive modeling assumes that all observations are generated by a data generating function f(x) combined with additive noise:
y k = f z k = y ^ k + ε k
where y ^ k   is the point forecast of the variable at time k obtained using data available at an earlier point in time and  is a data noise term, responsible for introducing uncertainty in the model. This can be decomposed into three terms: training data uncertainty, related with the uncertainty over how representative is the training data with respect to the whole operational range of the model input; model misspecification, related with the fact that the model, with its optimal parameters and data conditions, can only approximate the real underlying function generating the data; and parameter uncertainty, related with the fact that the parameters employed in the model might not be the optimal ones. Notice that ApproxHull tries to minimize the first type of uncertainty, the evolutionary part of MOGA the second one and the derivative part of MOGA minimizes the third one.
The most common extension from point to probabilistic forecasts is to construct prediction intervals (PIs). A number of methods can be used for this purpose, the most popular consider both the point forecast and the corresponding error. A (1 − α) PI implies that the interval contains the true value with a probability of (1 − α). Transferring this idea to the calculation of quantiles leads to τ = α/2 lower bound and τ = (1 – α/2) for the upper bound [39]. For instance, we need to calculate the 5% and 95% quantiles to yield a 90% PI if the distance between the two quantiles is considered, which is the level used in this paper. We will denote as y ^ - k and y - ^ k the upper and lower bounds, respectively.
There are diverse ways to compute the PIs. The interested reader can consult [11], which describes different possibilities. In this case the covariance method introduced in [16] is employed. Using the notation Γ X , v , introduced before to denote the output matrix of the last nonlinear layer, which is dependent on the input matrix, X, and on the nonlinear parameters v. The total prediction variance for each prediction y ^ k can therefore be obtained as
σ t o t 2 = σ ε 2 1 + φ T x k , v Γ T Z t r , v Γ Z t r , v 1 φ x k , v
where Ztr is the input training matrix, and the data noise variance, σ ε 2 , can be estimated as:
σ ε 2 = i = 1 N e i 2 N p = i = 1 N y i y ^ z i 2 N p ,
where N denotes the total number of samples and p the number of model parameters. The bounds are assumed to be symmetric and given as:
y ^ k _ = y ^ k Δ y ^ k y k y ^ k + Δ y ^ k = y ^ k ¯
Δ y ^ k = t 1 α 2 , N p σ t o t
where  represents the α/2 quantile of a Student’s t-distribution with Np degrees of freedom.
In particular, we can use in (18) the optimal parameter values obtained after training, Γ Z t r , v * and φ x k , v * . Notice that, due to the training of the model in MOGA, we have available (or can indirectly be obtained) these two quantities, if x k   belongs either to the test or validations sets. In other words, it is not computationally expensive to compute (18).

2.5. Robust Ensemble Forecasting models

For every time instant k, we are interested in determining PH forecasts y ^ k + s | k , s 1 P H , together with the corresponding bounds Δ y ^ k + s | k , s 1 P H .
As we are dealing with an ensemble of models, and using the notation introduced in (15), we shall denote the forecasts as y ˘ k + s | k to signify that, at each step, we shall be using the median value of the ensemble of values, computed using (15).
In the same way, the corresponding bound will be denoted as Δ y ˘ k + s | k and φ x k , v and Γ Z t r , v in (18) will be denoted as φ ˘ x k , v ˘ and Γ ˘ Z ˘ t r , v ˘ . Therefore, (19-22) should be replaced by:
σ ˘ t o t 2 ( s ) = σ ˘ ε 2 ( s ) 1 + φ ( s ) ˘ T x k + s , v ( s ) ˘ Γ ˘ T Z ˘ t r , v ( s ) ˘ Γ ˘ Z ˘ t r , v ( s ) ˘ 1 φ ( s ) ˘ x k + s , v ( s ) ˘
σ ˘ ε 2 ( s ) = k = 1 N e ˘ k + s 2 N p ˘ = k = 1 N y k + s y ˘ k + s k z k 2 N p ˘
y ˘ _ k + s | k = y ˘ k + s | k Δ y ˘ k + s | k y k y ˘ k + s | k Δ y ˘ k + s | k = y ˘ ¯ k + s | k
Δ y ˘ k + s | k = t 1 α 2 , N p ˘ σ t o t ( s )

2.6. Performance criteria

To assess the quality of the models, both in terms of robustness and in terms of the quality of approximation, different criteria will be used.
In this paper the following criteria of assessing the approximation quality will be employed: Root-Mean-Square of the Errors (RMSE) (27), Mean-Absolute Error (MAE) (28), Mean-Absolute Percentage Error (MAPE) (29) and Coefficient of Determination or R-Square (R2) (30). As we would like to assess the performance of the estimation not only for the next time-ahead, but along PH, all the different indices will be averaged over the Prediction Horizon. For all indices, N is the number of samples considered.
R M S E ¯ = 1 P H s = 1 P H R M S E ( s ) , R M S E s = k = 1 N y k + s y ^ k + s k 2 N
M A E ¯ = 1 P H s = 1 P H M A E ( s ) , M A E s = k = 1 N y k + s y ^ k + s k N
M A P E ¯ = 1 P H s = 1 P H M A P E ( s ) , M A P E s = 1 N k = 1 N y k + s y ^ k + s k y k + s
y ̿ s = 1 N k = 1 N y k + s S S R e s s = k = 1 N y k + s y ^ k + s k 2 S S T o t s = k = 1 N y k + s y ̿ s 2 R 2 s = 1 S S R e s s S S T o t s R 2 ¯ = 1 P H s = 1 P H R 2 s
Related with robustness of the solutions, some measures have been proposed. One of the simplest is the Average Width (AW):
A W ( s ) = 1 N k = 1 N W k s
Where the width is given as:
W k s = y ˘ ¯ k + s | k y ˘ _ k + s | k
In (32), y ˘ ¯ k + s | k and y ˘ _ k + s | k are the upper and lower bounds, introduced in (25), of the model output for time k+s, given the data available until time k, respectively.
A more sophisticated measure of interval width is the Prediction Interval Normalized Averaged Width (PINAW), defined as:
  P I N A W s = 1 R A W ( s ) = 1 N R k = 1 N y ˘ ¯ k + s | k y ˘ _ k + s | k ,
where R is the range of the whole prediction set:
R = m a x y m i n y .
On the other hand, if we are interested in the magnitude of the violations, we can use the Interval-based Mean Error (IME), also called the PI Normalized Average Deviation (PINAD):
P I N A D ( s ) = 1 N k = 1 N I M E k s ,
I M E k s = y ˘ _ k + s | k y k + s R , y k + s < y ˘ _ k + s | k 0 , y ˘ _ k + s | k y k + s y ˘ ¯ k + s | k y ˘ ¯ k + s | k y k + s R , y k + s > y ˘ ¯ k + s | k
To assess the coverage, we can use the Prediction Interval Coverage Probability (PICP). PICP indicates if the interval given by the bounds y ˘ _ k + s | k , y ˘ ¯ k + s | k contains the real measure of y k + s :
P I C P ( s ) = 1 N k = 1 N c k ( s ) ,
where
c k s = 1 ,   i f   y ˘ _ k + s | k y k + s y ˘ ¯ k + s | k 0 ,                                         o t h e r w i s e
Another metric that is often used to assess both the reliability and the sharpness of the PIs is the Winkler Score [40]. As it can be seen, it penalizes the interval width with the magnitude of the violations, weighted proportionally by the confidence level.
W S ( s ) = 1 N k = 1 N W S k s ,
W S k s = A W s + 2 α   y ˘ _ k + s | k y k + s , y k + s < y ˘ _ k + s | k A W ( s ) ,   y ˘ _ k + s | k y k + s y ˘ ¯ k + s | k A W s + 2 α y k + s y ˘ ¯ k + s | k , y k + s > y ˘ ¯ k + s | k
As we would like to assess the robustness of the estimation not only for one time-ahead, but along PH, all these criteria must be averaged over PH.
A W ¯ = 1 P H s = 1 P H A W s
P I N A W ¯ = 1 P H s = 1 P H P I N A W s
P I N A D ¯ = 1 P H s = 1 P H P I N A D s
P I C P ¯ = 1 P H s = 1 P H P I C P s
W S ¯ = 1 P H s = 1 P H W S s

3. Results

3.1. The Data

Data acquired from a community of energy situated Algarve, Portugal, from two time-periods, were used. Although much smaller sampling intervals were used, the data has been down sampled to intervals of one hour. The first period contained data from the 30th November 2021 to the 17th March 2022 (2583 samples), and the second period comprised data from 22 March 2022 to 3 Jun 2022 (1762 samples). The houses and the data acquisition system are described in [41,42], and the reader is encouraged to consult these references. The data used here can be freely downloaded at [43].
The data that will be used here is weather data, atmospheric air temperature (T) and global solar radiation (R), Power Demand (PD) and PV Power generated (PG) Additional data that is going to be used is a codification for the type of day (DE). This characterizes each day of the week and the occurrence and severity of holidays based on the day they occur, as may be consulted in [8,44]. In Table 1, the regular day column shows the coding for the days of the week when these are not a holiday. The following column presents the values encoded when there is a holiday, and finally, the special column shows the values that substitute the regular day value in two special cases: for Mondays when Tuesday is a holiday and for Fridays when Thursday is a holiday.
To be able to help the aggregator regarding the needs of purchasing/selling electricity from/to the market, in the sample related to the period between 11h-12h each day, there is the need to forecast the future community power demand and power generated related with next day, i.e., from 0 to 24 hour, in a 1-hour intervals. The former can be modelled as an ensemble of NARX models, for house h:
P ˘ k + 1 | k , h D = M ˘ h D P ¯ k , h D , T ¯ k , C o d e k
The use of the overline symbol in the last equation denotes a set of delayed values of the corresponding variable. As four houses will be considered in the community, four different ensemble models can be used, and the summation of their outputs used to determine the predictive power demand of the community:
P ˘ k + 1 | k , A l l D = h = 1 4 P ˘ k , h D
An alternative is to employ only one ensemble model, with the sum of the four demands:
P ˘ k + 1 | k , C o m D = M ˘ C o m D h = 1 4 P ¯ k , h D , T ¯ k , C o d e k
The power generated can also be modelled as a NARX model:
P ˘ k + 1 | k G = M ˘ G P ¯ k G , R ¯ k , T ¯ k
As it can be seen (46) and (48) use as exogeneous variables the air temperature and (49) the air temperature and the solar radiation. This means that the evolution of these variables should be predicted. The solar radiation can be modelled as an ensemble of NAR models:
R ^ k + 1 | k = M ˘ R R ¯ k
while the air temperature is modelled as:
T ˘ k + 1 | k = M ˘ T T ¯ k
The evolution of the data, for the two periods, is illustrated in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8.
As described earlier, several models will be designed: four power demand models, one for each house, and an additional one for the community; two weather models and one power generated model. The procedure described Section 2.2 and detailed in [25] is followed, and will be briefly summarized here. For each model, two executions of MOGA were performed, the first only minimizing four objectives, ρtr, ρte, #(v) and ρsim(PH), and in the second execution some objectives are set as goals.
For all problems, MOGA was parameterized with the following values:
  • Prediction Horizon: thirty-six steps (36 h).
  • Number of neuros: n n 2 20 ;
  • Initial parameter values: OAKM [33];
  • Number of training trials: five, best compromise solution.
  • Termination criterion: early stopping, with a maximum number of iterations of fifty.
  • Number of generations: 100.
  • Population size: 100.
  • Proportion of random emigrants: 0.10.
  • Crossover rate: 0.70.
As already mentioned, for each model, two executions of MOGA are performed, the results of the first being analyzed to constrain the objectives, and ranges of neurons and inputs of the second execution.
For the NARX models, the number of admissible inputs ranged from 1 to 30, while for the NAR model the range employed was from 1 to 20.
The next figure illustrates the range of forecasts pretended and the delays employed. Every day, a few minutes before 12, the average power for the period 11-12 is computed. Let us denote the index of this sample by k (in red). Using the delayed average powers of the previous 29 indices [k-1 ... k-29] (shown in green), we need to forecast the average powers for next day. Therefore, using a multi-step forecast model, there is the need to forecast 36 steps-ahead values (in cyan), from which only the last 24 steps are required (darker cyan).
The forecasting performance will be assessed using a prediction time-series, from 19-Jan-2022 00:30:00, to 25-Jan-2022 23:30:00, with 168 samples.

3.2. Solar Radiation models

Solar Radiation is a NAR model. Table 2 shows statistics, corresponding to the mean of the employed performance criteria, averaged over the Prediction Horizon. For this model and for the others, the results are related with the 2nd MOGA execution. Unless explicitly specified, scaled values, belonging to the interval [-1 … 1] are employed. This enables a comparison of the performance between different variables. The MAPE value shown is in %.
As it can be concluded analyzing Table 2 and Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17, PICP does not vary significantly along PH, and is always greater than 0.9. The other robustness measures increase along the prediction horizon, but not significantly. Comparing AW and PINAW, they are related by a scale factor, corresponding to the range of the forecasted solar radiation along PH, which remains nearly constant. Although the number of bound violations remains near constant over PH (please see PICP-Figure 12 left), the range of the violations changes with the step-ahead. This can be confirmed by PINAD evolution (Figure 11 right), and by the Winkler score evolution (Figure 11 left), which penalizes twenty times the magnitude of the violations.
The RMSE, MAE, and MAPE change also along PH, typically increasing with the steps-ahead. However, this increase is not high and even sometimes decreases. Analyzing the evolution of the R2, we can see that an excellent fit is obtained and maintained throughout the PH. This can be also seen in Figs 15, left and right. The right figure enables to confirm that the excellent performance is maintained through the PH, both in terms of fitting as well as prediction bounds and violations. Finally, Figure 17 is focused on our goal, which is to inspect the day-ahead predictions obtained with data obtained up to 12h the day before. As it can be seen, only twelve small violations were verified, which results in a PICP of 93% in those 7 days.

3.3. Atmospheric Air Temperature models

Air temperature is also a NAR model. Table 3 shows statistics, corresponding to the mean of the employed performance criteria, averaged over the Prediction Horizon. The evolution of the performance criteria can be seen in Appendix A.1.
If Table 3 is compared with Table 2, as well as Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 with Figures A.1.1–A.1.5, it can be observed that every performance index, apart PICP, are worse for the air temperature. One reason for this behavior might be that the solar radiation time-series is much smoother than the atmospheric temperature or, in other words, the latter has more high frequency content. This can be confirmed by examining Figure 18, which presents a snapshot of the air temperature snapshot.
The blue line represents the time-series, sampled at 1 sec, and the crosses the 1-hour averages. It is clear that there is a substantial high-frequency content in the time-series. The 1-step-ahead forecasted signals are presented in Figure 19. Looking at the right graph, only seven violations are present, corresponding to a PICP of 95%.
Looking at the number of violations for the steps-ahead shown in Figure 20, the number of violations remains nearly constant with PH. The next figure shows the air temperature next day forecast.

3.4. Power Generation models

Power Generation is a NARX model. Table 4 shows same statistics, corresponding to the mean of the employed performance criteria, averaged over the Prediction Horizon. The evolution of the performance criteria can be seen in Appendix A.2.
The results of the power generation are similar (actually slightly better) to the solar radiation performance, as the most important exogeneous variable for power generation is really the solar radiation.

3.5. Load demand models

These are NARX models. First, we shall address the load demand of each house individually. Then, using the four models, the load demand of the community using (47) will be evaluated. Finally, the community load demand with only one model, using (48), will be considered.

3.5.1. Monophasic House 1

Table 5 shows same statistics, corresponding to the mean of the employed performance criteria, averaged over the Prediction Horizon. The evolution of the performance criteria can be seen in Appendix A.3.
The load demand for house 1 (and as it will be shown latter, for all houses) obtains the worse results both in terms of fitting and in terms of robustness. To be able to maintain the PICP around (actually above) 0.9, the average width increases dramatically (please see Table 5 and figures in Appendix A.3). In contrast with the other variables analyzed before, the Winkler score is close to AW, which means that the penalties due to the width of the violations do not contribute significantly to WS.
The load demand time-series of just one house is quite volatile, as the electric appliances are a) not operated every day and b) not used with the same schedule. For this reason, it is exceedingly difficult to obtain good forecasts.

3.5.2. Monophasic House 2

Table 6 shows same statistics, corresponding to the mean of the employed performance criteria, averaged over the Prediction Horizon. The evolution of the performance criteria can be seen in Appendix A.4.
Comparing the results of MP2 with MP1, the latter has a better performance. The average interval widths are smaller, although the average value of WS is similar, as it increases significantly along PH. The fitting is also better for MP2.

3.5.3. Triphasic House 1

Table 7 shows same statistics, corresponding to the mean of the employed performance criteria, averaged over the Prediction Horizon. The evolution of the performance criteria can be seen in Appendix A.5.
The same comments for last case are applicable to TP1. It should be noticed, however, that the fitting ( R 2 ¯ ) is much worse.

3.5.4. Triphasic House 2

Table 8 shows same statistics, corresponding to the mean of the employed performance criteria, averaged over the Prediction Horizon. The evolution of the performance criteria can be seen in Appendix A.6.
The same comments for TP1 are applicable to TP2. It should also be noticed that the fitting ( R 2 ¯ ) of TP2 is much better.

3.5.5. Community using (47)

Assume, without loss of generality, that our energy community has only two houses, whose loads are represented as Y ( 1 ) and Y ( 2 ) . We know that:
σ ε 2 1 + 2 = σ ε 2 1 + σ ε 2 2 + 2 C o v Y ( 1 ) , Y ( 2 )
Where Cov denotes the covariance between two signals. As it has been found that the four loads are not cross-correlated, we shall assume:
σ ε 2 1 + 2 = σ ε 2 1 + σ ε 2 2
The standard error of the fit at y ^ k is given as:
φ ˘ k T x k , v ˘ Γ ˘ T Z ˘ t r , v ˘ Γ ˘ Z ˘ t r , v ˘ 1 φ ˘ k x k , v ˘
For the case of two houses:
Γ ˘ 1 + 2 Z ˘ t r , v ˘ 1 + 2 = Γ ˘ 1 Z ˘ t r 1 , v ˘ 1 Γ ˘ 2 Z ˘ t r 2 , v ˘ 2 ,
And using
φ ˘ k ( 1 + 2 ) x k , v ˘ 1 + 2 = φ ˘ k ( 1 ) x k , v ˘ 1 φ ˘ k ( 2 ) x k , v ˘ 2 ,
the standard error of the fit at y ^ k is given as:
φ ˘ k ( 1 ) x k , v ˘ 1 φ ˘ k ( 2 ) x k , v ˘ 2 <