Applying Multi-Output Random Forest Models to Electricity Price Forecast

Predicting electricity prices is a very important issue in modern society, because the associated decision process under uncertainty requires accurate forecasts for the economic agents involved. In this paper, we apply the decision tree extension of Random Forests to the prediction of electricity prices in Spain, but with the novelty of modeling prices jointly with demand, with the purpose of achieving greater accuracy than with univariate response Random Forests, particularly in price prediction, as well as understanding the effect of the input variables (lagged values of price and demand, current production levels of available energy sources) on the joint of the two outputs. The results are very encouraging, providing significant increase in price prediction accuracy. Also, interesting methodological challenges appear as far as the appropriate choice of the relative weights of price and demand in the joint modeling is concerned and a new procedure to provide the importance variable ranking is proposed. The partykit (package of R software) library allowing for multivariate Random Forests has been used.


Introduction
In any developed society, energy is a primary resource.Energy supply can be considered essential, ensuring wellness, stability and development.
Nowadays, in a global and interconnected society, energy supply can be considered a market where countries and public and private companies are capable of selling and buying energy according to their needs.The energy market involves three key elements: generation of electricity, transport, transmission, distribution and selling it to the consumer.
For energy generation, forecasting has become indispensable.The emergence of renewable energies (especially due to the policy applied in Spain since 2007) and their trend to become the main source of energy is an additional source of difficulty for the traditional energy producers to adjust their production.Traditional energy production includes thermal power plants and combined cycle, which are much more pollutant than renewable energies such as wind farms or solar energy.In the Spanish electrical market renewable energies are part of Special Regime [1] and generally, facilities that produce renewable energy have a maximum installed capacity of 50MW.
Pollutant ways of energy production are currently used for demand not covered by renewable sources.Due to the variability of renewable resources (such as wind), a reliable energy production system should lean on thermal power plants and combined cycle, which can adjust their productions almost instantly when necessary.
Since energy cannot be stored in large quantities, energy producers have to schedule their production according to the variability of the rest of producers.This scheduling is a primary activity in order to ensure that production covers demand, and it also allows them to optimize their resources and become more competitive, and it is the reason for the importance of demand forecasting.
The Spanish energy market is specially complex since it adjusts energy prices using a "pool market": prices are fixed at the figure at which the last producer used to cover demand offers energy.This means that, although some producers can offer their energy at price 0 €/MWh, they still get paid for this energy as long as price for the last energy used is not zero €/MWh, [2].For this reason, renewable energy producers offer their energy at 0€/MWh, and the rest of producers fix their prices according to demand.This explains that renewable energies are always chosen to cover demand.Therefore, price forecasting is also a main issue for energy producers and by thus for the energy market.
Although current models for price and demand forecasting are well developed and mature, there is still room for new research.New approaches provide new accurate models for price and demand forecasting, a better understanding of the energy market and steady improvement of already existent models.Some of the current models for price and demand forecast are based on the ARMA-ARIMA methodology [3][4][5][6][7][8][9][10][11][12].Others incorporate exponential smoothing [12][13][14] and data mining techniques [15][16][17][18][19][20][21][22].These analyses are performed for short, medium and even long term and separately for each variable: price and demand, since the importance variables are different.
Only the work of Amjadi and Dareeepour [23] deals with the joint study of price and demand using an iterative neural network procedure and provides results for different electricity markets including the Spanish one.
When building a forecasting model, it is important to take account of the following: • The input variables used for the analysis should be appropriate for the specific response.
Variable importance measures are often used to determine which variables should be included in the models.• Forecasting is often performed for very short horizons, e.g. one hour ahead, a day ahead, so predictions should be obtained quickly.
The approach in this study consists in the use of decision tree algorithms (Random Forests) [24] and of multivariate analysis, i.e. joint analysis and one hour ahead forecast of price and demand.One of the key points in the method is the selection of the explanatory variables.It is clear that in the new Spanish context, where since 2007 the renewables energies have been extensively introduced in the market, the identification and evaluation of importance variables through a variable ranking is crucial and Random Forests (RF from now on) provide it.Besides, it is clear that prices are load dependent, but in the new regulatory scenario, load patterns (customers behavior) should be also affected by electricity prices.This is the main reason for the use of the multioutput approach based on RF.
This allows us to test other models for the energy market and to take advantage of the correlation between responses (price and demand) and to find relationships between the responses and the input variables of the market, which may prove useful to develop or improve other models.In addition, the predictive performance of multivariate RF is tested as an alternative to univariate RF models.[19] For this paper, 2013 and 2014 hourly energy Spanish market data has been used [2].There are three kinds of energy market variables: calendar variables (related to the date, hour and type of day), present and lagged values of price and demand, and energy production variables, i.e.MWh of each kind of energy consumed along each hour.
The rest of the paper is structured as follows: the theoretical framework including the main concepts on RF, multioutput analysis and importance measures available from RF are introduced in Section 2; Section 3 presents the main features of the Spanish electricity market to be used in the study; in Section 4 the main results in terms of selection of input variables and short term predictions are assessed and compared with univariate framework.At the end, concluding remarks are presented.

Theoretical Framework
The scope of this article relies on regression tree based models in particular RF models and especially in the more recent development of multi output RF models.In what follows a brief description of the main features is provided.

Random Forests
RF is a tree-based method for classification and regression consisting in an ensemble of individual decision trees.The trees used as base learners in RF can be of different types (i.e.CART [25], C4.5 [26], or Conditional Inference [27]).In this paper, the individual trees used have been Conditional Inference Trees (CI Trees), since the algorithm provided by Hothorn (Party and Partykit libraries in R [28]), allows for multivariate, multi-output analysis.
As Hothorn and Zeileis [27] write, CART and C4.5 have two fundamental problems: overfitting and a selection bias towards covariates with many possible splits, which will lead to a biased importance ranking.In non-CI trees, to avoid overfitting, the trees created are pruned, however, the bias (induced by maximizing a splitting criterion over all possible splits simultaneously) is not so easy to eliminate.CI Trees are capable of overriding this problem using a statistical approach which takes into account the distributional properties, measuring in a first step the association between responses and covariates.This means that the iterative binary partitioning and the stopping criteria are applied with multiple test procedures to determine whether or not a significant association exists between any of the covariates and the response.Here, in similar fashion to contingency table independence tests, the association between the sign of model residuals and each covariate is measured by a P-value derived from a permutation test (null hypothesis test of independence between each covariate and the response variable, following standard test of independence).This implementation decreases bias and overfitting problems, and trees are created of different sizes (depth) depending on the pre-specified significance level α.A brief description of CI Tree modelling, based on [27] is presented as follows: Input variables and response are defined as well and may have arbitrary scales: • Response variable Y (possibly multivariate, in our paper bivariate response variable).
• Covariate vector X=(X1,…,Xm) taken from a sample space X = X 1×…× X m.Obviously covariates are the input variables for the model.
The conditional distribution of the response Y given covariates X depends on a function f of the covariates: D(Y|X)=D(Y|X1,…,Xm)=D(Y|f(X1,…,Xm).Binary partitioning is implemented using a case weight vector w=(w1,…,wn), where n is the sample size.Each node of the tree is represented by its own vector of case weights, w (non-zero elements when the corresponding observations i (Yi) are elements of the node and zero otherwise).
For w, the global null hypothesis of independence between any of covariates Xj and the response Y is tested.If the hypothesis cannot be rejected, the splitting stops.Otherwise, the covariate Xj*with the strongest association to Y is selected.
Set A ∈ Xj* is chosen in order to split Xj* into two subsets, left = {A* / Xj*<A*} and right = {Xj* > A*}.This creates two new case weight vectors: wleft and wright.These steps are performed iteratively until the algorithm cannot reject the null hypothesis and the algorithm stops.
The association between Y and the covariate Xj is measured, for the test, by a linear statistic Tj , whose expression is [27]: • is a learning sample, possibly with some covariates missing.• gj :Xj→ R pj is a non-random transformation of the covariate Xj .
• h is the influence function which depends on the responses in a permutation symmetric way.
The step forward from individual CI Trees to an ensemble is the RF-CI algorithm.In RF, each tree (base learner) in a forest is developed by employing two sources of randomness, thus decreasing correlation between trees and building a more reliable algorithm [29].The two sources stem from: • The samples used to build each tree are randomly selected from a given training dataset.
• The variables used to build each tree are also randomly selected from the total set of input variables available.The algorithm is not allowed to consider every predictor (variable) available, in such way that the not-very-strong predictors may appear in the top splits and preventing the trees from being very similar and thus from producing highly correlated predictions.
Thus, the trees can be regarded as near independent.When many nearly independent trees are combined for analysis, the risks of biased decisions or overfitting would decrease greatly, also variance of prediction decreases.As a consequence, RF, in particular CI based, has been recognized as an effective method in machine learning and as an algorithm which provides accurate predictions for many classification and regression problems.

Specific Issues for Multi-Output Analysis
There are two general approaches for solving multi-output pattern recognition problems: either by transforming the problem into multiple single-output problems; or by adapting a pattern recognition algorithm so that it directly handles multi-output data [30].When the datasets are large enough, performing a regression or classification model becomes very expensive in terms of computational resources.Therefore, when many models need to be obtained from the same dataset, it could be useful to perform a single multivariate analysis with two or more responses (outputs) from the same dataset instead of performing two or more univariate analyses separately, since the computation times are thus nearly halved.
More importantly, when the predictions are cross-correlated, training a coherent multi-output model can potentially increase predictive performance compared to training multiple disjoint models [31].
In this paper the research focuses on a multi-output regression problem, since both price and demand are continuous variables.When predicting, the univariate (single output) and multivariate (multi-output) approaches will be compared in this research, focusing not only on the accuracy of predictions but also on computing times.
In general terms, multi-output models based on RF build trees using the variables that explain both response variables for recursive binary splits.This approach means that the influence of each variable should be tested with the hypothesis tests of independence (mechanism used in CI trees) for each response variable.For this reason, the mathematical complexity of multivariate analysis is higher than for the univariate case.
As the response Y is multivariate, each observation Yi will contain two or more responses.Therefore, the global influence function h (as appears in Eq. ( 1)), depends on the multivariate response variables (demand and price in our research).

Importance Variable in Multi-output Environment. New Proposal
A summary of the importance of each input variable can be obtained using the Mean Squared Error (MSE) criterion.Grömping [32] uses the Out-of-Bag (OOB) concept and a permutation-based test to evaluate MSE reduction.As explained in [19], for each tree in the forest, built with a learning data set (usually about two-thirds of the observations), the value of a variable Xj which has been used to build the tree, is randomly permuted in the OOB data set (about one-third of the observations), and a new value of the MSE in the OOB is calculated.The importance of the variable is computed from the differences between MSE and MSEpermuted according to the expression: which is an average over all the trees (B) of the forest where the Xj variable has been used, and Afterwards, ̅ is normalized with the standard error and the final value of the importance metrics is obtained as follows: If the Xj variable does not have predictive importance on the response, is almost zero, therefore the higher is the value of %IncMSEj, the higher is the importance of the variable.
However, due to the novelty of the algorithm and the early stage of development of the partykit library [33], there are no specific commands to evaluate variable importance, for either the univariate response or the multivariate response.Additionally, it is difficult to access each tree in the RF, thus complicating the use of the approach proposed by Grömping [32], mentioned above.
For these reasons, a more pragmatic approach has been developed and proposed here, consisting in permuting an explanatory variable at a time with both responses simultaneously and evaluating the evolution of MSE of the whole RF.In fact this could be considered as a generalization of the previous algorithm.It is important to note that the results provided by this method are not the same as those of from using the permutation for each tree individually instead for the whole forest.
On the other hand, taking into account that multivariate analysis is performed, using MSE to evaluate error is useful when analyzing response variables separately, but it is not appropriate to compute a joint error unless a function of both response variables is created.This is the reason for the following proposal of the study: the definition of a joint response function involving both price and demand.
Since price is usually higher (three orders of magnitude) and harder to predict than demand, it is expected that price error has more influence than demand error when using a price-demand dependent function.This should condition the relative weights of both outputs.For these reasons, the price-demand dependent function proposed to be used as response variable in the analysis of this paper is: This function makes similar order magnitude for price and demand and marks price as the most important response variable (in the denominator, log 10 smooths error for demand).Evaluating a joint MSE using this function has some implications: • The MSE can be smaller when an input variable (explanatory variable) is removed.This means that removing a specific input variable improves price prediction allowing price-related input variables to appear more often.This improvement can occur for price and demand simultaneously or in exchange for less accurate demand prediction.This happens especially when no highly important demand-related input variables are removed (removing very important demand predictors will result in an increase of the joint MSE).• Some variables can be important for price or demand when evaluated separately, but removing these variables does not imply a loss of accuracy if removed one by one.The reason is that other not-so-important variables are capable of holding the quality of predictions if only one important variable is permuted.• Removing some of the inputs which are less relevant in the joint (multivariate) analysis, will certainly result in an improvement of price prediction and could also improve demand predictions.However, in some cases this removal could result in a loss of accuracy in demand predictions larger than the corresponding improvement in price predictions averaged over the full sample.This behavior is not frequent but should be born in mind • In section 4.1, importance variable analysis and its results in terms of the MSE values are presented first for both variables separately, in two univariate analysis and second, for the joint analysis using the response variable defined in Eq. ( 2); conclusions may be drawn straightforwardly.

Ajusting Tuning Parameters for The Study
The main tuning parameters in a RF algorithm are the number of trees of the forest (ntree), the number of variables (mtry) randomly chosen to be considered for each split in the individual trees and the depth of every individual tree.The adequate selection of these parameters could significantly improve the performance accuracy of the models, but the choice of the optimal values is case study-dependent.
In this section the importance and influence of these tuning parameters have been studied considering that input and response variables correspond to the same time point.
The algorithm used in our approach is included in the R "partykit" library [33].The RF-CI algorithm allows the user to choose the type of test statistic to be applied and how to compute the distribution of the test statistic.For this data set and multivariate framework, a computational study has determined that the best results are obtained using teststat="quad" and testtype="Teststatistic".
All plots and charts have been created using the R "ggplot2" [34] library, since "partykit" does not include plot functions yet due to its early stage of development.

Depth of Individual Trees
The depth of each individual tree can be either adjusted manually or have the algorithm choose it automatically.When trees grow too deep, there appears the risk of overfitting.As said before, the parameter α, specified in the construction of the algorithm, refers to the level of significance for the input-output independence tests, and is directly related to the depth of the trees.The higher the value of α, the less difficult to reject independence and thus a split which would result in a greater depth [27].In this paper, the α-value has been chosen at 0,05, a standard value for this parameter that ensures trees do not grow too deep.

Number of Trees in The Forest (ntree)
The number of trees used to make the ensemble, has a direct influence in prediction accuracy.The higher the number of trees, the smaller the error.But this trend is asymptotic: if the number of trees is large enough, increasing the number of trees does not result in a significant improvement in predictions.Besides, using more trees requires larger computing times.For this reason, the number of trees is set based on a trade-off solution between computing time and predictive performance.
For the study of the influence in the error of the number of trees in a RF-CI, how the error is computed should be defined first.In this section, the standard metric Mean Squared Error (MSE) of the Oot-Of-Bag (OOB) predictions for each response (output variable) has been first used to evaluate prediction accuracy.
Each tree makes use of around two-thirds (63,2%) of the observations to build the tree.The remaining observations are referred to as OOB.One may predict the response for the ith observation using each of the trees in which these observation is OOB.The accuracy of a RF prediction can be estimated from these OOB data as in [32]: where n is the sample size, the actual value of the observation and is the average prediction for the ith observation from all trees for which this observation has been OOB.
The analysis for the selection of the number trees in the forest has been performed using a subset of 3000 observations (10% of the full data base).The other parameters have been automatically adjusted by the algorithm.The results -OOB-MSE-for demand and price are shown in Figure 1 and Figure 2 respectively:  As it can be observed, for both responses, the decrease of the error starts to stabilize at 100 trees approximately; for example, the error difference is small when comparing 150 and 200 trees.Seeking for a trade-off solution, the number of trees when making predictions has finally been set at 150.

Number of variables randomly selected to be considered at each split (mtry) in RF
To choose the value of the mtry parameter it is necessary to consider the correlation between the input variables.With highly correlated input variables it is preferable to use a small value [29].Traditionally, mtry=√p for classification forests and mtry=p/3 for regression forests (where p is the total number of input variables) [32].
On the other hand, if there are many irrelevant input variables, a larger value of mtry would be needed in order to obtain better predictions.In this study, initially, there seem to exist input variables that may be irrelevant (or at least of relatively minor importance).An input variable highly uncorrelated with the remaining inputs, could be very important due its unique behavior in the analysis or not important at all in the prediction if not related with the response.
Figure 3 shows the Pearson correlation between the input variables.Since, as may be observed from the Figure 3, the correlations between the input variables are in general low and it is plausible (a priori) that some input variables are of low relevance, the mtry parameter could be chosen higher than recommended [32].RF-CI will likely have stopped splitting before the weak predictors (input variables) come into play with larger mtry [32].
An analysis has been performed to determine the optimal value for the mtry parameter.Since the number of input variables is p=14, the study range for the mtry parameter has been from 5 to 10.This analysis has been performed with a RF-CI of 50 trees, OOB predictions, 3000 observations and it has been replicated 10 times.
The evolution of the OOB-MSE for demand and price as mtry grows is displayed in Figures 4  and 5.When mtry equals 7, the smallest error in demand prediction and a substantial reduction in price error prediction are obtained.Thus, the value of the mtry parameter has been set at 7, once again seeking for the best trade-off option: the minimum for demand and not so small for price.
Besides, the contemporaneous correlation between price and demand is 0.4512.This value is not high enough to establish a priori that multivariate analysis will result in more accurate predictions than individual univariate ones (as mentioned above, highly correlated responses imply better predictions when using multivariate analysis).

Application to the Spanish Electricity Market
The real data base used for the Spanish electricity market includes hourly data from 2013 and 2014 years.The target variables are the clearing price and demand in time t.The explanatory variables include lagged prices incorporating short periodicities and daily periodicities, demand and the energy introduced in the market by the different technologies (nuclear, coal, fuel gas, combined cycle, hydraulic, wind and total special regime); besides calendar variables (type of day, day of the week, hour of the day and month) which incorporates in their different categories, information on the different price and demand patterns have been considered.The generation structure have been included in the data base to evaluate if the proposed methodology is able to capture market behavior, for example, some technologies are incorporated to the generation when high demand occurs and lead to high prices.Values of these variables are obtained from REE [1] and OMIE [2].Table 1 summarizes their values.

Variable Importance Analysis
As commented previously, a new algorithm to compute the importance variable ranking has been implemented because, as far as the authors know, there is none available in the multioutput framework of the RF-CI trees.It can be considered as a generalization of the one proposed in [32].The main differences are the following: A) an explanatory variable is randomly permuted when building the whole RF (with both responses) and the increase in the global MSE is computed and compared with previous one (without random permutation).The greater the increase in the MSE, the more important is the variable.On the other side, the greater the decrease in the MSE, the less important the variable is so it could be removed from the analysis.B) A joint function of the responses (price and demand) is defined, and the MSE is computed for this bivariate response function.
In our study Eq. 2 has been defined as joint function but others functions could be tried in the future.
The computational study involves a RF-CI of 100 trees, 3000 observations each, mtry=7, α=0,05 and OOB predictions have been used to evaluate the variable importance.Values for th tuning parameters have been selected according to the study described in Section 2. The proposed methodology has been implemented both in the univariate and multivariate framework of the RF-CI to assess experimentally the consistency of such rankings.Figures 6 through 8 show the results in terms of the variation of the MSE for the whole RF for univariate price, univariate demand and joint function respectively (Eq. 2).In previous figures, the dashed vertical lines represent the MSE value of the RF-CI when all predictors maintain their true values.Predictors to the right of the dashed vertical line are significant and the higher is the value of MSE, the higher the importance of the variable, which means that if this variable is randomly permuted when building the forest, the MSE will increase.Unlike previous ones, predictors to the left of the dashed vertical line have a negative effect on the MSE, that is, if these variables are randomly permuted from the analysis, the MSE will decrease and a better prediction accuracy will be obtained.Furthermore, this result allows us to conclude that if these variables are eliminated from the analysis, the prediction improves.The values in Table 2 complete the information displayed in Figures 6 through 8.For example, when explanatory variable price.t1 is randomly permuted, the OOB-MSE-Price increases to reach the highest value 35.91.Thus, the one hour lagged price (price.t1) is the most important variable to accurately predict the price.For demand, the most important variables are those related to the energy produced by the different technologies and used to cover demand instantly, i.e. renewable energies, combined cycle, and hydraulic as well as some calendar variables as the hour and the day type.Therefore, if the responses are considered separately, the univariate output approach, those input variables extremely related to one response are less important for the other (lagged prices are the less important for demand in RF-CI for example and hydraulic production for price in RF-CI).
Figures 6 through 8, and Table 2, with the exact values, indicate that multivariate importance variable analysis selects as the most important input variables those that explain each response separately (lagged values of price for price or hour for demand) and those that provide a good explanation of both response variables, energy produced by combined cycles (comb.cyc)for example.
Summarizing,, the most important variables for both price and demand are one hour, two hours and 24 hour lagged prices, and energy produced by combined cycles and coal (comb.cycand carb respectively).The rest of the variables have no strong (for better or worse) influence on their own, except for renewable energy production (reg.especial),day type(day.type)and energy produced by fuel gas plants (fuel.gas).These variables have a slightly negative influence on the joint prediction when price is selected as the most important response variable in multioutput analysis.
It should be highlighted that, due to the definition of the joint importance function, joint variable importance is very similar to that of univariate price.If another joint function had been defined, for example, giving equal importance to price and demand, the results would have been different.
The previous results, as obtained from RF-CI, can be compared with those provided by non conditional RF-CART evaluated for each response separately and presented in Figures 9 and 10 for price and demand respectively.In these cases the function "varImpPlot" to evaluate variable importance included in the library RandomForest has been used and the importance measure based on the Gini index, IncNodePurity, is displayed.This measure quantifies for each explanatory variable, the average decrease in the forest of the Gini index.The results of variable importance for RF-CART are pretty similar, validating the approach proposed and followed for RF-CI.

Forecasting One Hour Ahead
In this section, the forecasting capabilities of multivariate RF-CI have been tested performing one hour-ahead forecasts for both price and demand.The multivariate and univariate approaches are compared.In the multivariate analysis, RF with conditional inference trees as base learners are used (RF-CI); however, in the univariate framework, RF-CART have been built as well.
The input variables are the set of exogenous variables previously defined in Section 3 and lagged responses including two new ones: one hour lagged predicted price and predicted demand.It is worthwhile to mention that input variables are the same for the two forecast processes: price and demand, as it is incorporated in the multi-output algorithm of partykit.
The strategy to perform predictions is to eliminate from the analysis those predictors identified as having a negative influence on the joint MSE (i.e if they are eliminated, accuracy will increase).
Thus, the input variables are a set of exogenous variables and lagged responses including two new ones: one hour lagged predicted price and predicted demand.
For a more representative analysis, for comparison purposes with similar analysis with alternatives models, the error metric has been changed.Adopting the MAPE error measurement provides better interpretability, clarifying forecast accuracy.
The MAPE (Mean Absolute Percentage Error) is defined as follows: Since price (and by extension, the denominator of price-demand function as defined previously) is zero for many observations, a MAPE cannot be used directly.Therefore, the so called Fixed MAPE has been used in price and for the price-demand function, using the mean of the present values in the denominator: Once again, the joint function defined in Eq. ( 2) has been used, as well as individual price and demand forecasts.
Due to its random-based construction, the RF-CI created with the same training data set may produce slightly different outputs.Moreover, using different training data sets produces more variability in the results, which, however, hardly change from one set of bootstrap samples to another, i.e, we achieve the robustness sought with the RF stabilizing effect.Both the best and worst forecasting results are presented, as follows, to compare with other results in particular those provided for the Spanish market.
For all analyses, RF of 150 trees, mtry=7 and α=0.05 have been used (as selected in previous sections).The total number of observations (hourly data corresponding to the Spanish electricity market for 2013 and 2014, 17520 registers) has been split into two data sets: training data set (12270 observations) and test data set (5250 observations).The length of the training data set has not be optimized, in future works, the influence of the length on the prediction accuracy will be analyzed.
The prediction performance of the models is summarized in Table 3. Removing variables that have a negative effect on the joint function error (as identified in the variable importance analysis explained in subsection 4.1) results in a reduction in both price and demand forecasting errors (Row "Multivariate RF-CI NA" in bold Table 3).This means that when carrying out the adequately the multivariate analysis, selecting input variables can favor forecasting for both responses.In our case, the variables related to renewable energy production, day type and fuel gas energy production have been removed, allowing other variables whose influence on the joint function is minor, to appear more often and thus improving the algorithm's forecasting accuracy.
Note also that removing other variables whose influence is minor can result in a better price forecasting and a worse demand forecasting.
Finally, RF-CI and RF-CART have been used to perform univariate analysis for comparison with multivariate analysis using RF-CI, referred to as Univariate in Table 3.The comparison highlights that results are pretty similar for both techniques (RF-CI and RF-CART) and slightly different from those of the multivariate analysis.
Systematically, univariate RF-CART provides slightly higher errors than those of univariate RF CI.The best forecasting results come from Multivariate RF-CI when a previous input variable selection by variable importance analysis is carried out, and are quite similar to those of univariate RF-CI for demand, in fact the value obtained for the MAPE in the multivariate analysis for demand (2.8299) lies between the best (2.7943) and the worst results (2.8676).
In general terms, the results presented in Table 3 are similar.However, performing two univariate analyses requires doubling computing times, which for a single multivariate analysis is the same than those required to conduct one univariate analysis.Since forecasts presents almost the same accuracy, multivariate analysis can thus be considered more appropriate.
It is also noted that results of the multivariate forecast are more accurate for demand than for price, so the methodology is able to capture and reproduce results widely known in the literature.
When comparing the results of this research to those of studies relating to the Spanish electricity market performed with tree based models [19], and simultaneous prediction of load and price [23], the conclusion is that there are very promising.
Just for price, analysis carried out in [19] for the Spanish electricity market in 2011, shown a mean of MAPE for the third week of august of 6.02% (168 hours) obtained with RF-CART models.Amjady and Dareepour [23], for four specific weeks of 2002 for the Spanish system, report a MAPE of 4.22%, 4.39%, 5.55% and 5.66% respectively, with an algorithm that clearly outperforms other methods, with a MAPE mean ranged in the interval 6.76% to 9.96%.This comparison is presented in [23] and it includes time series and machine learning-based models.
Regarding load prediction, [23] reports for the same weeks in the Spanish market, 2002, a MAPE value of 0.99%, 1.10%, 1.02% and 1.08% respectively; and for January 2004 and July 2004 in the New York electricity market, they present 1.57% and 2.11% respectively which indicate better accuracy than other methods, as summarized in their paper as well, where MAPE varies from 1.82% to 3.55%.
So, as commented previously, results are good in terms of accuracy, in the same order of magnitude that other data mining models, although it is clear that new improvements in the methodology should be incorporated and the selection of tuning parameters to ensure the algorithm is reliable has been stated as essential.
Taking into account that the computational effort needed for conducting two univariate analyses (270 minutes) is twice that of a single multivariate one (135 minutes), then, later is preferable.

Conclusions
RF-CI and decision tree algorithms come into play as a powerful, reliable and useful tool for data exploration, understanding and prediction.Results show that the methodology proposed and incorporated in the algorithm is able to find the main drivers for price and demand meaningfully.After the importance variable assessment the following conclusions can be outlined: Due to the number of input variables and in some cases to their correlations, it is possible to remove some input variables without affecting prediction accuracy.
• Price.The most important variables for price are one hour lagged price, combined cycle energy production and hour.In most cases, removing only one variable does not imply a significant change and sometimes means a small • Demand.The most important variables for demand are renewable energy production, hour, day type and combined cycle energy production.For demand, lagged prices are not important.Demand seems to present more instability than price when just one variable is removed, but the algorithm is still capable of providing good predictions using the rest of input variables.• Joint-Prediction: due to the definition of the joint-prediction function, its behavior is very similar to that of price.In this case, removing some not-very influential variables allows other ones (hidden by previous ones) to appear often and improves predictions without modifying their quality.For joint prediction, the most important variables are those which appear as important variables for both price and demand.In the case of price.t1, it is not important for demand but extremely important for price, and it appears as the most important input variable for joint prediction.
Although the production of renewable energy results an important input variable for demand, its importance is minor for joint prediction.In fact, removing it results in an improvement.This can be explained by the high correlation between wind energy production and renewable energies that allows the algorithm to use wind energy as covariate instead of renewable energies without losing accuracy.This behavior highlights the importance of the study of correlation between input variables.
The analysis of variable importance and correlations is recommended since it allows for the identification of input variables that reduce the accuracy of predictions.In the future different joint functions should be tried.
Regarding forecast accuracy, the main conclusions can be summarized as follows: The best results have been obtained using multivariate RF-CI combined with previous selection of input variables (i.e., removing those variables that decrease forecast accuracy).In this case, RF-CI emerges as a competitor for traditional forecasting algorithms, such as ARIMA techniques and provides results with similar accuracy results as other machine learning methods.
Using all variables in multivariate RF-CI provides similar results, with a slight loss of accuracy, especially for demand.Univariate analysis performs similarly for demand and worse for price, but the difference is positive and greater in the case of price.
Taking into account that the computational effort needed for conducting two univariate analyses is twice that of a single multivariate one, then, the later is preferable.Besides, selection of tuning parameters to ensure the algorithm is reliable has been stated as essential.
The results globally imply room for new methodological research and for new computational experiments to adjust some important issues of the algorithm such as the length of the training set and the meaningful selection of joint function.

Figure 1 .
Figure 1.OOB-MSE for Demand versus the number of trees in the forest.

Figure 2 .
Figure 2. OOB-MSE for Price versus the number of trees in the forest.

Figure 6 .
Figure 6.Variable importance analysis for price.

Figure 7 .
Figure 7. Variable importance analysis for demand.

Figure 8 .
Figure 8. Importance variable analysis for the joint price-demand function.

Figure 9 .
Figure 9. Importance variable analysis for price using RFCART.

Figure 10 .
Figure 10.Importance variable analysis for demand using RFCART.

Table 1 .
Variables included in the data base.

Table 2 .
Comparison of Mean Squared Errors.

Table 3 .
Comparison of MAPE errors for multivariate and univariate modelling.