Submitted:
29 July 2024
Posted:
30 July 2024
You are already at the latest version
Abstract

Keywords:
1. Introduction
- In general, “ensemble learning” exhibited superior predictive ability to other individual ML algorithms.
- Most of the previous works studied have not completed or clearly described the ML preprocessing and data analysis stage, which is considered fundamental in the development of ML algorithms.
- A minimal number of the reported articles provided information to the number of points in the collected climate database (Table 1). The number of points in the database is a critical aspect motivated by the fact that the optimal training of ML algorithms is strongly related to the size and quality of the database. In parallel, all the research work has been carried out to estimate the solar radiation considering a prediction horizon greater than or equal to one hour.
- Not studies examined have applied the homogeneous ensemble algorithm identified as Histogram-based Gradient Boosting (HGB) to predict solar radiation.
- None of the manuscripts has proposed a comparative analysis to evaluate the prediction performance of the Voting and Stacking ensemble techniques by combining homogeneous ensemble based on sequential and parallel learning (Table 1).
- A comparative analysis will be performed to select the subset of input features that best fit the characteristics of the climate data by employing five ML algorithms to reduce the dimensionality of the database.
- Histogram-based Gradient Boosting (HGB) is adopted for the first time to predict solar irradiance in tropical climates with a time horizon of 1-minute.
- A new tool for solar radiation prediction based on heterogeneous ensemble learning algorithm by combining homogeneous learning with the best performance capability will be proposed.
- The performance prediction of the nine ensemble learning algorithms is evaluated based on metrics MAE, MSE, RMSE, MAPE, R2.
2. Description of Ensemble Learning Algorithms
2.1. Parallel Homogeneous Ensemble
2.1.1. Random Forest (RF)
2.1.2. Extremely Randomized Trees (ET)
2.2. Sequential Homogeneous Ensemble
2.2.1. Adaptive Boosting (AB)
2.2.2. Gradient Boosting (GB)
2.2.3. Extreme Gradient Boosting (XGB)
2.2.4. LightGBM (LGBM)
2.2.5. Histogram-Based Gradient Boosting (HGB)
2.3. Heterogeneous Ensemble Learning
2.3.1. Voting
2.3.2. Stacked Generalization.
3. Materials and Methods
3.1. Data Collection
3.2. Data Preprocessing and Analysis
- Arc.Int and Heat D-D parameters were eliminated of the dataset, since reported constant values (not variability).
- In the Pearson correlation matrix, pairs of correlated input predictors parameters can be identified based on correlation coefficient values higher 0.8 and lower -0.8 (collinearity); Wind Chill, Heat Index, THW index , Cool D-D are each separately correlated with Temp Out. Wind Run is associated with Wind speed; In EMC is related to In Hum. In Hum and In Temp are correlated, as well as In Dew with Dew Pt.. Rain is correlated to Rain rate. In parallel, there is a strong linear correlation between many measured meteorological parameters and the High (Hi)- Low (Low) registered values corresponding to each one of the parameters. This effect could be due to the fact that a very short time has been set for updating the DAQ lecture (1-minute/lecture). Therefore, for many parameters, the measured values and high-low register values do not differ. As a consequence, the following input predictor parameters were deleted from the dataset to prevent a propagation effect of the collinearity in the process of subset feature selection and possible bias in the technical evaluation metrics: Wind Chill, Heat Index, THW index , Cool D-D, Wind speed, In EMC, In Hum, In Dew, Rain rate, Hi Temp, Low Temp, Hi speed, Hi solar Rad. (store high and low values);
- Wind direction (Wind Dir) and high wind direction (Hi Dir) parameters could have some influence on solar radiation based on Figure 5a-b. Therefore, they were converted to numerical values by dummy technique and included intro the matrix correlation labels as WD_N, WD_NE, WD_NNE, HD_N, HD_NE, HD_NNE. In Figure 7 can be seen WD_NNE, HD_N has effect on solar radiation.
- Solar Energy parameter is computed from solar Radiation, so its collinearity is structural. As consequent, Solar Energy was not included in the dataset using to the feature selection process.
3.3. Splitting the Dataset
3.4. Standardization of the Dataset
3.5. Feature Selection
3.5.1. The Pearson Coefficient
3.5.2. Recursive Feature Elimination (RFE)
3.5.3. SelectKBest(SKBest)
3.5.4. Sequential Future Selection (SFS)
3.6. Training Process
3.7. Evaluation Metrics
4. Discussion and Results
4.1. Homogeneous Ensemble Learning
4.2. Heterogeneous Ensemble Learning
4.3. Generalization Capability
5. Conclusions
- Solar radiation measurements were distributed as following; Approximately 75.67% were captured when the wind was blowing from the north(N) direction, 23.75% correspond to the north-northeast (NNE) wind direction and only 0.09% were taken in the northeast (NE) wind direction. Maximum average value of the solar radiation was obtained from 12:30 p.m. to 1:30 p.m. (the 6th hour of the daily solar sample) with a value of 676.45 W/m2.
- Recursive Feature Elimination (RFE) with Random Forest (RF) as external model was the best method for selecting the subset of input features for the training process, outperforming the Pearson, univariate (SelectKBest), Sequential Feature Selection (SFS) methods in terms R2 score.
- Overall, the Stacking ensemble algorithm built by combining Random Forest (RF), Extra Tree (ET), Gradient Boosting (GB) and, Histogram-based Gradient Boosting (HGB) in the first layer and using linear regression in the second layer provides the superior accuracy and prediction performance, obtained evaluation metrics values MSE=3218.265, RMSE=56.730, MAE=29.872, MAPE=10.60, R2=0.9645. However, it is highly penalized by the computational cost of the training procedures, especially in the first layer. Therefore, in case that the computational cost is considered as a critical constraint, the homogeneous ensemble, Histogram-based Gradient Boosting (HGB) could be an excellent alternative, since it offers similar metrics (MSE=3308.874 RMSE=57.523, MAE=30.839, MAPE=10.7, R2=0.9631) as Stacking and requires the lowest computational cost. This section is not mandatory but can be added to the manuscript if the discussion is unusually long or complex.
- In general, the elaborated ensemble learning algorithms proved to be a powerful tool for predicting global solar radiation in Santo Domingo, located in the Caribbean region, which is characterized by a tropical climate. They captured the tendency of solar radiation with effectiveness and excellent accuracy
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- UNFCCC. Conference of the Parties (COP) Adoption of the Paris Agreement. Proposal by the President. Paris Clim. Chang. Conf. - Novemb. 2015, COP 21 2015, 21932, 32. [Google Scholar]
- COP28 UN Climate Change Conference - United Arab Emirates | UNFCCC. Available online: https://unfccc.int/cop28 (accessed on 9 June 2024).
- IEA (2024)-Renewables 2023 Renewables 2023 Analysis and Forecast to 2028; Paris , 2024.
- Comisión Nacional de Energía(CNE) PLAN ENERGÉTICO NACIONAL 2022-2036; Santo Domingo. D.N. 2022.
- Consultoría Jurídica del Poder Ejecutivo Ley Núm. 57-07 Sobre Incentivo Al Desarrollo de Fuentes Renovables de Energía y de Sus Regímenes Especiales. 10416 2007.
- Consultoría Jurídica del Poder Ejecutivo Ley Núm. 1-12 Que Establece La Estrategia Nacional de Desarrollo 2030. 10656 2012.
- Kumar, D.S.; Yagli, G.M.; Kashyap, M.; Srinivasan, D. Solar Irradiance Resource and Forecasting: A Comprehensive Review. IET Renew. Power Gener. 2020, 14, 1641–1656. [Google Scholar] [CrossRef]
- Panda, S.; Dhaka, R.K.; Panda, B.; Pradhan, A.; Jena, C.; Nanda, L. A Review on Application of Machine Learning in Solar Energy Photovoltaic Generation Prediction. Proc. Int. Conf. Electron. Renew. Syst. ICEARS 2022 2022, 1180–1184. [Google Scholar] [CrossRef]
- Krishnan, N.; Kumar, K.R.; Inda, C.S. How Solar Radiation Forecasting Impacts the Utilization of Solar Energy: A Critical Review. J. Clean. Prod. 2023, 388, 135860. [Google Scholar] [CrossRef]
- Voyant, C.; Notton, G.; Kalogirou, S.; Nivet, M.L.; Paoli, C.; Motte, F.; Fouilloy, A. Machine Learning Methods for Solar Radiation Forecasting: A Review. Renew. Energy 2017, 105, 569–582. [Google Scholar] [CrossRef]
- Guerrero, J.M.; Ponci, F.; Leligou, H.C.; Peñalvo-López, E.; Psomopoulos, C.S.; Sudharshan, K.; Naveen, C.; Vishnuram, P.; Venkata, D.; Krishna, S.; et al. Systematic Review on Impact of Different Irradiance Forecasting Techniques for Solar Energy Prediction. Energies 2022, 15, 6267. [Google Scholar] [CrossRef]
- Rahimi, N.; Park, S.; Choi, W.; Oh, B.; Kim, S.; Cho, Y. ho; Ahn, S.; Chong, C.; Kim, D.; Jin, C.; et al. A Comprehensive Review on Ensemble Solar Power Forecasting Algorithms. J. Electr. Eng. Technol. 2023, 18, 719–733. [Google Scholar] [CrossRef]
- Raza, M.Q.; Nadarajah, M.; Ekanayake, C. On Recent Advances in PV Output Power Forecast. Sol. Energy 2016, 136, 125–144. [Google Scholar] [CrossRef]
- Kunapuli, G. Ensemble Methods for Machine Learning; Olstein, K., Miller, K., Eds.; Manning Publications Co.: Shelter Island-NY, 2023; ISBN 9781617297137.
- Hassan, M.A.; Khalil, A.; Kaseb, S.; Kassem, M.A. Exploring the Potential of Tree-Based Ensemble Methods in Solar Radiation Modeling. Appl. Energy 2017, 203, 897–916. [Google Scholar] [CrossRef]
- Benali, L.; Notton, G.; Fouilloy, A.; Voyant, C.; Dizene, R. Solar Radiation Forecasting Using Artificial Neural Network and Random Forest Methods: Application to Normal Beam, Horizontal Diffuse and Global Components. Renew. Energy 2019, 132, 871–884. [Google Scholar] [CrossRef]
- Park, J.; Moon, J.; Jung, S.; Hwang, E. Multistep-Ahead Solar Radiation Forecasting Scheme Based on the Light Gradient Boosting Machine: A Case Study of Jeju Island. Remote Sens. 2020, Vol. 12, Page 2271 2020, 12, 2271. [Google Scholar] [CrossRef]
- Lee, J.; Wang, W.; Harrou, F.; Sun, Y. Reliable Solar Irradiance Prediction Using Ensemble Learning-Based Models: A Comparative Study. Energy Convers. Manag. 2020, 208, 112582. [Google Scholar] [CrossRef]
- Kumari, P.; Toshniwal, D. Extreme Gradient Boosting and Deep Neural Network Based Ensemble Learning Approach to Forecast Hourly Solar Irradiance. J. Clean. Prod. 2021, 279, 123285. [Google Scholar] [CrossRef]
- Huang, L.; Kang, J.; Wan, M.; Fang, L.; Zhang, C.; Zeng, Z. Solar Radiation Prediction Using Different Machine Learning Algorithms and Implications for Extreme Climate Events. Front. Earth Sci. 2021, 9, 596860. [Google Scholar] [CrossRef]
- Alam, M.S.; Al-Ismail, F.S.; Hossain, M.S.; Rahman, S.M. Ensemble Machine-Learning Models for Accurate Prediction of Solar Irradiation in Bangladesh. Processes 2023, 11, 908. [Google Scholar] [CrossRef]
- Solano, E.S.; Affonso, C.M. Solar Irradiation Forecasting Using Ensemble Voting Based on Machine Learning Algorithms. Sustain. 2023, 15, 7943. [Google Scholar] [CrossRef]
- Mohammed, A.; Kora, R. A Comprehensive Review on Ensemble Deep Learning: Opportunities and Challenges. J. King Saud Univ. - Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
- González, S.; García, S.; Del Ser, J.; Rokach, L.; Herrera, F. A Practical Tutorial on Bagging and Boosting Based Ensembles for Machine Learning: Algorithms, Software Tools, Performance Study, Practical Perspectives and Opportunities. Inf. Fusion 2020, 64, 205–237. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, J.; Shen, W. A Review of Ensemble Learning Algorithms Used in Remote Sensing Applications. Appl. Sci. 2022, 12. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Khan, A.A.; Chaudhari, O.; Chandra, R. A Review of Ensemble Learning and Data Augmentation Models for Class Imbalanced Problems: Combination, Implementation and Evaluation. Expert Syst. Appl. 2024, 244, 122778. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. Experiments with a New Boosting Algorithm. In Proceedings of the icml; Citeseer, 1996; Vol. 96, pp. 148–156.
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Friedman, J.H. Stochastic Gradient Boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
- Natekin, A.; Knoll, A. Gradient Boosting Machines, a Tutorial. Front. Neurorobot. 2013, 7, 63623. [Google Scholar] [CrossRef] [PubMed]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction,; Tibshirani, R., Hastie, T., Eds.; Springer Series in Statistics; Second Edition.; Springer : New York, 2009; ISBN 9780387848587.
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery, August 13 2016; Vol. 13-17-Augu, pp. 785–794.
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- scikit-learn Histogram-Based Gradient Boosting Regression Tree.
- Wolpert, D.H. Stacked Generalization. Neural Networks 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Li, Y.; Chen, W. A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics 2020, 8, 1–19. [Google Scholar] [CrossRef]
- Ruiz-Valero, L.; Arranz, B.; Faxas-Guzmán, J.; Flores-Sasso, V.; Medina-Lagrange, O.; Ferreira, J. Monitoring of a Living Wall System in Santo Domingo, Dominican Republic, as a Strategy to Reduce the Urban Heat Island. Buildings 2023, 13, 1222. [Google Scholar] [CrossRef]
- Pena, J.C.; Gordillo, G. Photovoltaic Energy in the Dominican Republic: Current Status, Policies, Currently Implemented Projects, and Plans for the Future. Int. J. Energy, Environ. Econ 2020, 26, 270–284. [Google Scholar]
- The World Bank(2020)-Source: Global Solar Atlas 2.0-Solar resource data: Solargis. Solar Resource Maps of Dominican Republic. Available online: https://solargis.com/maps-and-gis-data/download/dominican-republic (accessed on 6 June 2024).
- Elgeldawi, E.; Sayed, A.; Galal, A.R.; Zaki, A.M. Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis. Informatics 2021, Vol. 8, Page 79 2021, 8, 79. [Google Scholar] [CrossRef]













| Refs. | Location | Feature Subset selected |
Ensemble Algorithm test |
Time horizon | Data |
Periods | Metric |
Complete Preprocessing stage |
|---|---|---|---|---|---|---|---|---|
| 15 | Cairo, Ma’an, Ghardaia’ Tataouine, Tan-Tan |
4 | BG, GB, RF, SVM*, MLP-NN | 1 Hour | 71499 | 2010 to 2013 | MBE, R2 RMSE |
![]() |
| 3 | BG, GB, RF, SVM*, MLP-NN | 1 day | 7906 | |||||
| 16 | Odeillo (France) |
SP, MLP-NN, RF* | 1 to 6 hours | 10559 | 3 years | MAE, RMSE, nRMSE, nMAE | ![]() |
|
| 17 | Jeju Island (South Korea) |
330 | LGBM, RF, GB, DNN | 1 hour | 32,134 | 2011 to 2018 | MBE, RMSE, MAE, NRMSE |
![]() |
| 18 | California, Texas, Washington, Florida, Pennsylvania, Minnesota |
9 | BS, BG, RF, GRF*, SVM, GPR | 1 hour | - | A year (TMY3) |
RMSE, MAPE, R2 | ![]() |
| 19 | New Delhi, Jaipur, Gangtok |
8 | Stacking*(XGB+DNN) | 1 hour | - | 2005 to 2014 | RMSE, MBE, R2 | ✓ |
| 20 | Bangladesh | 7 | GB*, AGB, RF, BG, | - | 3060 | 1999 to 2017 | MAPE, RMSE, MAE, R2 | ![]() |
| 21 | Ganzhou | 10 | GB*, XGB*, AB, RF, SVM, ELM, DT, KNN, MLR, RBFNN, BPNN | 1day | 13,100 | 1980 to 2016 | RMSE, MAE, R2 | ![]() |
| 1months | 432 | |||||||
| 22 | El Salvador (Brazilian) |
9 |
Voting*, XGB, RF CatBoost, AdaBoost, |
1 to 12 hours | MAE, MAPE, RMSE, R2 | ✓ | ||
| This Work | Santo Domingo |
8 | RF, ET, GB, XGB, HGB*, LGBM, Voting, Stacking* |
1-min | 78,536 | 5 months (2022) |
MSE, MAE RMSE, R2, MAPE |
✓ |
| Model | Processors | Memory | Graphics Card | Hard Disk |
|---|---|---|---|---|
| Dell OptiPlex 7000 | 12th Intel Core i7-12700 | 32GB DDR4 | Intel Integrated Graphics | 1TB PCIe NVMe |
| Parameters /Features |
Description | Specifications | ||
|---|---|---|---|---|
| Range | accuracy (+/-) |
|||
| 1 | Date | month/day | 8 sec./ mon. |
|
| 2 | Time | 24 hours | 8sec./ mon. |
|
| 3 | Temp Out | Outside (Ambiental) Temperature | -40°C to 65°C | 0.3°C |
| 4 | Hi Temp | High Outside temperature recorded for a certain period | ||
| 5 | Low Temp | Low Outside temperature recorded for a certain period | ||
| 6 | In Temp | Inside Temperature/sensor located at the Console | 0°C to 60°C | 0.3°C |
| 7 | Out Hum | Outside Relative Humidity | 1% to 100% | 2% RH |
| 8 | In Hum | Inside Relative Humidity at the Console | 1% to 100% | 2% RH |
| 9 | Dew Pt. | Dew Point | -76°C to 54°C | 1°C |
| 10 | In Dew | Inside Dew Point at the Console | -76°C to 54°C | 1°C |
| 11 | Wind Speed | Speed of the outside local wind | 0 to 809 m/s | > 1 m/s |
| 12 | Hi Speed | High Velocity of the outside wind recorded in the configure period | ||
| 13 | Wind Dir | Wind direction | 0° to 360° | 3° |
| 14 | Hi Dir | High Wind direction recorded for a certain period | ||
| 15 | Wind Run | The “amount” of wind passing through the station/time | ||
| 16 | Wind Chill | Apparent temperature index calculated from wind speed and air temperature | -79°C to 57°C | 1°C; |
| 17 | Heat Index | An apparent temperature index estimated by associated temperature and relative humidity to determine the level of perceived air hot (feels) | -40°C to 74°C | 1°C |
| 18 | THW Index | use the temperature -humidity -wind to estimate apparent index | -68°C to 74°C | 2°C |
| 19 | THSW Index |
combine the temperature -humidity -sun-wind to estimate apparent temperature index (feels like out in the sun) | ||
| 20 | Bar | Barometric Pressure | 540 to 1100mb | 1.0 mb |
| 21 | Rain | The amount of rainfall Daily/monthly/yearly | to 6553 mm | > 4% |
| 22 | Rain Rate | Rainfall intensity | to 2438 mm/h | >5% |
| 23 | Solar Rad. | Solar Radiation, includes both the direct and diffuse components | 0 to 1800W/m2 | 5% FS |
| 24 | Hi Solar Rad. | High Solar Radiation recorded for a certain period | ||
| 25 | Solar Energy | The rate of solar radiation accumulated over a time | ||
| 26 | Heat D-D | Heating degree day | ||
| 27 | Cool D-D | Cooling degree days | ||
| 28 | In Heat | Inside heat index, where the console is located | -40°C to 74°C | |
| 29 | In EMC | Inside Electromagnetic Compatibility | ||
| 30 | In Density | Inside air density at the console installation location | 1 to 1.4 kg/m3 | 2% FS |
| 31 | ET | A measurement of the amount of water vapor returned to the air in a specific area through both evaporation and transpiration | to 1999.9 mm |
>5% |
| 32 | Wind Samp | wind speed samples in “Arc Int” amount of time | ||
| 33 | Wind Tx | RF channel for wind data | ||
| 34 | ISS Recept | % - RF reception | ||
| 35 | Arc. Int. | archival interval in minutes | ||
| Selection Methods |
Subset of feature selected |
Characteristics | Ensemble learning Algorithms | Score R2 (test set) |
|---|---|---|---|---|
| Pearson | Temp Out, Out Hum, Dew Pt.,THSW Index, Bar, Rain, In Temp, In Density |
<-0.1 |
GB | 0.924 |
| AGB | 0.822 | |||
| XGB | 0.958 | |||
| ET | 0.958 | |||
| RF | 0.954 | |||
| RFE |
In Temp, In Density, Out Hum, Bar, THSW Index,Wind Speed,Dew Pt.,Temp Out |
External ML algorithm= RF, RF={ n_estimators:350, criterion:squared_error, max_depth:15, max_features:sqrt } |
GB | 0.930 |
| AGB | 0.843 | |||
| XGB | 0.962 | |||
| ET | 0.964 | |||
| RF | 0.960 | |||
| SKBest |
Temp Out, Out Hum, Dew Pt., Wind Speed, THSW Index, Bar, Rain, In Temp |
Score function=Regression, Number feature to select =8 |
GB | 0.929 |
| AGB | 0.843 | |||
| XGB | 0.962 | |||
| ET | 0.964 | |||
| RF | 0.959 | |||
| SFS-FW |
Temp Out, Out Hum, Dew Pt., Wind Speed, THSW Index, In Density, WD_NNE, HD_N | External ML Algorithm=LR, direction=forward, scoring=R2, cross validation=kfold, Kfold={ folds=5, shuffle=NO} |
GB | 0.930 |
| AGB | 0.833 | |||
| XGB | 0.962 | |||
| ET | 0.961 | |||
| RF | 0.957 | |||
| SFS-BW |
Temp Out, Out Hum, Dew Pt., Wind Speed, THSW Index, Bar, In Temp, In Density |
External ML Algorithm=LR, direction=backward, scoring=R2, cross, validation=Kfold, Kfold={folds=5, shuffle=NO} |
GB | 0.930 |
| AGB | 0.833 | |||
| XGB | 0.962 | |||
| ET | 0.962 | |||
| RF | 0.957 |
| Algorithms | Iteration (n_iter)/ Cores |
Appropriate Hyperparameters | Computational cost(s) |
|---|---|---|---|
| RF | 1000/8 | n_estimators: 1160, max_features: 8, min_samples_leaf: 7, max_depth: 17 , min_samples_split: 10 |
51605.280 |
| ET | 1000/8 | n_estimators: 630, min_samples_split: 10, min_samples_leaf: 1, max_depth: 23 max_features: 8 |
126830.010 |
| AGB | 250/8 | n_estimators: 100, loss: exponential, learning_rate: 0.201 | 10877.330 |
| GB | 1500/8 | n_estimators: 2200, min_weight_fraction_leaf: 0, min_samples_split: 250, min_samples_leaf: 40, max_leaf_nodes: 10, max_features: 8, max_depth: 18, loss: huber, learning_rate: 0.101, criterion: friedman_mse, alpha: 0.210, tol: 1e-06, subsample: 0.1, |
16221.750 |
| XGB | 1500/8 | tree_method=hist, n_estimators: 2600, subsample: 0.9, scale_pos_weight: 0.05, reg_lambda: 0.89, reg_alpha: 0.2, min_child_weight: 10, max_depth: 5, learning_rate’: 0.01, gamma’: 0.05, colsample_bytree’: 0.79 |
56941.400 |
| HGB | 1500/8 | quantile: 1, min_samples_leaf: 49, max_iter: 680, max_depth: 5, loss: absolute_error, learning_rate: 0.101, l2_regularization: 0.0 | 10296.756 |
| LIGHTBM | 1500/8 | n_estimators: 2200, boosting_type :dart, subsample_freq: 4, subsample’: 0.5, reg_lambda: 2.40 ,reg_alpha’: 0.0, num_leaves: 31, min_sum_hessian_in_leaf’: 19, min_data_in_leaf’: 21, max_depth: 10, max_bin: 70, learning_rate: 0.1, colsample_bytree: 0.5, bagging_seed: 96, bagging_freq: 6, bagging_fraction’: 0.3, objective:‘regression’, force_row_wise:True, | 126833.010 |
| Voting | /8 | Average output results:{ HGB,ET,GB,RF} | 900.541 |
| Stacking | /8 | Combining algorithm-Layer 1:{ HGB,ET,GB,RF}, Layer-2:{LinearRegressor} Cross-validation: KFold{five folds without shuffle} |
2100.780 |
|
Ensemble Learning |
Test set | ||||
|---|---|---|---|---|---|
| Evaluation metrics | |||||
| MSE [W2/m4] |
RMSE [W/m2] |
MAE [W/m2] |
MAPE [%] |
R2 [-] |
|
| RF | 4243.296 | 65.141 | 33.745 | 9.20 | 0.9538 |
| ET | 3795.275 | 61.606 | 30.722 | 8.40 | 0.9584 |
| XGB | 3515.760 | 59.294 | 33.460 | 12.90 | 0.9608 |
| AGB | 8739.339 | 93.484 | 70.992 | 49.11 | 0.9027 |
| GB | 3499.137 | 59.154 | 31.977 | 11.8 | 0.9610 |
| HGB | 3308.874 | 57.523 | 30.839 | 10.7 | 0.9631 |
| LGBM | 3494.692 | 59.116 | 33.883 | 16.00 | 0.9611 |
| Stacking | 3218.265 | 56.730 | 29.872 | 10.60 | 0.9645 |
| Voting | 3346.470 | 57.849 | 29.220 | 10.40 | 0.9627 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
