Submitted:
18 September 2024
Posted:
20 September 2024
You are already at the latest version
Abstract

Keywords:
1. Introduction
1.1. Research Questions and Hypotheses
- RQ1 Which features (e.g., previously consumed power, temperature, irradiation) are most influential in predicting power consumption in academic vs. industrial buildings?
- RQ2 To what extent do climatic conditions (oceanic vs. continental Mediterranean) influence the predictive accuracy of power consumption models in different types of buildings?
- RQ3 What impact does the frequency of data collection (15-minute intervals for the Academic building vs. hourly for the Industrial building) and the split between train, validation, and test sets have on the performance of these machine learning models?
- RQ4 How does the choice of kernel in SVM (Radial vs. Sigmoid) affect the ability of the model to capture non-linear relationships in power consumption data from different building types?
- RQ5 How does the accuracy of power consumption predictions vary among different machine learning models (RF, SVM with Radial and Sigmoid kernels, DNN) when applied to buildings with distinct architectural functions (Academic vs. Industrial)?
- H1 Humidity and occupancy rates will be the most influential features in predicting power consumption in the Academic building, while temperature and equipment usage will be more critical in the Industrial building.
- H2 The climatic conditions will have a more significant impact on the accuracy of predictions in the Industrial building than in the Academic building due to the extreme temperature fluctuations typical of continental Mediterranean climates.
- H3 The prediction accuracy of power consumption models will be higher for the Academic building, which has a higher frequency of data collection (15-minute intervals), compared to the Industrial building, where data is collected hourly.
- H4 The Radial kernel in SVM will provide better predictive accuracy for power consumption in both buildings compared to the Sigmoid kernel, due to its superior ability to model complex, non-linear relationships in the data.
- H5 The DNN model will outperform RF and SVM in predicting power consumption in both the Academic and Industrial buildings due to its ability to capture complex non-linear relationships.
2. Methodology
2.1. Data Collection
2.2. Data Preprocessing
2.2.1. Data Cleaning
2.2.2. Outlier Identification
2.2.3. Normalization
2.2.4. Feature Selection
2.2.5. Data Division
- Case 1 Split of around 72% of the data allocated to train, 18% to validation and 10% to test.
- Case 2 Split of around 80% of the data allocated to train, 10% to validation and 10% to test.
2.3. Prediction Models
- RF is particularly useful due to its ability to handle multiple features and capture complex patterns. RF constructs an ensemble of decision trees, each trained on random subsets of the dataset, which not only enhances accuracy but also helps mitigate overfitting. Additionally, the ease of interpretation of RF results is an advantage, as it allows for understanding which features are most influential in the predictions.
-
SVM is powerful for prediction tasks because it handles both linear and non-linear relationships in data. SVM seeks the optimal hyperplane that maximizes the separation between classes or patterns. It is effective with high-dimensional datasets, providing robust solutions.The choice of kernel function is crucial in SVM as it defines how data is transformed and separated, significantly enhancing SVM’s predictive power and generalization ability.
- –
- The Radial Basis Function (RBF) kernel, also known as the Gaussian kernel or just Radial kernel, is popular due to its efficiency in modeling non-linear relationships. However, it may overfit small or noisy datasets.
- –
- The Sigmoid kernel is another option that offers flexibility in modeling various relationships, but may underperform compared to the RBF kernel in highly non-linear environments.
-
DNN is a deep learning architecture consisting of multiple layers of interconnected neurons. DNNs are particularly effective for learning complex and non-linear representations of data, making them suitable for tasks involving intricate patterns and relationships. One of the primary advantages of DNNs is their ability to automatically learn relevant features from the data, which enhances their performance in modeling intricate patterns.Due to its effectiveness and efficiency, Rectified Linear Unit (ReLU) activation function is chosen. It facilitates a faster training of the model, reduces the likelihood of overfitting and helps in capturing complex patterns by not suffering from the vanishing gradient problem.
2.4. Evaluation Metrics of the Models
3. Case Study
3.1. Buildings Description
3.2. Data Analysis and Correlations
3.2.1. Pearson Correlation
3.3. Input Data
- Pprev 24h. The power consumption at this point in time 24 hours before, normalized between 0 and 1.
- Pprev 48h. The power consumption at this point in time 48 hours before, normalized between 0 and 1.
- Holiday. Boolean value indicating whether the time to be predicted belongs to a holiday ‘1’ or not ‘0’.
-
Base power. Boolean value indicating whether at the time to be predicted there is a base load ‘1’ or not ‘0’.
-
Working hours. Boolean value indicating whether the time to be predicted belongs to working time ‘1’ or not ‘0’.
- Air temperature. The air temperature at that time, normalized between 0 and 1.
4. Experimental Results and Discussion
4.1. Analysis of Results
4.2. Discussion
- During the analysis of consumption data and correlation with environmental factors, it was observed that the features within the Basics package, directly related to occupancy rate and equipment usage, carry the most significant weight for both the Academic and Industrial buildings. In general, environmental factors showed low correlation with consumption. However, there is a high correlation between air temperature and power consumption depending on the month during working time in the Industrial building, justifying its use as a feature in this context.
- As previously noted, there is a high correlation between air temperature and power consumption during specific months and working hours in the Industrial building, a pattern not observed in the Academic building. However, when examining metrics in its Basics+AirTemp settings such as RMSE, MAE, and R², only in Case 2 with the SVM-Radial model does the performance surpass that of the Basics, with many instances showing even poorer results than the Persistent Model.
- Overall, better RMSE values were achieved in the Academic building compared to the Industrial one. Additionally, lower MAPE and CV values were observed, along with higher R² scores, indicating that the models performed more effectively in the environment with 15-minute data intervals than in the one with hourly intervals. Furthermore, the absolute error distributions in the Academic context were much more concentrated around zero.
- In all cases studied across both buildings, except for Case 2 in the Industrial building, the SVM-Radial model consistently outperformed the SVM-Sigmoid. The absolute error distribution clearly indicates better performance of the SVM-Radial in the Academic building, and although the difference is less pronounced in the Industrial building, a higher concentration of errors around zero is still observed.
- Firstly, in the case of the Academic building, the evaluation metrics show similar behavior between the DNN and RF models. However, the RF model generally exhibits better performance, with the lowest RMSE and a very narrow and high absolute error distribution. Secondly, in the Industrial building, the DNN model does not perform as well as the SVM-Radial or RF models, with RF being slightly superior.
5. Summary and Conclusions
Author Contributions
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| ANN | Artificial Neural Network |
| CNN | Convolutional Neuronal Network |
| DNN | Deep Neuronal Network |
| IoT | Internet of Things |
| LSTM | Long Short-Term Memory |
| ML | Machine Learning |
| PM | Persistent Model |
| RF | Random Forest |
| SVM | Support Vector Machine |
| ZEB | Zero-Emission Building |
References
- IEA. Energy Efficiency. Technical report, International Energy Agency, 2023.
- Zhao, T.; Xu, J.; Zhang, C.; Wang, P. A monitoring data based bottom-up modeling method and its application for energy consumption prediction of campus building. Journal of Building Engineering 2021, 35, 101962. [Google Scholar] [CrossRef]
- Liu, X.; Liu, X.; Luo, X.; Fu, H.; Wang, M.; Li, L. Impact of Different Policy Instruments on Diffusing Energy Consumption Monitoring Technology in Public Buildings: evidence from Xi’ an, China. Journal of Cleaner Production 2020, 251, 119693. [Google Scholar] [CrossRef]
- Wu, Y.; Wang, W.; Sun, Y.; Cui, Y.; Duan, D.; Deng, S. An equivalent temperature drop method for evaluating the operating performances of ASHP units jointly affected by ambient air temperature and relative humidity. Energy and Buildings 2020, 224, 110211. [Google Scholar] [CrossRef]
- Nematchoua, M.K.; Nishimwe, A.M.R.; Reiter, S. Towards nearly zero-energy residential neighbourhoods in the European Union: A case study. Renewable and Sustainable Energy Reviews 2021, 135, 110198. [Google Scholar] [CrossRef]
- Zhang, Q.; Yang, L.T.; Chen, Z.; Li, P. A survey on deep learning for big data. Information Fusion 2018, 42, 146–157. [Google Scholar] [CrossRef]
- Amasyali, K.; El-Gohary, N.M. A review of data-driven building energy consumption prediction studies. Renewable and Sustainable Energy Reviews 2018, 81, 1192–1205. [Google Scholar] [CrossRef]
- Somu, N.; R, G.R.M.; Ramamritham, K. A hybrid model for building energy consumption forecasting using long short term memory networks. Applied Energy 2020, 261, 114131. [Google Scholar] [CrossRef]
- Bakar, N.N.A.; Hassan, M.Y.; Abdullah, H.; Rahman, H.A.; Abdullah, M.P.; Hussin, F.; Bandi, M. Energy efficiency index as an indicator for measuring building energy performance: A review. Renewable and Sustainable Energy Reviews 2015, 44, 1–11. [Google Scholar] [CrossRef]
- Bourdeau, M.; qiang Zhai, X.; Nefzaoui, E.; Guo, X.; Chatellier, P. Modeling and forecasting building energy consumption: A review of data-driven techniques. Sustainable Cities and Society 2019, 48, 101533. [Google Scholar] [CrossRef]
- Wei, Y.; Zhang, X.; Shi, Y.; Xia, L.; Pan, S.; Wu, J.; Han, M.; Zhao, X. A review of data-driven approaches for prediction and classification of building energy consumption. Renewable and Sustainable Energy Reviews 2018, 82, 1027–1047. [Google Scholar] [CrossRef]
- Ramokone, A.; Popoola, O.; Awelewa, A.; Temitope, A. A review on behavioural propensity for building load and energy profile development – Model inadequacy and improved approach. Sustainable Energy Technologies and Assessments 2021, 45, 101235. [Google Scholar] [CrossRef]
- Ghoddusi, H.; Creamer, G.G.; Rafizadeh, N. Machine learning in energy economics and finance: A review. Energy Economics 2019, 81, 709–727. [Google Scholar] [CrossRef]
- Al-Saudi, K.; Degeler, V.; Medema, M. Energy Consumption Patterns and Load Forecasting with Profiled CNN-LSTM Networks. Processes 2021, 9, 1870. [Google Scholar] [CrossRef]
- Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. Journal of Big Data 2019, 6. [Google Scholar] [CrossRef]
- Türkmen, A.C.; Januschowski, T.; Wang, Y.; Cemgil, A.T. Forecasting intermittent and sparse time series: A unified probabilistic framework via deep renewal processes. PLoS ONE 2021, 16. [Google Scholar] [CrossRef] [PubMed]
- Adadi, A. A survey on data-efficient algorithms in big data era. Journal of Big Data 2021, 8. [Google Scholar] [CrossRef]
- Olu-Ajayi, R.; Alaka, H.; Sulaimon, I.; Sunmola, F.; Ajayi, S. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. Journal of Building Engineering 2022, 45, 103406. [Google Scholar] [CrossRef]
- Zheng, G.; Feng, Z.; Jiang, M.; Tan, L.; Wang, Z. Predicting the Energy Consumption of Commercial Buildings Based on Deep Forest Model and Its Interpretability. Buildings 2023, 13. [Google Scholar] [CrossRef]









| Raw Data | Consumption (kW) | ||||||
|---|---|---|---|---|---|---|---|
| Length | Frequency | Days | Min. | Max. | Mean | Stand. dev. | |
| Academic | 92,160 | 15 min | 960 | 0 | 10,410 | 20.24 | 40.87 |
| Clean Data | Consumption (kW) | ||||||
|---|---|---|---|---|---|---|---|
| Length | Frequency | Days | Min. | Max. | Mean | Stand. Dev. | |
| Academic | 69,216 | 15 min | 721 | 0.15 | 64.66 | 23.65 | 14.74 |
| Industrial | 8,759 | 1 h | 365 | 10 | 119 | 41.82 | 23.21 |
| Academic | Industrial | |||
|---|---|---|---|---|
| Working hours | Off-hours | Working hours | Off-hours | |
| Temperature | 0.205508 | 0.012566 | -0.196301 | -0.177106 |
| Irradiation | 0.176214 | 0 | 0.033515 | -0.033712 |
| Workday | Holiday | Workday | Holiday | |
| Pprev 24 h | 0.816096 | 0.926576 | 0.887726 | 0.154234 |
| Pprev 48 h | 0.710268 | 0.915227 | 0.854311 | 0.272240 |
| Label combination | |||||||
|---|---|---|---|---|---|---|---|
| Label | Type | Academic | Industrial | ||||
| Basics |
Basics +48h |
Basics |
Basics +48h |
Basics +AirTemp |
Compl. | ||
| Pprev 24 h | Fractional | • | • | • | • | • | • |
| Holiday | Boolean | • | • | • | • | • | • |
| Base Power | Boolean | • | • | • | • | • | • |
| Work hours | Boolean | • | • | • | • | • | • |
| Air temp. | Boolean | • | • | ||||
| Pprev 48 h | Fractional | • | • | • | |||


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).