Submitted:
27 September 2023
Posted:
28 September 2023
You are already at the latest version
Abstract

Keywords:
1. Introduction
- To adapt to uncertainty, we propose two modules to achieve robust scheduling. One module combines deep learning-based prediction techniques with stochastic optimization methods, while the other module is an online data augmentation strategy, including stages of model pretraining and finetuning.
- In order to realize sharing rewards among buildings, we propose to use multi-agent PPO to simulate each building. Addition, We provide the ensemble method between reinforcement learning and optimization methods.
- The proposed method won the 1st place of the NeurIPS Challenge Competition. We conduct extensive experiments on real-world scenario and the results demonstrate the effectiveness of our proposed framework.
2. Problem Statement
- (1a)
- Electricity need bounds from national grid: larger than zero and without upper bounds;
- (1b)
- () denotes the lower bound of electricity generation device, such as solar generation, while the () denotes the upper one.
- (1c)
- () represents the upper limit for battery/storage charging at timestamp t, while () represents the upper limit for discharging.
- (1d)
- represents the lower value of soc (state of charge), while denotes the upper one; And the seconde equation denotes the updating of the soc;
- (1e)
- This equation makes sure the power grid is stale (the sum of power generation is equal to the sum of power consumption).
3. Framework
3.1. Feature Engineering
- The user loads of past months;
- The electricyty generation of past months;
- The radiance of solar direct or diffuse;
- Detailed time including the hour of the day, the day of the week and the day of the month;
- The forecasting weather information including the values of humidity, temperature, and so on;
- The key components detailed before;
- The predictions of user load and electricity generation;
- The number of solar generation units in each building;
- The efficiency and capacity of the storage in each building;
- Market prices including the values fo electricity and carbon;
3.2. Deep Learning-based Forecasting Model
3.3. Reinforcement Learning
3.4. Optimization
3.4.1. Stochastic Optimization

3.4.2. Online Data Augmentation
Pre-training/Fine-tuning Scheme
Rolling-horizon Feedback Correction
4. Experiments
4.1. Experiment Setup
4.1.1. Dataset
4.1.2. Metric
4.1.3. Baseline
- RBC: Rule-Based Control method. We tested several strategies and selected the best one: charging the battery by 10% of its capacity between 10 a.m. to 2 p.m., followed by discharging it by the same amount between 4 p.m. to 8 p.m..
4.1.4. Implementations
4.2. Results
4.3. Ablation Studies
4.3.1. Analysis of Online Data Augmentation
4.3.2. Analysis of Forecasting Models
4.3.3. Analysis of Stochastic Optimization
5. Conclusion
References
- Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A survey on concept drift adaptation. ACM computing surveys (CSUR) 2014, 46, 1–37. [Google Scholar] [CrossRef]
- Camacho, E.F.; Alba, C.B. Model predictive control; Springer science & business media, 2013.
- Hewing, L.; Wabersich, K.P.; Menner, M.; Zeilinger, M.N. Learning-based model predictive control: Toward safe learning in control. Annual Review of Control, Robotics, and Autonomous Systems 2020, 3, 269–296. [Google Scholar]
- Muralitharan, K.; Sakthivel, R.; Vishnuvarthan, R. Neural network based optimization approach for energy demand prediction in smart grid. Neurocomputing 2018, 273, 199–208. [Google Scholar]
- Elmachtoub, A.N.; Grigas, P. Smart “predict, then optimize”. Management Science 2022, 68, 9–26. [Google Scholar] [CrossRef]
- Lauro, F.; Longobardi, L.; Panzieri, S. An adaptive distributed predictive control strategy for temperature regulation in a multizone office building. 2014 IEEE international workshop on intelligent energy systems (IWIES). IEEE, 2014, pp. 32–37.
- Heirung, T.A.N.; Paulson, J.A.; O’Leary, J.; Mesbah, A. Stochastic model predictive control—how does it work? Computers & Chemical Engineering 2018, 114, 158–170. [Google Scholar]
- Yan, S.; Goulart, P.; Cannon, M. Stochastic model predictive control with discounted probabilistic constraints. 2018 European Control Conference (ECC). IEEE, 2018, pp. 1003–1008.
- Paulson, J.A.; Buehler, E.A.; Braatz, R.D.; Mesbah, A. Stochastic model predictive control with joint chance constraints. International Journal of Control 2020, 93, 126–139. [Google Scholar] [CrossRef]
- Shang, C.; You, F. A data-driven robust optimization approach to scenario-based stochastic model predictive control. Journal of Process Control 2019, 75, 24–39. [Google Scholar] [CrossRef]
- Bradford, E.; Imsland, L.; Zhang, D.; del Rio Chanona, E.A. Stochastic data-driven model predictive control using gaussian processes. Computers & Chemical Engineering 2020, 139, 106844. [Google Scholar]
- Ioannou, P.A.; Sun, J. Robust adaptive control; Courier Corporation, 2012.
- Åström, K.J.; Wittenmark, B. Adaptive control; Courier Corporation, 2013.
- Liu, X.; Paritosh, P.; Awalgaonkar, N.M.; Bilionis, I.; Karava, P. Model predictive control under forecast uncertainty for optimal operation of buildings with integrated solar systems. Solar energy 2018, 171, 953–970. [Google Scholar] [CrossRef]
- Yu, C.; Velu, A.; Vinitsky, E.; Gao, J.; Wang, Y.; Bayen, A.; Wu, Y. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games 2021.
- Cho, K.; van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation; Association for Computational Linguistics: Doha, Qatar, 2014; pp. 103–111. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555 2014. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347 2017. [Google Scholar]
- Sultana, W.R.; Sahoo, S.K.; Sukchai, S.; Yamuna, S.; Venkatesh, D. A review on state of art development of model predictive control for renewable energy applications. Renewable and sustainable energy reviews 2017, 76, 391–406. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; NIPS’17, p. 3149–3157.
- Kathirgamanathan, A.; Twardowski, K.; Mangina, E.; Finn, D.P. A Centralised Soft Actor Critic Deep Reinforcement Learning Approach to District Demand Side Management through CityLearn. Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities; Association for Computing Machinery: New York, NY, USA, 2020; RLEM’20, p. 11–14. [CrossRef]
- Varelas, K.; Auger, A.; Brockhoff, D.; Hansen, N.; ElHara, O.A.; Semet, Y.; Kassab, R.; Barbaresco, F. A comparative study of large-scale variants of CMA-ES. Parallel Problem Solving from Nature–PPSN XV: 15th International Conference, Coimbra, Portugal, September 8–12, 2018, Proceedings, Part I 15. Springer, 2018, pp. 3–15.
- Vázquez-Canteli, J.R.; Kämpf, J.; Henze, G.; Nagy, Z. CityLearn v1.0: An OpenAI gym environment for demand response with deep reinforcement learning. Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, 2019, pp. 356–357.
- MindOpt. MindOpt Studio, 2022.






| Methods | Overall Performance | |||
|---|---|---|---|---|
| Average Cost | Emission | Price | Grid | |
| RBC | 0.921 | 0.964 | 0.817 | 0.982 |
| MPC | 0.861 | 0.921 | 0.746 | 0.916 |
| AMPC | 0.827 | 0.859 | 0.750 | 0.872 |
| ES | 0.812 | 0.863 | 0.748 | 0.827 |
| SAC | 0.834 | 0.859 | 0.737 | 0.905 |
| MAPPO | 0.810 | 0.877 | 0.726 | 0.826 |
| Optimization | 0.804 | 0.871 | 0.719 | 0.822 |
| Ensemble | 0.801 | 0.864 | 0.718 | 0.821 |
| Forecast Model |
Online Update |
Dispatch | Forecast (WMAPE) | ||
|---|---|---|---|---|---|
| Average | Time | Load | Solar | ||
| Linear | ✘ | 0.878 | 7.96s | 42.12% | 27.25% |
| GBDT | 0.875 | 8.17s | 44.70% | 10.74% | |
| RNN | 0.876 | 9.30s | 45.97% | 10.66% | |
| Transformer | 0.879 | 10.64s | 45.25% | 10.60% | |
| Linear | ✓ Self-Adaptive Linear Correction |
0.871 | 8.17s | 39.35% | 21.23% |
| GBDT | 0.868 | 8.99s | 39.48% | 9.38% | |
| RNN | 0.866 | 10.01s | 39.29% | 9.25% | |
| Transformer | 0.869 | 11.03s | 39.86% | 9.12% | |
| RNN | ✓ Online Fine-tuning |
0.862 | 11.45s | 38.98% | 9.01% |
| Transformer | 0.864 | 12.15s | 39.30% | 9.07% | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).