Preprint
Article

This version is not peer-reviewed.

Supervised Imitation Learning for Optimal Setpoint Trajectory Prediction in Energy Management under Dynamic Electricity Pricing

Submitted:

17 February 2026

Posted:

25 February 2026

You are already at the latest version

Abstract
Energy management systems under dynamic electricity pricing require fast and cost-optimal control strategies for the optimization of flexible loads such as heating, ventilation, and air conditioning (HVAC) systems and refrigeration units. While Mixed-Integer Linear Programming (MILP) can compute theoretically optimal control trajectories, its practical application is limited due to computationally expensive optimization, leading to limited real-time applicability, and its dependence on accurate forecasts of electrical loads and other relevant time-series signals including disturbances. This paper proposes a supervised imitation learning (IL) framework that learns to imitate MILP-optimal setpoint trajectories for a conventional proportional (P) controller using only electricity price signals and temporal features. Our IL model predicts setpoint trajectories in an open-loop manner without direct state feedback and a subsequent conventional P-controller provides closed-loop robustness in a two-stage control structure. In this study, our approach is validated for electrical load shifting of a refrigeration system in an industrial warehouse, including a systematic benchmark of multiple IL models. MILP achieves a cost reduction of 21.07% relative to baseline and serves as a theoretical upper bound. Among IL models, sequence-based architectures achieve the highest savings, with Transformer and Long Short-Term Memory (LSTM) models closely approximating MILP behavior, reaching 19.33% and 19.28% respectively. A closed-loop reinforcement learning (RL) controller achieves 19.69% savings and is included as an additional benchmark, while heuristic strategies reach at most 14.43% savings. From a computational perspective, IL models enable fast training and real-time inference, with Transformer inference requiring 526 ns per prediction compared to 22.8 s for a single MILP optimization. This makes the proposed approach well suited for real-time and edge computing applications. Overall, the results demonstrate that the proposed supervised IL approach can achieve near-optimal control performance with substantially reduced computational effort, providing a scalable and cost-efficient solution for energy management.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated