Supervised Imitation Learning for Optimal Setpoint Trajectory Prediction in Energy Management under Dynamic Electricity Pricing

Philipp Wohlgenannt; Vinzent Vetter; Lukas Moosbrugger; Mohan Kolhe; Elias Eder; Peter Kepplinger

doi:10.20944/preprints202602.1302.v1

Submitted:

17 February 2026

Posted:

25 February 2026

You are already at the latest version

Abstract

Energy management systems under dynamic electricity pricing require fast and cost-optimal control strategies for the optimization of flexible loads such as heating, ventilation, and air conditioning (HVAC) systems and refrigeration units. While Mixed-Integer Linear Programming (MILP) can compute theoretically optimal control trajectories, its practical application is limited due to computationally expensive optimization, leading to limited real-time applicability, and its dependence on accurate forecasts of electrical loads and other relevant time-series signals including disturbances. This paper proposes a supervised imitation learning (IL) framework that learns to imitate MILP-optimal setpoint trajectories for a conventional proportional (P) controller using only electricity price signals and temporal features. Our IL model predicts setpoint trajectories in an open-loop manner without direct state feedback and a subsequent conventional P-controller provides closed-loop robustness in a two-stage control structure. In this study, our approach is validated for electrical load shifting of a refrigeration system in an industrial warehouse, including a systematic benchmark of multiple IL models. MILP achieves a cost reduction of 21.07% relative to baseline and serves as a theoretical upper bound. Among IL models, sequence-based architectures achieve the highest savings, with Transformer and Long Short-Term Memory (LSTM) models closely approximating MILP behavior, reaching 19.33% and 19.28% respectively. A closed-loop reinforcement learning (RL) controller achieves 19.69% savings and is included as an additional benchmark, while heuristic strategies reach at most 14.43% savings. From a computational perspective, IL models enable fast training and real-time inference, with Transformer inference requiring 526 ns per prediction compared to 22.8 s for a single MILP optimization. This makes the proposed approach well suited for real-time and edge computing applications. Overall, the results demonstrate that the proposed supervised IL approach can achieve near-optimal control performance with substantially reduced computational effort, providing a scalable and cost-efficient solution for energy management.

Keywords:

imitation learning

;

supervised learning

;

optimal control

;

energy management

;

dynamic electricity pricing

;

transformer

;

industrial warehouse

Subject:

Engineering - Control and Systems Engineering

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Supervised Imitation Learning for Optimal Setpoint Trajectory Prediction in Energy Management under Dynamic Electricity Pricing

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe