Optimal control determines the actions that optimize a performance objective by solving a sequential decision-making problem. The preceding section highlighted the need for comparing S-MPC and AR-LSTM, the two primary approaches for optimal control applied to EM for hybrid MG control, and the possibility of combining them. Some components are utilized by both methods, while others are more controller-specific. These formulation differences make it difficult to compare and combine the two approaches, necessitating a conceptual analysis. It assists in identifying the primary methods for optimal control and establishes a common ground for a comprehensive classification. The sections that follow detail the most important aspects of these control methods.
2.1. Approach
There are typically two ways to approach an optimal control problem: by employing the S-MPC-inherent receding horizon principle or formalizing the problem as an AR-LSTM.
S-MPC is a control strategy that involves using a mathematical model of the system being controlled to predict the future behavior of the system and optimize a control signal over a finite time horizon. At each time step, the control signal is updated based on the current state of the system and the predictions made by the model. It is widely used in industrial control applications, such as process control, automotive control, and robotics, where it is important to consider the system’s dynamics being controlled and optimize performance over a prediction horizon. At each time step
k in S-MPC, switching logic controls multi-mode for the accumulators that fully describe the controller model at the current time. Then, the trajectories of the future state
x and input
u are optimized for a prediction horizon
based on the explicit representation of an objective function
J and a controller model
F.
J is the minimization of the imported energy and maximization of the exported energy. The constraints
H are also introduced explicitly in the optimization problem. Objective function, model, and constraints may also depend on model outputs
y and time-invariant parameters
p. In addition,
is the reference variable representing the PV, load data, and zero along the prediction horizon
.
and
are weighting coefficient reflecting the relative significance of
and penalizing relatively large variation in
, respectively. Implemented is only the initial control input from the optimized trajectory [
10].
Figure 1a depicts the full S-MPC procedure.
In the application of S-MPC to EM for MGs, the state vector x represents the state of charge of the accumulators (), such as the battery, fuel tank, and water tank, and the model output y illustrates the imported and exported energy, such as a grid to the load and PV to the grid, and the battery ( + ). Depending on whether or not the controller model employs physical insights, the set of time-invariant parameters p may or may not represent the physical properties of the MG.
In contrast to RNN-LSTM, AR models are not neural network architectures. On the contrary, they are statistical models that identify dependencies and patterns within a time series based on its own lagged values. The AR model predicts the future values of a variable based on its historical values and the estimated coefficients during model training. In other words, AR models are a statistical modeling technique that assumes a variable’s current value is a function of its previous values. They are utilized frequently for time series analysis and forecasting. Therefore, AR models can be viewed as linear regression in which the predictors are the values of the same variable at a prior time [
16]. AR models can be used to model the system’s dynamics within the context of control systems or reinforcement learning. The model can predict future states or observations by estimating the AR coefficients. These predictions can then be fed into control algorithms or reinforcement learning agents in order to optimize control signals or decision-making. Unlike neural network architectures, AR models are not adaptive by nature. The estimation of AR coefficients requires training on historical data, and their performance may degrade if the underlying dynamics of the system change significantly over time.
An AR model of order q can be mathematically represented by the following equation [
16]:
where
represents the value of the time series at time
k in this equation.
c is a constant term or an intercept. AR model coefficients are represented by
terms. The coefficients or weights associated with the previous values of the time series are denoted by 1, 2, …,
q.
,
, …,
represent the lagged values of the time series at time points
,
, …,
, respectively.
is the error term or random noise at time
k, representing the data portion the model cannot explain.
RNN-LSTM is a neural network type that is ideally suited for processing sequential data. It has loops that allow information to be passed from one step of a sequence to the next, unlike feed-forward neural networks. The approach for employing RNN-LSTM includes selecting an appropriate network architecture, an optimization algorithm for training the network, and an appropriate set of hyper-parameters. RNN-LSTM is an extension of a feed-forward neural network with internal memory. RNN-LSTM is recurrent in nature because it performs the same function for each data input, while the output of the current input is dependent on the previous computation. After the output has been generated, it is duplicated and sent back into the recurrent network [
46]. For decision-making, it considers both the current input and the output from the previous input it learned. As shown in
Figure 1b, the input vector of an LSTM network is
at time step
k.
represents the output vectors passed through the network between time steps
k and
. Three gates update and control the cell states in an LSTM network: the forget gate, input gate, and output gate. The gates are activated by hyperbolic tangent and sigmoid functions. Given new information that has entered the network, the forget gate determines which cell state information to forget. Given new input information, the input gate determines what new information will be encoded into the cell state. Using the output vector
, the output gate controls what information encoded in the cell state is sent to the network as input in the subsequent time step.
In the mathematical modeling of RNN-LSTM, The current state can be expressed mathematically as:
where
represents the current state,
represents the previous state, and
is the current input. Because the input neuron would have applied the transformations to the previous input, we now have a state of the previous input rather than the input itself. Each successive input is, therefore, referred to as a time step.
Considering the simplest form of a RNN-LSTM, where the activation function is tanx, the weight at the recurrent neuron is
, and the weight at the input neuron is
, we can write the equation for the state at time
k as follows [
46]:
In this instance, the recurrent neuron only considers the previous state. The equation may involve multiple such states for longer sequences. After calculating the final state, the output can be generated. Once the current state has been computed, we can then calculate the output state as follows [
46]:
where
is the output state and
is the weight at the output state. This process is represented by
Figure 2.
First, it extracts from the input sequence and then outputs , which, along with , is the input for the subsequent step. Therefore, and are the inputs for the subsequent step. Similarly, from the subsequent step is the input for for the subsequent step, and so on. Consequently, it remembers the context throughout training.
A cost function quantifies "how well" a neural network performs with respect to the training sample and the expected output. It may also depend on factors like weights and biases. This is a single value, not a vector, because it evaluates the overall performance of the neural network. The objective of the cost function is to evaluate the network’s performance to minimize its value during training. The cost function for a typical RNN-LSTM is the sum of losses at each time step [
47].
where
represents the parameters of the RNN,
T represents the length of the input sequence,
represents the predicted output and
represents the actual output at time step
k.
L is the loss function quantifying the difference between the predicted and actual output. Using gradient descent or a comparable optimization algorithm, the RNN’s training parameters are adjusted to minimize the cost function. The objective is to identify the parameters that minimize the loss over all time steps, resulting in an RNN that can accurately predict the output for a given input sequence.
2.2. Solution method
By analyzing the control processes illustrated in
Figure 1a,b, it is possible to identify a number of expressions with total or partial equivalence between the two methods.
S-MPC can be solved implicitly by performing switching logic, forecasting, and resolving a dynamic optimization problem at each time step or explicitly by learning a control policy from data generated by a S-MPC with any type of function approximation. Consequently, S-MPC has a higher online computational cost because every control step requires estimation of the states and dynamic optimization. Typically, the optimization problem in S-MPC is solved using numerical optimization techniques, such as nonlinear programming or quadratic programming (QP) (in this paper, QP has been used), to solve the optimization problem. The solution to the optimization problem over the prediction horizon provides the optimal control signal. At each time step, the first component of the optimal control signal is applied to the system, and the process is repeated with updated state and prediction horizon values. S-MPC necessitates the solution of an optimization problem at each time step, which can be computationally expensive for large systems.
The training process for AR-LSTM involves back-propagation through time (BPTT), a variation of the back-propagation algorithm that takes temporal dependencies in the data into account. Throughout the training, the RNN is unrolled for a predetermined number of time steps, and gradients are calculated at each step. The RNN’s weights are then updated based on the gradients accumulated across all time steps. The most prevalent optimization algorithm for training RNNs is gradient descent, which involves updating the weights iteratively in the direction of the loss function’s negative gradient [
46]. However, the standard gradient descent algorithm is susceptible to issues such as vanishing gradients, in which the gradients become extremely small, and the weights do not update. Several variants of gradient descent, such as the adaptive gradient descent algorithms AdaGrad, RMSProp, and Adam, have been developed to address this issue [
48].