1. Introduction
Optimization approaches have emerged as tools for solving complex problems across various disciplines. Unlike traditional linear models, nonlinear optimization (NOPT) methods are capable of incorporating the intricate and interdependent relationships inherent in real-world scenarios [
1]. These techniques are particularly valuable in fields such as engineering, economics, and operations research, where they enable the formulation and solution of models that more accurately reflect the underlying dynamics [
2]. By leveraging advanced algorithms and computational solutions, NOPT facilitates improved decision-making and implementation, thereby enhancing efficiency and effectiveness in tackling multifaceted challenges. As research and technology continue to evolve, their significance in achieving optimal outcomes in diverse applications is becoming increasingly evident [
3,
4]. Nonetheless, NOPT comprises salient issues: First, data variability and noisy input measurements yield erroneous and fluctuating solutions. Second, nonlinear constraints greatly complicate the task of achieving optimal outputs [
5]. Moreover, system scalability should be considered.
Data variability and noisy samples, in particular, are known to be a problem that makes stochastic measurements less accurate and increases the number of errors in NOPT [
6]. The presence of unwanted effects in the data not only reduces the solution quality but also adds complications to the computation, making it more difficult to choose suitable optimization parameters [
7]. The instability greatly impedes the optimization process, rendering the algorithm vulnerable to external effects and significantly reducing its overall efficiency [
8]. Besides, the intricacies of nonlinear constraints might result in outcomes that are either infeasible or suboptimal [
9]. Then, the NOPT may have a slow rate of convergence, with a tendency to become trapped at a local minimum. This might present a challenge when both speed and accuracy are crucial [
10]. Hence, optimization techniques become impractical for large-scale applications [
11], and as the number of variables increases, scalability becomes a significant hindrance, underscoring the pressing need for specialist software and more processing time [
12]. Consequently, it is important to deal with large optimization problems, reduce runtime, and simplify the inherent complexity of noisy inputs and nonlinear constraints [
11]. Indeed, many NOPT tasks are nondeterministic polynomial-time (NP-hard), making it difficult to find an exact solution for large instances because there is not a polynomial-time algorithm that works well or that does not introduce errors into the final output [
13]. Additionally, some NOPT tasks have non-convex nonlinear programming (NLP) issues. The latter are especially challenging because they involve a lot of non-convex and integer functions [
14].
Typically, mathematical programming or other classical techniques solve NOPT. These methods are capable of effectively handling non-linearities and discontinuities [
9]. Customized strategies are also implemented to refine the iterative search [
15]. Gradient-based techniques, mostly based on descent methods, have also shown they can deal with problems like non-linear and convex constraints [
16]. Similarly, decomposition methods simplify complexity by segmenting the optimization into more manageable subproblems [
17]. Additionally, search approaches and metaheuristics are crucial for maintaining a proper balance between exploration and exploitation [
18], which enhances efficiency in finding optimal outputs. However, conventional methods often converge on solutions that may not be useful, especially in stochastic and noisy environments with high uncertainty and intrinsic data variability, which can reduce their accuracy [
19].
Nowadays, artificial neural networks (ANNs) employ supervised learning to tackle nonlinear and stochastic problems through regression tasks. These networks are trained to find complex patterns and make accurate predictions even when there is a lot of uncertainty using data-driven strategies [
20]. Commonly, ANN-based approaches employ automatic differentiation (AD), a computational technique used to evaluate the derivatives of functions efficiently and accurately. Unlike numerical alternatives, which can suffer from precision issues, or symbolic differentiation, which can be computationally expensive, AD works by breaking down functions into elementary operations for which derivatives are known and applying the chain rule systematically [
21]. This process ensures that the derivative calculations are exact to machine precision and enables the calculation of loss function gradients with respect to network parameters, which is essential for gradient-based optimization algorithms like back-propagation.
Recently, physics-informed neural networks (PINNs) have emerged as an effective ANN-based optimization technique. Designed to align training with relevant physical principles, they have proven successful in various NOPT applications [
22]. Commonly, the Karush-Kuhn-Tucker (KKT) criteria are used to represent constraints and integrate them into the network’s cost function during supervised training [
23]. Additionally, a novel approach for integrating constraints using Runge-Kutta (RK) in unsupervised training has been proposed in [
24]. Nevertheless, putting these networks into action is hard, especially when it comes to defining the right loss functions, choosing the best hyperparameters, and making sure that computations run quickly while complex systems are being trained [
25]. Also, although PINNs have remarkable capabilities, their ability to generalize to nonlinear optimization problems is limited [
26].
In this paper, we present a novel regularized PINN framework, termed RPINN, as a NOPT optimization tool for both supervised and unsupervised data-driven scenarios. As a result, we deal with three key NOPT issues. We first address data variability and noisy input measurements by appropriately adapting custom activation and regularization penalties within an ANN scheme. Second, we effectively integrate nonlinear constraints into the network architecture, adhering to the principles of model physics. Specifically, we utilize the network weights and/or learned features within a functional composition framework to determine the NOPT variables. Third, our ANN-based strategy employs AD training, which favors system scalability and computational time through batch-based back-propagation. Experimental results from both supervised and unsupervised data-driven NOPT tasks confirm that our proposal is robust and competitive against state-of-the-art optimization approaches. The primary advantage of our proposal lies in its stability against noisy input measurements, making it a particularly valuable solution in contexts with fluctuating information. Furthermore, because RPINN is based on ANN, it offers flexibility in terms of network architecture.
The agenda for this paper is as follows:
Section 2 summarizes the related work.
Section 3 describes the materials and methods.
Section 5 and
Section 6 depict the experiments and discuss the results. Lastly,
Section 7 outlines the conclusions and future work.
2. Related Work
Some studies have shown that mathematical programming has become a crucial tool in numerical optimization. A notable example is the analysis by [
9], which employs a sequential linear programming algorithm to address nonlinearities and discontinuities. In this context, the simplex method proves essential, being a classic technique effective for solving linear programming problems through iterative adjustments of solutions within a feasible set [
27]. Similarly, the study by [
15] explores a solution via quadratic programming (QP). Mixed-integer programming (MIP), on the other hand, is an optimization strategy that uses both integer and continuous variables. It is widely used to solve difficult problems [
28], focusing on how the branch-and-cut (BC) algorithm can be employed to find the best solution [
29]. Furthermore, second-order cone programming (SOCP) facilitates effective solutions for problems involving linear and quadratic constraints [
30]. New studies, like [
31], look into semidefinite programming (SDP), and the work in [
32] uses convexification techniques. Likewise, exponential programming (EXP), which models NOPT objectives and constraints through exponential functions [
33]. Additionally, power cone programming (PCP) is considered for modeling product and square relationships [
34]. Yet, these classical methods face challenges such as scalability, computation time, convergence, and practical precision, underscoring their inherent complexity and limitations. Furthermore, the use of relaxations or approximations affects the optimization accuracy [
35].
On the other hand, gradient methods’ efficiency and precision in identifying optimal solutions highlight their relevance for practical optimization tasks. The work in [
36] uses the Dai-Liao conjugate gradient method and hyperplane projections for global convergence to solve nonlinear equations. In addition, [
37] faces the non-convex issue based on a set of starting points. Moreover, nonlinear decomposition using linear programming (LP) and gradient descent was also proposed [
38]. Further, the work in [
39] examined the Newton-based search to deal with convergence issues in poorly conditioned systems. Also, the semi-sweeping Newton technique was applied for optimization in Hilbert spaces [
40]. For noisy problems, the authors in [
41] use piecewise polynomial interpolation and box reformulations, along with an interior-point (IP) method. Authors in [
42] tackle similar problems with integrated penalty techniques. Overall, gradient methods are effective at solving NOPT tasks, but they have a challenging time convergent and are expensive to run in noisy and nonlinear situations [
43]. Besides, it can be challenging to choose the best learning rate, and they run the risk of finding local minima [
44]. As seen in [
45], it is also important to make sure that at least first-degree differentiation continuity is maintained when using techniques like the conjugate gradient, the IP, and the Newton-based approach.
Of note, most of the available optimization solvers are based on the classical approaches mentioned above. Among them, Clarabel stands out for its versatility in optimizing a wide variety of problems. However, it still faces significant challenges in areas such as MIP [
46]. Gurobi is renowned for its proficiency in MIP due to its extensive range of techniques, including simplex and IP methods. However, because it is proprietary software, it might not be able to be used in situations that require license flexibility [
47]. Mosek is efficient concerning the IP approach, but its support for MIP is relatively limited, and its aptitude for NLP remains under debate, which could be a hindrance for developers who prefer open-source solutions [
48]. Xpress specializes in solving MIP, offers conditional support for NLP, but is a closed-license alternative [
49]. In turn, SCS, leveraging its open-source status, promotes adaptability and collaborative development, although its limitations in NLP reduce its effectiveness in certain optimization areas [
50]. IPOPT excels at solving NLP problems, and its open access allows for flexibility [
51].
Now, in this multifaceted optimization environment, the integration of tools such as MATPOWER, GEKKO, and CVXPY significantly expands the available options. MATPOWER is essential for solving energy system issues and supports solvers like Gurobi, Xpress, and IPOPT for linear, mixed-integer, and nonlinear programming [
52,
53,
54]. GEKKO specializes in dynamic systems and nonlinear models, offering a holistic and open-source Python platform [
55,
56]. CVXPY is an open-source modeling language for convex optimization problems embedded in Python. It allows you to express your problems naturally, mirroring the mathematical formulation rather than conforming to the restrictive standard form required by solvers [
57,
58].
Table 1 summarizes the mentioned solvers.
Recently, ANNs have positioned themselves as fundamental tools in optimization by incorporating deep learning techniques, effectively addressing the complexity and non-linearities of various problems. Conventional ANNs employ supervised learning to tackle nonlinear and stochastic problems through regression tasks. To this end, historical data or solutions precomputed by specialized NOPT tools are used to train these networks [
60]. This approach enables ANNs to learn complex patterns and make accurate predictions even under significant uncertainty [
20]. Typically, ANN-based approaches utilize AD, a computational method for efficiently and accurately evaluating function derivatives. Instead of numerical or symbolic differentiation, which can have issues with accuracy and require a lot of computing power, AD breaks functions down into simple operations whose derivatives are known and uses the chain rule consistently [
21]. Thereby, AD ensures machine-level accuracy in derivative calculations and simplifies the determination of loss function gradients in relation to network parameters, enabling the use of gradient-based search with back-propagation. The work in [
61] combines quasi-Newton methods and ANNs for NOPT. Furthermore, the authors in [
60] utilize deep learning to solve optimal flow problems. Similarly, the work in [
62] introduces an integrated training technique that, while effective, requires larger neural networks and presents challenges in generalization. Concurrently, [
63] uses elastic layers and incremental training as optimization-based solvers. Furthermore, the method by [
64] combines convex relaxation with graph neural networks.
Besides, PINN has recently emerged as a powerful optimization tool. These training approaches have proven effective in various NOPT applications, integrating relevant physical principles within ANNs [
22]. The KKT criteria are applied to formulate constraints that are incorporated into an ANN’s cost function during supervised training [
23]. In [
65], a PINN framework is detailed that imposes penalties for constraint violations in the loss function. The study in [
66] proposes a loss function that combines errors from differential and algebraic states with normative equation violations. Additionally, a novel strategy has been proposed to include constraints in unsupervised training using an RK-based technique [
24]. Nevertheless, complete approaches based on ANNs and PINNs face challenges such as optimality degradation. In response, advanced alternatives like [
67] have emerged, integrating system constraints into the cost function and applying penalties for violations. Furthermore, [
68] introduces an algorithm to address nonlinear problems modeled by partial differential equations with noisy data through Bayesian physics-informed neural networks (B-PINNs). Additionally, [
69] proposes a parametric differential equation-based approach holding functional connections to enhance the robustness and accuracy of PINNs. In turn, [
70] presents a truncated Fourier decomposition, termed Modal-PINNs, to optimize the reconstruction of periodic signals. However, these alternatives often lack adequate precision, generalization capability, and scalability [
71]. Finally, supervised data is usually required, complicating their application in various NOPT scenarios.
7. Conclusions
We introduce a novel Regularized Physics-Informed Neural Network (RPINN) framework, named RPINN, presenting a significant advancement in addressing the challenges associated with nonlinear constrained optimization. By integrating custom activation functions and regularization penalties within an ANN architecture, RPINN effectively handles data variability and noisy inputs. Besides, the incorporation of physics principles into the network architecture allows for the computation of optimization variables based on network weights and learned features, leading to competitive performance compared to state-of-the-art solvers. Furthermore, the use of automatic differentiation for training enhances scalability and reduces computation time, making RPINN a robust solution for various NOPT tasks. Experimental results included two scenarios regarding supervised and unsupervised datasets.
The uniform mixture model experiments (supervised constrained NOPT) show that the RPINN is good at dealing with data variability and noisy samples. For noise-free data, both RPINN and the IPOPT solver achieved similar results due to the convex nature of the problem. Still, in scenarios with noisy inputs, RPINN significantly outperformed IPOPT. The RPINN framework, leveraging the Huber loss function, showed greater robustness against noise by effectively regularizing the network weights. This resulted in more accurate and stable output predictions compared to IPOPT, which relied on an objective function based on the l2-norm and was more sensitive to outliers. The RPINN weight distributions were concentrated, which showed that the model could find the main output dynamics even when noise was present, as shown by the lower mean absolute percentage error across all signal-to-noise ratio values.
Then, the results of the gas-powered system (unsupervised constrained optimization) highlight the capability of the RPINN framework to effectively manage complex, nonlinear constraints under varying conditions of gas demand. Compared to the IPOPT framework, the RPINN showed consistent performance with low changes in the mean absolute percentage error. This was especially true when the gas demand was higher than the source’s maximum capacity. While IPOPT showed lower MAPE in terms of node balance and Weymouth constraints, its precision fluctuated significantly with data variability. In contrast, RPINN maintained stable performance, ensuring compliance with physical constraints such as the Weymouth equation and compression ratio limits. The custom penalty functions within RPINN facilitated this stability, proving particularly valuable when traditional methods struggled with outliers and extreme values. Overall, RPINN offered a robust, scalable solution with reduced prediction times.
As future work, authors plan to include Bayesian hyperparameter optimization for RPINN fine tuning [
77]. We will also look at normalized and information theoretic learning-based loss as ways to deal with noisy inputs and complicated constraints [
78,
79]. Finally, Bayesian PINN and graph neural networks will be coupled with our RPINN for representation learning enhancement [
68,
80].
Figure 1.
Classical optimization pipeline for NOPT.
Figure 1.
Classical optimization pipeline for NOPT.
Figure 2.
Regularized physics-informed neural network for data-driven nonlinear constrained optimization main sketch .
Figure 2.
Regularized physics-informed neural network for data-driven nonlinear constrained optimization main sketch .
Figure 4.
Optimizing gas-powered systems. An eight-node gas network is studied. The diagram depicts the nodes as points, and the arrows indicate flow direction. The trapezoidal shapes represent the pressure compressors.
Figure 4.
Optimizing gas-powered systems. An eight-node gas network is studied. The diagram depicts the nodes as points, and the arrows indicate flow direction. The trapezoidal shapes represent the pressure compressors.
Figure 5.
RPINN pipeline for the uniform mixture model-based NOPT.
Figure 5.
RPINN pipeline for the uniform mixture model-based NOPT.
Figure 6.
RPINN pipeline for the gas-powered system-based NOPT.
Figure 6.
RPINN pipeline for the gas-powered system-based NOPT.
Figure 7.
RPINN uniform mixture model-based NOPT results. First row: SNR. Second row: SNR. Third row: noise-free. Left: output prediction. Right: weight distribution. Green: target. Red: noisy target. Black: RPINN. Blue: IPOPT.
Figure 7.
RPINN uniform mixture model-based NOPT results. First row: SNR. Second row: SNR. Third row: noise-free. Left: output prediction. Right: weight distribution. Green: target. Red: noisy target. Black: RPINN. Blue: IPOPT.
Figure 8.
Uniform mixture model MAPE results. Left: output error. Right: weights error. (N): noisy-free. (-1),(3), and (5) stand for the SNR value.
Figure 8.
Uniform mixture model MAPE results. Left: output error. Right: weights error. (N): noisy-free. (-1),(3), and (5) stand for the SNR value.
Figure 9.
Gas-powered system regularized loss illustration. Left: node balance and Weymouth penalties based on conventional Huber-loss. Middle: Compression factor limit constraint using our Huber-based enhancement. (see Eq.
10). Right: Gas-powered system custom penalty evolution (Blue: Weymouth equality constraint; Orange: compression ratio limit constraint).
Figure 9.
Gas-powered system regularized loss illustration. Left: node balance and Weymouth penalties based on conventional Huber-loss. Middle: Compression factor limit constraint using our Huber-based enhancement. (see Eq.
10). Right: Gas-powered system custom penalty evolution (Blue: Weymouth equality constraint; Orange: compression ratio limit constraint).
Figure 10.
Gas-powered system objective cost and constraint compliance MAPE results. Upper left: node balance. Upper right: Weymouth constraint. Bottom left: compression ratio constraint. Bottom right: cost difference (objective function) between RPINN and IPOPT.
Figure 10.
Gas-powered system objective cost and constraint compliance MAPE results. Upper left: node balance. Upper right: Weymouth constraint. Bottom left: compression ratio constraint. Bottom right: cost difference (objective function) between RPINN and IPOPT.
Figure 11.
Gas-powered system bound constraint MAPE results. The star symbol on this graph denotes the defined limits for each of the sources, compressors, pipelines, and pressures as well as their behavior. The number on the x-axis indicates the node to which the information belongs. MMSCFD: Million standard cubic feet per day. psia: pounds per square inch absolute.
Figure 11.
Gas-powered system bound constraint MAPE results. The star symbol on this graph denotes the defined limits for each of the sources, compressors, pipelines, and pressures as well as their behavior. The number on the x-axis indicates the node to which the information belongs. MMSCFD: Million standard cubic feet per day. psia: pounds per square inch absolute.
Figure 12.
RPINN vs. IPOPT computational cost results. The graph compares solution times for the test data between the classical technique (IPOPT, in blue) and our strategy (RPINN, in green). On the left, the training times are shown, while on the right, the prediction times are displayed.
Figure 12.
RPINN vs. IPOPT computational cost results. The graph compares solution times for the test data between the classical technique (IPOPT, in blue) and our strategy (RPINN, in green). On the left, the training times are shown, while on the right, the prediction times are displayed.
Table 1.
State-of-the-art solvers for optimization. (*) Except mixed-integer SDP. (**) Features available with the licensed version only.
Table 1.
State-of-the-art solvers for optimization. (*) Except mixed-integer SDP. (**) Features available with the licensed version only.
Solver |
LP |
QP |
SOCP |
SDP |
EXP |
PCP |
MIP |
NLP |
Strategy |
Open source |
Software |
Clarabel [46] |
✓ |
✓ |
✓ |
✓ |
✓ |
x |
x |
x |
IP |
✓ |
CVXPY |
Gurobi [47] |
✓ |
✓ |
✓ |
x |
x |
x |
✓ |
x |
IP, Simplex, BC |
x |
MATPOWER, CVXPY |
Mosek [48] |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓* |
x |
IP |
x |
MATPOWER, CVXPY |
Xpress [49] |
✓ |
✓ |
✓ |
x |
x |
x |
✓ |
✓** |
IP, Simplex, BC |
x |
CVXPY |
SCS [50,59] |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
x |
x |
IP |
✓ |
CVXPY |
IPOPT [51] |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
IP |
✓ |
MATPOWER, GEKKO |
Table 2.
RPINN details for the uniform mixture model-based NOPT. : batch-size for AD-based back-propagation. Param. #: number of trainable parameters. Total # of parameters: 30.
Table 2.
RPINN details for the uniform mixture model-based NOPT. : batch-size for AD-based back-propagation. Param. #: number of trainable parameters. Total # of parameters: 30.
Layer name |
Type |
Output shape |
Param. # |
Input |
InputLayer |
(, 5) |
0 |
Dense_1 |
Dense(SELU) |
(, 5) |
25 |
Dense_2 |
Dense(SELU, l1-max-constraint) |
(, 1) |
5 |
Table 3.
RPINN architecture details for the gas-powered system NOPT.
: batch-size for AD-based back-propagation. Source switching, unsupply gas switching, custom dense, and bounded dense stand for specific switching, limited, and scaled layers, as explained in
Section 4.2. Param. #: number of trainable parameters. Total # of parameters: 11855.
Table 3.
RPINN architecture details for the gas-powered system NOPT.
: batch-size for AD-based back-propagation. Source switching, unsupply gas switching, custom dense, and bounded dense stand for specific switching, limited, and scaled layers, as explained in
Section 4.2. Param. #: number of trainable parameters. Total # of parameters: 11855.
Layer name |
Type |
Output shape |
Param. # |
Input |
InputLayer |
(, 8) |
0 |
Dense_1 |
Dense(SELU) |
(, 236) |
2124 |
Dense_2 |
Dense(SELU) |
(, 8) |
1896 |
Source switching |
CustomDense |
(, 1) |
1 |
BatchNormalization_1 |
BatchNormalization |
(, 236) |
944 |
BatchNormalization_2 |
BatchNormalization |
(, 8) |
32 |
Partial flows |
BoundedDense |
(, 50) |
1422 |
Unsupply gas switching |
CustomDense |
(, 8) |
0 |
Flow prediction |
Concatenate |
(, 59) |
0 |
Dense_3 |
Dense(SELU) |
(, 236) |
2124 |
BatchNormalization_3 |
BatchNormalization |
(, 236) |
944 |
Pressure prediction |
BoundedDense |
(, 8) |
1896 |
Node balance |
CustomDense |
(, 8) |
472 |
Weymouth |
CustomDense |
(, 14) |
0 |