Submitted:
22 July 2025
Posted:
23 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Classical Mathematical Approaches
2.2. Metaheuristic Approaches
2.3. Machine Learning and Reinforcement Learning Approaches
3. Problem Formulation
3.1. Convex Economic Dispatch Problem
3.1.1. Power Balance Constraint
3.1.2. Generator Capacity Constraints
3.2. Nonconvex Economic Dispatch Problems
3.2.1. Valve-Point Effects
3.2.2. Prohibited Operating Zones
3.3. Additional Operational Constraints
3.3.1. Spinning Reserve Constraint
3.3.2. Ramp Rate Constraints
3.4. Constraint Handling Approaches
3.4.1. Slack Variable Method for Equality Constraints
3.4.2. Penalty Function Approach for Inequality Constraints
4. Group Relative Policy Optimization
4.1. Core Principles
- Group-based Learning: Instead of training a single policy, GRPO maintains a population of policies that learn collaboratively.
- Relative Performance Assessment: Policies are evaluated not only on their absolute performance but also on their performance relative to other policies in the group.
- Trust Region Optimization: Policy updates are constrained to prevent excessive deviations from the established consensus of the group.
4.2. The GRPO Framework
4.2.1. Policy Population
- Multiple starting points in the solution space, reducing the risk of being trapped in local optima
- Diverse exploration strategies that collectively cover more of the solution space
- Robustness against individual policy failures through information sharing
4.2.2. Relative Advantage Estimation
4.2.3. Elite Reference Set
- Providing reference strategies for other policies to learn from
- Stabilizing the learning process by preserving successful approaches
- Guiding the exploration toward promising regions of the solution space
4.2.4. Group Trust Region
- Improved stability during learning
- Prevention of catastrophic forgetting
- Balanced exploration and exploitation across the policy population
4.3. Learning Process
- Experience Collection: Each policy interacts with the environment independently
- Relative Performance Evaluation: Policies are assessed against the group
- Elite Selection: Top-performing policies are identified (typically 20-30%)
- Policy Update: All policies are updated based on their experience and elite influence
4.4. Mathematical Formulation
4.4.1. Policy Parameterization
4.4.2. Relative Advantage Function
4.4.3. Objective Function
4.4.4. Group-Based Trust Region
4.4.5. Elite Selection and Update
4.5. Adaptive Hyperparameter Tuning
4.5.1. Adaptive Learning Rate
4.5.2. Adaptive Exploration
4.6. Theoretical Properties
4.6.1. Improved Sample Efficiency
4.6.2. Enhanced Exploration
4.6.3. Stronger Convergence Guarantees
4.6.4. Performance Bounds
4.7. Key Advantages of GRPO
4.7.1. Enhanced Exploration
4.7.2. Improved Stability
4.7.3. Robustness to Local Optima
4.7.4. Efficient Knowledge Sharing
5. GRPO Implementation for Economic Dispatch Problem
5.1. Problem Representation
5.1.1. State and Action Spaces
- Power demand ()
- Current generator outputs
- System constraints including prohibited operating zones
- Spinning reserve requirements
5.2. GRPO Architecture for EDP
5.2.1. Policy Population Design
5.2.2. Smart Initialization Strategy
5.3. Constraint Handling Mechanisms
5.3.1. Prohibited Operating Zones
- Detection: For each generator i, we check if the current power output falls within any prohibited zone:
-
Repair: If a violation is detected, we adjust the power output to the nearest allowed region:where is the set of allowed operating regions for generator i.
5.3.2. Power Balance Constraint
- Calculate the current imbalance:
- Identify adjustable generators based on their cost efficiency and available capacity
- Allocate the imbalance among the adjustable generators, prioritizing those with lower cost coefficients when increasing power and those with higher cost coefficients when decreasing power
- After each adjustment, verify that the generator remains outside prohibited zones; if not, find the nearest valid operating point
5.3.3. Spinning Reserve Constraint
5.4. GRPO Learning Process for EDP
5.4.1. Candidate Generation
5.4.2. Evaluation and Elite Selection
5.4.3. Policy Update Mechanism
5.5. Adaptive Mechanisms
5.5.1. Adaptive Noise Level
5.5.2. Solution Caching
5.5.3. Final Power Balance Adjustment
| Algorithm 1 GRPO for Economic Dispatch Problem |
|
| Algorithm 2 SmartInitialization for GRPO-EDP |
|
| Algorithm 3 Constraint Handling for GRPO-EDP |
|
6. Experimental Results and Discussion
6.1. Experimental Setup
| Parameter | Symbol | Value | Description |
|---|---|---|---|
| Population size | K | 50 | Number of candidate solutions |
| Maximum iterations | T | 200 | Maximum number of iterations |
| Initial noise level | 0.05 | Initial exploration magnitude | |
| Noise decay rate | 0.98 | Rate of exploration reduction | |
| Minimum noise | 0.001 | Lower bound for noise level | |
| Elite percentage | E | 0.3 | Proportion of elite solutions |
| Elite influence | 0.5 | Weight for elite-based update | |
| Early stopping threshold | 32450 | Cost threshold for early termination |
6.2. Test Cases
6.2.1. Performance on 15-Unit System
6.3. Comparative Analysis
6.4. GRPO Performance on Larger-Scale Systems :30, 60, 90 Units
6.4.1. Performance on 30-Unit System
6.4.2. Performance on 60-Unit System
6.4.3. Performance on 90-Unit System
6.5. Comparative Analysis of the Quality Solution
7. Discussion
7.1. Convergence Behavior and Learning Stability
7.2. Constraint Handling and Feasibility
7.3. Scalability and Generalizability
7.4. Practical Implications and Future Work
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fang, X.; Misra, S.; Xue, G.; Yang, D. Smart Grid—The New and Improved Power Grid: A Survey. IEEE Communications Surveys & Tutorials 2012, 14, 944–980. [Google Scholar] [CrossRef]
- Siano, P. Demand response and smart grids—A survey. Renewable and Sustainable Energy Reviews 2014, 30, 461–478. [Google Scholar] [CrossRef]
- Wood, A.J.; Wollenberg, B.F.; Sheblé, G.B. Power Generation, Operation, and Control, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Chaturvedi, K.; Pandit, M.; Srivastava, L. Self-Organizing Hierarchical Particle Swarm Optimization for Nonconvex Economic Dispatch. IEEE Transactions on Power Systems 2008, 23, 1079–1087. [Google Scholar] [CrossRef]
- Zia, M.F.; Elbouchikhi, E.; Benbouzid, M. Microgrids energy management systems: A critical review on methods, solutions, and prospects. Applied Energy 2018, 222, 1033–1055. [Google Scholar] [CrossRef]
- Secui, D.C. A new modified artificial bee colony algorithm for the economic dispatch problem. Energy Conversion and Management 2015, 89, 43–62. [Google Scholar] [CrossRef]
- Pradhan, M.; Roy, P.K.; Pal, T. Grey wolf optimization applied to economic load dispatch problems. International Journal of Electrical Power & Energy Systems 2016, 83, 325–334. [Google Scholar] [CrossRef]
- Mohamed, A.E.A.W.; Abido, M.A.; Ali, A. Economic dispatch solution using chaotic particle swarm optimization algorithm. Energy 2017, 118, 861–874. [Google Scholar] [CrossRef]
- Li, S.; Gong, W.; Yan, X.; Hu, C.; Bai, D.; Wang, L.; Gao, L. A comprehensive review of hybrid meta-heuristic optimization algorithms for solving economic dispatch problems. Applied Soft Computing 2020, 92, 106311. [Google Scholar] [CrossRef]
- Yang, T.; Zhao, L.; Li, W. Deep Reinforcement Learning Based Approach for Solving Economic Dispatch Problems. IEEE Access 2019, 7, 120641–120649. [Google Scholar] [CrossRef]
- Richardson, K.; Sabharwal, A. Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability. Proceedings of the AAAI Conference on Artificial Intelligence 2022, 36, 11209–11219. [Google Scholar] [CrossRef]
- Visutarrom, T.; Chiang, T.C. Economic dispatch using metaheuristics: Algorithms, problems, and solutions. Applied Soft Computing 2024, 150, 110891. [Google Scholar] [CrossRef]
- Elsayed, W.; Hegazy, Y.; Bendary, F.; El-Bages, M. A review on accuracy issues related to solving the non-convex economic dispatch problem. Electric Power Systems Research 2016, 141, 325–332. [Google Scholar] [CrossRef]
- Chiang, C.L. Improved Genetic Algorithm for Power Economic Dispatch of Units With Valve-Point Effects and Multiple Fuels. IEEE Transactions on Power Systems 2005, 20, 1690–1699. [Google Scholar] [CrossRef]
- Walters, D.; Sheble, G. Genetic algorithm solution of economic dispatch with valve point loading. IEEE Transactions on Power Systems 1993, 8, 1325–1332. [Google Scholar] [CrossRef]
- Qu, B.; Zhu, Y.; Jiao, Y.; Wu, M.; Suganthan, P.; Liang, J. A survey on multi-objective evolutionary algorithms for the solution of the environmental/economic dispatch problems. Swarm and Evolutionary Computation 2018, 38, 1–11. [Google Scholar] [CrossRef]
- Dhillon, J.; K. Jain, S. Multi-Objective Generation and Emission Dispatch Using NSGA-II. International Journal of Engineering and Technology 2011, 3, 460–466. [Google Scholar] [CrossRef]
- Gaing, Z.L. Particle swarm optimization to solving the economic dispatch considering the generator constraints. IEEE Transactions on Power Systems 2003, 18, 1187–1195. [Google Scholar] [CrossRef]
- Abbas, G.; Gu, J.; Farooq, U.; Asad, M.U.; El-Hawary, M. Solution of an Economic Dispatch Problem Through Particle Swarm Optimization: A Detailed Survey - Part I. IEEE Access 2017, 5, 15105–15141. [Google Scholar] [CrossRef]
- Chen, X. Novel dual-population adaptive differential evolution algorithm for large-scale multi-fuel economic dispatch with valve-point effects. Energy 2020, 203, 117874. [Google Scholar] [CrossRef]
- Goni, M.O.F.; Nahiduzzaman, M.; Anower, M.S.; Kamwa, I.; Muyeen, S. Integration of machine learning with economic energy scheduling. International Journal of Electrical Power & Energy Systems 2022, 142, 108343. [Google Scholar] [CrossRef]
- Visutarrom, T.; Chiang, T.C.; Konak, A.; Kulturel-Konak, S. Reinforcement Learning-Based Differential Evolution for Solving Economic Dispatch Problems. In Proceedings of the 2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM). IEEE; 2020; pp. 913–917. [Google Scholar] [CrossRef]
- Sage, M.; Zhao, Y.F. Deep reinforcement learning for economic battery dispatch: A comprehensive comparison of algorithms and experiment design choices. Journal of Energy Storage 2025, 115, 115428. [Google Scholar] [CrossRef]
- Chen, M.; Shen, Z.; Wang, L.; Zhang, G. Intelligent Energy Scheduling in Renewable Integrated Microgrid With Bidirectional Electricity-to-Hydrogen Conversion. IEEE Transactions on Network Science and Engineering 2022, 9, 2212–2223. [Google Scholar] [CrossRef]
- Zhan, J.; Wu, Q.; Guo, C.; Zhou, X. Fast λ-Iteration method for economic dispatch with prohibited operating zones. IEEE Transactions on power systems 2013, 29, 990–991. [Google Scholar] [CrossRef]
- Yalcinoz, T.; Altun, H.; Hasan, U. Constrained economic dispatch with prohibited operating zones: a Hopfield neural network approach. In Proceedings of the 2000 10th Mediterranean Electrotechnical Conference. Information Technology and Electrotechnology for the Mediterranean Countries. Proceedings. MeleCon 2000 (Cat. No. 00CH37099). IEEE, Vol. 2; 2000; pp. 570–573. [Google Scholar] [CrossRef]
- Su, C.T.; Chiou, G.J. An enhanced Hopfield model for economic dispatch considering prohibited zones. Electric Power Systems Research 1997, 42, 72–76. [Google Scholar] [CrossRef]
- Neto, J.X.V.; de Andrade Bernert, D.L.; dos Santos Coelho, L. Improved quantum-inspired evolutionary algorithm with diversity information applied to economic dispatch problem with prohibited operating zones. Energy Conversion and Management 2011, 52, 8–14. [Google Scholar] [CrossRef]
- Khoa, T.H.; Vasant, P.M.; Singh, M.S.B.; Dieu, V.N. Swarm based mean-variance mapping optimization for convex and non-convex economic dispatch problems. Memetic Computing 2016, 9, 91–108. [Google Scholar] [CrossRef]
- SU, C.T. Nonconvex Power Economic Dispatch by Improved Genetic Algorithm with Multiplier Updating Method. Electric Power Components and Systems 2004, 32, 257–273. [Google Scholar] [CrossRef]






| Generator | Power Output (MW) | Generator | Power Output (MW) |
|---|---|---|---|
| 1 | 422.7000 | 9 | 25.0000 |
| 2 | 454.9785 | 10 | 20.0257 |
| 3 | 130.0000 | 11 | 20.0000 |
| 4 | 130.0000 | 12 | 22.5082 |
| 5 | 388.0750 | 13 | 25.0000 |
| 6 | 456.7136 | 14 | 15.0000 |
| 7 | 465.0000 | 15 | 15.0000 |
| 8 | 60.0000 |
| Method | Best Cost ($/h) | Average Cost ($/h) | CPU Time (s) |
|---|---|---|---|
| GA [18] | 33,113.00 | 33,228.00 | 49.31 |
| PSO [18] | 32,858.00 | 33,039.00 | 26.59 |
| - iterative [25] | 32704.45 | - | - |
| IHNN [26] | 32858.00 | - | - |
| EHNN [27] | 32555.00* | - | - |
| EP [27] | 32715.94 | - | - |
| QEA [28] | 32576.45 | - | - |
| IQEA [28] | 32574.03 | - | - |
| MVMO [29] | 32569.54 | 32572.37 | 9.651 |
| [29] | 32563.58 | 32565.03 | 10.258 |
| GRPO (Proposed) | 32421.67 | 32456.37 | 13.545 |
| *The solution from EHNN is not fully feasible, with 0.8 MW of unallocated power. | |||
| Method | No. of units | Min cost (best) ($) | Average cost ($) | CPU time (s) |
|---|---|---|---|---|
| MVMO [29] | 30 | 65,086.3370 | 65,090.2023 | 17.051 |
| 60 | 130,170.8046 | 130,175.0956 | 30.030 | |
| 90 | 195,258.6600 | 195,263.5962 | 41.574 | |
| [29] | 30 | 65,086.2051 | 65,089.2153 | 18.096 |
| 60 | 130,170.7797 | 130,175.0130 | 31.325 | |
| 90 | 195,258.4951 | 195,263.5819 | 43.633 | |
| CGA [30] | 30 | - | 65,784.740 | 275.73 |
| 60 | - | 131,992.310 | 563.81 | |
| 90 | - | 198,831.690 | 940.93 | |
| IGAMUM [30] | 30 | - | 65,089.954 | 79.80 |
| 60 | - | 130,180.030 | 162.58 | |
| 90 | - | 195,274.060 | 255.45 | |
| GRPO (Proposed) | 30 | 64558.0856 | 64593.320 | 37.16 |
| 60 | 129217.270 | 129249.667 | 79.98 | |
| 90 | 193936.087 | 194055.320 | 138.14 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).