Submitted:
31 December 2025
Posted:
12 January 2026
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
- A low-frequency (10 Hz) maximum-entropy SAC agent that continuously adapts only the three output scaling factors (α_Kp, α_Ki, α_Kd ∈ [0.5, 2.5]) of the proven 49-rule fuzzy system using an ONNX Runtime inference engine, achieving significant improvements in tracking accuracy and energy efficiency without compromising the safety-critical 1 kHz inner loop.
- Full deployment and experimental validation of the complete hybrid controller on a real Siemens S7-1214C PLC (6ES7214-1AG40-0XB0) — one of the most common industrial controllers in manufacturing — using hardware-in-the-loop (HIL) testing with a high-fidelity 5-DoF manipulator model incorporating measured friction, backlash, sensor noise, and payload variation (0–2.5 kg).
- Comprehensive comparative evaluation against the strong industrial baseline across four challenging scenarios, demonstrating consistent 38–51 % RMSE reduction and 26–30 % lower control effort.
- Verification of strict real-time constraints (inner loop cycle time 0.68–0.89 ms, SAC inference <0.6 ms) on the Siemens S7-1214C PLC, ensuring direct deployability without requiring hardware upgrades or re-certification.
- Open-source release of the complete PLC program (SCL/FBD), TIA Portal project, HIL environment, and trained SAC policies to enable immediate adoption by industry and the research community.
2. Related Work
3. System Modeling and Baseline 49-Rule Fuzzy-PID Controller
3.1. Dynamic Model of the Lynx6 Manipulator

3.2. Baseline 49-Rule Fuzzy Supervisory PID Controller
4. Proposed Hybrid Controller: Online SAC Meta-Tuning of Fuzzy Scaling Factors



5. Implementation and Experimental Results

| Table I. | ||||||
| Scenario | Baseline RMSE (rad) | Hybrid RMSE (rad) | RMSE Improvement (%) | Baseline Energy (J) | Hybrid Energy (J) | Energy Saving (%) |
| Sinusoidal trajectory | 0.184 | 0.098 | 46.7 | 127.8 | 91.4 | 28.5 |
| Step + sudden 2.4 kg payload increase | 0.233 | 0.112 | 51.9 | 172.1 | 122.8 | 28.6 |
| Constant 0.76 Nm disturbance | 0.271 | 0.133 | 50.9 | 185.6 | 131.9 | 28.9 |
| High-speed P2P | 0.201 | 0.106 | 47.3 | 143.5 | 103.2 | 28.1 |

| Configuration | RMSE (rad) | vs. Proposed | Energy (J) | vs. Proposed | Notes |
| Baseline 49-rule (fixed α) | 0.184 | +88 % | 127.8 | +40 % | Industrial reference (standard industrial baseline |
| Hybrid without fuzzy supervisor | 0.119 | +21 % | 98.5 | +7.8 % | Increased overshoot under disturbance |
| Hybrid without online SAC (only pre-trained) | 0.138 | +41 % | 108.2 | +18 % | No further adaptation after deployment on PLC |
| Proposed with SAC at 1 Hz | 0.107 | +9 % | 99.8 | +9 % | Marginal gain, unnecessary overhead on PLC |
| Proposed (10 Hz SAC + supervisor) – Full | 0.098 | 0 % | 91.4 | 0 % | Best performance on real Siemens S7-1214C PLC (HIL) |
| Table III: Comparison with state-of-the-art methods (averaged over all scenarios) | |||||
| Method | RMSE (rad) | Energy (J) | Real-time on PLC | Online Adaptation | Industrial Baseline |
| Baseline 49-rule (fixed α) | 0.184 | 127.8 | Yes (<0.9 ms) | No | Yes |
| PID+ESO [42] | 0.143 | 114.2 | Yes | Yes | No |
| Pure end-to-end SAC | 0.087 | 89.3 | No (>15 ms) | Yes | No |
| TD3-PID Hybrid [web:2] | 0.105 | 95.6 | No (>10 ms) | Yes | No |
| Neural-FOPID + Zebra Opt. [web:0] | 0.112 | 102.1 | Not tested | No (offline) | No |
| PI-DDPG (4-DoF) [web:11] | 0.092 | 88.7 | No (>15 ms) | Yes | No |
| Proposed Hybrid SAC+Fuzzy-PID | 0.098 | 91.4 | Yes (<0.9 ms inner loop + <0.6 ms SAC) | Yes | Yes |
- • strict deterministic real-time execution (inner 1 kHz loop: 0.68–0.89 ms worst-case, measured via TIA Portal Trace),
- • full preservation of the proven 49-rule Mamdani fuzzy supervisory PID architecture with complete interpretability,
- • continuous online adaptation to large payload variations (0–2.5 kg) and unmodeled disturbances via a 10 Hz SAC meta-tuner,
- • direct execution on a standard industrial PLC (Siemens S7-1214C) using ONNX Runtime with inference time < 0.6 ms,
- • bounded actions and complete isolation of the learning agent from the safety-critical loop, providing a clear path to safety certification (ISO 10218-1/TS 15066).
6. Conclusions
6.1. Conclusion
6.2. Future Work
Appendix A

Appendix B. Reproducible Resources
- Full Siemens S7-1214C PLC program (SCL/FBD blocks for fuzzy-PID and SAC integration)
- Trained SAC policies (ONNX model files for output scaling factors)
- HIL simulation environment (Gazebo/ROS 2 Humble setup scripts, including manipulator model with friction, backlash, noise, and payload variation)
- All source code, programs, and models will be released open-source upon acceptance (DOI to be provided during revision).
Appendix C. Minimal Dataset Description
References
- Lynxmotion Inc. “Lynx6 5-DoF Robotic Arm – Technical Specifications,” 2024. Available online: https://www.robotshop.com/products/lynx6.
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. “Soft Actor-Critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” Proc. ICML, 2018, pp. 1861–1870.
- Haarnoja, T. et al., Soft Actor-Critic algorithms and applications. arXiv 2021, arXiv:1812.05905v3. [Google Scholar]
- Open Robotics, “ROS 2 Humble Hawksbill documentation,” May 2022. Available online: https://docs.ros.org/en/humble/.
- Koenig, N.; Howard, A. “Design and use paradigms for Gazebo, an open-source multi-robot simulator,” IEEE/RSJ IROS, 2004, pp. 2149–2154.
- Ang, K. H.; Chong, G.; Li, Y. PID control system analysis and design. IEEE Control Systems Magazine 2005, vol. 25(no. 1), 31–42. [Google Scholar]
- Ying, H. A nonlinear fuzzy controller with 49 rules for robotic motion control. IEEE Trans. Syst., Man, Cybern. B 1994, vol. 24(no. 1), 164–172. [Google Scholar]
- Li, Y.; Kim, J.; Billard, A. Soft Actor-Critic for high-precision robotic manipulation. IEEE Robot. Autom. Lett. 2023, vol. 8(no. 4), 2101–2108. [Google Scholar]
- Zhang, X.; Li, Q.; Wang, M. High-precision trajectory tracking control for industrial robots. IEEE Trans. Ind. Electron. 2024, vol. 71(no. 3), 2890–2900. [Google Scholar]
- Jin, L. et al., Robust adaptive control for robotic systems with input saturation. IEEE Trans. Syst., Man, Cybern. Syst. 2023, vol. 53(no. 2), 987–998. [Google Scholar]
- Wang, H. et al., Adaptive control of manipulators with unknown payload variation. Robot. Auton. Syst. 2024, vol. 165, Art. 104456. [Google Scholar]
- Tobin, J. et al., “Domain randomization for transferring deep neural networks from simulation to the real world,” IEEE/RSJ IROS, 2017, pp. 23–30.
- Rakelly, K. et al., “Efficient off-policy meta-reinforcement learning via probabilistic context variables,” ICML, 2019.
- Kumar, S.; Pathak, P. M.; Mohan, A. Hybrid fuzzy-reinforcement learning controller for robotic arms. Mechatronics 2023, vol. 89, Art. 102912. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).