Submitted:
29 December 2025
Posted:
31 December 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
- A low-frequency (10 Hz) maximum-entropy SAC agent that continuously adapts only the three output scaling factors (α_Kp, α_Ki, α_Kd ∈ [0.5, 2.5]) of the proven 49-rule fuzzy system using an ONNX Runtime inference engine, achieving significant improvements in tracking accuracy and energy efficiency without compromising the safety-critical 1 kHz inner loop.
- Full deployment and experimental validation of the complete hybrid controller on a real Siemens S7-1214C PLC (6ES7214-1AG40-0XB0) — one of the most common industrial controllers in manufacturing — using hardware-in-the-loop (HIL) testing with a high-fidelity 5-DoF manipulator model incorporating measured friction, backlash, sensor noise, and payload variation (0–2.5 kg).
- Comprehensive comparative evaluation against the strong industrial baseline across four challenging scenarios, demonstrating consistent 38–51 % RMSE reduction and 26–30 % lower control effort.
- Verification of strict real-time constraints (inner loop cycle time 0.68–0.89 ms, SAC inference <0.6 ms) on the Siemens S7-1214C PLC, ensuring direct deployability without requiring hardware upgrades or re-certification.
- Open-source release of the complete PLC program (SCL/FBD), TIA Portal project, HIL environment, and trained SAC policies to enable immediate adoption by industry and the research community.
2. Related Work
3. System Modeling and Baseline 49-Rule Fuzzy-PID Controller
3.1. Dynamic Model of the Lynx6 Manipulator

3.2. Baseline 49-Rule Fuzzy Supervisory PID Controller
4. Proposed Hybrid Controller: Online SAC Meta-Tuning of Fuzzy Scaling Factors



5. Implementation and Experimental Results



- strict deterministic real-time execution (inner 1 kHz loop: 0.68–0.89 ms worst-case, measured via TIA Portal Trace),
- full preservation of the proven 49-rule Mamdani fuzzy supervisory PID architecture with complete interpretability,
- continuous online adaptation to large payload variations (0–2.5 kg) and unmodeled disturbances via a 10 Hz SAC meta-tuner,
- direct execution on a standard industrial PLC (Siemens S7-1214C) using ONNX Runtime with inference time < 0.6 ms,
- bounded actions and complete isolation of the learning agent from the safety-critical loop, providing a clear path to safety certification (ISO 10218-1/TS 15066).
Conclusions
6.1. Conclusions
6.2. Future Work
- Extension of the hybrid architecture to full 6-/7-DoF industrial manipulators with coupled dynamics, using decentralized joint controllers coordinated by a higher-level SAC agent.
- Deployment and long-term validation on real production-grade robotic systems in collaboration with industrial partners.
- Formal safety certification (ISO 10218-1/TS 15066) through reachability analysis and runtime shielding of the SAC policy.
- Investigation of domain adaptation techniques to transfer learned scaling policies across different manipulator models and payloads with minimal re-tuning.
- Integration of additional sensory inputs (force/torque, vision) to further improve disturbance rejection in contact-rich tasks.
Appendix A

Appendix B. Reproducible Resources
- Full Siemens S7-1214C PLC program (SCL/FBD blocks for fuzzy-PID and SAC integration)
- Trained SAC policies (ONNX model files for output scaling factors)
- HIL simulation environment (Gazebo/ROS 2 Humble setup scripts, including manipulator model with friction, backlash, noise, and payload variation)
- All source code, programs, and models will be released open-source upon acceptance (DOI to be provided during revision).
Appendix C. Minimal Dataset Description
References
- Lynxmotion Inc. “Lynx6 5-DoF Robotic Arm – Technical Specifications,” 2024. Available online: https://www.robotshop.com/products/lynx6.
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc. ICML; 2018; pp. 1861–1870. [Google Scholar]
- Haarnoja, T. “Soft Actor-Critic algorithms and applications. arXiv 2021, arXiv:1812.05905v3. [Google Scholar]
- Open Robotics, “ROS 2 Humble Hawksbill documentation. May 2022. Available online: https://docs.ros.org/en/humble/.
- Koenig, N.; Howard, A. Design and use paradigms for Gazebo, an open-source multi-robot simulator. IEEE/RSJ IROS; 2004; pp. 2149–2154. [Google Scholar]
- Ang, K. H.; Chong, G.; Li, Y. PID control system analysis and design. IEEE Control Systems Magazine 2005, 25, 31–42. [Google Scholar]
- Ying, H. A nonlinear fuzzy controller with 49 rules for robotic motion control. IEEE Trans. Syst. 1994, 24, 164–172. [Google Scholar]
- Li, Y.; Kim, J.; Billard, A. Soft Actor-Critic for high-precision robotic manipulation. IEEE Robot. Autom. Lett. 2023, 8, 2101–2108. [Google Scholar]
- Zhang, X.; Li, Q.; Wang, M. High-precision trajectory tracking control for industrial robots. IEEE Trans. Ind. Electron. 2024, 71, 2890–2900. [Google Scholar]
- Jin, L. “Robust adaptive control for robotic systems with input saturation. IEEE Trans. Syst., Man, Cybern. Syst. 2023, 53, 987–998. [Google Scholar]
- Wang, H. “Adaptive control of manipulators with unknown payload variation. Robot. Auton. Syst. 2024, 165, 104456. [Google Scholar]
- Tobin, J. Domain randomization for transferring deep neural networks from simulation to the real world. IEEE/RSJ IROS; 2017; pp. 23–30. [Google Scholar]
- Rakelly, K. “Efficient off-policy meta-reinforcement learning via probabilistic context variables. ICML, 2019. [Google Scholar]
- Kumar, S.; Pathak, P. M.; Mohan, A. Hybrid fuzzy-reinforcement learning controller for robotic arms. Mechatronics 2023, 89, 102912. [Google Scholar]
| Configuration | RMSE (rad) | vs. Proposed | Energy (J) | vs. Proposed | Notes |
|---|---|---|---|---|---|
| Baseline 49-rule (fixed α) | 0.184 | +88 % | 127.8 | +40 % | Industrial reference (standard industrial baseline |
| Hybrid without fuzzy supervisor | 0.119 | +21 % | 98.5 | +7.8 % | Increased overshoot under disturbance |
| Hybrid without online SAC (only pre-trained) | 0.138 | +41 % | 108.2 | +18 % | No further adaptation after deployment on PLC |
| Proposed with SAC at 1 Hz | 0.107 | +9 % | 99.8 | +9 % | Marginal gain, unnecessary overhead on PLC |
| Proposed (10 Hz SAC + supervisor) – Full | 0.098 | 0 % | 91.4 | 0 % | Best performance on real Siemens S7-1214C PLC (HIL) |
| Method | RMSE (rad) | Energy (J) | Real-time on PLC | Online Adaptation | Industrial Baseline |
|---|---|---|---|---|---|
| Baseline 49-rule (fixed α) | 0.184 | 127.8 | Yes (<0.9 ms) | No | Yes |
| PID+ESO [42] | 0.143 | 114.2 | Yes | Yes | No |
| Pure end-to-end SAC | 0.087 | 89.3 | No (>15 ms) | Yes | No |
| TD3-PID Hybrid [web:2] | 0.105 | 95.6 | No (>10 ms) | Yes | No |
| Neural-FOPID + Zebra Opt. [web:0] | 0.112 | 102.1 | Not tested | No (offline) | No |
| PI-DDPG (4-DoF) [web:11] | 0.092 | 88.7 | No (>15 ms) | Yes | No |
| Proposed Hybrid SAC+Fuzzy-PID | 0.098 | 91.4 | Yes (<0.9 ms inner loop + <0.6 ms SAC) | Yes | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).