Conservative Risk-Sensitive Reinforcement Learning for Reliable Decision-Making Under Uncertainty

Yinghao Zhao; Yilin Li; Yingzi Wang; Yunfei Nie; Yixuan Lu; Nuo Chen

doi:10.20944/preprints202604.0300.v1

Submitted:

03 April 2026

Posted:

07 April 2026

You are already at the latest version

Abstract

This paper addresses complex decision-making scenarios characterized by high uncertainty and high-cost errors, researching a risk-sensitive decision-oriented reinforcement learning mining method. It focuses on resolving the reliability issues arising from tail instability in the reward distribution and out-of-distribution actions under offline data conditions. Methodologically, the decision-making process is modeled using a Markov framework, with the reward distribution as the learning object to retain value information under adverse conditions. Based on this, a conditional risk-value metric is introduced to explicitly characterize and suppress tail risk, ensuring that policy optimization no longer relies solely on expected returns. To mitigate estimation bias and over-extrapolation in offline learning, conservative constraints based on behavioral distribution are further incorporated. By limiting the deviation between the policy and the implicit behavioral distribution in the data, out-of-distribution action expansion is suppressed, and the controllability of policy updates is improved. The overall framework unifies risk measurement and conservative learning into a single optimization form, forming a policy learning mechanism that balances returns and safety. Comparative experimental results show that this method exhibits superior overall performance in terms of average returns, tail reward robustness, and safety-related indicators, validating the effectiveness of the co-modeling of risk-sensitive objectives and conservative constraints, and providing an auditable and adjustable risk control approach for highly reliable intelligent decision-making systems.

Keywords:

tail risk

;

offline policy learning

;

distribution offset suppression

;

constraint consistency

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Conservative Risk-Sensitive Reinforcement Learning for Reliable Decision-Making Under Uncertainty

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe