Xinyi Sheng


Topic
Enhancing Control Reliability in Reinforcement Learning
Many reinforcement learning (RL) systems are designed to maximize average performance — but in real-world settings, what happens in a single run matters more than theoretical average. A simple example: if an agent repeatedly invests based on a coin toss, the average outcome may look good on paper, but in reality, most runs end in failure. This highlights a major gap between expected results and actual outcomes, especially in complex, uncertain environments. We need learning systems that are reliable, not just in theory, but in every run.
We propose a new way to train RL agents that shifts the focus from just optimizing the expected return to also considering long-term, trajectory-level performance. Our method estimates a “time-average growth rate” directly from what the agent experiences, without needing a model of the environment. It works as a simple add-on to standard RL methods and requires no extra complexity.
This approach leads to more reliable and stable behavior, especially in noisy or unpredictable environments. It aligns learning with how systems actually behave over time, not just on average. The method is easy to integrate into existing RL frameworks, with no need for additional modeling or assumptions.
Other techniques that improve reliability often need detailed knowledge of the environment or are limited to very specific types of problems. Our method is general, flexible, and works directly from data, which makes it a practical and scalable solution for real-world control systems.

