Research

Brief introduction:

We introduce a novel formulation of risk-sensitive RL in partially observable environments with hindsight observation, which finds various application scenarios but lacks theortical investigation. We develop the first provably sample-efficient algorithm tailored for this setting and showed by rigorous analysis that our algorithm outperforms or matches existing regret bounds when the environment degenerates to simpler settings. We adopt novel analytical techniques and validated the theoretical findings through numerical experiments. This work will be presented in ICML 2024.

We summarized our findings in this paper and you can find the numerical experiments in this Github repository.

To cite our work:

Tonghe Zhang, Yu Chen, Longbo Huang. Provably Efficient Partially Observable Risk-sensitive Reinforcement Learning with Hindsight Observation. In International Conference on Machine Learning (ICML). PMLR, 2024.