Exploring state transition uncertainty in variational reinforcement learning

Jen-Tzung Chien, Wei Lin Liao, Issam El Naqa

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

Model-free agent in reinforcement learning (RL) generally performs well but inefficient in training process with sparse data. A practical solution is to incorporate a model-based module in model-free agent. State transition can be learned to make desirable prediction of next state based on current state and action at each time step. This paper presents a new learning representation for variational RL by introducing the so-called transition uncertainty critic based on the variational encoder-decoder network where the uncertainty of structured state transition is encoded in a model-based agent. In particular, an action-gating mechanism is carried out to learn and decode the trajectory of actions and state transitions in latent variable space. The transition uncertainty maximizing exploration (TUME) is performed according to the entropy search by using the intrinsic reward based on the uncertainty measure corresponding to different states and actions. A dedicate latent variable model with a penalty using the bias of state-action value is developed. Experiments on Cart Pole and dialogue system show that the proposed TUME considerably performs better than the other exploration methods for reinforcement learning.

原文English
主出版物標題28th European Signal Processing Conference, EUSIPCO 2020 - Proceedings
發行者European Signal Processing Conference, EUSIPCO
頁面1527-1531
頁數5
ISBN(電子)9789082797053
DOIs
出版狀態Published - 24 一月 2021
事件28th European Signal Processing Conference, EUSIPCO 2020 - Amsterdam, Netherlands
持續時間: 24 八月 202028 八月 2020

出版系列

名字European Signal Processing Conference
2021-January
ISSN(列印)2219-5491

Conference

Conference28th European Signal Processing Conference, EUSIPCO 2020
國家Netherlands
城市Amsterdam
期間24/08/2028/08/20

指紋 深入研究「Exploring state transition uncertainty in variational reinforcement learning」主題。共同形成了獨特的指紋。

引用此