Model-free agent in reinforcement learning (RL) generally performs well but inefficient in training process with sparse data. A practical solution is to incorporate a model-based module in model-free agent. State transition can be learned to make desirable prediction of next state based on current state and action at each time step. This paper presents a new learning representation for variational RL by introducing the so-called transition uncertainty critic based on the variational encoder-decoder network where the uncertainty of structured state transition is encoded in a model-based agent. In particular, an action-gating mechanism is carried out to learn and decode the trajectory of actions and state transitions in latent variable space. The transition uncertainty maximizing exploration (TUME) is performed according to the entropy search by using the intrinsic reward based on the uncertainty measure corresponding to different states and actions. A dedicate latent variable model with a penalty using the bias of state-action value is developed. Experiments on Cart Pole and dialogue system show that the proposed TUME considerably performs better than the other exploration methods for reinforcement learning.