Deep learning has achieved great success in many real-world applications. For speech and language processing, recurrent neural networks are learned to characterize sequential patterns and extract the temporal information based on dynamic states which are evolved through time and stored as an internal memory. Traditionally, simple transition function using input-to-hidden and hidden-to-hidden weights is insufficient. To strengthen the learning capability, it is crucial to explore the diversity of latent structure in sequential signals and learn the stochastic trajectory of signal transitions to improve sequential prediction. This paper proposes the stochastic modeling of transitions in deep sequential learning. Our idea is to enhance latent variable representation by discovering the Markov state transitions in sequential data based on a K-state long short-term memory (LSTM) model. Such a latent state machine is capable of learning the complicated latent semantics in highly structured and heterogeneous sequential data. Gumbel-softmax is introduced to implement stochastic learning procedure with discrete states. Experimental results on visual and text language modeling illustrate the merit of the proposed stochastic transitions in sequential prediction with limited amount of parameters.