Investment decision making is considered as a series of complicated processes, which are difficult to be analyzed and imitated. Given large amounts of trading records with rich expert knowledge in financial domain, extracting its original decision logics and cloning the trading strategies are also quite challenging. In this paper, an agent-based reinforcement learning (RL) system is proposed to mimic professional trading strategies. The concept of continuous Markov decision process (MDP) in RL is similar to the trading decision making in financial time series data. With the specific-designed RL components, including states, actions, and rewards for financial applications, policy gradient method can successfully imitate the expert's strategies. In order to improve the convergence of RL agent in such highly dynamic environment, a pre-Trained model based on supervised learning is transferred to the deep policy networks. The experimental results show that the proposed system can reproduce around eighty percent trading decisions both in training and testing stages. With the discussion of the tradeoff between explorations and model updating, this paper tried to fine-Tuning the system parameters to get reasonable results. Finally, an advanced strategy is proposed to dynamically adjust the number of explorations in each episode to achieve better results.