The image-based virtual try-on system has attracted a lot of research attention. The virtual try-on task is challenging since synthesizing try-on images involves the estimation of 3D transformation from 2D images, which is an ill-posed problem. Therefore, most of the previous virtual try-on systems cannot solve difficult cases, e.g., body occlusions, wrinkles of clothes, and details of the hair. Moreover, the existing systems require the users to upload the image for the target pose, which is not user-friendly. In this paper, we aim to resolve the above challenges by proposing a novel FashionOn network to synthesize user images fitting different clothes in arbitrary poses to provide comprehensive information about how suitable the clothes are. Specifically, given a user image, an in-shop clothing image, and a target pose (can be arbitrarily manipulated by joint points), FashionOn learns to synthesize the try-on images by three important stages: pose-guided parsing translation, segmentation region coloring, and salient region refinement. Extensive experiments demonstrate that FashionOn maintains the details of clothing information (e.g., logo, pleat, lace), as well as resolves the body occlusion problem, and thus achieves the state-of-the-art virtual try-on performance both qualitatively and quantitatively.