Human detection and tracking is important for user-friendly human-robot interaction. The robot should be able to find the user autonomously and keep its attention to the user in a human-like manner. In this paper, a design and experimental study of robust human detection and tracking is presented through fusion several modalities of sensory information. The multi-modal interaction design utilizes a combination of visual, audio, and laser scanner data for reliable detection and tracking of an interested user. During tracking motion, obstacle avoidance behavior will be activated any time required to ensure safety. Furthermore, user can further assign the robot to interact with other user by speech command. Experimental results show that the robot can robustly tracks person under complex scenarios.