Many robotic vision systems suffer from unsatisfied recognition rate and high computational load. A solution to the design of a fast and sufficiently accurate robotic vision system requires urgent attention. This paper presents a novel image processing architecture for visual attention and tracking applications using a series of low-level image processing units. The proposed system features a human-mimic way to segment and keep tracking of interested objects. These simplification and tracking schemes help to enhance both the speed and correctness of high level vision applications such as object detection and recognition. Furthermore, the proposed system can be executed under low frame rates, and can adapt itself to changing environments, which often cause problems in practical robotic applications. The experimental results show that the system is robust against pose translation, camera motion, motion blur, and temporal occlusion.