Developing parallel programs imposes many debugging challenges on multicore systems. Many researchers were successful to detect parallel faults in background by hardware assistance. However, it is still an urgent issue to reproduce the same faulted circumstance after faults occurred. Tracing the causality between events is a popular solution in current multicore systems, but it is limited by onchip storage and tracing bandwidth. As a result, an intelligent record and replay system is the key to the future multicore debugging problems. This paper proposes IMITATOR for both trace compression and deterministic replay. In contrast to the most other record and replay systems, IMITATOR presents an additional phase, refining phase, between record and replay phases to significantly reduce the recorder overhead, while enabling faster replaying. Results with SPLASH2 benchmark on a 32-core system show that IMITATOR can (a) significantly reduce trace size by the trace refining techniques (16% of native trace) and (b) achieve replay speed 1.96 times faster than the replayer using Sigrace scheme on average.