Motion estimation (ME) in the latest High Efficiency Video Coding standard adopts the quadtree coding structure and up to a 64×64 prediction unit (PU) size to improve the coding gain. However, these techniques also have serious design problems regarding the complexity, data dependency, external memory bandwidth, and on-chip buffer size compared with previous standards, especially for real-time ultrahigh-definition video coding. To solve these problems, this paper proposes an efficient ME design with a joint algorithm and architecture optimization. To reduce complexity, we propose a predictive integer ME (IME) algorithm that selects the most probable search directions and steps through a statistical analysis to reduce the number of search points by 90.5%. We also employ a PU size-dependent fractional ME (FME) algorithm to reduce the interpolation filtering by 62.4% compared with the reference software. To resolve the corresponding dependency, we cascade the IME and FME computations via interlaced scheduling and propose an early motion vector prediction candidate approach. We use this scheduling with a 16×16 processing unit to compute the partial matching cost of all PUs with the same 16×16 current block in an interlaced order and share their common reference block to reduce the on-chip buffer size and off-chip memory bandwidth. The bandwidth is further reduced by a cache with double Z scan indexed addressing to simplify the cache controller. Implementation with a Taiwan Semiconductor Manufacturing Company 90-nm CMOS process supports the real-time encoding of 4K×2 K at 60 frames/s operated at 270 MHz with 778.7k logic gates and 17.4 KB of on-chip memory.
|Number of pages||12|
|Journal||IEEE Transactions on Circuits and Systems for Video Technology|
|State||Published - 1 Sep 2015|
- Motion estimation
- VLSI architecture