Exploiting fine-grain parallelism in the H.264 deblocking filter by operation reordering

Tsung Hsi Weng*, Chung-Ping Chung

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


In the H.264 video compression standard, the deblocking filtering contributes about one-third of all computation in the decoder. With many-core architectures becoming the future trend of system design, computation time can be reduced if the deblocking appropriately apportions its operations to multiple processing elements. In this study, we used a four-pixel-long boundary as the basis for analyzing and exploiting possible parallelism. Compared with the two-dimensional (2D) wavefront method order for deblocking both 1920×1080- and 1080×1920-pixel frames, the proposed design exhibits speedups of 1.92 and 2.44 times, respectively, given an unlimited number of processing elements. Compared with our previous design, it gains speedups of 1.25 and 1.13 times, respectively. In addition, as the frame size grows, this approach requires only extra time that is proportional to the square root of the frame size increase (keeping the same width to height ratio), pushing the boundary of practical real-time deblocking of increasingly larger video sizes.

Original languageEnglish
Pages (from-to)76-87
Number of pages12
JournalFuture Generation Computer Systems
StatePublished - 1 Jan 2014


  • Data intensive
  • Deblocking
  • H.264
  • Many-core architecture
  • Parallelization

Fingerprint Dive into the research topics of 'Exploiting fine-grain parallelism in the H.264 deblocking filter by operation reordering'. Together they form a unique fingerprint.

Cite this