This paper presents an efficient and scalable design for histogram-based bilateral filtering (BF) and joint BF (JBF) by memory reduction methods and architecture design techniques to solve the problems of high memory cost, high computational complexity, high bandwidth, and large range table. The presented memory reduction methods exploit the progressive computing characteristics to reduce the memory cost to 0.003%-0.020%, as compared with the original approach. Furthermore, the architecture design techniques adopt range domain parallelism and take advantage of the computing order and the numerical properties to solve the complexity, bandwidth, and range-table problems. The example design with a 90-nm complementary metal-oxide-semiconductor process can deliver the throughput to 124 Mpixels/s with 356-K gate counts and 23-KB on-chip memory.
- Bilateral filtering (BF)
- integral histogram (IH)
- very-large-scale-integration (VLSI) design