TY - JOUR
T1 - A memory-efficient realization of cyclic convolution and its application to discrete cosine transform
AU - Chen, Hun Chen
AU - Guo, Jiun-In
AU - Chang, Tian-Sheuan
AU - Jen, Chein Wei
PY - 2005/3/1
Y1 - 2005/3/1
N2 - This paper presents a memory-efficient approach to realize the cyclic convolution and its application to the discrete cosine transform (DCT). We adopt the way of distributed arithmetic (DA) computation, exploit the symmetry property of DCT coefficients to merge the elements in the matrix of DCT kernel, separate the kernel to be two perfect cyclic forms, and partition the content of ROM into groups to facilitate an efficient realization of a one-dimensional (1-D) N-point DCT kernel using (N-1)/2 adders or substractors, one small ROM module, a barrel shifter, and ((N - 1)/2) + 1 accumulators. The proposed memory-efficient design technique is characterized by rearranging the content of the ROM using the conventional DA approach into several groups such that all the elements in a group can be accessed simultaneously in accumulating all the DCT outputs for increasing the ROM utilization. Considering an example using 16-bit coefficients, the proposed design can save more than 57% of the delay-area product, as compare with the existing DA-based designs in the case of the 1-D seven-point DCT. Finally, a 1-D DCT chip was implemented to illustrate the efficiency associated with the proposed approach.
AB - This paper presents a memory-efficient approach to realize the cyclic convolution and its application to the discrete cosine transform (DCT). We adopt the way of distributed arithmetic (DA) computation, exploit the symmetry property of DCT coefficients to merge the elements in the matrix of DCT kernel, separate the kernel to be two perfect cyclic forms, and partition the content of ROM into groups to facilitate an efficient realization of a one-dimensional (1-D) N-point DCT kernel using (N-1)/2 adders or substractors, one small ROM module, a barrel shifter, and ((N - 1)/2) + 1 accumulators. The proposed memory-efficient design technique is characterized by rearranging the content of the ROM using the conventional DA approach into several groups such that all the elements in a group can be accessed simultaneously in accumulating all the DCT outputs for increasing the ROM utilization. Considering an example using 16-bit coefficients, the proposed design can save more than 57% of the delay-area product, as compare with the existing DA-based designs in the case of the 1-D seven-point DCT. Finally, a 1-D DCT chip was implemented to illustrate the efficiency associated with the proposed approach.
KW - Cyclic convolution
KW - Discrete cosine transform (DCT)
KW - Distributed arithmetic
UR - http://www.scopus.com/inward/record.url?scp=15244339187&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2004.842608
DO - 10.1109/TCSVT.2004.842608
M3 - Article
AN - SCOPUS:15244339187
VL - 15
SP - 445
EP - 453
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
SN - 1051-8215
IS - 3
ER -