Distributed arithmetic (DA) has been widely used to implement inner product computations with a fixed input. Conventional ROM-based DA suffers from large ROM requirements. A new DA algorithm is proposed that expands the fixed input instead of the variable input into bit level as in ROM-based DA. Thus the new DA algorithm can take advantage of shared partial sum-of-products and sparse nonzero bits in the fixed input to reduce the number of computations. Unlike ROM-based DA that stores the precomputed results the new DA algorithm uses a predefined structure to compute results. When applied to a 1-D eight-point DCT system the new DA algorithm only needs 30% of hardware area and has faster speed as compared with ROM-based DA. To illustrate the efficiency of the proposed algorithm a 2-D IDCT chip was implemented using 0.8 μm SPDM CMOS technology. The chip with size 4575 × 5525 μm can deliver a processing rate of 50 M pixels per second.