Vector quantization (VQ) for real-time video/image coding often faces the challenge of processing a large amount data within a given timing constraint. However through a pre-processing of energy transformation, computation complexity and memory requirement inherent in a VQ algorithm can be reduced, leading to a very efficient ASIC architecture for real-time video/image coding. This paper first presents some optimization tasks at algorithm level by taking into account computation and memory requirements in order to ensure that a more optimal solution can be achieved. In the architecture design, distributed arithmetic (DA) technique, replacing the traditional multiplier, is exploited to perform energy computation. By exploiting pipeline and parallelism, required computations can be realized through a very regular structure. The final results show that a single-chip containing 128 codevectors can be achieved for multi-stage or full search VQ coding with reasonable area and I/O pin-count.