New systolic arrays for matrix multiplication

In this paper, three new systolic arrays for matrix multiplication are proposed. The first systolic array has the minimum number of 3n-2 clock cycles in completing a matrix multiplication among the known structures, with n 2 processors elements (PE's). It is achieved by applying a new input data flow and deposition scheme. The second array is derived by combining the data flow technique with the simple Blahut's matrix multiplication algorithm. Not only the second array has the least amount of processing time of 3n-2 clock cycles, it has the least area complexity of about n2/2 PE's. By further modifying its input data flow patterns, the third array is obtained. Its processing time is further reduced to 2.5n-2 clock cycles. The proposed architectures exhibit better performances than the known structures, according to several standard performance measures.

