The authors present a new hardware-efficient group distributed arithmetic (GDA) design approach for the one-dimensional (1-D) discrete Fourier transform (DFT). The approach adopts distributed arithmetic (DA) computation and exploits the good features of cyclic convolution to facilitate an efficient realisation of the 1-D N-point DFT using small ROM modules, a barrel shifter, and N accumulators. The proposed GDA design is achieved by rearranging the contents of the ROM into several groups such that all the elements in a group can be accessed simultaneously in accumulating all the DFT outputs to increase ROM utilisation. Moreover, combining the symmetrical property of the DFT coefficients with the proposed GDA design requires only half the ROM contents to be stored, which further reduces ROM size by a factor of two. Realisation of a long-length DFT formulated in cyclic convolution is based on data permutation of the rows and columns in the matrix to directly partition the long-length cyclic convolution into short ones, so that short length DFTs may be realised efficiently by the proposed GDA design to achieve low hardware cost. This design approach is termed the 'block-based group distributed arithmetic approach'. Compared with existing systolic array designs and DA-based designs, the proposed GDA design can reduce the delay-area product by 29%-68% based on a 0.35 μm CMOS cell library.