Modern GPGPU's have enabled massively parallel computing with programmability that can exploit the highly parallel nature of LDPC decoding. Previous works customized the design on a GPGPU towards specific execution attributes of a particular LDPC decoding matrix. Supporting different LDPC decoding matrices requires either substantial rework on the current program, or a brand new parallel design. This paper proposes two unified designs that can achieve high performance for both regular and irregular LDPC decoding on a GPGPU. The first design introduces a node-based scheme with a versatile translation array mechanism that can efficiently handle the complex data access patterns of different LDPC decoding matrices. The second design proposes an edge-based parallel paradigm that uses more intuitive data layout. More edges than nodes in a Tanner graph also give the edge-based design higher computation parallelism when there are limited concurrent codewords. With the proposed unified designs, designers can be ignorant of the types of LDPC matrices and achieve high performance LDPC decoding. The experiments on a GTX 470 GPGPU have demonstrated up to 134.56x runtime improvement, when compared with designs on a high-end CPU. The maximum throughput can reach 80.25 Mbps. When compared with the previous customized designs, the proposed systematic designs can reach better performance while relieving the effort of customization.
- C.4 performance of systems
- D.2.2 design tools and techniques