This paper proposes a virtual cluster architecture, which executes multi-cluster VLIW programs with a reduced number of clusters in a time-sharing fashion. The interleaved sub-VLIWs help to hide instruction latencies significantly, and thus the proposed virtual cluster will have advantages of (1) reduced forwarding complexity in the processor datapath, (2) improved programming model for further code optimizations, and (3) supporting composite instructions without any extra functional unit. In our experiments with a 4-cluster VLIW DSP, the 28 forwarding paths inside a cluster are completely eliminated, which contributes to savings of 21.71% delay and 17.56% silicon area. Moreover, the virtual cluster has been verified to have better efficiency on its code sizes and execution times for its improved programming model for various DSP kernels.
|Number of pages||4|
|Journal||Proceedings - IEEE International Symposium on Circuits and Systems|
|State||Published - 27 Sep 2007|
|Event||2007 IEEE International Symposium on Circuits and Systems, ISCAS 2007 - New Orleans, LA, United States|
Duration: 27 May 2007 → 30 May 2007