Latency-tolerant virtual cluster architecture for VLIW DSP

Pi Chen Hsiao*, Tay Jyi Lin, Chih-Wei Liu, Chein Wei Jen

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review


This paper proposes a virtual cluster architecture, which executes multi-cluster VLIW programs with a reduced number of clusters in a time-sharing fashion. The interleaved sub-VLIWs help to hide instruction latencies significantly, and thus the proposed virtual cluster will have advantages of (1) reduced forwarding complexity in the processor datapath, (2) improved programming model for further code optimizations, and (3) supporting composite instructions without any extra functional unit. In our experiments with a 4-cluster VLIW DSP, the 28 forwarding paths inside a cluster are completely eliminated, which contributes to savings of 21.71% delay and 17.56% silicon area. Moreover, the virtual cluster has been verified to have better efficiency on its code sizes and execution times for its improved programming model for various DSP kernels.

Original languageEnglish
Article number4253436
Pages (from-to)3506-3509
Number of pages4
JournalProceedings - IEEE International Symposium on Circuits and Systems
StatePublished - 27 Sep 2007
Event2007 IEEE International Symposium on Circuits and Systems, ISCAS 2007 - New Orleans, LA, United States
Duration: 27 May 200730 May 2007

Fingerprint Dive into the research topics of 'Latency-tolerant virtual cluster architecture for VLIW DSP'. Together they form a unique fingerprint.

Cite this