Reducing memory penalty by a programmable prefetch engine for on-chip caches

Tien-Fu Chen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


Prefetching has been shown to be one of several effective approaches that can tolerate large memory latencies. Hardware-based prefetching schemes handles prefetching at run-time without compiler intervention, whereas software-directed prefetching inserts prefetch instructions in the code by performing static data analysis. In this paper, we consider a prefetch engine called Hare, which handles prefetches at run time and is built in addition to the data pipelining in the on-chip data cache for high-performance processors. The key design is that it is programmable by the user code so that techniques of software prefetching can be also employed in exploiting the benefits of prefetching. The engine launches prefetches ahead of current execution, which is controlled by the program counter. We evaluate the proposed scheme by trace-driven simulation and consider area and cycle time factors for the evaluation of cost-effectiveness. Our performance results show that the prefetch engine can significantly reduce data access penalty with only little prefetching overhead.

Original languageEnglish
Pages (from-to)121-130
Number of pages10
JournalMicroprocessors and Microsystems
Issue number2
StatePublished - 1 Jan 1997


  • Compiler optimization
  • Data prefetch
  • Programmable engine
  • Software prefetch

Fingerprint Dive into the research topics of 'Reducing memory penalty by a programmable prefetch engine for on-chip caches'. Together they form a unique fingerprint.

Cite this