Effective Hardware-Based Data Prefetching for High-Performance Processors

Tien-Fu Chen, Jean Loup Baer

Research output: Contribution to journalArticlepeer-review

301 Scopus citations


Memory latency and bandwidth are progressing at a much slower pace than processor performance. In this paper, we describe and evaluate the performance of three variations of a hardware function unit whose goal is to assist a data cache in prefetching data accesses so that memory latency is hidden as often as possible. The basic idea of the prefetching scheme is to keep track of data access patterns in a Reference Prediction Table (RPT) organized as an instruction cache. The three designs differ mostly on the timing of the prefetching. In the simplest scheme (basic), prefetches can be generated one iteration ahead of actual use. The lookahead variation takes advantage of a lookahead program counter that ideally stays one memory latency time ahead of the real program counter and that is used as the control mechanism to generate the prefetches. Finally the correlated scheme uses a more sophisticated design to detect patterns across loop levels. These designs are evaluated by simulating the ten SPEC benchmarks on a cycle-by-cycle basis. The results show that 1) the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regular caches, 2) the benefits are greater when the hardware assist augments small on-chip caches, and 3) the lookahead scheme is the preferred one cost-performance wise.

Original languageEnglish
Pages (from-to)609-623
Number of pages15
JournalIEEE Transactions on Computers
Issue number5
StatePublished - 1 Jan 1995


  • branch prediction
  • cycle-by-cycle simulations
  • data cache
  • hardware function unit
  • Prefetching
  • reference prediction

Fingerprint Dive into the research topics of 'Effective Hardware-Based Data Prefetching for High-Performance Processors'. Together they form a unique fingerprint.

Cite this