Prefetching has been shown to be one of several effective approaches that can tolerate large memory latencies. In this paper, we consider a prefetch engine called Hare, which handles prefetches at run time and is built in addition to the data pipelining in the on-chip data cache for high-performance processors. The key design is that it is programmable so that techniques of software prefetching can be also employed in exploiting the benefits of prefetching. The engine always launches prefetches ahead of current execution, which is controlled by the program counter. We evaluate the proposed scheme by trace-driven simulation and consider area and cycle time factors for the evaluation of cost-effectiveness. Our performance results show that the prefetch engine can significantly reduce data access penalty with only little prefetching overhead.
|Number of pages||6|
|Journal||Proceedings of the Annual International Symposium on Microarchitecture|
|State||Published - 1 Dec 1995|
|Event||Proceedings of the 1995 28th Annual International Symposium on Microarchitecture - Ann Arbor, MI, USA|
Duration: 29 Nov 1995 → 1 Dec 1995