What is it about?
This paper alleviates the front-end bottleneck in server applications through instruction prefetching. Leveraging a hardware-software co-design approach, we first perform lightweight static analysis on programs and partition call graphs into Bundles. Then, a hardware prefetcher conducts coarse-grained instruction prefetching at the Bundles level (10-100 KBs in instruction footprint). By increasing prefetch depth to improve timeliness and coverage while maintaining accuracy, we achieve an average IPC improvement of 6.6% across a range of popular server applications.
Featured Image
Photo by İsmail Enes Ayhan on Unsplash
Why is it important?
Why it matters: Large-scale servers are economically and practically crucial. Key innovation: Departing from fine-grained prior work, our coarse-grained prefetching exploits instruction stream repetition at scale, boosting prefetch depth to enhance coverage and timeliness.
Read the Original
This page is a summary of: Hierarchical Prefetching: A Software-Hardware Instruction Prefetcher for Server Applications, March 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3676641.3716260.
You can read the full text:
Contributors
The following have contributed to this page







