RapidMRC: Approximating L2 Miss Rate Curves on IBM POWER5 Systems for Online Optimizations

Miss rate curves (MRCs) are useful in a number of contexts. In our research, online L2 cache MRCs can provide a way to dynamically identify optimal cache sizes when cache-partitioning a shared-cache multicore. L2 MRCs are also useful for other online optimizations such as predicting the amount of cache to dynamically power down, managing bus bandwidth contention due to cache misses, guiding co-scheduling algorithms in making optimal use of a shared cache, and dynamic compiler optimizations.

Generating L2 MRCs has generally been assumed to be expensive when done in software and hence several researchers have proposed adding hardware support to generate them online. As a result, the use of L2 MRCs for online optimizations has been limited.

We have developed a low-overhead software technique to obtain L2 MRCs online on IBM POWER5 processors, exploiting features available in the processors' performance monitoring units so that no changes to the application source code or binaries are required. We consider the technique to be low overhead because it requires a single probing period of roughly 221 million processor cycles (147 ms), and subsequently 124 million cycles (83 ms) to process the data. We demonstrate the accuracy of our technique by comparing the obtained MRCs to the actual L2 MRCs of 30 applications taken from SPECcpu2006, SPECcpu2000, and SPECjbb2000. We show that RapidMRC can be applied to sizing cache partitions in order to help achieve up to 27% performance improvements for some applications.

Greg Steffan
Last modified: Mon Dec 22 10:26:27 EST 2008