Generating L2 MRCs has generally been assumed to be expensive when done in software and hence several researchers have proposed adding hardware support to generate them online. As a result, the use of L2 MRCs for online optimizations has been limited.
We have developed a low-overhead software technique to obtain L2 MRCs online on IBM POWER5 processors, exploiting features available in the processors' performance monitoring units so that no changes to the application source code or binaries are required. We consider the technique to be low overhead because it requires a single probing period of roughly 221 million processor cycles (147 ms), and subsequently 124 million cycles (83 ms) to process the data. We demonstrate the accuracy of our technique by comparing the obtained MRCs to the actual L2 MRCs of 30 applications taken from SPECcpu2006, SPECcpu2000, and SPECjbb2000. We show that RapidMRC can be applied to sizing cache partitions in order to help achieve up to 27% performance improvements for some applications.