Tag Elimination Assignment
We will be working with Simplescalar's OOO core simulator, sim-outorder. Most of the changes will happen in ruu_issue which is the function that simulates the OOO scheduler using register dependencies. We will overview how it works during the lectures.
Mirroring what would be a typical exploratory effort we will follow these steps:
1. Is there potential for our candidate technique to even work? What measurements would help us decide whether the idea is worth exploring further? Recall it is easier to show that something does not hold than that it does (prove that something is not true rather that it is). Mirroring the paper we want to first characterize instructions based on how many operands they need to wait for when they first enter the scheduler. Let's try to create a graph where along X we list the workloads. No need to do multiple scheduler sizes. Let's pick 128. The Y axis will be the fraction of instructions, up to 100%. List three categories: no need to wait on any operand, need to wait on 1 operand, and need to wait on 2 operands. Show these as stacked bars.
Any time we use simulation we are creating a model of a hypothetical machine. A critical consideration is correctness and then accuracy. Let's talk about correctness. How do we know that the model we created is correct. That is, it properly models what it is supposed to. In our case, the question transforms to: how do we know that the measurements we are collecting are correct. We need some validation test. However, “validation” is too strong a word here as it means making sure that it is absolutely correct. This is hard to do. So, lets instead talk about increasing our confidence that the measurements are correct.
For this purpose you will have to create a set of synthetic workloads (microbenchmarks) for which you know what the measurements should look like. It's up to you to figure out what these should be. I would suggest creating at least 3: 1. Worst case, 2. Best case, 3. “Average case”. Deliverable 1: The microbenchmark code and an explanation of why you think they are good validation microbenchmarks.
Deliverable 2: The above referenced graph for the workloads we have provided. No need to run the full 1B instructions. Feel free to do just 100M for all workloads except for gcc. For gcc run the full 1B.
2. Predicting the last tag: Assuming that the first set of experiments provide enough motivation to move forward, let's now try to build a predictor for the last tag. We will not be changing how the scheduler works. We will just check whether we can predict and with what accuracy which of the two operands an instruction should wait for. Let's try to report results as a graph where: X are the benchmarks and Y is accuracy (up to 100%). Let's us only focus on a 16K entry predictor which ought to be similar to what is described in the paper. Let's use 8 bits of global history and let's use the lower 14b of the instruction address (after we omit the lower bits that are always zero). Xor the bits with the upper 16b of the instruction address first and then index the table. The entries should be 2 bit saturating counters.
Deliverable 3: the accuracy measurements graph. In addition, instead of just reporting correct/wrong, what other categories you think would be useful.
3. We will not need you to change the simulator to model execution with the technique. This may be trickier than it looks. We may revisit this in a later assignment.