Computer Organization

Implementing a Processor: Multi-cycle Implementation

Andreas Moshovos

Spring 2005

 

 

MULTICYCLE IMPLEMENTATION: The Datapath

 

As with the single-cycle implementation our processor will consist of two cooperating units the datapath and the control. We will first design the datapath and then the control. The key difference here is that the execution of a single instruction will take multiple cycles to complete. Accordingly, the datapath will have to change a bit. The key change will be the introduction of temporary registers to hold the outcomes that are produced at each cycle. This is best understood by looking at the schematic for the datapath. For the time being please ignore the details and focus on the grey boxes. These are the new registers:

 

The following temporary registers are introduced:

 

1.    IR or Instruction Register: This is used to hold the instruction encoding after it is read from memory. A register is needed because we will use a single memory device both for data and instructions. Accordingly, its output may change during the execution of an instruction (a load will read from memory).

2.    R1 and R2: These are used to temporarily hold the register values read from the register file.

3.    AluOut: This is used to temporarily hold the result calculated by the ALU.

4.    MDR or Memory Data Register: holds the value returned from memory so that it can later be written into the register file.

 

 

Observe that in our datapath there is now a single memory device, one ALU and no additional adders.

 

Let’s see how this datapath was derived. We will explain what happens cycle by cycle. The first two cycles are the same for all instructions since we need to fetch the instruction from memory and then decode it (i.e., the control has to look at the opcode and decide what to do next).

 

-----

CYCLE 1: Fetching the Instruction and Incrementing the PC

 

The first step in executing an instruction requires fetching the instruction from memory. For this we have to send the value of the PC register to the address lines of the memory device. Assuming that the memory will respond within this first cycle, we want to store the returned value (this is the encoding of the instruction that we should execute). To do this we need to take the value from the memory’s output and write into the IR register. These steps are possible as shown below:

 

Because we may access the same memory device to perform a load or a store (read and write respectively) a MUX is needed at the address input so that it is possible to send either the PC or another address. So, during the first cycle we will be reading the instruction encoding from memory. This is probably a good time to also calculate PC = PC + 1 as all instructions use this (even branches require PC + 1 as part of their target calculation). Here’s how this is done:

 

In parallel with the memory access, we send the PC value through the ALU1 mux to the ALU. As the second input to the ALU we send the number 1 (input 001 of MUX ALU2). Finally, we set ALUop to 000 (addition). As a result, the ALU will calculate PC + 1. By setting PCWrite to 1, at the end of the current clock cycle (negative edge), PC will change and will become PC + 1.

 

CYCLE 1 SUMMARY: In summary the following actions take place during the first cycle. This is often called the FETCH cycle.

      [IR] = Mem[ [PC] ]

      [PC] = [PC] + 1

 

-------

CYCLE 2: Decoding the instruction and reading from the register file

 

During the second cycle, the control will be taking a look at the instruction opcode in order to decide what should happen during the next cycle. Because many instructions use the registers specified in fields R1 and R2 of the instruction we also read these registers from the register file. Note that some instructions do not use R1 or R2. In this case, we would have read registers that we do not need. While this is extra work we literally had nothing better to do during the second cycle. So, it is OK in hardware to perform actions that may be useful and later ignore the results if they are not needed. This is permissible as long as the extraneous work does not change and machine state in an irreversible way (reads do not change the register values so they are OK).

 

It is important to note that during the 2nd cycle the datapath cannot take actions that depend on the actual instruction being executed. This is because we assume that the control needs a full cycle to decode the opcode and decide what needs to happen next. For our simple instruction set this is probably a pessimistic assumption. Not so for other architectures that have many more instructions.

 

 

Note that because the R1 and R2 field always appear at the same bit locations it is possible to blindly use them and access the register file even through the control has not yet had enough time to check the actual opcode.

 

Schematically, here’s what happens in the datapath:

Thus at the end of the 2nd cycle, registers R1 and R2 are loaded with the values held by the registers identified by the instruction bit fields R1 and R2 respectively.

 

CYCLE 2 SUMMARY:

      [R1] = RF[[IR7..6]]

      [R2] = RF[[IR5..4]]

      Instruction Decode

 

 

-----

 

CYCLE 3 and after

 

What happens after cycle 2 depends on the actual instruction.  Accordingly we will consider each instruction in turn.

 

 

*** ADD, SUB and NAND

 

The execution of these three instruction proceeds into additional steps:

 

In cycle 3 we calculate the operation specified by the instruction and at the end store the result into ALUout. In cycle 4 we write the result into the register file:

CYCLE 3

CYCLE 4

 

 

*** SHIFT

 

Shift is almost identical to ADD, SUB and NAND. The only difference is that during cycle 3 we do not use register R2 but the Imm3 field from IR:

 

CYCLE 3

CYCLE 4

 

*** ORI

 

ORI uses an implied source/destination register operand. Accordingly, the register we read in cycle 2 may not be the right one. For this reason, we have to access the register file again and read K1, then in cycle 4 we can perform the OR in the ALU and in cycle 5 write the result into the register file:

 

CYCLE 3

CYCLE 4

CYCLE 5

 

*** LOAD

 

For a LOAD instruction we will be accessing memory during cycle 3 and storing the returned value into MDR. Then in cycle 4 we can write this value in the register file.

 

CYCLE 3

CYCLE 4

 

*** STORE

 

For a STORE instruction we will be accessing memory during cycle 3 to write the value into memory:

 

CYCLE 3

 

*** BRANCHES

 

For branch instructions in cycle 3 we will calculate PC + 1 + Sign-Extended (Imm4) and at the end of the cycle depending on whether the condition is true or not we will write this value into the PC. Recall that during cycle 1 we changed the PC to PC’ = PC + 1, so now we need to calculate PC’ + Sign-Extended(imm4):

CYCLE 3

 

The decision on whether to change the PC will be taken by the control. The decision can be enforced by setting the PCWrite signal (1 for changing PC 0 for keeping the PC + 1).