Lecture 2

Andreas Moshovos

Jan 2024

Another Example of Function Calls, Caller- & Calee-Saved Registers

We will be using an example function to illustrate the concepts of caller- and callee-registers how functions need to handle registers. First let’s start with the general concept and then we shall see how the concept apply to a specific example.

As we know the CPU has two mechanisms for storing values: registers and memory. Our discussion here what conventions apply to registers when one function calls another. The question boils down to the following:

1. If I am a caller and I make a call, what can I expect to happen to value held by each register. Will it change or will it remain the same.

2. If I am callee, what expectations are there for each register value from whoever called me? In other words, can I go ahead and use and overwrite a register without worry?

So, the scenario we are looking it is what happens to registers across a call as depicted below. We have a caller function A() which will make a call to another function B().

Our discussion revolves around what are the expectations and responsibilities of A() and B() around the call to B().

The Caller: A() has to ask itself: “is there a register that BEFORE I make the call to B() has a value that I might use AFTER the call returns?”. In the figure above, we show that r8 and r9 are set to 10 and 20 before the call to B(). After the Call returns, A() only uses R8 expecting that it will have the same value as it did before the call.

The Callee: Once B() is called it has to ask itself: “Can I go ahead and use a register say r16, and overwrite it? Or, is whoever called me expect that the register r16 to nt change”?

What is also relevant to our discussion, is that neither A() nor B() can assume that they know which registers the other function may use at any point, which values it cares about, etc. Recall, that B() could be called from other functions and that A() and B() could be written by different people. Long story short, to make things work there is a set of conventions that apply to registers across calls.

Some registers are CALLER-SAVED whereas others are CALLEE-SAVED.

The convention is as follows:

If a register is caller-saved, then it is up to a caller to preserve it across calls.

If a register is a callee-saved register, then it is up to the callee to ensure that the register keeps its original value before the callee returns.

How do functions preserve registers? On the stack. A figure helps:

Here we assume that R8 is a caller-saved register, whereas r16 is a callee-saved register. For R8, A() being the caller saves the value before the call and restores it after the call returns. So, to A() R8 after the call ends up having the same value it did before the call was made. Now, B() could have changed r8, but A() does not need to know nor does it care since it “protected” the value or R8 it cared about.

Similarly, we assume the R16 is a callee-saved register which means that B() has to assume that whoever called it (A() in our example) expects that R16’s value after B() returns, will be exactly the same it was before it called B(). How does B() achieve this? It saves R16 in the beginning, and restores it before returning. So, to whichever function called, it looks as if R16 never changed.

Notice that A() does not know for sure that B() will overwrite R8, nor does B() know for sure whether A() cares for R16’s value. By convention they have to assume the “worst” and save and restore the registers accordingly.

Where do functions save to and restore from register values? On the stack :) In our example A(), the caller, will push R8 on the stack just before the call, and it will pop it after the call. Similarly, B() will push r16’s value on the stack as soon as it starts executing, and will restored it just before returning. Notice that in either case, whichever function pushed a value on the stack, it popped it before returning. So, the convention that every function should only add things on the stack and then remove it before returning, is abided by.

For our purposes suffice to be aware the following conventions hold:

R2-R15 are CALLER-SAVED

R16-R23 are CALLEE-SAVED

I will leave this here for the time being: keep in mind that a function may be both a caller and a callee. In our example, some other function called A() which in turns calls B(). So, at the time A() is called it is a callee, but it also becomes a caller when it calls B(). For this reason, a function must preserve all callee-registers it may overwrite, and across calls it makes, it has to preserve all caller-saved registers it cares to “survive” the call.

OK, lots of information. Let’s try to absorb the concept by going over an example that forces us to put this information to use.

An Example

Let’s try to implement the following C function:

int add7(int a1, a2, a3, a4, a5, a6, a7) {

int sum;

sum = add5(a3, a4, a5, a6, a7);

sum = sum + add2(a1, a2);

return sum;

}

There are definitely more efficient ways of implementing a function that adds 7 numbers. That is not the point of this example. The goal is for us to see in practice the calling conventions.

We will be looking at how the stack is used as we are implementing the code.

What add7() expects: Let’s first see where add7 expects its input arguments to be? That is, when add7() is called by whichever function it calls it, where does add7 can find a1 through a7.

By convention we know that the first four arguments will be in r4 through r7, and any subsequent arguments will be on the stack with the 5^th argument on top.

So, just before the first instruction of add7() is executed we can assume that:

Point in time #1: ADD7 just got called. We haven’t yet executed any instructions of ADD7.

R4=a1, R5=a2, R6=a3, R7=a4, 0(sp)=a5, 4(sp)=a6, 8(sp)=a7

So, this what ADD7() expects that the stack will look like when it gets called. It is the responsibility of the function that calls it to make sure that this is the case. If not, too bad, things will break.

In the diagram below, SP is the stack pointer register (R27) it contains a 32b value which is to be interpreted as an address in memory. All addresses including it and higher are part of the stack. Anything lower is not part of the stack. The actual value in SP does not matter. This is because all actions on the stack are relative to SP.

What ADD7 can assume the stack contains when it is called.

First question ADD7() should ask itself is whether it calls any other functions. Why? Because RA (R31) upon entry contains the address that ADD7() is supposed to return to. Recall, that Call writes in r31 the return address (PC+4) and RET just does PC=R31. For a function that calls no other, RA is to remain untouched. However, here’s the conundrum for ADD7(). As the code is written it will call ADD5 and ADD2. It will do that using CALL instructions and those CALL instructions will overwrite R31. Somehow R31 needs to be preserved.

We can treat R31 as a callee-saved register. ADD7() is the callee and is about to execute. R31 contains the returns address for it, and ADD7() knows that it will overwrite it. SO, it has to save and restore it as was shown in our example above for R16.

Accordingly, we can start filling in the blanks add7()’s implementation by pushing RA on the stack upon entry and popping from the stack (and thus restoring the value it had upon entry) prior to returning:

.text

add7:
#PUSH RA

addi sp, sp, -4

stw ra, 0(sp)

#POINT in time #2

ADD7 BODY CODE GOES HERE

epilogue:

#POP RA

ldw ra, 0(sp)

addi sp, sp, +4

ret

So, just after we pushed RA on the stack (Point in time #2) the stack looks like this:

Nest we need to implement this statement:

sum = add5(a3, a4, a5, a6, a7);

This is a call, and add7() will be a caller. Before moving ahead with the implementation add7() needs to consider whether there are any values in caller-saved registers that it cares for them to survive the call to add5.

The caller-saved registers are r2-r15. Presently, add7() knows that values that it knows about are the ones holding the arguments, that is r4 through r8. Of course, all registers have values. However, add7() did not initialize any of the registers and the only ones it knows about are those in the arguments.

Let’s take a closer look at the code now to see what comes after the call to add5():

sum = add5(a3, a4, a5, a6, a7);

sum = sum + add2(a1, a2);

After the call to add5 we will call add2() where we need to pass a1 and a2 as arguments.

OK, back to before the call to add5(). Where are those two values? They are respectively in r4 and r5 since they are the original arguments to add7(). So, before we call add5() r4 and r5 contain values (a1 and a2) that we will need after the call to add5() returns.

So, r4 and r5 need to be preserved across the call to add5(). These are caller-saved registers like the r8 in our earlier example. So, we need to push them on the stack before the call to add5() and pop them after add5() returns to us. Here’s the code for this:

.text

add7:
#PUSH RA

addi sp, sp, -4

stw ra, 0(sp)

#POINT in time #2

addi sp, sp, -8

stw r4, 0(sp) # PUSH a1

stw r5, 4(sp) # PUSH a2

# Point time #3

CODE GOES HERE

epilogue:

#POP RA

ldw ra, 0(sp)

addi sp, sp, +4

ret

Let’s see how the stack looks like after we pushed a1 and a2 onto it:

Now we are ready to prepare for the call for add5(). Here we have to consider what add5() expects.

sum = add5(a3, a4, a5, a6, a7);

Add5() expects that a3 will in r4, a4 in r5, a5 in r6, a6 in r7, and a7 on the top of the stack 0(sp). Why, because they are its parameters and it expect that the convention will be followed by the caller. Accordingly, Add7() being the caller at this point has to take care of this. It will have to move values into r4 and r7 and to also push a7 on the stack.

So add7() will have to:

· Copy a3 from r6 to r4

· Copy a4 from r7 to r5

· Copy a5 from the stack to r6

· Copy a6 from the stack to r7

· Read a7 from the stack and push a copy of it on the stack.

In the code below, first we set a3 and a6, then push a7 on the stack after reading it first from the stack

.text

add7:
#PUSH RA

addi sp, sp, -4

stw ra, 0(sp)

#POINT in time #2

addi sp, sp, -8

stw r4, 0(sp) # PUSH a1

stw r5, 4(sp) # PUSH a2

# Point time #3

mov r4, r6

mov r5, r7

ldw r6, 12(sp)

ldw r7, 16(sp)

ldw r2, 20(sp) # read a7 from the stack

addi sp, sp, -4 # push a7 on the stack

stw r2, 0(sp)

#point in Time #4

CODE GOES HERE

epilogue:

#POP RA

ldw ra, 0(sp)

addi sp, sp, +4

ret

In the code, we used r2 to read a7 from the stack and then push it back on the top. R2 is a caller savcd register. So, add7 can overwrite it without breaking the convention – whoever called add7() they would have saved r2 if they cared about its value surviving the call to add7().

After pushing a7 on the stack (point in time #4) the stack looks like this:

Now we can call add5:

.text

add7:
#PUSH RA

addi sp, sp, -4

stw ra, 0(sp)

#POINT in time #2

addi sp, sp, -8

stw r4, 0(sp) # PUSH a1

stw r5, 4(sp) # PUSH a2

# Point time #3

mov r4, r6

mov r5, r7

ldw r6, 12(sp)

ldw r7, 16(sp)

ldw r2, 20(sp) # read a7 from the stack

addi sp, sp, -4 # push a7 on the stack

stw r2, 0(sp)

#point in Time #4

call add5

CODE GOES HERE

epilogue:

#POP RA

ldw ra, 0(sp)

addi sp, sp, +4

ret

After add5 returns to add7, its return value will be in r2. That can be add7()

s sum variable from this point on.

At this point, we could go and clean the stack since we no longer need the top value (which was used to hold the 5^th argument to add5()). We could do this, or we can promise to do so before we return. So, for the time being, we need to remember that we allocated another 12 bytes on the stack that we have not yet written code to remove – we already have code for removing the 4 bytes pushed for preserving ra.

Now we can write the code to call add2(). Again add7() is the caller and is making a call. What add7() has to ask is whether there is any value in a caller-saved register that it will need after the call to add2(). Stop here and think about it before reading further. Is there a value in any of the registers r2-r15 that add7() will need to preserve across the call to add2()?

Look at our decision to keep sum in r2. Do we care for this value? Our plan is to add to it the value that add2() returns. So, yes, we do care that sum keeps its value across the call to add2(). Since it is presently in r2 we have no expectation that it will be preserved. Actually, in this case it is a given that it will not since add2() will overwrite r2 to return a value to us.

So, similar to r8 in our first example, we need to preserve r2 on the stack across the call to add2().

Recall, that we already have space on the top of the stack that we used for passing the 5^th argument to add5(). That word at the top served its purpose, so we can repurpose it to save r2 across the call to add2.

Also, add2() expects that r4 will be a1 and r5 will be r2. So, before we call add2(), we will need to do the following:

· Store r2’s value on the top of the stack. We reuse the word on the top of the stack we used for our call to add5(). So, we will not be adjusting the stack.

· Read a1 from where we saved it on the stack before the call to add5() and write the value into r4

· Read a2 from where we saved it on the stack before the call to add5() and write the value into r5

Here’s the code. Note that the stack is as we left it at “point in time #4”:

.text

add7:
#PUSH RA

addi sp, sp, -4

stw ra, 0(sp)

#POINT in time #2

addi sp, sp, -8

stw r4, 0(sp) # PUSH a1

stw r5, 4(sp) # PUSH a2

# Point time #3

mov r4, r6

mov r5, r7

ldw r6, 12(sp)

ldw r7, 16(sp)

ldw r2, 20(sp) # read a7 from the stack

addi sp, sp, -4 # push a7 on the stack

stw r2, 0(sp)

#point in Time #4

call add5

stw r2, 0(sp) # save r2 on the top of the stack

# we reuse the word we allocated for passing

# the 5^th argument to add() before

ldw r4, 4(sp) # 1^st argument is a1 (read it from the stack)

ldw r5, 8(sp) # 2^nd argument is a2 as above

# point in time #5

call add2

epilogue:

#POP RA

ldw ra, 0(sp)

addi sp, sp, +4

ret

The stack at this point is as follows. We haven’t allocated any more space, however, we reused the top to save r2 which now is the return value of add5():

Finally, we can add code for:

1. updating sum (the return value of add2() plus the return value of add5() we saved on the stack.

2. Removing the 12 bytes we allocated on the stack

.text

add7:
#PUSH RA

addi sp, sp, -4

stw ra, 0(sp)

#POINT in time #2

addi sp, sp, -8

stw r4, 0(sp) # PUSH a1

stw r5, 4(sp) # PUSH a2

# Point time #3

mov r4, r6

mov r5, r7

ldw r6, 12(sp)

ldw r7, 16(sp)

ldw r2, 20(sp) # read a7 from the stack

addi sp, sp, -4 # push a7 on the stack

stw r2, 0(sp)

#point in Time #4

call add5

stw r2, 0(sp) # save r2 on the top of the stack

# we reuse the word we allocated for passing

# the 5^th argument to add() before

ldw r4, 4(sp) # 1^st argument is a1 (read it from the stack)

ldw r5, 8(sp) # 2^nd argument is a2 as above

# point in time #5

call add2

# return value is in r2

ldw r4, 0(sp) # recover the return value of add5()

ddd r2, r2, r4 # sum+=return value of add2()

addi sp, sp, +12 # remove the space allocated on the stack

#point in time #6

# the stack is back to what it was at point #2

epilogue:

#POP RA

ldw ra, 0(sp)

addi sp, sp, +4

ret

That’s it!

Now we can go back if we wish and aggregate all changes. In total, the maximum space in the stack that add7() ever needed was 16 bytes. So, we could change the first addi to -16 and have a single addi +16 at the end. We can then adjust all offsets in the ldw and stw instructions accordingly. This is an optimization and it is not required for correctness.

.text

add7:
#PUSH RA

addi sp, sp, -16

stw ra, 12(sp)

#POINT in time #2

stw r4, 4(sp) # PUSH a1

stw r5, 8(sp) # PUSH a2

# Point time #3

mov r4, r6

mov r5, r7

ldw r6, 16(sp)

ldw r7, 20(sp)

ldw r2, 24(sp) # read a7 from the stack

stw r2, 0(sp)

#point in Time #4

call add5

stw r2, 0(sp) # save r2 on the top of the stack

# we reuse the word we allocated for passing

# the 5^th argument to add() before

ldw r4, 4(sp) # 1^st argument is a1 (read it from the stack)

ldw r5, 8(sp) # 2^nd argument is a2 as above

# point in time #5

call add2

# return value is in r2

ldw r4, 0(sp) # recover the return value of add5()

add r2, r2, r4 # sum+=return value of add2()

#point in time #6

# the stack is back to what it was at point #2

epilogue:

#POP RA

ldw ra, 0(sp)

addi sp, sp, +16

ret