Andreas Moshovos
Spring 2007
Revised Winter 2024
For Loops and Arrays
In this section we will be implementing in assembly the following C pseudo-code:
short arr[5] =
{ 1, 2, 3, 4, 5 }; // an array of word
values (16 bit)
short n = 5; // the number of elements in the
array
short sum = 0;
for (i = 0; i < n; i++)
sum = sum + arr[i];
How are arrays are implemented at the machine level? They are typically implemented by allocating each element one after the other in memory. So, in our example, if arr[0] is at memory location 0x1000 then arr[1] will be at 0x1002, arr[2] at 0x1004 and in general arr[i] will be at 0x1000 + i x 2 (we use “2” because these are short numbers or half-words in NIOS II terminology – each takes two bytes in memory).
So, now in memory we will have the following:
|
Offset from where arr starts in memory: |
Byte Value |
What variable this
corresponds to |
|
+0 |
0x01 |
a[0] lower byte |
|
+1 |
0x00 |
a[0] upper byte |
|
+2 |
0x02 |
a[1] lower byte |
|
+3 |
0x00 |
a[1] upper byte |
|
+4 |
0x03 |
a[2] lower byte |
|
+5 |
0x00 |
a[2] upper byte |
|
+6 |
0x04 |
a[3] lower byte |
|
+7 |
0x00 |
a[3] upper byte |
|
+8 |
0x05 |
a[4] lower byte |
|
+9 |
0x00 |
a[4] upper byte |
|
+10 |
0x05 |
n lower byte |
|
+11 |
0x00 |
n higher byte |
|
+12 |
0x00 |
sum lower byte |
|
+13 |
0x00 |
sum upper byte |
Generally, if we have a one-dimensional array a of elements TYPE, then element a[i] is at address &a[0] + sizeof (TYPE) x i.
Note that the C expression &a[0] is equivalent to the address of the first array element. Instead of &a[0], in C we can typically write just “a” (note: in C if you write &a[0] + i, it will be automatically converted into &a[0] + i x sizeof(a[0]));
Writing the assembly
code:
Let us first declare our variables:
.data
.align 1
arr: .hword 1, 2, 3, 4, 5
n .hword 5
sum .hword 0
Focusing now on the code we should review what is the execution semantics of the C for statement. In general, a C for statement comprises four parts:
for (INIT; COND;
POST)
BODY
In pseudo-code, a C for statement is equivalent to the following:
STEP1 INIT
STEP2 if (COND is not true) we are done
STEP3 BODY
STEP4 POST
STEP5 GO TO STEP2
That is:
The INIT part is executed once at the beginning. We then
test the condition (COND). If the condition is not TRUE
then this is the end of it we skip the for. Otherwise, if the condition is
TRUE, we then execute the BODY portion, followed by the POST portion. We then return back to testing the condition and repeat the
aforementioned steps until the condition stops being TRUE.
We can visualize this with the following “decision diagram”:

This diagram is formally referred to as a “control flow graph”. The boxes show what actions the code should take, whereas the arrows show how execution should flow among them. The arrows that have a label (true or false in our example) correspond to conditional branches. As a side note a typical compiler uses such internal constructs to translate the code into assembly. If you are interested to learn more I would suggest you look up static single assignment in the context of compiler optimizations.
Now, let’s return to our specific example.
In our loop we have:
INIT à i = 0
COND à i < n
BODY à sum = sum + arr[i]
POST à i = i + 1
Let’s write each of them in turn.
For starters let us use a register for holding variable i. Let’s use r8 for this. Then INIT becomes:
add r8, r0, r0 à r8 = 0
Now, let’s move to STEP2. Our condition requires comparing the current value of i and n. Assuming that n does not change in value while our loop executes (and it shouldn’t) then we have the code:
movia r9,
n
ldh r9, 0(r9)
Parenthetically, this is a shorter piece of code that has the same effect:
movhi r9, %hiadj(n)
ldh r9, %lo(n)(r9) à keep n’s value in r9
Now we can test for the reverse condition and jump out of the loop as needed.
bge r8, r9, endloop
Let’s defer writing the BODY section for the time being and focus on the POST section instead:
addi r8, r8, 1 à r8 = r8 + 1
The almost complete code is then as follows:
.text
forloop:
add r8, r0, r0
movia r9, n
ldh r9, 0(r9)
loop: bge r8, r9, endloop
BODY GOES HERE
addi r8, r8, 1
br loop
endloop:
The next challenging part is writing the BODY section. Let’s use register r10 to keep the running sum and at the end we will write this value into the sum variable in memory. So, we should initialize r10 to 0 in the beginning and at the end of the loop write its value into the sum variable.
Now we can focus on implementing the BODY part.
We can rewrite sum=sum + a[i] as:
tmp = arr[i];
sum = sum + tmp;
Assuming that tmp will be held into a register now we have to devise a way of reading arr[i] into that register.
To implement the statement tmp= arr[i] we need to be able to access all array elements one after the other.
To access arr[i] we need to access the word at memory location “arr + i x 2” (see previous discussion about how arrays are laid out in memory). The code for that is:
movia r11, arr à r11 = &arr[0]
(address where a[0] is stored at in memory)
add r11, r11, r8 à r11 = &arr[0] + i
add r11, r11, r8 à r11
= &arr[0]
+ i + i = &arr[0] + 2 x i
ldhio r12, 0(r11) à r12 = arr[i]
add r10, r10, r12 à r10
= r10 + arr[i]
The complete code for the for loop is as follows:
.text
forloop:
add r8, r0, r0
movia r9,
n
ldh r9, 0(r9)
loop: bge r8, r9, endloop
movia r11, arr
add r11,
r11, r8
add
r11, r11, r8
ldh r12,
0(r11)
add
r10, r10, r12
addi r8, r8, 1
br loop
endloop:
movia r11,
sum
sth r10, 0(r11) ; write the sum into memory
While Loops
A while loop takes the following general form:
while (COND)
{
BODY
}
First we test the CONDITION. If it is TRUE we execute the BODY. This process is repeated until the CONDITION evaluates to FALSE. This is equivalent to for without the INIT and POST sections.
Do-While Loops
Do-while loops take the following form:
do
{
BODY
} while (COND);
We first execute the BODY and then test the CONDITION. This process is repeated as long as the CONDITION evaluates to TRUE.
Here’s an example:
i = 0;
do
{
sum = sum + arr[i];
i++;
} while (i < n);
This code assumes that n is at least 1.
Here’s the assembly implementation:
.text
add r8, r0, r0
movia r9,
n
ldh r9, 0(r9)
doloop:
movia r11, arr
add
r11, r11, r8
add r11,
r11, r8
ldh r12, 0(r11)
add
r10, r10, r12
addi r8, r8, 1
blt r8, r9, doloop
endloop:
movia r11,
sum
sth r12,
0(r11) ; write the sum into memory