For loops
and arrays – more examples
Andreas Moshovos, Jan 2024
Let’s look at another example of a for loop going over an array:
#define N 10 // elements in the
array
#define T 5 // T stands for
threshold
int a[N] = { some
values };
int sum = 0;
int i;
for (i =
0; i < N; i++)
if (a[i] >= T) sum += a[i];
The above code adds up all elements in a[i] whose value exceeds or is equal to some threshold value T.
The control flow diagram for the code is as follows:
As programmers we have to decide where to place the variables. Since a[] is an array it will be placed in memory. We can declare it with a statement that looks as follows:
.data
a: .word list_of_values_goes_here
In the rest of this note we will assume that a[] has been declared and initialized and we will refer to the label “a”. As we discussed, “a” from this point on becomes the address in memory where a[0] will be stored at. For example, a[0] could be 0x200000. Since every element is a word (4B), then a[1] will be at 0x200004, a[2] at 0x200008, and so on. In general, a[i] will be stored as 4B starting at address (a + 4 x i).
Let’s now decide that i will be stored in r2 and sum in r8.
The code is as follows:
example:
movi r8, 0 # sum = 0
movi r2, 0 # i = 0, INIT of loop
cond:
movi r4, 10 # r4 = N
bge r2, r4, after # if i >= N
we are done
body:
#
a[i] < T
movi r5, 5 # r5 = T
# let’s
find where a[i] is at
movia r6, a
add r7, r2,
r2 # r7 = 2xi
add r7, r7,
r7 # r7 = 4xi
add r6, r6,
r7 # r6 = a + 4
x i / address where a[i] is
at
ldw r7, 0(r6) # r7 = a[i] / read a[i] from memory
bge r7, r5, post # if a[i] >= T do not add to sum
addsum:
add
r8, r8, r7 # sum += a[i]
post:
addi r2, r2, 1
br loop
That’s all. Now, can we find any instructions that execute in the loop but whose output register does not change in value across iterations? Loop at the movi r4, 10 in the cond block. It always assigns 10 to r4 and it does so once per iteration. What if we move this instruction just before the loop? Will the final values calculated by the loop change? This is what we’ve done below:
example:
movi r8, 0 # sum = 0
movi r2, 0 # i = 0, INIT of loop
movi r4, 10 # r4 = N // moved to INIT
block
cond:
bge r2, r4, after # if i >= N
we are done
body:
#
a[i] < T
movi r5, 5 # r5 = T
# let’s
find where a[i] is at
movia r6, a
add r7, r2,
r2 # r7 = 2xi
add r7, r7,
r7 # r7 = 4xi
add r6, r6,
r7 # r6 = a + 4
x i / address where a[i] is
at
ldw r7, 0(r6) # r7 = a[i] / read a[i] from memory
bge r7, r5, post # if a[i] >= T do not add to sum
addsum:
add
r8, r8, r7 # sum += a[i]
post:
addi r2, r2, 1
br cond
At the end, even with this change, the loop produces exactly the same values. Why would we want to move such instructions out of the loop? Well, at the end, while the code produces the same results, it does so by executing fewer instructions. This means it is faster and uses less energy. Usually both these are advantages. Such instructions and code are referred to as “loop invariant” meaning they do not change across loops.
Can you spot other such instructions? HINT: this is not a wild goose chase. There are more.
Revisiting
how we access the array
Let’s us revisit the C code we wrote for summing some of the elements of the array:
int i;
for (i =
0; i < N; i++)
if (a[i] >= T) sum += a[i];
We can note that in this loop we know that i will increment by 1 at every iteration. However, the way the code is written we have to calculate where in memory a[i] is from scratch as every iteration. We do not take advantage of the fact that we are going over all array elements one after the other in sequence. We know that next(i)=current(i)+1. Given this observation, we can instead write the loop as follows where we use a pointer in memory (p) to walk through the elements of the array:
int i;
int *p;
p = &a[0];
for (i =
0; i < N; i++, p++)
if (*p >= T) sum += *p;
The pointer p is just a 32b value which we intend to use as an address to access memory (load values in this case). It is 32b because in NIOS II the memory address space is 2^32 and all addresses are 32b. In other architectures pointers can be of different bit length.
The statement *p = &a[0] can also be written as p = a. In both cases, we assign into variable p, the address of the first element of array a.
Let’s see what the above code translates to. Below we assume that sum is in r8, i in r9, and p in r10:
movia r10, a # p = a
movi r8, 0 # sum = 0
movi r9, 0 # i = 0
movi r11, 10 # r11 = N
movi r12, 5 # r12 = 5
cond:
bge r9, r11, after
body:
ldw r13, 0(r10) # r13 = *p // we are reading a[i] into r13
blt r13, r12, post # a[i]
< T then skip the sum
usum:
add r8, r8, r13
# sum += r13
post:
addi r9, r9, 1 # i++
addi r10, r10, 4 # p++, p points to int so we
increment by the sizeof(int) = 4
br cond
In the above code, we use p to access the elements of a[] one after the other. P points to the next element to process. We start with p pointing to a[0], and at every iteration we increment it by 4 so that we access the next element in order.
We can further optimize the code by getting rid the i variable and use p and the size of the array to check whether we processed all elements:
int i;
int *p, *p_last;
p = &a[0];
p_last = p + N;
// address immediately after the last element of a[]
since there are N elements in it
for (; p < p_last;
p++)
if (*p >= T) sum += *p;
and in assembly:
movia r10, a # p = a
movi r8, 0 # sum = 0
addi r9, r10, 40 # p_last = a +
10 elements x 4B per element
movi r12, 5 # r12 = 5 / T
cond:
bge r10, r9, after # if NOT p < p_last
we are done
body:
ldw r13, 0(r10) # r13 = *p // we are reading a[i] into r13
blt r13, r12, post # a[i]
< T then skip the sum
usum:
add r8, r8, r13
# sum += r13
post:
addi r10, r10, 4 # p++, p points to int so we
increment by the sizeof(int) = 4
br cond