Lecture 17.1
Andreas Moshovos
Fall 2013
Code Races
Consider the following C code, where a variable in memory (cnt) gets continuously decremented by a program, and incremented whenever interrupts happen:
int cnt = 9;
interrupt:
cnt++;
program:
while (1) cnt--;
if we could observe the values that cnt takes, we could have observed the sequence 8 followed by 9. This is possible when the program executes cnt--, decremementing cnt from 9 to 8, and then having an interrupt increment it back to 9. Similarly, it is possible to observe the sequence 10, 11, 10; two interrupts happen incrementing cnt to 10 and then 11, and afterward the program decrements it to 10.
If you execute this program, you may also observe the sequence 10, 11, 8. How is that possible? To understand how this can happen we have to look at the way the C program gets implemented at the machine code level. Here’s how:
.data
cnt: .word 9
.section .exceptions, “ax”
iHdlr:
H1: ldw r9, 0(r8)
H2: addi r9, r9, 1
H3: stw r9, 0(r8)
eret
.text
main:
movia r8, cnt
movi r9, 1
wrctl ctl0, r1
wait:
I1: ldw r11, 0(r8)
I2: addi r11, r11, -1
I3: stw r11, 0(r8)
br wait
The first three instructions of main load into r8 the address of cnt in memory, and then enable interrupts. Instructions I1-I3 read cnt, decrement it and then store the updated value back in memory. Instructions H1-H3 read cnt, increment it and then update the value in memory. Since H1-H3 are within the interrupt handler and since interrupts at disabled there, they will all execute without any interruption. This is not the case for I1-I3. They execute outside the interrupt handler where interrupts can occur in between any instruction. As a result, an interrupt can occur between I1 and I2, or between I2 and I3, and between any other pair of adjacent in the execution order instructions.
Here’s one possible sequence of events:
I1 reads cnt from memory and keeps a *copy* of the value in r11. R11 is now 9.
Interrupt happens, and cnt gets incremented to 10. Notice that the copy of cnt in r11 is not updated. R11 was a snapshot of cnt that was taken before the interrupt happened. Execute resumes at I2. R11 becomes 8. Then another interrupt happens, and cnt gets incremented to 11. So, now cnt has taken the values 10 and then 11. After the second interrupt, the program continues at I3 which goes and updates cnt to 8.
Here’s a list of the events in time that cause the 10, 11, and 8 sequence:
I1 à r11 = 9
Interrupt:
H1, H2, H3 à cnt = 10
I2 à r11 = 8
Interrupt:
H1, H2, H3 à cnt = 11
I3 à cnt = r11 = 8
The root of the problem is that while in C, cnt-- in the program looks like an *atomic* operation, in reality it is not. An atomic operation is one that execute completely once started without any possible intervention in between. So, if cnt-- was atomic it would always decrement cnt by 1 and update the value in memory before *any* other instruction could read and modify cnt. However, cnt-- is not implemented as an atomic operation but as a sequence of three instructions, a “read-modify-write” sequence. This sequence reads a copy of the value, modifies the *copy*, and then updates the original value.
H1-H3 are too a “read-modify-write” sequence, but they do behave as an atomic one since interrupts are disabled in the interrupt handler. In other processors there are instructions that are atomic that programs can use. In NIOS II there aren’t any and one has to explicitly disable interrupts if there a possibility of problems like the one described here.
In general terms, the above code is an example of a “race”: the main program reads and updates a variable, that another program, the interrupt handler in this case, may also read and update. There is no mechanism to coordinate the actions of these two programs. The general problem is that of synchronization across multiple programs (or threads as they are often called). There are synchronization mechanisms that you can learn more about in the operating systems course.
As we mentioned, NIOS II does not implement atomic operations. Other architectures do, however, typically using them comes at the expense of performance. In the most straightforward implementation, when executing an atomic operation everything else in the processor and devices has to wait for it to complete. This limits concurrency and hence performance.