Lecture 2

Andreas Moshovos

Spring 2008, Spring 2013

Input/Output Devices Continued: The Serial Interface (UART = Universal Asynchronous Receiver Transmitter)

In previous lectures we have seen the Parallel Interface device that provides a number of external connections that can be accessed in parallel (simultaneously as a group). The parallel interface device did not require any means of synchronization when reading or writing from/to it. That is, we assumed that all transactions completed instantaneously. In this lecture we will discuss another device, the serial interface where transactions are not instantaneous. Instead, it will take time for the serial device to complete a request of ours. This will allow us to discuss a new topic that relates to I/O devices: synchronization. We will first explain the programming interface of the UART device and then discuss about how it actually communicates with external devices.

The UART device:

The NIOS II board as configured for our course includes several serial port devices called UARTs. For practical reasons, we will be focusing on the JTAG UART. This is a UART implemented over the JTAG interface, which in turn is implemented over the USB connection with the host PC. The other UARTs provide a direct physical connection. In this lecture we will ignore the fact that the JTAG UART is implemented over the USB connection and pretend as if it had a dedicated physical connector. Each UART contains a receiver and a transmitter. The JTAG UART receiver can be used to receive characters from the PC (each a byte) while the transmitter can be used to send character to the PC. In the system you will be using, a terminal emulation program will be used to display the characters received on the PC, and to transmit characters from the PC’s keyboard to the DE2. The JTAG UART is a simplified version of other UARTS. Other UART interfaces contain several additional registers which we will not cover. These registers control various aspects of the communication such as its rate, data types, etc.

The JTAG UART interface starts at address 0x10001000 and comprises the following 32-bit memory mapped registers:

1. Receiver Register, RR at distance +0 from the base, i.e., location 0x10001000.

2. Transmitter Register, TR at distance +0 from the base, i.e., location 0x10001000 (not a typo, it *is* the same address as the RR).

3. Control and Status Register, CSR at distance +4 from the base, i.e., at location 0x10001004.

The format of these registers is as follows:

data elems available in recv buffer

data received

data to send

CSR

Slots available in transmit buffer

WIP

RIP

EWI

ERI

We’ll discuss the various fields as needed while explaining how program the device.

There are two transactions (i.e., operations) that can be asked of the UART: (1) send a character, or (2) receive a character. In a typical, setting the UART is connected via a few wires with an external device that is also capable of sending and receiving characters (such as the PC in the lab). The RR and TR are used respectively for receiving or sending characters (let us assume that characters are 8 bit for simplicity, but please note that in other UARTs there is a control register that can be used to configure different bit lengths for characters). They both map onto the same memory address. When reading from this memory location we are accessing RR, while when we write to this memory location we are accessing the TR. To send a character out we write it to TR, and to receive a character that was send to us, we read the RR. However, communication is a bit more complicated than that. Specifically:

1. Sending characters takes considerable time compared to the CPU’s processing speed (several thousand CPU cycles). Hence, we must be able to wait while the UART is sending a character before we attempt sending another one.

2. We should read a character from RR only if one has been received from the external source. While a character is waiting to be read in the RR, the receiver may receive additional characters. The receiver has a buffer where these characters are kept while waiting to be read from the NIOS II. If the buffer is full and a new character is received an existing one may be lost or the new one may be dropped. Either way, we should be reading characters as fast as they are received otherwise we will lose some of them.

Sending a character:

Let’s look at sending, i.e., transmitting characters using the JTAG UART. The pseudo algorithm is as follows:

While (transmitter is busy) wait;

TR = character we want to transmit

How do we determine whether the transmitter is busy? That information is contained in the Control and Status Register (CSR). Specifically, when we read this register, bits 16 through 31 contain a number. If that number is non-zero then the transmitter can accept our request. If that number is zero, then the transmitter is busy and we must wait and try later. Why is there a number and not just a bit saying busy/idle? The transmitter contains a small FIFO queue for accepting requests. The number reported by the upper bits of CSR tell us how many of those slots are currently empty. As we write values to the TR, we are filling in the FIFO queue. As characters get transmitted, the queue is drained. The number of FIFO slots available is determined at design time. The FIFO allows us to quickly hand off a number of characters to the transmitter without having to wait after each one before we hand off the next one.

Here’s a subroutine for sending the character in register r4. We are using caller-saved registers to avoid having to save/restore within the subroutine:

.equ JTAG_UART_BASE, 0x10001000

.equ JTAG_UART_RR, 0

.equ JTAG_UART_TR, 0

.equ JTAG_UART_CSR, 4

.text

putchar:

movia r8, JTAG_UART_BASE

wait:

ldwio r2, JTAG_UART_CSR(r8) # read CSR in r2

srli r2, r2, 16 # keep only the upper 16 bits

beq r2, r0, wait # as long as the upper 16 bits were zero keep trying

stwio r4, JTAG_UART_TR(r8) # place it in the FIFO

ret

Receiving a character:

To receive a character we must wait until one is received (in our setup the PC has to send one to DE2). How do we know if a character was received? Well, we have to ask. Receiving the character and asking whether one is available is all done by accessing register RR. The 32 bits of register RR contain all this information. Specifically:

1. Bits 0-7 contain the character, if one was received.

2. Bit 15 is 1 if a character was received. In this case, the character appears in bits 0-7. If this bit is zero, the value in bits 0-7 is meaningless.

3. Bits 31 through 16 contain additional information whether you want it or not. They contain the number of additional characters that have been received and are waiting in the incoming buffer. The UART contains a FIFO where the receiver places characters as it receives them. If this number is non-zero, then more characters are still available in the receive FIFO.

Here’s the subroutine for reading a character from the JTAG UART. The character is returned in r2:

.equ JTAG_UART_BASE, 0x10001000

.equ JTAG_UART_RR, 0

.equ JTAG_UART_TR, 0

.equ JTAG_UART_CSR, 4

.text

getchar:

movia r8, JTAG_UART_BASE

wait:

ldwio r2, JTAG_UART_RR(r8) # read RR in r2

andi r10, r2, 0x8000 # extract bit 15 in register r10 / keep a copy of r9 since it contains the character if any

beq r10, r0, wait # if bit 15 was zero, there was no character, keep waiting/trying

andi r2, r2, 0xff # a character was received, keep only that in r2 (mask out all other bits)

ret

Bits EWI and ERI stand for Enable Write and Enable Read Interrupts. Bits WIP and RIP stand for Write and Read Interrupt Pending respectively. More on these when we cover interrupts.

Bit AC we don’t need to worry about for regular communication. The JTAG UART serves other purposes as well where the other party (the host PC in this case) can use it to probe the FPGA hardware on the DE2 board. The AC bit indicates that such an inspection took place. If you are interested more about this, read the Altera manual.

Polling:

In both subroutines we used a busy-wait loop where we continuously probe the RR or the CSR until the receiver or the transmitter become available. Such loops are called busy-wait as the processor remains busy (i.e., executes instructions that communicate with the device) while not doing any productive work as it is simply waiting for the device to become available. This style of communication with devices is called POLLING. In polling we continuously probe the device until it becomes available or it completes our request. Polling is simple, but uses processor resources ineffectively. Using polling becomes at least cumbersome when more than one devices are involved as we have to write our code in a way that continuously probes several devices. In some cases, it is simply inappropriate. For example, consider what would happen if windows or X windows used polling for accepting input from the mouse: everything would freeze until you moved the mouse and then freeze again if you stopped moving it.

In real life, polling would be the equivalent of handing a piece of work to someone else and then keep knocking on their door asking: “Are you done yet?”. That will quickly help you make a lot of friends besides being completely inefficient. There is an alternative, which requires additional support at the hardware level. We will cover the alternative in a later lecture. It corresponds to the real life scenario where you hand out a task and then do something else while expecting to be notified when the task is completed.

As a final example, we present the echo routine, it just repeats the characters it receives:

.equ JTAG_UART_BASE, 0x10001000

.equ JTAG_UART_RR, 0

.equ JTAG_UART_TR, 0

.equ JTAG_UART_CSR, 4

.text

echo:

movia r8, JTAG_UART_BASE

waitr:

ldwio r2, JTAG_UART_RR(r8) # read RR in r2

andi r9, r2, 0x8000 # extract bit 15 in register r10 / keep a copy of r9 since it contains the character if any

beq r9, r0, waitr # if bit 15 was zero, there was no character, keep waiting/trying

andi r2, r2, 0xff # a character was received, copy the lower 8 bits to r2 and return

waitt:

ldwio r9, JTAG_UART_CSR(r8) # read CSR in r9

srli r9, r9, 16 # keep only the upper 16 bits

beq r9, r0, waitt # as long as the upper 16 bits were zero keep trying

stwio r2, JTAG_UART_TR(r8) # place it in the FIFO

br waitr # life is interesting, keep doing what you do

ret # never reaches here, this is for show

The other UART

Besides the JTAG UART, our board has a regular UART as well. It uses the standard physical serial interface (the JTAG UART in encapsulated under the USB physical interface). The regular, or “RS-232” UART occupies two words starting at address 0x10001010. The name RS-232 refers to the standard the describes the specifications of this interface (physical dimensions, voltage levels, protocol, etc.).

It too has three registers with the RB and the RB sharing address 0x10001010. Here’s their format:

data elems available in recv buffer

data received

data to sent

Slots available in transmit buffer

WIP

RIP

EWI

ERI

It’s almost identical to the JTAG UART. To read a character, one polls the RR register. Contrary to the JTAG UART, there is no valid bit here. The upper 16-bits of the RR register still tell us how many characters are waiting for us in the receive buffer. Polling here is done in two steps. First we use a ldhio to read just the upper 16 bits. As long as these are non-zero, there is a character that we can read. Then we use another ldhio to read the lower 16 bits to get the character. Reading the lower 16 bits “consumes” the character. The UART automatically decrements the value in the upper 16-bits. Here’s an example code that waits for one character and then returns it in r2:

wait:

movia r8, 0x10001010 # r7 now contains the base address

ldhio r2, 2(r8) # read the upper 16 bits. Recall NIOS II is little-endian. So the upper 16 bits (31-16) are at address (base + 2)

beq r2, r0, wait # If this is 0, no data is available

ldwio r2, 0(r8) # read the data and decrement the count in the upper 16-bits

andi r2, r2, 0xff # keep just the lower 8 bits – mask out the rest

What is bit PE? It stands for “Parity Error”. Parity is an extra bit that is communicated over the serial connection and is used to guard against possible transmition errors. The device is configured for ODD parity, which means that the parity bit value is set so that the total number of 1 in the data transmitted over the wire is always odd. For example, if we want to send the value 00001111 which has four “1”s, we would add an extra parity bit of “1”. The number of “1”s is 5 which is odd. The value transmitted over the wire would be 00001111 1. If the value was 00000111, then we will add an extra 0 and send 0000 0111 0. The number of “1”s here is 3 which is odd. Parity guards against a single error; if one bit is sensed incorrectly, then we will detect it.

How about sending a character? This is identical to the JTAG UART except for the base address:

putchar:

movia r8, 0x10001010

wait:

ldwio r2, 4(r8) # read CR in r2

srli r2, r2, 16 # keep only the upper 16 bits

beq r2, r0, wait # as long as the upper 16 bits were zero keep trying

stwio r4, 0(r8) # place it in the FIFO

ret

Bits EWI and ERI stand for Enable Write and Enable Read Interrupts. Bits WIP and RIP stand for Write and Read Interrupt Pending respectively. More on these when we cover interrupts.

How serial, asynchronous communication works:

At the communication link level the serial device uses the following protocol for sending/receiving characters. Each character is represented as a stream of bits. Specifically, each character is represented in the following format:

idle

start

D0 (lsb)

stop/idle

ß VOLTAGE LEVEL

ßà

bit cell

TIME

The actual bit pattern of the character appears in bits D0 through D7. There is a preamble START bit with the value of 0 and a postfix STOP bit with the value of 1. Each bit is sent by setting the communication line to the corresponding voltage level for a pre-specified duration. This is the bit cell shown.

BAUD RATE: It’s defined as the number of “bit cells” that fit within 1 second. For example for 9600 baud rate we get that bit time = 1/9600 = 104.16 microseconds. Thus it takes at least (8 + 1 + 1) * 104.16 = 1.0416 milliseconds to send a full byte (the +1 is for the start bit and the +1 for the stop bit). Note that baud rate is different than the effective bandwidth since there is the overhead associated with start and stop bits.

Ideally, the transmitter and the receiver would use identical time references (e.g., a clock) for communicating. Identical means both same frequency and same phase (i.e., transitions happen at the same time on both sides). This way they could agree on exactly where each bit starts and thus communicate without any errors. Communication in this case would be very simple: the receiver takes a single sample at the center of each bit cell and thus reconstructs the data byte transmitted.

IDEAL SCENARIO: TRANSMITTER AND RECEIVER USE EXACTLY THE SAME TIME REFERENCES HENCE THEY AGREE ON WHERE BIT CELLS START:

idle

start

D0 (lsb)

stop/idle

ß VOLTAGE LEVEL

sample

at recv.

However, the receiver and the transmitter do not share a common time reference. Instead, they use their own local time references. While this is highly practical (because there is no need to share a time reference, something that would require additional wires and that would be very hard to do anyhow due to the possibility of using long wires), it introduces two difficulties:

1. The frequency of the two time references may differ

2. The phase (i.e., the point in time where the transition from 0 to 1 happens) of the two time references will most likely be different.

REALISTIC SCENARIO: THE TRANSMITTER AND RECEIVER USE THEIR USE TIME REFERENCES. THERE IS A DIFFERENCE IN FREQUENCY AND IN PHASE:

WHAT THE TRANSMITTER USES:

idle

start

D0 (lsb)

stop/idle

ß VOLTAGE LEVEL

WHAT THE RECEIVER THINKS/USES:

idle

start

D0 (lsb)

stop/idle

ß VOLTAGE LEVEL

PHASE DIFFERENCE

FREQUENCY DIFFERENCES

TIME

To compensate for the two problems we use the start and stop bits and the receiver uses over-sampling. These are explained in what follows:

In the previous figure the differences in frequency are exaggerated. In practice there shouldn’t be a difference of more than 20% at the end between where the stop bit is and where the receiver thinks it is. Here’s why: To compensate for the first difficulty the RS-232C standard (the one used for the common serial port) imposes a requirement that the baud rates used by the two communicating devices should not be different more than 2%. Even so, noticing that we need at least 10 bits to transmit a byte, and even if we assume that initially the two time references are phase synchronized (i.e., both devices agree on where the start bit starts), at the end there may be a difference of up to 10 x 0.02 = 20% on where they think the center is for the stop bit.

To compensate for the second problem (phase difference) the serial interface uses the START and STOP bits. Note that the STOP and START bits use different logical values. This way there is always a transition from 1 to 0 and then to 1 when a new character is transmitted. Thus, the START and STOP bits are introduced as means of initial synchronization. The receiver waits until it detects a 1 to 0 transition and interprets this is as the START bit. Then it uses sampling to deal with differences in bit time and phase.

The receiver regenerates the transmitted value by over-sampling its input. That is, rather than taking a single sample per bit time it takes several and then uses these to detect the 0 to 1 transition for the stop/idle to start bits. Once this transition is determined, it can then use a single sample carefully chosen to so that it falls under the center of the bit time.

For example, if the receiver takes 16 samples per bit cell, then it should be able to detect the stop to start bit transition within 1/16 of the bit cell time in the worst case assuming identical time reference frequencies or within (1/16 x 1.02) of the transmitters bit cell time assuming that the receiver time reference is 2% slower than that of the transmitters. Once the beginning of the start bit is detected, the receiver can attempt to take samples at what it thinks is the center of the bit cell for each bit. The first sample should be taken after 24 cycles (at 16x over-sampling we pass 16 samples to go past the start bit and then pass another 8 samples to reach to the middle of D0). The second sample should be taken after 24+16 cycles and generally the ith sample should be taken at (24 + i x 16) cycles.

Even with these measures in place it is possible to encounter communication errors. These are referred to as FRAME errors. To further reduce the possibility of undetectable errors, serial communication often uses an additional parity bit. This can be used to detect single errors.