1. Introduction

This document focuses on the design of a synchronous write through cache memory which implements least recently used (LRU) algorithm for efficient data access. In the LRU algorithm, the content of the least recently used cell is being replaced whenever a write cycle is executed. The content of the most recently used cell remains untouched in the cache. The cache chip consists of address and data path. The address path is responsible for mapping the global memory address (typically from external DRAM) to the physical location of data in the cache. The data path returns the content of the physical memory location to the requesting device (typically from a processor or controller). The address path is implemented as CAM (Content Addressable Memory) block, which is able to search its content in a single clock cycle. The data path is implemented as SRAM (Static Random Access Memory) block. The overall architecture of cache chip is the integration between the two blocks which results in hardware reduction and better performance of the cache chip.

2. System Functionality

Figure 2.1 illustrates the overall block diagrams of the cache chip. The processor is assumed to have an address space of 16 bits. The address bits (15:0) are latched with Flip Flops before they are sent to the CAM and Decoder units. From the address bits, 14 bits are sent to the CAM unit and the remaining 2 bits are decoded into 4 bits to select the desired SRAM block through the SRAM Column MUX/DEMUX block. For instance, let’s imagine the data for address “AB72” is to be stored in the Cache. The first 14 bits are separated i.e. “AB70” (truncated to 14 bits) and the remaining 2 bits “10” (binary) are decoded. However, the processor will not just bring the data in block “AB72”; it will bring 4 words at a time. In this case, the data from “AB70”, “AB71”, “AB72” and “AB73” will all be brought into the Cache. Thus one row of the CAM (out of 256 rows) will point to 4 blocks of SRAM containing consecutive data block based on the remaining least significant 2 bits of the address bits. Column MUX/DEMUX block is in charge of accessing the data blocks of the SRAM. Since the DATA port of the Cache is bidirectional, tri-state buffers are used to interface the Cache to the processor. To reduce hardware, the SENSE AMPS for the SRAM are placed after the COLUMN MUX/DEMUX. For read operations, the CAM (which contains the most 14 significant bits of the address) selects the appropriate SRAM row and the least 2 significant bits will chose the appropriate word and return it to the processor. Therefore the total CAM & SRAM sizes are:

· CAM: 256 Rows X 14 Bits

· SRAM: 256 Rows x 4 Blocks x 32 Bits

Figure 2.1: System Block Diagram

When it comes to replacing an element in the Cache, an LRU algorithm has been employed. A custom LRU CAM block has been designed for which its CAM elements are made of T-Flip Flops connected in the form of a saturating 5-Bit counter. This ensures that no data in the Cache which has been accessed within the last 32 clock cycles would be replaced over one which hasn’t been accessed for longer. This LRU CAM block also has 256 bits, and keeps track of accesses made to the Cache. When an element in the Cache needs to be replaced, the LRU CAM is searched with bit patterns starting from “11111”. If a block contains this data, that row has not been accessed by the processor for at least 32 cycles and can be replaced. If no hits are detected, the next search pattern would be “1111X”. In this case any block which has not been access for at least 30 cycles would be searched. This continues until a hit is found and that block would be replaced. However, it is possible that there are multiple blocks have not been access for a certain number of cycles. In this case, multiple hits would come from the CAM. However, only one hit is necessary. To resolve this issue an LRU DECISION block has been designed. This block is a small state machine which scans the HIT LINES of the LRU CAM and selects the first one that it encounters. Although this is a multi clock operation, it only takes place when we need to access the main memory which is also a multi clock cycle operation. Thus, the speed of the Cache in not compromised.

The over all algorithmic description of the behavior of our Cache chip during READ/WRITE cycles is as follows:

READ – HIT

1. CAM is searched for the given address from the processor and locates it in one of the rows.

2. The least 2 significant bits of the address are used to select the appropriate SRAM block.

3. The data is read from SRAM and sent back to the processor.

4. The corresponding LRU CAM ROW COUNTER is reset to “00000”.

READ – MISS

1. CAM is searched for the given address from the processor, address is not found.

2. The processor is notified and the LRU CAM is searched with patters to find the LRU element.

3. In the meanwhile 4 words of data corresponding to the 14 most significant bits of the address are brought from main memory.

4. The data from the main memory is also sent to the processor.

5. LRU DECISION block selects the LRU CAM row and the new address is written in the row while 4 new words are stored in the SRAM in the same row number.

6. The corresponding LRU CAM ROW COUNTER is reset to “00000”.

WRITE – HIT

1. CAM is searched for the given address from the processor and locates it in one of the rows.

2. The least 2 significant bits of the address are used to select the appropriate SRAM block.

3. The new data coming from the processor is updates in the SRAM.

4. In the meanwhile, the data in the main memory is also updates. This is necessary as the Cache is a write through Cache.

5. The corresponding LRU CAM ROW COUNTER is reset to “00000”.

WRITE – MISS

1. CAM is searched for the given address from the processor, address is not found.

2. The processor is notified and the LRU CAM is searched with patters to find the LRU element.

3. In the meanwhile 4 words of data corresponding to the 14 most significant bits of the address are updated and brought from main memory.

4. LRU DECISION block selects the LRU CAM row and the new address is written in the row while 4 new words are stored in the SRAM in the same row number.

5. The corresponding LRU CAM ROW COUNTER is reset to “00000”.

In terms of division of tasks, here is an estimate:

· LRU Block Circuitry: Shahriar

· CAM Block Circuitry: Oleksiy

· SRAM Column Circuitry: Jen

· Peripheral Circuitry for LRU, CAM, SRAM: Cintia

· System Integration: All team members