Notes on Caches

Andreas Moshovos

Spring 2007

 

Please refer to the book for a complete treatment of caches. Here we review a few examples that explain some of the underlying concepts.

 

Cache Indexing

 

A cache is organized as a two-dimensional array of blocks. The rows are called sets, and the columns are called ways. At the one extreme there is only one set. This is a fully-associative cache. In this case the number of ways equals the number of blocks. At the other extreme there is a single column. This is a direct-mapped cache. In this case, the number of sets equals the number of blocks. Any configuration in between is called an N-way set-associative cache where N is the number of ways. Notice that the number of sets is not specified. We can derived it given the total cache capacity.

 

Given an address from the CPU we need to index the cache. By this we mean, selecting the set in which the address may be cached. The common indexing scheme partitions the incoming address into three fields. Starting from the MSB these are the TAG, SET and OFFSET:

 

TAG

SET

OFFSET

 

<- --------------------------------  Address bits ------------------------------ ->

 

For a fully-associative cache the set field does not exist. This is because there is only one set. For the direct-mapped cache, if the set width is S bits it holds that 2^S = #Blocks.

 

Let’s go over a few examples.

 

Example #1

 

1MB cache, with 64B blocks, 4-way Set-Associative, write-through with LRU replacement. Assume a 36-bit byte-addressable address space.

 

In this case, there are 2^36 different byte addresses.

 

It’s good practice to try to write the parameters as a power of two where possible. So we have Capacity = 1MB = 2^20 bytes, Block Size = 64B = 2^6, associativity = 4 = 2^2.

Notice that the capacity refers to the total number of data bytes that the cache can store. It does not include the overhead bits required by the tags, valid bits and LRU bits.

 

From the size of the blocks we can immediately derive the width of the offset field. It holds that offset_width = lg(Block Size), where lg is the logarithm base two. So, in this case OFFw = 6 bits.

 

To find the width of the SET field we need to determine the number of rows. To do that we need to determine the number of blocks. We are given that this is a 4-way set-associative cache, hence each set has four blocks.

 

It holds:

 

#Blocks = Capacity / BlockSize = 2^20 / 2^6 = 2^14. There are 2^14 blocks.

 

#Sets = #Blocks / #ways = 2^14 / 2^2 = 2^12. There are 2^12 sets, hence the SET field is 12 bits wide.

 

So the indexing is done as follows:

 

<- --- 36 – 12 – 6 --- ->

<- ------- 12 -------- ->

<- -------- 6 -------- ->

TAG

SET

OFFSET

 

To access the cache we first use the 12 SET bits to select one of the 2^12 sets. Then we compare the TAG field with the TAG fields of the four blocks that belong to the set. If a match is found, we finally use the OFFSET field to select the appropriate byte out of the 64 stored in the block.

 

Example #2

 

36KB, 9-way set-associative cache, with 8-byte blocks, write-back with LRU replacement and a 2^24 address space (byte addressable).

 

We immediately know that OFF width = 3 bits (from the block size).

 

#blocks = Capacity / block size = 9 * 4K / 2^3 = 9 * 2^12 / 2^3 = 9 x 2^9.

 

#sets = #blocks / #ways = 9 x 2^9 / 9 = 2^9.

 

So the SET field is 9 bits wide.

 

It follows that that TAG field is 24 – 9 – 3 bits wide.

 

Example #3

 

If we have 7KB of data cache and assuming 128B blocks what can the associativity be?

 

Let’s first see how many blocks are there:

 

#blocks = Capacity / block size = 7 x 2^10 / 2^7 = 7 x 2^3.

 

We can organize the blocks into one set. This would be a Fully-Associative cache.

 

Can we organize the blocks into two sets? Each set would have 7 x 2^3/2 = 7x2^2 blocks. Yes we can. This is a 28-way set-associative cache.

 

Can we organize the blocks into four sets? Notice that the number of sets *has* to be a power of two since we assume that we use a portion of the address to index the sets directly (note that there are advanced designs that use different functions but we will not discuss those in this course). So, if we had four sets, each set would have 7x2^3/2^2 = 7x2 blocks. Yes, we can. This is a 14-way set-associative cache.

 

How about 8 sets? Each set would have 7 blocks. This is a 7-way set-associative cache.

 

How about 16 sets? Each set would have 4.5 blocks. This is not possible.

 

How about 32? Each set would have have 1 3/4 blocks. This is not possible.

 

How about 64? No, there are only 56 blocks.

 

Storage Requirements

 

How many storage bits are required to implement a 256KB cache, with 16B blocks, that is 4-way set-associative, uses write-back, LRU replacement and assuming a 2^36 byte-addressable address space?

 

Bits are required for:

 

1.    The data

2.    The tags

3.    The valid bits

4.    The dirty bits

5.    The LRU bits

 

Data bits = 256K x 8 = 2^18 x 2^3 = 2^21 bits = 2Mbits

 

To calculate the rest we need to figure out how the cache is indexed.

 

#blocks = 2^18 / 2^4 = 2^14

#sets = 2^14 / 2^2 = 2^12

 

So, the TAG field is 36-12-4 = 18 bits wide.

 

Tag bits = #blocks x TAG field width = 2^14 * 18 bits

 

There is one valid bit and one dirty bit per blocks

 

Valid Bits = 2^14 and Dirty Bits = 2^14

 

The LRU bits are associated with the set. They tell us in which order the blocks within the set were accessed. As we explained we need at least lg(ways!) bits to encode this information. So:

 

LRUbits = 2^12 x lg(4!)

 

Determining Hits and Misses

 

Let us assume that a program accesses the following addresses:

 

$200, $204, $208, $20C, $2F4, $2F0, $200, $204, $218, $21C, $24C, $2F4, repeated two times

 

Given a cache with 8 four byte blocks determine how many hits and misses will occur.

 

#1 Direct-Mapped Cache

 

Since the block size is four bytes the lower two bits are the offset within the block. These are show in italics.

Since there are 8 blocks, and this is a direct-mapped cache, there are 8 sets. Hence, the next 3 bits are the set. These are underlined.

The rest are the TAG show in normal font.

 

ADDRESS IN HEX

ADDRESS IN BINARY

HIT/MISS

Action

$200

0010 0000 0000

Miss

Bring Block 0010 0000 00XX from memory into set 000

$204

0010 0000 0100

Miss

Bring block 0010 0000 01XX from memory into set 001

$208

0010 0000 1000

Miss

Bring block 0010 0000 10XX from memory into set 010

$20C

0010 0000 1100

Miss

Bring block 0010 0000 11XX from memory into set 011

$2F4

0010 1111 0100

Miss

Bring block 0010 1111 01XX from memory into set 101

$2F0

0010 1111 0000

Miss

Bring block 0010 1111 00XX from memory into set 100

$200

0010 0000 0000

Hit

Set 000

$204

0010 0000 0100

Hit

Set 001

$218

0010 0001 1000

Miss

Bring block 0010 0001 10XX from memory into set 110

$21C

0010 0001 1100

Miss

Bring block 0010 0001 11XX from memory into set 111

$24C

0010 0100 1100

Miss

Bring block 0010 0100 11XX from memory into set 011

Replace block 0010 0000 11XX

$2F4

0010 1111 0100

Hit

Set 101

$200

0010 0000 0000

Hit

Set 000

$204

0010 0000 0100

Hit

Set 001

$208

0010 0000 1000

Hit

Set 010

$20C

0010 0000 1100

Miss

Bring block 0010 0000 11XX from memory into set 011

Replace block 0010 0100 11XX

$2F4

0010 1111 0100

Hit

Set 101

$2F0

0010 1111 0000

Hit

Set 100

$200

0010 0000 0000

Hit

Set 000

$204

0010 0000 0100

Hit

Set 001

$218

0010 0001 1000

Hit

Set 110

$21C

0010 0001 1100

Hit

Set 111

$24C

0010 0100 1100

Miss

Bring block 0010 0100 11XX from memory into set 011

Replace block 0010 0000 11XX

$2F4

0010 1111 0100

Hit

Set 101

 

#2 Fully-Associative Cache

In this case there is only one set with 8 blocks.

 

ADDRESS IN HEX

ADDRESS IN BINARY

LRU Order

HIT/MISS

Action

$200

0010 0000 0000

$200

Miss

Bring Block 0010 0000 00XX from memory

$204

0010 0000 0100

$204, $200

Miss

Bring block 0010 0000 01XX from memory

$208

0010 0000 1000

$208, $204, $200

Miss

Bring block 0010 0000 10XX from memory

$20C

0010 0000 1100

$20C, $208, $204, $200

Miss

Bring block 0010 0000 11XX from memory

$2F4

0010 1111 0100

$2F4, $20C, $208, $204, $200

Miss

Bring block 0010 1111 01XX from memory

$2F0

0010 1111 0000

$2F0, $2F4, $20C, $208, $204, $200

Miss

Bring block 0010 1111 00XX from memory

$200

0010 0000 0000

$200, $2F0, $2F4, $20C, $208, $204

Hit

 

$204

0010 0000 0100

$204, $200, $2F0, $2F4, $20C, $208

Hit

 

$218

0010 0001 1000

$218, $204, $200, $2F0, $2F4, $20C, $208

Miss

Bring block 0010 0001 10XX from memory

$21C

0010 0001 1100

$21C, $218, $204, $200, $2F0, $2F4, $20C, $208

Miss

Bring block 0010 0001 11XX from memory

$24C

0010 0100 1100

$24C, $21C, $218, $204, $200, $2F0, $2F4, $20C

Miss

Bring block 0010 0100 11XX from memory

Replace block containing $208

$2F4

0010 1111 0100

$2F4, $24C, $21C, $218, $204, $200, $2F0, $20C

Hit

 

$200

0010 0000 0000

$200, $2F4, $24C, $21C, $218, $204, $2F0, $20C

Hit

 

$204

0010 0000 0100

$204, $200, $2F4, $24C, $21C, $218, $2F0, $20C

Hit

 

$208

0010 0000 1000

$208, $204, $200, $2F4, $24C, $21C, $218, $2F0

Miss

Bring block 0010 0000 10XX from memory

Replace block containing $20C

$20C

0010 0000 1100

$20C, $208, $204, $200, $2F4, $24C, $21C, $218

Miss

Bring block 0010 0000 11XX from memory

Replace block containing $2F0

$2F4

0010 1111 0100

$2F4, $20C, $208, $204, $200, $24C, $21C, $218

Hit

 

$2F0

0010 1111 0000

$2F0, $2F4, $20C, $208, $204, $200, $24C, $21C

Miss

Bring block 0010 1111 01XX from memory

Replace block containing $218

$200

0010 0000 0000

$200, $2F0, $2F4, $20C, $208, $204, $24C, $21C

Hit

 

$204

0010 0000 0100

$204, $200, $2F0, $2F4, $20C, $208, $24C, $21C

Hit

 

$218

0010 0001 1000

$218, $204, $200, $2F0, $2F4, $20C, $208, $24C

Miss

Bring block 0010 0001 10XX from memory

Replace block containing $21C

$21C

0010 0001 1100

$21C, $218, $204, $200, $2F0, $2F4, $20C, $208

Miss

Bring block 0010 0001 11XX from memory

Replace block containing $24C

$24C

0010 0100 1100

$24C, $21C, $218, $204, $200, $2F0, $2F4, $20C

Miss

Bring block 0010 0100 11XX from memory

Replace block containing $208

$2F4

0010 1111 0100

$2F4, $24C, $21C, $218, $204, $200, $2F0, $2F4

Miss

Bring block 0010 0100 11XX from memory

Replace block contaning $20C