Main | Assembly Programming | Input/Output | Memory | Computer Architecture | Advanced Topics |
In a cache and memory system with the following characteristics:
Direct Mapped
16 KByte
32 Byte Cache Lines (Blocks)
32 bit addresses, byte addressable
How many bits are the Tag, Index and Offset fields?
Offset field: 5 bits
Index field: 9 bits
Tag field: 18 bits
A memory system has the following performance characteristics:
Cache Tag Check time: 1ns
Cache Read Time: 1ns
Cache Line Size: 64 bytes
Memory Access time (Time to start memory operation): 10 ns
Memory Transfer time: 1ns/memory word
Memory Word: 16 bytes
The processor is pipelined and has a clock cycle of 500 MHz. The memory operation occupies one stage in the pipeline. If it takes longer, the pipeline is stalled.
How much time in nanoseconds does each pipeline stage take?
2nsec
A memory system has the following performance characteristics:
Cache Tag Check time: 1ns
Cache Read Time: 1ns
Cache Line Size: 64 bytes
Memory Access time (Time to start memory operation): 10 ns
Memory Transfer time: 1ns/memory word
Memory Word: 16 bytes
The processor is pipelined and has a clock cycle of 500 MHz. The memory operation occupies one stage in the pipeline. If it takes longer, the pipeline is stalled.
If a memory operation hits in the cache, does it stall the pipeline?
No
No (takes 2 nsec as well)
A memory system has the following performance characteristics:
Cache Tag Check time: 1ns
Cache Read Time: 1ns
Cache Line Size: 64 bytes
Memory Access time (Time to start memory operation): 10 ns
Memory Transfer time: 1ns/memory word
Memory Word: 16 bytes
The processor is pipelined and has a clock cycle of 500 MHz. The memory operation occupies one stage in the pipeline. If it takes longer, the pipeline is stalled.
How long does it take to service a memory operation if it misses in the cache?
15ns
Check time + time to start + transfer time for words = 1+10+(64/16)*1 = 15 ns
A memory system has the following performance characteristics:
Cache Tag Check time: 1ns
Cache Read Time: 1ns
Cache Line Size: 64 bytes
Memory Access time (Time to start memory operation): 10 ns
Memory Transfer time: 1ns/memory word
Memory Word: 16 bytes
The processor is pipelined and has a clock cycle of 500 MHz. The memory operation occupies one stage in the pipeline. If it takes longer, the pipeline is stalled.
What is the memory stall time for a program that performs 100 memory reads of which 90 hit in the cache and 10 miss. Memory stall time is defined as the amount of time the pipeline spends stalled due memory operations.
140ns
90 hits -> 0 stall; 10 misses -> (15+1-2)*10=140 ns stall time
Note: The +1 is because the stall time must be in multiples of clock cycles. The –2 is because the first 2 ns are part of the normal (unstalled) cycle
Is memory word order (big/small endian-ness) defined by the Instruction Set Architecture or the Processor Architecture?
Instruction Set Architecture
Which type of memory is faster to access?
SRAM
Which type of memory requires refresh?
DRAM
Which type of memory requires more power?
SRAM
How many transistors and capacitors does a DRAM cell need? SRAM cell?
DRAM: 1 Capacitor and 1 Transistor
SRAM: 0 Capacitors and 6 Transistors
Given the following cache and cache configuration:
TAG | SET | OFFSET |
---|---|---|
6 | 2 | 2 |
Block# | Tag (binary) | Data (3..0) | Valid | Dirty | |
---|---|---|---|---|---|
Set Index = 0 | 0 | 010110 | A5 06 72 B4 | 0 | 0 |
1 | 010010 | A5 06 72 B4 | 0 | 0 | |
Set Index = 1 | 0 | 010110 | 3A 59 BC 94 | 0 | 1 |
1 | 010111 | 3A 59 BC 94 | 1 | 1 | |
Set Index = 2 | 0 | 111111 | FF FF FF FF | 1 | 0 |
1 | 110011 | 19 FD E4 FF | 1 | 0 | |
Set Index = 3 | 0 | 100100 | A4 4A 56 65 | 1 | 1 |
1 | 000011 | A4 4A 56 65 | 0 | 1 |
Operation: CPU reads from $360 | |
Hit(H) or Miss(M):  M | Fetch from memory?    No X Yes, from $360 to $363 |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index = 0 | 0 | 110110 | MM MM MM MM | 1 | 0 |
Operation: CPU writes $FF to $24E | |
Hit(H) or Miss(M):  | Fetch from memory?    No    Yes, from      to     |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index = |   |   |   |   |   |
Operation: CPU writes $FF to $24E | |
Hit(H) or Miss(M):  H | Fetch from memory? X No    Yes, from      to     |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index =3 | 0 | 100100 | A4 FF 56 65 | 1 | 1 |
Given the following cache and cache configuration:
TAG | SET | OFFSET |
---|---|---|
6 | 2 | 2 |
Block# | Tag (binary) | Data (3..0) | Valid | Dirty | |
---|---|---|---|---|---|
Set Index = 0 | 0 | 010110 | A5 06 72 B4 | 0 | 0 |
1 | 010010 | A5 06 72 B4 | 0 | 0 | |
Set Index = 1 | 0 | 010110 | 3A 59 BC 94 | 0 | 1 |
1 | 010111 | 3A 59 BC 94 | 1 | 1 | |
Set Index = 2 | 0 | 111111 | FF FF FF FF | 1 | 0 |
1 | 110011 | 19 FD E4 FF | 1 | 0 | |
Set Index = 3 | 0 | 100100 | A4 4A 56 65 | 1 | 1 |
1 | 000011 | A4 4A 56 65 | 0 | 1 |
Operation: CPU reads from $360 | |
Hit(H) or Miss(M):  M | Fetch from memory?    No X Yes, from $360 to $363 |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index = 0 | 0 | 110110 | MM MM MM MM | 1 | 0 |
Operation: CPU reads from $265 | |
Hit(H) or Miss(M):  | Fetch from memory?    No    Yes, from      to     |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index = |   |   |   |   |   |
Operation: CPU reads from $265 | |
Hit(H) or Miss(M):  M | Fetch from memory?    No X Yes, from $264 to $267 |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index =1 | 0 | 100110 | MM MM MM MM | 1 | 0 |
Given the following cache and cache configuration:
TAG | SET | OFFSET |
---|---|---|
6 | 2 | 2 |
Block# | Tag (binary) | Data (3..0) | Valid | Dirty | |
---|---|---|---|---|---|
Set Index = 0 | 0 | 010110 | A5 06 72 B4 | 0 | 0 |
1 | 010010 | A5 06 72 B4 | 0 | 0 | |
Set Index = 1 | 0 | 010110 | 3A 59 BC 94 | 0 | 1 |
1 | 010111 | 3A 59 BC 94 | 1 | 1 | |
Set Index = 2 | 0 | 111111 | FF FF FF FF | 1 | 0 |
1 | 110011 | 19 FD E4 FF | 1 | 0 | |
Set Index = 3 | 0 | 100100 | A4 4A 56 65 | 1 | 1 |
1 | 000011 | A4 4A 56 65 | 0 | 1 |
Operation: CPU reads from $360 | |
Hit(H) or Miss(M):  M | Fetch from memory?    No X Yes, from $360 to $363 |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index = 0 | 0 | 110110 | MM MM MM MM | 1 | 0 |
Operation: CPU reads from $338 | |
Hit(H) or Miss(M):  | Fetch from memory?    No    Yes, from      to     |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index = |   |   |   |   |   |
Operation: CPU reads from $338 | |
Hit(H) or Miss(M):  H | Fetch from memory? X No    Yes, from      to     |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index =2 | 1 | 110011 | 19 FD E4 FF | 1 | 0 |
Given the following cache and cache configuration:
TAG | SET | OFFSET |
---|---|---|
6 | 2 | 2 |
Block# | Tag (binary) | Data (3..0) | Valid | Dirty | |
---|---|---|---|---|---|
Set Index = 0 | 0 | 010110 | A5 06 72 B4 | 0 | 0 |
1 | 010010 | A5 06 72 B4 | 0 | 0 | |
Set Index = 1 | 0 | 010110 | 3A 59 BC 94 | 0 | 1 |
1 | 010111 | 3A 59 BC 94 | 1 | 1 | |
Set Index = 2 | 0 | 111111 | FF FF FF FF | 1 | 0 |
1 | 110011 | 19 FD E4 FF | 1 | 0 | |
Set Index = 3 | 0 | 100100 | A4 4A 56 65 | 1 | 1 |
1 | 000011 | A4 4A 56 65 | 0 | 1 |
Operation: CPU reads from $360 | |
Hit(H) or Miss(M):  M | Fetch from memory?    No X Yes, from $360 to $363 |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index = 0 | 0 | 110110 | MM MM MM MM | 1 | 0 |
Operation: CPU reads from $376 | |
Hit(H) or Miss(M):  | Fetch from memory?    No    Yes, from      to     |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index = |   |   |   |   |   |
Operation: CPU reads from $376 | |
Hit(H) or Miss(M):  M | Fetch from memory?    No X Yes, from $374 to $377 |
--Result-- | Block# | Tag (binary) | Data (3..0) | Valid | Dirty |
---|---|---|---|---|---|
Set Index =1 | 0 | 110111 | MM MM MM MM | 1 | 0 |
Give a block diagram for a 2M x 16 memory using 256K x 8 SRAM chips connecting to a processor with a 24 bit address and a byte-addressable memory space (no cache is present). The memory should be connected starting at address $000000. You should show which address lines are used and their connections to the chips. You do not need to give details of the inner working of the memory chips or any bus synchronization information, but the addressing should be handled in enough detail to make your circuit obvious. Assume each chip has connections as shown in the diagram at right.
Need 2 chips ‘wide’ to handle data bus width. Need 8 chips ‘deep’ to get memory size.
Address must handle 2M therefore need 22 bits. LSB is not used (16 bit data but byte-addressable). Bits 1-3 select one of the rows of the memory through a demux. 10+8 bits (bits 4-21) go to the chips. Bits 22-23 must be zero (a decoder or NOR gate).
Briefly: Why do many advanced processors have a separate data and instruction caches?
Separate strategies for caching instructions & data;
Supports 2 internal busses
Prefetch & pipelining
Fill in the following table for a processor cache where the processor has a 32 bit word address and the cache has 512 blocks of 16 words.
512 = 29
Mapping Method | Tag Size (bits) | Block (or Set) Address Size (bits) | Word Select Address Size (bits) |
---|---|---|---|
Direct |   |   |   |
Fully Associative |   |   |   |
2-way Set Associative |   |   |   |
4-way Set Associative |   |   |   |
Address | Miss or Hit | Cache Set | Block # in Set | Reasoning |
---|---|---|---|---|
$F002 |   |   |   |   |
$A202 |   |   |   |   |
$B202 |   |   |   |   |
$F00C |   |   |   |   |
$E002 |   |   |   |   |
$F012 |   |   |   |   |
$0002 |   |   |   |   |
$F002 |   |   |   |   |
$0006 |   |   |   |   |
$F006 |   |   |   |   |
$FB02 |   |   |   |   |
Mapping Method | Tag Size (bits) | Block (or Set) Address Size (bits) | Word Select Address Size (bits) |
---|---|---|---|
Direct | 32-13=19 | 9 | 4 |
Fully Associative | 32-4=28 | 0 | 4 |
2-way Set Associative | 32-12=20 | 8 | 4 |
4-way Set Associative | 32-11=21 | 7 | 4 |
Address | Miss or Hit | Cache Set | Block # in Set | Reasoning |
---|---|---|---|---|
$F002 | M | 00 | 0 | Middle 2 hex digits give set |
$A202 | M | 20 | 0 | New tag |
$B202 | M | 20 | 1 | New tag |
$F00C | H | 00 | 0 |   |
$E002 | M | 00 | 1 | New tag |
$F012 | M | 01 | 0 | New tag |
$0002 | M | 00 | 0 | New tag, LRU replacement |
$F002 | M | 00 | 1 | Block was replaced |
$0006 | H | 00 | 0 |   |
$F006 | H | 00 | 1 |   |
$FB02 | M | B0 | 0 | New tag |