This is an old revision of the document!


Assignment #4

You are asked to think about the on and off-chip memory system for our DaDianNao like accelerator. Let's restrict attention to CNNs, where the layers are convolutions, pooling and fully-connected. Let us assume you have an external memory interface which can provide X bytes/cycle. For example, a DDR3-1600 memory system can, at peak provide, 1600 x 1M x 8B/ sec = 13,107,200 bytes/sec. Assuming a 1GHz operating frequency for our accelerator that would translate into just 12.5 bytes/cycle.

You are asked to think about strategies of how to allocate your on-chip memory to reduce as much as possible off-chip traffic to sustain as much as possible the execution cores. We will provide you with the architecture of a few CNNs shortly. You will have to calculate how much traffic will be needed

Assignment #3

Due Thursday, March 8, before class. Submission link will be provided in due time. Do not e-mail.

Description

Assignment #2

Due Thursday, February 15, before class. Submission link will be provided in due time. Do not e-mail.

Repeat Assignment #1 but use Intel's PIN tool.

Assignment #1

Due Thursday, February 1, before class. Submission link will be provided in due time. Do not e-mail.

Getting to know the Simplescalar Simulator Read the lab0 and lab4 handouts from ECE552. These will introduce you to the Simplescalar simulator. Our goal here is to modify the cache simulation module to implement a different replacement policy. The cache module is implemented in cache.c. The simplest simulator that uses it is sim-cache.c.

Part A: Modify cache.c to add a “not MRU” replacement policy. You will have to modify the cache_access() function and potentially others. For example, check whether you need to change cache_probe() too. Not MRU replaces one cache block a random except for the MRU. Sim-cache for the go and gcc traces first with LRU replacement and then with your notMRU.

Part B: Read the following paper: Adaptive Insertion Policies for High-Performance Caching, M. K. Qureshi et al, IEEE/ACM Intl’ Symposium on Computer Architecture, .

Modify cache.c to implement DIP. No need to implement set dueling.