ACA

ECE1773: Advanced Computer Architecture, Fall 2007
ECE, Univ. of Toronto

Instructor: Andreas Moshovos, EA310, x6-7373, moshovos@eecg.toronto.edu
Lectures: Monday 12-2 BA4164 & Thursday 2-4 WB130

Communication: Use e-mail as much as possible, Subject should start with “ACA:”,

Office hours: Stop by anytime (preferred method), if I’m available we can talk, or make an appointment through e-mail.

PROJECT PRESENTATION SCHEDULE

If there is a conflict with the schedule please e-mail me ASAP.

Final report due on December 21^st, 11:59pm EST.

Please submit via e-mail with the header: “ACA: Final report”

Project Presentations:

We will meet on Thursday and Friday, December 13 and 14 respectively starting at 12:30pm.

The presentation will be held in Pratt 266.

Please bring your own laptop, a projector will be provided.

Project Proposal and Report Requirements

How to use the EIO traces that were provided on the CD:

The EIO traces on the CD are compressed and are meant to be used with the simulator given in the myss directory.

That simulator is a modified version of simplescalar.

Before using it, make sure to do edit the Makefile and remove all references to “condor_compile”.

Then do a “make config-pisa”.

If you compile on cygwin, thanks to recent changes in the libraries, you may need to include –lintl –liconv in the LIBS macro in the Makefile.

If you get an error “config.h” not found, do a “ln –s target-pisa/config.h .”

Lecture Notes

1, What is this Course About - Technology – Course Outline - Expectations

Readings:

Required:

(a) Read this before the next lecture: Micro-architectural Innovations: Boosting Processor Performance Beyond Technology Scaling, A. Moshovos and G. S. Sohi, IEEE Proceedings, Jan. 2001.

(b) Reference for the Simplescalar toolset: Simplescalar report. You are not expected to read this in one go. Use it as a reference.

(c). Read this before the end of the course: The Task of the Referee, Alan Jay Smith, IEEE Computer, 1990.

Optional:

(a). Preliminary discussion of the logical design of an electronic computing instrument, Arthur W. Burks / Herman H. Goldstine / John von Neumann, Inst. for Advanced Study, Princeton, N. J., 1946

(b). Strong Inference, John R. Platt, Science, 1964.

2, Pipelining and Precise Interrupts

Readings:

(a) Implementing Precise Interrupts in Pipelined Processors, J. E. Smith and A. Plezkun, IEEE Transactions on Computers, May 1988. Required.

(b) Optimizing Pipelines for Power and Performance, V. Srinivasan, D. Brooks, M. Gshwind and P. Bose, in the Proceedings of the ACM/IEEE Annual Symposium on Microarchitecture, Nov. 2002. Optional.

3, Superscalar Execution

4, Control Flow Prediction part #1

Part #2 is now included in the preceding link.

5, Introduction to OOO Execution and Register Renaming

6, Renaming and Scheduling

Readings: Complexity Effective Superscalar Processors, S. Parlacharla, N. Jouppi and J. E. Smith, Proceedings of the Annual International Symposium on Computer Architecture, 1997.

7, Scheduling Optimizations

Readings:
A High-Speed Dynamic Instruction Scheduling Scheme for Superscalar Processors, Masahiro Goshima, Kengo Nishino, Yasuhiko Nakashima, Shin-ichiro Mori,Toshiaki Kitamura, and Shinji Tomita, MICRO 2001.

8, Simplescalar’s OOO Timing Simulator

9, Very Long Instruction Word Architectures

10, Instruction Supply and Load/Store Scheduling

Homeworks

Who are you? Due Thursday, Oct. 4
Using Simplescalar’s Functional Simulator: Understanding the power of simulation and the need for “validating” the results so collected. Due Thursday Oct. 11^th at the beginning of the lecture.

You will need these files and the Simplescalar simulator source code.

Other relevant files:

a. Please install Cygwin on an windows machine. Visit www.cygwin.com.

b. GCC port for Simplescalar. Installs under /usr/local.

c. MIPS ISA reference. Note that Simplescalar implements a modified MIPS-I instruction set architecture.

d. Simplescalar report.

2. (a) Read and summarize in two pages at most the TAGE branch predictor paper: http://www.irisa.fr/caps/people/seznec/L-TAGE.pdf

(b) Using sim-safe.c study accuracy of a BTB. The BTB should be indexed using the PC of branches and should return the taken target address of the branch. Do not use tags. Report accuracy only for those branches that are taken. Vary the size from 1 to 1024 entries in power of two steps. Study only direct-mapped BTBs. Use cc1.ss.lit from hw1 for this study. Run cc1.ss.lit as follows: cc1.lit.ss –O2 gcc.i

3. Using sim-outorder, Simplescalar’s timing simulator, measure how many operands are ready for instructions that enter the scheduler (done in ruu_dispatch). Collect the following statistics:

1. A graph where the Y axis is a percentage of all dynamic instructions. The graph should report the percentage of instructions that have no ready operand, 1 operand ready or 2 operands ready.

2. A graph where the Y axis is again a percentage of dynamic instructions. The graph should show the instructions that have 0, 1 or 2 source operands and those that produce a result into a register.

Collect these statistics for the cc1.lit.ss benchmark used as cc1.lit. ss –O2 gcc.i

PROJECT

Here’s a list of suggested papers:

A Case for MLP-Aware Cache Replacement, M. K. Qureshi et. al., ISCA 2006.
Increasing the Size of Atomic Instruction Blocks using Control Flow Assertions, S. Patel, MICRO 2000.
Selective value prediction, B. Calder et. al, ISCA 1999
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth, Bracy et. al, MICRO 2004.
Efficient Dynamic Scheduling Through Tag Elimination, Dan Ernst and Todd Austin, ISCA 2002.
Cache decay: exploiting generational behavior to reduce cache leakage power, S. Kaxiras et. al., ISCA 2001.
Scalable Store-Load Forwarding via Store Queue Index Prediction, S. Stone et. al., MICRO 2005.
NUCA: A Non-Uniform Cache Access Architecture for Wire-Delay Dominated On-Chip Caches, Changkyu Kim Doug Burger Stephen W. Keckler, ASPLOS 02.
Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching, Eric Rotenberg, Steve Bennett, James E. Smith, MICRO.