ACA

ECE1773: Advanced Computer Architecture, Fall 2006
ECE, Univ. of Toronto

Instructor: Andreas Moshovos, EA310, x6-7373, moshovos@eecg.toronto.edu
Lectures: Tuesday 1-3 BA2139 & Friday 3-5 BA4164

Communication: Use e-mail as much as possible, Subject should start with “ACA:”,

Office hours: Stop by anytime (preferred method), if I’m available we can talk, or make an appointment through e-mail.

ROOM CHANGE: I was informed that our Tuesday room will not be available during the first two weeks of December.

We are going to meet as follows:

Tuesday, Dec. 5 1 - 3 pm HA 316 (Haultain Building)

Tuesday, Dec. 12 1 - 3 pm BA1230

Lecture Notes

1, What is this Course About - Technology – Course Outline - Expectations

Readings:

(a). Preliminary discussion of the logical design of an electronic computing instrument, Arthur W. Burks / Herman H. Goldstine / John von Neumann, Inst. for Advanced Study, Princeton, N. J., 1946

(b). The Task of the Referee, Alan Jay Smith, IEEE Computer, 1990.

(c). Strong Inference, John R. Platt, Science, 1964.

2, Performance Metrics, Summarizing Performance, Benchmarking

Readings:

(a) Characterizing Computer Performance with a Single Number, J. E. Smith, Communications of the ACM, Oct. 1988.

3. Instruction Set Architecture

Readings:

(a) Compilers and Computer Architecture, W. A. Wulf, IEEE Computer, July 1981.

(b) A Characterization of Processor Performance in the VAX-11/780, in the Proceedings of the 11th Annual ACM/IEEE International Conference on Computer Architecture, 1984.

4. Pipelining and Precise Interrupts

Readings:

(a) Implementing Precise Interrupts in Pipelined Processors, J. E. Smith and A. Plezkun, IEEE Transactions on Computers, May 1988.

(b) Optimizing Pipelines for Power and Performance, V. Srinivasan, D. Brooks, M. Gshwind and P. Bose, in the Proceedings of the ACM/IEEE Annual Symposium on Microarchitecture, Nov. 2002.

(c) The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays, M. S. Hrishikesh. Doug Burger, Stephen W. Keckler, Premkishore Shivakumar, Norman P. Jouppi, Keith I. Farkas, in the Proceedings of the Annual ACM/IEEE International Conference on Computer Architecture, June 2002.

5. Superscalar and Out-of-Order Execution

Readings:

(a) The MIPS R10000 Processor, Kenneth C. Eager, IEEE Micro, 1996.

(b) The Alpha 21264 Processor, R. E. Kessler, IEEE Micro, 1999.

(c) Power5 System Microarchitecture, B. Sinharoy, R. N. Kala, J. M. Tendler, R. J. Eickemeyer, J. B. Joyner, IBM Journal of Research and Development, July/Sept. 2005.

Optional – Overview Papers:

(d) The micro-architecture of superscalar Processors, J. E. Smith and G. S. Sohi, IEEE Proceedings, Aug. 1995.

(e) Micro-architectural Innovations: Boosting Processor Performance Beyond Technology Scaling, A. Moshovos and G. S. Sohi, IEEE Proceedings, Jan. 2001.

6. Control Flow Speculation

Readings:

(a) Alternative Implementations of Two-Level Branch Predictors, Yeh and Patt, ISCA, 1992.

(b) Combined Branch Predictors, S. McFarling, DEC WRL Technical Report, 1993.

(c) Dynamic History Length Fitting: A Third level of Adaptivity for Branch Prediction, Juan, Sanjeevan, Navarro, ISCA, 1998.

7. Instruction Supply and Memory Dataflow

Readings:

(a) Optimization of Instruction Fetch Mechanisms for High Issue Rates, Thomas M. Conte, Kishore N. Menezes, Patrick M. Mills, Burzin A. Patel, ISCA, 1995.

(b) Memory Dependence Speculation Tradeoffs in Centralized, Continuous-Window Superscalar Processors, Andreas Moshovos and Gurindar S. Sohi, Proc. of HPCA-6, Feb. 2000.

8. VLIW

9. SIMD: Array Processors, Vector Processors and Multimedia Extensions

Readings:

(a) Altivec Extension to PowerPC Accelerates Media Processing, K. Deifendorff, P. K. Dubey, R. Hochsprung and H. Scales, IEEE MICRO, 2002.

(b) Vector Unit Architecture for Emotion Synthesis, Atsushi Kunimatsu, Nobuhiro Ide, Toshinori Sato, Yukio Endo, Hiroaki Murakami, Takayuki Kamei, Masashi Hirano, Fujio Ishihara, Haruyuki Tago, Masaaki Oka, Akio Ohba, Teiji Yutaka, Toyoshi Okada, Masakazu Suzuoki, IEEE MICRO, March/April 2000.

Optional Readings:

(c) The CRAY-1 Computer System, R. M. Russel, Communications of the ACM, January 1978.

(d) The Illiac IV System, W. J. Bouknight, S. A. Deneberg, D. A. McIntyre, J. M. Randall, A. H. Sameh, D. L. Slotnick, Proceedings of the IEEE, April 1972.

10. The Simplescalar Out-of-Order Simulator

11. On-Chip Caches

12. Paper Discussion, Friday, November 17

GUIDELINES FOR PAPER PRESENTATIONS

· Presenter: Jeremy à Dan Ernst and Todd Austin, “Efficient Dynamic Scheduling Through Tag Elimination,” ACM/IEEE 29th International Symposium on Computer Architecture (ISCA-2002), May 2002.

· Presenter: Andrija à Dan Ernst, Andrew Hamel, and Todd Austin, “Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay,” ACM/IEEE 30th Annual International Symposium on Computer Architecture (ISCA-2003), June 2003.

13. Paper Discussion, Friday, November 24

· Presenter: à E. Rotenberg et al., “Trace Processors”, ACM/IEEE 30^th International Conference on Microarchitecture, 1997.

· Presenter: à Z. Purser et al., “A Study of Slipstream Processors”, ACM/IEEE 34^th International Conference on Microarchitecture, 2000.

14. Paper Discussion, Tuesday, November 28

T. N. Vijaykumar et al., “Transient-Fault Recovery via Simultaneous Multithreading”, 29^th ISCA, 2002.

T. Austin, “DIVA: A Dynamic Approach to Microprocessor Verification,” Journal of Instruction Level Parallelism, Extended version of paper that appeared in MICRO-32, 1999.

15. Paper Discussion, Tuesday December 5

S. Ghopal et al., “Speculative Versioning Cache”, HPCA, 1998

B. Fields et al., “Focusing Processor Policies via Critical Path Prediction”, ISCA 2001.

Homeworks

0. Who are you? Due Friday, Sept. 28

1. The PISA, MIPS-like instruction set and the role of an optimizing compiler, Due Tuesday, Oct. 10.

a. Please install Cygwin on an windows machine. Visit www.cygwin.com.

b. GCC port for Simplescalar. Installs under /usr/local.

c. MIPS ISA reference. Note that Simplescalar implements a modified MIPS-I instruction set architecture.

d. mytest.c

e. Simplescalar report.

2. Using Simplescalar’s Functional Simulator: Understanding the power of simulation and the need for “validating” the results so collected. Due Tuesday Oct. 17th.

You will need these files and the Simplescalar simulator source code.

3. Issue-with and Window: How do they affect performance? Due Tuesday, Nov. 7^th.

Additional files needed:

1. fppp binary

2. fppp input

3. gcc input

How to run the benchmarks:

1. gcc: gcc.ss –O3 –funroll-loops –finline-functions regclass.i

2. fppp: fppp.ss < natoms.in

4. Project