ECE1773:
Advanced Computer Architecture, Fall 2006
ECE,
Instructor: Andreas
Moshovos, EA310, x6-7373, moshovos@eecg.toronto.edu
Lectures: Tuesday 1-3 BA2139 & Friday 3-5 BA4164
Communication:
Use
e-mail as much as possible, Subject should start with “ACA:”,
Office hours: Stop by
anytime (preferred method), if I’m available we can talk, or make an
appointment through e-mail.
ROOM CHANGE: I was informed that our Tuesday room will not be
available during the first two weeks of December.
We are going to meet as follows:
Tuesday, Dec. 5 1 - 3
pm HA 316 (
Tuesday, Dec. 12 1 - 3 pm BA1230
Lecture Notes
1, What is this Course About - Technology – Course Outline - Expectations
(a). Preliminary
discussion of the logical design of an electronic computing instrument,
Arthur W. Burks / Herman H. Goldstine / John von Neumann, Inst. for Advanced
Study, Princeton, N. J., 1946
(b). The Task of the Referee, Alan Jay Smith,
IEEE Computer, 1990.
(c). Strong Inference, John R. Platt,
Science, 1964.
2, Performance Metrics, Summarizing
Performance, Benchmarking
(a) Characterizing Computer Performance with
a Single Number, J. E. Smith, Communications of the ACM, Oct. 1988.
3. Instruction Set Architecture
(a) Compilers and Computer
Architecture, W. A. Wulf, IEEE Computer, July 1981.
(b) A Characterization of Processor Performance
in the VAX-11/780, in the Proceedings of the 11th Annual ACM/IEEE
International Conference on Computer Architecture, 1984.
4. Pipelining and Precise Interrupts
(a) Implementing Precise Interrupts in Pipelined
Processors, J. E. Smith and A. Plezkun, IEEE Transactions on Computers, May
1988.
(b) Optimizing Pipelines for Power and
Performance, V. Srinivasan, D. Brooks, M. Gshwind and P. Bose, in the
Proceedings of the ACM/IEEE Annual Symposium on Microarchitecture, Nov. 2002.
(c) The Optimal Logic Depth Per Pipeline
Stage is 6 to 8 FO4 Inverter Delays, M. S. Hrishikesh. Doug Burger, Stephen
W. Keckler, Premkishore Shivakumar, Norman P. Jouppi, Keith I. Farkas, in the
Proceedings of the Annual ACM/IEEE International Conference on Computer
Architecture, June 2002.
5. Superscalar and Out-of-Order Execution
(a) The MIPS R10000 Processor, Kenneth C. Eager, IEEE
Micro, 1996.
(b) The Alpha 21264 Processor, R. E. Kessler,
IEEE Micro, 1999.
(c) Power5 System Microarchitecture, B. Sinharoy, R.
N. Kala, J. M. Tendler, R. J. Eickemeyer, J. B. Joyner, IBM Journal of Research
and Development, July/Sept. 2005.
Optional
– Overview Papers:
(d) The micro-architecture of superscalar
Processors, J. E. Smith and G. S. Sohi, IEEE Proceedings, Aug. 1995.
(e) Micro-architectural Innovations: Boosting
Processor Performance Beyond Technology Scaling, A. Moshovos and G. S.
Sohi, IEEE Proceedings, Jan. 2001.
(a) Alternative Implementations of Two-Level
Branch Predictors, Yeh and Patt, ISCA, 1992.
(b) Combined Branch Predictors, S.
McFarling, DEC WRL Technical Report, 1993.
(c)
Dynamic History Length Fitting: A
Third level of Adaptivity for Branch Prediction, Juan, Sanjeevan,
Navarro, ISCA, 1998.
7. Instruction Supply and Memory
Dataflow
(a) Optimization of Instruction Fetch
Mechanisms for High Issue Rates, Thomas M. Conte, Kishore N. Menezes,
Patrick M. Mills, Burzin A. Patel, ISCA, 1995.
(b) Memory Dependence Speculation
Tradeoffs in Centralized, Continuous-Window Superscalar Processors, Andreas
Moshovos and Gurindar S. Sohi, Proc. of HPCA-6, Feb. 2000.
8. VLIW
9. SIMD: Array Processors, Vector Processors and
Multimedia Extensions
(a)
Altivec
Extension to PowerPC Accelerates Media Processing, K. Deifendorff, P. K.
Dubey, R. Hochsprung and H. Scales, IEEE MICRO, 2002.
(b) Vector Unit Architecture for Emotion
Synthesis, Atsushi Kunimatsu, Nobuhiro Ide, Toshinori Sato, Yukio Endo,
Hiroaki Murakami, Takayuki Kamei, Masashi Hirano, Fujio Ishihara, Haruyuki
Tago, Masaaki Oka, Akio Ohba, Teiji Yutaka, Toyoshi Okada, Masakazu Suzuoki,
IEEE MICRO, March/April 2000.
Optional
(c)
The
CRAY-1 Computer System, R. M. Russel, Communications of the ACM, January
1978.
(d) The
Illiac IV System, W. J. Bouknight, S. A. Deneberg, D. A. McIntyre, J. M.
Randall, A. H. Sameh, D. L. Slotnick, Proceedings of the IEEE, April 1972.
10. The Simplescalar Out-of-Order Simulator
11. On-Chip Caches
12. Paper
Discussion, Friday, November 17
GUIDELINES FOR PAPER
PRESENTATIONS
·
Presenter: Jeremy à Dan Ernst and
Todd Austin, “Efficient Dynamic
Scheduling Through Tag Elimination,” ACM/IEEE 29th International
Symposium on Computer Architecture (ISCA-2002), May 2002.
·
Presenter: Andrija à Dan Ernst,
Andrew Hamel, and Todd Austin, “Cyclone:
A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay,”
ACM/IEEE 30th Annual International Symposium on Computer Architecture
(ISCA-2003), June 2003.
13. Paper
Discussion, Friday, November 24
·
Presenter: à E. Rotenberg
et al., “Trace Processors”,
ACM/IEEE 30th International Conference on Microarchitecture, 1997.
·
Presenter: à Z. Purser et
al., “A Study of Slipstream Processors”,
ACM/IEEE 34th International Conference on Microarchitecture, 2000.
14. Paper
Discussion, Tuesday, November 28
T. N.
Vijaykumar et al., “Transient-Fault Recovery
via Simultaneous Multithreading”, 29th ISCA, 2002.
T. Austin, “DIVA: A Dynamic Approach to Microprocessor
Verification,” Journal of Instruction Level Parallelism, Extended
version of paper that appeared in MICRO-32, 1999.
15.
Paper Discussion, Tuesday December 5
S. Ghopal et
al., “Speculative Versioning Cache”,
HPCA, 1998
B. Fields et al., “Focusing Processor Policies via Critical Path
Prediction”, ISCA 2001.
Homeworks
0.
Who are you? Due Friday,
Sept. 28
1.
The PISA, MIPS-like instruction set and the role of
an optimizing compiler, Due Tuesday, Oct. 10.
a.
Please install Cygwin on an windows machine. Visit www.cygwin.com.
b.
GCC port for Simplescalar. Installs
under /usr/local.
c.
MIPS ISA reference. Note that Simplescalar
implements a modified MIPS-I instruction set architecture.
d.
mytest.c
2.
Using Simplescalar’s Functional Simulator:
Understanding
the power of simulation and the need for “validating” the results
so collected. Due Tuesday Oct. 17th.
You
will need these files and the Simplescalar simulator source code.
3.
Issue-with and Window: How do they affect
performance? Due Tuesday, Nov. 7th.
Additional
files needed:
1.
fppp binary
2.
fppp input
3.
gcc input
How to run the
benchmarks:
1.
gcc: gcc.ss –O3
–funroll-loops –finline-functions regclass.i
2.
fppp: fppp.ss < natoms.in
4.
Project