International Conference on 
Parallel Architecture and Compilation Techniques
Toronto, Canada
October 25-29, 2008

Main Page


Workshops and Tutorials



Local Information

Student Grants

Previous PACTs:


If you are looking for PaCT (Parallel Computing Technologies), please follow this link: PaCT-2007.

The Seventeenth International Conference on
Parallel Architectures and Compilation Techniques (PACT)

Holiday Inn on King
Toronto, CANADA
October 25-29, 2008

Special Event


Visit the CN Tower including a Reception at the Horizons Café and Dinner at the 360 Restaurant on Tuesday evening.


Technical Program   (pdf)



Sunday, October 26

17:30 – 19:00



Monday, October 27

8:00 - 8:30


8:30 - 9:30

Keynote 1: Norm Rubin, AMD - GPU Evolution: Will Graphics Morph Into Compute?



9:30 - 10:00


10:00 - 12:00

Session 1: Compilation

Outer-Loop Vectorization - Revisited for Short SIMD Architectures, Dorit Nuzman and Ayal Zaks

Redundancy Elimination Revisited, Keith Cooper, Jason Eckhardt and Ken Kennedy

Exploiting Loop-Dependent Stream Reuse for Stream Processors, Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers, Gen Li and Guibin Wang

Feature Selection and Policy Optimization for 3D Instruction Placement using Reinforcement Learning, Katherine Coons, Behnam Robatmili, Matthew Taylor, Doug Burger and Kathryn McKinley

12:00 – 1:30

Lunch (provided)

1:30 – 3:00

Session 2: CMP Architecture Design

Core Cannibalization Architecture: Improving Lifetime Chip Performance for Multicore Processors in the Presence of Hard Faults, Bogdan Romanescu and Daniel Sorin

Pangaea: A Tightly-Coupled IA32 Heterogeneous Chip Multiprocessor, Henry Wong, Anne Bracy, Ethan Schuchman, Tor M. Aamodt, Jamison D. Collins, Perry H. Wang, Gautham Chinya, Ankur Khandelwal Groen, Hong Jiang, and Hong Wang

Skewed Redundancy, Gordon Bell and Mikko Lipasti

3:00 – 3:30


3:30 – 5:00

Session 3A: Analyzing Applications

The PARSEC Benchmark Suite: Characterization and Architectural Implications, Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh and Kai Li

Visualizing Potential Parallelism in Sequential Programs, Graham Price, John Giacomoni and Manish Vachharajani

Characterizing and Modeling the Behavior of Context Switch Misses, Fang Liu, Fei Guo and Yan Solihin

Session 3B: I/O Optimizations

MCAMP: Communication Optimization on Massively Parallel Machines with Hierarchical Scratch-pad Memory, Hiroshige Hayashizaki, Yutaka Sugawara, Mary Inaba and Kei Hiraki

Profiler and Compiler Assisted Adaptive I/O Prefetching for Shared Storage Caches, Seung Woo Son, Sai Prashanth Muralidhara, Ozcan Ozturk, Mahmut Kandemir, Ibrahim Kolcu and Mustafa Karakoy

Optimizing One-Sided Communication of Multiple Disjoint Memory Regions, Costin Iancu

Tuesday, October 28

8:00 - 8:30


8:30 - 9:30

Keynote 2: Saman Amarasinghe, MIT - (How) Can Programmers Conquer the Multicore Menace?

9:30 - 10:00


10:00 - 11:30

Session 4: Multicore Memory Hierarchy Design (Part 1)

Distributed Cooperative Caching, Enric Herrero Abellanas, José González González and Ramon Canal Corretger

Scalable and Reliable Communication for Hardware Transactional Memory, Seth Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar and Rajeev Balasubramonian

Improving Support for Locality and Fine-Grain Sharing in Chip Multiprocessors, Hemayet Hossain, Sandhya Dwarkadas and Michael Huang

11:30 – 1:00

Lunch (provided)

1:00 – 2:30

Session 5: Reconfigurable Architecture Optimization

Edge-centric Modulo Scheduling for Coarse-Grained Reconfigurable Architectures, Hyunchul Park, Kevin Fan, Scott Mahlke, Taewook Oh, Heeseok Kim and Hong-seok Kim

Chip multi-processor global power management with multi-optimization power-saving strategies, Ke Meng and Russ Joseph

Multitasking Workload Scheduling on Flexible-Core Chip Multiprocessors, Divya P. Gulati, Changkyu Kim, Simha Sethumadhavan, Stephen W. Keckler and Doug Burger

2:30 – 3:00


3:00 – 4:30

Session 6: Multicore Memory Hierarchy Design ( Part 2)

Leveraging On-Chip Networks for Cache Migration in Chip Multiprocessors, Noel Eisley, Li-Shiuan Peh and Li Shang

Adaptive Insertion Policies for Managing Shared Caches on CMPs, Aamer Jaleel, William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Simon Steely and Joel Emer

Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors, Yunlian Jiang, Xipeng Shen, Jie Chen and Rahul Tripathi

4:30 – 5:00


5:00 – 6:00

Session 7: Multithreading Improvements

An Adaptive Resource Partitioning Algorithm for SMT Processors, Huaping Wang, Israel Koren and C .Mani Krishna

Meeting Points: Using Thread Criticality to Adapt Multicore Hardware to Parallel Regions, Qiong Cai, Jose Gonzalez, Ryan Rakvic, Grigorios Magklis, Pedro Chaparro and Antonio Gonzalez

6:30 – 9:30

Special Event:  Visit the CN Tower

Reception at the Horizons Café and Dinner at the 360 Restaurant

Wednesday, October 29

8:00 - 8:30

Coffee Break

8:30 – 10:00

Session 8: Middleware and Runtime Systems

Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores, Matthew Curtis-Maury, Ankur Shah, Filip Blagojevic, Dimitrios S. Nikolopoulos, Bronis R. de Supinski and Martin Schulz

Mars: A MapReduce Framework on Graphics Processors, Bingsheng He, Wenbin Fang, Qiong Luo, Naga Govindaraju and Tuyong Wang

Multi-mode Energy Management for Multi-tier Server Clusters, Tibor Horvath and Kevin Skadron

10:00 – 10:30


10:30 – 12:00

Session 9: Programming the Memory Hierarchy

A Tuning Framework for Software-Managed Memory Hierarchies, Manman Ren, Ji Young Park, Mike Houston, Alex Aiken and William Dally

Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture, Marc Gonzalez, Nikola Vujic, Alexandre E. Eichenberger, Tong Chen, Xavier Martorell, Eduard Ayguada, Zehra Sura, Tao Zhang, Kevin O'Brien, and Kathryn O'Brien

COMIC: A Coherent Shared Memory Interface for Cell BE, Jaejin Lee, Sangmin Seo, Chihun Kim, Junghyun Kim, Posung Chun, Zehra Sura, Jungwon Kim and SangYong Han



Keynote #1: Norm Rubin, AMD GPU Evolution: Will Graphics Morph Into Compute?


In the last several years GPU devices have started to evolve into supercomputers. New, non-graphics, features are rapidly appearing along with new more general programming languages. One reason for the quick pace of change is that, games and hardware evolve together: Hardware vendors review the most popular games, looking for places to add hardware while game developers review new hardware, looking for places to add more realism. Today, we see both GPU devices and games moving from a model of looks real to one of acts real. One consequence of acts real is that evaluating physics, simulations, and artificial intelligence on a GPU is becoming an element of future game programs.


We will review the difference between a CPU and a GPU. Then we will describe hardware changes added to the current generation of AMD graphics processors, including the introduction of traditional compute operations such as double precision, scatter/gather and local memory. Along with new features, we have added new metrics like performance/watt and performance/dollar. The current AMD GPU processor delivers 9 gigaflops/watt and 5 gigaflops/dollar. For the last two generations, each AMD GPU has provided double the performance/watt of the prior machine. We believe the software community needs to become more aware and appreciate these metrics.


Because this has been a kind of co-evolution and not a process of radical change, current GPU devices have retained a number of odd sounding transitional features, including fixed functions like memory systems that can do filtering, depth buffers, a rasterizer and the like. Today, each of these remain because they are important for graphics performance.


Software on GPU devices also shows transitional features. As AI/physics virtual reality starts to become important, development frameworks have started to shift. Graphics APIs have added compute shaders.


Finally, there has been a set of transitional programs implemented by graphics programmers but whose only real connection with graphics is that the result is rendered. One early example is toy shop which contains a weak physical simulation of rain on window (it looks great but the random number generator would not pass any kind of test). A more recent and better acting program is March of the Froblins an AI program related to robotic path calculations. This program both simulates large crowds of independent creatures and shows how massively parallel compute can benefit character-centric entertainment.


Bio: Dr Rubin is a fellow at AMD where he is the main architect for the AMD/ATI graphics compiler. He has built commercial compilers for processors ranging from embedded (ARM), desktop (HP, ALPHA) and supercomputer (KSR). He has published numerous papers on compiler design. Norm holds a PhD from the Courant Institute of NYU. Besides his work in compilers, he is well known for his work in compiler related parts of the tool chain, binary translators and dynamic optimizers.


Keynote #2: Saman Amarasinghe, MIT - (How) Can Programmers Conquer the Multicore Menace?


The era of exponential improvement of processor performance, a byproduct of Moore’s Law, is over. Multicores are here to stay. While architects have known how to build parallel processors for over a half a century, the main stumbling block has been the difficulty in programming them. In the first part of the talk I will discuss the path to multicores, address why parallel programming has been such a difficult problem to solve and speculate on our ability to crack it this time around.


One promising approach to parallel programming is the use of novel programming language techniques -- ones that reduce the burden on the programmers, while simultaneously increasing the compiler's ability to get good parallel performance.  In the second part of the talk, I will introduce StreamIt: a language and compiler specifically designed to expose and exploit inherent parallelism in "streaming applications" such as audio, video, and network processing.  StreamIt provides novel high-level representations to improve programmer productivity within the streaming domain.  By exposing the communication patterns of the program, StreamIt allows the compiler to perform aggressive transformations and effectively utilize parallel resources.  StreamIt is ideally suited for multicore architectures; recent experiments on a 16-core machine demonstrate an 11x speedup over a single core.


Bio: Saman P. Amarasinghe is an Associate Professor in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). Currently he leads the Commit compiler group and was the co-leader of the MIT Raw project. Under Saman's guidance, the Commit group developed the StreamIt language and compiler for the streaming domain, Superword Level Parallelism for multimedia extensions, DynamoRIO dynamic instrumentation system, Program Shepherding to protect programs against external attacks, and Convergent Scheduling and Meta Optimization that uses machine learning techniques to simplify the design and improve the quality of compiler optimization. His research interests are in discovering novel approaches to improve the performance of modern computer systems and make them more secure without unduly increasing the complexity faced by either the end users, application developers, compiler writers, or computer architects. Saman was also the founder of Determina Corporation, which productized Program Shepherding. Saman received his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D from Stanford University in 1990 and 1997, respectively.