ECE 1747H Parallel Programming
Fall 2022


Cristiana Amza
BA 4142
(416) 946-0299
amza at


Seyed Ali Jokar


WB 219

Class Time:

Thursday 3:00-5:00 PM

Project List:


Office Hours:

Fridays 2:00-3:00 PM

Important Information about Project


- Sep. 14, 2022:  Website Launched.

Please enroll on piazza at:


This course is an intermediate graduate course in the area of parallel programming. In the first part of the course we will briefly introduce the architecture of parallel systems and the concept of data dependencies/races. The three most commonly used parallel programming paradigms (shared memory, distributed memory and data parallel) will  then be examined in detail. An overview of automatic parallelization of programs and the use of parallel processing in related domains such as parallel and distributed database transaction processing will also be given. 
In the second part of the course selected research topics will be examined. This part of the course consists of student-led discussions of relevant research papers.  A research-intensive group project in an area related to program parallelization is a fundamental part of the course.  The projects can be done individually or in small teams of two or three people. The project outcome will be presented in a class session at the end of the semester. A list of suggested research projects has been posted (project_suggestions.txt). Students are also encouraged to propose their own projects and discuss them with me. Please also read:
Class Goals and Advice from Instructor


Textbooks and Pre-requisites

There is no required textbook for the class. You should be fine with the lectures and papers posted on this site. However, here are some suggestions for additional reading:

Parallel Programming in C with MPI and OpenMP
by Michael J. Quinn

Threads Primer: A Guide to Multithreaded Programming
by Bil Lewis, Daniel J. Berg

Concurrency Control and Recovery in Database Systems
by Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (free on-line edition you can download from in .pdf)

It would be good for you if you had basic understanding of operating system principles, basic architecture and some knowledge of network programming. These are not strict pre-requisites though, most of the necessary material will be covered in class.

Help for Project

Sample project report: Link to Sample Project Report.

Help for Programming Assignments

Pthread program examples have been posted here: Code Examples.

Please also consult this pthread reference manual

Programming Assignments



Here is a list of papers you will present. Please select a paper that you would like to present using this link. SELECTION IS BASED ON FIRST COME FIRST SERVED.




Due Dates

Sep 15

Introduction Slides-part1 Slides-part2



Sep 22

Parallel Programming and Optimizations Pthreads OpenMP Slides



Sep 29 (1)

Other Parallel Programming Paradigms MPI

Tutorial on MapReduce

Sep 29 (2)

Parallel Programming and Optimizations Project ideas: TM/Games
Brain Pipeline

1.Locality Aware Dynamic Load Management for Massively Multiplayer Games, Jin Chen, Baohua Wu, Margaret Delap, Bjorn Knutsson, Honghui Lu and Cristiana Amza, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2005),June 2005 slides

Paper summaries

Oct 6

Lock Synchronization and Optimization


2. Parallelization and Performance of Interactive Multiplayer Game Servers, Ahmed Abdelkhalek and Angelos Bilas. In Proc. of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), April 2004

3. Donnybrook: Enabling Large-Scale, High-Speed, Peer-to-peer Games, Ashwin Bharambe, John R. Douceur, Jacob R. Lorch, Thomas Moscibroda, Jeffrey Pang, Srinivasan Seshan, and Xinyu Zhuang, SIGCOMM, 2008.

4. Algorithms for scalable synchronization on shared-memory multiprocessors , John M. Mellor-Crummey and Michael L. Scott. ACM Transactions on Computer Systems, 9 (1):21-65, February 1991.

Paper summaries

Oct 13

Software Distributed Shared Memory

5. Memory Coherence in Shared Virtual Memory Systems, Kai Li, Paul Hudak, 1991 ivy91.pdf

6. Implementation and Performance of Munin. John Carter, John Bennett, and Willy Zwaenepoel

7. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , P. Keleher, A.L. Cox, S. Dwarkadas and W. Zwaenepoel, OSDI '94.

Paper summaries

Oct 20  

Programming Paradigms for GPUs: CUDA and OpenCL


8. OpenMP for Networks of SMPs , Y.C. Hu, H. Lu, A.L. Cox, and W. Zwaenepoel, Journal of Parallel and Distributed Computing, vol. 60 (12), pp. 1512-1530, December 2000

9. A Performance Study of General-Purpose Applications On Graphics Processors Using CUDA , Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Kevin Skadron, JPDC, Volume 68, Issue 10 General-Purpose Processing Using Graphics Processing Units, October 2008.

10. From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , Du Peng, Rick Weber, Piotr Luszczek, Stanimire Tomov, Gregory Peterson, and Jack Dongarra, Parallel Computing 38, no. 8 (2012): 391-407.

11. A massively parallel adaptive fast-multipole method on heterogeneous architectures, Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, and George Biros, IEEE Supercomputing 2009.

Paper summaries

Oct 27

Multithreading vs Event-Driven model for Server Code

12. Flash: An Efficient and Portable Web Server , Vivek S. Pai, Peter Druschel, Willy Zwaenepoel, USENIX Annual Technical Conference, 1999.

13. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services , Presented at the Eighteenth Symposium on Operating Systems Principles (SOSP'01), Lake Louise, Canada, October 24, 2001.

14. Adaptive Overload Control for Busy Internet Servers, Matt Welsh and David Culler. In Proceedings of the 4th USENIX Conference on Internet Technologies and Systems (USITS'03), March 2003.

Paper summaries

Nov 3

Dynamic Content Scheduling and TM

15. Lazy Asynchronous I/O for Event Driven Servers, Elmeleegy, Anupam Chanda, Alan L. Cox and Willy Zwaenepoel, in Proceedings of the USENIX 2004 Annual Technical Conference.

16. Software Transactional Memory for Dynamic-Sized Data Structures, Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III, PODC 2003

17. Conflict-aware scheduling for dynamic content applications, Cristiana Amza, Alan Cox and Willy Zwaenepoel, Usenix USITS 2003.

Paper summaries

Nov 10

Reading week; no class

Nov 17

Transactional Memory

Intel Haswell TSX

TSX Overview Slides

18. McRT-STM: A High Performance Software Transactional Memory System for a Multi-Core Runtime, Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, Benjamin Hertzberg. PPoPP 2006

19. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, NSDI 2012.

20. Exploiting Distributed Version Consistency in a Transactional Memory Cluster, Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza. PPoPP 2006

Paper summaries



Project class presentation



Final project report due (just the report and any links, not the code, please)