ECE 1747H Parallel Programming
Fall 2014


Cristiana Amza
BA 4142
(416) 946-0299


Sahel Sharify
BA 4187


BA 4164

Class Time:

Mondays 3:00-5:00 PM

Project List:


Office Hours:

Mondays 1:00-3:00 PM (BA 4142)

Class Project Description


- Oct. 22, 2014:  Please find the updated time of project class presentations below.

- Sep. 11, 2014:  Papers are assigned. You may find your names in the table.

- Sep. 9, 2014:  Welcome to this class. This Web page contains the main info for this course. The first item on our agenda is choosing a paper to present. As mentioned yesterday in class, please e-mail myself and also my TA (Sahel Sharify) with your top three paper selections, in the order of your preference. I would also appreciate it if you can go to the doodle link that is posted here and fill in three or more paper selections. Hopefully, with all this info, we can assign each of you a paper to present. Thank you to the people who have already done this. The papers will be assigned in FIFO order of our getting your preferences and also based on the doodle for people we have difficuly assigning after the first round of e-mails are in. A final paper assignment with each student name assigned to a paper and some possible slight reshuffling of the paper presentation schedule will be posted after all selections are in. Thank you so much for your collaboration on this and I look forward to teaching you in this class!

- Sep. 8, 2014:  You may sign up for paper presentations @ Presentation Poll.



This course is an intermediate graduate course in the area of parallel programming. In the first part of the course we will briefly introduce the architecture of parallel systems and the concept of data dependencies/races. The three most commonly used parallel programming paradigms (shared memory, distributed memory and data parallel) will  then be examined in detail. An overview of automatic parallelization of programs and the use of parallel processing in related domains such as parallel and distributed database transaction processing will also be given. 
In the second part of the course selected research topics will be examined. This part of the course consists of student-lead discussions of relevant research papers.  A research-intensive group project in an area related to program parallelization is a fundamental part of the course.  The projects can be done individually or in small teams of two or three people. The project outcome will be presented in a class session at the end of the semester. A list of suggested research projects has been posted (project_suggestions.txt). Students are also encouraged to propose their own projects and discuss them with me. Please also read:
Class Goals and Advice from Instructor


Textbooks and Pre-requisites

There is no required textbook for the class. You should be fine with the lectures and papers posted on this site. However, here are some suggestions for additional reading:

Parallel Programming in C with MPI and OpenMP
by Michael J. Quinn

Threads Primer: A Guide to Multithreaded Programming
by Bil Lewis, Daniel J. Berg

Concurrency Control and Recovery in Database Systems
by Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (free on-line edition you can download from in .pdf)

It would be good for you if you had basic understanding of operating system principles, basic architecture and some knowledge of network programming. These are not strict pre-requisites though, most of the necessary material will be covered in class.

Help for Project

Sample project report: Link to Sample Project Report.

Help for Programming Assignments

Pthread program examples have been posted here: Code Examples.

Please also consult this pthread How to use guide and this pthread reference manual

Programming Assignments

The first programming assignment is out: Assignment 1.

Pthread program examples have been posted here: Code Examples. Please also consult this pthread How to use guide and this pthread reference manual

MPI program examples have been posted here: Code Examples. Please also consult this MPI How to use guide  







Due Dates

Sep 8

Intro and project suggestion Slides-part1 Slides-part2



Sep 15

Parallel Programming and Optimizations Pthreads OpenMP Slides



Sep 22

Parallel Programming and Optimizations Project ideas: TM/Games
Answer to Challenge
Brain Pipeline

1.Locality Aware Dynamic Load Management for Massively Multiplayer Games, Jin Chen, Baohua Wu, Margaret Delap, Bjorn Knutsson, Honghui Lu and Cristiana Amza, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2005),June 2005
Tanzim Mokammel

Paper summaries

Sep 29

Lock Synchronization and Optimization


2. Parallelization and Performance of Interactive Multiplayer Game Servers, Ahmed Abdelkhalek and Angelos Bilas. In Proc. of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), April 2004
Peter Goodman

3. Donnybrook: Enabling Large-Scale, High-Speed, Peer-to-peer Games, Ashwin Bharambe, John R. Douceur, Jacob R. Lorch, Thomas Moscibroda, Jeffrey Pang, Srinivasan Seshan, and Xinyu Zhuang, SIGCOMM, 2008.
Yasser Khan

Paper summaries

Oct 6

Distributed Applications and Environments

Informal oral project proposals.


4. Algorithms for scalable synchronization on shared-memory multiprocessors , John M. Mellor-Crummey and Michael L. Scott. ACM Transactions on Computer Systems, 9 (1):21-65, February 1991.

Tutorial on MapReduce

5. CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications
Kevin Murray

Proposal & Paper summaries

Oct 20

Software Distributed Shared Memory

Informal oral project proposals (contd).


6. Memory Coherence in Shared Virtual Memory Systems, Kai Li, Paul Hudak, 1991 ivy91.pdf
Hatif Sattar

7. Implementation and Performance of Munin. John Carter, John Bennett, and Willy Zwaenepoel
Zohaib Alam

8. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , P. Keleher, A.L. Cox, S. Dwarkadas and W. Zwaenepoel, OSDI '94.
Haoyuan Liu

Short Introduction to Event-Driven Servers by Instructor

Proposal & Paper summaries

Oct 27

Programming Paradigms for new Environments: OpenMP on Clusters, CUDA, and OpenCL for GPUs

9. OpenMP for Networks of SMPs , Y.C. Hu, H. Lu, A.L. Cox, and W. Zwaenepoel, Journal of Parallel and Distributed Computing, vol. 60 (12), pp. 1512-1530, December 2000
Dhaval Miyan

10. From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , Du Peng, Rick Weber, Piotr Luszczek, Stanimire Tomov, Gregory Peterson, and Jack Dongarra, Parallel Computing 38, no. 8 (2012): 391-407.
Wenbo Ren

Papers summaries

Nov 3 

Multithreading vs Event-Driven model for Server Code


11. A Performance Study of General-Purpose Applications On Graphics Processors Using CUDA , Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Kevin Skadron, JPDC, Volume 68, Issue 10 General-Purpose Processing Using Graphics Processing Units, October 2008.
Xiaoyong Jiang

12. Productivity of GPUs Under Different Programming Paradigms, Maria Malik,Teng Li, Umar Sharif, Rabia Shahid, Tarek El-Ghazawi, Greg Newby, Concurrency and Computation: Practice and Experience 2012.
Gary Chaw

13. Porting a neuro-imaging application to a CPU-GPU cluster , R. S. Nakhjavani, S. Sharify, A. B. Hashemi, A. W. Lu, C. Amza, and S. Strother, High Performance Computing & Simulation (HPCS), International Conference on, pp. 137-145. IEEE, 2014.
Andrew Lee

Paper summaries

Nov 10

Advanced Synchronization Mechanisms in Multiprocessor and Distributed Systems

13. Flash: An Efficient and Portable Web Server , Vivek S. Pai, Peter Druschel, Willy Zwaenepoel, USENIX Annual Technical Conference, 1999.
Zhixu Han

14. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services , Presented at the Eighteenth Symposium on Operating Systems Principles (SOSP'01), Lake Louise, Canada, October 24, 2001.
Alan Ng

15. Adaptive Overload Control for Busy Internet Servers, Matt Welsh and David Culler. In Proceedings of the 4th USENIX Conference on Internet Technologies and Systems (USITS'03), March 2003.
Dustin Kut Moy Cheung

16. Lazy Asynchronous I/O for Event Driven Servers, Elmeleegy, Anupam Chanda, Alan L. Cox and Willy Zwaenepoel, in Proceedings of the USENIX 2004 Annual Technical Conference.
Optional reading

Paper summaries

Nov 17

Nonblocking Synchronization and TM

17. Code Transformations to Improve Memory Parallelism, Vijay S. Pai and Sarita Adve
David Carney

18. Read Copy Update: Using Execution History to Solve Concurrency Problems, Mc Kenney P. andSlingwine J.
Francis Deslauriers

19. Scherer III: Software Transactional Memory for Dynamic-Sized Data Structures, Maurice Herlihy, Victor Luchangco, Mark Moir, William N. PODC 2003
Yi Xin Wang

Paper summaries

Nov 24

Nonblocking Synchronization and TM

20. McRT-STM: A High Performance Software Transactional Memory System for a Multi-Core Runtime, Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, Benjamin Hertzberg. PPoPP 2006
Venkatesh Mahadevan

21. Exploiting Distributed Version Consistency in a Transactional Memory Cluster, Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza. PPoPP 2006
Ahmadul Hassan

Paper summaries

Dec 1

Transactional Memory

Intel Haswell TSX

TSX Overview Slides

22. A Case for Staged Database Systems, Stavros Harizopoulos and Anastassia Ailamaki
Yunlei Zhang

23. Transactional Memory Support for Scalable and Transparent Parallelization of Multiplayer Games, Daniel Lupei, Bogdan Simion, Don Pinto, Matthew Misler, Mihai Burcea, William Krick and Cristiana Amza.
Abhishek Rudra

24. Scheduling Support for Transactional Memory Contention Management, Maldonado et al. PPoPP 2010

Paper summaries

Dec 4


Project class presentation

Time: 10a.m - 4p.m, Location: BA 2135


Dec 22

Final project report due (by e-mail to me, just the report please)