Advances in Distributed Systems
ECE 1746, Fall 2004
University of Toronto
Instructor: Ashvin
Goel
Course Time: Tuesday, 1-3 pm
Course Room: Galbraith Building (GB) 120
Start Date: Sept 14, 2004
Course Description
The exponential growth of Internet services demonstrates the importance
and potential of large-scale distributed systems. Today, Web services
allow online shopping of virtually any product from cheap second-hand
items to expensive art collections. Content delivery networks can
potentially speed these services by cleverly caching Web pages.
Peer-to-peer applications allow sharing of content in ways that are
making industry nervous about their profit margins. Multimedia services
provide streaming delivery of audio and video. The new classes of
distributed applications that are becoming ubiquitous seems endless:
cluster computing, grid computing, game services, pervasive
computing, etc. In this scenario, a fundamental
challenge is to provide scalable, secure and robust services in the
presence of
best-effort communication and unreliable nodes.
This graduate-level course focuses on distributed computing from a
systems software perspective. Students are expected to read and
critique recent research papers that cover some of the distributed
applications mentioned above and span areas such as operating systems
and networks. They are also expected to work on a
research project and make a presentation.
While there are no specific prerequisites for this course, students who
have taken undergraduate courses in operating systems, networks and
distributed systems will have an edge.
Textbooks
There are no required textbooks for this course. The optional
textbook is Distributed Systems: Concepts and Design (Third
Edition), by George Coulouris, Jean Dollimore and Tim Kindberg.
Published by Addison-Wesley, 2001. ISBN 0-201-61918-0.
Mailing List
Please subscribe to the class
mailing list by joining this group. You will
need a Yahoo account, although Yahoo will forward the group messages to
any email address of
your choice. The instructor will use this group to send out assignments
and
reminders. All students who subscribe to the group can send email to
the group. The group is not moderated. If a student has a specific
question for the instructor, please send an email to the instructor
directly. For the first week of classes, you can join the
group directly. After that you will need approval from the instructor.
Grading Policy
Grades will be based on class presentation and the questions prepared
for the discussion, class project and presentation, assignments and
class participation and
discussion. There will be no final exam in this course. The
grading breakup is as follows:
- Class presentation: 20%
- Class project: 40%
- Assignments: 20%
- Class participation: 20%
Note: If a student is unable to attend a class, he or she will
lose 2% for non-participation. No exceptions.
Class Presentation
Each week this class will cover a group of papers that focuses on a
specific aspect of distributed systems. Students are expected to read
all the papers in the group that will be presented (the number of
presentations depends on the number of students in class). At the
beginning of the term, each paper will be assigned to a student who
will be presenting the paper. Presentations will be limited to 15
minutes.
More
details about the presentation format. Please read very
carefully.
Class Project
A major component of this course is devoted to a term-long project. The
topic of the final project is largely up to you, but to help you choose
a project, a sample list of projects is provided below. This list
should
help students determine whether their own projects are of reasonable
size and scope.
More details about the project format.
Please read very carefully.
Project Ideas
Here is a list of project ideas.
Assignments
The instructor will assign short assignments at the end of some
classes. These assignments, which will consist of one or two questions
that have to be answered, will typically be a follow up to the
discussion in the class and will help students get a better grasp of
the material.
Assignments will be sent to students by email as well as posted on
this web site. They will be due the next class. Students are expected
to submit a hard copy of the assignment. Please use typed text. Two to
four assignments will be given during
the term.
Assignment 1
Example
Review for Paper 1 Example Review for Paper 2
Assignment 2
Answer for Assignment 2
Readings
This is a tentative list. If a link to a paper is missing, please use a
search engine to find the paper.
Week 1: Introduction (Sept 14)
- Introduction to Distributed Systems.
Instructor.
- Efficient Readings of
Papers in Science and Technology. Michael
J. Hanson, Dylan J. McNamee.
- How (and How Not) to Write
a Good Systems Paper. Roy Levin, David
D. Redell. Operating Systems Review 17(3), July 1983.
Week 2: Fault Tolerance (Sept 21)
- Path-Based
Failure and Evolution Management. Mike Y. Chen, Anthony Accardi,
Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox, and Eric Brewer.
NSDI 2004. Student
Presenter: Tomasz Czajkowski.
- FUSE:
Lightweight Guaranteed Distributed Failure Notification. John
Dunagan, Nicholas J. A. Harvey, Michael B. Jones, Dejan Kostic, Marvin
Theimer, and Alec Wolman. OSDI 2004. Student Presenter: Thomas Liu.
Optional papers:
- Using Fault
Injection and Modeling to Evaluate the Performability of Cluster-Based
Services. Kiran Nagaraja, Xiaoyan Li, Ricardo Bianchini, Richard P.
Martin, and Thu D. Nguyen. USITS 2003.
- A Microrebootable
System-Design, Implementation, and Evaluation. George Candea,
Shinichi Kawamoto, Yuichi Fujiki, Greg
Friedman, and Armando Fox. OSDI 2004.
- Why
Do Internet Services Fail, and What Can Be Done About It? David
Oppenheimer, Archana Ganapathi, and David A.
Patterson. USITS 2003. Slides.
Week 3: Naming (Sept 28)
- Network-Sensitive
Service Discovery. An-Cheng Huang and Peter
Steenkiste. NSDI 2004.
Student Presenter: Mehrdad Ariannejad.
- The
Design and Implementation of a Next Generation Name
Service for the Internet. Venugopalan Ramasubramanian, Emin Gun
Sirer.
SIGCOMM 2004. Student Presenter: Frank
Plavec.
Optional papers:
- A
Layered Naming Architecture for the Internet. Hari Balakrishnan
, Karthik Lakshminarayanan, Sylvia Ratnasamy, Scott Shenker, Ion
Stoica,
Michael Walfish. SIGCOMM 2004.
- Untangling
the Web from DNS. Michael Walfish, Hari Balakrishnan,
and Scott Shenker. NSDI 2004.
Week 4: File and Storage Systems (Oct 5)
- Google
File System. Sanjay Ghemawat, Howard Gobioff, Shun-Tak
Leung. SOSP 2003. Student Presenter: Rita
Chiu.
- Secure
Untrusted Data Repository. Jinyuan Li, Maxwell Krohn,
David Mazières, and Dennis Shasha. OSDI 2004. Student Presenter:
Zheng Li.
Optional papers:
- Explicit
Control in the Batch-Aware Distributed File System. John
Bent, Douglas Thain, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau,
and Miron Livny. NSDI 2004.
Week 5: Resource Management (Oct 12)
- Resource
Overbooking and Application Profiling in Shared Hosting
Platforms. Bhuvan Urgaonkar and Prashant Shenoy, Timothy Roscoe.
OSDI
2002. Student Presenter: Antonio Wang.
- Adaptive
Overload Control for Busy Internet Servers. Matt Welsh
and David Culler. USITS 2003. Student Presenter: Chuan Wu.
Optional papers:
- Integrated
Resource Management for Cluster-based Internet
Services. Kai Shen, Hong Tang, Tao Yang, Lingkun Chu. OSDI
2002.
- SHARP: An
Architecture for Secure Resource Peering. Yun Fu, Jeffery Chase,
Brent Chun, Stephen Schwab, and Amin Vahdat. SOSP 2003.
Week 6: Replication (Oct 19)
- FARSITE:
Federated, Available, and Reliable Storage for an
Incompletely Trusted Environment.
Atul Adya, William J. Bolosky,
Miguel
Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell,
Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer. OSDI 2002.
Student Presenter: Jing Su.
- Consistent
and Automatic Replica Regeneration. Haifeng Yu, Amin
Vahdat. NSDI 2004. Student Presenter: Kevin
Yuen.
- The
Dangers of Replication and a Solution. J. Gray, P. Helland, P.
O'Neill, and D. Shasha. SIGMOD 1996. Student Presenter: Kaloian Manassiev.
Week 7: Recovery (Oct 26)
- TimeLine:
A High Performance Archive for a Distributed Object
Store. Chuang-Hue Moh and Barbara Liskov. NSDI 2004. Student
Presenter: Vinod Muthusamy.
- Undo
for Operators: Building an Undoable E-mail Store. Aron Brown
and David Patterson. USENIX 2003. Student Presenter: Mark Jackman.
Optional papers:
- Self-Repairing
Computers. Armando Fox and David Patterson. Scientific American
2004.
Week 8: Automated Management (Nov 2)
- Total
Recall: System Support for Automated Availability
Management. Ranjita Bhagwan, Kiran Tati, Yu-Chung Cheng, Stefan
Savage,
and Geoffrey M. Voelker. NSDI 2004. Student Presenter: Thomas Liu.
- Automatic
Misconfiguration Troubleshooting with PeerPressure.
Helen J. Wang, John Platt, Yu Chen, Ruyun Zhang, and Yi-min Wang. OSDI
2004. Student Presenter: Mahsa Moallem.
Optional papers:
- Understanding and Dealing
with Operator Mistakes in Internet
Services. Kiran Nagaraja, Fabio Oliveira, Ricardo Bianchini,
Richard P.
Martin, and Thu D. Nguyen. OSDI 2004.
- Correlating Instrumentation Data to System States: A Building
Block for Automated Diagnosis and Control. Ira Cohen, Jeff Chase,
Moises Goldszmidt, Terence Kelly, and Julie Symons. OSDI 2004.
- Using Magpie for request extraction and workload modelling. Paul
Barham, Austin Donnelly, Rebecca Isaacs, Richard Mortier. OSDI
2004.
- STRIDER:
A Black-box, State-based Approach to Change and
Configuration Management and Support. Yi-Min Wang, Chad Verbowski,
John Dunagan, Yu Chen, Helen
J. Wang, Chun Yuan, and Zheng Zhang. LISA 2003.
Week 9: Network Performance (Nov 9)
- Vivaldi:
A Decentralized Network Coordinate System. Frank Dabek,
Russ Cox, Frans Kaashoek, Robert Morris. SIGCOMM 2004. Student
Presenter: Mehrdad Ariannejad.
- Locating
Internet Bottlenecks: Algorithms, Measurements and
Implications. Ningning Hu, Li Erran Li, Zhuoqing Morley Mao, Peter
Steenkiste, Jia Wang. SIGCOMM 2004. Student Presenter: Dapeng Gao.
Optional papers:
- The
Effectiveness of Request Redirection on CDN Robustness. Limin
Wang, Vivek Pai, and Larry Peterson. OSDI 2002.
Week 10: Peer-to-Peer Networks (Nov 16)
- Chord: A
Scalable Peer-to-Peer Lookup Service for Internet
Applications. Ion Stoica, Robert Morris, David Karger, M. Frans
Kaashoek, Hari Balakrishnan. SIGCOMM 2001. Student Presenter: Gregory Hartl.
- Making
Gnutella-like P2P Systems Scalable. Yatin Chawathe, Sylvia
Ratnasamy, Lee Breslau, Nick Lanham, Scott Shenker. SIGCOMM 2003.
Student Presenter: Trevor Armstrong.
Optional papers:
- Handling
Churn in a DHT. Sean Rhea, Dennis Geels, Timothy Roscoe,
John Kubiatowicz. Usenix 2004.
- Modeling
and Performance Analysis of Bit Torrent-Like
Peer-to-Peer Networks. Dongyu Qiu, R. Srikant. SIGCOMM 2004.
Week 11: Fairness (Nov 23)
- Sprite:
A Simple, Cheat-Proof, Credit-Based System for Mobile Ad-Hoc Networks. Sheng
Zhong, Jiang Chen, Yang Richard Yang. Infocom 2003. Student Presenter: Alex Varshavsky.
- Performance
Analysis of the CONFIDANT Protocol
(Cooperation Of Nodes: Fairness In Dynamic Adhoc
NeTworks). Sonja Buchegger, Jean-Yyes Le Boudec. MobiHoc 2002.
Student Presenter: Alex Varshavsky.
Week 12: Multicast (Nov 30)
- The
Feasibility of Supporting Large-Scale Live Streaming
Applications with Dynamic Application End-Points. Kunwadee
Sripanidkulchai, Aditya Ganjam, Bruce Maggs, Hui Zhang. SIGCOMM 2004.
Student Presenter: Nazar Abbaz.
- SplitStream:
High-Bandwidth Multicast in Cooperative
Environments. Miguel Castro, Peter Druschel, Ann-Marie Kermarrec,
Animesh Nandi, Antony Rowstron, Atul Singh. SOSP 2003. Student
Presenter: Mea Wang.
Optional papers:
- Bullet: High
Bandwidth Data Dissemination Using an Overlay Mesh.
Dejan Kostic, Adolfo Rodriguez, Jeannie Albrecht, Amin Vahdat. SOSP
2003.
Week 13: Project Presentation (Dec 9)