Instructor: Ashvin Goel
Course Number: ECE1781H
Course Time: Fri, 1-3 pm
Course Room: BAB025
Start Date: Sept 13, 2013

Home
Accessing Papers
Presentation Format
Project Format
Project Ideas

Dependable Software Systems

ECE1781, Fall 2013
University of Toronto


Project Ideas

Some suggested projects are described below. Please talk to the instructor about more details regarding the projects. Please make sure to get a confirmation about any project from the instructor before starting the project.

It is important for you to have thought about the following questions regarding each project before starting any design and implementation: 1) what problem are you addressing, 2) what is interesting/novel about your approach, 3) what metrics and testing method will you use for evaluation, and 4) what results do you expect from the evaluation.

  1. Finding Kernel Bugs with Dynamic Binary Instrumentation

    The goal of this project is to find memory related bugs in the Linux kernel using a binary instrumentation system. A binary instrumentation system enables monitoring and manipulating every instruction in an executing binary. Binary instrumentation systems have been used for developing bug-finding and security tools. For example, Memcheck uses binary instrumentation to detect various types of memory errors dynamically, such as accessing memory after it has been freed. The instructor's group has been developing a binary instrumentation system called Granary for the Linux operating system. In this project, your aim is to detect memory errors such as accessing undefined values, dereferencing undefined pointers.
  2. Device Driver Isolation with Dynamic Binary Instrumentation

    The goal of this project is to isolate device drivers in the Linux kernel using a binary instrumentation system. Device drivers constitute the majority of code in most operating systems and tend be the least reliable or trustworthy because they developed by third party developers. In this project, your aim is to use the Granary binary instrumentation system (described in previous project) to isolate drivers so that they cannot divert control flow to arbitrary operating system code, or access arbitrary kernel data, similar to the byte-granularity isolation paper. In this project, your aim is to ensure that critical kernel data structures cannot be directly modified by driver code.
  3. Race Detection for OS kernels

    The goal of this project is to detect races in the Linux kernel. You can choose an existing algorithm that we have discussed in class (e.g., Lock set, Datacollider, CTrigger) and use tools such as the Granary binary instrumentation system (described here) or LLVM to instrument the kernel. What test framework will you use to trigger races? How will you detect races? Is it easy to replicate bugs that are found? Will be system be used in production or offline?
  4. Bug Detection Using Symbolic Execution

    Several papers in the reading list use symbolic execution for detecting bugs (KLEE, Execution Synthesis). In this project, you will use an available symbolic execution tool called S2E to detect bugs in some simple programs.
  5. File-System Aware Storage Virtualization

    A storage virtualization system abstracts the physical location of data by presenting logically contiguous storage (e.g., partition) that maps to physically separated storage (e.g., a disk array, disks on machines in a cluster, etc.). This separation allows greater flexibility for storage management (e.g., storage migration) and can help improve reliability and performance.

    The goal of this project is to allow storage virtualization systems to take advantage of file system semantics. Today, storage virtualization systems are unaware of the software running above them. Hints from file systems and storage applications can help improve the reliability and performance of storage systems. For example, a file system could provide a hint that some blocks are critical and thus should be multiply replicated (or not deduplicated) or other blocks are unlikely to be read in the near future and can be evicted from the storage cache. In this project, you will choose some hints that can be provided by the file system layer to improve the reliability of the storage system. The instructor's group can provide code that will help you get started. How will you evaluate such a system?
  6. Improving the Reliability of File Systems With Online Consistency Checking

    The goal of this project is to ensure that file system and kernel bugs do not cause corruption of file system metadata. As a result, an offline file system check does not have to be run even if the kernel or the file system is arbitrarily buggy. The instructor's group has worked on this project and shown the feasibility of this approach for the Linux ext3 and btrfs file systems. In this project, the same technique would be applied to a file system designed specifically for flash devices such as the Linux F2FS file system. Talk to the instructor for details.
  7. Improving the Reliability of Databases With Online Consistency Checking

    The goal of this project is to ensure that database bugs do not cause corruption of a database image. The instructor's group has worked on a related project for file systems (described in previous project) and shown the feasibility of this approach for the Linux ext3 file system. Previously, some students have started applying this approach to the open source SQLite database system. In this project, you will extend and complete this work. Talk to the instructor for details.
  8. N-Version File Systems

    The goal of this project is to improve the reliability of file systems in the face of hardware and file system bugs. One option is to take advantage of the fact that different file systems handle failures differently. As a result, a simple fault tolerance method would be to replicate all file system operations to two different file systems and detect errors based on comparing the outputs of the operations.
  9. Recovery via Restarting Applications

    The "Microreboot" paper described a method by which parts of an application are rebooted to allow recovery of the application. This approach gets rid of faulty state in the application. In this project, you will choose an application implement a recovery via "reboot" method for this application. You need to make sure that the persistent data in the application is not lost. For example, for a content download application (e.g., bittorrent),the music repository must not be lost. Similarly, for an instant messaging application (e.g., gaim), the received messages should not be lost. How fine is your reboot granularity? Can you tune it? How often is reboot possible? What types of faults or bugs can the reboot handle? How does the reboot affect user perception? Would you change the application design based on your experience with micro-reboot based restarting.
  10. Application-Level Undo and Recovery

    The "Undo for Operators" paper implemented an undoable email service. In general, their application-level undo and recovery service requires applications whose operations have well-defined semantics and can be serialized. Another example that satisfies this criteria is a calendar service. Can you think of other such applications? Choose an application and implement an undoable service for that application. Describe the properties of this undoable application. How does application-specific recovery improve on generic recovery as described in the "Exploring Failure Transparency" paper?