Non-exhaustive Project List ---------------------------- As I have also stated in class, any project in the Parallel Processing and Distributed Systems area including multiprocessors, clusters and wide-area platforms such as Planetlab, can be selected as a Parallel Programming project. Topics in software for these platforms may be as varied as Middleware, Aspect Orientation, Multiprocessor OS, transaction processing, compiler parallelization or runtime optimization. You can either optimize the system or optimize a particular application. It could be something that may help your research along as well as be a good fit for this course. As long as the project forces you to either reason about and optimize a parallel/distributed system or actually program in a parallel/distributed way, your project will be a good fit for this class. If all else fails, come talk to me about it once you have formed a team and I can give you some more suggestions. Apart from pthreads, MPI and Open-MP, There will be a couple of software packages available for you. Some sophisticated parallel programming software packages will be provided. These include the libTM transactional memory system (source) and the TreadMarks software Distributed Shared Memory System (library). This is intended only for people interested in computer architecture parallelism or optimizing programs to run better on parallel architectures. libTM is a software transaction memory system. You can use it to run a transaction processing workload like the applications that come with it. TreadMarks is a software DSM using which you can parallelize an application to run on a network of workstations. The reason why I cannot give you the TreadMarks source code to play with is that TreadMarks is a commercial system, while libTM is public domain code. Examples of projects: 1. You have an application that you would like to parallelize. For instance, if your Department is Physics, you may have a mollecular dynamics application that you would like to speed up by running on either a multiprocessor or a cluster of workstations. You should first reason about and detect the data races by understanding the application code and then introduce synchronization in this application by using pthreads, or explicitly exchange data between parallel processing units by using MPI, or insert Open-MP pragmas in the application code. Finally, you can use the TreadMarks library to parallelize you application on a cluster of workstations. You run the application sequentially and in parallel and document the speedup or absence thereof. If speedup is non-existant or worse, i.e., your application runs much slower in parallel then you should document your detection-of-overhead-causes process. There is a sample presentation pointer at the end of this document for a project in this category. 2. There are "famous" applications out there that have not been parallelized. For instance, the source code for the Quake game is available. You can try to first run Quake and amuse yourselves :-). Then you download the source and parallelize Quake by using multithreading or TreadMarks. If you want to tackle this project, please read the Quake parallelization paper that is on the reading list. You may not want to do it the same way they did it (just because they were not very successful). The code for the paper "1. Locality Aware Dynamic Load Management for Massively Multiplayer Games" is available for you to reimplement and extend. 3. You take a set of small applications that are already parallelized (e.g., SOR and other existing benchmarks that I will provide or you can look up yourselves) and rewrite the applications to improve locality with the goal to improve their cache behavior. 4. You may want to implement a distributed locking scheme with reader-writer locking either by following the Mellor-Crummey Scott paper provided in the reading list or by another distributed scheme (e.g., the one used in most Software DSM systems described in the TreadMarks paper provided). You will need to do some network programming for this project. A sample of network communication using sockets will be provided. Then you can use your distributed locking scheme to implement locking on top of a cluster of workstations. Document the performance of your locks with and without lock contention. 5. You can implement the bare-bones of software transactional memory, on a single machine. For this project, you will need to think of a way to handle detecting data races and undoing modifications when a data race is detected. You may do this by understanding the libTM code and reusing some of its code for "undo"-ing modifications. 6. You can compare the multi-threaded versus event-driven versus other models such as lazy asynchronous I/O (please see the Asynchronous I/O paper in the reading list) for a simple implementation of a server or server proxy code such as a web server skeleton or an open source database (MySQL) or a simple proxy code of your own. 7. You can implement a database query result cache that sits inside a proxy on top of a MySQL database. The cache will need to do "parallel processing" to handle incoming connections so you will need to use either multithreading or event-driven programming. An example of a benchmark will be provided. For this project you will document the query cache hit rate and the improvement or lack thereof that it brings. You need to argue that you understand why you did or did not get an improvement. 8. Implement a simple scheme for an Open-MP interface over networks of workstations scheme (please see the related paper in the reading list). For simplicity you can assume that you have an application without writes or where consistency does not matter (such as in the Travelling Salesman Problem) and you just want to parallelize some of the code. Your scheme should implement a fork/join model over a cluster of workstations. 9. Comparing the parallelization of a problem in CUDA on a GPU versus OpenMP on a multi-core and/or an FPGA (or all three). For example: Parallelizing an N-body simulation. Original N-Body-Simulation code from https://github.com/jagoosw/Gravitational-N-Body-Simulation J. Rodmann, D. Hampf, T. Hasenohr, L. Humbert, W. Riede, F. Sproll, and P. Wagner, "Precise orbit propagation for space debris objects using the hermite integration scheme" 2017 10. Paralelizing a Maze solver. Original Maze solver code "c++ program to solve a 2-D maze", https://github.com/zakjwalton/mazeSolver, 2021.