Non-exhaustive Project List
			 ----------------------------

As I have also stated in class, any project in the Parallel Processing
and Distributed Systems area including multiprocessors, clusters and
wide-area platforms such as Planetlab, can be selected as a Parallel
Programming project.  Topics in software for these platforms may be as
varied as Middleware, Aspect Orientation, Multiprocessor OS,
transaction processing, compiler parallelization or runtime
optimization.

You can either optimize the system or optimize a particular
application.  It could be something that may help your research along
as well as be a good fit for this course.  As long as the project
forces you to either reason about and optimize a parallel/distributed
system or actually program in a parallel/distributed way, your project
will be a good fit for this class.  If all else fails, come talk to me
about it once you have formed a team and I can give you some more
suggestions.

Apart from pthreads, MPI and Open-MP, There will be a couple of
software packages available for you.  Some sophisticated parallel
programming software packages will be provided.  These include the
libTM transactional memory system (source) and the TreadMarks
software Distributed Shared Memory System (library).  This
is intended only for people interested in computer architecture
parallelism or optimizing programs to run better on parallel
architectures.

libTM is a software transaction memory system.  You can use it to run a
transaction processing workload like the applications that come with it.

TreadMarks is a software DSM using which you can parallelize an
application to run on a network of workstations.  The reason why I
cannot give you the TreadMarks source code to play with is that
TreadMarks is a commercial system, while libTM is public domain code.

Examples of projects:

1. You have an application that you would like to parallelize.
For instance, if your Department is Physics, you may have a mollecular
dynamics application that you would like to speed up by running on
either a multiprocessor or a cluster of workstations.
You should first reason about and detect the data races by understanding 
the application code and then introduce synchronization in this application 
by using pthreads, or explicitly exchange data between parallel processing 
units by using MPI, or insert Open-MP pragmas in the application code.
Finally, you can use the TreadMarks library to parallelize you application
on a cluster of workstations. You run the application sequentially and
in parallel and document the speedup or absence thereof. If speedup
is non-existant or worse, i.e., your application runs much slower in
parallel then you should document your detection-of-overhead-causes process.
There is a sample presentation pointer at the end of this document
for a project in this category.


2. There are "famous" applications out there that have not been 
parallelized. For instance, the source code for the Quake game is
available. You can try to first run Quake and amuse yourselves :-).
Then you download the source and parallelize Quake by using multithreading
or TreadMarks. If you want to tackle this project, please read the
Quake parallelization paper that is on the reading list. You may not
want to do it the same way they did it 
(just because they were not very successful).
The code for the paper "1. Locality Aware Dynamic Load Management for
Massively Multiplayer Games" is available for you to reimplement and extend.

3. You take a set of small applications that are already parallelized
(e.g., SOR and other existing benchmarks that I will provide or you can
look up yourselves) and rewrite the applications to improve locality with
the goal to improve their cache behavior.

4. You may want to implement a distributed locking scheme with reader-writer
locking either by following the Mellor-Crummey Scott paper provided
in the reading list or by another distributed scheme (e.g., the one 
used in most Software DSM systems described in the TreadMarks paper
provided). You will need to do some network programming for this
project. A sample of network communication using sockets will be provided.
Then you can use your distributed locking scheme to implement locking on
top of a cluster of workstations.
Document the performance of your locks with and without lock contention.

5. You can implement the bare-bones of software transactional memory,
on a single machine. For this project, you will need to think of a way
to handle detecting data races and undoing modifications when a data
race is detected.  You may do this by understanding the libTM code and
reusing some of its code for "undo"-ing modifications.

6. You can compare the multi-threaded versus event-driven versus other
models such as lazy asynchronous I/O (please see the Asynchronous I/O
paper in the reading list) for a simple implementation of a server or
server proxy code such as a web server skeleton or an open source
database (MySQL) or a simple proxy code of your own.

7. You can implement a database query result cache that sits inside a
proxy on top of a MySQL database. The cache will need to do "parallel
processing" to handle incoming connections so you will need to use
either multithreading or event-driven programming.  An example of a benchmark
will be provided. For this project you will document the query cache hit rate
and the improvement or lack thereof that it brings. You need to argue that you
understand why you did or did not get an improvement.

8. Implement a simple scheme for an Open-MP interface over networks 
of workstations scheme (please see the related paper in the reading list).
For simplicity you can assume that you have an application without writes
or where consistency does not matter (such as in the Travelling Salesman
Problem) and you just want to parallelize some of the code. 
Your scheme should implement a fork/join model over a cluster of workstations.

9. Comparing the parallelization of a problem in CUDA on a GPU versus OpenMP
on a multi-core and/or an FPGA (or all three).

For example: Parallelizing an N-body simulation.
Original N-Body-Simulation code from https://github.com/jagoosw/Gravitational-N-Body-Simulation

J. Rodmann, D. Hampf, T. Hasenohr, L. Humbert, W. Riede, F. Sproll, and
P. Wagner, "Precise orbit propagation for space debris objects using the
hermite integration scheme" 2017

10. Paralelizing a Maze solver.
Original Maze solver code "c++ program to solve a 2-D maze", https://github.com/zakjwalton/mazeSolver, 2021.