ECE454, Fall 2025
University of Toronto
Instructors:
Ashvin Goel,
Ding Yuan
Assigned: Nov 13, Due: Nov 30, 11:59 PM
The TAs for this lab are: Zhihao Lin, Guozhen Ding
OptsRus will now use all of its knowledge about performance optimizations to handle its biggest challenge to date: optimizing a game simulator. Given C code from the client, OptsRus will parallelize the code to exploit multicore machines, as well as perform other optimizations to improve performance.
You will be optimizing Conway's "Game of Life" (GoL), a famous, simple algorithm for computing cellular automota. GoL uses simple rules to generate complex and interesting behavior. You can use the GoL emulator to see how the game works (start by reading the "Explanation"). You can also read about the history and the details of GoL on Wikipedia.
Start by copying the gol.tar.gz file from the shared directory /cad2/ece454f/hw5/ on the UG machines in a protected directory in your UG home directory. Then run the commands:
tar xzvf gol.tar.gz
cd src
make
The make command will generate two programs, initboard and gol. The initboard program creates input files for the gol game program. The initboard program is run as follows:
initboard num_rows num_cols > input.pbm
This command will create an image file of size num_rows x num_cols in the simple, monochrome, portable bitmap (pbm) format. The initboard program generates files with random data. Create a few small image files and look at the contents of these files to understand the format of the files. You can use the small image files for debugging your program. We have also provided you some larger input image files in the inputs directory.
The gol program is run as follows:
gol num_iterations input.pbm output.pbm
The program will run num_iterations of GoL on the input.pbm file to generate the output.pbm file. This program takes two optional arguments:
-s: run the original sequential code.
-v: run both your code and the sequential code and verify that the output produced by your code is the same as the output produced by the sequential code.
If you want to visualize a pbm image file, you can convert it to a jpg file that you can view in a browser as follows:
convert file.pbm file.jpg
Visualizing the file can potentially help you think of opportunities for optimization.
Your main task in this lab is to speedup the gol program using all the methods you have learned in the course. You can parallelize the program and use any number of threads. You are also free to use any optimizations, including rewriting the entire program, editing the Makefile and/or the compilation flags, etc. Here are the rules for this lab:
Given an input file and any number of iterations, your program must produce the same output file as the original unmodified gol program.
Your program must be be faster than the reference implementation.
The code must be your own, i.e., you cannot directly incorporate GoL-specific acceleration code or libraries written by others. However, you can study them and come up with a similar implementation. You must be able to explain exactly what your code is doing when asked by a TA.
Your program must be able to execute successfully on both the UG lab machines and our tester machine.
You must submit all the C source files (e.g., you can't submit pre-processed LLVM IR code).
For this lab, you should measure the "wall clock" run time of the gol program using the /usr/bin/time command. This command will measure the run time of the entire gol program, including reading and writing the image files, program initialization, thread creation, and any other overheads associated with parallelization.
We will evaluate your program by running 10,000 GoL iterations on the inputs/1k.pbm file as follows:
/usr/bin/time -f "%e" ./gol 10000 inputs/1k.pbm outputs/1k.pbm
You can also measure the run time of the original program by using the -s option to the gol program.
If you find that it takes too long to run the program using these inputs, you should experiment and debug your program using fewer generations or smaller input files. However, we will use the command shown above for evaluation.
Your optimized program works correctly if its output is the same as the output of the original program. You can test for correctness in two ways. You can run the gol program using the -v option to validate the output of your program. This option runs the original sequential program and so the program will run slowly for large inputs.
Alternatively, you can run the original version of the program (as described above) once to generate the correct output file and then compare the output files. We have provided one output file 1k_verify_out.pbm that you can use to verify your program as follows:
cmp outputs/1k.pbm outputs/1k_verify_out.pbm
The cmp program generates no output if the files match. Otherwise, it outputs the first character and line number at which the file differ, in which case you can use the diff program to see the differences in the files.
We will test the correctness of your code using several input files of varying sizes and initial configurations. Be sure to test/debug your current program on various different input files of different sizes.
We will be using an automated marker for this lab and the Lab 2 marking scheme. The total marks consist of a non-competitive and a competitive portion.
If your code can achieve a modest level of performance speedup (as shown on the automarker website), which we call the threshold speedup, the TA will assign 70% marks for this portion.
The threshold speedup for this portion of the lab will be based on the average mark of all submissions in the first week of last year. You can expect a threshold speedup requirement in the range of 100x.
The second competitive portion is designed to provide you opportunities to apply various performance optimizations.
You can assume that the dimensions of all input files are N x N, where N is a power of two. The image files in the inputs directory are in this format.
You can assume that the minimum image size is 32 x 32 and the maximum image size is 10000 x 10000. However, your program should exit gracefully for a smaller or a larger image size.
Make sure to modify your team information in your teaminfo.txt file.
You must not modify inputs/*.pbm and outputs/1kverifyout.pbm files.
Your source code directory must be named src, and your submission must be named gol.tar.gz.
You are allowed to modify any file, but your code must compile and execute using the same command line arguments as the reference implementation.
You must not cache the final output of execution across multiple runs. If needed, you may pre-compute internal states or hash tables.
For more details about the lab, read the Lab 5 FAQ.
For this lab, you need to submit a plain text file called report.txt that contains a short description (no more than 150 words) of how your program and the optimizations work, and what you other optimizations you tried.
When you have completed the lab, remove any input or output files that you have created. Then, please submit (on a UG machine) the report file and a tarball of all your source code, including the Makefile to compile and run your GoL solution, as follows:
cd src
make clean
cd ..
tar -zcvf gol.tar.gz src/
submitece454f 5 gol.tar.gz report.txt
Once you submit your work using the submitece454f command, your submission will be placed in an automarker queue so that your code can be run by our automarker. After the automarker runs your code, it makes your speedup score available on our web site. This web site's URL will be provided on Piazza. The automarker will only assign marks if your program runs successfully.
You can submit your solution as many times as you like before the submission deadline. We will use your last submission before the deadline for grading.