ECE454, Fall 2025
University of Toronto
Instructors:
Ashvin Goel,
Ding Yuan
Q: Is this lab competitive?
A: Yes.
Q: How should we debug our program?
A: Use the initboard program to generate small images. It will be easier to debug your program using small images. This program generates files with random data. It will be easiest to debug your program using the same image files.
Q: The starter code appears to run GoL on a toroidal world (edges loop around). This is not the usual configuration for GoL. Should we implement GoL for a toroidal world?
A: Yes, please implement GoL for a toroidal world.
Q: The starter has several optional arguments. If I write my code from scratch, do I need to support them?
A: Your code does not need to support the optional arguments. It should run with the required "num_generations inputfile outputfile" arguments. Your code also doesn't need to support the "-" argument for printing to stdout.
Q: If the leaderboard shows a successful grade, does it mean that our code passes the correctness check?
A: Yes.
Q: How do you evaluate the report?
A: We only refer to the report in case we suspect plagiarism.
Q: Is there a GPU on the testing server?
A: No.
Q: Can I use SIMD/avx2 instructions?
A: Yes. But remember that if your C code is structured well, the compiler can often generate SIMD code, so you will have to work hard to beat it.
Q: Can you give us some hints?
A: Here are some suggestions:
Start with some easy algorithms. Make sure you have a working version, before trying any complex algorithms.
Don't use complex algorithms (e.g., HashLife). Some of them are hard to parallelize efficiently and they will only run faster than simpler algorithms for much larger input sizes than our evaluation input size.
Here is a good article (but please don't copy and paste the code that is provided).
Many regions of the initial image will converge to a stable state. You can maintain an active list to track the active regions so you don't need to compute on the stable regions.
Q: Do you have any tips for multi-threading?
A: Here are some suggestions:
Make sure to improve the performance of your single-threaded solution before using multi-threading. We have seen optimized single-threaded solutions (including algorithm-level optimizations) providing a speedup of up to 300-400x! On the other hand, many fast single-threaded algorithms cannot be easily parallelized. So, remember to strike a balance between good single-threaded performance vs. scalability with more cores.
Try to avoid accessing the same memory from different CPUs since they have private caches and cache coherence is expensive. You can use the sched_setaffinity system call or the pthread_setaffinity_np pthread library call to specify a CPU on which a thread is run.
Use __attribute__((aligned(number))) to avoid false cache line sharing.
If you use an active list, you will need to balance the workload across different threads.
Remember to test your code using the automarker since the testing machine may have a different number of CPUs than the UG machines.
Q: What should we do to be competitive?
A: Here are some suggestions:
Here is a good starting point. This site covers a lot of algorithms and provides a brief introduction to how they work. Some of these algorithms will take a long time to implement and optimize. Make sure to not copy any code.
Look at this blog post. It covers several algorithms with varying complexity, and it also includes a benchmark across different algorithms.
The algorithms by Tony Finch and Alex Hensel are fast and parallelizable but they are hard to understand and implement.