Computer Systems Programming

ECE454, Fall 2025
University of Toronto
Instructors: Ashvin Goel, Ding Yuan

Lab 2: Memory Performance

Assigned: Sept 18, Due: Oct 12, 11:59 PM

The TAs for this lab are: Ruibin Li, Eric Xu

Introduction

After great success with their previous client, OptsRus has a second client: a virtual reality headset startup. The startup is co-founded by a group of hardware geeks: those who like to design electrical circuits and integrate sensors. The VR headset prototype hardware is almost ready but lacks a high-performance software image rendering engine. The hardware engineers have already written functionally correct code in C but need your help to supercharge the performance and efficiency of their code.

The rendering engine's input is a preprocessed time-series data set representing a list of object manipulation actions. Each action is consecutively applied over a 2D object in a bitmap image such that the object appears to move with respect to the viewer. To generate smooth and realistic visual animations, sensor data points are over-sampled at 1500Hz or 25x normal screen refresh rate (60 frames/s).

The diagram below shows all the possible object manipulation actions. Your goal is to process all these object manipulation actions and output rendered images for the display at 60 frames/s.

Object Manipulation Actions

Setup

Start by copying the hw2.tar.gz file from the shared directory /cad2/ece454f/hw2/ on the UG machines into a protected directory in your UG home directory. Then run the command:

tar xzvf hw2.tar.gz

This will cause several files to be unpacked into the directory. The ONLY file you will be modifying and handing in is implementation.c. You are prohibited from modifying other files. In implementations.c, please insert the requested identifying information. Do this right away so that you don't forget.

Compilation

The lab assignment utilizes the open-source, cross-platform CMake packaging system to manage the source code. Unlike the simple projects you have seen before, CMake generates the Makefile based on your computer configuration. The instructions to compile the project are shown below:

> cd <project directory> // Navigate to the lab assignment directory
> mkdir bin && cd bin    // Make a new bin directory, then navigate in it
> cmake ../              // Use cmake to generate Makefile automatically

After these simple configuration steps, the make file is automatically generated. Simply run make and an executable named ECE454_Lab2 should appear in the bin folder.

Performance Measurement Tools

To gain insight into the cache-related behavior of your implementation, you can use the perf tool to access the hardware performance counters of the processor. For example, to output the first-level cache misses generated by your program foo, you would execute:

perf stat -e L1-dcache-load-misses foo

You can view the list of all performance counters that you can monitor by running:

perf list

You can monitor multiple counters at once by using multiple -e options in one command line. perf has many other features that you can learn about as follows:

perf --help

For example, you can monitor TLB misses or other more advanced events. Check out this short write-up about perf.

If you have successfully completed Lab 1, you should be familiar with the gprof and the gcov tools. They can help pinpoint the bottleneck in your program when the program is run locally.

Note: To configure these tools for use in your project, you will need to provide additional cmake command-line options while generating the make file. You can find examples on Stackoverflow.

Rendering Engine

You task in this lab is to optimize the code of the rendering engine that we have provided to you. Here, we provide an overview of the rendering code.

Input Files

The rendering engine takes two input files. The first one is a standard 24-bit bitmap square image file. The second one is a list of processed sensor values stored in a CSV (comma separated value) file.

For the bitmap image file, white pixels (RGB=255,255,255) are the background, and non-white pixels are considered part of an object. You can generate your own image files using Microsoft Paint in Windows or GIMP on Linux. After drawing your own square image, export the bitmap to the 24-bit bitmap format. If you are using GIMP, under compatibility options, make sure to check "Do not write color space information" option. Then under advanced options, select "R8 G8 B8" under 24 bits. We have packaged a tiny 10x10 pixel bitmap image named object_2D.bmp in the lab assignment package to help you get started.

The list of processed sensor values can be viewed as a list of key-value pairs. The key represents a basic object manipulation action, and the value is the magnitude of the specified action. An example sensor value input file is shown below:

W,6   // shift object up by 6 pixels
A,5   // shift object left by 5 pixels
S,4   // shift object down by 4 pixels
D,3   // shift object right by 3 pixels
CW,2  // rotate entire image clockwise by 180 degrees
CCW,1 // rotate entire image counter-clockwise by 90 degrees
MX,1  // mirror entire image along the X-axis in the middle
MY,0  // mirror entire image along the Y-axis in the middle

Data Structures

The input bitmap image has already been parsed for you. The image pixel data is stored in the frame_buffer data structure:

unsigned char *frame_buffer; // [R1][G1][B1][R2][G2][B2]...
unsigned int width, height;  // dimension in number of pixels

The processed sensor values input file has also been parsed for you and stored in an array of key-value pairs. The array has enough storage for at least 10,000 key-value pairs. As mentioned earlier, the key represents the basic object manipulation action, and the value represents the magnitude. It is stored in the following data structure:

// KV Data Structure Definition
struct kv {
    char *key; int value;
};

// KV Data Structure Array
struct kv sensor_values[10240];

Key Function and Definitions

In this lab, you are only allowed to modify a single file (implementation.c). This file has three important functions. Please make sure to follow the instructions shown below regarding these functions.

print_team_info()

This method is used to print the team information to stdout. Since you will be doing the lab individually, your team is just you. It is called upon program startup. This information is used by the auto-marking system. Please modify the function shown below before starting the lab. Failure to modify this information or an incorrect modification will result in a zero mark for the assignment. All versions of submitted solutions, including from previous years and their github counterparts will be compared for plagiarism.

// Please modify this field with something interesting
char team_name[] = "default-name";

// Please fill in your information
char student1_first_name[] = "john";
char student1_last_name[] = "doe";
char student1_student_number[] = "0000000000";

implementation_driver()

This method is the main entry point to your code. All the available data is passed on to your implementation via this function. You should not modify the prototype of this function. Currently, a naive but working solution of the lab is provided in implementation_driver() to help you get started. However, you are free to modify the implementation of this function and modify or delete anything else in this file except for the print_team_info() function mentioned above. Please make sure that the implementation_driver() function is always reachable from main.

The prototype of the function is the following:

void implementation_driver(
    struct kv *sensor_values,
    int sensor_values_count,
    unsigned char *frame_buffer,
    unsigned int width,
    unsigned int height,
    bool grading_mode
);

verifyFrame()

You must call this function for each frame you are required to output to verify the correctness of your implementation. Before you call this function, please make sure you pass in valid data of the correct type. Failing to perform this step will generate an error in the program. The prototype of the function is the following:

void verifyFrame(unsigned char *frame_buffer,
                 unsigned int width, unsigned int height,
                 bool grading_mode);

Coding Rules

The coding rules for this lab are simple. You may write any code you want, provided it satisfies the following:

You submission only modifies the implementation.c file. You are not allowed to modify the cmakelist.txt file, and as a result, you will not be able to bypass this limitation.
Your code does not interfere or attempt to alter the time measurement mechanism.
Your submitted code does not print additional information to stdout or stderr.

This year we have added two new rules:

You may not use any assembly in your code.
You may not use any GNU compiler attributes.

You will not get an immediate error if you use assembly or compiler attributes with the automarker (see below), but we will scan your code after the deadline and you may receive zero if you violate these rules.

Evaluation

The lab is evaluated when the grading mode is turned on. The grading mode is controlled via the -g command line flag (see example terminal output below). The evaluation turns on instrumentation code to measure the total clock cycles used by the implementation_driver() function. When you evaluate your implementation using the commands below, you should see output similar to the output shown below.

$ cd <project directory>
$ ./bin/ECE454_Lab2 -g -f sensor_values.csv -i object_2D.bmp
Loading input sensor input from file: sensor_values.csv
Loading initial 2D object bmp image from file: object_2D.bmp
********************************************************************************
Team Information:
   team_name: default-name
   student1_first_name: john
   student1_last_name: doe
   student1_student_number: 0000000000
********************************************************************************
Performance Results:
   Number of cpu cycles consumed by the reference implementation: 124374
   Number of cpu cycles consumed by your implementation: 125073
   Optimization Speedup Ratio (nearest integer): 1
********************************************************************************
SUCCESS: frame #0 is the same compared to the reference implementation

We evaluate your code using an automarker that is run on a single dedicated machine. This machine has server-grade hardware (Intel Xeon E5-2430 CPU) similar to what cloud providers use compared to the lab machines (Intel Core i7 4790 CPU). After the automarker runs your code, it makes your speedup score available on our web site. This web site's URL will be provided on Piazza.

In most cases, your speedup score should be reflected on our web site within a short period of time after your submission. However, the web site may delay showing the speedup scores up to 24 hours after submission to prevent students from submitting too frequently. If you notice any irregularities with the web site, let us know via a piazza post. We will provide the most up-to-date status of the automarker on Piazza.

We recommend using the UG lab machines to test your code for correctness and to gauge relative performance against your own previous solutions. However, we will assign your final grade based on running your code on our dedicated machine and your marks on our web site. These marks are generally accurate but they may be adjusted if any cheating is observed. We will not use any hidden or extra tests for marking.

Marking Scheme

This course allows you to practice trading your grades and time. Don't be too stressed that there are people better than you at optimizing the program. However, you should be worried if you cannot meet the minimal and acceptable performance targets as described below.

Non-Competitive Portion - 70%

The non-competitive portion should be fairly easy for anyone who puts in a reasonable amount of effort in this lab. If your code can achieve a modest level of performance speedup (as shown on the automarker website), which we call the threshold speedup, the TA will assign 70% marks for this portion. This lab is designed to provide you with several opportunities for implementing performance optimizations. One can trivially achieve at least a 50x performance speedup compared to the reference solution.

The threshold speedup for this portion of the lab will be based on the average mark of all submissions in the first week of last year (e.g., you can expect a threshold speedup requirement in the range of 125x).

The TAs reserve the right to decrease the threshold speedup needed to obtain the 70% mark if many students are struggling even after putting in a reasonable amount of effort in this lab (as determined by the TAs).

Competitive Portion - 30%

The competitive portion marks (30% of total marks) will be assigned using this formula:

mark = sqrt((your speedup - worst speedup) / (top  speedup - worst speedup))

The top speedup is the highest speedup obtained by a student in the class. The worst speedup is the lowest speedup that is greater than the threshold speedup obtained by a student in the class.

The input we use to evaluate your solution is not provided on purpose. It is to prevent students from hard coding optimized solutions for the input. However, you should expect the image size to be large (somewhere around 10,000 x 10,000) with a long list of commands.

Assumptions and Requirements

When we evaluate your solution using automarker, you can (should) assume the following:

Object will always be visible and never shifted off the image frame.
You don't need to output incomplete frames (image frames consisting of less than 25 object manipulating actions).
Your solution will only need to handle square images with sizes up to 10,000 pixels in width and height.
Your solution will only need to perform a maximum of 10,000 sensor value inputs.
Your solution will be evaluated multiple times before an average speedup is taken.
Since we evaluate your code multiple times, it is possible to cache data or outputs from previous runs for the later runs. However, you must not cache any data or outputs from prior runs to gain advantage. Doing so will be considered cheating. A safe rule is to not use global variables that are initialized in the first run and then their initialized (cached) values are used in the later runs, which will improve performance for the later runs.
As long as your code compiles with modifications only in implementation.c file and you have not attempted to cheat, your solution is most likely valid. If in doubt, ask the TA via e-mail or on piazza.

FAQ

For more details about the lab, read the Lab 2 FAQ.

Submission

After completing the lab, please submit (on a UG machine) your solution as follows:

submitece454f 2 implementation.c

Make sure to include your identifying information in the print_team_info() function. Remove any extraneous print statements.

Once you submit your work using the submitece454f command, your submission will be placed in an automarker queue so that your code can be run by our automarker. The automarker will only assign marks if your program runs successfully.

You can submit your solution as many times as you like before the submission deadline. We will use your last submission before the deadline for grading.