Computer Systems Programming

ECE454, Fall 2025
University of Toronto
Instructors: Ashvin Goel, Ding Yuan

    Computer Systems Programming

Lab 2: Memory Performance

Assigned: Sept 18, Due: Oct 12, 11:59 PM

The TAs for this lab are: Ruibin Li, Eric Xu

Introduction

After great success with their previous client, OptsRus has a second client: a virtual reality headset startup. The startup is co-founded by a group of hardware geeks: those who like to design electrical circuits and integrate sensors. The VR headset prototype hardware is almost ready but lacks a high-performance software image rendering engine. The hardware engineers have already written functionally correct code in C but need your help to supercharge the performance and efficiency of their code.

The rendering engine's input is a preprocessed time-series data set representing a list of object manipulation actions. Each action is consecutively applied over a 2D object in a bitmap image such that the object appears to move with respect to the viewer. To generate smooth and realistic visual animations, sensor data points are over-sampled at 1500Hz or 25x normal screen refresh rate (60 frames/s).

The diagram below shows all the possible object manipulation actions. Your goal is to process all these object manipulation actions and output rendered images for the display at 60 frames/s.

Object Manipulation Actions

Implementation Overview

Fundamentally, the rendering engine you are asked to optimize is simple in design. This section will briefly describe all important parts you need to know for this lab. A well-documented trivial reference implementation is provided to help you get started.

Input Files

There are two input files. The first one is a standard 24-bit bitmap square image file. The second one is a list of processed sensor values stored in a CSV (comma separated value) file.

For the bitmap image file, white pixels (RGB=255,255,255) are the background, and non-white pixels are considered part of an object. You can generate your own image files using Microsoft Paint in Windows or GIMP on Linux. After drawing your own square image, export the bitmap to the 24-bit bitmap format. If you are using GIMP, under compatibility options, make sure to check "Do not write color space information" option. Then under advanced options, select "R8 G8 B8" under 24 bits. We have packaged a tiny 10x10 pixel bitmap image named object_2D.bmp in the lab assignment package to help you get started.

The list of processed sensor values can be viewed as a list of key-value pairs. The key represents a basic object manipulation action, and the value is the magnitude of the specified action. An example sensor value input file is shown below:

W,6   // shift object up by 6 pixels
A,5   // shift object left by 5 pixels
S,4   // shift object down by 4 pixels
D,3   // shift object right by 3 pixels
CW,2  // rotate entire image clockwise by 180 degrees
CCW,1 // rotate entire image counter-clockwise by 90 degrees
MX,1  // mirror entire image along the X-axis in the middle
MY,0  // mirror entire image along the Y-axis in the middle

Data Structures

Frame Buffer and Dimension

The input bitmap image has already been parsed for you. The image pixel data is stored in the following data structure:

unsigned char *frame_buffer; // [R1][G1][B1][R2][G2][B2]...
unsigned int width, height;  // dimension in number of pixels

Sensor Values

The processed sensor values input file has also been parsed for you and stored in a key-value pair array. The array has enough storage for at least 10,000 key-value pairs. As mentioned earlier, the key represents the basic object manipulation action, and the value represents the magnitude. It is stored in the following data structure:

// KV Data Structure Definition
struct kv {
    char *key; int value;
};

// KV Data Structure Array
struct kv sensor_values[10240];

Key Function and Definitions

In this lab, you are only allowed to modify a single file (implementation.c). This file has three important functions. Please make sure to follow the instructions shown below regarding these functions.

print_team_info()

This method is used to print the team information to stdout. Since you will be doing the lab individually, your team is just you. It is called upon program startup. This information is used by the auto-marking system. Please modify the function shown below before starting the lab. Failure to modify this information or an incorrect modification will result in a zero mark for the assignment. All versions of submitted solutions, including from previous years and their github counterparts will be compared for plagiarism.

// Please modify this field with something interesting
char team_name[] = "default-name";

// Please fill in your information
char student1_first_name[] = "john";
char student1_last_name[] = "doe";
char student1_student_number[] = "0000000000";

implementation_driver()

This method is the main entry point to your code. All the available data is passed on to your implementation via this function. You should not modify the prototype of this function. Currently, a naive but working solution of the lab is provided in implementation_driver() to help you get started. However, you are free to modify the implementation of this function and modify or delete anything else in this file except for the print_team_info() function mentioned above. Please make sure that the implementation_driver() function is always reachable from main.

The prototype of the function is the following:

void implementation_driver(
    struct kv *sensor_values,
    int sensor_values_count,
    unsigned char *frame_buffer,
    unsigned int width,
    unsigned int height,
    bool grading_mode
);

verifyFrame()

You must call this function for each frame you are required to output to verify the correctness of your implementation. Before you call this function, please make sure you pass in valid data of the correct type. Failing to perform this step will generate an error in the program. The prototype of the function is the following:

void verifyFrame(unsigned char *frame_buffer,
                 unsigned int width, unsigned int height,
                 bool grading_mode);

Performance Measurement Tools

Perf Tool

To gain insight into the cache-related behavior of your implementation, you can use the perf tool to access the hardware performance counters of the processor. For example, to output the first-level cache misses generated by your program foo, you would execute:

perf stat -e L1-dcache-load-misses foo

You can view the list of all performance counters that you can monitor by running:

perf list

You can monitor multiple counters at once by using multiple -e options in one command line. perf has many other features that you can learn about as follows:

perf --help

For example, you can monitor TLB misses or other more advanced events. Check out this short write-up about perf.

Gprof & GCov

If you have successfully completed Lab 1, you should be familiar with the gprof and the gcov tools. They can help pinpoint the bottleneck in your program when the program is run locally.

Note: To configure these tools for use within your project, you will need to provide additional cmake command-line options while generating the make file. You can find examples on Stackoverflow.

Setup

Initial Setup

Start by copying the hw2.tar.gz file from the shared directory /cad2/ece454f/hw2/ on the UG machines into a protected directory within your UG home directory. Then run the command:

tar xzvf hw2.tar.gz

This will cause several files to be unpacked into the directory. The ONLY file you will be modifying and handing in is implementation.c. You are prohibited from modifying other files. In implementations.c, please insert the requested identifying information. Do this right away so that you don't forget.

Compilation

The lab assignment utilizes the open-source, cross-platform CMake packaging system to manage the source code. Unlike the simple projects you have seen before, CMake generates the Makefile based on your computer configuration. The instructions to compile the project are shown below:

> cd <project directory> // Navigate to the lab assignment directory
> mkdir bin && cd bin    // Make a new bin directory, then navigate in it
> cmake ../              // Use cmake to generate Makefile automatically

After these simple configuration steps, the make file is automatically generated. Simply run make and an executable named ECE454_Lab2 should appear in the bin folder.

Coding Rules

The coding rules are simple. You may write any code you want, provided it satisfies the following:

This year we have added two new rules:

You will not get an immediate error if you use assembly or compiler attributes with the automarker (see below), but we will scan your code after the deadline and you may receive zero if you violate these rules.

Evaluation

The lab is evaluated when the grading mode is turned on. The grading mode is controlled via the -g command line flag (see example terminal output below). The evaluation turns on instrumentation code to measure the total clock cycles used by the implementation_driver() function. When you evaluate your implementation using the commands below, you should see output similar to the output shown below.

$ cd <project directory>
$ ./bin/ECE454_Lab2 -g -f sensor_values.csv -i object_2D.bmp
Loading input sensor input from file: sensor_values.csv
Loading initial 2D object bmp image from file: object_2D.bmp
********************************************************************************
Team Information:
   team_name: default-name
   student1_first_name: john
   student1_last_name: doe
   student1_student_number: 0000000000
********************************************************************************
Performance Results:
   Number of cpu cycles consumed by the reference implementation: 124374
   Number of cpu cycles consumed by your implementation: 125073
   Optimization Speedup Ratio (nearest integer): 1
********************************************************************************
SUCCESS: frame #0 is the same compared to the reference implementation

Marking Scheme

Note: This course allows you to practice trading your grades and time. Don't be too stressed that there are people better than you at optimizing the program. However, you should be worried if you cannot meet the minimal and acceptable performance targets as described below.

Non-Competitive Portion - 70%

The non-competitive portion should be fairly easy for anyone who puts in a reasonable amount of effort in this lab. If you can achieve a certain level of performance speedup, which we call the threshold speedup, the TA will assign marks to you 70% marks for this portion. This lab is designed to provide you with several opportunities for implementing performance optimizations. One can trivially achieve at least a 50x performance speedup compared to the reference solution.

The threshold speedup for this portion of the lab will be based on the average mark of all submissions in the first week of last year (e.g., you can expect a threshold speedup requirement in the range of 125x).

The TAs reserve the right to decrease the threshold speedup needed to obtain the 70% mark if many students are struggling even after putting in a reasonable amount of effort in this lab (as determined by the TAs).

Competitive Portion - 30%

The competitive portion marks (30% of total marks) will be assigned using this formula:

mark = sqrt((your speedup - worst speedup) / (top  speedup - worst speedup))

The top speedup is the highest speedup obtained by a student in the class. The worst speedup is the lowest speedup that is greater than the threshold speedup obtained by a student in the class.

The input we use to evaluate your solution is not provided on purpose. It is to prevent students from hard coding optimized solutions for the input. However, you should expect the image size to be large (somewhere around 10,000 x 10,000) with a long list of commands.

Assignment Assumptions

In the automarker, you can/should assume the following when writing your algorithm:

Submission

When you have completed the lab, you will hand in exactly one file, implementation.c, that contains your solution. You submit your assignment by typing:

submitece454f 2 implementation.c

on one of the UG machines. Make sure you have included your identifying information in the print_team_info() function. Remove any extraneous print statements. You can submit your solution as many times as you like before the submission deadline.

Once you submit your work using the submitece454f command, your submission will be placed in an automarker queue so that your code can be run by our automarker. Our automarker is run on a single dedicated machine. This machine has server-grade hardware (Intel Xeon E5-2430 CPU) similar to what cloud computing providers use compared to the lab machines (Intel Core i7 4790 CPU). After the automarker runs your code, it makes your speedup score available on our web site. This web site's URL will be provided on Piazza.

In most cases, your speedup score should be reflected on our web site within a short period of time after your submission. However, the web site may delay showing the speedup scores up to 24 hours after submission to prevent students from submitting too frequently. If you notice any irregularities with the web site, let us know via a piazza post. We will provide the most up-to-date status of the automarker on Piazza.

We recommend using the UG lab machines to test your code for correctness and to gauge relative performance against your own previous solutions. However, we will assign your final grades based on running your code on our dedicated machine and your marks on our web site. These marks are generally accurate but they may be adjusted if any cheating is observed. We will not use any hidden or extra tests for marking.