University of Toronto
CSC467F Compilers and Interpreters

Course Project

As part of your requirements for this course you will implement the compiler for a simplified programming language. Features of the language include scalar types, arrays, if and loop statements, functions and procedures with parameters. The compiler should generate assembly language code for a stack-based machine the interpreter of which will be given.

Assignment Schedule

All assignments are to be submitted on their due dates before class or tutorial. Late assignments will not be accepted unless an extension has been granted in advance by the TA or instructor; depending upon the circumstances, a late penalty of 10% a day may be assessed in such cases. If possible, solution code for phases 1 and 2 will be made available on the due date.

Assignments may be done by teams of students with 3 to 5 students per team. Team lists should be handed in or emailed to the TA at least a week before phase 1 is due; any students who are not a member of any team should contact the TA before the first tutorial on September 22. All students on a team will get the same mark for the assignment they hand in.

Project Testing

For the part of the assignments that involve modification of the project compiler, you will need to do test runs to check that you have implemented the features required for the particular assignment. As a minimum these tests should illustrate:

  1. Correct handling of normal cases
  2. Correct handling of tricky or difficult cases
  3. Any error handling that you have implemented.
Specific testing requirements for each assignment will be discussed in tutorial. However it is important that your documentation includes a description of your testing strategy.

What to hand in

For every assignment you will submit the following:

  1. Well written documentation for the work you did on the assignment. Documentation will be worth at least 15% of the mark for every assignment. This percentage may be increased for some assignments. Documentation should discuss design, implementation, and testing in a clear, concise manner; diagrams and other visual aids are welcome.
  2. All the code files involved in each assignment. Note: this includes starter files and files unchanged during the current phase; the marker should be able to simply invoke make (included in the submission as well) to compile your programs.
Both documentation and code are to be submitted electronically (tarred and gzipped) by the deadline using the submit facility of the ecf systems. Hard copy of code or documentation is not necessary.

The Assignments

Assignment 1, due September 30 [5%]. Write each of the following in CSC467 Source Language. Store them in text files (Unix format) and submit along with docs and code.

  1. a program that does not use functions or procedures, but does use variables, arrays, block structures (conditionals, loops), assignments, input, and output
  2. a program using nested loops and conditionals, simple functions and procedures with and without parameters, and at least one instance of a non-procedural non-program level scope statement
  3. a program using recursive functions with parameters, plus a procedure or function declared within another that features a non-trivial access of variables at a lower lexical level
Note: this part of the assignment is not worth a great deal, its purpose is to help the TA determine if you understand the source language specs; plus it is a good exercise to get you going on the assigment.

Also: using flex, build a lexical analyzer for recognizing the tokens of the language grammar. Files with the basic declarations will be given and should not be modified. Your lexical analyzer should store appropriate information for each token identified into a tracing file, and any error recognized into an error file.

Submit all source code (not just scanner.l), plus a short document describing how you dealt with any special problems encountered (the basic working of flex itself does not have to be described).

Assignment 2, due October 14 [10%]. Using the lexical analyzer from assignment 1 and bison build a parser for the translation grammar. This parser should include the proper operator precedence and associativity so that it is LR(1). Your parser should stop at the first parse error, after reporting it. If the program is parsed successfully, your parser should (by default) not write anything. No semantic actions are required in the bison file. A solution to phase 1 (a correctly working scanner) will be made available with the starter code for those who would prefer not to use their own.

A set of files containing a skeleton parser will be provided. Write a new parser to replace the skeleton (the skeleton will contain the basic structure and declarations you need, but otherwise will not be a good model for your solution). A correct parser should not have any shift/reduce or reduce/reduce conflicts. You may not change the programming language. Documentation will need to be a little more in depth than for phase 1, since you are implicitly defining the shape of your AST with your bison file, but again there is no need to explain things that can be assumed from the basic workings of bison. A solution to the assignment (i.e. a correctly working parser) will be made available by October 14. You may not turn in this assignment after the solution has been handed out.

Assignment 3, due November 14 [15%]. Using the solution parser for phase 2 (your own or the one provided), you are to do the following:

  1. Modify the parser to create an Abstract Syntax Tree (AST) and implement a dump AST module that outputs the AST in a human-readable form. A definition of the data structures for the AST plus a table of expected dump output strings will be provided.
  2. Construct a symbol table module and a semantics module that check all language constructs including arrays, functions and procedures. These modules should not interface with the parser directly but should process the abstract syntax tree produced by the parser. NOTE that you may have to modify the AST data structures for semantic analysis.
Documentation for this phase is quite important, since here you have a great deal of freedom in how the compiler is implemented. Note that this assignment is very coding-intensive so budget your time accordingly.

Assignment 4, due December 2 [15%]. Design code generation templates for selected statements in the CSC467 Source Language. (A code generation template is a generic description of the machine code that will be generated for a statement).

Construct a code generator. Add it to the compiler and test it. You may not change the pseudo machine interpreter. Electronically submit all source code, and hand in relevant documentation.


Frank Van Bussel
Last modified November, 2005