University of Toronto
CSC467F Compilers and Interpreters Fall 2005

Phase 4: Code Generation

In phase 4 we complete the compiler by implementing the code generation. The code is compiled to a virtual machine which is linked directly to the compiler; unless the suppressExecution flag is set, the compiled code is then run. As part of this phase code generation templates for the major constructs in the language have to be included with the documentation.

Starter files

The starter code is available for download from this web site in a single file, starter4.tar.gz. Once the file is downloaded into your working directory you can use gunzip and tar to extract the source files; these are:

In this phase you should add a code generation module (which I suggest naming codegen.c), an appropriate header file, and any other files you deem necessary. main.c and makefile will have the be modified to allow linking and running of the virtual machine etc. It is very likely that your symbol table module will as well have to be somewhat modified to work with the code generator. Other files dealing with AST generation etc. may or may not need to be modified; the machine source code, on the other hand, should not be touched.

Generating the code

See the virtual machine web page for a description of the virtual machine's architecture. The machine's memory is contained in a globally visible array of sixteen bit integers named memory; during code generation you will be filling in the lower portion of this array with the compiled code, while during execution the higher portion of the array will be used for the virtual machine's memory. The virtual machine is stack based, with all calculations taking place in a stack in the machine's memory.

A special feature of the machine is its ability to handle scoped local variables in displays. Along with two registers pointing at the next instruction and next free word in the memory it maintains an array of registers holding addresses to the current displays for all active scopes. All variables are accessed via these displays, and when your program enters the nth scope level it has to set the nth display register to the address of the activation record for the local data of this scope level. Note: when setting the address of the nth display register the previous value should be saved; it is possible for e.g. recursive function calls to be at the same scope level as other still active code blocks. This previous value should be restored when the scope is left.

Layout of the activation records. Activation records should be kept as blocks of memory in something like the following form:
         ________________________________
      / | dynamically allocated          |    \
simple  |________________________________|<-.  \
scope   | variable, constant: value      |  |   function
      \ | array: holds address to ----------'    or
        | previous display at this level |      procedure
        |--------------------------------|      scope
        | function parameters,           |     /
        | return address, value          |    /
        |________________________________|
Since the number of variables (including arrays) is known at compile time, the size of the block for any particular scope is the same for all instances of that scope. Code for initialization of variables and allocation for arrays can be treated as the first statements of the scope to be executed, e.g.
begin
    integer: i := 2*n
    integer: a[n^2]
  ...
end
is equivalent to
begin
    integer: i
    integer (array): a

    i := 2*n
    (size of a) := n^2
    (allocate memory for a by resetting stack pointer)
    (value of a) := (address of first word in this block)
  ...
end
Note that the offset of variables within their activation records will have to be maintained somewhere, such as the symbol table, or declaration list element holding other info about the variable.

There are various ways of handling the code for function bodies. The easiest, and the one I recommend, is to place it where it occurs within a block of code, with an unconditional branch jumping over it. This way the address is known when the function/procedure is called, so no fixups have to be done later. Some (relatively small) increase in efficiency is effected by placing the code for all functions together either before or after the main block, but this will require making a second pass through the compiled code to fix up the addresses for the function calls.

For a complete listing of the code blocks that need to be generated see the code generation rules web page.

Code generation templates and other documentation

As part of the specifications for this phase, code generation templates should be written up for the following productions (and no others):

  1. IF statement with ELSEIF and ELSE
  2. RETURN from function with value
  3. Array declaration
The idea behind the templates is fairly simple: in a schematic form using plain English plus some of our virtual machine's assembly language provide a generic description of the machine code that will be generated for a statement, expression, etc. It is important that care is taken to maintain information about the program counter and the display registers. Here are a few examples to give you an idea of an acceptable format:
Type of Construct Grammar Description Code Generation Template
integer literal integer
PUSH integer;
sum expr1 + expr2
(code for expr1);
(code for expr2);
ADD;
assignment to scalar variable var := expr
ADDR LL(var) ON(var);
(code for expr);
STORE;
if w/o else if expr then stmt end
(code for expr);
PUSH (address immediately following stmt);
BF;
(code for stmt);

Other documentation. As well, some documentation should be provided briefly describing basic structure of your code generation routines, and whatever changes had to be made to your parser, AST, semantic routines, and symbol table.

What to hand in

Electronic versions of your documentation and all source code files (including e.g. globalvars.c and common.h whether they are modified or not) should be tarred and gzipped together and submitted via ECF's submit facility as assignment 4 on the due date, by one member of your group. If you experience any problems with submission, notify me at fvb@cs) as soon as possible. Note: Faculty of Engineering rules prevent us from giving a general extension, but I am certainly willing to consider extra time on a group by group basis.


Frank Van Bussel
Last modified on November, 2005