University of Toronto
CSC467F Compilers and Interpreters Fall 2005

Phase 1: Lexical Analysis

In phase 1 you are required to hand in several programs written in the CSC467 compiler source language, and implement the basic lexical analysis for the compiler using flex. What is expected for the source programs is given in the general description of the project; here we will more describe in detail the lexical analysis aspect of phase 1.

General Idea

In phase 1 you will implement two aspects of the lexical analyzer: the trace option as given in the compiler man page and error handling. This will require that the input stream be properly divided into the tokens belonging to the source language. In phase 2 you will complete the implementation of lexical analysis by adding the interface to the parser module. This will not require much work if phase 1 is done properly.

Starter files

The starter code is available for download from this web site in a single file, starter1.tar.gz. Once the file is downloaded into your working directory you can use gunzip and tar to extract the source files; these are:

The starter code will compile (using make compiler467) but the resulting scanner merely strips whitespace and comments, echoing everything else in the input file to standard output. Your job is to modify the given scanner so that it correctly identifies the tokens of our source language, checks for lexical errors in the source code, and, if the -Tn (trace scanner) option is set, outputs a trace of the scan (format will be given below). By default, if no compiler options are set and no errors detected, your completed compiler should not give any output whatsoever at this stage.

In this phase of the project, only the scanner.l file should be modified. During subsequent phases both main.c and makefile will require modification, and you may add material to globalvars.c and common.h as you think necessary.

Format for the trace output

When the -Tn (trace scanning) switch is activated in the compiler, the compiler will output a single line for each token read, of the form:
TOKEN tokenval: tokentext

where tokenval is the integer value that the lexical analyzer will return when the given token is found, and tokentext is the text in the source language file that matched to the token. tokenval should be different for every distinct token, while tokentext should be a verbatim printout of the text in found in the file.

Tracing is active if and only if the global variable traceScanner has been set to "TRUE" (1). Output is sent to the globally visible FILE * variable traceFile. If the token read is valid, no other action should be taken by the lexical analyzer at this stage except to return with a non-zero value.

Error Checking

In phase 1 you will also implement lexical level error checking. If an error is detected at this stage lexical analysis is terminated with the message:
LEXICAL ERROR, LINE #: your error message.

This line should be output to the globally visible errorFile. Try to make sure that if all output is going to the consol that the error message is on a separate line. The line number is available from the variable yyline declared in the scanner file. As well as giving the error message, your lexical analyzer will also set the global variable errorOccurred. The analyzer can be exited by returning 0 at this point, or calling the flex supplied function yyterminate(); under no circumstances should you call C's exit() function (the calling routine may have some cleanup to do as well).

The contents of the error message are at your discretion, but it should be informative enough to find and fix the error in the source code. There are not many errors that can be caught at the lexical level; those you should be looking for are:

Certain other errors, such as malformed identifiers, could be caught here, or by the parser (the "interpretation" of the error will be affected by the stage at which the error is caught). The only one of these errors you are responsible for at this stage is the following:

What to hand in

The documentation for phase 1 should be very short, basically stating who is in your group and what your testing strategy for the scanner was, plus explanations of any problematic or complicated aspects of the scanner. Electronic versions of your compiler source code (including the makefile) and external documentation should be tarred and gzipped together and submitted via ECF's submit facility as assignment 1 before class. Every group should nominate one member to do the submission (this need not be the same person for each phase); if different packages are submitted by different people in the same group, notify me (at fvb@cs) as soon as possible, specifying which of the submissions you want me to test.


Frank Van Bussel
Last modified on September, 2005