this text outlines the general design of the compiler.

C is basically made up of declarations.  DECL.C is responsible
for parsing all declarations.  There are two basic types of
declarations, data declarations and function declarations.

---------------------------------------------------------------------
data declarations are parsed and each results in a structure of type SYM
which completely determines all information about the symbol being
declared.  Symbols are generally put in either a hash table or a linked
list depending on the scope of the symbol.  Symbol management is mostly
done in SYMBOL.C

Because typing is very important
each symbol is accompanied by a linked list of TYP structures which
determines the type of the symbol.  This information is used for
type compatability and auto-casting purposes.  This is all
fairly straightforward, the most confusing part of the TYP structure
is the 'val_flag' field.  In general this will be zero, which indicates
that to get the value of the variable you must load it out of the
corresponding address.  When this is set for one (which it is for
example for arrays and structs) it means that the address of the
variable is its value.

a variable may optionally be assigned an initial value.  If
the value is outside all functions the procedures in INIT.C
will be called to grab the initial values.  Initial values will
be put in a linked list of type 'struct decldata' which is also
part of the SYM structure.  When the program has been completely
parsed the symbol table will be traveresed and all this initial
data will be dumped to the file.

When a variable inside a function receives an initial value, it
will be treated very similarly to any other kind of assignment
statement.


---------------------------------------------------------------------
function declaration is done in several stages.  When a function declaration
is located control is transfered to FUNC.C which does the
entire function declaration.  FUNC.C calls DECL.C to parse
arguments and it calls STMT.C to parse the function body.

In the first stage the input is parsed by STMT.C and put into a 
list of SNODE structures.  The SNODE list
is basically an internal representation of the program flow.

Expressions are a kind of statment.  In addition they may be
evaluated in conjunction with a more complex statement, for
example the 'for' statment has three expressions which control
the looping.  Each expression is parsed into a binary tree of type
ENODE and then placed in an appropriate field of the SNODE of the
corresponding statement.

The basic unit of compilation is a function.  When the SNODE/ENODE
structs have been created for a function, then OPTIMIZE.C is
called to fold constants and get rid of algebraic identies like
adds of zero or multiplies of zero or 1. Then ANALYZE.C is called to 
determine what values should be stuffed into registers.  Basically, 
the more you use it the more likely it is to get a register.

Once these stages of optimization are done, the root node of the
SNODE list is passed to GSTMT386.C.  GSTMT386.C follows
the SNODE links and generates code in the form of a list of ICODE 
structures.

When an expression is to be evalutated control is transfered to
GEXPR386.C which generates the code for the expression.  In this
stage individual variables get the intermediate form of an AMODE
structure, which describes the processor addressing mode required
to access the variable.  The
result of this stage is a doubly-linked list of ICODE structures
which define the assembly statements.  At this point  PEEP386.C
is called to do very basic optimizations like turning compares into
tests or move immediates of zero into subtractions.  Once
the routines in peep386.C have completed their job the ICODE
structure is then processed to generate the assembler output.  The
main loop for this is also in PEEP386.C, but the actual output of
code is done by OUTAS386.C
---------------------------------------------------------------------

At the bottom of the compiler is a backbone made up of PREPROC.C
and GETSYM.C.  GETSYM.C reads in successive lines, then calls PREPROC.C
to evaluate any and all preprocessor directives and to do macro
substitutions.  After this GETSYM.C breaks thinks up into
tokens of various types such as identifiers, strings and numbers.  
GETSYM.C will call SEARCHKW.C any time it locates an identifier;
SEARCHKW.C will turn the identifier into a token denoting a reserved
word if it matches one of the reserved words.

One other important thing is MEMMGT.C, which implements a fairly
simple memory management scheme.  There are basically two groups
of memory 'global heap' and 'local heap'.  The global heap is
resident for the entire compile but the local heap is flushed
at the end of code generation for every function.  This somewhat
cuts down on memory needed by the compiler.  Because by the time
we hit the actual MALLOC and FREE routines we are dealing with
fairly constant size blocks, we don't have to worry about whether
the compiler used to compile this code has an adequate memory
management algorithm.  (borland C does not, in an assembler I
wrote I once had 100K of free memory but it was fragmented into
non-contiguous 8-byte chunks before the end of the assemble, 
totally unusable).

