
               THE CALC PLUS CLASS LIBRARY

                  by Vladimir Schipunov,

                   Copyright (c) 1996.

              Version 1.0, April 2, 1996



                        CONTENTS

          1.      What is CalcPlus library.
          2.      Distribution and warranty notice.
          3.      Installation and running examples.
          4.      How it works.
          4.1     Lexical analyzer.
          4.2     Class CType hierarchy.
          4.3     Class Expression hierarchy.
          4.4     Language and YACC rules.
          4.5     Interface with C++.
          4.6     Modifying source code.
          5.      Known bugs and problems.
          6.      Appendix. Language description.


  1. What is CalcPlus library.
  ----------------------------

  The CalcPlus library is the C++ class library which provides an ability
  to use your own programming language built into C++ project. Almost any
  complex C++ application needs to be tuned by some external description,
  e.g. INI file. CalcPlus generalizes this approach, any algorithm or any
  constant needed by application can be carried out into the special file,
  when process comes to the key point, it calls function or procedure stored
  in the text file. Interpreter runs the function and process returns back
  to C++ code. Library contains the interpreter which understands simple
  nameless procedural language. Bi-directional communication between C++ and
  the code for interpreter available.

  If you develop C++ project and want to provide user access to the algorithm
  of application, its constants or schemes, then probably you can make use of
  the CalcPlus class library.

  Version of the language that comes with the library allows to use functions,
  procedures, blocks, preprocessor, global and local variables and constants,
  if/for/while statements. Each variable can have value of type: nil, bool,
  long, float, string, date. Type definitions and arrays are allowed. Functions
  and procedures may be recursive. New functions written in C++ may be easily
  added to the language. Syntax of the language can be modified by changing
  YACC rules. Interpreter is fast enough and may be helpful for many tasks.

  Interpreter was successfully used in server application of Btrieve-based
  financial software. Often changing on requests of customers parts of C++
  code were moved into special file, which was interpreted by CalcPlus.

  The other application of the library was emulation Clipper machine (except
  "code blocks"), this allowed to debug C-extensions written for Clipper
  in normal C++ environment. (Actually, interpreter has a lot of common
  with Clipper and runs with the same speed on 16-bit platform, on 32-bit
  platform it is faster).

  Interpreter was written in 1995 during approximately 3 months. Parts of
  older C++ and YACC code were used. This is the first freeware release.
  Library pretends to be compiler/OS independent. This means you can
  compile it on any OS with any C++ compiler, YACC required. Templates,
  exception handling and RTTI were not used for compatibility with older
  compilers.


  2. Distribution and warranty notice.
  ------------------------------------

  The CalcPlus Class Library is freeware, you may use, modify and redistribute
  it under the condition that copyright notice is not removed from the source
  code.

  NO WARRANTY OF ANY KIND, YOU USE THIS SOFTWARE ON YOUR OWN RISK.

  Author:

    Vladimir Schipunov, 25 years,

    email:  vschipun@cammail1.attmail.com
    phone:  1-908-2716881

  Any comments, suggestions, extensions, bugs information are very welcome.


  3. Installation and running examples.
  -------------------------------------

  To install the class library, unzip file calcplus.zip.
  In UNIX platforms you probably will need to use options -a -L for unzip,
  these options are needed to convert file names to lower case and to convert
  DOS text with Carriage-Return-Line-Feed to UNIX text with Line-Feed.

  DOS:    pkunzip calcplus.zip
  UNIX:   unzip -a -L calcplus.zip

  This version of archive contains files:

  calcexpr.h          |
  calclex.h           |
  calctype.h          |
  yycalc.h            |
  calcexpr.cpp        |   C++ source code
  calclex.cpp         |
  calctype.cpp        |
  calcplus.cpp        |
  calclib.cpp         |
  yycalc.yac      YACC source code
  calc.mak        makefile for DOS
  gcalc.mak       makefile for UNIX
  readme          short description of the archive
  calcplus.txt    this file
  hello           Hello, world!
  example         selfcheck and example of program
  prime           example of program
  pi              example of program


  To build the interpreter you will need YACC and C++ compiler.

  DOS:  Command line options for some widely used C++ compilers for DOS are
        written in the file calc.mak. Uncomment or change CC and LINK to your
        C++ compiler and specify your version of YACC if necessary.
        Run make utility:

        make -f calc.mak

  UNIX: gcalc.mak is the makefile for GNU C++, simply run:

        make -f gcalc.mak

  Library was carefully tested for use with many versions of popular compilers:
  Borland, Watcom, Microsoft, Zortech, GNU. Generally, you should not have
  problems with building the interpreter. However, if you have problems,
  please contact me.

  The first thing you should do after building the interpreter is to check it:

      a)  calc example

          This is the primary check of correct work of the interpreter.
          File 'example' contains most of interpretable syntax constructions.
          If version works correctly output should be:

          {0,1,4,9}
          {0,1,4,9,{1,2,3}}
          {0,1,9,{1,2,3}}
          1 7 255
          {a,b,c}
          {a,{d,e,f},c}
          {a,{d,TRUE,f},c}
          3 ab 1
          exiting...

      b)  calc pi [Number of iterations]

          If interpreter works right you should see
          something around PI number.

      c)  calc prime [Upper limit]

          This test calculates prime numbers below
          the upper limit, 1000 by default.

  Command line switch /d can be added to trace the program, e.g.:

          calc prime 100 /d

  Note, that interpreter uses recursive algorithms and requires enough
  space on the stack. So, if you get runtime error 'stack overflow',
  increase the stack size.

  If interpreter works right, you can begin its adaptation for your own tasks.
  Here is the general description of the library.


  4. How it works.
  ----------------

  First of all, interpreter designed using YACC (yet another compiler of
  compilers). Hand written lexical analyzer takes input from the file(s),
  yyparse() function processes input and builds program as the tree of
  instructions. Each node of the tree has its own value, execution goes
  from the child nodes to the root.

  So, CalcPlus consists of:

      lexical analyzer,
      yacc parser,
      basic types hierarchy,
      hierarchy of language instructions.


  4.1. Lexical analyzer.
  ----------------------

  In general function YLex::yylex() is the traditional translator of input
  stream into tokens for yacc parser. Class YLex takes care of token
  analysis, simple preprocessing is performed. However, there are some
  features listed below.

  Tokens are divided into two parts - the first contains key words, signs of
  arithmetic operations, etc., the second part contains tokens returned
  by descendors of YLex class. Overloadable method YLex::__name() may
  find the word in the lists of already defined symbols: functions,
  procedures, structures, variables or constants. At the first case
  token lx*** used, at the second - yy***.

  Several simple container classes provide storage and linear search
  of the objects and references to them. There is also stack container,
  objects stored in stack are destroyed after pushing them, so often
  only references to the objects are pushed into stack.

  Preprocessing is performed by pushing into the stack of input streams.
  Only 'include', 'define', 'ifdef', 'endif' directives are supported.
  Input stream is an 'ifstream' for file input and 'istrstream' for
  preprocessor.

  Symbol '->' is translated as 'implementation'. When interpreter
  finds statement 'A -> B' it assumes that 'if A then B end' occurred.

  Strings separated by both '...' and "..." are allowed.

  Comments are of C++ style:

          //  This is comment
          /*  Another comment */

  Case of letters is insignificant.

  In all other senses lexical analyzer acts like any other yylex function.


  4.2. Class CType hierarchy.
  ---------------------------

  Class CType is the base class for all classes corresponding to the CalcPlus
  types. They are: CNil, CBool, CLong, CDouble, CString, CDate, CArray.
  These types may be used for writing program for the interpreter. When
  yylex() finds immediate value in the program, it allocates the object of
  appropriate type. For instance, following constants correspond to types:

      2       CLong
      1.2     CDouble
      true    CBool
      'aaa'   CString
      {1,2,3} CArray

  New types may be added by simply adding new class inherited from the base
  abstract class CType. Values of new types may be returned by C++ functions
  or yyparse() may be changed to translate input tokens into the instances
  of new class. Class CType has a good number of pure virtual methods,
  some of them actually may be set dummy.

  First of all, class should provide its identification. Unique numeric
  identifier is received from enumeration in file calctype.h. This identifier
  should be used in the constructor of CType and returned by method type().
  Method name() is the symbolic identification of the class.

  Method copy() returns pointer to the copy of the object allocated
  by operator new. Usually this is call of copying constructor:

      CType* copy() const { return CNewClass( *this ); }

  Input and output methods should be overwritten as well. They are:

      void print( ostream& ) const;
      void get( istream& );

  Other important methods are comparison and assignment:

      CType& operator =( const CType& t );
      compare( const CType& ) const;

  In order to provide standard implementation of these operations for
  simple objects represented by sequence of bytes, methods data(), size()
  and ptr() are used. These methods are intended to give an access to binary
  data of objects. The only difference between data() and ptr() is that data()
  is const method and it should be used in most cases rather than method ptr().

      const void *data() const;
      void *ptr();
      size() const;

  Class CArray is only non-primitive example of data type. It implements
  data() and ptr() methods to return null pointer and overloads methods of
  comparison and assignment. Arrays may be indexed by strings. Field
  CArray::structure points to the array of CString's, operator[](const char*)
  looks for the pattern string in the index array and returns reference
  to the object of the array, which has the same index as the element of
  CArray::structure. String indexes are used with structure types:

      struct abc {a,b,c};
      abc a;
      echo a.b;

  This will be translated as the CArray indexed by CArray {'a','b','c'}
  and echo a['b']. Such definitions may be useful though they are not
  strict. No optimization made for speed of the index search.

  Classes of the CType hierarchy are relatively simple. For details of
  realization refer to the source code in file calctype.cpp.


  4.2. Class Expression hierarchy.
  --------------------------------

  Classes of Expression hierarchy are the main part of the interpreter.
  All lexical input after parsing is converted into the tree of the
  instances of Expression inheritants as the nodes.

  Each node, as it is the instance of Expression, has:

      1) field 'flags' which shows the current state of the node
      2) field 'v' which is the pointer to the value of the node
      3) method 'Calc' which is called every time when node is being
         calculated

  Let us see what the process actually does during interpretation. Recursive
  method Expression::Calculate runs first at child nodes, if the execution
  of child nodes was not interrupted by setting non-zero flags, then method
  Calc() of the node will be called. Calc() refers to the values of its child
  nodes. For instance, if we have inherited class Addition from the abstract
  class Expression, then method Addition::Calc() could look like:

      void Addition::Calc()
      {
          *v = *child[0]->v + *child[1]->v;
      }

  More precisely, we should check the type of arguments, because this is an
  interpreter and there is no type checking in compilation time.
  So, the method rather should be:

      void Addition::Calc()
      {
          if( child[0]->type()!=idLong ||
              child[1]->type()!=idLong )
          {
              flags = exError;
              return;
          }
          delete v;
          v = new CLong;
          (CLong&)(*v) = (CLong&)(*child[0]->v) + (CLong&)(*child[1]->v);
      }

  Process is controlled by the state of bit flags of the nodes. There are
  normal flags, like indication of function return or exit from while/for
  statement, and flags showing that runtime error occurred. Flags are copied
  from the children to parents. Analysis of the state of flags can show the
  location of node where there was error.

  Every statement written in input language is translated to the instance
  of corresponding class of Expression hierarchy. This is the picture of
  the hierarchy.


      Expression                      base class
      |
      |-------XImmediate              immediate value
      |       |
      |       +-------XEndl           line-feed constant
      |
      |-------XBreak                  exit from for/while
      |
      |-------XAr1                    unary arithmetic
      |       |
      |       +-------XBool1          unary boolean
      |
      |-------XAr2                    binary arithmetic
      |       |
      |       +-------XComparison     comparison
      |       |
      |       +-------XBool2          binary boolean
      |
      |-------XVariable               variable
      |
      |-------XEcho                   output on cout
      |
      |-------XConditional            if ... then ... else ... end
      |
      |-------XLoop                   like 'continue' in C
      |
      |-------XWhile                  while/for
      |
      |-------XBlock                  begin ... end
      |       |
      |       +-------XFunction       func/proc ... end
      |               |
      |               +-------XUserFunction       external C++ functions
      |
      |-------XCall                   function call: f(1,2,3)
      |       |
      |       +-------XDynamic        function by name: &('f')(1,2,3)
      |
      |-------XReturn                 return expr
      |
      +-------XSet                    arrays: {1,2,{1,2,3}}


  Though C++ is much more easier language than English, and full description
  of used algorithms and methods can be found in file calcexpr.cpp, we will
  discuss in the next paragraphs some non-obvious points of architecture of
  the Expression hierarchy.

  For storage values of variables class Var is used. There may be many
  references (nodes of the tree) to the variable in different expressions,
  each reference is the instance class XVariable. Class XVariable has fields:

      PrintObj* obj;
      CType** ptr;
      int ref;

  Field obj is used for debugging output, ptr for setting new value to
  the variable. Field ref used as flag for passing argument to the function by
  reference. This field is set by XCall class temporally while function works.

  Method XVariable::Calc() acts in different manners depending of the number
  of child nodes. We assume, that if the number of children is zero, then
  this is the usage of variable inside of other expression, e.g. (x+y). If
  node of XVariable class has only one child node, then this is the operation
  of assignment: x := expr. When the number of child nodes is two, this is
  array element inside of the expression; (x[i]+y[j]). The only case left is
  three child nodes - this is assignment to the array element: x[i] := expr.
  child[0] is considered to be an array, child[1] index in the array,
  child[2] is expression which is assigned.

  Class XBlock is the composite expression consisting of a number of
  subexpressions. In the input language corresponding statement is:

      begin
          expr1
          expr2
          ...
          exprn
      end

  Every block bounds visibility of variables defined inside of it.
  That is why class XBlock has fields: vars, funcs and structs.
  Actually list of functions used only in global context.

  Function is the block, its arguments are local variables. Function
  has only one child node of class XBlock, which is the body of the function.

  Like XVariable references to Var, XCall references to XFunction.
  Arguments for function are child nodes of XCall. Method XCall::TieArgs()
  is called twice. The first call is to assign values of the arguments
  to the local variables of XFunction. The second is to assign back
  values for arguments passed by reference.

  It is easy to see that all algorithm with keeping temporary results
  of calculation in the nodes of the tree does not allow recursive
  calls of functions. To remove this problem method Expression::Recursion
  used. There is a stack of pointers to the values. By the signal, all
  subnodes of the function put their values onto the stack. This action
  is synchronized with the passing arguments in method XCall::TieArgs().

  There is no separate class XFor. Class XWhile provides both types of
  iterations: for and while.

  Most of other classes are obvious and intuitively clear.


  4.4. Language and YACC rules.
  -----------------------------

  Class CalcPlus is derived from the class YLex. It overloads method yyparse()
  for YACC parsing of the tokens from the input stream. Language is described
  by the set of rules for YACC, generally, every rule simply translates
  its arguments to the appropriate instance of class Expression hierarchy.
  For correct context handling, stack mechanism is used. Each recursive
  syntax construction has corresponding stack container in the class
  CalcPlus, they are:

      LexStack Blocks;    //  blocks
      LexStack Calls;     //  function calls
      LexStack Cond1;     //  'if'    part of the condition
      LexStack Cond2;     //  'else'  part of the condition
      LexStack Sets;      //  sets
      LexStack Idx;       //  array indexes

  Often used definitions XBEG, XEND, XSEQ are intended for handling
  current block context. When new variable defined, we store it in the
  list of variables of the current block. if/else/for/while statements
  have implicit blocks inside of them:

      if a then
          a:=b;
      end;

  This is actually translated as:

      if a then
          begin
              // local variables may be defined here
              a:=b;
          end;
      end;

  Class CalcPlus overloads method __name and searches for symbols that are
  already defined. This makes syntax analysis easier. Method Link uses
  recursive tree search for connecting XCall nodes with the XCall.
  Simple diagnostic is done.

  Method UserSym() is not implemented. It was initially added for different
  extensions. For example, we change method __name to translate symbols
  beginning with letter '@' as lxUser:

      Token CalcPlus::__name()
      {
          Token t = YLex::__name();
          if( t == lxName && *Lex == '@' )
          {
              Expression* e = new XImmediate( new CString( Lex+1 ));
              YYLVAL( e );
              return lxUser;
          }
          ...
      }

  When function yyparse() gets such token it calls method UserSym().
  There two possible calls: with one or two arguments. One argument,
  if token is detected on the right side of assignment, two arguments,
  if token is on the left. Possible implementation of method UserSym may be:

      Expression* CalcPlus::UserSym( Expression *e1, Expression *e2 )
      {
          Expression *e = new XEcho;
          if( e1 ) e->Add( e1 );
          if( e2 ) e->Add( e2 );
          return e;
      }

  So, the statement "@Hello := ' World!';" will print: Hello World!


  4.5. Interface with C++.
  ------------------------

  It is possible both to call C++ code from interpreter code
  as well as to call interpreter functions and procedures from C++.

  There are a number of definitions at the end of file calcexpr.h
  to help writing C++ function visible from the interpreter.
  Let's see the implementation of functions EMPTY and GETENV:

      USER_FUNC( Empty )      //  Is value empty?
          DEF_ARGX( 0, x )    //  We don't know the type of argument
          RETURNS( Bool )     //  Function returns TRUE or FALSE
          ret = x.empty();    //  Getting the result
      USER_END                //  Done

      USER_FUNC( Getenv )                 //  Reading environment variable
          DEF_ARGV( 0, var, String )      //  Expecting string argument
          RETURNS( String )               //  Result will be the string also
          const char* s = getenv( var );  //  Calling C function
          ret = s ? s : "";               //  Check if var is not in env.
      USER_END                            //  Done

  Functions and procedures must be registered before running
  the interpreter to make them visible from the program. Function
  UserLib defined in module calclib.cpp performs the registration:

      void UserLib()
      {
          RegFunc( "EMPTY", Empty );
          RegFunc( "GETENV", Getenv );

          //
          //  Other functions
          //
      }

  If number of arguments exceed one, it should be passed as the third
  parameter. For procedures DEF_PROC and RegProc are used.

  Call of the interpreter function from C++ code is illustrated in file
  calcplus.cpp. Function 'atexit' called when program finishes.
  Method CalcPlus::Call() takes as arguments pointer to function name,
  number of arguments, and pointers to CType arguments:

      if( calc.Global->funcs( "atexit" ))
      {
          CString s("exiting... ");
          calc.Call( "atexit", 1, &s );
      }


  4.6. Modifying source code.
  ---------------------------

  If you are going to use the library in your project, then most
  likely you will have to change its source code for your own needs.
  There are different ways of source code modification, and you should
  choose the better one. Which one is better depends of how serious
  changes you need.

  The simplest way to extend the library is to add new functions visible
  from the interpreter. This can be done by modifying file calclib.cpp.

  Another way of easy modification is the change of language syntax,
  see file yycalc.yac.

  More difficult solution may require change of the hierarchies CType and
  Expression. In this case you should overwrite necessary methods and probably
  change YACC rules. Actually whole CType hierarchy can be replaced by your
  own hierarchy, if you already have something like that in your project.

  As the example of changes in source code, let us consider steps
  necessary for implementation of big numbers arithmetic:

      a)  We need CType inheritor, which will store, print, and calculate
          very big numbers (hundreds of significant digits).

      b)  Method Calc() of classes Ar1, Ar2, Comparison should be changed
          to be able handle big numbers.

      c)  YLex::yylex() must detect big number from the input stream and
          return corresponding token. This can be done by adding special
          conversion function as well.

      d)  CalcPlus::yyparse() must generate new Immediate( new BigNumber )
          when such token detected.

  After we have done these steps, hopefully, big numbers arithmetic
  will be available from the interpreter's programs.


  5. Known bugs and problems.
  ---------------------------

  The biggest known problem is obvious: error diagnostic is too simple.
  So the user with low programming experience may have a lot of problems
  trying to write program for the interpreter.

  Compiler returns line number 0 when EOF inside of unclosed block detected.
  So, line number 0 is the last line in file.

  Complex recursive define directives may work not properly.
  There is no real reason yet to develop full built-in preprocessor.

  Passing arguments to function by reference is not absolutely correct.
  There were problems when operator throw was used in C++ code called
  from inside of such function. However, this problem can be easily avoided
  by adding flag to variable, which says that variable has passed its value
  to the function, so operator delete cannot be used for the pointer to value.

  If you have found more errors, bugs, problems - please, let me know.


  6. Appendix. Language description.
  ----------------------------------

  This is informal description of CalcPlus interpreter's language.
  Most of the syntax looks and works like the same syntax in other languages.
  Language has a lot of common with C and Clipper.

  Like C:

      Module is the unit of compilation.
      Program can consist of more than one module.
      Start symbol is MAIN if not redefined.
      File can include other files by #include 'filename' directive.
      Global variables may be defined in the module context.
      Semicolon ';' is the separator between statements.
      Sign '!' is the logical NOT.

  Not like C:

      Case is insignificant.
      Preprocessor has only 'define', 'ifdef', 'endif' directives.
      No logical operations available for preprocessor.
      There are no static variables.
      Assignment is ':='.
      Strings are declared with both (') and (") separator: 'str1', "str2".
      Unary assignment sign used in comparison: if a=b then ... end;
      EXIT and LOOP keywords are the same as 'break' and 'continue' in C.
      OR, AND keywords used instead of ||, &&.
      ARRAY, ADEL, AADD should be used for array access.
      ARGC, ARGV functions provide access to the command line arguments.
      '<>' used instead of '!=' for logical not_equal.

  Functions and procedures must be described before calling, description
  of arguments is not required. Functions and procedures may contain
  return statement. Default return value is NIL.

      func a;
      proc b;

      ...
          var x:=a(1,2,3);
          b('abc');
      ...

      func a(x,y,z)
          return x+y+z;
      end;

      proc b(x)
          echo x,endl;
      end;

  Arguments preceding with sign '*' are passed to the function by reference.
  Arrays are always passed by reference. Result of the program below will be
  1 2 3 3:

      func a(x)
          x:=x+1;
          return x;
      end;
      proc main()
          var x:=0;
          echo a(*x),' ',a(*x),' ';
          echo a( x),' ',a( x);
      end;


  Blocks are allowed in any place inside of function or procedure:

      begin
          expr1;
          expr2;
          ...
          exprn;
      end;

  Variables, constants, structures may be declared in any place of
  program, they are visible only inside of block where were defined.
  No type checking. Structures are actually arrays indexed by arrays
  of strings.

      var     count := 1;
      const   pi := 3.14;
      struct  abc {a,b,c};
      abc     test;

  Output is performed by ECHO statement followed by expressions list.
  ENDL is the Line-Feed constant.

      echo '2*2=',2*2,endl;

  All control structures: begin, if, for, while, func, proc must be
  closed by keyword 'end'. Variables defined inside of such structures
  are considered as local for them. STEP keyword may be omitted in
  FOR statement, step is 1 by default.

      if a=b then
          f1();
      else
          f2();
      end;

      for i:=1 to 10 step 2 do
          echo i*i,' ';
      end;

      while a>b do
          b := b+1;
      end;

  All the arithmetic operations are usual:  ((2+3)-1)*6/2.


  <EOF>
  -----
