Norstrulde Language 84
Quick build and test instructions:
$ make
$ cd source
$ ./hello_world
hello, world
$ ./factorial 5
120
Here are some facts about Language 84, its compiler, and its runtime:
- The compiler is written in Language 84; it is "self-hosting".
- The compiler produces C code. Lots of it.
- The syntax is LL(1).
- Parsing is not sensitive to indentation.
- Mutually-recursive variable bindings are not supported.
- The parser uses mutual recursion; for example, parse_applications calls
parse_expr and vice versa. See syntax.84.
- Variable assignment is not supported.
- Record component access does not involve any string processing at run
time. See module_fetch() in runtime.c.
- Function arguments and local variables are stored on a stack.
- The runtime has exactly one general-purpose register.
- Control flow operators include: If, Cond, And, Or, When, Match, Switch.
- Tail-call optimization is only performed when the tail-call is prefixed
by the Goto keyword.
- There are no looping operators. Recursion and Goto must be used instead.
- Because of the lexical rules regarding capitalization, every word
comprised of all-uppercase or all-lowercase letters is an acceptable
identifier ("if" and "let" are acceptable variable names).
- Each .84 source file defines a single value to be constructed at
program initialization time.
- Partial application of functions is not supported but the $ operator
familiar to Haskell programmers is supported.
- There is exactly one kind of data structure that is not immutable and it
is rarely used by the compiler.
- The runtime has no garbage collector.
- There is no type system yet but that may change in the future.
- There is no object system.
- There is no function overloading.
- There are no first-class continuations and there is no exception system
but such control effects may be supported in the future in some form.
- Local scope blocks grow top-down; global scope blocks grow bottom-up.
- The In keyword separates local scope levels and each scope level may
contain any number of Let, Define, and Do forms.
- Cyclical package dependencies are statically detected and forbidden.
- Unbound variable references are statically detected and forbidden.
- Standard algorithms and data structures used by the compiler include:
- Tarjan's algorithm for finding strongly connected components.
- Insertion sort.
- Recursive-descent parsing.
- Precedence climbing for parsing infix expressions.
- Lambda-lifting.
- Immutable 2-3 (balanced) trees.
- Immutable lists.
- The generated C code is very similar to machine code; it doesn't use
loops, function definitions, or complex nested expressions; and it only
uses a few local variables. When the compiler is modified to emit
machine code, GCC will only be needed to build the runtime and to
perform linking. Hence, overall compile times will become much shorter.
- The runtime only emits error messages when requested by the compiled
program. Therefore, a compiled program that has bug may silently fail
and exit with nonzero status. Be sure to check exit statuses.
- When debugging, it's useful to set a breakpoint on the runtime's halt()
function.
To build the programs, you only need GCC and GNU Make. There are a few
different build targets for the compiler and for some sample programs. Please
refer to source/Makefile.