How to write a (tiny) Compiler in a weekend.

Gigahertz league processors, fast 128Mb memories and hard discs storing tens of Gigabytes have made a joke of what used to be heated optimisation and memory management issues. But there may still be some reason to take a last nostalgic look at what would have been an interesting project some time ago. For example, an old, inexpensive installation (which did not warrant a powerful processor) may need a single fast routine. A programmer may not be proficient in machine code in order to provide it.

Writing a 3 kilobyte compiler is bound to attract the wrath of fanatics in the structured programming community, and is unlikely to appeal to anybody (very reasonably) expecting every possible facility. Here is a link of less controversial recent sections. Take the link, too, if you have the widespread demand to be provided with several ways of getting at the same result. By now, you are probably well aware that this site is fervently minimalist!

Now that you have been warned, here are the main compiler shortcuts:

The only variable type supported is two- byte integers: Boolean variables have a value of zero if false, and non- zero otherwise. Strings and arrays are implemented by the programmer directly accessing memory.
The expression analyser is a major thorn in the side of anybody contemplating a mini- compiler write- up. It involves two mutually recursive procedures to parse terms and factors and eventually provide an inverse Polish form. An interesting programming exercise, but a tall order, too, for a 3 kilobyte compiler. One solution is to omit it! The simple expressions allowed will be detailed shortly.
Calls to any assembly routines available can be made. See below for the type of routines you would want to have. Procedures are allowed, too, but no local variables, or parameters- the latter have to be supplied explicitly by the programmer. Recursion is allowed, but not very useful, since only global variables are implemented. Doing otherwise would complicate stack handling and variable access too much for the purposes of this project. Besides, recursion, although sometimes most convenient, is not entirely indispensable.
An If .. Condition .. Then .. Block .. Else .. Block .. EndIf construct is supported. A block consists of one or more commands separated by a suitable delimiter.
The While .. Condition .. Block .. Whilend loop can provide all kinds of iterations, at least if the (possible) counter variable is handled explicitly. Forward references require a two- pass compilation.
Additional commands can be implemented according to their frequency of use.
Error reporting is very basic- the program to be compiled can be debugged on the resident interpreter (whose limited speed can occasionally be a problem.)

Compared to high- level languages (and processors!) offering 500+ commands, this looks terribly limited. However, the mini compiler is not intended to provide stand- alone programs, only a fast routine to an otherwise undemanding program, in terms of speed. It should be contrasted to having to write a section in machine code! Interfacing between the main program and fast routine is by the primitive method of directly accessing common memory.

Here are typical applications:

Fast copying a block of memory.
Fast searching for a string in memory.
Fast Sort.
Interfacing- Detecting a pulse of very short width.
Capturing, storing and processing a large number of fast samples.
Image processing, screen animation and simple computer games.

Here are the simple expressions possible:

A=1
A=B+C
B=A Or C
C=A>B
*A=*B, A=*B, *A=B

The star operator directly accesses memory (is a pointer, in C- parlance.) Arithmetic operations also include subtraction, multiplication, integer division and remainder. Logical operations also feature And, Xor and Not. Relational operations also comprise less than, equal, and not equal. More complicated expressions have to be broken into simpler ones: (Think of this as a preamble to machine code programming!)

It would be a pity to severely restrict the language, in order to write a tiny, but giving fast executable code, compiler, only to waste time in IO calls: OS input- output routines have to cater for graphics, different devices, colours, character sets, screen resolutions, windows of different sizes at different locations, scrolling, compatibility and so on. Understandably, they are not particularly fast. If a limited range of options will only be requested, it is certainly possible to improve upon the speed.

Everything that is nice in organised programming has been eradicated- are the results any good? Well, a routine was compiled to capture and store samples on a slow board. The compiler executable could handle 50,000 samples per second, which is nowhere near the 165,000 samples achieved by handcrafted code, but a clear improvement upon the 400 samples provided by the high- level language interpreter: After all, an interpreter spends most of the time deciding what to do, not actually doing it.

Because of recent developments in technology, this section will not look in depth at why the differences between compiler code and custom code should be so. It is quite acceptable to trade efficiency for ease of programming, most of the time. However, here are the principal compiler drawbacks:

Often, a value is written to a location which already contains that value! For example, the high- order byte of a two- byte counter only needs to be updated once every 256 times.
In another compiler, a full four- byte addition may be carried out when a number between 1 an 8 only needs to be incremented by one. Defining byte, int and longint types does not necessarily solve the problem: If an implicit cast between types is needed, the overall performance is slower.
Using registers for often- used variables cannot be done effectively without human help: The number of times a loop will be executed may not be obvious until run- time, and then keeping statistics on variables access defeats the issue (takes longer.)

Strictly speaking, the 3 kb figure is only attained in a high- level language which tokenises keywords before storing them to disc/ memory, so a space is not needed after them. (Apart from loss of readability, the only ill- effect of this, is that a variable such as 'Forever' will clash with the keyword 'For'.)