The input to the compiler should be at level one. Combining the documentation with the program, the compiler producing two outputs. One the executable program, and the page description level.
Now a fundamental glitch with markup like languages like LaTeX or page description languages like postscript is the backing programming language is crappy. Thus by combining all three levels one can make use of a good top flight general purpose language for the whole lot.
The compiler should produce code for a VM ala Java. This has an additional advantage that the when the compiler produces a page description output the language component will be in VM form which is easy to interpret. The page description can be interpreted resulting in an image which is just slapped directly onto the printer / screen.
I'm typing this as an ASCII text file at the moment. The atomic unit is an ASCII character. The tools I have to manipulate this document with, (Emacs, AWK, PERL, sed, flex, bison etc.) can all cope with the ASCII atom.
I can also create binary atoms, (float, double, int), which unfortunately do not necessarily port to other machines, but the ASCII atoms will. (Unless it is a die hard IBM machine that still, heaven forfend, speaks EBCDIC)
Binary atoms while having many advantages have no general purpose tools for their manipulation.
However the ASCII atom is also a limitation on the tools I use. As I propose to shatter the mould with this language, not merely stretch it unpleasantly, I propose that the available atoms be vastly increased. Way back when there were compelling reasons for using ASCII. Simplicity, compactness, small memories, byte size word sizes (on the z80,6502,8086 etc). Many of these reasons have departed. Memory and disk space is now enormous. Modern CPU's stall and choke when forced to eat humble bytes.
I propose that each atom be an object. Portability can be maintained across computers by defining mappings between types.
What do the marks of a markup language do?
They, in a user friendly manner, ascribe an object type to a segment of text.
Thus the language no longer is written in ASCII and parsed, but is a sequence of objects.
So how do we get from the ASCII world to the next? (Apart from dying, of course.)
Two routes....
Lets contemplate the second route. Create an object editor.
Thought one, Emacs is very extensible, could I do it in Emacs lisp? Emacs hasn't got very good OOPS facilities.
Thought two. Emacs is best at manipulating ASCII. Maybe I should keep clear.
Thought three. Each object needs some form of graphic representation. For example the "if" object would "look" something like :- if expression then statement; statement; else statement; endif;
Should I skip to a graphical environment like X to do the editing, or stick in a text ASCII environment. X is very complex. This is a first parse to try things out, eventually the object editor would be written in Chamois itself.
Objects will be stored in Binary form.
Decisions.
The object editor will use automatic object name completion. A switch will switch from object creation to text.
Object in memory format. [ObjectId|InstVar|InstVar|...]
InstVar is [ObjectId|PtrToObject] if the size of object > N_local or the object is of variable size. Otherwise it is [ObjectId|ObjectVal]
Reference counting on fingers of one thumb make GC and heap management a lot easier.
Basic Objects are described in a trivial to parse text file. This is just a bootstrapping trick until have sufficient sophistication to handle a proper class Object.
Must be a stream of objects. Seekable to the N'th object. How about (objectClass|handle)*. The operating system view will be two files, the first is the stream of fixed sized class/ Hmm. some objects are container objects. How do I handle that?
If my reference counting on the the fingers of one thumb is applied, all data structures are just trees, so a simple traverse can be used.
Do I really need a seekable stream of objects? To what extent can I make the internal and external representation the same? If I can then the mmap style I/O becomes trivial. ASCII has this advantage, can I keep it? Yes. Via the handle mechanism. During program execution things can grow messy, when memory runs low, GC will clean up. On saving, GC is called in to tidy up and compact both the memory and the handle space. On opening the file, the handle space is perfectly orderly and doesn't need to be stored. It is simply the order in which the objects are stored.
On opening a file, it is a stream of uniform objects. A simple flat array from here to there of lots.
The file constitutes the current physical location of the object.
To make a file full of objects "live", one needs to build the "handles" table.
Comments, queries and conversation.