Towards implementing Leather.

As often happens, the idea becomes a bit clearer when implemented. An Abstracted class is merely an additional level of indirection between object creation and object use. Instead of tying type references directly to type declarations, it is done indirectly via the abstract type. This provides the flexibility to use any suitable declared type where the abstract type is used. The fact that the compiler can check calling interfaces and run the verification routines gaurantees that the flexibility doesn't come at the expense of safety.

You can call it John's law :-

All advances in computing have arisen through the creation of an additional level of indirection, the trick is to work out which indirection is actually useful.

Some more philosophy on the new language.

I have also been putting a lot of other features of programming languages to the following test. "Does the feature force the programmer to specify the mechanism or the semantics?" My idea here is that the programmer should say what he means, and not tell the compiler how to do its job.

For example :- parameter passing mechanisms. If the programmer has to choose whether he passes an array by reference or by value, he is specifying the mechanism. If the programmer states that the parameter is input only, or output only or will be modified, he is specifying the semantics, and the compiler is free to choose the most appropriate mechanism to ensure the semantics are met efficiently.

I believe this approach will lead to a much more elegant language. For example, take the classic confusion and mess surrounding the humble assignment.

The assignment operator problem.

When we say "a := b;" we have a mess of semantics in the various languages.

In most languages the semantics for integer and floating point types is that the contents of b is copied into a.

The semantics of assignment is inconsistent when applied to nonscalars.

In C & C++, If a is a pointer and b and array, this is equivalent to "a = &(b[0])". In Pascal it would be a syntax error.
In C & Pascal, if b is a structure, the whole structure is copied, including any pointers and arrays. However, pointers are not followed, thus only a "shallow" copy is made. Actor provides an additional operator "deepcopy" that will perform a deep copy of any structure, copying all subobjects as well. In C++ a "copy" constructor is called. If one isn't explicitly defined, an implicitly defined one does a shallow copy.

Assignment is used when manipulation of name binding is meant.

Consider this piece of Pascal :-

      temp := b;
      b := a;
      a := temp;

Here we mean to swap the labels on the objects in a and b. The mechanism we use is copying, what we actually meant was that the compiler should no longer consider 'b' to refer to what it was, but to refer to what 'a' was referring to. And vice versa.

Thus looking at the average program, it is certainly not clear what the implications of an assignment statement is. Not only does it often depend on structure/non-structure item context, but also on the fine implementation details of the object. If the object is some form of container, the assignment may mean shallow or deep copy, depending on the details.

The assignment operator solutions.

I propose that copy ALWAYS means deep copy. I also propose a new assignment-like operator which I will call "handover". Instead of copying the value of 'b' to a, it will handover the object in 'b' to 'a'. Thus 'b' loses its binding and the 'a' takes it over. Thus the swap becomes

      b handover to temp;
      a handover to b; 
      temp handover to a;

At the end of this sequence, 'temp' is unbound and it is illegal to refer to it.

Now please be patient and see where this takes us. It takes us into a world were memory errors disappear and garbage collection comes for free.

Pointers rethought.

Consider pointers. Pointers are integer labels/names of objects that exist in a global namespace. Just as 'temp' is a string of characters providing a means of referring to an object, a pointer is an integer that provides a means of referring to an object.

Think of this in terms of my semantics vs mechanism criterion. A C style pointer specifies the mechanism of a direct memory address. I say we should be concerned with the semantics. The semantics is we wish to be able to manipulate an integer name for an object.

Pascal attempts to safeguard this be removing the "integer" semantics. In C you can perform very useful, but unsafe integer arithmetic on pointers. Pascal bans this.

Nonetheless both Pascal and C pointers are labels in a GLOBAL namespace. A pointer created anywhere in the program is valid everywhere else.

Now any programmer worth his salt keeps the number of global variables to a minimum, because they are :-

Dangerous. They can create highly non-local bugs.
The hard to manage.
They pollute/clog the name space.

Yet practically all languages force us to use pointers which are global integer names.

In addition to this, pointers create aliases to objects. In the presence of pointers, objects can have a myriad of different names. Thus it is highly unsafe to dispose of a object as it is highly non-trivial to establish whether it is still in use or not. Some languages provide sophisticated garbage collection mechanisms to cope with this.

Cleaning up the semantics of assignment allows resolution of the problems of pointers.

I propose to resolve these problems by imposing the well-known database rule. "One fact in one place." I propose that all objects should be atomic in the sense that they can only be in one place, accessed via one name at a time.

This will be made possible by the "assignment is deepcopy" rule and the "handover" operator. There will be no address-of operator. There will be no way of creating an alias.

Now a lot of highly useful algorithms make sophisticated use of pointers, and at first sight it would seem as if I have lost a vast amount by imposing this restriction. I will now show that I have lost nothing and in fact gained garbage collection for free.

Clearly the language should provide some form of indexed container. (An array being the archetype). Now the "one object one place" rule will insist that if we place an object into a container, we "handover" the object to the container. Thus the object is now in the container and is no where else. The handover operation ensures that the variable holding the object previously, is now unbound, and the object is no longer accessible via that route. The only way of operating on that object is via the container.

Now the integer index of the object in the container provides all the functionality of a traditional pointer, but in a local, not global namespace.

However, and this is the cute bit, we need to know the name of the container and the integer index to access our object. Thus as soon as we move out of scope for that container, the container AND ALL THAT IT CONTAINS, becomes garbage and can be collected.

To make this a bit more concrete. Think of a C pointer dereference operator. It is the unary operator *. In my scheme it becomes a binary operator operating on the pointer and some container. In C the container is implicitly the global memory container.

Ok, so there are some holes in this scheme...

At first sight you may say, "Oh no! He has come up with the old reference counting idea, and we all know that doesn't work because of circular references."

To a certain extent yes, but in fact I have come up with something more draconian that that. I insist that the reference count MUST be forced to be 0 or 1. This creates a different realm for the programmer. It some senses it is more limiting, but in fact it can be "godelized" to be as flexible as any other language. ("godelized", after Kurt Godel, who mapped symbolic logic to the natural numbers to create meta-proofs of mind-boggling implications. Indexed containers permit "godelization" of objects.)

What this rule does do, is create an object world that is far more like the familiar spatial universe we ourselves exist in. I can be in my office or I can be at home, I can never be in both. Thus when writing programs under this draconian constraint I am operating in an environment similar to the every day one.

The only class of memory leak that still remains is if the programmer is stupid enough to put the container in it self... (Ye olde circular reference, but with the extra spin that the container must disappear into itself!)

Hmm. This gives rise to a new name for this bug, the Klein bottle bug! (The Klein bottle is a bizarrity borrowed from topology, it is a bottle whose inside is the same side as its outside. You get one by taping the edges of two moebius strips together in a 4-dimensional space.)

Still, the compiler should be able to pick up most obvious occurences of Klein bottle formation, but I suspect in general it would still be possible.

Also the "this is in that" graph should be a nice tree it should be quite efficient to check for such an event. Maybe one can do some form of garbage collection sweep on program exit and find if you have Klein bottles lying around and if you do have Klein bottles, you can switch on checks for bottle formation next time you run the program.

There are also situations where some form of garbage collected container could be useful, but at least this scheme can allow the garbage collection process to work in a much smaller domain.

The implications of this scheme in the presence of multi-threading have not been thought out properly. I suspect that it will result in far fewer race conditions, but the I still need to work on the meaning of 'one object one name' in the presence of multithreading.

Outdoing the functional language folk.

As far as I can see the main claims to fame of the functional languages are the following two points :-

Functional languages 'ala lambda calculus, have a very tidy flow of cause and effect, and hence are easy to reason about and create proofs of correctness. A reasonable suggestion is then made :- if it is easy to prove that it is correct, it is easier to write correct programs.
Functional languages allow higher order functions.
Lazy evaluation enables efficient computation of functions that have difficult to compute arguments.

Side-effects create hairy cause and effect, but the hair can be contained.

All functional languages I have seen so far, have destroyed the tidy flow of cause and effect by allowing functions with side effects.

Thus in my language I propose regain the original purity by enforcing side effect free functions. For efficiency sake, procedures will be available for operations requiring side-effects, but at least one will be secure in the knowledge that all expressions are side effect free.

Note :- there is no reason why one cannot use efficient side effect creating operations like assignment within a side effect free function, if and only if those effects are local to the function and disappear on exit of the function.

Consistent OOPL can treat functions as just another Object.

If the object model is made formally consistent and the language is reflexive then a function is merely yet another object. Thus a consistent OOP language should always allow higher order functions.

Lazy evaluation is dangerous, but can be made safe.

Lazy evaluation, as I understand it, basically comes down to "call-by-need" parameter passing. For a long time I have resisted the idea of "call-by-need", as it triggers the evaluation of a piece of code, (the calling parameter expression), at an arbitarily later time (the first parameter reference). This, I felt sure, would create ample room for all kinds of very nasty bugs. However I can also see that certain computations would be greatly facilitated by Lazy Evaluation.

The resolution of the dilemma is to restrict Lazy evaluation to parameters of my "pure" functions. If "call-by-need" was allowed in procedures with side-effects, the results of the parameter expressions could be altered by side-effects in the procedure AND side-effects in the evaluation of the parameter expressions themseleves. This would make for incredibly hairy flow of cause and effect. If lazy evaluation was restricted to side-effect free functions, then one is gauranteed that the result of evaluating the parameter expression would be the same no matter when it is actually performed.

Parameter passing revisited.

Two things about what I have said above affect the concept of parameter passing. One is the existence of pure functions, and the other is the concept of 'one object, one place'.

The 'one object, one place' rule would suggest that parameter passing be a 'handover' operation. In fact it must be one of the following options :-

'hand into procedure',
'hand in and hand out',
'hand out of procedure'.

The requirement for pure functions would be met by 'hand in and forbid any changes and then hand out'. This would be the only semantics allowed to function parameters. Note that the compiler need not formally go through the work of unbinding and rebinding, the compiler is free to choose the exact mechanism, so long as the semantics hold.

For procedures parameters can be one of :-

Hand in.
Input only. (Hand in and forbid changes and hand back.)
Modify. (Hand in and hand back)
Hand out. (Output only.)

The operation of placing an object into a container would require a "hand in" parameter. For example :-

      myStack.push( a);
      a.doSomething; // This is illegal as a is now unbound.

Note the difference between Input only and call-by-value. Call by value, is equivalent to shallow copy and hand-in. I object to this on two grounds :-

It allows the modification of the input parameter, creating a false expectation that the modification will be carried back to the calling routine.
Shallow copies allow modication of objects pointed to by the object passed. Thus a false expectation is created that the called routine cannot change the object passed.

I could get around objection 2 by making call-by-value deepcopy and hand-in, but objection 1 would still remain.

Comments, queries and conversation.

This page hosted by Get your own Free Home Page