Functionality on Different Levels of Abstraction

4May2009

Warning: This article may contain trace amounts of three different programming languages, compiler internals, object oriented method dispatchers, bitwise binary arithmetic on pointer variables and term rewriting systems (which are cool).

Update: To put this first, dispatching strategies similar to the one proposed here have been taken into account in the implementation of Pragmatic Smalltalk / the Etoile LanguageKit. The current implementation that inlines parts of the dispatch into the message sending (calling) method, was chosen to allow for very fast number computations and indeed performs very good in that aspect when compared to other scripting languages. Seen from the "mathematical operation performance" perspective, the decision is very reasonable.

The implementation of programming languages can probably be considered a comparably mature field. For a few months now, I was exceptionally lucky to have the possibility to follow the development of such a language implementation from the first row. During the development of David's LanguageKit framework (and the accompanying Pragmatic Smalltalk language), there's one specific design lesson that I believe to be very important:

Every piece of functionality can be implemented on very different levels of abstraction. Think carefully where it belongs.

An example: In Smalltalk implementations, ifTrue: messages to objects aren't dispatched but simply converted into a simple conditional branch on assembler or bytecode level. The straightforward way to implement this would be to change the code that generates byte code message sends¹ and make it produce a conditional branch instead.

However, LanguageKit takes a much more elegant (imho) solution there, which decouples the implementation of message sends from its optimization. Before a program's AST is converted into byte code, special optimizations can be done on the AST. An optimization on the AST is similar to a term rewriting rule²: When a specific kind of AST node is encountered in the AST, it can be replaced by whatever other AST node the rule likes. So, the LowerIfTrue rule can be loaded into the compiler, which replaces LKMessageSend nodes with LKIfStatement nodes if the message send nodes send ifTrue:-like messages.

So this basically changed LanguageKit in such a way that:

The code on byte code generation level can stay very straightforward and clear. No exceptions and corner cases for ifTrue: and friends there.
There's a mechanism for pluggable rules ("LKPlugins") which change the AST and in which terms the ifTrue: optimization is implemented.
Minor drawback: LanguageKit needs to support the additional LKIfStatement AST node, which is otherwise not required for Smalltalk.
Big advantage: The ifTrue: optimization is now implemented in terms of an AST transformation. This way, you don't need to understand the internals of the specific byte code language. (Maintainability!)

Just like in this example, it looks to me that it's a general design principle that you should implement things on the highest abstraction level possible. In dynamic languages like Pragmatic Smalltalk and Objective-C, this is ultimately even outside the actual compiler: In the runtime system. (Note that Objective-C's objc_msg_sendv is implemented in its runtime, not in the compiler.)

In that respect, I'm still not entirely happy with CodeGenLexicalScope::MessageSend in LanguageKit: It's a design philosophy of Pragmatic Smalltalk to use the Objective-C runtime and become a "native citizen" there. So it's obviously a good idea to call the Objective-C method dispatcher. However, unlike Objective-C, everything is an object in Pragmatic Smalltalk, including integers, leading to the usual dynamic language small-integer pointer hackery³. Because of this, extra caution must be applied to avoid sending Objective-C messages to small integers via the objc_msg_sendv. To be more precise, it needs to be checked whether that's the case. If the receiver is a small int, a specific different type of dispatch is done.

I don't understand much about LLVM, but looking at CodeGenLexicalScope::MessageSend (I assume you now also take a look at it), I'd guess that this can be implemented more elegantly by simply calling a specifically written pragmaticst_msg_sendv dispatcher, which is written in plain old C and relies on objc_msg_sendv in the generic case. This way, there wouldn't be so many basic blocks (so many generated byte code operations) everytime a Pragmatic Smalltalk program calls a method ==> pragmaticst_msg_sendv can stay in cache much longer than these basic blocks, less generated code. But the biggest advantage to me is, it's understandable without understanding LLVM. (Maintainability!)

The dispatch process would then be explicitly two-staged. However, looking at the current implementation, it currently is two-staged anyway. (This becomes clear if you refactor the generated byte code out into a dispatch function (in your head :-)).) The basic observation is, Pragmatic Smalltalk can't live directly in the Objective-C runtime, there has to be a wrapped-around dispatch mechanism which handles small integers. (It doesn't matter if you call it like this or not.)

An additional benefit would be the following: pragmaticst_msg_sendv could dispatch to a special NilClasses' instance methods when the receiver is nil, thereby allowing to have an isNil method⁴.

It's interesting how software evolves sometimes. Sometimes you're just a tiny step away from a familiar concept, then the code collapses to something that looks trivial, but noone can see any more how large it has been before. The big paradoxon: The more reflective work you put into it, the smaller it gets⁵.

References

1. In this case these are actually LLVM language message sends.

2. I recently realized how cool these are, now I start seeing them everywhere. :-)

3. A reference to an integer i below 2<sup>31</sup> is not a pointer to a real memory location but (i<<1)+1. The lowest significant bit indicates that it's a small integer. No collisions with real pointers because these usually have word-alignment.

4. For that to work with the type inference that checks whether a variable can be a small int, the dispatch process would actually need to be three-staged, like (1) dispatch to smallints (2) dispatch to nil (3) dispatch to ObjC objects. Objective-C message sends enter the dispatch process at stage (3), Smalltalk message sends that can't be small ints enter at stage (2), other Smalltalk message sends need to start at stage (1).

5. Note the fatal negation: The less reflective work you put into it, the bigger it will become. The bigger it is, the better it looks (to your non-technical boss?).