Functionality on Different Levels of Abstraction
Warning: This article may contain trace amounts of three different programming languages, compiler internals, object oriented method dispatchers, bitwise binary arithmetic on pointer variables and term rewriting systems (which are cool).
Update: To put this first, dispatching strategies similar to the one proposed here have been taken into account in the implementation of Pragmatic Smalltalk / the Etoile LanguageKit. The current implementation that inlines parts of the dispatch into the message sending (calling) method, was chosen to allow for very fast number computations and indeed performs very good in that aspect when compared to other scripting languages. Seen from the "mathematical operation performance" perspective, the decision is very reasonable.
The implementation of programming languages can probably be considered a comparably mature field. For a few months now, I was exceptionally lucky to have the possibility to follow the development of such a language implementation from the first row. During the development of David's LanguageKit framework (and the accompanying Pragmatic Smalltalk language), there's one specific design lesson that I believe to be very important:
Every piece of functionality can be implemented on very different levels of abstraction. Think carefully where it belongs.
An example: In Smalltalk implementations,
ifTrue: messages to objects
aren't dispatched but simply converted into a simple conditional
branch on assembler or bytecode level. The straightforward way to
implement this would be to change the code that generates byte code
message sends1 and make it produce a conditional branch instead.
However, LanguageKit takes a much more elegant (imho) solution there,
which decouples the implementation of message sends from its
optimization. Before a program's AST is converted into byte code,
special optimizations can be done on the AST. An optimization on the
AST is similar to a term rewriting rule2: When a specific kind of
AST node is encountered in the AST, it can be replaced by whatever
other AST node the rule likes. So, the LowerIfTrue rule can be loaded
into the compiler, which replaces
LKMessageSend nodes with
LKIfStatement nodes if the message send nodes send
So this basically changed LanguageKit in such a way that:
- The code on byte code generation level can stay very straightforward and clear. No exceptions and corner cases for ifTrue: and friends there.
- There's a mechanism for pluggable rules ("LKPlugins") which change the AST and in which terms the
ifTrue:optimization is implemented.
- Minor drawback: LanguageKit needs to support the additional
LKIfStatementAST node, which is otherwise not required for Smalltalk.
- Big advantage: The
ifTrue:optimization is now implemented in terms of an AST transformation. This way, you don't need to understand the internals of the specific byte code language. (Maintainability!)
Just like in this example, it looks to me that it's a general design principle that you should implement things on the highest abstraction level possible. In dynamic languages like Pragmatic Smalltalk and Objective-C, this is ultimately even outside the actual compiler: In the runtime system. (Note that Objective-C's objc_msg_sendv is implemented in its runtime, not in the compiler.)
In that respect, I'm still not entirely happy with
CodeGenLexicalScope::MessageSend in LanguageKit: It's a design
philosophy of Pragmatic Smalltalk to use the Objective-C runtime and
become a "native citizen" there. So it's obviously a good idea to call
the Objective-C method dispatcher. However, unlike Objective-C,
everything is an object in Pragmatic Smalltalk, including integers,
leading to the usual dynamic language small-integer pointer
hackery3. Because of this, extra caution must be applied to avoid
sending Objective-C messages to small integers via the
objc_msg_sendv. To be more precise, it needs to be
checked whether that's the case. If the receiver is a small int, a
specific different type of dispatch is done.
I don't understand much about LLVM, but looking at
CodeGenLexicalScope::MessageSend (I assume you now also take a look at
it), I'd guess that this can be implemented more elegantly by simply
calling a specifically written
dispatcher, which is written in plain old C and relies on
objc_msg_sendv in the generic case. This way, there
wouldn't be so many basic blocks (so many generated byte code
operations) everytime a Pragmatic Smalltalk program calls a method ==>
pragmaticst_msg_sendv can stay in cache much longer than
these basic blocks, less generated code. But the biggest advantage to
me is, it's understandable without understanding
The dispatch process would then be explicitly two-staged. However, looking at the current implementation, it currently is two-staged anyway. (This becomes clear if you refactor the generated byte code out into a dispatch function (in your head :-)).) The basic observation is, Pragmatic Smalltalk can't live directly in the Objective-C runtime, there has to be a wrapped-around dispatch mechanism which handles small integers. (It doesn't matter if you call it like this or not.)
An additional benefit would be the following:
pragmaticst_msg_sendv could dispatch to a special
NilClasses' instance methods when the receiver is
nil, thereby allowing to have an
It's interesting how software evolves sometimes. Sometimes you're just a tiny step away from a familiar concept, then the code collapses to something that looks trivial, but noone can see any more how large it has been before. The big paradoxon: The more reflective work you put into it, the smaller it gets5.
1. In this case these are actually LLVM language message sends.
2. I recently realized how cool these are, now I start seeing them everywhere. :-)
3. A reference to an integer i below 2<sup>31</sup> is not a pointer to a real memory location but (i<<1)+1. The lowest significant bit indicates that it's a small integer. No collisions with real pointers because these usually have word-alignment.
4. For that to work with the type inference that checks whether a variable can be a small int, the dispatch process would actually need to be three-staged, like (1) dispatch to smallints (2) dispatch to nil (3) dispatch to ObjC objects. Objective-C message sends enter the dispatch process at stage (3), Smalltalk message sends that can't be small ints enter at stage (2), other Smalltalk message sends need to start at stage (1).
5. Note the fatal negation: The less reflective work you put into it, the bigger it will become. The bigger it is, the better it looks (to your non-technical boss?).