Stacks, call frames, dynamism, and tracing JIT

While reading through the excellent (but discontinued) blog Squawks of the Parrot, I came upon a comment from a post about garbage collection:



quoted from Buk:


The thought that struck me was the C concept of automatic variables. They come into existence at exactly the level of scope that they need to, and they are automatically cleaned up when they go out of scope. So simple. Garbage Collection becomes a simple SP+n operation.


The problem is that they *can't* exists beyond their scope. The old "Don't return references to stack vars problem". And that's when it struck me, why bother GCing *everything*?


Why not allocate local/lexical scoped variables from a "stack" and only promote them to a "referenced variable heap" if the code actually takes a reference to them?



Of course, he or she then proposes:



quoted from Buk:


The testing for which type any given variable is would be restricted to code generation time.



Interesting, in fact, in that I actually implemented this, as described in the old SNAP VM post, "Interlude: the LIFO Heap". However, I don't test "which type" the variable is at code generation time - that testing is done at run time. As Dan pointed out in a reply to the comment, figuring this out at code generation time is difficult due to the existence of ccc. Instead, what I do is observe when a local variable could be captured - which can only be done if a continuation is captured.


Of course, any continuation capture will - at least for now - prevent stack-like behavior from call frames "below" the continuation capture point. Still, we don't lose anything in that case, and we gain something if continuations are not captured.


(i'm also working on a generational gc in a branch of hl; a side effect of this gc is that stack-like behavior for continuation functions/call frames can be "recovered", so to speak, when it is determined that a captured continuation that has been preventing deallocation has become garbage.)


Thus - if we were to drink Steve Yegge's Kool-Aid - instead of trying to force this glob of logic into the compiler, where we'll be forced to do nothing less than whole-program optimization, we just look at the run-time behavior. Instead of figuring out - which can go wrong in ways that are hard to test - we just watch.


And one new trend I've noticed in recent JIT techniques is, at its core, just watching. It's a technique called tracing (good introduction here).


Now, tracing is rather a simple concept. It just watches the interpreter loop going through the program, recording the bytecode sequence into a "trace". Then, when the interpreter notices a loop - usually by detecting a backward jump - the tracing JIT takes the recorded trace and compiles it, then tells the interpreter to go away and run the compiled trace. Of course, the compiled code has to do some checks - loops are usually not infinite so it has to handle loop exits, etc.


The neat thing is that tracing effectively gives function inlining "for free". You see, the tracing JIT keeps on observing even when function calls are made. Function calls are "invisible" - the tracing JIT just doesn't care about them.


One thing I noticed is that many of the earlier papers for tracing JIT were for building JVM's, although tracing has also spread to JavaScript implementations. In fact, I think that tracing will end up being more beneficial to dynamic languages - such as JavaScript - than to VM's that inherently assume a statically-typed language - such as Java. It has to do with the fact that dynamic languages usually just watch the types of data as it flows during run-time, rather than trying to figure them out at compilation.


For instance, consider two optimizations - constant propagation and type inference. No, don't do them at compile time - let's try doing them at run time! Let's watch during interpretation where constants go, let's watch which actual types the program ends up working with.


How? Easy - just mark your data with tags saying "this piece of data, we know it was constant because it was given as a constant in traced code", or "this piece of data, we know its type because we did a polymorphic invocation some time ago, and we wouldn't be going through this code if it were some other type." Just watch what happens during interpretation. We get more information about how the program is actually used, rather than making do with what it theoretically could be used for. We generate code based on the tags - if we encounter a "branch if false" bytecode, and we watched its input, and we know it's a constant, then we can just generate the best amount of code: nothing at all. It doesn't matter if the constant was true or false - the interpreter will automatically and correctly handle the truthness for us, and we just keep on watching the interpreter work.


So: JIT is now implicitly optimizing, because we can just watch the program. Then, once we have a record of the program's operation, we just "fast-forward" over the bits we know are given (i.e. constants, known types) and splice together the bits that aren't.


I think, then, that with tracing JIT, dynamic languages can gather some of the fruits of static languages, while still remaining openly dynamic.

No comments: