I’m considering reimplementing part of my Java parser library, hadrian. At the lowest level, hadrian is essentially a virtual machine with an unusual instruction set and it parses input by interpreting parser bytecodes. One thing I’ve been thinking about is replacing this simple interpreter with something a bit more clever, for instance dynamically generated java bytecode. Today I decided to play around with this approach to see how complicated that would be and how much performance there is to gain by replacing a simple bytecode interpreter loop with a dynamically generated interpreter.
Instead of actually replacing the hadrian interpreter I thought I’d do something simpler. I made a class, Handler, with one method for each letter in the alphabet, each one doing something trivial:
class Handler {
private int value;
public void doA() { value += 1; }
public void doB() { value -= 1; }
public void doC() { value += 2; }
...
}
The exercise was: given a piece of “code” in the form of a text string containing capital letters, how do you most efficiently “interpret” this string and call the corresponding methods on a handler object. Or, to be more specific, how do you efficiently interpret this string not just once but many many times. If this can be done efficiently then it should be possible to apply the same technique to a real bytecode interpreter that does something useful, like the one in hadrian.
The straightforward solution is to use an interpreter loop that dispatches on each letter and calls the corresponding method on the handler:
void interpret(char[] code, Handler h) {
for (int i = 0; i < code.length; i++) {
switch (code[i]) {
case 'A': h.doA(); break;
case 'B': h.doB(); break;
case 'C': h.doC(); break;
...
}
}
}
That’s very similar to what hadrian currently has and it’s actually relatively efficient. It should be possible to do better though.
Java allows you to generate and load classes at runtime, but to be able to use this I had to write a small classfile assembler. There are frameworks out there for generating classfiles but I wanted something absolutely minimal, something that did just what I needed and nothing else. I found the classfile specification and wrote some code that generated a legal but empty class file that could be loaded dynamically. Surprisingly, it only took around 200 lines of code to get that working. Adding a simple bytecode assembler to that, so that I could generate methods, got it up to around 600 lines of code. That was still a lot less code than I had expected.
The simplest approach I could think of was to generate code that consisted of a sequence of calls to static methods, one static method for each “opcode” or in this case letter in the alphabet. Given a set of methods:
static void handleA(Handler h) { h.a(); }
static void handleB(Handler h) { h.b(); }
static void handleB(Handler h) { h.b(); }
I would, for a string such as "KBQ..."
, generate code corresponding to:
void run(Handler h) {
handleK(h);
handleB(h);
handleQ(h);
...
}
This means that instead of dispatching while running the code, you dispatch while generating code:
for (int i = 0; i < code.length; i++) {
gen.aload(0); // Load first argument
String name = "handle" + code[i];
Method method = getDeclaredMethod(name, Handler.class);
gen.invokestatic(method); // Call handler
}
I tried this and compared it with the simple interpreter it is, perhaps not surprisingly, quite a bit faster than the simple interpreter. On short strings, around 10-30 letters, it is around twice as fast as the interpreter. On longer strings, say 200-300 letters, it is 4-5 times faster. Nice.
I though it might be interesting to see where the theoretical limit was for this so the next thing I tried was to remove all overhead. Instead of generating calls, I generated code that performed the operations directly without going through any methods:
void run(Handler h) {
h.value += 6;
h.value -= 1;
h.value += 9;
...
}
This was a bit more complicated to implement but I got it up and running and surprisingly, it was exactly as fast as the previous experiment. Even though there were two calls for each opcode in the previous example, one to handleX
and one to h.doX
, the VM apparently inlines everything. This means that you can generate efficient straight-line code simply by having a static method for each bytecode, generating a sequence of calls to those methods and then leaving the rest to the VM. And generating classfiles containing straightforward code like that can be done efficiently in very few lines of java code. If I do end up reimplementing part of hadrian, I’m sure I’ll use this technique.
I also played around with various ways of adding arguments to the generated code, like:
void run(Handler h, int x, int y) {
handleK(h, x, y);
handleB(h, x, y);
handleQ(h, x, y);
...
}
It turns out that as soon as you add arguments, each opcode becomes twice as expensive to execute. I tried the same with the interpreter and there was no notable change in performance. But it should be possible to pack any state you need to carry around into the one argument that can be passed around without losing performance.
All in all I would say that this technique has a lot of potential, and I suspect it might have other applications than bytecode interpreters. I’m currently considering cleaning up the code and publishing it on google project hosting.
Update
This code now lives at googlecode, along with an example. The interface is pretty straightforward, especially for expressing bytecode. Here’s how this code:
public int call(int x) {
while (x < 15)
x = Main.twice(x);
return x;
}
can be expressed directly in bytecode:
Code call = classFile.addMethod("call", new Code(file) {{
// --- Labels ---
Label loopStart = new Label();
Label loopEnd = new Label();
// --- Code ---
bind(loopStart);
push(15);
iload(1);
isub();
iflt(loopEnd);
iload(1);
invokestatic(Main.class.getDeclaredMethod("twice", Integer.TYPE));
istore(1);
ghoto(loopStart);
bind(loopEnd);
iload(1);
ireturn();
}});
call.returnType(Integer.TYPE);
call.argumentTypes(Integer.TYPE);
The only annoyance is that some opcode names like return
and goto
are reserved so I have to use names like rethurn
and ghoto
, following the “silent h” renaming convention.
One Response to Code Generation