Skip to content
glaforge edited this page Jan 19, 2011 · 4 revisions

What's this project ?

This project provides an AST transformation for Groovy which allows you to write the body of a method as JVM bytecode instructions. The JVM bytecode format used here is very close to the one used by the ASM project (which is the library used by Groovy to generate bytecode).

Who is it for ?

This project is a very good tool for anyone willing to understand the internals of the JVM. It provides a very simple way of using bytecode generation without having to deal with the ASM bytecode visitor code. Thanks to the Groovy AST transformations, we are able to provide a DSL for bytecode generation which avoids most of the boilerplate. Thus, this project is perfectly suited for students following compilation courses, or even programmers who write compilers for JVM languages.

Another possibility is to extend the Groovy language itself where metaprogramming isn't sufficient. Though, this feature is limited by the place where you can put bytecode : only in method bodies. It's not authorized in classes, nor in closures. It means you won't be able to generate a full class using this DSL : at most, you could write each method body as bytecode, but Groovy would add its own methods and metaclasses automatically.

Last but not least, this AST transform can also be used to write ultra-optimized code where Groovy is too slow. However :

  • Never think you're smarter than a compiler
  • If you were, bytecode must be limited to critical sections : not everyone understands bytecode, and you could easily write unmaintainable code
  • never forget that you can mix Java code and Groovy code, and that most of the performance problems can be solved this way

How do I start ?

First, you'll need the groovy-all jar on your classpath. This AST transformation uses features that are only available in this package. Furthermore, Groovy embeds the ASM library in a repackaged way, so you wouldn't be able to run the AST transformation with the regular ASM library as a dependency.

Second, this is not where you'll find documentation about the JVM bytecode. We strongly encourage you to read the ASM User Guide which contains excellent information about the JVM bytecode instructions.

Then, the easiest way to get started is to write Java code and dump the .class files to see what bytecode has been generated. There are several ways to do this, including plugins for Eclipse, IntelliJ IDEA or using the javap tool. We recommand using the plugins as what they'll show you can be used almost directly in the DSL because they use the ASM representation.

Step-by-step example

To illustrate the usage of bytecode in Groovy classes, we'll write a simple example : the Fibonacci function. Unless you know your JVM bytecode perfectly, start writing a Java function which does the computation :

public int fib(int i) {
   return i<2?i:fib(i-2)+fib(i-1);
}

Now, use the plugin (or javap) to display the JVM bytecode instructions :

   L0
    LINENUMBER 38 L0
    ILOAD 1
    ICONST_2
    IF_ICMPGE L1
    ILOAD 1
    GOTO L2
   L1
   FRAME SAME
    ALOAD 0
    ILOAD 1
    ICONST_2
    ISUB
    INVOKEVIRTUAL Bidon.fib (I)I
    ALOAD 0
    ILOAD 1
    ICONST_1
    ISUB
    INVOKEVIRTUAL Bidon.fib (I)I
    IADD
   L2
   FRAME SAME1 I
    IRETURN
   L3
    LOCALVARIABLE this LBidon; L0 L3 0
    LOCALVARIABLE i I L0 L3 1
    MAXSTACK = 4
    MAXLOCALS = 2

Let's do some code cleanup : first, you can drop the last part of the bytecode which is only necessary for debuggers. the MAXSTACK and MAXLOCALS are automatically computed too, so you don't need them. You can remove the LINENUMBER instructions too.

   L0
    ILOAD 1
    ICONST_2
    IF_ICMPGE L1
    ILOAD 1
    GOTO L2
   L1
   FRAME SAME
    ALOAD 0
    ILOAD 1
    ICONST_2
    ISUB
    INVOKEVIRTUAL Bidon.fib (I)I
    ALOAD 0
    ILOAD 1
    ICONST_1
    ISUB
    INVOKEVIRTUAL Bidon.fib (I)I
    IADD
   L2
   FRAME SAME1 I
    IRETURN

Another thing you can do is removing everything related to frames : frames are supported in JVM 1.6+, and Groovy is compatible with JDK 1.5. They are not necessary. Then convert the keywords to lowercase :

   l0
    iload 1
    iconst_2
    if_icmpge l1
    iload 1
    goto l2
   l1
    aload 0
    iload 1
    iconst_2
    isub
    invokevirtual Bidon.fib (I)I
    aload 0
    iload 1
    iconst_1
    isub
    invokevirtual Bidon.fib (I)I
    iadd
   l2
    ireturn

We're almost there. Now, when multiple arguments are required like for the invokevirtual instruction, separate them with commas. Class names and method signatures should be wrapped as strings too :

   l0
    iload 1
    iconst_2
    if_icmpge l1
    iload 1
    _goto l2
   l1
    aload 0
    iload 1
    iconst_2
    isub
    invokevirtual 'Bidon.fib', '(I)I'
    aload 0
    iload 1
    iconst_1
    isub
    invokevirtual 'Bidon.fib' ,'(I)I'
    iadd
   l2
    ireturn

Last, we've replaced the reserved keyword "goto" in Groovy with "_goto". There are 4 instructions that this AST transforms requires to escape :

  • goto becomes _goto
  • return becomes vreturn (for void return)
  • instanceof becomes _instanceof
  • new becomes _new

Labels are defined using the "l[0-9]+" syntax : l0, l1, l2, ... You may have noticed that the first one is not necessary. So you'll end up with this Groovy script :

@groovyx.ast.bytecode.Bytecode
int fib(int i) {
    iload 1
    iconst_2
    if_icmpge l1
    iload 1
    _goto l2
   l1
    aload 0
    iload 1
    iconst_2
    isub
    invokevirtual 'Bidon.fib', '(I)I'
    aload 0
    iload 1
    iconst_1
    isub
    invokevirtual 'Bidon.fib' ,'(I)I'
    iadd
   l2
    ireturn
}
println fib(40)

Now, there's something that may look strange to you : the Fibonacci function is recursive, and in ASM bytecode, you must reference the class name in the "invokevirtual" instruction. Here, my Java class was named "Bidon", so the fully qualified method name is "Bidon.fib". However, in a Groovy script, you don't know the name of the class that's beeing generated. We solve this by allowing the shortcut syntax ".fib", where the fully qualified name of the class is omitted. At compile time, the AST transformation will replace it with the name of the enclosing class, that is the script class. So that's the final script :

@groovyx.ast.bytecode.Bytecode
int fib(int i) {
    iload 1
    iconst_2
    if_icmpge l1
    iload 1
    _goto l2
   l1
    aload 0
    iload 1
    iconst_2
    isub
    invokevirtual '.fib', '(I)I'
    aload 0
    iload 1
    iconst_1
    isub
    invokevirtual '.fib' ,'(I)I'
    iadd
   l2
    ireturn
}
println fib(40)

If you run the script, you'll get the following output : 102334155. Well done, you've written your first bytecode instructions thanks to this Groovy DSL.

Alternatively, for method invocation, a friendlier syntax is also supported. You don't need to remember the notations for internal class names and descriptor names, instead, you can use the following syntax:

// usual syntax
invokevirtual ".method", "([DLjava/lang/String;)[I"
// improved syntax
invokevirtual method(double[], String) >> int[]

Another example using invokestatic:

// usual syntax
invokestatic "DummyClassWithWeirdMethod.method", "([DLjava/lang/String;)[I"
// improved syntax
invokestatic DummyClassWithWeirdMethod.method(double[], String) >> int[]

When a method is returning void, you can also use void.

The same approach applies to getstatic / putstatic and getfield / putfield:

getstatic age >> int
putfield SomePerson.name >> String

Going beyond

We strongly encourage you to download the source code of the project and look at the test cases, which serves as documentation as this page is far from complete. Those test case introduce multiple constructs which go from primitive type casts to try/catch/finally blocks.

DSL syntax for various bytecode instructions

Most of the bytecode instructions are usable just as they are displayed in the bytecode outline plugin, as long as you convert them to lowercase. However, some constructs are slightly different :

tableswitch / lookupswitch

Those instructions require parenthesis :

tableswitch(
  0: l2,
  1: l1,
  default: l3)

Groovy keywords

  • goto becomes _goto
  • return becomes vreturn (for void return)
  • instanceof becomes _instanceof
  • new becomes _new

ldc

ASM automatically deals with wide variants of LDC calls. However, the parameters for LDC must be explicitely types. For example :

ldc 2.0d

will work, while :

ldc 2.0

won't. That's because the defaut type for decimals in Groovy is BigDecimal, not Double. If you want to push a float instead of a double, just write 2.0f.