DEV Community

Tomasz Wegrzanowski
Tomasz Wegrzanowski

Posted on

100 Languages Speedrun: Episode 77: JVM Assembly with Jasmin

Java Virtual Machine (JVM) works in a weird way. First there's some source code in Java or Kotlin or whatnot. It gets compiled into JVM Assembly as intermediate form. Then that gets compiled into actual executable machine code.

In a way that JVM Assembly level is not necessary, and for example Android uses a different flow, and not even the same one between versions.

Anyway, let's see how the classic JVM Assembly for a regular JVM looks like. JVM doesn't include tools for human-readable assembly, so for that I'll be using Jasmin. There are a few other JVM assembly programs with slight differences, but none of the differences matter for our simple use case.

Hello, World!

It's going to be helpful if you know some basics about JVM - either Java or one of the other JVM languages. But if not, I'll still try to explain everything step by step.

Let's start with Hello, World! Here's Hello.j:

.class public Hello .super java/lang/Object .method public static main([Ljava/lang/String;)V .limit locals 1 .limit stack 2 getstatic java/lang/System/out Ljava/io/PrintStream; ldc "Hello, World!" invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V return .end method 
Enter fullscreen mode Exit fullscreen mode

We can compile it with jasmin into Hello.class:

$ jasmin Hello.j Generated: Hello.class 
Enter fullscreen mode Exit fullscreen mode

And then run by passing the class name to java:

$ java Hello Hello, World! 
Enter fullscreen mode Exit fullscreen mode

In case it's confusing, java command is not about Java the language, it's about Java Virtual Machine only. javac command deals with Java the language.

So what's going on, let's first look at structure of the Hello.j file:

  • code needs to be in a class, and class generally should match the file name - for Hello.j we start by defining .class public Hello
  • every class has a superclass, and the default one is Object, internally known as java/lang/Object
  • we then define a method main - this is what will run when you run class from command line. .method public static methodname ... .end method is a method definition, inside the method are various instructions that will run when the method runs
  • if there were multiple methods, we'd have multiple .method ... .end method blocks
  • declaring a method .method public static means it's a class method, and is not bound to any specific instance

Name Mangling

So far that makes sense. The first question you might have is what the hell are those names:

  • main([Ljava/lang/String;)V
  • Ljava/io/PrintStream;
  • java/io/PrintStream/println(Ljava/lang/String;)V

That's the "name mangling". In Java and a few other languages, you can have multiple functions or methods with the same name, as long as their types are different.

I'm not sure what's the easiest way to figure out the "mangled" name.

For example main([Ljava/lang/String;)V means int main(java.lang.String[]);:

  • name main goes first
  • then ( starts argument list
  • [ means array of
  • Ljava/lang/String; means object of type java.lang.String - the semicolon is there to show where the name finishes
  • ) closes arguments list
  • V after it means it returns an void, that is nothing

Name mangling depends on the language we use. Other JVM languages use Java-compatible mangling for the kinds of things Java supports, but have to come up with their own name mangling schemes for any extra features they need. Inside JVM these mangled names are all just flat strings, but Jasmin still cares about them to setup everything properly, so we need to follow Java name mangling rules here.

Inside the Hello, World! method

OK, let's look inside the method.

First we start with some .limit definitions.

 .limit locals 1 .limit stack 2 
Enter fullscreen mode Exit fullscreen mode

These specify how many local variables the function has, and how much stack space it needs at most. JVM is a stack machine, so most the instructions push thing on stack, or pop them off the stack to perform various operations.

I'm reasonably sure Jasmin could do this calculation automatically for us - because JVM sure does and will refuse to load a class if you specify numbers that are too low.

Now we need to push some values on stack:

 getstatic java/lang/System/out Ljava/io/PrintStream; ldc "Hello, World!" 
Enter fullscreen mode Exit fullscreen mode

getstatic gets a static value of certain type - it this case java.lang.System.out of type java.io.PrintStream - and pushes reference to it on top of the stack.

Then ldc "Hello, World!" gets a constant value and pushes reference to it on top of the stack.

Then we call a method:

 invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V 
Enter fullscreen mode Exit fullscreen mode

invokevirtual calls a virtual method - that is one that can potentially be overloaded by a subclass. We need to specify full mangled name of the method we're passing. JVM knows how many arguments from top of the stack it will use based on its type signature.

Finally:

return returns from the method.

For a minor technical point, all class, method, and type names, constant strings, and so on, are stored in a "constant pool" not in the bytecode. The bytecode actually refers to "constant #7 from the constant pool" or such. But it would be really tedious to write this way, so Jasmin does at least that much for us.

Loop

Let's do something slightly more complicated, a method that loops some number of times and prints numbers as it does so.

Here's a loop printing values 1 to 10:

.class public Loop .super java/lang/Object .method public static main([Ljava/lang/String;)V .limit locals 1 .limit stack 2 iconst_1 ; push value 1 on stack istore 0 ; save that to local variable #0 loop: iload 0 ; push local #0 to stack bipush 10 ; push byte value 10 on stack if_icmpgt end_loop ; if local #0 > 10, goto end_loop getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack iload 0 ; load local #0 invokevirtual java/io/PrintStream/println(I)V ; print it iinc 0 1 ; increase local variable #0 by 1 goto loop end_loop: return .end method 
Enter fullscreen mode Exit fullscreen mode

Let give it a go:

$ jasmin Loop.j Generated: Loop.class $ java Loop 1 2 3 4 5 6 7 8 9 10 
Enter fullscreen mode Exit fullscreen mode

There's a few new opcodes:

  • iconst_1 - pushes 1 on stack - a few numbers are so common they got their own opcodes
  • bipush 10 - pushes 10 on stack - 8bit bit bigger numbers are pushed by bipush, there's also sipush for 16bit - for anything bigger, it's loaded from the constant pool
  • istore 0 and iload 0 - store and load local variables
  • iinc 0 1 - increment local variable #0 by 1 - can be negative as well to decrement
  • goto label - jump to label
  • if_icmpgt label - go to label if top two values ale greater than one another (icmpgt for Integer CoMPare Greater Than)
  • notice that the method we're calling changed from java/io/PrintStream/println(Ljava/lang/String;)V (print a string) to java/io/PrintStream/println(I)V (print an int) - these are totally separate and unrelated methods as far as JVM is concerned; in every JVM language we'd just say println(...) and it would figure it out for us which of them we meant; but that needs to be disambiguated before JVM gets to it

FizzBuzz

We now have everything we needed to create FizzBuzz.

.class public FizzBuzz .super java/lang/Object .method public static main([Ljava/lang/String;)V .limit locals 1 .limit stack 2 iconst_0 ; push value 0 on stack istore 0 ; save that to local variable #0 loop: iinc 0 1 ; increase local variable #0 by 1 iload 0 ; push local #0 to stack bipush 100 ; push byte value 10 on stack if_icmpgt end_loop ; if local #0 > 100, goto end_loop iload 0 bipush 15 irem ifeq fizzbuzz iload 0 iconst_5 irem ifeq buzz iload 0 iconst_3 irem ifeq fizz print_number: getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack iload 0 ; load local #0 invokevirtual java/io/PrintStream/println(I)V ; print it goto loop fizz: getstatic java/lang/System/out Ljava/io/PrintStream; ldc "Fizz" invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V goto loop buzz: getstatic java/lang/System/out Ljava/io/PrintStream; ldc "Buzz" invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V goto loop fizzbuzz: getstatic java/lang/System/out Ljava/io/PrintStream; ldc "FizzBuzz" invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V goto loop end_loop: return .end method 
Enter fullscreen mode Exit fullscreen mode

The only new operations we'll need are:

  • iconst_X opcodes go up to 5 so we can use optimized iconst_3 and iconst_5 - but for bigger numbers we need bipush 15
  • irem is % operation
  • ifeq and other ifXX opcodes compare with 0, if_icmpXX opcodes compare two integer values

Fibonacci

Let's do the next usual thing, and define Fibonacci, calculated recursively with an equivalent of a public static int fib(int n) function.

.class public Fib .super java/lang/Object .method public static fib(I)I .limit stack 3 iload_0 iconst_2 if_icmple small_fib big_fib: iload_0 iconst_1 isub invokestatic Fib/fib(I)I ; push fib(i-1) to stack iload_0 iconst_2 isub invokestatic Fib/fib(I)I ; push fib(i-2) to stack iadd ireturn ; return fib(i-1) + fib(i-2) small_fib: iconst_1 ireturn ; return 1 .end method .method public static main([Ljava/lang/String;)V .limit locals 1 .limit stack 2 iconst_1 ; push value 1 on stack istore 0 ; save that to local variable #0 loop: iload 0 ; push local #0 to stack bipush 30 ; push byte value 10 on stack if_icmpgt end_loop ; if local #0 > 10, goto end_loop getstatic java/lang/System/out Ljava/io/PrintStream; ldc "fib(" invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V ; print "fib(" getstatic java/lang/System/out Ljava/io/PrintStream; iload 0 ; load local #0 invokevirtual java/io/PrintStream/print(I)V ; print i getstatic java/lang/System/out Ljava/io/PrintStream; ldc ")=" invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V ; print ")=" getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack iload 0 ; load local #0 invokestatic Fib/fib(I)I invokevirtual java/io/PrintStream/println(I)V ; print fib(i) iinc 0 1 ; increase local variable #0 by 1 goto loop end_loop: return .end method 
Enter fullscreen mode Exit fullscreen mode
$ jasmin Fib.j Generated: Fib.class $ java Fib fib(1)=1 fib(2)=1 fib(3)=2 fib(4)=3 fib(5)=5 fib(6)=8 fib(7)=13 fib(8)=21 fib(9)=34 fib(10)=55 fib(11)=89 fib(12)=144 fib(13)=233 fib(14)=377 fib(15)=610 fib(16)=987 fib(17)=1597 fib(18)=2584 fib(19)=4181 fib(20)=6765 fib(21)=10946 fib(22)=17711 fib(23)=28657 fib(24)=46368 fib(25)=75025 fib(26)=121393 fib(27)=196418 fib(28)=317811 fib(29)=514229 fib(30)=832040 
Enter fullscreen mode Exit fullscreen mode

Let's go through it step by step:

  • the main function has a loop, with various print(String), print(int), and println(int) calls in it
  • invokestatic Fib/fib(I)I invokes static function int fib(int) in class Fib - the one we're currently in
  • inside fib we do recursive calls to invokestatic Fib/fib(I)I
  • iload_0 pushes method's first argument to stack (arguments just become local variables, so they share same numbers)
  • iadd and isub do integer addition and subtraction
  • ireturn returns an integer

Java Disassembler javap

A popular related package is Java Disassembler javap which can turn .class file back into JVM assembly. Unfortunately javap and jasmin don't really agree on the details much:

$ javap -c Fib.class Compiled from "Fib.j" public class Fib { public static int fib(int); Code: 0: iload_0 1: iconst_2 2: if_icmple 19 5: iload_0 6: iconst_1 7: isub 8: invokestatic #21 // Method fib:(I)I 11: iload_0 12: iconst_2 13: isub 14: invokestatic #21 // Method fib:(I)I 17: iadd 18: ireturn 19: iconst_1 20: ireturn public static void main(java.lang.String[]); Code: 0: iconst_1 1: istore 0 3: iload 0 5: bipush 30 7: if_icmpgt 54 10: getstatic #15 // Field java/lang/System.out:Ljava/io/PrintStream; 13: ldc #32 // String fib( 15: invokevirtual #24 // Method java/io/PrintStream.print:(Ljava/lang/String;)V 18: getstatic #15 // Field java/lang/System.out:Ljava/io/PrintStream; 21: iload 0 23: invokevirtual #29 // Method java/io/PrintStream.print:(I)V 26: getstatic #15 // Field java/lang/System.out:Ljava/io/PrintStream; 29: ldc #14 // String )= 31: invokevirtual #24 // Method java/io/PrintStream.print:(Ljava/lang/String;)V 34: getstatic #15 // Field java/lang/System.out:Ljava/io/PrintStream; 37: iload 0 39: invokestatic #21 // Method fib:(I)I 42: invokevirtual #9 // Method java/io/PrintStream.println:(I)V 45: iinc_w 0, 1 51: goto 3 54: return } 
Enter fullscreen mode Exit fullscreen mode

As you can see, javap uses demangled names, it has explicit references to constant pool by number, and some of the opcodes are different (like iinc_w 0, 1 vs iinc 0 1).

Person Class

It would make no sense to end this episode without defining a small class. For this let's just define Person with two string fields (name, surname), and one toString method. We'll also have static main to test it.

I put comments inside the code. For non-static methods this is passed as extra first argument, so local variables from JVM point of view might look like this:

  • local 0 - this
  • local 1 - first argument
  • local 2 - second argument
  • local 3 - first local variable
  • local 4 - second local variable
.class public Person .super java/lang/Object .field public name Ljava/lang/String; .field public surname Ljava/lang/String; .method public <init>(Ljava/lang/String;Ljava/lang/String;)V .limit locals 4 .limit stack 4 ; local 0 - this ; local 1 - argument name ; local 2 - argument surname ; call super this.<init>(); aload_0 invokespecial java/lang/Object/<init>()V ; this.name = argument_name aload_0 aload_1 putfield Person/name Ljava/lang/String; ; this.surname = argument_surname aload_0 aload_2 putfield Person/surname Ljava/lang/String; return .end method .method public toString()Ljava/lang/String; .limit locals 4 .limit stack 4 ; local 0 - this ; push this.name aload_0 getfield Person/name Ljava/lang/String; ; push " " ldc " " ; call String.concat, getting: this.name + " " invokevirtual java/lang/String/concat(Ljava/lang/String;)Ljava/lang/String; ; push this.surname aload_0 getfield Person/surname Ljava/lang/String; ; call String.concat, getting: this.name + " " + this.surname invokevirtual java/lang/String/concat(Ljava/lang/String;)Ljava/lang/String; areturn .end method .method public static main([Ljava/lang/String;)V .limit locals 4 .limit stack 4 ; local Person a = new Person("Alice", "Smith") new Person dup ldc "Alice" ldc "Smith" invokespecial Person/<init>(Ljava/lang/String;Ljava/lang/String;)V astore_1 getstatic java/lang/System/out Ljava/io/PrintStream; ; push a.toString() aload_1 invokevirtual Person/toString()Ljava/lang/String; invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V return .end method 
Enter fullscreen mode Exit fullscreen mode
$ jasmin Person.j Generated: Person.class $ java Person Alice Smith 
Enter fullscreen mode Exit fullscreen mode

Should you use JVM Assembly?

It's meant for human use even less than regular assembly, so definitely not.

There's an additional problem that unlike regular assembly or LLVM assembly where there's some fully supported standard format, Jasmin is a third party program and different JVM assemblers and disassemblers disagree on so many things. There are also some newer assemblers and disassemblers like Krakatau you could try instead. Krakatau has different syntax than Jasmin or javap.

It could be helpful to have a general idea how it works if you're developing a new language for the JVM, but that's about it.

Another way to familiarize yourself with JVM assembly is with GodBolt compiler site, but that just compiles your language (Java, Kotlin etc.) and runs javap on the output, so you can do it locally too.

Code

All code examples for the series will be in this repository.

Code for the JVM Assembly with Jasmin episode is available here.

Top comments (0)