Posted on Mar 18, 2021

Kinx Library - JIT, Just In Time Compilation

Hello, everybody!

The script language Kinx was published as the version 1.0.0 here. The concept is, "Looks like JavaScript, feels like Ruby, and it is a script language fitting in C programmers."

But I realized there has been no article of JIT library in spite of a key feature, so I am going to post about JIT library.

If you were interested in JIT in Kinx, please see the document of JIT Compiler and Native for details.

Introduction

You can use JIT in Kinx for performance improvement. There are 2 ways to use JIT compilation below.

Use native keyword.
Use an abstracted assembler library.

I introduced native in some articles, for example, see Mandelbrot benchmark. It is a good start point to improve performance, so I will start this article from trying it with native.

When Not Using JIT

First of all, when it's the case without JIT, you will normally write the code below.

function fib(n) { if (n <= 3) return n; return fib(n-2) + fib(n-1); }

This is very simple. Now, this code will be modified to improve performance with JIT.

When Using JIT

`native` Keyword

The first try is to replace function by native. The native keyword is a keyword to compile it to the native code directly.

native fib(n) { if (n <= 3) return n; return fib(n-2) + fib(n-1); }

This will generate an assemble code below on Windows. This is a little long because it's including a type check and exception check. The pros is to write it easily, but the cons is that it can generate a little redundant code and it with some limitations. The code is redundant but necessary.

fib: (native-base:0x1b96340010) 0: 53 push rbx 1: 56 push rsi 2: 57 push rdi 3: 41 57 push r15 5: 41 56 push r14 7: 41 55 push r13 9: 55 push rbp a: 41 54 push r12 c: 48 8b d9 mov rbx, rcx f: 48 8b f2 mov rsi, rdx 12: 49 8b f8 mov rdi, r8 15: 4c 8b 8c 24 a8 fd ff ff mov r9, [rsp-0x258] 1d: 48 81 ec 58 02 00 00 sub rsp, 0x258 24: 48 8b 46 08 mov rax, [rsi+0x8] 28: 48 83 c0 01 add rax, 0x1 2c: 48 89 46 08 mov [rsi+0x8], rax 30: 48 3d 00 04 00 00 cmp rax, 0x400 36: 72 2b jb 0x63 38: 48 c7 43 20 01 00 00 00 mov qword [rbx+0x20], 0x1 40: 48 c7 43 28 06 00 00 00 mov qword [rbx+0x28], 0x6 48: 48 c7 c0 00 00 00 00 mov rax, 0x0 4f: 48 81 c4 58 02 00 00 add rsp, 0x258 56: 41 5c pop r12 58: 5d pop rbp 59: 41 5d pop r13 5b: 41 5e pop r14 5d: 41 5f pop r15 5f: 5f pop rdi 60: 5e pop rsi 61: 5b pop rbx 62: c3 ret 63: 48 83 be 18 01 00 00 01 cmp qword [rsi+0x118], 0x1 6b: 0f 85 30 01 00 00 jnz 0x1a1 71: 4c 8b 4e 18 mov r9, [rsi+0x18] 75: 4c 89 4c 24 20 mov [rsp+0x20], r9 7a: 4c 8b 74 24 20 mov r14, [rsp+0x20] 7f: 4c 89 f0 mov rax, r14 82: 48 83 f8 03 cmp rax, 0x3 86: 7f 1c jg 0xa4 88: 4c 8b 74 24 20 mov r14, [rsp+0x20] 8d: 4c 89 f0 mov rax, r14 90: 48 81 c4 58 02 00 00 add rsp, 0x258 97: 41 5c pop r12 99: 5d pop rbp 9a: 41 5d pop r13 9c: 41 5e pop r14 9e: 41 5f pop r15 a0: 5f pop rdi a1: 5e pop rsi a2: 5b pop rbx a3: c3 ret a4: 4c 8b 74 24 20 mov r14, [rsp+0x20] a9: 49 8d 46 fe lea rax, [r14-0x2] ad: 48 89 44 24 40 mov [rsp+0x40], rax b2: 48 c7 84 24 40 01 00 00 01 00 00 00 mov qword [rsp+0x140], 0x1 be: 48 8b 4e 10 mov rcx, [rsi+0x10] c2: 48 89 d8 mov rax, rbx c5: 4c 8b 4e 08 mov r9, [rsi+0x8] c9: 4c 89 4c 24 30 mov [rsp+0x30], r9 ce: 48 89 4c 24 38 mov [rsp+0x38], rcx d3: 48 8d 54 24 28 lea rdx, [rsp+0x28] d8: 49 89 ca mov r10, rcx db: 48 89 c1 mov rcx, rax de: 41 ff d2 call r10 e1: 49 89 c6 mov r14, rax e4: 48 8b 43 20 mov rax, [rbx+0x20] e8: 48 83 f8 00 cmp rax, 0x0 ec: 74 1b jz 0x109 ee: 48 c7 c0 00 00 00 00 mov rax, 0x0 f5: 48 81 c4 58 02 00 00 add rsp, 0x258 fc: 41 5c pop r12 fe: 5d pop rbp ff: 41 5d pop r13 101: 41 5e pop r14 103: 41 5f pop r15 105: 5f pop rdi 106: 5e pop rsi 107: 5b pop rbx 108: c3 ret 109: 4c 8b 6c 24 20 mov r13, [rsp+0x20] 10e: 49 8d 45 ff lea rax, [r13-0x1] 112: 48 89 44 24 40 mov [rsp+0x40], rax 117: 48 c7 84 24 40 01 00 00 01 00 00 00 mov qword [rsp+0x140], 0x1 123: 48 8b 4e 10 mov rcx, [rsi+0x10] 127: 48 89 d8 mov rax, rbx 12a: 4c 8b 4e 08 mov r9, [rsi+0x8] 12e: 4c 89 4c 24 30 mov [rsp+0x30], r9 133: 48 89 4c 24 38 mov [rsp+0x38], rcx 138: 48 8d 54 24 28 lea rdx, [rsp+0x28] 13d: 49 89 ca mov r10, rcx 140: 48 89 c1 mov rcx, rax 143: 41 ff d2 call r10 146: 49 89 c5 mov r13, rax 149: 48 8b 43 20 mov rax, [rbx+0x20] 14d: 48 83 f8 00 cmp rax, 0x0 151: 74 1b jz 0x16e 153: 48 c7 c0 00 00 00 00 mov rax, 0x0 15a: 48 81 c4 58 02 00 00 add rsp, 0x258 161: 41 5c pop r12 163: 5d pop rbp 164: 41 5d pop r13 166: 41 5e pop r14 168: 41 5f pop r15 16a: 5f pop rdi 16b: 5e pop rsi 16c: 5b pop rbx 16d: c3 ret 16e: 4b 8d 04 2e lea rax, [r14+r13] 172: 48 81 c4 58 02 00 00 add rsp, 0x258 179: 41 5c pop r12 17b: 5d pop rbp 17c: 41 5d pop r13 17e: 41 5e pop r14 180: 41 5f pop r15 182: 5f pop rdi 183: 5e pop rsi 184: 5b pop rbx 185: c3 ret 186: 48 c7 c0 00 00 00 00 mov rax, 0x0 18d: 48 81 c4 58 02 00 00 add rsp, 0x258 194: 41 5c pop r12 196: 5d pop rbp 197: 41 5d pop r13 199: 41 5e pop r14 19b: 41 5f pop r15 19d: 5f pop rdi 19e: 5e pop rsi 19f: 5b pop rbx 1a0: c3 ret 1a1: 48 c7 43 20 01 00 00 00 mov qword [rbx+0x20], 0x1 1a9: 48 c7 43 28 07 00 00 00 mov qword [rbx+0x28], 0x7 1b1: 48 c7 c0 00 00 00 00 mov rax, 0x0 1b8: 48 81 c4 58 02 00 00 add rsp, 0x258 1bf: 41 5c pop r12 1c1: 5d pop rbp 1c2: 41 5d pop r13 1c4: 41 5e pop r14 1c6: 41 5f pop r15 1c8: 5f pop rdi 1c9: 5e pop rsi 1ca: 5b pop rbx 1cb: c3 ret

Abstracted Assembler Library

Kinx also has a JIT library to use an abstracted assembler.
That library can be available with using JIT at the head of source code.

Let's use it like this.

using Jit; var c = new Jit.Compiler(); var entry1 = c.enter(); var jump0 = c.ge(Jit.S0, Jit.IMM(3)); c.ret(Jit.S0); var l1 = c.label(); c.sub(Jit.R0, Jit.S0, Jit.IMM(2)); c.call(entry1); c.mov(Jit.S1, Jit.R0); c.sub(Jit.R0, Jit.S0, Jit.IMM(1)); c.call(entry1); c.add(Jit.R0, Jit.R0, Jit.S1); c.ret(Jit.R0); jump0.setLabel(l1); var code = c.generate();

You can see the assembled code by code.dump(). Here it is.

 0: 53 push rbx 1: 56 push rsi 2: 57 push rdi 3: 48 8b d9 mov rbx, rcx 6: 48 8b f2 mov rsi, rdx 9: 49 8b f8 mov rdi, r8 c: 4c 8b 4c 24 d0 mov r9, [rsp-0x30] 11: 48 83 ec 30 sub rsp, 0x30 15: 48 83 fb 03 cmp rbx, 0x3 19: 73 0b jae 0x26 1b: 48 89 d8 mov rax, rbx 1e: 48 83 c4 30 add rsp, 0x30 22: 5f pop rdi 23: 5e pop rsi 24: 5b pop rbx 25: c3 ret 26: 48 8d 43 fe lea rax, [rbx-0x2] 2a: 48 89 c1 mov rcx, rax 2d: e8 ce ff ff ff call 0x0 32: 48 89 c6 mov rsi, rax 35: 48 8d 43 ff lea rax, [rbx-0x1] 39: 48 89 c1 mov rcx, rax 3c: e8 bf ff ff ff call 0x0 41: 48 03 c6 add rax, rsi 44: 48 83 c4 30 add rsp, 0x30 48: 5f pop rdi 49: 5e pop rsi 4a: 5b pop rbx 4b: c3 ret

It should be simpler than the native. It is very obvious because it is as you write. There is no type check and no exception check. The pros is that it can generate a simple and high-performance code, but the cons is that you have to care for everything.

Put It All Together and Benchmark

Let's put it all together and benchmark it.

using Jit; /* ------------------------------------------------------------------------ JIT ------------------------------------------------------------------------ */ var c = new Jit.Compiler(); var entry1 = c.enter(); var jump0 = c.ge(Jit.S0, Jit.IMM(3)); c.ret(Jit.S0); var l1 = c.label(); c.sub(Jit.R0, Jit.S0, Jit.IMM(2)); c.call(entry1); c.mov(Jit.S1, Jit.R0); c.sub(Jit.R0, Jit.S0, Jit.IMM(1)); c.call(entry1); c.add(Jit.R0, Jit.R0, Jit.S1); c.ret(Jit.R0); jump0.setLabel(l1); var code = c.generate(); var n = 36; var tmr = new SystemTimer(); var r = code.run(n); var elapsed = tmr.elapsed(); System.println("[elapsed:%8.3f] JIT lib fib(%2d) = %d" % elapsed % n % r); /* ------------------------------------------------------------------------ native ------------------------------------------------------------------------ */ native fibn(n) { if (n <= 3) return n; return fibn(n-2) + fibn(n-1); } tmr.restart(); r = fibn(n); elapsed = tmr.elapsed(); System.println("[elapsed:%8.3f] native fib(%2d) = %d" % elapsed % n % r); /* ------------------------------------------------------------------------ normal case ----------------------------------------------------------------------------- */ function fib(n) { if (n <= 3) return n; return fib(n-2) + fib(n-1); } tmr.restart(); r = fib(n); elapsed = tmr.elapsed(); System.println("[elapsed:%8.3f] function fib(%2d) = %d" % elapsed % n % r);

Here is the result.

[elapsed: 0.074] JIT lib fib(36) = 24157817 [elapsed: 0.158] native fib(36) = 24157817 [elapsed: 2.472] function fib(36) = 24157817

Comparison

By the way, the result by cl -O2 without compilation time is below. And the result of my C interpreter by x64 JIT compilation is also below. By this, I feel the result of the JIT lib is almost same as the case when adding a compilation time.

[elapsed: 0.049] fib(36) = 24157817 // => cl -O2 [elapsed: 0.094] fib(36) = 24157817 // => kcs -j

Conclusion

native is very simple but there are some limitations. JIT Library will be very useful for some specific situation. You can see the document of JIT Compiler and Native for details.

I hope you will find a use case to use this library and it helps you.

DEV Community

Kinx Library - JIT, Just In Time Compilation

Introduction

When Not Using JIT

When Using JIT

`native` Keyword

Abstracted Assembler Library

Put It All Together and Benchmark

Comparison

Conclusion

Top comments (0)

Introduction

When Not Using JIT

When Using JIT

native Keyword

Abstracted Assembler Library

Put It All Together and Benchmark

Comparison

Conclusion

`native` Keyword