Use Case / Experiment TsundereChen
Test Environment • Vagrant, Ubuntu/16.04 • The benchmark result on Host OS and Guest OS is really close, so I use VM to get result
 (BTW, it's really easy to get your VM dirty
Test Version • CPython2 2.7.12 • CPython3 3.5.2 • PyPy2 5.1.2 (Installed from apt-get) • PyPy2 5.6.0 (Compiled from source) • PyPy3 5.7.1 (Compiled from source)
Why only CPython & PyPy • Cython • You'll need to learn Cython's syntax, it's mixing C and Python. • Jython • The latest version of Jython 2.7.0 is released in May 2015, so it's outdated
Some notice • PyPy3 is still in beta, so if it's slower than CPython 3, no surprise • And not every module can run faster in PyPy than CPython, there will be samples later
Why PyPy3 is beta ? • The way CPython develop and the way PyPy develop is different • CPython • Focus on Python3, only maintain Python2 when security issue pops up • PyPy • Focus on PyPy2, also updating PyPy3, but it's not their main development
How to run PyPy
PyPy Installation • pypy.org/download.html • If you download binary • run bin/pypy
PyPy Installation • If you want to compile PyPy from scratch • First, install dependencies • http://doc.pypy.org/en/latest/build.html • Then, cd to pypy/goal
PyPy Installation • No-JIT • <Python/PyPy> ../../rpython/bin/rpython -- opt=2 • JIT-Enabled • <Python/PyPy> ../../rpython/bin/rpython -- opt=jit
PyPy Installation • Notice • Compile PyPy takes lots of time, and compile it with JIT-Enabled takes even more. • Usually takes 30min up • And you need at least 4G RAM to compile it on 64-Bit Machine, make sure you have enough RAM for this, or it may be killed by system
Mandelbrot — For Fun
Benchmark result
gcbench
json_bench
django_template
nqueens
regex_v8
richards
scimark
sqlalchemy_declarative
sqlalchemy_imperative
So, why do we still need CPython?
Not every case should use PyPy • For example, when it comes to the code below, CPython is faster than PyPy
 myStr = “”
 for x in xrange(1, 10**6):
 myStr += str(myStr[x])
Can <package> run on PyPy? http://packages.pypy.org/
Enough benchmark, let's get to DSL
Example • Language: Brainf*ck • 8 commands • + mem[ptr] += 1 - mem[ptr] -= 1
 < ptr -= 1 > ptr += 1
 , input() . print()
 [ while(mem[ptr]){ ] }
Repo for Brainf*ck experiment • https://github.com/TsundereChen/bf_to_py
Our Goal • Build a Brainf*ck Interpreter • Build a Brainf*ck to Python translator, and compile it with PyPy
Interpreter • Just read in the file, and execute the command • But, we can add JIT here
What to do to add JIT • We need to find "Reds" and "Greens" • Greens -> Define instructions • Reds -> What's being manipulated
What to do to add JIT • from rpython.rlib.jit import JitDriver • jitdriver = JitDriver(greens=[], reds=[]) • and add jit_merge_point to your main loop
Difference Hm....Not very good, right ? Notice the second It’s 2.73… v.s 2.61…
JIT is not enough...
 How about some opts
Optimize • Speed up loop • Because every loop needs to look up address in dictionary, but the dictionary is static, so we can use @elidable decorator and add a function to speed up
Difference Hmm.... Better
Difference Hmm.... Better
Difference Hmm.... Better
Difference Hmm.... Better
Okay...enough interpreter
 Let's talk about compiler
Basic Knowledge • It reads in Brainf*ck file, then turn into IR • Then you can choose to do Optimize in IR • Finally, turn your IR into Python Code, and compile it with PyPy to generate a binary file Brainf*ck Code IR Python Code Binary File
Architecture • ir.py -> For Brainf*ck to IR and IR to Python • trans.py -> Main program • python trans.py <input> <output> <optmode> • optmode 1 to open optimization, 0 to not to • opt.py -> Optimize tricks
Optimizations • opt_contract ( Contract) • Operation like " +++++ ", means that we have to do "mem[p] += 1" five times • But because we have IR, so we can change the instruction to "mem[p] += 5" • When it comes to “+ - > <“, this trick can apply
Optimizations • opt_clearloop (Clear Loop) • Command like [-], it means when(mem[p]), do mem[p] -= 1 • We know what the result is, so we can set mem[p] to zero directly
 mem[p] = 0
Optimizations • opt_multiloop & opt_copyloop (Multiplication and Copy) • Command like [->+>+<<] is copy mem[p]'s value to mem[p+1] and mem[p+2], and set mem[p] to zero • If we know what this is doing, we can make it short
Optimizations • opt_multiloop & opt_copyloop (Multiplication and Copy) • Same trick can apply to [->++<], make
 mem[p+1] = 2 * mem[p] and set mem[p] = 0 • Which is multiplication
Optimizations • opt_offsetops (Operation Offsets) • In Brainf*ck, we know that we have a pointer indicating where we are now, and pointer usually move a lot • What if we can calculate offset for Instructions directly, so we don't need to move the pointer around
Optimizations • opt_cancel (Cancel Instructions) • ++++-->>+-<<< do the same thing as ++< • Then, why waste all the time on these Instructons ?
Can it run faster ? Yes — JIT
Result
Result
Result
Result
Great! So I'll use JIT from now on
Wait a sec... • Not every case can use JIT • Because JIT needs to warm-up and Analysis
 Maybe warm-up can take more time than your code actually run • And it's import to avoid to record the warm-up time when you want to do some benchmarking
Wait a sec... • And do you really need JIT ? • It may cost a lot for one to import JIT to a project • Sometimes, maybe buy more server is a better choice than import JIT into your project
Wait a sec... • But if you analyzed your project, know how difficult it is for you to import PyPy and JIT into your project, then you're good to go! • BTW, file size of executable with JIT Enabled is bigger than the one with No-JIT
Question?
References • Tutorial: Writing an Interpreter with PyPy, part 1 • https://morepypy.blogspot.tw/2011/04/tutorial-writing-interpreter-with- pypy.html • PyPy - Tutorial for Brainf*ck Interpreter • http://wdv4758h.github.io/posts/2015/01/pypy-tutorial-for-brainfuck- interpreter/ • matslina/bfoptimization • https://github.com/matslina/bfoptimization/ • Virtual Machine Constructions for Dummies • https://www.slideshare.net/jserv/vm-construct

PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime -Part 2

  • 1.
    Use Case /Experiment TsundereChen
  • 2.
    Test Environment • Vagrant,Ubuntu/16.04 • The benchmark result on Host OS and Guest OS is really close, so I use VM to get result
 (BTW, it's really easy to get your VM dirty
  • 3.
    Test Version • CPython22.7.12 • CPython3 3.5.2 • PyPy2 5.1.2 (Installed from apt-get) • PyPy2 5.6.0 (Compiled from source) • PyPy3 5.7.1 (Compiled from source)
  • 4.
    Why only CPython& PyPy • Cython • You'll need to learn Cython's syntax, it's mixing C and Python. • Jython • The latest version of Jython 2.7.0 is released in May 2015, so it's outdated
  • 5.
    Some notice • PyPy3is still in beta, so if it's slower than CPython 3, no surprise • And not every module can run faster in PyPy than CPython, there will be samples later
  • 6.
    Why PyPy3 isbeta ? • The way CPython develop and the way PyPy develop is different • CPython • Focus on Python3, only maintain Python2 when security issue pops up • PyPy • Focus on PyPy2, also updating PyPy3, but it's not their main development
  • 7.
  • 8.
    PyPy Installation • pypy.org/download.html •If you download binary • run bin/pypy
  • 9.
    PyPy Installation • Ifyou want to compile PyPy from scratch • First, install dependencies • http://doc.pypy.org/en/latest/build.html • Then, cd to pypy/goal
  • 10.
    PyPy Installation • No-JIT •<Python/PyPy> ../../rpython/bin/rpython -- opt=2 • JIT-Enabled • <Python/PyPy> ../../rpython/bin/rpython -- opt=jit
  • 11.
    PyPy Installation • Notice •Compile PyPy takes lots of time, and compile it with JIT-Enabled takes even more. • Usually takes 30min up • And you need at least 4G RAM to compile it on 64-Bit Machine, make sure you have enough RAM for this, or it may be killed by system
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    So, why dowe still need CPython?
  • 24.
    Not every caseshould use PyPy • For example, when it comes to the code below, CPython is faster than PyPy
 myStr = “”
 for x in xrange(1, 10**6):
 myStr += str(myStr[x])
  • 25.
    Can <package> runon PyPy? http://packages.pypy.org/
  • 26.
  • 27.
    Example • Language: Brainf*ck •8 commands • + mem[ptr] += 1 - mem[ptr] -= 1
 < ptr -= 1 > ptr += 1
 , input() . print()
 [ while(mem[ptr]){ ] }
  • 28.
    Repo for Brainf*ckexperiment • https://github.com/TsundereChen/bf_to_py
  • 29.
    Our Goal • Builda Brainf*ck Interpreter • Build a Brainf*ck to Python translator, and compile it with PyPy
  • 30.
    Interpreter • Just readin the file, and execute the command • But, we can add JIT here
  • 31.
    What to doto add JIT • We need to find "Reds" and "Greens" • Greens -> Define instructions • Reds -> What's being manipulated
  • 32.
    What to doto add JIT • from rpython.rlib.jit import JitDriver • jitdriver = JitDriver(greens=[], reds=[]) • and add jit_merge_point to your main loop
  • 33.
    Difference Hm....Not very good,right ? Notice the second It’s 2.73… v.s 2.61…
  • 34.
    JIT is notenough...
 How about some opts
  • 35.
    Optimize • Speed uploop • Because every loop needs to look up address in dictionary, but the dictionary is static, so we can use @elidable decorator and add a function to speed up
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
    Basic Knowledge • Itreads in Brainf*ck file, then turn into IR • Then you can choose to do Optimize in IR • Finally, turn your IR into Python Code, and compile it with PyPy to generate a binary file Brainf*ck Code IR Python Code Binary File
  • 42.
    Architecture • ir.py ->For Brainf*ck to IR and IR to Python • trans.py -> Main program • python trans.py <input> <output> <optmode> • optmode 1 to open optimization, 0 to not to • opt.py -> Optimize tricks
  • 43.
    Optimizations • opt_contract (Contract) • Operation like " +++++ ", means that we have to do "mem[p] += 1" five times • But because we have IR, so we can change the instruction to "mem[p] += 5" • When it comes to “+ - > <“, this trick can apply
  • 44.
    Optimizations • opt_clearloop (ClearLoop) • Command like [-], it means when(mem[p]), do mem[p] -= 1 • We know what the result is, so we can set mem[p] to zero directly
 mem[p] = 0
  • 45.
    Optimizations • opt_multiloop &opt_copyloop (Multiplication and Copy) • Command like [->+>+<<] is copy mem[p]'s value to mem[p+1] and mem[p+2], and set mem[p] to zero • If we know what this is doing, we can make it short
  • 46.
    Optimizations • opt_multiloop &opt_copyloop (Multiplication and Copy) • Same trick can apply to [->++<], make
 mem[p+1] = 2 * mem[p] and set mem[p] = 0 • Which is multiplication
  • 47.
    Optimizations • opt_offsetops (OperationOffsets) • In Brainf*ck, we know that we have a pointer indicating where we are now, and pointer usually move a lot • What if we can calculate offset for Instructions directly, so we don't need to move the pointer around
  • 48.
    Optimizations • opt_cancel (CancelInstructions) • ++++-->>+-<<< do the same thing as ++< • Then, why waste all the time on these Instructons ?
  • 49.
    Can it runfaster ? Yes — JIT
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
    Great! So I'lluse JIT from now on
  • 55.
    Wait a sec... •Not every case can use JIT • Because JIT needs to warm-up and Analysis
 Maybe warm-up can take more time than your code actually run • And it's import to avoid to record the warm-up time when you want to do some benchmarking
  • 56.
    Wait a sec... •And do you really need JIT ? • It may cost a lot for one to import JIT to a project • Sometimes, maybe buy more server is a better choice than import JIT into your project
  • 57.
    Wait a sec... •But if you analyzed your project, know how difficult it is for you to import PyPy and JIT into your project, then you're good to go! • BTW, file size of executable with JIT Enabled is bigger than the one with No-JIT
  • 58.
  • 59.
    References • Tutorial: Writingan Interpreter with PyPy, part 1 • https://morepypy.blogspot.tw/2011/04/tutorial-writing-interpreter-with- pypy.html • PyPy - Tutorial for Brainf*ck Interpreter • http://wdv4758h.github.io/posts/2015/01/pypy-tutorial-for-brainfuck- interpreter/ • matslina/bfoptimization • https://github.com/matslina/bfoptimization/ • Virtual Machine Constructions for Dummies • https://www.slideshare.net/jserv/vm-construct