Project

General

Profile

Actions

Feature #16837

closed

Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?

Feature #16837: Can we make Ruby 3.0 as fast as Ruby 2.7 with the new assertions?

Added by k0kubun (Takashi Kokubun) over 5 years ago. Updated over 5 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:98174]

Description

Problem

How can we make Ruby 3.0 as fast as (or faster than) Ruby 2.7?

Background

Possible approaches

I have no strong preference yet. Here are some random ideas:

  • Optimize the assertion code somehow
  • Enable the new assertions only on CIs, at least ones in hot spots
    • Not sure which places have large impact on Optcarrot yet
  • Make some other not-so-important assertions CI-only to offset the impact from new ones
  • Provide .so for an assertion-enabled mode? (ko1's idea)

I hope people will comment more ideas in this ticket.


Related issues 1 (0 open1 closed)

Updated by k0kubun (Takashi Kokubun) over 5 years ago Actions #1

  • Tracker changed from Bug to Feature
  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)

Updated by k0kubun (Takashi Kokubun) over 5 years ago Actions #2

  • Description updated (diff)

Updated by k0kubun (Takashi Kokubun) over 5 years ago Actions #3

  • Description updated (diff)

Updated by shyouhei (Shyouhei Urabe) over 5 years ago Actions #4 [ruby-core:98182]

I would like to suggest that if a user really favor speed over sanity check, they should just compiler everything with -DNDEBUG. This has been the standard C manner since long before Ruby's birth.

Updated by shyouhei (Shyouhei Urabe) over 5 years ago Actions #5 [ruby-core:98183]

Some analysis of the slowdown.

Looking at the generated binary and perf output, the slowdown is because some functions are not inlined. Might depend on compilers, but for me rb_array_len() is one of such victim:

zsh % gdb -batch -ex 'file miniruby' -ex 'disassemble rb_array_len' Dump of assembler code for function rb_array_len: 0x0000000000295540 <+0>: push %rbx 0x0000000000295541 <+1>: mov %rdi,%rbx 0x0000000000295544 <+4>: test $0x7,%bl 0x0000000000295547 <+7>: jne 0x2955be <rb_array_len+126> 0x0000000000295549 <+9>: mov %rbx,%rax 0x000000000029554c <+12>: and $0xfffffffffffffff7,%rax 0x0000000000295550 <+16>: je 0x2955be <rb_array_len+126> 0x0000000000295552 <+18>: mov (%rbx),%rax 0x0000000000295555 <+21>: mov %eax,%edx 0x0000000000295557 <+23>: and $0x1f,%edx 0x000000000029555a <+26>: mov $0x7,%ecx 0x000000000029555f <+31>: cmp $0x7,%edx 0x0000000000295562 <+34>: jne 0x295585 <rb_array_len+69> 0x0000000000295564 <+36>: test $0x2000,%eax ;; <- This is `RB_FL_ANY_RAW(a, RARRAY_EMBED_FLAG)` 0x0000000000295569 <+41>: jne 0x295571 <rb_array_len+49> 0x000000000029556b <+43>: mov 0x10(%rbx),%rax ;; <- 0x000000000029556f <+47>: pop %rbx ;; <- This is `return RARRAY(a)->as.heap.len;` 0x0000000000295570 <+48>: retq ;; <- 0x0000000000295571 <+49>: cmp $0x7,%ecx 0x0000000000295574 <+52>: jne 0x2955a2 <rb_array_len+98> 0x0000000000295576 <+54>: test $0x2000,%eax 0x000000000029557b <+59>: je 0x2955ea <rb_array_len+170> 0x000000000029557d <+61>: shr $0xf,%eax ;; <- 0x0000000000295580 <+64>: and $0x3,%eax ;; <- This is `return RARRAY_EMBED_LEN(a);` 0x0000000000295583 <+67>: pop %rbx ;; <- 0x0000000000295584 <+68>: retq ;; <- 0x0000000000295585 <+69>: mov %rbx,%rdi 0x0000000000295588 <+72>: mov $0x7,%esi 0x000000000029558d <+77>: callq 0xcaea2 <rb_check_type> 0x0000000000295592 <+82>: mov (%rbx),%rax 0x0000000000295595 <+85>: mov %eax,%ecx 0x0000000000295597 <+87>: and $0x1f,%ecx 0x000000000029559a <+90>: cmp $0x1b,%rcx 0x000000000029559e <+94>: jne 0x295564 <rb_array_len+36> 0x00000000002955a0 <+96>: jmp 0x2955cb <rb_array_len+139> 0x00000000002955a2 <+98>: mov %rbx,%rdi 0x00000000002955a5 <+101>: mov $0x7,%esi 0x00000000002955aa <+106>: callq 0xcaea2 <rb_check_type> 0x00000000002955af <+111>: mov (%rbx),%rax 0x00000000002955b2 <+114>: mov %eax,%ecx 0x00000000002955b4 <+116>: and $0x1f,%ecx 0x00000000002955b7 <+119>: cmp $0x1b,%ecx 0x00000000002955ba <+122>: jne 0x295576 <rb_array_len+54> 0x00000000002955bc <+124>: jmp 0x2955cb <rb_array_len+139> 0x00000000002955be <+126>: mov %rbx,%rdi 0x00000000002955c1 <+129>: mov $0x7,%esi 0x00000000002955c6 <+134>: callq 0xcaea2 <rb_check_type> 0x00000000002955cb <+139>: lea 0x142fe(%rip),%rdi # 0x2a98d0 0x00000000002955d2 <+146>: lea 0x1432f(%rip),%rdx # 0x2a9908 0x00000000002955d9 <+153>: lea 0x14337(%rip),%rcx # 0x2a9917 0x00000000002955e0 <+160>: mov $0xea,%esi 0x00000000002955e5 <+165>: callq 0xcad86 <rb_assert_failure> 0x00000000002955ea <+170>: lea 0x14338(%rip),%rdi # 0x2a9929 0x00000000002955f1 <+177>: lea 0x1436d(%rip),%rdx # 0x2a9965 0x00000000002955f8 <+184>: lea 0x14377(%rip),%rcx # 0x2a9976 0x00000000002955ff <+191>: mov $0x79,%esi 0x0000000000295604 <+196>: callq 0xcad86 <rb_assert_failure> End of assembler dump. 

Here, assertions practically never fail. This means jumps are 100% predicted (almost no-op). They don't slow things. The problem is those unreachable branches. If you can read the assembly you see almost 2/3 of the above function just never reach. They blow the generated binary up significantly. rb_array_len is thus now considered too big to be inlined, to my compiler at least.

An obvious ad-hoc remedy is to supply __attribute__((__always_inline__)) for everything. But I don't think that's a good idea, because what is inlined and what is not depends very much on compilers, versions, target architectures, and almost everything.

Updated by shyouhei (Shyouhei Urabe) over 5 years ago Actions #6 [ruby-core:98184]

If you recompile everything using ./configure cppflags=-DNDEBUG, then those assertions are eliminated, to let compilers inline rb_array_len again.

Updated by shevegen (Robert A. Heiler) over 5 years ago Actions #7 [ruby-core:98185]

I have a question concerning one point mentioned above.

k0kubun wrote:

Provide .so for an assertion-enabled mode? (ko1's idea)

Could someone briefly explain the general idea behind this? I assume for a .so
file the ruby user would have to require/load that file, but what may be the
perceived benefits/disadvantages for doing so?

Updated by k0kubun (Takashi Kokubun) over 5 years ago Actions #8 [ruby-core:98194]

I would like to suggest that if a user really favor speed over sanity check, they should just compiler everything with -DNDEBUG. This has been the standard C manner since long before Ruby's birth.

Got it. I'll consider using -DNDEBUG in benchmark servers at least. Also maybe it's worth noting it in NEWS for those who package Ruby for performance-sensitive usages?

An obvious ad-hoc remedy is to supply __attribute__((__always_inline__)) for everything. But I don't think that's a good idea, because what is inlined and what is not depends very much on compilers, versions, target architectures, and almost everything.

Agreed. While it's not a good idea to always inline everything, some may be worth a consideration though.

I assume for a .so file the ruby user would have to require/load that file

His idea was to install the .so file to Ruby prefix by default and add a --debug-xxx option to load it.

Updated by k0kubun (Takashi Kokubun) over 5 years ago Actions #9

  • Related to Bug #16840: Decrease in Hash#[]= performance with object keys added

Updated by nobu (Nobuyoshi Nakada) over 5 years ago Actions #10 [ruby-core:98212]

Not only assertions, some optimizations can no longer be applied.

For instance, rb_str_new_cstr was defined as following in 2.7,

#define rb_str_new_cstr(str) RB_GNUC_EXTENSION_BLOCK(	\ (__builtin_constant_p(str)) ?	\	rb_str_new_static((str), (long)strlen(str)) : \	rb_str_new_cstr(str)	\ ) 

and rb_str_new_cstr("...") has been expected to be compiled as rb_str_new_static("...", 3).

The below is the master version.

static inline VALUE ruby3_str_new_cstr(const char *str) { if /* constexpr */ (! RUBY3_CONSTANT_P(str)) { return rb_str_new_cstr(str); } else { long len = ruby3_strlen(str); return rb_str_new_static(str, len); } } 

As str is an argument variable and RUBY3_CONSTANT_P(str) is always false here, _static function is never used (in Apple clang 11.0.3 and gcc 10.1.0-RC-20200430_0).

I'm uncertain how this particular case affects the whole performance though, similar un-optimizations might be more.

Updated by shyouhei (Shyouhei Urabe) over 5 years ago Actions #11 [ruby-core:98214]

nobu (Nobuyoshi Nakada) wrote in #note-10:

As str is an argument variable and RUBY3_CONSTANT_P(str) is always false here,

Well, thank you pointing this out. As I wrote in include/ruby/3/constant_p.h, you can apply __builtin_constant_p to an inline function argument. I thought that RUBY3_CONSTANT_P(str) is not always false. However https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html says:

You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC never returns 1 when you call the inline function with a string constant or ...

In this ruby3_str_new_cstr()'s particular case, the argument is a string. There is no chance. This is in fact wrong. We have to fix.

Updated by naruse (Yui NARUSE) over 5 years ago Actions #12 [ruby-core:98264]

I want Ruby 2.8/3.0 is faster than 2.7 by default.
NDEBUG is not acceptable.
I think Microsoft's _DEBUG approach is more reasonable.

Updated by shyouhei (Shyouhei Urabe) over 5 years ago Actions #13 [ruby-core:98277]

naruse (Yui NARUSE) wrote in #note-12:

NDEBUG is not acceptable.

NDEBUG is not my invention. Please file a bug report to upstream (ISO/IEC JTC1/SC22/WG14).

I'm not against defining it by default, though.

Updated by ko1 (Koichi Sasada) over 5 years ago Actions #14

  • Status changed from Open to Closed

Applied in changeset git|21991e6ca59274e41a472b5256bd3245f6596c90.


Use RUBY_DEBUG instead of NDEBUG

Assertions in header files slows down an interpreter, so they should be
turned off by default (simple make). To enable them, define a macro
RUBY_DEBUG=1 (e.g. make cppflags=-DRUBY_DEBUG or use #define at
the very beggining of the file. Note that even if NDEBUG=1 is defined,
RUBY_DEBUG=1 enables all assertions.
[Feature #16837]
related: https://github.com/ruby/ruby/pull/3120

assert() lines in MRI *.c is not disabled even if RUBY_DEBUG=0 and
it can be disabled with NDEBUG=1. So please consider to use
RUBY_ASSERT() if you want to disable them when RUBY_DEBUG=0.

Actions

Also available in: PDF Atom