Project

General

Profile

Actions

Misc #15007

open

Let all Init_xxx and extension APIs frequently called from init code paths be considered cold

Misc #15007: Let all Init_xxx and extension APIs frequently called from init code paths be considered cold

Added by methodmissing (Lourens Naudé) about 7 years ago. Updated over 1 year ago.

Status:
Assigned
[ruby-core:88548]

Description

References Github PR https://github.com/ruby/ruby/pull/1934

Why?

An incremental extraction from PR https://github.com/ruby/ruby/pull/1922, specifically addressing the feedback from Yui Naruse in https://github.com/ruby/ruby/pull/1922#issuecomment-413796710

The Linux kernel, PHP 7 and other projects use the hot and cold function attributes to help with better code layout.

I noticed Ruby is very much CPU frontend bound (not feeding instructions into the CPU pipelines as fast as it maybe could) and therefore even most micro benchmarks have a high CPI (cycles per instruction) rate. This PR is part of a larger chunk of work I'd like to do around improving CPU frontend throughput and can take a stab at formally writing up those ideas if there's any interest from the community. I don't know.

Implementation

This PR has an exclusive focus on having the Init_xxx functions for the core classes and those bundled in ext being flagged to be optimized for size as they're called only once at runtime.

The GCC specific cold function attribute works in the following way (from GCC docs):

The cold attribute is used to inform the compiler that a function is unlikely executed. The function is optimized for size rather than speed and on many targets it is placed into special subsection of the text section so all cold functions appears close together improving code locality of non-cold parts of program. The paths leading to call of cold functions within code are marked as unlikely by the branch prediction mechanism. It is thus useful to mark functions used to handle unlikely conditions, such as perror, as cold to improve optimization of hot functions that do call marked functions in rare occasions. When profile feedback is available, via -fprofile-use, hot functions are automatically detected and this attribute is ignored. 

By declaring a function as cold when defined we get the following benefits:

  • No-op on platforms that does not support the attribute
  • Size optimization of cold functions with a smaller footprint in the instruction cache
  • Therefore CPU frontend throughput increases due to a lower ratio of instruction cache misses and a lower ITLB overhead - see original chunky PR VS then trunk
  • This effect can further be amplified in future work with the hot attribute

Extension APIs flagged as cold

These are and should typically only be called on extension init, and thus safe to optimize for size as well.

  • void rb_define_method_id(VALUE, ID, VALUE (*)(ANYARGS), int));
  • void rb_undef(VALUE, ID));
  • void rb_define_protected_method(VALUE, const char*, VALUE (*)(ANYARGS), int));
  • void rb_define_private_method(VALUE, const char*, VALUE (*)(ANYARGS), int));
  • void rb_define_singleton_method(VALUE, const char*, VALUE(*)(ANYARGS), int));
  • void rb_define_alloc_func(VALUE, rb_alloc_func_t));
  • void rb_undef_alloc_func(VALUE));
  • VALUE rb_define_class(const char*,VALUE));
  • VALUE rb_define_module(const char*));
  • VALUE rb_define_class_under(VALUE, const char*, VALUE));
  • VALUE rb_define_module_under(VALUE, const char*));
  • void rb_define_variable(const char*,VALUE*));
  • void rb_define_virtual_variable(const char*,VALUE(*)(ANYARGS),void(*)(ANYARGS)));
  • void rb_define_hooked_variable(const char*,VALUE*,VALUE(*)(ANYARGS),void(*)(ANYARGS)));
  • void rb_define_readonly_variable(const char*,const VALUE*));
  • void rb_define_const(VALUE,const char*,VALUE));
  • void rb_define_global_const(const char*,VALUE));
  • void rb_define_method(VALUE,const char*,VALUE(*)(ANYARGS),int));
  • (void rb_define_module_function(VALUE,const char*,VALUE(*)(ANYARGS),int));
  • void rb_define_global_function(const char*,VALUE(*)(ANYARGS),int));
  • void rb_undef_method(VALUE,const char*));
  • void rb_define_alias(VALUE,const char*,const char*));
  • void rb_define_attr(VALUE,const char*,int,int));
  • void rb_global_variable(VALUE*));
  • void rb_gc_register_mark_object(VALUE));
  • void rb_gc_register_address(VALUE*));
  • void rb_gc_unregister_address(VALUE*));

Text segment reductions

Small changes (3144 bytes reduction of the text segment) because this is incremental groundwork and and initial low risk PR.

this branch:

lourens@CarbonX1:~/src/ruby/ruby$ size ruby text data bss dec hex	filename 3462153 21056 71344	3554553 363cf9	ruby 

trunk:

lourens@CarbonX1:~/src/ruby/trunk$ size ruby text data bss dec hex	filename 3465297 21056 71344	3557697 364941	ruby 

Diffs for individual object files: https://www.diffchecker.com/T0GVzX1q

Default text.unlikely section where init functions are moved to:

lourens@CarbonX1:~/src/ruby/ruby$ readelf -S vm.o There are 34 section headers, starting at offset 0x2a04f8: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .text PROGBITS 0000000000000000 00000040 000000000001c37f 0000000000000000 AX 0 0 16 [ 2] .rela.text RELA 0000000000000000 00114100 000000000000a7d0 0000000000000018 I 31 1 8 [ 3] .data PROGBITS 0000000000000000 0001c3c0 0000000000000030 0000000000000000 WA 0 0 16 [ 4] .bss NOBITS 0000000000000000 0001c400 00000000000002b0 0000000000000000 WA 0 0 32 [ 5] .rodata.str1.8 PROGBITS 0000000000000000 0001c400 0000000000000d6f 0000000000000001 AMS 0 0 8 [ 6] .text.unlikely PROGBITS 0000000000000000 0001d16f <<<<<<<<<<<<<<< 0000000000001aa9 0000000000000000 AX 0 0 1 

The relocations for vm.o:

lourens@CarbonX1:~/src/ruby/ruby$ ld -M vm.o --- truncated --- .text 0x0000000000400120 0x1de2f *(.text.unlikely .text.*_unlikely .text.unlikely.*) .text.unlikely 0x0000000000400120 0x1aa9 vm.o 0x000000000040038f rb_define_alloc_func 0x00000000004003bf rb_undef_alloc_func 0x00000000004003c5 Init_Method 0x0000000000400512 Init_vm_eval 0x00000000004007a1 Init_eval_method 0x0000000000400a54 rb_undef 0x0000000000400c1d Init_VM 0x000000000040185f Init_BareVM 0x0000000000401b16 Init_vm_objects 0x0000000000401b61 Init_top_self *(.text.exit .text.exit.*) *(.text.startup .text.startup.*) *(.text.hot .text.hot.*) *(.text .stub .text.* .gnu.linkonce.t.*) *fill* 0x0000000000401bc9 0x7 .text 0x0000000000401bd0 0x1c37f vm.o 0x00000000004022f0 rb_f_notimplement 0x0000000000404780 rb_vm_ep_local_ep 0x00000000004047b0 rb_vm_frame_block_handler 0x00000000004047e0 rb_vm_cref_new_toplevel 0x0000000000404870 rb_vm_block_ep_update 0x0000000000404890 ruby_vm_special_exception_copy 0x0000000000406960 rb_ec_stack_overflow 0x00000000004069c0 rb_vm_push_frame 0x0000000000406b20 rb_vm_pop_frame 0x0000000000406b30 rb_error_arity 0x0000000000407180 rb_vm_frame_method_entry 0x00000000004075e0 rb_vm_rewrite_cref 0x00000000004076f0 rb_simple_iseq_p 0x0000000000407700 rb_vm_opt_struct_aref 0x0000000000407730 rb_vm_opt_struct_aset 0x0000000000407750 rb_clear_constant_cache --- truncated --- 

I also dabbled with the idea of an INITFUNC macro that also places the Init_xxx functions into a text.init section as the kernel does for a possible future optimization of stripping out ELF sections for setup / init specific functions. I don't think that makes sense for now and possibly only interesting for mruby or embedded.

Possible next units of work

Cold code specific

TLB (translation lookaside buffer) specific

  • Further ITLB overhead investigation
  • Ruby binaries built with O3 and debug symbols come in at just short of 18MB, or roughly 9 hugepages on linux. PHP core developers were able to squeeze a few % by remapping code to hugepages on supported systems - http://developers-club.com/posts/270685/ . Implementation here

Bytecode specific

  • The Intel Tracing Task API is very well suited for the instruction sequences YARV generates and to infer better per instruction CPU utilization and identify any stalls (frontend, backend, branches etc.) to drive further work.

Updated by Eregon (Benoit Daloze) about 7 years ago Actions #1 [ruby-core:88550]

What is "original chunky PR" VS "then trunk"?
Which one is original and which one with your patch?
The first one seems better.

Updated by methodmissing (Lourens Naudé) about 7 years ago Actions #2 [ruby-core:88551]

Eregon (Benoit Daloze) wrote:

What is "original chunky PR" VS "then trunk"?
Which one is original and which one with your patch?
The first one seems better.

First one is from the changeset in https://github.com/ruby/ruby/pull/1922, but it's gotten too large to review, hence the split into smaller incremental changes. Sorry for that confusion.

Updated by shyouhei (Shyouhei Urabe) about 7 years ago Actions #3 [ruby-core:88559]

Interesting.

Re: label attributes. JFYI I once tried them https://github.com/ruby/ruby/pull/1805/files
(answer, bitblt, and all the trace_* instructions are marked cold.)
It did change generated binary to arrange branches, but I saw no performance gains.
Maybe this was because CPUs are much smarter than human beings to detect which branch is hot.

Updated by normalperson (Eric Wong) about 7 years ago Actions #4 [ruby-core:88566]

Thank you for taking the time to do this!

The GCC specific cold function attribute works in the following way (from GCC docs):

Since it's gcc (and clang); can we just use something like what
Linux uses:

#define __coldattribute((cold))

Instead of our ugly convention of: COLD( required prototype goes here );

It would allow us to be more DRY by avoiding prototypes for
static functions.

By declaring a function as cold when defined we get the following benefits:

  • No-op on platforms that does not support the attribute
  • Size optimization of cold functions with a smaller footprint in the instruction cache

Very much agreed. I've always hated the icache footprint from -O3
since so much of our code is rarely executed.

Another benefit is it helps document code paths as to which ones
are most critical :>

Extension APIs flagged as cold

These are and should typically only be called on extension init, and thus safe to optimize for size as well.

  • void rb_define_method(VALUE,const char*,VALUE(*)(ANYARGS),int));

Does this affect startup performance? (with --disable=gems)

  • Ruby binaries built with O3 and debug symbols come in at
    just short of 18MB, or roughly 9 hugepages on linux. PHP core
    developers were able to squeeze a few % by remapping code to
    hugepages on supported systems -
    http://developers-club.com/posts/270685/ . Implementation
    here

Cool. While THP is a disaster for CoW and anonymous mappings;
it makes perfect sense for large read-only mappings

Now, the question is; shouldn't this be done in ld-linux.so
(part of glibc) instead?

Updated by methodmissing (Lourens Naudé) about 7 years ago Actions #5 [ruby-core:88574]

normalperson (Eric Wong) wrote:

Thank you for taking the time to do this!

The GCC specific cold function attribute works in the following way (from GCC docs):

Since it's gcc (and clang); can we just use something like what
Linux uses:

#define __coldattribute((cold))

Instead of our ugly convention of: COLD( required prototype goes here );

Initial pass had that, but I decided to line up with recommendations from the guidelines. Not great for human parsing, especially when nested with NORETURN etc.

It would allow us to be more DRY by avoiding prototypes for
static functions.

By declaring a function as cold when defined we get the following benefits:

  • No-op on platforms that does not support the attribute
  • Size optimization of cold functions with a smaller footprint in the instruction cache

Very much agreed. I've always hated the icache footprint from -O3
since so much of our code is rarely executed.

Another benefit is it helps document code paths as to which ones
are most critical :>

Extension APIs flagged as cold

These are and should typically only be called on extension init, and thus safe to optimize for size as well.

  • void rb_define_method(VALUE,const char*,VALUE(*)(ANYARGS),int));

Does this affect startup performance? (with --disable=gems)

Great point, I'll investigate.

  • Ruby binaries built with O3 and debug symbols come in at
    just short of 18MB, or roughly 9 hugepages on linux. PHP core
    developers were able to squeeze a few % by remapping code to
    hugepages on supported systems -
    http://developers-club.com/posts/270685/ . Implementation
    here

Cool. While THP is a disaster for CoW and anonymous mappings;
it makes perfect sense for large read-only mappings

Yeah, does terrible things - the way this would work is reserving 20MB worth more or less with /proc/sys/vm/nr_hugepages (1 value of 10 in there) just for the ruby binary.

Now, the question is; shouldn't this be done in ld-linux.so
(part of glibc) instead?

Possible, but there's already https://github.com/libhugetlbfs/libhugetlbfs (more specifically https://github.com/libhugetlbfs/libhugetlbfs/blob/master/HOWTO#L370 ) Maybe an --enable-libhugetlbfs config option. That also exposes an API for mapping bytecode to huge pages, if it's possible to have a reserved heap for iseqs. Anyways, drifting off topic ...

Updated by ko1 (Koichi Sasada) about 7 years ago Actions #6 [ruby-core:88615]

I agree to introduce COLD attribute.


TLB (translation lookaside buffer) specific

Do PHP people specify THP with DWARF knowledge?

Updated by Hanmac (Hans Mackowiak) about 7 years ago Actions #7 [ruby-core:88618]

what happens when i would call such cold function outside of Init_xxx ?

what functions in my own binding should be using cold?
like for example i have a own _attr_method function that mimic a attribute by setting a Get/Set function

Updated by methodmissing (Lourens Naudé) about 7 years ago Actions #8 [ruby-core:88620]

ko1 (Koichi Sasada) wrote:

I agree to introduce COLD attribute.

I was wondering about Eric's suggestion about the convention for using it though. I like the Kernel style of COLDFUNC void some_func() or COLD void some_func() more than this style we have now enforced in the contributor docs:

COLDFUNC(void Init_Array(void)); void Init_Array(void) 

Also thoughts on an explicit INITFUNC macro for extension authors + documentation in the extensions guide as a forward looking contract for init functions?


TLB (translation lookaside buffer) specific

Do PHP people specify THP with DWARF knowledge?

THP turned off, explicitly maps the read only sections to some of a pool of pre-allocated huge pages - around 10 x 2MB for ruby to have some buffer. as binaries with O3 and debug symbols are weighing in @ 18MB now. I have a work in progress branch with CFLAGS -B /usr/share/libhugetlbfs, -Wl,--hugetlbfs-align but unfortunately having issues with the -B flag + gcc 7.3 (bug in Bionic). I got likwid up and running yesterday (master supports recent kabylakes) and have ballbark figures of optcarrot being around 5% Instruction TLB bound, with a miss overhead estimate of around 30 cycles.

For this COLD attribute change, I still need to:

  • Assert no negative effect on startup performance (with --disable=gems)
  • Another pass through likwid with the icache group (and pin to a specific CPU core to reduce noise) with optcarrot
  • Ditto for Intel VTune
  • Add the results to the original PR

The goal with this change is to introduce a mechanism for better code layout, then follow up with further incremental work to reduce the instruction cache footprint and improve branch prediction of callsites for rb_bug and other super rare invocations.

Updated by methodmissing (Lourens Naudé) about 7 years ago Actions #9 [ruby-core:88621]

Hanmac (Hans Mackowiak) wrote:

what happens when i would call such cold function outside of Init_xxx ?

I considered primarily only core for this change and carefully looked @ usage of the extension API in Init_xxx functions, but for sure there's going to be edges in the wild with other extensions. I only chose to override the public API functions that you won't call much in a hot path either (or shouldn't). Also optimized for size VS performance, sometimes a function with smaller code size can execute faster than an optimized one.

what functions in my own binding should be using cold?
like for example i have a own _attr_method function that mimic a attribute by setting a Get/Set function

I can document likely candidates (what to look for, to consider - most would have a strong system specific or exceptions error path, depending on what's being wrapped) in the extensions guide as part of this change, if that works ...

Updated by naruse (Yui NARUSE) about 7 years ago Actions #10 [ruby-core:88646]

methodmissing (Lourens Naudé) wrote:

ko1 (Koichi Sasada) wrote:

I agree to introduce COLD attribute.

I was wondering about Eric's suggestion about the convention for using it though. I like the Kernel style of COLDFUNC void some_func() or COLD void some_func() more than this style we have now enforced in the contributor docs:

COLDFUNC(void Init_Array(void)); void Init_Array(void) 

It uses function style macro because some function attribute of MSVC are valid only if written as suffix.
Therefore about coldfunc, just constant style is ok.

Follwing patch or just use new macro other than RUBY_FUNC_ATTRIBUTE..

diff --git a/configure.ac b/configure.ac index a48eda1532..a6cd0b4928 100644 --- a/configure.ac +++ b/configure.ac @@ -1302,6 +1302,7 @@ AS_IF([test "$rb_cv_have_alignof" != no], [ RUBY_FUNC_ATTRIBUTE(__const__, CONSTFUNC) RUBY_FUNC_ATTRIBUTE(__pure__, PUREFUNC) +RUBY_FUNC_ATTRIBUTE(__cold__, COLDFUNC) RUBY_FUNC_ATTRIBUTE(__noreturn__, NORETURN) RUBY_FUNC_ATTRIBUTE(__deprecated__, DEPRECATED) RUBY_FUNC_ATTRIBUTE(__deprecated__("by "@%:@n), DEPRECATED_BY(n,x), rb_cv_func_deprecated_by) diff --git a/tool/m4/ruby_decl_attribute.m4 b/tool/m4/ruby_decl_attribute.m4 index f15947c0ab..87f3242a96 100644 --- a/tool/m4/ruby_decl_attribute.m4 +++ b/tool/m4/ruby_decl_attribute.m4 @@ -35,6 +35,10 @@ done ])]) AS_IF([test "$rbcv" != x], [ RUBY_DEFINE_IF(m4_ifval([$4],[${rbcv_cond}]), attrib[](attrib_params)[], $rbcv) + AS_CASE(["$mac"], + ["__attribute__ ((attrib_code)) x"|"__declspec(attrib_code) x"], [ + RUBY_DEFINE_IF(m4_ifval([$4],[${rbcv_cond}]), attrib[_PREFIX], ${rbcv% x}) + ]) ]) m4_ifval([$4], [unset rbcv_cond]) dnl m4_popdef([attrib_params])dnl 

Updated by methodmissing (Lourens Naudé) about 7 years ago Actions #11 [ruby-core:88655]

naruse (Yui NARUSE) wrote:

methodmissing (Lourens Naudé) wrote:

ko1 (Koichi Sasada) wrote:

I agree to introduce COLD attribute.

I was wondering about Eric's suggestion about the convention for using it though. I like the Kernel style of COLDFUNC void some_func() or COLD void some_func() more than this style we have now enforced in the contributor docs:

COLDFUNC(void Init_Array(void)); void Init_Array(void) 

It uses function style macro because some function attribute of MSVC are valid only if written as suffix.
Therefore about coldfunc, just constant style is ok.

Thanks for clarifying the context there. I looked @ MSVC attributes / annotations and cold (or anything similar) is not supported, thus for cold and hot attributes constant style as prefix only should be fine.

Any preferences on these for constant style attribute annotation?

  • COLD - clearly expressed, but not namespaced and there's a possibility for bindings already having such a macro defined
  • COLDFUNC - as above, but less chance of collision
  • RB_COLD - as above, but aligns better with existing macros (RB_ prefix)
  • FUNC_COLDCALL - for alignment with FUNC_FASTCALL (although very much specific to VM code paths only, and MSVC specific)

Follwing patch or just use new macro other than RUBY_FUNC_ATTRIBUTE..

diff --git a/configure.ac b/configure.ac index a48eda1532..a6cd0b4928 100644 --- a/configure.ac +++ b/configure.ac @@ -1302,6 +1302,7 @@ AS_IF([test "$rb_cv_have_alignof" != no], [ RUBY_FUNC_ATTRIBUTE(__const__, CONSTFUNC) RUBY_FUNC_ATTRIBUTE(__pure__, PUREFUNC) +RUBY_FUNC_ATTRIBUTE(__cold__, COLDFUNC) RUBY_FUNC_ATTRIBUTE(__noreturn__, NORETURN) RUBY_FUNC_ATTRIBUTE(__deprecated__, DEPRECATED) RUBY_FUNC_ATTRIBUTE(__deprecated__("by "@%:@n), DEPRECATED_BY(n,x), rb_cv_func_deprecated_by) diff --git a/tool/m4/ruby_decl_attribute.m4 b/tool/m4/ruby_decl_attribute.m4 index f15947c0ab..87f3242a96 100644 --- a/tool/m4/ruby_decl_attribute.m4 +++ b/tool/m4/ruby_decl_attribute.m4 @@ -35,6 +35,10 @@ done ])]) AS_IF([test "$rbcv" != x], [ RUBY_DEFINE_IF(m4_ifval([$4],[${rbcv_cond}]), attrib[](attrib_params)[], $rbcv) + AS_CASE(["$mac"], + ["__attribute__ ((attrib_code)) x"|"__declspec(attrib_code) x"], [ + RUBY_DEFINE_IF(m4_ifval([$4],[${rbcv_cond}]), attrib[_PREFIX], ${rbcv% x}) + ]) ]) m4_ifval([$4], [unset rbcv_cond]) dnl m4_popdef([attrib_params])dnl 

Updated by methodmissing (Lourens Naudé) about 7 years ago Actions #12 [ruby-core:88748]

ko1 (Koichi Sasada) wrote:

I agree to introduce COLD attribute.


TLB (translation lookaside buffer) specific

Do PHP people specify THP with DWARF knowledge?

Somewhat off topic, but fits with the general direction here.

I have a proof of concept going with mapping the text section to hugepages after a few evenings of interesting segfaults. It uses a small custom linker script and an API that first tries to map text with:

  • MAP_HUGETLB - explicit request, requires hubgetlbfs mount and pages reserved explicitly ahead of time, will fail if that's not true
  • MADV_HUGEPAGE - fallback option, implicit request, kernel will map it implicit if not aligned to hugepage boundary, or right away if aligned to hugepage boundary (I implemented with the 2MB alignment for my system). Requires a kernel compiled with THP support (most are) and THP enabled (likewise)

It's easy to check for the support of either and also very easy to back out of the hugepage mapping attempt and just fallback to 4k pages.

The text section of trunk with -O3 is around 3.3MB, meaning it maps to 1 x 2MB huge page and then the rest to 4k standard pages. What's interesting here is the possibility of NOT mapping all of the text section, but only a more concise hot section into a single hugepage, effectively removing all ITLB overhead for the hot code paths. And further prune down the text footprint of rarely invoked functions with COLDFUNC. Anyways, just some thoughts.

Updated by normalperson (Eric Wong) about 7 years ago Actions #13 [ruby-core:88749]

wrote:

  • MADV_HUGEPAGE - fallback option, implicit request, kernel
    will map it implicit if not aligned to hugepage boundary, or
    right away if aligned to hugepage boundary (I implemented with
    the 2MB alignment for my system). Requires a kernel compiled
    with THP support (most are) and THP enabled (likewise)

Crap, I think that conflicts with the usage of
PR_SET_THP_DISABLE (for CoW-friendliness) since r63253 in trunk.
MAP_HUGETLB will still work, though.

Updated by methodmissing (Lourens Naudé) about 7 years ago Actions #14 [ruby-core:88758]

normalperson (Eric Wong) wrote:

wrote:

  • MADV_HUGEPAGE - fallback option, implicit request, kernel
    will map it implicit if not aligned to hugepage boundary, or
    right away if aligned to hugepage boundary (I implemented with
    the 2MB alignment for my system). Requires a kernel compiled
    with THP support (most are) and THP enabled (likewise)

Crap, I think that conflicts with the usage of
PR_SET_THP_DISABLE (for CoW-friendliness) since r63253 in trunk.
MAP_HUGETLB will still work, though.

Yes, explains behaviour observed through smaps - correct, MAP_HUGETLB works. Will clean it up during the week and submit as a new issue.

Updated by ko1 (Koichi Sasada) about 7 years ago Actions #15 [ruby-core:89703]

I agree to introduce cold attribute (I have no idea about the naming).

rb_bug or rb_warn (etc) should be cold functions because they should not be called.

However, I'm not sure rb_define_... and so on because it can affect startup time for big Ruby application. We need to specify with benchmark results.

Updated by methodmissing (Lourens Naudé) about 7 years ago Actions #16 [ruby-core:89706]

ko1 (Koichi Sasada) wrote:

I agree to introduce cold attribute (I have no idea about the naming).

rb_bug or rb_warn (etc) should be cold functions because they should not be called.

However, I'm not sure rb_define_... and so on because it can affect startup time for big Ruby application. We need to specify with benchmark results.

Yes, I agree - in retrospect rb_define functions would affect startup time, especially for very large Rails applications. I was waiting on further input from core for guidance as it's very easy to fall into a trap and go too wide with cold attributes. I've been using clang a lot more lately as well and think the following scope with proper benchmark qualification could work well:

  • Support clang and gcc versions with cold attr support (I did not see any support for MSVC but need to confirm again)
  • rb_bug and rb_warn (1 first class internal and 1 first class user API call)
  • Possibly rb_memerror and a few similar extreme edge case ones that cannot be used for flow control
  • That way there's also a better chance of slight branch prediction improvements along with code relocation

Does that seem reasonable as a unit of work to you?

Updated by ko1 (Koichi Sasada) about 7 years ago Actions #17 [ruby-core:89721]

Does that seem reasonable as a unit of work to you?

Perfect!

Note that we will release ruby 2.6 in Dec. If you finish (3) "Possibly rb_memerror...", it is easy to introduce. We can continue further improvement on next release.

Updated by methodmissing (Lourens Naudé) almost 7 years ago Actions #18 [ruby-core:89937]

ko1 (Koichi Sasada) wrote:

Does that seem reasonable as a unit of work to you?

Perfect!

Note that we will release ruby 2.6 in Dec. If you finish (3) "Possibly rb_memerror...", it is easy to introduce. We can continue further improvement on next release.

I added some investigation and conclusions with gcc and clang to the reduced scope PR https://github.com/ruby/ruby/pull/2005

Updated by methodmissing (Lourens Naudé) almost 7 years ago Actions #19 [ruby-core:90119]

methodmissing (Lourens Naudé) wrote:

ko1 (Koichi Sasada) wrote:

Does that seem reasonable as a unit of work to you?

Perfect!

Note that we will release ruby 2.6 in Dec. If you finish (3) "Possibly rb_memerror...", it is easy to introduce. We can continue further improvement on next release.

I added some investigation and conclusions with gcc and clang to the reduced scope PR https://github.com/ruby/ruby/pull/2005

I'm having a hard time getting MSVC to pass for rb_memerror https://ci.appveyor.com/project/ruby/ruby/builds/20571215/job/wc9ha9eg36xjiv70 with error c:\projects\ruby\include\ruby/intern.h(482) : error C4028: formal parameter 1 different from declaration (https://github.com/ruby/ruby/pull/2005)

Any thoughts much appreciated.

Updated by methodmissing (Lourens Naudé) almost 7 years ago Actions #20 [ruby-core:90272]

methodmissing (Lourens Naudé) wrote:

methodmissing (Lourens Naudé) wrote:

ko1 (Koichi Sasada) wrote:

Does that seem reasonable as a unit of work to you?

Perfect!

Note that we will release ruby 2.6 in Dec. If you finish (3) "Possibly rb_memerror...", it is easy to introduce. We can continue further improvement on next release.

I added some investigation and conclusions with gcc and clang to the reduced scope PR https://github.com/ruby/ruby/pull/2005

I'm having a hard time getting MSVC to pass for rb_memerror https://ci.appveyor.com/project/ruby/ruby/builds/20571215/job/wc9ha9eg36xjiv70 with error c:\projects\ruby\include\ruby/intern.h(482) : error C4028: formal parameter 1 different from declaration (https://github.com/ruby/ruby/pull/2005)

Any thoughts much appreciated.

Fallback to a definition in defines.h for clang and GCC 4.3.0+ only as MSVC didn't like the NORETURN + COLDFUNC combo for rb_memerror.

https://github.com/ruby/ruby/pull/2005/files#diff-b6f26affdb29910802b4d97a901fe689R92

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions #23

  • Status changed from Open to Assigned
Actions

Also available in: PDF Atom