Project

General

Profile

Actions

Bug #17497

closed

Ractor performance issue

Bug #17497: Ractor performance issue

Added by marcandre (Marc-Andre Lafortune) almost 5 years ago. Updated over 4 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin18]
[ruby-core:101840]

Description

There's a strange performance issue with Ractor (at least on MacOS, didn't run on other OS).

I ran a benchmark doing 3 different types of work:

  • "fib": method calls (naive fibonacci calculation)
  • "cpu": (0...1000).inject(:+)
  • "sleep": call sleep

I get the kind of results I was excepting for the fib and for sleeping, but the results for the "cpu" workload show a problem.

It is so slow that my pure Ruby backport (using Threads) is 65x faster 😮 on my Mac Pro (despite having 6 cores). Expected results would be 6x slower, so in that case Ractor is 400x slower than it should 😿

On my MacBook (2 cores) the results are not as bad, the cpu workload is 3x faster with my pure-Ruby backport (only) instead of ~2x slower, so the factor is 6x too slow.

$ gem install backports Successfully installed backports-3.20.0 1 gem installed $ ruby ractor_test.rb <internal:ractor>:267: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues. fib: 110 ms | cpu: 22900 ms | sleep: 206 ms $ B=t ruby ractor_test.rb Using pure Ruby implementation fib: 652 ms | cpu: 337 ms | sleep: 209 ms 

Notice the sleep run takes similar time, which is good, and fib is ~6x faster on my 6-core CPU (and ~2x faster on my 2-core MacBook), again that's good as the pure ruby version uses Threads and thus runs with a single GVL.

The cpu version is the problem.

Script is here: https://gist.github.com/marcandre/bfed626e538a3d0fc7cad38dc026cf0e

Updated by MSP-Greg (Greg L) almost 5 years ago Actions #1 [ruby-core:101889]

Using various 2021-01-03 versions of master, got the following times, a few were averaged by eye:

—————————— Windows 10 mingw —————————— fib | cpu | sleep native 41 ms | 940 ms | 220 ms ruby 345 ms | 205 ms | 243 ms —————————— WSL2/Ubuntu 20.04 ————————— fib | cpu | sleep native 40 ms | 530 ms | 202 ms ruby 330 ms | 150 ms | 203 ms —————————— Windows 10 mswin —————————— fib | cpu | sleep native 88 ms | 1080 ms | 215 ms ruby 690 ms | 290 ms | 235 ms 

New, fast desktop. Always interested in comparing the three platforms/OS's. Ubuntu usually wins...

Updated by marcandre (Marc-Andre Lafortune) almost 5 years ago Actions #2 [ruby-core:101890]

Thanks for running this on other platforms.

From the numbers, it looks like you are running on an 8-core machine, right? If so, the "fib" and "sleep" numbers are what should be expected, but "cpu" is 3-5x slower on Ractor when it should be 8x times faster...

Updated by MSP-Greg (Greg L) almost 5 years ago Actions #3 [ruby-core:101893]

10 core i9. I've set up enough systems in my life (I used DOS); prefer new systems to last a while...

Updated by ko1 (Koichi Sasada) almost 5 years ago Actions #4 [ruby-core:101926]

Thank you for the report. Let me investigate more.

(just curious) why is the name "CPU"?

Updated by marcandre (Marc-Andre Lafortune) almost 5 years ago Actions #5 [ruby-core:101928]

ko1 (Koichi Sasada) wrote in #note-4:

Thank you for the report. Let me investigate more.

(just curious) why is the name "CPU"?

I wanted to compare CPU-bound processes (which should benefit from Ractor) to IO-bound processes (which should have similar benchmarks if my backport isn't too inefficient). I used sleep instead of IO because I'm lazy 😅. It's only when I saw the issue that I tested other CPU-bound methods and added "fib".

Updated by ko1 (Koichi Sasada) almost 5 years ago Actions #6 [ruby-core:101930]

flash report;

Warning[:experimental] = false if defined? Warning[] def task_inject (1..10_000_000).inject(:+) end alias task task_inject # p method(:task) MODE = (ARGV.shift || :r_parallel).to_sym TN = 4 case MODE when :serial TN.times{ task } when :r_serial exit(1) unless defined? Ractor TN.times{ Ractor.new{ task }.take } when :r_parallel exit(1) unless defined? Ractor TN.times.map{ Ractor.new{ task } }.each{|r| r.take} else raise end print "%4d" % GC.count 

and

 user system total real serial/26_mini 0 0.000000 0.000248 1.318308 ( 1.318555) serial/27_mini 0 0.000000 0.000627 1.209881 ( 1.209730) serial/master_mini 0 0.000000 0.000430 1.997904 ( 1.997656) serial/miniruby 0 0.000000 0.000254 1.723801 ( 1.723786) serial/26_ruby 0 0.000000 0.000481 1.256867 ( 1.256746) serial/27_ruby 0 0.000000 0.008709 1.098332 ( 1.098257) serial/master_ruby 0 0.000000 0.000312 1.915706 ( 1.916034) serial/ruby 0 0.000000 0.000288 1.921821 ( 1.921793) r_serial/26_mini N/A r_serial/27_mini N/A r_serial/master_mini 1 0.000000 0.000388 2.460095 ( 2.460922) r_serial/miniruby 1 0.000000 0.000359 2.784072 ( 2.784779) r_serial/26_ruby N/A r_serial/27_ruby N/A r_serial/master_ruby 1 0.000000 0.000216 2.690338 ( 2.690321) r_serial/ruby 1 0.000000 0.000237 2.982560 ( 2.983885) r_parallel/26_mini N/A r_parallel/27_mini N/A r_parallel/master_mini 1 0.000000 0.000210 23.172113 ( 6.316598) r_parallel/miniruby 1 0.000000 0.000248 25.933848 ( 7.054210) r_parallel/26_ruby N/A r_parallel/27_ruby N/A r_parallel/master_ruby 1 0.000000 0.000214 25.243151 ( 6.805798) r_parallel/ruby 1 0.000000 0.000181 28.647737 ( 7.991565) 
  • on serial execution, master is x2 slower than 2.6/2.7
  • on serial execution with quiet ractor (multi-ractor-mode), master is -x2.5 times slower than 2.6/2.7
  • on parallel execution with ractos, master is x7 slower than 2.6/2.7

Updated by ko1 (Koichi Sasada) almost 5 years ago Actions #7 [ruby-core:101931]

master_ruby and ruby was same, so not needed.

with version information:

26_mini ruby 2.6.7p148 (2020-06-14 revision 67884) [x86_64-linux] 27_mini ruby 2.7.3p139 (2020-10-11 revision d1ba554551) [x86_64-linux] miniruby ruby 3.1.0dev (2021-01-05T07:50:00Z master e91160f757) [x86_64-linux] 26_ruby ruby 2.6.7p148 (2020-06-14 revision 67884) [x86_64-linux] 27_ruby ruby 2.7.3p139 (2020-10-11 revision d1ba554551) [x86_64-linux] ruby ruby 3.1.0dev (2021-01-05T07:50:00Z master e91160f757) [x86_64-linux] method: task_range_inject user system total real serial/26_mini 0 0.000330 0.000000 1.334362 ( 1.334173) serial/27_mini 0 0.000401 0.000000 1.111259 ( 1.111157) serial/miniruby 0 0.000388 0.000000 1.803708 ( 1.803610) serial/26_ruby 0 0.000283 0.000000 1.274182 ( 1.274051) serial/27_ruby 0 0.000243 0.000000 1.102547 ( 1.102493) serial/ruby 0 0.000340 0.000000 1.892771 ( 1.892611) r_serial/26_mini N/A r_serial/27_mini N/A r_serial/miniruby 1 0.000271 0.000000 2.484844 ( 2.485867) r_serial/26_ruby N/A r_serial/27_ruby N/A r_serial/ruby 1 0.000308 0.000000 2.736252 ( 2.737059) r_parallel/26_mini N/A r_parallel/27_mini N/A r_parallel/miniruby 1 0.000232 0.000000 20.958254 ( 5.828039) r_parallel/26_ruby N/A r_parallel/27_ruby N/A r_parallel/ruby 1 0.000396 0.000000 22.323984 ( 6.165640) 

Updated by ko1 (Koichi Sasada) almost 5 years ago Actions #8 [ruby-core:101932]

with perf (with --call-graph dwarf) option, I may figure out the big difference:

master: - 63.72% 6.23% miniruby miniruby [.] inject_op_i - 57.49% inject_op_i - 55.13% rb_funcallv_public - rb_call (inlined) - 26.94% rb_call0 + 17.69% rb_callable_method_entry_with_refinements + 3.30% stack_check (inlined) 2.75% rb_method_call_status (inlined) + 1.03% rb_class_of (inlined) - 25.91% rb_vm_call0 - vm_call0_body (inlined) + 13.27% vm_call0_cfunc (inlined) 0.50% vm_passed_block_handler (inlined) 0.50% rb_vm_check_ints (inlined) 1.21% rb_current_execution_context (inlined) 0.59% rb_vm_call_kw 1.73% rb_sym2id 0.63% rb_enum_values_pack + 4.38% _start + 0.53% 0x564cbc05b517 ruby 2.7: - 46.93% 9.09% miniruby miniruby [.] inject_op_i - 37.85% inject_op_i - 34.18% rb_funcallv_with_cc - 23.15% vm_call0_body + 17.07% vm_call0_cfunc (inlined) 0.81% rb_vm_check_ints (inlined) + 4.22% vm_search_method (inlined) 2.94% rb_sym2id 0.73% rb_enum_values_pack + 8.13% _start 

Updated by marcandre (Marc-Andre Lafortune) almost 5 years ago Actions #9 [ruby-core:101934]

Just to be clear, there may be two different issues:

  1. Ruby 2.x vs Ruby 3.0 performance regression. This can be important to figure out, but is not why I opened this issue.

  2. Threads vs Ractor in Ruby 3.0. My tests are all in Ruby 3.0. This is what this issue is about.

Updated by inversion (Yura Babak) almost 5 years ago Actions #10 [ruby-core:101963]

I also made 2 posts about strange performance testing results (with sources) and some conclusions.
In my case, 2 ractors work 3-times longer than doing the same payload in the main thread.

https://www.reddit.com/r/ruby/comments/kpmt73/ruby_30_ractors_performance_test_strange_results/
https://www.reddit.com/r/ruby/comments/krq5xe/when_ruby3_ractors_are_good_and_when_not_yet/

The test:

require 'digest/sha2' N = 1_500_000 def workload N.times do |n| Digest::SHA2.base64digest(n.to_s) end end puts 'in the main thread' t_start = Time.now workload puts "Total: %.3f" % (Time.now - t_start) puts worker1 = Ractor.new do Ractor.receive print '['; workload; ']' end worker2 = Ractor.new do Ractor.receive print '['; workload; ']' end puts 'in 2 ractors' t_start = Time.now worker1.send 'start' worker2.send 'start' print '=' print worker1.take print worker2.take puts puts "Total: %.3f" % (Time.now - t_start) 

Updated by keithrbennett (Keith Bennett) almost 5 years ago Actions #11 [ruby-core:102141]

I too have seen strange results testing ractors. I used the code at https://github.com/keithrbennett/keithrbennett-ractor-test/blob/master/my_ractor.rb to do some arbitrary but predictable work. I have a 24-core Ryzen 9 CPU, and I compared using 1 ractor with using 24. With 24, htop reported that all the CPU's were at 100% most of the time, yet the elapsed time using 24 CPU's was only about a third less than when using 1 CPU. Also, the CPU's seemed to be working collectively about ten times harder with 24 CPU's. Here is the program output:

1 CPU: Many HTOP readings are < 100% for all CPU's time ractor/my_ractor.rb ruby '*.rb' Running the following command to find all filespecs to process: find -L ruby -type f -name '*.rb' -print Processing 8218 files in 1 slices, whose sizes are: [8218] ractor/my_ractor.rb ruby '*.rb' 2513.90s user 6.75s system 99% cpu 42:03.01 total 24 CPU's: % time ractor/my_ractor.rb ruby '*.rb' ; espeak finished Running the following command to find all filespecs to process: find -L ruby -type f -name '*.rb' -print Processing 8218 files in 24 slices, whose sizes are: [343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 343, 329] ractor/my_ractor.rb ruby '*.rb' 22986.42s user 14.98s system 1134% cpu 33:47.96 total 

(In the command, ruby refers to the directory in which I've cloned the Github Ruby repo.)

Here is the current content of the test program:

#!/usr/bin/env ruby require 'amazing_print' require 'etc' require 'set' require 'shellwords' require 'yaml' raise "This script requires Ruby version 3 or later." unless RUBY_VERSION.split('.').first.to_i >= 3 # An instance of this parser class is created for each ractor. class RactorParser attr_reader :dictionary_words def initialize(dictionary_words) @dictionary_words = dictionary_words end def parse(filespecs) filespecs.inject(Set.new) do |found_words, filespec| found_words | process_one_file(filespec) end end private def word?(string) dictionary_words.include?(string) end private def strip_punctuation(string) punctuation_regex = /[[:punct:]]/ string.gsub(punctuation_regex, ' ') end private def file_lines(filespec) command = "strings #{Shellwords.escape(filespec)}" text = `#{command}` strip_punctuation(text).split("\n") end private def line_words(line) line.split.map(&:downcase).select { |text| word?(text) } end private def process_one_file(filespec) file_words = Set.new file_lines(filespec).each do |line| line_words(line).each { |word| file_words << word } end # puts "Found #{file_words.count} words in #{filespec}." file_words end end class Main BASEDIR = ARGV[0] || '.' FILEMASK = ARGV[1] CPU_COUNT = Etc.nprocessors def call check_arg_count slices = get_filespec_slices ractors = create_and_populate_ractors(slices) all_words = collate_ractor_results(ractors) yaml = all_words.to_a.sort.to_yaml File.write('ractor-words.yaml', yaml) puts "Words are in ractor-words.yaml." end private def check_arg_count if ARGV.length > 2 puts "Syntax is ractor [base_directory] [filemask], and filemask must be quoted so that the shell does not expand it." exit -1 end end private def collate_ractor_results(ractors) ractors.inject(Set.new) do |all_words, ractor| all_words | ractor.take end end private def get_filespec_slices all_filespecs = find_all_filespecs slice_size = (all_filespecs.size / CPU_COUNT) + 1 # slice_size = all_filespecs.size # use this line instead of previous to test with 1 ractor slices = all_filespecs.each_slice(slice_size).to_a puts "Processing #{all_filespecs.size} files in #{slices.size} slices, whose sizes are:\n#{slices.map(&:size).inspect}" slices end private def create_and_populate_ractors(slices) words = File.readlines('/usr/share/dict/words').map(&:chomp).map(&:downcase).sort slices.map do |slice| ractor = Ractor.new do filespecs = Ractor.receive dictionary_words = Ractor.receive RactorParser.new(dictionary_words).parse(filespecs) end ractor.send(slice) ractor.send(words) ractor end end private def find_all_filespecs filemask = FILEMASK ? %Q{-name '#{FILEMASK}'} : '' command = "find -L #{BASEDIR} -type f #{filemask} -print" puts "Running the following command to find all filespecs to process: #{command}" `#{command}`.split("\n") end end Main.new.call 

Updated by keithrbennett (Keith Bennett) almost 5 years ago Actions #12 [ruby-core:102270]

I've updated the software I used to measure this, and moved it to https://github.com/keithrbennett/keithrbennett-ractor-test.

Updated by ko1 (Koichi Sasada) almost 5 years ago Actions #13

  • Status changed from Open to Closed

Applied in changeset git|1ecda213668644d656eb0d60654737482447dd92.


global call-cache cache table for rb_funcall*

rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.

Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.

This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]

Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.

Updated by ko1 (Koichi Sasada) almost 5 years ago Actions #14 [ruby-core:102286]

quoted from https://github.com/ruby/ruby/pull/4129#issuecomment-769613184


call the following methods as a task:

def task_range_inject (1..20_000_000).inject(:+) end 

with

  • 4 times sequentially
  • 4 times sequentially with a sleeping ractor
  • 4 ractors in parallel

on

26_mini ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux] 27_mini ruby 2.7.3p140 (2020-12-09 revision 9b884df6dd) [x86_64-linux] master_mini ruby 3.1.0dev (2021-01-29T05:27:53Z master 9241211538) [x86_64-linux] miniruby ruby 3.1.0dev (2021-01-29T06:21:39Z gh-4129 f996e15ff6) [x86_64-linux] 26_ruby ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux] 27_ruby ruby 2.7.3p139 (2020-10-11 revision d1ba554551) [x86_64-linux] master_ruby ruby 3.1.0dev (2021-01-29T05:27:53Z master 9241211538) [x86_64-linux] ruby ruby 3.1.0dev (2021-01-29T06:21:39Z gh-4129 f996e15ff6) [x86_64-linux] 

result:

 user system total real serial/26_mini 0 0.000145 0.000039 2.782685 ( 2.783691) serial/27_mini 0 0.000133 0.000036 2.320257 ( 2.320305) serial/master_mini 0 0.000141 0.000039 3.756926 ( 3.756963) serial/miniruby 0 0.000136 0.000037 2.598088 ( 2.598126) serial/26_ruby 0 0.000143 0.000038 2.695175 ( 2.704443) serial/27_ruby 0 0.000139 0.000038 2.391679 ( 2.401067) serial/master_ruby 0 0.000139 0.000038 4.391577 ( 4.391626) serial/ruby 0 0.000128 0.000035 3.109923 ( 3.109991) r_serial/26_mini N/A r_serial/27_mini N/A r_serial/master_mini 1 0.000146 0.000040 5.133049 ( 5.133056) r_serial/miniruby 1 0.000133 0.000037 2.597336 ( 2.597300) r_serial/26_ruby N/A r_serial/27_ruby N/A r_serial/master_ruby 1 0.000147 0.000040 5.910876 ( 5.910907) r_serial/ruby 1 0.000135 0.000037 2.875752 ( 2.875727) r_parallel/26_mini N/A r_parallel/27_mini N/A r_parallel/master_mini 1 0.000100 0.000028 39.297123 ( 10.160359) r_parallel/miniruby 1 0.000110 0.000030 2.703400 ( 0.695634) r_parallel/26_ruby N/A r_parallel/27_ruby N/A r_parallel/master_ruby 1 0.000122 0.000034 38.941810 ( 10.072950) r_parallel/ruby 1 0.000131 0.000036 2.980137 ( 0.757672) 

Updated by ko1 (Koichi Sasada) almost 5 years ago Actions #15 [ruby-core:102287]

inversion (Yura Babak) wrote in #note-10:

I also made 2 posts about strange performance testing results (with sources) and some conclusions.
In my case, 2 ractors work 3-times longer than doing the same payload in the main thread.

With digest benchmark:

 user system total real serial/26_ruby 0 0.000239 0.000079 2.497317 ( 2.503279) serial/27_ruby 0 0.000972 0.000324 2.306552 ( 2.310275) serial/master_ruby 0 0.000293 0.000098 3.776824 ( 3.776623) serial/ruby 0 0.000190 0.000063 2.668395 ( 2.668287) r_serial/26_ruby N/A r_serial/27_ruby N/A r_serial/master_ruby 1 0.000407 0.000000 5.579597 ( 5.579969) r_serial/ruby 1 0.000476 0.000000 2.682626 ( 2.683627) r_parallel/26_ruby N/A r_parallel/27_ruby N/A r_parallel/master_ruby 1 0.000242 0.000000 48.298173 ( 12.959255) r_parallel/ruby 1 0.000242 0.000000 3.782164 ( 0.984832) 

seems solved.

Updated by ko1 (Koichi Sasada) almost 5 years ago Actions #16 [ruby-core:102288]

keithrbennett (Keith Bennett) wrote in #note-11:

I too have seen strange results testing ractors. I used the code at https://github.com/keithrbennett/keithrbennett-ractor-test/blob/master/my_ractor.rb to do some arbitrary but predictable work. I have a 24-core Ryzen 9 CPU, and I compared using 1 ractor with using 24. With 24, htop reported that all the CPU's were at 100% most of the time, yet the elapsed time using 24 CPU's was only about a third less than when using 1 CPU. Also, the CPU's seemed to be working collectively about ten times harder with 24 CPU's. Here is the program output:

could you check it again?

Updated by keithrbennett (Keith Bennett) almost 5 years ago Actions #17 [ruby-core:102339]

@ko1 (Koichi Sasada) - My apologies for not responding sooner. I guess I have not configured this forum correctly to receive notifications, I'll look into that.

I've tested my benchmark against Ruby head, and performance with multiple cores seem to have degraded. Perhaps I have made an error in my approaches, I don't know. I will paste my results below. My OS is "Ubuntu 20.04.2 LTS" (Kubuntu).

In case it's useful, I've made my script easier to use; it automatically tests and compares 1 ractor with (CPU_count) ractors. You can find it at https://github.com/keithrbennett/keithrbennett-ractor-test/blob/master/ractor-file-strings-test.rb. Information about configuring it, what it does, etc., is included in comments at the top of the script.

 Small Data Set: Ruby 3.0.0: 1 CPU 24 CPU's Factor ---------------------------------------------------------------- User 6.75500 52.34200 7.74863 System 0.00000 0.02800 0.00415 Total 6.83900 52.40700 7.66296 Real 6.83300 4.10500 0.60076 2021-01-31 Ruby head (ruby 3.1.0dev (2021-01-31T09:48:28Z master 22b8ddfd10) [x86_64-linux]): 1 CPU 24 CPU's Factor ---------------------------------------------------------------- User 6.18000 56.27400 9.10583 System 0.00400 0.02800 0.00453 Total 6.26800 56.34200 8.98883 Real 6.26100 4.26200 0.68072 ================================================================ Larger Data Set: Ruby 3.0.0: 1 CPU 24 CPU's Factor ---------------------------------------------------------------- User 51.01000 499.67200 9.79557 System 0.04000 0.25900 0.00508 Total 51.32300 500.12900 9.74473 Real 51.31200 45.56600 0.88802 2021-01-31 Ruby head (ruby 3.1.0dev (2021-01-31T09:48:28Z master 22b8ddfd10) [x86_64-linux]): 1 CPU 24 CPU's Factor ---------------------------------------------------------------- User 47.08900 486.34400 10.32819 System 0.03200 0.20300 0.00431 Total 47.39500 486.74800 10.27003 Real 47.38400 43.95100 0.92755 

Updated by ko1 (Koichi Sasada) over 4 years ago Actions #18 [ruby-core:102467]

  • Status changed from Closed to Assigned

keithrbennett (Keith Bennett) wrote in #note-17:

I've tested my benchmark against Ruby head, and performance with multiple cores seem to have degraded. Perhaps I have made an error in my approaches, I don't know. I will paste my results below. My OS is "Ubuntu 20.04.2 LTS" (Kubuntu).

I confirmed with the following script

WORDS = Ractor.make_shareable File.readlines('/usr/share/dict/words').map(&:chomp).map(&:downcase).sort def try File.readlines(__dir__ + '/compar.c').each{|line| line.split.map(&:downcase).select { |text| WORDS.include? text } } end Warning[:experimental] = false require 'benchmark' Benchmark.bm{|x| x.report{ 4.times{try} } x.report{ 4.times.map{ Ractor.new{ try } }.each(&:take) } } __END__ user system total real 4.501388 0.001541 4.502929 ( 4.502980) 16.763446 0.000018 16.763464 ( 4.335964) 

It compare with sequential 4 times try method and 4 times try methods on ractors in parallel.
To compare with real, 4.5 vs 4.3 sec. It is not slow, but not first with 4 cores.

The reason seems WORDS.include? text. I'll investigate more.

Updated by ko1 (Koichi Sasada) over 4 years ago Actions #19

  • Status changed from Assigned to Closed

Applied in changeset git|813fe4c256f89babebb8ab53821ae5eb6bb138c6.


opt_equality_by_mid for rb_equal_opt

This patch improves the performance of sequential and parallel
execution of rb_equal() (and rb_eql()).
[Bug #17497]

rb_equal_opt (and rb_eql_opt) does not have own cd and it waste
a time to initialize cd. This patch introduces opt_equality_by_mid()
to check equality without cd.

Furthermore, current master uses "static" cd on rb_equal_opt
(and rb_eql_opt) and it hurts CPU caches on multi-thread execution.
Now they are gone so there are no bottleneck on parallel execution.

Updated by ko1 (Koichi Sasada) over 4 years ago Actions #20 [ruby-core:102474]

maybe git|813fe4c256f89babebb8ab53821ae5eb6bb138c6 solved the issue.
could you check it?

Updated by ko1 (Koichi Sasada) over 4 years ago Actions #21

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN to 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: REQUIRED

Updated by keithrbennett (Keith Bennett) over 4 years ago Actions #22 [ruby-core:102510]

Koichi -

Huge improvement! Thank you!

I installed Ruby head and now have the following output from ruby -v:

ruby 3.1.0dev (2021-02-15T09:29:35Z master 37b90bcdc1) [x86_64-linux] 

I made minor modifications to your script (see https://gist.github.com/keithrbennett/18f10124354d62eb8ba5feafaa9b39dc) and then ran it in the Ruby project root directory and got the following results:

On my Linux (Kubuntu 20.04.2) desktop:

Measuring first sequentially on main ractor and then with 24 ractors: user system total real 14.969907 0.003891 14.973798 ( 14.977699) 29.087580 0.051934 29.139514 ( 1.243316) 0.515 User time difference factor 12.047 Real time difference factor 

And then on my 2015 Mac:

Measuring first sequentially on main ractor and then with 4 ractors: user system total real 10.477194 0.047028 10.524222 ( 10.605862) 18.226199 0.068098 18.294297 ( 5.101498) 0.575 User time difference factor 2.079 Real time difference factor 

It's interesting that the real time difference factor on both machines is so close to ((the number of CPU's and ractors) / 2.0).

The original script I used to test (at https://github.com/keithrbennett/keithrbennett-ractor-test/blob/master/ractor-file-strings-test.rb) was not very good at distributing work among the ractors equally, and this made the real time observations less reliable, since the real time was really the real time of the longest running ractor. Your script is much better in that way. It would be interesting to test more parts of the standard library though, such as the Set instantiations and merges I had used; if I have time I'll look into that.


P.S. Sorry it took so long to respond; given that notifications don't seem to work, I need to develop a habit of manually checking here every day.

Updated by naruse (Yui NARUSE) over 4 years ago Actions #23 [ruby-core:102825]

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: REQUIRED to 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: DONE

ruby_3_0 de6072a22edbaab3793cf7f976cc9e0118d0df40 merged revision(s) abdc634f64a440afcdc7f23c9757d27aab4db8a9,083c5f08ec4e95c9b75810d46f933928327a5ab3,1ecda213668644d656eb0d60654737482447dd92,813fe4c256f89babebb8ab53821ae5eb6bb138c6.

Actions

Also available in: PDF Atom