This patch speeds up setting the backref match object by avoiding some memcopies. Take the following code for example:
"hello world"=~/hello/p$~
When the RE matches the string, we have to set the Match object in the backref global. So we would allocate a match object1 and use rb_reg_region_copy2 to make a deep copy of the stack allocated re_registers struct3 in to the newly created Ruby object. This could possibly trigger GC4, and would allocate new memory.
This patch makes a shallow copy of the re_registers struct on to the Match object allowing the match object to manage the re_registers pointer and also avoiding some calls to xmalloc and some manual memcopy.
$ ruby -v test.rb ruby 3.2.0dev (2022-07-27T22:29:00Z master 4ad69899b7) [arm64-darwin21] Ignoring bcrypt-3.1.16 because its extensions are not built. Try: gem pristine bcrypt --version 3.1.16 Warming up -------------------------------------- re hit 345.401k i/100ms re miss 673.584k i/100ms Calculating ------------------------------------- re hit 3.452M (± 0.5%) i/s - 17.270M in 5.002535s re miss 6.736M (± 0.4%) i/s - 34.353M in 5.099593s
After this patch:
$ ./ruby -v test.rb ruby 3.2.0dev (2022-08-01T21:24:12Z less-memcpy 0ff2a56606) [arm64-darwin21] Warming up -------------------------------------- re hit 419.578k i/100ms re miss 673.251k i/100ms Calculating ------------------------------------- re hit 4.201M (± 0.7%) i/s - 21.398M in 5.093593s re miss 6.716M (± 0.4%) i/s - 33.663M in 5.012756s
Matches get faster and misses maintain the same speed
Speed up setting the backref match object
This patch speeds up setting the backref match object by avoiding some
memcopies. Take the following code for example:
When the RE matches the string, we have to set the Match object in the
backref global. So we would allocate a match object1 and use
rb_reg_region_copy2 to make a deep copy of the stack allocatedre_registersstruct3 in to the newly created Ruby object. Thiscould possibly trigger GC4, and would allocate new memory.
This patch makes a shallow copy of the
re_registersstruct on to theMatch object allowing the match object to manage the
re_registerspointer and also avoiding some calls to
xmallocand some manualmemcopy.
Benchmark looks like this:
Before this patch:
After this patch:
Matches get faster and misses maintain the same speed
https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1737 ↩
https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1738 ↩
https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1686 ↩
https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L981 ↩