Actions
Feature #17115
openOptimize String#casecmp? for ASCII strings
Feature #17115: Optimize String#casecmp? for ASCII strings
Status:
Open
Assignee:
-
Target version:
-
Description
Patch: https://github.com/ruby/ruby/pull/3369
casecmp? is a kind of performance trap as it's much slower than using a case insensitive regexp or just casecmp == 0.
str = "Connection" cmp = "connection" Benchmark.ips do |x| x.report('/\A\z/i.match?') { /\Afoo\Z/i.match?(str) } x.report('casecmp?') { cmp.casecmp?(str) } x.report('casecmp') { cmp.casecmp(str) == 0 } x.compare! end Calculating ------------------------------------- /\A\z/i.match? 11.447M (± 1.3%) i/s - 57.814M in 5.051489s casecmp? 6.197M (± 0.9%) i/s - 31.138M in 5.025252s casecmp 12.753M (± 1.2%) i/s - 64.636M in 5.069195s Comparison: casecmp: 12752791.6 i/s /\A\z/i.match?: 11446996.1 i/s - 1.11x (± 0.00) slower casecmp?: 6196886.0 i/s - 2.06x (± 0.00) slower This is because, unlike the others, it is sensitive to unicode case folding.
However, there are cases where fast case insensitive equality check of known ASCII strings is useful. For instance, matching HTTP headers.
This patch checks if both strings use a single byte encoding, and if so then does a simple iterative comparison with TOLOWER(). This makes casecmp? slightly faster than casecmp == 0 when both strings are ASCII.
| |compare-ruby|built-ruby| |:-----------------------|-----------:|---------:| |casecmp-1 | 11.618M| 10.757M| | | 1.08x| -| |casecmp-10 | 1.849M| 1.723M| | | 1.07x| -| |casecmp-100 | 204.490k| 186.798k| | | 1.09x| -| |casecmp-1000 | 20.413k| 20.184k| | | 1.01x| -| |casecmp-nonascii1 | 19.541M| 20.100M| | | -| 1.03x| |casecmp-nonascii10 | 19.489M| 19.914M| | | -| 1.02x| |casecmp-nonascii100 | 19.479M| 20.155M| | | -| 1.03x| |casecmp-nonascii1000 | 19.462M| 20.064M| | | -| 1.03x| |casecmp_p-1 | 2.214M| 12.030M| | | -| 5.43x| |casecmp_p-10 | 1.373M| 2.150M| | | -| 1.57x| |casecmp_p-100 | 249.292k| 231.041k| | | 1.08x| -| |casecmp_p-1000 | 16.173k| 23.592k| | | -| 1.46x| |casecmp_p-nonascii1 | 651.921k| 650.572k| | | 1.00x| -| |casecmp_p-nonascii10 | 108.253k| 109.006k| | | -| 1.01x| |casecmp_p-nonascii100 | 11.749k| 11.889k| | | -| 1.01x| |casecmp_p-nonascii1000 | 1.140k| 1.138k| |
Updated by byroot (Jean Boussier) over 5 years ago
- Description updated (diff)
Updated by Dan0042 (Daniel DeLorme) over 5 years ago
Updated by sawa (Tsuyoshi Sawada) over 5 years ago
- Description updated (diff)
Actions