Skip to content

Conversation

@naitoh
Copy link
Contributor

@naitoh naitoh commented Mar 2, 2024

Why

rexml/lib/rexml/source.rb

Lines 208 to 234 in 370666e

def readline
str = @source.readline(@line_break)
if @pending_buffer
if str.nil?
str = @pending_buffer
else
str = @pending_buffer + str
end
@pending_buffer = nil
end
return nil if str.nil?
if @to_utf
decode(str)
else
str.force_encoding(::Encoding::UTF_8) if @force_utf8
str
end
end
def encoding_updated
case @encoding
when "UTF-16BE", "UTF-16LE"
@source.binmode
@source.set_encoding(@encoding, @encoding)
end
@line_break = encode(">")

Because @line_break = encode(">"), the end of @scanner << readline is one of the following.

  1. ">"
  2. "X>"
  3. "X" (eof)

This will not be matched by additional reads in the following cases.

  • @source.match("<?")
  • @source.match("--")
  • @source.match("DOCTYPE")

In the following cases, additional reads may result in a match, but explicitly prohibiting such a specification with a comment makes the string length check unnecessary.

  • @source.match(">>")
  • @source.match(">X")

Benchmark

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 10.689 10.736 18.484 18.108 i/s - 100.000 times in 9.355754s 9.314792s 5.409984s 5.522527s sax 30.793 31.583 52.965 52.641 i/s - 100.000 times in 3.247486s 3.166258s 1.888036s 1.899660s pull 36.308 37.182 63.773 64.669 i/s - 100.000 times in 2.754203s 2.689440s 1.568069s 1.546325s stream 34.936 35.991 56.830 57.729 i/s - 100.000 times in 2.862361s 2.778467s 1.759632s 1.732238s Comparison: dom before(YJIT): 18.5 i/s after(YJIT): 18.1 i/s - 1.02x slower after: 10.7 i/s - 1.72x slower before: 10.7 i/s - 1.73x slower sax before(YJIT): 53.0 i/s after(YJIT): 52.6 i/s - 1.01x slower after: 31.6 i/s - 1.68x slower before: 30.8 i/s - 1.72x slower pull after(YJIT): 64.7 i/s before(YJIT): 63.8 i/s - 1.01x slower after: 37.2 i/s - 1.74x slower before: 36.3 i/s - 1.78x slower stream after(YJIT): 57.7 i/s before(YJIT): 56.8 i/s - 1.02x slower after: 36.0 i/s - 1.60x slower before: 34.9 i/s - 1.65x slower 
  • YJIT=ON : 0.98x - 1.02x faster
  • YJIT=OFF : 1.00x - 1.03x faster
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

end
end

# Note: "When specifying a string for 'pattern', it must not include '>' except in the following format."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Note: "When specifying a string for 'pattern', it must not include '>' except in the following format."
# Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
@kou kou changed the title Remove unnecessary string length comparisons in the case of string comparisons source: Remove unnecessary string length comparisons in the case of string comparisons Mar 3, 2024
…tring comparisons ## Why https://github.com/ruby/rexml/blob/370666e314816b57ecd5878e757224c3b6bc93f5/lib/rexml/source.rb#L208-L234 Because `@line_break = encode(">")`, the end of `@scanner << readline` is one of the following. 1. ">" 2. "X>" 3. "X" (eof) This will not be matched by additional reads in the following cases. - `@source.match("<?")` - `@source.match("--")` - `@source.match("DOCTYPE")` In the following cases, additional reads may result in a match, but explicitly prohibiting such a specification with a comment makes the string length check unnecessary. - `@source.match(">>")` - `@source.match(">X")` ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 10.689 10.736 18.484 18.108 i/s - 100.000 times in 9.355754s 9.314792s 5.409984s 5.522527s sax 30.793 31.583 52.965 52.641 i/s - 100.000 times in 3.247486s 3.166258s 1.888036s 1.899660s pull 36.308 37.182 63.773 64.669 i/s - 100.000 times in 2.754203s 2.689440s 1.568069s 1.546325s stream 34.936 35.991 56.830 57.729 i/s - 100.000 times in 2.862361s 2.778467s 1.759632s 1.732238s Comparison: dom before(YJIT): 18.5 i/s after(YJIT): 18.1 i/s - 1.02x slower after: 10.7 i/s - 1.72x slower before: 10.7 i/s - 1.73x slower sax before(YJIT): 53.0 i/s after(YJIT): 52.6 i/s - 1.01x slower after: 31.6 i/s - 1.68x slower before: 30.8 i/s - 1.72x slower pull after(YJIT): 64.7 i/s before(YJIT): 63.8 i/s - 1.01x slower after: 37.2 i/s - 1.74x slower before: 36.3 i/s - 1.78x slower stream after(YJIT): 57.7 i/s before(YJIT): 56.8 i/s - 1.02x slower after: 36.0 i/s - 1.60x slower before: 34.9 i/s - 1.65x slower ``` - YJIT=ON : 0.98x - 1.02x faster - YJIT=OFF : 1.00x - 1.03x faster
@naitoh naitoh force-pushed the del_bytesize_check branch from 84ba8bd to 261e10b Compare March 3, 2024 09:17
@naitoh naitoh requested a review from kou March 3, 2024 09:17
@kou kou merged commit 19975fe into ruby:master Mar 3, 2024
@naitoh naitoh deleted the del_bytesize_check branch March 3, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants