Skip to content

Conversation

@naitoh
Copy link
Contributor

@naitoh naitoh commented Aug 20, 2024

Why?

Pull parser expands character references and predefined entity references, but doesn't expand user-defined entity references.

Change

  • text_stream_unnormalize.rb
$LOAD_PATH.unshift(File.expand_path("lib")) require 'rexml/document' require 'rexml/parsers/sax2parser' require 'rexml/parsers/pullparser' require 'rexml/parsers/streamparser' require 'rexml/streamlistener' xml = <<EOS <!DOCTYPE foo [ <!ENTITY la "1234"> <!ENTITY lala "--&la;--"> <!ENTITY lalal "&la;&la;"> ]><root><la>&la;</la><lala>&lala;</lala><a>&lt;P&gt; &lt;I&gt; &lt;B&gt; Text &lt;/B&gt; &lt;/I&gt;</a><b>test&#8482;</b></root> EOS class StListener include REXML::StreamListener def text(text) puts text end end puts "REXML(DOM)" REXML::Document.new(xml).elements.each("/root/*") {|element| puts element.text} puts "" puts "REXML(Pull)" parser = REXML::Parsers::PullParser.new(xml) while parser.has_next? event = parser.pull case event.event_type when :text puts event[1] end end puts "" puts "REXML(Stream)" parser = REXML::Parsers::StreamParser.new(xml, StListener.new).parse puts "" puts "REXML(SAX)" sax = REXML::Parsers::SAX2Parser.new(xml) sax.listen(:characters) {|x| puts x } sax.parse 

Before (master)

$ ruby text_stream_unnormalize.rb REXML(DOM) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(Pull) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(Stream) &la; #<= This &lala; #<= This <P> <I> <B> Text </B> </I> test™ REXML(SAX) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ 

After(This PR)

$ ruby text_stream_unnormalize.rb REXML(DOM) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(Pull) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(Stream) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(SAX) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ 
naitoh added 4 commits August 21, 2024 07:41
…eferences for "text" ## Why? Pull parser expands character references and predefined entity references, but doesn't expand user-defined entity references. ## Change - text_stream_unnormalize.rb ``` $LOAD_PATH.unshift(File.expand_path("lib")) require 'rexml/document' require 'rexml/parsers/sax2parser' require 'rexml/parsers/pullparser' require 'rexml/parsers/streamparser' require 'rexml/streamlistener' xml = <<EOS <!DOCTYPE foo [ <!ENTITY la "1234"> <!ENTITY lala "--&la;--"> <!ENTITY lalal "&la;&la;"> ]><root><la>&la;</la><lala>&lala;</lala><a>&lt;P&gt; &lt;I&gt; &lt;B&gt; Text &lt;/B&gt; &lt;/I&gt;</a><b>test&#8482;</b></root> EOS class StListener include REXML::StreamListener def text(text) puts text end end puts "REXML(DOM)" REXML::Document.new(xml).elements.each("/root/*") {|element| puts element.text} puts "" puts "REXML(Pull)" parser = REXML::Parsers::PullParser.new(xml) while parser.has_next? event = parser.pull case event.event_type when :text puts event[1] end end puts "" puts "REXML(Stream)" parser = REXML::Parsers::StreamParser.new(xml, StListener.new).parse puts "" puts "REXML(SAX)" sax = REXML::Parsers::SAX2Parser.new(xml) sax.listen(:characters) {|x| puts x } sax.parse ``` ## Before (master) ``` $ ruby text_stream_unnormalize.rb REXML(DOM) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(Pull) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(Stream) &la; #<= This &lala; #<= This <P> <I> <B> Text </B> </I> test™ REXML(SAX) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ ``` After(This PR) ``` $ ruby text_stream_unnormalize.rb REXML(DOM) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(Pull) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(Stream) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ REXML(SAX) 1234 --1234-- <P> <I> <B> Text </B> </I> test™ ```
## Why? See: - ruby#187 - ruby#195 ## Change - Supported `REXML::Security.entity_expansion_limit=` in Stream parser - Supported `REXML::Security.entity_expansion_text_limit=` in Stream parser
## Why? Because `StreamParser#entity_expansion_count` was added.
@naitoh naitoh marked this pull request as ready for review August 20, 2024 23:04
@naitoh naitoh requested a review from kou August 21, 2024 00:54
@kou kou merged commit 6109e01 into ruby:master Aug 21, 2024
@kou
Copy link
Member

kou commented Aug 21, 2024

Thanks.

@naitoh naitoh deleted the fix_stream_text_unnormalize branch August 21, 2024 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants