Project

General

Profile

Actions

Bug #19468

closed

Ruby 3.2: net/http sets UTF-8 encoding for binary responses

Bug #19468: Ruby 3.2: net/http sets UTF-8 encoding for binary responses

Added by romuloceccon (Rômulo Ceccon) over 2 years ago. Updated over 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.1 (2023-02-22 revision 65ab2c1ef2) [x86_64-linux]
[ruby-core:112630]

Description

net/http on Ruby 3.2 has changed the encoding of binary responses from SSL connected hosts (non-SSL connections are not affected):

# req.rb require 'openssl' require 'net/http' puts "openssl ext: #{OpenSSL::VERSION}" puts "openssl lib: #{OpenSSL::OPENSSL_VERSION}" puts "net-protocol: #{Net::Protocol::VERSION}" puts "net-http: #{Net::HTTP::VERSION}" puts Net::HTTP.get(URI(ARGV.first)).encoding 

Ruby 3.1 (with updated net-protocol and net-http libs):

$ ruby -v ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] $ ruby req.rb https://www.gnu.org/software/gzip/manual/gzip.pdf openssl ext: 3.0.0 openssl lib: OpenSSL 1.1.1n 15 Mar 2022 net-protocol: 0.2.1 net-http: 0.3.2 ASCII-8BIT # <== CORRECT 

Ruby 3.2 (latest git revision):

$ ruby -v ruby 3.2.1 (2023-02-22 revision 65ab2c1ef2) [x86_64-linux] $ ruby req.rb https://www.gnu.org/software/gzip/manual/gzip.pdf openssl ext: 3.1.0 openssl lib: OpenSSL 1.1.1n 15 Mar 2022 net-protocol: 0.2.1 net-http: 0.3.2 UTF-8 # <== WRONG 

I've tracked the problem down to the SSL socket call at https://github.com/ruby/ruby/blob/9557c8edf2dcf18fdece066c596a71696b2f2b30/lib/net/protocol.rb#L218.

The string returned has the encoding set to ASCII-8BIT, but #ascii_only? also always reports true, even when there are non-ascii bytes. This seems to be a bug, and is the probably cause of the change in behavior in net/http. On Ruby 3.1 concatenating the result of reading the SSL socket to a UTF-8 string produces an ASCII-8BIT string. On Ruby 3.2 the concatenation produces a UTF-8 string.

Here's a program demonstrating the behavior of the SSL socket:

# ssltest.rb require 'openssl' require 'uri' url = URI(ARGV.first) path = url.path path += '?' + url.query if url.query req = "GET #{path} HTTP/1.1\r\nHost: #{url.hostname}\r\nAccept: */*\r\n\r\n" sock = OpenSSL::SSL::SSLSocket.open(url.hostname, url.port || HTTPS.default_https_port) sock.connect sock.write(req) sleep(1) loop do sleep(0.1) b = ''.b r = sock.read_nonblock(1024 * 16, b, exception: false) break unless String === r p [r.bytesize, r.encoding.to_s, r.ascii_only?] end 

Ruby 3.1:

$ ruby ssltest.rb https://www.gnu.org/software/gzip/manual/gzip.pdf [475, "ASCII-8BIT", true] [16384, "ASCII-8BIT", false] # <== always false (except HTTP header): CORRECT [16384, "ASCII-8BIT", false] ... [13927, "ASCII-8BIT", false] 

Ruby 3.2:

$ ruby ssltest.rb https://www.gnu.org/software/gzip/manual/gzip.pdf [475, "ASCII-8BIT", true] [16384, "ASCII-8BIT", true] # <== always true: WRONG [16384, "ASCII-8BIT", true] ... [13927, "ASCII-8BIT", true] 

Updated by romuloceccon (Rômulo Ceccon) over 2 years ago Actions #1 [ruby-core:112677]

Another way to reproduce the problem:

$ ruby require "zlib" p Zlib::Inflate.new.inflate(Zlib.deflate("\u3042".b)).ascii_only? p Zlib::Inflate.new.inflate(Zlib.deflate("\u3042".b), buffer: "".b).ascii_only? ^D false # <== correct true # <== wrong 

Bisecting points to https://github.com/ruby/ruby/commit/b0b9f7201a as the offending commit.

This patch fixes the issue with both zlib and openssl:

diff --git a/string.c b/string.c index 6773ed2d86..95f023232a 100644 --- a/string.c +++ b/string.c @@ -2460,6 +2460,7 @@ rb_str_modify_expand(VALUE str, long expand) else if (expand > 0) { RESIZE_CAPA_TERM(str, len + expand, termlen); } + ENC_CODERANGE_CLEAR(str);  } /* As rb_str_modify(), but don't clear coderange */ 

I could fix zlib's rb_inflate_inflate (https://github.com/ruby/ruby/blob/65ab2c1ef2/ext/zlib/zlib.c#L2167) and openssl's ossl_ssl_read_internal (https://github.com/ruby/ruby/blob/65ab2c1ef2/ext/openssl/ossl_ssl.c#L1929) instead, but I don't fully understand the reasoning of the change.

Updated by byroot (Jean Boussier) over 2 years ago Actions #2 [ruby-core:112678]

This patch fixes the issue with both zlib and openssl:

Interesting, that you for the bug report. One problem though is that the point of https://github.com/ruby/ruby/commit/b0b9f7201a, as to not clear the coderange as often when it wasn't required.

Expanding the string capacity shouldn't change the coderange.

So the missing CODERANGE_CLEAR is likely in whatever function calls this one. I'll try to see if I can track it down.

Updated by byroot (Jean Boussier) over 2 years ago Actions #3 [ruby-core:112679]

Actually, the assumption in https://github.com/ruby/ruby/commit/b0b9f7201a might have been incorrect. The function is called rb_str_modify_expand and rb_str_modify does clear the coderange, so it is somewhat legitimate for extension to assume it include what rb_str_modify does. SO your patch seem correct to me.

Updated by byroot (Jean Boussier) over 2 years ago Actions #4

  • Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED

Updated by romuloceccon (Rômulo Ceccon) over 2 years ago Actions #5

  • Status changed from Open to Closed

Applied in changeset git|d78ae78fd76e556e281a743c75bea4c0bb81ed8c.


rb_str_modify_expand: clear the string coderange

[Bug #19468]

b0b9f7201acab05c2a3ad92c3043a1f01df3e17f errornously stopped
clearing the coderange.

Since rb_str_modify clears it, rb_str_modify_expand
should too.

Updated by naruse (Yui NARUSE) over 2 years ago Actions #6 [ruby-core:112925]

  • Backport changed from 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE

ruby_3_2 b309c246ee70926d593d3857e1625202e2d0f67b merged revision(s) d78ae78fd76e556e281a743c75bea4c0bb81ed8c.

Actions

Also available in: PDF Atom