Bug #20189: `rb_str_resize` does not clear coderange when expanding - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #20189

open

`rb_str_resize` does not clear coderange when expanding

Bug #20189: `rb_str_resize` does not clear coderange when expanding

Added by tompng (tomoya ishida) almost 2 years ago. Updated almost 2 years ago.

Status:

Open

Assignee:

Target version:

ruby -v:

ruby 3.4.0dev (2024-01-09T07:07:19Z master db476cc71c) [x86_64-linux]

Backport:

3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: REQUIRED

[ruby-core:116226]

Description

Expanding string in some encoding (utf16 utf32) can change coderange to either valid or broken, but rb_str_resize does not clear coderange.

This will cause a bug in c-extension libraries that use rb_str_resize.

# Example for stringio s = StringIO.new("\0".encode('UTF-16LE')) s.truncate(1); s.truncate(2); s.string.valid_encoding? #=> true s.truncate(1); s.string.valid_encoding?; s.truncate(2); s.string.valid_encoding? #=> false (expect to be true)

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#1 [ruby-core:116227]

Does this happen only with wide-char encoding?

Updated by tompng (tomoya ishida) almost 2 years ago Actions
Copy link
#2 [ruby-core:116228]

I think so. sjis char does not end with null bytes, other encoding seems same too.

Encoding.list.select {|e| 256.times.any? do |first_byte| a = first_byte.chr b = a + "\0"; # only one of \x??\x00 and \x?? is valid a.force_encoding(e).valid_encoding? != b.force_encoding(e).valid_encoding? end } # => [#<Encoding:UTF-16BE>, #<Encoding:UTF-16LE>]

It looks like there is no string like ("表"(sjis)=="\x95\x5c") that satisfies "\x??\x00" is valid and "\x??" is not.

I opened a pull request https://github.com/ruby/ruby/pull/9552

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#3 [ruby-core:116229]

Seems b0b9f7201acab05c2a3ad92c3043a1f01df3e17f.

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#4

Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: REQUIRED

Updated by byroot (Jean Boussier) almost 2 years ago Actions
Copy link
#5 [ruby-core:116231]

Expanding string in some encoding (utf16 utf32) can change coderange to either valid or broken,

I must admit I'm not very familiar with wide char encodings, but this surprises me a bit. Ruby strings should always have their terminator, so I don't see how expanding a string would change their interpretation.

Updated by Eregon (Benoit Daloze) almost 2 years ago Actions
Copy link
#6 [ruby-core:116240]

byroot (Jean Boussier) wrote in #note-5:

I must admit I'm not very familiar with wide char encodings, but this surprises me a bit. Ruby strings should always have their terminator, so I don't see how expanding a string would change their interpretation.

It's because in UTF-16 if the number of bytes is not a multiple of 2 then it's CR_BROKEN. Same for UTF-32 if not a multiple of 4.
And since rb_str_resize() changes the String#bytesize then that condition can change:

irb(main):002:0> "a".force_encoding(Encoding::UTF_16LE).valid_encoding? => false irb(main):003:0> "a\x00".force_encoding(Encoding::UTF_16LE).valid_encoding? => true

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #20189

`rb_str_resize` does not clear coderange when expanding

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#1 [ruby-core:116227]

Updated by tompng (tomoya ishida) almost 2 years ago Actions
Copy link
#2 [ruby-core:116228]

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#3 [ruby-core:116229]

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#4

Updated by byroot (Jean Boussier) almost 2 years ago Actions
Copy link
#5 [ruby-core:116231]

Updated by Eregon (Benoit Daloze) almost 2 years ago Actions
Copy link
#6 [ruby-core:116240]

Project

General

Profile

Ruby

Tags

Custom queries

Bug #20189

`rb_str_resize` does not clear coderange when expanding

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago ActionsCopy link #1 [ruby-core:116227]

Updated by tompng (tomoya ishida) almost 2 years ago ActionsCopy link #2 [ruby-core:116228]

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago ActionsCopy link #3 [ruby-core:116229]

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago ActionsCopy link #4

Updated by byroot (Jean Boussier) almost 2 years ago ActionsCopy link #5 [ruby-core:116231]

Updated by Eregon (Benoit Daloze) almost 2 years ago ActionsCopy link #6 [ruby-core:116240]

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#1 [ruby-core:116227]

Updated by tompng (tomoya ishida) almost 2 years ago Actions
Copy link
#2 [ruby-core:116228]

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#3 [ruby-core:116229]

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago Actions
Copy link
#4

Updated by byroot (Jean Boussier) almost 2 years ago Actions
Copy link
#5 [ruby-core:116231]

Updated by Eregon (Benoit Daloze) almost 2 years ago Actions
Copy link
#6 [ruby-core:116240]