Bug #16842: `inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #16842

closed

`inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable

Bug #16842: `inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable

Added by sawa (Tsuyoshi Sawada) over 5 years ago. Updated over 3 years ago.

Status:

Closed

Assignee:

duerst (Martin Dürst)

Target version:

ruby -v:

ruby 2.8.0dev (2020-05-09T13:24:57Z master 889b0fe46f) [x86_64-linux]

Backport:

2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN

[ruby-core:98231]

Description

The UTF-8 character U+0085 (NEXT LINE) is not printable, but inspect prints the character verbatim (within double quotation):

0x85.chr(Encoding::UTF_8).match?(/\p{print}/) # => false 0x85.chr(Encoding::UTF_8).inspect #=> "\" \""

My understanding is that non-printable characters are not printed verbatim with inspect:

"\n".match?(/\p{print}/) # => false "\n".inspect #=> "\"\\n\""

while printable characters are:

"a".match?(/\p{print}/) # => true "a".inspect # => "\"a\""

I ran the following script, and found that U+0085 is the only character within the range U+0000 to U+FFFF that behaves like this.

def verbatim?(char) !char.inspect.start_with?(%r{\"\\[a-z]}) end def printable?(char) char.match?(/\p{print}/) end (0x0000..0xffff).each do |i| begin char = i.chr(Encoding::UTF_8) rescue RangeError next end puts '%#x' % i unless verbatim?(char) == printable?(char) end

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago Actions
Copy link
#1 [ruby-core:102611]

Status changed from Open to Assigned
Assignee set to duerst (Martin Dürst)

Behavior here seems to be dependent on the encoding:

$ LC_ALL=C ruby -e "p 0x85.chr(Encoding::UTF_8).inspect.b" "\"\\u0085\"" $ LC_ALL=en_US.UTF-8 ruby -e "p 0x85.chr(Encoding::UTF_8).inspect.b" "\"\xC2\x85\""

I've submitted a pull request to fix the behavior, though the implementation is rather crude: https://github.com/ruby/ruby/pull/4229

@duerst (Martin Dürst) Is there a better fix by handling the unicode properties differently?

Updated by naruse (Yui NARUSE) over 4 years ago Actions
Copy link
#2 [ruby-core:102613]

Why U+0085 is categorized as Print in Ruby is historically Oniguruma treats as that.
https://moriyoshi.hatenablog.com/entry/20090307/1236410006

I'm neutral about the change, but I want the change should have detailed comment or link to this ticket.

Updated by jeremyevans (Jeremy Evans) over 3 years ago Actions
Copy link
#3

Status changed from Assigned to Closed

Applied in changeset git|49517b3bb436456407e0ee099c7442f3ab5ac53d.

Fix inspect for unicode codepoint 0x85

This is an inelegant hack, by manually checking for this specific
code point in rb_str_inspect. Some testing indicates that this is
the only code point affected.

It's possible a better fix would be inside of lower-level encoding
code, such that rb_enc_isprint would return false and not true for
codepoint 0x85.

Fixes [Bug #16842]

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #16842

`inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago Actions
Copy link
#1 [ruby-core:102611]

Updated by naruse (Yui NARUSE) over 4 years ago Actions
Copy link
#2 [ruby-core:102613]

Updated by jeremyevans (Jeremy Evans) over 3 years ago Actions
Copy link
#3

Project

General

Profile

Ruby

Tags

Custom queries

Bug #16842

`inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago ActionsCopy link #1 [ruby-core:102611]

Updated by naruse (Yui NARUSE) over 4 years ago ActionsCopy link #2 [ruby-core:102613]

Updated by jeremyevans (Jeremy Evans) over 3 years ago ActionsCopy link #3

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago Actions
Copy link
#1 [ruby-core:102611]

Updated by naruse (Yui NARUSE) over 4 years ago Actions
Copy link
#2 [ruby-core:102613]

Updated by jeremyevans (Jeremy Evans) over 3 years ago Actions
Copy link
#3