Correct way to encode mixed width text in Unicode?

Question

From what I read, the fullwidth glyphs in Unicode are provided solely for backward compatibility and lossless roundtrip with legacy standards such as Shift-JIS. The rationale seems to be that Unicode views it as a presentational issue that is better dealt with by the renderer based on linguistic context, and use of such characters is generally discouraged. In some cases, no compatibility character is provided at all, such as fullwidth left/right single/double quotation marks, because there is no legacy encoding that contains both forms.

Unicode recommends in the same document,

Ambiguous quotation marks are generally resolved to wide when they enclose and are adjacent to a wide character, and to narrow otherwise.

However, there are cases where the width gets tricky to resolve, which sometimes yields incorrect results across current fonts and implementations,

他们一致认为，目前最大的敌人无疑是“N问题”，即Nostalgia，思乡病。 “Make a wish! Make a wish!”琳琳和盼盼喊。 The term “char kway teow” is a transliteration of the Chinese characters “炒粿條”. 教授昨天讲了：“Hamlet的原文其实是Polonius (II.ii.) ‘Though this be madness, yet there is method in‘t.’“。

It seems that the recommended algorithm fails in such cases (some quotation marks are rendered with wrong/inconsistent width), and such cases may just be too complex for an algorithm to render without intricate and fragile rulesets for the language itself.

My question is thus, is there a common way to provide a hint in plaintext for the width of an ambiguous width character, maybe as a Unicode variation selector or something like RLM?

Hi. Welcome to GDSE. what is this going to be used for, and what exactly is wrong with the example you you supplied? It's not clear what problem you are trying to solve. Also, in graphic design, we're generally not limited to using plain text or even monospace fonts, so this seems like an odd thing to want to solve, and is possibly not really related to graphic design. To be honest, at the moment this question reads a bit like an XY Problem. — Billy Kerr
– Billy Kerr, Commented Jan 28, 2024 at 13:48
@BillyKerr Thanks for the comment. The examples render inconsistently e.g. with fullwidth left quote and halfwidth right quote, and I am seeking a solution for this in Unicode-encoded text. This question does concern more about the information encoding aspect where presentation is an orthogonal concern (in the same way bidirectional typesetting can be done using Inkscape alone while Unicode bidi still exists for a different reason). I posted it here in the lack of another more appropriate site, but please do feel free to move it somewhere more suitable. — SuibianP
– SuibianP, Commented Jan 28, 2024 at 14:30
Hmm . . . Perhaps stack overflow? There are unicode questions which have been asked there. I'm not a mod, so I don't have the power to move this for you. — Billy Kerr
– Billy Kerr, Commented Jan 28, 2024 at 14:46

SuibianP · Accepted Answer · 2024-03-14 01:52:03Z

The variation selector had been suggested in L2/23-212R, approved, and scheduled for release in Unicode 16.0 in September 2024.

2018 FE00; non-fullwidth form; # LEFT SINGLE QUOTATION MARK 2018 FE01; right-justified fullwidth form; # LEFT SINGLE QUOTATION MARK 2019 FE00; non-fullwidth form; # RIGHT SINGLE QUOTATION MARK 2019 FE01; left-justified fullwidth form; # RIGHT SINGLE QUOTATION MARK 201C FE00; non-fullwidth form; # LEFT DOUBLE QUOTATION MARK 201C FE01; right-justified fullwidth form; # LEFT DOUBLE QUOTATION MARK 201D FE00; non-fullwidth form; # RIGHT DOUBLE QUOTATION MARK 201D FE01; left-justified fullwidth form; # RIGHT DOUBLE QUOTATION MARK

Credit: https://corp.unicode.org/pipermail/unicode/2024-March/010814.html

Stack Exchange Network

Correct way to encode mixed width text in Unicode?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Correct way to encode mixed width text in Unicode?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions