Incorrect display of Chr() function return

Question

I have got a new work PC (it has a higher build of Windows 10, the same highest version of O365), configured it pretty much the same as my previous one (exactly like that in terms of region and language) and I have run into a strange issue with Excel VBA special characters display. I hae multiple macros for our French users and they encode the French special characters using the Chr() function. However on the new PC, these display as � in all userforms (including messagebox). When I debug.print the function return, it yields some other nonsensical result like "Mo", "Tr" or again �. Even when I tell the macro to open a Word document and type the result there, it is typically �.

The issue is only limited to my new PC and only to the Chr() function - if I use the ChrW() function, the results display correctly. I can use ChrW() for future, however I am rather unwilling to redo all the existing macros...

example:

Sub TEST() MsgBox "Sélectionnez" & vbCrLf & "S" & Chr(233) & "lectionnez" & vbCrLf & "S" & ChrW(233) & "lectionnez" & vbCrLf End Sub

EDIT:

On advice, I ran the suspect position from the string through AscW() function and it returns -3. I am not that familiar with hexadecimal notation, but when interpreting this result, I understood that negative numbers may be encoded by the same ones as certain positive ones, so -3 corresponds to FFFD, which in turn can be translated also as 65533. Further research indicates U+65533 is a "replacement character" or � and Google returns to me questions where the answer is typically "encoding issue".

Also i ran a quick loop to map 1 to 65533 to their results in ChrW(), Chr() and Char() in a Worksheet. The break occurs at 128, from which Chr() and Char() return � up until 255 ASCII limit. However, notably, ChrW() returns no character from 128 to 160 (€ to nonbreakable space), which I find rather peculiar. Also a wild guess: 128 is just above the half of 255, however that is probably just a coincidence as Char(256) returns #VALUE!, not �, also I don't see how this would influence the ChrW() result. Also, I ran the results from each three functions through Code() and interestingly enough, from 160, this starts returning 0 for ChrW() (for the others, predictably, it starts for their � return, so already at 128).

Please post some examples and screenshots, without that it's difficult to help.Do you have the same version of Windows and Office in both PCs? — Máté Juhász
– Máté Juhász, Commented Jan 12, 2021 at 10:12
Modified - both PCs have Windows 10 but the new one has a higher build. O365 is up to date on both PCs. — Eleshar
– Eleshar, Commented Jan 12, 2021 at 10:33
What is the code for the character that is actually in that position? In particular, for the second character, what would be returned by something like ?AscW(Mid(theString,2,1)) — Ron Rosenfeld
– Ron Rosenfeld, Commented Jan 13, 2021 at 1:11
@RonRosenfeld - updated; TLDR: AscW() identifies it for what it is, i.e. a black diamond of death. — Eleshar
– Eleshar, Commented Jan 13, 2021 at 17:09
Based on what you've posted in your recent edits, I wonder if your new computer is using a code page that does not include some characters. You can check your Windows code page by executing chcp from the command prompt. (Type command into the windows search bar (below the start menu). Then type chcp after the prompt and see what returns. — Ron Rosenfeld
– Ron Rosenfeld, Commented Jan 13, 2021 at 20:09

Ron Rosenfeld · Accepted Answer · 2021-01-14 12:18:10Z

2

I can reproduce your issue. Hopefully yours is caused by the same and reversing it will fix it for you.

Navigate to Settings -- Time and Language -- Language -- Administrative Language Settings -- Change system locale

DEselect Beta: Use Unicode UTF-8 for worldwide language support

screenshot shows it selected. Please DEselect

Maybe this is why it is labelled Beta

edited Jan 14, 2021 at 12:18

answered Jan 14, 2021 at 12:12

Ron Rosenfeld

9,7243 gold badges16 silver badges18 bronze badges

It's labelled "Beta" because programs and programmers make poor assumptions about legacy codepages being in use...

grawity
– grawity

2021-01-14 12:35:25 +00:00
Commented Jan 14, 2021 at 12:35
@user1686 I still find it surprising that making that selection would cause these kinds of problems. I also noted that my keyboard layout changed upon making that selection. However, and this is on me, I did not investigate the ramifications of this selection as I was merely trying to duplicate the original poster's issue.

Ron Rosenfeld
– Ron Rosenfeld

2021-01-14 12:45:30 +00:00
Commented Jan 14, 2021 at 12:45
Well, the gist of the OP's issue, literally, is that Chr() is a function for converting bytes into characters according to the currently active legacy codepage, so you get one result if the codepage is set to Windows-1252 ("English") and you get a different result if the codepage is set to UTF-8. After all, that's the whole point of codepages.

grawity
– grawity

2021-01-14 13:18:33 +00:00
Commented Jan 14, 2021 at 13:18

Add a comment |

grawity · Accepted Answer · 2021-01-14 14:27:11Z

I think Ron's answer is correct regarding the practical Windows configuration change needed, but doesn't really talk much about why it has this effect.

Chr() converts byte values according to the system's currently selected "ANSI codepage", usually one of the Windows-125x codepages depending on region (e.g. cp1252 for Western, cp1257 for Baltic).

ChrW() instead takes Unicode BMP codepoints, which are two-byte values in the range 0..65535 and have the same mapping everywhere regardless of region.

The two functions only happen to match in the 0..127 range, because both Windows-125x and Unicode are based on the same 7-bit ASCII and define this range in exactly the same way. But beyond that range, they're indeed two different codepages.

The Windows-125x codepages used by Chr() are single-byte codepages covering the values 0..255 (minus the "control" range 0..31). That's why Chr(256) returns #VALUE! – it literally has no meaning for single-byte codepages, whereas ChrW(256) in Unicode is just ordinary U+0100 Ā.

So what's the actual reason the old Chr() stopped working?

As mentioned, Chr() uses a region-specific codepage which Windows calls the "ANSI codepage". The ANSI codepage is considered a legacy feature, kept around for compatibility with programs originally written for Windows 95/98 (which weren't Unicode-based).

(Many Windows APIs actually have two sets of nearly identical functions, one for "ANSI" and one for "Unicode", e.g. MessageBoxA() accepts text in the legacy ANSI-codepage while MessageBoxW() accepts text in Unicode. Similarly CreateFileA() vs CreateFileW(), etc.)

The big change in Windows 10 (i.e. the checkbox that Ron Rosenfeld shows in the other answer) is that the "ANSI" functions no longer use Windows-125x – they use UTF-8. The UTF-8 codepage is another way of working with Unicode codepoints, and it is not single-byte – it is variable-width, with a character being anywhere from 1 byte to 4 bytes.

The first 128 bytes of UTF-8 (0..127) still deliberately match ASCII, so you still get the same results as with Windows-125x. However, the other half (bytes 128..255) have a completely different meaning – they are used for multi-byte sequences, and have no meaning when used alone.

For example, é in UTF-8 is two bytes, 195 followed by 169. If you use Chr(195) + Chr(169), you should probably get an "é" back. But when either of them is used alone, that's an incomplete sequence; the UTF-8 decoder gives you an "�" because the single byte has no meaning by itself.

The reason why Microsoft added this new feature is that the "ANSI" functions are becoming less useful for compatibility with old software, whereas UTF-8 can be very useful for modern software that's ported from other operating systems (such as Linux), as those programs tend to use UTF-8 as the only codepage.

However, notably, ChrW() returns no character from 128 to 160 (€ to nonbreakable space), which I find rather peculiar

That's normal – as mentioned above, ChrW() uses Unicode mapping, and Unicode has no characters in this range.

Windows-1252 (Western) is based on the ISO 8859-1 codepage, with one difference: the original ISO 8859 reserves the values 128..159 (0x80..0x9F) for a second control characters range, which has no visible characters. However, Microsoft does in fact put additional graphical characters in this range, which is what you get when calling Chr(128) and so on.

The first two Unicode blocks, covering 0..255 in total (or rather U+0000..U+00FF), are also based on the ISO 8859-1 – therefore codepoints 128..159 aka U+0080..U+009F (part of Latin-1 Supplement block) contain the same invisible control characters.

The issue is only limited to my new PC and only to the Chr() function - if I use the ChrW() function, the results display correctly. I can use ChrW() for future, however I am rather unwilling to redo all the existing macros...

You should really be using ChrW().

Because it takes Unicode codepoints, it will always return the same result no matter what your Windows region settings are. It means your macros will continue to work for people who for some reason need their OS to be set to Cyrillic, or Baltic, or Greek, or Turkish.

(Old programs with embedded assumptions about regional settings have caused me and my coworkers a lot of headache, and I'm a bit vocal about this as a result. I strongly believe this nonsense should not continue into the 2020s – after all, Windows itself has already been Unicode-based for more than two decades.

We've had people's names mangled on official documents because of codepage mismatch. We've had to tell people to change their regional settings to something completely wrong for their language, just to make a legacy program work. We have a program which requires YYYY/MM/DD dates, and we have a program which requires YYYY-MM-DD dates, and we needed to find a way to run both on the same computer. Dealing with programmers who are rather unwilling to improve is not the slightest bit of fun.)

Thank you for this very comprehensive explanation of the Why — Ron Rosenfeld
– Ron Rosenfeld, Commented Jan 14, 2021 at 18:58
Very nice explanation! So in other words, the Chr() function does not have a clear/transparent return, depends on system settings and should be avoided. I wonder why I have never read about that anywhere. — Eleshar
– Eleshar, Commented Jan 14, 2021 at 22:32
Also preach on the last two paragraphs. I am no IT person and not even that long in the workforce and this feels like constant pain. — Eleshar
– Eleshar, Commented Jan 14, 2021 at 22:47

Stack Exchange Network

Incorrect display of Chr() function return

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Incorrect display of Chr() function return

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions