0

I have a .txt file saved in UTF-8 format without a BOM. It contains an 'é' character.

How does notepad.exe determine that it is UTF-8 encoded?

Other .txt files containing only < 0x80 characters are opened as "ANSI" encoding.

1 Answer 1

4

According to Raymond Chen:

Some files come up strange in Notepad

[...] When faced with a file that lacks a special prefix, Notepad is forced to guess which of those two encodings the file actually uses. The function that does this work is IsTextUnicode, which studies a chunk of bytes and does some statistical analysis to come up with a guess.

And as the documentation notes, “Absolute certainty is not guaranteed.” Short strings are most likely to be misdetected.

(Related follow-up blog post.)

1
  • Thx for the Raymond Chen links @user1686. I have looked into this in the years past, but never come across his articles. I wonder if Excel uses the IsTextUnicode() function itself when opening a TXT file? Commented May 9, 2023 at 3:34

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.