0

I have some encoded values values which I believe is UTF-8. Now I dont really know if it is UTF-8 or not because other online tool and steps to decode UTF-8 is not working, BUT an open source tool available is the ONLY tool working for me. The actual plain text will be in Korean Language.

Now the problem is tool is not working for more than 100 words or larger strings and also it takes lot time even for around 50-60 words. Since the tool is open source I want to run the tool on my local system if it is possible so maybe I can work faster or without any character limit.

Tool link:- https://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder You can also check parent directory (going back to previous directory by removing current location from URL) of this tool where other files are also present like library or something.

I want to learn why all other decoders on the internet are not working specific for my strings and only this tool is able to achieve the success? Also how can I run this tool locally if possible. I have very large amount of data.

Here is the sample data.

  1. 박서연
  2. 김신ìž
  3. 유ì€ì„œ
  4. 박미연
  5. 김민ì˜
  6. 김민ì˜
  7. ì´íš¨ì§„
  8. 최유빈
  9. 박미연
  10. 유ì€ì„œ

FYI, these encoded strings are Names in Korean Language. My final goal is to achieve that Korean Plain text, not the translated version to any language.

2
  • 1
    For me simply pasting your text into the built-in Windows version of Notepad, saving as ANSI encoded and then simply reopening the file seems to fix it. I get the following sets 박서연, 김신자, 유은서, 박미연, 김민영, 김민영, 이효진, 최유빈, 박미연, 유은서 Commented May 2, 2024 at 20:01
  • ok so remember, you encode byte-code into text strings, and decode text strings into bytes. if you are seeing text that is not exclusively 1-9+A-F, then you are looking at somthing that has already been encoded. in this case the bytes were subjected to the wrong encoding. Commented May 2, 2024 at 21:31

1 Answer 1

2

What you have appears to be UTF that has been corrupted to show as ANSI encoded. It potentially came from a text file that is missing the Unicode UTF BOM (Byte Order Mark)

For me pasting your text into the built-in Windows version of Notepad, saving as ANSI encoded and then simply reopening the file seems to fix it. I get the following 박서연, 김신자, 유은서, 박미연, 김민영, 김민영, 이효진, 최유빈, 박미연, 유은서

enter image description here

Simply doing this is enough to get windows to look at the text and detect properly encoded text.

An alternative is Notepad++. Set the Encoding to ANSI and paste your text. It will look like garbage:

enter image description here

Then set the encoding to UTF-8:

enter image description here

1
  • Wait what? This was as this easy? I mean it works so perfect like I want!!! And here I was trying to host the tool on localhost, tried re-program the tool, tried find another tools and what not!!! Man you are the life savior for me. I learned something new today. I will surely research on this in detail. Thanks a lot brother!!! <3 Commented May 3, 2024 at 3:25

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.