3

I have Windows 7 with NTFS filesystem. I have filenames and directory names like:

Kispál és a Borz - 02 - Tökéletes Helyettes 

I want to transform them to:

Kispál és a Borz - 02 - Tökéletes Helyettes 

The filesystem is capable of storing filenames like フリー百科事典, so it surely has unicode support.

As I imagine the story, a long time ago they were perfect. Then they were transferred from an UTF-8 to a Latin-1 filesystem, then back to this UTF-8 supporting filesystem. In theory, all information is there, I could write a program in C to fix these characters, but I assume someone somewhere already did it.

Do you know any utility that can do the transformation?

3 Answers 3

2

I wrote a C / C++ hybrid which does the translation part (does not rename anything, just converts bad byte sequences to good ones). You can download it using the link at the end of this post.

The input file is decoded as an UTF-8 stream into a UNICODE code position sequence which is then NOT converted to any other codepage. All code-positions are under 256, they represent the original UTF-8 string's byte sequence. So I just write these code positions as bytes to the output. The result is a correct UTF-8 string. It is still not an application for my problem, but the core of the solution.

The program is written and tested under Linux, but should work on any OS. Usage example:

nil@hippy:~/playground/c++$ g++ utf8decode.cpp -o utf8decode nil@hippy:~/playground/c++$ cat > file Kispál és a Borz - 02 - Tökéletes Helyettes nil@hippy:~/playground/c++$ cat file | ./utf8decode Kispál és a Borz - 02 - Tökéletes Helyettes Characters found: 48 nil@hippy:~/playground/c++$ 

I wrote an UTF-8 character counter before, and I modified that. I havn't written the whole program in an hour. Source: http://pastebin.com/Hy7tVt5A http://pastebin.com/NFJUP0R5

1

My problem was that Windows 10 Explorer was not showing Unicode filenames correctly. The name was in Unicode, but garbage was shown on the screen. The answer was that the problem went away when I rebooted.

1

Let me elaborate on the answer given by dinar qurbanov. To fix file names encoding in Total Commander v7 or higher you'll need to use the multi-rename tool (Ctrl+M).

In there you'll find a folder-like button, click it and select 'Edit names' to get a text file containing file names. After fixing them with any tool/editor you like paste them back and close the editor.

A button to edit filenames

3
  • "Let me elaborate on the answer given by dinar qurbanov" - Your elaboration should have been submitted as a comment, so the author of the answer, could consider improving the answer. Commented Jun 30, 2017 at 18:58
  • 1
    Unfortunately, I have not enough reputation to comment. Commented Jul 1, 2017 at 10:04
  • Commentary doesn't belong in an answer. Commentary shouldn't be submitted as an answer. Take those statements how you want to. Your inability to submit a comment has nothing to do with submitting a quality answer that doesn't include commentary. Here is the thing, if you submit commentary as an answer, you will never earn enough reputation, to actually submit commentary Commented Jul 1, 2017 at 14:41

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.