0

On my Linux machine I found old files (at least from 2004 if not older), so possibly Win9x days. Maybe they came over some old FAT drive on my disk or some old Samba share.

Umlaute are very weirdly encoded. Examples:

$ ls -la insgesamt 1862 drwx------ 10 user users 11 Feb 15 2006 . drwx------ 11 user users 11 Dez 2 2004 .. -rw------- 1 user users 1796429 Apr 13 2004 'Geb'$'\344''udeplan.jpg' drwx------ 2 user users 17 Feb 15 2006 'K'$'\374''che' 

The names should be Gebäudeplan.jpg and Küche.

This does not seem to be ISO-8859-15, ANSI or similar. Indeed, the hex values seem to be greater than 256.

I have tried multiple options with convmv and detox but nothing seems to fit.

I would like to scan my entire harddisk for similar files and fix them (to UTF8).

4
  • 2
    (You should be surprised if greater or equal 256.) The numbers are less than 256. In $'\344' the number is in octal. \377 would be 255. Commented Dec 21, 2020 at 7:20
  • 1
    What exactly have you tried with convmv and detox? Please edit and be specific. Commented Dec 21, 2020 at 7:34
  • @KamilMaciorowski Good point, thanks! I think I forgot the -r flag. I used convmv -f iso-8859-15 -t utf8 . which only did the current directory node. So yes, it seems it is ISO-8859-15! Commented Dec 21, 2020 at 7:42
  • 1
    I thought of and successfully tested with -f cp1250. Anyway convmv is the tool. Commented Dec 21, 2020 at 7:43

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.