2

I want to do a very simple thing: Copy a file.
Copy is a simple (and fundamental) command. It should be easy, but it seems to be complicated when it comes to Unicode filenames (using English XP, cmd.exe, and a .cmd script).
I have managed to create a .cmd file with the Unicode filenames, as follows

 :: To create the final .cmd script :: set SRCE=D:\_cmd\cpp set DEST=H%SRCE:~1% cmd.exe /U /c DIR /A:-D /s /b "%SRCE%" >"SRCE.UTF16" cmd.exe /U /c DIR /A:-D /s /b "%DEST%" >"DEST.UTF16" ConvUTF.exe 1628 "DEST.UTF16" "DEST.UTF-8" ConvUTF.exe 1628 "SRCE.UTF16" "SRCE.UTF-8" :: Then, with `sed.exe`, `diff.exe`, and `ConvUTF.exe` again... :: the resulting UTF-8 (or UTF16) .cmd file looks like this... :: copy "D:\_cmd\cpp\ā.क.test" "H:\_cmd\cpp\" 

The copy command works fine when I run it directly at the command prompt, but fails, when used in the .cmd script.
The UTF-8 .cmd errors out with: The system cannot find the file specified
The UTF16 .cmd doesn't get past the first NULL-byte (of the first character), and just exits.

Is there some way to do it from a .cmd script? (I want to use the cmd.exe shell)
Perhaps there is a utility program which can be called from my .cmd...
All suggestions are welcome.

PS. To clarify the manin issue... I don't care about how Unicode filenames display in the console window (That just doesn't happen for most non-latin-based letters in the cmd.exe window)... I am only interesterd in being able to copy a file which has Unicode letters in its filename- via a batch .cmd "script".

10
  • Can tell us what your desired output is from this? If all you want to do is copy a file why not just us a batch file? If you copied with a batch file all you would need to do is xcopy c:\somefile.txt z:\somefile.txt /u /y exit Commented Aug 1, 2010 at 5:44
  • @typoknig: Yes, xcopy c:\somefile.txt z:\somefile.txt /u /y works, because your example uses filenames whose "letters" do not range beyond a single-byte char-set. My problem arises when I want to copy a file with Unicode characters in its filename.. by Unicode, I mean characters which have a Unicode CodePoint greater than 127 (hex 7F) .. as per my example above: "ā.क.test".. Commented Aug 1, 2010 at 6:38
  • Very interesting question. I tried(Vista) copy abcऊ.txt to kk\defऊ.txt in a command window and this got pasted copy abc?.txt to kk\def?.txt. Very much interested in whats the deal with this. Commented Aug 1, 2010 at 7:56
  • @Zabba: The fact that it shows "?" is not really (necessarily) a problem.. It typically only means that the particular character is not known to the current console font.. However the underlying character is still and the copy will succeed (It does on my setup, but there may be other issues involved. So, I don't have a problem with that point (it is a pain! but not a show-stopper)... The issue I have is that the copy fails when the command is called from within a .cmd script.. It doesn't seem to handle any type of Unicode: UTF-8 and UTF16 both fail.. and UTF16 is native to Windows! Commented Aug 1, 2010 at 8:16
  • have you tried another copy command called robocopy? Commented Aug 1, 2010 at 8:45

1 Answer 1

3

Save the UTF-8 batch file without the BOM at the start which will trip up cmd. Also, cmd isn't really Unicode-aware when it comes to batch files. You should put

chcp 65001 

into the batch at the beginning to switch to UTF-8 which should enable your Unicode characters to be read and processed correctly. The only downside is that this change persists even after the batch file exited and thus you're stranded with the shell in UTF-8. You can save the previous code page and restore it at the end if this poses a problem.

Also changing the console font to a TrueType font might help since several internal commands are known to exhibit Unicode problems with raster fonts (a reason why for /f over dir is such a stupid idea generally).

3
  • Thanks Joey... My main comuter just died. I've bought a new box. I've installed Ubuntu... Like the Maginot Line, I'll just go around the obstruction! Unicode problems are now solved :) Commented Aug 31, 2010 at 13:06
  • @orth: Well, good if it works. Still, Linux or any Unix-like system has it's own share of weird Unicode problems, for example, that everything from the locale to the program to the terminal emulator, GNU screen SSH clients or whatever else comes between the computer and the user (yes, that may involve many levels and on some occasions many machines) has to use UTF-8, otherwise it breaks. Often it's not obvious where it breaks either. I think I prefer a system that supports Unicode in its core and not just as an afterthought as a quick hack. Commented Aug 31, 2010 at 14:46
  • Genius! Even saving the BAT file with BOM will stuff up the chcp command. Have to use UTF no BOM and use that chcp command . Thanks mate! Commented Nov 30, 2024 at 5:43

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.