How to use unicode characters in Windows command line?

We have a project in Team Foundation Server (TFS) that has a non-English character (š) in it. When trying to script a few build-related things we've stumbled upon a problem - we can't pass the š letter to the command-line tools. The command prompt or what not else messes it up, and the tf.exe utility can't find the specified project.

I've tried different formats for the .bat file (ANSI, UTF-8 with and without BOM) as well as scripting it in JavaScript (which is Unicode inherently) - but no luck. How do I execute a program and pass it a Unicode command line?


My background: I use Unicode input/output in a console for years (and do it a lot daily. Moreover, I develop support tools for exactly this task). There are very few problems, as far as you understand the following facts/limitations:

  • CMD and “console” are unrelated factors. CMD.exe is a just one of programs which are ready to “work inside” a console (“console applications”).
  • AFAIK, CMD has perfect support for Unicode; you can enter/output all Unicode chars when any codepage is active.
  • Windows' console has A LOT of support for Unicode — but it is not perfect (just “good enough”; see below).
  • chcp 65001 is very dangerous. Unless a program was specially designed to work around defects in the Microsoft's C runtime library (or uses a different CRTL), it would not work reliably.
  • I work in cp1252 . As I already said: To input/output Unicode in a console, one does not need to set the codepage .
  • The details

  • To read/write Unicode to a console, an application (or its C runtime library) should be smart enough to use not File-I/O API, but Console-I/O API.
  • Likewise, to read Unicode command-line arguments, an application (or its C runtime library) should be smart enough to use the corresponding API.
  • Console font rendering supports only Unicode characters in BMP (in other words: below U+10000 ). Only simple text rendering is supported (so European — and some East Asian — languages should work fine — as far as one uses precomposed forms). [There is a minor fine print here for East Asian and for characters U+0000, U+0001, U+30FB.]
  • Practical considerations

  • The defaults on Window are not very helpful. For best experience, one should tune 3 pieces of configuration:

  • For output: console font. For best results, I recommend my builds. (The installation instructions are present there — and also listed in other answers on this page.)
  • For input: capable keyboard layout. For best results, I recommend my layouts.
  • For input: allow HEX input of Unicode.
  • One more gotcha with “Pasting” into a console application (very technical):

  • HEX input delivers a character on KeyUp of Alt ; all the other ways to deliver a character happen on KeyDown ; so many applications are not ready to see a character on KeyUp . (Only applicable to applications using Console-I/O API.)
  • Conclusion: many application would not react on HEX input events.
  • Moreover, what happens with a “Pasted” character depends on the current keyboard layout: if the character can be typed without using prefix keys (but with arbitrary complicated combination of modifiers, as in Ctrl-Alt-AltGr-Kana-Shift-Gray* ) then it is delivered on an emulated keypress. This is what any application expects — so pasting anything which contains only such characters is fine.
  • However, the “other” characters are delivered by emulating HEX input .
  • Conclusion : unless your keyboard layout supports input of A LOT of characters without prefix keys, some buggy applications may skip characters when you Paste via Console's UI: Alt-Space EP . ( This is why I recommend using my keyboard layouts!)

    One should also keep in mind that the “alternative, 'more capable' consoles” for Windows are not consoles at all . They do not support Console-I/O APIs, so the programs which rely on these APIs to work would not function. (The programs which use only “File-I/O APIs to the console filehandles” would work fine, though.)

    One example of such non-console is a part of MicroSoft's Powershell . I do not use it; to experiment, press and release WinKey , then type powershell .

    Summary

  • set font, keyboard layout (and optionally, allow HEX input).

  • use only programs which go through Console-I/O APIs, and accept Unicode command-line arguments. For example, any cygwin -compiled program should be fine. As I already said, CMD is fine too.


  • Try:

    chcp 65001
    

    which will change the code page to UTF-8. Also, you need to use Lucida console fonts.


    I had same problem (I'm from the Czech Republic). I have an English installation of Windows, and I have to work with files on a shared drive. Paths to the files include Czech-specific characters.

    The solution that works for me is:

    In the batch file, change the charset page

    My batch file:

    chcp 1250
    copy "O:VEŘEJNÉŽŽŽŽŽŽŽ.xls" c:temp
    

    The batch file has to be saved in CP 1250.

    Note that the console will not show characters correctly, but it will understand them...

    链接地址: http://www.djcxy.com/p/5128.html

    上一篇: 如何将命令行参数传递给rake任务

    下一篇: 如何在Windows命令行中使用unicode字符?