Unicode tip #9 - Console Output
12/11/2008 12:29:22 PM
First the bad news: console I/O does not support reading (UTF-16) Unicode strings, and writing only suports AnsiStrings. This means that as soon as you call write or writeln, the contents of a (unicode) string will be converted to AnsiString when needed, and written to the output.
This means that any Text file I/O needs to be rewritten using streams or other techniques. However, since a UTF8String is also an AnsiString (with the 65001 code page specified), there is a good workaround for writing to console output provided you set the console codepage to UTF-8 and use a font that can display the Unicode characters (that’s Lucida Console for example):
program ConsoleUTF8;This will produce cyrillic characters on the standard output, if you use Lucida Console as font (just try it - copy and paste into the Delphi 2009 IDE). Note that Lucida Console cannot display all Unicode characters – Chinese and the Clef are not shown, but at least cyrillic characters display without problems.
// Writeln(UTF8String('[???????????? ???????]')); // normal Unicode String, now "????"
Writeln(AnsiString('[Ð½Ð°Ð¸Ð¼ÐµÐ½Ð¾Ð²Ð°Ð½Ð¸Ðµ Ð¿Ñ€Ð¾ÐµÐºÑ‚Ð°]')); // UTF-8 cyrillic "hack"
Note that I do not have to write the BOM to the output (you may want to in case you want to save the console output to a text file and read it afterwards. That way, you can set the font afterwards and also see the Chinese or Clef characters without problems. Provided they were written as UTF-8).
As long as we convert UTF-16 Unicode Strings to UTF-8 before writing to Text files, and don’t forget to use the UTF-8 BOM as prefix, this will work fine for writing files with Unicode UTF-8 output.
This tip is the 9th in a series of Unicode tips taken from my Delphi 2009 Development Essentials book published earlier this week on Lulu.com.