The GHC 6.12.1 release candidate will be out shortly, and it includes a newly rewritten I/O library including Unicode support. Here’s what you need to know to make sure your applications/libraries continue to work with GHC 6.12.1.
We expect the release candidate phase to last a couple of weeks or so, depending on how many problems arise, after which 6.12.1 will be released. However, 6.12 is not currently scheduled to become part of the Haskell Platform until the next platform release, due around February 2010, so package authors have a grace period for testing before 6.12.1 becomes more widely used.
Console and text I/O
If you are reading or writing to/from the console, or reading/writing text files in the local encoding, then use the System.IO functions for doing text I/O (openFile, readFile, hGetContents, putStr, etc.), and you will automatically benefit from the new Unicode support. Text written will be encoded according to the current locale, or code page on Windows, and text read will be decoded accordingly.
If you need to use a particular encoding (e.g. UTF-8), then the hSetEncoding function lets you set the encoding on a Handle, e.g.
hSetEncoding stdout utf8
If you’re reading or writing binary data, or for some other reason you want to bypass the Unicode encoding/decoding that the IO library now does, you have two options:
- Use openBinaryFile or hSetBinaryMode to put the Handle into binary mode. No encoding/decoding or newline translation will be done.
- Use hGetBuf/hPutBuf, or the I/O operations provided by Data.ByteString, which all operate with binary data.
If you’re using utf8-string in certain ways then you might get incorrect results.
- The operations in System.IO.UTF8 add a UTF8 wrapper around the corresponding System.IO operation. Unless the underlying Handle is in binary mode, these operations will result in garbage being read or written. For example, if you want to use System.IO.UTF8.print, then call hSetBinaryMode stdout True first. Better still, just use System.IO.print directly. f you need to fix the encoding to UTF-8 rather than using the locale encoding, then call hSetEncoding handle utf8.
- The rest of the operations in utf8-string will continue to work as before.
There is a new API for newline translation in System.IO. By default, Handles in text mode translate newlines to or from the native representation for the current platform, that is “\r\n” on Windows and “\n” on other platforms. You can change this default using hSetNewlineMode, for example to be able to read a file with either Windows or Unix line-ending conventions:
hSetNewlineMode handle universalNewlineMode
where universalNewlineMode translates from “\r\n” to “\n” on input, leaving “\n” alone, and translates “\n” to the native newline representation on output.