Character Encodings

What is encoding?

We will not attempt to answer this question in detail, other than to briefly present the practical ramifications of different character encodings. For historical background, see this article or do a Google Search on character encoding. It is assumed that if you use TeX, you understand that if you use non-ASCII characters in a document, you have to pay attention to the character encoding and possibly use a special package that recognizes the text that you input.

The basic problem at hand is that characters represented on disk as bytes can be interpred differently. Hence, opening a file that is UTF-8 encoded under the assumption that it is Latin-1 encoded can lead to a loss of information; some characters may not be read or displayed correctly, and if the document is saved in this state, the data is permanently lost. TeX works around this problem by using {\"u} for the "ü" character, since ASCII characters are essentially a lowest common denominator among most of the existing character sets.

What encoding should I use?

In general, pick an encoding that will be compatible with all systems the document will be used on. On Macintosh systems in Western countries, the default encoding is MacRoman; this is understood by all Apple computers, but users on other platforms may have difficulty reading the files (generally because they assume a different encoding). The UTF-8 encoding is a superset of ASCII, and is a more modern choice that supports extended character sets; it also works with XeTeX, a Unicode-based TeX system for Mac OS X. Windows computers (or users thereof) generally assume Windows Latin 1 or ISO Latin 1 (8859-1) encoding, which is a widely used character encoding for Western European scripts (but is incompatible with UTF-8).

The "Non-lossy ASCII" encoding provided by BibDesk should not be used unless you have a very specific need for that encoding; it is not compatible with TeX accented character sequences, but some users need it for other reasons.