COMPUTER: Internationalization

Friday, January 9, 2009

Internationalization

A few early words about internationalization should be said to make clear what internationalization is not. Internationalization does not mean that you use your native language to name identifiers and write comments.

Coding character set
You should limit yourself to the ASCII character set in C source files. The printable ASCII characters are represented by the bytes hexadecimal 20 to 7e.
Non-ASCII characters (e.g. ISO-Latin above hexadecimal 7f) are syntactically incorrect when used for identifier names (even if some compilers may accept such identifiers).
Comments, strings and character literals should not contain non-ASCII characters.
Non-ASCII characters and control characters (hexadecimal 0 to 1f, with the exception of newline, carriage return, tabulator and formfeed) may confuse editors or may even lead to unexpected results on some compilers (e.g. ^Z misinterpreted as end-of-file, most significant bit stripped on non-ASCII characters, etc.). There is no means of notation of the encoding type of a source file (unlike with mail messages).
C offers nice ASCII-notations for non-ASCII characters, e.g. '\xe8'.

Coding language
To let the sources be globally readable, the lingua franca of computer science, english, should exclusively be used. As with other parts of the C coding guidelines: the smaller the scope context, the less strict the rule.
Customizing your program's user interface to a local language (i.e. internationalization in the stricter sense) can be done with locales and e.g. the gettext() family of functions. These functions let you use compiled text databases for the different ISO language codes, while leaving your C sources still readable, i.e. (english) strings are still present in the code.
Note that internationalization covers also local formats for things like dates and money

COMPUTER

Labels

Friday, January 9, 2009

Internationalization

0 Comments:

Blog Archive