Different countries and cultures have varying conventions for how to
communicate. These conventions range from very simple ones, such as the
format for representing dates and times, to very complex ones, such as
the language spoken. Provided the programs are written to obey the
choice of conventions, they will follow the conventions preferred by the
user. GNU Smalltalk provides two packages to ease you in doing so.
The I18N
package covers both internationalization and
multilingualization; the lighter-weight Iconv
package
covers only the latter, as it is a prerequisite for correct
internationalization.
Multilingualizing software means programming it to be able to
support languages from every part of the world. In particular, it
includes understanding multi-byte character sets (such as UTF-8)
and Unicode characters whose code point (the equivalent of the
ASCII value) is above 127. To this end, GNU Smalltalk provides the
UnicodeString
class that stores its data as 32-bit Unicode
values. In addition, Character
will provide support for
all the over one million available code points in Unicode.
Loading the I18N
package improves this support through
the EncodedStream
class13, which interprets and transcodes
non-ASCII Unicode characters. This support is mostly transparent,
because the base classes Character
, UnicodeCharacter
and UnicodeString
are enhanced to use it. Sending asString
or printString
to an instance of Character
and
UnicodeString
will convert Unicode characters so that they
are printed correctly in the current locale. For example,
‘$<279> printNl’ will print a small Latin letter ‘e’ with
a dot above, when the I18N
package is loaded.
Dually, you can convert String
or ByteArray
objects to
Unicode with a single method call. If the current locale’s encoding is
UTF-8, ‘#[196 151] asUnicodeString’ will return a Unicode string
with the same character as above, the small Latin letter ‘e’ with
a dot above.
The implementation of multilingualization support is not yet
complete. For example, methods such as asLowercase
,
asUppercase
, isLetter
do not yet recognize Unicode
characters.
You need to exercise some care, or your program will be buggy when
Unicode characters are used. In particular, Characters must
not be compared with ==
14 and should
be printed on a Stream with display:
rather than
nextPut:
.
Also, Characters need to be created with
the class method codePoint:
if you are referring to their
Unicode value; codePoint:
is also the only method to create
characters that is accepted by the ANSI Standard for Smalltalk.
The method value:
, instead, should be used if you are referring
to a byte in a particular encoding. This subtle difference means
that, for example, the last two of the following examples will fail:
"Correct. Use #value: with Strings, #codePoint: with UnicodeString." String with: (Character value: 65) String with: (Character value: 128) UnicodeString with: (Character codePoint: 65) UnicodeString with: (Character codePoint: 128) "Correct. Only works for characters in the 0-127 range, which may be considered as defensive programming." String with: (Character codePoint: 65) "Dubious, and only works for characters in the 0-127 range. With UnicodeString, probably you always want #codePoint:." UnicodeString with: (Character value: 65) "Fails, we try to use a high character in a String" String with: (Character codePoint: 128) "Fails, we try to use an encoding in a Unicode string" UnicodeString with: (Character value: 128)
Internationalizing software, instead, means programming it to be able to adapt to the user’s favorite conventions. These conventions can get pretty complex; for example, the user might specify the locale ‘espana-castellano’ for most purposes, but specify the locale ‘usa-english’ for currency formatting: this might make sense if the user is a Spanish-speaking American, working in Spanish, but representing monetary amounts in US dollars. You can see that this system is simple but, at the same time, very complete. This manual, however, is not the right place for a thorough discussion of how an user would set up his system for these conventions; for more information, refer to your operating system’s manual or to the GNU C library’s manual.
GNU Smalltalk inherits from ISO C the concept of a locale, that is, a
collection of conventions, one convention for each purpose, and maps each of
these purposes to a Smalltalk class defined by the I18N
package, and
these classes form a small hierarchy with class Locale
as its roots:
LcNumeric
formats numbers; LcMonetary
and LcMonetaryISO
format currency amounts.
LcTime
formats dates and times.
LcMessages
translates your program’s output. Of course, the
package can’t automatically translate your program’s output messages
into other languages; the only way you can support output in the user’s
favorite language is to translate these messages by hand. The package
does, though, provide methods to easily handle translations into
multiple languages.
Basic usage of the I18N
package involves a single selector, the
question mark (?
), which is a rarely used yet valid character for
a Smalltalk binary message. The meaning of the question mark selector
is “How do you say … under your convention?”. You can send
?
to either a specific instance of a subclass of Locale
,
or to the class itself; in this case, rules for the default locale
(which is specified via environment variables) apply. You might say,
for example, LcTime ? Date today
or, for example,
germanMonetaryLocale ? account balance
. This syntax can be at
first confusing, but turns out to be convenient because of its
consistency and overall simplicity.
Here is how ?
works for different classes:
Format a date, a time or a timestamp (DateTime
object).
Format a number.
Format a monetary value together with its currency symbol.
Format a monetary value together with its ISO currency symbol.
Answer an LcMessagesDomain
that retrieves translations
from the specified file.
Retrieve the translation of the given string.15
These two packages provides much more functionality, including more advanced formatting options support for Unicode, and conversion to and from several character sets. For more information, refer to Multilingual and international support with Iconv and I18N in the GNU Smalltalk Library Reference.
As an aside, the representation of locales that the package uses is exactly the same as the C library, which has many advantages: the burden of mantaining locale data is removed from GNU Smalltalk’s mantainers; the need of having two copies of the same data is removed from GNU Smalltalk’s users; and finally, uniformity of the conventions assumed by different internationalized programs is guaranteed to the end user.
In addition, the representation of translated strings is the standard
MO file format adopted by the GNU gettext
library.
All
the classes mentioned in this section reside in the
I18N
namespace.
Character equality
with =
will be as fast as with ==
.
The ?
method
does not apply to the LcMessagesDomain class itself, but only to its
instances. This is because LcMessagesDomain is not a subclass of
Locale.