Modern systems support the notion of locales: a way to tell the
system about the local character set and language. The ISO C standard
defines a default "C"
locale, which is an environment that is
typical of what many C programmers are used to.
Once upon a time, the locale setting used to affect regexp matching, but this is no longer true (see Regexp Ranges and Locales: A Long Sad Story).
Locales can affect record splitting. For the normal case of ‘RS =
"\n"’, the locale is largely irrelevant. For other single-character
record separators, setting ‘LC_ALL=C’ in the environment will
give you much better performance when reading records. Otherwise,
gawk
has to make several function calls, per input
character, to find the record terminator.
Locales can affect how dates and times are formatted (see Time Functions). For example, a common way to abbreviate the date September
4, 2015, in the United States is “9/4/15.” In many countries in
Europe, however, it is abbreviated “4.9.15.” Thus, the ‘%x’
specification in a "US"
locale might produce ‘9/4/15’,
while in a "EUROPE"
locale, it might produce ‘4.9.15’.
According to POSIX, string comparison is also affected by locales (similar to regular expressions). The details are presented in String Comparison Based on Locale Collating Order.
Finally, the locale affects the value of the decimal point character
used when gawk
parses input data. This is discussed in detail
in Conversion of Strings and Numbers.