Where you are can matter when it comes to converting between numbers and
strings. The local character set and language—the locale—can
affect numeric formats. In particular, for awk
programs,
it affects the decimal point character and the thousands-separator
character. The "C"
locale, and most English-language locales,
use the period character (‘.’) as the decimal point and don’t
have a thousands separator. However, many (if not most) European and
non-English locales use the comma (‘,’) as the decimal point
character. European locales often use either a space or a period as
the thousands separator, if they have one.
The POSIX standard says that awk
always uses the period as the decimal
point when reading the awk
program source code, and for
command-line variable assignments (see Other Command-Line Arguments). However,
when interpreting input data, for print
and printf
output,
and for number-to-string conversion, the local decimal point character
is used. (d.c.) In all cases, numbers in source code and
in input data cannot have a thousands separator. Here are some examples
indicating the difference in behavior, on a GNU/Linux system:
$ export POSIXLY_CORRECT=1 Force POSIX behavior $ gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3.14159 $ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3,14159 $ echo 4,321 | gawk '{ print $1 + 1 }' -| 5 $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }' -| 5,321
The en_DK.utf-8
locale is for English in Denmark, where the comma acts as
the decimal point separator. In the normal "C"
locale, gawk
treats ‘4,321’ as 4, while in the Danish locale, it’s treated
as the full number including the fractional part, 4.321.
Some earlier versions of gawk
fully complied with this aspect
of the standard. However, many users in non-English locales complained
about this behavior, because their data used a period as the decimal
point, so the default behavior was restored to use a period as the
decimal point character. You can use the --use-lc-numeric
option (see Command-Line Options) to force gawk
to use the locale’s
decimal point character. (gawk
also uses the locale’s decimal
point character when in POSIX mode, either via --posix or the
POSIXLY_CORRECT
environment variable, as shown previously.)
Table 6.1 describes the cases in which the locale’s decimal point character is used and when a period is used. Some of these features have not been described yet.
Feature | Default | --posix or --use-lc-numeric |
---|---|---|
%'g | Use locale | Use locale |
%g | Use period | Use locale |
Input | Use period | Use locale |
strtonum() | Use period | Use locale |
Finally, modern-day formal standards and the IEEE standard floating-point
representation can have an unusual but important effect on the way
gawk
converts some special string values to numbers. The details
are presented in Standards Versus Existing Practice.