Historically, awk
has converted any nonnumeric-looking string
to the numeric value zero, when required. Furthermore, the original
definition of the language and the original POSIX standards specified that
awk
only understands decimal numbers (base 10), and not octal
(base 8) or hexadecimal numbers (base 16).
Changes in the language of the
2001 and 2004 POSIX standards can be interpreted to imply that awk
should support additional features. These features are:
0xDEADBEEF
). (Note: data values, not
source code constants.)
The first problem is that both of these are clear changes to historical practice:
gawk
maintainer feels that supporting hexadecimal
floating-point values, in particular, is ugly, and was never intended by the
original designers to be part of the language.
The second problem is that the gawk
maintainer feels that this
interpretation of the standard, which required a certain amount of
“language lawyering” to arrive at in the first place, was not even
intended by the standard developers. In other words, “We see how you
got where you are, but we don’t think that that’s where you want to be.”
Recognizing these issues, but attempting to provide compatibility
with the earlier versions of the standard,
the 2008 POSIX standard added explicit wording to allow, but not require,
that awk
support hexadecimal floating-point values and
special values for “not a number” and infinity.
Although the gawk
maintainer continues to feel that
providing those features is inadvisable,
nevertheless, on systems that support IEEE floating point, it seems
reasonable to provide some way to support NaN and infinity values.
The solution implemented in gawk
is as follows:
gawk
becomes
“hands off.” String values are passed directly to the system library’s
strtod()
function, and if it successfully returns a numeric value,
that is what’s used.105
By definition, the results are not portable across
different systems. They are also a little surprising:
$ echo nanny | gawk --posix '{ print $1 + 0 }' -| nan $ echo 0xDeadBeef | gawk --posix '{ print $1 + 0 }' -| 3735928559
gawk
interprets the four string values
‘+inf’,
‘-inf’,
‘+nan’,
and
‘-nan’
specially, producing the corresponding special numeric values.
The leading sign acts a signal to gawk
(and the user)
that the value is really numeric. Hexadecimal floating point is
not supported (unless you also use --non-decimal-data,
which is not recommended). For example:
$ echo nanny | gawk '{ print $1 + 0 }' -| 0 $ echo +nan | gawk '{ print $1 + 0 }' -| +nan $ echo 0xDeadBeef | gawk '{ print $1 + 0 }' -| 0
gawk
ignores case in the four special values.
Thus, ‘+nan’ and ‘+NaN’ are the same.
Besides handling input, gawk
also needs to print “correct” values on
output when a value is either NaN or infinity. Starting with version
4.2.2, for such values gawk
prints one of the four strings
just described: ‘+inf’, ‘-inf’, ‘+nan’, or ‘-nan’.
Similarly, in POSIX mode, gawk
prints the result of
the system’s C printf()
function using the %g
format string
for the value, whatever that may be.
NOTE: The sign used for NaN values can vary! The result depends upon both the underlying system architecture and the underlying library used to format NaN values. In particular, it’s possible to get different results for the same function call depending upon whether or not
gawk
is running in MPFR mode (-M) or not. Caveat Emptor!