Scalar objects in awk
(variables, array elements, and fields)
are dynamically typed. This means their type can change as the
program runs, from untyped before any use,36 to string
or number, and then from string to number or number to string, as the
program progresses. (gawk
also provides regexp-typed scalars,
but let’s ignore that for now; see Strongly Typed Regexp Constants.)
You can’t do much with untyped variables, other than tell that they
are untyped. The following program tests a
against ""
and 0
; the test succeeds when a
has never been assigned
a value. It also uses the built-in typeof()
function
(not presented yet; see Getting Type Information) to show a
’s type:
$ gawk 'BEGIN { print (a == "" && a == 0 ? > "a is untyped" : "a has a type!") ; print typeof(a) }' -| a is untyped -| unassigned
A scalar has numeric type when assigned a numeric value, such as from a numeric constant, or from another scalar with numeric type:
$ gawk 'BEGIN { a = 42 ; print typeof(a) > b = a ; print typeof(b) }' number number
Similarly, a scalar has string type when assigned a string value, such as from a string constant, or from another scalar with string type:
$ gawk 'BEGIN { a = "forty two" ; print typeof(a) > b = a ; print typeof(b) }' string string
So far, this is all simple and straightforward. What happens, though,
when awk
has to process data from a user? Let’s start with
field data. What should the following command produce as output?
echo hello | awk '{ printf("%s %s < 42\n", $1, ($1 < 42 ? "is" : "is not")) }'
Since ‘hello’ is alphabetic data, awk
can only do a string
comparison. Internally, it converts 42
into "42"
and compares
the two string values "hello"
and "42"
. Here’s the result:
$ echo hello | awk '{ printf("%s %s < 42\n", $1, > ($1 < 42 ? "is" : "is not")) }' -| hello is not < 42
However, what happens when data from a user looks like a number?
On the one hand, in reality, the input data consists of characters, not
binary numeric
values. But, on the other hand, the data looks numeric, and awk
really ought to treat it as such. And indeed, it does:
$ echo 37 | awk '{ printf("%s %s < 42\n", $1, > ($1 < 42 ? "is" : "is not")) }' -| 37 is < 42
Here are the rules for when awk
treats data as a number, and for when it treats data as a string.
The POSIX standard uses the term numeric string for input data that looks numeric. The ‘37’ in the previous example is a numeric string. So what is the type of a numeric string? Answer: numeric.
The type of a variable is important because the types of two variables determine how they are compared. Variable typing follows these definitions and rules:
getline
input, FILENAME
, ARGV
elements,
ENVIRON
elements, and the elements of an array created by
match()
, split()
, and patsplit()
that are numeric
strings have the strnum attribute.37
Otherwise, they have
the string attribute. Uninitialized variables also have the
strnum attribute.
The last rule is particularly important. In the following program,
a
has numeric type, even though it is later used in a string
operation:
BEGIN { a = 12.345 b = a " is a cute number" print b }
When two operands are compared, either string comparison or numeric comparison may be used. This depends upon the attributes of the operands, according to the following symmetric matrix:
+---------------------------------------------- | STRING NUMERIC STRNUM --------+---------------------------------------------- | STRING | string string string | NUMERIC | string numeric numeric | STRNUM | string numeric numeric --------+----------------------------------------------
The basic idea is that user input that looks numeric—and only
user input—should be treated as numeric, even though it is actually
made of characters and is therefore also a string.
Thus, for example, the string constant " +3.14"
,
when it appears in program source code,
is a string—even though it looks numeric—and
is never treated as a number for comparison
purposes.
In short, when one operand is a “pure” string, such as a string
constant, then a string comparison is performed. Otherwise, a
numeric comparison is performed.
(The primary difference between a number and a strnum is that
for strnums gawk
preserves the original string value that
the scalar had when it came in.)
This point bears additional emphasis: Input that looks numeric is numeric. All other input is treated as strings.
Thus, the six-character input string ‘ +3.14’ receives the
strnum attribute. In contrast, the eight characters
" +3.14"
appearing in program text comprise a string constant.
The following examples print ‘1’ when the comparison between
the two different constants is true, and ‘0’ otherwise:
$ echo ' +3.14' | awk '{ print($0 == " +3.14") }' True -| 1 $ echo ' +3.14' | awk '{ print($0 == "+3.14") }' False -| 0 $ echo ' +3.14' | awk '{ print($0 == "3.14") }' False -| 0 $ echo ' +3.14' | awk '{ print($0 == 3.14) }' True -| 1 $ echo ' +3.14' | awk '{ print($1 == " +3.14") }' False -| 0 $ echo ' +3.14' | awk '{ print($1 == "+3.14") }' True -| 1 $ echo ' +3.14' | awk '{ print($1 == "3.14") }' False -| 0 $ echo ' +3.14' | awk '{ print($1 == 3.14) }' True -| 1
You can see the type of an input field (or other user input)
using typeof()
:
$ echo hello 37 | gawk '{ print typeof($1), typeof($2) }' -| string strnum
gawk
calls this unassigned, as the following example shows.
Thus, a POSIX
numeric string and gawk
’s strnum are the same thing.