getline
¶Here are some miscellaneous points about getline
that
you should bear in mind:
getline
changes the value of $0
and NF
,
awk
does not automatically jump to the start of the
program and start testing the new record against every pattern.
However, the new record is tested against any subsequent rules.
awk
implementations limit the number of pipelines that an awk
program may have open to just one. In gawk
, there is no such limit.
You can open as many pipelines (and coprocesses) as the underlying operating
system permits.
getline
without a
redirection inside a BEGIN
rule. Because an unredirected getline
reads from the command-line data files, the first getline
function
causes awk
to set the value of FILENAME
. Normally,
FILENAME
does not have a value inside BEGIN
rules, because you
have not yet started to process the command-line data files.
(d.c.)
(See The BEGIN
and END
Special Patterns;
also see Built-in Variables That Convey Information.)
FILENAME
with getline
(‘getline < FILENAME’)
is likely to be a source of
confusion. awk
opens a separate input stream from the
current input file. However, by not using a variable, $0
and NF
are still updated. If you’re doing this, it’s
probably by accident, and you should reconsider what it is you’re
trying to accomplish.
getline
Variants,
presents a table summarizing the
getline
variants and which variables they can affect.
It is worth noting that those variants that do not use redirection
can cause FILENAME
to be updated if they cause
awk
to start reading a new input file.
getline
is not a statement (unlike print
), it’s an
expression. It has a result value, and can be used as part as a
larger expression, in control statements, and so on.
Here are examples of the “read until EOF/error” idiom:
while ("sort FILE" | getline line > 0) print line while (getline line < "file.txt" > 0) print line
If you need to test the error code for being less than zero,
you need to enclose getline
in parentheses, to avoid
it being interpreted as input redirection:
if ((getline VAR) < 0) print "Read error";
It is, in fact, best to parenthesize calls to getline
in all control expressions, as some versions of awk
require this. Thus, the previous examples are best written
this way:
while (("sort FILE" | getline line) > 0) print line while ((getline line < "file.txt") > 0) print line
awk
behave differently upon encountering
end-of-file. Some versions don’t evaluate the expression; many versions
(including gawk
) do. Here is an example, courtesy of Duncan Moore:
BEGIN { system("echo 1 > f") while ((getline a[++c] < "f") > 0) { } print c }
Here, the side effect is the ‘++c’. Is c
incremented if
end-of-file is encountered before the element in a
is assigned?
Despite the lack of parentheses when calling getline
,
gawk
evaluates
the expression ‘a[++c]’ before attempting to read from f.
However, some versions of awk
only evaluate the expression once they
know that there is a string value to be assigned.