The following is an alphabetical list of variables that awk
sets automatically on certain occasions in order to provide
information to your program.
The variables that are specific to gawk
are marked with a pound
sign (‘#’). These variables are gawk
extensions. In other
awk
implementations or if gawk
is in compatibility
mode (see Command-Line Options), they are not special:
ARGC
, ARGV
The command-line arguments available to awk
programs are stored in
an array called ARGV
. ARGC
is the number of command-line
arguments present. See Other Command-Line Arguments.
Unlike most awk
arrays,
ARGV
is indexed from 0 to ARGC
− 1.
In the following example:
$ awk 'BEGIN { > for (i = 0; i < ARGC; i++) > print ARGV[i] > }' inventory-shipped mail-list -| awk -| inventory-shipped -| mail-list
ARGV[0]
contains ‘awk’, ARGV[1]
contains ‘inventory-shipped’, and ARGV[2]
contains
‘mail-list’. The value of ARGC
is three, one more than the
index of the last element in ARGV
, because the elements are numbered
from zero.
The names ARGC
and ARGV
, as well as the convention of indexing
the array from 0 to ARGC
− 1, are derived from the C language’s
method of accessing command-line arguments.
The value of ARGV[0]
can vary from system to system.
Also, you should note that the program text is not included in
ARGV
, nor are any of awk
’s command-line options.
See Using ARGC
and ARGV
for information
about how awk
uses these variables.
(d.c.)
ARGIND #
The index in ARGV
of the current file being processed.
Every time gawk
opens a new data file for processing, it sets
ARGIND
to the index in ARGV
of the file name.
When gawk
is processing the input files,
‘FILENAME == ARGV[ARGIND]’ is always true.
This variable is useful in file processing; it allows you to tell how far along you are in the list of data files as well as to distinguish between successive instances of the same file name on the command line.
While you can change the value of ARGIND
within your awk
program, gawk
automatically sets it to a new value when it
opens the next file.
ENVIRON
An associative array containing the values of the environment. The array
indices are the environment variable names; the elements are the values of
the particular environment variables. For example,
ENVIRON["HOME"]
might be /home/arnold
.
For POSIX awk
, changing this array does not affect the
environment passed on to any programs that awk
may spawn via
redirection or the system()
function.
However, beginning with version 4.2, if not in POSIX
compatibility mode, gawk
does update its own environment when
ENVIRON
is changed, thus changing the environment seen by programs
that it creates. You should therefore be especially careful if you
modify ENVIRON["PATH"]
, which is the search path for finding
executable programs.
This can also affect the running gawk
program, since some of the
built-in functions may pay attention to certain environment variables.
The most notable instance of this is mktime()
(see Time Functions), which pays attention the value of the TZ
environment
variable on many systems.
Some operating systems may not have environment variables.
On such systems, the ENVIRON
array is empty (except for
ENVIRON["AWKPATH"]
and
ENVIRON["AWKLIBPATH"]
;
see The AWKPATH
Environment Variable and
see The AWKLIBPATH
Environment Variable).
ERRNO #
If a system error occurs during a redirection for getline
, during
a read for getline
, or during a close()
operation, then
ERRNO
contains a string describing the error.
In addition, gawk
clears ERRNO
before opening each
command-line input file. This enables checking if the file is readable
inside a BEGINFILE
pattern (see The BEGINFILE
and ENDFILE
Special Patterns).
Otherwise, ERRNO
works similarly to the C variable errno
.
Except for the case just mentioned, gawk
never clears
it (sets it to zero or ""
). Thus, you should only expect its
value to be meaningful when an I/O operation returns a failure value,
such as getline
returning −1. You are, of course, free
to clear it yourself before doing an I/O operation.
If the value of ERRNO
corresponds to a system error in the C
errno
variable, then PROCINFO["errno"]
will be set to the value
of errno
. For non-system errors, PROCINFO["errno"]
will
be zero.
FILENAME
The name of the current input file. When no data files are listed
on the command line, awk
reads from the standard input and
FILENAME
is set to "-"
. FILENAME
changes each
time a new file is read (see Reading Input Files). Inside a BEGIN
rule, the value of FILENAME
is ""
, because there are no input
files being processed yet.42 (d.c.) Note, though,
that using getline
(see Explicit Input with getline
) inside a BEGIN
rule
can give FILENAME
a value.
FNR
¶The current record number in the current file. awk
increments
FNR
each time it reads a new record (see How Input Is Split into Records).
awk
resets FNR
to zero each time it starts a new
input file.
NF
¶The number of fields in the current input record.
NF
is set each time a new record is read, when a new field is
created, or when $0
changes (see Examining Fields).
Unlike most of the variables described in this subsection,
assigning a value to NF
has the potential to affect
awk
’s internal workings. In particular, assignments
to NF
can be used to create fields in or remove fields from the
current record. See Changing the Contents of a Field.
FUNCTAB #
An array whose indices and corresponding values are the names of all the built-in, user-defined, and extension functions in the program.
NOTE: Attempting to use the
delete
statement with theFUNCTAB
array causes a fatal error. Any attempt to assign to an element ofFUNCTAB
also causes a fatal error.
NR
¶The number of input records awk
has processed since
the beginning of the program’s execution
(see How Input Is Split into Records).
awk
increments NR
each time it reads a new record.
PROCINFO #
The elements of this array provide access to information about the
running awk
program.
The following elements (listed alphabetically)
are guaranteed to be available:
PROCINFO["argv"]
¶The PROCINFO["argv"]
array contains all of the command-line arguments
(after glob expansion and redirection processing on platforms where that must
be done manually by the program) with subscripts ranging from 0 through
argc
− 1. For example, PROCINFO["argv"][0]
will contain
the name by which gawk
was invoked. Here is an example of how this
feature may be used:
gawk ' BEGIN { for (i = 0; i < length(PROCINFO["argv"]); i++) print i, PROCINFO["argv"][i] }'
Please note that this differs from the standard ARGV
array which does
not include command-line arguments that have already been processed by
gawk
(see Using ARGC
and ARGV
).
PROCINFO["egid"]
¶The value of the getegid()
system call.
PROCINFO["errno"]
The value of the C errno
variable when ERRNO
is set to
the associated error message.
PROCINFO["euid"]
¶The value of the geteuid()
system call.
PROCINFO["FS"]
This is
"FS"
if field splitting with FS
is in effect,
"FIELDWIDTHS"
if field splitting with FIELDWIDTHS
is in effect,
"FPAT"
if field matching with FPAT
is in effect,
or "API"
if field splitting is controlled by an API input parser.
PROCINFO["gid"]
¶The value of the getgid()
system call.
PROCINFO["identifiers"]
¶A subarray, indexed by the names of all identifiers used in the text of
the awk
program. An identifier is simply the name of a variable
(be it scalar or array), built-in function, user-defined function, or
extension function. For each identifier, the value of the element is
one of the following:
"array"
The identifier is an array.
"builtin"
The identifier is a built-in function.
"extension"
¶The identifier is an extension function loaded via
@load
or -l.
"scalar"
The identifier is a scalar.
"untyped"
The identifier is untyped (could be used as a scalar or an array;
gawk
doesn’t know yet).
"user"
The identifier is a user-defined function.
The values indicate what gawk
knows about the identifiers
after it has finished parsing the program; they are not updated
while the program runs.
PROCINFO["platform"]
¶This element gives a string indicating the platform for which
gawk
was compiled. The value will be one of the following:
"mingw"
Microsoft Windows, using MinGW.
"os390"
OS/390 (also known as z/OS).
"posix"
GNU/Linux, Cygwin, macOS, and legacy Unix systems.
"vms"
OpenVMS.
PROCINFO["pgrpid"]
¶The process group ID of the current process.
PROCINFO["pid"]
¶The process ID of the current process.
PROCINFO["pma"]
¶The version of the PMA memory allocator compiled into gawk
.
This element will not be present if the PMA allocator is not available
for use. See Preserving Data Between Runs.
PROCINFO["ppid"]
¶The parent process ID of the current process.
PROCINFO["strftime"]
The default time format string for strftime()
.
Assigning a new value to this element changes the default.
See Time Functions.
PROCINFO["uid"]
The value of the getuid()
system call.
PROCINFO["version"]
¶The version of gawk
.
The following additional elements in the array
are available to provide information about the MPFR and GMP libraries
if your version of gawk
supports arbitrary-precision arithmetic
(see Arithmetic and Arbitrary-Precision Arithmetic with gawk
):
PROCINFO["gmp_version"]
¶The version of the GNU MP library.
PROCINFO["mpfr_version"]
The version of the GNU MPFR library.
PROCINFO["prec_max"]
¶The maximum precision supported by MPFR.
PROCINFO["prec_min"]
¶The minimum precision required by MPFR.
The following additional elements in the array are available to provide
information about the version of the extension API, if your version
of gawk
supports dynamic loading of extension functions
(see Writing Extensions for gawk
):
PROCINFO["api_major"]
¶The major version of the extension API.
PROCINFO["api_minor"]
The minor version of the extension API.
On some systems, there may be elements in the array, "group1"
through "groupN"
for some N. N is the number of
supplementary groups that the process has. Use the in
operator
to test for these elements
(see Referring to an Array Element).
The following elements allow you to change gawk
’s behavior:
PROCINFO["BUFFERPIPE"]
If this element exists, all output to pipelines becomes buffered. See Speeding Up Pipe Output.
PROCINFO["command", "BUFFERPIPE"]
Make output to command buffered. See Speeding Up Pipe Output.
PROCINFO["NONFATAL"]
If this element exists, then I/O errors for all redirections become nonfatal. See Enabling Nonfatal Output.
PROCINFO["name", "NONFATAL"]
Make I/O errors for name be nonfatal. See Enabling Nonfatal Output.
PROCINFO["command", "pty"]
For two-way communication to command, use a pseudo-tty instead of setting up a two-way pipe. See Two-Way Communications with Another Process for more information.
PROCINFO["input_name", "READ_TIMEOUT"]
Set a timeout for reading from input redirection input_name. See Reading Input with a Timeout for more information.
PROCINFO["input_name", "RETRY"]
If an I/O error that may be retried occurs when reading data from
input_name, and this array entry exists, then getline
returns
−2 instead of following the default behavior of returning −1
and configuring input_name to return no further data. An I/O error
that may be retried is one where errno
has the value EAGAIN
,
EWOULDBLOCK
, EINTR
, or ETIMEDOUT
. This may be useful
in conjunction with PROCINFO["input_name", "READ_TIMEOUT"]
or situations where a file descriptor has been configured to behave in
a non-blocking fashion.
See Retrying Reads After Certain Input Errors for more information.
PROCINFO["sorted_in"]
If this element exists in PROCINFO
, its value controls the
order in which array indices will be processed by
‘for (indx in array)’ loops.
This is an advanced feature, so we defer the
full description until later; see
Using Predefined Array Scanning Orders with gawk
.
RLENGTH
¶The length of the substring matched by the
match()
function
(see String-Manipulation Functions).
RLENGTH
is set by invoking the match()
function. Its value
is the length of the matched string, or −1 if no match is found.
RSTART
¶The start index in characters of the substring that is matched by the
match()
function
(see String-Manipulation Functions).
RSTART
is set by invoking the match()
function. Its value
is the position of the string where the matched substring starts, or zero
if no match was found.
RT #
The input text that matched the text denoted by RS
,
the record separator. It is set every time a record is read.
SYMTAB #
An array whose indices are the names of all defined global variables and
arrays in the program. SYMTAB
makes gawk
’s symbol table
visible to the awk
programmer. It is built as gawk
parses the program and is complete before the program starts to run.
The array may be used for indirect access to read or write the value of a variable:
foo = 5 SYMTAB["foo"] = 4 print foo # prints 4
The isarray()
function (see Getting Type Information) may be used to test
if an element in SYMTAB
is an array.
Also, you may not use the delete
statement with the
SYMTAB
array.
Prior to version 5.0 of gawk
, you could
use an index for SYMTAB
that was not a predefined identifier:
SYMTAB["xxx"] = 5 print SYMTAB["xxx"]
This no longer works, instead producing a fatal error, as it led to rampant confusion.
The SYMTAB
array is more interesting than it looks. Andrew Schorr
points out that it effectively gives awk
data pointers. Consider his
example:
# Indirect multiply of any variable by amount, return result function multiply(variable, amount) { return SYMTAB[variable] *= amount }
You would use it like this:
BEGIN { answer = 10.5 multiply("answer", 4) print "The answer is", answer }
When run, this produces:
$ gawk -f answer.awk -| The answer is 42
NOTE: In order to avoid severe time-travel paradoxes,43 neither
FUNCTAB
norSYMTAB
is available as an element within theSYMTAB
array.
Changing NR and FNR |
---|
$ echo '1 > 2 > 3 > 4' | awk 'NR == 2 { NR = 17 } > { print NR }' -| 1 -| 17 -| 18 -| 19 Before |
Some early implementations of Unix
awk
initialized FILENAME
to "-"
, even if there
were data files to be processed. This behavior was incorrect and should
not be relied upon in your programs.
Not to mention difficult implementation issues.