This section describes an advanced, gawk
-specific extension.
Often, you may wish to defer the choice of function to call until runtime. For example, you may have different kinds of records, each of which should be processed differently.
Normally, you would have to use a series of if
-else
statements to decide which function to call. By using indirect
function calls, you can specify the name of the function to call as a
string variable, and then call the function. Let’s look at an example.
Suppose you have a file with your test scores for the classes you are taking, and you wish to get the sum and the average of your test scores. The first field is the class name. The following fields are the functions to call to process the data, up to a “marker” field ‘data:’. Following the marker, to the end of the record, are the various numeric test scores.
Here is the initial file:
Biology_101 sum average data: 87.0 92.4 78.5 94.9 Chemistry_305 sum average data: 75.2 98.3 94.7 88.2 English_401 sum average data: 100.0 95.6 87.1 93.4
To process the data, you might write initially:
{ class = $1 for (i = 2; $i != "data:"; i++) { if ($i == "sum") sum() # processes the whole record else if ($i == "average") average() ... # and so on } }
This style of programming works, but can be awkward. With indirect
function calls, you tell gawk
to use the value of a
variable as the name of the function to call.
The syntax is similar to that of a regular function call: an identifier immediately followed by an opening parenthesis, any arguments, and then a closing parenthesis, with the addition of a leading ‘@’ character:
the_function = "sum" result = @the_function() # calls the sum() function
Here is a full program that processes the previously shown data, using indirect function calls:
# indirectcall.awk --- Demonstrate indirect function calls # average --- return the average of the values in fields $first - $last function average(first, last, sum, i) { sum = 0; for (i = first; i <= last; i++) sum += $i return sum / (last - first + 1) } # sum --- return the sum of the values in fields $first - $last function sum(first, last, ret, i) { ret = 0; for (i = first; i <= last; i++) ret += $i return ret }
These two functions expect to work on fields; thus, the parameters
first
and last
indicate where in the fields to start and end.
Otherwise, they perform the expected computations and are not unusual:
# For each record, print the class name and the requested statistics { class_name = $1 gsub(/_/, " ", class_name) # Replace _ with spaces # find start for (i = 1; i <= NF; i++) { if ($i == "data:") { start = i + 1 break } } printf("%s:\n", class_name) for (i = 2; $i != "data:"; i++) { the_function = $i printf("\t%s: <%s>\n", $i, @the_function(start, NF) "") } print "" }
This is the main processing for each record. It prints the class name (with
underscores replaced with spaces). It then finds the start of the actual data,
saving it in start
.
The last part of the code loops through each function name (from $2
up to
the marker, ‘data:’), calling the function named by the field. The indirect
function call itself occurs as a parameter in the call to printf
.
(The printf
format string uses ‘%s’ as the format specifier so that we
can use functions that return strings, as well as numbers. Note that the result
from the indirect call is concatenated with the empty string, in order to force
it to be a string value.)
Here is the result of running the program:
$ gawk -f indirectcall.awk class_data1 -| Biology 101: -| sum: <352.8> -| average: <88.2> -| -| Chemistry 305: -| sum: <356.4> -| average: <89.1> -| -| English 401: -| sum: <376.1> -| average: <94.025>
The ability to use indirect function calls is more powerful than you may
think at first. The C and C++ languages provide “function pointers,” which
are a mechanism for calling a function chosen at runtime. One of the most
well-known uses of this ability is the C qsort()
function, which sorts
an array using the famous “quicksort” algorithm
(see the Wikipedia article
for more information). To use this function, you supply a pointer to a comparison
function. This mechanism allows you to sort arbitrary data in an arbitrary
fashion.
We can do something similar using gawk
, like this:
# quicksort.awk --- Quicksort algorithm, with user-supplied # comparison function # quicksort --- C.A.R. Hoare's quicksort algorithm. See Wikipedia # or almost any algorithms or computer science text. function quicksort(data, left, right, less_than, i, last) { if (left >= right) # do nothing if array contains fewer return # than two elements quicksort_swap(data, left, int((left + right) / 2)) last = left for (i = left + 1; i <= right; i++) if (@less_than(data[i], data[left])) quicksort_swap(data, ++last, i) quicksort_swap(data, left, last) quicksort(data, left, last - 1, less_than) quicksort(data, last + 1, right, less_than) } # quicksort_swap --- helper function for quicksort, should really be inline function quicksort_swap(data, i, j, temp) { temp = data[i] data[i] = data[j] data[j] = temp }
The quicksort()
function receives the data
array, the starting and ending
indices to sort (left
and right
), and the name of a function that
performs a “less than” comparison. It then implements the quicksort algorithm.
To make use of the sorting function, we return to our previous example. The first thing to do is write some comparison functions:
# num_lt --- do a numeric less than comparison function num_lt(left, right) { return ((left + 0) < (right + 0)) }
# num_ge --- do a numeric greater than or equal to comparison function num_ge(left, right) { return ((left + 0) >= (right + 0)) }
The num_ge()
function is needed to perform a descending sort; when used
to perform a “less than” test, it actually does the opposite (greater than
or equal to), which yields data sorted in descending order.
Next comes a sorting function. It is parameterized with the starting and
ending field numbers and the comparison function. It builds an array with
the data and calls quicksort()
appropriately, and then formats the
results as a single string:
# do_sort --- sort the data according to `compare' # and return it as a string function do_sort(first, last, compare, data, i, retval) { delete data for (i = 1; first <= last; first++) { data[i] = $first i++ } quicksort(data, 1, i-1, compare) retval = data[1] for (i = 2; i in data; i++) retval = retval " " data[i] return retval }
Finally, the two sorting functions call do_sort()
, passing in the
names of the two comparison functions:
# sort --- sort the data in ascending order and return it as a string function sort(first, last) { return do_sort(first, last, "num_lt") }
# rsort --- sort the data in descending order and return it as a string function rsort(first, last) { return do_sort(first, last, "num_ge") }
Here is an extended version of the data file:
Biology_101 sum average sort rsort data: 87.0 92.4 78.5 94.9 Chemistry_305 sum average sort rsort data: 75.2 98.3 94.7 88.2 English_401 sum average sort rsort data: 100.0 95.6 87.1 93.4
Finally, here are the results when the enhanced program is run:
$ gawk -f quicksort.awk -f indirectcall.awk class_data2 -| Biology 101: -| sum: <352.8> -| average: <88.2> -| sort: <78.5 87.0 92.4 94.9> -| rsort: <94.9 92.4 87.0 78.5> -| -| Chemistry 305: -| sum: <356.4> -| average: <89.1> -| sort: <75.2 88.2 94.7 98.3> -| rsort: <98.3 94.7 88.2 75.2> -| -| English 401: -| sum: <376.1> -| average: <94.025> -| sort: <87.1 93.4 95.6 100.0> -| rsort: <100.0 95.6 93.4 87.1>
Another example where indirect functions calls are useful can be found in processing arrays. This is described in Traversing Arrays of Arrays.
Remember that you must supply a leading ‘@’ in front of an indirect function call.
Starting with version 4.1.2 of gawk
, indirect function
calls may also be used with built-in functions and with extension functions
(see Writing Extensions for gawk
). There are some limitations when calling
built-in functions indirectly, as follows.
sub()
,
gsub()
, gensub()
, match()
, split()
and
patsplit()
functions. However, you can pass a strongly typed
regexp constant (see Strongly Typed Regexp Constants).
sub()
or gsub()
, you may only pass two arguments,
since those functions are unusual in that they update their third argument.
This means that $0
will be updated.
$0
as
a default parameter; you must supply an argument instead. For example,
you must pass an argument to length()
if calling it indirectly.
length()
with two arguments. These errors are found at runtime
instead of when gawk
parses your program, since gawk
doesn’t know until runtime if you have passed the correct number of
arguments or not.
gawk
does its best to make indirect function calls efficient.
For example, in the following case:
for (i = 1; i <= n; i++) @the_function()
gawk
looks up the actual function to call only once.