awk
Programs ¶You may produce execution traces of your awk
programs.
This is done by passing the option --profile to gawk
.
When gawk
has finished running, it creates a profile of your program in a file
named awkprof.out. Because it is profiling, it also executes up to 45% slower than
gawk
normally does.
As shown in the following example,
the --profile option can be used to change the name of the file
where gawk
will write the profile:
gawk --profile=myprog.prof -f myprog.awk data1 data2
In the preceding example, gawk
places the profile in
myprog.prof instead of in awkprof.out.
Here is a sample session showing a simple awk
program,
its input data, and the results from running gawk
with the
--profile option. First, the awk
program:
BEGIN { print "First BEGIN rule" } END { print "First END rule" } /foo/ { print "matched /foo/, gosh" for (i = 1; i <= 3; i++) sing() } { if (/foo/) print "if is true" else print "else is true" } BEGIN { print "Second BEGIN rule" } END { print "Second END rule" } function sing( dummy) { print "I gotta be me!" }
Following is the input data:
foo bar baz foo junk
Here is the awkprof.out that results from running the
gawk
profiler on this program and data (this example also
illustrates that awk
programmers sometimes get up very early
in the morning to work):
# gawk profile, created Mon Sep 29 05:16:21 2014 # BEGIN rule(s) BEGIN { 1 print "First BEGIN rule" } BEGIN { 1 print "Second BEGIN rule" } # Rule(s) 5 /foo/ { # 2 2 print "matched /foo/, gosh" 6 for (i = 1; i <= 3; i++) { 6 sing() } } 5 { 5 if (/foo/) { # 2 2 print "if is true" 3 } else { 3 print "else is true" } } # END rule(s) END { 1 print "First END rule" } END { 1 print "Second END rule" } # Functions, listed alphabetically 6 function sing(dummy) { 6 print "I gotta be me!" }
This example illustrates many of the basic features of profiling output. They are as follows:
BEGIN
rules,
BEGINFILE
rules,
pattern–action rules,
ENDFILE
rules, END
rules, and functions, listed
alphabetically.
Multiple BEGIN
and END
rules retain their
separate identities, as do
multiple BEGINFILE
and ENDFILE
rules.
if
-else
statement shows how many times
the condition was tested.
To the right of the opening left brace for the if
’s body
is a count showing how many times the condition was true.
The count for the else
indicates how many times the test failed.
for
or while
) shows how many times the loop test was executed.
(Because of this, you can’t just look at the count on the first
statement in a rule to determine how many times the rule was executed.
If the first statement is a loop, the count is misleading.)
function
keyword indicates how many times the function was called.
The counts next to the statements in the body show how many times
those statements were executed.
if
, else
, or loop is only a single statement.
print
and printf
only when
the print
or printf
statement is followed by a redirection.
Similarly, if
the target of a redirection isn’t a scalar, it gets parenthesized.
gawk
supplies leading comments in
front of the BEGIN
and END
rules,
the BEGINFILE
and ENDFILE
rules,
the pattern–action rules, and the functions.
awk
namespace are listed first, in alphabetical order. Then come the
functions in namespaces. The namespaces are listed in alphabetical order,
and the functions within each namespace are listed alphabetically.
The profiled version of your program may not look exactly like what you
typed when you wrote it. This is because gawk
creates the
profiled version by “pretty-printing” its internal representation of
the program. The advantage to this is that gawk
can produce
a standard representation.
Also, things such as:
/foo/
come out as:
/foo/ { print }
which is correct, but possibly unexpected. (If a program uses both ‘print $0’ and plain ‘print’, that distinction is retained.)
Besides creating profiles when a program has completed,
gawk
can produce a profile while it is running.
This is useful if your awk
program goes into an
infinite loop and you want to see what has been executed.
To use this feature, run gawk
with the --profile
option in the background:
$ gawk --profile -f myprog & [1] 13992
The shell prints a job number and process ID number; in this case, 13992.
Use the kill
command to send the USR1
signal
to gawk
:
$ kill -USR1 13992
As usual, the profiled version of the program is written to awkprof.out, or to a different file if one was specified with the --profile option.
Along with the regular profile, as shown earlier, the profile file includes a trace of any active functions:
# Function Call Stack: # 3. baz # 2. bar # 1. foo # -- main --
You may send gawk
the USR1
signal as many times as you like.
Each time, the profile and function call trace are appended to the output
profile file.
If you use the HUP
signal instead of the USR1
signal,
gawk
produces the profile and the function call trace and then exits.
When gawk
runs on MS-Windows systems, it uses the
INT
and QUIT
signals for producing the profile, and in
the case of the INT
signal, gawk
exits. This is
because these systems don’t support the kill
command, so the
only signals you can deliver to a program are those generated by the
keyboard. The INT
signal is generated by the
Ctrl-c or Ctrl-BREAK key, while the
QUIT
signal is generated by the Ctrl-\ key.
Finally, gawk
also accepts another option, --pretty-print.
When called this way, gawk
“pretty-prints” the program into
awkprof.out, without any execution counts.
NOTE: Once upon a time, the --pretty-print option would also run your program. This is no longer the case.
There is a significant difference between the output created when
profiling, and that created when pretty-printing. Pretty-printed output
preserves the original comments that were in the program, although their
placement may not correspond exactly to their original locations in the
source code. However, no comments should be lost.
Also, gawk
does the best it can to preserve
the distinction between comments at the end of a statement and comments
on lines by themselves. This isn’t always perfect, though.
However, as a deliberate design decision, profiling output omits the original program’s comments. This allows you to focus on the execution count data and helps you avoid the temptation to use the profiler for pretty-printing.
Additionally, pretty-printed output does not have the leading indentation that the profiling output does. This makes it easy to pretty-print your code once development is completed, and then use the result as the final version of your program.
Because the internal representation of your program is formatted to
recreate an awk
program, profiling and pretty-printing
automatically disable gawk
’s default optimizations.
Profiling and pretty-printing also preserve the original format of numeric constants; if you used an octal or hexadecimal value in your source code, it will appear that way in the output.