B.4.4 What To Do If You Think There Is A Performance Issue

If you think that gawk is too slow at doing a particular task, you should investigate before sending in a bug report. Here are the steps to follow:

  1. Run gawk with the --profile option (see Command-Line Options) to see what your program is doing. It may be that you have written it in an inefficient manner. For example, you may be doing something for every record that could be done just once, for every file. (Use a BEGINFILE rule; see The BEGINFILE and ENDFILE Special Patterns.) Or you may be doing something for every file that only needs to be done once per run of the program. (Use a BEGIN rule; see The BEGIN and END Special Patterns.)
  2. If profiling at the awk level doesn’t help, then you will need to compile gawk itself for profiling at the C language level.

    To do that, start with the latest released version of gawk. Unpack the source code in a new directory, and configure it:

    $ tar -xpzvf gawk-X.Y.Z.tar.gz
    ...                                Output omitted
    $ cd gawk-X.Y.Z
    $ ./configure
    ...                                Output omitted
    
  3. Edit the files Makefile and support/Makefile. Change every instance of -O2 or -O to -pg. This causes gawk to be compiled for profiling.
  4. Compile the program by running the make command:
    $ make
    ...                                Output omitted
    
  5. Run the freshly compiled gawk on a real program, using real data. Using an artificial program to try to time one particular feature of gawk is useless; real awk programs generally spend most of their time doing I/O, not computing. If you want to prove that something is slow, it must be done using a real program and real data.

    Use a data file that is large enough for the statistical profiling to measure where gawk spends its time. It should be at least 100 megabytes in size.

    $ ./gawk -f realprogram.awk realdata > /dev/null
    
  6. When done, you should have a file in the current directory named gmon.out. Run the command ‘gprof gawk gmon.out > gprof.out’.
  7. Submit a bug report explaining what you think is slow. Include the gprof.out file with it.

    Preferably, you should also submit the program and the data, or else indicate where to get the data if the file is large.

  8. If you have not submitted your program and data, be prepared to apply patches and rerun the profiling in order to see if the patches were effective.

If you are incapable or unwilling to do the steps listed above, then you will just have to live with gawk as it is.