11.5 Exercises

  1. Rewrite cut.awk (see Cutting Out Fields and Columns) using split() with "" as the separator.
  2. In Searching for Regular Expressions in Files, we mentioned that ‘egrep -i’ could be simulated in versions of awk without IGNORECASE by using tolower() on the line and the pattern. In a footnote there, we also mentioned that this solution has a bug: the translated line is output, and not the original one. Fix this problem.
  3. POSIX versions of grep accept a -F option which causes grep to match fixed strings. Add support for this to egrep.awk.
  4. Similarly, POSIX versions of grep allow you to provide multiple patterns to match, in either of two ways. You may provide a quoted string on the command line where the patterns are separated by newlines. Or you may use the -f option to provide a file containing the patterns, one per line. Implement both of these features.
  5. The POSIX version of id takes options that control which information is printed. Modify the awk version (see Printing Out User Information) to accept the same arguments and perform in the same way.
  6. The split.awk program (see Splitting a Large File into Pieces) assumes that letters are contiguous in the character set, which isn’t true for EBCDIC systems. Fix this problem. (Hint: Consider a different way to work through the alphabet, without relying on ord() and chr().)
  7. In uniq.awk (see Printing Nonduplicated Lines of Text, the logic for choosing which lines to print represents a state machine, which is “a device which can be in one of a set number of stable conditions depending on its previous condition and on the present values of its inputs.”84 Brian Kernighan suggests that “an alternative approach to state machines is to just read the input into an array, then use indexing. It’s almost always easier code, and for most inputs where you would use this, just as fast.” Rewrite the logic to follow this suggestion.
  8. Why can’t the wc.awk program (see Counting Things) just use the value of FNR in endfile()? Hint: Examine the code in Noting Data file Boundaries.
  9. Manipulation of individual characters in the translate program (see Transliterating Characters) is painful using standard awk functions. Given that gawk can split strings into individual characters using "" as the separator, how might you use this feature to simplify the program?
  10. The extract.awk program (see Extracting Programs from Texinfo Source Files) was written before gawk had the gensub() function. Use it to simplify the code.
  11. Compare the performance of the awksed.awk program (see A Simple Stream Editor) with the more straightforward:
    BEGIN {
        pat = ARGV[1]
        repl = ARGV[2]
        ARGV[1] = ARGV[2] = ""
    }
    
    { gsub(pat, repl); print }
    
  12. What are the advantages and disadvantages of awksed.awk versus the real sed utility?
  13. In An Easy Way to Use Library Functions, we mentioned that not trying to save the line read with getline in the pathto() function when testing for the file’s accessibility for use with the main program simplifies things considerably. What problem does this engender though?
  14. As an additional example of the idea that it is not always necessary to add new features to a program, consider the idea of having two files in a directory in the search path:
    default.awk

    This file contains a set of default library functions, such as getopt() and assert().

    site.awk

    This file contains library functions that are specific to a site or installation; i.e., locally developed functions. Having a separate file allows default.awk to change with new gawk releases, without requiring the system administrator to update it each time by adding the local functions.

    One user suggested that gawk be modified to automatically read these files upon startup. Instead, it would be very simple to modify igawk to do this. Since igawk can process nested @include directives, default.awk could simply contain @include directives for the desired library functions. Make this change.

  15. Modify anagram.awk (see Finding Anagrams from a Dictionary), to avoid the use of the external sort utility.

Footnotes

(84)

This definition is from https://www.lexico.com/en/definition/state_machine.