The sed
utility is a stream editor, a program that reads a
stream of data, makes changes to it, and passes it on.
It is often used to make global changes to a large file or to a stream
of data generated by a pipeline of commands.
Although sed
is a complicated program in its own right, its most common
use is to perform global substitutions in the middle of a pipeline:
command1 < orig.data | sed 's/old/new/g' | command2 > result
Here, ‘s/old/new/g’ tells sed
to look for the regexp
‘old’ on each input line and globally replace it with the text
‘new’ (i.e., all the occurrences on a line). This is similar to
awk
’s gsub()
function
(see String-Manipulation Functions).
The following program, awksed.awk, accepts at least two command-line arguments: the pattern to look for and the text to replace it with. Any additional arguments are treated as data file names to process. If none are provided, the standard input is used:
# awksed.awk --- do s/foo/bar/g using just print # Thanks to Michael Brennan for the idea function usage() { print "usage: awksed pat repl [files...]" > "/dev/stderr" exit 1 }
BEGIN { # validate arguments if (ARGC < 3) usage()
RS = ARGV[1] ORS = ARGV[2] # don't use arguments as files ARGV[1] = ARGV[2] = "" }
# look ma, no hands! { if (RT == "") printf "%s", $0 else print }
The program relies on gawk
’s ability to have RS
be a regexp,
as well as on the setting of RT
to the actual text that terminates the
record (see How Input Is Split into Records).
The idea is to have RS
be the pattern to look for. gawk
automatically sets $0
to the text between matches of the pattern.
This is text that we want to keep, unmodified. Then, by setting ORS
to the replacement text, a simple print
statement outputs the
text we want to keep, followed by the replacement text.
There is one wrinkle to this scheme, which is what to do if the last record
doesn’t end with text that matches RS
. Using a print
statement unconditionally prints the replacement text, which is not correct.
However, if the file did not end in text that matches RS
, RT
is set to the null string. In this case, we can print $0
using
printf
(see Using printf
Statements for Fancier Printing).
The BEGIN
rule handles the setup, checking for the right number
of arguments and calling usage()
if there is a problem. Then it sets
RS
and ORS
from the command-line arguments and sets
ARGV[1]
and ARGV[2]
to the null string, so that they are
not treated as file names
(see Using ARGC
and ARGV
).
The usage()
function prints an error message and exits.
Finally, the single rule handles the printing scheme outlined earlier,
using print
or printf
as appropriate, depending upon the
value of RT
.