Consider the following:
echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
This example uses the sub()
function to make a change to the input
record. (sub()
replaces the first instance of any text matched
by the first argument with the string provided as the second argument;
see String-Manipulation Functions.) Here, the regexp /a+/
indicates “one
or more ‘a’ characters,” and the replacement text is ‘<A>’.
The input contains four ‘a’ characters.
awk
(and POSIX) regular expressions always match
the leftmost, longest sequence of input characters that can
match. Thus, all four ‘a’ characters are
replaced with ‘<A>’ in this example:
$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }' -| <A>bcd
For simple match/no-match tests, this is not so important. But when doing
text matching and substitutions with the match()
, sub()
, gsub()
,
and gensub()
functions, it is very important.
Understanding this principle is also important for regexp-based record
and field splitting (see How Input Is Split into Records,
and also see Specifying How Fields Are Separated).