This section describes a gawk
-specific feature.
As we saw in the previous section,
regexp constants (/…/
) hold a strange position in the
awk
language. In most contexts, they act like an expression:
‘$0 ~ /…/’. In other contexts, they denote only a regexp to
be matched. In no case are they really a “first class citizen” of the
language. That is, you cannot define a scalar variable whose type is
“regexp” in the same sense that you can define a variable to be a
number or a string:
num = 42 Numeric variable str = "hi" String variable re = /foo/ Wrong! re is the result of $0 ~ /foo/
For a number of more advanced use cases, it would be nice to have regexp constants that are strongly typed; in other words, that denote a regexp useful for matching, and not an expression.
gawk
provides this feature. A strongly typed regexp constant
looks almost like a regular regexp constant, except that it is preceded
by an ‘@’ sign:
re = @/foo/ Regexp variable
Strongly typed regexp constants cannot be used everywhere that a regular regexp constant can, because this would make the language even more confusing. Instead, you may use them only in certain contexts:
case
part of a switch
statement
(see The switch
Statement).
gensub()
,
gsub()
,
match()
,
patsplit()
,
split()
,
and
sub()
(see String-Manipulation Functions).
some_var
is regexp. Additionally, some_var
can be used with ‘~’ and ‘!~’, passed to one of the built-in functions
listed above, or passed as a parameter to a user-defined function.
You may use the -v option (see Command-Line Options) to assign a strongly-typed regexp constant to a variable on the command line, like so:
gawk -v pattern='@/something(interesting)+/' ...
You may also make such assignments as regular command-line arguments (see Other Command-Line Arguments).
You may use the typeof()
built-in function
(see Getting Type Information)
to determine if a variable or function parameter is
a regexp variable.
The true power of this feature comes from the ability to create variables that have regexp type. Such variables can be passed on to user-defined functions, without the confusing aspects of computed regular expressions created from strings or string constants. They may also be passed through indirect function calls (see Indirect Function Calls) and on to the built-in functions that accept regexp constants.
When used in numeric conversions, strongly typed regexp variables convert to zero. When used in string conversions, they convert to the string value of the original regexp text.
There is an additional, interesting corner case. When used as the third
argument to sub()
or gsub()
, they retain their type. Thus,
if you have something like this:
re = @/don't panic/ sub(/don't/, "do", re) print typeof(re), re
then re
retains its type, but now attempts to match the string
‘do panic’. This provides a (very indirect) way to create regexp-typed
variables at runtime.