Next: Freeing GNU Pattern Buffers, Previous: GNU Translate Tables, Up: GNU Regex Functions [Contents][Index]
A group in a regular expression can match a (possibly empty) substring of the string that regular expression as a whole matched. The matcher remembers the beginning and end of the substring matched by each group.
To find out what they matched, pass a nonzero regs argument to a GNU matching or searching function (see GNU Matching and GNU Searching), i.e., the address of a structure of this type, as defined in regex.h:
struct re_registers { unsigned num_regs; regoff_t *start; regoff_t *end; };
Except for (possibly) the num_regs’th element (see below), the
ith element of the start
and end
arrays records
information about the ith group in the pattern. (They’re declared
as C pointers, but this is only because not all C compilers accept
zero-length arrays; conceptually, it is simplest to think of them as
arrays.)
The start
and end
arrays are allocated in one of two ways.
The simplest and perhaps most useful is to let the matcher (re)allocate
enough space to record information for all the groups in the regular
expression. If re_set_registers
is not called before searching
or matching, then the matcher allocates two arrays each of 1 +
re_nsub elements (re_nsub is another field in the pattern
buffer; see GNU Pattern Buffers). The extra element is set to
-1. Then on subsequent calls with the same pattern buffer and
regs arguments, the matcher reallocates more space if necessary.
The function:
void re_set_registers (struct re_pattern_buffer *buffer, struct re_registers *regs, size_t num_regs, regoff_t *starts, regoff_t *ends)
sets regs to hold num_regs registers, storing
them in starts and ends. Subsequent matches using
buffer and regs will use this memory for recording
register information. starts and ends must be allocated
with malloc, and must each be at least num_regs *
sizeof (regoff_t)
bytes long.
If num_regs is zero, then subsequent matches should allocate their own register data.
Unless this function is called, the first search or match using buffer will allocate its own register data, without freeing the old data.
The following examples illustrate the information recorded in the
re_registers
structure. (In all of them, ‘(’ represents the
open-group and ‘)’ the close-group operator. The first character
in the string string is at index 0.)
regs->start[i]
to the index in string where
the substring matched by the i-th group begins, and
regs->end[i]
to the index just beyond that
substring’s end. The function sets regs->start[0]
and
regs->end[0]
to analogous information about the entire
pattern.
For example, when you match ‘((a)(b))’ against ‘ab’, you get:
regs->start[0]
and 2 in regs->end[0]
regs->start[1]
and 2 in regs->end[1]
regs->start[2]
and 1 in regs->end[2]
regs->start[3]
and 2 in regs->end[3]
For example, when you match the pattern ‘(a)*’ against the string ‘aa’, you get:
regs->start[0]
and 2 in regs->end[0]
regs->start[1]
and 2 in regs->end[1]
regs->start[i]
and
regs->end[i]
to -1.
For example, when you match the pattern ‘(a)*b’ against the string ‘b’, you get:
regs->start[0]
and 1 in regs->end[0]
regs->start[1]
and -1 in regs->end[1]
regs->start[i]
and
regs->end[i]
to the index just beyond that
zero-length string.
For example, when you match the pattern ‘(a*)b’ against the string ‘b’, you get:
regs->start[0]
and 1 in regs->end[0]
regs->start[1]
and 0 in regs->end[1]
regs->start[j]
and
regs->end[j]
the last match (if it matched) of
the j-th group.
For example, when you match the pattern ‘((a*)b)*’ against the string ‘abb’, group 2 last matches the empty string, so you get what it previously matched:
regs->start[0]
and 3 in regs->end[0]
regs->start[1]
and 3 in regs->end[1]
regs->start[2]
and 2 in regs->end[2]
When you match the pattern ‘((a)*b)*’ against the string ‘abb’, group 2 doesn’t participate in the last match, so you get:
regs->start[0]
and 3 in regs->end[0]
regs->start[1]
and 3 in regs->end[1]
regs->start[2]
and 1 in regs->end[2]
regs->start[i]
and
regs->end[i]
to -1, then it also sets
regs->start[j]
and
regs->end[j]
to -1.
For example, when you match the pattern ‘((a)*b)*c’ against the string ‘c’, you get:
regs->start[0]
and 1 in regs->end[0]
regs->start[1]
and -1 in regs->end[1]
regs->start[2]
and -1 in regs->end[2]
Next: Freeing GNU Pattern Buffers, Previous: GNU Translate Tables, Up: GNU Regex Functions [Contents][Index]