The Emacs regexp implementation, like many of its kind, is generally robust but occasionally causes trouble in either of two ways: matching may run out of internal stack space and signal an error, and it can take a long time to complete. The advice below will make these symptoms less likely and help alleviate problems that do arise.
\`
). This takes advantage
of fast paths in the implementation and can avoid futile matching
attempts. Other zero-width assertions may also bring benefits by
causing a match to fail early.
(It is a trade-off: successfully matched or-patterns run faster with the most frequently matched pattern first.)
Be especially careful with nested repetitions: they can easily result in very slow matching in the presence of ambiguities. For example, ‘\(?:a*b*\)+c’ will take a long time attempting to match even a moderately long string of ‘a’s before failing. The equivalent ‘\(?:a\|b\)*c’ is much faster, and ‘[ab]*c’ better still.
rx
(see The rx
Structured Regexp Notation); it can optimize some
or-patterns automatically and will never introduce capturing groups
unless explicitly requested.
If you run into regexp stack overflow despite following the above advice, don’t be afraid of performing the matching in multiple function calls, each using a simpler regexp where backtracking can more easily be contained.