sentence-end
The symbol sentence-end
is bound to the pattern that marks the
end of a sentence. What should this regular expression be?
Clearly, a sentence may be ended by a period, a question mark, or an exclamation mark. Indeed, in English, only clauses that end with one of those three characters should be considered the end of a sentence. This means that the pattern should include the character set:
[.?!]
However, we do not want forward-sentence
merely to jump to a
period, a question mark, or an exclamation mark, because such a character
might be used in the middle of a sentence. A period, for example, is
used after abbreviations. So other information is needed.
According to convention, you type two spaces after every sentence, but only one space after a period, a question mark, or an exclamation mark in the body of a sentence. So a period, a question mark, or an exclamation mark followed by two spaces is a good indicator of an end of sentence. However, in a file, the two spaces may instead be a tab or the end of a line. This means that the regular expression should include these three items as alternatives.
This group of alternatives will look like this:
\\($\\| \\| \\) ^ ^^ TAB SPC
Here, ‘$’ indicates the end of the line, and I have pointed out where the tab and two spaces are inserted in the expression. Both are inserted by putting the actual characters into the expression.
Two backslashes, ‘\\’, are required before the parentheses and vertical bars: the first backslash quotes the following backslash in Emacs; and the second indicates that the following character, the parenthesis or the vertical bar, is special.
Also, a sentence may be followed by one or more carriage returns, like this:
[ ]*
Like tabs and spaces, a carriage return is inserted into a regular expression by inserting it literally. The asterisk indicates that the RET is repeated zero or more times.
But a sentence end does not consist only of a period, a question mark or an exclamation mark followed by appropriate space: a closing quotation mark or a closing brace of some kind may precede the space. Indeed more than one such mark or brace may precede the space. These require a expression that looks like this:
[]\"')}]*
In this expression, the first ‘]’ is the first character in the expression; the second character is ‘"’, which is preceded by a ‘\’ to tell Emacs the ‘"’ is not special. The last three characters are ‘'’, ‘)’, and ‘}’.
All this suggests what the regular expression pattern for matching the
end of a sentence should be; and, indeed, if we evaluate
sentence-end
we find that it returns the following value:
sentence-end ⇒ "[.?!][]\"')}]*\\($\\| \\| \\)[ ]*"
(Well, not in GNU Emacs 22; that is because of an effort to make the
process simpler and to handle more glyphs and languages. When the
value of sentence-end
is nil
, then use the value defined
by the function sentence-end
. (Here is a use of the difference
between a value and a function in Emacs Lisp.) The function returns a
value constructed from the variables sentence-end-base
,
sentence-end-double-space
, sentence-end-without-period
,
and sentence-end-without-space
. The critical variable is
sentence-end-base
; its global value is similar to the one
described above but it also contains two additional quotation marks.
These have differing degrees of curliness. The
sentence-end-without-period
variable, when true, tells Emacs
that a sentence may end without a period, such as text in Thai.)