SMIE comes with a predefined lexical analyzer which uses syntax tables
in the following way: any sequence of characters that have word or
symbol syntax is considered a token, and so is any sequence of
characters that have punctuation syntax. This default lexer is
often a good starting point but is rarely actually correct for any given
language. For example, it will consider "2,+3"
to be composed
of 3 tokens: "2"
, ",+"
, and "3"
.
To describe the lexing rules of your language to SMIE, you need 2 functions, one to fetch the next token, and another to fetch the previous token. Those functions will usually first skip whitespace and comments and then look at the next chunk of text to see if it is a special token. If so it should skip the token and return a description of this token. Usually this is simply the string extracted from the buffer, but it can be anything you want. For example:
(defvar sample-keywords-regexp (regexp-opt '("+" "*" "," ";" ">" ">=" "<" "<=" ":=" "=")))
(defun sample-smie-forward-token () (forward-comment (point-max)) (cond ((looking-at sample-keywords-regexp) (goto-char (match-end 0)) (match-string-no-properties 0)) (t (buffer-substring-no-properties (point) (progn (skip-syntax-forward "w_") (point))))))
(defun sample-smie-backward-token () (forward-comment (- (point))) (cond ((looking-back sample-keywords-regexp (- (point) 2) t) (goto-char (match-beginning 0)) (match-string-no-properties 0)) (t (buffer-substring-no-properties (point) (progn (skip-syntax-backward "w_") (point))))))
Notice how those lexers return the empty string when in front of
parentheses. This is because SMIE automatically takes care of the
parentheses defined in the syntax table. More specifically if the lexer
returns nil
or an empty string, SMIE tries to handle the corresponding
text as a sexp according to syntax tables.