When built with the tree-sitter library (see Parsing Program Source), Emacs is capable of parsing the program source and producing a syntax tree. This syntax tree can be used for guiding the program source indentation commands. For maximum flexibility, it is possible to write a custom indentation function that queries the syntax tree and indents accordingly for each language, but that is a lot of work. It is more convenient to use the simple indentation engine described below: then the major mode needs only write some indentation rules, and the engine takes care of the rest.
To enable the parser-based indentation engine, either set
treesit-simple-indent-rules
and call
treesit-major-mode-setup
, or equivalently, set the value of
indent-line-function
to treesit-indent
.
This variable stores the actual function called by
treesit-indent
. By default, its value is
treesit-simple-indent
. In the future we might add other,
more complex indentation engines.
This local variable stores indentation rules for every language. It
is an alist with elements of the form (language . rules)
, where language is a language symbol, and
rules is a list with elements of the form
(matcher anchor offset)
.
First, Emacs passes the smallest tree-sitter node at the beginning of
the current line to matcher; if it returns non-nil
, this
rule is applicable. Then Emacs passes the node to anchor, which
returns a buffer position. Emacs takes the column number of that
position, adds offset to it, and the result is the indentation
column for the current line.
The matcher and anchor are functions, and Emacs provides convenient defaults for them.
Each matcher or anchor is a function that takes three
arguments: node, parent, and bol. The argument
bol is the buffer position whose indentation is required: the
position of the first non-whitespace character after the beginning of
the line. The argument node is the largest node that starts at
that position (and is not a root node); and parent is the parent
of node. However, when that position is in a whitespace or
inside a multi-line string, no node can start at that position, so
node is nil
. In that case, parent would be the
smallest node that spans that position.
matcher should return non-nil
if the rule is applicable,
and anchor should return a buffer position.
offset can be an integer, a variable whose value is an integer, or a function that returns an integer. If it is a function, it is passed node, parent, and bol, like matchers and anchors.
This is a list of defaults for matchers and anchors in
treesit-simple-indent-rules
. Each of them represents a
function that takes 3 arguments: node, parent, and
bol. The available default functions are:
no-node
¶This matcher is a function that is called with 3 arguments:
node, parent, and bol. It returns non-nil
,
indicating a match, if node is nil
, i.e., there is no
node that starts at bol. This is the case when bol is on
an empty line or inside a multi-line string, etc.
parent-is
¶This matcher is a function of one argument, type; it returns a
function that is called with 3 arguments: node, parent,
and bol, and returns non-nil
(i.e., a match) if
parent’s type matches regexp type.
node-is
¶This matcher is a function of one argument, type; it returns a
function that is called with 3 arguments: node, parent,
and bol, and returns non-nil
if node’s type matches
regexp type.
field-is
¶This matcher is a function of one argument, name; it returns a
function that is called with 3 arguments: node, parent,
and bol, and returns non-nil
if node’s field name
in parent matches regexp name.
query
¶This matcher is a function of one argument, query; it returns a
function that is called with 3 arguments: node, parent,
and bol, and returns non-nil
if querying parent
with query captures node (see Pattern Matching Tree-sitter Nodes).
match
¶This matcher is a function of 5 arguments: node-type,
parent-type, node-field, node-index-min, and
node-index-max). It returns a function that is called with 3
arguments: node, parent, and bol, and returns
non-nil
if node’s type matches regexp node-type,
parent’s type matches regexp parent-type, node’s
field name in parent matches regexp node-field, and
node’s index among its siblings is between node-index-min
and node-index-max. If the value of an argument is nil
,
this matcher doesn’t check that argument. For example, to match the
first child where parent is argument_list
, use
(match nil "argument_list" nil 0 0)
In addition, node-type can be a special value null
,
which matches when the value of node is nil
.
n-p-gp
¶Short for “node-parent-grandparent”, this matcher is a function of 3
arguments: node-type, parent-type, and
grandparent-type. It returns a function that is called with 3
arguments: node, parent, and bol, and returns
non-nil
if: (1) node-type matches node’s type, and
(2) parent-type matches parent’s type, and (3)
grandparent-type matches parent’s parent’s type. If any
of node-type, parent-type, and grandparent-type is
nil
, this function doesn’t check for it.
comment-end
¶This matcher is a function that is called with 3 arguments:
node, parent, and bol, and returns non-nil
if
point is before a comment-ending token. Comment-ending tokens are
defined by regexp comment-end-skip
.
catch-all
¶This matcher is a function that is called with 3 arguments:
node, parent, and bol. It always returns
non-nil
, indicating a match.
first-sibling
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of the first child of parent.
nth-sibling
¶This anchor is a function of two arguments: n, and an optional
argument named. It returns a function that is called with 3
arguments: node, parent, and bol, and returns the
start of the nth child of parent. If named is
non-nil
, only named children are counted (see named node).
parent
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of parent.
grand-parent
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of parent’s parent.
great-grand-parent
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of parent’s parent’s parent.
parent-bol
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the first non-space character on the line which parent’s start is on.
standalone-parent
¶This anchor is a function that is called with 3 arguments: node, parent, and bol. It finds the first ancestor node (parent, grandparent, etc.) of node that starts on its own line, and return the start of that node. “Starting on its own line” means there is only whitespace character before the node on the line which the node’s start is on.
prev-sibling
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of the previous sibling of node.
no-indent
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of node.
prev-line
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the first non-whitespace character on the previous line.
column-0
¶This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the beginning of the current line, which is at column 0.
comment-start
¶This anchor is a function that is called with 3 arguments: node,
parent, and bol, and returns the position after the
comment-start token. Comment-start tokens are defined by regular
expression comment-start-skip
. This function assumes
parent is the comment node.
prev-adaptive-prefix
¶This anchor is a function that is called with 3 arguments: node,
parent, and bol. It tries to match
adaptive-fill-regexp
to the text at the beginning of the
previous non-empty line. If there is a match, this function returns
the end of the match, otherwise it returns nil
. However, if
the current line begins with a prefix (e.g., ‘-’), return the
beginning of the prefix of the previous line instead, so that the two
prefixes align. This anchor is useful for an
indent-relative
-like indent behavior for block comments.
Here are some utility functions that can help writing parser-based indentation rules.
This command checks the current buffer’s indentation against major mode mode. It indents the current buffer according to mode and compares the results with the current indentation. Then it pops up a buffer showing the differences. Correct indentation (target) is shown in green color, current indentation is shown in red color.
It is also helpful to use treesit-inspect-mode
(see Tree-sitter Language Grammar) when writing indentation rules.