Wisent Parser Development

Wisent (the European Bison ;-) is an Emacs Lisp implementation of the GNU Compiler Compiler Bison.

This manual describes how to use Wisent to develop grammars for programming languages, and how to use grammars to parse language source in Emacs buffers.

It also describes how Wisent is used with the Semantic tool set described in the (semantic)Semantic Manual.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being “A GNU Manual,” and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled “GNU Free Documentation License”.

(a) The FSF’s Back-Cover Text is: “You have the freedom to copy and modify this GNU manual.”

1 Wisent Overview

Wisent (the European Bison) is an implementation in Emacs Lisp of the GNU Compiler Compiler Bison. Its code is a port of the C code of GNU Bison 1.28 & 1.31.

For more details on the basic concepts for understanding Wisent, it is worthwhile to read the (bison)Bison Manual.

Wisent can generate compilers compatible with the Semantic tool set. See the (semantic)Semantic Manual.

It benefits from these Bison features:

It uses a fast but not so space-efficient encoding for the parse tables, described in Corbett’s PhD thesis from Berkeley:

Static Semantics in Compiler Error Recovery
June 1985, Report No. UCB/CSD 85/251.
For generating the lookahead sets, Wisent uses the well-known technique of F. DeRemer and T. Pennello described in:

Efficient Computation of LALR(1) Look-Ahead Sets
October 1982, ACM TOPLAS Vol 4 No 4, 615–49, https://doi.org/10.1145/69622.357187.
Wisent resolves shift/reduce conflicts using operator precedence and associativity.
Parser error recovery is accomplished using rules which match the special token error.

Nevertheless there are some fundamental differences between Bison and Wisent.

Wisent is intended to be used in Emacs. It reads and produces Emacs Lisp data structures. All the additional code used in grammars is Emacs Lisp code.
Contrary to Bison, Wisent does not generate a parser which combines Emacs Lisp code and grammar constructs. They exist separately. Wisent reads the grammar from a Lisp data structure and then generates grammar constructs as tables. Afterward, the derived tables can be included and byte-compiled in separate Emacs Lisp files, and be used at a later time by the Wisent’s parser engine.
Wisent allows multiple start nonterminals and allows a call to the parsing function to be made for a particular start nonterminal. For example, this is particularly useful to parse a region of an Emacs buffer. Semantic heavily depends on the availability of this feature.

2 Wisent Grammar

In order for Wisent to parse a language, it must be described by a context-free grammar. That is a grammar specified as rules that can be applied regardless of context. For more information, see (bison)Language and Grammar, in the Bison manual.

The formal grammar is formulated using terminal and nonterminal items. Terminals can be Emacs Lisp symbols or characters, and nonterminals are symbols only.

Terminals (also known as tokens) represent the lexical elements of the language like numbers, strings, etc..

For example ‘PLUS’ can represent the operator ‘+’.

Nonterminal symbols are described by rules:

RESULT ≡ COMPONENTS…

‘RESULT’ is a nonterminal that this rule describes and ‘COMPONENTS’ are various terminals and nonterminals that are put together by this rule.

For example, this rule:

exp ≡ exp PLUS exp

Says that two groupings of type ‘exp’, with a ‘PLUS’ token in between, can be combined into a larger grouping of type ‘exp’.

Grammar format
Example
Compiling a grammar
Conflicts

2.1 Grammar format

To be acceptable by Wisent a context-free grammar must respect a particular format. That is, must be represented as an Emacs Lisp list of the form:

(terminals assocs . non-terminals)

terminals

Is the list of terminal symbols used in the grammar.

assocs

Specify the associativity of terminals. It is nil when there is no associativity defined, or an alist of (assoc-type . assoc-value) elements.

assoc-type must be one of the default-prec, nonassoc, left or right symbols. When assoc-type is default-prec, assoc-value must be nil or t (the default). Otherwise it is a list of tokens which must have been previously declared in terminals.

For details, see (bison)Contextual Precedence, in the Bison manual.

non-terminals

Is the list of nonterminal definitions. Each definition has the form:

(nonterm . rules)

Where nonterm is the nonterminal symbol defined and rules the list of rules that describe this nonterminal. Each rule is a list:

(components [precedence] [action])

Where:

components

Is a list of various terminals and nonterminals that are put together by this rule.

For example,

(exp ((exp ?+ exp))          ;; exp: exp '+' exp
     )                       ;;    ;

Says that two groupings of type ‘exp’, with a ‘+’ token in between, can be combined into a larger grouping of type ‘exp’.

By convention, a nonterminal symbol should be in lower case, such as ‘exp’, ‘stmt’ or ‘declaration’. Terminal symbols should be upper case to distinguish them from nonterminals: for example, ‘INTEGER’, ‘IDENTIFIER’, ‘IF’ or ‘RETURN’. A terminal symbol that represents a particular keyword in the language is conventionally the same as that keyword converted to upper case. The terminal symbol error is reserved for error recovery.

Scattered among the components can be middle-rule actions. Usually only action is provided (see action).

If components in a rule is nil, it means that the rule can match the empty string. For example, here is how to define a comma-separated sequence of zero or more ‘exp’ groupings:

(expseq  (nil)               ;; expseq: ;; empty
         ((expseq1))         ;;       | expseq1
         )                   ;;       ;

(expseq1 ((exp))             ;; expseq1: exp
         ((expseq1 ?, exp))  ;;        | expseq1 ',' exp
         )                   ;;        ;

precedence

Assign the rule the precedence of the given terminal item, overriding the precedence that would be deduced for it, that is the one of the last terminal in it. Notice that only terminals declared in assocs have a precedence level. The altered rule precedence then affects how conflicts involving that rule are resolved.

precedence is an optional vector of one terminal item.

Here is how precedence solves the problem of unary minus. First, declare a precedence for a fictitious terminal symbol named UMINUS. There are no tokens of this type, but the symbol serves to stand for its precedence:

…
((default-prec t) ;; This is the default
 (left '+' '-')
 (left '*')
 (left UMINUS))

Now the precedence of UMINUS can be used in specific rules:

(exp    …                  ;; exp:    …
         ((exp ?- exp))      ;;         | exp '-' exp
        …                  ;;         …
         ((?- exp) [UMINUS]) ;;         | '-' exp %prec UMINUS
        …                  ;;         …
        )                    ;;         ;

If you forget to append [UMINUS] to the rule for unary minus, Wisent silently assumes that minus has its usual precedence. This kind of problem can be tricky to debug, since one typically discovers the mistake only by testing the code.

Using (default-prec nil) declaration makes it easier to discover this kind of problem systematically. It causes rules that lack a precedence modifier to have no precedence, even if the last terminal symbol mentioned in their components has a declared precedence.

If (default-prec nil) is in effect, you must specify precedence for all rules that participate in precedence conflict resolution. Then you will see any shift/reduce conflict until you tell Wisent how to resolve it, either by changing your grammar or by adding an explicit precedence. This will probably add declarations to the grammar, but it helps to protect against incorrect rule precedences.

The effect of (default-prec nil) can be reversed by giving (default-prec t), which is the default.

For more details, see (bison)Contextual Precedence, in the Bison manual.

It is important to understand that assocs declarations defines associativity but also assign a precedence level to terminals. All terminals declared in the same left, right or nonassoc association get the same precedence level. The precedence level is increased at each new association.

On the other hand, precedence explicitly assign the precedence level of the given terminal to a rule.

action

An action is an optional Emacs Lisp function call, like this:

(identity $1)

The result of an action determines the semantic value of a rule.

From an implementation standpoint, the function call will be embedded in a lambda expression, and several useful local variables will be defined:

$n: Where n is a positive integer. Like in Bison, the value of $n is the semantic value of the nth element of components, starting from 1. It can be of any Lisp data type.
$regionN: Where n is a positive integer. For each $n variable defined there is a corresponding $regionn variable. Its value is a pair (start-pos . end-pos) that represent the start and end positions (in the lexical input stream) of the $n value. It can be nil when the component positions are not available, like for an empty string component for example.
$region: Its value is the leftmost and rightmost positions of input data matched by all components in the rule. This is a pair (leftmost-pos . rightmost-pos). It can be nil when components positions are not available.
$nterm: This variable is initialized with the nonterminal symbol (nonterm) the rule belongs to. It could be useful to improve error reporting or debugging. It is also used to automatically provide incremental re-parse entry points for Semantic tags (see How to use Wisent with Semantic).
$action: The value of $action is the symbolic name of the current semantic action (see Debugging semantic actions).

When an action is not specified a default value is supplied, it is (identity $1). This means that the default semantic value of a rule is the value of its first component. Excepted for a rule matching the empty string, for which the default action is to return nil.

2.2 Example

Here is an example to parse simple infix arithmetic expressions. See (bison)Infix Calc, in the Bison manual for details.

'(
  ;; Terminals
  (NUM)

  ;; Terminal associativity & precedence
  ((nonassoc ?=)
   (left ?- ?+)
   (left ?* ?/)
   (left NEG)
   (right ?^))

  ;; Rules
  (input
   ((line))
   ((input line)
    (format "%s %s" $1 $2))
   )

  (line
   ((?;)
    (progn ";"))
   ((exp ?;)
    (format "%s;" $1))
   ((error ?;)
    (progn "Error;")))
   )

  (exp
   ((NUM)
    (string-to-number $1))
   ((exp ?= exp)
    (= $1 $3))
   ((exp ?+ exp)
    (+ $1 $3))
   ((exp ?- exp)
    (- $1 $3))
   ((exp ?* exp)
    (* $1 $3))
   ((exp ?/ exp)
    (/ $1 $3))
   ((?- exp) [NEG]
    (- $2))
   ((exp ?^ exp)
    (expt $1 $3))
   ((?\( exp ?\))
    (progn $2))
   )
  )

In the bison-like WY format (see How to use Wisent with Semantic) the grammar looks like this:

%token <number> NUM

%nonassoc '=' ;; comparison
%left '-' '+'
%left '*' '/'
%left NEG     ;; negation--unary minus
%right '^'    ;; exponentiation

%%

input:
    line
  | input line
    (format "%s %s" $1 $2)
  ;

line:
    ';'
    {";"}
  | exp ';'
    (format "%s;" $1)
  | error ';'
    {"Error;"}
  ;

exp:
    NUM
    (string-to-number $1)
  | exp '=' exp
    (= $1 $3)
  | exp '+' exp
    (+ $1 $3)
  | exp '-' exp
    (- $1 $3)
  | exp '*' exp
    (* $1 $3)
  | exp '/' exp
    (/ $1 $3)
  | '-' exp %prec NEG
    (- $2)
  | exp '^' exp
    (expt $1 $3)
  | '(' exp ')'
    {$2}
  ;

%%

2.3 Compiling a grammar

After providing a context-free grammar in a suitable format, it must be translated into a set of tables (an automaton) that will be used to derive the parser. Like Bison, Wisent translates grammars that must be LALR(1).

A grammar is LALR(1) if it is possible to tell how to parse any portion of an input string with just a single token of look-ahead: the look-ahead token. See (bison)Language and Grammar, in the Bison manual for more information.

Grammar translation (compilation) is achieved by the function:

Function: wisent-compile-grammar grammar &optional start-list ¶

Compile grammar and return an LALR(1) automaton.

Optional argument start-list is a list of start symbols (nonterminals). If nil the first nonterminal defined in the grammar is the default start symbol. If start-list contains only one element, it defines the start symbol. If start-list contains more than one element, all are defined as potential start symbols, unless wisent-single-start-flag is non-nil. In that case the first element of start-list defines the start symbol and others are ignored.

The LALR(1) automaton is a vector of the form:

[actions gotos starts functions]

actions: A state/token matrix telling the parser what to do at every state based on the current look-ahead token. That is shift, reduce, accept or error. See also Wisent Parsing.
gotos: A state/nonterminal matrix telling the parser the next state to go to after reducing with each rule.
starts: An alist which maps the allowed start symbols (nonterminals) to lexical tokens that will be first shifted into the parser stack.
functions: An obarray of semantic action symbols. A semantic action is actually an Emacs Lisp function (lambda expression).

2.4 Conflicts

Normally, a grammar should produce an automaton where at each state the parser has only one action to do (see Wisent Parsing).

In certain cases, a grammar can produce an automaton where, at some states, there are more than one action possible. Such a grammar is ambiguous, and generates conflicts.

The parser can’t be driven by an automaton which isn’t completely deterministic, that is which contains conflicts. It is necessary to resolve the conflicts to eliminate them. Wisent resolves conflicts like Bison does.

There are two sorts of conflicts:

shift/reduce conflicts

When either a shift or a reduction would be valid at the same state.

Such conflicts are resolved by choosing to shift, unless otherwise directed by operator precedence declarations. See (bison)Shift/Reduce, in the Bison manual for more information.

reduce/reduce conflicts

That occurs if there are two or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar.

Such conflicts are resolved by choosing to use the rule that appears first in the grammar, but it is very risky to rely on this. Every reduce/reduce conflict must be studied and usually eliminated. See (bison)Reduce/Reduce, in the Bison manual for more information.

Grammar debugging
Understanding the automaton

2.4.1 Grammar debugging

To help writing a new grammar, wisent-compile-grammar can produce a verbose report containing a detailed description of the grammar and parser (equivalent to what Bison reports with the --verbose option).

To enable the verbose report you can set to non-nil the variable:

Option: wisent-verbose-flag ¶: non-nil means to report verbose information on generated parser.

Or interactively use the command:

Command: wisent-toggle-verbose-flag ¶: Toggle whether to report verbose information on generated parser.

The verbose report is printed in the temporary buffer *wisent-log* when running interactively, or in file wisent.output when running in batch mode. Different reports are separated from each other by a line like this:

*** Wisent source-file - 2002-06-27 17:33

where source-file is the name of the Emacs Lisp file from which the grammar was read. See Understanding the automaton, for details on the verbose report.

Please Note

To help debugging the grammar compiler itself, you can set this variable to print the content of some internal data structures:

Variable: wisent-debug-flag ¶: non-nil means enable some debug stuff.

2.4.2 Understanding the automaton

This section (took from the manual of Bison 1.49) describes how to use the verbose report printed by wisent-compile-grammar to understand the generated automaton, to tune or fix a grammar.

We will use the following example:

(let ((wisent-verbose-flag t)) ;; Print a verbose report!
  (wisent-compile-grammar
   '((NUM STR)                          ; %token NUM STR

     ((left ?+ ?-)                      ; %left '+' '-';
      (left ?*))                        ; %left '*'

     (exp                               ; exp:
      ((exp ?+ exp))                    ;    exp '+' exp
      ((exp ?- exp))                    ;  | exp '-' exp
      ((exp ?* exp))                    ;  | exp '*' exp
      ((exp ?/ exp))                    ;  | exp '/' exp
      ((NUM))                           ;  | NUM
      )                                 ;  ;

     (useless                           ; useless:
      ((STR))                           ;    STR
      )                                 ;  ;
     )
   'nil)                                ; no %start declarations
  )

When evaluating the above expression, grammar compilation first issues the following two clear messages:

Grammar contains 1 useless nonterminals and 1 useless rules
Grammar contains 7 shift/reduce conflicts

The *wisent-log* buffer details things!

The first section reports conflicts that were solved using precedence and/or associativity:

Conflict in state 7 between rule 1 and token '+' resolved as reduce.
Conflict in state 7 between rule 1 and token '-' resolved as reduce.
Conflict in state 7 between rule 1 and token '*' resolved as shift.
Conflict in state 8 between rule 2 and token '+' resolved as reduce.
Conflict in state 8 between rule 2 and token '-' resolved as reduce.
Conflict in state 8 between rule 2 and token '*' resolved as shift.
Conflict in state 9 between rule 3 and token '+' resolved as reduce.
Conflict in state 9 between rule 3 and token '-' resolved as reduce.
Conflict in state 9 between rule 3 and token '*' resolved as reduce.

The next section reports useless tokens, nonterminal and rules (note that useless tokens might be used by the scanner):

Useless nonterminals:

   useless


Terminals which are not used:

   STR


Useless rules:

#6     useless: STR;

The next section lists states that still have conflicts:

State 7 contains 1 shift/reduce conflict.
State 8 contains 1 shift/reduce conflict.
State 9 contains 1 shift/reduce conflict.
State 10 contains 4 shift/reduce conflicts.

The next section reproduces the grammar used:

Grammar

  Number, Rule
  1       exp -> exp '+' exp
  2       exp -> exp '-' exp
  3       exp -> exp '*' exp
  4       exp -> exp '/' exp
  5       exp -> NUM

And reports the uses of the symbols:

Terminals, with rules where they appear

$EOI (-1)
error (1)
NUM (2) 5
STR (3) 6
'+' (4) 1
'-' (5) 2
'*' (6) 3
'/' (7) 4


Nonterminals, with rules where they appear

exp (8)
    on left: 1 2 3 4 5, on right: 1 2 3 4

The report then details the automaton itself, describing each state with it set of items, also known as pointed rules. Each item is a production rule together with a point (marked by ‘.’) that the input cursor.

state 0

    NUM shift, and go to state 1

    exp go to state 2

State 0 corresponds to being at the very beginning of the parsing, in the initial rule, right before the start symbol (‘exp’). When the parser returns to this state right after having reduced a rule that produced an ‘exp’, it jumps to state 2. If there is no such transition on a nonterminal symbol, and the lookahead is a ‘NUM’, then this token is shifted on the parse stack, and the control flow jumps to state 1. Any other lookahead triggers a parse error.

In the state 1...

state 1

    exp  ->  NUM .   (rule 5)

    $default    reduce using rule 5 (exp)

the rule 5, ‘exp: NUM;’, is completed. Whatever the lookahead (‘$default’), the parser will reduce it. If it was coming from state 0, then, after this reduction it will return to state 0, and will jump to state 2 (‘exp: go to state 2’).

state 2

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    $EOI        shift, and go to state 11
    '+' shift, and go to state 3
    '-' shift, and go to state 4
    '*' shift, and go to state 5
    '/' shift, and go to state 6

In state 2, the automaton can only shift a symbol. For instance, because of the item ‘exp -> exp . '+' exp’, if the lookahead if ‘+’, it will be shifted on the parse stack, and the automaton control will jump to state 3, corresponding to the item ‘exp -> exp . '+' exp’:

state 3

    exp  ->  exp '+' . exp   (rule 1)

    NUM shift, and go to state 1

    exp go to state 7

Since there is no default action, any other token than those listed above will trigger a parse error.

The interpretation of states 4 to 6 is straightforward:

state 4

    exp  ->  exp '-' . exp   (rule 2)

    NUM shift, and go to state 1

    exp go to state 8



state 5

    exp  ->  exp '*' . exp   (rule 3)

    NUM shift, and go to state 1

    exp go to state 9



state 6

    exp  ->  exp '/' . exp   (rule 4)

    NUM shift, and go to state 1

    exp go to state 10

As was announced in beginning of the report, ‘State 7 contains 1 shift/reduce conflict.’:

state 7

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp '+' exp .   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '*' shift, and go to state 5
    '/' shift, and go to state 6

    '/' [reduce using rule 1 (exp)]
    $default    reduce using rule 1 (exp)

Indeed, there are two actions associated to the lookahead ‘/’: either shifting (and going to state 6), or reducing rule 1. The conflict means that either the grammar is ambiguous, or the parser lacks information to make the right decision. Indeed the grammar is ambiguous, as, since we did not specify the precedence of ‘/’, the sentence ‘NUM + NUM / NUM’ can be parsed as ‘NUM + (NUM / NUM)’, which corresponds to shifting ‘/’, or as ‘(NUM + NUM) / NUM’, which corresponds to reducing rule 1.

Because in LALR(1) parsing a single decision can be made, Wisent arbitrarily chose to disable the reduction, see Conflicts. Discarded actions are reported in between square brackets.

Note that all the previous states had a single possible action: either shifting the next token and going to the corresponding state, or reducing a single rule. In the other cases, i.e., when shifting and reducing is possible or when several reductions are possible, the lookahead is required to select the action. State 7 is one such state: if the lookahead is ‘*’ or ‘/’ then the action is shifting, otherwise the action is reducing rule 1. In other words, the first two items, corresponding to rule 1, are not eligible when the lookahead is ‘*’, since we specified that ‘*’ has higher precedence that ‘+’. More generally, some items are eligible only with some set of possible lookaheads.

States 8 to 10 are similar:

state 8

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp '-' exp .   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '*' shift, and go to state 5
    '/' shift, and go to state 6

    '/' [reduce using rule 2 (exp)]
    $default    reduce using rule 2 (exp)


state 9

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp '*' exp .   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '/' shift, and go to state 6

    '/' [reduce using rule 3 (exp)]
    $default    reduce using rule 3 (exp)


state 10

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)
    exp  ->  exp '/' exp .   (rule 4)

    '+' shift, and go to state 3
    '-' shift, and go to state 4
    '*' shift, and go to state 5
    '/' shift, and go to state 6

    '+' [reduce using rule 4 (exp)]
    '-' [reduce using rule 4 (exp)]
    '*' [reduce using rule 4 (exp)]
    '/' [reduce using rule 4 (exp)]
    $default    reduce using rule 4 (exp)

Observe that state 10 contains conflicts due to the lack of precedence of ‘/’ wrt ‘+’, ‘-’, and ‘*’, but also because the associativity of ‘/’ is not specified.

Finally, the state 11 (plus 12) is named the final state, or the accepting state:

state 11

    $EOI        shift, and go to state 12



state 12

    $default    accept

The end of input is shifted ‘$EOI shift,’ and the parser exits successfully (‘go to state 12’, that terminates).

3 Wisent Parsing

The Wisent’s parser is what is called a bottom-up or shift-reduce parser which repeatedly:

shift: That is pushes the value of the last lexical token read (the look-ahead token) into a value stack, and reads a new one.
reduce: That is replaces a nonterminal by its semantic value. The values of the components which form the right hand side of a rule are popped from the value stack and reduced by the semantic action of this rule. The result is pushed back on top of value stack.

The parser will stop on:

accept: When all input has been successfully parsed. The semantic value of the start nonterminal is on top of the value stack.
error: When a syntax error (an unexpected token in input) has been detected. At this point the parser issues an error message and either stops or calls a recovery routine to try to resume parsing.

The above elementary actions are driven by the LALR(1) automaton built by wisent-compile-grammar from a context-free grammar.

The Wisent’s parser is entered by calling the function:

Function: wisent-parse automaton lexer &optional error start ¶

Parse input using the automaton specified in automaton.

automaton: Is an LALR(1) automaton generated by wisent-compile-grammar (see Wisent Grammar).
lexer: Is a function with no argument called by the parser to obtain the next terminal (token) in input (see What the parser must receive).
error: Is an optional reporting function called when a parse error occurs. It receives a message string to report. It defaults to the function wisent-message (see The error reporting function).
start: Specify the start symbol (nonterminal) used by the parser as its goal. It defaults to the start symbol defined in the grammar (see Wisent Grammar).

The following two normal hooks permit doing some useful processing respectively before starting parsing, and after the parser terminated.

Variable: wisent-pre-parse-hook ¶: Normal hook run just before entering the LR parser engine.

Variable: wisent-post-parse-hook ¶: Normal hook run just after the LR parser engine terminated.

What the parser must receive
Variables and macros useful in grammar actions.
The error reporting function
Error recovery
Debugging semantic actions

3.1 What the parser must receive

It is important to understand that the parser does not parse characters, but lexical tokens, and does not know anything about characters in text streams!

Reading input data to produce lexical tokens is performed by a lexer (also called a scanner) in a lexical analysis step, before the syntax analysis step performed by the parser. The parser automatically calls the lexer when it needs the next token to parse.

A Wisent’s lexer is an Emacs Lisp function with no argument. It must return a valid lexical token of the form:

(token-class value [start . end])

token-class: Is a category of lexical token identifying a terminal as specified in the grammar (see Wisent Grammar). It can be a symbol or a character literal.
value: Is the value of the lexical token. It can be of any valid Emacs Lisp data type.
start
end: Are the optional beginning and ending positions of value in the input stream.

When there are no more tokens to read the lexer must return the token (list wisent-eoi-term) to each request.

Variable: wisent-eoi-term ¶: Predefined constant, End-Of-Input terminal symbol.

wisent-lex is an example of a lexer that reads lexical tokens produced by a Semantic lexer, and translates them into lexical tokens suitable to the Wisent parser. See also The Wisent Lex lexer.

To call the lexer in a semantic action use the function wisent-lexer. See also Variables and macros useful in grammar actions..

3.2 Variables and macros useful in grammar actions.

Variable: wisent-input ¶: The last token read. This variable only has meaning in the scope of wisent-parse.

Function: wisent-lexer ¶: Obtain the next terminal in input.

Function: wisent-region &rest positions ¶: Return the start/end positions of the region including positions. Each element of positions is a pair (start-pos . end-pos) or nil. The returned value is the pair (min-start-pos . max-end-pos) or nil if no positions are available.

3.3 The error reporting function

When the parser encounters a syntax error it calls a user-defined function. It must be an Emacs Lisp function with one argument: a string containing the message to report.

By default the parser uses this function to report error messages:

Function: wisent-message string &rest args ¶: Print a one-line message if wisent-parse-verbose-flag is set. Pass string and args arguments to message.

Please Note:

wisent-message uses the following function to print lexical tokens:

Function: wisent-token-to-string token ¶: Return a printed representation of lexical token token.

The general printed form of a lexical token is:

token(value)@location

To control the verbosity of the parser you can set to non-nil this variable:

Option: wisent-parse-verbose-flag ¶: non-nil means to issue more messages while parsing.

Or interactively use the command:

Command: wisent-parse-toggle-verbose-flag ¶: Toggle whether to issue more messages while parsing.

When the error reporting function is entered the variable wisent-input contains the unexpected token as returned by the lexer.

The error reporting function can be called from a semantic action too using the special macro wisent-error. When called from a semantic action entered by error recovery (see Error recovery) the value of the variable wisent-recovering is non-nil.

3.4 Error recovery

The error recovery mechanism of the Wisent’s parser conforms to the one Bison uses. See (bison)Error Recovery, in the Bison manual for details.

To recover from a syntax error you must write rules to recognize the special token error. This is a terminal symbol that is automatically defined and reserved for error handling.

When the parser encounters a syntax error, it pops the state stack until it finds a state that allows shifting the error token. After it has been shifted, if the old look-ahead token is not acceptable to be shifted next, the parser reads tokens and discards them until it finds a token which is acceptable.

Strategies for error recovery depend on the choice of error rules in the grammar. A simple and useful strategy is simply to skip the rest of the current statement if an error is detected:

(statement (( error ?; )) ;; on error, skip until ';' is read
           )

It is also useful to recover to the matching close-delimiter of an opening-delimiter that has already been parsed:

(primary (( ?{ expr  ?} ))
         (( ?{ error ?} ))
         …
         )

Note that error recovery rules may have actions, just as any other rules can. Here are some predefined hooks, variables, functions or macros, useful in such actions:

Variable: wisent-nerrs ¶: The number of parse errors encountered so far.

Variable: wisent-recovering ¶: non-nil means that the parser is recovering. This variable only has meaning in the scope of wisent-parse.

Function: wisent-error msg ¶

Call the user supplied error reporting function with message msg (see The error reporting function).

For an example of use, See wisent-skip-token.

Function: wisent-errok ¶

Resume generating error messages immediately for subsequent syntax errors.

The parser suppress error message for syntax errors that happens shortly after the first, until three consecutive input tokens have been successfully shifted.

Calling wisent-errok in an action, make error messages resume immediately. No error messages will be suppressed if you call it in an error rule’s action.

For an example of use, See wisent-skip-token.

Function: wisent-clearin ¶

Discard the current lookahead token. This will cause a new lexical token to be read.

In an error rule’s action the previous lookahead token is reanalyzed immediately. wisent-clearin may be called to clear this token.

For example, suppose that on a parse error, an error handling routine is called that advances the input stream to some point where parsing should once again commence. The next symbol returned by the lexical scanner is probably correct. The previous lookahead token ought to be discarded with wisent-clearin.

For an example of use, See wisent-skip-token.

Function: wisent-abort ¶: Abort parsing and save the lookahead token.

Function: wisent-set-region start end ¶

Change the region of text matched by the current nonterminal. start and end are respectively the beginning and end positions of the region occupied by the group of components associated to this nonterminal. If start or end values are not a valid positions the region is set to nil.

For an example of use, See wisent-skip-token.

Variable: wisent-discarding-token-functions ¶

List of functions to be called when discarding a lexical token. These functions receive the lexical token discarded. When the parser encounters unexpected tokens, it can discards them, based on what directed by error recovery rules. Either when the parser reads tokens until one is found that can be shifted, or when an semantic action calls the function wisent-skip-token or wisent-skip-block. For language specific hooks, make sure you define this as a local hook.

For example, in Semantic, this hook is set to the function wisent-collect-unmatched-syntax to collect unmatched lexical tokens (see Useful functions).

Function: wisent-skip-token ¶

Skip the lookahead token in order to resume parsing. Return nil. Must be used in error recovery semantic actions.

It typically looks like this:

(wisent-message "%s: skip %s" $action
                (wisent-token-to-string wisent-input))
(run-hook-with-args
 'wisent-discarding-token-functions wisent-input)
(wisent-clearin)
(wisent-errok)))

Function: wisent-skip-block ¶

Safely skip a block in order to resume parsing. Return nil. Must be used in error recovery semantic actions.

A block is data between an open-delimiter (syntax class () and a matching close-delimiter (syntax class )):

(a parenthesized block)
[a block between brackets]
{a block between braces}

The following example uses wisent-skip-block to safely skip a block delimited by ‘LBRACE’ ({) and ‘RBRACE’ (}) tokens, when a syntax error occurs in ‘other-components’:

(block ((LBRACE other-components RBRACE))
       ((LBRACE RBRACE))
       ((LBRACE error)
        (wisent-skip-block))
       )

3.5 Debugging semantic actions

Each semantic action is represented by a symbol interned in an obarray that is part of the LALR(1) automaton (see Compiling a grammar). symbol-function on a semantic action symbol return the semantic action lambda expression.

A semantic action symbol name has the form nonterminal:index, where nonterminal is the name of the nonterminal symbol the action belongs to, and index is an action sequence number within the scope of nonterminal. For example, this nonterminal definition:

input:
   line                     [input:0]
 | input line
   (format "%s %s" $1 $2)   [input:1]
 ;

Will produce two semantic actions, and associated symbols:

input:0: A default action that returns $1.
input:1: That returns (format "%s %s" $1 $2).

Debugging uses the Lisp debugger to investigate what is happening during execution of semantic actions. Three commands are available to debug semantic actions. They receive two arguments:

The automaton that contains the semantic action.
The semantic action symbol.

Command: wisent-debug-on-entry automaton function ¶: Request automaton’s function to invoke debugger each time it is called. function must be a semantic action symbol that exists in automaton.

Command: wisent-cancel-debug-on-entry automaton function ¶: Undo effect of wisent-debug-on-entry on automaton’s function. function must be a semantic action symbol that exists in automaton.

Command: wisent-debug-show-entry automaton function ¶: Show the source of automaton’s semantic action function. function must be a semantic action symbol that exists in automaton.

4 How to use Wisent with Semantic

This section presents how the Wisent’s parser can be used to produce tags for the Semantic tool set.

Semantic tags form a hierarchy of Emacs Lisp data structures that describes a program in a way independent of programming languages. Tags map program declarations, like functions, methods, variables, data types, classes, includes, grammar rules, etc..

To use the Wisent parser with Semantic you have to define your grammar in WY form, a grammar format very close to the one used by Bison.

Please see (grammar-fw)Semantic Grammar Framework Manual, for more information on Semantic grammars.

Grammar styles
The Wisent Lex lexer

4.1 Grammar styles

Semantic parsing heavily depends on how you wrote the grammar. There are mainly two styles to write a Wisent’s grammar intended to be used with the Semantic tool set: the Iterative style and the Bison style. Each one has pros and cons, and in certain cases it can be worth a mix of the two styles!

Iterative style
Bison style
Mixed style
Start nonterminals
Useful functions

4.1.1 Iterative style

The iterative style is the preferred style to use with Semantic. It relies on an iterative parser back-end mechanism which parses start nonterminals one at a time and automagically skips unexpected lexical tokens in input.

Compared to rule-based iterative functions (see Bison style), iterative parsers are better in that they can handle obscure errors more cleanly.

Each start nonterminal must produces a raw tag by calling a TAG-like grammar macro with appropriate parameters. See also Start nonterminals.

Then, each parsing iteration automatically translates a raw tag into expanded tags, updating the raw tag structure with internal properties and buffer related data.

After parsing completes, it results in a tree of expanded tags.

The following example is a snippet of the iterative style Java grammar provided in the Semantic distribution in the file semantic/wisent/java-tags.wy.

…
;; Alternate entry points
;;    - Needed by partial re-parse
%start formal_parameter
…
;;    - Needed by EXPANDFULL clauses
%start formal_parameters
…

formal_parameter_list
  : PAREN_BLOCK
    (EXPANDFULL $1 formal_parameters)
  ;

formal_parameters
  : LPAREN
    ()
  | RPAREN
    ()
  | formal_parameter COMMA
  | formal_parameter RPAREN
  ;

formal_parameter
  : formal_parameter_modifier_opt type variable_declarator_id
    (VARIABLE-TAG $3 $2 nil :typemodifiers $1)
  ;

It shows the use of the EXPANDFULL grammar macro to parse a ‘PAREN_BLOCK’ which contains a ‘formal_parameter_list’. EXPANDFULL tells to recursively parse ‘formal_parameters’ inside ‘PAREN_BLOCK’. The parser iterates until it digested all available input data inside the ‘PAREN_BLOCK’, trying to match any of the ‘formal_parameters’ rules:

‘LPAREN’
‘RPAREN’
‘formal_parameter COMMA’
‘formal_parameter RPAREN’

At each iteration it will return a ‘formal_parameter’ raw tag, or nil to skip unwanted (single ‘LPAREN’ or ‘RPAREN’ for example) or unexpected input data. Those raw tags will be automatically expanded by the iterative back-end parser.

4.1.2 Bison style

What we call the Bison style is the traditional style of Bison’s grammars. Compared to iterative style, it is not straightforward to use grammars written in Bison style in Semantic. Mainly because such grammars are designed to parse the whole input data in one pass, and don’t use the iterative parser back-end mechanism (see Iterative style). With Bison style the parser is called once to parse the grammar start nonterminal.

The following example is a snippet of the Bison style Java grammar provided in the Semantic distribution in the file semantic/wisent/java.wy.

%start formal_parameter
…

formal_parameter_list
  : formal_parameter_list COMMA formal_parameter
    (cons $3 $1)
  | formal_parameter
    (list $1)
  ;

formal_parameter
  : formal_parameter_modifier_opt type variable_declarator_id
    (EXPANDTAG
     (VARIABLE-TAG $3 $2 :typemodifiers $1)
     )
  ;

The first consequence is that syntax errors are not automatically handled by Semantic. Thus, it is necessary to explicitly handle them at the grammar level, providing error recovery rules to skip unexpected input data.

The second consequence is that the iterative parser can’t do automatic tag expansion, except for the start nonterminal value. It is necessary to explicitly expand tags from concerned semantic actions by calling the grammar macro EXPANDTAG with a raw tag as parameter. See also Start nonterminals, for incremental re-parse considerations.

4.1.3 Mixed style

%start grammar
;; Reparse
%start prologue epilogue declaration nonterminal rule
…

%%

grammar:
    prologue
  | epilogue
  | declaration
  | nonterminal
  | PERCENT_PERCENT
  ;
…

nonterminal:
    SYMBOL COLON rules SEMI
    (TAG $1 'nonterminal :children $3)
  ;

rules:
    lifo_rules
    (apply 'nconc (nreverse $1))
  ;

lifo_rules:
    lifo_rules OR rule
    (cons $3 $1)
  | rule
    (list $1)
  ;

rule:
    rhs
    (let* ((rhs $1)
           name type comps prec action elt)
      …
      (EXPANDTAG
       (TAG name 'rule :type type :value comps :prec prec :expr action)
       ))
  ;

This example shows how iterative and Bison styles can be combined in the same grammar to obtain a good compromise between grammar complexity and an efficient parsing strategy in an interactive environment.

‘nonterminal’ is parsed using iterative style via the main ‘grammar’ rule. The semantic action uses the TAG macro to produce a raw tag, automagically expanded by Semantic.

But ‘rules’ part is parsed in Bison style! Why?

Rule delimiters are the colon (:), that follows the nonterminal name, and a final semicolon (;). Unfortunately these delimiters are not open-paren/close-paren type, and the Emacs’ syntactic analyzer can’t easily isolate data between them to produce a ‘RULES_PART’ parenthesis-block-like lexical token. Consequently it is not possible to use EXPANDFULL to iterate in ‘RULES_PART’, like this:

nonterminal:
    SYMBOL COLON rules SEMI
    (TAG $1 'nonterminal :children $3)
  ;

rules:
    RULES_PART  ;; Map a parenthesis-block-like lexical token
    (EXPANDFULL $1 'rules)
  ;

rules:
    COLON
    ()
    OR
    ()
    SEMI
    ()
    rhs
    rhs
    (let* ((rhs $1)
           name type comps prec action elt)
      …
      (TAG name 'rule :type type :value comps :prec prec :expr action)
      )
  ;

In such cases, when it is difficult for Emacs to obtain parenthesis-block-like lexical tokens, the best solution is to use the traditional Bison style with error recovery!

In some extreme cases, it can also be convenient to extend the lexer, to deliver new lexical tokens, to simplify the grammar.

4.1.4 Start nonterminals

When you write a grammar for Semantic, it is important to carefully indicate the start nonterminals. Each one defines an entry point in the grammar, and after parsing its semantic value is returned to the back-end iterative engine. Consequently:

The semantic value of a start nonterminal must be a produced by a TAG like grammar macro.

Start nonterminals are declared by %start statements. When nothing is specified the first nonterminal that appears in the grammar is the start nonterminal.

Generally, the following nonterminals must be declared as start symbols:

The main grammar entry point

Of course!
nonterminals passed to EXPAND/EXPANDFULL
These grammar macros recursively parse a part of input data, based on rules of the given nonterminal.

For example, the following will parse ‘PAREN_BLOCK’ data using the ‘formal_parameters’ rules:
```
formal_parameter_list
  : PAREN_BLOCK
    (EXPANDFULL $1 formal_parameters)
  ;
```
The semantic value of ‘formal_parameters’ becomes the value of the EXPANDFULL expression. It is a list of Semantic tags spliced in the tags tree.

Because the automaton must know that ‘formal_parameters’ is a start symbol, you must declare it like this:
```
%start formal_parameters
```

The EXPANDFULL macro has a side effect it is important to know, related to the incremental re-parse mechanism of Semantic: the nonterminal symbol parameter passed to EXPANDFULL also becomes the reparse-symbol property of the tag returned by the EXPANDFULL expression.

When buffer’s data mapped by a tag is modified, Semantic schedules an incremental re-parse of that data, using the tag’s reparse-symbol property as start nonterminal.

The rules associated to such start symbols must be carefully reviewed to ensure that the incremental parser will work!

Things are a little bit different when the grammar is written in Bison style.

The reparse-symbol property is set to the nonterminal symbol the rule that explicitly uses EXPANDTAG belongs to.

For example:

rule:
    rhs
    (let* ((rhs $1)
           name type comps prec action elt)
      …
      (EXPANDTAG
       (TAG name 'rule :type type :value comps :prec prec :expr action)
       ))
  ;

Set the reparse-symbol property of the expanded tag to ‘rule’. An important consequence is that:

Every nonterminal having any rule that calls EXPANDTAG in a semantic action, should be declared as a start symbol!

4.1.5 Useful functions

Here is a description of some predefined functions it might be useful to know when writing new code to use Wisent in Semantic:

Function: wisent-collect-unmatched-syntax input ¶

Add input lexical token to the cache of unmatched tokens, in variable semantic-unmatched-syntax-cache.

See implementation of the function wisent-skip-token in Error recovery, for an example of use.

4.2 The Wisent Lex lexer

The lexical analysis step of Semantic is performed by the general function semantic-lex. For more information, see (semantic-langdev)Semantic Language Development.

semantic-lex produces lexical tokens of the form:

(token-class start . end)

token-class: Is a symbol that identifies a lexical token class, like symbol, string, number, or PAREN_BLOCK.
start
end: Are the start and end positions of mapped data in the input buffer.

The Wisent’s parser doesn’t depend on the nature of analyzed input stream (buffer, string, etc.), and requires that lexical tokens have a different form (see What the parser must receive):

(token-class value [start . end])

wisent-lex is the default Wisent’s lexer used in Semantic.

Function: wisent-lex ¶

Return the next available lexical token in Wisent’s form.

The variable wisent-lex-istream contains the list of lexical tokens produced by semantic-lex. Pop the next token available and convert it to a form suitable for the Wisent’s parser.

Mapping of lexical tokens as produced by semantic-lex into equivalent Wisent lexical tokens is straightforward:

(token-class start . end)
     ⇒ (token-class value start . end)

value is the input buffer-substring from start to end.

Appendix A GNU Free Documentation License

Version 1.3, 3 November 2008

Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
https://fsf.org/

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

PREAMBLE
The purpose of this License is to make a manual, textbook, or other functional and useful document free in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

This License is a kind of “copyleft”, which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The “Document”, below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as “you”. You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.

A “Modified Version” of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

A “Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The “Invariant Sections” are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.

The “Cover Texts” are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.

A “Transparent” copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not “Transparent” is called “Opaque”.

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.

The “Title Page” means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, “Title Page” means the text near the most prominent appearance of the work’s title, preceding the beginning of the body of the text.

The “publisher” means any person or entity that distributes copies of the Document to the public.

A section “Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To “Preserve the Title” of such a section when you modify the Document means that it remains a section “Entitled XYZ” according to this definition.

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.
VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may publicly display copies.
COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document’s license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement.
3. State on the Title page the name of the publisher of the Modified Version, as the publisher.
4. Preserve all the copyright notices of the Document.
5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.
7. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s license notice.
8. Include an unaltered copy of this License.
9. Preserve the section Entitled “History”, Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled “History” in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the “History” section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
11. For any section Entitled “Acknowledgements” or “Dedications”, Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
13. Delete any section Entitled “Endorsements”. Such a section may not be included in the Modified Version.
14. Do not retitle any existing section to be Entitled “Endorsements” or to conflict in title with any Invariant Section.
15. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version’s license notice. These titles must be distinct from any other section titles.

You may add a section Entitled “Endorsements”, provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled “History” in the various original documents, forming one section Entitled “History”; likewise combine any sections Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You must delete all sections Entitled “Endorsements.”
COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an “aggregate” if the copyright resulting from the compilation is not used to limit the legal rights of the compilation’s users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document’s Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.
TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.

If a section in the Document is Entitled “Acknowledgements”, “Dedications”, or “History”, the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.
TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, or distribute it is void, and will automatically terminate your rights under this License.

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.

Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, receipt of a copy of some or all of the same material does not give you any rights to use it.
FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See https://www.gnu.org/licenses/.

Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License “or any later version” applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. If the Document specifies that a proxy can decide which future versions of this License can be used, that proxy’s public statement of acceptance of a version permanently authorizes you to choose that version for the Document.
RELICENSING
“Massive Multiauthor Collaboration Site” (or “MMC Site”) means any World Wide Web server that publishes copyrightable works and also provides prominent facilities for anybody to edit those works. A public wiki that anybody can edit is an example of such a server. A “Massive Multiauthor Collaboration” (or “MMC”) contained in the site means any set of copyrightable works thus published on the MMC site.

“CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation with a principal place of business in San Francisco, California, as well as future copyleft versions of that license published by that same organization.

“Incorporate” means to publish or republish a Document, in whole or in part, as part of another Document.

An MMC is “eligible for relicensing” if it is licensed under this License, and if all works that were first published under this License somewhere other than this MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or invariant sections, and (2) were thus incorporated prior to November 1, 2008.

The operator of an MMC Site may republish an MMC contained in the site under CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is eligible for relicensing.

ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:

  Copyright (C)  year  your name.
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.3
  or any later version published by the Free Software Foundation;
  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
  Texts.  A copy of the license is included in the section entitled ``GNU
  Free Documentation License''.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the “with…Texts.” line with this:

    with the Invariant Sections being list their titles, with
    the Front-Cover Texts being list, and with the Back-Cover Texts
    being list.

If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.

Index

Jump to:	$ A B C D E G I L M N P R S T U W

	Index Entry	Section

$
	`$action`:	Grammar format
	`$N`:	Grammar format
	`$nterm`:	Grammar format
	`$region`:	Grammar format
	`$regionn`:	Grammar format

A
	accept:	Wisent Parsing
	ambiguous grammar:	Conflicts
	associativity:	Grammar format
	automaton:	Compiling a grammar

B
	bottom-up parser:	Wisent Parsing

C
	compiling a grammar:	Compiling a grammar
	conflicts resolution:	Conflicts
	context-free grammar:	Wisent Grammar

D
	debugging semantic actions:	Debugging actions
	deterministic automaton:	Conflicts

E
	error recovery:	Error recovery
	error recovery actions:	Error recovery
	error recovery strategy:	Error recovery
	error reporting:	Report errors
	error token:	Error recovery
	expanded tag:	Iterative style
	`EXPANDFULL`:	Iterative style

G
	grammar bison style:	Bison style
	grammar coding conventions:	Grammar format
	grammar compilation:	Compiling a grammar
	grammar conflicts:	Conflicts
	grammar debugging:	Grammar Debugging
	grammar example:	Example
	grammar format:	Grammar format
	grammar iterative style:	Iterative style
	grammar mixed style:	Mixed style
	grammar styles:	Grammar styles
	grammar verbose description:	Grammar Debugging

I
	incremental re-parse:	Start nonterminals

L
	LALR(1) grammar:	Compiling a grammar
	lexer:	Writing a lexer
	lexical analysis:	Writing a lexer
	lexical token mapping:	Wisent Lex
	lexical tokens:	Writing a lexer
	look-ahead token:	Compiling a grammar

M
	middle-rule actions:	Grammar format

N
	nonterminal:	Wisent Grammar

P
	precedence level:	Grammar format

R
	raw tag:	Iterative style
	reduce:	Wisent Parsing
	reduce/reduce conflicts:	Conflicts
	reparse-symbol:	Start nonterminals
	`reparse-symbol` property:	Start nonterminals
	rule:	Wisent Grammar

S
	scanner:	Writing a lexer
	semantic action symbols:	Debugging actions
	semantic actions:	Grammar format
	`semantic-lex`:	Wisent Lex
	shift:	Wisent Parsing
	shift-reduce parser:	Wisent Parsing
	shift/reduce conflicts:	Conflicts
	start nonterminals:	Start nonterminals
	syntax error:	Wisent Parsing

T
	table-driven parser:	Wisent Parsing
	tags:	Wisent Semantic
	terminal:	Wisent Grammar
	token:	Wisent Grammar

U
	understanding the automaton:	Understanding the automaton

W
	`wisent-abort`:	Error recovery
	`wisent-abort`:	Error recovery
	`wisent-cancel-debug-on-entry`:	Debugging actions
	`wisent-cancel-debug-on-entry`:	Debugging actions
	`wisent-clearin`:	Error recovery
	`wisent-clearin`:	Error recovery
	`wisent-collect-unmatched-syntax`:	Useful functions
	`wisent-collect-unmatched-syntax`:	Useful functions
	`wisent-compile-grammar`:	Compiling a grammar
	`wisent-compile-grammar`:	Compiling a grammar
	`wisent-debug-flag`:	Grammar Debugging
	`wisent-debug-flag`:	Grammar Debugging
	`wisent-debug-on-entry`:	Debugging actions
	`wisent-debug-on-entry`:	Debugging actions
	`wisent-debug-show-entry`:	Debugging actions
	`wisent-debug-show-entry`:	Debugging actions
	`wisent-discarding-token-functions`:	Error recovery
	`wisent-discarding-token-functions`:	Error recovery
	`wisent-eoi-term`:	Writing a lexer
	`wisent-eoi-term`:	Writing a lexer
	`wisent-errok`:	Error recovery
	`wisent-errok`:	Error recovery
	`wisent-error`:	Error recovery
	`wisent-error`:	Error recovery
	`wisent-input`:	Actions goodies
	`wisent-input`:	Actions goodies
	`wisent-lex`:	Wisent Lex
	`wisent-lex`:	Wisent Lex
	`wisent-lex-istream`:	Wisent Lex
	`wisent-lexer`:	Actions goodies
	`wisent-lexer`:	Actions goodies
	`wisent-message`:	Report errors
	`wisent-message`:	Report errors
	`wisent-nerrs`:	Error recovery
	`wisent-nerrs`:	Error recovery
	`wisent-parse`:	Wisent Parsing
	`wisent-parse`:	Wisent Parsing
	`wisent-parse-toggle-verbose-flag`:	Report errors
	`wisent-parse-toggle-verbose-flag`:	Report errors
	`wisent-parse-verbose-flag`:	Report errors
	`wisent-parse-verbose-flag`:	Report errors
	`wisent-post-parse-hook`:	Wisent Parsing
	`wisent-post-parse-hook`:	Wisent Parsing
	`wisent-pre-parse-hook`:	Wisent Parsing
	`wisent-pre-parse-hook`:	Wisent Parsing
	`wisent-recovering`:	Error recovery
	`wisent-recovering`:	Error recovery
	`wisent-region`:	Actions goodies
	`wisent-region`:	Actions goodies
	`wisent-set-region`:	Error recovery
	`wisent-set-region`:	Error recovery
	`wisent-single-start-flag`:	Compiling a grammar
	`wisent-skip-block`:	Error recovery
	`wisent-skip-block`:	Error recovery
	`wisent-skip-token`:	Error recovery
	`wisent-skip-token`:	Error recovery
	`wisent-toggle-verbose-flag`:	Grammar Debugging
	`wisent-toggle-verbose-flag`:	Grammar Debugging
	`wisent-token-to-string`:	Report errors
	`wisent-verbose-flag`:	Grammar Debugging
	`wisent-verbose-flag`:	Grammar Debugging
	WY grammar format:	Wisent Semantic

Jump to:	$ A B C D E G I L M N P R S T U W

Wisent Parser Development

Table of Contents

1 Wisent Overview

2 Wisent Grammar

2.1 Grammar format

2.2 Example

2.3 Compiling a grammar

2.4 Conflicts

2.4.1 Grammar debugging

2.4.2 Understanding the automaton

3 Wisent Parsing

3.1 What the parser must receive

3.2 Variables and macros useful in grammar actions.

3.3 The error reporting function

3.4 Error recovery

3.5 Debugging semantic actions

4 How to use Wisent with Semantic

4.1 Grammar styles

4.1.1 Iterative style

4.1.2 Bison style

4.1.3 Mixed style

4.1.4 Start nonterminals

4.1.5 Useful functions

4.2 The Wisent Lex lexer

Appendix A GNU Free Documentation License

ADDENDUM: How to use this License for your documents

Index