Next: Parser Language, Previous: Textual Port Primitives, Up: Input/Output [Contents][Index]
The parser buffer mechanism facilitates construction of parsers for complex grammars. It does this by providing an input stream with unbounded buffering and backtracking. The amount of buffering is under program control. The stream can backtrack to any position in the buffer.
The mechanism defines two data types: the parser buffer and the parser-buffer pointer. A parser buffer is like an input port with buffering and backtracking. A parser-buffer pointer is a pointer into the stream of characters provided by a parser buffer.
Note that all of the procedures defined here consider a parser buffer to contain a stream of Unicode characters.
There are several constructors for parser buffers:
Returns a parser buffer that buffers characters read from textual-input-port.
Returns a parser buffer that buffers the characters in the argument
substring. This is equivalent to creating a string input port and
calling textual-input-port->parser-buffer
, but it runs faster
and uses less memory.
Like substring->parser-buffer
but buffers the entire string.
Returns a parser buffer that buffers the characters returned by calling source. Source is a procedure of three arguments: a string, a start index, and an end index (in other words, a substring specifier). Each time source is called, it writes some characters in the substring, and returns the number of characters written. When there are no more characters available, it returns zero. It must not return zero in any other circumstance.
Parser buffers and parser-buffer pointers may be distinguished from other objects:
Returns #t
if object is a parser buffer, otherwise
returns #f
.
Returns #t
if object is a parser-buffer pointer,
otherwise returns #f
.
Characters can be read from a parser buffer much as they can be read from an input port. The parser buffer maintains an internal pointer indicating its current position in the input stream. Additionally, the buffer remembers all characters that were previously read, and can look at characters arbitrarily far ahead in the stream. It is this buffering capability that facilitates complex matching and backtracking.
Returns the next character in buffer, advancing the internal
pointer past that character. If there are no more characters
available, returns #f
and leaves the internal pointer
unchanged.
Returns the next character in buffer, or #f
if no
characters are available. Leaves the internal pointer unchanged.
Returns a character in buffer. Index is a non-negative
integer specifying the character to be returned. If index is
zero, returns the next available character; if it is one, returns the
character after that, and so on. If index specifies a position
after the last character in buffer, returns #f
. Leaves
the internal pointer unchanged.
The internal pointer of a parser buffer can be read or written:
Returns a parser-buffer pointer object corresponding to the internal pointer of buffer.
Sets the internal pointer of buffer to the position specified by
pointer. Pointer must have been returned from a previous
call of get-parser-buffer-pointer
on buffer.
Additionally, if some of buffer’s characters have been discarded
by discard-parser-buffer-head!
, pointer must be outside
the range that was discarded.
Returns a newly-allocated string consisting of all of the characters
in buffer that fall between pointer and buffer’s
internal pointer. Pointer must have been returned from a
previous call of get-parser-buffer-pointer
on buffer.
Additionally, if some of buffer’s characters have been discarded
by discard-parser-buffer-head!
, pointer must be outside
the range that was discarded.
Discards all characters in buffer that have already been read;
in other words, all characters prior to the internal pointer. After
this operation has completed, it is no longer possible to move the
internal pointer backwards past the current position by calling
set-parser-buffer-pointer!
.
The next rather large set of procedures does conditional matching
against the contents of a parser buffer. All matching is performed
relative to the buffer’s internal pointer, so the first character to
be matched against is the next character that would be returned by
peek-parser-buffer-char
. The returned value is always
#t
for a successful match, and #f
otherwise. For
procedures whose names do not end in ‘-no-advance’, a successful
match also moves the internal pointer of the buffer forward to the end
of the matched text; otherwise the internal pointer is unchanged.
Each of these procedures compares a single character in buffer
to char. The basic comparison match-parser-buffer-char
compares the character to char using char=?
. The
procedures whose names contain the ‘-ci’ modifier do
case-insensitive comparison (i.e. they use char-ci=?
). The
procedures whose names contain the ‘not-’ modifier are successful
if the character doesn’t match char.
These procedures compare the next character in buffer against
char-set using char-in-set?
.
These procedures match string against buffer’s contents. The ‘-ci’ procedures do case-insensitive matching.
These procedures match the specified substring against buffer’s contents. The ‘-ci’ procedures do case-insensitive matching.
The remaining procedures provide information that can be used to identify locations in a parser buffer’s stream.
Returns a string describing the location of pointer in terms of its character and line indexes. This resulting string is meant to be presented to an end user in order to direct their attention to a feature in the input stream. In this string, the indexes are presented as one-based numbers.
Pointer may alternatively be a parser buffer, in which case it is equivalent to having specified the buffer’s internal pointer.
Next: Parser Language, Previous: Textual Port Primitives, Up: Input/Output [Contents][Index]