The lexical syntax determines how a character sequence is split into a sequence of lexemes, omitting non–significant portions such as comments and whitespace. The character sequence is assumed to be text according to the Unicode standard. Some of the lexemes, such as identifiers, representations of number objects, strings etc., of the lexical syntax are syntactic data in the datum syntax, and thus represent objects. Besides the formal account of the syntax, this section also describes what datum values are represented by these syntactic data.
The lexical syntax, in the description of comments, contains a forward
reference to datum
, which is described as part of the datum
syntax. Being comments, however, these datum
s do not play a
significant role in the syntax.
Case is significant except in representations of booleans, number
objects, and in hexadecimal numbers specifying Unicode scalar values.
For example, #x1A
and #X1a
are equivalent. The identifier
Foo
is, however, distinct from the identifier FOO
.
Interlexeme-space
may occur on either side of any lexeme, but not
within a lexeme.
Identifier
s, .
, number
s, character
s, and
boolean
s, must be terminated by a delimiter
or by the end
of the input.
lexeme
::=
identifier
| boolean
| number
| character
| string
| (
| )
| [
| ]
| #(
| ’
| ‘
| ,
| ,@
| .
| #’
| #‘
| #,
| #,@
delimiter
::=
(
| )
| [
| ]
| "
| ;
| #
| whitespace
((UNFINISHED))
Line endings are significant in Scheme in single–line comments
and within string literals.
In Scheme source code, any of the line endings in line-ending
marks the end of a line. Moreover, the two–character line endings
carriage-return
linefeed
and carriage-return
next-line
each count as a single line ending.
In a string literal, a line-ending
not preceded by a \
stands for a linefeed character, which is the standard line–ending
character of Scheme.
intraline-whitespace
::=
space
| character-tabulation
whitespace
::=
intraline-whitespace
| linefeed
| line-tabulation
| form-feed
| carriage-return
| next-line
| any character whose category is Zs, Zl, or Zp
line-ending
::=
linefeed
| carriage return
| carriage-return
linefeed
| next-line
| carriage-return
next-line
| line-separator
comment
::=
;
all subsequent characters up to a line-ending
or paragraph-separator
| nested-comment
| #;
interlexeme-space
datum
| shebang-comment
nested-comment
::=
#|
comment-text
comment-cont
* |#
comment-text
::=
character sequence not containing #|
or |#
comment-cont
::=
nested-comment
comment-text
atmosphere
::=
whitespace
| comment
interlexeme-space
::=
atmosphere
*
As a special case the characters #!/
are treated as starting a comment,
but only at the beginning of file. These characters are used on
Unix systems as an Shebang interpreter directive.
The Kawa reader skips the entire line.
If the last non-whitespace character is \
(backslash) then the following line is also skipped, and so on.
Whitespace characters are spaces, linefeeds, carriage returns, character tabulations, form feeds, line tabulations, and any other character whose category is Zs, Zl, or Zp. Whitespace is used for improved readability and as necessary to separate lexemes from each other. Whitespace may occur between any two lexemes, but not within a lexeme. Whitespace may also occur inside a string, where it is significant.
The lexical syntax includes several comment forms. In all cases, comments are invisible to Scheme, except that they act as delimiters, so, for example, a comment cannot appear in the middle of an identifier or representation of a number object.
A semicolon (;
) indicates the start of a line comment. The
comment continues to the end of the line on which the semicolon appears.
Another way to indicate a comment is to prefix a datum
with #;
, possibly with
interlexeme-space
before the datum
. The comment consists
of the comment prefix #;
and the datum
together. This
notation is useful for “commenting out” sections of code.
Block comments may be indicated with properly nested #|
and
|#
pairs.
#| The FACT procedure computes the factorial of a non-negative integer. |# (define fact (lambda (n) ;; base case (if (= n 0) #;(= n 1) 1 ; identity of * (* n (fact (- n 1))))))
identifier
::=
initial
subsequent
*
| peculiar-identifier
initial
::=
constituent
| special-initial
| inline-hex-escape
letter
::=
a
| b
| c
| ... | z
| A
| B
| C
| ... | Z
constituent
::=
letter
| any character whose Unicode scalar value is greater than
127, and whose category is Lu, Ll, Lt, Lm, Lo, Mn,
Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co
special-initial
::=
!
| $
| %
| &
| *
| /
| <
| =
| >
| ?
| ^
| _
| ~
subsequent
::=
initial
| digit
| any character whose category is Nd, Mc, or Me
| special-subsequent
digit
::=
0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
oct-digit
::=
0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
hex-digit
::=
digit
| a
| A
| b
| B
| c
| C
| d
| D
| e
| E
| f
| F
special-subsequent
::=
+
| -
| .
| @
escape-sequence
::=
inline-hex-escape
| \
character-except-x
| multi-escape-sequence
inline-hex-escape
::=
\x
hex-scalar-value
;
hex-scalar-value
::=
hex-digit
+
multi-escape-sequence
::=
|
symbol-element
*|
symbol-element
::=
any character except |
or \
| inline-hex-escape
| mnemonic-escape
| \|
character-except-x
::=
any character except x
peculiar-identifier
::=
+
| -
| ...
| ->
subsequent
*
Most identifiers allowed by other programming languages are also
acceptable to Scheme. In general, a sequence of letters, digits, and
“extended alphabetic characters” is an identifier when it begins with
a character that cannot begin a representation of a number object. In
addition, +
, -
, and ...
are identifiers, as is a
sequence of letters, digits, and extended alphabetic characters that
begins with the two–character sequence ->
. Here are some
examples of identifiers:
lambda q soup list->vector + V17a <= a34kTMNs ->- the-word-recursion-has-many-meanings
Extended alphabetic characters may be used within identifiers as if they were letters. The following are extended alphabetic characters:
! $ % & * + - . / < = > ? @ ^ _ ~
Moreover, all characters whose Unicode scalar values are greater than
127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within
identifiers. In addition, any character can be used within an
identifier when specified using an escape-sequence
. For example,
the identifier H\x65;llo
is the same as the identifier
Hello
.
Kawa supports two additional non-R6RS ways of making
identifiers using special characters, both taken from Common Lisp:
Any character (except x
) following a backslash is treated
as if it were a letter
;
as is any character between a pair of vertical bars.
Identifiers have two uses within Scheme programs:
Any identifier may be used as a variable or as a syntactic keyword.
When an identifier appears as or with in literal, it is being used to denote a symbol.
In contrast with older versions of Scheme, the syntax distinguishes between upper and lower case in identifiers and in characters specified via their names, but not in numbers, nor in inline hex escapes used in the syntax of identifiers, characters, or strings. The following directives give explicit control over case folding.
These directives may appear anywhere comments are permitted and are treated as comments, except that they affect the reading of subsequent data. The
#!fold-case
directive causes theread
procedure to case-fold (as if bystring-foldcase
) each identifier and character name subsequently read from the same port. The#!no-fold-case
directive causes theread
procedure to return to the default, non-folding behavior.
Note that colon :
is treated specially for
colon notation in Kawa Scheme,
though it is a special-initial
in standard Scheme (R6RS).
((INCOMPLETE))
number
::=
((TODO))
| quantity
decimal
::=
digit
+ optional-exponent
| .
digit
+ optional-exponent
| digit
+ .
digit
+ optional-exponent
optional-exponent
::=
empty
| exponent-marker
optional-sign
digit
+
exponent-marker
::=
e
| s
| f
| d
| l
The letter used for the exponent in a floating-point literal determines its type:
e
Returns a gnu.math.DFloat
- for example 12e2
.
Note this matches the default when there is no exponent-marker
.
s
or f
Returns a primitive float
(or java.lang.Float
when boxed as an object) - for example 12s2
or 12f2
.
d
Returns a primitive double
(or java.lang.Double
when boxed)
- for example 12d2
.
l
Returns a java.math.BigDecimal
- for example 12l2
.