Kaw support two syntaxes of string literals:
The traditional, portable, qdouble-quoted-delimited literals
like "this"
;
and extended SRFI-109 quasi-literals like &{this}
.
string
::=
"
string-element
*"
string-element
::=
any character other than "
or \
| mnemonic-escape
| \"
| \\
| \
intraline-whitespace
*line-ending
intraline-whitespace
*
| inline-hex-escape
mnemonic-escape
::=
\a
| \b
| \t
| \n
| \r
| ... (see below)
A string is written as a sequence of characters enclosed
within quotation marks ("
).
Within a string literal, various escape sequence represent characters
other than themselves.
Escape sequences always start with a backslash (\
):
\a
Alarm (bell), #\x0007
.
\b
Backspace, #\x0008
.
\e
Escape, #\x001B
.
\f
Form feed, #\x000C
.
\n
Linefeed (newline), #\x000A
.
\r
Return, #\x000D
.
\t
Character tabulation, #\x0009
.
\v
Vertical tab, #\x000B
.
\C-
x
\^
x
Returns the scalar value of x
masked (anded) with #x9F
.
An alternative way to write the Ascii control characters:
For example "\C-m"
or "\^m"
is
the same as "#\x000D"
(which the same as "\r"
).
As a special case \^?
is rubout (delete) (\x7f;
).
\x
hex-scalar-value
;
\X
hex-scalar-value
;
A hex encoding that gives the scalar value of a character.
\\
oct-digit
+
At most three octal digits that give the scalar value of a character. (Historical, for C compatibility.)
\u
hex-digit
+
Exactly four hex digits that give the scalar value of a character. (Historical, for Java compatibility.)
\M-
x
(Historical, for Emacs Lisp.)
Set the meta-bit (high-bit of single byte) of the following character x
.
\|
Vertical line, #\x007c
.
(Not useful for string literals, but useful for symbols.)
\"
Double quote, #\x0022
.
\\
Backslah, #\005C
.
\
intraline-whitespace
*line-ending
intraline-whitespace
*
Nothing (ignored). Allows you to split up a long string over multiple lines; ignoring initial whitespace on the continuation lines allows you to indent them.
Except for a line ending, any character outside of an escape
sequence stands for itself in the string literal. A line ending
which is preceded by \
intraline-whitespace
*
expands to nothing (along with any trailing intraline-whitespace
),
and can be used to indent strings for improved legibility.
Any other line ending has the same effect as inserting a \n
character into the string.
Examples:
"The word \"recursion\" has many meanings." "Another example:\ntwo lines of text" "Here’s text \ containing just one line" "\x03B1; is named GREEK SMALL LETTER ALPHA."
The following syntax is a string template (also called a string quasi-literal or “here document”):
&{Hello &[name]!}
Assuming the variable name
evaluates to "John"
then the example evaluates to "Hello John!"
.
The Kawa reader converts the above example to:
($string$ "Hello " $<<$ name $>>$ "!")
See the SRFI-109 specification for details.
extended-string-literal
::=
&{
[initial-ignored
] string-literal-part
* }
string-literal-part
::=
any character except &
, {
or }
| {
string-literal-part
* }
| char-ref
| entity-ref
| special-escape
| enclosed-part
You can use the plain "
syntax for
longer multiline strings, but string
"&{
has
various advantages.
The syntax is less error-prone because the start-delimiter is
different from the end-delimiter. Also note that nested braces
are allowed: a right brace string
}}
is only an end-delimiter
if it is unbalanced, so you would seldom need to escape it:
&{This has a {braced} section.} ⇒ "This has a {braced} section."
The escape character used for special characters is
&
. This is compatible with XML syntax and XML literals.
char-ref
::=
&#
digit
+ ;
| &#x
hex-digit
+ ;
entity-ref
::=
&
char-or-entity-name
;
char-or-entity-name
::=
tagname
You can the standard XML syntax for character references, using either decimal or hexadecimal values. The following string has two instances of the Ascii escape character, as either decimal 27 or hex 1B:
&{} ⇒ "\e\e"
You can also use the pre-defined XML entity names:
&{& < > " '} ⇒ "& < > \" '"
In addition, {
}
can be used for left and
right curly brace, though you don’t need them for balanced parentheses:
&{ }_{ / {_} } ⇒ " }_{ / {_} "
You can use the standard XML entity names. For example:
&{Lærdalsøyri} ⇒ "Lærdalsøyri"
You can also use the standard R7RS character names null
,
alarm
, backspace
, tab
, newline
, return
,
escape
, space
, and delete
.
For example:
&{&escape;&space;}
The syntax &
is actually syntactic sugar
(specifically reader syntax) to the variable reference
name
;$entity$:
.
Hence you can also define your own entity names:
name
(define $entity$:crnl "\r\n") &{&crnl;} ⟹ "\r\n"
initial-ignored
::=
intraline-whitespace
*line-ending
intraline-whitespace
*&|
special-escape
::=
intraline-whitespace
*&|
|&
nested-comment
|&-
intraline-whitespace
*line-ending
A line-ending directly in the text is becomes a newline, as in a simple string literal:
(string-capitalize &{one two three uno dos tres }) ⇒ "One Two Three\nUno Dos Tres\n"
However, you have extra control over layout.
If the string is in a nested expression, it is confusing
(and ugly) if the string cannot be indented to match
the surrounding context. The indentation marker &|
is used to mark the end of insignificant initial whitespace.
The &|
characters and all the preceding whitespace are removed.
In addition, it also suppresses an initial newline. Specifically,
when the initial left-brace is followed by optional (invisible)
intraline-whitespace, then a newline, then optional
intraline-whitespace (the indentation), and finally the indentation
marker &|
- all of which is removed from the output.
Otherwise the &|
only removes initial intraline-whitespace
on the same line (and itself).
(write (string-capitalize &{
&|one two three
&|uno dos tres
}) out)
⇒ prints "One Two Three\nUno Dos Tres\n"
As a matter of style, all of the indentation lines should line up. It is an error if there are any non-whitespace characters between the previous newline and the indentation marker. It is also an error to write an indentation marker before the first newline in the literal.
The line-continuation marker &-
is used to suppress a newline:
&{abc&- def} ⇒ "abc def"
You can write a #|...|#
-style comment following a &
.
This could be useful for annotation, or line numbers:
&{&#|line 1|#one two &#|line 2|# three &#|line 3|#uno dos tres } ⇒ "one two\n three\nuno dos tres\n"
enclosed-part
::=
&
enclosed-modifier
*[
expression
*]
|&
enclosed-modifier
*(
expression
+)
An embedded expression has the form &[
.
It is evaluated, the result converted to a string (as by expression
]display
),
and the result added in the result string.
(If there are multiple expressions, they are all evaluated and
the corresponding strings inserted in the result.)
&{Hello &[(string-capitalize name)]!}
You can leave out the square brackets when the expression is a parenthesized expression:
&{Hello &(string-capitalize name)!}
enclosed-modifier
::=
~
format-specifier-after-tilde
Using format
allows finer-grained control over the
output, but a problem is that the association between format
specifiers and data expressions is positional, which is hard-to-read
and error-prone. A better solution places the specifier adjacant to
the data expression:
&{The response was &~,2f(* 100.0 (/ responses total))%.}
The following escape forms are equivalent to the corresponding
forms withput the ~
fmt-spec
, except the
expression(s) are formatted using format
:
&~
fmt-spec
[
expression
*]
Again using parentheses like this:
&~
fmt-spec
(
expression
+)
is equivalent to:
&~
fmt-spec
[(
expression
+)]
The syntax of format
specifications is arcane, but it allows you
to do some pretty neat things in a compact space.
For example to include "_"
between each element of
the array arr
you can use the ~{...~}
format speciers:
(define arr [5 6 7]) &{&~{&[arr]&~^_&~}} ⇒ "5_6_7"
If no format is specified for an enclosed expression,
the that is equivalent to a ~a
format specifier,
so this is equivalent to:
&{&~{&~a[arr]&~^_&~}} ⇒ "5_6_7"
which is in turn equivalent to:
(format #f "~{~a~^_~}" arr)
The fine print that makes this work:
If there are multiple expressions in a &[...]
with
no format specifier then there is an implicit ~a
for
each expression.
On the other hand, if there is an explicit format specifier,
it is not repeated for each enclosed expression: it appears
exactly once in the effective format string, whether
there are zero, one, or many expressions.