Strings are sequences of characters. The length of a string is the number of characters that it contains, as an exact non-negative integer. The valid indices of a string are the exact non-negative integers less than the length of the string. The first character of a string has index 0, the second has index 1, and so on.
Strings are implemented as a sequence of 16-bit char
values,
even though they’re semantically a sequence of 32-bit Unicode code points.
A character whose value is greater than #xffff
is represented using two surrogate characters.
The implementation allows for natural interoperability with Java APIs.
However it does make certain operations (indexing or counting based on
character counts) difficult to implement efficiently. Luckily one
rarely needs to index or count based on character counts;
alternatives are discussed below.
There are different kinds of strings:
An istring is immutable:
It is fixed, and cannot be modified.
On the other hand, indexing (e.g. string-ref
) is efficient (constant-time),
while indexing of other string implementations takes time proportional
to the index.
String literals are istrings, as are the return values of most of the procedures in this chapter.
An istring is an instance of the gnu.lists.IString
class.
An mstring is mutable:
You can replace individual characters (using string-set!
).
You can also change the mstring
’s length by inserting
or removing characters (using string-append!
or string-replace!
).
An mstring is an instance of the gnu.lists.FString
class.
Any other object that implements the java.lang.CharSequence
interface
is also a string.
This includes standard Java java.lang.String
and java.lang.StringBuilder
objects.
Some of the procedures that operate on strings ignore the
difference between upper and lower case. The names of
the versions that ignore case end with “-ci
” (for “case
insensitive”).
Compatibility:
Many of the following procedures (for example string-append
)
return an immutable istring in Kawa,
but return a “freshly allocated” mutable string in
standard Scheme (include R7RS) as well as most Scheme implementations
(including previous versions of Kawa).
To get the “compatibility mode” versions of those procedures
(which return mstrings),
invoke Kawa with one the --r5rs
, --r6rs
, or --r7rs
options, or you can import
a standard library like (scheme base)
.
The type of string objects. The underlying type is the interface
java.lang.CharSequence
. Immultable strings aregnu.lists.IString
orjava.lang.String
, while mutable strings aregnu.lists.FString
.
Return
#t
ifobj
is a string,#f
otherwise.
Return
#t
ifobj
is a istring (a immutable, constant-time-indexable string);#f
otherwise.
Return a string composed of the arguments. This is analogous to
list
.Compatibility: The result is an istring, except in compatibility mode, when it is a new allocated mstring.
Procedure: string-length
string
Return the number of characters in the given
string
as an exact integer object.Performance note: If the
string
is not an istring, the callingstring-length
may take time proportional to the length of thestring
, because of the need to scan for surrogate pairs.
Procedure: string-ref
string
k
k
must be a valid index ofstring
. Thestring-ref
procedure returns characterk
ofstring
using zero–origin indexing.Performance note: If the
string
is not an istring, then callingstring-ref
may take time proportional tok
because of the need to check for surrogate pairs. An alternative is to usestring-cursor-ref
. If iterating through a string, usestring-for-each
.
Procedure: string-null?
string
Is
string
the empty string? Same result as(= (string-length
but executes in O(1) time.string
) 0)
Procedure: string-every
pred
string
[start
end
])
Procedure: string-any
pred
string
[start
end
])
Checks to see if every/any character in
string
satisfiespred
, proceeding from left (indexstart
) to right (indexend
). These procedures are short-circuiting: ifpred
returns false,string-every
does not callpred
on subsequent characters; ifpred
returns true,string-any
does not callpred
on subsequent characters. Both procedures are “witness-generating”:
If
string-every
is given an empty interval (withstart
=end
), it returns#t
.If
string-every
returns true for a non-empty interval (withstart
<end
), the returned true value is the one returned by the final call to the predicate on(string-ref
.string
(-end
1))If
string-any
returns true, the returned true value is the one returned by the predicate.Note: The names of these procedures do not end with a question mark. This indicates a general value is returned instead of a simple boolean (
#t
or#f
).
Procedure: string-tabulate
proc
len
Constructs a string of size
len
by callingproc
on each value from 0 (inclusive) tolen
(exclusive) to produce the corresponding element of the string. The procedureproc
accepts an exact integer as its argument and returns a character. The order in whichproc
is called on those indexes is not specifified.Rationale: Although
string-unfold
is more general,string-tabulate
is likely to run faster for the common special case it implements.
Procedure: string-unfold
stop?
mapper
successor
seed
[base
make-final
]
Procedure: string-unfold-right
stop?
mapper
successor
seed
[base
make-final
]
This is a fundamental and powerful constructor for strings.
successor
is used to generate a series of “seed” values from the initial seed:seed
,(
successor
seed
)
,(
successor
2seed
)
,(
successor
3seed
)
, ...
stop?
tells us when to stop — when it returns true when applied to one of these seed values.
mapper
maps each seed value to the corresponding character(s) in the result string, which are assembled into that string in left-to-right order. It is an error formapper
to return anything other than a character or string.
base
is the optional initial/leftmost portion of the constructed string, which defaults to the empty string""
. It is an error ifbase
is anything other than a character or string.
make-final
is applied to the terminal seed value (on whichstop?
returns true) to produce the final/rightmost portion of the constructed string. It defaults to(lambda (x) "")
. It is an error formake-final
to return anything other than a character or string.
string-unfold-right
is the same asstring-unfold
except the results ofmapper
are assembled into the string in right-to-left order,base
is the optional rightmost portion of the constructed string, andmake-final
produces the leftmost portion of the constructed string.You can use it
string-unfold
to convert a list to a string, read a port into a string, reverse a string, copy a string, and so forth. Examples:(define (port->string p) (string-unfold eof-object? values (lambda (x) (read-char p)) (read-char p))) (define (list->string lis) (string-unfold null? car cdr lis)) (define (string-tabulate f size) (string-unfold (lambda (i) (= i size)) f add1 0))To map
f
over a listlis
, producing a string:(string-unfold null? (composef
car) cdrlis
)Interested functional programmers may enjoy noting that
string-fold-right
andstring-unfold
are in some sense inverses. That is, given operationsknull?
,kar
,kdr
,kons
, andknil
satisfying(kons
(kar
x) (kdr
x)) = x and (knull?
knil
) = #tthen
(string-fold-rightkons
knil
(string-unfoldknull?
kar
kdr
x
)) =x
and
(string-unfoldknull?
kar
kdr
(string-fold-rightkons
knil
string
)) =string
.This combinator pattern is sometimes called an “anamorphism.”
Procedure: substring
string
start
end
string
must be a string, andstart
andend
must be exact integer objects satisfying:0 <=start
<=end
<= (string-lengthstring
)The
substring
procedure returns a newly allocated string formed from the characters ofstring
beginning with indexstart
(inclusive) and ending with indexend
(exclusive).
Procedure: string-take
string
nchars
Procedure: string-drop
string
nchars
Procedure: string-take-right
string
nchars
Procedure: string-drop-right
string
nchars
string-take
returns an immutable string containing the firstnchars
ofstring
;string-drop
returns a string containing all but the firstnchars
ofstring
.string-take-right
returns a string containing the lastnchars
ofstring
;string-drop-right
returns a string containing all but the lastnchars
ofstring
.(string-take "Pete Szilagyi" 6) ⇒ "Pete S" (string-drop "Pete Szilagyi" 6) ⇒ "zilagyi" (string-take-right "Beta rules" 5) ⇒ "rules" (string-drop-right "Beta rules" 5) ⇒ "Beta "It is an error to take or drop more characters than are in the string:
(string-take "foo" 37) ⇒ error
Procedure: string-pad
string
len
[char
start
end
]
Procedure: string-pad-right
string
len
[char
start
end
]
Returns an istring of length
len
comprised of the characters drawn from the given subrange ofstring
, padded on the left (right) by as many occurrences of the characterchar
as needed. Ifstring
has more thanlen
chars, it is truncated on the left (right) to lengthlen
. Thechar
defaults to#\space
(string-pad "325" 5) ⇒ " 325" (string-pad "71325" 5) ⇒ "71325" (string-pad "8871325" 5) ⇒ "71325"
Procedure: string-trim
string
[pred
start
end
]
Procedure: string-trim-right
string
[pred
start
end
]
Procedure: string-trim-both
string
[pred
start
end
]
Returns an istring obtained from the given subrange of
string
by skipping over all characters on the left / on the right / on both sides that satisfy the second argumentpred
:pred
defaults tochar-whitespace?
.(string-trim-both " The outlook wasn't brilliant, \n\r") ⇒ "The outlook wasn't brilliant,"
Procedure: string=?
string
1
string
2
string
3
…
Return
#t
if the strings are the same length and contain the same characters in the same positions. Otherwise, thestring=?
procedure returns#f
.(string=? "Straße" "Strasse") ⇒ #f
Procedure: string<?
string
1
string
2
string
3
…
Procedure: string>?
string
1
string
2
string
3
…
Procedure: string<=?
string
1
string
2
string
3
…
Procedure: string>=?
string
1
string
2
string
3
…
These procedures return
#t
if their arguments are (respectively): monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically nonincreasing. These predicates are required to be transitive.These procedures are the lexicographic extensions to strings of the corresponding orderings on characters. For example,
string<?
is the lexicographic ordering on strings induced by the orderingchar<?
on characters. If two strings differ in length but are the same up to the length of the shorter string, the shorter string is considered to be lexicographically less than the longer string.(string<? "z" "ß") ⇒ #t (string<? "z" "zz") ⇒ #t (string<? "z" "Z") ⇒ #f
Procedure: string-ci=?
string
1
string
2
string
3
…
Procedure: string-ci<?
string
1
string
2
string
3
…
Procedure: string-ci>?
string
1
string
2
string
3
…
Procedure: string-ci<=?
string
1
string
2
string
3
…
Procedure: string-ci>=?
string
1
string
2
string
3
…
These procedures are similar to
string=?
, etc., but behave as if they appliedstring-foldcase
to their arguments before invoking the corresponding procedures without-ci
.(string-ci<? "z" "Z") ⇒ #f (string-ci=? "z" "Z") ⇒ #t (string-ci=? "Straße" "Strasse") ⇒ #t (string-ci=? "Straße" "STRASSE") ⇒ #t (string-ci=? "ΧΑΟΣ" "χαοσ") ⇒ #t
The
list->string
procedure returns an istring formed from the characters inlist
, in order. It is an error if any element oflist
is not a character.Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
Procedure: reverse-list->string
list
An efficient implementation of
(compose list->text reverse)
:(reverse-list->text '(#\a #\B #\c)) ⇒ "cBa"This is a common idiom in the epilogue of string-processing loops that accumulate their result using a list in reverse order. (See also
string-concatenate-reverse
for the “chunked” variant.)
Procedure: string->list
[string
[start
]]end
The
string->list
procedure returns a newly allocated list of the characters ofstring
betweenstart
andend
, in order. Thestring->list
andlist->string
procedures are inverses so far asequal?
is concerned.
Procedure: vector->string
vector
[start
[end
]]
The
vector->string
procedure returns a newly allocated string of the objects contained in the elements ofvector
betweenstart
andend
. It is an error if any element ofvector
betweenstart
andend
is not a character, or is a character forbidden in strings.(vector->string #(#\1 #\2 #\3)) ⇒ "123" (vector->string #(#\1 #\2 #\3 #\4 #\5) 2 4) ⇒ "34"
Procedure: string->vector
string
[start
[end
]]
The
string->vector
procedure returns a newly created vector initialized to the elements of the stringstring
betweenstart
andend
.(string->vector "ABC") ⇒ #(#\A #\B #\C) (string->vector "ABCDE" 1 3) ⇒ #(#\B #\C)
Procedure: string-upcase
string
Procedure: string-downcase
string
Procedure: string-titlecase
string
Procedure: string-foldcase
string
These procedures take a string argument and return a string result. They are defined in terms of Unicode’s locale–independent case mappings from Unicode scalar–value sequences to scalar–value sequences. In particular, the length of the result string can be different from the length of the input string. When the specified result is equal in the sense of
string=?
to the argument, these procedures may return the argument instead of a newly allocated string.The
string-upcase
procedure converts a string to upper case;string-downcase
converts a string to lower case. Thestring-foldcase
procedure converts the string to its case–folded counterpart, using the full case–folding mapping, but without the special mappings for Turkic languages. Thestring-titlecase
procedure converts the first cased character of each word, and downcases all other cased characters.(string-upcase "Hi") ⇒ "HI" (string-downcase "Hi") ⇒ "hi" (string-foldcase "Hi") ⇒ "hi" (string-upcase "Straße") ⇒ "STRASSE" (string-downcase "Straße") ⇒ "straße" (string-foldcase "Straße") ⇒ "strasse" (string-downcase "STRASSE") ⇒ "strasse" (string-downcase "Σ") ⇒ "σ" ; Chi Alpha Omicron Sigma: (string-upcase "ΧΑΟΣ") ⇒ "ΧΑΟΣ" (string-downcase "ΧΑΟΣ") ⇒ "χαος" (string-downcase "ΧΑΟΣΣ") ⇒ "χαοσς" (string-downcase "ΧΑΟΣ Σ") ⇒ "χαος σ" (string-foldcase "ΧΑΟΣΣ") ⇒ "χαοσσ" (string-upcase "χαος") ⇒ "ΧΑΟΣ" (string-upcase "χαοσ") ⇒ "ΧΑΟΣ" (string-titlecase "kNock KNoCK") ⇒ "Knock Knock" (string-titlecase "who's there?") ⇒ "Who's There?" (string-titlecase "r6rs") ⇒ "R6rs" (string-titlecase "R6RS") ⇒ "R6rs"Since these procedures are locale–independent, they may not be appropriate for some locales.
Kawa Note: The implementation of
string-titlecase
does not correctly handle the case where an initial character needs to be converted to multiple characters, such as “LATIN SMALL LIGATURE FL” which should be converted to the two letters"Fl"
.Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
Procedure: string-normalize-nfd
string
Procedure: string-normalize-nfkd
string
Procedure: string-normalize-nfc
string
Procedure: string-normalize-nfkc
string
These procedures take a string argument and return a string result, which is the input string normalized to Unicode normalization form D, KD, C, or KC, respectively. When the specified result is equal in the sense of
string=?
to the argument, these procedures may return the argument instead of a newly allocated string.(string-normalize-nfd "\xE9;") ⇒ "\x65;\x301;" (string-normalize-nfc "\xE9;") ⇒ "\xE9;" (string-normalize-nfd "\x65;\x301;") ⇒ "\x65;\x301;" (string-normalize-nfc "\x65;\x301;") ⇒ "\xE9;"
Procedure: string-prefix-length
string
1
string
2
[start
1
end
1
start
2
end
2
]
Procedure: string-suffix-length
string
1
string
2
[start
1
end
1
start
2
end
2
]
Return the length of the longest common prefix/suffix of
string1
andstring2
. For prefixes, this is equivalent to their “mismatch index” (relative to the start indexes).The optional
start
/end
indexes restrict the comparison to the indicated substrings ofstring1
andstring2
.
Procedure: string-prefix?
string
1
string
2
[start
1
end
1
start
2
end
2
]
Procedure: string-suffix?
string
1
string
2
[start
1
end
1
start
2
end
2
]
Is
string1
a prefix/suffix ofstring2
?The optional
start
/end
indexes restrict the comparison to the indicated substrings ofstring1
andstring2
.
Procedure: string-index
string
pred
[start
end
]
Procedure: string-index-right
string
pred
[start
end
]
Procedure: string-skip
string
pred
[start
end
]
Procedure: string-skip-right
string
pred
[start
end
]
string-index
searches through the given substring from the left, returning the index of the leftmost character satisfying the predicatepred
.string-index-right
searches from the right, returning the index of the rightmost character satisfying the predicatepred
. If no match is found, these procedures return#f
.The
start
andend
arguments specify the beginning and end of the search; the valid indexes relevant to the search includestart
but excludeend
. Beware of “fencepost”" errors: when searching right-to-left, the first index considered is(-
, whereas when searching left-to-right, the first index considered isend
1)start
. That is, the start/end indexes describe the same half-open interval[
in these procedures that they do in other string procedures.start
,end
)The
-skip
functions are similar, but use the complement of the criterion: they search for the first char that doesn’t satisfypred
. To skip over initial whitespace, for example, say(substring string (or (string-skip string char-whitespace?) (string-length string)) (string-length string))These functions can be trivially composed with
string-take
andstring-drop
to producetake-while
,drop-while
,span
, andbreak
procedures without loss of efficiency.
Procedure: string-contains
string
1
string
2
[start
1
end
1
start
2
end
2
]
Procedure: string-contains-right
string
1
string
2
[start
1
end
1
start
2
end
2
]
Does the substring of
string1
specified bystart1
andend1
contain the sequence of characters given by the substring ofstring2
specified bystart2
andend2
?Returns
#f
if there is no match. Ifstart2
=end2
,string-contains
returnsstart1
butstring-contains-right
returnsend1
. Otherwise returns the index instring1
for the first character of the first/last match; that index lies within the half-open interval [start1
,end1
), and the match lies entirely within the [start1
,end1
) range ofstring1
.(string-contains "eek -- what a geek." "ee" 12 18) ; Searches "a geek" ⇒ 15Note: The names of these procedures do not end with a question mark. This indicates a useful value is returned when there is a match.
Procedure: string-append
string
…
Returns a string whose characters form the concatenation of the given strings.
Compatibility: The result is an istring, except in compatibility mode, when it is an mstring.
Procedure: string-concatenate
string-list
Concatenates the elements of
string-list
together into a single istring.Rationale: Some implementations of Scheme limit the number of arguments that may be passed to an n-ary procedure, so the
(apply string-append
idiom, which is otherwise equivalent to using this procedure, is not as portable.string-list
)
Procedure: string-concatenate-reverse
string-list
[final-string
[end
]])
With no optional arguments, calling this procedure is equivalent to
(string-concatenate (reverse
. If the optional argumentstring-list
))final-string
is specified, it is effectively consed onto the beginning ofstring-list
before performing the list-reverse and string-concatenate operations.If the optional argument
end
is given, only the characters up to but not includingend
infinal-string
are added to the result, thus producing(string-concatenate (reverse (cons (substring final-string 0 end) string-list)))For example:
(string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7) ⇒ "Hello, I must be going."Rationale: This procedure is useful when constructing procedures that accumulate character data into lists of string buffers, and wish to convert the accumulated data into a single string when done. The optional end argument accommodates that use case when
final-string
is a bob-full mutable string, and is allowed (for uniformity) whenfinal-string
is an immutable string.
Procedure: string-join
string-list
[delimiter
[grammar
]]
This procedure is a simple unparser; it pastes strings together using the
delimiter
string, returning an istring.The
string-list
is a list of strings. Thedelimiter
is the string used to delimit elements; it defaults to a single space" "
.The
grammar
argument is a symbol that determines how thedelimiter
is used, and defaults to'infix
. It is an error forgrammar
to be any symbol other than these four:
'infix
An infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty string.
'strict-infix
Means the same as
'infix
if the string-list is non-empty, but will signal an error if given an empty list. (This avoids an ambiguity shown in the examples below.)'suffix
Means a suffix or terminator grammar: insert the
delimiter
after every list element.'prefix
Means a prefix grammar: insert the
delimiter
before every list element.(string-join '("foo" "bar" "baz")) ⇒ "foo bar baz" (string-join '("foo" "bar" "baz") "") ⇒ "foobarbaz" (string-join '("foo" "bar" "baz") ":") ⇒ "foo:bar:baz" (string-join '("foo" "bar" "baz") ":" 'suffix) ⇒ "foo:bar:baz:" ;; Infix grammar is ambiguous wrt empty list vs. empty string: (string-join '() ":") ⇒ "" (string-join '("") ":") ⇒ "" ;; Suffix and prefix grammars are not: (string-join '() ":" 'suffix)) ⇒ "" (string-join '("") ":" 'suffix)) ⇒ ":"
Procedure: string-replace
string
1
string
2
start
1
end
1
[start
2
end
2
]
Returns
(string-append (substringstring1
0start1
) (substringstring2
start2
end2
) (substringstring1
end1
(string-lengthstring1
)))That is, the segment of characters in
string1
fromstart1
toend1
is replaced by the segment of characters instring2
fromstart2
toend2
. Ifstart1
=end1
, this simply splices the characters drawn fromstring2
intostring1
at that position.Examples:
(string-replace "The TCL programmer endured daily ridicule." "another miserable perl drone" 4 7 8 22) ⇒ "The miserable perl programmer endured daily ridicule." (string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9) ⇒ "It's lots of fun to code it up in Scheme." (define (string-insert s i t) (string-replace s t i i)) (string-insert "It's easy to code it up in Scheme." 5 "really ") ⇒ "It's really easy to code it up in Scheme." (define (string-set s i c) (string-replace s (string c) i (+ i 1))) (string-set "String-ref runs in O(n) time." 19 #\1) ⇒ "String-ref runs in O(1) time."
Also see string-append!
and string-replace!
for destructive changes to a mutable string.
Procedure: string-fold
kons
knil
string
[start
end
]
Procedure: string-fold-right
kons
knil
string
[start
end
]
These are the fundamental iterators for strings.
The
string-fold
procedure maps thekons
procedure across the givenstring
from left to right:(... (kons
string
2 (kons
string
1 (kons
string
0knil
))))In other words, string-fold obeys the (tail) recursion
(string-foldkons
knil
string
start
end
) = (string-foldkons
(kons
string
startknil
)start+1
end
)The
string-fold-right
procedure mapskons
across the given stringstring
from right to left:(kons
string
0 (... (kons
string
end-3
(kons
string
end-2
(kons
string
end-1
knil
)))))obeying the (tail) recursion
(string-fold-rightkons
knil
string
start
end
) = (string-fold-rightkons
(kons
string
end-1
knil
)start
end-1
)Examples:
;;; Convert a string or string to a list of chars. (string-fold-right cons '() string) ;;; Count the number of lower-case characters in a string or string. (string-fold (lambda (c count) (if (char-lower-case? c) (+ count 1) count)) 0 string)The string-fold-right combinator is sometimes called a "catamorphism."
Procedure: string-for-each
proc
string
1
string
2
…
Procedure: string-for-each
proc
string
1
[start
[end
]]
The
string
s must all have the same length.proc
should accept as many arguments as there arestring
s.The
start
-end
variant is provided for compatibility with the SRFI-13 version. (In that casestart
andend
count code Unicode scalar values (character
values), not Java 16-bitchar
values.)The
string-for-each
procedure appliesproc
element–wise to the characters of thestring
s for its side effects, in order from the first characters to the last.proc
is always called in the same dynamic environment asstring-for-each
itself.Analogous to
for-each
.(let ((v '())) (string-for-each (lambda (c) (set! v (cons (char->integer c) v))) "abcde") v) ⇒ (101 100 99 98 97)Performance note: The compiler generates efficient code for
string-for-each
. Ifproc
is a lambda expression, it is inlined.
Procedure: string-map
proc
string
1
string
2
…
The
string-map
procedure appliesproc
element-wise to the elements of the strings and returns a string of the results, in order. It is an error ifproc
does not accept as many arguments as there are strings, or return other than a single character or a string. If more than one string is given and not all strings have the same length,string-map
terminates when the shortest string runs out. The dynamic order in whichproc
is applied to the elements of the strings is unspecified.(string-map char-foldcase "AbdEgH") ⇒ "abdegh"(string-map (lambda (c) (integer->char (+ 1 (char->integer c)))) "HAL") ⇒ "IBM"(string-map (lambda (c k) ((if (eqv? k #\u) char-upcase char-downcase) c)) "studlycaps xxx" "ululululul") ⇒ "StUdLyCaPs"Traditionally the result of
proc
had to be a character, but Kawa (and SRFI-140) allows the result to be a string.Performance note: The
string-map
procedure has not been optimized (mainly because it is not very useful): The characters are boxed, and theproc
is not inlined even if it is a lambda expression.
Procedure: string-map-index
proc
string
[start
end
]
Calls
proc
on each valid index of the specified substring, converts the results of those calls into strings, and returns the concatenation of those strings. It is an error forproc
to return anything other than a character or string. The dynamic order in which proc is called on the indexes is unspecified, as is the dynamic order in which the coercions are performed. If any strings returned byproc
are mutated after they have been returned and before the call tostring-map-index
has returned, thenstring-map-index
returns a string with unspecified contents; thestring-map-index
procedure itself does not mutate those strings.
Procedure: string-for-each-index
proc
string
[start
end
]
Calls
proc
on each valid index of the specified substring, in increasing order, discarding the results of those calls. This is simply a safe and correct way to loop over a substring.Example:
(let ((txt (string->string "abcde")) (v '())) (string-for-each-index (lambda (cur) (set! v (cons (char->integer (string-ref txt cur)) v))) txt) v) ⇒ (101 100 99 98 97)
Procedure: string-count
string
pred
[start
end
]
Returns a count of the number of characters in the specified substring of
string
that satisfy the predicatepred
.
Procedure: string-filter
pred
string
[start
end
]
Procedure: string-remove
pred
string
[start
end
]
Return an immutable string consisting of only selected characters, in order:
string-filter
selects only the characters that satisfypred
;string-remove
selects only the characters that not satisfypred
Procedure: string-repeat
string-or-character
len
Create an istring by repeating the first argument
len
times. If the first argument is a character, it is as if it were wrapped with thestring
constructor. We can define string-repeat in terms of the more generalxsubstring
procedure:(define (string-repeat S N) (let ((T (if (char? S) (string S) S))) (xsubstring T 0 (* N (string-length T))))
Procedure: xsubstring
string
[from
to
[start
end
]]
This is an extended substring procedure that implements replicated copying of a substring. The
string
is a string;start
andend
are optional arguments that specify a substring ofstring
, defaulting to 0 and the length ofstring
. This substring is conceptually replicated both up and down the index space, in both the positive and negative directions. For example, ifstring
is"abcdefg"
,start
is 3, andend
is 6, then we have the conceptual bidirectionally-infinite string... d e f d e f d e f d e f d e f d e f d ... -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9
xsubstring
returns the substring of thestring
beginning at indexfrom
, and ending atto
. It is an error iffrom
is greater thanto
.If
from
andto
are missing they default to 0 andfrom
+(end
-start
), respectively. This variant is a generalization of usingsubstring
, but unlikesubstring
never shares substructures that would retain characters or sequences of characters that are substructures of its first argument or previously allocated objects.You can use
xsubstring
to perform a variety of tasks:
To rotate a string left:
(xsubstring "abcdef" 2 8) ⇒ "cdefab"
To rotate a string right:
(xsubstring "abcdef" -2 4) ⇒ "efabcd"
To replicate a string:
(xsubstring "abc" 0 7) ⇒ "abcabca"
Note that
The
from
/to
arguments give a half-open range containing the characters from indexfrom
up to, but not including, indexto
.The
from
/to
indexes are not expressed in the index space ofstring
. They refer instead to the replicated index space of the substring defined bystring
,start
, andend
.It is an error if
start
=end
, unlessfrom
=to
, which is allowed as a special case.
Procedure: string-split
string
delimiter
[grammar
limit
start
end
]
Returns a list of strings representing the words contained in the substring of
string
fromstart
(inclusive) toend
(exclusive). Thedelimiter
is a string to be used as the word separator. This will often be a single character, but multiple characters are allowed for use cases such as splitting on"\r\n"
. The returned list will have one more item than the number of non-overlapping occurrences of thedelimiter
in the string. Ifdelimiter
is an empty string, then the returned list contains a list of strings, each of which contains a single character.The
grammar
is a symbol with the same meaning as in thestring-join
procedure. If it isinfix
, which is the default, processing is done as described above, except an empty string produces the empty list; if grammar isstrict-infix
, then an empty string signals an error. The valuesprefix
andsuffix
cause a leading/trailing empty string in the result to be suppressed.If
limit
is a non-negative exact integer, at most that many splits occur, and the remainder of string is returned as the final element of the list (so the result will have at most limit+1 elements). If limit is not specified or is #f, then as many splits as possible are made. It is an error if limit is any other value.To split on a regular expression, you can use SRFI 115’s
regexp-split
procedure.
The following procedures create a mutable string, i.e. one that you can modify.
Procedure: make-string
[
[k
]]char
Return a newly allocated mstring of
k
characters, wherek
defaults to 0. Ifchar
is given, then all elements of the string are initialized tochar
, otherwise the contents of thestring
are unspecified.The 1-argument version is deprecated as poor style, except when k is 0.
Rationale: In many languags the most common pattern for mutable strings is to allocate an empty string and incrementally append to it. It seems natural to initialize the string with
(make-string)
, rather than(make-string 0)
.To return an immutable string that repeats
k
times a characterchar
usestring-repeat
.This is as R7RS, except the result is variable-size and we allow leaving out
k
when it is zero.
Procedure: string-copy
[string
[start
]]end
Returns a newly allocated mutable (mstring) copy of the part of the given
string
betweenstart
andend
.
The following procedures modify a mutable string.
Procedure: string-set!
string
k
char
This procedure stores
char
in elementk
ofstring
.(define s1 (make-string 3 #\*)) (define s2 "***") (string-set! s1 0 #\?) ⇒ void s1 ⇒ "?**" (string-set! s2 0 #\?) ⇒ error (string-set! (symbol->string 'immutable) 0 #\?) ⇒ errorPerformance note: Calling
string-set!
may take time proportional to the length of the string: First it must scan for the right position, likestring-ref
does. Then if the new character requires using a surrogate pair (and the old one doesn’t) then we have to make room in the string, possibly re-allocating a newchar
array. Alternatively, if the old character requires using a surrogate pair (and the new one doesn’t) then following characters need to be moved.The function
string-set!
is deprecated: It is inefficient, and it very seldom does the correct thing. Instead, you can construct a string withstring-append!
.
Procedure: string-append!
string
value
…
The
string
must be a mutable string, such as one returned bymake-string
orstring-copy
. Thestring-append!
procedure extendsstring
by appending eachvalue
(in order) to the end ofstring
. Eachvalue
should be a character or a string.Performance note: The compiler converts a call with multiple
value
s to multiplestring-append!
calls. If avalue
is known to be acharacter
, then no boxing (object-allocation) is needed.The following example shows how to efficiently process a string using
string-for-each
and incrementally “build” a result string usingstring-append!
.(define (translate-space-to-newline str::string)::string (let ((result (make-string 0))) (string-for-each (lambda (ch) (string-append! result (if (char=? ch #\Space) #\Newline ch))) str) result))
Procedure: string-copy!
to
at
[from
[start
]]end
Copies the characters of the string
from
that are betweenstart
endend
into the stringto
, starting at indexat
. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This is achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)This is equivalent to (and implemented as):
(string-replace! to at (+ at (- end start)) from start end))(define a "12345") (define b (string-copy "abcde")) (string-copy! b 1 a 0 2) b ⇒ "a12de"
Procedure: string-replace!
dst
dst-start
dst-end
[src
[src-start
]]src-end
Replaces the characters of string
dst
(betweendst-start
anddst-end
) with the characters ofsrc
(betweensrc-start
andsrc-end
). The number of characters fromsrc
may be different than the number replaced indst
, so the string may grow or contract. The special case wheredst-start
is equal todst-end
corresponds to insertion; the case wheresrc-start
is equal tosrc-end
corresponds to deletion. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. (This is achieved without allocating storage by making sure to copy in the correct direction in such circumstances.)
Procedure: string-fill!
string
[fill
[start
]]end
The
string-fill!
procedure storesfill
in the elements ofstring
betweenstart
andend
. It is an error iffill
is not a character or is forbidden in strings.
Using function-call syntax with strings is convenient and efficient. However, it has some “gotchas”.
We will use the following example string:
(! str1 "Smile \x1f603;!")
or if you’re brave:
(! str1 "Smile 😃!")
This is "Smile "
followed by an emoticon (“smiling face with
open mouth”) followed by "!"
.
The emoticon has scalar value \x1f603
- it is not
in the 16-bit Basic Multi-language Plane,
and so it must be encoded by a surrogate pair
(#\xd83d
followed by #\xde03
).
The number of scalar values (character
s) is 8,
while the number of 16-bits code units (char
s) is 9.
The java.lang.CharSequence:length
method
counts char
s. Both the length
and the
string-length
procedures count character
s. Thus:
(length str1) ⇒ 8 (string-length str1) ⇒ 8 (str1:length) ⇒ 9
Counting char
s is a constant-time operation (since it
is stored in the data structure).
Counting character
s depends on the representation used:
In geneeral it may take time proportional to the length of
the string, since it has to subtract one for each surrogate pair;
however the istring
type (gnu.lists.IString
class)
uses a extra structure so it can count characters in constant-time.
Similarly we can can index the string in 3 ways:
(str1 1) ⇒ #\m :: character (string-ref str1 1) ⇒ #\m :: character (str1:charAt 1) ⇒ #\m :: char
Using function-call syntax when the “function” is a string
and a single integer argument is the same as using string-ref
.
Things become interesting when we reach the emoticon:
(str1 6) ⇒ #\😃 :: character (str1:charAt 6) ⇒ #\d83d :: char
Both string-ref
and the function-call syntax return the
real character, while the charAt
methods returns a partial character.
(str1 7) ⇒ #\! :: character
(str1:charAt 7) ⇒ #\de03 :: char
(str1 8) ⇒ throws StringIndexOutOfBoundsException
(str1:charAt 8) ⇒ #\! :: char
You can index a string with a list of integer indexes, most commonly a range:
(str
[i
...])
is basically the same as:
(string (str
i
) ...)
Generally when working with strings it is best to work with substrings rather than individual characters:
(str
[start
<:end
])
This is equivalent to invoking the substring
procedure:
(substringstr
start
end
)
Indexing into a string (using for example string-ref
)
is inefficient because of the possible presence of surrogate pairs.
Hence given an index i
access normally requires linearly
scanning the string until we have seen i
characters.
The string-cursor API is defined in terms of abstract “cursor values”, which point to a position in the string. This avoids the linear scan.
Typical usage is:
(let* ((strwhatever
) (end (string-cursor-end str))) (do ((sc::string-cursor (string-cursor-start str) (string-cursor-next str sc))) ((string-cursor>=? sc end)) (let ((ch (string-cursor-ref str sc))) (do-something-with
ch))))
Alternatively, the following may be marginally faster:
(let* ((strwhatever
) (end (string-cursor-end str))) (do ((sc::string-cursor (string-cursor-start str) (string-cursor-next-quick sc))) ((string-cursor>=? sc end)) (let ((ch (string-cursor-ref str sc))) (if (not (char=? ch #\ignorable-char)) (do-something-with
ch)))))
The API is non-standard, but is based on that in Chibi Scheme.
An abstract position (index) in a string. Implemented as a primitive
int
which counts the number of preceding code units (16-bitchar
values).
Procedure: string-cursor-start
str
Returns a cursor for the start of the string. The result is always 0, cast to a
string-cursor
.
Procedure: string-cursor-end
str
Returns a cursor for the end of the string - one past the last valid character. Implemented as
(as string-cursor (invoke
.str
'length))
Procedure: string-cursor-ref
str
cursor
Return the
character
at thecursor
. If thecursor
points to the secondchar
of a surrogate pair, returns#\ignorable-char
.
Procedure: string-cursor-next
string
cursor
[count
]
Return the cursor position
count
(default 1) character positions forwards beyondcursor
. For eachcount
this may add either 1 or 2 (if pointing at a surrogate pair) to thecursor
.
Procedure: string-cursor-next-quiet
cursor
Increment cursor by one raw
char
position, even ifcursor
points to the start of a surrogate pair. (In that case the nextstring-cursor-ref
will return#\ignorable-char
.) Same as(+
but with thecursor
1)string-cursor
type.
Procedure: string-cursor-prev
string
cursor
[count
]
Return the cursor position
count
(default 1) character positions backwards beforecursor
.
Procedure: substring-cursor
string
[start
[end
]]
Create a substring of the section of
string
between the cursorsstart
andend
.
Procedure: string-cursor<?
cursor1
cursor2
Procedure: string-cursor<=?
cursor1
cursor2
Procedure: string-cursor=?
cursor1
cursor2
Procedure: string-cursor>=?
cursor1
cursor2
Procedure: string-cursor>?
cursor1
cursor2
Is the position of
cursor1
respectively before, before or same, same, after, or after or same, ascursor2
.Performance note: Implemented as the corresponding
int
comparison.
Procedure: string-cursor-for-each
proc
string
[start
[end
]]
Apply the procedure
proc
to each character position instring
between the cursorsstart
andend
.