String Internals (Guile Reference Manual)

Previous: Conversion to/from C, Up: Strings [Contents][Index]

6.6.5.15 String Internals ¶

Guile stores each string in memory as a contiguous array of Unicode code points along with an associated set of attributes. If all of the code points of a string have an integer range between 0 and 255 inclusive, the code point array is stored as one byte per code point: it is stored as an ISO-8859-1 (aka Latin-1) string. If any of the code points of the string has an integer value greater that 255, the code point array is stored as four bytes per code point: it is stored as a UTF-32 string.

Conversion between the one-byte-per-code-point and four-bytes-per-code-point representations happens automatically as necessary.

No API is provided to set the internal representation of strings; however, there are pair of procedures available to query it. These are debugging procedures. Using them in production code is discouraged, since the details of Guile’s internal representation of strings may change from release to release.

Scheme Procedure: string-bytes-per-char str ¶
C Function: scm_string_bytes_per_char (str) ¶: Return the number of bytes used to encode a Unicode code point in string str. The result is one or four.

Scheme Procedure: %string-dump str ¶

C Function: scm_sys_string_dump (str) ¶

Returns an association list containing debugging information for str. The association list has the following entries.

string: The string itself.
start: The start index of the string into its stringbuf
length: The length of the string
shared: If this string is a substring, it returns its parent string. Otherwise, it returns #f
read-only: #t if the string is read-only
stringbuf-chars: A new string containing this string’s stringbuf’s characters
stringbuf-length: The number of characters in this stringbuf
stringbuf-shared: #t if this stringbuf is shared
stringbuf-wide: #t if this stringbuf’s characters are stored in a 32-bit buffer, or #f if they are stored in an 8-bit buffer