This section documents the finer points of Guile’s handling of Unicode byte order marks (BOMs). A byte order mark (U+FEFF) is typically found at the start of a UTF-16 or UTF-32 stream, to allow readers to reliably determine the byte order. Occasionally, a BOM is found at the start of a UTF-8 stream, but this is much less common and not generally recommended.
Guile attempts to handle BOMs automatically, and in accordance with the
recommendations of the Unicode Standard, when the port encoding is set
to UTF-8
, UTF-16
, or UTF-32
. In brief, Guile
automatically writes a BOM at the start of a UTF-16 or UTF-32 stream,
and automatically consumes one from the start of a UTF-8, UTF-16, or
UTF-32 stream.
As specified in the Unicode Standard, a BOM is only handled specially at
the start of a stream, and only if the port encoding is set to
UTF-8
, UTF-16
or UTF-32
. If the port encoding is
set to UTF-16BE
, UTF-16LE
, UTF-32BE
, or
UTF-32LE
, then BOMs are not handled specially, and none of
the special handling described in this section applies.
UTF-16BE
, UTF-16LE
, UTF-32BE
, or UTF-32LE
,
and explicitly write a BOM (#\xFEFF
) if desired.
set-port-encoding!
is called in the middle of a stream, Guile
treats this as a new logical “start of stream” for purposes of BOM
handling, and will forget about any BOMs that had previously been seen.
Therefore, it may choose a different byte order than had been used
previously. This is intended to support multiple logical text streams
embedded within a larger binary stream.
set-port-encoding!
, if a byte order had
already been chosen for the port, it will remain in effect after a seek,
and cannot be changed by the presence of a BOM. Seeks anywhere other
than the beginning of a file clear the “start of stream” flags.