Textual input and output on Guile ports is layered on top of binary operations. To this end, each port has an associated character encoding that controls how bytes read from the port are converted to characters, and how characters written to the port are converted to bytes.
Returns, as a string, the character encoding that port uses to interpret its input and output.
Sets the character encoding that will be used to interpret I/O to
port. enc is a string containing the name of an encoding.
Valid encoding names are those
defined by IANA,
for example "UTF-8"
or "ISO-8859-1"
.
When ports are created, they are assigned an encoding. The usual
process to determine the initial encoding for a port is to take the
value of the %default-port-encoding
fluid.
A fluid containing name of the encoding to be used by default for newly
created ports (see Fluids and Dynamic States). As a special case,
the value #f
is equivalent to "ISO-8859-1"
.
The %default-port-encoding
itself defaults to the encoding
appropriate for the current locale, if setlocale
has been called.
See Locales, for more on locales and when you might need to call
setlocale
explicitly.
Some port types have other ways of determining their initial locales.
String ports, for example, default to the UTF-8 encoding, in order to be
able to represent all characters regardless of the current locale. File
ports can optionally sniff their file for a coding:
declaration;
See File Ports. Binary ports might be initialized to the ISO-8859-1
encoding in which each codepoint between 0 and 255 corresponds to a byte
with that value.
Currently, the ports only work with non-modal encodings. Most encodings are non-modal, meaning that the conversion of bytes to a string doesn’t depend on its context: the same byte sequence will always return the same string. A couple of modal encodings are in common use, like ISO-2022-JP and ISO-2022-KR, and they are not yet supported.
Each port also has an associated conversion strategy, which determines what to do when a Guile character can’t be converted to the port’s encoded character representation for output. There are three possible strategies: to raise an error, to replace the character with a hex escape, or to replace the character with a substitute character. Port conversion strategies are also used when decoding characters from an input port.
Returns the behavior of the port when outputting a character that is not representable in the port’s current encoding.
If port is #f
, then the current default behavior will be
returned. New ports will have this default behavior when they are
created.
Sets the behavior of Guile when outputting a character that is not
representable in the port’s current encoding, or when Guile encounters a
decoding error when trying to read a character. sym can be either
error
, substitute
, or escape
.
If port is an open port, the conversion error behavior is set for
that port. If it is #f
, it is set as the default behavior for
any future ports that get created in this thread.
As with port encodings, there is a fluid which determines the initial conversion strategy for a port.
The fluid that defines the conversion strategy for newly created ports,
and also for other conversion routines such as scm_to_stringn
,
scm_from_stringn
, string->pointer
, and
pointer->string
.
Its value must be one of the symbols described above, with the same
semantics: error
, substitute
, or escape
.
When Guile starts, its value is substitute
.
Note that (set-port-conversion-strategy! #f sym)
is
equivalent to (fluid-set! %default-port-conversion-strategy
sym)
.
As mentioned above, for an output port there are three possible port
conversion strategies. The error
strategy will throw an error
when a nonconvertible character is encountered. The substitute
strategy will replace nonconvertible characters with a question mark
(‘?’). Finally the escape
strategy will print
nonconvertible characters as a hex escape, using the escaping that is
recognized by Guile’s string syntax. Note that if the port’s encoding
is a Unicode encoding, like UTF-8
, then encoding errors are
impossible.
For an input port, the error
strategy will cause Guile to throw
an error if it encounters an invalid encoding, such as might happen if
you tried to read ISO-8859-1
as UTF-8
. The error is
thrown before advancing the read position. The substitute
strategy will replace the bad bytes with a U+FFFD replacement character,
in accordance with Unicode recommendations. When reading from an input
port, the escape
strategy is treated as if it were error
.