Warning: This is the manual of the legacy Guile 2.0 series. You may want to read the manual of the current stable series instead.
Next: Delayed Evaluation, Previous: Load Paths, Up: Read/Load/Eval/Compile [Contents][Index]
Scheme source code files are usually encoded in ASCII or UTF-8, but the
built-in reader can interpret other character encodings as well. When
Guile loads Scheme source code, it uses the file-encoding
procedure (described below) to try to guess the encoding of the file.
In the absence of any hints, UTF-8 is assumed. One way to provide a
hint about the encoding of a source file is to place a coding
declaration in the top 500 characters of the file.
A coding declaration has the form coding: XXXXXX
, where
XXXXXX
is the name of a character encoding in which the source
code file has been encoded. The coding declaration must appear in a
scheme comment. It can either be a semicolon-initiated comment, or the
first block #!
comment in the file.
The name of the character encoding in the coding declaration is
typically lower case and containing only letters, numbers, and hyphens,
as recognized by set-port-encoding!
(see set-port-encoding!
). Common examples of character encoding
names are utf-8
and iso-8859-1
,
as defined by
IANA. Thus, the coding declaration is mostly compatible with Emacs.
However, there are some differences in encoding names recognized by
Emacs and encoding names defined by IANA, the latter being essentially a
subset of the former. For instance, latin-1
is a valid encoding
name for Emacs, but it’s not according to the IANA standard, which Guile
follows; instead, you should use iso-8859-1
, which is both
understood by Emacs and dubbed by IANA (IANA writes it uppercase but
Emacs wants it lowercase and Guile is case insensitive.)
For source code, only a subset of all possible character encodings can
be interpreted by the built-in source code reader. Only those
character encodings in which ASCII text appears unmodified can be
used. This includes UTF-8
and ISO-8859-1
through
ISO-8859-15
. The multi-byte character encodings UTF-16
and UTF-32
may not be used because they are not compatible with
ASCII.
There might be a scenario in which one would want to read non-ASCII
code from a port, such as with the function read
, instead of
with load
. If the port’s character encoding is the same as the
encoding of the code to be read by the port, not other special
handling is necessary. The port will automatically do the character
encoding conversion. The functions setlocale
or by
set-port-encoding!
are used to set port encodings
(see Ports).
If a port is used to read code of unknown character encoding, it can
accomplish this in three steps. First, the character encoding of the
port should be set to ISO-8859-1 using set-port-encoding!
.
Then, the procedure file-encoding
, described below, is used to
scan for a coding declaration when reading from the port. As a side
effect, it rewinds the port after its scan is complete. After that,
the port’s character encoding should be set to the encoding returned
by file-encoding
, if any, again by using
set-port-encoding!
. Then the code can be read as normal.
Alternatively, one can use the #:guess-encoding
keyword argument
of open-file
and related procedures. See File Ports.
Attempt to scan the first few hundred bytes from the port for
hints about its character encoding. Return a string containing the
encoding name or #f
if the encoding cannot be determined. The
port is rewound.
Currently, the only supported method is to look for an Emacs-like
character coding declaration (see how Emacs
recognizes file encoding in The GNU Emacs Reference Manual). The
coding declaration is of the form coding: XXXXX
and must appear
in a Scheme comment. Additional heuristics may be added in the future.
Next: Delayed Evaluation, Previous: Load Paths, Up: Read/Load/Eval/Compile [Contents][Index]