The unibyte and multibyte text representations use different
character codes. The valid character codes for unibyte representation
range from 0 to #xFF
(255)—the values that can fit in one
byte. The valid character codes for multibyte representation range
from 0 to #x3FFFFF
. In this code space, values 0 through
#x7F
(127) are for ASCII characters, and values
#x80
(128) through #x3FFF7F
(4194175) are for
non-ASCII characters.
Emacs character codes are a superset of the Unicode standard.
Values 0 through #x10FFFF
(1114111) correspond to Unicode
characters of the same codepoint; values #x110000
(1114112)
through #x3FFF7F
(4194175) represent characters that are not
unified with Unicode; and values #x3FFF80
(4194176) through
#x3FFFFF
(4194303) represent eight-bit raw bytes.
This returns t
if charcode is a valid character, and
nil
otherwise.
(characterp 65) ⇒ t
(characterp 4194303) ⇒ t
(characterp 4194304) ⇒ nil
This function returns the largest value that a valid character
codepoint can have in Emacs. If the optional argument unicode
is non-nil
, it returns the largest character codepoint defined
by the Unicode Standard (which is smaller than the maximum codepoint
supported by Emacs).
(characterp (max-char)) ⇒ t
(characterp (1+ (max-char))) ⇒ nil
This function returns the character whose Unicode name is string.
If ignore-case is non-nil
, case is ignored in string.
This function returns nil
if string does not name a character.
;; U+03A3 (= (char-from-name "GREEK CAPITAL LETTER SIGMA") #x03A3) ⇒ t
This function returns the byte at character position pos in the current buffer. If the current buffer is unibyte, this is literally the byte at that position. If the buffer is multibyte, byte values of ASCII characters are the same as character codepoints, whereas eight-bit raw bytes are converted to their 8-bit codes. The function signals an error if the character at pos is non-ASCII.
The optional argument string means to get a byte value from that string instead of the current buffer.