Next: Licenses, Previous: The wchar_t
mess, Up: GNU libunistring [Contents][Index]
char32_t
problemIn response to the wchar_t
mess described in the previous section,
ISO C 11 introduces two new types: char32_t
and char16_t
.
char32_t
is a type like wchar_t
, with the added guarantee that it
is 32 bits wide. So, it is a type that is appropriate for encoding a Unicode
character. It is meant to resolve the problems of the 16-bit wide
wchar_t
on AIX and Windows platforms, and allow a saner programming model
for wide character strings across all platforms.
char16_t
is a type like wchar_t
, with the added guarantee that it
is 16 bits wide. It is meant to allow porting programs that use the broken wide
character strings programming model from Windows to all platforms. Of course,
no one needs this.
These types are accompanied with a syntax for defining wide string literals with
these element types: u"..."
and U"..."
.
So far, so good. What the ISO C designers forgot, is to provide standardized C
library functions that operate on these wide character strings. They
standardized only the most basic functions, mbrtoc32
and c32rtomb
,
which are analogous to mbrtowc
and wcrtomb
, respectively. For the
rest, GNU gnulib https://www.gnu.org/software/gnulib/ provides the
functions:
mbstoc32s
– like
mbstowcs
, c32stombs
– like wcstombs
.
c32isalnum
, c32isalpha
, etc. – like iswalnum
,
iswalpha
, etc.
Still, this API has two problems:
char32_t
encoding is locale dependent and undocumented. This means,
if you want to know any property of a char32_t
character, other than the
properties defined by <wctype.h>
– such as whether it’s a dash, currency
symbol, paragraph separator, or similar –, you have to convert it to
char *
encoding first, by use of the function c32tomb
.
wchar_t
is 32 bits wide, the char32_t
encoding may be different from the wchar_t
encoding.
Next: Licenses, Previous: The wchar_t
mess, Up: GNU libunistring [Contents][Index]