Next: Canonical combining class, Up: Unicode character classification and properties <unictype.h>
[Contents][Index]
Every Unicode character or code point has a general category assigned to it. This classification is important for most algorithms that work on Unicode text.
The GNU libunistring library provides two kinds of API for working with
general categories. The object oriented API uses a variable to denote
every predefined general category value or combinations thereof. The
low-level API uses a bit mask instead. The advantage of the object oriented
API is that if only a few predefined general category values are used,
the data tables are relatively small. When you combine general category
values (using uc_general_category_or
, uc_general_category_and
,
or uc_general_category_and_not
), or when you use the low level
bit masks, a big table is used thats holds the complete general category
information for all Unicode characters.