Next: The bit mask API for general category, Up: General category [Contents][Index]
This data type denotes a general category value. It is an immediate type that can be copied by simple assignment, without involving memory allocation. It is not an array type.
The following are the predefined general category value. Additional general categories may be added in the future.
The UC_CATEGORY_*
constants reflect the systematic general category
values assigned by the Unicode Consortium. Whereas the other UC_*
macros are aliases, for use when readable code is preferred.
This represents the general category “Letter”.
This represents the general category “Letter, uppercase”.
This represents the general category “Letter, lowercase”.
This represents the general category “Letter, titlecase”.
This represents the general category “Letter, modifier”.
This represents the general category “Letter, other”.
This represents the general category “Marker”.
This represents the general category “Marker, nonspacing”.
This represents the general category “Marker, spacing combining”.
This represents the general category “Marker, enclosing”.
This represents the general category “Number”.
This represents the general category “Number, decimal digit”.
This represents the general category “Number, letter”.
This represents the general category “Number, other”.
This represents the general category “Punctuation”.
This represents the general category “Punctuation, connector”.
This represents the general category “Punctuation, dash”.
This represents the general category “Punctuation, open”, a.k.a. “start punctuation”.
This represents the general category “Punctuation, close”, a.k.a. “end punctuation”.
This represents the general category “Punctuation, initial quote”.
This represents the general category “Punctuation, final quote”.
This represents the general category “Punctuation, other”.
This represents the general category “Symbol”.
This represents the general category “Symbol, math”.
This represents the general category “Symbol, currency”.
This represents the general category “Symbol, modifier”.
This represents the general category “Symbol, other”.
This represents the general category “Separator”.
This represents the general category “Separator, space”.
This represents the general category “Separator, line”.
This represents the general category “Separator, paragraph”.
This represents the general category “Other”.
This represents the general category “Other, control”.
This represents the general category “Other, format”.
This represents the general category “Other, surrogate”. All code points in this category are invalid characters.
This represents the general category “Other, private use”.
This represents the general category “Other, not assigned”. Some code points in this category are invalid characters.
The following functions combine general categories, like in a boolean algebra, except that there is no ‘not’ operation.
Returns the union of two general categories. This corresponds to the unions of the two sets of characters.
Returns the intersection of two general categories as bit masks. This does not correspond to the intersection of the two sets of characters.
Returns the intersection of a general category with the complement of a second general category, as bit masks. This does not correspond to the intersection with complement, when viewing the categories as sets of characters.
The following functions associate general categories with their name.
Returns the name of a general category, more precisely, the abbreviated name. Returns NULL if the general category corresponds to a bit mask that does not have a name.
Returns the long name of a general category. Returns NULL if the general category corresponds to a bit mask that does not have a name.
Returns the general category given by name, e.g. "Lu"
, or by long
name, e.g. "Uppercase Letter"
.
This lookup ignores spaces, underscores, or hyphens as word separators and is
case-insignificant.
The following functions view general categories as sets of Unicode characters.
Returns the general category of a Unicode character.
This function uses a big table.
Tests whether a Unicode character belongs to a given category. The category argument can be a predefined general category or the combination of several predefined general categories.
Next: The bit mask API for general category, Up: General category [Contents][Index]