Previous: Grapheme cluster breaks in a string, Up: Grapheme cluster breaks in strings <unigbrk.h>
[Contents][Index]
This is a more low-level API. The grapheme cluster break property is a property defined in Unicode Standard Annex #29, section “Grapheme Cluster Boundaries”, see https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries. It is used for determining the grapheme cluster breaks in a string.
The following are the possible values of the grapheme cluster break property. More values may be added in the future.
The following function looks up the grapheme cluster break property of a character.
Returns the Grapheme_Cluster_Break property of a Unicode character.
The following function determines whether there is a grapheme cluster break between two Unicode characters. It is the primitive upon which the higher-level functions in the previous section are directly based.
Returns true if there is an grapheme cluster boundary between Unicode characters a and b.
There is always a grapheme cluster break at the start or end of text. You can specify zero for a or b to indicate start of text or end of text, respectively.
This implements the extended (not legacy) grapheme cluster rules described in the Unicode standard, because the standard says that they are preferred.
Note that this function does not handle the case when three or more
consecutive characters are needed to determine the boundary.
This is the case in particular with syllables in Indic scripts or emojis.
Use uc_grapheme_breaks
for such cases.