Next: Grapheme cluster break property, Up: Grapheme cluster breaks in strings <unigbrk.h>
[Contents][Index]
The following functions find a single boundary between grapheme clusters in a string.
Returns the start of the next grapheme cluster following s,
or end if no grapheme cluster break is encountered before it.
Returns NULL if and only if s == end
.
Note that these functions do not handle the case when a character
outside of the range between s and end is needed to
determine the boundary.
This is the case in particular with syllables in Indic scripts or emojis.
Use _grapheme_breaks
functions for such cases.
Returns the start of the grapheme cluster preceding s, or
start if no grapheme cluster break is encountered before it.
Returns NULL if and only if s == start
.
Note that these functions do not handle the case when a character
outside of the range between start and s is needed to
determine the boundary.
This is the case in particular with syllables in Indic scripts or emojis.
Use _grapheme_breaks
functions for such cases.
Note also that these functions work only on well-formed Unicode strings.
The following functions determine all of the grapheme cluster boundaries in a string.
Determines the grapheme cluster break points in s, an array of
n units, and stores the result at p[0..nx-1]
.
p[i] = 1
means that there is a grapheme cluster boundary between
s[i-1]
and s[i]
.
p[i] = 0
means that s[i-1]
and s[i]
are part of the
same grapheme cluster.
p[0]
is always set to 1, because there is always a
grapheme cluster break at start of text.
In addition to the above variants for UTF-8, UTF-16, and UTF-32 strings,
<unigbrk.h>
provides another variant: uc_grapheme_breaks
.
This is similar to u32_grapheme_breaks
, but it accepts any
characters which may not be represented in UTF-32, such as control
characters.