Next: , Previous: , Up: Introduction   [Contents][Index]


1.4 Choice of in-memory representation of strings

There are three ways of representing strings in memory of a running program.

Of course, a ‘char *’ string can, in some cases, be encoded in UTF-8. You will use the data type depending on what you can guarantee about how it’s encoded: If a string is encoded in the locale encoding, or if you don’t know how it’s encoded, use ‘char *’. If, on the other hand, you can guarantee that it is UTF-8 encoded, then you can use the UTF-8 string type, uint8_t *, for it.

The five types char *, uint8_t *, uint16_t *, uint32_t *, and wchar_t * are incompatible types at the C level. Therefore, ‘gcc -Wall’ will produce a warning if, by mistake, your code contains a mismatch between these types. In the context of using GNU libunistring, even a warning about a mismatch between char * and uint8_t * is a sign of a bug in your code that you should not try to silence through a cast.


Next: char *’ strings, Previous: Locale encodings, Up: Introduction   [Contents][Index]