Some plugins link against the libextractor_common
library which
provides common abstractions needed by many plugins. This section
documents this internal API for plugin developers. Note that the headers
for this library are (intentionally) not installed: we do not consider
this API stable and it should hence only be used by plugins that are
build and shipped with GNU libextractor. Third-party plugins should
not use it.
convert_numeric.h defines various conversion functions for numbers (in particular, byte-order conversion for floating point numbers).
unzip.h defines an API for accessing compressed files.
pack.h provides an interpreter for unpacking structs of integer numbers from streams and converting from big or little endian to host byte order at the same time.
convert.h provides a function for character set conversion described below.
Various GNU libextractor plugins make use of the internal convert.h header which defines a function
EXTRACTOR_common_convert_to_utf8 which can be used to easily convert text from any character set to UTF-8. This conversion is important since the linked list of keywords that is returned by GNU libextractor is expected to contain only UTF-8 strings. Naturally, proper conversion may not always be possible since some file formats fail to specify the character set. In that case, it is often better to not convert at all.
The arguments to EXTRACTOR_common_convert_to_utf8 are the input string (which does not have to be zero-terminated), the length of the input string, and the character set (which must be zero-terminated). Which character sets are supported depends on the platform, a list can generally be obtained using the iconv -l command. The return value from EXTRACTOR_common_convert_to_utf8 is a zero-terminated string in UTF-8 format. The responsibility to free the string is with the caller, so storing the string in the keyword list is acceptable.