(libidn.info.gz) Utility Functions
Info Catalog
(libidn.info.gz) Preparation
(libidn.info.gz) Top
(libidn.info.gz) Stringprep Functions
3 Utility Functions
*******************
The rest of this library makes extensive use of Unicode characters. In
order to interface this library with the outside world, your
application may need to make various Unicode transformations.
3.1 Header file `stringprep.h'
==============================
To use the functions explained in this chapter, you need to include the
file `stringprep.h' using:
#include <stringprep.h>
3.2 Unicode Encoding Transformation
===================================
stringprep_unichar_to_utf8
--------------------------
-- Function: int stringprep_unichar_to_utf8 (uint32_t C, char * OUTBUF)
C: a ISO10646 character code
OUTBUF: output buffer, must have at least 6 bytes of space. If
`NULL', the length will be computed and returned and nothing will
be written to `outbuf'.
Converts a single character to UTF-8.
*Return value:* number of bytes written.
stringprep_utf8_to_unichar
--------------------------
-- Function: uint32_t stringprep_utf8_to_unichar (const char * P)
P: a pointer to Unicode character encoded as UTF-8
Converts a sequence of bytes encoded as UTF-8 to a Unicode
character. If `p' does not point to a valid UTF-8 encoded
character, results are undefined.
*Return value:* the resulting character.
stringprep_ucs4_to_utf8
-----------------------
-- Function: char * stringprep_ucs4_to_utf8 (const uint32_t * STR,
ssize_t LEN, size_t * ITEMS_READ, size_t * ITEMS_WRITTEN)
STR: a UCS-4 encoded string
LEN: the maximum length of `str' to use. If `len' < 0, then the
string is terminated with a 0 character.
ITEMS_READ: location to store number of characters read read, or
`NULL'.
ITEMS_WRITTEN: location to store number of bytes written or `NULL'.
The value here stored does not include the trailing 0 byte.
Convert a string from a 32-bit fixed width representation as UCS-4.
to UTF-8. The result will be terminated with a 0 byte.
*Return value:* a pointer to a newly allocated UTF-8 string. This
value must be deallocated by the caller. If an error occurs,
`NULL' will be returned.
stringprep_utf8_to_ucs4
-----------------------
-- Function: uint32_t * stringprep_utf8_to_ucs4 (const char * STR,
ssize_t LEN, size_t * ITEMS_WRITTEN)
STR: a UTF-8 encoded string
LEN: the maximum length of `str' to use. If `len' < 0, then the
string is nul-terminated.
ITEMS_WRITTEN: location to store the number of characters in the
result, or `NULL'.
Convert a string from UTF-8 to a 32-bit fixed width representation
as UCS-4, assuming valid UTF-8 input. This function does no error
checking on the input.
*Return value:* a pointer to a newly allocated UCS-4 string. This
value must be deallocated by the caller.
3.3 Unicode Normalization
=========================
stringprep_ucs4_nfkc_normalize
------------------------------
-- Function: uint32_t * stringprep_ucs4_nfkc_normalize (const uint32_t
* STR, ssize_t LEN)
STR: a Unicode string.
LEN: length of `str' array, or -1 if `str' is nul-terminated.
Converts a UCS4 string into canonical form, see
`stringprep_utf8_nfkc_normalize()' for more information.
*Return value:* a newly allocated Unicode string, that is the NFKC
normalized form of `str'.
stringprep_utf8_nfkc_normalize
------------------------------
-- Function: char * stringprep_utf8_nfkc_normalize (const char * STR,
ssize_t LEN)
STR: a UTF-8 encoded string.
LEN: length of `str', in bytes, or -1 if `str' is nul-terminated.
Converts a string into canonical form, standardizing such issues
as whether a character with an accent is represented as a base
character and combining accent or as a single precomposed
character.
The normalization mode is NFKC (ALL COMPOSE). It standardizes
differences that do not affect the text content, such as the
above-mentioned accent representation. It standardizes the
"compatibility" characters in Unicode, such as SUPERSCRIPT THREE to
the standard forms (in this case DIGIT THREE). Formatting
information may be lost but for most text operations such
characters should be considered the same. It returns a result with
composed forms rather than a maximally decomposed form.
*Return value:* a newly allocated string, that is the NFKC
normalized form of `str'.
3.4 Character Set Conversion
============================
stringprep_locale_charset
-------------------------
-- Function: const char * stringprep_locale_charset ( VOID)
Find out current locale charset. The function respect the CHARSET
environment variable, but typically uses nl_langinfo(CODESET) when
it is supported. It fall back on "ASCII" if CHARSET isn't set and
nl_langinfo isn't supported or return anything.
Note that this function return the application's locale's preferred
charset (or thread's locale's preffered charset, if your system
support thread-specific locales). It does not return what the
system may be using. Thus, if you receive data from external
sources you cannot in general use this function to guess what
charset it is encoded in. Use stringprep_convert from the external
representation into the charset returned by this function, to have
data in the locale encoding.
*Return value:* Return the character set used by the current
locale. It will never return NULL, but use "ASCII" as a fallback.
stringprep_convert
------------------
-- Function: char * stringprep_convert (const char * STR, const char *
TO_CODESET, const char * FROM_CODESET)
STR: input zero-terminated string.
TO_CODESET: name of destination character set.
FROM_CODESET: name of origin character set, as used by `str'.
Convert the string from one character set to another using the
system's `iconv()' function.
*Return value:* Returns newly allocated zero-terminated string
which is `str' transcoded into to_codeset.
stringprep_locale_to_utf8
-------------------------
-- Function: char * stringprep_locale_to_utf8 (const char * STR)
STR: input zero terminated string.
Convert string encoded in the locale's character set into UTF-8 by
using `stringprep_convert()'.
*Return value:* Returns newly allocated zero-terminated string
which is `str' transcoded into UTF-8.
stringprep_utf8_to_locale
-------------------------
-- Function: char * stringprep_utf8_to_locale (const char * STR)
STR: input zero terminated string.
Convert string encoded in UTF-8 into the locale's character set by
using `stringprep_convert()'.
*Return value:* Returns newly allocated zero-terminated string
which is `str' transcoded into the locale's character set.
Info Catalog
(libidn.info.gz) Preparation
(libidn.info.gz) Top
(libidn.info.gz) Stringprep Functions
automatically generated by
info2html