(libidn.info.gz) Utility Functions

Info Catalog (libidn.info.gz) Preparation (libidn.info.gz) Top (libidn.info.gz) Stringprep Functions
 
 3 Utility Functions
 *******************
 
 The rest of this library makes extensive use of Unicode characters.  In
 order to interface this library with the outside world, your
 application may need to make various Unicode transformations.
 
 3.1 Header file `stringprep.h'
 ==============================
 
 To use the functions explained in this chapter, you need to include the
 file `stringprep.h' using:
 
      #include <stringprep.h>
 
 3.2 Unicode Encoding Transformation
 ===================================
 
 stringprep_unichar_to_utf8
 --------------------------
 
  -- Function: int stringprep_unichar_to_utf8 (uint32_t C, char * OUTBUF)
      C: a ISO10646 character code
 
      OUTBUF: output buffer, must have at least 6 bytes of space.  If
      `NULL', the length will be computed and returned and nothing will
      be written to `outbuf'.
 
      Converts a single character to UTF-8.
 
      *Return value:* number of bytes written.
 
 stringprep_utf8_to_unichar
 --------------------------
 
  -- Function: uint32_t stringprep_utf8_to_unichar (const char * P)
      P: a pointer to Unicode character encoded as UTF-8
 
      Converts a sequence of bytes encoded as UTF-8 to a Unicode
      character.  If `p' does not point to a valid UTF-8 encoded
      character, results are undefined.
 
      *Return value:* the resulting character.
 
 stringprep_ucs4_to_utf8
 -----------------------
 
  -- Function: char * stringprep_ucs4_to_utf8 (const uint32_t * STR,
           ssize_t LEN, size_t * ITEMS_READ, size_t * ITEMS_WRITTEN)
      STR: a UCS-4 encoded string
 
      LEN: the maximum length of `str' to use. If `len' < 0, then the
      string is terminated with a 0 character.
 
      ITEMS_READ: location to store number of characters read read, or
      `NULL'.
 
      ITEMS_WRITTEN: location to store number of bytes written or `NULL'.
      The value here stored does not include the trailing 0 byte.
 
      Convert a string from a 32-bit fixed width representation as UCS-4.
      to UTF-8. The result will be terminated with a 0 byte.
 
      *Return value:* a pointer to a newly allocated UTF-8 string.  This
      value must be deallocated by the caller.  If an error occurs,
      `NULL' will be returned.
 
 stringprep_utf8_to_ucs4
 -----------------------
 
  -- Function: uint32_t * stringprep_utf8_to_ucs4 (const char * STR,
           ssize_t LEN, size_t * ITEMS_WRITTEN)
      STR: a UTF-8 encoded string
 
      LEN: the maximum length of `str' to use. If `len' < 0, then the
      string is nul-terminated.
 
      ITEMS_WRITTEN: location to store the number of characters in the
      result, or `NULL'.
 
      Convert a string from UTF-8 to a 32-bit fixed width representation
      as UCS-4, assuming valid UTF-8 input.  This function does no error
      checking on the input.
 
      *Return value:* a pointer to a newly allocated UCS-4 string.  This
      value must be deallocated by the caller.
 
 3.3 Unicode Normalization
 =========================
 
 stringprep_ucs4_nfkc_normalize
 ------------------------------
 
  -- Function: uint32_t * stringprep_ucs4_nfkc_normalize (const uint32_t
           * STR, ssize_t LEN)
      STR: a Unicode string.
 
      LEN: length of `str' array, or -1 if `str' is nul-terminated.
 
      Converts a UCS4 string into canonical form, see
      `stringprep_utf8_nfkc_normalize()' for more information.
 
      *Return value:* a newly allocated Unicode string, that is the NFKC
      normalized form of `str'.
 
 stringprep_utf8_nfkc_normalize
 ------------------------------
 
  -- Function: char * stringprep_utf8_nfkc_normalize (const char * STR,
           ssize_t LEN)
      STR: a UTF-8 encoded string.
 
      LEN: length of `str', in bytes, or -1 if `str' is nul-terminated.
 
      Converts a string into canonical form, standardizing such issues
      as whether a character with an accent is represented as a base
      character and combining accent or as a single precomposed
      character.
 
      The normalization mode is NFKC (ALL COMPOSE).  It standardizes
      differences that do not affect the text content, such as the
      above-mentioned accent representation. It standardizes the
      "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to
      the standard forms (in this case DIGIT THREE). Formatting
      information may be lost but for most text operations such
      characters should be considered the same. It returns a result with
      composed forms rather than a maximally decomposed form.
 
      *Return value:* a newly allocated string, that is the NFKC
      normalized form of `str'.
 
 3.4 Character Set Conversion
 ============================
 
 stringprep_locale_charset
 -------------------------
 
  -- Function: const char * stringprep_locale_charset ( VOID)
      Find out current locale charset.  The function respect the CHARSET
      environment variable, but typically uses nl_langinfo(CODESET) when
      it is supported.  It fall back on "ASCII" if CHARSET isn't set and
      nl_langinfo isn't supported or return anything.
 
      Note that this function return the application's locale's preferred
      charset (or thread's locale's preffered charset, if your system
      support thread-specific locales).  It does not return what the
      system may be using.  Thus, if you receive data from external
      sources you cannot in general use this function to guess what
      charset it is encoded in.  Use stringprep_convert from the external
      representation into the charset returned by this function, to have
      data in the locale encoding.
 
      *Return value:* Return the character set used by the current
      locale.  It will never return NULL, but use "ASCII" as a fallback.
 
 stringprep_convert
 ------------------
 
  -- Function: char * stringprep_convert (const char * STR, const char *
           TO_CODESET, const char * FROM_CODESET)
      STR: input zero-terminated string.
 
      TO_CODESET: name of destination character set.
 
      FROM_CODESET: name of origin character set, as used by `str'.
 
      Convert the string from one character set to another using the
      system's `iconv()' function.
 
      *Return value:* Returns newly allocated zero-terminated string
      which is `str' transcoded into to_codeset.
 
 stringprep_locale_to_utf8
 -------------------------
 
  -- Function: char * stringprep_locale_to_utf8 (const char * STR)
      STR: input zero terminated string.
 
      Convert string encoded in the locale's character set into UTF-8 by
      using `stringprep_convert()'.
 
      *Return value:* Returns newly allocated zero-terminated string
      which is `str' transcoded into UTF-8.
 
 stringprep_utf8_to_locale
 -------------------------
 
  -- Function: char * stringprep_utf8_to_locale (const char * STR)
      STR: input zero terminated string.
 
      Convert string encoded in UTF-8 into the locale's character set by
      using `stringprep_convert()'.
 
      *Return value:* Returns newly allocated zero-terminated string
      which is `str' transcoded into the locale's character set.
 
Info Catalog (libidn.info.gz) Preparation (libidn.info.gz) Top (libidn.info.gz) Stringprep Functions
automatically generated by info2html