(libidn.info.gz) Invoking idn
Info Catalog
(libidn.info.gz) Examples
(libidn.info.gz) Top
(libidn.info.gz) Emacs API
10 Invoking idn
***************
10.1 Name
=========
GNU Libidn (idn) - Internationalized Domain Names command line tool
10.2 Description
================
`idn' allows internationalized string preparation (`stringprep'),
encoding and decoding of punycode data, and IDNA ToASCII/ToUnicode
operations to be performed on the command line.
If strings are specified on the command line, they are used as input
and the computed output is printed to standard output `stdout'. If no
strings are specified on the command line, the program read data, line
by line, from the standard input `stdin', and print the computed output
to standard output. What processing is performed (e.g., ToASCII, or
Punycode encode) is indicated by options. If any errors are
encountered, the execution of the applications is aborted.
All strings are expected to be encoded in the preferred charset used
by your locale. Use `--debug' to find out what this charset is. You
can override the charset used by setting environment variable `CHARSET'.
To process a string that starts with `-', for example `-foo', use
`--' to signal the end of parameters, as in `idn --quiet -a -- -foo'.
10.3 Options
============
`idn' recognizes these commands:
-h, --help Print help and exit
-V, --version Print version and exit
-s, --stringprep Prepare string according to nameprep profile
-d, --punycode-decode Decode Punycode
-e, --punycode-encode Encode Punycode
-a, --idna-to-ascii Convert to ACE according to IDNA (default mode)
-u, --idna-to-unicode Convert from ACE according to IDNA
--allow-unassigned Toggle IDNA AllowUnassigned flag (default off)
--usestd3asciirules Toggle IDNA UseSTD3ASCIIRules flag (default off)
--no-tld Don't check string for TLD specific rules
Only for --idna-to-ascii and --idna-to-unicode
-n, --nfkc Normalize string according to Unicode v3.2 NFKC
-p, --profile=STRING Use specified stringprep profile instead
Valid stringprep profiles: `Nameprep',
`iSCSI', `Nodeprep', `Resourceprep',
`trace', `SASLprep'
--debug Print debugging information
--quiet Silent operation
10.4 Environment Variables
==========================
The CHARSET environment variable can be used to override what character
set to be used for decoding incoming data (i.e., on the command line or
on the standard input stream), and to encode data to the standard
output. If your system is set up correctly, however, the application
will guess which character set is used automatically. Example usage:
$ CHARSET=ISO-8859-1 idn --punycode-encode
...
10.5 Examples
=============
Standard usage, reading input from standard input:
jas@latte:~$ idn
libidn 0.3.5
Copyright 2002, 2003 Simon Josefsson.
GNU Libidn comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Libidn under the terms of
the GNU Lesser General Public License. For more information
about these matters, see the file named COPYING.LIB.
Type each input string on a line by itself, terminated by a newline character.
räksmörgås.se
xn--rksmrgs-5wao1o.se
jas@latte:~$
Reading input from command line, and disabling copyright and license
information:
jas@latte:~$ idn --quiet räksmörgås.se blåbærgrød.no
xn--rksmrgs-5wao1o.se
xn--blbrgrd-fxak7p.no
jas@latte:~$
Accessing a specific StringPrep profile directly:
jas@latte:~$ idn --quiet --profile=SASLprep --stringprep teÃtª
teÃta
jas@latte:~$
10.6 Troubleshooting
====================
Getting character data encoded right, and making sure Libidn use the
same encoding, can be difficult. The reason for this is that most
systems encode character data in more than one character encoding,
i.e., using `UTF-8' together with `ISO-8859-1' or `ISO-2022-JP'. This
problem is likely to continue to exist until only one character
encoding come out as the evolutionary winner, or (more likely, at least
to some extents) forever.
The first step to troubleshooting character encoding problems with
Libidn is to use the `--debug' parameter to find out which character
set encoding `idn' believe your locale uses.
jas@latte:~$ idn --debug --quiet ""
system locale uses charset `UTF-8'.
jas@latte:~$
If it prints `ANSI_X3.4-1968' (i.e., `US-ASCII'), this indicate you
have not configured your locale properly. To configure the locale, you
can, for example, use `LANG=sv_SE.UTF-8; export LANG' at a `/bin/sh'
prompt, to set up your locale for a Swedish environment using `UTF-8'
as the encoding.
Sometimes `idn' appear to be unable to translate from your system
locale into `UTF-8' (which is used internally), and you get an error
like the following:
jas@latte:~$ idn --quiet foo
idn: could not convert from ISO-8859-1 to UTF-8.
jas@latte:~$
The simplest explanation is that you haven't installed the `iconv'
conversion tools. You can find it as a standalone library in GNU
Libiconv (`http://www.gnu.org/software/libiconv/'). On many GNU/Linux
systems, this library is part of the system, but you may have to
install additional packages (e.g., `glibc-locale' for Debian) to be
able to use it.
Another explanation is that the error is correct and you are feeding
`idn' invalid data. This can happen inadvertently if you are not
careful with the character set encodings you use. For example, if your
shell run in a `ISO-8859-1' environment, and you invoke `idn' with the
`CHARSET' environment variable as follows, you will feed it
`ISO-8859-1' characters but force it to believe they are `UTF-8'.
Naturally this will lead to an error, unless the byte sequences happen
to be parsable as `UTF-8'. Note that even if you don't get an error,
the output may be incorrect in this situation, because `ISO-8859-1' and
`UTF-8' does not in general encode the same characters as the same byte
sequences.
jas@latte:~$ idn --quiet --debug ""
system locale uses charset `ISO-8859-1'.
jas@latte:~$ CHARSET=UTF-8 idn --quiet --debug räksmörgås
system locale uses charset `UTF-8'.
input[0] = U+0072
input[1] = U+4af3
input[2] = U+006d
input[3] = U+1b29e5
input[4] = U+0073
output[0] = U+0078
output[1] = U+006e
output[2] = U+002d
output[3] = U+002d
output[4] = U+0072
output[5] = U+006d
output[6] = U+0073
output[7] = U+002d
output[8] = U+0068
output[9] = U+0069
output[10] = U+0036
output[11] = U+0064
output[12] = U+0035
output[13] = U+0039
output[14] = U+0037
output[15] = U+0035
output[16] = U+0035
output[17] = U+0032
output[18] = U+0061
xn--rms-hi6d597552a
jas@latte:~$
The sense moral here is to forget about `CHARSET' (configure your
locales properly instead) unless you know what you are doing, and if
you want to use it, do it carefully, after verifying with `--debug'
that you get the desired results.
Info Catalog
(libidn.info.gz) Examples
(libidn.info.gz) Top
(libidn.info.gz) Emacs API
automatically generated by
info2html