(libidn.info.gz) On Label Separators

Info Catalog (libidn.info.gz) PR29 discussion (libidn.info.gz) Top (libidn.info.gz) Copying Information
 
 Appendix B On Label Separators
 ******************************
 
 Some strings contains characters whose NFKC normalized form contain the
 ASCII dot (0x2E, ".").  Examples of these characters are U+2024 (ONE
 DOT LEADER) and U+248C (DIGIT FIVE FULL STOP).  The strings have the
 interesting property that their IDNA ToASCII output will contain
 embedded dots.  For example:
 
      ToASCII (hi U+248C com) = hi5.com
      ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u
 
    This demonstrate the two general cases: The first where the ASCII dot
 is part of an output that do not begin with the IDN prefix `xn--'.  The
 second example illustrate when the dot is part of IDN prefixed with
 `xn--'.
 
    The input strings are, from the DNS point of view, a single label.
 The IDNA algorithm translate one label at a time.  Thus, the output is
 expected to be only one label.  What is important here is to make sure
 the DNS resolver receives the correct query.  The DNS protocol does not
 use the dot to delimit labels on the wire, rather it uses length-value
 pairs.  Thus the correct query would be for `{7}hi5.com' and
 `{22}xn--rksmrgs.com-l8as9u' respectively.
 
    Some implementations (1) have decided that these inputs strings are
 potentially confusing for the user.  The string `hi U+248C com' looks
 like `hi5.com' on systems that support Unicode properly.  These
 implementations do not follow RFC 3490.  They yield:
 
      ToASCII (hi U+248C com) = hi5.com
      ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com
 
    The DNS query they perform are `{3}hi5{3}com' and
 `{18}xn--rksmrgs-5wao1o{3}com' respectively.  Arguably, this leads to a
 better user experience, and suggests that the IDNA specification is
 sub-optimal in this area.
 
 B.1 Recommended Workaround
 ==========================
 
 It has been suggested to normalize the entire input string using NFKC
 before passing it to IDNA ToASCII.  You may use
 `stringprep_utf8_nfkc_normalize' or `stringprep_ucs4_nfkc_normalize'.
 This appears to lead to similar behaviour as IE/Firefox, which would
 avoid the problem, but this needs to be confirmed.  Feel free to
 discuss the issue with us.
 
    Alternative workarounds are being considered.  Eventually Libidn may
 implement a new flag to the `idna_*' functions that implements a
 recommended way to work around this problem.
 
    ---------- Footnotes ----------
 
    (1) Notably Microsoft's Internet Explorer and Mozilla's Firefox, but
 not Apple's Safari.
 
Info Catalog (libidn.info.gz) PR29 discussion (libidn.info.gz) Top (libidn.info.gz) Copying Information
automatically generated by info2html