(gawk.info.gz) POSIX Floating Point Problems

Info Catalog (gawk.info.gz) Unexpected Results (gawk.info.gz) Floating Point Issues
 
 D.3.3 Standards Versus Existing Practice
 ----------------------------------------
 
 Historically, `awk' has converted any non-numeric looking string to the
 numeric value zero, when required.  Furthermore, the original
 definition of the language and the original POSIX standards specified
 that `awk' only understands decimal numbers (base 10), and not octal
 (base 8) or hexadecimal numbers (base 16).
 
    Changes in the language of the 2001 and 2004 POSIX standard can be
 interpreted to imply that `awk' should support additional features.
 These features are:
 
    * Interpretation of floating point data values specified in
      hexadecimal notation (`0xDEADBEEF'). (Note: data values, _not_
      source code constants.)
 
    * Support for the special IEEE 754 floating point values "Not A
      Number" (NaN), positive Infinity ("inf") and negative Infinity
      ("-inf").  In particular, the format for these values is as
      specified by the ISO 1999 C standard, which ignores case and can
      allow machine-dependent additional characters after the `nan' and
      allow either `inf' or `infinity'.
 
    The first problem is that both of these are clear changes to
 historical practice:
 
    * The `gawk' maintainer feels that supporting hexadecimal floating
      point values, in particular, is ugly, and was never intended by the
      original designers to be part of the language.
 
    * Allowing completely alphabetic strings to have valid numeric
      values is also a very severe departure from historical practice.
 
    The second problem is that the `gawk' maintainer feels that this
 interpretation of the standard, which requires a certain amount of
 "language lawyering" to arrive at in the first place, was not even
 intended by the standard developers.  In other words, "we see how you
 got where you are, but we don't think that that's where you want to be."
 
    The 2008 POSIX standard added explicit wording to allow, but not
 require, that `awk' support hexadecimal floating point values and
 special values for "Not A Number" and infinity.
 
    Although the `gawk' maintainer continues to feel that providing
 those features is inadvisable, nevertheless, on systems that support
 IEEE floating point, it seems reasonable to provide _some_ way to
 support NaN and Infinity values.  The solution implemented in `gawk' is
 as follows:
 
    * With the `--posix' command-line option, `gawk' becomes "hands
      off." String values are passed directly to the system library's
      `strtod()' function, and if it successfully returns a numeric
      value, that is what's used.(1) By definition, the results are not
      portable across different systems.  They are also a little
      surprising:
 
           $ echo nanny | gawk --posix '{ print $1 + 0 }'
           -| nan
           $ echo 0xDeadBeef | gawk --posix '{ print $1 + 0 }'
           -| 3735928559
 
    * Without `--posix', `gawk' interprets the four strings `+inf',
      `-inf', `+nan', and `-nan' specially, producing the corresponding
      special numeric values.  The leading sign acts a signal to `gawk'
      (and the user) that the value is really numeric.  Hexadecimal
      floating point is not supported (unless you also use
      `--non-decimal-data', which is _not_ recommended). For example:
 
           $ echo nanny | gawk '{ print $1 + 0 }'
           -| 0
           $ echo +nan | gawk '{ print $1 + 0 }'
           -| nan
           $ echo 0xDeadBeef | gawk '{ print $1 + 0 }'
           -| 0
 
      `gawk' does ignore case in the four special values.  Thus `+nan'
      and `+NaN' are the same.
 
    ---------- Footnotes ----------
 
    (1) You asked for it, you got it.
 
Info Catalog (gawk.info.gz) Unexpected Results (gawk.info.gz) Floating Point Issues
automatically generated by info2html