(libc.info.gz) Rounding

Info Catalog (libc.info.gz) Floating Point Errors (libc.info.gz) Arithmetic (libc.info.gz) Control Functions
 
 20.6 Rounding Modes
 ===================
 
 Floating-point calculations are carried out internally with extra
 precision, and then rounded to fit into the destination type.  This
 ensures that results are as precise as the input data.  IEEE 754 defines
 four possible rounding modes:
 
 Round to nearest.
      This is the default mode.  It should be used unless there is a
      specific need for one of the others.  In this mode results are
      rounded to the nearest representable value.  If the result is
      midway between two representable values, the even representable is
      chosen.  "Even" here means the lowest-order bit is zero.  This
      rounding mode prevents statistical bias and guarantees numeric
      stability: round-off errors in a lengthy calculation will remain
      smaller than half of 'FLT_EPSILON'.
 
 Round toward plus Infinity.
      All results are rounded to the smallest representable value which
      is greater than the result.
 
 Round toward minus Infinity.
      All results are rounded to the largest representable value which is
      less than the result.
 
 Round toward zero.
      All results are rounded to the largest representable value whose
      magnitude is less than that of the result.  In other words, if the
      result is negative it is rounded up; if it is positive, it is
      rounded down.
 
 'fenv.h' defines constants which you can use to refer to the various
 rounding modes.  Each one will be defined if and only if the FPU
 supports the corresponding rounding mode.
 
 'FE_TONEAREST'
      Round to nearest.
 
 'FE_UPWARD'
      Round toward +oo.
 
 'FE_DOWNWARD'
      Round toward -oo.
 
 'FE_TOWARDZERO'
      Round toward zero.
 
    Underflow is an unusual case.  Normally, IEEE 754 floating point
 numbers are always normalized ( Floating Point Concepts).
 Numbers smaller than 2^r (where r is the minimum exponent,
 'FLT_MIN_RADIX-1' for FLOAT) cannot be represented as normalized
 numbers.  Rounding all such numbers to zero or 2^r would cause some
 algorithms to fail at 0.  Therefore, they are left in denormalized form.
 That produces loss of precision, since some bits of the mantissa are
 stolen to indicate the decimal point.
 
    If a result is too small to be represented as a denormalized number,
 it is rounded to zero.  However, the sign of the result is preserved; if
 the calculation was negative, the result is "negative zero".  Negative
 zero can also result from some operations on infinity, such as 4/-oo.
 
    At any time one of the above four rounding modes is selected.  You
 can find out which one with this function:
 
  -- Function: int fegetround (void)
      Preliminary: | MT-Safe | AS-Safe | AC-Safe |  POSIX Safety
      Concepts.
 
      Returns the currently selected rounding mode, represented by one of
      the values of the defined rounding mode macros.
 
 To change the rounding mode, use this function:
 
  -- Function: int fesetround (int ROUND)
      Preliminary: | MT-Safe | AS-Safe | AC-Safe |  POSIX Safety
      Concepts.
 
      Changes the currently selected rounding mode to ROUND.  If ROUND
      does not correspond to one of the supported rounding modes nothing
      is changed.  'fesetround' returns zero if it changed the rounding
      mode, a nonzero value if the mode is not supported.
 
    You should avoid changing the rounding mode if possible.  It can be
 an expensive operation; also, some hardware requires you to compile your
 program differently for it to work.  The resulting code may run slower.
 See your compiler documentation for details.
 
Info Catalog (libc.info.gz) Floating Point Errors (libc.info.gz) Arithmetic (libc.info.gz) Control Functions
automatically generated by info2html