(gawk.info.gz) Conversion
Info Catalog
(gawk.info.gz) Variables
(gawk.info.gz) Values
6.1.4 Conversion of Strings and Numbers
---------------------------------------
Strings are converted to numbers and numbers are converted to strings,
if the context of the `awk' program demands it. For example, if the
value of either `foo' or `bar' in the expression `foo + bar' happens to
be a string, it is converted to a number before the addition is
performed. If numeric values appear in string concatenation, they are
converted to strings. Consider the following:
two = 2; three = 3
print (two three) + 4
This prints the (numeric) value 27. The numeric values of the
variables `two' and `three' are converted to strings and concatenated
together. The resulting string is converted back to the number 23, to
which 4 is then added.
If, for some reason, you need to force a number to be converted to a
string, concatenate that number with the empty string, `""'. To force
a string to be converted to a number, add zero to that string. A
string is converted to a number by interpreting any numeric prefix of
the string as numerals: `"2.5"' converts to 2.5, `"1e3"' converts to
1000, and `"25fix"' has a numeric value of 25. Strings that can't be
interpreted as valid numbers convert to zero.
The exact manner in which numbers are converted into strings is
controlled by the `awk' built-in variable `CONVFMT' ( Built-in
Variables). Numbers are converted using the `sprintf()' function
with `CONVFMT' as the format specifier ( String Functions).
`CONVFMT''s default value is `"%.6g"', which prints a value with at
most six significant digits. For some applications, you might want to
change it to specify more precision. On most modern machines, 17
digits is usually enough to capture a floating-point number's value
exactly.(1)
Strange results can occur if you set `CONVFMT' to a string that
doesn't tell `sprintf()' how to format floating-point numbers in a
useful way. For example, if you forget the `%' in the format, `awk'
converts all numbers to the same constant string.
As a special case, if a number is an integer, then the result of
converting it to a string is _always_ an integer, no matter what the
value of `CONVFMT' may be. Given the following code fragment:
CONVFMT = "%2.2f"
a = 12
b = a ""
`b' has the value `"12"', not `"12.00"'. (d.c.)
Prior to the POSIX standard, `awk' used the value of `OFMT' for
converting numbers to strings. `OFMT' specifies the output format to
use when printing numbers with `print'. `CONVFMT' was introduced in
order to separate the semantics of conversion from the semantics of
printing. Both `CONVFMT' and `OFMT' have the same default value:
`"%.6g"'. In the vast majority of cases, old `awk' programs do not
change their behavior. However, these semantics for `OFMT' are
something to keep in mind if you must port your new-style program to
older implementations of `awk'. We recommend that instead of changing
your programs, just port `gawk' itself. Print, for more
information on the `print' statement.
And, once again, where you are can matter when it comes to converting
between numbers and strings. In Locales, we mentioned that the
local character set and language (the locale) can affect how `gawk'
matches characters. The locale also affects numeric formats. In
particular, for `awk' programs, it affects the decimal point character.
The `"C"' locale, and most English-language locales, use the period
character (`.') as the decimal point. However, many (if not most)
European and non-English locales use the comma (`,') as the decimal
point character.
The POSIX standard says that `awk' always uses the period as the
decimal point when reading the `awk' program source code, and for
command-line variable assignments ( Other Arguments). However,
when interpreting input data, for `print' and `printf' output, and for
number to string conversion, the local decimal point character is used.
Here are some examples indicating the difference in behavior, on a
GNU/Linux system:
$ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
-| 3.14159
$ LC_ALL=en_DK gawk 'BEGIN { printf "%g\n", 3.1415927 }'
-| 3,14159
$ echo 4,321 | gawk '{ print $1 + 1 }'
-| 5
$ echo 4,321 | LC_ALL=en_DK gawk '{ print $1 + 1 }'
-| 5,321
The `en_DK' locale is for English in Denmark, where the comma acts as
the decimal point separator. In the normal `"C"' locale, `gawk' treats
`4,321' as `4', while in the Danish locale, it's treated as the full
number, 4.321.
Some earlier versions of `gawk' fully complied with this aspect of
the standard. However, many users in non-English locales complained
about this behavior, since their data used a period as the decimal
point, so the default behavior was restored to use a period as the
decimal point character. You can use the `--use-lc-numeric' option
( Options) to force `gawk' to use the locale's decimal point
character. (`gawk' also uses the locale's decimal point character when
in POSIX mode, either via `--posix', or the `POSIXLY_CORRECT'
environment variable.)
table-locale-affects describes the cases in which the
locale's decimal point character is used and when a period is used.
Some of these features have not been described yet.
Feature Default `--posix' or `--use-lc-numeric'
------------------------------------------------------------
`%'g' Use locale Use locale
`%g' Use period Use locale
Input Use period Use locale
`strtonum()'Use period Use locale
Table 6.1: Locale Decimal Point versus A Period
Finally, modern day formal standards and IEEE standard floating point
representation can have an unusual but important effect on the way
`gawk' converts some special string values to numbers. The details are
presented in POSIX Floating Point Problems.
---------- Footnotes ----------
(1) Pathological cases can require up to 752 digits (!), but we
doubt that you need to worry about this.
Info Catalog
(gawk.info.gz) Variables
(gawk.info.gz) Values
automatically generated by
info2html