(gawk.info.gz) Field Splitting Summary
Info Catalog
(gawk.info.gz) Command Line Field Separator
(gawk.info.gz) Field Separators
4.5.5 Field-Splitting Summary
-----------------------------
It is important to remember that when you assign a string constant as
the value of `FS', it undergoes normal `awk' string processing. For
example, with Unix `awk' and `gawk', the assignment `FS = "\.."'
assigns the character string `".."' to `FS' (the backslash is
stripped). This creates a regexp meaning "fields are separated by
occurrences of any two characters." If instead you want fields to be
separated by a literal period followed by any single character, use `FS
= "\\.."'.
The following table summarizes how fields are split, based on the
value of `FS' (`==' means "is equal to"):
`FS == " "'
Fields are separated by runs of whitespace. Leading and trailing
whitespace are ignored. This is the default.
`FS == ANY OTHER SINGLE CHARACTER'
Fields are separated by each occurrence of the character. Multiple
successive occurrences delimit empty fields, as do leading and
trailing occurrences. The character can even be a regexp
metacharacter; it does not need to be escaped.
`FS == REGEXP'
Fields are separated by occurrences of characters that match
REGEXP. Leading and trailing matches of REGEXP delimit empty
fields.
`FS == ""'
Each individual character in the record becomes a separate field.
(This is a `gawk' extension; it is not specified by the POSIX
standard.)
Advanced Notes: Changing `FS' Does Not Affect the Fields
--------------------------------------------------------
According to the POSIX standard, `awk' is supposed to behave as if each
record is split into fields at the time it is read. In particular,
this means that if you change the value of `FS' after a record is read,
the value of the fields (i.e., how they were split) should reflect the
old value of `FS', not the new one.
However, many older implementations of `awk' do not work this way.
Instead, they defer splitting the fields until a field is actually
referenced. The fields are split using the _current_ value of `FS'!
(d.c.) This behavior can be difficult to diagnose. The following
example illustrates the difference between the two methods. (The
`sed'(1) command prints just the first line of `/etc/passwd'.)
sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'
which usually prints:
root
on an incorrect implementation of `awk', while `gawk' prints something
like:
root:nSijPlPhZZwgE:0:0:Root:/:
Advanced Notes: `FS' and `IGNORECASE'
-------------------------------------
The `IGNORECASE' variable ( User-modified) affects field
splitting _only_ when the value of `FS' is a regexp. It has no effect
when `FS' is a single character, even if that character is a letter.
Thus, in the following code:
FS = "c"
IGNORECASE = 1
$0 = "aCa"
print $1
The output is `aCa'. If you really want to split fields on an
alphabetic character while ignoring case, use a regexp that will do it
for you. E.g., `FS = "[c]"'. In this case, `IGNORECASE' will take
effect.
---------- Footnotes ----------
(1) The `sed' utility is a "stream editor." Its behavior is also
defined by the POSIX standard.
Info Catalog
(gawk.info.gz) Command Line Field Separator
(gawk.info.gz) Field Separators
automatically generated by
info2html