(gawk.info.gz) Field Splitting Summary

(gawk.info.gz) Command Line Field Separator
 
 4.5.5 Field-Splitting Summary
 -----------------------------
 
 It is important to remember that when you assign a string constant as
 the value of `FS', it undergoes normal `awk' string processing.  For
 example, with Unix `awk' and `gawk', the assignment `FS = "\.."'
 assigns the character string `".."' to `FS' (the backslash is
 stripped).  This creates a regexp meaning "fields are separated by
 occurrences of any two characters."  If instead you want fields to be
 separated by a literal period followed by any single character, use `FS
 = "\\.."'.
 
    The following table summarizes how fields are split, based on the
 value of `FS' (`==' means "is equal to"):
 
 `FS == " "'
      Fields are separated by runs of whitespace.  Leading and trailing
      whitespace are ignored.  This is the default.
 
 `FS == ANY OTHER SINGLE CHARACTER'
      Fields are separated by each occurrence of the character.  Multiple
      successive occurrences delimit empty fields, as do leading and
      trailing occurrences.  The character can even be a regexp
      metacharacter; it does not need to be escaped.
 
 `FS == REGEXP'
      Fields are separated by occurrences of characters that match
      REGEXP.  Leading and trailing matches of REGEXP delimit empty
      fields.
 
 `FS == ""'
      Each individual character in the record becomes a separate field.
      (This is a `gawk' extension; it is not specified by the POSIX
      standard.)
 
 Advanced Notes: Changing `FS' Does Not Affect the Fields
 --------------------------------------------------------
 
 According to the POSIX standard, `awk' is supposed to behave as if each
 record is split into fields at the time it is read.  In particular,
 this means that if you change the value of `FS' after a record is read,
 the value of the fields (i.e., how they were split) should reflect the
 old value of `FS', not the new one.
 
    However, many older implementations of `awk' do not work this way.
 Instead, they defer splitting the fields until a field is actually
 referenced.  The fields are split using the _current_ value of `FS'!
 (d.c.)  This behavior can be difficult to diagnose. The following
 example illustrates the difference between the two methods.  (The
 `sed'(1) command prints just the first line of `/etc/passwd'.)
 
      sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'
 
 which usually prints:
 
      root
 
 on an incorrect implementation of `awk', while `gawk' prints something
 like:
 
      root:nSijPlPhZZwgE:0:0:Root:/:
 
 Advanced Notes: `FS' and `IGNORECASE'
 -------------------------------------
 
 The `IGNORECASE' variable ( User-modified) affects field
 splitting _only_ when the value of `FS' is a regexp.  It has no effect
 when `FS' is a single character, even if that character is a letter.
 Thus, in the following code:
 
      FS = "c"
      IGNORECASE = 1
      $0 = "aCa"
      print $1
 
 The output is `aCa'.  If you really want to split fields on an
 alphabetic character while ignoring case, use a regexp that will do it
 for you.  E.g., `FS = "[c]"'.  In this case, `IGNORECASE' will take
 effect.
 
    ---------- Footnotes ----------
 
    (1) The `sed' utility is a "stream editor."  Its behavior is also
 defined by the POSIX standard.
Info Catalog
(gawk.info.gz) Command Line Field Separator
(gawk.info.gz) Field Separators
automatically generated by info2html