(gawk.info.gz) Case-sensitivity

Info Catalog (gawk.info.gz) GNU Regexp Operators (gawk.info.gz) Regexp (gawk.info.gz) Leftmost Longest
 
 3.6 Case Sensitivity in Matching
 ================================
 
 Case is normally significant in regular expressions, both when matching
 ordinary characters (i.e., not metacharacters) and inside bracket
 expressions.  Thus, a `w' in a regular expression matches only a
 lowercase `w' and not an uppercase `W'.
 
    The simplest way to do a case-independent match is to use a bracket
 expression--for example, `[Ww]'.  However, this can be cumbersome if
 you need to use it often, and it can make the regular expressions harder
 to read.  There are two alternatives that you might prefer.
 
    One way to perform a case-insensitive match at a particular point in
 the program is to convert the data to a single case, using the
 `tolower()' or `toupper()' built-in string functions (which we haven't
 discussed yet;  String Functions).  For example:
 
      tolower($1) ~ /foo/  { ... }
 
 converts the first field to lowercase before matching against it.  This
 works in any POSIX-compliant `awk'.
 
    Another method, specific to `gawk', is to set the variable
 `IGNORECASE' to a nonzero value ( Built-in Variables).  When
 `IGNORECASE' is not zero, _all_ regexp and string operations ignore
 case.  Changing the value of `IGNORECASE' dynamically controls the
 case-sensitivity of the program as it runs.  Case is significant by
 default because `IGNORECASE' (like most variables) is initialized to
 zero:
 
      x = "aB"
      if (x ~ /ab/) ...   # this test will fail
 
      IGNORECASE = 1
      if (x ~ /ab/) ...   # now it will succeed
 
    In general, you cannot use `IGNORECASE' to make certain rules
 case-insensitive and other rules case-sensitive, because there is no
 straightforward way to set `IGNORECASE' just for the pattern of a
 particular rule.(1) To do this, use either bracket expressions or
 `tolower()'.  However, one thing you can do with `IGNORECASE' only is
 dynamically turn case-sensitivity on or off for all the rules at once.
 
    `IGNORECASE' can be set on the command line or in a `BEGIN' rule
 ( Other Arguments; also  Using BEGIN/END).  Setting
 `IGNORECASE' from the command line is a way to make a program
 case-insensitive without having to edit it.
 
    Both regexp and string comparison operations are affected by
 `IGNORECASE'.
 
    In multibyte locales, the equivalences between upper- and lowercase
 characters are tested based on the wide-character values of the
 locale's character set.  Otherwise, the characters are tested based on
 the ISO-8859-1 (ISO Latin-1) character set. This character set is a
 superset of the traditional 128 ASCII characters, which also provides a
 number of characters suitable for use with European languages.(2)
 
    The value of `IGNORECASE' has no effect if `gawk' is in
 compatibility mode ( Options).  Case is always significant in
 compatibility mode.
 
    ---------- Footnotes ----------
 
    (1) Experienced C and C++ programmers will note that it is possible,
 using something like `IGNORECASE = 1 && /foObAr/ { ... }' and
 `IGNORECASE = 0 || /foobar/ { ... }'.  However, this is somewhat
 obscure and we don't recommend it.
 
    (2) If you don't understand this, don't worry about it; it just
 means that `gawk' does the right thing.
 
Info Catalog (gawk.info.gz) GNU Regexp Operators (gawk.info.gz) Regexp (gawk.info.gz) Leftmost Longest
automatically generated by info2html