(gawk.info.gz) Escape Sequences

Info Catalog (gawk.info.gz) Regexp Usage (gawk.info.gz) Regexp (gawk.info.gz) Regexp Operators
 
 3.2 Escape Sequences
 ====================
 
 Some characters cannot be included literally in string constants
 (`"foo"') or regexp constants (`/foo/').  Instead, they should be
 represented with "escape sequences", which are character sequences
 beginning with a backslash (`\').  One use of an escape sequence is to
 include a double-quote character in a string constant.  Because a plain
 double quote ends the string, you must use `\"' to represent an actual
 double-quote character as a part of the string.  For example:
 
      $ awk 'BEGIN { print "He said \"hi!\" to her." }'
      -| He said "hi!" to her.
 
    The  backslash character itself is another character that cannot be
 included normally; you must write `\\' to put one backslash in the
 string or regexp.  Thus, the string whose contents are the two
 characters `"' and `\' must be written `"\"\\"'.
 
    Other escape sequences represent unprintable characters such as TAB
 or newline.  While there is nothing to stop you from entering most
 unprintable characters directly in a string constant or regexp constant,
 they may look ugly.
 
    The following table lists all the escape sequences used in `awk' and
 what they represent. Unless noted otherwise, all these escape sequences
 apply to both string constants and regexp constants:
 
 `\\'
      A literal backslash, `\'.
 
 `\a'
      The "alert" character, `Ctrl-g', ASCII code 7 (BEL).  (This
      usually makes some sort of audible noise.)
 
 `\b'
      Backspace, `Ctrl-h', ASCII code 8 (BS).
 
 `\f'
      Formfeed, `Ctrl-l', ASCII code 12 (FF).
 
 `\n'
      Newline, `Ctrl-j', ASCII code 10 (LF).
 
 `\r'
      Carriage return, `Ctrl-m', ASCII code 13 (CR).
 
 `\t'
      Horizontal TAB, `Ctrl-i', ASCII code 9 (HT).
 
 `\v'
      Vertical tab, `Ctrl-k', ASCII code 11 (VT).
 
 `\NNN'
      The octal value NNN, where NNN stands for 1 to 3 digits between
      `0' and `7'.  For example, the code for the ASCII ESC (escape)
      character is `\033'.
 
 `\xHH...'
      The hexadecimal value HH, where HH stands for a sequence of
      hexadecimal digits (`0'-`9', and either `A'-`F' or `a'-`f').  Like
      the same construct in ISO C, the escape sequence continues until
      the first nonhexadecimal digit is seen. (c.e.)  However, using
      more than two hexadecimal digits produces undefined results. (The
      `\x' escape sequence is not allowed in POSIX `awk'.)
 
 `\/'
      A literal slash (necessary for regexp constants only).  This
      sequence is used when you want to write a regexp constant that
      contains a slash. Because the regexp is delimited by slashes, you
      need to escape the slash that is part of the pattern, in order to
      tell `awk' to keep processing the rest of the regexp.
 
 `\"'
      A literal double quote (necessary for string constants only).
      This sequence is used when you want to write a string constant
      that contains a double quote. Because the string is delimited by
      double quotes, you need to escape the quote that is part of the
      string, in order to tell `awk' to keep processing the rest of the
      string.
 
    In `gawk', a number of additional two-character sequences that begin
 with a backslash have special meaning in regexps.   GNU Regexp
 Operators.
 
    In a regexp, a backslash before any character that is not in the
 previous list and not listed in  GNU Regexp Operators, means
 that the next character should be taken literally, even if it would
 normally be a regexp operator.  For example, `/a\+b/' matches the three
 characters `a+b'.
 
    For complete portability, do not use a backslash before any
 character not shown in the previous list.
 
    To summarize:
 
    * The escape sequences in the table above are always processed first,
      for both string constants and regexp constants. This happens very
      early, as soon as `awk' reads your program.
 
DONTPRINTYET     * `gawk' processes both regexp constants and dynamic regexps (
      Computed Regexps), for the special operators listed in *note GNU
DONTPRINTYET     * `gawk' processes both regexp constants and dynamic regexps (
      Computed Regexps), for the special operators listed in  GNU

      Regexp Operators.
 
    * A backslash before any other character means to treat that
      character literally.
 
 Advanced Notes: Backslash Before Regular Characters
 ---------------------------------------------------
 
 If you place a backslash in a string constant before something that is
 not one of the characters previously listed, POSIX `awk' purposely
 leaves what happens as undefined.  There are two choices:
 
 Strip the backslash out
      This is what Brian Kernighan's `awk' and `gawk' both do.  For
      example, `"a\qc"' is the same as `"aqc"'.  (Because this is such
      an easy bug both to introduce and to miss, `gawk' warns you about
      it.)  Consider `FS = "[ \t]+\|[ \t]+"' to use vertical bars
      surrounded by whitespace as the field separator. There should be
      two backslashes in the string: `FS = "[ \t]+\\|[ \t]+"'.)
 
 Leave the backslash alone
      Some other `awk' implementations do this.  In such
      implementations, typing `"a\qc"' is the same as typing `"a\\qc"'.
 
 Advanced Notes: Escape Sequences for Metacharacters
 ---------------------------------------------------
 
 Suppose you use an octal or hexadecimal escape to represent a regexp
 metacharacter.  (See  Regexp Operators.)  Does `awk' treat the
 character as a literal character or as a regexp operator?
 
    Historically, such characters were taken literally.  (d.c.)
 However, the POSIX standard indicates that they should be treated as
 real metacharacters, which is what `gawk' does.  In compatibility mode
 ( Options), `gawk' treats the characters represented by octal
 and hexadecimal escape sequences literally when used in regexp
 constants. Thus, `/a\52b/' is equivalent to `/a\*b/'.
 
Info Catalog (gawk.info.gz) Regexp Usage (gawk.info.gz) Regexp (gawk.info.gz) Regexp Operators
automatically generated by info2html