(gawk.info.gz) Filetrans Function

Info Catalog (gawk.info.gz) Data File Management (gawk.info.gz) Rewind Function
 
 12.3.1 Noting Data File Boundaries
 ----------------------------------
 
 The `BEGIN' and `END' rules are each executed exactly once at the
 beginning and end of your `awk' program, respectively (
 BEGIN/END).  We (the `gawk' authors) once had a user who mistakenly
 thought that the `BEGIN' rule is executed at the beginning of each data
 file and the `END' rule is executed at the end of each data file.
 
    When informed that this was not the case, the user requested that we
 add new special patterns to `gawk', named `BEGIN_FILE' and `END_FILE',
 that would have the desired behavior.  He even supplied us the code to
 do so.
 
    Adding these special patterns to `gawk' wasn't necessary; the job
 can be done cleanly in `awk' itself, as illustrated by the following
 library program.  It arranges to call two user-supplied functions,
 `beginfile()' and `endfile()', at the beginning and end of each data
 file.  Besides solving the problem in only nine(!) lines of code, it
 does so _portably_; this works with any implementation of `awk':
 
      # transfile.awk
      #
      # Give the user a hook for filename transitions
      #
      # The user must supply functions beginfile() and endfile()
      # that each take the name of the file being started or
      # finished, respectively.
 
      FILENAME != _oldfilename \
      {
          if (_oldfilename != "")
              endfile(_oldfilename)
          _oldfilename = FILENAME
          beginfile(FILENAME)
      }
 
      END   { endfile(FILENAME) }
 
    This file must be loaded before the user's "main" program, so that
 the rule it supplies is executed first.
 
    This rule relies on `awk''s `FILENAME' variable that automatically
 changes for each new data file.  The current file name is saved in a
 private variable, `_oldfilename'.  If `FILENAME' does not equal
 `_oldfilename', then a new data file is being processed and it is
 necessary to call `endfile()' for the old file.  Because `endfile()'
 should only be called if a file has been processed, the program first
 checks to make sure that `_oldfilename' is not the null string.  The
 program then assigns the current file name to `_oldfilename' and calls
 `beginfile()' for the file.  Because, like all `awk' variables,
 `_oldfilename' is initialized to the null string, this rule executes
 correctly even for the first data file.
 
    The program also supplies an `END' rule to do the final processing
 for the last file.  Because this `END' rule comes before any `END' rules
 supplied in the "main" program, `endfile()' is called first.  Once
 again the value of multiple `BEGIN' and `END' rules should be clear.
 
    If the same data file occurs twice in a row on the command line, then
 `endfile()' and `beginfile()' are not executed at the end of the first
 pass and at the beginning of the second pass.  The following version
 solves the problem:
 
      # ftrans.awk --- handle data file transitions
      #
      # user supplies beginfile() and endfile() functions
 
      FNR == 1 {
          if (_filename_ != "")
              endfile(_filename_)
          _filename_ = FILENAME
          beginfile(FILENAME)
      }
 
      END  { endfile(_filename_) }
 
     Wc Program, shows how this library function can be used and
 how it simplifies writing the main program.
 
 Advanced Notes: So Why Does `gawk' have `BEGINFILE' and `ENDFILE'?
 ------------------------------------------------------------------
 
 You are probably wondering, if `beginfile()' and `endfile()' functions
 can do the job, why does `gawk' have `BEGINFILE' and `ENDFILE' patterns
 ( BEGINFILE/ENDFILE)?
 
    Good question.  Normally, if `awk' cannot open a file, this causes
 an immediate fatal error.  In this case, there is no way for a
 user-defined function to deal with the problem, since the mechanism for
 calling it relies on the file being open and at the first record.  Thus,
 the main reason for `BEGINFILE' is to give you a "hook" to catch files
 that cannot be processed.  `ENDFILE' exists for symmetry, and because
 it provides an easy way to do per-file cleanup processing.
 
Info Catalog (gawk.info.gz) Data File Management (gawk.info.gz) Rewind Function
automatically generated by info2html