(gawk.info.gz) Filetrans Function
Info Catalog
(gawk.info.gz) Data File Management
(gawk.info.gz) Rewind Function
12.3.1 Noting Data File Boundaries
----------------------------------
The `BEGIN' and `END' rules are each executed exactly once at the
beginning and end of your `awk' program, respectively (
BEGIN/END). We (the `gawk' authors) once had a user who mistakenly
thought that the `BEGIN' rule is executed at the beginning of each data
file and the `END' rule is executed at the end of each data file.
When informed that this was not the case, the user requested that we
add new special patterns to `gawk', named `BEGIN_FILE' and `END_FILE',
that would have the desired behavior. He even supplied us the code to
do so.
Adding these special patterns to `gawk' wasn't necessary; the job
can be done cleanly in `awk' itself, as illustrated by the following
library program. It arranges to call two user-supplied functions,
`beginfile()' and `endfile()', at the beginning and end of each data
file. Besides solving the problem in only nine(!) lines of code, it
does so _portably_; this works with any implementation of `awk':
# transfile.awk
#
# Give the user a hook for filename transitions
#
# The user must supply functions beginfile() and endfile()
# that each take the name of the file being started or
# finished, respectively.
FILENAME != _oldfilename \
{
if (_oldfilename != "")
endfile(_oldfilename)
_oldfilename = FILENAME
beginfile(FILENAME)
}
END { endfile(FILENAME) }
This file must be loaded before the user's "main" program, so that
the rule it supplies is executed first.
This rule relies on `awk''s `FILENAME' variable that automatically
changes for each new data file. The current file name is saved in a
private variable, `_oldfilename'. If `FILENAME' does not equal
`_oldfilename', then a new data file is being processed and it is
necessary to call `endfile()' for the old file. Because `endfile()'
should only be called if a file has been processed, the program first
checks to make sure that `_oldfilename' is not the null string. The
program then assigns the current file name to `_oldfilename' and calls
`beginfile()' for the file. Because, like all `awk' variables,
`_oldfilename' is initialized to the null string, this rule executes
correctly even for the first data file.
The program also supplies an `END' rule to do the final processing
for the last file. Because this `END' rule comes before any `END' rules
supplied in the "main" program, `endfile()' is called first. Once
again the value of multiple `BEGIN' and `END' rules should be clear.
If the same data file occurs twice in a row on the command line, then
`endfile()' and `beginfile()' are not executed at the end of the first
pass and at the beginning of the second pass. The following version
solves the problem:
# ftrans.awk --- handle data file transitions
#
# user supplies beginfile() and endfile() functions
FNR == 1 {
if (_filename_ != "")
endfile(_filename_)
_filename_ = FILENAME
beginfile(FILENAME)
}
END { endfile(_filename_) }
Wc Program, shows how this library function can be used and
how it simplifies writing the main program.
Advanced Notes: So Why Does `gawk' have `BEGINFILE' and `ENDFILE'?
------------------------------------------------------------------
You are probably wondering, if `beginfile()' and `endfile()' functions
can do the job, why does `gawk' have `BEGINFILE' and `ENDFILE' patterns
( BEGINFILE/ENDFILE)?
Good question. Normally, if `awk' cannot open a file, this causes
an immediate fatal error. In this case, there is no way for a
user-defined function to deal with the problem, since the mechanism for
calling it relies on the file being open and at the first record. Thus,
the main reason for `BEGINFILE' is to give you a "hook" to catch files
that cannot be processed. `ENDFILE' exists for symmetry, and because
it provides an easy way to do per-file cleanup processing.
Info Catalog
(gawk.info.gz) Data File Management
(gawk.info.gz) Rewind Function
automatically generated by
info2html