(gawk.info.gz) Getopt Function
Info Catalog
(gawk.info.gz) Data File Management
(gawk.info.gz) Library Functions
(gawk.info.gz) Passwd Functions
12.4 Processing Command-Line Options
====================================
Most utilities on POSIX compatible systems take options on the command
line that can be used to change the way a program behaves. `awk' is an
example of such a program ( Options). Often, options take
"arguments"; i.e., data that the program needs to correctly obey the
command-line option. For example, `awk''s `-F' option requires a
string to use as the field separator. The first occurrence on the
command line of either `--' or a string that does not begin with `-'
ends the options.
Modern Unix systems provide a C function named `getopt()' for
processing command-line arguments. The programmer provides a string
describing the one-letter options. If an option requires an argument,
it is followed in the string with a colon. `getopt()' is also passed
the count and values of the command-line arguments and is called in a
loop. `getopt()' processes the command-line arguments for option
letters. Each time around the loop, it returns a single character
representing the next option letter that it finds, or `?' if it finds
an invalid option. When it returns -1, there are no options left on
the command line.
When using `getopt()', options that do not take arguments can be
grouped together. Furthermore, options that take arguments require
that the argument be present. The argument can immediately follow the
option letter, or it can be a separate command-line argument.
Given a hypothetical program that takes three command-line options,
`-a', `-b', and `-c', where `-b' requires an argument, all of the
following are valid ways of invoking the program:
prog -a -b foo -c data1 data2 data3
prog -ac -bfoo -- data1 data2 data3
prog -acbfoo data1 data2 data3
Notice that when the argument is grouped with its option, the rest of
the argument is considered to be the option's argument. In this
example, `-acbfoo' indicates that all of the `-a', `-b', and `-c'
options were supplied, and that `foo' is the argument to the `-b'
option.
`getopt()' provides four external variables that the programmer can
use:
`optind'
The index in the argument value array (`argv') where the first
nonoption command-line argument can be found.
`optarg'
The string value of the argument to an option.
`opterr'
Usually `getopt()' prints an error message when it finds an invalid
option. Setting `opterr' to zero disables this feature. (An
application might want to print its own error message.)
`optopt'
The letter representing the command-line option.
The following C fragment shows how `getopt()' might process
command-line arguments for `awk':
int
main(int argc, char *argv[])
{
...
/* print our own message */
opterr = 0;
while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
switch (c) {
case 'f': /* file */
...
break;
case 'F': /* field separator */
...
break;
case 'v': /* variable assignment */
...
break;
case 'W': /* extension */
...
break;
case '?':
default:
usage();
break;
}
}
...
}
As a side point, `gawk' actually uses the GNU `getopt_long()'
function to process both normal and GNU-style long options (
Options).
The abstraction provided by `getopt()' is very useful and is quite
handy in `awk' programs as well. Following is an `awk' version of
`getopt()'. This function highlights one of the greatest weaknesses in
`awk', which is that it is very poor at manipulating single characters.
Repeated calls to `substr()' are necessary for accessing individual
characters ( String Functions).(1)
The discussion that follows walks through the code a bit at a time:
# getopt.awk --- Do C library getopt(3) function in awk
# External variables:
# Optind -- index in ARGV of first nonoption argument
# Optarg -- string value of argument to current option
# Opterr -- if nonzero, print our own diagnostic
# Optopt -- current option letter
# Returns:
# -1 at end of options
# "?" for unrecognized option
# <c> a character representing the current option
# Private Data:
# _opti -- index in multi-flag option, e.g., -abc
The function starts out with comments presenting a list of the
global variables it uses, what the return values are, what they mean,
and any global variables that are "private" to this library function.
Such documentation is essential for any program, and particularly for
library functions.
The `getopt()' function first checks that it was indeed called with
a string of options (the `options' parameter). If `options' has a zero
length, `getopt()' immediately returns -1:
function getopt(argc, argv, options, thisopt, i)
{
if (length(options) == 0) # no options given
return -1
if (argv[Optind] == "--") { # all done
Optind++
_opti = 0
return -1
} else if (argv[Optind] !~ /^-[^:[:space:]]/) {
_opti = 0
return -1
}
The next thing to check for is the end of the options. A `--' ends
the command-line options, as does any command-line argument that does
not begin with a `-'. `Optind' is used to step through the array of
command-line arguments; it retains its value across calls to
`getopt()', because it is a global variable.
The regular expression that is used, `/^-[^:[:space:]/', checks for
a `-' followed by anything that is not whitespace and not a colon. If
the current command-line argument does not match this pattern, it is
not an option, and it ends option processing. Continuing on:
if (_opti == 0)
_opti = 2
thisopt = substr(argv[Optind], _opti, 1)
Optopt = thisopt
i = index(options, thisopt)
if (i == 0) {
if (Opterr)
printf("%c -- invalid option\n",
thisopt) > "/dev/stderr"
if (_opti >= length(argv[Optind])) {
Optind++
_opti = 0
} else
_opti++
return "?"
}
The `_opti' variable tracks the position in the current command-line
argument (`argv[Optind]'). If multiple options are grouped together
with one `-' (e.g., `-abx'), it is necessary to return them to the user
one at a time.
If `_opti' is equal to zero, it is set to two, which is the index in
the string of the next character to look at (we skip the `-', which is
at position one). The variable `thisopt' holds the character, obtained
with `substr()'. It is saved in `Optopt' for the main program to use.
If `thisopt' is not in the `options' string, then it is an invalid
option. If `Opterr' is nonzero, `getopt()' prints an error message on
the standard error that is similar to the message from the C version of
`getopt()'.
Because the option is invalid, it is necessary to skip it and move
on to the next option character. If `_opti' is greater than or equal
to the length of the current command-line argument, it is necessary to
move on to the next argument, so `Optind' is incremented and `_opti' is
reset to zero. Otherwise, `Optind' is left alone and `_opti' is merely
incremented.
In any case, because the option is invalid, `getopt()' returns `"?"'.
The main program can examine `Optopt' if it needs to know what the
invalid option letter actually is. Continuing on:
if (substr(options, i + 1, 1) == ":") {
# get option argument
if (length(substr(argv[Optind], _opti + 1)) > 0)
Optarg = substr(argv[Optind], _opti + 1)
else
Optarg = argv[++Optind]
_opti = 0
} else
Optarg = ""
If the option requires an argument, the option letter is followed by
a colon in the `options' string. If there are remaining characters in
the current command-line argument (`argv[Optind]'), then the rest of
that string is assigned to `Optarg'. Otherwise, the next command-line
argument is used (`-xFOO' versus `-x FOO'). In either case, `_opti' is
reset to zero, because there are no more characters left to examine in
the current command-line argument. Continuing:
if (_opti == 0 || _opti >= length(argv[Optind])) {
Optind++
_opti = 0
} else
_opti++
return thisopt
}
Finally, if `_opti' is either zero or greater than the length of the
current command-line argument, it means this element in `argv' is
through being processed, so `Optind' is incremented to point to the
next element in `argv'. If neither condition is true, then only
`_opti' is incremented, so that the next option letter can be processed
on the next call to `getopt()'.
The `BEGIN' rule initializes both `Opterr' and `Optind' to one.
`Opterr' is set to one, since the default behavior is for `getopt()' to
print a diagnostic message upon seeing an invalid option. `Optind' is
set to one, since there's no reason to look at the program name, which
is in `ARGV[0]':
BEGIN {
Opterr = 1 # default is to diagnose
Optind = 1 # skip ARGV[0]
# test program
if (_getopt_test) {
while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
printf("c = <%c>, optarg = <%s>\n",
_go_c, Optarg)
printf("non-option arguments:\n")
for (; Optind < ARGC; Optind++)
printf("\tARGV[%d] = <%s>\n",
Optind, ARGV[Optind])
}
}
The rest of the `BEGIN' rule is a simple test program. Here is the
result of two sample runs of the test program:
$ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
-| c = <a>, optarg = <>
-| c = <c>, optarg = <>
-| c = <b>, optarg = <ARG>
-| non-option arguments:
-| ARGV[3] = <bax>
-| ARGV[4] = <-x>
$ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
-| c = <a>, optarg = <>
error--> x -- invalid option
-| c = <?>, optarg = <>
-| non-option arguments:
-| ARGV[4] = <xyz>
-| ARGV[5] = <abc>
In both runs, the first `--' terminates the arguments to `awk', so
that it does not try to interpret the `-a', etc., as its own options.
NOTE: After `getopt()' is through, it is the responsibility of the
user level code to clear out all the elements of `ARGV' from 1 to
`Optind', so that `awk' does not try to process the command-line
options as file names.
Several of the sample programs presented in Sample Programs,
use `getopt()' to process their arguments.
---------- Footnotes ----------
(1) This function was written before `gawk' acquired the ability to
split strings into single characters using `""' as the separator. We
have left it alone, since using `substr()' is more portable.
Info Catalog
(gawk.info.gz) Data File Management
(gawk.info.gz) Library Functions
(gawk.info.gz) Passwd Functions
automatically generated by
info2html