(gawk.info.gz) Internals
Info Catalog
(gawk.info.gz) Dynamic Extensions
(gawk.info.gz) Sample Library
C.3.1 A Minimal Introduction to `gawk' Internals
------------------------------------------------
The truth is that `gawk' was not designed for simple extensibility.
The facilities for adding functions using shared libraries work, but
are something of a "bag on the side." Thus, this tour is brief and
simplistic; would-be `gawk' hackers are encouraged to spend some time
reading the source code before trying to write extensions based on the
material presented here. Of particular note are the files `awk.h',
`builtin.c', and `eval.c'. Reading `awkgram.y' in order to see how the
parse tree is built would also be of use.
With the disclaimers out of the way, the following types, structure
members, functions, and macros are declared in `awk.h' and are of use
when writing extensions. The next minor node shows how they are used:
`AWKNUM'
An `AWKNUM' is the internal type of `awk' floating-point numbers.
Typically, it is a C `double'.
`NODE'
Just about everything is done using objects of type `NODE'. These
contain both strings and numbers, as well as variables and arrays.
`AWKNUM force_number(NODE *n)'
This macro forces a value to be numeric. It returns the actual
numeric value contained in the node. It may end up calling an
internal `gawk' function.
`void force_string(NODE *n)'
This macro guarantees that a `NODE''s string value is current. It
may end up calling an internal `gawk' function. It also
guarantees that the string is zero-terminated.
`size_t get_curfunc_arg_count(void)'
This function returns the actual number of parameters passed to
the current function. Inside the code of an extension this can be
used to determine the maximum index which is safe to use with
`stack_ptr'. If this value is greater than `tree->param_cnt', the
function was called incorrectly from the `awk' program.
*Caution:* This function is new as of `gawk' 3.1.4.
`n->param_cnt'
Inside an extension function, this is the maximum number of
expected parameters, as set by the `make_builtin' function.
`n->stptr'
`n->stlen'
The data and length of a `NODE''s string value, respectively. The
string is _not_ guaranteed to be zero-terminated. If you need to
pass the string value to a C library function, save the value in
`n->stptr[n->stlen]', assign `'\0'' to it, call the routine, and
then restore the value.
`n->type'
The type of the `NODE'. This is a C `enum'. Values should be
either `Node_var' or `Node_var_array' for function parameters.
`n->vname'
The "variable name" of a node. This is not of much use inside
externally written extensions.
`void assoc_clear(NODE *n)'
Clears the associative array pointed to by `n'. Make sure that
`n->type == Node_var_array' first.
`NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference)'
Finds, and installs if necessary, array elements. `symbol' is the
array, `subs' is the subscript. This is usually a value created
with `tmp_string' (see below). `reference' should be `TRUE' if it
is an error to use the value before it is created. Typically,
`FALSE' is the correct value to use from extension functions.
`NODE *make_string(char *s, size_t len)'
Take a C string and turn it into a pointer to a `NODE' that can be
stored appropriately. This is permanent storage; understanding of
`gawk' memory management is helpful.
`NODE *make_number(AWKNUM val)'
Take an `AWKNUM' and turn it into a pointer to a `NODE' that can
be stored appropriately. This is permanent storage; understanding
of `gawk' memory management is helpful.
`NODE *tmp_string(char *s, size_t len);'
Take a C string and turn it into a pointer to a `NODE' that can be
stored appropriately. This is temporary storage; understanding of
`gawk' memory management is helpful.
`NODE *tmp_number(AWKNUM val)'
Take an `AWKNUM' and turn it into a pointer to a `NODE' that can
be stored appropriately. This is temporary storage; understanding
of `gawk' memory management is helpful.
`NODE *dupnode(NODE *n)'
Duplicate a node. In most cases, this increments an internal
reference count instead of actually duplicating the entire `NODE';
understanding of `gawk' memory management is helpful.
`void free_temp(NODE *n)'
This macro releases the memory associated with a `NODE' allocated
with `tmp_string' or `tmp_number'. Understanding of `gawk' memory
management is helpful.
`void make_builtin(char *name, NODE *(*func)(NODE *), int count)'
Register a C function pointed to by `func' as new built-in
function `name'. `name' is a regular C string. `count' is the
maximum number of arguments that the function takes. The function
should be written in the following manner:
/* do_xxx --- do xxx function for gawk */
NODE *
do_xxx(NODE *tree)
{
...
}
`NODE *get_argument(NODE *tree, int i)'
This function is called from within a C extension function to get
the `i'-th argument from the function call. The first argument is
argument zero.
`NODE *get_actual_argument(NODE *tree, unsigned int i,'
` int optional, int wantarray);'
This function retrieves a particular argument `i'. `wantarray' is
`TRUE' if the argument should be an array, `FALSE' otherwise. If
`optional' is `TRUE', the argument need not have been supplied.
If it wasn't, the return value is `NULL'. It is a fatal error if
`optional' is `TRUE' but the argument was not provided.
*Caution:* This function is new as of `gawk' 3.1.4.
`get_scalar_argument(t, i, opt)'
This is a convenience macro that calls `get_actual_argument'.
*Caution:* This macro is new as of `gawk' 3.1.4.
`get_array_argument(t, i, opt)'
This is a convenience macro that calls `get_actual_argument'.
*Caution:* This macro is new as of `gawk' 3.1.4.
`void set_value(NODE *tree)'
This function is called from within a C extension function to set
the return value from the extension function. This value is what
the `awk' program sees as the return value from the new `awk'
function.
`void update_ERRNO(void)'
This function is called from within a C extension function to set
the value of `gawk''s `ERRNO' variable, based on the current value
of the C `errno' variable. It is provided as a convenience.
`void update_ERRNO_saved(int errno_saved)'
This function is called from within a C extension function to set
the value of `gawk''s `ERRNO' variable, based on the saved value
of the C `errno' variable provided as the argument. It is
provided as a convenience.
*Caution:* This function is new as of `gawk' 3.1.5.
`void register_deferred_variable(const char *name, NODE *(*load_func)(void))'
This function is called to register a function to be called when a
reference to an undefined variable with the given name is
encountered. The callback function will never be called if the
variable exists already, so, unless the calling code is running at
program startup, it should first check whether a variable of the
given name already exists. The argument function must return a
pointer to a NODE containing the newly created variable. This
function is used to implement the builtin `ENVIRON' and `PROCINFO'
variables, so you can refer to them for examples.
*Caution:* This function is new as of `gawk' 3.1.5.
`void register_open_hook(void *(*open_func)(IOBUF *))'
This function is called to register a function to be called
whenever a new data file is opened, leading to the creation of an
`IOBUF' structure in `iop_alloc'. After creating the new `IOBUF',
`iop_alloc' will call (in reverse order of registration, so the
last function registered is called first) each open hook until one
returns non-NULL. If any hook returns a non-NULL value, that
value is assigned to the `IOBUF''s `opaque' field (which will
presumably point to a structure containing additional state
associated with the input processing), and no further open hooks
are called.
The function called will most likely want to set the `IOBUF'
`get_record' method to indicate that future input records should
be retrieved by calling that method instead of using the standard
`gawk' input processing.
And the function will also probably want to set the `IOBUF'
`close_func' method to be called when the file is closed to clean
up any state associated with the input.
Finally, hook functions should be prepared to receive an `IOBUF'
structure where the `fd' field is set to `INVALID_HANDLE', meaning
that `gawk' was not able to open the file itself. In this case,
the hook function must be able to successfully open the file and
place a valid file descriptor there.
Currently, for example, the hook function facility is used to
implement the XML parser shared library extension. For more info,
please look in `awk.h' and in `io.c'.
*Caution:* This function is new as of `gawk' 3.1.5.
An argument that is supposed to be an array needs to be handled with
some extra code, in case the array being passed in is actually from a
function parameter.
In versions of `gawk' up to and including 3.1.2, the following
boilerplate code shows how to do this:
NODE *the_arg;
the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */
/* if a parameter, get it off the stack */
if (the_arg->type == Node_param_list)
the_arg = stack_ptr[the_arg->param_cnt];
/* parameter referenced an array, get it */
if (the_arg->type == Node_array_ref)
the_arg = the_arg->orig_array;
/* check type */
if (the_arg->type != Node_var && the_arg->type != Node_var_array)
fatal("newfunc: third argument is not an array");
/* force it to be an array, if necessary, clear it */
the_arg->type = Node_var_array;
assoc_clear(the_arg);
For versions 3.1.3 and later, the internals changed. In particular,
the interface was actually _simplified_ drastically. The following
boilerplate code now suffices:
NODE *the_arg;
the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */
/* force it to be an array: */
the_arg = get_array(the_arg);
/* if necessary, clear it: */
assoc_clear(the_arg);
As of version 3.1.4, the internals improved again, and became even
simpler:
NODE *the_arg;
the_arg = get_array_argument(tree, 2, FALSE); /* assume need 3rd arg, 0-based */
Again, you should spend time studying the `gawk' internals; don't
just blindly copy this code.
Info Catalog
(gawk.info.gz) Dynamic Extensions
(gawk.info.gz) Sample Library
automatically generated by
info2html