(gawk.info.gz) Finding The Bug

Info Catalog (gawk.info.gz) dgawk invocation (gawk.info.gz) Sample dgawk session
 
 14.2.2 Finding The Bug
 ----------------------
 
 Let's say that we are having a problem using (a faulty version of)
 `uniq.awk' in the "field-skipping" mode, and it doesn't seem to be
 catching lines which should be identical when skipping the first field,
 such as:
 
      awk is a wonderful program!
      gawk is a wonderful program!
 
    This could happen if we were thinking (C-like) of the fields in a
 record as being numbered in a zero-based fashion, so instead of the
 lines:
 
      clast = join(alast, fcount+1, n)
      cline = join(aline, fcount+1, m)
 
 we wrote:
 
      clast = join(alast, fcount, n)
      cline = join(aline, fcount, m)
 
    The first thing we usually want to do when trying to investigate a
 problem like this is to put a breakpoint in the program so that we can
 watch it at work and catch what it is doing wrong.  A reasonable spot
 for a breakpoint in `uniq.awk' is at the beginning of the function
 `are_equal()', which compares the current line with the previous one.
 To set the breakpoint, use the `b' (breakpoint) command:
 
      dgawk> b are_equal
      -| Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 64
 
    The debugger tells us the file and line number where the breakpoint
 is.  Now type `r' or `run' and the program runs until it hits the
 breakpoint for the first time:
 
      dgawk> r
      -| Starting program:
      -| Stopping in Rule ...
      -| Breakpoint 1, are_equal(n, m, clast, cline, alast, aline)
               at `awklib/eg/prog/uniq.awk':64
      -| 64          if (fcount == 0 && charcount == 0)
      dgawk>
 
    Now we can look at what's going on inside our program.  First of all,
 let's see how we got to where we are.  At the prompt, we type `bt'
 (short for "backtrace"), and `dgawk' responds with a listing of the
 current stack frames:
 
      dgawk> bt
      -| #0  are_equal(n, m, clast, cline, alast, aline)
               at `awklib/eg/prog/uniq.awk':69
      -| #1  in main() at `awklib/eg/prog/uniq.awk':89
 
    This tells us that `are_equal()' was called by the main program at
 line 89 of `uniq.awk'.  (This is not a big surprise, since this is the
 only call to `are_equal()' in the program, but in more complex
 programs, knowing who called a function and with what parameters can be
 the key to finding the source of the problem.)
 
    Now that we're in `are_equal()', we can start looking at the values
 of some variables.  Let's say we type `p n' (`p' is short for "print").
 We would expect to see the value of `n', a parameter to `are_equal()'.
 Actually, `dgawk' gives us:
 
      dgawk> p n
      -| n = untyped variable
 
 In this case, `n' is an uninitialized local variable, since the
 function was called without arguments ( Function Calls).
 
    A more useful variable to display might be the current record:
 
      dgawk> p $0
      -| $0 = string ("gawk is a wonderful program!")
 
 This might be a bit puzzling at first since this is the second line of
 our test input above.  Let's look at `NR':
 
      dgawk> p NR
      -| NR = number (2)
 
 So we can see that `are_equal()' was only called for the second record
 of the file.  Of course, this is because our program contained a rule
 for `NR == 1':
 
      NR == 1 {
          last = $0
          next
      }
 
    OK, let's just check that that rule worked correctly:
 
      dgawk> p last
      -| last = string ("awk is a wonderful program!")
 
    Everything we have done so far has verified that the program has
 worked as planned, up to and including the call to `are_equal()', so
 the problem must be inside this function.  To investigate further, we
 must begin "stepping through" the lines of `are_equal()'.  We start by
 typing `n' (for "next"):
 
      dgawk> n
      -| 67          if (fcount > 0) {
 
    This tells us that `gawk' is now ready to execute line 67, which
 decides whether to give the lines the special "field skipping" treatment
 indicated by the `-f' command-line option.  (Notice that we skipped
 from where we were before at line 64 to here, since the condition in
 line 64
 
      if (fcount == 0 && charcount == 0)
 
 was false.)
 
    Continuing to step, we now get to the splitting of the current and
 last records:
 
      dgawk> n
      -| 68              n = split(last, alast)
      dgawk> n
      -| 69              m = split($0, aline)
 
    At this point, we should be curious to see what our records were
 split into, so we try to look:
 
      dgawk> p n m alast aline
      -| n = number (5)
      -| m = number (5)
      -| alast = array, 5 elements
      -| aline = array, 5 elements
 
 (The `p' command can take more than one argument, similar to `awk''s
 `print' statement.)
 
    This is kind of disappointing, though.  All we found out is that
 there are five elements in each of our arrays.  Useful enough (we now
 know that none of the words were accidentally left out), but what if we
 want to see inside the array?
 
    The first choice would be to use subscripts:
 
      dgawk> p alast[0]
      -| "0" not in array `alast'
 
 Oops!
 
      dgawk> p alast[1]
      -| alast["1"] = string ("awk")
 
    This would be kind of slow for a 100-member array, though, so
 `dgawk' provides a shortcut (reminiscent of another language not to be
 mentioned):
 
      dgawk> p @alast
      -| alast["1"] = string ("awk")
      -| alast["2"] = string ("is")
      -| alast["3"] = string ("a")
      -| alast["4"] = string ("wonderful")
      -| alast["5"] = string ("program!")
 
    It looks like we got this far OK.  Let's take another step or two:
 
      dgawk> n
      -| 70              clast = join(alast, fcount, n)
      dgawk> n
      -| 71              cline = join(aline, fcount, m)
 
    Well, here we are at our error (sorry to spoil the suspense).  What
 we had in mind was to join the fields starting from the second one to
 make the virtual record to compare, and if the first field was numbered
 zero, this would work.  Let's look at what we've got:
 
      dgawk> p cline clast
      -| cline = string ("gawk is a wonderful program!")
      -| clast = string ("awk is a wonderful program!")
 
    Hey, those look pretty familiar!  They're just our original,
 unaltered, input records.  A little thinking (the human brain is still
 the best debugging tool), and we realize that we were off by one!
 
    We get out of `dgawk':
 
      dgawk> q
      -| The program is running. Exit anyway (y/n)? y
 
 Then we get into an editor:
 
      clast = join(alast, fcount+1, n)
      cline = join(aline, fcount+1, m)
 
 and problem solved!
 
Info Catalog (gawk.info.gz) dgawk invocation (gawk.info.gz) Sample dgawk session
automatically generated by info2html