(gawk.info.gz) Two-way I/O

Info Catalog (gawk.info.gz) Array Sorting (gawk.info.gz) Advanced Features (gawk.info.gz) TCP/IP Networking
 
 11.3 Two-Way Communications with Another Process
 ================================================
 
      From: brennan@whidbey.com (Mike Brennan)
      Newsgroups: comp.lang.awk
      Subject: Re: Learn the SECRET to Attract Women Easily
      Date: 4 Aug 1997 17:34:46 GMT
      Message-ID: <5s53rm$eca@news.whidbey.com>
 
      On 3 Aug 1997 13:17:43 GMT, Want More Dates???
      <tracy78@kilgrona.com> wrote:
      >Learn the SECRET to Attract Women Easily
      >
      >The SCENT(tm)  Pheromone Sex Attractant For Men to Attract Women
 
      The scent of awk programmers is a lot more attractive to women than
      the scent of perl programmers.
      --
      Mike Brennan
 
    It is often useful to be able to send data to a separate program for
 processing and then read the result.  This can always be done with
 temporary files:
 
      # Write the data for processing
      tempfile = ("mydata." PROCINFO["pid"])
      while (NOT DONE WITH DATA)
          print DATA | ("subprogram > " tempfile)
      close("subprogram > " tempfile)
 
      # Read the results, remove tempfile when done
      while ((getline newdata < tempfile) > 0)
          PROCESS newdata APPROPRIATELY
      close(tempfile)
      system("rm " tempfile)
 
 This works, but not elegantly.  Among other things, it requires that
 the program be run in a directory that cannot be shared among users;
 for example, `/tmp' will not do, as another user might happen to be
 using a temporary file with the same name.
 
    However, with `gawk', it is possible to open a _two-way_ pipe to
 another process.  The second process is termed a "coprocess", since it
 runs in parallel with `gawk'.  The two-way connection is created using
 the `|&' operator (borrowed from the Korn shell, `ksh'):(1)
 
      do {
          print DATA |& "subprogram"
          "subprogram" |& getline results
      } while (DATA LEFT TO PROCESS)
      close("subprogram")
 
    The first time an I/O operation is executed using the `|&' operator,
 `gawk' creates a two-way pipeline to a child process that runs the
 other program.  Output created with `print' or `printf' is written to
 the program's standard input, and output from the program's standard
 output can be read by the `gawk' program using `getline'.  As is the
 case with processes started by `|', the subprogram can be any program,
 or pipeline of programs, that can be started by the shell.
 
    There are some cautionary items to be aware of:
 
    * As the code inside `gawk' currently stands, the coprocess's
      standard error goes to the same place that the parent `gawk''s
      standard error goes. It is not possible to read the child's
      standard error separately.
 
    * I/O buffering may be a problem.  `gawk' automatically flushes all
      output down the pipe to the coprocess.  However, if the coprocess
      does not flush its output, `gawk' may hang when doing a `getline'
      in order to read the coprocess's results.  This could lead to a
      situation known as "deadlock", where each process is waiting for
      the other one to do something.
 
    It is possible to close just one end of the two-way pipe to a
 coprocess, by supplying a second argument to the `close()' function of
 either `"to"' or `"from"' ( Close Files And Pipes).  These
 strings tell `gawk' to close the end of the pipe that sends data to the
 coprocess or the end that reads from it, respectively.
 
    This is particularly necessary in order to use the system `sort'
 utility as part of a coprocess; `sort' must read _all_ of its input
 data before it can produce any output.  The `sort' program does not
 receive an end-of-file indication until `gawk' closes the write end of
 the pipe.
 
    When you have finished writing data to the `sort' utility, you can
 close the `"to"' end of the pipe, and then start reading sorted data
 via `getline'.  For example:
 
      BEGIN {
          command = "LC_ALL=C sort"
          n = split("abcdefghijklmnopqrstuvwxyz", a, "")
 
          for (i = n; i > 0; i--)
              print a[i] |& command
          close(command, "to")
 
          while ((command |& getline line) > 0)
              print "got", line
          close(command)
      }
 
    This program writes the letters of the alphabet in reverse order, one
 per line, down the two-way pipe to `sort'.  It then closes the write
 end of the pipe, so that `sort' receives an end-of-file indication.
 This causes `sort' to sort the data and write the sorted data back to
 the `gawk' program.  Once all of the data has been read, `gawk'
 terminates the coprocess and exits.
 
    As a side note, the assignment `LC_ALL=C' in the `sort' command
 ensures traditional Unix (ASCII) sorting from `sort'.
 
    You may also use pseudo-ttys (ptys) for two-way communication
 instead of pipes, if your system supports them.  This is done on a
 per-command basis, by setting a special element in the `PROCINFO' array
 ( Auto-set), like so:
 
      command = "sort -nr"           # command, save in convenience variable
      PROCINFO[command, "pty"] = 1   # update PROCINFO
      print ... |& command       # start two-way pipe
      ...
 
 Using ptys avoids the buffer deadlock issues described earlier, at some
 loss in performance.  If your system does not have ptys, or if all the
 system's ptys are in use, `gawk' automatically falls back to using
 regular pipes.
 
    ---------- Footnotes ----------
 
    (1) This is very different from the same operator in the C shell.
 
Info Catalog (gawk.info.gz) Array Sorting (gawk.info.gz) Advanced Features (gawk.info.gz) TCP/IP Networking
automatically generated by info2html