(gawkinet.info.gz) Web page

Info Catalog (gawkinet.info.gz) Email (gawkinet.info.gz) Using Networking (gawkinet.info.gz) Primitive Service
 
 2.7 Reading a Web Page
 ======================
 
 Retrieving a web page from a web server is as simple as retrieving
 email from an email server. We only have to use a similar, but not
 identical, protocol and a different port. The name of the protocol is
 HyperText Transfer Protocol (HTTP) and the port number is usually 80.
 As in the preceding node, ask your administrator about the name of your
 local web server or proxy web server and its port number for HTTP
 requests.
 
    The following program employs a rather crude approach toward
 retrieving a web page. It uses the prehistoric syntax of HTTP 0.9,
 which almost all web servers still support. The most noticeable thing
 about it is that the program directs the request to the local proxy
 server whose name you insert in the special file name (which in turn
 calls `www.yahoo.com'):
 
      BEGIN {
        RS = ORS = "\r\n"
        HttpService = "/inet/tcp/0/PROXY/80"
        print "GET http://www.yahoo.com"     |& HttpService
        while ((HttpService |& getline) > 0)
           print $0
        close(HttpService)
      }
 
    Again, lines are separated by a redefined `RS' and `ORS'.  The `GET'
 request that we send to the server is the only kind of HTTP request
 that existed when the web was created in the early 1990s.  HTTP calls
 this `GET' request a "method," which tells the service to transmit a
 web page (here the home page of the Yahoo! search engine). Version 1.0
 added the request methods `HEAD' and `POST'. The current version of
 HTTP is 1.1,(1) and knows the additional request methods `OPTIONS',
 `PUT', `DELETE', and `TRACE'.  You can fill in any valid web address,
 and the program prints the HTML code of that page to your screen.
 
    Notice the similarity between the responses of the POP and HTTP
 services. First, you get a header that is terminated by an empty line,
 and then you get the body of the page in HTML.  The lines of the
 headers also have the same form as in POP. There is the name of a
 parameter, then a colon, and finally the value of that parameter.
 
    Images (`.png' or `.gif' files) can also be retrieved this way, but
 then you get binary data that should be redirected into a file. Another
 application is calling a CGI (Common Gateway Interface) script on some
 server. CGI scripts are used when the contents of a web page are not
 constant, but generated instantly at the moment you send a request for
 the page. For example, to get a detailed report about the current
 quotes of Motorola stock shares, call a CGI script at Yahoo! with the
 following:
 
      get = "GET http://quote.yahoo.com/q?s=MOT&d=t"
      print get |& HttpService
 
    You can also request weather reports this way.
 
    ---------- Footnotes ----------
 
    (1) Version 1.0 of HTTP was defined in RFC 1945.  HTTP 1.1 was
 initially specified in RFC 2068. In June 1999, RFC 2068 was made
 obsolete by RFC 2616, an update without any substantial changes.
 
Info Catalog (gawkinet.info.gz) Email (gawkinet.info.gz) Using Networking (gawkinet.info.gz) Primitive Service
automatically generated by info2html