NAME
    LaBrea::Tarpit::Get

SYNOPSIS
      use LaBrea::Tarpit::Get;

      ($rv,$host,$port,$path)=parse_http_URL($url)
      ($handle,$host,$port,$path)=open_http(*S,$url);
      $rv=parse_http_response(\$buffer,\%response);
      $rv=short_response($url,\%response,\%content,$timeout);
      $line = make_line($url,$err,\%content);
      $rv = not_hour($file);
      $rv = not_day($file);
      $rv=auto_update($url,$file,$cur_ver,$timeout);

DESCRIPTION - LaBrea::Tarpit::Get
    Module connects to a web site running
    LaBrea::Tarpit::Report::html_report.plx and retrieves a short_report as
    described in LaBrea::Tarpit::Report.

    Run "examples/web_scan.pl" from a cron job hourly or daily to update the
    statistics from all know sites running LaBrea::Tarpit. A report can then
    be generated showing the activity worldwide.

     # MIN HOUR DAY MONTH DAYOFWEEK   COMMAND
     30 * * * * ./web_scan.pl ./other_sites.txt ./tmp/site_stats

    See: LaBrea::Tarpit::Report::other_sites

    ($handle,$host,$port,$path)= parse_http_URL($url);
      Separate an http URL into its components

        input:        URL of the form
              http://www.foo.com[:8080]/file.html

        https:// service is not supported

        returns: (undef, error message)
                      or
                 (file_handle,hostname,port,path)
              where port and path may be empty

    ($handle,$host,$port,$path)=open_http(*S,$url);
      Open connection to http target

        input:        *S,$url [default port = 80]
        returns:      (undef, error)  on error

                      (file_handle,
                       hostname,
                       port
                       path )         on success

    $rv=parse_http_response(\$buffer,\%response);
      Parse an http server response into a hash of headers.

        i.e.  (representative, will vary)

        rc             => 200
        msg            => OK
        date           => Wed, 24 Apr 2002 21:46:30 GMT
        server         => Apache/1.3.22
        protocol       => HTTP/1.1
        content-type   => text/plain
        content-length => 92
        last-modified  => Wed, 24 Apr 2002 21:46:34 GMT
        expires        => Wed, 24 Apr 2002 21:47:04 GMT
        connection     => close
        content        => (complete text buffer)

        input:        \$text_in, \%response
        returns:      true on success, %response filled
                      false on failure

        NOTE:         %response{rc}   (server response code)
                      %response(msg}  (server messages)
                      are ALWAYS filled with something.
                      In the case of server failure, the 
                      cause of the failure will be inserted
                      into %response(msg} and undef returned.

    $rv=short_response($url,\%response,\%content,$timeout);
      Fetch the short report from "$url" and place the headers in
      "%response", the content, parsed, in "%content". Optional "$timeout",
      default is 60 seocnds.

      %response contains http headers

      %content contains key => value pairs

        LaBrea        => version
        Tarpit        => version
        Report        => version
        Util          => version
        now           => seconds since epoch (local)
        tz            => time zone (i.e. -0700)
        threads       => number of threads
        total_IPs     => total IP's
        bw            => bandwidth

        input:        URL,    # complete url
                 i.e. www.foo.com/html_report.plx
                      \%response,
                      \%content,

        returns:      false on success
                      error message on failure

    $line = make_line($url,$err,\%content);
      Make a line of text summarizing the short report where "$err" is the
      return value from "short_report"

        Format:

        url threads total_IPs bw time tz version:nn:nn:nn
          or
        url error message

    $rv = not_hour($file);
      Check if the file has been accessed this hour;

        input:        path/to/file
        returns:      true, not current hour
                      false if accessed this hour
                      or non-existent or not readable

    $rv = not_day($file);
      Check if the file has been accessed this day;

        input:        path/to/file
        returns:      true, not accessed this day
                      false if accessed this day
                      or non-existent or not readable

    $rv=auto_update($url,$file,$cur_ver,$timeout);
      Update the 'other_sites.txt' file from $url on a daily basis only.

        input:  url,  # complete url to 'other_sites.txt'
              # http://scans.bizsystems.net/other_sites.txt

                file, # path to your 'other_sites.txt'

                cur_ver       # optional current version
              # the current file will be opened and scanned
              # if this is not supplied

                timeout       # wait for http response        
              # default 60 seconds
        returns:      false on success or no update needed
                      error msg on failure

EXPORT_OK
            parse_http_URL
            open_http
            parse_http_response
            short_response
            make_line
            not_hour
            not_day
            auto_update

COPYRIGHT
    Copyright 2002 - 2004, Michael Robinton & BizSystems This program is
    free software; you can redistribute it and/or modify it under the terms
    of the GNU General Public License as published by the Free Software
    Foundation; either version 2 of the License, or (at your option) any
    later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
    Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, write to the Free Software Foundation, Inc.,
    59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

AUTHOR
    Michael Robinton, michael@bizsystems.com

SEE ALSO
    perl(1), LaBrea::Tarpit(3), LaBrea::Codes(3), LaBrea::Tarpit::Report(3),
    LaBrea::Tarpit::Util(3), LaBrea::Tarpit::DShield(3)

