Web Client Programming Using Perl
Many users of Linux are initially attracted to the platform for its powerful web server capabilities. Developing a web site on the Linux platform is a very satisfying experience. After the fun of designing and developing the site is over, it moves to production. Once in production, a web site typically has uptime requirements; to ensure these are met, the site must be monitored. I use a monitoring system that requests pages from a number of web sites, several times a day. If any server does not serve a web page, it is retried. If again there is no response, I am alerted via pager. Using this system, I can respond to an outage even before a user can report a problem. It takes only a little Perl and Linux to make this happen.
Linux and Perl were the obvious choices for a web server monitoring and alerting facility. Linux provides the stability and flexibility in the platform, and Perl, using the LWP bundle of modules (also known as libwww), provides an excellent means to work with HTTP requests.
I will explain how to compile/install the components needed, how to create a simple Perl script that will “HTTP ping” a server (HTTPping.pl), how to send a page to a pager using Perl, and how to glue it all together into an industrial-strength web site monitoring solution (Monitor.pl).
The wide availability of modules that extend Perl's functionality is one of the language's strong suits. In the case of fetching web pages, there are several modules that could be used. LWP is the clear choice, because it is a fantastic set of modules giving Perl powerful control over HTTP and HTML. Before you can begin exploring LWP and how you can use it to monitor a web site, you must install it.
The LWP modules depend on several other modules. These may not be installed on your system. If not, they must be downloaded and installed. Table 1 contains the exact version I installed, in the order I installed them. Except for SSLeay (more on that later), you can download all these packages from CPAN (Comprehensive Perl Archive Network). If you are new to Perl, it would be a good idea to browse around CPAN for a bit. It contains an enormous number of support modules for Perl that can save a lot of programming time and costs.
Once you have downloaded the tar.gz files, they must be expanded using the tar command:
tar -zxvpf MIME-Base64-2.11.tar.gz
This will uncompress and unpack the package. Do this for all archives. When finished, you will have a directory full of the original archives and a new folder for each archive. Each folder contains the module's installer; run the installer for each package. Again, using the MIME::Base64 module as the example, change (using cd) to the HTML-Parser-2.22 directory and type the following commands:
perl Makefile.PL make make test make installThis is the typical way of installing Perl modules. However, packages vary, and you should always read the README or INSTALL file and other documentation that comes with any module.
If you plan to HTTP ping servers that use SSL (secure sockets layer), you must compile and install SSLeay and Crypt-SSLeay. SSLeay is a set of programs that provides the cryptographic routines needed for SSL. Crypt-SSLeay is the Perl module that serves as a wrapper for SSLeay. I suggest you read the documentation that comes with SSLeay. It will help especially if you run into any problems when compiling.
Execute the following commands in the directory to which you uncompressed the SSLeay-0.6.6b.tar.gz archive to install SSLeay:
./Configure linux-elf make depend make make rehash make test make install
Three important things about SSLeay are:
Don't try to use the newer versions of SSLeay. They will not work with Crypt-SSleay and the LWP bundle. The last stable version is SSLeay-0.6.6b.tar.gz.
Second, SSLeay leaks memory. If you choose to use SSLeay and LWP to repeatedly HTTP ping a site that uses SSL, you will probably experience some problems with memory leaks. By calling HTTPping.pl from Monitor.pl, the script executes, pinging the site once, then terminates; this reduces the potential for a memory leak to accumulate, thereby causing your application to crash.
Since SSLeay uses RSA, you should read the SSLeay FAQ (see references) to determine if you can legally use SSLeay and if you need a license from RSA.
HTTPPing.pl sends a HEAD request to the web server to be monitored. The server responds by sending only the HTTP headers. If you are testing only if the server is up, this is all you need to know. It is a good idea to understand the basic HTTP methods, GET, POST and HEAD, when beginning web-client programming. To see how the HEAD command works, we can emulate a browser using telnet. I have a web server running on the machine that I use for development. If you do not have a server running on your local machine, replace localhost with the name of a production web server.
$ telnet www.cpan.org 80Trying 127.0.0.1... Connected to localhost. Escape character is '^]'.
Type the following, then press “enter” twice:
HEAD / HTTP/1.0This is the output returned by my server.
HTTP/1.1 200 OK Date: Fri, 21 May 1999 03:18:26 GMT Server: Apache/1.3.6 (Unix) (Red Hat/Linux) Last Modified: Wed, 0 1999 21:17:54 GMT ETag: "41803-799-370bcb82" Accept-Ranges: bytes Content-Length: 1945 Connection: close Content-Type: text/htmlThis block of text contains the HTTP headers for the requested file. It can be automated using Perl and LWP as shown in Listing 1.
The first line of the script determines which interpreter to use to run the script. If Perl is in a different location on your system, you will need to change your script accordingly. HTTPPing.pl accepts two arguments: the URL to HTTP ping and a debug flag. These arguments are placed into variables. LWP::UserAgent and an HTTP::Request objects are created. The Request is passed to the UserAgent, and a response is returned. The response is examined to see if the server is up. If the site is up, the script exits with a status of 1; if not, it exits with a status of 0. Optionally, determined by the debug parameter, the status of the site is verbosely stated as “up” or “down”.
To ping a site, enter the command:
$ perl HTTPPing.pl http://localhost/ 1 http://localhost is up.
If you have a site that uses SSL and a user name and password, you could use:
$ perl HTTPPing.pl https://username:password@localhost/ 1 https://username:password@localhost/ is up.
Now that you can HTTP ping your site, let's see how to send the alert page. Once that is accomplished, you can tie HTTPPing.pl and the SendPage subroutine together by calling them both from a Perl script running as a service.
Several options are available for sending a page from a Perl script: the Simple Network Paging Protocol (SNPP), TAP (telocator alphanumeric protocol) and proprietary paging interfaces. Many pager services allow SNPP access. Also, a Perl module, Net::SNPP, can make sending a page very simple.
My paging service provides a web-based paging interface. It is an HTML form which collects the recipient's pager PIN and the message, and puts (POST) that information in a CGI script at my paging service provider. Since this is a web-based interface, LWP is a natural for the job. The form requires three values to be posted: the command type, the PIN and the message. The command type is constant for sending pages to my type of pager; the other two values are passed in as arguments.
Listing 2. SendPage Subroutine using Skytel Web Site
Note that the request object allows you to create a POST request as well as the already familiar HEAD request. This subroutine is the one I use in my Monitor.pl script. Most paging service providers have web-based paging services (see Resources).
Alternately, if your paging service provider gives you access to an SNPP server, consider using that protocol. Part of the libnet bundle is the Net::SNPP module. It is a great tool for sending pages using the SNPP (RFC 1861). It also has the appeal of being open-standards-based. However, some SNPP servers are so slow that many times, the above method will be superior. Using the Net::SNPP module to send a page is shown in Listing 3.
Integrate HTTPPing.pl and whichever SendPage routine you chose. Monitor.pl repeatedly calls the HTTPPing.pl script at regular intervals. If HTTPPing.pl detects that the site is down, it retries the site to verify. If the site is verified as down, a page is sent. After sending the page, Monitor.pl continues to send HTTP pings to the site. When the site comes back up, another page is sent.
Below is an example of starting Monitor.pl, giving the URL to ping, the pager pin and the delays as arguments. The last argument is an optional debug flag that will show you what is going on inside the script.
perl Monitor.pl http://localhost 1234567 120 120 1
Once Monitor.pl is started, it will patiently test your site at an interval you choose for as long as you like. It is a good idea to start such a script during system boot.
Copy HTTPPing.pl and Monitor.pl to the /bin directory, and change their permissions to make them executable. This permits them to be run by the system at system boot.
chmod +x Monitor.pl chmod +x HTTPping.pl
You can start the Monitor.pl script when your system boots by taking advantage of a special script that is run every time your system boots. The /etc/rc.d/rc.local script is run by the system after all services have been started. By adding the following line to the end of your rc.local file, the Monitor.pl will be started at boot time.
Monitor.pl http://localhost 1234567 120 120 0 &Then use a tool such as tksysv to add it to the run levels you want it to start in. On my system, I boot to run level 3, so I added it to the list of the programs to start in run level 3. This makes the monitoring station even more robust by enabling the monitoring scripts to start up at boot time. See Linux Journal issue 55 for a good introduction to init and services.
The solid server performance of Linux and the rich and easy-to-use features of the LWP bundle and Perl would make it very easy to extend HTTPPing.pl and Monitor.pl to perform more complex web server checks. You could handle cookies and perform simulated user input into forms periodically to test applications. The possibilities of HTTP client programming are limitless with the LWP modules.
Robb Hill is the Senior Webmaster at the American Red Cross National Headquarters, where he is responsible for the technical aspects of the public and Intranet web servers. If you have questions, he can be reached at vortextube@earthlink.net.