FastCGI: Persistent Applications for Your Web Server
The Common Gateway Interface is nearly as old as the web server itself. NCSA added the CGI 1.0 specification to version 1.0 of the granddaddy of all web servers, httpd. CGI 1.1, the current specification, was added to the 1.2 release. Every popular web server package developed since then has incorporated CGI as a way—usually the way—for web-borne visitors to access server-based executables.
CGI works well with small or infrequently used programs in which their sole function is to respond to one-time requests, such as processing simple information from HTML forms. There's no sense in clogging up your memory or process table with small applications invoked only a few times an hour.
The opposite is true, however, for complex or frequently used programs. Your web server can slow to a crawl if your site depends on a script with a long initialization process; in particular, one that involves connecting to a database or reading and structuring information from large text files. Speed issues are even more critical for small sites with servers that also process mail, FTP or DNS requests.
CGI applications must be launched anew with each invocation, a limitation that leads to two problems. First, the hardware and operating system have to deal with the overhead of creating a process for every CGI request. Second, CGI scripts can't handle persistent variables or data structures; they must be rebuilt with each invocation.
Open Market's FastCGI interface is one way to overcome CGI's limitations. FastCGI applications are invoked via URLs just like their CGI counterparts. The difference is that they're persistent; they function like servers within a server. FastCGI offers benefits in three areas:
Speed: a FastCGI script goes through only one process-creation cycle. Initialization of data and database connections are done only once. A further benefit of FastCGI is that it can connect to processes running on remote machines, taking the processing burden off the main server.
Persistence: even if you don't have access to an SQL server, FastCGI enables many database-like functions by storing your complex methods, objects and variables in RAM. Data can be stored across sessions, providing a workaround for the statelessness of HTTP connections.
Process management: Apache's implementation of FastCGI gives the server daemon the ability to take care of FastCGI applications, automatically restarting them should they die off. Other servers may share this ability; my experience is limited to Apache.
FastCGI is not the first—nor, I'm sure, will it be the last—approach to move beyond CGI. Most web servers have APIs that allow developers to write new functions right into the server. Doug MacEachern's mod_perl module allows the Perl runtime library to be compiled directly into Apache, giving hackers the ability to write server modules entirely in Perl.
I prefer FastCGI to its alternatives for four reasons:
It's not language-specific. mod_perl and proprietary APIs all dictate that the developer use a certain programming language. FastCGI applications can currently be written in Perl, C/C++, Java or Python, and the standard is flexible enough that other languages could be added in the future.
It's not server-specific. Actually, the implementations of FastCGI are server-specific, but the standard is not tied to one software package. FastCGI is currently supported on Apache, Roxen, Stronghold, and Zeus; a commercial variation is available for Netscape and Microsoft servers from Fast Engines, http://www.fastserv.com/.
FastCGI applications don't run in the server's name space. If a FastCGI application dies, it doesn't take the server down with it. Also, since FastCGI scripts run as separate processes, they don't increase the size of the server executable.
It's scalable. FastCGI scripts can be configured to run remotely via a TCP/IP connection, providing a method for load sharing.
My server platform is a fairly standard Linux box: a 120MHz Pentium with 64MB RAM running the 2.0.27 kernel. If you already get decent performance from your hardware, you won't have any trouble with FastCGI.
As to software, I use Apache and Perl; the material below is unabashedly biased in their favor. If you want to do your coding in C/C++, tcl, Java or Python, or if you want to use different server software, I suggest you visit http://fastcgi.idle.com/ for further information. On the other hand, most of the coding hints I'll be providing are applicable to any language.
To use Perl and Apache, you'll need to do some recompiling. Apache needs to be rebuilt with the FastCGI module. You'll also have to compile a Perl module. A few months ago you would have needed to rebuild Perl—I'll provide instructions in case you need or want to do so—but now can probably get by on your current Perl build. Even if you're not an accomplished C programmer, however, the compilation process is fairly painless. Here's what you'll need:
A C compiler of recent vintage: I've used gcc 2.7.2.1 without any problems.
The Apache source code: I use Apache 1.3.0 (1.3.1 is the current revision). Apache comes with most Linux distributions, or you can download it from http://www.apache.org/.
The Perl 5.004 or 5.005 source code: I strongly recommend that you upgrade if you're using an older version. If nothing else, it's a good opportunity to take a peek at the new and improved Perl home page at http://www.perl.com/.
The mod_fastcgi source code: The current version (as of September 3, 1998) is 2.0.17. It's compatible with Apache 1.3.1 and is available at http://fastcgi.idle.com/fastcgi.tar.gz.
Documentation and sample scripts: These are available with the The FastCGI Developer's Kit, links to which are provided at http://fastcgi.idle.com/.
Sven Verdoolaege's FCGI.pm (http://www.perl.com/CPAN/modules/by-module/FCGI/) handles the Perl-to-FastCGI interaction; this is the module on which my examples are based. Alternatively, you can use Leonard Stein's CGI::Fast module included in the standard Perl distribution (in which case you'll need to tweak my example code a bit).
You may need AT&T's freely distributed Safe/Fast I/O (sfio) libraries, available from http://www.research.att.com/sw/tools/sfio/. Until last June, Perl needed to be rebuilt with sfio to be able to handle FastCGI I/O streams. The new versions of FCGI.pm work without sfio (at least I've had no trouble), but some posts to the fastcgi-developers mailing list suggest that there may still be a few kinks in the new module. My recommendation is that you try FastCGI on a stock Perl build before resorting to building it with sfio.
Once you've gathered the necessary source code, you'll be ready to spend some quality make time. Compilation is done in two segments: Perl and Apache. Either can be done first, but within each segment the steps have to be completed in a certain order.
Unless you know you need to recompile the Perl binary, you can skip down to the “Compiling FCGI.pm” section. If, however, you do need to recompile Perl, it's helpful to know a few things.
To begin the Perl compilation process, unpack and build sfio; the README will tell how. You'll also have to update your $PATH to include one of the newly created subdirectories; this is somewhat unusual, but required.
Second, build, test and install Perl. It can take awhile to work through Larry Wall's Configure script, but there are a few items for which you should not choose the default answer:
To the question “Directories to use for library searches”, answer $* $sfio/lib (where $sfio is the directory in which you unpacked sfio). The default answer to the next question, “Any additional libraries?”, should now include -lsfio.
To the question “Any additional cc flags?”, answer $* -I$sfio/include.
To the question “Any additional ld flags (not including libraries)?”, answer $* -L$sfio/lib.
To the question “Use the experimental PerlIO abstraction layer?” answer yes.
To the question “perl5 can use the sfio library, but it is experimental. You seem to have sfio available, do you want to try using it?”, answer yes.
I've never had trouble re-compiling Perl in this way, but the fastcgi-developers mailing list archive (available from http://www.findmail.com/list/fastcgi-developers/) has plenty of messages from people who have. A fairly complete set of directions for recompiling Perl with sfio can be found at http://fastcgi.idle.com/fcgi2.0b2.1/doc/fcgi-perl.htm.
Regardless of whether you had to recompile Perl, you'll need to unpack, build, test and install FCGI.pm according to the instructions provided with the source code. You can be fairly sure you're on the right track if FCGI.pm passes make test.
Before dealing with the Apache distribution, unpack the mod_fastcgi source code. Read the INSTALL file, which details the two ways to configure, compile, and install Apache. The first, the Apache Autoconf-style Interface (APACI), is new to version 1.3. The second is the tried-and-true manual configuration we all know and love. Then unpack Apache's source code and configure it for the FastCGI interface:
Copy or move the mod_fastcgi distribution directory to <apache_dir>/src/modules/fastcgi.
Configure Apache using either APACI or the manual method as detailed in the mod_fastcgi INSTALL file. I'm pretty accustomed to dealing with the Configuration file, so I usually do it the old way.
If you're using Apache 1.2.x or 1.3.x and you're not running out of RAM, try uncommenting the line containing mod_rewrite. This is a tremendous extension to Apache that allows it to parse incoming URLs as regular expressions. See the sidebar, “Health and Beauty the Rewrite Way”, for my line of reasoning in this regard.
Run make.
Apache gets all of its runtime directives from three files found in $apache/conf: access.conf, httpd.conf and srm.conf. Following the suggestion given by Ben and Peter Laurie in Apache: the Definitive Guide, I put all directives in httpd.conf and don't use the other two files. If you use all three files, the configuration changes will occur in srm.conf.
Before doing any configuration, you'll need to read the documentation included with the mod_fastcgi source code. The docs/mod_fastcgi.html document is somewhat dated, but still very useful for getting you started. No author is listed, but I'd gladly buy him or her a beer for putting together a truly excellent resource, and thereby making my job much simpler.
Let me also say that you should have more than a passing familiarity with httpd.conf before altering it. Take a good look at the documentation that comes with the source code or buy yourself a copy of the Lauries' book.
The FastCGI configuration directives (see sidebar “Configuring Apache for FastCGI”) allow you to accomplish two essential tasks.
First, define the local pathway to the FastCGI applications using the AppClass directive and/or the remote connection host and port number via ExternalAppClass. AppClass is responsible for starting and managing processes that run locally.
Second, associate your FastCGI applications with the proper handler or MIME type so that dealings with these files are handled by mod_fastcgi. Associate the handler “fastcgi-script” with a file or files based on location (SetHandler) or file extension (AddHandler). Alternately, you can associate the MIME type
application/x-httpd-fcgi
with a file or files based on location (ForceType) or file extension (AddType).
Note that your FastCGI applications cannot go into the normal CGI directory specified by ScriptAlias. Apache's way of assigning priorities leads it to attempt to handle any and all files in the CGI directory with the standard CGI module, which won't work with FastCGI applications.
In many ways, writing FastCGI scripts is not very different from traditional CGI programming. You must specify a Content-type (typically, “text/html”) if you're providing content. You can use Location and Status to specify redirects or other HTTP messages. Also, you have normal access to the %ENV hash.
From within scripts, STDIN and STDOUT can be accessed, but only in standard ways. The FastCGI library manipulates those data streams quite heavily; you can print without trouble, but more advanced operations will fail. You can't, for example, send a reference to a typeglob (a symbol table entry) of STDOUT (\*STDOUT) to a forked process. In fact, FastCGI is fairly scornful of forking, and I haven't heard any reports at all from someone trying to run it on a thread-enabled version of Perl 5.005.
The main difference, structurally speaking, between CGI and FastCGI scripts is that the main body of code is placed within a while loop, one which hopefully never ends. The basic structure of a FastCGI script is pretty much the same regardless of its task:
Initialize variables and connections to databases, daemons, etc.
Do the loop.
Provide for cleanup so you can exit gracefully when needed.
Although FastCGI will force few substantive changes in your code, it will likely change your perspective on what makes a good script. Some of the lessons I've learned while developing FastCGI applications are:
Think clean. Typical CGI scripts don't need to be excessively concerned with memory leaks or sloppy variable scoping. FastCGI scripts, since they're persistent, have to keep a tighter rein on things.
Think big. We're used to thinking of CGI scripts as fast one-timers that should define the fewest functions necessary to get the job done. With FastCGI, it's usually better to have lots of functionality in one script; you have easier access to shared data and fewer PIDs littering your process table. I try to use the main script (the one specified in httpd.conf) as a distribution center, jobbing out all the real work to modules. Doing so makes it easy to extend the main script's functionality with just an extra line or two of code; all your tweaking can be done on the module.
Think long-term. You want your process to keep running, so it's wise to not let your script die() or croak(). Catch the return value of any statement whose failure might prove fatal (such as open()) and rely on error messages and flow control to keep the loop running.
Webmasters of commercial sites hate to admit it, but getting advertisements on-line is an increasingly unavoidable fact of the job. If you have multiple sponsors in a rotation, or if your sponsors each have multiple ads, there's no way to hardcode the ad into a page stored on disk. Of course, this is true for any information likely to be presented on a rotating basis: news, current specials or random links.
The rotate.fcg script shown in Listing 1 provides a bare-bones approach to meeting that need. It provides a persistent array of ad information that can be inserted wherever you choose on any disk-based document. It also allows the ad array to be updated without having to re-start the script (although this technique won't work if you're running multiple instances of the script).
Based on the Apache configuration shown in the Apache sidebar, the URL to invoke the script is http://www.yoursite.com/fastcgi-bin/rotate.fcg?page.html, where “page.html” is the name of a document into which you'd like to insert an ad. page.html can contain one or more instances of an HTML comment that serves as a placeholder for the ad:
<!-- Ad Here -->
Using an HTML comment in this capacity means that the document will display correctly, even if you have no ad to put there yet.
The script's opening section scopes and initializes all variables to be used for the life of the process. Three things are worthy of note in this section. First, since we initialize @ads outside the loop, it will stay persistent for the life of the script. Second, we need to initialize the %ENV array ourselves, lest we find it empty later on down the line. Third, we set $| to a non-zero number, because we want to flush STDOUT every time the script is invoked.
Right before the script enters the main loop, it initializes the array of ads by calling the initialize routine. This routine reads a text file of the sort shown in Listing 2. The data for each sponsor are temporarily put into the %sponsor hash, formatted into HTML and pushed into the @ads array. If the text file can't be opened, the routine returns an empty array, allowing the script to run anyway.
The main action takes place in the loop labeled REQUEST. The while command is the only place the script interacts explicitly with FCGI.pm. It's also the only substantive difference between a FastCGI script and a traditional one. Regardless of the language you use for FastCGI programming, a loop like this one will be the structure in which you frame the script's main process.
Once in the loop, the first task is to allow the webmaster to re-initialize the ad array on the fly. In the example script, this is accomplished by placing a request to http://www.yoursite.com/fastcgi-bin/rotate.pl?reload. To provide a little security, the script allows re-initialization only from the web server. If you're running multiple instances of a script, you'll have to accomplish this by some other means: restarting Apache with kill -USR1, reloading the data file if its timestamp has changed, etc.
If you used a script like this to run current news headlines, it would be easy to post new updates to your site several times each day by adding them to the text file and re-initializing the array.
The loop's second task is to make sure that the requested file can be opened. If it can't, the script calls a routine (not included in my example) that would send off a “File Not Found” message. By providing its own error message, the script can recover gracefully from a bad request without having to die off. If it is available, the requested document is assigned to the @doc array.
Next, an ad is pulled off the front of the @ads array, assigned to $ad, then pushed to the back of the array. The script retains a copy of the ad, even though it's been put back in the array.
Fourth, the script cycles through the document looking for any instances of <!-- Ad Here -->. When it finds one, it substitutes the $ad for it. If the text file containing the ads is empty or unopenable, or if the requested page has no place for an ad, no substitutions are made.
Finally, the script prints the appropriate HTTP header, sends off the document and heads back to the front of the loop to wait for the next request.
My example script doesn't tackle many of the tasks at which FastCGI excels: persistent database connections, format translation (e.g., SGML to HTML) or providing common HTML page headers and footers. At the site I manage, I use FastCGI to do all these things and more.
I've found that a FastCGI application can perform its duties, including multiple SQL queries, and deliver a page on the fly only slightly slower than the server can deliver static documents. On a 10Mbps LAN connection the speed difference is perceptible, but just barely, and only if I'm looking for it. Over a 128Kbps or slower connection, I notice no difference.
I still use CGI to perform simple, infrequently needed tasks. A CGI script doesn't hog system resources for very long. For complex, frequently invoked tasks, FastCGI provides a great combination of flexibility and speed.
The two listings referred to in this article are available by anonymous download in the file ftp.linuxjournal.com/pub/lj/listings/issue55/2607.tgz.
Paul Heinlein (heinlein@teleport.com) lives with his family near Portland, Oregon and is Webmaster at http://www.computerbits.com/. When he and his daughter aren't playing CD-ROM-based games, Paul indulges his odd hankering for Lutheran theology and hymnody.