Embperl: Modern Templates
Earlier this year, I described mod_perl, a module for the Apache web server that embeds a full version of Perl inside Apache. Not only does this allow you to write CGI-style programs that overcome CGI's bottleneck problems, but it also gives you access to Apache's innards, letting you configure your server in many new ways. A number of developers have begun to take advantage of this flexibility, configuring Apache in new and clever ways.
One such clever idea is Embperl, written by Gerald Richter (richter@dev.ecos.de). Embperl allows you to create hybrid pages of HTML and Perl. As we have seen in several previous columns, templates allow designers and programmers to modify their respective parts of a web site without getting in each other's way. If the programmer wants to modify the logic, he or she can do so by modifying the Perl parts of a template. By the same token, designers can modify the look and feel of a page without having to ask the programmer to change a few print statements in a CGI program.
Embperl is but one of several template systems available for mod_perl. Another contender for this role is ePerl, about which I have read quite a bit, but haven't yet had a chance to try. Another solution, which uses Perl but doesn't depend on mod_perl or Apache, is Text::Template, a module I have used in previous columns when discussing templates. Finally, PHP is an embedded scripting language that resembles C and Perl in many ways, and is designed to be interspersed with HTML inside of documents. To find more information about all of these, including URLs, see Resources.
Before we can use Embperl, it's important to understand how HTTP requests and responses are formed, and how a web server performs its job. When you click on a web page link, your browser connects to the host name in the URL and sends a short request to the server. The request consists of a verb (typically GET or POST), the name of the document being requested, and the version of HTTP that the browser supports. For example, to request the root document from a web server, a browser will typically send
GET / HTTP/1.0
to the server. It is the server's responsibility to handle the request, responding with an error message or a document. Depending on which version of HTTP the browser is running, the server might return multiple documents in the same response, demand some sort of user authentication before continuing, or redirect the user's browser to a different URL.
In many cases, though, the server will not return a document at all. Instead, it will run a program, returning the program's output, rather than its contents. This is how CGI programs work: the server is configured such that all files in a certain directory are treated as programs, rather than documents to be retrieved verbatim. (Indeed, security concerns arise when users can retrieve programs' contents, rather than seeing their output.) As far as the browser is concerned, it requested a document and received one in response. The magic happens on the server side, where the program is executed and produces its output.
A price is paid for CGI programs, above and beyond their execution times: because web servers fork a separate process for each CGI program, and Perl (and other popular scripting languages) can have a long start-up time, it often takes longer for the program to get started than for it to actually run.
For this reason, each web server has developed its own native API that allows programs to bind more closely to the server's internal code than would be possible with CGI. Netscape's NSAPI and Microsoft's ISAPI are two examples of such proprietary systems, and Apache's mod_perl is an example of how similar functionality can be given to Perl programmers. With mod_perl installed in your server, operations speed up tremendously, because the server compiles the program once, rather than each time it is run. In addition, because the program never requires creating a separate process, the overhead associated with executing such programs is relatively low.
Mod_perl is perhaps best known for allowing programmers to write very fast CGI-like programs. However, since Apache's internals are available via mod_perl, it is possible to write Perl programs that change one or more steps in Apache's processing of outgoing documents. These can range from the mundane to the fancy; in Embperl's case, we are setting a special PerlHandler for particular documents. In the Apache world, a “handler” is a program that does something special with the files in a directory before returning them to the HTTP client. You can think of a handler as a middleman between Apache and the file; the handler grabs the file and modifies it as necessary, handing the finished product to Apache. Apache then takes this finished product and returns it to the user's browser in the HTTP response.
Note that installing Embperl can be a bit tricky. The documentation is generally good and describes all of the steps necessary to install it on your own computer. I have installed it several times and found that each time required several tries before I managed to follow the directions correctly. I will describe the procedure here in some detail, but you might want to look at the FAQ file that comes with Embperl for more information. (Many of the following instructions are based on that FAQ.)
Before you begin, you should install or update the latest versions of several packages: LWP (the library for Web client programming), HTML::HeadParser (used for parsing HTML document heads), CGI.pm (the super-module that handles everything having to do with CGI), and MIME::Base64 (which handles the encoding information to and from Base64, which is used in the MIME standard). All of these are available from CPAN (see Resources).
Both mod_perl and Apache must be recompiled in order to get Embperl running. It is possible for Embperl to run as an external CGI program, rather than from within mod_perl, but you will then lose the speed benefits of mod_perl. I strongly suggest going the mod_perl route, unless you are using a web server other than Apache, or if you would rather not recompile things just now.
For starters, then, you will need the source for Apache (from http://www.apache.org/), mod_perl (from CPAN, at http://www.perl.com/CPAN/) and Embperl (also from CPAN, as HTML-Embperl-1.0.0). On my machine, these packages were named as follows:
HTML-Embperl-1.0.0.tar.gz apache_1.3.0.tar.gz mod_perl-1.12.tar.gz
I am sure that newer versions of these programs will be available by the time you read this article. However, you should be able to follow this discussion by updating the version numbers as appropriate.
First, unpack all of the files using the command:
for file in `ls *gz`; do tar -zxvf $file; done
Before you can start to compile the components, you will have to set some of the configurations and modify Makefiles. First of all, go into the mod_perl directory and edit src/modules/perl/Makefile:
/downloads/mod_perl-1.12/src/modules/perl/MakefileYou will have to make three changes to this file. First, add the HTML::Embperl to the definition of STATIC_EXTS that will be grabbed by the mod_perl configuration system. That is, edit the line (line 98, in mod_perl-1.12):
#STATIC_EXTS = Apache Apache::Constantsand change it to:
#STATIC_EXTS = Apache Apache::Constants HTML::EmbperlNext, look for the line that begins with OBJS= (line 131 in mod_perl-1.12). Just before that line, define the variable EPDIR so that it points to your Embperl build directory. For instance, assuming that we are building Embperl in /downloads/HTML-Embperl-1.0.0, we will set it to:
EPDIR=/downloads/HTML-Embperl-1.0.0We will now modify the OBJS variable such that it creates the object files for Embperl as well as mod_perl:
OBJS=$(PERLSRC:.c=.o) $(EPDIR)/Embperl.o \ $(EPDIR)/epmain.o $(EPDIR)/epio.o \ $(EPDIR)/epeval.o $(EPDIR)/epcmd.o \ $(EPDIR)/epchar.o $(EPDIR)/eputil.oDon't forget to put backslashes at the end of each continued line, so that make doesn't think the second and third lines should stand on their own.
The hardest part is over. All we have to do now is configure and compile the various components. Make sure to do them in the right order, though, or things might not work correctly.
First, enter the mod_perl directory and create the Makefile using the standard Perl command perl Makefile.PL.
If you want some or all of mod_perl's capabilities, now is the time to specify that. I tend to activate all of them (except for two that need explicit activation), so I enter perl Makefile.PL EVERYTHING=1. This will begin the mod_perl configuration process. You will be asked if you want to use the Apache source code in the parallel directory, and then if you want mod_perl to build httpd for you. Answer “yes” to both questions.
When the configuration script has finished running, go into the Embperl directory (/downloads/HTML-Embperl-1.0.0) and configure the module using the same command:
perl Makefile.PL
Once again, the system will perform a variety of configurations. You will be asked if you want Embperl to support Apache, and then if it should use the Apache source code in the parallel directory. Again, answer “yes” to both questions. Finally, you will be asked for a path name to the copy of httpd that will be used in testing. Check that the default is correct, and correct it if necessary.
Now we can actually create Embperl by typing make in its directory. After the compilation is complete, switch back to the mod_perl directory and create mod_perl and Apache by typing make.
Congratulations. You should now have working copies of Apache, mod_perl and Embperl. At this point, we could run make install in each of the three directories to install the software, or we can test Embperl. If you are interested in testing your Embperl compilation without installing it, I suggest that you read the FAQ. The directions are not that difficult to follow, but they are more complex than I can describe in the space provided here.
Now that Embperl is part of your copy of Apache, what can you do with it? Not much at this point, since we have not yet defined the handler for our Embperl files. Now we will have to modify Apache's configuration files, which might be in a number of possible places. When I installed Apache, I accepted the default installation locations (under /usr/local/apache), and my path names will reflect that.
In order to get Embperl working, we will need to modify two of Apache's configuration files. (Each of the three files can actually contain any of the configuration directives, but certain items are traditionally put in certain files.) I told the server to redirect URLs beginning with /embperl to /usr/local/apache/share/embperl by adding the following lines to the srm.conf file:
Alias /embperl /usr/local/apache/share/embperl
Next, I told Apache to install Embperl as the handler for that directory. As I mentioned above, this means that HTML::Embperl will be called each time Apache is asked to retrieve a document from /embperl. The file will be read from disk, handled by Embperl, and finally given to Apache, which returns it to the user's computer. I added the following to the access.conf file:
<Location /embperl> SetHandler perl-script PerlHandler HTML::Embperl Options ExecCGI </Location>Once we have installed these changes, we restart Apache with:
/usr/local/apache/sbin/apachectl restart
Embperl files look just like HTML files, with a minor difference: square brackets signify special sections of code, which are interpreted separately. In other words, you can put stock HTML files in an Embperl directory, although I would tend to advise against doing so, because of the additional overhead involved. Why force Embperl to look at a file unnecessarily? For that reason, some sites have decided to use a special suffix—.htmpl, perhaps—and then to configure Apache so that all files with that suffix, regardless of directory, are interpreted. That allows HTML files to be mixed in with their Embperl counterparts.
The following file, when retrieved from within a directory defined for Embperl, will print the current time:
<HTML> <Head><Title>Current time</Title></Head> <Body> <P>This is Embperl</P> <!-- Below are the square brackets --> <P>[+ localtime(time) +]</P> </Body> </HTML>
Retrieving this file from an Embperl directory will produce the same output as the following short CGI program:
#!/usr/bin/perl -w use strict; use diagnostics; use CGI; # Create a new instance of CGI my $query = new CGI; # Send a MIME header print $query->header("text/html"); # Send the HTML print $query->start_html(-title => "This is Embperl"); print scalar localtime(time); print $query->end_html;However, Embperl has several advantages over a CGI program. For one, running it under mod_perl gives it a distinct speed advantage. Of course, we could modify our CGI program and/or Apache configuration so that the program would run under Apache::Registry, the mod_perl module that handles CGI-like programs.
The biggest advantage, though, is the clean separation between static and dynamic content. No longer does the programmer become the bottleneck, slowing down design and content changes—now the site's designers and editors can modify the HTML, so long as they stay away from the Perl inside square brackets. There will obviously be times when the embedded Perl code will affect the design, and the programmer can be included in such cases. But for the most part, such a separation allows everyone to do what they do best.
We have already seen one form of the Embperl brackets, namely [+ and +]. Anything in square-plus brackets is evaluated as Perl code, with the results inserted into the HTML document and passed to the browser. Remember that the result of evaluating a Perl variable or string is the value of that variable or string. It's very common for square-plus brackets to contain a single variable name, whose contents are inserted into the document at the indicated point. Don't use the print function to insert things into the Embperl document, because print sends output to STDOUT, and then returns a result indicating whether it was successful. Each set of square-plus brackets can contain as much or as little Perl as you might like, although most Embperl programmers seem to prefer keeping the lines short.
Output from square-plus brackets is placed directly into the file without any additional formatting. If you want something to be in paragraph tags, boldface, italics or a different font, it is your responsibility to make sure that happens. Most often, you will want to surround the square-plus brackets with the appropriate HTML tags, so that the resulting output will be correctly formatted. That is, you could make a variable's value italic by saying
[+ "<i>$variable</i>" +]
but, for the sake of maintenance and separating static and dynamic content, it's better to say:
<i>[+ $variable +]</i>What if you don't want the results of your code to be inserted into the document? You could end each set of Perl expressions with the empty string, as in:
[+ $counter++; &get_user_info($id); "" +]which will insert the empty string into the document, since it was the last element to be evaluated. But a better solution would be square-minus brackets ([- and -]), which do that for you automatically. For example:
<HTML> <Head><Title>Current time</Title></Head> <Body> <P>This is Embperl</P> <!-- Square-plus brackets --> <P>[+ localtime(time) +]</P> <!-- Square-minus brackets --> <P>[- localtime(time) -]</P> </Body> </HTML>Output from the above Embperl will look the same as the one without square-minus brackets, since the output from operations performed in square-minus brackets aren't inserted into the HTML. This is useful when making variable assignments, as well as when importing other Perl modules. For example:
<HTML> <Head><Title>Print user information</Title></Head> [- $user = $ENV{"REMOTE_USER"}; -] <Body> <P>This is Embperl</P> <P>[+ &print_user_profile($user) +]</P> </Body> </HTML>
As you can see, variable assignments are kept across square brackets, meaning that you can assign a variable in one block and refer to it later. Variables are global by default, but you can use Perl's “my” convention to create temporary variables, which go out of scope at the end of the block.
One of the nice things about mod_perl is that it compiles programs once, caching them for future invocations. Not only do you save the overhead of forking a new process, but the program runs much faster since it only needs to be interpreted. In many cases, you want variable values to remain intact across several invocations of a program. Such persistence allows you to log into a database server only once, keeping a connection open through the duration of many HTTP requests.
This raises the question of what happens to variables you define in an Embperl document—do they also keep their values across invocations, or do they disappear? The answer is that each Embperl document is processed in its own package, and the variables defined in that package are reset by default upon each invocation. However, variables defined in other packages are kept across invocations. The following Embperl document demonstrates how this works:
<HTML> <Head><Title>Current time</Title></Head> [- $counter++; -] [- $remain::counter++; -] <Body> <P>This is Embperl</P> <P>Counter: [+ $counter; +]</P> <P>remain::counter: [+ $remain::counter; +]</P> </Body> </HTML>
If you try this on your system, you may well discover that $counter always remains at 1, while $remain::counter is incremented with each invocation. However, if you are running more than a single copy of Apache, $remain::counter probably jumps around, as if several different copies of it were being incremented. This is indeed the case, since each copy of Apache is running its own copy of mod_perl and Embperl. If you rely on persistent variables across invocations, remember that a given user might connect to more than one copy of Apache, and you cannot rely on the same copy always being available to the same user.
However, persistent variables can be useful when making connections with other than the user's computer. In particular, DBI (the Perl database interface) can take advantage of this with the Apache::DBI module. This module opens a connection to a database server when it is first invoked, and then continues to use that connection throughout the life of the Apache process, immediately sending each query to the database server. Because the persistence is between Apache and the database server, it works regardless of whether a user connects to the same httpd process each time.
When defining subroutines inside of Embperl documents, it's probably best to use another kind of square brackets, with exclamation points as the special characters. Square-bang brackets ([! !]) are the same as square-minus brackets, except that the Perl code contained within is executed only upon the document's first invocation. If you are running Embperl under mod_perl, defining subroutines inside of square-bang brackets means they will be defined and compiled a single time, further increasing the speed of your program.
Finally, we come to square-dollar brackets ([$ $]), which allow you to enter Embperl meta-commands. These meta-commands, as you might imagine from the name, are actually part of a small programming language with which you can tell Embperl what to do.
Meta-commands allow you to make sections of HTML and Perl conditional, or to loop over them a given number of times. The same tasks could be performed inside of a normal Embperl block, since Perl is a full-fledged programming language and can handle conditionals and looping just fine. But by using the Embperl meta-commands, you can place even more HTML outside of the Perl blocks, making the Perl blocks somewhat smaller and easier to read.
For example, let's say we run a web site that requires registration. Assuming we have a function called &is_registered that returns “true” or “false”, depending on whether a user is registered with our system, we could print an appropriate greeting with the following code:
[+ ®istered($user_id) ? "You are known" : "You are unknown" +]
Once you start to deal with the formatting associated with those strings, the menus you might want to display for new users and the personalized home pages that registered users should see, the block of Perl inside of square-plus brackets becomes quite large. It's thus easier to use square-dollar brackets and Embperl meta-commands:
[$ if ®istered($user_id) $] You are known, your registered home page: [+ &output_home_page($user_id) +] [$ else $] Welcome, new user! We would like to ask you a few questions: [+ &output_questionnaire +] [$ endif $]The above, which I have indented in the style of a programming language, is easier to understand than a large block of Perl code. It is also more easily understood and modified by non-programmers on your site, who can clearly see the difference between HTML and other items.
Embperl has many features, far too many to describe here. My favorite feature is its ability to create HTML tables automatically, filling them in as necessary. Embperl looks for the beginning of an HTML table, marked with a <TABLE> tag, before filling it in. In order to do this, Embperl expects you to use a number of “magic” variables within the table. You can set the exact behavior with Embperl's $tabmode, but the basic idea is that within a table, $row (a magic variable) begins at 0 and increments until it reaches $maxrows (another magic variable). When an expression within the table returns “undefined”, Embperl exits from the table loop and stops incrementing $row. We can thus get a nicely formatted listing of environment variables with this code:
<HTML> <Head><Title>Environment</Title></Head> <Body> [- @keys = sort keys %ENV -] <Table border=2> <tr> [- $index = $row -] <td>[+ $keys[$row] +] </td> <td>[+ $ENV{$keys[$index]} +] </td> </tr> </Table> </Body> </HTML>
Notice how we first defined each array outside of the table definition. We then used $row (which is incremented automatically by Embperl) to retrieve each element from @keys.
Using the magic table fill-in procedure can be extremely powerful, but it requires you to change your programming style somewhat. Nevertheless, the potential uses for it in database applications are tremendous, since it greatly cuts down on the amount of necessary coding.
If you look at the list of environment variables, you might notice QUERY_STRING is unset. When invoking programs, QUERY_STRING is normally set by appending a question mark (?) and a string to a URL, but there is no reason why we cannot use the same syntax with Embperl documents, as in http://localhost/embperl/env.html?foo.
If the above environment-printing Embperl file is called env.html, then invoking it with the foo parameter should give QUERY_STRING a value.
Indeed, we can even use Embperl documents as the “action” of a CGI program. Grabbing values from the %fdat hash, Perl blocks within our document can retrieve form values, use them, and even construct a document based on them.
Embperl does require a slightly different style of programming than is usual in Perl. Typically, Perl is written in blocks of code, with each code returning a value. Embperl is much terser, with pairs of square brackets occurring much more often than Perl's curly braces. Of course, the style presented in the Embperl documentation and the above examples does not have to be your own; you can put entire Perl programs between square brackets.
The trend seems to be toward using templates and databases to design web sites, with more and more products appearing on the market that claim to do such things. The combination of Linux, Apache, mod_perl and Embperl not only makes for a cost-effective solution, but also a powerful combination of programming tools that is hard to beat. Next month, we will look at Embperl a bit more closely, and learn how we can use it with databases to easily create personalized home pages.
Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel, who has been using the Web since early 1993. In his spare time, he cooks, reads and volunteers with educational projects in his community. You can reach him at reuven@netvision.net.il.