Building Sites with Mason
When we speak of dynamically generated web content, most people immediately think of CGI, the “common gateway interface”. CGI is portable across web servers, languages and operating systems, but that portability comes at the expense of efficiency. For example, every invocation of a CGI program written in Perl results in the creation of a new process, in which the program must be compiled and then interpreted.
Programmers willing to trade speed for portability have a number of options at their disposal. Many Perl programmers have chosen to use mod_perl, which makes it possible to modify Apache's behavior using Perl modules. A Perl module invoked by mod_perl is run inside the Apache process, removing the overhead associated with starting a separate CGI process. mod_perl also caches the compiled version of invoked modules, eliminating the need to compile them each time they are run. The result is a dramatic improvement in speed, as well as the flexibility to modify Apache quickly and easily using Perl.
For all its power, mod_perl has never appealed to me for creating large, dynamic web sites. True, the increase in speed is extremely impressive, and it is not that hard to work with. As we have seen in several previous editions of this column, writing a mod_perl module can be quite easy, and it integrates smoothly into a larger web site.
When a site needs to create a large quantity of dynamic output, much of which is written and designed by non-programmers, mod_perl's power is hampered by the need to use dozens or hundreds of modules, each servicing only a single URL or directory. The solution to this problem is to integrate mod_perl with templates, which intersperse HTML-formatted text with Perl code. We have looked at templates on several previous occasions and have seen their power and flexibility.
This month, we will look at Mason, a mod_perl module written by Jonathan Swartz, which attempts to solve many of these problems. It uses templates and encourages the use of separate “components”, which can be built up to create a large, dynamically generated site. Because these components exist in separate files, Mason offers additional advantages:
It caches components, providing a bigger speed boost than that from simple templates.
It provides a complete debugging and previewing environment.
It produces output files.
I first heard about Mason nearly two years ago, and kept telling myself I would look at it one day. I finally took a serious look at Mason several months ago, and was extremely impressed with what I saw—enough that I expect to do most of my future development with it.
The number of publicly distributed Mason components is still relatively small, which makes it seem like a poorer development environment than Zope and commercial template solutions. However, this situation already appears to be changing, and the number of components will most likely increase significantly during the coming months and years.
While Mason can work as a CGI program, it works best and most easily with mod_perl. (See Resources for information on obtaining and installing Apache and mod_perl.) Retrieve the latest version of HTML::Mason from CPAN and follow the same procedure as you would for any other module, detailed in the INSTALL file that comes with it.
However, using Mason is more complicated than simply saying “use HTML::Mason”. Because it works via mod_perl, which is part of Apache, the Mason configuration must be performed when Apache starts up. This is accomplished most easily by using a PerlRequire statement in the Apache configuration file, normally called httpd.conf:
PerlRequire /usr/local/apache/conf/mason.pl
The above statement tells Apache to execute the Perl statements in /usr/local/apache/conf/mason.pl when it first starts up. Any Perl modules imported or variables declared in mason.pl are placed into a section of memory shared across all Apache processes. This can mean a substantial memory savings, since Perl and mod_perl consume a large amount of memory, and most web servers run prefork child Apache processes.
At the very least, mason.pl must create three different objects:
The Mason parser ($parser), which turns each Mason component into a Perl subroutine.
The Mason interpreter ($interp), which executes the subroutines that were created by the parser. When creating the interpreter (HTML::Mason::Interp), we name two directories in which the interpreter reads and writes information: the Mason “component root” where the Mason components sit, and the Mason “data directory” in which caching and debugging information are stored. I normally set the component root to /usr/local/apache/mason and the data directory to /usr/local/apache/masondata.
The Mason ApacheHandler ($ah), which handles the HTTP request and generates a response.
Warning: for reasons that are not entirely clear, Mason cannot handle symbolic links. Specifying a symbolic link as a directory name will lead to mysterious “File not found” errors. On my system, /usr/local is a symbolic link to /quam/local; while the Mason documentation does mention this quirk, it was not explicit enough to save me more than half an hour of installation.
A bare-bones mason.pl is shown in Listing 1. Notice how mason.pl defines the subroutine HTML::Mason::handler, which is invoked once for each incoming HTTP request. In this way, Mason is able to handle each HTTP request; the ApacheHandler takes the request and hands it to the Interpreter, which then reads a compiled component from the cache or parses it as necessary.
More advanced Mason installations use mason.pl to define all sorts of additional behavior. For example, Mason comes with a previewer/debugger component, making it possible to trace through the execution of a component and its subcomponents. It is also possible to define different ApacheHandler objects, one for each type of browser or request type.
Once our mason.pl file is installed, we must tell Apache to let HTML::Mason handle some incoming requests, rather than using the default Apache handlers. This is where we connect the component root to the Apache Handler. For example, if the component root is /usr/local/apache/mason, we can say the following:
Alias /mason /usr/local/apache/mason <Location /mason> SetHandler perl-script PerlHandler HTML::Mason </Location>
The Alias directive tells Apache to translate every URL beginning with /mason to the Mason component root, /usr/local/apache/mason. The <Location> section tells Apache that every URL beginning with /mason should then be handled by the mod_perl handler HTML::Mason.
In the Mason universe, a “component” can return either HTML or a value. The former usually consists of HTML templates or template fragments, while the latter consists of subroutines and other code which are invoked by templates. All components share the same syntax, which should be familiar to anyone who has used a template system.
Perl code can be placed inside a component, bracketed by <% and %>. Any returned value is inserted into the component, replacing the Perl code that created it. For example, the following component (output.html) will display the current time of day each time it is invoked:
<HTML> <Head><Title>Current time</Title></Head> <Body> <H1>Current time</H1> <P>The current time is: <% scalar localtime %></P> </Body> </HTML>
I put the above into the file time.html and placed it in the component root directory. Immediately after doing so, I was able to go to the URL /mason/time.html and get the current time.
Mason supports two other types of Perl sections, which can be useful in different contexts. A % in the first column of a Mason component forces the entire line to be interpreted as Perl, rather than literally. This is best used for control structures (such as loops and if-then statements) that produce text strings, as in the following:
<HTML> <Head><Title>Current time</Title></Head> <Body> <H1>Months</H1> % foreach my $month (qw(Jan Feb Mar Apr May Jun % Jul Aug Sep Oct Nov Dec)) % { <P><% $month %></P> % } </Body> </HTML>
As you can see, the <% %> construct works in all contexts. In addition, lexicals declared at the top level of one Perl segment can be used within any other Perl segment.
Finally, long runs of Perl can be placed inside %perl blocks. This is best for doing heavy-duty computation, rather than simply retrieving variable values. For example:
<HTML> <Head><Title>Current time</Title></Head> <Body> <H1>Months</H1> <%perl> my @months = qw(January February March April May June July August September October November December); </%perl> <P>The current month is <% $months[(localtime)[4]] %>.</P> </Body> </HTML>
Once again, notice how the lexical (my) variable declared in the <%perl> section is available in the following <% %> section.
Experienced users of Text::Template and other Perl templating modules are probably not very impressed at this point. After all, there are dozens of ways to create templates of this sort, and many work with mod_perl for extra speed.
However, Mason's template syntax includes provisions for invoking other components, much as one subroutine might invoke another. (Indeed, since the Mason parser turns each component into a subroutine, this is not an incorrect analogy.) In some ways, this is like having a heavy-duty server-side include system, allowing you to standardize headers and footers. However, because components can return values as well as HTML output, and because Mason makes it possible to pass arguments to a component, things can get far more interesting.
One component can invoke another component with the special <& &> syntax. For example, the following invokes the component subcomp:
<& subcomp &>
Any HTML produced by subcomp is placed at the point where it was invoked, much like a server-side include. Each HTML page generated by a Mason site can consist of one, five, 10, 20 or more components. In this way, it is possible to assemble a page from individual elements—beginning with headers and footers and moving on to tables and pull-down menus. For example, here is a header component:
<!-- begin component: header.comp --> <Body bgcolor="#FFFFFF"> <H1>This is a header</H1> <!-- end component: header.comp -->And here is a footer component:
<!-- begin component: footer.comp --> <address> <a href="mailto:reuven@lerner.co.il"> reuven@lerner.co.il</a> </address> <!-- end component: footer.comp -->Finally, here is a top-level component in which the header and footer come from the above components:
<HTML> <Head><Title>Title</Title></Head> <& header.comp &> <P>This is the body</P> <& footer.comp &> </HTML>Notice, I gave file extensions of “comp” rather than “html” to the header and footer. This is simply a convention that enables me to differentiate between top-level components (which have .html extensions) and lower-level fragments.
Also, notice how I begin and end each lower-level component with HTML comments that indicate where it begins and ends. This provides a primitive type of debugging (expanded by the Mason previewer/debugger component) that lets me see where things are happening, simply by viewing a component's HTML source code.
The above examples of header and footer components are good for simple sites. However, it would be more useful if our header and footer components could take arguments, allowing us to modify parts of their content as necessary.
Mason indeed allows components to send and receive arguments, giving an extra level of flexibility. To pass arguments to an invoked component, place a comma between the component's name and a list of name,value pairs. For example:
<& header, "address" => 'president@whitehouse.gov' &>
Components can receive passed arguments in special <%args> sections, traditionally placed at the bottom of a component file. An <%args> section declares arguments for a component, with an optional default value if none are passed to the component. For example, the following <%args> section declares the $name and $address variables. An argument without a default variable is mandatory. $name has no default value, while $address has a default value of reuven@lerner.co.il:
<%args> $name $address => 'reuven@lerner.co.il' </%args>We can rewrite footer.comp in this way:
<!-- begin component: footer.comp --> <address> <a href="<% $address %>"><% $name ? $name : $address %></a> </address> <%args> $name => "" $address => 'reuven@lerner.co.il' </%args> <!-- end component: footer.comp -->Finally, we can rewrite output.html to send the required parameter without the optional parameter:
<HTML> <Head><Title>Title</Title></Head> <& header.comp &> <P>This is the body</P> <& footer.comp, "name" => 'Reuven' &> </HTML>
Experienced mod_perl programmers might like the idea of the components Mason provides. However, there are times when it is easiest to accomplish something by reaching into the guts of Apache and working with the mod_perl request object, traditionally called $r.
Mason provides each component with a copy of $r, so we can work with the internals of the server. For example, we can send an HTTP Content-type of “text/html” by using the content_type method:
$r->content_type("text/html");
Because <%perl> sections are invoked before the actual HTTP headers are returned, Mason components can modify all response headers in this way, including working with HTTP cookies.
A similar object, called $m, is specific to Mason and allows us to invoke methods having to do with Mason components and development. For example, we can retrieve the contents of a component with the $m->scomp method. The manual page at HTML::Mason::Devel lists many more methods that can be invoked on $m.
Mason gives us two sections, <%init> and <%once>, in which to run Perl code at the beginning of a component's execution.
An <%init> section is evaluated before any <%perl> sections, as well as any other Perl code in the component. This gives the component a chance to define variables and retrieve information on its environment. In effect, <%init> is the same as <%perl>, except it can be placed anywhere in the component, rather than at the top. Traditionally, <%init> sections are placed near the bottom, along with the other special sections.
An <%init> section is evaluated each time a component is invoked. However, there are items that need to be created only the first time a component is invoked, rather than every time. Such items can be put in a <%once> section. Lexicals and subroutines declared within a <%once> section remain throughout the life of the component, making them particularly useful for initializing the component's state. However, <%once> sections are not evaluated within the context of a Mason request, which means they cannot invoke other components.
Mason components that connect to a relational database with Perl's DBI often use a combination of <%once>, <%init> and $m to reuse database handles. We can do the following, for example, as suggested in the Mason documentation:
<%once> my $dbh; # Declare $dbh only once </%once> <%init> # If this is the first time we're running, # connect to the database if ($m->current_comp->first_time) { $dbh = DBI->connect("DBI:mysql:$database:localhost", $username, $password) || die qq{DBI error from connect: "$DBI::errstr"}; } </init>
While Mason components can create headers and footers using the <& &> syntax we saw above, it becomes cumbersome to put such sections inside each top-level component we create. For this reason, Mason supports two special kinds of components, one called autohandler and the other dhandler.
If an autohandler component exists, it is invoked before each component in the directory. That is, the autohandler is invoked and can produce HTML output of its own before retrieving the component that was actually requested, with $m->call_next. For example, the following autohandler will put a uniform title and footer on each document in its directory:
<HTML> <Head><Title>Welcome to our site!</Title></Head> <Body> <% $m->call_next %> <hr> <address>webmaster@example.com</address> </Body> </HTML>
dhandler, by contrast, is invoked if a component does not exist. In some ways, this allows us to rewrite the “404--No such file” error message that web sites often produce.
While autohandlers normally influence only their own directories, dhandlers affect all subdirectories. Thus, a dhandler in /foo will affect all documents in /foo/bar, but not in /bar. However, an autohandler in /foo will not affect items in either /foo/bar or /bar.
Now that we have seen how Mason can work for some simple tasks, let's look at some components I wrote for creating slide shows. Such presentations will not have the fancy wipes and graphics available with Microsoft's PowerPoint, but are more than adequate for most technically oriented groups.
The slide show component consists of an autohandler, a dhandler and one or more slides (text files) written in HTML. Each slide consists of a piece of HTML that will be stuck inside the <Body>. For example, the following could be a slide:
<H1>Short Presentation</H1> <P>This is my short presentation.</P>
Inside the autohandler (Listing 2; see Resources) we have a <%once> section that defines several constants we will reuse, as well as @slides, an array containing the list of slides. For example, here is the value of @slides from a talk I recently gave:
my @slides = qw(start whoami free-software just-in-time databases mysql postgresql cgi mod_perl templates text::template minivend minivend-example mason mason-example mason-autohandler php jsp zope acs xml conclusion);By reordering the file names within @slides, I change the order of my presentation, and by removing or adding elements from @slides, I can change the length of the presentation.
The autohandler uses $m->scomp, described earlier, to retrieve the HTML associated with a slide. It uses this to retrieve any headline (in <H1> tags) it might find within the slide and uses the headline in the <Title> tag.
In addition, the autohandler produces links for the “previous” and “next” slides. We do this by getting the index of the current slide and retrieving the names from the array:
my $previous_slide = $slides[$current_slide_index - 1] || $slides[0]; my $next_slide = $slides[$current_slide_index + 1] || $slides[0];
Once we have the names of the previous and next slides, we can retrieve their headlines, making for attractive “previous” and “next” links:
# Grab the headline from the next component my $next_headline = $next_slide; my $next_contents = ($m->scomp($next_slide)); if ($next_contents =~ m|<H1>(.+)</H1>|igs) { $next_headline = $1; }One of the nice things about using this autohandler for slides is that it allows me to reorder or modify a talk by shifting the names of the files.
In addition to the autohandler, I installed a dhandler to take care of mistaken filenames:
<HTML> <Head><Title>Error: No such page</Title></Head> <Body BGCOLOR="#FFFFFF"> <P>Sorry, but the page <i><% $r->filename() %></i> does not exist.</P> </Body> </HTML>
Mason provides an environment balanced nicely between simple, easy-to-use templates and the complex, powerful underpinnings of mod_perl. If you ever considered using mod_perl on your site, but were scared away by the complexity, consider looking into Mason. Not only is Mason free software—a good thing, for a variety of reasons—but it is a proven tool that makes web development significantly easier than many of its counterparts. I hope to do much development in Mason over the coming months, and hope to share many of my experiences and code as I grow to enjoy this new tool.
Reuven M. Lerner , an Internet and Web consultant, recently moved to Modi'in, Israel following his marriage to Shira Friedman-Lerner. His book Core Perl will be published by Prentice-Hall in the spring. Reuven can be reached at reuven@lerner.co.il. The ATF home page, including archives and discussion forums, is at http://www.lerner.co.il/atf/.