Choosing Tools
If someone asked you to name the best car on the market, you'd probably tell them the answer depends on who will use the car. After all, a family of eight living in Manhattan probably needs a different type of vehicle from a single hacker living in rural North Dakota. The same is true for programming languages and development toolkits. Each has its place and is appropriate for solving different sorts of problems.
Although this might seem obvious, many programmers believe the language or toolkit they use is so good it should be used to solve all problems, all of the time. As the old saying goes, if your only tool is a hammer, every problem looks like a nail. No programming language is the best fit for all problems, which is why experienced programmers know and use a variety of languages and constantly are learning new ones.
Until only a few years ago, programmers were largely concerned with optimizing their programs for speed and memory usage. After all, processors were relatively slow, and RAM was fairly expensive, so any program that didn't try to squeeze the most out of the hardware was seen as bloatware.
Today, however, we are blessed with cheap, fast computers and cheap, plentiful RAM. This means that software engineers can and should use languages that encourage rapid development and long-term program maintenance. I'm not against optimizing programs for speed or memory usage, but they are less important than creating stable, maintainable software quickly and easily.
About two years ago, I decided I would devote a long series of columns to four basic web development technologies: mod_perl/Mason, J2EE, Zope and OpenACS. These technologies are not only interesting and useful but thought-provoking as well, providing new perspectives and ideas for web developers. And although occasional arguments arise among these communities about which product is superior, the fact is that each tries to solve a slightly different type of problem.
This month, we take some time to summarize the ideas and frameworks that we've explored over the last few years. I don't expect that everyone reading my column will jump to use all of the technologies I name here; however, I do hope to provide even the most die-hard aficionados with some food for thought.
Apache is deservedly one of the poster children of the Open Source movement. It is reliable, highly configurable, well documented, stable and extensible. You can do amazing things with Apache and can customize it in any number of ways to fit your needs. If you are writing a web application that must execute as quickly as possible, you can write a new module in C that seamlessly hooks into Apache.
Although C programs execute quickly and Apache libraries (now known as the Apache Portable Runtime) provide a great deal of useful infrastructure and support for module authors, development in C is slower and more prone to bugs than working with a high-level language such as Perl or Python. So it shouldn't come as any surprise that there are Apache modules that embed these languages inside of the Apache server. mod_perl allows you to write Apache modules in Perl, rather than C, giving you nearly unlimited control over your server with all of the speed and flexibility of Perl.
And indeed, mod_perl comes to mind whenever someone asks me to create a high-performance web application, particularly one that involves text processing or a relational database. I could write the module in C, but why bother? There are times when it makes sense to code in C, but I've generally found mod_perl to be fast enough for even high-powered applications.
Of course, the wonder of mod_perl diminishes somewhat when graphic designers enter the scene. Designers have no interest in modifying program code whenever they want to change the style (or content) on a given page, and giving them access to the source code of Perl modules is asking for trouble. Thus, dozens of different templating systems are available, each of which allows you to mix Perl and HTML in a slightly different way. One of the most popular is Mason, which has been used on a large number of heavy-duty publishing sites for years.
Mason is indeed a wonderful tool, and it provides an excellent trade-off between rapid development (thanks to Perl), easy maintenance (thanks to Mason) and fast execution (thanks to mod_perl). The Mason e-mail list is a source of useful information and support, and the package maintainers have done an admirable job of improving it steadily over time. Configuring, using and debugging modern versions of Mason make the versions that I first used several years ago appear primitive in comparison.
At the same time, Mason is an infrastructure and framework for creating your own applications. True, you easily can use Apache::Session to generate cookies and associate users with a unique ID, but anything having to do with user registration, groups and permissions, let alone full-fledged applications, are your responsibility to implement. For some projects, this is just fine, because it gives you the flexibility you may need. But the fifth time you find yourself creating a system for creating and managing users, groups and permissions, you may decide you need something with a bit more infrastructure.
Sun has been pushing Java as a server-side solution for several years now, and J2EE (Java 2, Enterprise Edition) is the umbrella for a variety of technologies that are meant to help developers create such solutions. Servlets are classes that execute code on a server; JavaServer Pages (JSPs) are Java/HTML templates that are compiled into servlets on the fly. JDBC allows you to access the database, and Enterprise JavaBeans provide you with transactions and automatic relational-to-object mapping. Entering the world of Java requires learning a huge number of acronyms and technologies, as well as learning the various versions for different standards.
I've been working with Java at various times since it was first released, and on nearly every occasion, I find myself wanting to get excited about it but being unable to do so. Java isn't bad, per se, and the different technologies it brings to the table are all rather impressive. Servlets are easy to write; JSPs (and especially the custom tags you can create for JSPs) are a mature and impressive templating system, and JDBC provides everything you would ever want in a database interface. And although EJB is undoubtedly overkill for most projects, it is extremely useful for the big enterprise development groups that Sun is targeting. In addition, multiple implementations, including fine open-source application servers and tools, are impressive and encouraging.
Indeed, Java seems to be the “big company” of the web development world. It gets things done reliably; it has an enormous array of talent at its disposal and follows a huge number of standards; oodles of development tools are available, and a lot of people are using Java. But the overhead associated with Java projects is too large for my liking. Simply learning which version of which standard goes with which version of which Jakarta subproject can take a fairly long time. Just as it's typically more fun to work at a small company than a large one, I find it more interesting to program in Perl or Python than in Java.
Moreover, J2EE suffers from problems similar to those I described with mod_perl and Mason, namely the fact that it's purely infrastructure, without any attention paid to built-in applications. Developers can create amazing things but must reinvent the wheel for each project.
Perhaps my favorite part of the Java world is the attention to maintainable and reliable software. A fair number of testing and development tools, such as Ant, Cactus, JUnit and log4j make it possible (and even relatively straightforward) for programmers to create and manage comprehensive testing of software before it is released.
So, is Java a good choice for web development? I would argue that the larger your project, the more seriously you should consider Java. But for the typical basic web application that small shops work on, the overhead associated with development is too great to ignore.
Zope is clear proof that open-source software does more than imitate its proprietary competition. Zope combines an object database with a multiprotocol server, hooking them together with a rich set of objects and a slick web-based management interface. Zope is innovative, clever, a pleasure to work with and one of the rare open-source projects designed with end users, not just hackers, in mind. Graphic designers love to hear that they can revert to any previous version of a document by using the undo feature in the web-based management interface.
Zope has a number of programming interfaces, each of which trades off simplicity for power. You can create simple DTML templates and Python scripts, use the fascinating ZPT templates that completely separate programs from the display logic, or you can go all the way and create a new Zope product. Zope products are where the real power is, and because each product is a class, you can create multiple instances of your product at different URLs. Because objects inherit via the URL hierarchy (acquisition) in addition to their native object hierarchy, the permissions, behavior or look and feel of a product instance can vary according to its URL.
So far, it sounds like Zope is the best thing that happened to the Web since HTTP. And indeed, the growing number of Zope hackers means a large number of products are available for free download, as well as a growing number of commercial products that use Zope as their underlying infrastructure.
However, Zope has a few problems, the first and biggest one being that the learning curve can be rather steep. Even if you're an experienced web developer, Zope requires that you re-learn nearly all of the concepts from scratch, changing almost all of the habits you've acquired over the years. This isn't necessarily a bad thing, as Zope implements it so well, but it can be a surprise and a reason to be wary, simply because using Zope inevitably will slow things down during the initial startup period.
The other issue I have with Zope is its object database. Object databases historically have had a lot of problems, and ZODB appears to be bucking that trend nicely. At the same time, relational databases are still pretty standard, and people expect (and often need) to work with them. In theory, this isn't a problem. Zope's built-in ZSQL methods allow you to do fascinating things with relational database queries without thinking very hard. The problem then is that your data is split across two different locations: ZODB and your relational database. I like to keep all of my data in one central location, which means this split can frustrate me somewhat.
There is also the issue of speed. Zope's sophisticated permissions and acquisition mechanism is probably faster than you or I could implement on our own, but it still can be relatively sluggish. The official Zope solution for this problem is ZEO (Zope Enterprise Objects), which allows multiple Zope servers to access a single object database. This apparently scales to one million hits per day, which is more than adequate for most of the sites I work on. But exceptionally large sites may need to worry about how quickly Zope operates or alternatively, consider investing in some high-end hardware for the central ZODB server.
Finally, Zope products tend to be relatively independent. The good news is that this allows developers to work in parallel, without slowing each other down. The bad news is that things are not as unified as they could be, with repeated functionality and a lack of coordination. This may be inevitable in an open-source project of this magnitude, but it can be frustrating at times.
Over the last year or two, Zope Corporation has been pushing the use of Zope for content management, rather than for application development. Of course, any decent content management system needs to be modified and reprogrammed for the needs of the customer, so the difference isn't that pronounced. Although Zope is not the only open-source content management system on the market, it is undoubtedly one of the most sophisticated, as well as one of the most mature.
In my own work, I pitch Zope to clients whose projects will involve a fair amount of tricky development, on those that require a relatively easy to use interface or those that require content management. I continue to be impressed by it and look forward to working with Zope quite a bit in the years to come.
OpenACS began as a sophisticated data model for community web sites, along with a large number of web/database templates written in Tcl. Over time, it has grown into a much larger project with a number of facets: independent packages that can upgrade both programs and the data model, the ability to work seamlessly with either Oracle or PostgreSQL and a sophisticated templating system that separates programs from the HTML output. And, OpenACS comes with a huge number of prebuilt applications that do about everything you would want for a community web site, from weblogs to fora and FAQs to an ecommerce store. With nothing more than your web browser, you can create a site in very little time.
And indeed, I find myself recommending OpenACS again and again to nonprofits that want to create on-line communities, reach out to their constituents, conduct discussions among the members or publicize information easily, without needing to know much in the way of technology.
That said, OpenACS has a number of issues. First and foremost is the learning curve. Zope's learning curve is difficult because there are so many technologies to understand. OpenACS has a much simpler model, but it stores absolutely everything in a relational database. This means the data is all in one place, but relational databases are notoriously bad at dealing with hierarchies, and all of the clever OpenACS developers in the world cannot mask that.
Thus, learning to work with OpenACS requires that you learn how to implement a simple object inheritance system and the extensive API that allows you to do it. If you haven't ever written stored procedures or worked with databases containing dozens or hundreds of tables, then you may be overwhelmed by the knowledge necessary to work with OpenACS.
OpenACS also suffers from little documentation for developers and none for end users. OpenACS is admittedly a complex system that can be difficult to describe to nontechnical people, but it can be maddening to find nothing to help with that. To their credit, the main openacs.org site was recently remodeled and rewritten shortly before I wrote this article and seems to have made some positive headway in this direction.
Finally, I find it somewhat ironic that OpenACS has become increasingly sluggish over time. Granted, this is because the latest version (4.x, as of this writing) is far more clever about users, groups and permissions than its predecessors, and checking these things with each HTTP request takes time. In addition, the developers know many optimizations still can be made, such that each request doesn't require quite so many database queries.
Several days before I wrote this article, a new report appeared on the Web describing how Yahoo has settled on PHP as a web programming environment. I personally prefer to work in other languages and wouldn't relish the idea of rewriting all of Yahoo in a new language. But for Yahoo's particular needs, it seems like PHP is indeed a good choice. I give them credit for considering all of the options, weighing the pluses and minuses and coming to a conclusion that fit their needs.
As I said at the beginning of this article, I am a firm believer in finding a technology that meets the needs of the problem at hand. As a developer, this means I'm constantly forced to learn new languages, technologies and techniques. At the same time, this means my clients can get a solution that's appropriate for their problems, and I gain broader skills and depth as a software engineer.
The fact that these systems are available free of charge via the Internet means that the only thing stopping you from trying them is time and some effort. I strongly encourage you to find the time to work with them; you and the people you work with will undoubtedly enjoy the results.
email: reuven@lerner.co.il
Reuven M. Lerner is a consultant specializing in web/database applications and open-source software. His book, Core Perl, was published in January 2002 by Prentice Hall. Reuven lives in Modi'in, Israel, with his wife and daughter.