Data Modeling with Alzabo
Over the last few months, we have been looking at server-side Java programming from a variety of perspectives. From servlets to JSPs to the Enhydra application server, we've seen several different ways to create dynamic, database-driven web sites using open-source Java technologies.
I had originally planned to continue in that vein this month, looking at Enhydra's intriguing DODS object-to-relational modeling software. DODS provides a high-level Java abstraction layer for tables in a relational database. DODS methods are translated automatically into the appropriate SQL, which is then handed to the database. The result: you see Java objects and methods, your database sees tables and SQL, and everyone is happy.
Unfortunately, the help and goodwill expressed by folks at Lutris (the corporate backer of Enhydra) were no match for the Israeli customs service and our local branch of FedEx. The CD and book with additional explanations of DODS sit in a warehouse as I write this, forcing me to take a short detour from my original plan.
However, investigating DODS for this month's article revived my interest in the subject of object-relational mappings. One of the most interesting and easy-to-use tools that I've seen for this purpose is Alzabo, a set of Perl modules that allows server-side Perl programmers to wrap their relational database schemata inside of an object. (The project is named for a creature in the science fiction work of Gene Wolfe.) I was quite impressed by what I saw and believe that many Perl programmers will be equally happy to discover such a powerful tool.
Programmers have reaped many benefits by working with objects, from reusability to inheritance to encapsulation. But while programmers have taken to object-oriented programming in droves, object databases have been less popular for a variety of reasons. Instead, relational databases have become increasingly popular over the last few years, with huge quantities of data being placed within them. The problem, then, is how we can model our data as objects, while storing them as tables.
One possibility is to model each table as a class, each table column as an instance variable and each table row as an instance of that class. But anyone who has tried this quickly discovers it is easier said than done, particularly when creating web applications—how can we join two tables? What happens when two programs modify the same row in memory and only later commit those changes to the database? How can we ensure that changes to our class definition are reflected in the database and vice versa?
Another possibility is to read an entire table into an object instance, modifying the object and writing it out when a particular method is invoked. This works pretty well for small tables, but what happens when your tables become several megabytes (or gigabytes or terabytes) in size? Your boss might be willing to buy more memory for the web server but not if you're wasting it all reading entire tables into memory! Besides, modeling tables inside of your object means you also have to create a decent locking mechanism, complete with commits and rollbacks—something that most programmers are equipped to do.
We can easily dismiss these problems when working on a small application. But as applications and databases scale up, we want to ensure that things will work as expected. This is particularly true when creating an object-to-relational mapping system, such as Alzabo. One of my employees and I created a simple object-to-relational mapping middleware layer last year and were very happy with what we had done—until we found that we hadn't taken nearly enough corner cases into account, ending up with a mess of exceptions and default values.
Luckily for the Perl programmers among us, Dave Rolsky took the time to sit down and map out all of these problems, as well as many others. Alzabo gives us an object-oriented middleware layer that removes our need to interact directly with a database.
But Alzabo does more than provide a high-level interface to your database. It also gives you a programmatic way to modify your database schema definitions, including a browser-based table creation and maintenance tool that creates SQL for you automatically. Moreover, Alzabo can take an existing database and reverse-engineer it, allowing you to use Alzabo with existing databases as well as new ones.
Like most Perl modules, Alzabo is available for download from CPAN. However, installing Alzabo can be more complicated than other modules, simply because Alzabo depends on many modules. Not only does Alzabo require the use of DBI (for database access) and either DBD::mysql or DBD::Pg (for PostgreSQL), but the browser-based schema-creation tool uses HTML::Mason, which in turn requires mod_perl. If all of these are installed on your system, then installing Alzabo should be relatively straightforward.
I was able to install Alzabo without too much difficulty, using the CPAN modules to download and install automatically each of the prerequisites and then Alzabo itself.
I accepted the default values for almost all of the questions asked during the software's configuration and installation, with the exception of the .mhtml suffix that Alzabo assumes you use for Mason components. I normally give Mason components the simple .html suffix; because my Apache configuration didn't know what to do with the .mhtml extension, it sent them as Content-type text/plain, displaying the Mason component's source code in my browser window. Changing the suffix of the installed Mason components to .html worked on my computer, but I could have modified my Mason or Apache configuration just as easily.
Alzabo tracks each schema in its own directory, called /usr/local/alzabo by default. Inside of this directory is a schemata directory, with a single subdirectory for each of the database schemata that Alzabo is modeling. For example, the appointments schemata would be in /usr/local/alzabo/schemas/appointments.
There were two small hitches in my Alzabo installation that I had to fix. First, I had to change the permissions /usr/local/alzabo so that my web user could read and write to it. Secondly, I had to modify my PostgreSQL startup script to include the -i option, so that clients could connect via the network. By default, most PostgreSQL installations (including RPM versions) do not turn on -i, meaning that even the most liberal configuration in pg_hba.conf (the PostgreSQL host access control file) will fail to work. While you normally can connect to PostgreSQL without the network using UNIX sockets, Alzabo always specifies a hostname, which in turn requires a network connection even on the local computer.
To install the web-based schema generator, at least one directory under your Apache server must be controlled by HTML::Mason. The Alzabo installation script will create a new/alzabo subdirectory there, along with the Mason components that create and modify the schema definitions that you create. My workstation, for instance, has all of its Mason components in /usr/local/apache/mason, which is mapped to URLs beginning with /mason. The web part of my Alzabo installation is thus in /usr/local/apache/mason/alzabo, accessible via the URL /mason/alzabo. If you have not done so already, you may wish to tell Apache (via the DirectoryIndex directive) that index.mhtml is an acceptable index page for a directory.
Now that we have installed Alzabo, let's create a simple database schema using the browser-based design tool. Admittedly this is not as slick as commercial or client-side tools, but it does the job rather well.
Begin by creating a new schema (known in PostgreSQL and MySQL parlance as a database) to which you must give a name. The schema must be a legitimate database name within either PostgreSQL or MySQL. I choose to work with PostgreSQL because of its built-in referential integrity, foreign keys, views and triggers, as well as a more standard dialect of SQL and the ability to write stored procedures in a variety of languages.
Let's create a simple phone book and appointment calendar using Alzabo. We will keep track of people we know, their addresses and telephone numbers, and appointments we have scheduled with them. Using this database, we can learn about the people with whom we're meeting on a given day or about all of the appointments with a given person.
To create this schema, we point our web browser at the URL alzabo/schema under the Mason directory we mentioned earlier (on my computer, I pointed the browser to http://localhost/mason/alzabo/schema.) This brings up the schema creation/editing page that allows us to edit an existing schema, create a new one or reverse-engineer an existing one. While the last option is the most interesting, allowing you to access legacy databases using Alzabo, we will create a new schema. I entered the name (I chose addressbook, for lack of a better idea) and indicated that we wish to use PostgreSQL as our back-end database.
After clicking on “submit”, several possibilities were presented: I could add a new table to this schema, delete the entire schema or examine the SQL that Alzabo will generate automatically. Right now, of course, there isn't any SQL to display. Over time, we will see this SQL grow considerably.
However, because Alzabo has not created any SQL doesn't mean that no work has been done on the back end. Indeed, Alzabo automatically created the addressbook directory within /usr/local/alzabo/schemas, containing three files: addressbook.create.alz and addressbook.runtime.alz (both are stored in a binary format) and addressbook.rdbms, which contains the single word PostgreSQL. In this way, Alzabo tracks the database server in which the schema is stored.
Once inside the addressbook schema, I added a “People” table by entering “People” in the “add a table” text field and clicking on “submit”. (PostgreSQL ignores case in table and column names, but I like Joe Celko's convention of initial caps for Table Names, all lowercase for column names and all caps for SQL RESERVED WORDS.)
Within my People table, I created columns, each of a different data type. Alzabo offers a menu of potential data types, but we can enter our own if we want; this can be particularly useful in PostgreSQL, which allows us to create our own data types.
I generally prefer to work with synthetic primary keys in such a table, giving each row its own value. In PostgreSQL, we accomplish this using the SERIAL data type. But you will notice that no such data type exists in the Alzabo selection list. You might be tempted to indicate that this is an INTEGER column and to mark the “sequenced” check box at the bottom of the column editor. Doing so, however, will create an INTEGER column, as well as a totally unrelated PostgreSQL sequence object. Rather, to get a synthetic primary key you must manually enter SERIAL in the text field below the <select> list of column types.
An additional check box lets you indicate if a column is the primary key and automatically marks it with “pk” in column listings. And a third check box allows you to indicate if a column may contain NULL values, a subtle way of reminding new database designers that NULLs complicate life and should be avoided whenever possible.
To create a foreign key (REFERENCES) or CHECK clause, add it in the “attributes” text fields toward the bottom of the HTML form. Remember that you're only modeling the schema in Perl at this point, meaning that you will be free to add and remove such clauses in the future without having to send ALTER TABLE queries to the database. You also can create indices on one or more columns using the Alzabo editor.
You can use the Alzabo table and column editors to create many tables and columns, moving between them using a set of hierarchical menus and listings. The Alzabo display even places “<” and “>” marks next to each column, allowing you to move them relative to each other within a particular definition.
As you work with the browser-based schema editor, I suggest that you occasionally preview the SQL that Alzabo generates. Not only will this ensure that Alzabo is doing the right thing (as we saw with the SERIAL column), but it will give you a better sense of the low-level details your schema is creating.
After you have finished creating the schema, use the “execute SQL” button from within the “SQL preview” page to send your SQL to the database server. If the database server returns any errors, Alzabo will produce a lengthy and detailed error message describing what happened.
In some cases, you may need to fix your table or column definitions, while in others you may need to ensure that the server is running with the correct permissions. Also ensure that you have defined a PostgreSQL user (created with the command-line createuser program) whose name matches the username under which Apache runs, unless you explicitly name another user in the HTML form.
Once you execute the SQL from within the schema editor, you have two ways to access the data. You may, of course, access it directly using DBI (or a similar interface from another language), creating and executing SQL queries.
For example, let's assume that I have created my addressbook schema with the following table:
CREATE TABLE People ( person_id SERIAL NOT NULL, first_name TEXT NOT NULL, last_name TEXT NOT NULL, birthday DATE NOT NULL, PRIMARY KEY(person_id) );
In order to make things a bit more interesting, let's populate our table with some values:
INSERT INTO People (first_name, last_name, birthday) VALUES ('Reuven', 'Lerner', '1970-Jul-14'); INSERT INTO People (first_name, last_name, birthday) VALUES ('Atara Margalit', 'Lerner-Friedman', '2000-Dec-16');Listing 1 contains a simple Perl program that uses DBI to retrieve the names (and birthdays) of people in our addressbook who match the SQL pattern entered on the command line. (SQL patterns are much simpler than UNIX regular expressions—there are only two characters: % matches zero or more characters and _ matches exactly one character.)
We retrieve the user's input on the command line and place % signs before and after it to ensure that the string will match, regardless of where it occurs in the first_name or last_name column. Then we connect to the database, turning on AutoCommit (as the DBI documentation encourages us to do) and activating the RaiseError and PrintError diagnostic aids.
Finally, we create our SQL query in the $sql variable, making sure to use placeholders (“?”) instead of directly interpolating variables. Not only does this reduce the risk of someone messing with our SQL, but some database drivers will take advantage of our placeholders in subsequent queries, giving us a speed boost.
Let's rewrite this program using Alzabo instead of straight DBI. We won't write the SQL ourselves or connect to the database ourselves. Rather, we will create a new schema object, naming the schema that we created with Alzabo's interactive tool. This object has a number of methods that let us perform many of the tasks for which we would otherwise use DBI.
As you can see from Listing 2, there are not many differences between the two versions until we connect to the data source. In the DBI version of the program, we connected to the data source itself with DBI->connect. In Alzabo, however, we connect to a schema, which is presumably attached to a database, and assign it to the object $schema.
Listing 2. retrieve-birthday-alzabo.pl, an Alzabo implementation of the program in Listing 1.
Using $schema, we retrieve a table object associated with one of our tables:
my $people = $schema->table("People");
Now that we have an object mapped to our People table, we can retrieve selected rows from the table. The easiest way to retrieve rows is with the rows_where method. This returns a single object of type Alzabo::Runtime::RowCursor:
my $row_cursor = $people->rows_where (where => [[$people->column('first_name'), 'LIKE', $look_for_name], 'or', [$people->column('last_name'), 'LIKE', $look_for_name]]);Alzabo's WHERE clauses usually consist of a three-element list: a column object, a comparison operator and a value or second column object. We can compare the first_name column for equality with Zaphod with:
where => [$table->column('first_name'), '=', 'Zaphod']In Listing 2, we have made this a bit more complicated, linking two array references with the OR boolean operator:
where => [[$people->column('first_name'), 'LIKE', $look_for_name], 'or', [$people->column('last_name'), 'LIKE', $look_for_name]]Alzabo is smart enough to realize that the first and third elements of its WHERE clause are array references, and it turns the above code into the appropriate SQL.
Once we have our RowCursor object, we iterate through each row with the next_row method:
while (my $row = $row_cursor->next_row) { my $first_name = $row->select('first_name'); my $last_name = $row->select('last_name'); my $birthday = $row->select('birthday'); print "$first_name $last_name (birthday: $birthday)\n"; $rows_returned++; }
If Alzabo simply provided a set of methods that create SQL, it wouldn't be a very powerful tool. However, Alzabo provides caching and exception-handling as part of its suite of tools, making it easier in some ways to work with databases.
Alzabo's caching functionality keeps a table in memory rather than returning to the database server each time we request a value from it. Obviously, caching isn't appropriate for tables that change on a regular basis, but for tables that rarely change, you can activate the cache and enjoy a nice boost in speed.
You can activate caching by loading the Alzabo::ObjectCache module in your program. The RowCursor object, which we used to retrieve rows in Listing 2, returns Row objects with each iteration of the next_row method. See the documentation for Alzabo::Runtime::Row and Alzabo::ObjectCache for information about the different kinds of caches available to you, as well as the issues associated with them.
Alzabo also uses Perl's built-in exception-handling system, meaning that it invokes “die” if something goes wrong. Therefore, you should wrap your Alzabo-using programs (or individual calls within them) in “eval” blocks:
# Try to run this code eval { my $row_cursor = $people->rows_where( where => [[$people->column('first_name'), 'LIKE', $look_for_name], 'or', [$people->column('last_name'), 'LIKE', $look_for_name]]); };
You can find out if something went wrong by checking the special Perl variable $@, which is set if an error occurs within the previous eval. But Alzabo uses the Exception::Class object (available from CPAN) for more sophisticated exception-handling in Perl. The $@ variable isn't set to a text string describing the error, rather it is set to an instance of the appropriate exception class. You can thus test $@ with UNIVERSAL::isa to determine just what kind of object it is and what kind of problem occurred within your code. The Mason component common/exception, installed under the alzabo directory in your Mason-controlled Apache content directory, demonstrates how to do this in detail.
There are obviously costs associated with Alzabo, as with any tool that tries to bridge the object-relational gap. For starters, SQL is a fairly standard means for working with relational databases. Using Alzabo means you will be moving away from that standard and toward a different solution that is incompatible with anything else. I'm not opposed to new ways of doing things, and there are a number of significant advantages to using Alzabo. That said, I'm always cautious about doing old, standard things in new, nonstandard ways.
While I normally prefer to create my tables using handcrafted SQL, that technique doesn't scale above 10 or 20 tables without forcing me to scroll wildly within my Emacs buffer. Alzabo's web-based schema design tool does make it easier to keep track of a large number of tables to create relations between them and modify them. I recently spent half an hour trying to remember how Oracle's syntax was different from that of PostgreSQL and would have greatly benefitted from a tool like Alzabo.
As we saw earlier, creating complex queries based on equality isn't difficult within Alzabo, even when those queries include OR and AND operators. The Alzabo::Runtime::Table object includes a function method, which is meant for executing arbitrary SQL functions. However, I found it difficult, and in some cases impossible, to create Alzabo WHERE clauses that would let me create an SQL query based on multiple function calls. I admit that I'm relatively new to Alzabo and only tried it for an hour or two, but a query that took me 20 seconds to write in SQL shouldn't take much longer than that in Alzabo.
One of the more difficult issues when mapping objects to relational databases has to do with joins. Joins make a lot of sense when working with tables, but the meaning is less obvious when working with objects. Alzabo does have some built-in support for joins, but it is marked as being largely new and experimental.
Finally, there is also a speed trade-off associated with any middleware layer. The speed differences between Listings 1 and 2 were quite noticeable when I executed them from the command line, owing in no small part to the fact that using Alzabo imports a large number of Perl classes. In a mod_perl environment (where Alzabo is designed to shine), the speed differences will be much smaller, since much of the time is spent loading different modules from disk. Because mod_perl compiles programs only once before executing them, the speed difference between Alzabo and raw DBI calls is probably not that great.
Alzabo provides a relatively simple way to wrap objects around your relational database tables. There is a lot of good news here: the data-modeling tool is quite sophisticated, there is a large amount of nice functionality, the methods largely make sense, and the documentation is vast and generally well written. As the Open Source community has long said, using an existing, battle-tested and open tool is almost always better than rolling a new, proprietary package that solves the same problem.
But wrapping relational database tables inside of objects is always fraught with danger and problems, and Alzabo is no exception: joins are still clunky, and it's not clear how to create some queries. Alzabo isn't at fault here; it's an inherent problem when working with two technologies that see the world in different ways.
It's certainly clear that I'll be using Alzabo in the future for some of my server-side programs, particularly those that need more sophisticated caching and exception-handling than I could otherwise provide.
Next month, customs permitting, we will return to our tour of server-side Java, comparing Enhydra's DODS package with Alzabo and its kin.
email: reuven@lerner.co.il
Reuven M. Lerner owns a small consulting firm specializing in web and internet technologies. He lives with his wife Shira and daughter Atara Margalit in Modi'in, Israel. You can reach him at reuven@lerner.co.il or on the ATF home page, http://www.lerner.co.il/atf.