Data Modeling with DODS
Relational databases form the backbone of most serious web applications because they make data storage and retrieval easy, safe and flexible. This setup normally works wonderfully, until developers begin to write their programs using objects, which have a completely different paradigm. Is there any way for us to close the gap between the object and relational worlds?
Actually, many ways exist to map relational databases to objects and methods, and most programmers have found themselves writing such systems on their own. As we saw last month, Perl programmers can get some assistance from Alzabo, a module that helps programmers to create tables and then provides a method-based interface to manipulate them.
This month, we will look at DODS (Data Object Design Studio), a tool similar to Alzabo in spirit, except that it is aimed at Java programmers. DODS is a central part of Enhydra, whose upcoming version (Enhydra Enterprise) is expected to be the first open-source application server to implement J2EE (Java 2, Enterprise Edition).
Right now, Enhydra Enterprise is still in beta testing, and while the DODS support appears to have improved since the first versions, I was told by David Young, the Enhydra evangelist at Lutris, that DODS from Enhydra 3.x is more stable. So that I could try things more easily, Lutris sent me a copy of EAS (Enhydra Application Service), their commercially supported and enhanced version of Enhydra.
I'm not entirely sure what the differences are between EAS and the open-source Enhydra server; enhydra.org indicates that EAS is “based on” Enhydra, but the differences between EAS and buying a copy of Enhydra are not at all obvious. I will assume that the copy of EAS I installed is largely similar to Enhydra 3.x, but this might not be an accurate assumption.
DODS, like the Perl Alzabo system we discussed last month, has two goals: to help design databases with a high-end tool and to provide a set of objects and methods for working with that database. While Alzabo is a server-side web application, DODS is a client-side application written in Java that lets you define and edit your database definitions.
The goal of DODS is to create parallel SQL definitions and Java classes that describe the same relational database. You then feed the SQL definitions into your database and use the Java classes to access them.
Moreover, DODS is designed to work with different databases; at present, it works with PostgreSQL, MySQL, Sybase and Oracle, but it may work with more in the future. Since the actual SQL queries are written by an object middleware layer, this means you can move your Enhydra programs from one database to another without having to rewrite them. In reality, of course, things are more complicated. For example, Enhydra's support of PostgreSQL is not very impressive, ignoring the SERIAL data type (which is really just a sequence) and unaware of referential integrity constrations, such as foreign keys. Nevertheless, the goal is an admirable one, and I'm looking forward to seeing how Enhydra 4.x handles this. As time goes on, I expect that DODS will continue to improve its support for different databases, creating appropriate queries for individual SQL dialects.
Let's create a simple web/database application using Enhydra and DODS to demonstrate how it will work. I'll use PostgreSQL in my examples, both because it's an excellent open-source database and because DODS supports it. Our example, as has been the case for the last few months, will be a simple weblog (or blog) application that displays entries in a database table in reverse chronological order. Writing such a program is not particularly difficult, making it all the more attractive as a test of DODS and Enhydra.
Our first stop will be the Enhydra appwizard, which creates the directories and skeleton files for our basic Enhydra application. The appwizard is in $ENHYDRA/bin, where ENHYDRA is an environment variable naming the location of your Enhydra installation. (When I installed it from CD using RPMs for my Red Hat Linux box, ENHYDRA was set to /usr/local/lutris-enhydra3.5.2.)
On the first appwizard screen, I was allowed to choose between a standard web application and an Enhydra super servlet. I chose the latter. On the next screen, I chose an HTML project (rather than a wireless WML project), named the project “blog” and put it in the il.co.lerner package class. I also accepted the default home for Enhydra applications, ~/enhydraApps/. I chose not to associate a copyright with my source code and then clicked Finish, which created 18 new files in ~/enhydraApps/blog.
Now that I've created the skeleton for my application, I'll modify the default Welcome page that comes with Enhydra. We will have to do this in two parts; first, we'll modify the HTML file Welcome.html, which my computer placed in ~/enhydraApps/blog/src/il/co/lerner/presentation/Welcome.html.
Note that this file consists not just of HTML, but of tags that will be processed by XMLC (see At the Forge in the August 2001 issue of Linux Journal to see just what that means). We'll change it to display the latest information from our weblog, rather than the standard page, as you can see in Listing 1. The only difference between an XMLC document and a standard page of HTML is we mark the parts we want to modify inside of <span> tags, with an id attribute. For example:
<p><b><span id="date">Date</span><b></p> <p><span id="text">Text</span></p>
Listing 1. Welcome.html, the XMLC Input File We'll Use to Display Our Weblog
If we display this file literally in our web browser, we'll see the words Date and Text. But users will not retrieve this HTML document directly. Rather, XMLC will turn our document into a Java class. We will then use the WelcomePresentation class to create an instance of the document, using automatically generated methods to modify the date and text sections.
WelcomePresentation, however, will fill in the date and text sections with information that it retrieved from a relational database table. We will have to create the table, as well as populate it with some information, before we can continue.
This is where DODS enters the picture. The DODS program, which is in $ENHYDRA/bin/dods, is another graphical client-side program written in Java. When working with DODS, we have to remember that we're using a bridge between two different paradigms, and the terminology may often be a bit weird.
DODS begins with a package, which is a container for all of the tables and other attributes that we will create. By default, as you can see from the initial DODS window, the package is named root; I changed this to blog by clicking on the root folder and selecting Package from the Edit menu.
We will create a database table (BlogEntries) with two attributes: the date and the contents. (These match the two id tags that are in our version of Welcome.html.) We first add a new table to BlogEntries, using the Insert menu, choosing “data object” and giving it the name BlogEntries.
Next, we must add two fields (date and text) to our table. In order to do this, click on the BlogEntries name in the upper left-hand corner of the DODS window, and use the Insert menu to add each new attribute. Both of our attributes will be of type “varchar”, meaning that we'll treat them both as text. And while this will work perfectly well for our purposes, it's unfortunate that DODS doesn't support the PostgreSQL TIMESTAMP data type, a tool that allows us to perform all sorts of clever and sophisticated manipulations on dates and times. So we'll store the dates in text-only format, knowing that we can use ORDER BY to retrieve them in order—and not much more than that.
Because we want to ensure maximum speed for our web/database application, and because we'll be inserting text much less frequently than we retrieve it, we will tell DODS to make both of our fields indexable and queryable. The former will modify the SQL defintion, ensuring that an index is created for the attribute. The latter will create additional methods that allow us to retrieve information based on this column.
Finally, we choose PostgreSQL from the Database menu, telling DODS to create PostgreSQL-style SQL, rather than any other database.
Once we're done creating our table with DODS, we export it into DOML (using File Save as), an XML format that describes our tables and can be used to generate both the SQL and Java. Make sure to save the DOML file into the source directory within your project and package; thus, I put it in blog/src/il/co/lerner. Once our DOML file is complete (see Listing 2), we can turn it into Java and SQL with the build all command from the File menu. Building everything results in the creation of a number of files within the data directory, so when you're asked where you want to build things, select the data directory in which you stuck the blog.doml file. A window will pop up, describing what DODS is doing at each moment. If everything is successful, you can exit from DODS.
Listing 2. blog.doml, the Automatically Generated File that DODS Creates
Running build all from DODS turns the DOML file into a set of files under the data directory. This directory now contains not just a Makefile (previously empty), but also the blog subdirectory in which the following four Java classes now exist: BlogEntriesDO, a data-object class similar to Enhydra's presentation objects for displaying information; BlogEntriesDataStruct, which actually contains the data from our table; BlogEntriesDOI, an interface for the BlogEntriesDO class; and BlogEntriesQuery, which allows us to query fields we've described as queryable.
In addition to the Java source code that was created, we also have several files containing SQL. In particular, the data directory contains create_tables.sql, which we can use as input to create our database.
To do this, we use the CREATEDB command, which is normally executed by an authorized PostgreSQL user from the UNIX shell. (This does not necessarily mean the root user; check your PostgreSQL installation documentation to see how to create PostgreSQL superusers.)
We thus say
CREATEDB blog
We can then query that database interactively with
psql blogTo create our tables using the automatically generated DODS script, we use the \i command from within psql:
\i /home/reuven/enhydraApps/blog/src/il/co/lerner/ data/create_tables.sqlYou should see several CREATE messages, and then the psql prompt once again. Using the \d command, we see that DODS actually didn't create a table named BlogEntries. Rather, it created two tables, one named objectid and the other (primary) table named newdbtable, which is the default. The objectid table comes in place of a PostgreSQL sequence, demonstrating one limitation of this sort of generic tool, and has a column “next” that indicates which ID comes next. We thus insert information into the newdbtable table, adding a new row to objectid each time we do so.
Let's add several items to our table so that we'll then be able to retrieve them:
INSERT INTO newdbtable (entrydate, text, objectid, objectversion) VALUES (NOW(), 'First blog entry', 1, 1); INSERT INTO objectid ( next ) VALUES ( '2' ); INSERT INTO newdbtable (entrydate, text, objectid, objectversion) VALUES (NOW(), 'Second blog entry', 2, 1); INSERT INTO objectid ( next ) VALUES ( '3' );
Now that we have two blog entries inserted, we can exit psql and go onto modifying our Java class.
Our WelcomePresentation.java class is where most of the magic happens. It creates instances of both Welcome.html and of our database objects, retrieves the query results and then inserts those results into the HTML file.
After you modify WelcomePresentation.java, as shown in Listing 3 [available at ftp.linuxjournal.com/pub/lj/listings/issue91/5426.tgz], run a make from the top-level directory for this project. Enhydra will compile your Java classes, double-check that everything is where it should be and get your application ready to run.
Notice in Listing 3 that we had to modify the definition of the “run” method such that it returns two new exceptions: NonUniqueQueryException and DataObjectException. These are generated by the various data objects that we've created, and since we aren't going to catch these exceptions, we must indicate to the caller that we may raise them.
Listing 3 uses the Enhydra QueryBuilder to create an SQL query using methods created by DODS. We first create an instance of BlogEntriesQuery, one of the automatically created classes:
BlogEntriesQuery blogq = new BlogEntriesQuery();
We want to retrieve all rows until now, in reverse order by entryDate:
blogq.setQueryEntrydate("NOW()", QueryBuilder.LESS_THAN); blogq.addOrderByEntrydate(false);There are also methods for adding WHERE clauses to our SQL query, letting us create arbitrarily complex SQL queries.
Finally, we retrieve an array of matching rows, each of which is represented by a BlogEntriesDO object:
BlogEntriesDO[] blogEntries = blogq.getDOArray();
We're only going to display the most recent one, so we will simply get the first element of our array. We use the method in our “welcome” object, created by XMLC, to insert the appropriate text in our document:
welcome.setTextDate(blogEntries[0].getEntrydate()); welcome.setTextText(blogEntries[0].getText());Once we have modified WelcomePresentation.java, we create the application by running make from our top-level project directory. If you see any errors in your Java program, you can correct them and rerun make as often as you want.
Theoretically, you could now run the application by changing into the output subdirectory and running ./start. But our application will fail if we do this, since it doesn't yet know where to look for the PostgreSQL .jar file. In addition, it's useful to get full debugging output from Enhydra (or any application) when we're first using it, so that we can identify and fix problems more quickly.
We must modify three files in order to get things to work. First, we need to modify $ENHYDRA/bin/multiserver by adding a reference to the PostgreSQL JDBC driver's .jar file. To do this, we simply modify the multiserver (which is a shell script that invokes a Java program), changing the lines under the comment “build up classpath” to the following:
# Where is the PostgreSQL JDBC .jar file? PG_JDBC=/usr/share/pgsql/jdbc7.1-1.2.jar if [ "X${CLASSPATH}" = "X" ] ; then NEWCP=${ENHYDRA_CLASSES}${PS}${PG_JDBC} else NEWCP=${ENHYDRA_CLASSES}${PS}${CLASSPATH}${PS} ${PG_JDBC} fi
Next, we modify blog.conf. Every Enhydra project has a configuration file that tells the system which database to use, as well as a number of other properties. In my particular case, the configuration file is blog/output/conf/blog.conf and consists of a lot of name-value pairs for my application.
We must modify several parts of the “Database manager” section in order to point to our programs. You can see the full section, as it needs to be, in Listing 4 [available at ftp.linuxjournal.com/pub/lj/listings/issue91/5426.tgz].
Finally, we modify servlet.conf. Although this doesn't need to be modified, I find it useful to turn on DEBUG mode by modifying the following two definitions:
Server.LogToFile[] = EMERGENCY, ALERT, CRITICAL, ERROR, WARNING, INFO, DEBUG Server.LogToStderr[] = EMERGENCY, ALERT, CRITICAL, ERROR, WARNING, INFO, DEBUG
The most important thing to realize about blog.conf and servlet.conf is that they are regenerated every time you do a top-level make. So once you have modified them in this way, never do a top-level make again. You will be quite sorry (as I have been) if you do so. Rather, do a make from within the presentation directory.
Once you have modified the configuration in this way, you can go into ~/enhydraApps/blog/output and run ./start. You will see the server start up, plus a fair amount of debugging information if you activated DEBUG in servlet.conf and logging in blog.conf.
You can check out your creation by pointing your web browser to point 9000, the default port for Enhydra applications: http://localhost:9000/. If all is well, you should see the output from our weblog in your web browser.
DODS is better than Alzabo at providing a nice object layer above the SQL. Moreover, it seems to provide a better and more stable set of methods for creating queries and working with their results. However, DODS suffers from several of the same problems as Alzabo and other relational-object mapping tools. Some problems include having to learn a new way to execute all of those SQL queries you've been working with for years and complex queries that can be frustrating to write, since you have to rephrase them as method calls. Also, the generic nature of a tool like DODS means that you won't get the specific benefits of your favorite database. In the case of PostgreSQL, DODS didn't seem to take foreign keys or sequences into account, both of which would have made for more solid table designs.
DODS works well when coupled with the rest of Enhydra, though. As with XMLC and super servlets, I found DODS to be somewhat overwhelming and clunky at first, and then useful and clever. As a first stab at things, DODS in Enhydra is a good attempt to bridge the gap between the object and relational worlds. I look forward to the final version of Enhydra Enterprise, which will undoubtedly push things ahead even further.
Next month, we'll look at an increasingly standard way to not only bridge the relational and object worlds, but also to add transactional capabilities to our server-side Java applications. Enterprise JavaBeans represent services and data for web applications and have become increasingly popular among web developers who want to be able to find objects, work with them and store them to a database, without having to think too hard.
email: reuven@lerner.co.il
Reuven M. Lerner (reuven@lerner.co.il) owns a small consulting firm specializing in web and internet technologies. He lives with his wife Shira and daughter Atara Margalit in Modi'in, Israel. You also can reach him on the ATF home page, www.lerner.co.il/atf.