The Lack of a Small Unified Database
Many desktop users do not have the skills or, frankly, any real need to install and manage a multiuser database server. They still need, however, to use and exchange SQL databases. Unfortunately, many developers fail to see this situation as a problem and dismiss it, at least initially, with any combination of these responses:
It now is so easy to install PostgreSQL, MySQL or any other RDBMS server, desktop users should install one of them.
Program X also is available for Windows/Mac/whatever, so just install it.
If anything, we only need to improve the interface and/or the documentation to do the above.
Nothing technically is wrong with these arguments, but they simply don't apply to the niche sector of soho and corporate users. Nobody in this sector is whining because installing a server requires two or one mouse click instead of zero. But there is a huge difference between a perfect, completely documented install wizard for any SQL server and what those users currently are getting.
Everybody says how great file sharing is. Okay, so answer this one: how can a Linux/KOffice geek share his book or recipe database with his aunt, who is running Windows/OO.o? How can an employee send a product database from his MAC/OO.o desktop to a potential corporate customer running Solaris/KOffice? What if the receiver has no root password or permission to install extra software, the standard situation in most offices? In general, in the real world, the "just install and configure this" attitude doesn't make sense and doesn't help information to flow freely. This attitude can be just as impractical, if not impossible, as having to install a new font or print server simply to open a text file.
Currently, free software users are missing a single-file SQL standard format, which may be a tar or ZIP archive, that contains everything needed by a generic frontend to let people work: schemas, data, indexes, forms structures and so on. Such databases could be copied immediately, uploaded to a Web server or sent by e-mail, the same as any other file. Users would have the certainty that the receiver immediately could access all the data, queries and forms, even if they might look different. Above all, it would be great if such a file format became an OASIS standard, because it would make it much easier to accept in corporate or government scenarios.
In the text/spreadsheet/presentation space, the Right Thing already is happening. The two most popular free office software suites, OO.o and KOffice, are converging on the same default file format, which is an OASIS standard. This means being able to write, read and share such documents today between OO.o KOffice and tomorrow with any other OASIS-compliant application--transparently. This level of standardization also gives much more credibility and strength to Free Software.
Wouldn't it be really great and isn't it time to do the same thing for simple SQL databases? Without, of course, preventing anybody who wants a full blown RDBMS daemon from using it? Today, such databases are not covered or influenced by OASIS. I say that they should be. The rest of this article proposes a way to achieve this goal.
The first two applications that should converge on this database standard are OpenOffice.org (OO.o) and KOffice. OO.o has data sources, meaning it can connect to external RDBMS servers and can use single-file databases in dBase format, whose features simply are too limited. When I started to investigate this matter, I learned that OO.o developers already have begun working on improving support for a server-less database engine. Standardization, however, simply isn't among their objectives right now, though. What they anticipate today to include in OO.o 2.0 (alpha snapshot available here) is a database file format that is XML-based and that contains everything except the actual data (forms, reports, queries and administrative information).
When I asked, I was told that the most probable file format choice is HSQLDB, mainly because it supports more features than its competitors do. Personally, I am against this choice for four reasons. The first is performance (read more here and here), especially considering that OO.o doesn't need to remain as heavy as it is today. The second reason is HSQLDB requires Java, and I don't like the idea of depending on third-party elements, as it makes it more likely that these single-file DBs don't work in practice when moved from PC to PC. The third reason is many other application and languages in the free software arena have partially converged on something else (more on this in a moment), so I think OO.o should be a good community citizen and follow suit. Hence, my fourth reason: by not proposing a portable standard in OO.o, OO.o users would be in the position of saying to everybody else "yes, we are using this so called "free" software, but if you want to share small databases transparently, please force yourself to freely install OO.o, HSQLDB/Java or any combination of the above".
The database interface in KOffice is called Kexi. Kexi is an integrated environment for creating database schemas and for inserting, querying and processing data. It can run without KDE, on UNIX, MS Windows and Mac OS X. Kexi already creates self-contained database files (.kexi). Queries and other metadata are stored inside the database itself, in special hidden tables named kexi__*. Such metadata is of both visual (column widths, detailed cell formatting) and functional natures (constraints, error messages). This is quite different from what happens today in OO.o, where metadata is stored in an XML package while the data itself is stored somewhere else.
Obviously, using the Microsoft Access format, .mdb, is not an option. Why remain dependent on yet another proprietary format that could change overnight? As far as I know, the best solution is SQLite, a lean and mean database engine already available in KDE and GNOME--it's embedded in Kexi and gnome-db, respectively--and built from less than 30K lines of C code. Using SQLite, databases can be designed and used with a standalone, public domain browser on Linux, Windows and Mac OS X. Wrappers in all popular languages already exist; even PHP 5.0.0 already embeds SQLite, version 2.8.14.
For the record, it currently is possible to use SQLite databases as external data sources in OO.o using an ODBC driver. Detailed instructions are available both in English and French.
Another reason to go for this engine might be the SQLite license, making it suitable for inclusion in proprietary products. The main drawback of SQLite today is the lack of some functions, such as ALTER TABLE, check constraints and referential integrity. However, the addition of these function, starting with ALTER TABLE, is planned to happen over the next months. Last but not least, the small amount of code involved makes it easy to become familiar with SQLite and add new features. D. Richard Hipp, the author of SQLite, is very interested in seeing this project succeed.
Both developers and end users can contribute to make this portable database dream come true. The first thing to do is help embed SQLite in OO.o. A possible approach to this goal is discussed here. Other things you should know are the "Basic UNO" and "Database Access" sections of the Developer's Guide, as well as the relevant APIs of SQLite. The list to join is dev@dba.openoffice.org. The developers eagerly are awaiting your support on this particular sub-project.
The hardest part, of course, is to make a complete standard of all this. Once OO.o embeds SQLite, what still would be missing from making it possible to exchange databases and forms directly among the users of OO.o, KOffice and, later, everybody else? Above all, do the KOffice and OO.o teams have the will to converge fully on achieving this standard?
After hearing my arguments, OO.o developers agreed that SQLite indeed might be a better bet than HSQLDB, even if it will require additional time to implement. Sun already has a strong interest in standards, so I hope it will support this proposal of mine. As far as KOffice is concerned, all its core developers have the goal that "all KOffice applications either follow the OASIS format or help to define new OASIS definitions for formats that are currently not standardized". Looking good, no? Of course, SQLite is only the database engine part. The greatest work is to define and use in the same way all the other associated information--how reports and forms are represented and recorded, how the query is stored, and so on. Theoretically, it should be possible to converge on some XML user interface description--perhaps UIML, another OASIS standard.
Another issue that needs to be dealt with is the current difference in file format approaches. In Kexi, everything is added to the actual database in separate tables. In OO.o, a ZIP archive with administrative XML streams is used. As of current plans, OO.o 2.0 is supposed to embed the actual data into that archive. Also, OO.o forms and reports are real OO.o Writer documents generated by a wizard that contain a macro that, when the document is opened, fills it with the data. Consequently, either Kexi or OO.o should change the top-level file format itself (ZIP archive or SQLite DB) and then agree on a common formats for queries and forms. Initially, the choice seemed to me to be already made, even if not all the interested parties had realized it yet. If the OO.o format has become an open OASIS standard, and KOffice is adopting it as its own native format and Kexi is part of KOffice, then what's left? The standard forms will be OASIS text documents, as they already are in OO.o. But, this is my initial opinion. What matters is that one common format does comes out and that it is given the rank of official standard. To me, SQLite seems to be the best way to do it.
First of all, go here and vote for an SQLite driver in OO.o. Also, vote to add support for OO.o forms in Kexi. Even if you don't use any of them yourself, you might receive or send an SQLite database some day, so the more widespread the standard is, the better. Above all, start using SQLite today. Remember, it also works on Windows and Mac. And databases, because they are self-contained, can be moved in a snap from machine to machine. Of course, report any bugs you find so as to help speed up development.
Marco Fioretti is a hardware systems engineer interested in free software both as an EDA platform and, as the current leader of the RULE Project, as an efficient desktop. Marco lives with his family in Rome, Italy.