The Python DB-API
Many people use Python because, like other scripting languages, it is a portable, platform-independent and general-purpose language that can perform the same tasks as the database-centric, proprietary 4GL tools supplied by database vendors. Like 4GL tools, Python lets you write programs that access, display and update the information in the database with minimal effort. Unlike many 4GLs, Python also gives you a variety of other capabilities, such as parsing HTML, making socket connections and encrypting data.
Possible applications for the Python DB-API include:
Many web sites construct pages on the fly to display information requested by the user, such as selections from the products offered in a catalog or from the documents available in a library. Doing this requires CGI scripts that can extract the desired information from a database and render it as HTML.
An Intranet application might use the Tkinter module and the DB-API to provide a graphical user interface for browsing through a local database, such as accounts receivable or a customer list.
Python programs can be used to analyze data by computing statistical properties of the data.
Python programs can form a testing framework for programs that modify the database, in order to verify that a particular integrity constraint is maintained.
There are lots of commercial and freeware databases available, and most of them provide Structured Query Language (SQL) for retrieving and adding information (see Resources). However, while most databases have SQL in common, the details of how to perform an SQL operation vary. The individuals who wrote the Python database modules invented their own interfaces, and the resulting proliferation of different Python modules caused problems: no two of them were exactly alike, so if you decided to switch to a new database product, you had to rewrite all the code that retrieved and inserted data.
To solve the problem, a Special Interest Group (SIG) for databases was formed. People interested in using Python for a given application form a SIG of their own: they meet on an Internet mailing list, where they discuss the topic, exchange ideas, write code (and debug it) and eventually produce a finished product. (Sounds a lot like the development process for the Linux kernel, doesn't it?)
After some discussion, the Database SIG produced a specification for a consistent interface to relational databases—the DB-API. Thanks to this specification, there's only one interface to learn. Porting code to a different database product is much simpler, often requiring the change of only a few lines.
The database modules written before the Database SIG are still around and don't match the specification—the mSQL module is the most commonly used one. These modules will eventually be rewritten to comply with the DB-API; it's just a matter of the maintainers finding the time to do it.
A relational database is made up of one or more tables. Each table is divided into columns and rows. A column contains items of a similar type, such as customer IDs or prices, and a row contains a single data item, with a value for each column. A single row is also called a tuple or a relation, which is where the term “relational database” originates.
For an example database, we'll use a small table designed to track the attendees for a series of seminars. (See Listing 1.) The Seminars table lists the seminars being offered; an example row is (1, Python Programming, 200, 15). Each row contains a unique identifying ID number (1, in this case), the seminar's title (Python Programming), its cost ($200), and the number of places still open (15). The Attendees table lists the name of each attendee, the seminar that he or she wishes to attend and whether the fee has been paid. If someone wants to attend more than one seminar, there will be more than one row with that person's name, with each row having a different seminar number and payment status.
The examples below use the soliddb module, which supports accessing SOLID databases from Python. SOLID is a product from Solidtech that was reviewed by Bradley Willson in LJ, September, 1997. I'm not trying to cover CGI or Tkinter programming, so only the commands to access the database are presented here, in the same manner as if typed directly into the Python interpreter.
To begin, the program must first import the appropriate Python module for connecting to the database product being used. By convention, all database modules compliant with the Python DB-API have names that end in “db”, e.g., soliddb and oracledb.
The next step is to create an object that represents a database connection. The object has the same name as the module. The information required to open a connection, and its format, varies for different databases. Usually, it includes a user name and password, and some indication of how to find the database server, such as a TCP/IP hostname. If you're using the free trial version of SOLID, UNIX pipes are the only method available to connect to the server, so the code is:
>>> import soliddb >>> db = soliddb.soliddb('UPipe SOLID', 'amk', 'mypassword') >>> db <Solid object at 809bf10>
Next, you should create a cursor object. A cursor object acts as a handle for a given SQL query; it allows retrieval of one or more rows of the result, until all the matching rows have been processed. For simple applications that do not need more than one query at a time, it's not necessary to use a cursor object because database objects support all the same methods as cursor objects. We'll deliberately use cursor objects in the following example. (For more on beginning SQL, see At the Forge by Reuven Lerner in LJ, October, 1997.)
Cursor objects provide an execute() statement that accepts a string containing an SQL statement to be performed. This, in turn causes the database server to create a set of rows that match the query.
The results are retrieved by calling a method whose name begins with fetch, which returns one or more matching rows or “None” if there are no more rows to retrieve. The fetchone() method always returns a single row, while fetchmany() returns a small number of rows and fetchall() returns all the rows that match.
For example, to list all the seminars being offered, do the following:
>>> cursor = db.cursor() >>> # List all the seminars >>> cursor.execute('select * from Seminars') >>> cursor.fetchall( [(4, 'Web Commerce', 300.0, 26), (1, 'Python Programming', 200.0, 15), (3, 'Socket Programming', 475.0, 7), (2, 'Intro to Linux', 100.0, 32), ]
A row is represented as a tuple, so the first row returned is:
(4, 'Web Commerce', 300.0, 26)Notice that the rows aren't returned in sorted order; to do that, the query has to be slightly different (just add order by ID). Because they return multiple rows, the fetchmany() and fetchall() methods return a list of tuples. It's also possible to manually iterate through the results using the fetchone() method and looping until it returns “None”, as in this example which lists all the attendees for seminar 1:
>>> cursor.execute ( 'select * from Attendees where seminar=1') >>> while (1): ... attendee = cursor.fetchone() ... if attendee == None: break ... print attendee ... ('Albert', 1, 'no') ('Beth', 1, 'yes') ('Elaine', 1, 'yes')SQL also lets you write queries that operate on multiple tables, as in this query, which lists the seminars that Albert will be attending:
>>> cursor.execute("""select Seminars.title ... from Seminars, Attendees ... where Attendees.name = 'Albert' ... and Seminars.ID = Attendees.seminar""") >>≫ cursor.fetchall() [('Python Programming',), ('Web Commerce',)]Now that we can get information out of the database, it's time to start modifying it by adding new information. Changes are made by using the SQL insert and update statements. Just like queries, the SQL statement is passed to the execute() method of a cursor object.
Before showing how to add information, there's one subtlety to be noted that occurs when a task requires several different SQL commands to complete. Consider adding an attendee to a given seminar. This requires two steps. In one step, a row must be added to the Attendees table giving the person's name, the ID of the seminar they'll be attending and whether or not they've paid. In the other step, the places_left value for this seminar should be decreased by one, because there's room for one less person. SQL has no way to combine two commands, so this requires two execute() calls. But what if something happens and the second command isn't executed—perhaps, because the computer crashed, the network died or there was a typo in the Python program? The database is now inconsistent: an attendee has been added, but the places_left column for that seminar is now wrong.
Most databases offer transactions as a solution for this problem. A transaction is a group of commands: either all of them are executed, or none of them are. Programs can issue several SQL commands as part of a transaction and then commit them, (i.e., tell the database to apply all these changes simultaneously). Alternatively, the program can decide that something's wrong and roll back the transaction without making the changes.
For databases that support transactions, the Python interface silently starts a transaction when the cursor is created. The commit() method commits the updates made using that cursor, and the rollback() method discards them. Each method then starts a new transaction. Some databases don't have transactions, but simply apply all changes as they're executed. On these databases, commit() does nothing, but you should still call it in order to be compatible with those databases that do support transactions.
Listing 2 is a Python function that tries to get all this right by committing the transaction once both operations have been performed. Calling this function is simple:
addAttendee('George', 4, 'yes')
We can verify that the change was performed by checking the listing for seminar #4, and listing its attendees. This produces the following output:
Seminars: 4 'Web Commerce' 300.0 25 Attendees: Albert 4 no Dale 4 yes Felix 4 no George 4 yesNote that this function is still buggy if more than one process or thread tries to execute it at the same time. Database programming can be potentially quite complex.
With this standardized interface, it's not difficult to write all kinds of Python programs that act as easy-to-use front ends to a database.
Andrew Kuchling works as a web site developer for Magnet Interactive in Washington, D.C. One of his past projects was a sizable commercial site that was implemented using Python on top of an Illustra database. He can be reached via e-mail at akuchling@acm.org.