Writing CGI Scripts in Python
The Python Reference Manual abstract describes Python as:
A simple yet powerful, interpreted programming language that bridges the gap between C and shell programming, and is thus ideally suited for “throw-away programming” and rapid prototyping. Its syntax is put together from constructs borrowed from a variety of other languages; most prominent are influences from ABC, C Modula-3 and Icon ... Python is available for various operating systems, amongst which are several flavors of Unix (including Linux), the Apple Macintosh O.S., MS-DOS, MS Windows 3.1, Windows NT, and OS/2.
It should also be noted that Python is an object-oriented language. You can write classes like in C++ or Java. I use Python every time the common guy would use Perl [“common guy”? Sheesh!—Ed, who is a die-hard Perl fan]. It has about the same functionality, while being far more readable. But it should not be restricted to a scripting language—a lot of people are using it for complete applications. It's also a perfect glue language, like Tcl, because it's easy to add new modules (written in C) to it. It can also be embedded in C applications.
Listing 1 shows my very first Python script. It's still used on a file server at the office. It deletes ~*.tmp left everywhere by buggy MS Windows applications. It's not really the common Hello World program (we'll see one later), and maybe it's not the most efficient way to do the job, but it demonstrates several features of the language:
Variables: Variables don't have to be declared.
Recursivity: See the ScanDir() function.
Platform independence: The os module provides constants for current directory, parent directory, and so on. See os.curdir, os.pardir...
for statement and lists: In Python, for works differently than in C. os.listdir() returns a list of files. For example, in Listing 1, at each iteration of:
for p in files
p will become the value of the next element in the list. So, if files is a triplet with values:
['lib', 'include', 'src' ]
The first time, p will be 'lib'. At the second iteration, it will be 'include', and so on, until it has gone through the whole list.
Arrays and indices: An array can be referred with one or two indices. One index is used to get a single element from the array (like in C). Two indices can be used to get a subset of the array. The first index gives the from element, while the second one gives the to element. For example, if a is an array containing the values 'abcdef', a[ 2 : 4 ] will return 'cd'. There are defaults for both indices: they default to from the start and to to the end respectively. Examples: a[ 2 : ] will return 'cdef'. Negative indices can be used to count from the end; a[ -2 ] will return 'e'. See the Python Tutorial at the Python web site (http://www.Python.org/) to learn more about arrays.
Blocks: Unlike C or Pascal-derived languages, there are no Start-Block or End-Block separators. Python works only with indentations (and the “:” character).
One of my colleagues in the firmware department recently had some problems debugging a TCP/IP application he is writing. There is a server application running in an embedded system, and a client application running on a PC. He was stuck for two days with a protocol problem, and didn't even know if the problem came from the client or from the server. Every test version meant recompiling, eventually downloading the code in the embedded system, and so on. In addition, it's not always easy to debug a device that doesn't even have a screen—you get the point.
So, after we discussed his problems, I decided to write little Python test programs to test his applications. In less than a quarter of an hour, I had tested his server application. This included writing a Python script and running it on a console of a Linux box, concurrently to tcpdump. Since the problem didn't come from the server, I wrote another program to test his client application. This script masqueraded the server, and we immediately discovered the problem. My colleague was very impressed by the short time it took me to write those two scripts, so I gave him a copy of the Python Tutorial.
Some simple scripts using sockets can be found in Listing 2a and2b. They are from the Python Library Reference.
My company sells time and attendance software in a client/server environment. Supported platforms include Unix and NT. The biggest problem with time and attendance is that, although general functionalities are the same for all our customers, they all have special specific rules. That's why the software department is considering the inclusion of the Python interpreter in their software. It would allow on-site customization, and it is available on all our platforms.
The Python home page is located at http://www.python.org/. On the site there is a list of mirror sites, and the current distribution of Python.
A tutorial and other documents including the Language Reference, Library Reference, a guide on how to extend and embed the interpreter and a FAQ can be found in the doc directory of the Python Home Page (http://www.python.org/doc/).
Two books will soon be available about Python:
Programming Python, by Mark Lutz, O'Reilly and Associates Publishers.
Internet Programming with Python, by Aaron Watters, Guido Van Rossum (the author of the language) and James Ahlstrom, from MIS Press/Henry Holt Publishers. See http://www.python.org/python/arwbook.html.
And finally, there's a newsgroup devoted to Python: comp.lang.python.
In the following text, I will assume that you run your own HTTP daemon locally. My preference is Apache, but any server will do the work, if properly configured.
And of course, you should have installed Python on your system. You'll need to configure it to use the gdbm module, since it's used in count.py.
For the examples of scripts which interface with a relational database, I've used PostGres95 (and its contributed Python module, PyGres95). PostGres95 is available from http://www.ki.net/postgres95/. PyGres95 is available from http://zen.via.ecp.fr/via_dvpt/products/pygres.html.
To understand the following text, you should know how to write an HTML page, have a general idea of how CGI works, and have a little background with C programming.
Listing 3, helloworld.py, is our first script. It's very simple. Run from the command line, it will print an HTML document. But you should copy it to your cgi-bin directory, then call it from your browser with the URL http://localhost/cgi-bin/script.py.
This script displays a little message and the local time. Here, you need to note only one thing: the script must send a header describing the contents of the document. This is done by the means of the Content-type header. Common values include text/html, text/plain, image/gif or image/jpeg. The header is terminated by a blank line. It is used by the client browser, and won't appear in the generated page. And, as you'll see, the script is executed, and not just displayed in the browser. Everything printed to sys.stdout by the script will be sent to the client, while error messages will go to an error log (/usr/local/etc/httpd/logs/error_log, if you are using Apache).
Listing 4 is the well-known Count script written in Python. This is used to display a graphical counter of the number of times that a particular page has been accessed.
This script imports a module called cgi, which I'll describe later. It's used to retrieve the URL parameter passed to the script. This script interfaces with gdbm (which must be included in the modules list when Python is configured) to store { URL ; access count } couples.
This is our first introduction to Python dictionaries. A dictionary is generally referred to as an “associative array” in the literature. It means that you can access arrays by keys instead of indices. For example, if you want to handle an e-mail address book, with couples like these:
"Michel", "Michel.Vanaken@ping.be" "Veronique", "Vero@home.sweet.home"
Here is how you should retrieve the address of Michel in C and in Python:
struct { char *key ; char *addr ; } email[ MAX ] ; int i ; for( i=0 ; i<MAX ; i++ ) { if( strcmp( email[ i ].key, "Michel" ) = 0 ) { printf( "%s\n", email[ i ].addr ) ; break ; } } if( i = MAX ) { printf( "Not found\n" ) ; } if email.has_key( "Michel" ) : print email[ "Michel" ] else : print "Not found"
Adding an entry with Python is also very easy :
email[ "Homer" ] = \ "HSimpson@Springfield.power_plant.com"
adds an entry if Homer is not a valid key, and overwrites the old value if it is already present.
We see that Content-type here is image/x-bitmap (since the browser is waiting for an <img src=...>).
Of course, the bitmaps aren't very pretty (I drew them with a paint package, saved them as xbm files, then used a lot of keyboard macros and M-Kill/Yank rectangles in Emacs). The goal of this script is not to reinvent the wheel, but to allow readers to compare it with other versions widely available on the Net in different languages.
In order to use this script, the gdbm database must be created. Change the current directory to your cgi-bin directory, run Python, and type:
import gdbm gdbm.open( "counters.gdbm", "n", 0666 )
and exit Python with Ctrl-D.
It should also be noted that the xbm file created by this script is bad. It contains an extraneous byte (added in the print_footer() function), in order to simplify the print_digit_values() function (in this version, there are no tests for commas).
Before putting your CGI scripts on-line, you should be sure that they're really clean, by testing them carefully, especially in near bounds or out of bounds conditions. A script that crashes in the middle of its job can cause large problems, like data inconsistency in a database application. You can eliminate most of the problems by running your script from the command line; then testing it from your HTTP daemon.
First, you have to remember that Python is an interpreted language. This means that several syntax errors will not be discovered until run time. You must be sure your script has been tested in every part of the control flow. You can do that by generating parameter sets that you will hardcode at the beginning of your script.
Then, be sure that incorrect input cannot lead to an incorrect behaviour of your script. Don't expect that all parameters received by your script will be meaningful. They can be corrupted during communication, or some hacker could try to obtain more data then normally allowed.
Listing 5 shows a different version of our Hello World script and demonstrates the following features:
Tuples: Tuples are arrays consisting of a number of values separated by commas. Ouput tuples are enclosed in parenthesis. The localtime() function returns a tuple which can be assigned in one variable (that becomes a tuple). Or as in this script, individual elements of the tuple can be assigned at one time to several variables.
The elif (“else if”) statement: Listing 5 has two syntax errors that are not detected when the interpreter loads the script, but will crash it when executed. It will crash at Christmas, because there is a call to a Christmas() function which has not been defined, and it will crash again at the New Year's Day, because in addition to “Happy New Year!”, it tries to print a “Max” variable which doesn't exist (due perhaps to a cut-and-paste from a script intended to wish someone happy birthday?). Here is what you'll find in the error_log file if the script is accessed on Christmas:
Traceback (innermost last): File "/cgi-bin/buggy.py", line 59, in ? Main() File "/cgi-bin/buggy.py", line 53, in Main Christmas() NameError: Christmas
The fact that the script seems to execute normally (especially on New Year's Day, since everything that should have been printed is actually printed) can be a pitfall. The script has actually crashed!
Of course, in this script, crashing is not a big problem. But in an Intranet application, it could be very harmful. Imagine, for example, a script that displays a message saying it has updated your stock database, but has in fact crashed immediately after giving the message. The user thinks everything is going well, but the data have not been updated.
Let's get back to Listing 4. We've already seen that the generated xbm is not good; but maybe there are other problems. What happens if:
The script is called with:
<img src="http://localhost/cgi-bin/count.py?\ http://localhost/index.html">
instead of:
<img src="http://localhost/cgi-bin/count.py? _url=_http://localhost/index.html">?
The database file counters.gdbm does not exist?
The access count exceeds 9999?
I suggest you try these, and try your own solutions. For the last situation in the list—the access count exceeds 9999—there are several solutions; I suggest modifying the DIGITS value if the incremented value in the inc_counter() function has a length that exceeds DIGITS. How would you see the generated file if your web browser displays nothing? Maybe you could add the following code, replace the call to CGImain() with TSTmain() and run the script from the command line:
def TSTmain() : ####### url = "http://localhost/test.html" counter = get_put_counter( url ) print_header() print_digits_values( counter ) print_footer()
Listing 6 shows the HTML source for a form we are going to discuss for the remainder of this article. It allows the user to enter some values to perform a query on a database. The action parameter of the form should be adapted to your needs. For a real application, you should replace localhost by the fully qualified name of your host. The name of the script should also be adapted to call the right thing. Note that the HTML code defines a hidden field (TableName).
Let's start with a script that just echoes values entered by the user (see Listing 7). You'll see that, even if you leave the form empty, two parameters are displayed. The first one is (TableName), a hidden parameter in our form, and the second one is the value of the Submit button (which is also a field). Notice that:
CGI module imported by our scripts is used to parse the input sent by an HTML form. It works with GET and POST methods.
cgi.SvFormContentDict() builds a dictionary with:
{ field name ; field value }
couples corresponding to the data encoded by the user.
cgi.escape() is used to convert special characters into their HTML escape sequence (for example, < becomes <).
Now we are going to look at more “real life” scripts that could be used in an Intranet application.
We are going to use PostGres95. It must be installed and configured properly. I won't explain that process here, since it would require a lot of additional text. But two things should be mentioned:
The “user” which is used when a CGI script is run on your system must have access to PostGres95, and to the database being queried.
The connect() function used in the following scripts may need to be adapted to work on your system. Mine doesn't need any parameters, since everything works with the default settings I've configured.
See the PostGres95 manual for more information.
The PyGres95 modules offer the same interface as the LIBPQ API, which is also described in the PostGres95 manual. You should know that there is a connect() function used to connect to the database, and a query() function that receives an SQL string as a parameter.
Listing 8 shows a script that will handle queries on a customer database which has a structure similar to what the fields might be in a query form. The script will connect to the database, build an SQL command, query the database, and finally, display the results in a table that is built on the fly for each request. Of course the SQL statements here are very simple, but scripts could be written to do anything.
This script is not very practical. We'd have to write specific code for every table we want to use. The script of Listing 9 implements a general query on any single PostGres95 table/view from an HTML form. This means that it will work for any query where you need a subset of a table. It could work for customers (as in our example), providers or articles. The main difference from the former script is the build_query() function:
The script now implements the following behaviour: a query made on a numeric field will require an exact match, while a query made on a text field will be considered as ending with a wildcard. This means that numeric fields are considered to be IDs, and that it's not possible, for example, to use it to search articles with a value between $500 and $1,000. But it can be used to search a personal database for all names beginning with “Van”.
Restriction: to determine the type of a field, we'll consider it numeric if its name ends in “num”. This is because all data sent to a CGI script is seen as text. Of course, you could parse the value to see if it's numeric or not. But it's not always a good choice. If you want to search for all telephone numbers beginning with “800”, our script will look for an exact match if it thinks it's a numeric field, and it will find nothing. Of course, you can also completely rewrite the build_query() function to fit your needs.
The script needs to know on which table it should perform the query. That's why our form contains an invisible field called TableName. It must be set to the name of the desired table.
The form field names must be the same as the table field names, because the script uses them to perform the query. But, of course, the labels displayed on the user input form can be anything.
And finally, the script contains several lines that can be commented or uncommented to enable or disable some debug strings in the resulting page (e.g., as the query string).
There are several powerful features of Python that weren't discussed in this article. Python supports exception handling, as in C++ or Modula-3. This can be useful to trap errors in CGI scripting. It's even possible to write a script with a function to send a bug report by e-mail to its author when it detects an unexpected error. And of course, you can write your own classes.
For CGI scripting, although we didn't use them in our sample scripts, some additional features are available. On the Python home page, you'll find code to embed the Python interpreter in Apache. And Apache itself comes with optional modules that interact with PostGres95. But PostGres95 is not the only database available—among others, there is a module for Oracle.
Now, if you want to try Python, the first thing to do is read the Python Tutorial (see Documentation and Availability), then print a copy of the Python Library Reference manual. Then, you should try to reach simple goals—like deleting all ~*.tmp files older than one day, for example.
Michel Vanakan is a 32-year-old software engineer and part-time network administrator. His interests include fantasy and Sci Fi books and games, walks in the wilderness and flights with light aircraft. He can be reached at Michel.Vanaken@ping.be.