Creating a Web-based BBS, Part 1
For a period of time last year, the buzzword on the Web was “community”. Everyone wanted to build a virtual community, allowing people to interact on-line in much the same way as they interact in real life.
While virtual communities were (and are) overhyped, it is true that the Internet has produced a number of such on-line groups, many of whose members have never met in person. If you're reading this magazine, you probably participate in at least one e-mail list, chat system or Usenet newsgroup. Indeed, Linux would probably not be the success it is today were it not for communities of developers and users sharing information with each other via the Internet.
Several methods are available for creating an on-line community, beginning with the oldest and best-known, an e-mail list. Setting up a mailing list is relatively easy, and only minimal resources are necessary to keep a list running. Another popular option is a Usenet newsgroup, which uses a similar format but a different distribution mechanism than e-mail.
Still another option is a web-based bulletin board system. While such systems are neither as flexible nor as powerful as Usenet or e-mail lists, they do offer a number of advantages. They are resistant to spam, can be easily integrated into other aspects of a web site and give visitors to the web site a chance to participate in discussion without having to register. Many commercial web sites now offer bulletin boards for their users, in the hopes of turning their site into a truly interactive and two-way experience, rather than another distribution medium for their content.
Starting in this issue, we will take a three-part look at how to create a simple bulletin board system of our own. This project was suggested by reader Dwight Johnson and also influenced by my creation of an “At the Forge” home page that will include examples of the programs presented in these columns, as well as a central place for readers to discuss the programs.
This month, we will look at the basic guts of the bulletin board system to be used on the ATF site. As you will see, I have decided to keep the software and the BBS very simple, without certain advanced features such as hierarchies and threading. However, it should not be difficult to add these features to the software, or to use this as a base for a more advanced system. Next month, we will add enough features to make this a serviceable BBS. Finally, in the third part of this series, we will look at ways in which we can add a number of useful features to the system.
The first consideration is the look and feel of the BBS, since that will force our hand on a number of other issues. As I indicated above, it is my goal to keep this software as simple as possible. I decided to keep discussions in a non-hierarchical manner. Each message belongs to a single thread within the BBS. We will not keep track of replies or allow sub-threads. Messages within a thread will be presented in chronological order, from the newest message to the oldest.
The user will thus have several possible options at any given point: starting a new thread, posting a new message to an existing thread, listing the current threads, or looking through the messages in one thread.
While I briefly considered storing messages in ASCII text files, I quickly decided to use a relational database. A database makes it easier to handle future expansion, since more features can be provided by adding one or more columns to a table. Databases also free us from having to worry about file formats, locking and other problems which inevitably occur when we use ASCII text files.
My database of choice is the “mostly free” MySQL. The programs will be written in Perl and will use Perl's database interface, known as DBI. See the “Resources” sidebar for pointers to information about any or all of these.
If you have been following this column over the last few months, you may be surprised to see that I have implemented it using simple CGI programs. I could have used mod_perl, a module that embeds a Perl binary inside of the Apache HTTP server. I could also have used HTML::Embperl, the templating language we explored in this column's previous two installments.
However, reality is often the compelling factor and the web space provider I use has not yet installed mod_perl. These programs should run just fine under Apache::Registry, the modules for mod_perl that provide emulation of the CGI standard.
If we are going to store information in a relational database, the first technical decision involves the database itself. What information do we want to store, and how do we want to store it?
Because we are storing messages and threads, I designed the system with two tables, ATFThreads and ATFMessages. Each message, including information about the author and the posting date, is stored in ATFMessages. Each message in the table points to a single thread in ATFThreads, allowing us to sort messages by thread.
Here, for instance, is the definition of ATFThreads:
CREATE TABLE ATFThreads ( id SMALLINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, subject VARCHAR(255) NOT NULL, author VARCHAR(60) NOT NULL, email VARCHAR(60) NOT NULL, text TEXT NOT NULL, date DATETIME NOT NULL, UNIQUE(subject) );
Each thread is stored in a single row of the database, uniquely identified by its id column, which we define to be a SMALLINT UNSIGNED. (We are thus allowed 65,535 different topics, which should suffice for now.) By declaring the column to be AUTO_INCREMENT, we are asking MySQL to give the id column a new value each time we insert a new row. By declaring it to be the PRIMARY KEY, we indicate that the id column will uniquely identify a row.
The other columns are fairly self-explanatory: subject contains the subject of the thread, while author and email contain the thread creator's name and e-mail address, respectively.
Each thread has an opening message that starts the discussion; it is stored in the text column in a column of type TEXT. TEXT fields can contain amounts of text larger than the 255-character maximum given to us by VARCHAR columns. VARCHAR columns are stripped of trailing whitespace, sparing us from at least one housekeeping chore when working with the database.
Finally, we give each thread a date column in which we record the creation date and time with a DATETIME element. We also ensure that the human-readable subject line for the thread is unique with the UNIQUE keyword at the end of the table definition. This prevents us from having two threads named “Problems with MySQL”, for example.
Now that we have seen how to create ATFThreads, we can define ATFMessages. The two are quite similar, the main difference being a reference to a thread ID:
CREATE TABLE ATFMessages ( id MEDIUMINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, thread SMALLINT UNSIGNED NOT NULL, subject VARCHAR(60) NOT NULL DEFAULT "No subject", date DATETIME NOT NULL, author VARCHAR(60) NOT NULL DEFAULT "Mr. Nobody", email VARCHAR(60) NOT NULL DEFAULT "atf@lerner.co.il", text TEXT NOT NULL );
Once again, we create a column with an auto-incrementing primary key named id. Different tables can have identically named keys just as different hashes can. If we are referring to both tables in a single query, we can distinguish between the two by using the table.column syntax, as in ATFMessages.id and ATFThreads.id.
Notice how we have used the DEFAULT keyword to assign default values to each of the elements. Truth be told, the way the database-handling programs are written makes it unlikely we will ever see these defaults. (Empty strings are passed to the database as empty strings, rather than as NULL values. To get a true NULL, we must pass an undefined scalar.) However, it is always a good idea to build multiple checks into your programs just in case one of the other levels does not work in the way you expected. This can also help us track down problems; if we notice that many users are identified as “Mr. Nobody”, we can assume something has gone wrong with our posting software.
We can create the tables by entering the above SQL commands at the interactive mysql prompt. Once they have been created, we are ready to start working on the programs.
The programs in this project share a number of elements. Each starts with a series of use statements:
use strict; use diagnostics; use CGI; use CGI::Carp qw(fatalsToBrowser); use DBI;
The first, use strict, prods us into making our variable references explicit, either by creating them as lexicals (with the my statement) or with the use vars statement. I have chosen to create all variables as lexicals, but if you were interested in putting common variable definitions into an external file, you might want to consider making them globals.
Next, we invoke use diagnostics, which tells Perl to give us information from the perldiag manual page if and when there are problems with our program. I find use diagnostics to be an invaluable debugging tool when working with web applications, since it often points to a foolish mistake I have made. This, along with use strict and the -w flag, makes programming in Perl much less error-prone.
We then load the CGI::Carp module, which overrides the built-in Carp module with routines of its own that make for more accurate messages in our HTTP server's error log. We also import CGI::Carp::fatalsToBrowser, which sends an error message to the user's browser if and when an error occurs. This allows us to use the standard die statement without having to worry about whether we have already sent the HTTP “Content-type” header. Sending a message to the user's browser without such a header almost always causes an error message to be displayed.
Each program in the BBS also defines a number of variables: $database, $server, $port, $username and $password. These variables are used to log into the database with DBI; by setting them at the top of the program, you can modify them as necessary without having to change hard-coded strings.
Each program also turns off buffering, so that information is sent to the user's browser as soon as the program sends it to the appropriate file handle. Normally, saying
print "<P>Hello</P>";
does not send <P>Hello</P> to the user's browser. Rather, it places the string in a buffer. When the buffer is filled, its contents are shipped off to the user's browser. This is more efficient, since the computer can copy a lot of data at once, rather than spending its time entering and exiting from the routines that handle file operations. However, it also means the user must wait to see results. We can turn off buffering by setting the built-in Perl variable $|:
$| = 1;Finally, each program connects to the database with the standard DBI routine:
my $dbh = DBI->connect("DBI:mysql:$database:$server:$port", $username, $password);If the connection succeeds, we receive a database handle (dbh) in return and store it in $dbh. If $dbh is false, however, we should report an error, since it means the connection did not work:
die "DBI error from connect:", $DBI::errstr unless $dbh;We can do the same thing when preparing a query. The result from $dbh->prepare is a statement handle (sth). When defined, $sth is an object that itself accepts methods. When $sth is undefined, the statement preparation failed:
my $sth = $dbh->prepare($sql); die "DBI error with prepare: ", $sth->errstr unless $sth;We can execute our statement with $sth->execute, which works in much the same way as $dbh->prepare. The difference is that the result code is a simple value, rather than an object:
my $result = $sth->execute;In some programs, we test the value of $result and use die to report an error:
die "DBI error with execute: ", $sth->errstr unless $result;In others, we use $result to decide whether to continue with the program or to print a more user-friendly error message:
if ($result) { # do something } else { # indicate an error }Finally, we always disconnect from the database at the end of our programs:
$dbh->disconnect;This is not truly necessary, since DBI and Perl close all such connections when the program exits. However, if you are running with -w, a message will be inserted into your error log each time a program exits without disconnecting from the database nicely. We do this in order to keep our error log free of spurious details.
Since each message must belong to a thread, we will first look at how a thread is created. A thread is no more than a single row in the ATFThreads table, so our thread-creation program will be fairly simple.
The three listings referred to in this article are available for anonymous download at ftp.linuxjournal.com/pub/lj/listings/issue57/3193.tgz. They are not printed here due to space considerations.
Add-thread.pl (Listing 1 in the archive file) uses the contents of an HTML form to insert a new row into ATFThreads. However, it performs some additional manipulation as well, to ensure that the data will be retrievable in a useful way.
We can use either single or double quotes around text strings in SQL queries. Double quotes are used by DBI for parameters and thus exclude the possibility of using quotation marks. We therefore use single quotes around our text strings. However, this raises the issue of how to pass single quotes to the program. A simple solution is to perform a substitution on each of the text strings generated by the user. For example:
$value{"subject"} = $query->param("subject"); $value{"subject"} =~ s/\'/\'\'/g;
We can do even better by using the built-in $dbh->quote method, which quotes a text string for us. $dbh->quote decides whether to use single or double quotes and also handles special characters, such as quotation marks and question marks, with ease. We use a foreach loop to quote each of the elements:
# Get the form parameters foreach my $element (qw(subject text author email)) { $value{$element} = $dbh->quote($query->param($element)); }Once we have done this, we can be sure that $value{$element} is suitable for insertion into the database.
We also perform several substitutions on the “text” HTML element, which contains the text that starts the thread. To begin, we remove all HTML tags, so as to prevent people from linking to all sorts of crazy sites. While it might be desirable to allow people to include HTML in their postings, it could also lead to chaos if formatting commands were inserted. I decided to be slightly draconian and disallowed all HTML. We do that by removing everything between < and >:
$text =~ s/<.*?>//sg;
Notice how we use Perl's non-greedy operator *? instead of * to remove the HTML tag. If we were to use * and the line had two HTML tags, Perl would remove everything from the first < through the final >. We use the /s modifier to tell Perl that . includes all characters, including new lines. Without /s, \n would not be included in ., which means a two-line tag such as
<a href="http://www.cnn.com/">would be ignored.
We then make sure new lines are treated correctly, first removing multiple new lines and then replacing them with HTML paragraph markers:
$text =~ s/\r\n/\n/g; $text =~ s/\r/\n/g; $text =~ s|\n\n|</P>\n<\P>|gi;
Once it has performed all of these tasks, add-thread.pl creates the SQL query that will insert the new thread into ATFThreads:
my $sql = "INSERT INTO ATFThreads "; $sql .= " (subject, text, author, email, date) "; $sql .= "VALUES ($values, NOW())";We insert the date of the thread for future use, but also so that we can sort the threads in the order of their creation.
The program which lists threads, appropriately named list-threads.pl (Listing 2 in the archive file), uses a SELECT query to retrieve all of the rows from ATFThreads:
my $sql = "SELECT id,subject FROM ATFThreads ORDER BY subject";
After performing an $sth->execute, it checks to see how many rows were returned. If none were returned, we indicate that no threads have yet been created. If threads exist, we iterate through the results with $sth->fetchrow, which places the query result into @row. We can pull out the elements of @row and print a list:
if ($sth->rows) { print "<ul>\n"; while (my @row = $sth->fetchrow) { print "<li> "; print "<a href=\"/cgi-bin/view-thread.pl?"; print "$row[0]\">$row[1]</a>\n"; } print "</ul>\n"; $sth->finish; }Users are presented with an alphabetical list of thread titles, each of which is a hyperlink to view-thread.pl (Listing 3 in the archive file), described below. As you can see, the argument to view-thread.pl is the id value of the thread, the defined primary key.
Next month, we will complete the design and implementation of our basic BBS, adding the ability to create messages and search through the system. Until then, consider dropping by a working implementation of this software at http://www.lerner.co.il/atf/, where you can trade ideas with other readers of this column.
Reuven M. Lerner is an Internet and Web consultant living in Haifa, Israel, who has been using the Web since early 1993. His book Core Perl will be published by Prentice-Hall in the spring. He can be reached at reuven@lerner.co.il. The ATF home page, including archives and discussion forums, is at http://www.lerner.co.il/atf/.