Internet Servers in Perl
In my previous article in Issue #35 of Linux Journal, I wrote about the socket library functions in Perl with an emphasis on writing Internet client programs. Perl is also a good language for Internet servers, not only because of the socket capabilities and the ease of dealing with files and data, but because it also has a special mode for improving security. In this article I cover several aspects of writing Perl servers, including how to use the basic socket functions, how to best handle multiple connections, asynchronous communication and security issues. In the process we'll develop a simple Internet server similar to fingerd that works through the Web.
Socket communication may be either connection-oriented or connectionless. Connection-oriented protocols, like the Internet's Transmission Control Protocol (TCP), establish a link between client and server before exchanging any data. Connectionless protocols, like the User Datagram Protocol (UDP), simply read or write data, specifying the client or server address each time. Most servers use a connection-oriented scheme, and we use this approach in our example server (see Listing 1). However, I discuss the connectionless approach below.
Any Internet server, from the simplest to the most complicated, first uses the two functions socket and bind to establish an identifiable communications endpoint. The server uses socket to create a socket with the desired type and protocol. Recall the syntax for this function is:
socket SOCKET, DOMAIN, TYPE, PROTOCOL
SOCKET here is a Perl file handle initialized by the call to socket. For Internet TCP applications DOMAIN is AF_INET and TYPE is SOCK_STREAM. The Perl 5 Socket package defines the constants AF_INET and SOCK_STREAM as well as other socket-related constants and functions; refer to the previous article for details. The
An Internet server must bind a network address to the socket with the bind function. A client can bind an address, but it is not usually necessary in connection-oriented clients. This is also referred to as “naming the socket”. This process specifies the network address to which a client must connect to start communicating with the server. The syntax of bind is:
bind SOCKET, NAME
The SOCKET argument is still the file handle created by the call to socket. NAME is the address that is bound to the socket. The contents of this argument can be quite complicated (again, refer to the previous article for details). For versions of Perl from 5.2 on, a function in the Socket package called sockaddr_in returns a value for the NAME argument given a port number and an Internet host address. If you're writing something like an ftp or HTTP server, you can use the reserved “well-known” port number (see the file /etc/services for these numbers). Otherwise, any positive 16-bit integer will suffice as long as it is not one of the reserved numbers. For servers the special argument INADDR_ANY can be used for the Internet address, which lets the kernel pick an address for the socket.
For connection-oriented servers like our example program we now can use the listen function to tell the operating system that we'll accept connections on the socket. This function looks like this:
listen SOCKET, QUEUESIZE
We all know what SOCKET is by now. QUEUESIZE specifies the number of attempted connections that can be kept waiting; the symbol SOMAXCONN is the maximum for this argument (usually 5). This lets the server handle several near-simultaneous connection requests, a crucial feature for HTTP servers or daemons like inetd.
Now a client program could attempt to connect to the server, but we need more code to actually create the link. For many servers, the accept function is called, typically in a loop of some sort, directly after listen. The syntax of accept is:
accept NEWSOCKET, GENERICSOCKET
This function opens NEWSOCKET, a file handle that you can read from or write to in order to communicate with the connecting client. GENERICSOCKET is any open, named socket. For our server, this is the named socket we've already created with socket and bind. accept returns the address of NEWSOCKET in the same form as the NAME argument to bind.
Note that the accept call waits until a connection request arrives, so no processing can occur until it completes. This usually poses no problem since it matches the way most servers work: they wait for a request and then service it. Sometimes, though, an application might perform other tasks, like calculation or system monitoring, that can't be stopped to wait for client connections. If so, communication can be done asynchronously—that is, processing can be interrupted temporarily using a signal handler to make the socket connection and to process the client's request. I don't cover this in detail since that requires a lengthy digression into the fcntl system call and signal handlers, but Listing 2 illustrates the basic idea.
UDP does not guarantee reliability; extra user code must deal with problems caused by packets that don't make it to their destinations. The Internet's main connectionless protocol is called UDP, or User Datagram Protocol. A datagram contains all of the information required to send it to the right place. needed. For a connectionless server, listen and accept are not needed. A connectionless client usually does need to use bind so that a valid return address gets passed to the server in the client's data packets, but we won't worry about the client side here. To use UDP on our socket rather than TCP, we simply replace the socket argument SOCK_STREAM with SOCK_DGRAM and the getprotobyname argument tcp with udp.
In C we use the system functions sendto and recvfrom to send data between client and server with UDP, but Perl doesn't implement these directly. Instead, Perl uses send and recv for both connection-oriented and connectionless protocols. After setting up the socket with socket and bind, a connectionless server would usually call recv:
recv SOCKET, SCALAR, LEN, FLAGS
This function blocks until data becomes available on SOCKET, then reads LEN bytes into the scalar variable SCALAR. FLAGS are the same flags as for the recv system call. recv returns the address of the client, which can then be used to send information back with the send function:
send SOCKET, MSG, FLAGS, TOTO is the client address. The socket code in the simplest connectionless server would look something like this:
socket(S, AF_INET, SOCK_DGRAM, \ getprotobyname('udp')); bind(S, sockaddr_in( $port, INADDR_ANY) ); $cli_addr = recv S, $request, 80, 0; send S, $message, 0, $cli_addr;Now back to our TCP server. Remember I mentioned earlier that several connection requests can get queued up so the server can respond to each in turn. This might be inefficient (and probably annoying to the client user) if the server does something that takes a significant amount of time, like querying a database or running an external program. To get around this problem, many servers fork a new process to handle a request once they accept a connection. Look at our example server code for details. The only slightly tricky part is the CHLD signal handler used to clean up zombie processes.
Servers often run as setuid or setgid programs, meaning the processes have the privileges of the user or group that owns the executable file regardless of who runs the program. At the very least, a server program will run under your own user ID. Since anyone can, in principle, use an Internet server, you can see security is of the utmost importance. You must make sure the server does not give privileged access to important system files or your own confidential data. Usually this requires checks on environment variables, file privileges, external program execution, etc., so that it's hard to be thorough. Fortunately, Perl helps us out here with its taint mode, a mode that checks for common security violations. The -T command line switch turns on this mode, so we just add this to the “shebang” line at the top of the script.
The exec function in the example server might cause security concerns for at least two reasons. First, executing an external program implies the use of the PATH environment variable. This variable is considered to be tainted until we set it explicitly in the script, since it could be modified to cause the execution of a program other than the one we intended. Second, we separate the arguments to exec into the program name and the argument list, which prevents exec from calling the shell to do metacharacter substitutions. If these modification were not made, the taint mode would send warnings to the terminal and stop the program (in fact, that's how I found these problems). Keep in mind taint mode does not guarantee security, but it does make it much easier to identify well-known problems.
Network servers are among the most complex pieces of software, which is to say, you should by no means consider this article a comprehensive treatment of the subject. Still, you'll be surprised to find how many of the elements of our simple example program show up in even large, complicated servers. Perl does reduce some of the complexity though, since you already have convenient tools at hand to do the hard parts, like parsing protocols and manipulating files. Even if you ultimately decide to write the program in C or some other compiled language, Perl can't be surpassed for prototyping server applications. The price is right too, but I don't need to convince Linux users of the value of “free” software.
Mike Mull writes software to simulate sub-microscopic objects. Stranger still, people pay him to do this. Mike thinks Linux is nifty. His favorite programming project is his 2-year-old son, Nathan. Mike can be reached at mwm@cts.com.