Network Administration with AWK

by Juergen Kahrs

What does the scripting language AWK have to do with networking? In the May 1996 LJ, Ian Gordon introduced us to AWK and demonstrated how to solve common problems with this scripting language that is part of Linux and every UNIX-compatible operating system. He summarized:

If your main concern is getting a working program written as quickly as possible, you probably do not want to wrestle with C or C++ for a week to perfect the most efficient algorithm. By trading off the speed advantages and control features of C (or another compiled language) for ease of use, gawk lets you get the job done quickly and relatively painlessly.

With this kind of efficiency in mind, it would be nice to also access network services with short AWK scripts. However, standard AWK has no functions for networking, and most AWK users would probably object to the introduction of such functions. AWK should stay the small, simple and powerful language it is now. Release 3.1 of GNU AWK does not introduce special functions for socket access (as Perl and C have), but uses a special file name for it. By treating network connections like files, even novices can write web clients with a few lines of AWK.
Finding Who is Logged In

Let's look at an example. It asks the finger service of your local machine if a particular user is logged in.

BEGIN {
 NetService = "/inet/tcp/0/localhost/finger"
 print "
 while ((NetService |& getline) > 0)
   print $0
 close(NetService)
}

Store this script in a file named finger.awk and let GNU AWK 3.1 execute it by typing gawk -f finger.awk. The strange pipe symbol, |&, is the second and last addition to the AWK language needed for networking. When communicating over a network, we have to use |& instead of simply |.

After telling the service on the machine which user it is looking for, the program repeatedly reads lines that form the reply. When no more lines are received (because the service has closed the connection), the program closes the socket before finishing. Try replacing name by your login name or the name of someone else logged in. If you want a list of all users currently logged in, replace name by an empty string (""). Also, change localhost to another machine name in your local network; doing so allows you to watch who is logged in on machines at remote locations.

The Coke Machine

Okay, this is not really an exciting application. The result you get is identical to the one you get by typing finger name@localhost at the shell prompt. So, let's try a really useful application. Today, many Coke machines are connected to the Internet. A short list of such machines can be found at http://www5.biostr.washington.edu/~jsp/coke.html. There, you see that the way to access them is identical to what we did in our first (and not so exciting) example—a finger request. Let us take the first Coke machine from the list and ask the machine which kinds of soft drinks are available there.

BEGIN {
 NetService = "/inet/tcp/0/cs.wisc.edu/finger"
 print "coke" |& NetService
 while ((NetService |& getline) > 0)
   print $0
 close(NetService)
}

Usually you get a reply with information on the different flavours of Coke and root beer currently available. If you have an account there, you can also order a drink. Many other machines of this kind are connected to the Internet. (See Resources.)

Both examples shown would work even if we deleted the final close command, because the operating system closes any open connection by default when a script reaches the end of execution. In order to avoid portability problems, we always close connections explicitly.

The Weather in Germany

Unlike the Coke machine service, most web services we access usually transmit HTML pages across the Internet with a protocol named HTTP. To most people, this is the real Internet. Can we access the real Internet with GNU AWK? Certainly. We just have to make sure we connect to port 80 of the web server instead of the finger port. This way, we can connect to the Yahoo machine and let it tell us the weather conditions at the place we live.

BEGIN {
 NetService = "/inet/tcp/0/
 print "GET http://weather.yahoo.com/forecast/Bremen_DL_c.html" |&
   NetService
 while ((NetService |& getline) > 0)
   print $0
 close(NetService)
}

Before starting this script, make sure you know which proxy server your provider uses and insert its name into the second line. If you do not use a proxy, insert the name of the web server (weather.yahoo.com). The result is the HTML content of the web page. It is up to your scripts to bring it into a more readable form or to extract the details of interest for further processing.

Reading the Ticker

Sometimes we are not really interested in viewing a web page. Imagine a web robot (or agent) that looks at the quotes of the Motorola stock shares every 15 minutes and sends you an e-mail if the price hits a certain limit. A command-line call that is executed every 15 minutes is easily written and stored in a shell script. Also, depending on the content of a data file, sending an e-mail is as straightforward to write as a shell script. Here is a script that reads the ticker for you:

BEGIN {
 NetService = "/inet/tcp/0/
 print "GET http://quote.yahoo.com/q?s=MOT&d=v1" |&
   NetService
 while ((NetService |& getline) > 0)
   print $0
 close(NetService)
}

Again, you must insert your proxy's name into the second line. During execution of the script, a request is sent to Yahoo's quote server and the resulting web page should be redirected to a file by you. With a grep command, the price can be extracted from the HTML text and compared to the limit.

Advanced Applications

In these examples, we have seen how useful applications can be written built on the same simple framework. This framework represents only a small fraction of what can be done with GNU AWK's networking device. Both TCP and UDP connections are available and both clients and servers can be written. More of the advanced applications can be seen in the small manual that supplements the official documentation distributed with the GNU AWK sources. (See Resources.)

Treating network connections like files is not a feature unique to GNU AWK. When TCP/IP was integrated into BSD UNIX in the early 80s, the creators of the socket API originally intended networking connections to appear as special files even to the user. But networking turned out to have many special cases which could not be handled in a uniform way with file handling. Later, the Portal File System approach was integrated into BSD UNIX. Portals are similar to GNU AWK's special file but are integrated as a file system into the operating system. This works well because the user can even establish connections at the shell prompt. The most recent implementation of the Korn shell (ksh93) provides virtually the same concept (/dev/tcp) at the shell level. None of these approaches has gained wide acceptance among users. Even Richard Stevens' article on Portals (see Resources) has not changed this.

One other approach to networking at the shell level that has gained some acceptance during the past year is the tool netcat. Originally a kind of UNIX hacker tool, it simply binds the standard input and output of a process to a network connection. It knows TCP as well as UDP, can behave as a server and allows “port scanning”, i.e., checking if there are servers listening at certain ports. This tool is simple to use and powerful, but some of the comments in the source code are quite unprofessional. Seldom have I seen such a large number of indecent curses, foolish hype and pure ignorance in a source file. Recently, netcat has been ported to Windows NT. To a humble user of NT, such a tool is like a long-awaited revelation.

Microsoft Windows

Back to pure AWKism and the different forms this belief takes on. On which platforms other than Linux is the networking feature of GNU AWK 3.1 available? It should work on all UNIX systems that comply with the XPG4 rules; this includes every UNIX that has a significant market share. Although the exact release date for GNU AWK 3.1 has not been set, this new feature should also work on Microsoft Windows 95 and NT as a part of the Cygwin tool set as soon as both are out. Cygwin is a UNIX-compatible programming environment that runs on top of Microsoft's Win32 API. It is currently available only as a beta release, but is already able to compile its own set of sources.

When this article was written, compilation of GNU AWK 3.0.3 worked fine, but 3.1 caused problems. If you intend to compile the sources in this environment, be prepared to experience some trouble. Most importantly, avoid compiling on the same machine you are using for networking with GNU AWK. In case you have only one machine available, reboot between compiling and testing. As of release B20 of the Cygwin tool set, clients and servers written in GNU AWK worked on Windows 95 but no server worked on NT 4.0 SP3. As of release B20.2, the compiler supports linking the file gawk.exe statically with all needed dynamic libraries. This would allow for distributing the GNU AWK interpreter as one single executable, but this executable does not work. Those problems should be solved by the time you read this; therefore, networking on Windows 95 should work.

Trade-Offs

We have seen that network access through a special device is good enough for many useful applications, but there are advanced features we have to trade off for this convenient access method. Some things are simply not possible within the easy-to-use framework AWK employs, namely:

  • broadcasting

  • non-blocking read

  • timeout

  • forking server processes

In spite of the lack of these advanced features, advanced applications such as a prototype “web server” or a “mobile agent” have been implemented in GNU AWK. If you need, and can handle, features like broadcasting or non-blocking read, you should use Perl or C instead of AWK.

Resources

Juergen Kahrs (juergen.kahrs@t-online.de) has used AWK on MS-DOS for time series analysis, statistical analysis and graphical presentation, mostly of neurological data. In 1996 he switched to Linux and enjoyed seeing his old scripts still running in a more efficient programming environment. He has come to peace with the fact that AWK will never be mainstream and enjoys seeing C programmers spend nights chasing NULL-pointer accesses. Juergen did the initial work for integrating TCP/IP support into gawk.

Load Disqus comments