Languages
The great thing about Unix is that you can write programs to do just about anything you want. The down side is that you may have to write a program every time you want to get anything done. A Unix system administrator—and the average Linux user is his own administrator—is faced with a seemingly never-ending stream of small jobs which are too tedious for human hands but too infrequent for a large programming effort.
Most of the programs I write are run once and thrown away. A significantly smaller number see action as often as once a week. I can count the number of programs run every day on one hand. Obviously, I can't afford to spend much time on any one program. I need a language in which it's easy to develop, easy to debug, and easy to extend. C, the traditional Unix programming language, doesn't offer any of these. In this article I'll introduce a language which does.
Scheme is closely related to Lisp, a language whose name once stood for “LISt Processing”. Lisp first saw the light of day in 1958; unfortunately, it has not become a stellar commercial success since then. In fact, common knowledge says that Lisp and its sister language Scheme are bloated and slow. While that may have been true in the bad old days when every programmer wrote in assembler and toggled the program into the computer's front panel with his teeth, today it's more important to maximize programmer productivity than to minimize machine cost. Scheme advances this goal by providing the programmer with a flexible but safe language which allows him to operate at a higher level than he would with C.
How exactly does it do that, you ask? First, and most importantly, Scheme provides automatic memory management. A C programmer must explicitly allocate and de-allocate every object he uses. If the program allocates more than he de-allocates, memory will leak and be wasted. If the program de-allocates too often, the program will behave incorrectly or even crash. Thanks to a process known as “garbage collection”, a Scheme programmer need only concern himself with allocation. He allocates an object when he needs it, and the Scheme runtime system frees it when the object is no longer needed.
Scheme provides a richer selection of data types than C does. While the C programmer has only numbers, characters, arrays, and pointers to choose from, the Scheme programmer has at his disposal numbers, characters, arrays, strings, lists, association lists, functions, closures, ports, and booleans. In addition, Scheme and C disagree on how to handle typing information: C assigns a type to each variable, and each variable may hold only values of that type. Scheme assigns no type to variables, but identifies each value with a type. One of the benefits of this approach is that it allows “polymorphism”, which means that one function can take arguments of many types. For example, you might have one function that could search for a word or a list of words.
An arguably peripheral issue plays an important part in making Scheme such a wonderful language for development: Scheme is available in both interpreted and compiled implementations. My experience with interpreted languages shows that development with an interpreted language leads to faster prototyping and debugging of the finished product. After two years programming in interpreted languages (mostly Perl and Scheme/Lisp), I cannot tolerate the edit-compile-run cycle that plagues the C programmer. With most Scheme interpreters, you can simply reload the particular function definition that you have changed. You have the full capabilities of the language at your disposal from the debugger, and you can even modify a running program!
Finally, a Scheme program is usually safer and more robust than its C counterpart. The Scheme primitives are type-safe—unlike the C primitives which will let you add a string, a character, and an integer, or cast any number to a pointer and then dereference it—and the Scheme environment provides significantly better error-checking than any C compiler could.
Why does C allow these deficiencies to exist? Do Scheme's conveniences come for free? Of course not.
As with most things in computer science, you are given three options: fast, cheap, and correct (you may pick two). The most common Scheme implementation, an interpreter, suffers from slowness and slightly inflated memory usage. Scheme compilers produce much faster code at the expense of larger executables and decreased (or totally removed) error checking. Which two of the attributes you pick depends on which two you need. Most programs don't need to be blindingly fast, but you can make Scheme fast if you need it.
Another drawback stems directly from the overwhelming popularity of C. Most external libraries and system interfaces are available as C-linkable libraries. Scheme users have little or no access to such libraries. My favorite Scheme implementation's solution to this is its Foreign Function Interface (FFI). An FFI allows a Scheme program to access variables and functions written in another language. Here's a short Scheme program that uses the FFI provided by Bigloo, a version of Scheme, to access the C function “printf” and the global system error variable “errno”:
(module ffi-example (foreign (int errno "errno") (int printf (string . foreign) "printf")) (main show-errno)) (define (show-errno argv) (printf "The value of errno is %d" errno) (newline))
Scheme can allow C the use of its functions through similar directives. With a decent FFI, Scheme and C programs can share data and interfaces as freely as two C programs.
Even the slowest Scheme interpreter is adequately fast for most of my day-to-day programs. Most Unix programs spend most of their time waiting for I/O to complete, and mine are no exception. The few programs that must be as computationally efficient as possible gain respectable increases in speed from using a Scheme compiler. In some cases, a program compiled by the Bigloo compiler at maximum optimization ran exactly as fast as the C equivalent compiled with “gcc -O2”. A trivial example is provided below. (See the table and two program listings).
time language 0.72 gcc -O2 0.72 Bigloo -unsafe 1.03 Bigloo 2.92 SCM/compiled 3.00 Scheme->C 79.04 Scheme48 90.30 SCM 91.76 Perl5 109.04 GNU awk 174.36 Perl4
Bigloo Version
(module optest (main main)) (define (main argv) (let ((b (string->integer (cadr argv))) (j 0)) (do ((i 1 (+ i 1))) ((> i b)) (if (even? i) (set! j (+ j 1)) (set! j (- j 1)))) (display j) (newline)))
int main(int argc, char *argv[]) { int i = 0, j = 0, b; b = atoi(argv[1]); while (i++ < b) { i % 2 ? j++ : j-; } printf("%d\n", j); return(j); }
Now that I've convinced you that Scheme isn't too expensive, I'd like to introduce you to programming in Scheme. The best reference for this is the “Revised Revised Revised Revised Report on Scheme” (R4RS), the great-great-grandson of the original Scheme language definition. The 1990 IEEE standard for Scheme is number 1178. However, I'll attempt to show the joy of Scheme programming with a few examples.
First, let me explain that Scheme departs from the esthetic norm for computer languages. Quite unlike a C or Fortran program—which is organized as a series of statements, usually one per line—a Scheme program consists of a series of parenthesized lists, “S-expressions” Each S-expression may define a new function or variable, invoke a function, or may simply be a literal data list. That's part of the genius of Scheme: program code and data are virtually indistinguishable. S-expressions can contain other S-expressions (which may contain other S-expressions, etc.). That means that Scheme's equivalent of statements may contain other statements, and that Scheme lists may contain other lists. To truly understand Scheme, you must be comfortable with recursive relationships like that.
Unlike most statement-oriented languages, Scheme has no operators. All actions are handled by functions or special forms, both of which exist superficially as S-expressions. C's “+” operator appears in Scheme as the “+” function. Functions are invoked by placing their names at the beginning of a code S-expression. For example, (+ 2 2) produces the number 4. The S-expression (display (+ 2 2)) prints the number 4 on the standard output. Some other S-expressions:
(define some-variable 12) (define (some-function argument) (display argument)) (- some-variable 1) (display (* some-variable 2))
Most people new to Scheme dislike the parentheses at first, but grow to enjoy them. Scheme's syntax is more regular than that of state-based languages, more conducive to language-sensitive editing, and easier to manipulate with macros. (Macros are beyond the scope of this article.)
Scheme provides the familiar “if” expression for conditional execution of code. One difference between C's if statement and Scheme's if statement is that the Scheme “if” statement returns a value. In fact, all Scheme statements return a value. This property of Lisp, coupled with the purported inefficiency of Lisp systems, caused some wit to comment, “Lisp programmers know the value of everything and the cost of nothing.”
This function prints an “s” if the number passed to it is not 1.
(define (plural-print-s num) (display (if (eq? num 1) "" "s")))
Another conditional statement is the “cond” form; rather than a simple either-or choice, “cond” evaluates each of several tests and executes the corresponding expression of the first one to evaluate to true.
(define (print-type object) (cond ((number? object) (display "number")) ((string? object) (display "string")) ((list? object) (display "list")) (else (display "I don't know that type"))))
Most C programmers will be familiar with the “for” and “while” iterative loops for repetition. Recursion is a little less-known in the C world. Scheme provides several very powerful methods of repetition in both iterative and recursive forms.
This recursive function prints a list of numbers with each number incremented by one.
(define (print-list+1 arglist) (if (pair? arglist) (begin (display (+ 1 (car arglist))) (newline) (print-list+1 (cdr arglist)))))
(car and cdr are functions that return the first element of a list and a list minus its first element, respectively.)
Recognizing that applying an operation to each element of a list is a common operation, Scheme provides the “map” function. Here is the same program written using “map”:
(define (print-list+1 arglist) (map (lambda (arg) (display (+ 1 arg)) (newline)) arglist))
The “lambda” form is similar to a function definition but the resulting function has no name. Although this function may not be called by name, it may be passed to other functions that take functions as arguments (confused yet?). In this case, “map” takes as its first argument a function. It then applies that function to each element of its second argument, which must be a list.
Scheme also provides the “do” loop, which functions almost identically to C's “for” loop.
Because of its simplicity and simple syntax, Scheme has become a favorite of language implementors, and as a result, dozens of Scheme implementations are available for free. These may be divided into two categories: interpreters and compilers. (Well, more correctly, the compilers usually include both compilers and interpreters.)
My favorite compiler is Bigloo. Written by Manuel Serrano of France's INRIA, Bigloo compiles Scheme code to efficient C, which is then compiled by gcc into executable code. Because of this method, Bigloo is both portable and open to improvements in the system C compiler. As mentioned earlier, programs compiled by Bigloo usually run within a few percent of the C equivalent program. Manuel, currently in the throes of his doctoral dissertation, may be reached at Manuel.Serrano@inria.fr.
Other compilers include the widely used Scheme->C, produced by Joel Bartlett (who may still be reachable as bartlett@decwrl.dec.com) at DEC's Western Research Lab. Programs compiled by Scheme->C aren't quite as fast at simple tasks as those compiled by Bigloo, but Scheme->C sports a much more advanced garbage collector. For programs with large data sets, Scheme->C may be a wiser choice.
Chez Scheme is also available for Linux. Chez is a venerable compiler written by R. Kent Dybvig (dybvig@cs.indiana.edu ), one of the authors of the de facto Scheme standard (R4RS). Chez Scheme is not free.
Several Scheme interpreters have gained popularity in recent months. One of the hottest recent products is STk, an interpreter which makes John Ousterhout's Tk easy-to-use widget set available from an object-oriented dialect of Scheme. STk isn't the best performer among the interpreter crowd, but it easily bests Ousterhout's Tcl.
Aubrey Jaffer's SCM, one of the most mature Scheme interpreters, offers small size, high speed, and a growing library of extensions. These modules include POSIX interfaces, socket I/O, and a curses screen management library. SCM's author also maintains a library of helpful Scheme functions in a package called SLIB. I use SLIB in several of my code examples. The author, Aubrey Jaffer, may be reached as jaffer@ai.mit.edui.
The Free Software Foundation recently began an effort to provide a standard scripting and application extension language for their products. This language, called GUILE for some acronymic reason that escapes me now, will be based upon the SCM interpreter. To find out more about GUILE, send mail to gel-request@cygnus.com (that's not a misspelling!).
Finally, a group of Scheme enthusiasts at MIT (Scheme's birthplace) have undertaken an ambitious project to make Scheme as practical at Unix scripting as the Bourne and Korn shells. Their Scheme shell (scsh) combines the superior Scheme language with the powerful process/pipe-based data flow mechanism of the Unix shells.
Here are some program examples to give you a feel for programming in Scheme.
Program 1 is the standard fibonacci algorithm, and Program 2 is the standard recursive factorial. These should work under any Scheme implementation. Listing 3 shows an implementation of the tried-and-true Unix->DOS file conversion utility for the SCM interpreter and SLIB, its library package. It reads a file of linefeed-terminated lines and outputs a file whose lines are terminated by carriage returns and newlines. Pay special attention to the definition of the function chomp; this function splits a string into a list of characters, filters out all the unwanted characters, and collapses the list back into a string. Listing 4 shows a definition of chomp that corresponds more closely to what a C programmer would write. Note the striking difference in complexity.
In Listing 5 is a program which takes a list of files on the command line and arranges them into disk-sized groups. This process will be familiar to those of you who installed Linux from floppy disks. The program illustrates the power of Scheme's list-manipulation procedures. map and for-each both execute a function for every member of a list. find-if returns the first element of a list that matches specified condition. These and other list techniques are usually implemented in C with cumbersome, error-prone loops.
Every language has its place in the programmer's toolkit. I don't use Scheme for every task—most of the time I use Perl, and sometimes I even use C. However, Scheme lets me write error-free programs in less time, with less effort and less pain, than almost any other language would. It provides me with many facilities that I would have to write for myself if using some other language, and unlike some other very high level languages, can be compiled to blindingly fast native code.
I maintain an archive of Scheme interpreters and compilers pre-compiled for Linux. To retrieve one of these, ftp to ftp.mindspring.com and look in the directory named /users/rsanders/lang. You will find these files:
bigloo-bin.tar.gz The Bigloo compiler version
bigloo-elf-bin.tar.gz Bigloo with ELF shared libraries
scheme2c-bin.tar.gz The Scheme->C compiler from DEC
scheme2c-elf-bin.tar.gz Scheme->C with ELF shared libraries
scm-bin.tar.gz Aubrey Jaffer's SCM interpeter
slib.tar.gz Jaffer's SLIB library
stk-bin.tar.gz The Tk-compatible Scheme interpreter
If your system is capable of compiling and running ELF binaries, then I suggest you use the packages that contain ELF shared libraries. The use of these shared libraries can reduce application startup time and size by up to 180 KB.
Revised Revised Revised Revised Report on the Algorithmic Language Scheme (R4RS) - William Clinger, Jonathan Rees et al. Postscript and DVI versions available via anonymous FTP from swiss-ftp.ai.mit.edu in /archive/scheme-reports. Also available in HTML form from swiss-ftp.ai.mit.edu as /archive/scm/HTML/r4rs*. The Usenet newsgroups comp.lang.scheme and comp.lang.lisp. Introductory books (from the comp.lang.scheme FAQ):
Daniel P. Friedman and M. Felleisen.The Little LISPerMIT Press (Cambridge, MA), 3rd printing, 1989.ISBN 0-262-56038-0. Science Research Associates (Chicago),3rd ed, 1989. 206 pages.
Good for a quick introduction. Uses Scheme instead of Common Lisp. (The book uses a dialect of Scheme with footnotes about translating to Scheme or Common Lisp. The footnotes won't allow a non-expert to use Common Lisp for the advanced chapters because of the complexity.)
Brian Harvey and Matthew WrightSimply Scheme: Introducing Computer ScienceMIT Press, Cambridge,MA, 1994. 583 pages.ISBN 0-262-08226-8. $49.95.
This book is ideal for students with little or no previous exposure to programming. The book is designed to be used before SICP (the authors call it a SICP “prequel”), and makes Scheme fun by sheltering the students from potentially confusing technical details. Unlike Pascal or C, the emphasis is on ideas, not obscure matters of syntax and arbitrary rules of style. High schools who have shied away from using Scheme because they found SICP to be too challenging should consider using this book instead.
The text gradually and gently introduces students to some of the key concepts of programming in Scheme. It starts off with functions and function composition and continues with the notion of functions as data (first-class functions) and programs that write programs (higher-order functions). Since the complexity of the language is hidden, students can get involved in some of the more interesting and fun aspects of the language earlier than in other texts. Then the book progresses through the more complicated concepts of lambda, recursion, data abstraction and procedural abstraction, and concludes with sequential techniques, but with careful attention to topics students often find difficult. There are five chapters on recursion alone! There's also a pitfalls section at the end of most chapters to help students recognize and avoid common errors. The book uses several programs as examples, including a tic-tac-toe program, a pattern matcher, a miniature spreadsheet, and a simple database program. Source code for the programs is available by anonymous ftp from anarres.cs.berkeley.edu:/pub/scheme/, or for $10 on IBM or Macintosh diskettes from the publisher.
Harold Abelson and Gerald Jay Sussman, with Julie Sussman.Structure and Interpretation of Computer ProgramsMIT Press (Cambridge, MA) and McGraw-Hill (New York), 1985.542 pages. ISBN 0-262-01077-1 $55.
The teacher's manual, which is also available from MIT Press (ISBN 0-262-51046-4 $20), does NOT contain solutions to the exercises, but does contain hints on teaching with the book.
Starts off introductory, but rapidly gets into powerful Lisp-particular constructs, such as using closures and engines, building interpreters, compilers and object-oriented systems. Often referred to by its acronym, SICP, which is pronounced “Sick-Pee”. This is the classical text for teaching program design using Scheme, and everybody should read it at least once. MIT problem sets are available from the repositories, and materials from Gustavus Adolphus College are available from ftp.gac.edu:/pub/SICP/.
George Springer and Daniel P. FriedmanScheme and the Art of ProgrammingMIT Press and McGraw Hill,1990, 596 pages. ISBN 0-262-19288-8, $50.
Introduces basic concepts of programming in Scheme. Also deals with object oriented programming, co-routining, and continuations. Gives numerous examples. Has more of an emphasis on teaching Scheme than SICP, and can be seen as an alternative to SICP. Source code from the chapters is available from ftp.cs.indiana.edu:/pub/scheme-repository/lit/sap/
Robert Sanders (rsanders@mindspring.com) works as the senior engineer at MindSpring Enterprises, an Atlanta-based Internet Service Provider. Ever since he escaped his role as dosemu's author, he's been hard at work studying computer languages and their implementations. He likes Linux.