Compile C Faster on Linux
lcc is a small, fast C compiler now available on Linux. A perfectly good C compiler, gcc, comes with Linux. Why would anyone bother installing a second one? Because the two compilers make different tradeoffs, so they suit different stages of the development cycle. gcc has many targets and users, and it includes an ambitious optimizer. lcc is 75% smaller (more when counting source code), compiles more quickly, and helps prevent some porting bugs.
For those who have always wanted to customize or extend their compiler, our recent book, A Retargetable C Compiler: Design and Implementation, tours lcc's source code in detail and thus offers especially thorough documentation. Pointers to lcc's source code, executables, book, and authors appear at the end of this article.
lcc is fast. gcc implements a more ambitious global code optimizer, so it emits better code, particularly with full optimization options, but global optimization takes time and space. lcc implements a few low-cost, high-yield optimizations that collaborate to yield respectable code in a hurry.
For example, lcc compiles itself in 36 seconds on a 90 megahertz Pentium running Linux. gcc takes 68 seconds to compile the same program (the lcc source) with the default compiler options, and 130 seconds with the highest level of optimization. Code quality varied less. gcc's default code took 36 seconds to reprocess this input, just like lcc's code. gcc's best code (that is, with optimization level 3) runs in 30 seconds, about 20% faster. This is only a single data point, and both compilers evolve constantly, so your mileage may vary. Naturally, one can save time by using lcc for development and optimizing with gcc for the final release build.
Indeed, compiling code with two different compilers helps expose portability bugs. If a program is useful and if the source code is available, sooner or later someone will try to port it to another machine, or compile it with another compiler, or both. With a new machine or compiler, glitches are not uncommon. Which of the following solutions will net you less unwanted e-mail? For you to find and erase these blots while the code is fresh in your mind? Or for the porter to get diagnostics much later, about non-standard source code?
lcc follows the ANSI standard faithfully and implements no extension. Indeed, one option directs lcc to warn about a variety of C constructs that are valid but give undefined results, and thus can behave differently on a different machine or with a different compiler. Some programmers use lcc mainly for its strict-ANSI option, which helps them keep their code portable.
Like gcc, lcc can be configured as a cross-compiler that runs on one machine and compiles code for another. Cross-compilers can simplify life for programmers with multiple target platforms. lcc takes this notion a step further than most cross-compilers: we can, and typically do, link code generators for several machines into each version of the compiler.
For example, we maintain code generators for the MIPS, SPARC, and X86 architectures. We both work on and generate code for multiple platforms, so it's handy to be able to generate code for any target from any machine. We usually fold all three code generators into all compiler executables. A run-time option tells lcc which target to generate code for. If you don't maintain code for multiple targets, you're free to use an lcc that includes just one code generator, saving roughly 50KB for each code generator omitted.
lcc is small. lcc's Linux executable with one code generator is 232 KB, and its text segment is 192 KB. Both figures for the corresponding phase of gcc (cc1) exceed a megabyte. lcc's small size contributes to its speed, especially on modest systems. A compact program benefits those who wish to modify the compiler. Most developers will use pre-built executables for lcc; they will never examine or even recompile the source code. But the Linux community particularly prizes the availability of source code, partly because it allows users to customize their programs or adapt them for other purposes.
When configured with the Linux PC code generator lcc is 12,000 lines of C source code. gcc's root directory—without the target-specific description files—holds 240,000 lines. Surely, some of this material is not part of the compiler proper, but the separation is not immediately apparent to those who haven't browsed gcc's source recently. The machine-specific module is the part most often changed, because new target machines come along more often than, say, new source languages. The lcc target-specific module for the Linux PC is 1200 lines, and half of that repeats boilerplate declarations or supports the debugger, so the actual code generator is under 600 lines. The target-specific modules for gcc average about 3000 lines. These comparisons illustrate the fact that the two compilers embody different trade-offs and that neither beats the other at everything: gcc can emit better code and offers many options, while lcc is easier to comprehend but is otherwise less ambitious.
gcc and lcc use retargetable code generators driven in part by formal specifications of the target machine, just as a parser can be driven by a formal grammar of its input language. gcc's code generator is based in part on techniques that one of us (Fraser) originated in the late 1970's. lcc uses a different technique that is simpler but somewhat less flexible.
Its compact source code helps people adapt lcc. An early adaptation by one of us (Hanson) injected profiling code that counts executions for a profiler that comes with lcc. For example, the lines
for (<1965>r = 0; <15720>r < 8; <15720>r++) if (<15720>rows[r] && <5508>up[r-c+7] && <3420>down[r+c]) { ...
annotate source code from the lcc test suite's implementation of a program that finds all ways to place eight queens on a chessboard such that none of them attacks any other. The numbers in angle brackets report how many times the following code fragment was executed, which can differ within one line. This profiler would have been a big project without lcc, but it was modest one with it. Other adaptations of lcc include interpreters (www.cit.gu.edu.au/~sosic/papers/sigplan92.ps.Z), code generators for multiple targets (ftp://ftp.cs.princeton.edu/pub/lcc/contrib/), a C++ compiler, programmable debuggers that can debug across a network (http://www.cs.purdue.edu/homes/nr/ldb/), and a retargetable optimizing linker (http://www.cs.princeton.edu/~mff/mld/). A group at Stanford University has adapted lcc for use with a global optimizer (suif.stanford.edu/suif/suif.html). At least some of these efforts chose lcc over gcc because lcc's small size made it seem easier to comprehend and change. Many of these projects were begun before the lcc book was done; we expect even more adaptations now that extensive documentation is available.
Most developers will use pre-built executables for lcc and never study the source code. The Linux community, however, expects source, and lcc provides an annotated version of most of its code in the form of a book. lcc's annotations, like its small size, are designed to help developers modify lcc.
lcc is written as what Knuth has termed a “literate program”, which interleaves the source code with prose explanations. Two programs process this input. One program extracts just the C source code, which can be compiled with any C compiler. The other program processes both the prose and the code and emits the typescript for the lcc book. We generate the book and the compiler from a single source because it's too easy for multiple sources to get out of sync with one another.
A brief fragment of the chapter on the X86 code generator demonstrates literate programming:
Static locals get a generated name to avoid other static locals of the same name:
<X86 defsymbol>= if (p->scope > LOCAL && p->sclass == STATIC) p->x.name = stringf("L%d", genlabel(1));Generated symbols already have a unique numeric name. Defsymbol simply prefixes a letter to make a valid assembler identifier:
<X86 defsymbol>+= else if (p->generated) p->x.name = stringf("L%s",p->name);
Each of the two displays above consists of a “fragment label” in angle brackets and a “fragment” of C code. The fragment label names the piece of the C program being described (here the version of the routine defsymbol for the X86). The += in the second fragment says that the second code fragment is appended to the first.
This example is necessarily tiny, but it shows how literate programming allows one to build up a complex program a bit at a time, explaining it on the way. The lcc distribution includes conventional C code that can be modified as usual, but when some explanation would help, one can easily get it from the annotated code in the book.
Not shown in this sample are page numbers in each fragment that point to adjoining fragments, and miniature indices in the page margin that point to the page that defines each identifier that's being used. Many readers have identified these mini-indices as especially helpful.
lcc's C source code and Linux executables are available for anonymous ftp at URL ftp://ftp.cs.princeton.edu/pub/lcc/. It's about a megabyte, so it can be downloaded using even, say, a 14.4kbaud modem in about 10 minutes. The package includes Dennis Ritchie's preprocessor for ANSI C, but lcc is also used with gcc's preprocessor. Like gcc, lcc emits assembler code for the standard Linux assembler, debugger, and C library, so the package does not include any of these. A sub-directory collects code generators and other companion software contributed by others. The package describes mailing lists for communicating with others working on, and with, lcc.
Our book about lcc, A Retargetable C Compiler: Design and Implementation, (ISBN 0-8053-1670-1) is available from Addison Wesley at 800-447-2226 and from other sources listed on lcc's home page (http://www.cs.princeton.edu/software/lcc/).
lcc is free for non-commercial use. The lcc book amounts to a single-user license for lcc, so some have arranged commercial use by simply including a copy of the book with their product (and charging for it); the publisher offers substantial discounts. Other arrangements are possible.
Chris Fraser (cwf@research.att.com) has been writing compilers since 1974. He earned a Ph.D. in computer science at Yale in 1977 and does computing research at AT&T Bell Laboratories in Murray Hill, New Jersey.
Dave Hanson (drh@cs.princeton.edu) is Professor of Computer Science at Princeton University. His research interests include programming language design and implementation, software engineering, and programming environments. His Web is at: www.cs.princeton.edu/~drh/.