Typesetting with groff Macros
“In the beginning was the word.” And from the wordy primordial void there soon arose the blank page, the toner cartridge and the now ceaseless human craving for print. If you have a desire to look good in print, or just need to knock out a memo, term paper or letter to mom, you should know about groff. groff is a rich yet accessible set of document formatting tools and is available as standard equipment on every Linux system. groff can help take your words and typeset them beautifully on the printed page.
groff refers specifically to the GNU and updated version of troff (that venerable document formatting system developed for UNIX in the prehistoric era, before the Internet, compact disc and microwave popcorn). Traditional troff was first written in the early 1970s by Joseph Ossana at Bell Labs, rewritten a few years later by Brian Kernighan and designed for the computers and typesetting equipment available at the time. The GNU version of troff—first called gtroff, now simply groff—was written in the early 1990s by James Clark. While remaining compatible with traditional troff, groff offers several key enhancements making it easier to use, more powerful and containing fewer limitations than the program it supersedes. GNU groff is actively maintained and continues to evolve. In addition to Linux and other UNIX/UNIX-like systems, ports of groff are available for most of the other platforms out there. This ubiquity and open-source freedom lets you publish and share your documents portably and freely among platforms.
Using groff's macro capabilities for generating printed output is the focus of this article. It should also be mentioned that groff serves as the formatting engine for the on-line manual pages produced by the man command. If you need a sample of the typesetting prowess of groff, simply generate a printed manual page with the -t option to man:
man -t troff >troff.man.ps
This will produce a PostScript version of the manual page for groff, which you can then view on-screen with one of the PostScript previewers (gv, mgv), print directly with a PostScript printer or print to a non-PostScript printer using a PostScript interpreter such as GhostScript. (You should really take a look at this man page, by the way. It provides a thorough summary of all the additional features available in GNU groff, with more detail than presented here.)
groff offers all the niceties of computerized typesetting, including automatic ligatures, kerning, hyphenation and end-of-sentence spacing. groff also provides low-level control over all aspects of page layout by means of typesetting commands embedded into an otherwise plain text file. Most often these commands—or, in groff parlance, requests—are specified with a period in the first column of the line containing the command. For example, the following snippet of document has embedded commands for increasing the left indent and decreasing the current line length:
This is an example of a groff document..in +0.5i .ll -0.5i When formatted by groff, the text continuing here will appear indented by one-half inch from both of the previous margins.
Although it is possible to format a document completely using such “raw” groff requests, it is more typical for endusers to work with a collection of predefined macros that encapsulate sequences of raw requests into single commands. For example, if we wanted to create a macro for the block indent commands in the previous snippet, it might look like this:
.de Bi.in +0.5i .ll -0.5i ..The .de request begins the definition of our macro named Bi, and the double period on the last line marks the end. Invoking a macro within a document follows the same syntax as using a raw request (the name of the macro follows on a line with a period in the first column). Our new macro used in a document would look like:
This is another section of my groff document..Bi Oh boy, now the text continuing here is indented from both margins!If at some later time we want to increase the block indent to three-quarters of an inch, we need only change the macro definition. All instances of Bi throughout the document will then format with the new dimensions.
So far, we haven't seen a whole lot here to get excited about. One of the limitations of traditional troff is that the names of all commands, macros and other variables are limited to two characters. Two measly characters? As mentioned earlier, troff was developed in the veritable stone age of computing, when every bit mattered, and succinctness was sublime. While the developers of troff and the standard macro packages have done their best to devise naming schemes that are as mnemonic as possible within this two-character constraint, the resulting interface is about as user-friendly as 80x86 assembly language (which at least uses three characters for most of its instruction set!).
Fortunately, GNU groff eliminates this two-character naming limitation. For both the macro developer and the eduser, the
most significant enhancement of groff is that all names, including macros, numbers, strings, fonts and environments, can be of arbitrary length. Groff also allows for the aliasing of troff commands, macros and variables to provide alternative names for existing ones. We will exploit this feature heavily through the rest of the article. In fact, let's begin right now by aliasing the groff alias command itself:
.als ALIAS als
We can now use this command to provide a set of longer names for other key groff commands:
.ALIAS MACRO de.ALIAS NUMBER nr .ALIAS STRING dsSure, your old-time, hard-core troff jocks will gnash their teeth at the syntactic sugar. But the rest of us will have an easier time figuring out what in Sam Hill some macro is doing when we get back to work on it after a long and pleasurable weekend—or some other lapse into real life.
Listing 1 demonstrates an obligatory “Hello, world!” program implemented with groff macros. In it we can see that groff offers commands for creating user-defined variables of type number and string, and that these can be used within the macros we develop. Note the addition of more command aliases and number registers in this more practical example:
Listing 1. “Hello World Program”
.\"(this is a traditional troff comment).\" \# (this is a gnu and improved comment!) \# \# define additional aliases: .ALIAS BRKFILL br .ALIAS SKIP sp .ALIAS NEED ne .ALIAS TINDENT ti \# \# define number registers: .NUMBER #PARINDENT 0.5i .NUMBER #PARSKIP 0.8v .NUMBER #ORPHANS 2 \# \# define user interface \# tag for a new paragraph: .MACRO <p> __END__ . BRKFILL \" break and spread pending output . SKIP \\n[#PARSKIP]u \" paragraph prespace . NEED \\n[#ORPHANS]v-1v+1u \" orphan control . TINDENT \\n[#PARINDENT]u \" indent 1st line .__END__
As may be clear from our selection of alias names and in-line comments, the macro definition of <p> provides a markup tag for a new paragraph with the following features when formatted by groff:
completes formatting and forces output of any pending line currently in process
creates vertical prespace for the paragraph to follow by the value in the #PARSKIP variable
controls orphans by keeping a minimum of #ORPHANS lines together
temporarily indents the first line of the paragraph by the value in the #PARINDENT variable
.<p>This is my new paragraph. Notice how groff lets me create HTML-like tags. .<p> Here is my next paragraph...Although this example is simplified from a final implementation, it demonstrates how we can export a user interface built up from basic groff macros and create structured markup tags for our documents. Notice also that another macro file could alternatively define the <p> macro when publishing the same document to the Web:
.MACRO <p> __END__<p> .__END__A macro name can be any string of any characters, and groff is case sensitive. In our example named <p>, the angle brackets have no special meaning; they are just part of the macro name we have devised to simulate an HTML-like tag.
We should, however, expand the definition of the macro given above. Recall that the .MACRO command itself is an alias we have given to the raw groff request .de. This command accepts two arguments: the first is the name of the macro (here <p>); the second is an optional termination label (here END). Any arbitrary string may be used to mark the end of a macro definition. We use END in these examples, but one could also use <<< or *****, or any other convention that helps to improve the readability of a collection of macro commands in a file.
The macro also demonstrates different forms of comments. The first form (.\“) with a period in the first column actually functions as an undefined request, with the effect that the entire line is silently ignored. The second form (\#) is a GNU groff extension and ignores everything on the line beyond the comment including the new_line. The third form (\") can be used on the same line as groff commands and ignores everything on the line beyond the comment, not including the new_line. If one were to use this last form of comment (\") on a line by itself, and without a period in the first column, groff would interpret the new_line and generally convert it into a space or new line (depending on fill mode) in the output. Unintended spaces and blank lines can be a source of misery and anguish, especially to the novice macro developer trying to figure out why extra space is creeping into the document. Generally, the GNU form of comment is preferable for single-line comments, while the traditional form is required for comments following on the same line as groff commands.
Finally, you probably noticed that while groff command syntax requires a period in the first column, the name of the command itself may be indented to any level on the same line. By using logically indented source code, together with comments, you will greatly improve the readability of the code for yourself and others in future generations of groff users to come. (The preceding comment is a public service announcement as required by the Surgeon General of Computer Science and is based on extensive scientific evidence that such conventions will prolong the life expectancy of your source code.)
Groff has a set of about 50 predefined variables called number registers. These are the internal gauges of groff's typesetting machinery. While processing an input file, groff maintains these registers with the current value of such variables as page number, position on page and point size. Number registers are in a separate namespace from strings and macros, and are aliased with their own alias command, as in the following:
.ALIAS ALIASNR aln.ALIASNR _PTSIZE .s .ALIASNR _LEADING .v
In this example, we first alias the command for aliasing numbers, adapting the methodology we used earlier. Then, we alias the read-only registers for the current point size and vertical line spacing, choosing to use the traditional typesetting terminology—“leading”—for the latter. Although not required, the above example also demonstrates the use of a specific convention we follow, to prefix aliases for system variables with an “_” (underscore character).
You can, of course, follow your own heart in these matters. But the use of a naming convention may help to distinguish the variables themselves from the names of the commands that set the variables, such as:
.ALIAS PTSIZE ps.ALIAS LEADING vs
These might be used in a macro as follows:
.MACRO <fontsize:> __END__. PTSIZE \\$1 . IFELSE "\\$2"" \{\ . LEADING ( \\n[_PTSIZE]u * 120/100 ) . \} . ELSE \\{\ . LEADING \\$2 . \} .__END__With usage in a document:
.<p>A message to the world: .<p> .<fontsize:> 18p Is groff great or what?The first line of the macro sets the current point size to the value of the first argument to the macro. The second line introduces a compound if/else statement, using groff's string comparison syntax for the logical test. If the second argument is empty, the leading is set by taking the value of the point size now in the numeric register _PTSIZE, and increasing it by 20%. Otherwise, the leading is set to the value provided by the second argument.
Parentheses in a numeric expression permit the use of spaces within the expression. Otherwise, in the example above, we would need to use the less legible form without any spaces:
.LEADING \\n[_PTSIZE]u*120/100
Numeric expressions are evaluated simply left to right, there are no operator precedence rules, and parentheses are required to explicitly change the order of evaluation.
All arithmetic operations and number registers are ultimately integer based. Groff internally translates all dimensional measurements into machine units (based on 72,000 units per inch for PostScript devices), providing a functional “illusion” of fractional dimensions and point sizes. This allows us to specify decimal terms such as 8.5i and 11.5p, which, in fact, evaluate to 612,000 and 11,500 machine units respectively. Numeric values can be specified in any of the units shown in Table 1.
In practice, groff's internal use of integral math can have significant consequences for the macro developer. Consider what would happen if the expression above were instead stated:
.LEADING (\\n[_PTSIZE]u * (120/100))
Using integer division, the parenthetical term of 120/100 would evaluate to one and the entire expression would then evaluate to the current point size, and not 20% larger as intended.
As it turns out, not all predefined number registers are, in fact, numeric. For example, the name of the current file being processed is in the read-only register .F:
.ALIAS MESSAGE tm.ALIASNR _LINE .c .ALIASNR _FILE .F .MESSAGE Currently processing file \n[_FILE], line \n[_LINE].
Although both variables are evaluated using the syntax for number registers, _FILE returns the name of the current file as a string. Despite this anomaly, groff permits only numeric expressions in user-defined number registers. The example here, by the way, is one means of inserting debugging messages in your macro file during development. The .tm request—aliased above to .MESSAGE—sends any text that follows to the standard error stream.
Observant readers may be wondering why the syntax for evaluating the number registers inside the <p> macro have two backslashes (e.g., \\n[#PARSKIP]u), rather than one (e.g., \n[_LINE]) as are shown above. The difference is subtle but important.
The reason for using two backslashes inside macro definitions is that we usually don't want the expression inside the macro to be evaluated at the time the macro is first read. Rather, we would like the expression to be evaluated every time the macro is played back. A double backslash is groff's escape sequence for the backslash character itself, providing the means of getting a single backslash to print in your output. When groff is reading in a macro for the first time—in what is called “copy mode”—it interprets everything as it usually does, including escape sequences. So when a double backslash is encountered in a macro definition, groff converts it to the single backslash the sequence represents. Then, whenever the macro is played back, the single backslash remaining is interpreted in the usual manner.
Although we could define macro variables with a single backslash, such as:
.MACRO <p>.SKIP \n[#PARSKIP]u \# etcetera
This macro would always execute with the amount of paragraph prespace specified in the variable #PARSKIP at the time the macro was first read. You would be stuck with the same #PARSKIP for your whole document. By using two backslashes, as in our original definition of <p>, we can dynamically change the #PARSKIP variable anywhere in the document and as often as we like, for example:
\# user interface for setting parskip:.MACRO <parskip:> __END__ . NUMBER #PARSKIP \\$1 .__END__ \# \# tighten spacing between paragraphs: .<parskip:> 0.4vThe new setting will now affect the format of all instances of the <p> macro that follow.
As we could expect, groff offers a useful extension in this area as well. The “\E” sequence represents an escape character that will not be interpreted in copy mode. So, our <p> macro could just as easily be written:
.MACRO <p>.SKIP \En[#PARSKIP]u \# etcetera
The “\E” sequence will provide the same result as the “\\” double backslash sequence.
We have shown above that the groff syntax for evaluating a number register with names of arbitrary length is
\n[anyname]
Similarly, the syntax for evaluating other registers is
\*[anyname] string\f[anyname] font \[anyname] special charactergroff only has scalar variables, lacking compound structures or subscripted arrays. But it is possible to combine definitions and numeric variables in such a way as to fake the effects of compound data types. Here, we will demonstrate a “pseudo-array” that may come in handy for your bag of macro tricks.
Consider the following string definitions for days of the week:
.STRING $DAYNAME1 Sunday.STRING $DAYNAME2 Monday \# etcetera .STRING $DAYNAME7 Saturday
groff provides a number register representing the current day of the week as a numeric value 1 through 7, and, of course, we alias it again to fit in with our scheme:
.ALIASNR _DOW dwNow, we can initialize a variable with the current dayname using the pseudo-array of strings we defined above as follows:
.STRING $TODAY \*[$DAYNAME\n[_DOW]]Anytime we need the name of the current day in a macro or document, we need only use the string variable $TODAY:
.<p>Thank goodness it's \*[$TODAY].Human reaction to this message will likely be most favorable when the _DOW variable evaluates to 6.
groff also has an extension that enables the use of looping constructs within macros. Together with pseudo-arrays, this feature gives you considerably more power and flexibility over traditional troff, which only has if/else branching logic. Following the example above, if you needed a macro to create a header for a list of tab-separated columns for each weekday, (Monday through Friday) you might cobble up something like the following:
.ALIAS TABSET ta.ALIAS WHILE while .MACRO <weekdays> __END__ . NUMBER IX 1 1 . NOFILL . TABSET T .75iC . WHILE \\n[IX]<6 \{\ \\*[$DAYNAME\\n+[IX]]\c . \} . FILL .__END__
In the example above, the TABSET command makes use of groff's “T” extension for repeating tabs, set here every 3/4 of an inch. The loop test variable, “IX”, demonstrates the auto-increment syntax with a number register (the “+” sign in the \\n+[IX] expression). This has the effect of preincrementing the variable, so the first time through the loop IX will evaluate as 2, printing “Monday” from the pseudo-array $DAYNAME. Finally, the printed line is terminated with the \c escape sequence, to continue output on the current line without inserting the new line that would otherwise be inserted in nofill mode.
groff's implementation of “while loops” includes .break and .continue statements. These give groff more of the flow-of-control facilities of a complete programming language. Although you probably won't be using groff for solving multiple regression problems, groff's while loops do make it easier to write macros for, say, printing columns of address labels on precut forms, without using an external processor.
The basic page layout model for groff is in keeping with groff's minimalism. The only dimensions groff requires are the vertical length of the page, the hard left margin and the horizontal length available for printable lines. Each of these dimensions is set with its own request that we alias below:
.ALIAS PGLENGTH pl.ALIAS PGOFFSET po .ALIAS LNLENGTH ll
Usually, though, we need a document to have other layout parameters, such as a top and bottom margin, possibly with running headers and/or footers. All these may be configured using groff's trap mechanism in combination with additional parameters and macros for the page transition that we devise.
Let's imagine that we would like a top and bottom margin of one inch for our main body text and a centered page number one-half inch from the bottom of every page. In addition, we want to define these parameters so they will work whether we are using letter, legal or A4 sized paper. The first step is to define our own set of number registers to hold all of our layout parameters:
\# parameters with default settings:.NUMBER #PAGELENGTH 11.0i .NUMBER #PAGEWIDTH 8.5i .NUMBER #LEFTMARGIN 1.0i .NUMBER #RIGHTMARGIN 1.0i .NUMBER #TOPMARGIN 1.0i .NUMBER #BOTMARGIN 1.0i .NUMBER #FOOTMARGIN 0.5i \# \# layout initialization macro: .MACRO SET_LAYOUT __END__ . PGLENGTH \\n[#PAGELENGTH]u . PGOFFSET \\n[#LEFTMARGIN]u . LNLENGTH \ \\n[#PAGEWIDTH]u-\\n[#LEFTMARGIN]u-\\n[#RIGHTMARGIN]u .__END__ \# \# initialize layout with defaults: .SET_LAYOUT
The next step is to write our page transition macros and put them into position with the trap mechanism. The following snippet demonstrates:
\# some more aliases:.ALIAS CENTER ce .ALIAS RIGHT rj .ALIAS NEWPAGE bp .ALIAS SETTRAP wh \# \# macro for header: .MACRO MYHEADER __END__ ' SKIP |\\n[#TOPMARGIN]u .__END__ \# \# macro for footer: .MACRO MYFOOTER __END__ ' SKIP |(\\n[#PAGELENGTH]u - \\n[#FOOTMARGIN]u) . CENTER \\n[_PAGE] ' NEWPAGE .__END__ \# \# position to invoke header/footer: .SETTRAP 0 MYHEADER .SETTRAP 0-\n[#BOTMARGIN]u MYFOOTERThe example shows two macros, MYHEADER and MYFOOTER, which define the actions taken at the top of the page (position 0) and at the bottom margin (-1.0i). The syntax in these macros shows the deferred break control character, the apostrophe, ', used with groff commands that would otherwise cause the output to be immediately forced out.
The page layout parameters are defined with default values, but here we will create a user-interface for changing the papersize:
.MACRO <papersize:> __END__. IFELSE "\\$1"letter" \{\ . NUMBER #PAGELENGTH 792p . NUMBER #PAGEWIDTH 612p . \} . ELSE .IFELSE "\\$1"a4" \{\ . NUMBER #PAGELENGTH 842p . NUMBER #PAGEWIDTH 595p . \} . ELSE .IFELSE "\\$1"legal" \{\ . NUMBER #PAGELENGTH 1008p . NUMBER #PAGEWIDTH 612p . \} . ELSE \{\ . MESSAGE \ Missing or unrecognized papersize,\ file \\n[_FILE], line \\n[_LINE] . \} . \" re-initialize layout: . SET_LAYOUT .__END__
The enduser can now set the paper size of the document, which initializes the margins accordingly. Putting together the elements we have looked at so far, we have defined an interface to groff that allows the enduser to create a document that looks like this:
.<papersize:> a4.<fontsize:> 11.5p .<parskip:> 0.8v .<p> This document is typeset by groff!
Just like the vi editor and gcc compiler, groff is one of the mainstay classics in the standard UNIX/Linux environment. We have seen just a few ways of using groff's extensive macro capabilities to define markup and page layout interfaces that readily turn plain text files into typeset-quality print.
The features covered here are by no means the whole story. For example, groff also includes native facilities for drawing lines, curves, circles, ellipses and polygons with shaded filling. And, this does not even begin to cover groff's suite of preprocessors for graphs (grap), pictures (pic), equations (eqn), tables (tbl) and bibliographic references (refer). As is customary with GNU and Linux software, groff comes with thorough and high-quality documentation. (See Resources for more information.) And there are, of course, active mailing lists for staying current with groff and interacting with its user community.
This article has been aimed at the creation of short documents, but groff is capable of printing works of any length. In fact, groff is likely the typesetter used in the publication of your favorite O'Reilly title. For tour-de-force examples of groff in action, not to mention some of the best books on UNIX programming ever published, see any of the series by W. Richard Stevens. (The late Dr. Stevens is quoted at the beginning of this article from his colophon to UNIX Network Programming, Volume 2, Prentice-Hall PTR, 1999.) Much like the C programming language born of the same era, groff has an enduring and powerful minimalism that continues to lend itself well to typesetting tasks of all sizes. And if you should hear of reports suggesting groff's demise, just remember, some folks used to make similar claims about UNIX as well!
Wayne Marshall (guinix@yahoo.com) is a UNIX programmer and technical consultant currently living in Guinea, West Africa. He enjoys traveling, hiking, photography, Africa, strong black tea, popcorn and baking cookies.