What's GNU

by Arnold Robbins
groff

This month's column discusses groff, the GNU version of troff. Explaining troff in full detail can (and has!) taken more than one book. For now, we'll provide a little bit of history and an overview of what groff is, what the input tends to look like, and how you would use it.

by Arnold Robbins

What is troff?

While there are many WYSIWYG word processing programs out there, some of which are quite powerful, and others which are usable and freely available, many long time “power users” still prefer text formatters like troff and TeX for the control they give you. Another advantage that these programs have is that you can edit the input using any text editor, even ed or vi over a 2400 baud modem connection, or on a laptop system that can't support X windows.

nroff and troff are the Unix text formatters. They are essentially twins; each accepts the same input “language”. The difference is in the output they produce. nroff was designed to produce output for devices with fixed-width and fixed-size characters, such as terminals and line printers. troff was designed for photo-typesetters. nroff simply ignores requests that it cannot honor. From now on, we will follow the time-honored convention of referring to both programs as troff, to make things simpler.

troff was written at Bell Laboratories by the late Joseph Osanna. It was modeled after the text formatters of the time, notably one named runoff. (runoff was written by Jerry Saltzer for the CTSS system at MIT, running on a modified IBM 7094, in the middle 1960's time frame.) Interestingly enough, nroff was written first; the name stood for “new runoff”. Later, when the research group acquired a photo-typesetter, nroff was enhanced to deal with the newly acquired capabilities, and thus troff was born. In the early 1980's, after the death of Mr. Osanna, Brian Kernighan took over troff, cleaned it up and enhanced it. The troff language is now frozen. It will not evolve further.

troff's Capabilities

Input to troff is a mixture of text and formatting commands. You might think of this as “what you want to say” and “how you want to say it.” Typically, commands are on separate lines by themselves. troff is able to distinguish commands from text, since command lines begin with a dot, or period. Special tricks have to be used to get troff to treat a line that begins with a dot as real text.

There are a large number of commands in troff. Some of the more important commands are for the following tasks:

  • Filling. This means putting as many words of input text on one output line as possible.

  • Adjusting. This means padding lines with blanks so that the margins on both sides are even. Book and magazine text is typically both filled and adjusted.

  • Font changes. Printed text is often in multiple fonts. Italics are often used for emphasis. Bold text is used for strong emphasis and for headings. troff supports at least four fonts normally, with the ability to easily add others.

  • Size changes. Photo-typesetters give you the ability to print characters at different sizes. Most text is set in 10-point type, where one point is 1/72 of an inch. Text can be made smaller or larger as needed. For example, footnotes are often set in a smaller point size than normal text.

  • Margin control. troff gives you separate commands to control the size of all four margins on the piece of paper. This is typically done using a combination of the

    • - line length, how many characters or inches of text that can be in a line

    • - the page offset, how far to the right to shift the entire line, and

    • - the indentation, how far left or right from the beginning of the line to actually place text. E.g., in a book, the first line of a paragraph is often indented 1/2 an inch.

  • Centering. Any number of input lines can be centered in the output text.

  • Line drawing. troff can draw horizontal, and vertical lines, as well as arbitrary curves.

  • Horizontal and Vertical Motions. You can move text up or down an arbitrary amount. Consider subscripts and superscripts in mathematical formulae, or footnotes indicators, which are often one half a line up and in a smaller point size.

You can add comments to your troff source. They begin with \" and continue to the end of the line. We will be using comments in our examples, to help explain what is going on.

Many of the facilities can be done both as standalone commands, and with in-line escape sequences. For example, to change to a bold font, one might have text like this:

        Here is some regular text.
        .ft B  \" now switch to bold
        This is bold.
        .ft    \" switch back to earlier font
        This will be regular again.

You can do the same thing with in-line escape sequences. For font changes, you use \f and either a single letter font name, or a ( and a two letter font name. A similar example would be:

        Here is some regular text. \fBThis is bold.\fP This will
        be regular again.

The letter P is special. It means to use the previous font.

troff provides two nice features, strings and number registers. A string is a shorthand for some text. For example, if you don't want to type the Linux Operating System over and over again, you could define a string LX, and then use the string in your text. This feature can save a lot of typing.

        .ds LX the Linux Operating System \" ds means define string
        If you are new to \*(LX, then you should subscribe to
        \fILinux Journal\fP. It covers \*(LX in great detail,
        month after month.

Number registers are like variables in programming languages. They can contain numeric values. They can also be used in an “auto-increment” and “auto-decrement” fashion. This means that with each use, the value goes up by one or down by one. Why would you need such a thing? Think about automatically numbering chapters, sections, and subsections, as well as figures and footnotes. You would have a register for the chapter number, another for the section number, and so on. With each new chapter, the section register is reset to one. With each new section, the subsection register is also reset to one, and so on.

Macros

As you are hopefully beginning to see, troff provides you all the mechanisms you need for complete control over your document's format. Unfortunately, this is often more control than you need. Writing documents using bare troff, while possible, can be quite painful. It is very much like programming in assembly language: you have complete control, and when the result works it works really well, but there is an awful lot of detail to keep track of, and it can be tedious and difficult.

To make it easier for regular users to manage the detail, troff allows you to define macros. A macro is like a subroutine in a programming language. You can group commands together to perform a larger task, and use the macro name, instead of writing out the entire sequence of commands each time.

Consider starting a new paragraph. You have to do the following tasks:

  • 1. See if there is room left on the page for at least two lines of text.

  • 2. Skip a space after the last paragraph.

  • 3. Indent the first line of text by 1/2 an inch.

You could write the commands to do this over and over again. But that is (a) tedious, (b) a hassle to update if you change how you do paragraphing. Instead, you can define a macro, say .P, to do this for you.

        .de P   \" DEfine paragraph macro
        .ne 3   \" we NEed at least three lines
        .sp     \" SPace down one line
        .ti .5i \" Temporary Indent .5 inches
        ..      \" end the macro definition

Then, in your text, you just put .P on a line by itself wherever you want a paragraph.

        .ds LX the Linux Operating System
        .ds LJ \fILinux Journal\fP
        If you are new to \*(LX, then you should subscribe to
        \*(LJ. It covers \*(LX in great detail,
        month after month.
        .P
        And even if you are an experienced user of \*(LX,
        \*(LJ will bring you valuable tips and tricks to keep
        your Linux system up and running.

One of the less attractive features of standard troff is that command, macro, string, and register names are limited to no more than two characters. Fortunately, groff allows longer names, with a different syntax for accessing them. In fact, groff has many nice extensions over troff, making the task of writing macro packages considerably easier. The extensions are documented in the gtroff(1) man page.

Popular Macro Packages

There are a number of popular troff macro packages. Using them is like programming in FORTRAN; it beats the heck out of Assembly Language, but it's not as nice as C or Modula-3.

The common macro packages are:

  • -ms - Manuscript macros. Originated in V7, popular on Berkeley Unix.

  • -man - Manual Page macros.

  • -mm - Memorandum Macros. Very powerful macros, popular on System V.

  • -me - Berkeley Technical Paper macros. An ugly package.

  • -mdoc - The new Document Macros from Berkeley.

  • -mandoc - A package that figures out dynamically if you want -man or -mdoc.

groff will support these directly if you have them, particularly using the -C compatibility mode option. It also has its own version of many of these packages.

The -ms and -mm are the most portable packages to use. -mm has many more features than -ms, thus making it harder to learn. In the long run though, the effort is worth it, because you can do so much.

Preprocessors For troff

Over the years, it was found that macros helped, but that there were some things that were just too difficult to do in bare troff, even with macro packages. The approach that was developed was to write a “little language” that solved a particular task, and to pre-process the language into raw troff. The common pre-processors are:

  • tbl - formats tables

  • eqn - formats equations

  • pic - formats pictures (diagrams)

  • grap - formats graphs

As an example, here is part of a table from a reference card I worked on:

        .TS
        tab(~);
        lfB l lfB l.
        abs~absolute value~int~integer part
        acos~arc cosine~log~natural logarithm
        asin~arc sine~sin~sine
        atan~arc tangent~sinh~hyperbolic sine
        cos~cosine~sqrt~square root
        cosh~hyperbolic cosine~tan~tangent
        .TE

We'll explain it line by line. First, tbl only looks at lines between .TS (table start) and .TE (table end). Everything else is left alone. This makes it easy to use tbl in a pipeline with the other preprocessors. The first line sets the tab character to ~. Normally, tabs in the input separate each column of the table. For this table, a ~ is used to make it easier to mark off the columns.

Then, for each line of data in the table, you provide a line that describes the layout information. l means left justified, r means right justified, and c means centered. All the columns in this table are left justified. The first and third columns also use a different font (the f). Here, they are using the bold font.

In this example, there is only one control line, so it is applied to all the data lines. For more complicated tables, you have one control line per data line, with the last control line applying to any remaining data lines.

The other preprocessors are similar in functionality. grap is actually a preprocessor for pic.

Typically, the commands are used in a pipeline:

        grap doc.tr pic  tbl eqn  troff -mm -Tps > doc.ps

The actual usage will vary from machine to machine; we'll see below how to run groff.

Output Devices

nroff was originally designed for terminals and line printers, which are devices with fixed width characters. troff was designed for the Wang CAT photo-typsetter, which could have up to four different fonts available, in many different point sizes.

Around 1980, Brian Kernighan revamped troff to create ditroff, the device-independent troff. This version accepted an enhanced language, and generated ASCII output that described the motions the output device should make around the page, the size and placement of characters, and so on. Then, to add a new output device (laser printer, photo-typesetter, or whatever), you would write a post-processor for the device independent output that would correctly drive your device. Recent version of troff are the device-independent version, usually with support for PostScript(TM) output.

The ditroff saga can be found in Bell Labs Computing Science Technical Report #97. You can get PostScript for this report via anonymous ftp to netlib.att.com. Change to /netlib/att/cs/cstr, use binary mode, and retrieve 97.ps.Z.

GNU troff

Now that you know what troff is, we'll discuss the specifics of the GNU version, groff. groff is written in C++. This is somewhat unusual; most GNU programs are written in C. To compile it, you need a C++ compiler. The GNU C++ compiler, g++, will usually do it with little problem. You will need both g++ and the GNU C++ library, libg++ to compile the groff suite of programs.

The programs in the suite are:

        gtroff - the actual troff clone
        gtbl - the tbl clone
        geqn - the eqn clone
        gpic - the pic clone
        groff - the driver for the other programs

There is no grap clone. Anyone who wishes to write one should contact gnu@prep.ai.mit.edu.

groff has a large number of extensions over Unix troff. It particular, groff supports long names for commands, strings, and registers, and has many additional commands. It also has a compatibility mode, where all the extensions are turned off. This is occasionally necessary when using macro packages meant for original troff.

The groff pre-processors described above cannot be used with original troff; they take advantage of groff's extensions.

groff uses the ditroff model of post-processors for different devices, with the same intermediate format. By default, groff generates PostScript output. The other most useful output format is plain ASCII. This is in fact how nroff is provided; by a shell script that calls groff -Tascii (i.e., the output type [-T] is ASCII). An interesting output type is TeX DVI, which can be used on many older laser printers that do not support PostScript. groff comes with two previewers for X windows, using different density fonts (75 and 100 dots per inch).

groff comes with a number of macro packages. It has its own version of the -man macros. The -mgs package is the GNU version of -ms, and -mgm is the GNU version of -mm. These should be used in preference to the original packages, since they can also take advantage of the groff extensions. The Berkeley -me, -mdoc, and -mandoc packages are themselves freely distributable, and are included with the groff distribution.

What is really nice about groff is that it is like lint for your troff documents. The programs check *everything*. Many things that Unix troff silently ignores, groff will warn you about. Often there are subtle errors in your files, and groff will help you catch the problems. Although every once in a while, there really is no problem, and you need to use compatibility mode instead.

Unfortunately, the one major lack in the groff distribution is that there is no comprehensive manual. The Tenth Edition Research Unix Programmer's Manual describes troff and its friends in detail. groff is based on this specification. Additional information can be found in the man pages that come with groff.

Information about pic can be obtained via anonymous ftp from the same site and directory mentioned above, in the file 116.ps.Z. A description of grap can be found in 114.ps.Z.

Summary

GNU troff, groff is a powerful, complete implementation of the troff software suite. If you will be doing anything with troff, it is definitely the version to get. It generates PostScript by default, will find bugs in your documents, and supports all popular macro packages. The source code is available on prep.ai.mit.edu in /pub/gnu, in the file groff-1.09.tar.gz. It should be found on all GNU mirror sites as well.

Editorial

Every once in a while, it is a worthwhile exercise to step back and stop and think about the free software you use with Linux, day in and day out. The Linux kernel is only one part of it. There are literally hundreds of utility programs, the majority of which were produced by Free Software Foundation staff and volunteers. The GNU General Public License, whose terms cover the utilities and the Linux kernel, came from the FSF. Linux is testimony to the idea that freely distributable software can be usable, and of high quality. Linux would have never happened if it had it *not* been free, and had there not been the GNU utilities to complete the picture.

Free Software Foundation Information

It is only good sportsmanship and fair play to “give something back” to the organization that has done so much for you: the FSF. You can help further the cause of the FSF in a number of ways, both directly and indirectly.

If you are a programmer or a writer, or both, the FSF has software *and* documentation that needs to be written. Serious volunteers are always welcome.

If you want to help support the FSF monetarily, you can do that too. You can buy software and/or documentation from them. The FSF sells tape and CD-ROMs with their software on it. You probably already have most of the software, but you may wish to have the printed documentation that goes with it. The GNU manuals are nicely printed and bound, and are not that expensive. Buying software and manuals directly contributes to the production of more, high quality, free software.

In the U.S., you can make tax-deductible donations to the FSF. It is considered a non-profit organization under U.S. law. This also helps.

Indirectly, you can choose to buy your Linux distributions from resellers who state that they give a percentage to the FSF. If your favorite distributor does not do this, then ask them *why* they don't, and encourage them to do so.

Consider what you can do to help the FSF, and then do it!

Arnold Robbins is a professional programmer and semi-professional author. He has been doing volunteer work for the GNU project since 1987 and working with Unix and Unix-like systems since 1981.

Load Disqus comments