Book Excerpt: Linux Programming by Example, Part 1

by Arnold Robbins

This article is excerpted from Chapter 7 of my book Linux Programming by Example: The Fundamentals, published by Prentice Hall (ISBN: 0-13-142964-7). Chapters 1 through 6 cover the standard APIs for argument parsing, dynamic memory management, file and directory access, retrieving file information (stat(), lstat()), working with users and groups and also date and time information and sorting.

V7 code uses so called K&R C, where functions are declared without an argument list. For such declarations, we show the corresponding ANSI C style declaration on the right between underscore marks [_Collects needed info_].

Putting It All Together: ls

The V7 ls command nicely ties together everything we've seen so far. It uses almost all of the APIs we've covered, touching on many aspects of Unix programming: memory allocation, file metadata, dates and times, user names, directory reading and sorting.

1. V7 ls Options

In comparison to modern versions of ls, the V7 ls accepted only a handful of options, and the meaning of some of them is different for V7 than for current ls. The options are as follows:

  • -a: Print all directory entries. Without this, don't print . and .. Interestingly enough, V7 ls ignores only . and .. , while V1 through V6 ignore any file whose name begins with a period. This latter behavior is the default in modern versions of ls as well.

  • -c: Use the inode change time, instead of the modification time, with -t or -l.

  • -d: For directory arguments, print information about the directory itself, not its contents.

  • -f: Force each argument to be read as a directory, and print the name found in each slot. This option disables -l, -r, -s and -t and enables -a. (This option apparently existed for filesystem debugging and repair.)

  • -g: For ls -l, use the group name instead of the user name.

  • -i: Print the inode number in the first column along with the filename or the long listing.

  • -l: Provide the familiar long format output. Note, however, that V7 ls -l printed only the user name, not the user and group names together.

  • -r: Reverse the sort order, be it alphabetic for filenames or by time.

  • -s: Print the size of the file in 512-byte blocks. The V7 ls (1) manpage states that "indirect blocks"--blocks used by the filesystem for locating the data blocks of large files--are also included in the computation, but, as we shall see, this statement was incorrect.

  • -t: Sort the output by modification time, most recent first, instead of by name.

  • -u: Use the access time instead of the modification time with -t and/or -l.

The biggest differences between V7 ls and modern ls concern the -a option and the -l option. Modern systems omit all dot files unless -a is given, and they include both user and group names in the -l long listing. On modern systems, -g is taken to mean print only the group name, and -o means print only the user name. For what it's worth, GNU ls has over 50 options!

2. V7 ls Code

The file /usr/src/cmd/ls.c in the V7 distribution contains the code. It is all of 425 lines long.


  1  /*
  2   * list file or directory
  3   */
  4
  5  #include <sys/param.h>
  6  #include <sys/stat.h>
  7  #include <sys/dir.h>
  8  #include <stdio.h>
  9
 10  #define NFILES  1024
 11  FILE    *pwdf, *dirf;
 12  char    stdbuf[BUFSIZ];
 13
 14  struct lbuf {                    _Collects needed info_
 15      union {
 16          char    lname[15];
 17          char    *namep;
 18      } ln;
 19      char    ltype;
 20      short   lnum;
 21      short   lflags;
 22      short   lnl;
 23      short   luid;
 24      short   lgid;
 25      long    lsize;
 26      long    lmtime;
 27  };
 28
 29  int aflg, dflg, lflg, sflg, tflg, uflg, iflg, fflg, gflg, cflg;
 30  int rflg    = 1;
 31  long    year;             _Global variables: auto init to 0_
 32  int flags;
 33  int lastuid = -1;
 34  char    tbuf[16];
 35  long    tblocks;
 36  int statreq;
 37  struct  lbuf    *flist[NFILES];
 38  struct  lbuf    **lastp = flist;
 39  struct  lbuf    **firstp = flist;
 40  char    *dotp   = ".";
 41
 42  char    *makename();      _char *makename(char *dir, char *file);_
 43  struct  lbuf *gstat();    _struct lbuf *gstat(char *file, int argfl);_
 44  char    *ctime();         _char *ctime(time_t *t);_
 45  long    nblock();         _long nblock(long size);_
 46
 47  #define ISARG   0100000

The program starts with file inclusions (lines 5-8) and variable declarations. The struct lbuf (lines 14-27) encapsulates the parts of the struct stat that are of interest to ls. We see later how this structure is filled.

The variables aflg, dflg, and so on (lines 29 and 30) all indicate the presence of the corresponding option. This variable naming style is typical of V7 code. The flist, lastp, and firstp variables (lines 37-39) represent the files that ls reports information about. Note that flist is a fixed-size array, allowing no more than 1024 files to be processed. We see shortly how all these variables are used.

After the variable declarations come function declarations (lines 42-45), and then the definition of ISARG, which distinguishes a file named on the command line from a file found when a directory is read.


 49  main(argc, argv)                        _int main(int argc, char **argv)_
 50  char *argv[];
 51  {
 52      int i;
 53      register struct lbuf *ep, **ep1;   _Variable and function declarations_
 54      register struct lbuf **slastp;
 55      struct lbuf **epp;
 56      struct lbuf lb;
 57      char *t;
 58      int compar();
 59
 60      setbuf(stdout, stdbuf);
 61      time(&lb.lmtime);                   _Get current time_
 62      year = lb.lmtime - 6L*30L*24L*60L*60L; /* 6 months ago */

The main() function starts by declaring variables and functions (lines 52-58), setting the buffer for standard output, retrieving the time of day (lines 60-61), and computing the seconds-since-the-Epoch value for approximately six months ago (line 62). Note that all the constants have the L suffix, indicating the use of long arithmetic.


 63      if (--argc > 0 && *argv[1] == '-') {
 64          argv++;                   _Parse options_
 65          while (*++*argv) switch (**argv) {
 66
 67          case 'a':                 _All directory entries_
 68              aflg++;
 69              continue;
 70
 71          case 's':                 _Size in blocks_
 72              sflg++;
 73              statreq++;
 74              continue;
 75
 76          case 'd':                 _Directory info, not contents_
 77              dflg++;
 78              continue;
 79
 80          case 'g':                 _Group name instead of user name_
 81              gflg++;
 82              continue;
 83
 84          case 'l':                 _Long listing_
 85              lflg++;
 86              statreq++;
 87              continue;
 88
 89          case 'r':                 _Reverse sort order_
 90              rflg = -1;
 91              continue;
 92
 93          case 't':                 _Sort by time, not name_
 94              tflg++;
 95              statreq++;
 96              continue;
 97
 98          case 'u':               _Access time, not modification time_
 99              uflg++;
100              continue;
101
102          case 'c':               _Inode change time, not modification time_
103              cflg++;
104              continue;
105
106          case 'i':               _Include inode number_
107              iflg++;
108              continue;
109
110          case 'f':               _Force reading each arg as directory_
111              fflg++;
112              continue;
113
114          default:                _Ignore unknown option letters_
115              continue;
116          }
117          argc--;
118      }

Lines 63-118 parse the command-line options. Note the manual parsing code getopt() hadn't been invented yet. The statreq variable is set to true when an option requires the use of the stat() system call.

Avoiding an unnecessary stat() call on each file is a big performance win. The stat() call was particularly expensive, because it could involve a disk seek to the inode location, a disk read to read the inode, and then a disk seek back to the location of the directory contents (in order to continue reading directory entries).

Modern systems have the inodes in groups, spread out throughout a filesystem instead of clustered together at the front. This makes a noticeable performance improvement. Nevertheless, stat() calls are still not free; you should use them as needed, but not any more than that.


119      if (fflg) {                  _-f overrides -l, -s, -t, adds -a_
120          aflg++;
121          lflg = 0;
122          sflg = 0;
123          tflg = 0;
124          statreq = 0;
125      }
126      if(lflg) {                      _Open password or group file_
127          t = "/etc/passwd";
128          if(gflg)
129              t = "/etc/group";
130          pwdf = fopen(t, "r");
131      }
132      if (argc==0) {                   _Use current dir if no args_
133          argc++;
134          argv = &dotp - 1;
135      }

Lines 119-125 handle the -f option, turning off -l, -s, -t, and statreq. Lines 126-131 handle -l, setting the file to be read for user or group information. Remember that the V7 ls shows only one or the other, not both.

If no arguments are left, lines 132-135 set up argv such that it points at a string representing the current directory. The assignment argv = &dotp - 1 is valid, although unusual. The - 1 compensates for the ++argv on line 137. This avoids special case code for argc == 1 in the main part of the program.


136      for (i=0; i < argc; i++) {        _Get info about each file_
137          if ((ep = gstat(*++argv, 1))==NULL)
138              continue;
139          ep->ln.namep = *argv;
140          ep->lflags |= ISARG;
141      }
142      qsort(firstp, lastp - firstp, sizeof *lastp, compar);
143      slastp = lastp;
144      for (epp=firstp; epp<slastp; epp++) {    _Main code, see text_
145          ep = *epp;
146          if (ep->ltype=='d' && dflg==0 || fflg) {
147              if (argc>1)
148                  printf("\n%s:\n", ep->ln.namep);
149              lastp = slastp;
150              readdir(ep->ln.namep);
151              if (fflg==0)
152                  qsort(slastp,lastp - slastp,sizeof *lastp,compar);
153              if (lflg || sflg)
154                  printf("total %D\n", tblocks);
155              for (ep1=slastp; ep1<lastp; ep1++)
156                  pentry(*ep1);
157          } else
158              pentry(ep);
159      }
160      exit(0);
161  }                                _End of main()_

Lines 136-141 loop over the arguments, gathering information about each one. The second argument to gstat() is a boolean: true if the name is a command-line argument, false otherwise. Line 140 adds the ISARG flag to the lflags field for each command-line argument.

The gstat() function adds each new struct lbuf into the global flist array (line 137). It also updates the lastp global pointer to point into this array at the current last element.

Lines 142-143 sort the array using qsort() and save the current value of lastp in slastp. Lines 144-159 loop over each element in the array, printing file or directory info as appropriate.

The code for directories deserves further explication:

  • if (ep->ltype=='d' && dflg==0 || fflg) ... Line 146. If the file type is directory and if -d was not provided or if -f was, then ls has to read the directory instead of printing information about the directory itself.

  • if (argc>1) printf("\n%s:\n", ep->ln.namep) Lines 147-148. Print the directory name and a colon if multiple files were named on the command line.

  • lastp = slastp; readdir(ep->ln.namep) Lines 149-150. Reset lastp from slastp. The flist array acts as a two-level stack of filenames. The command-line arguments are kept in firstp through slastp - 1. When readdir() reads a directory, it puts the struct lbuf structures for the directory contents onto the stack, starting at slastp and going through lastp. This is illustrated in Figure 7.1.


           firstp                      slastp             lastp
              |                          |                  |
              V                          V                  V
          +--------+--------+--------+--------+--------+--------+
          | struct | struct | struct | struct | struct | struct |  flist array
          | lbuf * | lbuf * | lbuf * | lbuf * | lbuf * | lbuf * |
          +--------+--------+--------+--------+--------+--------+

          |<--- From command line -->|<---- From readdir() ---->|

Figure 7.1: The flist Array as a Two-Level Stack

  • if (fflg==0) qsort(slastp,lastp - slastp,sizeof *lastp,compar) Lines 151-152. Sort the subdirectory entries if -f is not in effect.

  • if (lflg || sflg) printf("total %D\n", tblocks) Lines 153-154. Print the total number of blocks used by files in the directory, for -l or -s. This total is kept in the variable tblocks, which is reset for each directory. The %D format string for printf() is equivalent to %ld on modern systems; it means print a long integer. (V7 also had %ld, see line 192.)

  • for (ep1=slastp; ep1>lastp; ep1++) pentry(*ep1) Lines 155-156. Print the information about each file in the subdirectory. Note that the V7 ls descends only one level in a directory tree. It lacks the modern -R recursive option.


163  pentry(ap)                                _void pentry(struct lbuf *ap)_
164  struct lbuf *ap;
165  {
166      struct { char dminor, dmajor;};       _Unused historical artifact_
167      register t;                           _from V6 ls_
168      register struct lbuf *p;
169      register char *cp;
170
171      p = ap;
172      if (p->lnum == -1)
173          return;
174      if (iflg)
175          printf("%5u ", p->lnum);          _Inode number_
176      if (sflg)
177      printf("%4D ", nblock(p->lsize));     _Size in blocks_

The pentry() routine prints information about a file. Lines 172-173 check whether the lnum field is 1, and return if so. When p->lnum == -1 is true, the struct lbuf is not valid. Otherwise, this field is the file's inode number.

Lines 174-175 print the inode number if -i is in effect. Lines 176-177 print the total number of blocks if -s is in effect. (As we see below, this number may not be accurate.)


178      if (lflg) {                              _Long listing:_
179          putchar(p->ltype);                     _-- File type_
180          pmode(p->lflags);                      _-- Permissions_
181          printf("%2d ", p->lnl);                _-- Link count_
182          t = p->luid;
183          if(gflg)
184              t = p->lgid;
185          if (getname(t, tbuf)==0)
186              printf("%-6.6s", tbuf);            _-- User or group_
187          else
188              printf("%-6d", t);
189          if (p->ltype=='b' || p->ltype=='c')    _-- Device: major/minor_
190              printf("%3d,%3d", major((int)p->lsize), minor((int)p->lsize));
191          else
192              printf("%7ld", p->lsize);          _-- Size in bytes_
193          cp = ctime(&p->lmtime);
194          if(p->lmtime < year)                   _-- Modification time_
195              printf(" %-7.7s %-4.4s ", cp+4, cp+20); else
196              printf(" %-12.12s ", cp+4);
197      }
198      if (p->lflags&ISARG)                       _-- Filename_
199          printf("%s\n", p->ln.namep);
200      else
201          printf("%.14s\n", p->ln.lname);
202  }

Lines 178-197 handle the -l option. Lines 179-181 print the file's type, permissions, and number of links. Lines 182-184 set t to the user ID or the group ID, based on the -g option. Lines 185-188 retrieve the corresponding name and print it if available. Otherwise, the program prints the numeric value.

Lines 189-192 check whether the file is a block or character device. If it is, they print the major and minor device numbers, extracted with the major() and minor() macros. Otherwise, they print the file's size.

Lines 193-196 print the time of interest. If it's older than six months, the code prints the month, day, and year. Otherwise, it prints the month, day, and time (*note Ctime::, for the format of ctime()'s result).

Finally, lines 198-201 print the filename. For a command-line argument, we know it's a zero-terminated string, and %s can be used. For a file read from a directory, it may not be zero-terminated, and thus an explicit precision, %.14s, must be used.


204  getname(uid, buf)                      _int getname(int uid, char buf[])_
205  int uid;
206  char buf[];
207  {
208      int j, c, n, i;
209
210      if (uid==lastuid)                  _Simple caching, see text_
211          return(0);
212      if(pwdf == NULL)                   _Safety check_
213          return(-1);
214      rewind(pwdf);                      _Start at front of file_
215      lastuid = -1;
216      do {
217          i = 0;                         _Index in buf array_
218          j = 0;                         _Counts fields in line_
219          n = 0;                            _Converts numeric value_
220          while((c=fgetc(pwdf)) != '\n') {  _Read lines_
221              if (c==EOF)
222                  return(-1);
223              if (c==':') {                 _Count fields_
224                  j++;
225                  c = '0';
226              }
227              if (j==0)                     _First field is name_
228                  buf[i++] = c;
229              if (j==2)                     _Third field is numeric ID_
230                  n = n*10 + c - '0';
231          }
232      } while (n != uid);                   _Keep searching until ID found_
233      buf[i++] = '\0';
234      lastuid = uid;
235      return(0);
236  }

The getname() function converts a user or group ID number into the corresponding name. It implements a simple caching scheme; if the passed-in uid is the same as the global variable lastuid, then the function returns 0, for OK; the buffer will already contain the name (lines 210-211). lastuid is initialized to -1 (line 33), so this test fails the first time getname() is called.

pwdf is already open on either /etc/passwd or /etc/group (see lines 126-130). The code here checks that the open succeeded and returns -1 if it didn't (lines 212-213).

Surprisingly, ls does not use getpwuid() or getgrgid(). Instead, it takes advantage of the facts that the format of /etc/passwd and /etc/group is identical for the first three fields (name, password, numeric ID) and that both use a colon as separator.

Lines 216-232 implement a linear search through the file. j counts the number of colons seen so far: 0 for the name and 2 for the ID number. Thus, while scanning the line, it fills in both the name and the ID number.

Lines 233-235 terminate the name buffer, set the global lastuid to the found ID number, and return 0 for OK.

In the second half of this article, we will finish our tour through the V7 ls code file.

Arnold Robbins is a professional programmer and technical author. He is the long-time maintainer of gawk, the GNU Project's version of the Awk programming language. He is the author of numerous well-known technical books, such as Linux Programming by Example: The Fundamentals, recently published by Prentice Hall, as well as Unix In A Nutshell, Learning the vi Editor and Effective awk Programming, published by O'Reilly Media Inc. He is happily married with four wonderful children. Arnold is an amateur Talmudist and particularly enjoys studying the Jerusalem Talmud. He can be reached at arnold@skeeve.com.

Load Disqus comments