Book Excerpt: Linux Programming by Example, Part 1
This article is excerpted from Chapter 7 of my book Linux Programming by Example: The Fundamentals, published by Prentice Hall (ISBN: 0-13-142964-7). Chapters 1 through 6 cover the standard APIs for argument parsing, dynamic memory management, file and directory access, retrieving file information (stat(), lstat()), working with users and groups and also date and time information and sorting.
V7 code uses so called K&R C, where functions are declared without an argument list. For such declarations, we show the corresponding ANSI C style declaration on the right between underscore marks [_Collects needed info_].
The V7 ls command nicely ties together everything we've seen so far. It uses almost all of the APIs we've covered, touching on many aspects of Unix programming: memory allocation, file metadata, dates and times, user names, directory reading and sorting.
1. V7 ls Options
In comparison to modern versions of ls, the V7 ls accepted only a handful of options, and the meaning of some of them is different for V7 than for current ls. The options are as follows:
-a: Print all directory entries. Without this, don't print . and .. Interestingly enough, V7 ls ignores only . and .. , while V1 through V6 ignore any file whose name begins with a period. This latter behavior is the default in modern versions of ls as well.
-c: Use the inode change time, instead of the modification time, with -t or -l.
-d: For directory arguments, print information about the directory itself, not its contents.
-f: Force each argument to be read as a directory, and print the name found in each slot. This option disables -l, -r, -s and -t and enables -a. (This option apparently existed for filesystem debugging and repair.)
-g: For ls -l, use the group name instead of the user name.
-i: Print the inode number in the first column along with the filename or the long listing.
-l: Provide the familiar long format output. Note, however, that V7 ls -l printed only the user name, not the user and group names together.
-r: Reverse the sort order, be it alphabetic for filenames or by time.
-s: Print the size of the file in 512-byte blocks. The V7 ls (1) manpage states that "indirect blocks"--blocks used by the filesystem for locating the data blocks of large files--are also included in the computation, but, as we shall see, this statement was incorrect.
-t: Sort the output by modification time, most recent first, instead of by name.
-u: Use the access time instead of the modification time with -t and/or -l.
The biggest differences between V7 ls and modern ls concern the -a option and the -l option. Modern systems omit all dot files unless -a is given, and they include both user and group names in the -l long listing. On modern systems, -g is taken to mean print only the group name, and -o means print only the user name. For what it's worth, GNU ls has over 50 options!
2. V7 ls Code
The file /usr/src/cmd/ls.c in the V7 distribution contains the code. It is all of 425 lines long.
1 /* 2 * list file or directory 3 */ 4 5 #include <sys/param.h> 6 #include <sys/stat.h> 7 #include <sys/dir.h> 8 #include <stdio.h> 9 10 #define NFILES 1024 11 FILE *pwdf, *dirf; 12 char stdbuf[BUFSIZ]; 13 14 struct lbuf { _Collects needed info_ 15 union { 16 char lname[15]; 17 char *namep; 18 } ln; 19 char ltype; 20 short lnum; 21 short lflags; 22 short lnl; 23 short luid; 24 short lgid; 25 long lsize; 26 long lmtime; 27 }; 28 29 int aflg, dflg, lflg, sflg, tflg, uflg, iflg, fflg, gflg, cflg; 30 int rflg = 1; 31 long year; _Global variables: auto init to 0_ 32 int flags; 33 int lastuid = -1; 34 char tbuf[16]; 35 long tblocks; 36 int statreq; 37 struct lbuf *flist[NFILES]; 38 struct lbuf **lastp = flist; 39 struct lbuf **firstp = flist; 40 char *dotp = "."; 41 42 char *makename(); _char *makename(char *dir, char *file);_ 43 struct lbuf *gstat(); _struct lbuf *gstat(char *file, int argfl);_ 44 char *ctime(); _char *ctime(time_t *t);_ 45 long nblock(); _long nblock(long size);_ 46 47 #define ISARG 0100000
The program starts with file inclusions (lines 5-8) and variable declarations. The struct lbuf (lines 14-27) encapsulates the parts of the struct stat that are of interest to ls. We see later how this structure is filled.
The variables aflg, dflg, and so on (lines 29 and 30) all indicate the presence of the corresponding option. This variable naming style is typical of V7 code. The flist, lastp, and firstp variables (lines 37-39) represent the files that ls reports information about. Note that flist is a fixed-size array, allowing no more than 1024 files to be processed. We see shortly how all these variables are used.
After the variable declarations come function declarations (lines 42-45), and then the definition of ISARG, which distinguishes a file named on the command line from a file found when a directory is read.
49 main(argc, argv) _int main(int argc, char **argv)_ 50 char *argv[]; 51 { 52 int i; 53 register struct lbuf *ep, **ep1; _Variable and function declarations_ 54 register struct lbuf **slastp; 55 struct lbuf **epp; 56 struct lbuf lb; 57 char *t; 58 int compar(); 59 60 setbuf(stdout, stdbuf); 61 time(&lb.lmtime); _Get current time_ 62 year = lb.lmtime - 6L*30L*24L*60L*60L; /* 6 months ago */
The main() function starts by declaring variables and functions (lines 52-58), setting the buffer for standard output, retrieving the time of day (lines 60-61), and computing the seconds-since-the-Epoch value for approximately six months ago (line 62). Note that all the constants have the L suffix, indicating the use of long arithmetic.
63 if (--argc > 0 && *argv[1] == '-') { 64 argv++; _Parse options_ 65 while (*++*argv) switch (**argv) { 66 67 case 'a': _All directory entries_ 68 aflg++; 69 continue; 70 71 case 's': _Size in blocks_ 72 sflg++; 73 statreq++; 74 continue; 75 76 case 'd': _Directory info, not contents_ 77 dflg++; 78 continue; 79 80 case 'g': _Group name instead of user name_ 81 gflg++; 82 continue; 83 84 case 'l': _Long listing_ 85 lflg++; 86 statreq++; 87 continue; 88 89 case 'r': _Reverse sort order_ 90 rflg = -1; 91 continue; 92 93 case 't': _Sort by time, not name_ 94 tflg++; 95 statreq++; 96 continue; 97 98 case 'u': _Access time, not modification time_ 99 uflg++; 100 continue; 101 102 case 'c': _Inode change time, not modification time_ 103 cflg++; 104 continue; 105 106 case 'i': _Include inode number_ 107 iflg++; 108 continue; 109 110 case 'f': _Force reading each arg as directory_ 111 fflg++; 112 continue; 113 114 default: _Ignore unknown option letters_ 115 continue; 116 } 117 argc--; 118 }
Lines 63-118 parse the command-line options. Note the manual parsing code getopt() hadn't been invented yet. The statreq variable is set to true when an option requires the use of the stat() system call.
Avoiding an unnecessary stat() call on each file is a big performance win. The stat() call was particularly expensive, because it could involve a disk seek to the inode location, a disk read to read the inode, and then a disk seek back to the location of the directory contents (in order to continue reading directory entries).
Modern systems have the inodes in groups, spread out throughout a filesystem instead of clustered together at the front. This makes a noticeable performance improvement. Nevertheless, stat() calls are still not free; you should use them as needed, but not any more than that.
119 if (fflg) { _-f overrides -l, -s, -t, adds -a_ 120 aflg++; 121 lflg = 0; 122 sflg = 0; 123 tflg = 0; 124 statreq = 0; 125 } 126 if(lflg) { _Open password or group file_ 127 t = "/etc/passwd"; 128 if(gflg) 129 t = "/etc/group"; 130 pwdf = fopen(t, "r"); 131 } 132 if (argc==0) { _Use current dir if no args_ 133 argc++; 134 argv = &dotp - 1; 135 }
Lines 119-125 handle the -f option, turning off -l, -s, -t, and statreq. Lines 126-131 handle -l, setting the file to be read for user or group information. Remember that the V7 ls shows only one or the other, not both.
If no arguments are left, lines 132-135 set up argv such that it points at a string representing the current directory. The assignment argv = &dotp - 1 is valid, although unusual. The - 1 compensates for the ++argv on line 137. This avoids special case code for argc == 1 in the main part of the program.
136 for (i=0; i < argc; i++) { _Get info about each file_ 137 if ((ep = gstat(*++argv, 1))==NULL) 138 continue; 139 ep->ln.namep = *argv; 140 ep->lflags |= ISARG; 141 } 142 qsort(firstp, lastp - firstp, sizeof *lastp, compar); 143 slastp = lastp; 144 for (epp=firstp; epp<slastp; epp++) { _Main code, see text_ 145 ep = *epp; 146 if (ep->ltype=='d' && dflg==0 || fflg) { 147 if (argc>1) 148 printf("\n%s:\n", ep->ln.namep); 149 lastp = slastp; 150 readdir(ep->ln.namep); 151 if (fflg==0) 152 qsort(slastp,lastp - slastp,sizeof *lastp,compar); 153 if (lflg || sflg) 154 printf("total %D\n", tblocks); 155 for (ep1=slastp; ep1<lastp; ep1++) 156 pentry(*ep1); 157 } else 158 pentry(ep); 159 } 160 exit(0); 161 } _End of main()_
Lines 136-141 loop over the arguments, gathering information about each one. The second argument to gstat() is a boolean: true if the name is a command-line argument, false otherwise. Line 140 adds the ISARG flag to the lflags field for each command-line argument.
The gstat() function adds each new struct lbuf into the global flist array (line 137). It also updates the lastp global pointer to point into this array at the current last element.
Lines 142-143 sort the array using qsort() and save the current value of lastp in slastp. Lines 144-159 loop over each element in the array, printing file or directory info as appropriate.
The code for directories deserves further explication:
if (ep->ltype=='d' && dflg==0 || fflg) ... Line 146. If the file type is directory and if -d was not provided or if -f was, then ls has to read the directory instead of printing information about the directory itself.
if (argc>1) printf("\n%s:\n", ep->ln.namep) Lines 147-148. Print the directory name and a colon if multiple files were named on the command line.
lastp = slastp; readdir(ep->ln.namep) Lines 149-150. Reset lastp from slastp. The flist array acts as a two-level stack of filenames. The command-line arguments are kept in firstp through slastp - 1. When readdir() reads a directory, it puts the struct lbuf structures for the directory contents onto the stack, starting at slastp and going through lastp. This is illustrated in Figure 7.1.
firstp slastp lastp | | | V V V +--------+--------+--------+--------+--------+--------+ | struct | struct | struct | struct | struct | struct | flist array | lbuf * | lbuf * | lbuf * | lbuf * | lbuf * | lbuf * | +--------+--------+--------+--------+--------+--------+ |<--- From command line -->|<---- From readdir() ---->|
Figure 7.1: The flist Array as a Two-Level Stack
if (fflg==0) qsort(slastp,lastp - slastp,sizeof *lastp,compar) Lines 151-152. Sort the subdirectory entries if -f is not in effect.
if (lflg || sflg) printf("total %D\n", tblocks) Lines 153-154. Print the total number of blocks used by files in the directory, for -l or -s. This total is kept in the variable tblocks, which is reset for each directory. The %D format string for printf() is equivalent to %ld on modern systems; it means print a long integer. (V7 also had %ld, see line 192.)
for (ep1=slastp; ep1>lastp; ep1++) pentry(*ep1) Lines 155-156. Print the information about each file in the subdirectory. Note that the V7 ls descends only one level in a directory tree. It lacks the modern -R recursive option.
163 pentry(ap) _void pentry(struct lbuf *ap)_ 164 struct lbuf *ap; 165 { 166 struct { char dminor, dmajor;}; _Unused historical artifact_ 167 register t; _from V6 ls_ 168 register struct lbuf *p; 169 register char *cp; 170 171 p = ap; 172 if (p->lnum == -1) 173 return; 174 if (iflg) 175 printf("%5u ", p->lnum); _Inode number_ 176 if (sflg) 177 printf("%4D ", nblock(p->lsize)); _Size in blocks_
The pentry() routine prints information about a file. Lines 172-173 check whether the lnum field is 1, and return if so. When p->lnum == -1 is true, the struct lbuf is not valid. Otherwise, this field is the file's inode number.
Lines 174-175 print the inode number if -i is in effect. Lines 176-177 print the total number of blocks if -s is in effect. (As we see below, this number may not be accurate.)
178 if (lflg) { _Long listing:_ 179 putchar(p->ltype); _-- File type_ 180 pmode(p->lflags); _-- Permissions_ 181 printf("%2d ", p->lnl); _-- Link count_ 182 t = p->luid; 183 if(gflg) 184 t = p->lgid; 185 if (getname(t, tbuf)==0) 186 printf("%-6.6s", tbuf); _-- User or group_ 187 else 188 printf("%-6d", t); 189 if (p->ltype=='b' || p->ltype=='c') _-- Device: major/minor_ 190 printf("%3d,%3d", major((int)p->lsize), minor((int)p->lsize)); 191 else 192 printf("%7ld", p->lsize); _-- Size in bytes_ 193 cp = ctime(&p->lmtime); 194 if(p->lmtime < year) _-- Modification time_ 195 printf(" %-7.7s %-4.4s ", cp+4, cp+20); else 196 printf(" %-12.12s ", cp+4); 197 } 198 if (p->lflags&ISARG) _-- Filename_ 199 printf("%s\n", p->ln.namep); 200 else 201 printf("%.14s\n", p->ln.lname); 202 }
Lines 178-197 handle the -l option. Lines 179-181 print the file's type, permissions, and number of links. Lines 182-184 set t to the user ID or the group ID, based on the -g option. Lines 185-188 retrieve the corresponding name and print it if available. Otherwise, the program prints the numeric value.
Lines 189-192 check whether the file is a block or character device. If it is, they print the major and minor device numbers, extracted with the major() and minor() macros. Otherwise, they print the file's size.
Lines 193-196 print the time of interest. If it's older than six months, the code prints the month, day, and year. Otherwise, it prints the month, day, and time (*note Ctime::, for the format of ctime()'s result).
Finally, lines 198-201 print the filename. For a command-line argument, we know it's a zero-terminated string, and %s can be used. For a file read from a directory, it may not be zero-terminated, and thus an explicit precision, %.14s, must be used.
204 getname(uid, buf) _int getname(int uid, char buf[])_ 205 int uid; 206 char buf[]; 207 { 208 int j, c, n, i; 209 210 if (uid==lastuid) _Simple caching, see text_ 211 return(0); 212 if(pwdf == NULL) _Safety check_ 213 return(-1); 214 rewind(pwdf); _Start at front of file_ 215 lastuid = -1; 216 do { 217 i = 0; _Index in buf array_ 218 j = 0; _Counts fields in line_ 219 n = 0; _Converts numeric value_ 220 while((c=fgetc(pwdf)) != '\n') { _Read lines_ 221 if (c==EOF) 222 return(-1); 223 if (c==':') { _Count fields_ 224 j++; 225 c = '0'; 226 } 227 if (j==0) _First field is name_ 228 buf[i++] = c; 229 if (j==2) _Third field is numeric ID_ 230 n = n*10 + c - '0'; 231 } 232 } while (n != uid); _Keep searching until ID found_ 233 buf[i++] = '\0'; 234 lastuid = uid; 235 return(0); 236 }
The getname() function converts a user or group ID number into the corresponding name. It implements a simple caching scheme; if the passed-in uid is the same as the global variable lastuid, then the function returns 0, for OK; the buffer will already contain the name (lines 210-211). lastuid is initialized to -1 (line 33), so this test fails the first time getname() is called.
pwdf is already open on either /etc/passwd or /etc/group (see lines 126-130). The code here checks that the open succeeded and returns -1 if it didn't (lines 212-213).
Surprisingly, ls does not use getpwuid() or getgrgid(). Instead, it takes advantage of the facts that the format of /etc/passwd and /etc/group is identical for the first three fields (name, password, numeric ID) and that both use a colon as separator.
Lines 216-232 implement a linear search through the file. j counts the number of colons seen so far: 0 for the name and 2 for the ID number. Thus, while scanning the line, it fills in both the name and the ID number.
Lines 233-235 terminate the name buffer, set the global lastuid to the found ID number, and return 0 for OK.
In the second half of this article, we will finish our tour through the V7 ls code file.
Arnold Robbins is a professional programmer and technical author. He is the long-time maintainer of gawk, the GNU Project's version of the Awk programming language. He is the author of numerous well-known technical books, such as Linux Programming by Example: The Fundamentals, recently published by Prentice Hall, as well as Unix In A Nutshell, Learning the vi Editor and Effective awk Programming, published by O'Reilly Media Inc. He is happily married with four wonderful children. Arnold is an amateur Talmudist and particularly enjoys studying the Jerusalem Talmud. He can be reached at arnold@skeeve.com.