The Linux /proc Filesystem as a Programmers' Tool
One of the best investments in time that a person with a keen interest in operating systems can make is to explore them from multiple angles. Operating system installation and configuration can go a long way toward solidifying the concepts that one reads about in system administration and networking texts. Electronic references in the form of system manual (man) pages and the Linux info(1) facility are available for instant consultation. groups.google.com, with its archive of 20-plus years of Usenet postings, can provide expert guidance on issues ranging from DNS configuration to memory interleaving. However, I feel another approach is singularly unique when it comes to exploring operating systems--programming them.
My entry into systems programming was guided by my desire to understand further the operating systems I was working with daily as a contract UNIX and, later, Linux system administrator. The result of this was ifchk, a packet sniffer detector I wrote in C and released in June of 2003. ifchk initially was written under IRIX and then ported to Linux, mostly under the 2.4 kernel. The current ifchk revision, beta 4, recently was released and beta 5 is on the way.
My work on ifchk has allowed me to examine programmatically several areas of operating system functionality. Examples include the Linux netlink(7) and rtnetlink(7) facilities, device control--that is, network interfaces--via ioctl(2), signals and proc, the process filesystem. Proc and its ability to display a wide array of data concerning the runtime state of a system are the focus of our discussion here.
Before we begin to talk about the proc filesystem as a programming facility, we need need to establish what it actually is. The proc filesystem is a pseudo-filesystem rooted at /proc that contains user-accessible objects that pertain to the runtime state of the kernel and, by extension, the executing processes that run on top of it. "Pseudo" is used because the proc filesystem exists only as a reflection of the in-memory kernel data structures it displays. This is why most files and directories within /proc are 0 bytes in size.
Broadly speaking, a directory listing of /proc reveals two main file groups. Each numerically named directory within /proc corresponds to the process ID (PID) of a process currently executing on the system. The following line of ls output illustrates this:
dr-xr-xr-x 3 noorg noorg 0 Apr 16 23:24 19636
Directory 19636 corresponds to PID 19636, a current bash shell session. These per-process directories contain both subdirectories and regular files that further elaborate on the runtime attributes of a given process. The proc(5) manual page discusses these process attributes at length.
The second file group within /proc is the non-numerically named directories and regular files that describe some aspect of kernel operation. As an example, the file /proc/version contains revision information relevant to the running kernel image.
Proc files are either read-only or read-write. The /proc/version file above is an example of a read-only file. Its contents are viewable by way of cat(1), and they remain static while the system is powered up and accessible to users. Read-write files, however, allow for both the display and modification of the runtime state of the kernel. /proc/sys/net/ipv4/ip_forwarding is one such file. Using cat(1) on this file reveals if the system is forwarding IP datagrams between network interfaces--the file contains a 1--or not--the file contains a 0. In echo(1)ing 1 or 0 to this file, that is, writing to the file, we can enable or disable the kernels ability to forward packets without having to build and boot a new kernel image. This works for many other proc files with read-write permissions.
Readers are invited to explore the /proc directory of an available system and consult the proc(5) manual page to solidify their understanding of the above discussion.
With the above foundation in place, we now turn our attention to using the contents of /proc to examine programmatically an aspect of the running kernel. A word of note: although the sample file ifmetric.txt discussed below contains all of the code we are referencing, additional understanding can be gleaned by downloading and building ifchk and by walking through its code, via gdb(1).
One area of ifchk functionality deals with the display of counters that describe transmitted and received packets across all network interfaces attached to the system. ifchk does this by parsing and processing /proc/net/dev. Although the file contains all manner of statistics concerning network interfaces, we are interested mainly in the extraction of packet count data.
ifMetric() is the function that implements all of the above functionality. It can be found in the file, ifmetric.txt, that is excerpted below. ifMetric() also can be found within the ifchk distribution in ~/ifchk-0.95b4/linux.c.
The ifMetric() function is defined as follows:
int ifMetric( struct ifList *list );
*list is a pointer to the head of a linked list of structs of type ifList. Each node describes characteristics of an interface present on the system. ifList is defined in ~/ifchk-0.95b4/linux.h as such:
struct ifList { int flags; /* Interface flags. */ char data[DATASZ]; /* Interface name/unit number, e.g., "eth0". */ struct ifList *next; /* Pointer to next struct ifList. */ };
In getting started, our first inclination might be to open the /proc/net/dev file and begin searching for the relevant packet count data. There is, however, some file reconnaissance work to be done in between these two steps. Program failure due to maliciously crafted input is a staple of security advisories these days. We cannot make any assumptions about external program input, regardless of its source. As we will see, there is another security-related reason why we sequence the above file access operations as we do. With all of the above in mind, let's get to work.
540 int ifMetric( struct ifList *list ) 541 { 542 struct stat fileAttrs; /* /proc/net/dev file attributes. */ 543 struct stat linkAttrs; /* /proc/net/dev link attributes. */ 544 char *filePath = NULL; /* Path to /proc/net/dev file. */ 545 FILE *file = NULL; /* File pointer for fopen(). */ 546 int status = 0; /* /proc/net/dev file close() status. */ 547 char buf[1024] = ""; /* Character buffer. */ 548 struct ifList *cur = NULL; /* Current node in linked list. */ 549 int conv = 0; /* sscanf() string match count. */ 550 long long rxPackets = 0; /* Received packet count. */ 551 long long txPackets = 0; /* Transmitted packet count. */ 552 char *delim = NULL; /* Ptr to matched character in string. */ 553 unsigned int ifIndex = 0; /* Interface index. */ 554 mode_t mode = 0; /* /proc/net/dev file permissions. */ 558 cur = list; 559 if( cur -> data == NULL ) 560 { 561 fprintf( stderr, "ifchk: ERROR: interface list is empty\n" ); 562 return (-1); 563 } 564 filePath = "/proc/net/dev"; 565 if( ( lstat( filePath, &linkAttrs ) ) != 0 ) 566 { 567 perror( "lstat" ); 568 return (-1); 569 } 570 if( S_ISLNK( linkAttrs.st_mode ) ) 571 { 572 fprintf( stderr, "ifchk: ERROR: /proc/net/dev is a symbolic link\n" ); 573 return (-1); 574 }
Before we open /proc/net/dev, we must be sure that the file is not actually a symbolic link (symlink). If passed a symlink, which our proc file shouldn't be, fopen(3) would follow that link to what it points to. Our subsequent fstat(2) call, using the file descriptor returned by fopen(3), would return data on the wrong file. To protect against this, we lstat(2) /proc/net/dev and then check if it is a symlink, by using the S_ISLNK POSIX macro. If S_ISLNK finds a symlink, we print an error message with fprintf(3) and return -1, signifying failure. Otherwise, we continue on.
578 if( ( file = fopen( "/proc/net/dev", "r" ) ) == NULL ) 579 { 580 perror( "fopen" ); 581 return (-1); 582 }
Presuming that S_ISLNK didn't find a symlink, we next open the /proc/net/dev file by calling fopen(3). If our call succeeds, fopen(3) returns a FILE pointer for use with future operations on /proc/net/dev. If our call fails, we call perror(3) and return -1, signifying failure. Notice that we check the return value of the fopen(3) call. Checking function return values is critical, to say the least.
586 if( ( fstat( fileno( file ), &fileAttrs ) ) != 0 ) 587 { 588 perror( "fstat" ); 589 return (-1); 590 }
One of the first things I think about when writing code is how it might fail. Additionally, I also think about how I can minimize the impact of failure by failing gracefully through programming defensively. Doing this is crucial, I feel, especially in light of the heightened state of concern regarding system security and the potential impact of security compromises.
Given this, we must continue to probe /proc/net/dev, via fstat(2), to see if the file exhibits what constitutes plausible attributes for a read-only proc file. These attributes include its file type outside of being a symlink, size in bytes and disk blocks, user and group ownership status and permissions. The following attribute criteria is based on what I have seen for /proc/net/dev on the bulk of Linux systems I have worked on.
file type: regular size in bytes: 0 size in blocks: 0 file ownership: root:root file permissions: 0444 (-r--r--r--)
From the perspective of a default ifchk build, our /proc/net/dev file must conform to the above attribute criteria. However, a simple modification to the ifchk code, if required, can accommodate site policies and so on that differ from the above.
If successful, fstat(2) fills in fileAttrs with /proc/net/dev attributes. In the event of failure, we relay a descriptive error message to the user via a call to perror(3). We then return -1, signifying failure. Examining fileAttrs in gdb(1), we see that fstat(2) has the following to say about /proc/net/dev (some struct members were removed from GDB output, as they did not add to the discussion):
(gdb) print fileAttrs $1 = {..., st_mode = 33060, st_uid = 0, st_gid = 0, st_size = 0, st_blocks = 0, ...}
In the discussion above, I alluded to an additional security-related reason behind the sequencing of our /proc/net/dev access procedures. To recap, our handling of /proc/net/dev--outside of lstat(2) and our not wanting to follow symlinks--currently consists of calling fopen(3) followed by fstat(2). We just as easily could have achieved the same result via a call to stat(2), not fstat(2), and then fopen(3), as stat(2) returns the same file attribute data as fstat(2) does. So, why use the former sequence--fopen(3), fstat(2)--instead of the latter one--stat(2), fopen(3)? Because we wish to avoid a race condition. The stat(2), fopen(3) sequence creates the possibility that the file we referenced during the stat(2) call could have been substituted with another one with possibly different attributes or even different contents once we got to the fopen(3) call. In this situation, we'd think we were calling fopen(3) on the same file we just called stat(2) on but, alas, not. The danger of this, I think, is obvious.
591 if( ( ( linkAttrs.st_ino ) != ( fileAttrs.st_ino ) ) || 592 ( ( linkAttrs.st_dev ) != ( fileAttrs.st_dev ) ) ) 593 { 594 fprintf( stderr, "ifchk: ERROR: /proc/net/dev file attribute inconsistency\n" ); 595 return (-1); 596 }
As an added measure in checking that we are working with the proc file we think we are, we compare the inode numbers, st_ino, and resident filesystem, st_dev, that both lstat(2) and fstat(2) reported in their /proc/net/dev attribute checks. If all is well, the value of linkAttrs.st_ino should equal fileAttrs.st_ino and the value of linkAttrs.st_dev should equal the the value of fileAttrs.st_dev. If we're okay here, we continue on. If not, we report an attribute consistency error via fprintf(3) and return -1, signifying failure.
600 if( ! ( S_ISREG( fileAttrs.st_mode ) ) ) 601 { 602 fprintf( stderr, "ifchk: ERROR: /proc/net/dev is not a regular file\n" ); 603 return (-1); 604 }
S_ISREG is a POSIX macro that checks to see if its argument is a regular file and not, say, a directory. If it is, we continue on. If not, we print an error message via fprintf(3), and return -1, signifying failure. At this point, you might be asking why we need the lstat(2)/S_ISLNK symlink test above if we're testing for file type here. Referring back to the symlink test above should help to answer that question.
608 if( ( ( fileAttrs.st_size ) || ( fileAttrs.st_blocks ) ) != 0 ) 609 { 610 fprintf( stderr, "ifchk: ERROR: /proc/net/dev file size is greater than 0\n" ); 611 return (-1); 612 }
Is /proc/net/dev 0 bytes in length and does it occupy 0 disk blocks? If we see zero for both byte count and disk blocks, we continue on. If not, we print an error message via fprintf(3) and return -1, signifying failure. Notice that only one of the two file tests has to fail for the program to fail.
616 if( ( ( fileAttrs.st_uid ) || ( fileAttrs.st_gid ) ) != 0 ) 617 { 618 fprintf( stderr, "ifchk: ERROR: /proc/net/dev is not owned by UID 0, GID 0\n" ); 619 return (-1); 620 }
Is /proc/net/dev owned by user root and group root? Here again, only one of the two tests has to fail for the program to fail and return -1, signifying failure.
624 if( ( mode = fileAttrs.st_mode & ALLPERMS ) != MODEMASK ) 625 { 626 fprintf( stderr, "ifchk: ERROR: /proc/net/dev permissions are not mode 0444\n" ); 627 return (-1); 628 }
Is /proc/net/dev mode 0444--read-only user, group and other? ALLPERMS, defined in the /usr/include/sys/stat.h system header file, is a mask that defines all file permissions, or mode 07777. MODEMASK, defined within ~/ifchk-0.95b4/linux.h, is a mask that defines user, group and other read permissions, or mode 0444.
By bitwise ANDing fileAttrs.st_mode with ALLPERMS and then comparing that result to MODEMASK, we can see if /proc/net/dev is mode 0444. If it is, we continue execution. If not, we print an error message via fprintf(3) and return -1, signifying failure. This concludes our /proc/net/dev file attribute tests. However, before ifchk can work with the file, we must examine its internal contents.
Testing the internal content structure or format of /proc/net/dev required that I define a criteria under which ifchk would accept or reject the proc file. As Linux has progressed, the number of fields--bytes, packets, errs--in /proc/net/dev have changed. All of the /proc/net/dev files I have seen share an identical content structure to the file below; output to far right was truncated, due to space limitations:
Inter-| Receive | Transmit ... face |bytes packets errs drop fifo frame compressed multicast|bytes ... lo: 34230 586 0 0 0 0 0 0 34230 ... eth0:22476180 208548 0 0 0 0 0 0 52718375 ...
Two lines of headers are followed by lines of per-interface statistics. Older versions of the file do not contain the compressed field. In the name of simplicity, I decided that /proc/net/dev files that did not contain this field would be rejected by ifchk. With that foundation in place, we now begin the process of examining /proc/net/dev internally.
632 if( ! fgets(buf, sizeof( buf ), file) ) 633 { 634 perror( "fgets" ); 635 return (-1); 636 } 637 if( ! fgets(buf, sizeof( buf ), file) ) 638 { 639 perror( "fgets" ); 640 return (-1); 641 } 645 if( ( strstr( buf, "compressed" ) ) == NULL ) 646 { 647 fprintf( stderr, "ifchk: ERROR: /proc/net/dev header format is not supported\n" ); 648 return (-1); 649 }
We make two identical fgets(3) calls in a row to read the first and second line of headers from /proc/net/dev. Each fgets(3) call results in an overwrite of what previously was in buf. As a result, buf now contains the second header line. We then check to see if the second line of headers contains the compressed field.
If compressed is located in buf, our strstr(3) call succeeds and we have what looks like a usable /proc/net/dev file. If compressed cannot be located in buf, we print an error message via fprintf(3) and return -1, signifying failure. With this, all of our testing, which began by looking at file attributes, is done.
The remainder of our counter output code takes the form of a while loop that handles the processing and output of data for each interface. It iterates as many times as there are interfaces on the system.
653 printf( "*** Network Interface Metrics ***\n" ); 654 printf( "Name Index RX-OK TX-OK\n" ); 659 while( fgets( buf, sizeof( buf ), file ) ) 660 { 664 if( ( strstr( buf, cur -> data ) ) != NULL ) 665 { 666 delim = strchr( buf, ':' ); 670 if( *( delim + 1 ) == ' ' ) 671 { 672 conv = sscanf( buf, 673 "%*s %*Lu %Lu %*lu %*lu %*lu %*lu %*lu %*lu %*Lu %Lu %*lu %*lu %*lu %*lu %*lu %*lu", 674 &rxPackets, &txPackets ); 675 } 676 else 677 { 678 conv = sscanf( buf, 679 "%*s %Lu %*lu %*lu %*lu %*lu %*lu %*lu %*Lu %Lu %*lu %*lu %*lu %*lu %*lu %*lu", 680 &rxPackets, &txPackets ); 681 } 682 }
We call fgets(3) to read the next line of the /proc/net/dev file into buf. We then call strstr(3) to check that the interface name in cur -> data, ifmetric.txt:558:, matches the interface name in the /proc/net/dev line we just read in. Interface statistics lines in /proc/net/dev begin with an interface name, followed by a colon followed by a count of bytes received on that interface. In some cases, there is whitespace between the colon and the received bytes count, for example, eth0: 6571407, and in other cases, not, eth0:12795779).
In order to maintain uniform column output in either of these cases, we use pointer arithmetic to test for the existence of this white space. If whitespace does exist, we enter the if() statement on line 670, whose sscanf(3) format specifiers deal with it. If there is no whitespace, we enter the else block on line 676. These format specifiers deal with the lack of whitespace.
In either case, the counters for both received and transmitted packets are copied by sscanf(3) from /proc/net/dev to the variables rxPackets and txPackets for later output.
683 else 684 { 685 fprintf( stderr, "ifchk: ERROR: current metrics do not describe current interface %s\n", 686 cur -> data ); 687 return (-1); 688 }
If we compare the interface name in cur -> data to the name in /proc/net/dev and find a mismatch, we print an error message via fprintf(3) and return -1, signifying failure.
692 if( conv != 2 ) 693 { 694 fprintf( stderr, "ifchk: ERROR: /proc/net/dev parse error\n" ); 695 return (-1); 696 }
If successful, our above sscanf(3) call returns the number of matched items. As a result, the variable conv should equal 2 for rxPackets and txPackets. If not, we print an error message via fprintf(3) and return -1, signifying failure.
697 if( ( ifIndex = if_nametoindex( cur -> data ) ) == 0 ) 698 { 699 perror( "if_nametoindex" ); 700 return (-1); 701 }
Next, we call if_nametoindex(), passing it the interface name in cur -> data and, if successful, store the integer interface index in ifIndex. If the call fails, we handle it as usual. An interface index is a positive integer that the kernel assigns to each interface present on the system.
702 printf( "%-7s %-7d %-13Lu %-13Lu\n", cur -> data, ifIndex, rxPackets, txPackets ); 703 704 conv = 0; 705 if( cur -> next != NULL ) 706 { 707 cur = cur -> next; 708 } 709 }
Having built a line of counter output for the current interface, we print it. We then return to the beginning of the while loop and continue or, having reached our loop termination condition, exit the loop.
713 if( ( status = fclose( file ) != 0 ) ) 714 { 715 perror( "fclose" ); 716 return (-1); 717 } 721 if( ( writeLog( LOGINFO, pw -> pw_name, NULLSTATE ) ) != 0 ) 722 { 723 fprintf( stderr, "ifchk: ERROR: could not pass logging message to syslogd\n" ); 724 return (-1); 725 } 726 return (0); 727 }
Having exited the while loop, we call fclose(3), which corresponds to our fopen(3) call on line 578. We then call our logging function to log that an interface counter dump was performed.
With all processing done, ifchk produces the following packet count output on a system with two interfaces:
*** Network Interface Metrics *** Name Index RX-OK TX-OK lo 1 104 104 eth0 3 1280903 1162571<--. ^ ^ ^ | | | |[from /proc/net/dev]-----' | | | |[from if_nametoindex()] | |[from cur -> data]
The process filesystem provides all who make use of it with a wealth of system-level information. The ability to manipulate all manners of runtime state information by using file-level system calls and commands, such as cat(1) and echo(1), make proc a high priority candidate for inclusion in anyone's Linux toolkit.
Joshua Birnbaum began his system administration career in 1994. An addiction to SGI led to Sun and then to Linux. From there, he broadened his horizons by branching out into contract sysadmin, public speaking, UNIX/Linux systems programming and now writing for magazines. He can be reached at engineer@noorg.org.