Loadable Kernel Module Programming and System Call Interception
Modern CPUs can run in two modes: kernel mode and user mode. When a CPU runs in kernel mode, an extended set of instructions is allowed, as is free access to anywhere in memory and device registers. Interrupt drivers and operating system services run in kernel mode. In contrast, when the CPU runs in user mode, only a restricted set of instructions is allowed, and the CPU has a restricted view of the memory and devices. Library functions and user programs run in user mode. Kernel and user mode together form the basis for security and reliability in modern operating systems.
Programs spend most of their time in user mode and switch to kernel mode only when they need an operating system service. Operating system services are offered through system calls. System calls are “gates” into the kernel implemented with software interrupts. Software interrupts are interrupts produced by a program and processed in kernel mode by the operating system.
The operating system maintains a “system call table” that has pointers to the functions that implement the system calls inside the kernel. From the program's point of view, this list of system calls provides a well-defined interface to the operating system services. You can obtain a list of the different system calls by looking at the file /usr/include/sys/syscall.h. In Linux, this file includes the file /usr/include/bits/syscall.h.
Loadable modules are pieces of code that can be loaded and unloaded into the kernel on demand. Loadable modules add extra functionality to the kernel without the need of rebooting the machine. For example, it is common in Linux to use loadable modules for new device drivers. The alternative to loadable modules is a monolithic kernel where new functionality is added directly into the kernel code. Monolithic kernels have the disadvantage of needing to be rebuilt and reinstalled every time new functionality is added.
Kernel programming can be difficult not only because of the intrinsic complexity but also because of the long debugging cycle. Debugging an operating system may require installing a new kernel and rebooting the machine in every cycle. We strongly recommend using loadable modules in kernel development because a) there is no need to rebuild the kernel or to reboot the machine more often than necessary; and b) since the end user does not need to replace/rebuild the existing kernel, the user is more likely to install the new functionality.
Loadable module support within the Linux kernel facilitates the interception of system calls, and this feature can be taken advantage of as described within the examples below. As a note, it is assumed that the reader is familiar with C programming.
Operating systems provide entry points through system calls that allow user-level processes to request services from the kernel. It is important to distinguish between system calls and library functions. Library functions are linked to the program and tend to be more portable since they are not bound to the kernel implementation. However, many library functions use system calls to perform various tasks within the system kernel. To illustrate, consider this C program that opens a file and prints its contents:
#include <stdio.h> int main(void) { FILE *myfile; char tempstring[1024]; if(!(myfile=fopen("/etc/passwd","r"))) { fprintf(stderr,"Could not open file\n"); exit(1); } while(!feof(myfile)) { fscanf(myfile,"%s\n",tempstring); fprintf(stdout,"%s\n",tempstring); } exit(0); }
Within the program, we used the fopen function call in order to open the /etc/passwd file. However, it is important to note that fopen is not a system call. In fact, fopen calls the system call open internally in order to do the real I/O. To get a list of all the system calls invoked by a program, use the strace program. Assuming you have compiled the above program as a.out by running gcc example1.c, running strace like : strace ./a.out will allow you to see all the system calls being invoked by a.out.
The kernel switches to the user-id of the process owner invoking the system call. So, if a regular user were to run the above program, with /etc/shadow (which is not readable) as the parameter to fopen, the open would fail and so would fopen, causing the if clause above to translate to true, thus printing the Could not open file error message.
Assume that we want to intercept the exit system call and print a message on the console when any process invokes it. In order to do this, we have to write our own fake exit system call, then make the kernel call our fake exit function instead of the original exit call. At the end of our fake exit call, we can invoke the original exit call. In order to do this, we must manipulate the system call table array (sys_call_table). Take a look at /usr/src/linux/arch/i386/kernel/entry.S (assuming you are on an i386 architecture). This file contains a list of all the system calls implemented within the kernel and their position within the sys_call_table array.
Armed with the sys_call_table array, we can manipulate it to make the sys_exit entry point to our new fake exit call. We must store a pointer to the original sys_exit call and call it when we are done printing our message to the console. Source code to implement the above is as shown in Listing 1.
Compile the program shown in Listing 1 by invoking gcc: gcc -Wall -DMODULE -D__KERNEL__ -DLINUX -c example2.c. This gives us our example2.o module. In order to insert this module into the kernel, do this as root: insmod example2.o. Now, make sure you are on the console (since printk only prints to the console), and run any program which uses the exit system call. For example, ls should print: HEY! sys_exit called with error_code=0.
Next, try to invoke ls with a file that does not exist; this should cause ls to call the exit system call with an argument other than 0. Therefore, ls somefilethatdoesnotexist should print: HEY! sys_exit called with error_code=1.
In order to list all the modules loaded, use lsmod. To remove the module, run rmmod example2.
“root-kits”
After a machine is compromised, malicious users tend to replace commonly used programs with trojan horses (programs that execute malicious instructions in addition to their normal functions). Packages of such trojan horses are widely distributed over the Internet and are easily accessible by anyone. Therefore, it becomes important to protect programs from being replaced by malicious users.
In order to protect against such problems, our next example involves the interception of various system calls, most importantly sys_execve, to check the hash of the program to be executed against a known hash present in a database file. The program is denied execution if the hashes do not match, and such an attempt is logged. One way to implement this is seen in the following steps:
Intercept sys_execve and compute the inode of the file being executed, than compare it with the inodes of the files present in the hash database. Inodes are data structures that contain information about files in the file system. Since there is a unique inode for each file, we can be certain of our comparison results. If no match is found, call the original sys_execve and return. However, if a match is found, compute the hash of the program and then compare it with the hash present in the hash database. If they match, call the original sys_execve and return. If they do not match, log the attempt and return an error.
Intercept sys_delete_module. If called with our module name as the parameter, return an error. Our module cannot be deleted.
Intercept sys_create_module, and return an error. No more modules can be inserted because we do not want any malicious module to be able to intercept the sys_execve described in step 1.
Intercept sys_open to prevent our hash database and log file to be opened for writing.
Intercept sys_unlink to prevent deletion of the hash database and log file.
Note that the above does not offer complete protection, but it is a simple first-step implementation. For example, a malicious user may modify kernel symbols in /dev/kmem or use raw device access to the hard disk, and bypass open to write to the hash database file. Also, since our implementation is only a loadable module, a malicious user can alter our /etc/rc.d files and stop our module from being loaded the next time the machine is rebooted. In addition, various other system calls exist that could cause our hash database and log files to be altered or deleted.
At this time, it becomes important to acknowledge the possibility of loadable module support being misused by a malicious user. For example, the sys_execve function call can be intercepted to invoke a trojan program instead of the one intended, and system calls such as read and write can be intercepted to perform keystroke logging. Therefore, the flexibility and power of loadable kernel modules can be misused by malicious users who may have gained access to the system. See Resources for a web site that has details of this example along with complete source code.
Nitesh Dhanjani is a graduate student at Purdue University. His interests are operating systems, networking and security. He has performed security audits and reviews for various firms including Ernst & Young LLP, and offers consulting services in his spare time. He can be reached at dhanjani@dhanjani.com.
Gustavo Rodriguez-Rivera is a Visiting Assistant Professor at Purdue University and is also software architect for Geodesic Systems. His interests are operating systems, networking and memory management. He can be reached at grr@cs.purdue.edu.