Remote Debugging of Loadable Kernel Modules with kgdb: a Knowledge-based Article for Getting Started
As many kernel developers and hackers have known for years, loadable/unloadable kernel modules (like user-space applications) are almost never bug-free. With the continuing use and development of loadable modules growing, due in fact to the obvious benefits of the mechanism (lean kernels, reduction of kernel recompiles/reboots, etc.), developers are in an increasing need for robust debugging tools capable of aiding in the identification of problem code. Traditionally, module developers have used various debugging techniques to help identify problematic code. These techniques have included:
printk statements around suspected areas of failure (probably the most useful)
Oops analysis (also quite useful)
Magic Key combinations (for recovery of system hangs, displaying register contents, etc.)
kgdb is a kernel patch that, once applied, allows for the use of the familiar gdb interface for source-level debugging of a running Linux kernel. The process requires the use of two machines. One machine runs the kernel being debugged while the other runs the gdb session. Communication between the running kernel and gdb transpires via a serial cable connecting the two machines.
The kgdb patch supplies the kernel with a debugging stub. This stub uses the gdb remote serial protocol to communicate with gdb through a serial driver interface (also supplied by the patch). This patch is applied to the kernel on the machine that will run the gdb session (the development machine) where it is recompiled. The newly compiled boot image is then copied to the other machine (the target machine) where it is configured as the bootable kernel. When a reboot into the transferred kernel is complete, the target machine can then be configured to halt and await a remote connection from a gdb session on the development machine. When this connection is established, the target machine's kernel can then be debugged (single-stepping, issuing of breakpoints, data examination, etc.) through gdb on the development machine as if it were a user-space application.
The first step is to download the kgdb patch for your kernel version. A patch can be obtained at http://kgdb.sourceforge.net/. As of this writing, patches only exist for the following kernels:
2.4.0-test9
2.4.0-test4 (kernel used for this article)
2.4.0-test1
2.3.99-pre6
2.2.17
2.2.12
Once you have obtained the patch, copy the patch to the kernel source directory on the development machine, and apply.
patch -p1 < patchfile(remember, this is the kernel that will eventually turn on the target machine)
The patched kernel must now be recompiled. It is assumed here that /usr/src/linux/.config exists and accurately reflects your current kernel configuration. Navigate to the source directory (if you aren't there already), and do a make menuconfig. From the main menu navigate to and select kernel hacking. You should now see an option for Remote (serial) debugging with gdb. Make sure this option is selected and then exit, saving your configuration. Next, do a make clean followed by a make bzImage (or whatever image you usually make).
The recompile adds a documentation file called gdb-serial.txt to your system. This file can be found in /usr/src/linux/Documentation/i386 and includes a step-by-step description of what needs to transpire next. Basically, here are the highlights.
The newly compiled kernel image (e.g., bzImage) is copied to the target machine where it is configured for boot. For example, the image may be copied to /boot/vmlinuz-target (or whatever you want to call it) followed by an added entry in lilo.conf:
image= /boot/vmlinuz-target label=target_kernel read-only root=/dev/hda1
Next, run LILO at the command line, and reboot into the new kernel.
On the development machine, navigate to /usr/src/linux/arch/i386/kernel. Here you will find an executable called gdbstart. Copy this program to the target machine. gdbstart is responsible for configuring the target machine's serial port (from user space) for communication with gdb on the development machine. The program then calls a process ioctl that activates the serial driver interface to the debugging stub. This driver effectively halts the target system until gdb on the development machine issues a continuance to resume execution.
Next, decide which serial port (i.e., ttyS0 or ttyS1) is to be used as well as a baud rate for communication (e.g., /dev/ttyS0 with a baud rate of 38,400).
Connect the two machines with a null modem serial cable. Be sure to connect the cable to the serial ports you have designated in the above step.
Run the gdbstart program on the target machine with the following parameters (or whatever port and data rate you decide upon):
gdbstart - s 38400 - t /dev/ttyS0The program will execute and pause, awaiting a remote connection from the development machine.
Alternatively, the documentation suggests creating a script on the target machine to deliberately call gdbstart with user-defined parameters.
The documentation next instructs you to create a .gdbinit file in /usr/src/linux on the development machine. Included in this file is a macro (called rmt) that is used to supply gdb with the information it needs to initiate the remote protocol. Edit this information to reflect the com port and data rate you have designated for communication between gdb and the target machine.
Now, navigate to /usr/src/linux on the development machine, and run gdb vmlinux. Once you receive a gdb prompt enter rmt, which informs gdb that it is connecting to a remote target (via the serial port and data rate specified in the .gdbinit file).
You should now see something that resembles Listing 1.
Listing 1. gdb Connecting to a Remote Target
You can now issue step commands, set breakpoints, etc. Issuing a continue to gdb will return the target kernel to a running state. The kernel will continue to run until it encounters a defined breakpoint, an interrupt, a signal, a segment violation, etc., at which point control is returned to gdb on the development machine.
In order for us to use kgdb to debug loadable modules, we must understand how the remote kernel communicates with gdb on the development machine. Remember, the mechanism's default communication between gdb on the development machine and the debugging stub on the target machine transpires via a serial driver interface (called gdbserial.o). In order for the debugging process to begin, two things must happen from within this driver. First, the set_debug_traps( ) is initiated. This function is defined in the debugging stub and informs the remote kernel that all breakpoints, error conditions and other exception handling is to be intercepted and handled by gdb. Secondly, the serial driver must call the function breakpoint( ). This function is also defined in the debugging stub and is used to initiate the communication by issuing a breakpoint interrupt:
asm( " int $3");
Since gdb is now configured to intercept such a condition (i.e., the set_debug_traps( ) call), the kernel on the target machine halts and transfers control to gdb on the development machine. It is from this point that the user may begin normal debugging such as single-stepping, issuing of breakpoints, stack tracing, etc. However, if we were to begin stepping from this initial point, the code we would be examining would be in gdbserial immediately following the call to the debugging stub's breakpoint( ) (since this is where program execution has halted).
For example, Listing 2 is the excerpt of gdbserial in which the two calls to the previously explained stub functions are called.
Listing 2. An Excerpt of gdbserial
Now, if we initiate another debug session and issue a step command after gdb receives the stub's breakpoint interrupt, we would step to the next line of code in gdbserial after the breakpoint( ) is made (which should be gdb_null in the example in Listing 3).
Listing 3. Sample gdb Dialogue with Breakingpoints
As was mentioned before, gdb can return the remote kernel to a running state by issuing a continue command. If this is done, gdb patiently waits until the remote kernel returns control by issuing some sort of exception (such as a user-defined breakpoint, segmentation fault, etc.).
For a better understanding of how the entire process works review the debugging stub code found in /usr/src/linux/arch/i386/kernel/gdbstub.c and the serial driver interface, which can be found in /usr/src/linux/drivers/char/gdbserial.c.
We now have almost all the information we need in order begin using the kgdb mechanism to debug loadable modules. Remember, the important information we must retain from kgdb's communication process in order to initiate module debugging is: 1) at the beginning of the debug session, the debugging stub informs gdb on the development machine that it is responsible for intercepting and handling all exceptions from the remote target kernel for example, gdbserial's initial call to set_debug"traps( ); 2) the initial debug process begins with the serial interface's call to the debugging stub's breakpoint( ) function; 3) gdb can return the remote kernel to a running state but will regain control once the remote kernel issues any type of exception.
Additionally, we must consider that because the module will be loaded on the target machine and the gdb session runs on another machine, gdb will have no idea where in the target machine's memory the module code will be loaded. We must therefore determine this location and inform gdb of its whereabouts before the debugging of the module can begin.
During the kernel-building process, the kernel produces a file that maps addresses in memory to function names for the modules/drivers that are built during the compile process. This file is usually placed in the root of the kernel source directory (i.e., /usr/src/linux/System.map) and is used by the kernel to access those compiled devices properly in the correct memory location. However, at the construction of that file the kernel is unaware of where in memory a particular module may be loaded at a later time.
Fortunately, we can determine the memory location for modularized code during its load process. This is accomplished by using insmod with the -m parameter that informs insmod to produce a load map. This map informs us of where in memory the object-code sections reside. Ultimately, we must locate this information in order to inform gdb on our development machine of where our module's object code resides on the target machine. To illustrate, let's consider the module code shown in Listing 4, which we will refer to as the simple module.
As you can see, the simple module contains just enough functionality to to identify its memory address on the target machine when loaded.
On the development machine compile the module:
gcc -c -O2 -g simple.c
Be sure to include the -g option during compilation in order to enable debugging symbols for gdb.
Copy the compiled object code to the target machine.
Install the object code into the kernel using insmod's -m parameter:
insmod -m simple.oThis, of course, loads the module and produces a similar load map as shown in Listing 5.
Listing 5. Memory Map of Kernel Module Size and Location
Record the hex address of the .text section (0xc480004c in our example) for use later. This section represents the beginning of the module code in memory. We will use this value to inform the development machine running gdb of the module's whereabouts in memory.
Unload the module:
rmmod simple
We are almost ready to begin debugging a module. But before we proceed, we have to alleviate another possible issue. In gdb we will be using the add-symbol-file to inform the debugger of the memory location for our module's object code on the target machine. However, gdb 5.0 and previous versions have had problems in correctly calculating addresses using the add-symbol-file command (the problem surrounds the issue of a module's global variables). The problem has been corrected in developmental versions of gdb. It is therefore recommended that you use a developmental version of gdb to debug modules with kgdb. The gdb version used for the remainder of this article is a developmental gdb built for Red Hat 6.2. For more information regarding this issue, visit http://kgdb.sourceforge.net/.
The last step we must account for is the configuration of our module to communicate with the remote gdb session. This is a relatively simple task. We now know that when gdb on the development machine makes the initial contact with the target machine, the gdbserial interface issues a call to the debugging stub's set_debug_traps( ) function. This function, as you recall, instructs gdb to perform all exception handling for the target kernel. The serial interface then issues a call to the stub's breakpoint( ) function, which turns control over to gdb. At this point we can inform the remote kernel to resume normal operations by issuing a continue command from gdb.
With the target kernel now configured to return control to the remote gdb session whenever an exception is triggered, we can modify our module code to guarantee that such an event will arise. Consider the following modifications made to the simple module in Listing 6.
Listing 6. Modified Sample Module to Include Further Remote Control
As you can see by the code in Listing 6, we have added a breakpoint interrupt that is to be called (in this example) when the module is both loaded and unloaded from the kernel. These interrupts will return control to the remote gdb session, thus halting execution of the module at those points. Let's try it out.
On the development machine, recompile the module after adding the BREAKPOINT code:
gcc -c -O2 -g simple.c
Copy the newly compiled object code to the target machine.
Initiate a remote debug session on the target machine by running the gdbstart program.
On the development machine, navigate to /usr/src/linux and run gdb vmlinux. Remember to use a developmental version of the debugger as described in the previous section.
Once prompted in gdb, type rmt to initiate contact with the remote machine.
With the previously recorded hex address, use add-symbol-file to instruct gdb of the modules location in memory:
add-symbol-file /root/simple.o 0xc480004c
You may be wondering at this point how we can assign this address space when the module is not currently loaded. Or a better question may be, ``How do we know that the kernel will load the module into the exact same location in memory?'' We can make this assumption because the kernel very often will load the object code into the same memory segment as before. While this is not written in stone it does happen frequently enough to render this method quite reliable.
Return control to the remote kernel by issuing a continue command from gdb.
On the target machine, install the module:
insmod simple.oWhen insmod invokes the init_module modules, our breakpoint interrupt is called and returns control to gdb. This allows us to step through the remainder of the init_module as if it were a user-space application as shown in Listing 7.
Listing 7. User Space gdb Dialogue Running Modified Simple Module
Note that this sequence reflects an example of stopping the module's load process. The module could be installed with the insmod -m parameter again to verify memory placement of the object code if other module functionality (other than the initialization process) was to be debugged (e.g., file operation functions: open, read, write, close, ioctl; driver resource allocation/deallocation, etc.).
Return control to the remote kernel from gdb (i.e., continue).
On the target machine, remove the simple module:
rmmod simple
This of course issues the cleanup_module function, which in turn invokes another of our breakpoints, returning control to gdb on the development machine:
(gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0xc4800068 in cleanup_module () at simple.c:38 38 } // end function init_module (gdb)
Return control to the remote kernel and stop the debug session.
Thanks go to all those contributors at http://kgdb.sourceforge.net/ and http://oss.sgi.com/ for their continuing work on kgdb.
A special thanks goes to kgdb's original author, David Grothe, and to Amit S. Kale for his outstanding ongoing work and enhancements to the system. Thanks guys.
James Lamphere has a BA in music history from Eastern Washington University and is currently working on an interdisciplinary masters degree in computer science at Eastern Washington University specializing in operating system-level development. He is a graduate instructor/system administrator in the computer science department at EWU.