The Perl Debugger
This article is a tutorial about the Perl5 source debugger and assumes that the reader has written at least one or more simple Perl programs. It is best read in front of a computer, following along with a copy of the code, available at Linux Journal's FTP site (see Resources). The version of Perl that I use is perl5.004_1, which comes with the Perl debugger level 1. I've noticed some subtle differences between this and earlier versions of the debugger. If something discussed here doesn't work for you, consider upgrading.
The Perl programming language is being used increasingly on the World Wide Web as the back end to Common Gateway Interface (CGI) forms and interactive web pages, as well as for automated scripts for maintaining web sites and Unix servers in general. As a result, more and more users are beginning to learn Perl.
Conceptually, a debugger is a tool which allows the programmer a greater degree of control over the execution of the program without having to physically insert code that provides this control. A debugger allows the programmer to step through the program code, line by line if necessary. It allows peeks into the contents of the variables of the program, as well as into the stack, which is basically the list of functions (known as subroutines in Perl parlance) that have been called in order to get from the the main part of the program to the current point of execution.
There are many different debuggers. Some, such as dbx or gdb, are separate programs that can be used to debug programs written in languages such as C, C++, Modula-2 or FORTRAN. (gdb, for instance, can handle C, C++ and Modula-2, according to its man page, which on my system dates from 1991, so by now it may cover FORTRAN.) Programming environments from Borland, Microsoft and others may have debugging capabilities built into their windowing environment.
Invoking the Perl debugger is as easy as invoking Perl itself. All one needs to do is provide the -d option when invoking the Perl interpreter, like this:
perl -d perlscript.pl
Perl has also been ported to Win32 systems and can be invoked similarly. If your system supports the #! syntax for scripts, you can have this as the first line of your Perl script (assuming you keep the Perl interpreter in /usr/bin):
#!/usr/bin/perl -d
This option isn't supported under Win32 systems (that I know of), but there are ways to simulate it. See the appropriate documentation.
The Perl debugger can also run under Emacs, creating an integrated programming environment that is similar to products from Borland or others.
Generally, when using a program written in Perl, you are invoking the program with the Perl interpreter. A Perl compiler is on the horizon, but will not be directly covered by this article. (The most logical way to debug code intended for the Perl compiler is to use the standard debugger until the code is “bug free”, then compile it.)
Under normal conditions, the Perl interpreter will read in the Perl script and will do a certain amount of compilation, turning your Perl code into some highly-optimized instructions, which are then interpreted. When using the debugger, extra Perl code is inserted into your code before it is handed off to the interpreter. Also, a library file, called in current releases perl5db.pl, is required in your Perl script. This final script is interpreted, resulting in the program running in the debugging environment.
When programming in Perl, you should probably always use the warning flag. Use this just as you would use the debug flag, as follows:
perl -w
When you are getting strange results from your program, you should definitely use the warning flag. The warning flag causes Perl to issue warnings regarding your code. These warnings are about things which are not fatal, but may cause problems. You can view the warnings as critiques of your coding style. Common warnings are those indicating that a certain variable has been used only once (perhaps a typo), or that a used package can't be found (maybe the package is not available on your system or is installed incorrectly).
Perl doesn't make you specify function prototypes and allows you to create variables at any point, so you don't have the advantages of type checking, although, with Perl 5 you can optionally have type checking for subroutines.
This tutorial covers the debugger commands that I've found the most useful. The perldebug man page has a complete list of commands.
The most important command that can be entered into the debugger is h, which prints out a help screen. This tends to scroll off the screen, so type h h to see the help screen better formatted to fit your screen. Or, you can type |h, which will pipe the output of the command h into a pager, such as more or less. You can define what pager to use by setting the PAGER environmental variable to whatever pager you prefer. I prefer using less. (You can actually do this from within the debugger by typing:
$ENV{'PAGER'} = "/usr/bin/less"
at the debugger prompt.) This piping mechanism works with more than just the help command, so if you ever do something and the result moves off your screen, try prepending it with a pipe. You can get help with individual commands by typing h command.
Now let's look at actual examples of using the debugger. We'll start with the simple snippet of Perl code in Listing 1, called p1.pl. Notice that we are executing the code with the -d option to perl, invoking the debugger. Upon invoking the script under the debugger, we'll see the following:
Loading DB routines from perl5db.pl version 1 Emacs support available. Enter h or h h for help. main::(./p1.pl:3): if(0) { DB<1>
The debugger has suspended the normal execution of p1.pl, and is waiting for a command. Notice that we are given some information concerning where we are in the text of the program. The string main::(./p1.pl:3): tells us that we are in the main part of the Perl code, that the program we are executing is ./p1.pl and that we are at line three of the code. If we were in the middle of another Perl package, that package name would be listed here. We are also shown that line three is if(0) {. When we see code on a line, we have not yet executed it; rather it is this line in the code that is about to be executed. The next line, DB<1>, is a prompt at which to enter the first command to the debugger. If you enter a command and wish to repeat it, you can enter ! comnum, where comnum is the command number you wish to repeat.
We can see more of the surrounding script by typing l and pressing enter. Be careful not to put white space before this or any other commands. Doing so tells the debugger that what follows is not a command. Instead, the debugger will try to execute the code as normal Perl code and will evaluate it in the current context of the program being debugged. The debugger will do the same thing for any input it doesn't recognize as a debugger command. Using the character ; (semicolon) to end the command is optional.
Entering l (letter l for list) causes the following lines to appear on the screen:
3==> if(0) { 4: print "Can't get here!\n"; 5 } 6 7: while ($i < 10) { 8: $i++; 9 } 10 11: if($i >= 9) { 12: print "Hello, world!\n"; DB<1>
Notice the arrow, ==>. This represents the current line of code. In this case, it is line 3 and is the first actual line of Perl code. Notice also that all the lines which actually have executable code on them are labeled with a : (colon) after the line number. This is important, because later on when we get into breakpoints and action points, we will only be able to set them at these lines.
Entering l again yields this output:
13 } 14 15: exit 0; DB<1>
The l without any arguments reveals the next window of Perl code. Subsequent usage reveals the next window and the next. There is an internal line pointer that gets incremented one window each time l is used. To back up a window, type - (hyphen) and press enter, then press l again.
There are also arguments to the l command, dealing with various ways of specifying what lines are printed based on their line numbers. We will use some of them as we need them. Similar to l is w, which prints out windows of program text. See the perldebug man page for details.
There are two ways to execute the code. We know that the current line is 3 and is an if statement. The first method, s, is to step through the code, statement by statement. The other method is n, for next, which similarly steps through the code; however, in the case where the current statement is a subroutine call (as opposed to a built-in function or some sort of variable assignment), n will treat the subroutine as though it were a built-in function and will step over the subroutine, as if it is an atomic command. In contrast, s will enter the subroutine and step through every line of the subroutine. It will do the same for any subroutines encountered within the first one. This can be annoying when we know that the subroutine is working correctly—hence the n command. For this simple example, where we have no subroutines, n has the same effect as s. After entering s or n, we can simply press enter, and the debugger will reissue the last s or the last n command. This is useful to get through lines of code quickly. Pressing s displays the following:
main::(./p1.pl:7): while ($i < 10) {
Notice that we've skipped from line 3 to line 7. Enter l 3+4. This shows us four lines from line 3. We skipped to line 7 because the conditional in line 3, if(0), is false. So the then part of the conditional is ignored, and the else portion is executed.
Notice that there is a variable in the code, $i. We know that the body of the while loop will be executed until $i is greater than or equal to 10. (Enter l 7+10 to see the body of the while loop.)
So what value does $i have now? Type p $i. The print command is p, and without an argument; it will print the contents of the magic Perl variable $_. Any valid Perl expression is a valid argument to p. Because anything that the debugger doesn't recognize as a debugger command is evaluated as Perl code, you could also type print instead of p. Don't worry about having redirected standard output to something other than your screen. The debugger will take care to ensure that you'll see some output. But, typing p is quicker than typing print, and as any good programmer knows, laziness is one of the “programmer's virtues”, the other two being hubris and impatience (Larry Wall, see Resources).
Typing p $i results in nothing. No, we didn't do anything wrong. $i hasn't been set to anything, so it gets the default value of nothing. Type s (or just press enter). Try p $i again. It should print the number 1. Press enter again and type p $i again. Now, we could continue this, but we know that we will keep spinning in this while loop until the conditional returns false, which won't happen until $i is no longer less than 10. And, as I said before, impatience is another programmer's virtue, so we'll rush things along a bit. Enter $i = 8, then press enter again. Do it one more time, and we've broken free of the loop.
The last conditional checks that $i is at least equal to 9. Because it now is, the then portion of the if statement will not get executed. Note that we could have set $i back to 2 before we execute the final if statement. The result would have been an execution that under normal conditions (i.e., without using the debugger) could never have occurred (assuming the computer is working properly, and no bits in memory get fiddled).
As any good first program should, our first debug program prints Hello, World! to the screen. Notice that even under the debugger, this happens as it should. Pressing enter one more time terminates the program.
The code in Listing 2 is a more complex piece of code with a bug in it. It should print out every regular file in the current directory and all subdirectories, recursively. Right now, it only prints the files in the current directory and doesn't seem to delve into further subdirectories.
Execute this program in a directory with a few subdirectories and place files and further subdirectories in these subdirectories to create a small but diverse hierarchy.
The output of this code (once the bug gets fixed) from the directories I ran it in, looked like this:
./file1 ./dir1.0/file1 ./dir1.0/file2 ./dir1.0/file3 ./dir1.0/dir1.1/file1 ./dir1.0/dir1.1/file2 ./dir1.0/dir1.1/file3 ./dir2.0/file1 ./dir2.0/file2 ./dir2.0/file3 ./dir2.0/dir2.1/file1 ./dir2.0/dir2.1/file2 ./dir3.0/file1
There is one more variation of the list code command, l. It is the ability to list the code of a subroutine, by typing l sub, where sub is the subroutine name.
Running the code in Listing 2 returns:
Loading DB routines from perl5db.pl version 1 Emacs support available. Enter h or h h for help. main::(./p2.pl:3): require 5.001; DB<1>
Entering l searchdir allows us to see the text of searchdir, which is the meat of this program.
22 sub searchdir { # takes directory as argument 23: my($dir) = @_; 24: my(@files, @subdirs); 25 26: opendir(DIR,$dir) or die "Can't open \" 27: $dir\" for reading: $!\n"; 28 29: while(defined($_ = readdir(DIR))) { 30: /^\./ and next; # if file begins with '.', skip 31 32 ### SUBTLE HINT ###As you can see, I left a subtle hint. The bug is that I deleted an important line at this point.
If we were to step through every line of code in a subroutine that is supposed to be recursive, it would take all day. As I mentioned before, the code as in Listing 2 seems only to list the files in the current directory, and it ignores the files in any subdirectories. Since the code only prints the files in the current, initial directory, maybe the recursive calls aren't working. Invoke the Listing 2 code under the debugger.
Now, set a breakpoint. A breakpoint is a way to tell the debugger that we want normal execution of the program until it gets to a specific point in the code. To specify where the debugger should stop, we insert a breakpoint. In the Perl debugger, there there are two basic ways to insert a breakpoint. The first is by line number, with the syntax b linenum. If linenum is omitted, the breakpoint is inserted at the next line about to be executed. However, we can also specify breakpoints by subroutine, by typing b sub, where sub is the subroutine name. Both forms of breakpointing take an optional second argument, a Perl conditional. If when the flow of execution reached the breakpoint the conditional evaluates to true, the debugger will stop at the breakpoint; otherwise, it will continue. This gives greater control of execution.
For now we'll set a break at the searchdir subroutine with b searchdir. Once the breakpoint is set, we'll just execute until we hit the subroutine. To do this, enter c (for continue).
Looking at the code in Listing 2, we can see that the first call to searchdir comes in the main code. This seems to works fine, or else nothing would be printed out. Press c again to continue to the next invocation of searchdir, which occurs in the searchdir routine.
We wish to know what is in the $dir variable, which represents the directory that will be searched for files and subdirectories. Specifically, we want to know the contents of this variable each time we cycle through the code. We can do this by setting an action. By looking at the program listing, we see that by line 25, the variable $dir has been assigned. So, set an action at line 25 in this way:
a 25 print "dir is $dir\n"
Now, whenever line 25 comes around, the print command will be executed. Note that for the a command, the line number is optional and defaults to the next line to be executed.
Pressing c will execute the code until we come across a breakpoint, executing action points that are set along the way. In our example, pressing c continuously will yield the following:
main::(../p2.pl:3): require 5.001; DB<1> b searchdir DB<2> a 25 print "dir is $dir\n" DB<3> c main::searchdir(../p2.pl:23): my($dir) = @_; DB<3> c dir is . main::searchdir(../p2.pl:23): my($dir) = @_; DB<3> c dir is dir1.0 main::searchdir(../p2.pl:23): my($dir) = @_; DB<3> c dir is dir2.0 main::searchdir(../p2.pl:23): my($dir) = @_; DB<3> c dir is dir3.0 file1 file1 file1 file1 DB::fake::(/usr/lib/perl5/perl5db.pl:2043): 2043: "Debugged program terminated. Use `q' to quit or `R' to restart."; DB<3>
Note that older versions of the debugger don't output the last line as listed here, but instead exit the debugger. This newer version is nice because when the program has finished it still lets you have control so that you can restart the program.
It still seems that we aren't getting into any subdirectories. Enter D and A to clear all breakpoints and actions, respectively, and enter R to restart. Or, in older debugger versions, simply restart the program to begin again.
We now know that the searchdir subroutine isn't being called for any subdirectories except the first level ones. Looking back at the text of the program, notice in lines 44 through 46 that the only time the searchdir subroutine is called recursively is when there is something in the @subdirs list. Put an action at line 42 that will print the $dir and @subdirs variables by entering:
a 42 print "in $dir is @subdirs \n"
Now, put a breakpoint at line 12 to prevent the program from outputting to our screen (b 12), then enter c. This will tell us all the subdirectories that our program thinks are in the directory.
main::(../p2.pl:3): require 5.001; DB<1> a 42 print "in $dir is @subdirs \n" DB<2> b 12 DB<3> c in . is dir1.0 dir2.0 dir3.0 in dir1.0 is in dir2.0 is in dir3.0 is main::(../p2.pl:12): foreach (@files) { DB<3>This program sees that there are directories in “.”, but not in any of the subdirectories within “.”. Since we are printing out the value of @subdirs at line 42, we know that @subdirs has no elements in it. (Notice that when listing line 42, there is the letter “a” after the line number and a colon. This tells us that there is an action point here.) So, nothing is being assigned to @subdirs in line 37, but should be if the current (as held in $_) file is a directory. If it is, it should be pushed into the @subdirs list. This is not happening.
One error I've committed (intentionally, of course) is on line 38. There is no catch-all “else” statement. I should probably put an error statement here. Instead of doing this, let's put in another action point. Reinitialize the program so that all points are cleared and enter the following:
a 34 if( ! -f $_ and ! -d $_ ) { print "in $dir: $_ is weird!\n" } b 12" c
which reveals:
main::(../p2.pl:3): require 5.001; DB<1> a 34 if( ! -f $_ and ! -d $_ ) { print "in $dir: $_ is weird!\n" } DB<2> b 12 DB<3> c in dir1.0: dir1.1 is weird! in dir1.0: dir2.1 is weird! in dir1.0: file2 is weird! in dir1.0: file3 is weird! in dir2.0: dir2.1 is weird! in dir2.0: dir1.1 is weird! in dir2.0: file2 is weird! in dir2.0: file3 is weird! main::(../p2.pl:12): foreach (@files) { DB<3>While the program can read (through the readdir call on line 29) that dir1.1 is a file of some type in dir1.0, the file test (the -f construct) on dir1.1 says that it is not.
It would be nice to halt the execution at a point (line 34) where we have a problem. We can use the conditional breakpoint that I mentioned earlier to do this. Reinitialize or restart the debugger, and enter:
b 34 ( ! -f $_ and ! -d $_ ) c p p $dir
You'll get output that looks like this:
main::(../p2.pl:3): require 5.001; DB<1> b 34 ( ! -f $_ and ! -d $_ ) DB<2> c main::searchdir(../p2.pl:34): if( -f $_) { # if its a file... DB<2> p dir1.1 DB<2> p $dir dir1.0 DB<3>The first line sets the breakpoint, the next c executes the program until the break point stops it. The p prints the contents of the variable $_ and the last command, p $dir prints out $dir. So, dir1.1 is a file in dir1.0, but the file tests (-d and -f) don't admit that it exists, and therefore dir1.1 is not being inserted into @subdirs (if it's a directory) or into @files (if it's a file).
Now that we are back at a prompt, we could inspect all sorts of variables, subroutines or any other Perl construct. To save you from banging your heads against your monitors, and thus saving both your heads and your monitors, I'll tell you what is wrong.
All programs have something known as the current working directory (CWD). By default, the CWD is the directory where the program starts. Any and all file accesses (such as file tests or file and directory openings) are made in reference from the CWD. At no time does our program change its CWD. But the values returned by the readdir call on line 29 are simply file names relative to the directory that readdir is reading (which is in $dir). So, when we do the readdir, $_ gets assigned a string representing a file (or directory) within the directory in $dir (which is why it's called a subdirectory). But when running the -f and -d file tests, they look for $_ in the context of the CWD. But it isn't in the CWD, it's in the directory represented by $dir. The moral of the story is that we should be working with $dir/$_, not just $_. So the string
###SUBTLE HINT###
should be replaced by
$_ = "$dir/$_"; # make all path names absoluteThat sums it up. Our problem was we were dealing with relative paths, not absolute (from the CWD) paths.
Putting it back into our example, we need to check dir1.0/dir1.1, not dir1.1. To check to make sure that this is what we want, we can put in another action point. Try typing:
a 34 $_ = "$dir/$_" c
In effect this temporarily places the corrective measure into our code. Action points are the first item on the line to be evaluated. You should now see the proper results of the execution of the program:
DB<1> a 34 $_ = "$dir/$_" DB<2> c ./file1 ./dir1.0/file1 ./dir1.0/file2 ./dir1.0/file3 ./dir1.0/dir1.1/file1 ./dir1.0/dir1.1/file2 ./dir1.0/dir1.1/file3 ./dir2.0/file1 ./dir2.0/file2 ./dir2.0/file3 ./dir2.0/dir2.1/file1 ./dir2.0/dir2.1/file2 ./dir3.0/file1 DB::fake::(/usr/lib/perl5/perl5db.pl:2043): 2043: "Debugged program terminated. Use `q' to quit or `R' to restart."; DB<2>
Now that we've got the recursive call debugged, let's play with the calling stack a bit. Giving the command T will display the current calling stack. The calling stack is a list of the subroutines which have been called between the current point in execution and the beginning of execution. In other words, if the main portion of the code executes subroutine “a”, which in turn executes subroutine “b”, which calls “c”, then pressing “T” while in the middle of subroutine “c” outputs a list going from “c” all the way back to “main”.
Start up the program and enter the following commands (omit the second one if you have fixed the bug we discovered in the last section):
b 34 ( $_ =~ /file2$/) a 34 $_ = "$dir/$_" c
These commands set a breakpoint that will only stop execution if the value of the variable $_ ends with the string file2. Effectively, this code will halt execution at arbitrary points in the program. Press T and you'll get this:
@ = main::searchdir('./dir1.0/file2') called from file '../p2.pl' line 45 @ = main::searchdir(.) called from file '../p2.pl' line 10
Enter c, then T again:
@ = main::searchdir('./dir1.0/dir1.1/file2') called from file `../p2.pl' line 45 @ = main::searchdir(undef) called from file '../p2.pl' line 45 @ = main::searchdir(.) called from file '../p2.pl' line 10
Do it once more:
@ = main::searchdir('./dir2.0/file2') called from file '../p2.pl' line 45 @ = main::searchdir(.) called from file '../p2.pl' line 10
You can go on, if you so desire, but I think we have enough data from the arbitrary stack dumps we've taken.
We see here which subroutines were called, the debugger's best guess of which arguments were passed to the subroutine and which line of which file the subroutine was called from. Since the lines begin with @ = , we know that searchdir will return a list. If it were going to return a scalar value, we'd see $ =. For hashes (also known as associative arrays), we would see % =.
I say “best guess of what arguments were passed” because in Perl, the arguments to subroutines are placed into the @_ magic list. However, manipulating @_ (or $_) in the body of the subroutine is allowed and even encouraged. When a T is entered, the stack trace is printed out, and the current value of @_ is printed as the arguments to the subroutine. So when @_ is changed, the trace doesn't reflect what was actually passed as arguments to the subroutine.
Well, by now you must be thinking, “Gosh, this Perl debugger is so keen that with it I can end world hunger, learn to play the piano and increase my productivity by 300%!” Well, this is the right attitude. You are now displaying the third programmer's virtue, hubris. However, there are some warnings.
Race conditions are the scourge of the programmer. Race conditions are bugs that occur only under certain circumstances. These circumstances usually involve the time at which certain events correlate with other events. Using the debugger to debug these situations is not always possible, because the act of using the debugger may change the timing of the events in the program. This can cause a symptom to occur without the debugger, but while using the debugger, the symptom may disappear. The bug isn't gone, it just isn't being “tickled”.
There really isn't any stock method to get rid of race conditions. Usually, an intense analysis of the algorithms is necessary. Finite-state diagrams may also be useful, if you have the patience for it.
When writing code that involves more than one process (for example, if your code uses a “fork” system call or its equivalent), using the debugger becomes very difficult. This is because when the fork occurs, you are left with two (or more) processes, all running under the debugger. But since the debugger is interactive, you have to interact with every process. The result is that you have to individually deal with each process, controlling each execution. All the processes will want to read debug commands from the controlling terminal, but only one at a time will be able to do so. The other(s) will block, waiting for the first to complete. When it does, another process will complete. Incidentally, we can't know for sure which process will be first. This is an example of the above mentioned race condition.
The final concern with using the debugger is compilation. Because the debugger is actually just debugging code inserted into your script, it is necessary that your script be compilable. That is, there should be no syntax errors.
Jeremy Impson is a Senior Computer Science student at Syracuse University, in Syracuse, NY, studying Operating Systems. He spent the past summer working for IBM Global Services in Poughkeepsie, NY. He's been playing with Linux since Spring 1995, and has been hacking Perl just as long. Outside of computing and sleeping, he spends time studying history and cooking up strange recipes. You can reach him at jdimpson@acm.org.