Debugging C programs is often extremely challenging. The direct pointer manipulations permitted by the language give rise to bugs that can't happen in most other computer languages. Often, these bugs manifest themselves in strange ways, such as the program printing interesting messages like "core dump" or "bus error" with no additional information. This is the price you pay for the efficiency and low-level control that the C language provides.
Debugging is a big subject, and we can only scratch the surface here. In general, here are three approaches you can use for debugging:
When you get a bug, put lots of print statements in code likely to have caused the bug so that you can monitor the values of variables which may not be what they should be.
Add lots of assert statements so that when something goes
wrong the program halts right away instead of continuing. If you don't know
about assert, do "man assert". We will talk more about this later in the
course.
Use a debugger to find out where your code went wrong.
These approaches are not mutually exclusive and almost every programmer uses a combination of all three (plus others). The first two methods are pretty self-explanatory. The third needs a bit more explanation, which we provide below. You can also do "man gdb" and/or "info gdb" to get much more information.
GDB stands for Gnu DeBugger. It is an environment under which you can run a C program in such a way as to make it very easy to identify bugs.
To use gdb, do the following:
Compile your program with the -g flag e.g.
gcc -Wall -Wstrict-prototypes -ansi -pedantic -g myprog.c -o myprog
(Note that we're using a lot of warning options as well, which are the
"-Wall -Wstrict-prototypes -ansi -pedantic" options; these force
the compiler to complain if your code isn't ANSI-compliant or if it has other
suspicious features. It's a good habit to always use these options.) The
"-g" option puts debugging information into the executable. Most
importantly, it puts the text of the source code file into the executable so
you can examine it as the program executes (we'll see how below).
Type gdb myprog (for the example above). This will start
the interactive debugger. It's basically an interpreter-like environment in
which you can run your program line-by-line and do useful debugging tasks as
well.
When in the debugger, you have a choice of lots of commands. Do "info" to get a list of commands. Here are some of the most important ones:
run: runs the programwhere: tells you where you are in the program when you have stopped at
some point. Also tells you the calling history of the program up to that
point (i.e. which functions have been called to get you where you
are).p <variable>: prints the value of <variable>break <file>:<line>: causes the program to stop at a
particular line in a particular source code filebreak <function>: causes the program to stop when entering a
particular function.n: executes the next statement and then stops. This command will
not enter a new function while you're inside a function. Instead, it
goes to the next statement in the current function.s: executes the next statement, possibly entering a new function, and
then stops.l: lists lines in a source code file.c: continues executing the program.q: exits (quits) gdb.Several of these commands have longer names that you can use as well:
print for p, next for n,
step for s, list for l,
cont for c, and quit for
q.
For more information about any of these, type help
cmdname at the gdb prompt, where cmdname is the name of
the command listed above.
Let's say that you're running a C program and it core dumps. The error
message you get is unlikely to be helpful; it will probably be something like
segmentation violation (core dumped). First, let's identify
what that cryptic phrase means. A "segmentation violation" means that your
program tried to access memory that it wasn't allowed to. Since Unix is a
multitasking operating system, each process lives in its own little world,
with its own little hunk of memory that it's allowed to play with. The
operating system knows what hunk belongs to your process and what doesn't; if
your process tries to access memory that it doesn't have the right to access,
then it violates the (memory) segment boundaries and you get a segmentation
violation, which (normally) causes your program to abort. A "core dump"
refers to the fact that by default, a "core" file will be "dumped" into the
directory from which you ran your program. The file is actually called
"core" and can be very large (several megabytes or more). That's because
it's a dump of what the memory contained when your program crashed. It is
possible to use the core file to debug your program, but there are much
easier ways to debug, so we won't cover that here. Most Unix shells
(i.e. the command interpreter like bash) allow you to put
a statement in the initialization file (.bashrc for
bash) that restricts the size of core dumps (ideally to zero
bytes, in which case no core file is dumped); ask your local Unix guru for
more information on this.
OK. Now what you need to know is where the segmentation violation
occurred. To do this, compile your program with the "-g" option
described above, start up gdb, and type "run myprog" (where
"myprog" is the name of your program). Alternatively you can invoke gdb as
"gdb myprog" and then just type "run" at the gdb
prompt. [NOTE: if your program needs command-line arguments, you
should supply them after the "run" or "run myprog"
statement e.g. "run myprog arg1 arg2 arg3".] This will
run your program until the segmentation violation occurs. Gdb will tell you
that the segmentation violation occurred and then wait for your command. It
will look something like this:
Program received signal SIGSEGV, Segmentation fault.
0x4006cb26 in free () from /lib/libc.so.6
This means that the segmentation violation (also known as a segmentation
fault or segfault for short) occurred in the library function
"free". This is weird; does this mean that there is a bug in
"free"? Almost certainly not. Instead, your program did
something bad that caused "free" to fail (possibly by asking it
to free a NULL pointer).
Type "where" and you will get a stack backtrace. This
is probably the single most useful thing you can have when something goes
wrong. A stack backtrace is a list of function names in your program and
associated data. It looks something like this:
(gdb) where
#0 0x4006cb26 in free () from /lib/libc.so.6
#1 0x4006ca0d in free () from /lib/libc.so.6
#2 0x8048951 in board_updater (array=0x8049bd0, ncells=2) at 1dCA2.c:148
#3 0x80486be in main (argc=3, argv=0xbffff7b4) at 1dCA2.c:44
#4 0x40035a52 in __libc_start_main () from /lib/libc.so.6
The stack is a data structure which holds information about
functions which have partially finished executing. When a function calls
another function, information about the new function being called is "pushed"
onto a stack. This includes information such as the arguments to the
function, the contents of local variables, etc. This information is referred
to as a stack frame. When the function is finished its work it "pops"
the frame off the stack and returns to the previous stack frame, which
belongs to the function that called it. In the above backtrace, we see that
the function __libc_start_main called main which
called board_updater which called free which called
itself recursively. In this case, the functions
__libc_start_main and free are C library functions
which you didn't write. main is the good old main function that
you write in every C program. What seems to have happened here is that
something went wrong in the board_updater function, and gdb even
tells you what line it happened on (which is what the "-g" option did). You
should look at that line, and perhaps set a breakpoint there:
break 1dCA2.c:148
Now, when you run the program again, gdb will stop it on that line, and
you will be able to print out the values of any relevant variables before
free is called.
There is much, much more to debugging than I have time to go into here, but this should get you started. Reading the gdb info documentation (type "info gdb" at the shell prompt) will be a good place to go for more information, as will asking your Unix guru friends.
There exist programs which serve as graphical front-ends to
gdb. The one we recommend is called ddd (for
Data Display Debugger). Learning a graphical
debugger can take time, but it's well worth it because it makes it much
easier to interact with your program (setting breakpoints, looking at code as
it executes, etc.). Describing ddd in detail is beyond the
scope of this document, but if you're interested, type "man ddd" at the Unix
shell prompt or visit this
link for much more information.