Crashes, Core Dumps, and GDB
"My MUD crashes. What do I do? Someone said there's a core file that I can debug, I can't find it. And how do I debug a core file?
It's every mud or talker developer's worst nightmare. You're coding along, adding lots of new features to your code. Then it begins to crash. Worse, it begins to crash about a week later. And you have no idea what's wrong.
This isn't intended to be an article on how to debug your crashes. That's generally a skill that needs to be developed with experience and intimate knowledge of your code base (if getting intimate with your code base scares you, now's the time to quit).
Ninety-percent of fixing a crash is finding and understanding why it crashes. This article explains how to generate and debug a core file, and tries to explain what it all means. Other debugging strategies will be described in future articles.
Excuse me for being pedantic, but some information before we jump into it all.
Why does a MUD or Talker crash?
Simply put: you went where you were not supposed to. An application will crash if you attempt to access memory that wasn't owned by you. This could be memory that belongs to another application, or unallocated memory.
Why would my program do that? Well...
- Trying to take a long walk off a short array. If your array is of length 100, don't be accessing item number 1429.
- Dereferencing a NULL pointer. If a pointer is NULL, it has the value 0x00. Trying to use it causes your program to try to read from memory address 0x00, which is undoubtedly not owned by your program.
- Corrupted pointers. If you've got memory corruption going on, who knows what's happening. Some other parts of your program may be writing to memory it's not suppose to, causing this corruption.
- Weird arguments to system calls. System calls aren't perfect. If you give them odd arguments, they might die on you.
So what happens when I crash?
When your program has done something naughty, the kernel sends it a SIGSEGV signal, also known as 'segmentation fault'. This leaves your program two opportunities:
- Let the operating system handle it. Most simple programs fit into this category; they have nothing special to handle SIGSEGV. As a result the operating system takes over for them: the OS will terminate the program and then generate a core file -- more info below.
- Handle the signal yourself. Some programs install handlers for the SIGSEGV signal. This handler will do something special, like restart your MUD whenever it crashes, or tries to dump out some logging information, etc. You won't get a core file though -- although there's some ways around this.
Debugging your core file
Debugging your core file with gdb allows you to see what line your code crashed at, and the values of any variables and the time of your code crash.
There are a few things you have to ensure before you have debuggable core files:
- Make sure you're compiling with the -g flag. This ensures that debugging symbols are inserted into your binary executable. These debugging symbols help the debugger find its way around.
- You haven't changed the source since your compile. Although this happens a lot, if the source has been changed after compiling and running your MUD or talker, you'll get odd results. Make sure the source is the same source as the code that crashed.
Simplest use of GDB? Find out what line your MUD or talker crashed on. Go into your src directory, or where your MUD or talker binary sits. Then type:
gdb <name of binary> core
Once in GDB, you'll see the line of code that your program crashed at. You can also use the bt (backtrace) command to see the function calls leading up to that line.
For more information on GDB, here is an excellent resource: Debugging "C" and "C+" Programs Using "gdb".
How do I force a core file?
Well what if your mud or talker handles SIGSEGV? You won't get a core file. Which also means you have nothing to debug, should you crash. There's got to be a way around that, right?
But of course. At the top of your signal handle, you can insert this line of code:
if (fork()) abort();
What does that do? Simply put, the fork() system call clones your code into two simultaneous running copies, sometimes called the child and the parent. In the child, the fork() call will return 0, while in the parent, the fork() call returns the process id of the child -- non zero.
If fork() returns non zero, i.e. the parent, the abort() system call gets called. abort() terminates the program and forces a core dump!
You can debug as normal. Keep in mind that you aborted from inside your signal handler so you may have to use the up command in GDB to move to right frame.
let's wrap it up
So hopefully with the knowledge of how to debug those pesky crashes, you can find the problem faster and give it the good old quick fix. Leaving you more time to write more code, and of course, add more bugs. ;)
Happy coding!
about the author
Ardant has been involved in the mud and talker community since 1997. His experience as head developer and co-owner of the Enchantment Under the Sea telnet talker proves invaluable in his operations director role at Dune Internet.
|