GDB for Multi-threaded C: Hunting Race Conditions Without Losing Your Mind

Debugging single-threaded C in GDB is mostly mechanical: set a breakpoint, step, print, repeat. Multi-threaded C is a different game. Bugs vanish under the debugger. Stack traces lie. Variables seem to change value between two prints.

These are the GDB techniques that actually help.

Know Where You Are

First, see all threads:

(gdb) info threads
  Id   Target Id            Frame
* 1    Thread 0x7ff... "app"  __futex_abstimed_wait
  2    Thread 0x7ff... "worker-1" do_work () at worker.c:42
  3    Thread 0x7ff... "worker-2" __libc_recv () at recv.c:27

The * marks the current thread. Switch with:

(gdb) thread 3
(gdb) bt

A backtrace from every thread — the single most useful command for multi-threaded freeze investigations:

(gdb) thread apply all bt

Look for thread chains: thread A waiting on a lock held by B, B waiting on a lock held by C, C blocked on the network. That pattern is a deadlock or a starvation.

Naming Threads

Unnamed threads are unreadable in info threads. Set names from the thread itself:

c
#include <pthread.h>
pthread_setname_np(pthread_self(), "net-rx");

Limit is 16 characters. Now info threads shows net-rx instead of app.

Conditional Breakpoints by Thread

A breakpoint that fires on every thread is useless when only one is interesting. Pin it:

(gdb) break worker.c:42 thread 3
(gdb) break worker.c:42 thread 3 if pid == 1234

The second form combines thread selection with a data condition.

Non-Stop Mode

Default GDB freezes all threads at any breakpoint. That's safe but distorts timing — the bug may not happen when other threads are paused.

Turn on non-stop mode:

(gdb) set non-stop on
(gdb) set target-async on
(gdb) run

Now only the thread that hit the breakpoint stops. Others keep running. Resume the stopped thread without affecting others:

(gdb) thread 3
(gdb) continue&

This exposes races that vanish with all-stop mode — the so-called Heisenbug effect.

Watchpoints

A breakpoint stops on a line. A watchpoint stops on a value.

(gdb) watch counter
(gdb) watch -l ptr->state    // by location, survives ptr changes
(gdb) rwatch counter         // read-only access
(gdb) awatch counter         // any access

For a corrupted variable mystery: watch corrupted_field — GDB stops the moment any thread touches it. The current thread shows the perpetrator.

Hardware watchpoints are limited (typically 4 simultaneous). Software watchpoints still work but are 100× slower — GDB single-steps every instruction.

ThreadSanitizer First

GDB is for inspection, not race detection. For finding races, compile with TSan:

clang -fsanitize=thread -g -O1 ./app.c -o app
./app

TSan instruments memory accesses and prints the conflicting accesses, with stack traces of both threads. It catches races that may not yet manifest on your hardware. Use TSan to find them; use GDB to investigate them.

Most real-world race investigations are TSan output → GDB attach.

Reverse Debugging (when it's available)

With rr (record-replay), you record one execution and replay it deterministically as many times as you want. GDB inside rr lets you step backwards:

rr record ./app
rr replay
(gdb) reverse-continue
(gdb) reverse-next
(gdb) watch -l mystate
(gdb) reverse-continue   // find the previous write

This is the killer feature for "how did this variable get this value?" Run backward to the last write. No more re-running and praying.

Attaching to a Live Process

Don't restart — attach:

gdb -p $(pgrep myapp)

This is essential for production debugging. With gcore, you can also snapshot a core dump from the live process and analyze it offline:

gcore $(pgrep myapp)
gdb ./myapp core.12345

No need to halt service.

Useful Pretty-Printing

GDB's default print of a pthread_mutex_t is unreadable bytes. Use libthread-db's helpers and write your own pretty-printers in Python for your data structures. A 30-line printer for your queue type pays for itself within an hour.

(gdb) python
class QueuePrinter:
    def __init__(self, val): self.val = val
    def to_string(self):
        return "queue head=%d tail=%d size=%d" % (
            self.val['head'], self.val['tail'], self.val['size'])
end

Common Multi-thread Bug Patterns

Frozen process, no CPU usage: deadlock. Run thread apply all bt and look for the cycle.

Frozen process, 100% CPU: livelock or busy-spin. Same backtrace command — you'll see threads in a tight loop or pthread_cond_wait returning immediately.

Crash with corrupted state: race writing the state. TSan first, watchpoint second.

Bug disappears under GDB: timing-sensitive race. Switch to non-stop mode, or use rr for deterministic replay.

Different stack traces every run: lock-free code with insufficient memory ordering. Reach for the memory-barriers article or relacy model checker.

Takeaways

  • thread apply all bt is the first command in any multi-threaded freeze.
  • Name your threads. Untagged threads are useless under load.
  • Use non-stop mode when freezing all threads distorts the bug.
  • TSan finds races; GDB inspects them. Don't confuse the roles.
  • rr is worth installing for the day you have a non-deterministic bug. That day always comes.