Existing

We have debugging infrastructure. For example:

To Do

glibc's sotruss
ltrace
latrace
profiling
Checkpoint/restart allows the state of a set of processes to be saved to persistent storage, then restarted at some future time -- quoting from Jonathan Corbet's 2010 Linux Kernel Summit report.

This is surely a very useful facility to have for reproducing failures, for example. But on the other hand it's questionable how it can help with debugging failures in GNU Hurd servers' interactions, as their state is typically spread between several processes.

Continues: http://lwn.net/Articles/414264/, which introduces http://dmtcp.sourceforge.net/.
?crash server}}, [[GDB gcore, http://code.google.com/p/google-coredumper/
libdiskfs locking
http://lwn.net/Articles/415728/, or http://lwn.net/Articles/415471/ -- just two examples; there's a lot of such stuff for Linux.
debugging gnumach startup QEMU GDB