SIGTRAP vs. sys_exit_group race

* SIGTRAP vs. sys_exit_group race
@ 2008-10-06 14:08 Jan Kiszka
  2008-10-16 16:57 ` Oleg Nesterov
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kiszka @ 2008-10-06 14:08 UTC (permalink / raw)
  To: Roland McGrath, Oleg Nesterov; +Cc: Linux Kernel Mailing List

Hi,

are there any news on these ideas?

http://marc.info/?l=linux-kernel&m=121540671602971

I've been caught by a race between a ptraced thread running on a
breakpoint, thus generating a SIGTRAP and another thread in this process
issuing sys_exit_group. The discussion above, specifically Oleg's
concerns, made me think that this is a generic issue of current
mainline.

I observed this on a heavily patched 2.6.26.5 kernel which comes, among
other things, with a higher probability for latencies/reschedules
between the queuing of SIGTRAP and the actual delivery. Right into this
window, the sys_exit_group comes. It informs gdb about the termination,
sends out SIGKILL to the other threads and turns the caller into a
zombie. Now the second thread has SIGKILL + SIGTRAP pending, and it
picks SIGTRAP for delivery. At this point gdb gets confused (maybe a bug
of its own?), sends SIGSTOP to the dead thread and waits for it to enter
the traced state (which it will never do) - deadlock of gdb, only
resolvable by killing the latter.

The patch below (rebased against latest git) resolves the issue for me,
but I'm definitely not sure about all its implications and if I'm not
papering over a different issue. Could you comment on my scenario? Is it
possible with mainline as well? Will Roland's approach resolve it?

Thanks,
Jan

---
Index: b/kernel/signal.c
===================================================================

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1528,10 +1528,11 @@ static void ptrace_stop(int exit_code, i
 		spin_unlock_irq(&current->sighand->siglock);
 		arch_ptrace_stop(exit_code, info);
 		spin_lock_irq(&current->sighand->siglock);
-		if (sigkill_pending(current))
-			return;
 	}
 
+	if (sigkill_pending(current))
+		return;
+
 	/*
 	 * If there is a group stop in progress,
 	 * we must participate in the bookkeeping.
-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 4+ messages in thread