[PATCHSET] ptrace,signal: Improve ptrace and job control interaction

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET] ptrace,signal: Improve ptrace and job control interaction
@ 2011-03-23 10:05 Tejun Heo
  2011-03-23 10:05 ` [PATCH 01/20] signal: Fix SIGCONT notification code Tejun Heo
                   ` (20 more replies)
  0 siblings, 21 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland

Hello,

This is repost of the following two pending patchsets.

 [1] group stop / ptrace updates, take#2
 [2] ptrace,signal: Fix notifications to the real parent while ptraced

Combined these patches implement "P1. Always TASK_TRACED while
ptraced", "P2. Fix notifications to the real parent" and "P3. Keep
ptrace resume separate from and beneath jctl stop" of the ptrace job
control improvements proposal[3].

All patches have been reviewed and acked by Oleg and this will be the
last repost of these patches.  They're being committed to a stable git
tree (will eventually be managed by Oleg) and will be pulled into
linux-next once -rc1 is released.

 0001-signal-Fix-SIGCONT-notification-code.patch
 0002-ptrace-Remove-the-extra-wake_up_state-from-ptrace_de.patch
 0003-signal-Remove-superflous-try_to_freeze-loop-in-do_si.patch
 0004-ptrace-Kill-tracehook_notify_jctl.patch
 0005-ptrace-Add-why-to-ptrace_stop.patch
 0006-signal-Fix-premature-completion-of-group-stop-when-i.patch
 0007-signal-Use-GROUP_STOP_PENDING-to-stop-once-for-a-sin.patch
 0008-ptrace-Participate-in-group-stop-from-ptrace_stop-if.patch
 0009-ptrace-Make-do_signal_stop-use-ptrace_stop-if-the-ta.patch
 0010-ptrace-Clean-transitions-between-TASK_STOPPED-and-TR.patch
 0011-ptrace-Collapse-ptrace_untrace-into-__ptrace_unlink.patch
 0012-ptrace-Always-put-ptracee-into-appropriate-execution.patch
 0013-job-control-Don-t-set-group_stop-exit_code-if-re-ent.patch
 0014-job-control-Small-reorganization-of-wait_consider_ta.patch
 0015-job-control-Fix-ptracer-wait-2-hang-and-explain-nota.patch
 0016-job-control-Allow-access-to-job-control-events-throu.patch
 0017-job-control-Add-for_ptrace-to-do_notify_parent_cldst.patch
 0018-job-control-Job-control-stop-notifications-should-al.patch
 0019-job-control-Notify-the-real-parent-of-job-control-ev.patch
 0020-job-control-Don-t-send-duplicate-job-control-stop-no.patch

0001-0010 are from the first patchset and unchanged.  0010-0020 are
from the second patchset.  0010 and 0011 are the additional ones
posted as 0.1 and 0.2.  0015, 0016 and 0019 are the updated version
per Oleg's reviews.  The rest remain unchanged.

The patches are on top of the current mainline (6447f55da) and is
available in the following git branch

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git ptrace

and contains the following changes.

 fs/exec.c                 |    1 
 include/linux/sched.h     |   11 +
 include/linux/tracehook.h |   27 ---
 kernel/exit.c             |   84 +++++++++--
 kernel/ptrace.c           |  110 ++++++++++----
 kernel/signal.c           |  341 +++++++++++++++++++++++++++++++++++++---------
 6 files changed, 434 insertions(+), 140 deletions(-)

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1109224
[2] http://thread.gmane.org/gmane.linux.kernel/1110403
[3] http://thread.gmane.org/gmane.linux.kernel/1107045

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/20] signal: Fix SIGCONT notification code
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 02/20] ptrace: Remove the extra wake_up_state() from ptrace_detach() Tejun Heo
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

After a task receives SIGCONT, its parent is notified via SIGCHLD with
its siginfo describing what the notified event is.  If SIGCONT is
received while the child process is stopped, the code should be
CLD_CONTINUED.  If SIGCONT is recieved while the child process is in
the process of being stopped, it should be CLD_STOPPED.  Which code to
use is determined in prepare_signal() and recorded in signal->flags
using SIGNAL_CLD_CONTINUED|STOP flags.

get_signal_deliver() should test these flags and then notify
accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying
CLD_CONTINUED if the signal is delivered before the task is wait(2)ed
and CLD_STOPPED if the state was fetched already.

Fix it by testing SIGNAL_CLD_CONTINUED.  While at it, uncompress the
?: test into if/else clause for better readability.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 3175186..e26274a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1853,8 +1853,13 @@ relock:
 	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
 	 */
 	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
-		int why = (signal->flags & SIGNAL_STOP_CONTINUED)
-				? CLD_CONTINUED : CLD_STOPPED;
+		int why;
+
+		if (signal->flags & SIGNAL_CLD_CONTINUED)
+			why = CLD_CONTINUED;
+		else
+			why = CLD_STOPPED;
+
 		signal->flags &= ~SIGNAL_CLD_MASK;

 		why = tracehook_notify_jctl(why, CLD_CONTINUED);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/20] ptrace: Remove the extra wake_up_state() from ptrace_detach()
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
  2011-03-23 10:05 ` [PATCH 01/20] signal: Fix SIGCONT notification code Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 03/20] signal: Remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo, Roland McGrath

This wake_up_state() has a turbulent history.  This is a remnant from
ancient ptrace implementation and patently wrong.  Commit 95a3540d
(ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic) removed
it but the change was reverted later by commit edaba2c5 (ptrace:
revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic")
citing compatibility breakage and general brokeness of the whole group
stop / ptrace interaction.  Then, recently, it got converted from
wake_up_process() to wake_up_state() to make it less dangerous.

Digging through the mailing archives, the compatibility breakage
doesn't seem to be critical in the sense that the behavior isn't well
defined or reliable to begin with and it seems to have been agreed to
remove the wakeup with proper cleanup of the whole thing.

Now that the group stop and its interaction with ptrace are being
cleaned up, it's high time to finally kill this silliness.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/ptrace.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index e2302e4..6acf895 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -312,8 +312,6 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	if (child->ptrace) {
 		child->exit_code = data;
 		dead = __ptrace_detach(current, child);
-		if (!child->exit_state)
-			wake_up_state(child, TASK_TRACED | TASK_STOPPED);
 	}
 	write_unlock_irq(&tasklist_lock);

-- 
1.7.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/20] signal: Remove superflous try_to_freeze() loop in do_signal_stop()
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
  2011-03-23 10:05 ` [PATCH 01/20] signal: Fix SIGCONT notification code Tejun Heo
  2011-03-23 10:05 ` [PATCH 02/20] ptrace: Remove the extra wake_up_state() from ptrace_detach() Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 04/20] ptrace: Kill tracehook_notify_jctl() Tejun Heo
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

do_signal_stop() is used only by get_signal_to_deliver() and after a
successful signal stop, it always calls try_to_freeze(), so the
try_to_freeze() loop around schedule() in do_signal_stop() is
superflous and confusing.  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index e26274a..f4db769 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1781,9 +1781,7 @@ static int do_signal_stop(int signr)
 	}
 
 	/* Now we don't run again until woken by SIGCONT or SIGKILL */
-	do {
-		schedule();
-	} while (try_to_freeze());
+	schedule();
 
 	tracehook_finish_jctl();
 	current->exit_code = 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/20] ptrace: Kill tracehook_notify_jctl()
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (2 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 03/20] signal: Remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 05/20] ptrace: Add @why to ptrace_stop() Tejun Heo
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo, Roland McGrath

tracehook_notify_jctl() aids in determining whether and what to report
to the parent when a task is stopped or continued.  The function also
adds an extra requirement that siglock may be released across it,
which is currently unused and quite difficult to satisfy in
well-defined manner.

As job control and the notifications are about to receive major
overhaul, remove the tracehook and open code it.  If ever necessary,
let's factor it out after the overhaul.

* Oleg spotted incorrect CLD_CONTINUED/STOPPED selection when ptraced.
  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 include/linux/tracehook.h |   27 ---------------------------
 kernel/signal.c           |   34 ++++++++++++++--------------------
 2 files changed, 14 insertions(+), 47 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3a2e66d..b073f3c 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -469,33 +469,6 @@ static inline int tracehook_get_signal(struct task_struct *task,
 }
 
 /**
- * tracehook_notify_jctl - report about job control stop/continue
- * @notify:		zero, %CLD_STOPPED or %CLD_CONTINUED
- * @why:		%CLD_STOPPED or %CLD_CONTINUED
- *
- * This is called when we might call do_notify_parent_cldstop().
- *
- * @notify is zero if we would not ordinarily send a %SIGCHLD,
- * or is the %CLD_STOPPED or %CLD_CONTINUED .si_code for %SIGCHLD.
- *
- * @why is %CLD_STOPPED when about to stop for job control;
- * we are already in %TASK_STOPPED state, about to call schedule().
- * It might also be that we have just exited (check %PF_EXITING),
- * but need to report that a group-wide stop is complete.
- *
- * @why is %CLD_CONTINUED when waking up after job control stop and
- * ready to make a delayed @notify report.
- *
- * Return the %CLD_* value for %SIGCHLD, or zero to generate no signal.
- *
- * Called with the siglock held.
- */
-static inline int tracehook_notify_jctl(int notify, int why)
-{
-	return notify ?: (current->ptrace & PT_PTRACED) ? why : 0;
-}
-
-/**
  * tracehook_finish_jctl - report about return from job control stop
  *
  * This is called by do_signal_stop() after wakeup.
diff --git a/kernel/signal.c b/kernel/signal.c
index f4db769..03d874e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1727,7 +1727,7 @@ void ptrace_notify(int exit_code)
 static int do_signal_stop(int signr)
 {
 	struct signal_struct *sig = current->signal;
-	int notify;
+	int notify = 0;
 
 	if (!sig->group_stop_count) {
 		struct task_struct *t;
@@ -1759,19 +1759,16 @@ static int do_signal_stop(int signr)
 	 * a group stop in progress and we are the last to stop, report
 	 * to the parent.  When ptraced, every thread reports itself.
 	 */
-	notify = sig->group_stop_count == 1 ? CLD_STOPPED : 0;
-	notify = tracehook_notify_jctl(notify, CLD_STOPPED);
-	/*
-	 * tracehook_notify_jctl() can drop and reacquire siglock, so
-	 * we keep ->group_stop_count != 0 before the call. If SIGCONT
-	 * or SIGKILL comes in between ->group_stop_count == 0.
-	 */
-	if (sig->group_stop_count) {
-		if (!--sig->group_stop_count)
-			sig->flags = SIGNAL_STOP_STOPPED;
-		current->exit_code = sig->group_exit_code;
-		__set_current_state(TASK_STOPPED);
+	if (!--sig->group_stop_count) {
+		sig->flags = SIGNAL_STOP_STOPPED;
+		notify = CLD_STOPPED;
 	}
+	if (task_ptrace(current))
+		notify = CLD_STOPPED;
+
+	current->exit_code = sig->group_exit_code;
+	__set_current_state(TASK_STOPPED);
+
 	spin_unlock_irq(&current->sighand->siglock);
 
 	if (notify) {
@@ -1860,14 +1857,11 @@ relock:
 
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		why = tracehook_notify_jctl(why, CLD_CONTINUED);
 		spin_unlock_irq(&sighand->siglock);
 
-		if (why) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current->group_leader, why);
-			read_unlock(&tasklist_lock);
-		}
+		read_lock(&tasklist_lock);
+		do_notify_parent_cldstop(current->group_leader, why);
+		read_unlock(&tasklist_lock);
 		goto relock;
 	}
 
@@ -2034,7 +2028,7 @@ void exit_signals(struct task_struct *tsk)
 	if (unlikely(tsk->signal->group_stop_count) &&
 			!--tsk->signal->group_stop_count) {
 		tsk->signal->flags = SIGNAL_STOP_STOPPED;
-		group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
+		group_stop = CLD_STOPPED;
 	}
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/20] ptrace: Add @why to ptrace_stop()
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (3 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 04/20] ptrace: Kill tracehook_notify_jctl() Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 06/20] signal: Fix premature completion of group stop when interfered by ptrace Tejun Heo
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

To prepare for cleanup of the interaction between group stop and
ptrace, add @why to ptrace_stop().  Existing users are updated such
that there is no behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 03d874e..95ac42d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1617,7 +1617,7 @@ static int sigkill_pending(struct task_struct *tsk)
  * If we actually decide not to stop at all because the tracer
  * is gone, we keep current->exit_code unless clear_code.
  */
-static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
+static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
@@ -1655,7 +1655,7 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
 	if (may_ptrace_stop()) {
-		do_notify_parent_cldstop(current, CLD_TRAPPED);
+		do_notify_parent_cldstop(current, why);
 		/*
 		 * Don't want to allow preemption here, because
 		 * sys_ptrace() needs this task to be inactive.
@@ -1714,7 +1714,7 @@ void ptrace_notify(int exit_code)
 
 	/* Let the debugger run.  */
 	spin_lock_irq(&current->sighand->siglock);
-	ptrace_stop(exit_code, 1, &info);
+	ptrace_stop(exit_code, CLD_TRAPPED, 1, &info);
 	spin_unlock_irq(&current->sighand->siglock);
 }
 
@@ -1795,7 +1795,7 @@ static int ptrace_signal(int signr, siginfo_t *info,
 	ptrace_signal_deliver(regs, cookie);
 
 	/* Let the debugger run.  */
-	ptrace_stop(signr, 0, info);
+	ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	signr = current->exit_code;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/20] signal: Fix premature completion of group stop when interfered by ptrace
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (4 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 05/20] ptrace: Add @why to ptrace_stop() Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 07/20] signal: Use GROUP_STOP_PENDING to stop once for a single group stop Tejun Heo
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo, Roland McGrath

task->signal->group_stop_count is used to track the progress of group
stop.  It's initialized to the number of tasks which need to stop for
group stop to finish and each stopping or trapping task decrements.
However, each task doesn't keep track of whether it decremented the
counter or not and if woken up before the group stop is complete and
stops again, it can decrement the counter multiple times.

Please consider the following example code.

 static void *worker(void *arg)
 {
	 while (1) ;
	 return NULL;
 }

 int main(void)
 {
	 pthread_t thread;
	 pid_t pid;
	 int i;

	 pid = fork();
	 if (!pid) {
		 for (i = 0; i < 5; i++)
			 pthread_create(&thread, NULL, worker, NULL);
		 while (1) ;
		 return 0;
	 }

	 ptrace(PTRACE_ATTACH, pid, NULL, NULL);
	 while (1) {
		 waitid(P_PID, pid, NULL, WSTOPPED);
		 ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
	 }
	 return 0;
 }

The child creates five threads and the parent continuously traps the
first thread and whenever the child gets a signal, SIGSTOP is
delivered.  If an external process sends SIGSTOP to the child, all
other threads in the process should reliably stop.  However, due to
the above bug, the first thread will often end up consuming
group_stop_count multiple times and SIGSTOP often ends up stopping
none or part of the other four threads.

This patch adds a new field task->group_stop which is protected by
siglock and uses GROUP_STOP_CONSUME flag to track which task is still
to consume group_stop_count to fix this bug.

task_clear_group_stop_pending() and task_participate_group_stop() are
added to help manipulating group stop states.  As ptrace_stop() now
also uses task_participate_group_stop(), it will set
SIGNAL_STOP_STOPPED if it completes a group stop.

There still are many issues regarding the interaction between group
stop and ptrace.  Patches to address them will follow.

- Oleg spotted duplicate GROUP_STOP_CONSUME.  Dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 include/linux/sched.h |    6 ++++
 kernel/signal.c       |   62 ++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4b601be..85f5104 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1260,6 +1260,7 @@ struct task_struct {
 	int exit_state;
 	int exit_code, exit_signal;
 	int pdeath_signal;  /*  The signal sent when the parent dies  */
+	unsigned int group_stop;	/* GROUP_STOP_*, siglock protected */
 	/* ??? */
 	unsigned int personality;
 	unsigned did_exec:1;
@@ -1771,6 +1772,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
+/*
+ * task->group_stop flags
+ */
+#define GROUP_STOP_CONSUME	(1 << 17) /* consume group stop count */
+
 #ifdef CONFIG_PREEMPT_RCU
 
 #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
diff --git a/kernel/signal.c b/kernel/signal.c
index 95ac42d..ecb2008 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -223,6 +223,52 @@ static inline void print_dropped_signal(int sig)
 				current->comm, current->pid, sig);
 }
 
+/**
+ * task_clear_group_stop_pending - clear pending group stop
+ * @task: target task
+ *
+ * Clear group stop states for @task.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void task_clear_group_stop_pending(struct task_struct *task)
+{
+	task->group_stop &= ~GROUP_STOP_CONSUME;
+}
+
+/**
+ * task_participate_group_stop - participate in a group stop
+ * @task: task participating in a group stop
+ *
+ * @task is participating in a group stop.  Group stop states are cleared
+ * and the group stop count is consumed if %GROUP_STOP_CONSUME was set.  If
+ * the consumption completes the group stop, the appropriate %SIGNAL_*
+ * flags are set.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static bool task_participate_group_stop(struct task_struct *task)
+{
+	struct signal_struct *sig = task->signal;
+	bool consume = task->group_stop & GROUP_STOP_CONSUME;
+
+	task_clear_group_stop_pending(task);
+
+	if (!consume)
+		return false;
+
+	if (!WARN_ON_ONCE(sig->group_stop_count == 0))
+		sig->group_stop_count--;
+
+	if (!sig->group_stop_count) {
+		sig->flags = SIGNAL_STOP_STOPPED;
+		return true;
+	}
+	return false;
+}
+
 /*
  * allocate a new signal queue record
  * - this may be called without locks if and only if t == current, otherwise an
@@ -1645,7 +1691,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	 * we must participate in the bookkeeping.
 	 */
 	if (current->signal->group_stop_count > 0)
-		--current->signal->group_stop_count;
+		task_participate_group_stop(current);
 
 	current->last_siginfo = info;
 	current->exit_code = exit_code;
@@ -1730,6 +1776,7 @@ static int do_signal_stop(int signr)
 	int notify = 0;
 
 	if (!sig->group_stop_count) {
+		unsigned int gstop = GROUP_STOP_CONSUME;
 		struct task_struct *t;
 
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1741,6 +1788,7 @@ static int do_signal_stop(int signr)
 		 */
 		sig->group_exit_code = signr;
 
+		current->group_stop = gstop;
 		sig->group_stop_count = 1;
 		for (t = next_thread(current); t != current; t = next_thread(t))
 			/*
@@ -1750,19 +1798,19 @@ static int do_signal_stop(int signr)
 			 */
 			if (!(t->flags & PF_EXITING) &&
 			    !task_is_stopped_or_traced(t)) {
+				t->group_stop = gstop;
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
-			}
+			} else
+				task_clear_group_stop_pending(t);
 	}
 	/*
 	 * If there are no other threads in the group, or if there is
 	 * a group stop in progress and we are the last to stop, report
 	 * to the parent.  When ptraced, every thread reports itself.
 	 */
-	if (!--sig->group_stop_count) {
-		sig->flags = SIGNAL_STOP_STOPPED;
+	if (task_participate_group_stop(current))
 		notify = CLD_STOPPED;
-	}
 	if (task_ptrace(current))
 		notify = CLD_STOPPED;
 
@@ -2026,10 +2074,8 @@ void exit_signals(struct task_struct *tsk)
 			recalc_sigpending_and_wake(t);
 
 	if (unlikely(tsk->signal->group_stop_count) &&
-			!--tsk->signal->group_stop_count) {
-		tsk->signal->flags = SIGNAL_STOP_STOPPED;
+	    task_participate_group_stop(tsk))
 		group_stop = CLD_STOPPED;
-	}
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/20] signal: Use GROUP_STOP_PENDING to stop once for a single group stop
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (5 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 06/20] signal: Fix premature completion of group stop when interfered by ptrace Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 08/20] ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for " Tejun Heo
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo, Roland McGrath

Currently task->signal->group_stop_count is used to decide whether to
stop for group stop.  However, if there is a task in the group which
is taking a long time to stop, other tasks which are continued by
ptrace would repeatedly stop for the same group stop until the group
stop is complete.

Conversely, if a ptraced task is in TASK_TRACED state, the debugger
won't get notified of group stops which is inconsistent compared to
the ptraced task in any other state.

This patch introduces GROUP_STOP_PENDING which tracks whether a task
is yet to stop for the group stop in progress.  The flag is set when a
group stop starts and cleared when the task stops the first time for
the group stop, and consulted whenever whether the task should
participate in a group stop needs to be determined.  Note that now
tasks in TASK_TRACED also participate in group stop.

This results in the following behavior changes.

* For a single group stop, a ptracer would see at most one stop
  reported.

* A ptracee in TASK_TRACED now also participates in group stop and the
  tracer would get the notification.  However, as a ptraced task could
  be in TASK_STOPPED state or any ptrace trap could consume group
  stop, the notification may still be missing.  These will be
  addressed with further patches.

* A ptracee may start a group stop while one is still in progress if
  the tracer let it continue with stop signal delivery.  Group stop
  code handles this correctly.

Oleg:

* Spotted that a task might skip signal check even when its
  GROUP_STOP_PENDING is set.  Fixed by updating
  recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
  group_stop_count.

* Pointed out that task->group_stop should be cleared whenever
  task->signal->group_stop_count is cleared.  Fixed accordingly.

* Pointed out the behavior inconsistency between TASK_TRACED and
  RUNNING and the last behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 fs/exec.c             |    1 +
 include/linux/sched.h |    3 +++
 kernel/signal.c       |   36 +++++++++++++++++++++---------------
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 5e62d26..8328beb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1659,6 +1659,7 @@ static int zap_process(struct task_struct *start, int exit_code)
 
 	t = start;
 	do {
+		task_clear_group_stop_pending(t);
 		if (t != current && t->mm) {
 			sigaddset(&t->pending.signal, SIGKILL);
 			signal_wake_up(t, 1);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 85f5104..b2a17df 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1775,8 +1775,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * task->group_stop flags
  */
+#define GROUP_STOP_PENDING	(1 << 16) /* task should stop for group stop */
 #define GROUP_STOP_CONSUME	(1 << 17) /* consume group stop count */
 
+extern void task_clear_group_stop_pending(struct task_struct *task);
+
 #ifdef CONFIG_PREEMPT_RCU
 
 #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
diff --git a/kernel/signal.c b/kernel/signal.c
index ecb2008..a2e7a65 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -124,7 +124,7 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)
 
 static int recalc_sigpending_tsk(struct task_struct *t)
 {
-	if (t->signal->group_stop_count > 0 ||
+	if ((t->group_stop & GROUP_STOP_PENDING) ||
 	    PENDING(&t->pending, &t->blocked) ||
 	    PENDING(&t->signal->shared_pending, &t->blocked)) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
@@ -232,19 +232,19 @@ static inline void print_dropped_signal(int sig)
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
-static void task_clear_group_stop_pending(struct task_struct *task)
+void task_clear_group_stop_pending(struct task_struct *task)
 {
-	task->group_stop &= ~GROUP_STOP_CONSUME;
+	task->group_stop &= ~(GROUP_STOP_PENDING | GROUP_STOP_CONSUME);
 }
 
 /**
  * task_participate_group_stop - participate in a group stop
  * @task: task participating in a group stop
  *
- * @task is participating in a group stop.  Group stop states are cleared
- * and the group stop count is consumed if %GROUP_STOP_CONSUME was set.  If
- * the consumption completes the group stop, the appropriate %SIGNAL_*
- * flags are set.
+ * @task has GROUP_STOP_PENDING set and is participating in a group stop.
+ * Group stop states are cleared and the group stop count is consumed if
+ * %GROUP_STOP_CONSUME was set.  If the consumption completes the group
+ * stop, the appropriate %SIGNAL_* flags are set.
  *
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
@@ -254,6 +254,8 @@ static bool task_participate_group_stop(struct task_struct *task)
 	struct signal_struct *sig = task->signal;
 	bool consume = task->group_stop & GROUP_STOP_CONSUME;
 
+	WARN_ON_ONCE(!(task->group_stop & GROUP_STOP_PENDING));
+
 	task_clear_group_stop_pending(task);
 
 	if (!consume)
@@ -765,6 +767,9 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		t = p;
 		do {
 			unsigned int state;
+
+			task_clear_group_stop_pending(t);
+
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
 			/*
 			 * If there is a handler for SIGCONT, we must make
@@ -906,6 +911,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 			signal->group_stop_count = 0;
 			t = p;
 			do {
+				task_clear_group_stop_pending(t);
 				sigaddset(&t->pending.signal, SIGKILL);
 				signal_wake_up(t, 1);
 			} while_each_thread(p, t);
@@ -1139,6 +1145,7 @@ int zap_other_threads(struct task_struct *p)
 	p->signal->group_stop_count = 0;
 
 	while_each_thread(p, t) {
+		task_clear_group_stop_pending(t);
 		count++;
 
 		/* Don't bother with already dead threads */
@@ -1690,7 +1697,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	 * If there is a group stop in progress,
 	 * we must participate in the bookkeeping.
 	 */
-	if (current->signal->group_stop_count > 0)
+	if (current->group_stop & GROUP_STOP_PENDING)
 		task_participate_group_stop(current);
 
 	current->last_siginfo = info;
@@ -1775,8 +1782,8 @@ static int do_signal_stop(int signr)
 	struct signal_struct *sig = current->signal;
 	int notify = 0;
 
-	if (!sig->group_stop_count) {
-		unsigned int gstop = GROUP_STOP_CONSUME;
+	if (!(current->group_stop & GROUP_STOP_PENDING)) {
+		unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
 		struct task_struct *t;
 
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1796,8 +1803,7 @@ static int do_signal_stop(int signr)
 			 * stop is always done with the siglock held,
 			 * so this check has no races.
 			 */
-			if (!(t->flags & PF_EXITING) &&
-			    !task_is_stopped_or_traced(t)) {
+			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
 				t->group_stop = gstop;
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
@@ -1926,8 +1932,8 @@ relock:
 		if (unlikely(signr != 0))
 			ka = return_ka;
 		else {
-			if (unlikely(signal->group_stop_count > 0) &&
-			    do_signal_stop(0))
+			if (unlikely(current->group_stop &
+				     GROUP_STOP_PENDING) && do_signal_stop(0))
 				goto relock;
 
 			signr = dequeue_signal(current, &current->blocked,
@@ -2073,7 +2079,7 @@ void exit_signals(struct task_struct *tsk)
 		if (!signal_pending(t) && !(t->flags & PF_EXITING))
 			recalc_sigpending_and_wake(t);
 
-	if (unlikely(tsk->signal->group_stop_count) &&
+	if (unlikely(tsk->group_stop & GROUP_STOP_PENDING) &&
 	    task_participate_group_stop(tsk))
 		group_stop = CLD_STOPPED;
 out:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/20] ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for group stop
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (6 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 07/20] signal: Use GROUP_STOP_PENDING to stop once for a single group stop Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 09/20] ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo, Roland McGrath

Currently, ptrace_stop() unconditionally participates in group stop
bookkeeping.  This is unnecessary and inaccurate.  Make it only
participate if the task is trapping for group stop - ie. if @why is
CLD_STOPPED.  As ptrace_stop() currently is not used when trapping for
group stop, this equals to disabling group stop participation from
ptrace_stop().

A visible behavior change is increased likelihood of delayed group
stop completion if the thread group contains one or more ptraced
tasks.

This is to preapre for further cleanup of the interaction between
group stop and ptrace.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index a2e7a65..9f36dd2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1694,10 +1694,13 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
-	 * If there is a group stop in progress,
-	 * we must participate in the bookkeeping.
+	 * If @why is CLD_STOPPED, we're trapping to participate in a group
+	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
+	 * while siglock was released for the arch hook, PENDING could be
+	 * clear now.  We act as if SIGCONT is received after TASK_TRACED
+	 * is entered - ignore it.
 	 */
-	if (current->group_stop & GROUP_STOP_PENDING)
+	if (why == CLD_STOPPED && (current->group_stop & GROUP_STOP_PENDING))
 		task_participate_group_stop(current);
 
 	current->last_siginfo = info;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/20] ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (7 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 08/20] ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for " Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 10/20] ptrace: Clean transitions between TASK_STOPPED and TRACED Tejun Heo
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo, Roland McGrath

A ptraced task would still stop at do_signal_stop() when it's stopping
for stop signals and do_signal_stop() behaves the same whether the
task is ptraced or not.  However, in addition to stopping,
ptrace_stop() also does ptrace specific stuff like calling
architecture specific callbacks, so this behavior makes the code more
fragile and difficult to understand.

This patch makes do_signal_stop() test whether the task is ptraced and
use ptrace_stop() if so.  This renders tracehook_notify_jctl() rather
pointless as the ptrace notification is now handled by ptrace_stop()
regardless of the return value from the tracehook.  It probably is a
good idea to update it.

This doesn't solve the whole problem as tasks already in stopped state
would stay in the regular stop when ptrace attached.  That part will
be handled by the next patch.

Oleg pointed out that this makes a userland-visible change.  Before,
SIGCONT would be able to wake up a task in group stop even if the task
is ptraced if the tracer hasn't issued another ptrace command
afterwards (as the next ptrace commands transitions the state into
TASK_TRACED which ignores SIGCONT wakeups).  With this and the next
patch, SIGCONT may race with the transition into TASK_TRACED and is
ignored if the tracee already entered TASK_TRACED.

Another userland visible change of this and the next patch is that the
ptracee's state would now be TASK_TRACED where it used to be
TASK_STOPPED, which is visible via fs/proc.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
---
 kernel/signal.c |   43 +++++++++++++++++++++++++------------------
 1 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9f36dd2..418776c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1783,7 +1783,6 @@ void ptrace_notify(int exit_code)
 static int do_signal_stop(int signr)
 {
 	struct signal_struct *sig = current->signal;
-	int notify = 0;

 	if (!(current->group_stop & GROUP_STOP_PENDING)) {
 		unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
@@ -1813,29 +1812,37 @@ static int do_signal_stop(int signr)
 			} else
 				task_clear_group_stop_pending(t);
 	}
-	/*
-	 * If there are no other threads in the group, or if there is
-	 * a group stop in progress and we are the last to stop, report
-	 * to the parent.  When ptraced, every thread reports itself.
-	 */
-	if (task_participate_group_stop(current))
-		notify = CLD_STOPPED;
-	if (task_ptrace(current))
-		notify = CLD_STOPPED;

 	current->exit_code = sig->group_exit_code;
 	__set_current_state(TASK_STOPPED);

-	spin_unlock_irq(&current->sighand->siglock);
+	if (likely(!task_ptrace(current))) {
+		int notify = 0;

-	if (notify) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current, notify);
-		read_unlock(&tasklist_lock);
-	}
+		/*
+		 * If there are no other threads in the group, or if there
+		 * is a group stop in progress and we are the last to stop,
+		 * report to the parent.
+		 */
+		if (task_participate_group_stop(current))
+			notify = CLD_STOPPED;

-	/* Now we don't run again until woken by SIGCONT or SIGKILL */
-	schedule();
+		spin_unlock_irq(&current->sighand->siglock);
+
+		if (notify) {
+			read_lock(&tasklist_lock);
+			do_notify_parent_cldstop(current, notify);
+			read_unlock(&tasklist_lock);
+		}
+
+		/* Now we don't run again until woken by SIGCONT or SIGKILL */
+		schedule();
+
+		spin_lock_irq(&current->sighand->siglock);
+	} else
+		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+
+	spin_unlock_irq(&current->sighand->siglock);

 	tracehook_finish_jctl();
 	current->exit_code = 0;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/20] ptrace: Clean transitions between TASK_STOPPED and TRACED
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (8 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 09/20] ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 11/20] ptrace: Collapse ptrace_untrace() into __ptrace_unlink() Tejun Heo
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo, Roland McGrath

Currently, if the task is STOPPED on ptrace attach, it's left alone
and the state is silently changed to TRACED on the next ptrace call.
The behavior breaks the assumption that arch_ptrace_stop() is called
before any task is poked by ptrace and is ugly in that a task
manipulates the state of another task directly.

With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
TRACED can be made clean.  The tracer can use the flag to tell the
tracee to retry stop on attach and detach.  On retry, the tracee will
enter the desired state in the correct way.  The lower 16bits of
task->group_stop is used to remember the signal number which caused
the last group stop.  This is used while retrying for ptrace attach as
the original group_exit_code could have been consumed with wait(2) by
then.

As the real parent may wait(2) and consume the group_exit_code
anytime, the group_exit_code needs to be saved separately so that it
can be used when switching from regular sleep to ptrace_stop().  This
is recorded in the lower 16bits of task->group_stop.

If a task is already stopped and there's no intervening SIGCONT, a
ptrace request immediately following a successful PTRACE_ATTACH should
always succeed even if the tracer doesn't wait(2) for attach
completion; however, with this change, the tracee might still be
TASK_RUNNING trying to enter TASK_TRACED which would cause the
following request to fail with -ESRCH.

This intermediate state is hidden from the ptracer by setting
GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
for it to clear on its signal->wait_chldexit.  Completing the
transition or getting killed clears TRAPPING and wakes up the tracer.

Note that the STOPPED -> RUNNING -> TRACED transition is still visible
to other threads which are in the same group as the ptracer and the
reverse transition is visible to all.  Please read the comments for
details.

Oleg:

* Spotted a race condition where a task may retry group stop without
  proper bookkeeping.  Fixed by redoing bookkeeping on retry.

* Spotted that the transition is visible to userland in several
  different ways.  Most are fixed with GROUP_STOP_TRAPPING.  Unhandled
  corner case is documented.

* Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
  task would result in more consistent behavior.

* Pointed out that calling ptrace_stop() from do_signal_stop() in
  TASK_STOPPED can race with group stop start logic and then confuse
  the TRAPPING wait in ptrace_check_attach().  ptrace_stop() is now
  called with TASK_RUNNING.

* Suggested using signal->wait_chldexit instead of bit wait.

* Spotted a race condition between TRACED transition and clearing of
  TRAPPING.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
---
 include/linux/sched.h |    2 +
 kernel/ptrace.c       |   49 +++++++++++++++++++++++++++---
 kernel/signal.c       |   79 +++++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 112 insertions(+), 18 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b2a17df..456d80e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1775,8 +1775,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * task->group_stop flags
  */
+#define GROUP_STOP_SIGMASK	0xffff    /* signr of the last group stop */
 #define GROUP_STOP_PENDING	(1 << 16) /* task should stop for group stop */
 #define GROUP_STOP_CONSUME	(1 << 17) /* consume group stop count */
+#define GROUP_STOP_TRAPPING	(1 << 18) /* switching from STOPPED to TRACED */
 
 extern void task_clear_group_stop_pending(struct task_struct *task);
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 6acf895..745fc2d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -49,14 +49,22 @@ static void ptrace_untrace(struct task_struct *child)
 	spin_lock(&child->sighand->siglock);
 	if (task_is_traced(child)) {
 		/*
-		 * If the group stop is completed or in progress,
-		 * this thread was already counted as stopped.
+		 * If group stop is completed or in progress, it should
+		 * participate in the group stop.  Set GROUP_STOP_PENDING
+		 * before kicking it.
+		 *
+		 * This involves TRACED -> RUNNING -> STOPPED transition
+		 * which is similar to but in the opposite direction of
+		 * what happens while attaching to a stopped task.
+		 * However, in this direction, the intermediate RUNNING
+		 * state is not hidden even from the current ptracer and if
+		 * it immediately re-attaches and performs a WNOHANG
+		 * wait(2), it may fail.
 		 */
 		if (child->signal->flags & SIGNAL_STOP_STOPPED ||
 		    child->signal->group_stop_count)
-			__set_task_state(child, TASK_STOPPED);
-		else
-			signal_wake_up(child, 1);
+			child->group_stop |= GROUP_STOP_PENDING;
+		signal_wake_up(child, 1);
 	}
 	spin_unlock(&child->sighand->siglock);
 }
@@ -165,6 +173,7 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 
 static int ptrace_attach(struct task_struct *task)
 {
+	bool wait_trap = false;
 	int retval;
 
 	audit_ptrace(task);
@@ -204,12 +213,42 @@ static int ptrace_attach(struct task_struct *task)
 	__ptrace_link(task, current);
 	send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
 
+	spin_lock(&task->sighand->siglock);
+
+	/*
+	 * If the task is already STOPPED, set GROUP_STOP_PENDING and
+	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
+	 * will be cleared if the child completes the transition or any
+	 * event which clears the group stop states happens.  We'll wait
+	 * for the transition to complete before returning from this
+	 * function.
+	 *
+	 * This hides STOPPED -> RUNNING -> TRACED transition from the
+	 * attaching thread but a different thread in the same group can
+	 * still observe the transient RUNNING state.  IOW, if another
+	 * thread's WNOHANG wait(2) on the stopped tracee races against
+	 * ATTACH, the wait(2) may fail due to the transient RUNNING.
+	 *
+	 * The following task_is_stopped() test is safe as both transitions
+	 * in and out of STOPPED are protected by siglock.
+	 */
+	if (task_is_stopped(task)) {
+		task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
+		signal_wake_up(task, 1);
+		wait_trap = true;
+	}
+
+	spin_unlock(&task->sighand->siglock);
+
 	retval = 0;
 unlock_tasklist:
 	write_unlock_irq(&tasklist_lock);
 unlock_creds:
 	mutex_unlock(&task->signal->cred_guard_mutex);
 out:
+	if (wait_trap)
+		wait_event(current->signal->wait_chldexit,
+			   !(task->group_stop & GROUP_STOP_TRAPPING));
 	return retval;
 }
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 418776c..1e19991 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -224,6 +224,26 @@ static inline void print_dropped_signal(int sig)
 }
 
 /**
+ * task_clear_group_stop_trapping - clear group stop trapping bit
+ * @task: target task
+ *
+ * If GROUP_STOP_TRAPPING is set, a ptracer is waiting for us.  Clear it
+ * and wake up the ptracer.  Note that we don't need any further locking.
+ * @task->siglock guarantees that @task->parent points to the ptracer.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void task_clear_group_stop_trapping(struct task_struct *task)
+{
+	if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
+		task->group_stop &= ~GROUP_STOP_TRAPPING;
+		__wake_up_sync(&task->parent->signal->wait_chldexit,
+			       TASK_UNINTERRUPTIBLE, 1);
+	}
+}
+
+/**
  * task_clear_group_stop_pending - clear pending group stop
  * @task: target task
  *
@@ -1706,8 +1726,20 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	current->last_siginfo = info;
 	current->exit_code = exit_code;
 
-	/* Let the debugger run.  */
-	__set_current_state(TASK_TRACED);
+	/*
+	 * TRACED should be visible before TRAPPING is cleared; otherwise,
+	 * the tracer might fail do_wait().
+	 */
+	set_current_state(TASK_TRACED);
+
+	/*
+	 * We're committing to trapping.  Clearing GROUP_STOP_TRAPPING and
+	 * transition to TASK_TRACED should be atomic with respect to
+	 * siglock.  This hsould be done after the arch hook as siglock is
+	 * released and regrabbed across it.
+	 */
+	task_clear_group_stop_trapping(current);
+
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
 	if (may_ptrace_stop()) {
@@ -1788,6 +1820,9 @@ static int do_signal_stop(int signr)
 		unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
 		struct task_struct *t;
 
+		/* signr will be recorded in task->group_stop for retries */
+		WARN_ON_ONCE(signr & ~GROUP_STOP_SIGMASK);
+
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
 		    unlikely(signal_group_exit(sig)))
 			return 0;
@@ -1797,25 +1832,27 @@ static int do_signal_stop(int signr)
 		 */
 		sig->group_exit_code = signr;
 
-		current->group_stop = gstop;
+		current->group_stop &= ~GROUP_STOP_SIGMASK;
+		current->group_stop |= signr | gstop;
 		sig->group_stop_count = 1;
-		for (t = next_thread(current); t != current; t = next_thread(t))
+		for (t = next_thread(current); t != current;
+		     t = next_thread(t)) {
+			t->group_stop &= ~GROUP_STOP_SIGMASK;
 			/*
 			 * Setting state to TASK_STOPPED for a group
 			 * stop is always done with the siglock held,
 			 * so this check has no races.
 			 */
 			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
-				t->group_stop = gstop;
+				t->group_stop |= signr | gstop;
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
-			} else
+			} else {
 				task_clear_group_stop_pending(t);
+			}
+		}
 	}
-
-	current->exit_code = sig->group_exit_code;
-	__set_current_state(TASK_STOPPED);
-
+retry:
 	if (likely(!task_ptrace(current))) {
 		int notify = 0;
 
@@ -1827,6 +1864,7 @@ static int do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		__set_current_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
 		if (notify) {
@@ -1839,13 +1877,28 @@ static int do_signal_stop(int signr)
 		schedule();
 
 		spin_lock_irq(&current->sighand->siglock);
-	} else
-		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+	} else {
+		ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
+			    CLD_STOPPED, 0, NULL);
+		current->exit_code = 0;
+	}
+
+	/*
+	 * GROUP_STOP_PENDING could be set if another group stop has
+	 * started since being woken up or ptrace wants us to transit
+	 * between TASK_STOPPED and TRACED.  Retry group stop.
+	 */
+	if (current->group_stop & GROUP_STOP_PENDING) {
+		WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
+		goto retry;
+	}
+
+	/* PTRACE_ATTACH might have raced with task killing, clear trapping */
+	task_clear_group_stop_trapping(current);
 
 	spin_unlock_irq(&current->sighand->siglock);
 
 	tracehook_finish_jctl();
-	current->exit_code = 0;
 
 	return 1;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/20] ptrace: Collapse ptrace_untrace() into __ptrace_unlink()
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (9 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 10/20] ptrace: Clean transitions between TASK_STOPPED and TRACED Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 12/20] ptrace: Always put ptracee into appropriate execution state Tejun Heo
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

Remove the extra task_is_traced() check in __ptrace_unlink() and
collapse ptrace_untrace() into __ptrace_unlink().  This is to prepare
for further changes.

While at it, drop the comment on top of ptrace_untrace() and convert
__ptrace_unlink() comment to docbook format.  Detailed comment will be
added by the next patch.

This patch doesn't cause any visible behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/ptrace.c |   40 +++++++++++++++-------------------------
 1 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 745fc2d..e609843 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -37,15 +37,23 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
 	child->parent = new_parent;
 }
 
-/*
- * Turn a tracing stop into a normal stop now, since with no tracer there
- * would be no way to wake it up with SIGCONT or SIGKILL.  If there was a
- * signal sent that would resume the child, but didn't because it was in
- * TASK_TRACED, resume it now.
- * Requires that irqs be disabled.
+/**
+ * __ptrace_unlink - unlink ptracee and restore its execution state
+ * @child: ptracee to be unlinked
+ *
+ * Remove @child from the ptrace list, move it back to the original parent.
+ *
+ * CONTEXT:
+ * write_lock_irq(tasklist_lock)
  */
-static void ptrace_untrace(struct task_struct *child)
+void __ptrace_unlink(struct task_struct *child)
 {
+	BUG_ON(!child->ptrace);
+
+	child->ptrace = 0;
+	child->parent = child->real_parent;
+	list_del_init(&child->ptrace_entry);
+
 	spin_lock(&child->sighand->siglock);
 	if (task_is_traced(child)) {
 		/*
@@ -70,24 +78,6 @@ static void ptrace_untrace(struct task_struct *child)
 }
 
 /*
- * unptrace a task: move it back to its original parent and
- * remove it from the ptrace list.
- *
- * Must be called with the tasklist lock write-held.
- */
-void __ptrace_unlink(struct task_struct *child)
-{
-	BUG_ON(!child->ptrace);
-
-	child->ptrace = 0;
-	child->parent = child->real_parent;
-	list_del_init(&child->ptrace_entry);
-
-	if (task_is_traced(child))
-		ptrace_untrace(child);
-}
-
-/*
  * Check that we have indeed attached to the thing..
  */
 int ptrace_check_attach(struct task_struct *child, int kill)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/20] ptrace: Always put ptracee into appropriate execution state
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (10 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 11/20] ptrace: Collapse ptrace_untrace() into __ptrace_unlink() Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:05 ` [PATCH 13/20] job control: Don't set group_stop exit_code if re-entering job control stop Tejun Heo
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

Currently, __ptrace_unlink() wakes up the tracee iff it's in
TASK_TRACED.  For unlinking from PTRACE_DETACH, this is correct as the
tracee is guaranteed to be in TASK_TRACED or dead; however, unlinking
also happens when the ptracer exits and in this case the ptracee can
be in any state and ptrace might be left running even if the group it
belongs to is stopped.

This patch updates __ptrace_unlink() such that GROUP_STOP_PENDING is
reinstated regardless of the ptracee's current state as long as it's
alive and makes sure that signal_wake_up() is called if execution
state transition is necessary.

Test case follows.

  #include <unistd.h>
  #include <time.h>
  #include <sys/types.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  static const struct timespec ts1s = { .tv_sec = 1 };

  int main(void)
  {
	  pid_t tracee;
	  siginfo_t si;

	  tracee = fork();
	  if (tracee == 0) {
		  while (1) {
			  nanosleep(&ts1s, NULL);
			  write(1, ".", 1);
		  }
	  }

	  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
	  waitid(P_PID, tracee, &si, WSTOPPED);
	  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
	  waitid(P_PID, tracee, &si, WSTOPPED);
	  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
	  write(1, "exiting", 7);
	  return 0;
  }

Before the patch, after the parent process exits, the child is left
running and prints out "." every second.

  exiting..... (continues)

After the patch, the group stop initiated by the implied SIGSTOP from
PTRACE_ATTACH is re-established when the parent exits.

  exiting

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/ptrace.c |   59 ++++++++++++++++++++++++++++++++++++------------------
 1 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index e609843..4348586 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -41,7 +41,26 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
  * __ptrace_unlink - unlink ptracee and restore its execution state
  * @child: ptracee to be unlinked
  *
- * Remove @child from the ptrace list, move it back to the original parent.
+ * Remove @child from the ptrace list, move it back to the original parent,
+ * and restore the execution state so that it conforms to the group stop
+ * state.
+ *
+ * Unlinking can happen via two paths - explicit PTRACE_DETACH or ptracer
+ * exiting.  For PTRACE_DETACH, unless the ptracee has been killed between
+ * ptrace_check_attach() and here, it's guaranteed to be in TASK_TRACED.
+ * If the ptracer is exiting, the ptracee can be in any state.
+ *
+ * After detach, the ptracee should be in a state which conforms to the
+ * group stop.  If the group is stopped or in the process of stopping, the
+ * ptracee should be put into TASK_STOPPED; otherwise, it should be woken
+ * up from TASK_TRACED.
+ *
+ * If the ptracee is in TASK_TRACED and needs to be moved to TASK_STOPPED,
+ * it goes through TRACED -> RUNNING -> STOPPED transition which is similar
+ * to but in the opposite direction of what happens while attaching to a
+ * stopped task.  However, in this direction, the intermediate RUNNING
+ * state is not hidden even from the current ptracer and if it immediately
+ * re-attaches and performs a WNOHANG wait(2), it may fail.
  *
  * CONTEXT:
  * write_lock_irq(tasklist_lock)
@@ -55,25 +74,25 @@ void __ptrace_unlink(struct task_struct *child)
 	list_del_init(&child->ptrace_entry);
 
 	spin_lock(&child->sighand->siglock);
-	if (task_is_traced(child)) {
-		/*
-		 * If group stop is completed or in progress, it should
-		 * participate in the group stop.  Set GROUP_STOP_PENDING
-		 * before kicking it.
-		 *
-		 * This involves TRACED -> RUNNING -> STOPPED transition
-		 * which is similar to but in the opposite direction of
-		 * what happens while attaching to a stopped task.
-		 * However, in this direction, the intermediate RUNNING
-		 * state is not hidden even from the current ptracer and if
-		 * it immediately re-attaches and performs a WNOHANG
-		 * wait(2), it may fail.
-		 */
-		if (child->signal->flags & SIGNAL_STOP_STOPPED ||
-		    child->signal->group_stop_count)
-			child->group_stop |= GROUP_STOP_PENDING;
-		signal_wake_up(child, 1);
-	}
+
+	/*
+	 * Reinstate GROUP_STOP_PENDING if group stop is in effect and
+	 * @child isn't dead.
+	 */
+	if (!(child->flags & PF_EXITING) &&
+	    (child->signal->flags & SIGNAL_STOP_STOPPED ||
+	     child->signal->group_stop_count))
+		child->group_stop |= GROUP_STOP_PENDING;
+
+	/*
+	 * If transition to TASK_STOPPED is pending or in TASK_TRACED, kick
+	 * @child in the butt.  Note that @resume should be used iff @child
+	 * is in TASK_TRACED; otherwise, we might unduly disrupt
+	 * TASK_KILLABLE sleeps.
+	 */
+	if (child->group_stop & GROUP_STOP_PENDING || task_is_traced(child))
+		signal_wake_up(child, task_is_traced(child));
+
 	spin_unlock(&child->sighand->siglock);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/20] job control: Don't set group_stop exit_code if re-entering job control stop
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (11 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 12/20] ptrace: Always put ptracee into appropriate execution state Tejun Heo
@ 2011-03-23 10:05 ` Tejun Heo
  2011-03-23 10:06 ` [PATCH 14/20] job control: Small reorganization of wait_consider_task() Tejun Heo
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:05 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

While ptraced, a task may be resumed while the containing process is
still job control stopped.  If the task receives another stop signal
in this state, it will still initiate group stop, which generates
group_exit_code, which the real parent would be able to see once the
ptracer detaches.

In this scenario, the real parent may see two consecutive CLD_STOPPED
events from two stop signals without intervening SIGCONT, which
normally is impossible.

Test case follows.

  #include <stdio.h>
  #include <unistd.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  int main(void)
  {
	  pid_t tracee;
	  siginfo_t si;

	  tracee = fork();
	  if (!tracee)
		  while (1)
			  pause();

	  kill(tracee, SIGSTOP);
	  waitid(P_PID, tracee, &si, WSTOPPED);

	  if (!fork()) {
		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
		  return 0;
	  }

	  while (1) {
		  si.si_pid = 0;
		  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG);
		  if (si.si_pid)
			  printf("st=%02d c=%02d\n", si.si_status, si.si_code);
	  }
	  return 0;
  }

Before the patch, the latter waitid() in polling mode reports the
second stopped event generated by the implied SIGSTOP of
PTRACE_ATTACH.

  st=19 c=05
  ^C

After the patch, the second event is not reported.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/signal.c |   23 ++++++++++++++++++++---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 1e19991..2f2c8f6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1827,10 +1827,27 @@ static int do_signal_stop(int signr)
 		    unlikely(signal_group_exit(sig)))
 			return 0;
 		/*
-		 * There is no group stop already in progress.
-		 * We must initiate one now.
+		 * There is no group stop already in progress.  We must
+		 * initiate one now.
+		 *
+		 * While ptraced, a task may be resumed while group stop is
+		 * still in effect and then receive a stop signal and
+		 * initiate another group stop.  This deviates from the
+		 * usual behavior as two consecutive stop signals can't
+		 * cause two group stops when !ptraced.
+		 *
+		 * The condition can be distinguished by testing whether
+		 * SIGNAL_STOP_STOPPED is already set.  Don't generate
+		 * group_exit_code in such case.
+		 *
+		 * This is not necessary for SIGNAL_STOP_CONTINUED because
+		 * an intervening stop signal is required to cause two
+		 * continued events regardless of ptrace.
 		 */
-		sig->group_exit_code = signr;
+		if (!(sig->flags & SIGNAL_STOP_STOPPED))
+			sig->group_exit_code = signr;
+		else
+			WARN_ON_ONCE(!task_ptrace(current));
 
 		current->group_stop &= ~GROUP_STOP_SIGMASK;
 		current->group_stop |= signr | gstop;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/20] job control: Small reorganization of wait_consider_task()
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (12 preceding siblings ...)
  2011-03-23 10:05 ` [PATCH 13/20] job control: Don't set group_stop exit_code if re-entering job control stop Tejun Heo
@ 2011-03-23 10:06 ` Tejun Heo
  2011-03-23 10:06 ` [PATCH 15/20] job control: Fix ptracer wait(2) hang and explain notask_error clearing Tejun Heo
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:06 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

Move EXIT_DEAD test in wait_consider_task() above ptrace check.  As
ptraced tasks can't be EXIT_DEAD, this change doesn't cause any
behavior change.  This is to prepare for further changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f9a45eb..b4a935c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1537,6 +1537,10 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 		return 0;
 	}
 
+	/* dead body doesn't have much to contribute */
+	if (p->exit_state == EXIT_DEAD)
+		return 0;
+
 	if (likely(!ptrace) && unlikely(task_ptrace(p))) {
 		/*
 		 * This child is hidden by ptrace.
@@ -1546,9 +1550,6 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 		return 0;
 	}
 
-	if (p->exit_state == EXIT_DEAD)
-		return 0;
-
 	/*
 	 * We don't reap group leaders with subthreads.
 	 */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 15/20] job control: Fix ptracer wait(2) hang and explain notask_error clearing
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (13 preceding siblings ...)
  2011-03-23 10:06 ` [PATCH 14/20] job control: Small reorganization of wait_consider_task() Tejun Heo
@ 2011-03-23 10:06 ` Tejun Heo
  2011-03-23 10:06 ` [PATCH 16/20] job control: Allow access to job control events through ptracees Tejun Heo
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:06 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

wait(2) and friends allow access to stopped/continued states through
zombies, which is required as the states are process-wide and should
be accessible whether the leader task is alive or undead.
wait_consider_task() implements this by always clearing notask_error
and going through wait_task_stopped/continued() for unreaped zombies.

However, while ptraced, the stopped state is per-task and as such if
the ptracee became a zombie, there's no further stopped event to
listen to and wait(2) and friends should return -ECHILD on the tracee.

Fix it by clearing notask_error only if WCONTINUED | WEXITED is set
for ptraced zombies.  While at it, document why clearing notask_error
is safe for each case.

Test case follows.

  #include <stdio.h>
  #include <unistd.h>
  #include <pthread.h>
  #include <time.h>
  #include <sys/types.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  static void *nooper(void *arg)
  {
	  pause();
	  return NULL;
  }

  int main(void)
  {
	  const struct timespec ts1s = { .tv_sec = 1 };
	  pid_t tracee, tracer;
	  siginfo_t si;

	  tracee = fork();
	  if (tracee == 0) {
		  pthread_t thr;

		  pthread_create(&thr, NULL, nooper, NULL);
		  nanosleep(&ts1s, NULL);
		  printf("tracee exiting\n");
		  pthread_exit(NULL);	/* let subthread run */
	  }

	  tracer = fork();
	  if (tracer == 0) {
		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
		  while (1) {
			  if (waitid(P_PID, tracee, &si, WSTOPPED) < 0) {
				  perror("waitid");
				  break;
			  }
			  ptrace(PTRACE_CONT, tracee, NULL,
				 (void *)(long)si.si_status);
		  }
		  return 0;
	  }

	  waitid(P_PID, tracer, &si, WEXITED);
	  kill(tracee, SIGKILL);
	  return 0;
  }

Before the patch, after the tracee becomes a zombie, the tracer's
waitid(WSTOPPED) never returns and the program doesn't terminate.

  tracee exiting
  ^C

After the patch, tracee exiting triggers waitid() to fail.

  tracee exiting
  waitid: No child processes

-v2: Oleg pointed out that exited in addition to continued can happen
     for ptraced dead group leader.  Clear notask_error for ptraced
     child on WEXITED too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |   44 ++++++++++++++++++++++++++++++++++----------
 1 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index b4a935c..84d13d6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1550,17 +1550,41 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 		return 0;
 	}
 
-	/*
-	 * We don't reap group leaders with subthreads.
-	 */
-	if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
-		return wait_task_zombie(wo, p);
+	/* slay zombie? */
+	if (p->exit_state == EXIT_ZOMBIE) {
+		/* we don't reap group leaders with subthreads */
+		if (!delay_group_leader(p))
+			return wait_task_zombie(wo, p);
 
-	/*
-	 * It's stopped or running now, so it might
-	 * later continue, exit, or stop again.
-	 */
-	wo->notask_error = 0;
+		/*
+		 * Allow access to stopped/continued state via zombie by
+		 * falling through.  Clearing of notask_error is complex.
+		 *
+		 * When !@ptrace:
+		 *
+		 * If WEXITED is set, notask_error should naturally be
+		 * cleared.  If not, subset of WSTOPPED|WCONTINUED is set,
+		 * so, if there are live subthreads, there are events to
+		 * wait for.  If all subthreads are dead, it's still safe
+		 * to clear - this function will be called again in finite
+		 * amount time once all the subthreads are released and
+		 * will then return without clearing.
+		 *
+		 * When @ptrace:
+		 *
+		 * Stopped state is per-task and thus can't change once the
+		 * target task dies.  Only continued and exited can happen.
+		 * Clear notask_error if WCONTINUED | WEXITED.
+		 */
+		if (likely(!ptrace) || (wo->wo_flags & (WCONTINUED | WEXITED)))
+			wo->notask_error = 0;
+	} else {
+		/*
+		 * @p is alive and it's gonna stop, continue or exit, so
+		 * there always is something to wait for.
+		 */
+		wo->notask_error = 0;
+	}
 
 	if (task_stopped_code(p, ptrace))
 		return wait_task_stopped(wo, ptrace, p);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 16/20] job control: Allow access to job control events through ptracees
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (14 preceding siblings ...)
  2011-03-23 10:06 ` [PATCH 15/20] job control: Fix ptracer wait(2) hang and explain notask_error clearing Tejun Heo
@ 2011-03-23 10:06 ` Tejun Heo
  2011-03-23 10:06 ` [PATCH 17/20] job control: Add @for_ptrace to do_notify_parent_cldstop() Tejun Heo
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:06 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

Currently a real parent can't access job control stopped/continued
events through a ptraced child.  This utterly breaks job control when
the children are ptraced.

For example, if a program is run from an interactive shell and then
strace(1) attaches to it, pressing ^Z would send SIGTSTP and strace(1)
would notice it but the shell has no way to tell whether the child
entered job control stop and thus can't tell when to take over the
terminal - leading to awkward lone ^Z on the terminal.

Because the job control and ptrace stopped states are independent,
there is no reason to prevent real parents from accessing the stopped
state regardless of ptrace.  The continued state isn't separate but
ptracers don't have any use for them as ptracees can never resume
without explicit command from their ptracers, so as long as ptracers
don't consume it, it should be fine.

Although this is a behavior change, because the previous behavior is
utterly broken when viewed from real parents and the change is only
visible to real parents, I don't think it's necessary to make this
behavior optional.

One situation to be careful about is when a task from the real
parent's group is ptracing.  The parent group is the recipient of both
ptrace and job control stop events and one stop can be reported as
both job control and ptrace stops.  As this can break the current
ptrace users, suppress job control stopped events for these cases.

If a real parent ptracer wants to know about both job control and
ptrace stops, it can create a separate process to serve the role of
real parent.

Note that this only updates wait(2) side of things.  The real parent
can access the states via wait(2) but still is not properly notified
(woken up and delivered signal).  Test case polls wait(2) with WNOHANG
to work around.  Notification will be updated by future patches.

Test case follows.

  #include <stdio.h>
  #include <unistd.h>
  #include <time.h>
  #include <errno.h>
  #include <sys/types.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  int main(void)
  {
	  const struct timespec ts100ms = { .tv_nsec = 100000000 };
	  pid_t tracee, tracer;
	  siginfo_t si;
	  int i;

	  tracee = fork();
	  if (tracee == 0) {
		  while (1) {
			  printf("tracee: SIGSTOP\n");
			  raise(SIGSTOP);
			  nanosleep(&ts100ms, NULL);
			  printf("tracee: SIGCONT\n");
			  raise(SIGCONT);
			  nanosleep(&ts100ms, NULL);
		  }
	  }

	  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG | WNOWAIT);

	  tracer = fork();
	  if (tracer == 0) {
		  nanosleep(&ts100ms, NULL);
		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);

		  for (i = 0; i < 11; i++) {
			  si.si_pid = 0;
			  waitid(P_PID, tracee, &si, WSTOPPED);
			  if (si.si_pid && si.si_code == CLD_TRAPPED)
				  ptrace(PTRACE_CONT, tracee, NULL,
					 (void *)(long)si.si_status);
		  }
		  printf("tracer: EXITING\n");
		  return 0;
	  }

	  while (1) {
		  si.si_pid = 0;
		  waitid(P_PID, tracee, &si,
			 WSTOPPED | WCONTINUED | WEXITED | WNOHANG);
		  if (si.si_pid)
			  printf("mommy : WAIT status=%02d code=%02d\n",
				 si.si_status, si.si_code);
		  nanosleep(&ts100ms, NULL);
	  }
	  return 0;
  }

Before the patch, while ptraced, the parent can't see any job control
events.

  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  tracee: SIGSTOP
  tracee: SIGCONT
  tracee: SIGSTOP
  tracee: SIGCONT
  tracee: SIGSTOP
  tracer: EXITING
  mommy : WAIT status=19 code=05
  ^C

After the patch,

  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  tracer: EXITING
  mommy : WAIT status=19 code=05
  ^C

-v2: Oleg pointed out that wait(2) should be suppressed for the real
     parent's group instead of only the real parent task itself.
     Updated accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |   41 +++++++++++++++++++++++++++++++++--------
 1 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 84d13d6..1a0f10f 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1541,17 +1541,19 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 	if (p->exit_state == EXIT_DEAD)
 		return 0;
 
-	if (likely(!ptrace) && unlikely(task_ptrace(p))) {
+	/* slay zombie? */
+	if (p->exit_state == EXIT_ZOMBIE) {
 		/*
-		 * This child is hidden by ptrace.
-		 * We aren't allowed to see it now, but eventually we will.
+		 * A zombie ptracee is only visible to its ptracer.
+		 * Notification and reaping will be cascaded to the real
+		 * parent when the ptracer detaches.
 		 */
-		wo->notask_error = 0;
-		return 0;
-	}
+		if (likely(!ptrace) && unlikely(task_ptrace(p))) {
+			/* it will become visible, clear notask_error */
+			wo->notask_error = 0;
+			return 0;
+		}
 
-	/* slay zombie? */
-	if (p->exit_state == EXIT_ZOMBIE) {
 		/* we don't reap group leaders with subthreads */
 		if (!delay_group_leader(p))
 			return wait_task_zombie(wo, p);
@@ -1580,15 +1582,38 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 			wo->notask_error = 0;
 	} else {
 		/*
+		 * If @p is ptraced by a task in its real parent's group,
+		 * hide group stop/continued state when looking at @p as
+		 * the real parent; otherwise, a single stop can be
+		 * reported twice as group and ptrace stops.
+		 *
+		 * If a ptracer wants to distinguish the two events for its
+		 * own children, it should create a separate process which
+		 * takes the role of real parent.
+		 */
+		if (likely(!ptrace) && task_ptrace(p) &&
+		    same_thread_group(p->parent, p->real_parent))
+			return 0;
+
+		/*
 		 * @p is alive and it's gonna stop, continue or exit, so
 		 * there always is something to wait for.
 		 */
 		wo->notask_error = 0;
 	}
 
+	/*
+	 * Wait for stopped.  Depending on @ptrace, different stopped state
+	 * is used and the two don't interact with each other.
+	 */
 	if (task_stopped_code(p, ptrace))
 		return wait_task_stopped(wo, ptrace, p);
 
+	/*
+	 * Wait for continued.  There's only one continued state and the
+	 * ptracer can consume it which can confuse the real parent.  Don't
+	 * use WCONTINUED from ptracer.  You don't need or want it.
+	 */
 	return wait_task_continued(wo, p);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 17/20] job control: Add @for_ptrace to do_notify_parent_cldstop()
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (15 preceding siblings ...)
  2011-03-23 10:06 ` [PATCH 16/20] job control: Allow access to job control events through ptracees Tejun Heo
@ 2011-03-23 10:06 ` Tejun Heo
  2011-03-23 10:06 ` [PATCH 18/20] job control: Job control stop notifications should always go to the real parent Tejun Heo
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:06 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

Currently, do_notify_parent_cldstop() determines whether the
notification is for the real parent or ptracer.  Move the decision to
the caller by adding @for_ptrace parameter to
do_notify_parent_cldstop().  All the callers are updated to pass
task_ptrace(target_task), so this patch doesn't cause any behavior
difference.

While at it, add function comment to do_notify_parent_cldstop().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/signal.c |   31 ++++++++++++++++++++++++-------
 1 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 2f2c8f6..69d6054 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1595,16 +1595,30 @@ int do_notify_parent(struct task_struct *tsk, int sig)
 	return ret;
 }
 
-static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
+/**
+ * do_notify_parent_cldstop - notify parent of stopped/continued state change
+ * @tsk: task reporting the state change
+ * @for_ptracer: the notification is for ptracer
+ * @why: CLD_{CONTINUED|STOPPED|TRAPPED} to report
+ *
+ * Notify @tsk's parent that the stopped/continued state has changed.  If
+ * @for_ptracer is %false, @tsk's group leader notifies to its real parent.
+ * If %true, @tsk reports to @tsk->parent which should be the ptracer.
+ *
+ * CONTEXT:
+ * Must be called with tasklist_lock at least read locked.
+ */
+static void do_notify_parent_cldstop(struct task_struct *tsk,
+				     bool for_ptracer, int why)
 {
 	struct siginfo info;
 	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
 
-	if (task_ptrace(tsk))
+	if (for_ptracer) {
 		parent = tsk->parent;
-	else {
+	} else {
 		tsk = tsk->group_leader;
 		parent = tsk->real_parent;
 	}
@@ -1743,7 +1757,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
 	if (may_ptrace_stop()) {
-		do_notify_parent_cldstop(current, why);
+		do_notify_parent_cldstop(current, task_ptrace(current), why);
 		/*
 		 * Don't want to allow preemption here, because
 		 * sys_ptrace() needs this task to be inactive.
@@ -1886,7 +1900,8 @@ retry:
 
 		if (notify) {
 			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, notify);
+			do_notify_parent_cldstop(current, task_ptrace(current),
+						 notify);
 			read_unlock(&tasklist_lock);
 		}
 
@@ -1982,6 +1997,7 @@ relock:
 	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
 	 */
 	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
+		struct task_struct *leader;
 		int why;
 
 		if (signal->flags & SIGNAL_CLD_CONTINUED)
@@ -1994,7 +2010,8 @@ relock:
 		spin_unlock_irq(&sighand->siglock);
 
 		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current->group_leader, why);
+		leader = current->group_leader;
+		do_notify_parent_cldstop(leader, task_ptrace(leader), why);
 		read_unlock(&tasklist_lock);
 		goto relock;
 	}
@@ -2167,7 +2184,7 @@ out:
 
 	if (unlikely(group_stop)) {
 		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, group_stop);
+		do_notify_parent_cldstop(tsk, task_ptrace(tsk), group_stop);
 		read_unlock(&tasklist_lock);
 	}
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 18/20] job control: Job control stop notifications should always go to the real parent
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (16 preceding siblings ...)
  2011-03-23 10:06 ` [PATCH 17/20] job control: Add @for_ptrace to do_notify_parent_cldstop() Tejun Heo
@ 2011-03-23 10:06 ` Tejun Heo
  2011-03-23 10:06 ` [PATCH 19/20] job control: Notify the real parent of job control events regardless of ptrace Tejun Heo
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:06 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

The stopped notifications in do_signal_stop() and exit_signals() are
always for the completion of job control.  The one in do_signal_stop()
may be delivered to the ptracer if PTRACE_ATTACH races with
notification and the one in exit_signals() if task exits while
ptraced.

In both cases, the notifications are meaningless and confusing to the
ptracer as it never accesses the group stop state while the real
parent would miss notifications for the events it is watching.

Make sure these notifications always go to the real parent by calling
do_notify_parent_cld_stop() with %false @for_ptrace.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/signal.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 69d6054..9f10b24 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1898,10 +1898,18 @@ retry:
 		__set_current_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
+		/*
+		 * Notify the parent of the group stop completion.  Because
+		 * we're not holding either the siglock or tasklist_lock
+		 * here, ptracer may attach inbetween; however, this is for
+		 * group stop and should always be delivered to the real
+		 * parent of the group leader.  The new ptracer will get
+		 * its notification when this task transitions into
+		 * TASK_TRACED.
+		 */
 		if (notify) {
 			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, task_ptrace(current),
-						 notify);
+			do_notify_parent_cldstop(current, false, notify);
 			read_unlock(&tasklist_lock);
 		}
 
@@ -2182,9 +2190,13 @@ void exit_signals(struct task_struct *tsk)
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
 
+	/*
+	 * If group stop has completed, deliver the notification.  This
+	 * should always go to the real parent of the group leader.
+	 */
 	if (unlikely(group_stop)) {
 		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, task_ptrace(tsk), group_stop);
+		do_notify_parent_cldstop(tsk, false, group_stop);
 		read_unlock(&tasklist_lock);
 	}
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 19/20] job control: Notify the real parent of job control events regardless of ptrace
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (17 preceding siblings ...)
  2011-03-23 10:06 ` [PATCH 18/20] job control: Job control stop notifications should always go to the real parent Tejun Heo
@ 2011-03-23 10:06 ` Tejun Heo
  2011-03-23 10:06 ` [PATCH 20/20] job control: Don't send duplicate job control stop notification while ptraced Tejun Heo
  2011-03-23 18:38 ` [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Oleg Nesterov
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:06 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

With recent changes, job control and ptrace stopped states are
properly separated and accessible to the real parent and the ptracer
respectively; however, notifications of job control stopped/continued
events to the real parent while ptraced are still missing.

A ptracee participates in group stop in ptrace_stop() but the
completion isn't notified.  If participation results in completion of
group stop, notify the real parent of the event.  The ptrace and group
stops are separate and can be handled as such.

However, when the real parent and the ptracer are in the same thread
group, only the ptrace stop event is visible through wait(2) and the
duplicate notifications are different from the current behavior and
are confusing.  Suppress group stop notification in such cases.

The continued state is shared between the real parent and the ptracer
but is only meaningful to the real parent.  Always notify the real
parent and notify the ptracer too for backward compatibility.  Similar
to stop notification, if the real parent is the ptracer, suppress a
duplicate notification.

Test case follows.

  #include <stdio.h>
  #include <unistd.h>
  #include <time.h>
  #include <errno.h>
  #include <sys/types.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  int main(void)
  {
	  const struct timespec ts100ms = { .tv_nsec = 100000000 };
	  pid_t tracee, tracer;
	  siginfo_t si;
	  int i;

	  tracee = fork();
	  if (tracee == 0) {
		  while (1) {
			  printf("tracee: SIGSTOP\n");
			  raise(SIGSTOP);
			  nanosleep(&ts100ms, NULL);
			  printf("tracee: SIGCONT\n");
			  raise(SIGCONT);
			  nanosleep(&ts100ms, NULL);
		  }
	  }

	  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG | WNOWAIT);

	  tracer = fork();
	  if (tracer == 0) {
		  nanosleep(&ts100ms, NULL);
		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);

		  for (i = 0; i < 11; i++) {
			  si.si_pid = 0;
			  waitid(P_PID, tracee, &si, WSTOPPED);
			  if (si.si_pid && si.si_code == CLD_TRAPPED)
				  ptrace(PTRACE_CONT, tracee, NULL,
					 (void *)(long)si.si_status);
		  }
		  printf("tracer: EXITING\n");
		  return 0;
	  }

	  while (1) {
		  si.si_pid = 0;
		  waitid(P_PID, tracee, &si, WSTOPPED | WCONTINUED | WEXITED);
		  if (si.si_pid)
			  printf("mommy : WAIT status=%02d code=%02d\n",
				 si.si_status, si.si_code);
	  }
	  return 0;
  }

Before this patch, while ptraced, the real parent doesn't get
notifications for job control events, so although it can access those
events, the later waitid(2) call never wakes up.

  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  tracee: SIGSTOP
  tracee: SIGCONT
  tracee: SIGSTOP
  tracee: SIGCONT
  tracee: SIGSTOP
  tracer: EXITING
  mommy : WAIT status=19 code=05
  ^C

After this patch, it works as expected.

  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  tracer: EXITING
  mommy : WAIT status=19 code=05
  ^C

-v2: Oleg pointed out that

     * Group stop notification to the real parent should also happen
       when ptracer detach races with ptrace_stop().

     * real_parent_is_ptracer() should be testing thread group
       equality not the task itself as wait(2) and stop/cont
       notifications are normally thread-group wide.

     Both issues are fixed accordingly.

-v3: real_parent_is_ptracer() updated to test child->real_parent
     instead of child->group_leader->real_parent per Oleg's
     suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/signal.c |   53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9f10b24..f65403d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1694,6 +1694,15 @@ static int sigkill_pending(struct task_struct *tsk)
 }
 
 /*
+ * Test whether the target task of the usual cldstop notification - the
+ * real_parent of @child - is in the same group as the ptracer.
+ */
+static bool real_parent_is_ptracer(struct task_struct *child)
+{
+	return same_thread_group(child->parent, child->real_parent);
+}
+
+/*
  * This must be called with current->sighand->siglock held.
  *
  * This should be the path for all ptrace stops.
@@ -1708,6 +1717,8 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
+	bool gstop_done = false;
+
 	if (arch_ptrace_stop_needed(exit_code, info)) {
 		/*
 		 * The arch code has something special to do before a
@@ -1735,7 +1746,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	 * is entered - ignore it.
 	 */
 	if (why == CLD_STOPPED && (current->group_stop & GROUP_STOP_PENDING))
-		task_participate_group_stop(current);
+		gstop_done = task_participate_group_stop(current);
 
 	current->last_siginfo = info;
 	current->exit_code = exit_code;
@@ -1757,7 +1768,20 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
 	if (may_ptrace_stop()) {
-		do_notify_parent_cldstop(current, task_ptrace(current), why);
+		/*
+		 * Notify parents of the stop.
+		 *
+		 * While ptraced, there are two parents - the ptracer and
+		 * the real_parent of the group_leader.  The ptracer should
+		 * know about every stop while the real parent is only
+		 * interested in the completion of group stop.  The states
+		 * for the two don't interact with each other.  Notify
+		 * separately unless they're gonna be duplicates.
+		 */
+		do_notify_parent_cldstop(current, true, why);
+		if (gstop_done && !real_parent_is_ptracer(current))
+			do_notify_parent_cldstop(current, false, why);
+
 		/*
 		 * Don't want to allow preemption here, because
 		 * sys_ptrace() needs this task to be inactive.
@@ -1772,7 +1796,16 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 		/*
 		 * By the time we got the lock, our tracer went away.
 		 * Don't drop the lock yet, another tracer may come.
+		 *
+		 * If @gstop_done, the ptracer went away between group stop
+		 * completion and here.  During detach, it would have set
+		 * GROUP_STOP_PENDING on us and we'll re-enter TASK_STOPPED
+		 * in do_signal_stop() on return, so notifying the real
+		 * parent of the group stop completion is enough.
 		 */
+		if (gstop_done)
+			do_notify_parent_cldstop(current, false, why);
+
 		__set_current_state(TASK_RUNNING);
 		if (clear_code)
 			current->exit_code = 0;
@@ -2017,10 +2050,24 @@ relock:
 
 		spin_unlock_irq(&sighand->siglock);
 
+		/*
+		 * Notify the parent that we're continuing.  This event is
+		 * always per-process and doesn't make whole lot of sense
+		 * for ptracers, who shouldn't consume the state via
+		 * wait(2) either, but, for backward compatibility, notify
+		 * the ptracer of the group leader too unless it's gonna be
+		 * a duplicate.
+		 */
 		read_lock(&tasklist_lock);
+
+		do_notify_parent_cldstop(current, false, why);
+
 		leader = current->group_leader;
-		do_notify_parent_cldstop(leader, task_ptrace(leader), why);
+		if (task_ptrace(leader) && !real_parent_is_ptracer(leader))
+			do_notify_parent_cldstop(leader, true, why);
+
 		read_unlock(&tasklist_lock);
+
 		goto relock;
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 20/20] job control: Don't send duplicate job control stop notification while ptraced
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (18 preceding siblings ...)
  2011-03-23 10:06 ` [PATCH 19/20] job control: Notify the real parent of job control events regardless of ptrace Tejun Heo
@ 2011-03-23 10:06 ` Tejun Heo
  2011-03-23 18:38 ` [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Oleg Nesterov
  20 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-23 10:06 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, roland, Tejun Heo

Just as group_exit_code shouldn't be generated when a PTRACE_CONT'd
task re-enters job control stop, notifiction for the event should be
suppressed too.  The logic is the same as the group_exit_code
generation suppression in do_signal_stop(), if SIGNAL_STOP_STOPPED is
already set, the task is re-entering job control stop without
intervening SIGCONT and the notifications should be suppressed.

Test case follows.

  #include <stdio.h>
  #include <unistd.h>
  #include <signal.h>
  #include <time.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  static const struct timespec ts100ms = { .tv_nsec = 100000000 };
  static pid_t tracee, tracer;

  static const char *pid_who(pid_t pid)
  {
	  return pid == tracee ? "tracee" : (pid == tracer ? "tracer" : "mommy ");
  }

  static void sigchld_sigaction(int signo, siginfo_t *si, void *ucxt)
  {
	  printf("%s: SIG status=%02d code=%02d (%s)\n",
		 pid_who(getpid()), si->si_status, si->si_code,
		 pid_who(si->si_pid));
  }

  int main(void)
  {
	  const struct sigaction chld_sa = { .sa_sigaction = sigchld_sigaction,
					     .sa_flags = SA_SIGINFO|SA_RESTART };
	  siginfo_t si;

	  sigaction(SIGCHLD, &chld_sa, NULL);

	  tracee = fork();
	  if (!tracee) {
		  tracee = getpid();
		  while (1)
			  pause();
	  }

	  kill(tracee, SIGSTOP);
	  waitid(P_PID, tracee, &si, WSTOPPED);

	  tracer = fork();
	  if (!tracer) {
		  tracer = getpid();
		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  printf("tracer: detaching\n");
		  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
		  return 0;
	  }

	  while (1)
		  pause();
	  return 0;
  }

Before the patch, the parent gets the second notification for the
tracee after the tracer detaches.  si_status is zero because
group_exit_code is not set by the group stop completion which
triggered this notification.

  mommy : SIG status=19 code=05 (tracee)
  tracer: SIG status=00 code=05 (tracee)
  tracer: SIG status=19 code=04 (tracee)
  tracer: SIG status=00 code=05 (tracee)
  tracer: detaching
  mommy : SIG status=00 code=05 (tracee)
  mommy : SIG status=00 code=01 (tracer)
  ^C

After the patch, the duplicate notification is gone.

  mommy : SIG status=19 code=05 (tracee)
  tracer: SIG status=00 code=05 (tracee)
  tracer: SIG status=19 code=04 (tracee)
  tracer: SIG status=00 code=05 (tracee)
  tracer: detaching
  mommy : SIG status=00 code=01 (tracer)
  ^C

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/signal.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index f65403d..f799a05 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -268,6 +268,10 @@ void task_clear_group_stop_pending(struct task_struct *task)
  *
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
+ *
+ * RETURNS:
+ * %true if group stop completion should be notified to the parent, %false
+ * otherwise.
  */
 static bool task_participate_group_stop(struct task_struct *task)
 {
@@ -284,7 +288,11 @@ static bool task_participate_group_stop(struct task_struct *task)
 	if (!WARN_ON_ONCE(sig->group_stop_count == 0))
 		sig->group_stop_count--;
 
-	if (!sig->group_stop_count) {
+	/*
+	 * Tell the caller to notify completion iff we are entering into a
+	 * fresh group stop.  Read comment in do_signal_stop() for details.
+	 */
+	if (!sig->group_stop_count && !(sig->flags & SIGNAL_STOP_STOPPED)) {
 		sig->flags = SIGNAL_STOP_STOPPED;
 		return true;
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHSET] ptrace,signal: Improve ptrace and job control interaction
  2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
                   ` (19 preceding siblings ...)
  2011-03-23 10:06 ` [PATCH 20/20] job control: Don't send duplicate job control stop notification while ptraced Tejun Heo
@ 2011-03-23 18:38 ` Oleg Nesterov
  2011-03-25 14:26   ` Tejun Heo
  20 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2011-03-23 18:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, roland

On 03/23, Tejun Heo wrote:
>
> All patches have been reviewed and acked by Oleg and this will be the
> last repost of these patches.  They're being committed to a stable git
> tree (will eventually be managed by Oleg) and will be pulled into
> linux-next once -rc1 is released.

Thanks Tejun. The whole series is fine imho.

But of course we need more changes. In particular, there is still the
small problem with the CLD_CONTINUED notification.

__ptrace_unlink() does signal_wake_up() if it adds SIGNAL_STOP_STOPPED.
This is correct, but it should also add TIF_SIGPENDING if
(signal->flags & SIGNAL_CLD_MASK) != 0.

Otherwise, if the stopped tracee was PTRACE_CONT'ed and then SIGCONT
ends the group-stop, the real_parent won't be notified after detach.

Unfortunately, this means that recalc_sigpending_tsk() has to check
SIGNAL_CLD_MASK as well. Do you see another solution?

There is another case. SIGCONT can hit the stopped-but-running-task,
but I don't think we should try to set TIF_SIGPENDING in this case,
you are going to add the trap later.
There is another case. SIGCONT can hit the stopped-but-running-task,
but I don't think we should try to set TIF_SIGPENDING in this case,
you are going to add the trap later.

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET] ptrace,signal: Improve ptrace and job control interaction
  2011-03-23 18:38 ` [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Oleg Nesterov
@ 2011-03-25 14:26   ` Tejun Heo
  2011-03-26 18:25     ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2011-03-25 14:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, roland

Hello, Oleg.

On Wed, Mar 23, 2011 at 07:38:37PM +0100, Oleg Nesterov wrote:
> But of course we need more changes. In particular, there is still the
> small problem with the CLD_CONTINUED notification.
> 
> __ptrace_unlink() does signal_wake_up() if it adds SIGNAL_STOP_STOPPED.
> This is correct, but it should also add TIF_SIGPENDING if
> (signal->flags & SIGNAL_CLD_MASK) != 0.
> 
> Otherwise, if the stopped tracee was PTRACE_CONT'ed and then SIGCONT
> ends the group-stop, the real_parent won't be notified after detach.

Heh, that's an interesting one.  I don't think it has much to do with
__ptrace_unlink() tho.  Isn't the proper solution using something akin
to signal_wake_up() in SIGCONT generation path in prepare_signal()?

Explicit wake_up_state() without kick_process() is okay there because
if the code assumes that the tasks are guaranteed to pass through
signal delivery path whenever event worthy of notification happens
(either SIGNAL_STOP_STOPPED or group_stop_count is set).  PTRACE_CONT
breaks that as the tracee could be running in userland and thus the
solution is to add kick_process() as in signal_wake_up().

Am I making any sense?

> Unfortunately, this means that recalc_sigpending_tsk() has to check
> SIGNAL_CLD_MASK as well. Do you see another solution?

Hmmm... I think the above subtle breakage exists for !ptrace case too.
Please consider the following scenario.

* SIGSTOP is sent to a task and group stop is initiated.

* Before the task participates in group stop, SIGCONT is sent.

* Before CLD_STOPPED notification for the incomplete-stop/cont
  sequence can be made, recalc_sigpending() happens.

* CLD_STOPPED notification is pending but TIF_SIGPENDING isn't set and
  the task isn't in signal delivery path and can continue execution.

It's a pretty convoluted extremely unlikely corner case tho.  Anyways,
adding SIGNAL_CLD_MASK test to recalc_sigpending() should solve it.

> There is another case. SIGCONT can hit the stopped-but-running-task,
> but I don't think we should try to set TIF_SIGPENDING in this case,
> you are going to add the trap later.

Hmmm... As I wrote above, I think we should do it regardless of the
new trap.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET] ptrace,signal: Improve ptrace and job control interaction
  2011-03-25 14:26   ` Tejun Heo
@ 2011-03-26 18:25     ` Oleg Nesterov
  2011-03-28  8:58       ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2011-03-26 18:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, roland

Hello Tejun,

On 03/25, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Wed, Mar 23, 2011 at 07:38:37PM +0100, Oleg Nesterov wrote:
> > But of course we need more changes. In particular, there is still the
> > small problem with the CLD_CONTINUED notification.
> >
> > __ptrace_unlink() does signal_wake_up() if it adds SIGNAL_STOP_STOPPED.
> > This is correct, but it should also add TIF_SIGPENDING if
> > (signal->flags & SIGNAL_CLD_MASK) != 0.
> >
> > Otherwise, if the stopped tracee was PTRACE_CONT'ed and then SIGCONT
> > ends the group-stop, the real_parent won't be notified after detach.
>
> Heh, that's an interesting one.  I don't think it has much to do with
> __ptrace_unlink() tho.  Isn't the proper solution using something akin
> to signal_wake_up() in SIGCONT generation path in prepare_signal()?

I am not sure... but please see below.

> Explicit wake_up_state() without kick_process() is okay there because
> if the code assumes that the tasks are guaranteed to pass through
> signal delivery path whenever event worthy of notification happens
> (either SIGNAL_STOP_STOPPED or group_stop_count is set).  PTRACE_CONT
> breaks that as the tracee could be running in userland and thus the
> solution is to add kick_process() as in signal_wake_up().
>
> Am I making any sense?

Perhaps. This depends on how we define/implement the new behaviour.

It is not clear to me what the new trap should actually do. And how.
Either way, prepare_signal(SIGCONT) should do something with the
ptraced threads, and this is what we should care about. Probably
we can set TIF_SIGPENDING if task_ptrace() is true.

Anyway we should ensure SIGCONT can't race with detach.

> > Unfortunately, this means that recalc_sigpending_tsk() has to check
> > SIGNAL_CLD_MASK as well. Do you see another solution?
>
> Hmmm... I think the above subtle breakage exists for !ptrace case too.
> Please consider the following scenario.
>
> * SIGSTOP is sent to a task and group stop is initiated.
>
> * Before the task participates in group stop, SIGCONT is sent.

In this case we are doing nothing intentionally, as if SIGSTOP wasn't
sent.

> * Before CLD_STOPPED notification for the incomplete-stop/cont
>   sequence can be made, recalc_sigpending() happens.
>
> * CLD_STOPPED notification is pending but TIF_SIGPENDING isn't set and
>   the task isn't in signal delivery path and can continue execution.

This doesn't matter, or I misunderstood.

We only add "SIGNAL_CLD_* | SIGNAL_STOP_CONTINUED" if we know there
is at least one thread in get_signal_to_deliver()->do_signal_stop()
paths. In this case we do not rely on TIF_SIGPENDING at all.


(just in case, I am ignoring the current problems with group-stop
 and PTRACE_CONT).

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET] ptrace,signal: Improve ptrace and job control interaction
  2011-03-26 18:25     ` Oleg Nesterov
@ 2011-03-28  8:58       ` Tejun Heo
  2011-03-28 12:14         ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2011-03-28  8:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, roland

Hey,

On Sat, Mar 26, 2011 at 07:25:54PM +0100, Oleg Nesterov wrote:
> > Explicit wake_up_state() without kick_process() is okay there because
> > if the code assumes that the tasks are guaranteed to pass through
> > signal delivery path whenever event worthy of notification happens
> > (either SIGNAL_STOP_STOPPED or group_stop_count is set).  PTRACE_CONT
> > breaks that as the tracee could be running in userland and thus the
> > solution is to add kick_process() as in signal_wake_up().
> >
> > Am I making any sense?
> 
> Perhaps. This depends on how we define/implement the new behaviour.
> 
> It is not clear to me what the new trap should actually do. And how.
> Either way, prepare_signal(SIGCONT) should do something with the
> ptraced threads, and this is what we should care about. Probably
> we can set TIF_SIGPENDING if task_ptrace() is true.
> 
> Anyway we should ensure SIGCONT can't race with detach.

Hmmm... setting TIF_SIGPENDING and kicking the task to enter signal
delivery path doesn't have any side effect when it's running in
userland, which is the problematic case here anyway.  Is there any
reason we should be careful about this?

> > * Before CLD_STOPPED notification for the incomplete-stop/cont
> >   sequence can be made, recalc_sigpending() happens.
> >
> > * CLD_STOPPED notification is pending but TIF_SIGPENDING isn't set and
> >   the task isn't in signal delivery path and can continue execution.
> 
> This doesn't matter, or I misunderstood.
> 
> We only add "SIGNAL_CLD_* | SIGNAL_STOP_CONTINUED" if we know there
> is at least one thread in get_signal_to_deliver()->do_signal_stop()
> paths. In this case we do not rely on TIF_SIGPENDING at all.

We set SIGNAL_CLD_STOPPED if group_stop_count wasn't zero, ie. if
group stop has initiated, which will be delivered as soon as any task
enters signal delivery path.  So, there's a path that we schedule a
notification and doesn't enforce the delivery until something happens
and a task in the group gets called into signal delivery path somehow,
which is wrong.  We need to call them into the delivery path on the
generation of the notification regardless of ptrace.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET] ptrace,signal: Improve ptrace and job control interaction
  2011-03-28  8:58       ` Tejun Heo
@ 2011-03-28 12:14         ` Oleg Nesterov
  2011-03-28 15:21           ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2011-03-28 12:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, roland

On 03/28, Tejun Heo wrote:
>
> Hey,
>
> On Sat, Mar 26, 2011 at 07:25:54PM +0100, Oleg Nesterov wrote:
> > > Explicit wake_up_state() without kick_process() is okay there because
> > > if the code assumes that the tasks are guaranteed to pass through
> > > signal delivery path whenever event worthy of notification happens
> > > (either SIGNAL_STOP_STOPPED or group_stop_count is set).  PTRACE_CONT
> > > breaks that as the tracee could be running in userland and thus the
> > > solution is to add kick_process() as in signal_wake_up().
> > >
> > > Am I making any sense?
> >
> > Perhaps. This depends on how we define/implement the new behaviour.
> >
> > It is not clear to me what the new trap should actually do. And how.
> > Either way, prepare_signal(SIGCONT) should do something with the
> > ptraced threads, and this is what we should care about. Probably
> > we can set TIF_SIGPENDING if task_ptrace() is true.
> >
> > Anyway we should ensure SIGCONT can't race with detach.
>
> Hmmm... setting TIF_SIGPENDING and kicking the task to enter signal
> delivery path doesn't have any side effect when it's running in
> userland,

Yes. We should avoid the spurious TIF_SIGPENDING, if possible. But in
this case we don't care.

But, unless the thread is ptraced, it can't be running in userland,
why should we set TIF_SIGPENDING?

> > > * Before CLD_STOPPED notification for the incomplete-stop/cont
> > >   sequence can be made, recalc_sigpending() happens.
> > >
> > > * CLD_STOPPED notification is pending but TIF_SIGPENDING isn't set and
> > >   the task isn't in signal delivery path and can continue execution.
> >
> > This doesn't matter, or I misunderstood.
> >
> > We only add "SIGNAL_CLD_* | SIGNAL_STOP_CONTINUED" if we know there
> > is at least one thread in get_signal_to_deliver()->do_signal_stop()
> > paths. In this case we do not rely on TIF_SIGPENDING at all.
>
> We set SIGNAL_CLD_STOPPED if group_stop_count wasn't zero, ie. if
> group stop has initiated, which will be delivered as soon as any task
> enters signal delivery path.

Yes. And that task T has already called do_signal_stop() and it is
TASK_STOPPED.

> So, there's a path that we schedule a
> notification and doesn't enforce the delivery until something happens

prepare_signal(SIGCONT) wakes up all threads, including T. Once it
returns from do_signal_stop() to get_signal_to_deliver(), it will
check signal->flags.

> and a task in the group gets called into signal delivery path somehow,
> which is wrong.

Afaics, no. No need to force any thread to enter into the signal
delivery path. If group_stop_count != 0 (or SIGNAL_STOP_STOPPED is
set) there must be at least one thread which should _return_ to
get_signal_to_deliver() after wakeup.

No?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHSET] ptrace,signal: Improve ptrace and job control interaction
  2011-03-28 12:14         ` Oleg Nesterov
@ 2011-03-28 15:21           ` Tejun Heo
  0 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2011-03-28 15:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, roland

Hi,

On Mon, Mar 28, 2011 at 02:14:01PM +0200, Oleg Nesterov wrote:
> > Hmmm... setting TIF_SIGPENDING and kicking the task to enter signal
> > delivery path doesn't have any side effect when it's running in
> > userland,
> 
> Yes. We should avoid the spurious TIF_SIGPENDING, if possible. But in
> this case we don't care.
> 
> But, unless the thread is ptraced, it can't be running in userland,
> why should we set TIF_SIGPENDING?

It goes both ways.  If we need it for ptrace path, the overhead for
!ptrace case is negligible && it makes the code less finicky, why not?

It's much cleaner to say "it sets notification condition and
guarantees that tasks travel signal delivery path" than "if !ptraced,
at least one task should already be in signal delivery path for such
and such reasons; however, while ptraced, because of the interaction
with PTRACE_CONT, there's a case we need to call in the task
explicitly."  The code becomes less prone to subtle bugs when the
surrounding code changes too.

> > We set SIGNAL_CLD_STOPPED if group_stop_count wasn't zero, ie. if
> > group stop has initiated, which will be delivered as soon as any task
> > enters signal delivery path.
> 
> Yes. And that task T has already called do_signal_stop() and it is
> TASK_STOPPED.

Ah, right, this is where I got confused.  signal_stop_count wouldn't
be set unless at least one task is already in signal delivery path.

> No?

You're right.  I was confused.  But if we're gonna call in a task,
let's do it regardless of ptrace.  It's not like splitting that
condition will buy us anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-03-28 15:22 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-23 10:05 [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Tejun Heo
2011-03-23 10:05 ` [PATCH 01/20] signal: Fix SIGCONT notification code Tejun Heo
2011-03-23 10:05 ` [PATCH 02/20] ptrace: Remove the extra wake_up_state() from ptrace_detach() Tejun Heo
2011-03-23 10:05 ` [PATCH 03/20] signal: Remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
2011-03-23 10:05 ` [PATCH 04/20] ptrace: Kill tracehook_notify_jctl() Tejun Heo
2011-03-23 10:05 ` [PATCH 05/20] ptrace: Add @why to ptrace_stop() Tejun Heo
2011-03-23 10:05 ` [PATCH 06/20] signal: Fix premature completion of group stop when interfered by ptrace Tejun Heo
2011-03-23 10:05 ` [PATCH 07/20] signal: Use GROUP_STOP_PENDING to stop once for a single group stop Tejun Heo
2011-03-23 10:05 ` [PATCH 08/20] ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for " Tejun Heo
2011-03-23 10:05 ` [PATCH 09/20] ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
2011-03-23 10:05 ` [PATCH 10/20] ptrace: Clean transitions between TASK_STOPPED and TRACED Tejun Heo
2011-03-23 10:05 ` [PATCH 11/20] ptrace: Collapse ptrace_untrace() into __ptrace_unlink() Tejun Heo
2011-03-23 10:05 ` [PATCH 12/20] ptrace: Always put ptracee into appropriate execution state Tejun Heo
2011-03-23 10:05 ` [PATCH 13/20] job control: Don't set group_stop exit_code if re-entering job control stop Tejun Heo
2011-03-23 10:06 ` [PATCH 14/20] job control: Small reorganization of wait_consider_task() Tejun Heo
2011-03-23 10:06 ` [PATCH 15/20] job control: Fix ptracer wait(2) hang and explain notask_error clearing Tejun Heo
2011-03-23 10:06 ` [PATCH 16/20] job control: Allow access to job control events through ptracees Tejun Heo
2011-03-23 10:06 ` [PATCH 17/20] job control: Add @for_ptrace to do_notify_parent_cldstop() Tejun Heo
2011-03-23 10:06 ` [PATCH 18/20] job control: Job control stop notifications should always go to the real parent Tejun Heo
2011-03-23 10:06 ` [PATCH 19/20] job control: Notify the real parent of job control events regardless of ptrace Tejun Heo
2011-03-23 10:06 ` [PATCH 20/20] job control: Don't send duplicate job control stop notification while ptraced Tejun Heo
2011-03-23 18:38 ` [PATCHSET] ptrace,signal: Improve ptrace and job control interaction Oleg Nesterov
2011-03-25 14:26   ` Tejun Heo
2011-03-26 18:25     ` Oleg Nesterov
2011-03-28  8:58       ` Tejun Heo
2011-03-28 12:14         ` Oleg Nesterov
2011-03-28 15:21           ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).