All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3
@ 2011-05-24 18:37 Tejun Heo
  2011-05-24 18:37 ` [PATCH 01/19] job control: rename signal->group_stop and flags to jobctl and rearrange flags Tejun Heo
                   ` (18 more replies)
  0 siblings, 19 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro

Hello,

This is the third try at implementing PTRACE_SEIZE/INTERRUPT and group
stop notification.  This patchset contains both the prep and the
actual implementation patches.  Changes from the second take[1][2]
are,

- 0008-ptrace-move-JOBCTL_TRAPPING-wait-to-wait-2-and-ptrac.patch:

  wait_task_stopped() syscall restart fixed such that -ERESTARTSYS is
  used for !WNOHANG waits.

- 0006-job-control-introduce-task_set_jobctl_pending.patch:

  Added to address Oleg's concern that setting trap conditions on
  dying task may make it unkillable.  task_set_jobctl_pending() is
  always used when raising stop/trap conditions and becomes noop if
  target task is dying.  0011, 0013 and 0019 updated to use
  task_set_jobctl_pending().

- 0012-ptrace-implement-PTRACE_SEIZE.patch:

  PTRACE_SEIZE no longer traps the tracee automatically as suggested
  by Jan Kratochvil.  If tracee is running, it's left running.

- 0016-ptrace-make-group-stop-state-visible-via-PTRACE_GETS.patch:
- 0018-ptrace-add-JOBCTL_BLOCK_NOTIFY.patch:-v2:

  Cosmetic updates as per review.

This patchset contains the following 19 patches.

  0001-job-control-rename-signal-group_stop-and-flags-to-jo.patch
  0002-ptrace-ptrace_check_attach-rename-kill-to-ignore_sta.patch
  0003-ptrace-relocate-set_current_state-TASK_TRACED-in-ptr.patch
  0004-job-control-introduce-JOBCTL_PENDING_MASK-and-task_c.patch
  0005-job-control-make-task_clear_jobctl_pending-clear-TRA.patch
  0006-job-control-introduce-task_set_jobctl_pending.patch
  0007-ptrace-use-bit_waitqueue-for-TRAPPING-instead-of-wai.patch
  0008-ptrace-move-JOBCTL_TRAPPING-wait-to-wait-2-and-ptrac.patch
  0009-ptrace-make-TRAPPING-wait-interruptible.patch
  0010-signal-remove-three-noop-tracehooks.patch
  0011-job-control-introduce-JOBCTL_TRAP_STOP-and-use-it-fo.patch
  0012-ptrace-implement-PTRACE_SEIZE.patch
  0013-ptrace-implement-PTRACE_INTERRUPT.patch
  0014-ptrace-restructure-ptrace_getsiginfo.patch
  0015-ptrace-add-siginfo.si_pt_flags.patch
  0016-ptrace-make-group-stop-state-visible-via-PTRACE_GETS.patch
  0017-ptrace-don-t-let-PTRACE_SETSIGINFO-override-__SI_TRA.patch
  0018-ptrace-add-JOBCTL_BLOCK_NOTIFY.patch
  0019-ptrace-implement-group-stop-notification-for-ptracer.patch

and available in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-ptrace-seize

The HEAD is 6e3eb3ab5f (ptrace: implement group stop notification for
ptracer).  If you see older branch, please retry after a while (korg
is still syncing).

The patchset is on top of today's (20110524) mainline -
d762f438310 (Merge branch 'sh-latest' of
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6).

diffstat follows.

 arch/ia64/include/asm/siginfo.h       |    7 
 arch/ia64/kernel/signal.c             |    5 
 arch/mips/include/asm/compat-signal.h |    7 
 arch/mips/include/asm/siginfo.h       |    7 
 arch/mips/kernel/signal32.c           |    5 
 arch/parisc/kernel/signal32.c         |    5 
 arch/parisc/kernel/signal32.h         |    7 
 arch/powerpc/kernel/ppc32.h           |    7 
 arch/powerpc/kernel/signal_32.c       |    5 
 arch/s390/kernel/compat_linux.h       |    7 
 arch/s390/kernel/compat_signal.c      |    5 
 arch/sparc/kernel/signal32.c          |   12 +
 arch/tile/kernel/compat_signal.c      |   11 +
 arch/x86/ia32/ia32_signal.c           |    4 
 arch/x86/include/asm/ia32.h           |    7 
 fs/exec.c                             |    2 
 include/asm-generic/siginfo.h         |   10 
 include/linux/ptrace.h                |   16 +
 include/linux/sched.h                 |   28 +-
 include/linux/tracehook.h             |   52 ----
 kernel/exit.c                         |   27 ++
 kernel/ptrace.c                       |  302 +++++++++++++++++++++++----
 kernel/signal.c                       |  370 ++++++++++++++++++++++------------
 23 files changed, 672 insertions(+), 236 deletions(-)

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1139751
[2] http://thread.gmane.org/gmane.linux.kernel/1140778

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/19] job control: rename signal->group_stop and flags to jobctl and rearrange flags
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 02/19] ptrace: ptrace_check_attach(): rename @kill to @ignore_state and add comments Tejun Heo
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

signal->group_stop currently hosts mostly group stop related flags;
however, it's gonna be used for wider purposes and the GROUP_STOP_
flag prefix becomes confusing.  Rename signal->group_stop to
signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.

Also, reassign JOBCTL_TRAPPING to bit 22 to better accomodate future
additions.

This doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/exec.c             |    2 +-
 include/linux/sched.h |   16 ++++----
 kernel/ptrace.c       |   12 +++---
 kernel/signal.c       |   91 +++++++++++++++++++++++++------------------------
 4 files changed, 61 insertions(+), 60 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c1cf372..223e3b2 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1735,7 +1735,7 @@ static int zap_process(struct task_struct *start, int exit_code)
 
 	t = start;
 	do {
-		task_clear_group_stop_pending(t);
+		task_clear_jobctl_stop_pending(t);
 		if (t != current && t->mm) {
 			sigaddset(&t->pending.signal, SIGKILL);
 			signal_wake_up(t, 1);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aaf71e0..01178e6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1271,7 +1271,7 @@ struct task_struct {
 	int exit_state;
 	int exit_code, exit_signal;
 	int pdeath_signal;  /*  The signal sent when the parent dies  */
-	unsigned int group_stop;	/* GROUP_STOP_*, siglock protected */
+	unsigned int jobctl;	/* JOBCTL_*, siglock protected */
 	/* ??? */
 	unsigned int personality;
 	unsigned did_exec:1;
@@ -1793,15 +1793,15 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define used_math() tsk_used_math(current)
 
 /*
- * task->group_stop flags
+ * task->jobctl flags
  */
-#define GROUP_STOP_SIGMASK	0xffff    /* signr of the last group stop */
-#define GROUP_STOP_PENDING	(1 << 16) /* task should stop for group stop */
-#define GROUP_STOP_CONSUME	(1 << 17) /* consume group stop count */
-#define GROUP_STOP_TRAPPING	(1 << 18) /* switching from STOPPED to TRACED */
-#define GROUP_STOP_DEQUEUED	(1 << 19) /* stop signal dequeued */
+#define JOBCTL_STOP_SIGMASK	0xffff    /* signr of the last group stop */
+#define JOBCTL_STOP_DEQUEUED	(1 << 16) /* stop signal dequeued */
+#define JOBCTL_STOP_PENDING	(1 << 17) /* task should stop for group stop */
+#define JOBCTL_STOP_CONSUME	(1 << 18) /* consume group stop count */
+#define JOBCTL_TRAPPING		(1 << 21) /* switching to TRACED */
 
-extern void task_clear_group_stop_pending(struct task_struct *task);
+extern void task_clear_jobctl_stop_pending(struct task_struct *task);
 
 #ifdef CONFIG_PREEMPT_RCU
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7a81fc0..c7044d3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -77,13 +77,13 @@ void __ptrace_unlink(struct task_struct *child)
 	spin_lock(&child->sighand->siglock);
 
 	/*
-	 * Reinstate GROUP_STOP_PENDING if group stop is in effect and
+	 * Reinstate JOBCTL_STOP_PENDING if group stop is in effect and
 	 * @child isn't dead.
 	 */
 	if (!(child->flags & PF_EXITING) &&
 	    (child->signal->flags & SIGNAL_STOP_STOPPED ||
 	     child->signal->group_stop_count))
-		child->group_stop |= GROUP_STOP_PENDING;
+		child->jobctl |= JOBCTL_STOP_PENDING;
 
 	/*
 	 * If transition to TASK_STOPPED is pending or in TASK_TRACED, kick
@@ -91,7 +91,7 @@ void __ptrace_unlink(struct task_struct *child)
 	 * is in TASK_TRACED; otherwise, we might unduly disrupt
 	 * TASK_KILLABLE sleeps.
 	 */
-	if (child->group_stop & GROUP_STOP_PENDING || task_is_traced(child))
+	if (child->jobctl & JOBCTL_STOP_PENDING || task_is_traced(child))
 		signal_wake_up(child, task_is_traced(child));
 
 	spin_unlock(&child->sighand->siglock);
@@ -227,7 +227,7 @@ static int ptrace_attach(struct task_struct *task)
 	spin_lock(&task->sighand->siglock);
 
 	/*
-	 * If the task is already STOPPED, set GROUP_STOP_PENDING and
+	 * If the task is already STOPPED, set JOBCTL_STOP_PENDING and
 	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
 	 * will be cleared if the child completes the transition or any
 	 * event which clears the group stop states happens.  We'll wait
@@ -244,7 +244,7 @@ static int ptrace_attach(struct task_struct *task)
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task)) {
-		task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
+		task->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;
 		signal_wake_up(task, 1);
 		wait_trap = true;
 	}
@@ -259,7 +259,7 @@ unlock_creds:
 out:
 	if (wait_trap)
 		wait_event(current->signal->wait_chldexit,
-			   !(task->group_stop & GROUP_STOP_TRAPPING));
+			   !(task->jobctl & JOBCTL_TRAPPING));
 	return retval;
 }
 
diff --git a/kernel/signal.c b/kernel/signal.c
index ad5e818..42d9bbd 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -124,7 +124,7 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)
 
 static int recalc_sigpending_tsk(struct task_struct *t)
 {
-	if ((t->group_stop & GROUP_STOP_PENDING) ||
+	if ((t->jobctl & JOBCTL_STOP_PENDING) ||
 	    PENDING(&t->pending, &t->blocked) ||
 	    PENDING(&t->signal->shared_pending, &t->blocked)) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
@@ -224,27 +224,28 @@ static inline void print_dropped_signal(int sig)
 }
 
 /**
- * task_clear_group_stop_trapping - clear group stop trapping bit
+ * task_clear_jobctl_trapping - clear jobctl trapping bit
  * @task: target task
  *
- * If GROUP_STOP_TRAPPING is set, a ptracer is waiting for us.  Clear it
- * and wake up the ptracer.  Note that we don't need any further locking.
- * @task->siglock guarantees that @task->parent points to the ptracer.
+ * If JOBCTL_TRAPPING is set, a ptracer is waiting for us to enter TRACED.
+ * Clear it and wake up the ptracer.  Note that we don't need any further
+ * locking.  @task->siglock guarantees that @task->parent points to the
+ * ptracer.
  *
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
-static void task_clear_group_stop_trapping(struct task_struct *task)
+static void task_clear_jobctl_trapping(struct task_struct *task)
 {
-	if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
-		task->group_stop &= ~GROUP_STOP_TRAPPING;
+	if (unlikely(task->jobctl & JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_TRAPPING;
 		__wake_up_sync_key(&task->parent->signal->wait_chldexit,
 				   TASK_UNINTERRUPTIBLE, 1, task);
 	}
 }
 
 /**
- * task_clear_group_stop_pending - clear pending group stop
+ * task_clear_jobctl_stop_pending - clear pending group stop
  * @task: target task
  *
  * Clear group stop states for @task.
@@ -252,19 +253,19 @@ static void task_clear_group_stop_trapping(struct task_struct *task)
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
-void task_clear_group_stop_pending(struct task_struct *task)
+void task_clear_jobctl_stop_pending(struct task_struct *task)
 {
-	task->group_stop &= ~(GROUP_STOP_PENDING | GROUP_STOP_CONSUME |
-			      GROUP_STOP_DEQUEUED);
+	task->jobctl &= ~(JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME |
+			  JOBCTL_STOP_DEQUEUED);
 }
 
 /**
  * task_participate_group_stop - participate in a group stop
  * @task: task participating in a group stop
  *
- * @task has GROUP_STOP_PENDING set and is participating in a group stop.
+ * @task has %JOBCTL_STOP_PENDING set and is participating in a group stop.
  * Group stop states are cleared and the group stop count is consumed if
- * %GROUP_STOP_CONSUME was set.  If the consumption completes the group
+ * %JOBCTL_STOP_CONSUME was set.  If the consumption completes the group
  * stop, the appropriate %SIGNAL_* flags are set.
  *
  * CONTEXT:
@@ -277,11 +278,11 @@ void task_clear_group_stop_pending(struct task_struct *task)
 static bool task_participate_group_stop(struct task_struct *task)
 {
 	struct signal_struct *sig = task->signal;
-	bool consume = task->group_stop & GROUP_STOP_CONSUME;
+	bool consume = task->jobctl & JOBCTL_STOP_CONSUME;
 
-	WARN_ON_ONCE(!(task->group_stop & GROUP_STOP_PENDING));
+	WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING));
 
-	task_clear_group_stop_pending(task);
+	task_clear_jobctl_stop_pending(task);
 
 	if (!consume)
 		return false;
@@ -604,7 +605,7 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 		 * is to alert stop-signal processing code when another
 		 * processor has come along and cleared the flag.
 		 */
-		current->group_stop |= GROUP_STOP_DEQUEUED;
+		current->jobctl |= JOBCTL_STOP_DEQUEUED;
 	}
 	if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private) {
 		/*
@@ -809,7 +810,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending);
 		t = p;
 		do {
-			task_clear_group_stop_pending(t);
+			task_clear_jobctl_stop_pending(t);
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
 			wake_up_state(t, __TASK_STOPPED);
 		} while_each_thread(p, t);
@@ -925,7 +926,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 			signal->group_stop_count = 0;
 			t = p;
 			do {
-				task_clear_group_stop_pending(t);
+				task_clear_jobctl_stop_pending(t);
 				sigaddset(&t->pending.signal, SIGKILL);
 				signal_wake_up(t, 1);
 			} while_each_thread(p, t);
@@ -1160,7 +1161,7 @@ int zap_other_threads(struct task_struct *p)
 	p->signal->group_stop_count = 0;
 
 	while_each_thread(p, t) {
-		task_clear_group_stop_pending(t);
+		task_clear_jobctl_stop_pending(t);
 		count++;
 
 		/* Don't bother with already dead threads */
@@ -1738,7 +1739,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	 * clear now.  We act as if SIGCONT is received after TASK_TRACED
 	 * is entered - ignore it.
 	 */
-	if (why == CLD_STOPPED && (current->group_stop & GROUP_STOP_PENDING))
+	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
 		gstop_done = task_participate_group_stop(current);
 
 	current->last_siginfo = info;
@@ -1751,12 +1752,12 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	set_current_state(TASK_TRACED);
 
 	/*
-	 * We're committing to trapping.  Clearing GROUP_STOP_TRAPPING and
+	 * We're committing to trapping.  Clearing JOBCTL_TRAPPING and
 	 * transition to TASK_TRACED should be atomic with respect to
-	 * siglock.  This hsould be done after the arch hook as siglock is
+	 * siglock.  This should be done after the arch hook as siglock is
 	 * released and regrabbed across it.
 	 */
-	task_clear_group_stop_trapping(current);
+	task_clear_jobctl_trapping(current);
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
@@ -1792,9 +1793,9 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 		 *
 		 * If @gstop_done, the ptracer went away between group stop
 		 * completion and here.  During detach, it would have set
-		 * GROUP_STOP_PENDING on us and we'll re-enter TASK_STOPPED
-		 * in do_signal_stop() on return, so notifying the real
-		 * parent of the group stop completion is enough.
+		 * JOBCTL_STOP_PENDING on us and we'll re-enter
+		 * TASK_STOPPED in do_signal_stop() on return, so notifying
+		 * the real parent of the group stop completion is enough.
 		 */
 		if (gstop_done)
 			do_notify_parent_cldstop(current, false, why);
@@ -1856,14 +1857,14 @@ static int do_signal_stop(int signr)
 {
 	struct signal_struct *sig = current->signal;
 
-	if (!(current->group_stop & GROUP_STOP_PENDING)) {
-		unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
+	if (!(current->jobctl & JOBCTL_STOP_PENDING)) {
+		unsigned int gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME;
 		struct task_struct *t;
 
-		/* signr will be recorded in task->group_stop for retries */
-		WARN_ON_ONCE(signr & ~GROUP_STOP_SIGMASK);
+		/* signr will be recorded in task->jobctl for retries */
+		WARN_ON_ONCE(signr & ~JOBCTL_STOP_SIGMASK);
 
-		if (!likely(current->group_stop & GROUP_STOP_DEQUEUED) ||
+		if (!likely(current->jobctl & JOBCTL_STOP_DEQUEUED) ||
 		    unlikely(signal_group_exit(sig)))
 			return 0;
 		/*
@@ -1890,19 +1891,19 @@ static int do_signal_stop(int signr)
 		else
 			WARN_ON_ONCE(!task_ptrace(current));
 
-		current->group_stop &= ~GROUP_STOP_SIGMASK;
-		current->group_stop |= signr | gstop;
+		current->jobctl &= ~JOBCTL_STOP_SIGMASK;
+		current->jobctl |= signr | gstop;
 		sig->group_stop_count = 1;
 		for (t = next_thread(current); t != current;
 		     t = next_thread(t)) {
-			t->group_stop &= ~GROUP_STOP_SIGMASK;
+			t->jobctl &= ~JOBCTL_STOP_SIGMASK;
 			/*
 			 * Setting state to TASK_STOPPED for a group
 			 * stop is always done with the siglock held,
 			 * so this check has no races.
 			 */
 			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
-				t->group_stop |= signr | gstop;
+				t->jobctl |= signr | gstop;
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
 			}
@@ -1943,23 +1944,23 @@ retry:
 
 		spin_lock_irq(&current->sighand->siglock);
 	} else {
-		ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
+		ptrace_stop(current->jobctl & JOBCTL_STOP_SIGMASK,
 			    CLD_STOPPED, 0, NULL);
 		current->exit_code = 0;
 	}
 
 	/*
-	 * GROUP_STOP_PENDING could be set if another group stop has
+	 * JOBCTL_STOP_PENDING could be set if another group stop has
 	 * started since being woken up or ptrace wants us to transit
 	 * between TASK_STOPPED and TRACED.  Retry group stop.
 	 */
-	if (current->group_stop & GROUP_STOP_PENDING) {
-		WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
+	if (current->jobctl & JOBCTL_STOP_PENDING) {
+		WARN_ON_ONCE(!(current->jobctl & JOBCTL_STOP_SIGMASK));
 		goto retry;
 	}
 
 	/* PTRACE_ATTACH might have raced with task killing, clear trapping */
-	task_clear_group_stop_trapping(current);
+	task_clear_jobctl_trapping(current);
 
 	spin_unlock_irq(&current->sighand->siglock);
 
@@ -2078,8 +2079,8 @@ relock:
 		if (unlikely(signr != 0))
 			ka = return_ka;
 		else {
-			if (unlikely(current->group_stop &
-				     GROUP_STOP_PENDING) && do_signal_stop(0))
+			if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
+			    do_signal_stop(0))
 				goto relock;
 
 			signr = dequeue_signal(current, &current->blocked,
@@ -2253,7 +2254,7 @@ void exit_signals(struct task_struct *tsk)
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
 
-	if (unlikely(tsk->group_stop & GROUP_STOP_PENDING) &&
+	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
 	    task_participate_group_stop(tsk))
 		group_stop = CLD_STOPPED;
 out:
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/19] ptrace: ptrace_check_attach(): rename @kill to @ignore_state and add comments
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
  2011-05-24 18:37 ` [PATCH 01/19] job control: rename signal->group_stop and flags to jobctl and rearrange flags Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 03/19] ptrace: relocate set_current_state(TASK_TRACED) in ptrace_stop() Tejun Heo
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

PTRACE_INTERRUPT is going to be added which should also skip
task_is_traced() check in ptrace_check_attach().  Rename @kill to
@ignore_state and make it bool.  Add function comment while at it.

This patch doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/ptrace.h |    2 +-
 kernel/ptrace.c        |   24 +++++++++++++++++++-----
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 9178d5c..e93ef1a 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -105,7 +105,7 @@ extern long arch_ptrace(struct task_struct *child, long request,
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
 extern int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long dst, int len);
 extern void ptrace_disable(struct task_struct *);
-extern int ptrace_check_attach(struct task_struct *task, int kill);
+extern int ptrace_check_attach(struct task_struct *task, bool ignore_state);
 extern int ptrace_request(struct task_struct *child, long request,
 			  unsigned long addr, unsigned long data);
 extern void ptrace_notify(int exit_code);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c7044d3..abf6383 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -97,10 +97,24 @@ void __ptrace_unlink(struct task_struct *child)
 	spin_unlock(&child->sighand->siglock);
 }
 
-/*
- * Check that we have indeed attached to the thing..
+/**
+ * ptrace_check_attach - check whether ptracee is ready for ptrace operation
+ * @child: ptracee to check for
+ * @ignore_state: don't check whether @child is currently %TASK_TRACED
+ *
+ * Check whether @child is being ptraced by %current and ready for further
+ * ptrace operations.  If @ignore_state is %false, @child also should be in
+ * %TASK_TRACED state and on return the child is guaranteed to be traced
+ * and not executing.  If @ignore_state is %true, @child can be in any
+ * state.
+ *
+ * CONTEXT:
+ * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ *
+ * RETURNS:
+ * 0 on success, -ESRCH if %child is not ready.
  */
-int ptrace_check_attach(struct task_struct *child, int kill)
+int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
 	int ret = -ESRCH;
 
@@ -119,13 +133,13 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 		 */
 		spin_lock_irq(&child->sighand->siglock);
 		WARN_ON_ONCE(task_is_stopped(child));
-		if (task_is_traced(child) || kill)
+		if (task_is_traced(child) || ignore_state)
 			ret = 0;
 		spin_unlock_irq(&child->sighand->siglock);
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !kill)
+	if (!ret && !ignore_state)
 		ret = wait_task_inactive(child, TASK_TRACED) ? 0 : -ESRCH;
 
 	/* All systems go.. */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/19] ptrace: relocate set_current_state(TASK_TRACED) in ptrace_stop()
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
  2011-05-24 18:37 ` [PATCH 01/19] job control: rename signal->group_stop and flags to jobctl and rearrange flags Tejun Heo
  2011-05-24 18:37 ` [PATCH 02/19] ptrace: ptrace_check_attach(): rename @kill to @ignore_state and add comments Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 04/19] job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending() Tejun Heo
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

In ptrace_stop(), after arch hook is done, the task state and jobctl
bits are updated while holding siglock.  The ordering requirement
there is that TASK_TRACED is set before JOBCTL_TRAPPING is cleared to
prevent ptracer waiting on TRAPPING doesn't end up waking up TRACED is
actually set and sees TASK_RUNNING in wait(2).

Move set_current_state(TASK_TRACED) to the top of the block and
reorganize comments.  This makes the ordering more obvious
(TASK_TRACED before other updates) and helps future updates to group
stop participation.

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/signal.c |   28 +++++++++++++---------------
 1 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 42d9bbd..eef44fd 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1733,6 +1733,18 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
+	 * We're committing to trapping.  TRACED should be visible before
+	 * TRAPPING is cleared; otherwise, the tracer might fail do_wait().
+	 * Also, transition to TRACED and updates to ->jobctl should be
+	 * atomic with respect to siglock and should be done after the arch
+	 * hook as siglock is released and regrabbed across it.
+	 */
+	set_current_state(TASK_TRACED);
+
+	current->last_siginfo = info;
+	current->exit_code = exit_code;
+
+	/*
 	 * If @why is CLD_STOPPED, we're trapping to participate in a group
 	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
 	 * while siglock was released for the arch hook, PENDING could be
@@ -1742,21 +1754,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
 		gstop_done = task_participate_group_stop(current);
 
-	current->last_siginfo = info;
-	current->exit_code = exit_code;
-
-	/*
-	 * TRACED should be visible before TRAPPING is cleared; otherwise,
-	 * the tracer might fail do_wait().
-	 */
-	set_current_state(TASK_TRACED);
-
-	/*
-	 * We're committing to trapping.  Clearing JOBCTL_TRAPPING and
-	 * transition to TASK_TRACED should be atomic with respect to
-	 * siglock.  This should be done after the arch hook as siglock is
-	 * released and regrabbed across it.
-	 */
+	/* entering a trap, clear TRAPPING */
 	task_clear_jobctl_trapping(current);
 
 	spin_unlock_irq(&current->sighand->siglock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/19] job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending()
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (2 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 03/19] ptrace: relocate set_current_state(TASK_TRACED) in ptrace_stop() Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 05/19] job control: make task_clear_jobctl_pending() clear TRAPPING automatically Tejun Heo
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

This patch introduces JOBCTL_PENDING_MASK and replaces
task_clear_jobctl_stop_pending() with task_clear_jobctl_pending()
which takes an extra @mask argument.

JOBCTL_PENDING_MASK is currently equal to JOBCTL_STOP_PENDING but
future patches will add more bits.  recalc_sigpending_tsk() is updated
to use JOBCTL_PENDING_MASK instead.

task_clear_jobctl_pending() takes @mask which in subset of
JOBCTL_PENDING_MASK and clears the relevant jobctl bits.  If
JOBCTL_STOP_PENDING is set, other STOP bits are cleared together.  All
task_clear_jobctl_stop_pending() users are updated to call
task_clear_jobctl_pending() with JOBCTL_STOP_PENDING which is
functionally identical to task_clear_jobctl_stop_pending().

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/exec.c             |    2 +-
 include/linux/sched.h |    5 ++++-
 kernel/signal.c       |   27 +++++++++++++++++----------
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 223e3b2..5b83ba9 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1735,7 +1735,7 @@ static int zap_process(struct task_struct *start, int exit_code)
 
 	t = start;
 	do {
-		task_clear_jobctl_stop_pending(t);
+		task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 		if (t != current && t->mm) {
 			sigaddset(&t->pending.signal, SIGKILL);
 			signal_wake_up(t, 1);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 01178e6..e0ce64c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1801,7 +1801,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define JOBCTL_STOP_CONSUME	(1 << 18) /* consume group stop count */
 #define JOBCTL_TRAPPING		(1 << 21) /* switching to TRACED */
 
-extern void task_clear_jobctl_stop_pending(struct task_struct *task);
+#define JOBCTL_PENDING_MASK	JOBCTL_STOP_PENDING
+
+extern void task_clear_jobctl_pending(struct task_struct *task,
+				      unsigned int mask);
 
 #ifdef CONFIG_PREEMPT_RCU
 
diff --git a/kernel/signal.c b/kernel/signal.c
index eef44fd..8033d86 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -124,7 +124,7 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)
 
 static int recalc_sigpending_tsk(struct task_struct *t)
 {
-	if ((t->jobctl & JOBCTL_STOP_PENDING) ||
+	if ((t->jobctl & JOBCTL_PENDING_MASK) ||
 	    PENDING(&t->pending, &t->blocked) ||
 	    PENDING(&t->signal->shared_pending, &t->blocked)) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
@@ -245,18 +245,25 @@ static void task_clear_jobctl_trapping(struct task_struct *task)
 }
 
 /**
- * task_clear_jobctl_stop_pending - clear pending group stop
+ * task_clear_jobctl_pending - clear jobctl pending bits
  * @task: target task
+ * @mask: pending bits to clear
  *
- * Clear group stop states for @task.
+ * Clear @mask from @task->jobctl.  @mask must be subset of
+ * %JOBCTL_PENDING_MASK.  If %JOBCTL_STOP_PENDING is being cleared, other
+ * STOP bits are cleared together.
  *
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
-void task_clear_jobctl_stop_pending(struct task_struct *task)
+void task_clear_jobctl_pending(struct task_struct *task, unsigned int mask)
 {
-	task->jobctl &= ~(JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME |
-			  JOBCTL_STOP_DEQUEUED);
+	BUG_ON(mask & ~JOBCTL_PENDING_MASK);
+
+	if (mask & JOBCTL_STOP_PENDING)
+		mask |= JOBCTL_STOP_CONSUME | JOBCTL_STOP_DEQUEUED;
+
+	task->jobctl &= ~mask;
 }
 
 /**
@@ -282,7 +289,7 @@ static bool task_participate_group_stop(struct task_struct *task)
 
 	WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING));
 
-	task_clear_jobctl_stop_pending(task);
+	task_clear_jobctl_pending(task, JOBCTL_STOP_PENDING);
 
 	if (!consume)
 		return false;
@@ -810,7 +817,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending);
 		t = p;
 		do {
-			task_clear_jobctl_stop_pending(t);
+			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
 			wake_up_state(t, __TASK_STOPPED);
 		} while_each_thread(p, t);
@@ -926,7 +933,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 			signal->group_stop_count = 0;
 			t = p;
 			do {
-				task_clear_jobctl_stop_pending(t);
+				task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 				sigaddset(&t->pending.signal, SIGKILL);
 				signal_wake_up(t, 1);
 			} while_each_thread(p, t);
@@ -1161,7 +1168,7 @@ int zap_other_threads(struct task_struct *p)
 	p->signal->group_stop_count = 0;
 
 	while_each_thread(p, t) {
-		task_clear_jobctl_stop_pending(t);
+		task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 		count++;
 
 		/* Don't bother with already dead threads */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/19] job control: make task_clear_jobctl_pending() clear TRAPPING automatically
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (3 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 04/19] job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending() Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 06/19] job control: introduce task_set_jobctl_pending() Tejun Heo
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

JOBCTL_TRAPPING indicates that ptracer is waiting for tracee to
(re)transit into TRACED.  task_clear_jobctl_pending() must be called
when either tracee enters TRACED or the transition is cancelled for
some reason.  The former is achieved by explicitly calling
task_clear_jobctl_pending() in ptrace_stop() and the latter by calling
it at the end of do_signal_stop().

Calling task_clear_jobctl_trapping() at the end of do_signal_stop()
limits the scope TRAPPING can be used and is fragile in that seemingly
unrelated changes to tracee's control flow can lead to stuck TRAPPING.

We already have task_clear_jobctl_pending() calls on those cancelling
events to clear JOBCTL_STOP_PENDING.  Cancellations can be handled by
making those call sites use JOBCTL_PENDING_MASK instead and updating
task_clear_jobctl_pending() such that task_clear_jobctl_trapping() is
called automatically if no stop/trap is pending.

This patch makes the above changes and removes the fallback
task_clear_jobctl_trapping() call from do_signal_stop().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/exec.c       |    2 +-
 kernel/signal.c |   13 ++++++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 5b83ba9..eaff88c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1735,7 +1735,7 @@ static int zap_process(struct task_struct *start, int exit_code)
 
 	t = start;
 	do {
-		task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
+		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
 		if (t != current && t->mm) {
 			sigaddset(&t->pending.signal, SIGKILL);
 			signal_wake_up(t, 1);
diff --git a/kernel/signal.c b/kernel/signal.c
index 8033d86..16826c4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -253,6 +253,9 @@ static void task_clear_jobctl_trapping(struct task_struct *task)
  * %JOBCTL_PENDING_MASK.  If %JOBCTL_STOP_PENDING is being cleared, other
  * STOP bits are cleared together.
  *
+ * If clearing of @mask leaves no stop or trap pending, this function calls
+ * task_clear_jobctl_trapping().
+ *
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
@@ -264,6 +267,9 @@ void task_clear_jobctl_pending(struct task_struct *task, unsigned int mask)
 		mask |= JOBCTL_STOP_CONSUME | JOBCTL_STOP_DEQUEUED;
 
 	task->jobctl &= ~mask;
+
+	if (!(task->jobctl & JOBCTL_PENDING_MASK))
+		task_clear_jobctl_trapping(task);
 }
 
 /**
@@ -933,7 +939,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
 			signal->group_stop_count = 0;
 			t = p;
 			do {
-				task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
+				task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
 				sigaddset(&t->pending.signal, SIGKILL);
 				signal_wake_up(t, 1);
 			} while_each_thread(p, t);
@@ -1168,7 +1174,7 @@ int zap_other_threads(struct task_struct *p)
 	p->signal->group_stop_count = 0;
 
 	while_each_thread(p, t) {
-		task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
+		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
 		count++;
 
 		/* Don't bother with already dead threads */
@@ -1964,9 +1970,6 @@ retry:
 		goto retry;
 	}
 
-	/* PTRACE_ATTACH might have raced with task killing, clear trapping */
-	task_clear_jobctl_trapping(current);
-
 	spin_unlock_irq(&current->sighand->siglock);
 
 	tracehook_finish_jctl();
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/19] job control: introduce task_set_jobctl_pending()
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (4 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 05/19] job control: make task_clear_jobctl_pending() clear TRAPPING automatically Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit Tejun Heo
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
pending bits too.  Setting pending conditions on a dying task may make
the task unkillable.  Currently, each setting site is responsible for
checking for the condition but with to-be-added job control traps this
becomes too fragile.

This patch adds task_set_jobctl_pending() which should be used when
setting task->jobctl bits to schedule a stop or trap.  The function
performs the followings to ease setting pending bits.

* Sanity checks.

* If fatal signal is pending or PF_EXITING is set, no bit is set.

* STOP_SIGMASK is automatically cleared if new value is being set.

do_signal_stop() and ptrace_attach() are updated to use
task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
The surrounding structures around setting are changed to fit
task_set_jobctl_pending() better but there should be no userland
visible behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h |    2 ++
 kernel/ptrace.c       |    5 +++--
 kernel/signal.c       |   46 ++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e0ce64c..8519614 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1803,6 +1803,8 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 
 #define JOBCTL_PENDING_MASK	JOBCTL_STOP_PENDING
 
+extern bool task_set_jobctl_pending(struct task_struct *task,
+				    unsigned int mask);
 extern void task_clear_jobctl_pending(struct task_struct *task,
 				      unsigned int mask);
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index abf6383..b44dc43 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -257,8 +257,9 @@ static int ptrace_attach(struct task_struct *task)
 	 * The following task_is_stopped() test is safe as both transitions
 	 * in and out of STOPPED are protected by siglock.
 	 */
-	if (task_is_stopped(task)) {
-		task->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;
+	if (task_is_stopped(task) &&
+	    task_set_jobctl_pending(task,
+				    JOBCTL_STOP_PENDING | JOBCTL_TRAPPING)) {
 		signal_wake_up(task, 1);
 		wait_trap = true;
 	}
diff --git a/kernel/signal.c b/kernel/signal.c
index 16826c4..17feedf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -224,6 +224,39 @@ static inline void print_dropped_signal(int sig)
 }
 
 /**
+ * task_set_jobctl_pending - set jobctl pending bits
+ * @task: target task
+ * @mask: pending bits to set
+ *
+ * Clear @mask from @task->jobctl.  @mask must be subset of
+ * %JOBCTL_PENDING_MASK | %JOBCTL_STOP_CONSUME | %JOBCTL_STOP_SIGMASK |
+ * %JOBCTL_TRAPPING.  If stop signo is being set, the existing signo is
+ * cleared.  If @task is already being killed or exiting, this function
+ * becomes noop.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ *
+ * RETURNS:
+ * %true if @mask is set, %false if made noop because @task was dying.
+ */
+bool task_set_jobctl_pending(struct task_struct *task, unsigned int mask)
+{
+	BUG_ON(mask & ~(JOBCTL_PENDING_MASK | JOBCTL_STOP_CONSUME |
+			JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
+	BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
+
+	if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING)))
+		return false;
+
+	if (mask & JOBCTL_STOP_SIGMASK)
+		task->jobctl &= ~JOBCTL_STOP_SIGMASK;
+
+	task->jobctl |= mask;
+	return true;
+}
+
+/**
  * task_clear_jobctl_trapping - clear jobctl trapping bit
  * @task: target task
  *
@@ -1902,19 +1935,20 @@ static int do_signal_stop(int signr)
 		else
 			WARN_ON_ONCE(!task_ptrace(current));
 
-		current->jobctl &= ~JOBCTL_STOP_SIGMASK;
-		current->jobctl |= signr | gstop;
-		sig->group_stop_count = 1;
+		sig->group_stop_count = 0;
+
+		if (task_set_jobctl_pending(current, signr | gstop))
+			sig->group_stop_count++;
+
 		for (t = next_thread(current); t != current;
 		     t = next_thread(t)) {
-			t->jobctl &= ~JOBCTL_STOP_SIGMASK;
 			/*
 			 * Setting state to TASK_STOPPED for a group
 			 * stop is always done with the siglock held,
 			 * so this check has no races.
 			 */
-			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
-				t->jobctl |= signr | gstop;
+			if (!task_is_stopped(t) &&
+			    task_set_jobctl_pending(t, signr | gstop)) {
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
 			}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (5 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 06/19] job control: introduce task_set_jobctl_pending() Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 19:03   ` Linus Torvalds
  2011-05-24 18:37 ` [PATCH 08/19] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach() Tejun Heo
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

ptracer->signal->wait_chldexit was used to wait for TRAPPING; however,
->wait_chldexit was already complicated with waker-side filtering
without adding TRAPPING wait on top of it.  Also, it unnecessarily
made TRAPPING clearing depend on the current ptrace relationship - if
the ptracee is detached, wakeup is lost.

There is no reason to use signal->wait_chldexit here.  We're just
waiting for JOBCTL_TRAPPING bit to clear and given the relatively
infrequent use of ptrace, bit_waitqueue can serve it perfectly.

This patch makes JOBCTL_TRAPPING wait use bit_waitqueue instead of
signal->wait_chldexit.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/ptrace.c |   10 ++++++++--
 kernel/signal.c |    3 +--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b44dc43..3be5d1b 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -25,6 +25,12 @@
 #include <linux/hw_breakpoint.h>
 
 
+static int ptrace_trapping_sleep_fn(void *flags)
+{
+	schedule();
+	return 0;
+}
+
 /*
  * ptrace a task: make the debugger its new parent and
  * move it to the ptrace list.
@@ -273,8 +279,8 @@ unlock_creds:
 	mutex_unlock(&task->signal->cred_guard_mutex);
 out:
 	if (wait_trap)
-		wait_event(current->signal->wait_chldexit,
-			   !(task->jobctl & JOBCTL_TRAPPING));
+		wait_on_bit(&task->jobctl, ilog2(JOBCTL_TRAPPING),
+			    ptrace_trapping_sleep_fn, TASK_UNINTERRUPTIBLE);
 	return retval;
 }
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 17feedf..106b47e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -272,8 +272,7 @@ static void task_clear_jobctl_trapping(struct task_struct *task)
 {
 	if (unlikely(task->jobctl & JOBCTL_TRAPPING)) {
 		task->jobctl &= ~JOBCTL_TRAPPING;
-		__wake_up_sync_key(&task->parent->signal->wait_chldexit,
-				   TASK_UNINTERRUPTIBLE, 1, task);
+		wake_up_bit(&task->jobctl, ilog2(JOBCTL_TRAPPING));
 	}
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/19] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (6 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 09/19] ptrace: make TRAPPING wait interruptible Tejun Heo
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

Currently, JOBCTL_TRAPPING is used by PTRACE_ATTACH and SEIZE to hide
TASK_STOPPED -> TRACED transition from ptracer.  If tracee is in group
stop, TRAPPING is set, tracee is kicked and tracer waits for the
transition to complete before completing attach.  This prevents tracer
from seeing tracee during transition.

The transition is visible only through wait(2) and following ptrace(2)
requests.  Without TRAPPING, WNOHANG which should succeed right after
attach (when tracer knows tracee was stopped) might fail and likewise
for the following ptrace requests.

TRAPPING will also be used to implement ptrace notification re-traps,
which can be initiated by tasks other than tracer.  To allow this,
this patch moves TRAPPING wait from attach completion path to
operations which are actually affected by the transition - wait(2) and
following ptrace(2) requests.

As reliably checking and modifying TASK_STOPPED/TRACED transition
together with JOBCTL_TRAPPING require siglock and both ptrace and wait
paths are holding tasklist_lock and siglock where TRAPPING check is
needed, ptrace_wait_trapping() assumes both locks to be held on entry
and releases them if it actually had to wait for TRAPPING.

Both wait and ptrace paths are updated to retry the operation after
TRAPPING wait.  Note that wait_task_stopped() now always grabs siglock
for ptrace waits.  This can be avoided with "task_stopped_code() ->
rmb() -> TRAPPING -> rmb() -> task_stopped_code()" sequence but given
that ptrace isn't particularly sensitive to performance or
scalability, choosing simpler implementation seems better.

Both ptrace(2) and wait(2) use -ERESTART* to retry after waiting for
TRAPPING.  This simplifies the implementation and will be useful when
TRAPPING sleep is converted to be interruptible.

Note that, after this change, PTRACE_ATTACH may return before the
transition completes and the ptracer might see the tracee in transient
TASK_RUNNING state via /proc/PID/stat; however, wait(2) and the
following ptrace requests would behave correctly regardless.  This is
userland visible behavior change.

-v2: wait_task_stopped() now returns -ERESTARTSYS instead of
     -ERESTARTNOINTR if !WNOHANG, so that retry follows SA_RESTART.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/ptrace.h |    1 +
 kernel/exit.c          |   27 ++++++++++++++++++-
 kernel/ptrace.c        |   65 ++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index e93ef1a..bde0be4 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -105,6 +105,7 @@ extern long arch_ptrace(struct task_struct *child, long request,
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
 extern int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long dst, int len);
 extern void ptrace_disable(struct task_struct *);
+extern bool ptrace_wait_trapping(struct task_struct *child);
 extern int ptrace_check_attach(struct task_struct *task, bool ignore_state);
 extern int ptrace_request(struct task_struct *child, long request,
 			  unsigned long addr, unsigned long data);
diff --git a/kernel/exit.c b/kernel/exit.c
index 20a4064..aecba55 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1409,15 +1409,38 @@ static int wait_task_stopped(struct wait_opts *wo,
 	if (!ptrace && !(wo->wo_flags & WUNTRACED))
 		return 0;
 
-	if (!task_stopped_code(p, ptrace))
+	/*
+	 * For ptrace waits, we can't reliably check whether wait condition
+	 * exists without grabbing siglock due to JOBCTL_TRAPPING
+	 * transitions.  A task might be temporarily in TASK_RUNNING while
+	 * trapping which should be transparent to the ptracer.
+	 *
+	 * Note that we can avoid unconditionally grabbing siglock by
+	 * wrapping TRAPPING test with two rmb's; however, let's stick with
+	 * simpler implementation for now.
+	 */
+	if (!ptrace && !(p->signal->flags & SIGNAL_STOP_STOPPED))
 		return 0;
 
 	exit_code = 0;
 	spin_lock_irq(&p->sighand->siglock);
 
 	p_code = task_stopped_code(p, ptrace);
-	if (unlikely(!p_code))
+	if (unlikely(!p_code)) {
+		/*
+		 * If trapping, wait for it and retry.  If WNOHANG, -EINTR
+		 * shouldn't happen and syscall must be retried; otherwise,
+		 * follow SA_RESTART.
+		 */
+		if (ptrace && ptrace_wait_trapping(p)) {
+			restart_syscall();
+			if (wo->wo_flags & WNOHANG)
+				return -ERESTARTNOINTR;
+			else
+				return -ERESTARTSYS;
+		}
 		goto unlock_sig;
+	}
 
 	exit_code = *p_code;
 	if (!exit_code)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 3be5d1b..14aedcf 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -31,6 +31,47 @@ static int ptrace_trapping_sleep_fn(void *flags)
 	return 0;
 }
 
+/**
+ * ptrace_wait_trapping - wait ptracee to finish %TASK_TRACED/STOPPED transition
+ * @child: child to wait for
+ *
+ * There are cases where ptracer needs to ask the ptracee to [re]enter
+ * %TASK_TRACED which involves the tracee going through %TASK_RUNNING
+ * briefly, which could affect operation of ptrace(2) and wait(2).
+ *
+ * %JOBCTL_TRAPPING is used to hide such transitions from the ptracer.
+ * It's set when such transition is initiated by the ptracer and cleared on
+ * completion.  Operations which may be affected should call this function
+ * to make sure no transition is in progress before proceeding.
+ *
+ * This function checks whether @child is TRAPPING and, if so, waits for
+ * the transition to complete.
+ *
+ * CONTEXT:
+ * read_lock(&tasklist_lock) and spin_lock_irq(&child->sighand->siglock).
+ * On %true return, both locks are released and the function might have
+ * slept.
+ *
+ * RETURNS:
+ * %false if @child wasn't trapping and nothing happened.  %true if waited
+ * for trapping transition and released both locks.
+ */
+bool ptrace_wait_trapping(struct task_struct *child)
+	__releases(&child->sighand->siglock)
+	__releases(&tasklist_lock)
+{
+	if (likely(!(child->jobctl & JOBCTL_TRAPPING)))
+		return false;
+
+	spin_unlock_irq(&child->sighand->siglock);
+	get_task_struct(child);
+	read_unlock(&tasklist_lock);
+	wait_on_bit(&child->jobctl, ilog2(JOBCTL_TRAPPING),
+		    ptrace_trapping_sleep_fn, TASK_UNINTERRUPTIBLE);
+	put_task_struct(child);
+	return true;
+}
+
 /*
  * ptrace a task: make the debugger its new parent and
  * move it to the ptrace list.
@@ -141,6 +182,8 @@ int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 		WARN_ON_ONCE(task_is_stopped(child));
 		if (task_is_traced(child) || ignore_state)
 			ret = 0;
+		else if (ptrace_wait_trapping(child))
+			return restart_syscall();
 		spin_unlock_irq(&child->sighand->siglock);
 	}
 	read_unlock(&tasklist_lock);
@@ -204,7 +247,6 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 
 static int ptrace_attach(struct task_struct *task)
 {
-	bool wait_trap = false;
 	int retval;
 
 	audit_ptrace(task);
@@ -250,25 +292,21 @@ static int ptrace_attach(struct task_struct *task)
 	 * If the task is already STOPPED, set JOBCTL_STOP_PENDING and
 	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
 	 * will be cleared if the child completes the transition or any
-	 * event which clears the group stop states happens.  We'll wait
-	 * for the transition to complete before returning from this
-	 * function.
+	 * event which clears the group stop states happens.
 	 *
-	 * This hides STOPPED -> RUNNING -> TRACED transition from the
-	 * attaching thread but a different thread in the same group can
-	 * still observe the transient RUNNING state.  IOW, if another
-	 * thread's WNOHANG wait(2) on the stopped tracee races against
-	 * ATTACH, the wait(2) may fail due to the transient RUNNING.
+	 * This is to hide STOPPED -> RUNNING -> TRACED transition from
+	 * wait(2) and ptrace(2).  If called before the transition is
+	 * complete, both will wait for TRAPPING to be cleared and retry,
+	 * thus hiding the transition from userland; however, the transient
+	 * RUNNING state is still visible through /proc.
 	 *
 	 * The following task_is_stopped() test is safe as both transitions
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
 	    task_set_jobctl_pending(task,
-				    JOBCTL_STOP_PENDING | JOBCTL_TRAPPING)) {
+				    JOBCTL_STOP_PENDING | JOBCTL_TRAPPING))
 		signal_wake_up(task, 1);
-		wait_trap = true;
-	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -278,9 +316,6 @@ unlock_tasklist:
 unlock_creds:
 	mutex_unlock(&task->signal->cred_guard_mutex);
 out:
-	if (wait_trap)
-		wait_on_bit(&task->jobctl, ilog2(JOBCTL_TRAPPING),
-			    ptrace_trapping_sleep_fn, TASK_UNINTERRUPTIBLE);
 	return retval;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/19] ptrace: make TRAPPING wait interruptible
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (7 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 08/19] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach() Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 10/19] signal: remove three noop tracehooks Tejun Heo
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

With JOBCTL_TRAPPING waits moved to wait_task_stopped() and
ptrace_check_attach(), and both using full syscall retries after wait,
TRAPPING wait can switch to interruptible sleeps.

As all transitions are interlocked and all cancellation events
(supposedly) clear TRAPPING, this doesn't change the actual behavior
but it makes the TRAPPING wait mechanism much more forgiving when
something goes wrong and allows using TRAPPING waits across freezing
points.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/ptrace.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 14aedcf..71e1034 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -28,7 +28,7 @@
 static int ptrace_trapping_sleep_fn(void *flags)
 {
 	schedule();
-	return 0;
+	return signal_pending(current);
 }
 
 /**
@@ -45,7 +45,11 @@ static int ptrace_trapping_sleep_fn(void *flags)
  * to make sure no transition is in progress before proceeding.
  *
  * This function checks whether @child is TRAPPING and, if so, waits for
- * the transition to complete.
+ * the transition to complete.  Interruptible sleep is used for waiting and
+ * %true will be returned regardless of why it is woken up.  On %true
+ * return, callers should ensure that the whole operation is restarted
+ * using the syscall restart mechanism so that operations like freezing or
+ * killing don't get blocked by TRAPPING waits.
  *
  * CONTEXT:
  * read_lock(&tasklist_lock) and spin_lock_irq(&child->sighand->siglock).
@@ -67,7 +71,7 @@ bool ptrace_wait_trapping(struct task_struct *child)
 	get_task_struct(child);
 	read_unlock(&tasklist_lock);
 	wait_on_bit(&child->jobctl, ilog2(JOBCTL_TRAPPING),
-		    ptrace_trapping_sleep_fn, TASK_UNINTERRUPTIBLE);
+		    ptrace_trapping_sleep_fn, TASK_INTERRUPTIBLE);
 	put_task_struct(child);
 	return true;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/19] signal: remove three noop tracehooks
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (8 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 09/19] ptrace: make TRAPPING wait interruptible Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 11/19] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap Tejun Heo
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

Remove the following three noop tracehooks in signals.c.

* tracehook_force_sigpending()
* tracehook_get_signal()
* tracehook_finish_jctl()

The code area is about to be updated and these hooks don't do anything
other than obfuscating the logic.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/tracehook.h |   52 ---------------------------------------------
 kernel/signal.c           |   44 ++++++++++++--------------------------
 2 files changed, 14 insertions(+), 82 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index e95f523..15745cd 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -425,58 +425,6 @@ static inline int tracehook_consider_fatal_signal(struct task_struct *task,
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
 
-/**
- * tracehook_force_sigpending - let tracing force signal_pending(current) on
- *
- * Called when recomputing our signal_pending() flag.  Return nonzero
- * to force the signal_pending() flag on, so that tracehook_get_signal()
- * will be called before the next return to user mode.
- *
- * Called with @current->sighand->siglock held.
- */
-static inline int tracehook_force_sigpending(void)
-{
-	return 0;
-}
-
-/**
- * tracehook_get_signal - deliver synthetic signal to traced task
- * @task:		@current
- * @regs:		task_pt_regs(@current)
- * @info:		details of synthetic signal
- * @return_ka:		sigaction for synthetic signal
- *
- * Return zero to check for a real pending signal normally.
- * Return -1 after releasing the siglock to repeat the check.
- * Return a signal number to induce an artificial signal delivery,
- * setting *@info and *@return_ka to specify its details and behavior.
- *
- * The @return_ka->sa_handler value controls the disposition of the
- * signal, no matter the signal number.  For %SIG_DFL, the return value
- * is a representative signal to indicate the behavior (e.g. %SIGTERM
- * for death, %SIGQUIT for core dump, %SIGSTOP for job control stop,
- * %SIGTSTP for stop unless in an orphaned pgrp), but the signal number
- * reported will be @info->si_signo instead.
- *
- * Called with @task->sighand->siglock held, before dequeuing pending signals.
- */
-static inline int tracehook_get_signal(struct task_struct *task,
-				       struct pt_regs *regs,
-				       siginfo_t *info,
-				       struct k_sigaction *return_ka)
-{
-	return 0;
-}
-
-/**
- * tracehook_finish_jctl - report about return from job control stop
- *
- * This is called by do_signal_stop() after wakeup.
- */
-static inline void tracehook_finish_jctl(void)
-{
-}
-
 #define DEATH_REAP			-1
 #define DEATH_DELAYED_GROUP_LEADER	-2
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 106b47e..85df25e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -150,9 +150,7 @@ void recalc_sigpending_and_wake(struct task_struct *t)
 
 void recalc_sigpending(void)
 {
-	if (unlikely(tracehook_force_sigpending()))
-		set_thread_flag(TIF_SIGPENDING);
-	else if (!recalc_sigpending_tsk(current) && !freezing(current))
+	if (!recalc_sigpending_tsk(current) && !freezing(current))
 		clear_thread_flag(TIF_SIGPENDING);
 
 }
@@ -2005,8 +2003,6 @@ retry:
 
 	spin_unlock_irq(&current->sighand->siglock);
 
-	tracehook_finish_jctl();
-
 	return 1;
 }
 
@@ -2109,37 +2105,25 @@ relock:
 
 	for (;;) {
 		struct k_sigaction *ka;
-		/*
-		 * Tracing can induce an artificial signal and choose sigaction.
-		 * The return value in @signr determines the default action,
-		 * but @info->si_signo is the signal number we will report.
-		 */
-		signr = tracehook_get_signal(current, regs, info, return_ka);
-		if (unlikely(signr < 0))
+
+		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
+		    do_signal_stop(0))
 			goto relock;
-		if (unlikely(signr != 0))
-			ka = return_ka;
-		else {
-			if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
-			    do_signal_stop(0))
-				goto relock;
 
-			signr = dequeue_signal(current, &current->blocked,
-					       info);
+		signr = dequeue_signal(current, &current->blocked, info);
 
-			if (!signr)
-				break; /* will return 0 */
+		if (!signr)
+			break; /* will return 0 */
 
-			if (signr != SIGKILL) {
-				signr = ptrace_signal(signr, info,
-						      regs, cookie);
-				if (!signr)
-					continue;
-			}
-
-			ka = &sighand->action[signr-1];
+		if (signr != SIGKILL) {
+			signr = ptrace_signal(signr, info,
+					      regs, cookie);
+			if (!signr)
+				continue;
 		}
 
+		ka = &sighand->action[signr-1];
+
 		/* Trace actually delivered signals. */
 		trace_signal_deliver(signr, info, ka);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/19] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (9 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 10/19] signal: remove three noop tracehooks Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 12/19] ptrace: implement PTRACE_SEIZE Tejun Heo
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

do_signal_stop() implemented both normal group stop and trap for group
stop while ptraced.  This approach has been enough but scheduled
changes require trap mechanism which can be used in more generic
manner and using group stop trap for generic trap site simplifies both
userland visible interface and implementation.

This patch adds a new jobctl flag - JOBCTL_TRAP_STOP.  When set, it
triggers a trap site, which behaves like group stop trap, in
get_signal_to_deliver() before checking for pending signals.  While
ptraced, do_signal_stop() doesn't stop itself.  It initiates group
stop if requested and schedules JOBCTL_TRAP_STOP and returns, which
makes its caller - get_signal_to_deliver() - to relock, check and
enter the trap.

Although this adds an unlock-relocking between checking of
JOBCTL_STOP_PENDING and actually trapping for STOP, this doesn't
affect correctness.  ptrace_stop() already had conditional
unlock-relocking depending on arch and, if SIGCONT is generated
inbetween, it's ignored as if it were received after the task entered
TASK_TRACED.  The extra unlock-relocking follows the same rule and the
race window will be properly handled by notification mechanism which
will be added later.

ptrace_attach() is updated to use JOBCTL_TRAP_STOP instead of
JOBCTL_STOP_PENDING and __ptrace_unlink() to clear all pending trap
bits and TRAPPING so that TRAP_STOP and future trap bits don't linger
after detach.

While at it, add proper function comment to do_signal_stop() and make
it return bool.

-v2: __ptrace_unlink() updated to clear JOBCTL_TRAP_MASK and TRAPPING
     instead of JOBCTL_PENDING_MASK.  This avoids accidentally
     clearing JOBCTL_STOP_CONSUME.  Spotted by Oleg.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h |    5 ++-
 kernel/ptrace.c       |   12 +++++--
 kernel/signal.c       |   82 +++++++++++++++++++++++++++++++------------------
 3 files changed, 65 insertions(+), 34 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8519614..9a0e1bc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1799,12 +1799,15 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define JOBCTL_STOP_DEQUEUED	(1 << 16) /* stop signal dequeued */
 #define JOBCTL_STOP_PENDING	(1 << 17) /* task should stop for group stop */
 #define JOBCTL_STOP_CONSUME	(1 << 18) /* consume group stop count */
+#define JOBCTL_TRAP_STOP	(1 << 19) /* trap for STOP */
 #define JOBCTL_TRAPPING		(1 << 21) /* switching to TRACED */
 
-#define JOBCTL_PENDING_MASK	JOBCTL_STOP_PENDING
+#define JOBCTL_TRAP_MASK	JOBCTL_TRAP_STOP
+#define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
 extern bool task_set_jobctl_pending(struct task_struct *task,
 				    unsigned int mask);
+extern void task_clear_jobctl_trapping(struct task_struct *task);
 extern void task_clear_jobctl_pending(struct task_struct *task,
 				      unsigned int mask);
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 71e1034..e75d335 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -128,6 +128,13 @@ void __ptrace_unlink(struct task_struct *child)
 	spin_lock(&child->sighand->siglock);
 
 	/*
+	 * Clear all pending traps and TRAPPING.  TRAPPING should be
+	 * cleared regardless of JOBCTL_STOP_PENDING.  Do it explicitly.
+	 */
+	task_clear_jobctl_pending(child, JOBCTL_TRAP_MASK);
+	task_clear_jobctl_trapping(child);
+
+	/*
 	 * Reinstate JOBCTL_STOP_PENDING if group stop is in effect and
 	 * @child isn't dead.
 	 */
@@ -293,7 +300,7 @@ static int ptrace_attach(struct task_struct *task)
 	spin_lock(&task->sighand->siglock);
 
 	/*
-	 * If the task is already STOPPED, set JOBCTL_STOP_PENDING and
+	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
 	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
 	 * will be cleared if the child completes the transition or any
 	 * event which clears the group stop states happens.
@@ -308,8 +315,7 @@ static int ptrace_attach(struct task_struct *task)
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task,
-				    JOBCTL_STOP_PENDING | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
 		signal_wake_up(task, 1);
 
 	spin_unlock(&task->sighand->siglock);
diff --git a/kernel/signal.c b/kernel/signal.c
index 85df25e..38e7d2f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -266,7 +266,7 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned int mask)
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
  */
-static void task_clear_jobctl_trapping(struct task_struct *task)
+void task_clear_jobctl_trapping(struct task_struct *task)
 {
 	if (unlikely(task->jobctl & JOBCTL_TRAPPING)) {
 		task->jobctl &= ~JOBCTL_TRAPPING;
@@ -1790,13 +1790,16 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	/*
 	 * If @why is CLD_STOPPED, we're trapping to participate in a group
 	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
-	 * while siglock was released for the arch hook, PENDING could be
-	 * clear now.  We act as if SIGCONT is received after TASK_TRACED
-	 * is entered - ignore it.
+	 * across siglock relocks since INTERRUPT was scheduled, PENDING
+	 * could be clear now.  We act as if SIGCONT is received after
+	 * TASK_TRACED is entered - ignore it.
 	 */
 	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
 		gstop_done = task_participate_group_stop(current);
 
+	/* any trap clears pending STOP trap */
+	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
+
 	/* entering a trap, clear TRAPPING */
 	task_clear_jobctl_trapping(current);
 
@@ -1888,13 +1891,30 @@ void ptrace_notify(int exit_code)
 	spin_unlock_irq(&current->sighand->siglock);
 }
 
-/*
- * This performs the stopping for SIGSTOP and other stop signals.
- * We have to stop all threads in the thread group.
- * Returns non-zero if we've actually stopped and released the siglock.
- * Returns zero if we didn't stop and still hold the siglock.
+/**
+ * do_signal_stop - handle group stop for SIGSTOP and other stop signals
+ * @signr: signr causing group stop if initiating
+ *
+ * If %JOBCTL_STOP_PENDING is not set yet, initiate group stop with @signr
+ * and participate in it.  If already set, participate in the existing
+ * group stop.  If participated in a group stop (and thus slept), %true is
+ * returned with siglock released.
+ *
+ * If ptraced, this function doesn't handle stop itself.  Instead,
+ * %JOBCTL_TRAP_STOP is scheduled and %true is returned with siglock
+ * released.  The caller must ensure that INTERRUPT trap handling takes
+ * places afterwards.
+ *
+ * CONTEXT:
+ * Must be called with @current->sighand->siglock held, which is released
+ * on %true return.
+ *
+ * RETURNS:
+ * %false if group stop is already cancelled and nothing happened.  %true
+ * if participated in group stop.
  */
-static int do_signal_stop(int signr)
+static bool do_signal_stop(int signr)
+	__releases(&current->sighand->siglock)
 {
 	struct signal_struct *sig = current->signal;
 
@@ -1907,7 +1927,7 @@ static int do_signal_stop(int signr)
 
 		if (!likely(current->jobctl & JOBCTL_STOP_DEQUEUED) ||
 		    unlikely(signal_group_exit(sig)))
-			return 0;
+			return false;
 		/*
 		 * There is no group stop already in progress.  We must
 		 * initiate one now.
@@ -1951,7 +1971,7 @@ static int do_signal_stop(int signr)
 			}
 		}
 	}
-retry:
+
 	if (likely(!task_ptrace(current))) {
 		int notify = 0;
 
@@ -1983,27 +2003,16 @@ retry:
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		schedule();
-
-		spin_lock_irq(&current->sighand->siglock);
 	} else {
-		ptrace_stop(current->jobctl & JOBCTL_STOP_SIGMASK,
-			    CLD_STOPPED, 0, NULL);
-		current->exit_code = 0;
-	}
-
-	/*
-	 * JOBCTL_STOP_PENDING could be set if another group stop has
-	 * started since being woken up or ptrace wants us to transit
-	 * between TASK_STOPPED and TRACED.  Retry group stop.
-	 */
-	if (current->jobctl & JOBCTL_STOP_PENDING) {
-		WARN_ON_ONCE(!(current->jobctl & JOBCTL_STOP_SIGMASK));
-		goto retry;
+		/*
+		 * While ptraced, group stop is handled by STOP trap.
+		 * Schedule it and let the caller deal with it.
+		 */
+		task_set_jobctl_pending(current, JOBCTL_TRAP_STOP);
+		spin_unlock_irq(&current->sighand->siglock);
 	}
 
-	spin_unlock_irq(&current->sighand->siglock);
-
-	return 1;
+	return true;
 }
 
 static int ptrace_signal(int signr, siginfo_t *info,
@@ -2103,6 +2112,19 @@ relock:
 		goto relock;
 	}
 
+	/*
+	 * Take care of ptrace jobctl traps.  It currently is only used to
+	 * trap for group stop while ptraced.
+	 */
+	if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
+		signr = current->jobctl & JOBCTL_STOP_SIGMASK;
+		WARN_ON_ONCE(!signr);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
+		current->exit_code = 0;
+		spin_unlock_irq(&sighand->siglock);
+		goto relock;
+	}
+
 	for (;;) {
 		struct k_sigaction *ka;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/19] ptrace: implement PTRACE_SEIZE
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (10 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 11/19] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 13/19] ptrace: implement PTRACE_INTERRUPT Tejun Heo
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
effects on tracee signal and job control states.  This patch
implements a new ptrace request PTRACE_SEIZE which attaches and traps
tracee without affecting its signal and job control states.

The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
flags in @data.  Currently, the only defined flag is
PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
The changes will be implemented gradually and the DEVEL flag is to
prevent programs which expect full SEIZE behavior from using it before
all the behavior modifications are complete while allowing unit
testing.  The flag will be removed once SEIZE behaviors are completely
implemented.

* PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap.  After
  attaching tracee continues to run unless a trap condition occurs.

* PTRACE_SEIZE doesn't affect signal or group stop state.

* If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
  exit_code of (SIGTRAP | PTRACE_EVENT_STOP << 8) instead of the
  stopping signal number and returns usual trap siginfo on
  PTRACE_GETSIGINFO instead of NULL.

Note that there currently is no way to find out the stopping signal
number while seized.  This will be improved by future patches.

Seizing sets PT_SEIZED in ->ptrace of the tracee.  This flag will be
used to determine whether new SEIZE behaviors should be enabled.

Test program follows.

  #define PTRACE_SEIZE		0x4206
  #define PTRACE_SEIZE_DEVEL	0x80000000

  static const struct timespec ts100ms = { .tv_nsec = 100000000 };
  static const struct timespec ts1s = { .tv_sec = 1 };
  static const struct timespec ts3s = { .tv_sec = 3 };

  int main(int argc, char **argv)
  {
	  pid_t tracee;

	  tracee = fork();
	  if (tracee == 0) {
		  nanosleep(&ts100ms, NULL);
		  while (1) {
			  printf("tracee: alive\n");
			  nanosleep(&ts1s, NULL);
		  }
	  }

	  if (argc > 1)
		  kill(tracee, SIGSTOP);

	  nanosleep(&ts100ms, NULL);

	  ptrace(PTRACE_SEIZE, tracee, NULL,
		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
	  if (argc > 1) {
		  waitid(P_PID, tracee, NULL, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, NULL);
	  }
	  nanosleep(&ts3s, NULL);
	  printf("tracer: exiting\n");
	  return 0;
  }

When the above program is called w/o argument, tracee is seized while
running and remains running.  When tracer exits, tracee continues to
run and print out messages.

  # ./test-seize-simple
  tracee: alive
  tracee: alive
  tracee: alive
  tracer: exiting
  tracee: alive
  tracee: alive

When called with an argument, tracee is seized from stopped state and
continued, and returns to stopped state when tracer exits.

  # ./test-seize
  tracee: alive
  tracee: alive
  tracee: alive
  tracer: exiting
  # ps -el|grep test-seize
  1 T     0  4720     1  0  80   0 -   941 signal ttyS0    00:00:00 test-seize

-v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
     suggested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
---
 include/linux/ptrace.h |    7 +++++++
 kernel/ptrace.c        |   35 +++++++++++++++++++++++++++++------
 kernel/signal.c        |   32 ++++++++++++++++++++++++--------
 3 files changed, 60 insertions(+), 14 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index bde0be4..cfb6c97 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -47,6 +47,11 @@
 #define PTRACE_GETREGSET	0x4204
 #define PTRACE_SETREGSET	0x4205
 
+#define PTRACE_SEIZE		0x4206
+
+/* flags in @data for PTRACE_SEIZE */
+#define PTRACE_SEIZE_DEVEL	0x80000000 /* temp flag for development */
+
 /* options set using PTRACE_SETOPTIONS */
 #define PTRACE_O_TRACESYSGOOD	0x00000001
 #define PTRACE_O_TRACEFORK	0x00000002
@@ -65,6 +70,7 @@
 #define PTRACE_EVENT_EXEC	4
 #define PTRACE_EVENT_VFORK_DONE	5
 #define PTRACE_EVENT_EXIT	6
+#define PTRACE_EVENT_STOP	7
 
 #include <asm/ptrace.h>
 
@@ -77,6 +83,7 @@
  * flags.  When the a task is stopped the ptracer owns task->ptrace.
  */
 
+#define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
 #define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 #define PT_TRACESYSGOOD	0x00000004
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index e75d335..132857a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -256,10 +256,28 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 	return !err;
 }
 
-static int ptrace_attach(struct task_struct *task)
+static int ptrace_attach(struct task_struct *task, long request,
+			 unsigned long flags)
 {
+	bool seize = (request == PTRACE_SEIZE);
 	int retval;
 
+	/*
+	 * SEIZE will enable new ptrace behaviors which will be implemented
+	 * gradually.  SEIZE_DEVEL is used to prevent applications
+	 * expecting full SEIZE behaviors trapping on kernel commits which
+	 * are still in the process of implementing them.
+	 *
+	 * Only test programs for new ptrace behaviors being implemented
+	 * should set SEIZE_DEVEL.  If unset, SEIZE will fail with -EIO.
+	 *
+	 * Once SEIZE behaviors are completely implemented, this flag and
+	 * the following test will be removed.
+	 */
+	retval = -EIO;
+	if (seize && !(flags & PTRACE_SEIZE_DEVEL))
+		goto out;
+
 	audit_ptrace(task);
 
 	retval = -EPERM;
@@ -291,11 +309,16 @@ static int ptrace_attach(struct task_struct *task)
 		goto unlock_tasklist;
 
 	task->ptrace = PT_PTRACED;
+	if (seize)
+		task->ptrace |= PT_SEIZED;
 	if (task_ns_capable(task, CAP_SYS_PTRACE))
 		task->ptrace |= PT_PTRACE_CAP;
 
 	__ptrace_link(task, current);
-	send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+
+	/* SEIZE doesn't trap tracee on attach */
+	if (!seize)
+		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
 
 	spin_lock(&task->sighand->siglock);
 
@@ -827,8 +850,8 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		goto out;
 	}
 
-	if (request == PTRACE_ATTACH) {
-		ret = ptrace_attach(child);
+	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
+		ret = ptrace_attach(child, request, data);
 		/*
 		 * Some architectures need to do book-keeping after
 		 * a ptrace attach.
@@ -969,8 +992,8 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
 		goto out;
 	}
 
-	if (request == PTRACE_ATTACH) {
-		ret = ptrace_attach(child);
+	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
+		ret = ptrace_attach(child, request, data);
 		/*
 		 * Some architectures need to do book-keeping after
 		 * a ptrace attach.
diff --git a/kernel/signal.c b/kernel/signal.c
index 38e7d2f..16cd311 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1873,7 +1873,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	recalc_sigpending_tsk(current);
 }
 
-void ptrace_notify(int exit_code)
+static void ptrace_do_notify(int exit_code, int why)
 {
 	siginfo_t info;
 
@@ -1886,8 +1886,13 @@ void ptrace_notify(int exit_code)
 	info.si_uid = current_uid();
 
 	/* Let the debugger run.  */
+	ptrace_stop(exit_code, why, 1, &info);
+}
+
+void ptrace_notify(int exit_code)
+{
 	spin_lock_irq(&current->sighand->siglock);
-	ptrace_stop(exit_code, CLD_TRAPPED, 1, &info);
+	ptrace_do_notify(exit_code, CLD_TRAPPED);
 	spin_unlock_irq(&current->sighand->siglock);
 }
 
@@ -2113,14 +2118,25 @@ relock:
 	}
 
 	/*
-	 * Take care of ptrace jobctl traps.  It currently is only used to
-	 * trap for group stop while ptraced.
+	 * Take care of ptrace jobctl traps.
+	 *
+	 * When PT_SEIZED, it's used for both group stop and explicit
+	 * SEIZE/INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap
+	 * with accompanying siginfo.
+	 *
+	 * When !PT_SEIZED, it's used only for group stop trap with stop
+	 * signal number as exit_code and no siginfo.
 	 */
 	if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
-		signr = current->jobctl & JOBCTL_STOP_SIGMASK;
-		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
-		current->exit_code = 0;
+		if (current->ptrace & PT_SEIZED) {
+			ptrace_do_notify(SIGTRAP | PTRACE_EVENT_STOP << 8,
+					 CLD_STOPPED);
+		} else {
+			signr = current->jobctl & JOBCTL_STOP_SIGMASK;
+			WARN_ON_ONCE(!signr);
+			ptrace_stop(signr, CLD_STOPPED, 0, NULL);
+			current->exit_code = 0;
+		}
 		spin_unlock_irq(&sighand->siglock);
 		goto relock;
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/19] ptrace: implement PTRACE_INTERRUPT
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (11 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 12/19] ptrace: implement PTRACE_SEIZE Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 14/19] ptrace: restructure ptrace_getsiginfo() Tejun Heo
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

Currently, there's no way to trap a running ptracee short of sending a
signal which has various side effects.  This patch implements
PTRACE_INTERRUPT which traps ptracee without any signal or job control
related side effect.

The implementation is almost trivial.  It uses the group stop trap -
SIGTRAP | PTRACE_EVENT_STOP << 8.  A new trap flag
JOBCTL_TRAP_INTERRUPT is added, which is set on PTRACE_INTERRUPT and
cleared when any trap happens.  As INTERRUPT should be useable
regardless of the current state of tracee, task_is_traced() test in
ptrace_check_attach() is skipped for INTERRUPT.

PTRACE_INTERRUPT is available iff tracee is attached with
PTRACE_SEIZE.

Test program follows.

  #define PTRACE_SEIZE		0x4206
  #define PTRACE_INTERRUPT	0x4207

  #define PTRACE_SEIZE_DEVEL	0x80000000

  static const struct timespec ts100ms = { .tv_nsec = 100000000 };
  static const struct timespec ts1s = { .tv_sec = 1 };
  static const struct timespec ts3s = { .tv_sec = 3 };

  int main(int argc, char **argv)
  {
	  pid_t tracee;

	  tracee = fork();
	  if (tracee == 0) {
		  nanosleep(&ts100ms, NULL);
		  while (1) {
			  printf("tracee: alive pid=%d\n", getpid());
			  nanosleep(&ts1s, NULL);
		  }
	  }

	  if (argc > 1)
		  kill(tracee, SIGSTOP);

	  nanosleep(&ts100ms, NULL);

	  ptrace(PTRACE_SEIZE, tracee, NULL,
		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
	  if (argc > 1) {
		  waitid(P_PID, tracee, NULL, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, NULL);
	  }
	  nanosleep(&ts3s, NULL);

	  printf("tracer: INTERRUPT and DETACH\n");
	  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
	  waitid(P_PID, tracee, NULL, WSTOPPED);
	  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
	  nanosleep(&ts3s, NULL);

	  printf("tracer: exiting\n");
	  kill(tracee, SIGKILL);
	  return 0;
  }

When called without argument, tracee is seized from running state,
interrupted and then detached back to running state.

  # ./test-interrupt
  tracee: alive pid=4546
  tracee: alive pid=4546
  tracee: alive pid=4546
  tracer: INTERRUPT and DETACH
  tracee: alive pid=4546
  tracee: alive pid=4546
  tracee: alive pid=4546
  tracer: exiting

When called with argument, tracee is seized from stopped state,
continued, interrupted and then detached back to stopped state.

  # ./test-interrupt  1
  tracee: alive pid=4548
  tracee: alive pid=4548
  tracee: alive pid=4548
  tracer: INTERRUPT and DETACH
  tracer: exiting

Before PTRACE_INTERRUPT, once the tracee was running, there was no way
to trap tracee and do PTRACE_DETACH without causing side effect.

-v2: Updated to use task_set_jobctl_pending() so that it doesn't end
     up scheduling TRAP_STOP if child is dying which may make the
     child unkillable.  Spotted by Oleg.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/ptrace.h |    1 +
 kernel/ptrace.c        |   27 +++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index cfb6c97..8c45eb0 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -48,6 +48,7 @@
 #define PTRACE_SETREGSET	0x4205
 
 #define PTRACE_SEIZE		0x4206
+#define PTRACE_INTERRUPT	0x4207
 
 /* flags in @data for PTRACE_SEIZE */
 #define PTRACE_SEIZE_DEVEL	0x80000000 /* temp flag for development */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 132857a..3911567 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -704,6 +704,7 @@ int ptrace_request(struct task_struct *child, long request,
 	siginfo_t siginfo;
 	void __user *datavp = (void __user *) data;
 	unsigned long __user *datalp = datavp;
+	unsigned long flags;
 
 	switch (request) {
 	case PTRACE_PEEKTEXT:
@@ -736,6 +737,26 @@ int ptrace_request(struct task_struct *child, long request,
 			ret = ptrace_setsiginfo(child, &siginfo);
 		break;
 
+	case PTRACE_INTERRUPT:
+		/*
+		 * Stop tracee without any side-effect on signal or job
+		 * control.  At least one trap is guaranteed to happen
+		 * after this request.  If @child is already trapped, the
+		 * current trap is not disturbed and another trap will
+		 * happen after the current trap is ended with PTRACE_CONT.
+		 *
+		 * The actual trap might not be PTRACE_EVENT_STOP trap but
+		 * the pending condition is cleared regardless.
+		 */
+		if (likely(child->ptrace & PT_SEIZED) &&
+		    lock_task_sighand(child, &flags)) {
+			if (task_set_jobctl_pending(child, JOBCTL_TRAP_STOP))
+				signal_wake_up(child, 0);
+			unlock_task_sighand(child, &flags);
+			ret = 0;
+		}
+		break;
+
 	case PTRACE_DETACH:	 /* detach a process that was attached. */
 		ret = ptrace_detach(child, data);
 		break;
@@ -861,7 +882,8 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL);
+	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
+				  request == PTRACE_INTERRUPT);
 	if (ret < 0)
 		goto out_put_task_struct;
 
@@ -1003,7 +1025,8 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL);
+	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
+				  request == PTRACE_INTERRUPT);
 	if (!ret)
 		ret = compat_arch_ptrace(child, request, addr, data);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/19] ptrace: restructure ptrace_getsiginfo()
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (12 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 13/19] ptrace: implement PTRACE_INTERRUPT Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 15/19] ptrace: add siginfo.si_pt_flags Tejun Heo
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

Flatten ptrace_getsiginfo() to prepare for more logic in the success
path.  While at it, remove [un]likely() on child->last_siginfo check -
signal delivery and group stop traps can only be distinguished by NULL
siginfo and group stop isn't that unlikely.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/ptrace.c |   21 ++++++++++++---------
 1 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 3911567..851870c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -572,16 +572,19 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
 {
 	unsigned long flags;
-	int error = -ESRCH;
+	int error;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			*info = *child->last_siginfo;
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
+	if (!lock_task_sighand(child, &flags))
+		return -ESRCH;
+
+	error = -EINVAL;
+	if (!child->last_siginfo)
+		goto out_unlock;
+
+	error = 0;
+	*info = *child->last_siginfo;
+out_unlock:
+	unlock_task_sighand(child, &flags);
 	return error;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/19] ptrace: add siginfo.si_pt_flags
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (13 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 14/19] ptrace: restructure ptrace_getsiginfo() Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 16/19] ptrace: make group stop state visible via PTRACE_GETSIGINFO Tejun Heo
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo, Tony Luck, Fenghua Yu, Ralf Baechle,
	Kyle McMartin, Helge Deller, James E.J. Bottomley,
	Benjamin Herrenschmidt, Paul Mackerras, Martin Schwidefsky,
	Heiko Carstens, David S. Miller, x86

This essentially is a simple addition of a flag field but seems
complicated thanks to siginfo_t convolution.  _sigtrap struct, which
contains all the fields used by ptrace_notify[_locked]() and the new
_pt_flags, is added to siginfo._sifields union along with the field
abbreviation macro si_pt_flags; then, __SI_TRAP is defined to
implement copying of the new field to userland.

Two architectures - ia64 and mips - define their own versions of
siginfo_t and ia64 implements its own copy_siginfo_to_user().  Also,
x86, mips, parisc, powerpc, s390, sparc and tile have compat_siginfo_t
and copy_siginfo_to_user32() for 32bit compatibility.  All are updated
such that [compat_]siginfo_t have _sigtrap and all the fields are
copied out.

x86 is tested.  Affected code in mips, powerpc, s390 and sparc are
compile tested.  mips and tile are untested.

This patch doesn't actually make use of the new field.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: x86@kernel.org
---
 arch/ia64/include/asm/siginfo.h       |    7 +++++++
 arch/ia64/kernel/signal.c             |    5 +++++
 arch/mips/include/asm/compat-signal.h |    7 +++++++
 arch/mips/include/asm/siginfo.h       |    7 +++++++
 arch/mips/kernel/signal32.c           |    5 +++++
 arch/parisc/kernel/signal32.c         |    5 +++++
 arch/parisc/kernel/signal32.h         |    7 +++++++
 arch/powerpc/kernel/ppc32.h           |    7 +++++++
 arch/powerpc/kernel/signal_32.c       |    5 +++++
 arch/s390/kernel/compat_linux.h       |    7 +++++++
 arch/s390/kernel/compat_signal.c      |    5 +++++
 arch/sparc/kernel/signal32.c          |   12 ++++++++++++
 arch/tile/kernel/compat_signal.c      |   11 +++++++++++
 arch/x86/ia32/ia32_signal.c           |    4 ++++
 arch/x86/include/asm/ia32.h           |    7 +++++++
 include/asm-generic/siginfo.h         |   10 ++++++++++
 kernel/signal.c                       |    7 ++++++-
 17 files changed, 117 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/include/asm/siginfo.h b/arch/ia64/include/asm/siginfo.h
index c8fcaa2..2cff1ce 100644
--- a/arch/ia64/include/asm/siginfo.h
+++ b/arch/ia64/include/asm/siginfo.h
@@ -70,6 +70,13 @@ typedef struct siginfo {
 			long _band;	/* POLL_IN, POLL_OUT, POLL_MSG (XPG requires a "long") */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			pid_t _pid;		/* sender's pid */
+			uid_t _uid;		/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } siginfo_t;
 
diff --git a/arch/ia64/kernel/signal.c b/arch/ia64/kernel/signal.c
index 7bdafc8..ee18366 100644
--- a/arch/ia64/kernel/signal.c
+++ b/arch/ia64/kernel/signal.c
@@ -142,6 +142,11 @@ copy_siginfo_to_user (siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_addr, &to->si_addr);
 			err |= __put_user(from->si_imm, &to->si_imm);
 			break;
+		      case __SI_TRAP >> 16:
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		      case __SI_TIMER >> 16:
 			err |= __put_user(from->si_tid, &to->si_tid);
 			err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/mips/include/asm/compat-signal.h b/arch/mips/include/asm/compat-signal.h
index 368a99e..47b2e4f 100644
--- a/arch/mips/include/asm/compat-signal.h
+++ b/arch/mips/include/asm/compat-signal.h
@@ -54,6 +54,13 @@ typedef struct compat_siginfo {
 			int _fd;
 		} _sigpoll;
 
+		/* SIGTRAP */
+		struct {
+			compat_pid_t _pid;	/* sender's pid */
+			compat_uid_t _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
+
 		/* POSIX.1b timers */
 		struct {
 			timer_t _tid;		/* timer id */
diff --git a/arch/mips/include/asm/siginfo.h b/arch/mips/include/asm/siginfo.h
index 20ebeb8..6e8f0d6 100644
--- a/arch/mips/include/asm/siginfo.h
+++ b/arch/mips/include/asm/siginfo.h
@@ -96,6 +96,13 @@ typedef struct siginfo {
 			__ARCH_SI_BAND_T _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			pid_t _pid;		/* sender's pid */
+			__ARCH_SI_UID_T _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } siginfo_t;
 
diff --git a/arch/mips/kernel/signal32.c b/arch/mips/kernel/signal32.c
index aae9866..7e392e1 100644
--- a/arch/mips/kernel/signal32.c
+++ b/arch/mips/kernel/signal32.c
@@ -452,6 +452,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_band, &to->si_band);
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_RT >> 16: /* This is not generated by the kernel as of now.  */
 		case __SI_MESGQ >> 16:
 			err |= __put_user(from->si_pid, &to->si_pid);
diff --git a/arch/parisc/kernel/signal32.c b/arch/parisc/kernel/signal32.c
index e141324..ead8ca4 100644
--- a/arch/parisc/kernel/signal32.c
+++ b/arch/parisc/kernel/signal32.c
@@ -482,6 +482,11 @@ copy_siginfo_to_user32 (compat_siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_band, &to->si_band);
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_TIMER >> 16:
 			err |= __put_user(from->si_tid, &to->si_tid);
 			err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/parisc/kernel/signal32.h b/arch/parisc/kernel/signal32.h
index c780084..8016f51 100644
--- a/arch/parisc/kernel/signal32.h
+++ b/arch/parisc/kernel/signal32.h
@@ -104,6 +104,13 @@ typedef struct compat_siginfo {
                         int _band;      /* POLL_IN, POLL_OUT, POLL_MSG */
                         int _fd;
                 } _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			unsigned int _pid;      /* sender's pid */
+			unsigned int _uid;      /* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
         } _sifields;
 } compat_siginfo_t;
 
diff --git a/arch/powerpc/kernel/ppc32.h b/arch/powerpc/kernel/ppc32.h
index dc16aef..4293542 100644
--- a/arch/powerpc/kernel/ppc32.h
+++ b/arch/powerpc/kernel/ppc32.h
@@ -64,6 +64,13 @@ typedef struct compat_siginfo {
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			compat_pid_t _pid;		/* sender's pid */
+			compat_uid_t _uid;		/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } compat_siginfo_t;
 
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index b96a3a0..d072458 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -716,6 +716,11 @@ int copy_siginfo_to_user32(struct compat_siginfo __user *d, siginfo_t *s)
 		err |= __put_user(s->si_band, &d->si_band);
 		err |= __put_user(s->si_fd, &d->si_fd);
 		break;
+	case __SI_TRAP:
+		err |= __put_user(s->si_pid, &d->si_pid);
+		err |= __put_user(s->si_uid, &d->si_uid);
+		err |= __put_user(s->si_pt_flags, &d->si_pt_flags);
+		break;
 	case __SI_TIMER >> 16:
 		err |= __put_user(s->si_tid, &d->si_tid);
 		err |= __put_user(s->si_overrun, &d->si_overrun);
diff --git a/arch/s390/kernel/compat_linux.h b/arch/s390/kernel/compat_linux.h
index 9635d75..f8c973f 100644
--- a/arch/s390/kernel/compat_linux.h
+++ b/arch/s390/kernel/compat_linux.h
@@ -72,6 +72,13 @@ typedef struct compat_siginfo {
 			int	_band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int	_fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			pid_t		_pid;	/* sender's pid */
+			uid_t		_uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } compat_siginfo_t;
 
diff --git a/arch/s390/kernel/compat_signal.c b/arch/s390/kernel/compat_signal.c
index eee9998..b3c9f6b 100644
--- a/arch/s390/kernel/compat_signal.c
+++ b/arch/s390/kernel/compat_signal.c
@@ -96,6 +96,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_band, &to->si_band);
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_TIMER >> 16:
 			err |= __put_user(from->si_tid, &to->si_tid);
 			err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c
index 75fad42..c545212 100644
--- a/arch/sparc/kernel/signal32.c
+++ b/arch/sparc/kernel/signal32.c
@@ -102,6 +102,13 @@ typedef struct compat_siginfo{
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			compat_pid_t _pid;		/* sender's pid */
+			unsigned int _uid;		/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 }compat_siginfo_t;
 
@@ -165,6 +172,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_band, &to->si_band);
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_RT >> 16: /* This is not generated by the kernel as of now.  */
 		case __SI_MESGQ >> 16:
 			err |= __put_user(from->si_pid, &to->si_pid);
diff --git a/arch/tile/kernel/compat_signal.c b/arch/tile/kernel/compat_signal.c
index dbb0dfc..0a5d694 100644
--- a/arch/tile/kernel/compat_signal.c
+++ b/arch/tile/kernel/compat_signal.c
@@ -109,6 +109,13 @@ struct compat_siginfo {
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			unsigned int _pid;	/* sender's pid */
+			unsigned int _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 };
 
@@ -219,6 +226,10 @@ int copy_siginfo_to_user32(struct compat_siginfo __user *to, siginfo_t *from)
 		case __SI_POLL >> 16:
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_TIMER >> 16:
 			err |= __put_user(from->si_overrun, &to->si_overrun);
 			err |= __put_user(ptr_to_compat(from->si_ptr),
diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 588a7aa..1df88cc 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -85,6 +85,10 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			case __SI_POLL >> 16:
 				put_user_ex(from->si_fd, &to->si_fd);
 				break;
+			case __SI_TRAP:
+				put_user_ex(from->si_uid, &to->si_uid);
+				put_user_ex(from->si_pt_flags, &to->si_pt_flags);
+				break;
 			case __SI_TIMER >> 16:
 				put_user_ex(from->si_overrun, &to->si_overrun);
 				put_user_ex(ptr_to_compat(from->si_ptr),
diff --git a/arch/x86/include/asm/ia32.h b/arch/x86/include/asm/ia32.h
index 1f7e625..7eab27a 100644
--- a/arch/x86/include/asm/ia32.h
+++ b/arch/x86/include/asm/ia32.h
@@ -126,6 +126,13 @@ typedef struct compat_siginfo {
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			unsigned int _pid;	/* sender's pid */
+			unsigned int _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } compat_siginfo_t;
 
diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 0dd4e87..9ecabdf 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -90,6 +90,13 @@ typedef struct siginfo {
 			__ARCH_SI_BAND_T _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			__kernel_pid_t _pid;	/* sender's pid */
+			__ARCH_SI_UID_T _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } siginfo_t;
 
@@ -116,6 +123,7 @@ typedef struct siginfo {
 #define si_addr_lsb	_sifields._sigfault._addr_lsb
 #define si_band		_sifields._sigpoll._band
 #define si_fd		_sifields._sigpoll._fd
+#define si_pt_flags	_sifields._sigtrap._pt_flags
 
 #ifdef __KERNEL__
 #define __SI_MASK	0xffff0000u
@@ -126,6 +134,7 @@ typedef struct siginfo {
 #define __SI_CHLD	(4 << 16)
 #define __SI_RT		(5 << 16)
 #define __SI_MESGQ	(6 << 16)
+#define __SI_TRAP	(7 << 16)
 #define __SI_CODE(T,N)	((T) | ((N) & 0xffff))
 #else
 #define __SI_KILL	0
@@ -135,6 +144,7 @@ typedef struct siginfo {
 #define __SI_CHLD	0
 #define __SI_RT		0
 #define __SI_MESGQ	0
+#define __SI_TRAP	0
 #define __SI_CODE(T,N)	(N)
 #endif
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 16cd311..4662723 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1881,7 +1881,7 @@ static void ptrace_do_notify(int exit_code, int why)
 
 	memset(&info, 0, sizeof info);
 	info.si_signo = SIGTRAP;
-	info.si_code = exit_code;
+	info.si_code = __SI_TRAP | exit_code;
 	info.si_pid = task_pid_vnr(current);
 	info.si_uid = current_uid();
 
@@ -2535,6 +2535,11 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
 		err |= __put_user(from->si_band, &to->si_band);
 		err |= __put_user(from->si_fd, &to->si_fd);
 		break;
+	case __SI_TRAP:
+		err |= __put_user(from->si_pid, &to->si_pid);
+		err |= __put_user(from->si_uid, &to->si_uid);
+		err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+		break;
 	case __SI_FAULT:
 		err |= __put_user(from->si_addr, &to->si_addr);
 #ifdef __ARCH_SI_TRAPNO
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/19] ptrace: make group stop state visible via PTRACE_GETSIGINFO
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (14 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 15/19] ptrace: add siginfo.si_pt_flags Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 17/19] ptrace: don't let PTRACE_SETSIGINFO override __SI_TRAP siginfo Tejun Heo
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

If tracee is SEIZED, PTRACE_EVENT_STOP trap is used for group stop;
however, there currently is no way to find out which signal initiated
the group stop or if the group stop is still in effect.

This patch changes PTRACE_GET_SIGINFO on ptrace traps to report group
stop information via si.si_signo and si.si_pt_flags.  Note that it's
only available if tracee was seized.

This doesn't address notification and tracer has to put tracee in an
appropriate trap and poll the flag.  Later patches will deal with
notification and trap transition.

Test program follows.

  #define PTRACE_SEIZE		0x4206
  #define PTRACE_INTERRUPT	0x4207

  #define PTRACE_SEIZE_DEVEL	0x80000000

  static const struct timespec ts1s = { .tv_sec = 1 };

  int main(int argc, char **argv)
  {
	  pid_t tracee, tracer;
	  int i;

	  tracee = fork();
	  if (!tracee)
		  while (1)
			  nanosleep(&ts1s, NULL);

	  tracer = fork();
	  if (!tracer) {
		  int last_stopped = 0, stopped;
		  siginfo_t si;

		  ptrace(PTRACE_SEIZE, tracee, NULL,
			 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
	  repeat:
		  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
		  waitid(P_PID, tracee, NULL, WSTOPPED);

		  ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
		  if (si.si_code) {
			  stopped = !!si.si_status;
			  if (stopped != last_stopped)
				  printf("tracer: stopped=%d signo=%d\n",
					 stopped, si.si_signo);
			  last_stopped = stopped;
			  ptrace(PTRACE_CONT, tracee, NULL, NULL);
		  } else {
			  printf("tracer: SIG %d\n", si.si_signo);
			  ptrace(PTRACE_CONT, tracee, NULL,
				 (void *)(unsigned long)si.si_signo);
		  }
		  goto repeat;
	  }

	  for (i = 0; i < 3; i++) {
		  nanosleep(&ts1s, NULL);
		  printf("mother: SIGSTOP\n");
		  kill(tracee, SIGSTOP);
		  nanosleep(&ts1s, NULL);
		  printf("mother: SIGCONT\n");
		  kill(tracee, SIGCONT);
	  }
	  nanosleep(&ts1s, NULL);

	  kill(tracer, SIGKILL);
	  kill(tracee, SIGKILL);
	  return 0;
  }

Tracer delivers signal, resumes group stop, induces INTERRUPT traps
and reports group stop state change in busy loop.  Mother sends
SIGSTOP or CONT to tracee on each second.  Note that si_pt_flags and
flag testing are replaced with si_status testing.  si_status occupies
the same offset as si_pt_flags and PTRACE_SI_STOPPED is the only flag
defined, so I took a dirty short cut.

  # ./test-stopped
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: SIG 18
  tracer: stopped=0 signo=5

-v2: Local variable sig defined inside if() block it's used in as
     suggested by Oleg.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/ptrace.h |    3 +++
 kernel/ptrace.c        |   19 +++++++++++++++++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 8c45eb0..1e84960 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -73,6 +73,9 @@
 #define PTRACE_EVENT_EXIT	6
 #define PTRACE_EVENT_STOP	7
 
+/* flags in siginfo.si_pt_flags from PTRACE_GETSIGINFO */
+#define PTRACE_SI_STOPPED	0x00000001 /* tracee is job control stopped */
+
 #include <asm/ptrace.h>
 
 #ifdef __KERNEL__
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 851870c..a205c98 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -583,6 +583,25 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
 
 	error = 0;
 	*info = *child->last_siginfo;
+
+	/*
+	 * If reporting ptrace trap for a seized tracee, enable reporting
+	 * of info->si_pt_flags.
+	 */
+	if ((child->ptrace & PT_SEIZED) &&
+	    (info->si_code & __SI_MASK) == __SI_TRAP) {
+		struct signal_struct *sig = child->signal;
+
+		/*
+		 * Report whether group stop is in effect w/ SI_STOPPED and
+		 * if so which signal caused it.
+		 */
+		if (sig->group_stop_count || sig->flags & SIGNAL_STOP_STOPPED) {
+			info->si_pt_flags |= PTRACE_SI_STOPPED;
+			info->si_signo = child->jobctl & JOBCTL_STOP_SIGMASK;
+			WARN_ON_ONCE(!info->si_signo);
+		}
+	}
 out_unlock:
 	unlock_task_sighand(child, &flags);
 	return error;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/19] ptrace: don't let PTRACE_SETSIGINFO override __SI_TRAP siginfo
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (15 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 16/19] ptrace: make group stop state visible via PTRACE_GETSIGINFO Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 18/19] ptrace: add JOBCTL_BLOCK_NOTIFY Tejun Heo
  2011-05-24 18:37 ` [PATCH 19/19] ptrace: implement group stop notification for ptracer Tejun Heo
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

__SI_TRAP siginfo is special in the operation of ptrace.  It reports
group stop related information and will also interact with
notification retraps.  Don't let userland mess with it.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/ptrace.c |   31 ++++++++++++++++++++++---------
 1 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a205c98..a9b3c67 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -610,16 +610,29 @@ out_unlock:
 static int ptrace_setsiginfo(struct task_struct *child, const siginfo_t *info)
 {
 	unsigned long flags;
-	int error = -ESRCH;
+	int error;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			*child->last_siginfo = *info;
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
+	if (!lock_task_sighand(child, &flags))
+		return -ESRCH;
+
+	error = -EINVAL;
+	if (unlikely(!child->last_siginfo))
+		goto out_unlock;
+
+	/*
+	 * If seized, __SI_TRAP siginfo is used to communicate information
+	 * regarding traps and contains dynamic information generated on
+	 * GETSIGINFO.  Don't let userland override or fake it.
+	 */
+	if ((child->ptrace & PT_SEIZED) &&
+	    unlikely((child->last_siginfo->si_code & __SI_MASK) == __SI_TRAP ||
+		     (info->si_code & __SI_MASK) == __SI_TRAP))
+		goto out_unlock;
+
+	*child->last_siginfo = *info;
+	error = 0;
+out_unlock:
+	unlock_task_sighand(child, &flags);
 	return error;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/19] ptrace: add JOBCTL_BLOCK_NOTIFY
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (16 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 17/19] ptrace: don't let PTRACE_SETSIGINFO override __SI_TRAP siginfo Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  2011-05-24 18:37 ` [PATCH 19/19] ptrace: implement group stop notification for ptracer Tejun Heo
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

For to-be-added notification retraps, other tasks need to be able to
tell whether ptrace request is currently in progress while tracee is
in STOP trap.  This patch adds JOBCTL_BLOCK_NOTIFY which is set on
ptrace_check_attach() if the request requires tracee to be trapped and
it's trapped for STOP, and cleared when ptrace syscall finishes.

This flag isn't used yet.

-v2: ptrace_put_task_struct() reorganized per Oleg's comment.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/ptrace.h |    2 +
 include/linux/sched.h  |    1 +
 kernel/ptrace.c        |   54 ++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 1e84960..2e39f81 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -73,6 +73,8 @@
 #define PTRACE_EVENT_EXIT	6
 #define PTRACE_EVENT_STOP	7
 
+#define PTRACE_STOP_SI_CODE	(__SI_TRAP | SIGTRAP | PTRACE_EVENT_STOP << 8)
+
 /* flags in siginfo.si_pt_flags from PTRACE_GETSIGINFO */
 #define PTRACE_SI_STOPPED	0x00000001 /* tracee is job control stopped */
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9a0e1bc..9298f97 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1801,6 +1801,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define JOBCTL_STOP_CONSUME	(1 << 18) /* consume group stop count */
 #define JOBCTL_TRAP_STOP	(1 << 19) /* trap for STOP */
 #define JOBCTL_TRAPPING		(1 << 21) /* switching to TRACED */
+#define JOBCTL_BLOCK_NOTIFY	(1 << 22) /* block NOTIFY re-traps */
 
 #define JOBCTL_TRAP_MASK	JOBCTL_TRAP_STOP
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a9b3c67..1982d7a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -128,11 +128,13 @@ void __ptrace_unlink(struct task_struct *child)
 	spin_lock(&child->sighand->siglock);
 
 	/*
-	 * Clear all pending traps and TRAPPING.  TRAPPING should be
-	 * cleared regardless of JOBCTL_STOP_PENDING.  Do it explicitly.
+	 * Clear all pending traps, TRAPPING and BLOCK_NOTIFY.  TRAPPING
+	 * should be cleared regardless of JOBCTL_STOP_PENDING.  Do it
+	 * explicitly.
 	 */
 	task_clear_jobctl_pending(child, JOBCTL_TRAP_MASK);
 	task_clear_jobctl_trapping(child);
+	child->jobctl &= ~JOBCTL_BLOCK_NOTIFY;
 
 	/*
 	 * Reinstate JOBCTL_STOP_PENDING if group stop is in effect and
@@ -191,10 +193,24 @@ int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 		 */
 		spin_lock_irq(&child->sighand->siglock);
 		WARN_ON_ONCE(task_is_stopped(child));
-		if (task_is_traced(child) || ignore_state)
+
+		if (ignore_state) {
+			ret = 0;
+		} else if (task_is_traced(child)) {
+			siginfo_t *si = child->last_siginfo;
+
+			/*
+			 * If STOP trapped, ptrace notification may cause
+			 * re-traps, which we don't want while ptrace
+			 * request is in progress.  Block notification.
+			 */
+			if (si && si->si_code == PTRACE_STOP_SI_CODE)
+				child->jobctl |= JOBCTL_BLOCK_NOTIFY;
 			ret = 0;
-		else if (ptrace_wait_trapping(child))
+		} else if (ptrace_wait_trapping(child)) {
 			return restart_syscall();
+		}
+
 		spin_unlock_irq(&child->sighand->siglock);
 	}
 	read_unlock(&tasklist_lock);
@@ -887,6 +903,32 @@ static struct task_struct *ptrace_get_task_struct(pid_t pid)
 #define arch_ptrace_attach(child)	do { } while (0)
 #endif
 
+/**
+ * ptrace_put_task_struct - ptrace request processing done, put child
+ * @child: child task struct to put
+ *
+ * ptrace request processing for @child is finished.  Clean up and put
+ * @child.  This function clears %JOBCTL_BLOCK_NOTIFY which can be set by
+ * ptrace_check_attach().  @child might not be being ptraced by %current.
+ */
+static void ptrace_put_task_struct(struct task_struct *child)
+{
+	unsigned long flags;
+
+	if (!(child->jobctl & JOBCTL_BLOCK_NOTIFY))
+		goto out_put;
+
+	if (unlikely(!(child->ptrace & PT_PTRACED) || child->parent != current))
+		goto out_put;
+
+	if (likely(lock_task_sighand(child, &flags))) {
+		child->jobctl &= ~JOBCTL_BLOCK_NOTIFY;
+		unlock_task_sighand(child, &flags);
+	}
+out_put:
+	put_task_struct(child);
+}
+
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -925,7 +967,7 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 	ret = arch_ptrace(child, request, addr, data);
 
  out_put_task_struct:
-	put_task_struct(child);
+	ptrace_put_task_struct(child);
  out:
 	return ret;
 }
@@ -1066,7 +1108,7 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
 		ret = compat_arch_ptrace(child, request, addr, data);
 
  out_put_task_struct:
-	put_task_struct(child);
+	ptrace_put_task_struct(child);
  out:
 	return ret;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 19/19] ptrace: implement group stop notification for ptracer
  2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
                   ` (17 preceding siblings ...)
  2011-05-24 18:37 ` [PATCH 18/19] ptrace: add JOBCTL_BLOCK_NOTIFY Tejun Heo
@ 2011-05-24 18:37 ` Tejun Heo
  18 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-24 18:37 UTC (permalink / raw)
  To: oleg
  Cc: vda.linux, jan.kratochvil, linux-kernel, torvalds, akpm, indan,
	bdonlan, pedro, Tejun Heo

Currently there's no way for ptracer to find out whether group stop
that tracee was in finished other than polling with PTRACE_GETSIGINFO.
Also, tracer can't detect new group stop started by an untraced thread
if tracee is already trapped.  This patch implements group stop
notification for ptracer using STOP traps.

When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
is set, which triggers STOP trap but is sticky until the next
PTRACE_GETSIGINFO.  As GETSIGINFO exports the current group stop
state, this guarantees that tracer checks the current group stop state
at least once after group stop state change.  Stickiness is necessary
because notification trap may race with PTRACE_CONT for other traps
and get lost.

Note that simply scheduling such trap isn't enough.  If tracee is
running (PTRACE_CONT'd from group stop trap), the usual trapping -
setting NOTIFY followed by the usual signal_wake_up() - is enough;
however, if tracee is trapped, the scheduled trap won't happen until
the trap is continued.

This is solved by re-trapping if tracee is in STOP trap.  Along with
JOBCTL_TRAP_NOTIFY, JOBCTL_TRAPPING is set and tracee is woken up from
TASK_TRACED.  Tracee then (re-)enters INTERRUPT trap generating
notification for tracer.  TRAPPING hides the TRACED -> RUNNING ->
TRACED transition from tracer.

Many ptrace requests expect tracee to remain trapped until they
finish.  Such conditions are marked with JOBCTL_BLOCK_NOTIFY and if
notification happens while BLOCK_NOTIFY is set, JOBCTL_TRAPPING is set
but the actual wake up and re-trapping takes place when the ptrace
request finishes.  This is safe as the only task which can wait for
TRAPPING is the ptracer.

Re-trapping is used only for STOP trap.  If tracer wants to get
notified about group stop, it either leaves tracee in the initial STOP
trap or puts it into STOP trap using PTRACE_INTERRUPT.  If STOP trap
is scheduled while tracee is already in a trap, it's guaranteed that
tracee will enter a trap without returning to userland, so tracer
doesn't lose any control over tracee execution for group stop
notification.

An example program follows.

  #define PTRACE_SEIZE		0x4206
  #define PTRACE_INTERRUPT	0x4207

  #define PTRACE_SEIZE_DEVEL	0x80000000

  static const struct timespec ts1s = { .tv_sec = 1 };

  int main(int argc, char **argv)
  {
	  pid_t tracee, tracer;
	  int i;

	  tracee = fork();
	  if (!tracee)
		  while (1)
			  pause();

	  tracer = fork();
	  if (!tracer) {
		  int stopped;
		  siginfo_t si;

		  ptrace(PTRACE_SEIZE, tracee, NULL,
			 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
		  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
	  repeat:
		  waitid(P_PID, tracee, NULL, WSTOPPED);

		  ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
		  if (!si.si_code) {
			  printf("tracer: SIG %d\n", si.si_signo);
			  ptrace(PTRACE_CONT, tracee, NULL,
				 (void *)(unsigned long)si.si_signo);
			  goto repeat;
		  }
		  stopped = !!si.si_status;
		  printf("tracer: stopped=%d signo=%d\n", stopped, si.si_signo);
		  if (!stopped)
			  ptrace(PTRACE_CONT, tracee, NULL, NULL);
		  goto repeat;
	  }

	  for (i = 0; i < 3; i++) {
		  nanosleep(&ts1s, NULL);
		  printf("mother: SIGSTOP\n");
		  kill(tracee, SIGSTOP);
		  nanosleep(&ts1s, NULL);
		  printf("mother: SIGCONT\n");
		  kill(tracee, SIGCONT);
	  }
	  nanosleep(&ts1s, NULL);

	  kill(tracer, SIGKILL);
	  kill(tracee, SIGKILL);
	  return 0;
  }

In the above program, tracer gets notification of group stop state
changes and can track stopped state without polling PTRACE_GETSIGINFO.

  # ./test-gstop-notify
  tracer: stopped=0 signo=5
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18

-v2: ptrace_trap_notify() updated to use task_set_jobctl_pending() and
     should no longer set NOTIFY if target task is dying.  This issue
     was spotted by Oleg.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h |    3 +-
 kernel/ptrace.c       |   11 ++++++++
 kernel/signal.c       |   69 ++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9298f97..3120b97 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1800,10 +1800,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define JOBCTL_STOP_PENDING	(1 << 17) /* task should stop for group stop */
 #define JOBCTL_STOP_CONSUME	(1 << 18) /* consume group stop count */
 #define JOBCTL_TRAP_STOP	(1 << 19) /* trap for STOP */
+#define JOBCTL_TRAP_NOTIFY	(1 << 20) /* sticky trap for notifications */
 #define JOBCTL_TRAPPING		(1 << 21) /* switching to TRACED */
 #define JOBCTL_BLOCK_NOTIFY	(1 << 22) /* block NOTIFY re-traps */
 
-#define JOBCTL_TRAP_MASK	JOBCTL_TRAP_STOP
+#define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
 extern bool task_set_jobctl_pending(struct task_struct *task,
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1982d7a..6424323 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -617,6 +617,9 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
 			info->si_signo = child->jobctl & JOBCTL_STOP_SIGMASK;
 			WARN_ON_ONCE(!info->si_signo);
 		}
+
+		/* tracer got siginfo, clear the sticky trap */
+		task_clear_jobctl_pending(child, JOBCTL_TRAP_NOTIFY);
 	}
 out_unlock:
 	unlock_task_sighand(child, &flags);
@@ -923,6 +926,14 @@ static void ptrace_put_task_struct(struct task_struct *child)
 
 	if (likely(lock_task_sighand(child, &flags))) {
 		child->jobctl &= ~JOBCTL_BLOCK_NOTIFY;
+
+		/*
+		 * If TRAPPING is set, it means NOTIFY occurred in-between
+		 * and re-trap was blocked.  Trigger re-trap.
+		 */
+		if (child->jobctl & JOBCTL_TRAPPING)
+			signal_wake_up(child, task_is_traced(child));
+
 		unlock_task_sighand(child, &flags);
 	}
 out_put:
diff --git a/kernel/signal.c b/kernel/signal.c
index 4662723..e1e44f4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -817,6 +817,61 @@ static int check_kill_permission(int sig, struct siginfo *info,
 	return security_task_kill(t, info, sig, 0);
 }
 
+/**
+ * ptrace_trap_notify - schedule trap to notify ptracer
+ * @t: tracee wanting to notify tracer
+ *
+ * This function schedules sticky ptrace trap which is cleared on
+ * PTRACE_GETSIGINFO to notify ptracer of an event.  @t must have been
+ * seized by ptracer.
+ *
+ * If @t is running, STOP trap will be taken.  If already trapped for STOP,
+ * it will re-trap.  If trapped for other traps, STOP trap will be
+ * eventually taken without returning to userland after the existing traps
+ * are finished by PTRACE_CONT.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void ptrace_trap_notify(struct task_struct *t)
+{
+	siginfo_t *si = t->last_siginfo;
+	unsigned int mask;
+	bool pstop;
+
+	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
+	assert_spin_locked(&t->sighand->siglock);
+
+	/*
+	 * @t is being ptraced and new SEIZE behavior is in effect.
+	 * Schedule sticky trap which will clear on the next GETSIGINFO.
+	 *
+	 * If @t is currently trapped for STOP, it should re-trap with new
+	 * exit_code indicating continuation so that the ptracer can notice
+	 * the event; otherwise, use normal signal delivery wake up.
+	 *
+	 * The re-trapping sets JOBCTL_TRAPPING such that the transition is
+	 * hidden from the ptracer.
+	 *
+	 * This means that if @t is trapped for other reasons than STOP,
+	 * the notification trap won't be delievered until the current one
+	 * is complete.  This is the intended behavior.
+	 *
+	 * Note that if JOBCTL_BLOCK_NOTIFY, TRAPPING is set but actual
+	 * re-trap doesn't happen.  This is used to avoid waking up while
+	 * ptrace request is in progress.  The ptracer will notice TRAPPING
+	 * is set on request completion and trigger re-trap.
+	 */
+	mask = JOBCTL_TRAP_NOTIFY;
+	pstop = task_is_traced(t) && si && si->si_code == PTRACE_STOP_SI_CODE;
+	if (pstop)
+		mask |= JOBCTL_TRAPPING;
+
+	if (task_set_jobctl_pending(t, mask) &&
+	    !(t->jobctl & JOBCTL_BLOCK_NOTIFY))
+		signal_wake_up(t, pstop);
+}
+
 /*
  * Handle magic process-wide effects of stop/continue signals. Unlike
  * the signal actions, these happen immediately at signal-generation
@@ -855,7 +910,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		do {
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
-			wake_up_state(t, __TASK_STOPPED);
+			if (likely(!(t->ptrace & PT_SEIZED)))
+				wake_up_state(t, __TASK_STOPPED);
+			else
+				ptrace_trap_notify(t);
 		} while_each_thread(p, t);
 
 		/*
@@ -1972,7 +2030,10 @@ static bool do_signal_stop(int signr)
 			if (!task_is_stopped(t) &&
 			    task_set_jobctl_pending(t, signr | gstop)) {
 				sig->group_stop_count++;
-				signal_wake_up(t, 0);
+				if (likely(!(t->ptrace & PT_SEIZED)))
+					signal_wake_up(t, 0);
+				else
+					ptrace_trap_notify(t);
 			}
 		}
 	}
@@ -2010,10 +2071,10 @@ static bool do_signal_stop(int signr)
 		schedule();
 	} else {
 		/*
-		 * While ptraced, group stop is handled by STOP trap.
+		 * While ptraced, group stop is handled by NOTIFY trap.
 		 * Schedule it and let the caller deal with it.
 		 */
-		task_set_jobctl_pending(current, JOBCTL_TRAP_STOP);
+		task_set_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
 		spin_unlock_irq(&current->sighand->siglock);
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit
  2011-05-24 18:37 ` [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit Tejun Heo
@ 2011-05-24 19:03   ` Linus Torvalds
  2011-05-25  8:44     ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2011-05-24 19:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: oleg, vda.linux, jan.kratochvil, linux-kernel, akpm, indan,
	bdonlan, pedro

On Tue, May 24, 2011 at 11:37 AM, Tejun Heo <tj@kernel.org> wrote:
>
> +               wake_up_bit(&task->jobctl, ilog2(JOBCTL_TRAPPING));

Why do people do this *SHIT*?

Stop it.

If you need the bit number, then define the damn thing in terms of bit
numbers! Do

  #define JOBCTL_TRAPPING_BIT xxx
  #define JOBCTL_TRAPPING (1u << JOBCTL_TRAPPING_BIT)

but never ever do that stupid ilog2() thing. Every time I see it I go
"ok, that's just total crap".

Don't do it.

                      Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit
  2011-05-24 19:03   ` Linus Torvalds
@ 2011-05-25  8:44     ` Tejun Heo
  2011-05-25 14:34       ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2011-05-25  8:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: oleg, vda.linux, jan.kratochvil, linux-kernel, akpm, indan,
	bdonlan, pedro

Hello, Linus.

On Tue, May 24, 2011 at 12:03:16PM -0700, Linus Torvalds wrote:
>   #define JOBCTL_TRAPPING_BIT xxx
>   #define JOBCTL_TRAPPING (1u << JOBCTL_TRAPPING_BIT)
> 
> but never ever do that stupid ilog2() thing. Every time I see it I go
> "ok, that's just total crap".
> 
> Don't do it.

Sure, but can you at least explain why you dislike it so much?  This
is the only place bit position is needed and having two variants with
_BIT can be confusing (I did that more than once with workqueue code).
ilog2() could be, I don't know, unfamiliar, but what's so crappy about
it?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit
  2011-05-25  8:44     ` Tejun Heo
@ 2011-05-25 14:34       ` Linus Torvalds
  2011-05-25 14:42         ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2011-05-25 14:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: oleg, vda.linux, jan.kratochvil, linux-kernel, akpm, indan,
	bdonlan, pedro

On Wed, May 25, 2011 at 1:44 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Sure, but can you at least explain why you dislike it so much?  This
> is the only place bit position is needed and having two variants with
> _BIT can be confusing (I did that more than once with workqueue code).
> ilog2() could be, I don't know, unfamiliar, but what's so crappy about
> it?

Have you looked at ilog2()?

Do you realize how much extra crap it is for the compiler? For
absolutely *no* reason.

I hate that disgusting thing. It basically screams to everybody "I'm a
f*cking moron, I did things the wrong way around, so now I use this
thing to undo my braindamage".

You can get a good idea of the code quality if you grep for users. I
don't think it's an accident at all that the two big blocks are
ide-tape and some infiniband drivers.

                         Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit
  2011-05-25 14:34       ` Linus Torvalds
@ 2011-05-25 14:42         ` Tejun Heo
  2011-05-25 21:08           ` Valdis.Kletnieks
  0 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2011-05-25 14:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: oleg, vda.linux, jan.kratochvil, linux-kernel, akpm, indan,
	bdonlan, pedro

On Wed, May 25, 2011 at 07:34:27AM -0700, Linus Torvalds wrote:
> On Wed, May 25, 2011 at 1:44 AM, Tejun Heo <tj@kernel.org> wrote:
> > Sure, but can you at least explain why you dislike it so much?  This
> > is the only place bit position is needed and having two variants with
> > _BIT can be confusing (I did that more than once with workqueue code).
> > ilog2() could be, I don't know, unfamiliar, but what's so crappy about
> > it?
> 
> Have you looked at ilog2()?

Yeap, I did.

> Do you realize how much extra crap it is for the compiler? For
> absolutely *no* reason.
>
> I hate that disgusting thing. It basically screams to everybody "I'm a
> f*cking moron, I did things the wrong way around, so now I use this
> thing to undo my braindamage".

I don't know.  The overhead is compile-time and if the bit position
usage is very rare compared to bitmask usage, it is convenient.

> You can get a good idea of the code quality if you grep for users. I
> don't think it's an accident at all that the two big blocks are
> ide-tape and some infiniband drivers.

Maybe, but also ilog2() is relatively new and most already had ways to
deal with bit positions and masks, so the uptake is expected to be
slow.

Anyways, understood.  Linus hates ilog2().  Will convert to _BIT
macros on the next round and stay the f$@? away from it from now on.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit
  2011-05-25 14:42         ` Tejun Heo
@ 2011-05-25 21:08           ` Valdis.Kletnieks
  0 siblings, 0 replies; 25+ messages in thread
From: Valdis.Kletnieks @ 2011-05-25 21:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, oleg, vda.linux, jan.kratochvil, linux-kernel,
	akpm, indan, bdonlan, pedro

[-- Attachment #1: Type: text/plain, Size: 478 bytes --]

On Wed, 25 May 2011 16:42:14 +0200, Tejun Heo said:

> > You can get a good idea of the code quality if you grep for users. I
> > don't think it's an accident at all that the two big blocks are
> > ide-tape and some infiniband drivers.
> 
> Maybe, but also ilog2() is relatively new and most already had ways to
> deal with bit positions and masks, so the uptake is expected to be
> slow.

One has to wonder how ide-tape got to be a early-adopter user of *anything*.

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-05-25 21:08 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-24 18:37 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#3 Tejun Heo
2011-05-24 18:37 ` [PATCH 01/19] job control: rename signal->group_stop and flags to jobctl and rearrange flags Tejun Heo
2011-05-24 18:37 ` [PATCH 02/19] ptrace: ptrace_check_attach(): rename @kill to @ignore_state and add comments Tejun Heo
2011-05-24 18:37 ` [PATCH 03/19] ptrace: relocate set_current_state(TASK_TRACED) in ptrace_stop() Tejun Heo
2011-05-24 18:37 ` [PATCH 04/19] job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending() Tejun Heo
2011-05-24 18:37 ` [PATCH 05/19] job control: make task_clear_jobctl_pending() clear TRAPPING automatically Tejun Heo
2011-05-24 18:37 ` [PATCH 06/19] job control: introduce task_set_jobctl_pending() Tejun Heo
2011-05-24 18:37 ` [PATCH 07/19] ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit Tejun Heo
2011-05-24 19:03   ` Linus Torvalds
2011-05-25  8:44     ` Tejun Heo
2011-05-25 14:34       ` Linus Torvalds
2011-05-25 14:42         ` Tejun Heo
2011-05-25 21:08           ` Valdis.Kletnieks
2011-05-24 18:37 ` [PATCH 08/19] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach() Tejun Heo
2011-05-24 18:37 ` [PATCH 09/19] ptrace: make TRAPPING wait interruptible Tejun Heo
2011-05-24 18:37 ` [PATCH 10/19] signal: remove three noop tracehooks Tejun Heo
2011-05-24 18:37 ` [PATCH 11/19] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap Tejun Heo
2011-05-24 18:37 ` [PATCH 12/19] ptrace: implement PTRACE_SEIZE Tejun Heo
2011-05-24 18:37 ` [PATCH 13/19] ptrace: implement PTRACE_INTERRUPT Tejun Heo
2011-05-24 18:37 ` [PATCH 14/19] ptrace: restructure ptrace_getsiginfo() Tejun Heo
2011-05-24 18:37 ` [PATCH 15/19] ptrace: add siginfo.si_pt_flags Tejun Heo
2011-05-24 18:37 ` [PATCH 16/19] ptrace: make group stop state visible via PTRACE_GETSIGINFO Tejun Heo
2011-05-24 18:37 ` [PATCH 17/19] ptrace: don't let PTRACE_SETSIGINFO override __SI_TRAP siginfo Tejun Heo
2011-05-24 18:37 ` [PATCH 18/19] ptrace: add JOBCTL_BLOCK_NOTIFY Tejun Heo
2011-05-24 18:37 ` [PATCH 19/19] ptrace: implement group stop notification for ptracer Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.