All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2
@ 2011-05-16 18:17 Tejun Heo
  2011-05-16 18:17 ` [PATCH 01/10] signal: remove three noop tracehooks Tejun Heo
                   ` (11 more replies)
  0 siblings, 12 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan

Hello,

This is the second try at implementing PTRACE_SEIZE/INTERRUPT and
group stop notification.  Notable changes from the first take[1] are,

* Prep patches moved to a separate patchset[2].

* JOBCTL_EVENT_INTERRUPT renamed to JOBCTL_EVENT_STOP and now also
  used for group stop.  This means that PTRACE_GETSIGINFO will always
  return a valid siginfo.  For signal delivery traps, it will contain
  the siginfo for the signal being delivered.  For all other traps, it
  will contain the trap information which includes group stop info.

* JOBCTL_TRAP_STOP trap doesn't carry signo in si_code and the signal
  which initiated the group stop currently in effect can be obtained
  from siginfo.si_signo.

* JOBCTL_INTERRUPT triggers JOBCTL_EVENT_STOP and gets cleared on any
  trap, so the initial trap by JOBCTL_SEIZE is identical to
  JOBCTL_INTERRUPT.

* PTRACE_SETSIGINFO disallows overriding __SI_TRAP siginfo.

* Re-trap delay mechanism added to avoid re-trapping while ptrace
  request is in progress.

So, PTRACE_INTERRUPT simply puts the tracee into the same state as
group stop and this simplifies things a lot.  Trap siginfo is also
accessible during group stop and group stop notification happening
during PTRACE_EVENT_STOP makes sense and there's no awkward careful
sequencing of traps to avoid unnecessary traps.  I think it makes
whole lot more sense this way.

This patchset contains the following ten patches.

  0001-signal-remove-three-noop-tracehooks.patch
  0002-job-control-introduce-JOBCTL_TRAP_STOP-and-use-it-fo.patch
  0003-ptrace-implement-PTRACE_SEIZE.patch
  0004-ptrace-implement-PTRACE_INTERRUPT.patch
  0005-ptrace-restructure-ptrace_getsiginfo.patch
  0006-ptrace-add-siginfo.si_pt_flags.patch
  0007-ptrace-make-group-stop-state-visible-via-PTRACE_GETS.patch
  0008-ptrace-don-t-let-PTRACE_SETSIGINFO-override-__SI_TRA.patch
  0009-ptrace-add-JOBCTL_BLOCK_NOTIFY.patch
  0010-ptrace-implement-group-stop-notification-for-ptracer.patch

0001 removes three signal tracehooks.  Will be moved to the prep
patchset on the next round.

0002 introduces TRAP_STOP and 0003-0004 implement SEIZE and INTERRUPT.

0005-0008 expose group stop info via GETSIGINFO.

0009-0010 implement group stop notification.

This patchset is on top of Oleg's ptrace branch[3] + prep patchset[2]
and available in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-ptrace-seize

The HEAD is 8026ce454dd46dc3c8cf6bbc57bcd30f5efa4e55.  If you see
older branch, please retry after a while (korg is still syncing).

diffstat follows.

 arch/ia64/include/asm/siginfo.h       |    7 +
 arch/ia64/kernel/signal.c             |    5 
 arch/mips/include/asm/compat-signal.h |    7 +
 arch/mips/include/asm/siginfo.h       |    7 +
 arch/mips/kernel/signal32.c           |    5 
 arch/parisc/kernel/signal32.c         |    5 
 arch/parisc/kernel/signal32.h         |    7 +
 arch/powerpc/kernel/ppc32.h           |    7 +
 arch/powerpc/kernel/signal_32.c       |    5 
 arch/s390/kernel/compat_linux.h       |    7 +
 arch/s390/kernel/compat_signal.c      |    5 
 arch/sparc/kernel/signal32.c          |   12 +
 arch/tile/kernel/compat_signal.c      |   11 +
 arch/x86/ia32/ia32_signal.c           |    4 
 arch/x86/include/asm/ia32.h           |    7 +
 include/asm-generic/siginfo.h         |   10 +
 include/linux/ptrace.h                |   13 ++
 include/linux/sched.h                 |    6 
 include/linux/tracehook.h             |   52 --------
 kernel/ptrace.c                       |  202 +++++++++++++++++++++++++++----
 kernel/signal.c                       |  215 +++++++++++++++++++++++-----------
 21 files changed, 452 insertions(+), 147 deletions(-)

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1136930
[2] http://thread.gmane.org/gmane.linux.kernel/1139751
[3] git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc.git

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH 01/10] signal: remove three noop tracehooks
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-17 16:22   ` Christoph Hellwig
  2011-05-18 18:45   ` Oleg Nesterov
  2011-05-16 18:17 ` [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap Tejun Heo
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

Remove the following three noop tracehooks in signals.c.

* tracehook_force_sigpending()
* tracehook_get_signal()
* tracehook_finish_jctl()

The code area is about to be updated and these hooks don't do anything
other than obfuscating the logic.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/tracehook.h |   52 ---------------------------------------------
 kernel/signal.c           |   44 ++++++++++++--------------------------
 2 files changed, 14 insertions(+), 82 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index b073f3c..85dae19 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -425,58 +425,6 @@ static inline int tracehook_consider_fatal_signal(struct task_struct *task,
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
 
-/**
- * tracehook_force_sigpending - let tracing force signal_pending(current) on
- *
- * Called when recomputing our signal_pending() flag.  Return nonzero
- * to force the signal_pending() flag on, so that tracehook_get_signal()
- * will be called before the next return to user mode.
- *
- * Called with @current->sighand->siglock held.
- */
-static inline int tracehook_force_sigpending(void)
-{
-	return 0;
-}
-
-/**
- * tracehook_get_signal - deliver synthetic signal to traced task
- * @task:		@current
- * @regs:		task_pt_regs(@current)
- * @info:		details of synthetic signal
- * @return_ka:		sigaction for synthetic signal
- *
- * Return zero to check for a real pending signal normally.
- * Return -1 after releasing the siglock to repeat the check.
- * Return a signal number to induce an artifical signal delivery,
- * setting *@info and *@return_ka to specify its details and behavior.
- *
- * The @return_ka->sa_handler value controls the disposition of the
- * signal, no matter the signal number.  For %SIG_DFL, the return value
- * is a representative signal to indicate the behavior (e.g. %SIGTERM
- * for death, %SIGQUIT for core dump, %SIGSTOP for job control stop,
- * %SIGTSTP for stop unless in an orphaned pgrp), but the signal number
- * reported will be @info->si_signo instead.
- *
- * Called with @task->sighand->siglock held, before dequeuing pending signals.
- */
-static inline int tracehook_get_signal(struct task_struct *task,
-				       struct pt_regs *regs,
-				       siginfo_t *info,
-				       struct k_sigaction *return_ka)
-{
-	return 0;
-}
-
-/**
- * tracehook_finish_jctl - report about return from job control stop
- *
- * This is called by do_signal_stop() after wakeup.
- */
-static inline void tracehook_finish_jctl(void)
-{
-}
-
 #define DEATH_REAP			-1
 #define DEATH_DELAYED_GROUP_LEADER	-2
 
diff --git a/kernel/signal.c b/kernel/signal.c
index fff3036..bb63c25 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -150,9 +150,7 @@ void recalc_sigpending_and_wake(struct task_struct *t)
 
 void recalc_sigpending(void)
 {
-	if (unlikely(tracehook_force_sigpending()))
-		set_thread_flag(TIF_SIGPENDING);
-	else if (!recalc_sigpending_tsk(current) && !freezing(current))
+	if (!recalc_sigpending_tsk(current) && !freezing(current))
 		clear_thread_flag(TIF_SIGPENDING);
 
 }
@@ -1971,8 +1969,6 @@ retry:
 
 	spin_unlock_irq(&current->sighand->siglock);
 
-	tracehook_finish_jctl();
-
 	return 1;
 }
 
@@ -2075,37 +2071,25 @@ relock:
 
 	for (;;) {
 		struct k_sigaction *ka;
-		/*
-		 * Tracing can induce an artifical signal and choose sigaction.
-		 * The return value in @signr determines the default action,
-		 * but @info->si_signo is the signal number we will report.
-		 */
-		signr = tracehook_get_signal(current, regs, info, return_ka);
-		if (unlikely(signr < 0))
+
+		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
+		    do_signal_stop(0))
 			goto relock;
-		if (unlikely(signr != 0))
-			ka = return_ka;
-		else {
-			if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
-			    do_signal_stop(0))
-				goto relock;
 
-			signr = dequeue_signal(current, &current->blocked,
-					       info);
+		signr = dequeue_signal(current, &current->blocked, info);
 
-			if (!signr)
-				break; /* will return 0 */
+		if (!signr)
+			break; /* will return 0 */
 
-			if (signr != SIGKILL) {
-				signr = ptrace_signal(signr, info,
-						      regs, cookie);
-				if (!signr)
-					continue;
-			}
-
-			ka = &sighand->action[signr-1];
+		if (signr != SIGKILL) {
+			signr = ptrace_signal(signr, info,
+					      regs, cookie);
+			if (!signr)
+				continue;
 		}
 
+		ka = &sighand->action[signr-1];
+
 		/* Trace actually delivered signals. */
 		trace_signal_deliver(signr, info, ka);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
  2011-05-16 18:17 ` [PATCH 01/10] signal: remove three noop tracehooks Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-18 16:48   ` Oleg Nesterov
  2011-05-16 18:17 ` [PATCH 03/10] ptrace: implement PTRACE_SEIZE Tejun Heo
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

do_signal_stop() implemented both normal group stop and trap for group
stop while ptraced.  This approach has been enough but scheduled
changes require trap mechanism which can be used in more generic
manner and using group stop trap for generic trap site simplifies both
userland visible interface and implementation.

This patch adds a new jobctl flag - JOBCTL_TRAP_STOP.  When set, it
triggers a trap site, which behaves like group stop trap, in
get_signal_to_deliver() before checking for pending signals.  While
ptraced, do_signal_stop() doesn't stop itself.  It initiates group
stop if requested and schedules JOBCTL_TRAP_STOP and returns, which
makes its caller - get_signal_to_deliver() - to relock, check and
enter the trap.

Although this adds an unlock-relocking between checking of
JOBCTL_STOP_PENDING and actually trapping for STOP, this doesn't
affect correctness.  ptrace_stop() already had conditional
unlock-relocking depending on arch and, if SIGCONT is generated
inbetween, it's ignored as if it were received after the task entered
TASK_TRACED.  The extra unlock-relocking follows the same rule and the
race window will be properly handled by notification mechanism which
will be added later.

ptrace_attach() is updated to use JOBCTL_TRAP_STOP instead of
JOBCTL_STOP_PENDING and __ptrace_unlink() to clear all pending bits so
that TRAP_STOP and future trap bits don't linger after detach.

While at it, add proper function comment to do_signal_stop() and make
it return bool.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/sched.h |    4 ++-
 kernel/ptrace.c       |   10 +++++-
 kernel/signal.c       |   80 +++++++++++++++++++++++++++++++------------------
 3 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 86f21e7..d8a11cb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1784,9 +1784,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define JOBCTL_STOP_DEQUEUED	(1 << 16) /* stop signal dequeued */
 #define JOBCTL_STOP_PENDING	(1 << 17) /* task should stop for group stop */
 #define JOBCTL_STOP_CONSUME	(1 << 18) /* consume group stop count */
+#define JOBCTL_TRAP_STOP	(1 << 19) /* trap for STOP */
 #define JOBCTL_TRAPPING		(1 << 21) /* switching to TRACED */
 
-#define JOBCTL_PENDING_MASK	JOBCTL_STOP_PENDING
+#define JOBCTL_TRAP_MASK	JOBCTL_TRAP_STOP
+#define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
 extern void task_clear_jobctl_pending(struct task_struct *task,
 				      unsigned int mask);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9e0423c..7f02129 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -127,6 +127,12 @@ void __ptrace_unlink(struct task_struct *child)
 	spin_lock(&child->sighand->siglock);
 
 	/*
+	 * Clear jobctl bits owned by this tracer along with STOP_PENDING,
+	 * which is reinstated below if necessary.
+	 */
+	task_clear_jobctl_pending(child, JOBCTL_PENDING_MASK);
+
+	/*
 	 * Reinstate JOBCTL_STOP_PENDING if group stop is in effect and
 	 * @child isn't dead.
 	 */
@@ -292,7 +298,7 @@ static int ptrace_attach(struct task_struct *task)
 	spin_lock(&task->sighand->siglock);
 
 	/*
-	 * If the task is already STOPPED, set JOBCTL_STOP_PENDING and
+	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
 	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
 	 * will be cleared if the child completes the transition or any
 	 * event which clears the group stop states happens.
@@ -307,7 +313,7 @@ static int ptrace_attach(struct task_struct *task)
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task)) {
-		task->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;
+		task->jobctl |= JOBCTL_TRAP_STOP | JOBCTL_TRAPPING;
 		signal_wake_up(task, 1);
 	}
 
diff --git a/kernel/signal.c b/kernel/signal.c
index bb63c25..50a4e8a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1757,13 +1757,16 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	/*
 	 * If @why is CLD_STOPPED, we're trapping to participate in a group
 	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
-	 * while siglock was released for the arch hook, PENDING could be
-	 * clear now.  We act as if SIGCONT is received after TASK_TRACED
-	 * is entered - ignore it.
+	 * across siglock relocks since INTERRUPT was scheduled, PENDING
+	 * could be clear now.  We act as if SIGCONT is received after
+	 * TASK_TRACED is entered - ignore it.
 	 */
 	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
 		gstop_done = task_participate_group_stop(current);
 
+	/* any trap clears pending STOP trap */
+	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
+
 	/* entering a trap, clear TRAPPING */
 	task_clear_jobctl_trapping(current);
 
@@ -1855,13 +1858,30 @@ void ptrace_notify(int exit_code)
 	spin_unlock_irq(&current->sighand->siglock);
 }
 
-/*
- * This performs the stopping for SIGSTOP and other stop signals.
- * We have to stop all threads in the thread group.
- * Returns non-zero if we've actually stopped and released the siglock.
- * Returns zero if we didn't stop and still hold the siglock.
+/**
+ * do_signal_stop - handle group stop for SIGSTOP and other stop signals
+ * @signr: signr causing group stop if initiating
+ *
+ * If %JOBCTL_STOP_PENDING is not set yet, initiate group stop with @signr
+ * and participate in it.  If already set, participate in the existing
+ * group stop.  If participated in a group stop (and thus slept), %true is
+ * returned with siglock released.
+ *
+ * If ptraced, this function doesn't handle stop itself.  Instead,
+ * %JOBCTL_TRAP_STOP is scheduled and %true is returned with siglock
+ * released.  The caller must ensure that INTERRUPT trap handling takes
+ * places afterwards.
+ *
+ * CONTEXT:
+ * Must be called with @current->sighand->siglock held, which is released
+ * on %true return.
+ *
+ * RETURNS:
+ * %false if group stop is already cancelled and nothing happened.  %true
+ * if participated in group stop.
  */
-static int do_signal_stop(int signr)
+static bool do_signal_stop(int signr)
+	__releases(&current->sighand->siglock)
 {
 	struct signal_struct *sig = current->signal;
 
@@ -1874,7 +1894,7 @@ static int do_signal_stop(int signr)
 
 		if (!likely(current->jobctl & JOBCTL_STOP_DEQUEUED) ||
 		    unlikely(signal_group_exit(sig)))
-			return 0;
+			return false;
 		/*
 		 * There is no group stop already in progress.  We must
 		 * initiate one now.
@@ -1917,7 +1937,7 @@ static int do_signal_stop(int signr)
 			}
 		}
 	}
-retry:
+
 	if (likely(!task_ptrace(current))) {
 		int notify = 0;
 
@@ -1949,27 +1969,16 @@ retry:
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		schedule();
-
-		spin_lock_irq(&current->sighand->siglock);
 	} else {
-		ptrace_stop(current->jobctl & JOBCTL_STOP_SIGMASK,
-			    CLD_STOPPED, 0, NULL);
-		current->exit_code = 0;
-	}
-
-	/*
-	 * JOBCTL_STOP_PENDING could be set if another group stop has
-	 * started since being woken up or ptrace wants us to transit
-	 * between TASK_STOPPED and TRACED.  Retry group stop.
-	 */
-	if (current->jobctl & JOBCTL_STOP_PENDING) {
-		WARN_ON_ONCE(!(current->jobctl & JOBCTL_STOP_SIGMASK));
-		goto retry;
+		/*
+		 * While ptraced, group stop is handled by STOP trap.
+		 * Schedule it and let the caller deal with it.
+		 */
+		current->jobctl |= JOBCTL_TRAP_STOP;
+		spin_unlock_irq(&current->sighand->siglock);
 	}
 
-	spin_unlock_irq(&current->sighand->siglock);
-
-	return 1;
+	return true;
 }
 
 static int ptrace_signal(int signr, siginfo_t *info,
@@ -2069,6 +2078,19 @@ relock:
 		goto relock;
 	}
 
+	/*
+	 * Take care of ptrace jobctl traps.  It currently is only used to
+	 * trap for group stop while ptraced.
+	 */
+	if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
+		signr = current->jobctl & JOBCTL_STOP_SIGMASK;
+		WARN_ON_ONCE(!signr);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
+		current->exit_code = 0;
+		spin_unlock_irq(&sighand->siglock);
+		goto relock;
+	}
+
 	for (;;) {
 		struct k_sigaction *ka;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
  2011-05-16 18:17 ` [PATCH 01/10] signal: remove three noop tracehooks Tejun Heo
  2011-05-16 18:17 ` [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-18  0:40   ` Denys Vlasenko
  2011-05-18 18:17   ` Oleg Nesterov
  2011-05-16 18:17 ` [PATCH 04/10] ptrace: implement PTRACE_INTERRUPT Tejun Heo
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
effects on tracee signal and job control states.  This patch
implements a new ptrace request PTRACE_SEIZE which attaches and traps
tracee without affecting its signal and job control states.

The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
flags in @data.  Currently, the only defined flag is
PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
The changes will be implemented gradually and the DEVEL flag is to
prevent programs which expect full SEIZE behavior from using it before
all the behavior modifications are complete while allowing unit
testing.  The flag will be removed once SEIZE behaviors are completely
implemented.

After PTRACE_SEIZE, tracee will trap.  Which trap will happen isn't
fixed.  If other trap conditions exist (e.g. signal delivery), they
might be taken; otherwise, a trap with exit_code == (SIGTRAP |
PTRACE_EVENT_STOP << 8) is taken.  If seized, this trap is also used
for group stop traps instead of exit_code == 0 with NULL GETSIGINFO.

* PTRACE_SEIZE doesn't affect signal or group stop state.

* After PTRACE_SEIZE, one trap will happen which might be a
  PTRACE_EVENT_STOP trap.

* If PTRACE_SEIZE'd, group stop also uses PTRACE_EVENT_STOP trap which
  uses exit_code of (SIGTRAP | PTRACE_EVENT_STOP << 8) instead of the
  stopping signal number and returns usual trap siginfo on
  PTRACE_GETSIGINFO instead of NULL.

Note that there currently is no way to find out the stopping signal
number while seized.  This will be improved by future patches.

Seizing sets PT_SEIZED in ->ptrace of the tracee.  This flag will be
used to determine whether new SEIZE behaviors should be enabled.

Test program follows.

  #define PTRACE_SEIZE		0x4206
  #define PTRACE_SEIZE_DEVEL	0x80000000

  static const struct timespec ts100ms = { .tv_nsec = 100000000 };
  static const struct timespec ts1s = { .tv_sec = 1 };
  static const struct timespec ts3s = { .tv_sec = 3 };

  int main(int argc, char **argv)
  {
	  pid_t tracee;

	  tracee = fork();
	  if (tracee == 0) {
		  nanosleep(&ts100ms, NULL);
		  while (1) {
			  printf("tracee: alive\n");
			  nanosleep(&ts1s, NULL);
		  }
	  }

	  if (argc > 1)
		  kill(tracee, SIGSTOP);

	  nanosleep(&ts100ms, NULL);

	  ptrace(PTRACE_SEIZE, tracee, NULL,
		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
	  waitid(P_PID, tracee, NULL, WSTOPPED);
	  ptrace(PTRACE_CONT, tracee, NULL, NULL);
	  nanosleep(&ts3s, NULL);
	  printf("tracer: exiting\n");
	  return 0;
  }

When the above program is called w/o argument, tracee is seized from
running state and continued.  When tracer exits, tracee is returned to
running state and keeps printing out.

  # ./test-seize
  tracee: alive
  tracee: alive
  tracee: alive
  tracer: exiting
  # tracee: alive
  tracee: alive
  tracee: alive

When called with an argument, tracee is seized from stopped state and
continued, and returns to stopped state when tracer exits.

  # ./test-seize
  tracee: alive
  tracee: alive
  tracee: alive
  tracer: exiting
  # ps -el|grep test-seize
  1 T     0  4720     1  0  80   0 -   941 signal ttyS0    00:00:00 test-seize

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/ptrace.h |    7 +++++++
 kernel/ptrace.c        |   38 ++++++++++++++++++++++++++++++++------
 kernel/signal.c        |   32 ++++++++++++++++++++++++--------
 3 files changed, 63 insertions(+), 14 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 6b359cd..3fd389d 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -47,6 +47,11 @@
 #define PTRACE_GETREGSET	0x4204
 #define PTRACE_SETREGSET	0x4205
 
+#define PTRACE_SEIZE		0x4206
+
+/* flags in @data for PTRACE_SEIZE */
+#define PTRACE_SEIZE_DEVEL	0x80000000 /* temp flag for development */
+
 /* options set using PTRACE_SETOPTIONS */
 #define PTRACE_O_TRACESYSGOOD	0x00000001
 #define PTRACE_O_TRACEFORK	0x00000002
@@ -65,6 +70,7 @@
 #define PTRACE_EVENT_EXEC	4
 #define PTRACE_EVENT_VFORK_DONE	5
 #define PTRACE_EVENT_EXIT	6
+#define PTRACE_EVENT_STOP	7
 
 #include <asm/ptrace.h>
 
@@ -77,6 +83,7 @@
  * flags.  When the a task is stopped the ptracer owns task->ptrace.
  */
 
+#define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
 #define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 #define PT_TRACESYSGOOD	0x00000004
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7f02129..7aefd43 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -254,10 +254,28 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 	return !err;
 }
 
-static int ptrace_attach(struct task_struct *task)
+static int ptrace_attach(struct task_struct *task, long request,
+			 unsigned long flags)
 {
+	bool seize = (request == PTRACE_SEIZE);
 	int retval;
 
+	/*
+	 * SEIZE will enable new ptrace behaviors which will be implemented
+	 * gradually.  SEIZE_DEVEL is used to prevent applications
+	 * expecting full SEIZE behaviors trapping on kernel commits which
+	 * are still in the process of implementing them.
+	 *
+	 * Only test programs for new ptrace behaviors being implemented
+	 * should set SEIZE_DEVEL.  If unset, SEIZE will fail with -EIO.
+	 *
+	 * Once SEIZE behaviors are completely implemented, this flag and
+	 * the following test will be removed.
+	 */
+	retval = -EIO;
+	if (seize && !(flags & PTRACE_SEIZE_DEVEL))
+		goto out;
+
 	audit_ptrace(task);
 
 	retval = -EPERM;
@@ -289,11 +307,16 @@ static int ptrace_attach(struct task_struct *task)
 		goto unlock_tasklist;
 
 	task->ptrace = PT_PTRACED;
+	if (seize)
+		task->ptrace |= PT_SEIZED;
 	if (task_ns_capable(task, CAP_SYS_PTRACE))
 		task->ptrace |= PT_PTRACE_CAP;
 
 	__ptrace_link(task, current);
-	send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+
+	/* SEIZE uses TRAP_STOP instead of SIGSTOP for initial trap */
+	if (!seize)
+		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
 
 	spin_lock(&task->sighand->siglock);
 
@@ -315,6 +338,9 @@ static int ptrace_attach(struct task_struct *task)
 	if (task_is_stopped(task)) {
 		task->jobctl |= JOBCTL_TRAP_STOP | JOBCTL_TRAPPING;
 		signal_wake_up(task, 1);
+	} else if (seize) {
+		task->jobctl |= JOBCTL_TRAP_STOP;
+		signal_wake_up(task, 0);
 	}
 
 	spin_unlock(&task->sighand->siglock);
@@ -826,8 +852,8 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		goto out;
 	}
 
-	if (request == PTRACE_ATTACH) {
-		ret = ptrace_attach(child);
+	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
+		ret = ptrace_attach(child, request, data);
 		/*
 		 * Some architectures need to do book-keeping after
 		 * a ptrace attach.
@@ -968,8 +994,8 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
 		goto out;
 	}
 
-	if (request == PTRACE_ATTACH) {
-		ret = ptrace_attach(child);
+	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
+		ret = ptrace_attach(child, request, data);
 		/*
 		 * Some architectures need to do book-keeping after
 		 * a ptrace attach.
diff --git a/kernel/signal.c b/kernel/signal.c
index 50a4e8a..84e75db 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1840,7 +1840,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	recalc_sigpending_tsk(current);
 }
 
-void ptrace_notify(int exit_code)
+static void ptrace_do_notify(int exit_code, int why)
 {
 	siginfo_t info;
 
@@ -1853,8 +1853,13 @@ void ptrace_notify(int exit_code)
 	info.si_uid = current_uid();
 
 	/* Let the debugger run.  */
+	ptrace_stop(exit_code, why, 1, &info);
+}
+
+void ptrace_notify(int exit_code)
+{
 	spin_lock_irq(&current->sighand->siglock);
-	ptrace_stop(exit_code, CLD_TRAPPED, 1, &info);
+	ptrace_do_notify(exit_code, CLD_TRAPPED);
 	spin_unlock_irq(&current->sighand->siglock);
 }
 
@@ -2079,14 +2084,25 @@ relock:
 	}
 
 	/*
-	 * Take care of ptrace jobctl traps.  It currently is only used to
-	 * trap for group stop while ptraced.
+	 * Take care of ptrace jobctl traps.
+	 *
+	 * When PT_SEIZED, it's used for both group stop and explicit
+	 * SEIZE/INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap
+	 * with accompanying siginfo.
+	 *
+	 * When !PT_SEIZED, it's used only for group stop trap with
+	 * CLD_STOPPED as exit_code and no siginfo.
 	 */
 	if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
-		signr = current->jobctl & JOBCTL_STOP_SIGMASK;
-		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
-		current->exit_code = 0;
+		if (current->ptrace & PT_SEIZED) {
+			ptrace_do_notify(SIGTRAP | PTRACE_EVENT_STOP << 8,
+					 CLD_STOPPED);
+		} else {
+			signr = current->jobctl & JOBCTL_STOP_SIGMASK;
+			WARN_ON_ONCE(!signr);
+			ptrace_stop(signr, CLD_STOPPED, 0, NULL);
+			current->exit_code = 0;
+		}
 		spin_unlock_irq(&sighand->siglock);
 		goto relock;
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 04/10] ptrace: implement PTRACE_INTERRUPT
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (2 preceding siblings ...)
  2011-05-16 18:17 ` [PATCH 03/10] ptrace: implement PTRACE_SEIZE Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-18 18:38   ` Oleg Nesterov
  2011-05-16 18:17 ` [PATCH 05/10] ptrace: restructure ptrace_getsiginfo() Tejun Heo
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

Currently, there's no way to trap a running ptracee short of sending a
signal which has various side effects.  This patch implements
PTRACE_INTERRUPT which traps ptracee without any signal or job control
related side effect.

The implementation is almost trivial.  It uses the group stop trap -
SIGTRAP | PTRACE_EVENT_STOP << 8.  A new trap flag
JOBCTL_TRAP_INTERRUPT is added, which is set on PTRACE_INTERRUPT and
cleared when any trap happens.  As INTERRUPT should be useable
regardless of the current state of tracee, task_is_traced() test in
ptrace_check_attach() is skipped for INTERRUPT.

PTRACE_INTERRUPT is available iff tracee is attached with
PTRACE_SEIZE.

Test program follows.

  #define PTRACE_SEIZE		0x4206
  #define PTRACE_INTERRUPT	0x4207

  #define PTRACE_SEIZE_DEVEL	0x80000000

  static const struct timespec ts100ms = { .tv_nsec = 100000000 };
  static const struct timespec ts1s = { .tv_sec = 1 };
  static const struct timespec ts3s = { .tv_sec = 3 };

  int main(int argc, char **argv)
  {
	  pid_t tracee;

	  tracee = fork();
	  if (tracee == 0) {
		  nanosleep(&ts100ms, NULL);
		  while (1) {
			  printf("tracee: alive pid=%d\n", getpid());
			  nanosleep(&ts1s, NULL);
		  }
	  }

	  if (argc > 1)
		  kill(tracee, SIGSTOP);

	  nanosleep(&ts100ms, NULL);

	  ptrace(PTRACE_SEIZE, tracee, NULL,
		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
	  waitid(P_PID, tracee, NULL, WSTOPPED);
	  ptrace(PTRACE_CONT, tracee, NULL, NULL);
	  nanosleep(&ts3s, NULL);

	  printf("tracer: INTERRUPT and DETACH\n");
	  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
	  waitid(P_PID, tracee, NULL, WSTOPPED);
	  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
	  nanosleep(&ts3s, NULL);

	  printf("tracer: exiting\n");
	  kill(tracee, SIGKILL);
	  return 0;
  }

When called without argument, tracee is seized from running state,
continued, interrupted and then detached back to running state.

  # ./test-interrupt
  tracee: alive pid=4546
  tracee: alive pid=4546
  tracee: alive pid=4546
  tracer: INTERRUPT and DETACH
  tracee: alive pid=4546
  tracee: alive pid=4546
  tracee: alive pid=4546
  tracer: exiting

When called with argument, it's the same but tracee is detached back
to stopped state.

  # ./test-interrupt  1
  tracee: alive pid=4548
  tracee: alive pid=4548
  tracee: alive pid=4548
  tracer: INTERRUPT and DETACH
  tracer: exiting

Before PTRACE_INTERRUPT, once the tracee was continued, there was no
easy way to do PTRACE_DETACH without causing side effect as tracee
couldn't be trapped without side effect.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/ptrace.h |    1 +
 kernel/ptrace.c        |   27 +++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 3fd389d..b07b9e3 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -48,6 +48,7 @@
 #define PTRACE_SETREGSET	0x4205
 
 #define PTRACE_SEIZE		0x4206
+#define PTRACE_INTERRUPT	0x4207
 
 /* flags in @data for PTRACE_SEIZE */
 #define PTRACE_SEIZE_DEVEL	0x80000000 /* temp flag for development */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7aefd43..351db7c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -706,6 +706,7 @@ int ptrace_request(struct task_struct *child, long request,
 	siginfo_t siginfo;
 	void __user *datavp = (void __user *) data;
 	unsigned long __user *datalp = datavp;
+	unsigned long flags;
 
 	switch (request) {
 	case PTRACE_PEEKTEXT:
@@ -738,6 +739,26 @@ int ptrace_request(struct task_struct *child, long request,
 			ret = ptrace_setsiginfo(child, &siginfo);
 		break;
 
+	case PTRACE_INTERRUPT:
+		/*
+		 * Stop tracee without any side-effect on signal or job
+		 * control.  At least one trap is guaranteed to happen
+		 * after this request.  If @child is already trapped, the
+		 * current trap is not disturbed and another trap will
+		 * happen after the current trap is ended with PTRACE_CONT.
+		 *
+		 * The actual trap might not be PTRACE_EVENT_STOP trap but
+		 * the pending condition is cleared regardless.
+		 */
+		if (likely(child->ptrace & PT_SEIZED) &&
+		    lock_task_sighand(child, &flags)) {
+			child->jobctl |= JOBCTL_TRAP_STOP;
+			signal_wake_up(child, 0);
+			unlock_task_sighand(child, &flags);
+			ret = 0;
+		}
+		break;
+
 	case PTRACE_DETACH:	 /* detach a process that was attached. */
 		ret = ptrace_detach(child, data);
 		break;
@@ -863,7 +884,8 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL);
+	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
+				  request == PTRACE_INTERRUPT);
 	if (ret < 0)
 		goto out_put_task_struct;
 
@@ -1005,7 +1027,8 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL);
+	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
+				  request == PTRACE_INTERRUPT);
 	if (!ret)
 		ret = compat_arch_ptrace(child, request, addr, data);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 05/10] ptrace: restructure ptrace_getsiginfo()
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (3 preceding siblings ...)
  2011-05-16 18:17 ` [PATCH 04/10] ptrace: implement PTRACE_INTERRUPT Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-16 18:17 ` [PATCH 06/10] ptrace: add siginfo.si_pt_flags Tejun Heo
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

Flatten ptrace_getsiginfo() to prepare for more logic in the success
path.  While at it, remove [un]likely() on child->last_siginfo check -
signal delivery and group stop traps can only be distinguished by NULL
siginfo and group stop isn't that unlikely.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/ptrace.c |   21 ++++++++++++---------
 1 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 351db7c..42037a4 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -574,16 +574,19 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
 {
 	unsigned long flags;
-	int error = -ESRCH;
+	int error;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			*info = *child->last_siginfo;
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
+	if (!lock_task_sighand(child, &flags))
+		return -ESRCH;
+
+	error = -EINVAL;
+	if (!child->last_siginfo)
+		goto out_unlock;
+
+	error = 0;
+	*info = *child->last_siginfo;
+out_unlock:
+	unlock_task_sighand(child, &flags);
 	return error;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 06/10] ptrace: add siginfo.si_pt_flags
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (4 preceding siblings ...)
  2011-05-16 18:17 ` [PATCH 05/10] ptrace: restructure ptrace_getsiginfo() Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-16 18:17 ` [PATCH 07/10] ptrace: make group stop state visible via PTRACE_GETSIGINFO Tejun Heo
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo,
	Tony Luck, Fenghua Yu, Ralf Baechle, Kyle McMartin, Helge Deller,
	James E.J. Bottomley, Benjamin Herrenschmidt, Paul Mackerras,
	Martin Schwidefsky, Heiko Carstens, David S. Miller,
	Chris Metcalf, x86

This essentially is a simple addition of a flag field but seems
complicated thanks to siginfo_t convolution.  _sigtrap struct, which
contains all the fields used by ptrace_notify[_locked]() and the new
_pt_flags, is added to siginfo._sifields union along with the field
abbreviation macro si_pt_flags; then, __SI_TRAP is defined to
implement copying of the new field to userland.

Two architectures - ia64 and mips - define their own versions of
siginfo_t and ia64 implements its own copy_siginfo_to_user().  Also,
x86, mips, parisc, powerpc, s390, sparc and tile have compat_siginfo_t
and copy_siginfo_to_user32() for 32bit compatibility.  All are updated
such that [compat_]siginfo_t have _sigtrap and all the fields are
copied out.

x86 is tested.  Affected code in mips, powerpc, s390 and sparc are
compile tested.  mips and tile are untested.

This patch doesn't actually make use of the new field.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: x86@kernel.org
---
 arch/ia64/include/asm/siginfo.h       |    7 +++++++
 arch/ia64/kernel/signal.c             |    5 +++++
 arch/mips/include/asm/compat-signal.h |    7 +++++++
 arch/mips/include/asm/siginfo.h       |    7 +++++++
 arch/mips/kernel/signal32.c           |    5 +++++
 arch/parisc/kernel/signal32.c         |    5 +++++
 arch/parisc/kernel/signal32.h         |    7 +++++++
 arch/powerpc/kernel/ppc32.h           |    7 +++++++
 arch/powerpc/kernel/signal_32.c       |    5 +++++
 arch/s390/kernel/compat_linux.h       |    7 +++++++
 arch/s390/kernel/compat_signal.c      |    5 +++++
 arch/sparc/kernel/signal32.c          |   12 ++++++++++++
 arch/tile/kernel/compat_signal.c      |   11 +++++++++++
 arch/x86/ia32/ia32_signal.c           |    4 ++++
 arch/x86/include/asm/ia32.h           |    7 +++++++
 include/asm-generic/siginfo.h         |   10 ++++++++++
 kernel/signal.c                       |    7 ++++++-
 17 files changed, 117 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/include/asm/siginfo.h b/arch/ia64/include/asm/siginfo.h
index c8fcaa2..2cff1ce 100644
--- a/arch/ia64/include/asm/siginfo.h
+++ b/arch/ia64/include/asm/siginfo.h
@@ -70,6 +70,13 @@ typedef struct siginfo {
 			long _band;	/* POLL_IN, POLL_OUT, POLL_MSG (XPG requires a "long") */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			pid_t _pid;		/* sender's pid */
+			uid_t _uid;		/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } siginfo_t;
 
diff --git a/arch/ia64/kernel/signal.c b/arch/ia64/kernel/signal.c
index 7bdafc8..ee18366 100644
--- a/arch/ia64/kernel/signal.c
+++ b/arch/ia64/kernel/signal.c
@@ -142,6 +142,11 @@ copy_siginfo_to_user (siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_addr, &to->si_addr);
 			err |= __put_user(from->si_imm, &to->si_imm);
 			break;
+		      case __SI_TRAP >> 16:
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		      case __SI_TIMER >> 16:
 			err |= __put_user(from->si_tid, &to->si_tid);
 			err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/mips/include/asm/compat-signal.h b/arch/mips/include/asm/compat-signal.h
index 368a99e..47b2e4f 100644
--- a/arch/mips/include/asm/compat-signal.h
+++ b/arch/mips/include/asm/compat-signal.h
@@ -54,6 +54,13 @@ typedef struct compat_siginfo {
 			int _fd;
 		} _sigpoll;
 
+		/* SIGTRAP */
+		struct {
+			compat_pid_t _pid;	/* sender's pid */
+			compat_uid_t _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
+
 		/* POSIX.1b timers */
 		struct {
 			timer_t _tid;		/* timer id */
diff --git a/arch/mips/include/asm/siginfo.h b/arch/mips/include/asm/siginfo.h
index 1ca64b4..232920f 100644
--- a/arch/mips/include/asm/siginfo.h
+++ b/arch/mips/include/asm/siginfo.h
@@ -96,6 +96,13 @@ typedef struct siginfo {
 			__ARCH_SI_BAND_T _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			pid_t _pid;		/* sender's pid */
+			__ARCH_SI_UID_T _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } siginfo_t;
 
diff --git a/arch/mips/kernel/signal32.c b/arch/mips/kernel/signal32.c
index aae9866..7e392e1 100644
--- a/arch/mips/kernel/signal32.c
+++ b/arch/mips/kernel/signal32.c
@@ -452,6 +452,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_band, &to->si_band);
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_RT >> 16: /* This is not generated by the kernel as of now.  */
 		case __SI_MESGQ >> 16:
 			err |= __put_user(from->si_pid, &to->si_pid);
diff --git a/arch/parisc/kernel/signal32.c b/arch/parisc/kernel/signal32.c
index e141324..ead8ca4 100644
--- a/arch/parisc/kernel/signal32.c
+++ b/arch/parisc/kernel/signal32.c
@@ -482,6 +482,11 @@ copy_siginfo_to_user32 (compat_siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_band, &to->si_band);
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_TIMER >> 16:
 			err |= __put_user(from->si_tid, &to->si_tid);
 			err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/parisc/kernel/signal32.h b/arch/parisc/kernel/signal32.h
index c780084..8016f51 100644
--- a/arch/parisc/kernel/signal32.h
+++ b/arch/parisc/kernel/signal32.h
@@ -104,6 +104,13 @@ typedef struct compat_siginfo {
                         int _band;      /* POLL_IN, POLL_OUT, POLL_MSG */
                         int _fd;
                 } _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			unsigned int _pid;      /* sender's pid */
+			unsigned int _uid;      /* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
         } _sifields;
 } compat_siginfo_t;
 
diff --git a/arch/powerpc/kernel/ppc32.h b/arch/powerpc/kernel/ppc32.h
index dc16aef..4293542 100644
--- a/arch/powerpc/kernel/ppc32.h
+++ b/arch/powerpc/kernel/ppc32.h
@@ -64,6 +64,13 @@ typedef struct compat_siginfo {
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			compat_pid_t _pid;		/* sender's pid */
+			compat_uid_t _uid;		/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } compat_siginfo_t;
 
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index b96a3a0..d072458 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -716,6 +716,11 @@ int copy_siginfo_to_user32(struct compat_siginfo __user *d, siginfo_t *s)
 		err |= __put_user(s->si_band, &d->si_band);
 		err |= __put_user(s->si_fd, &d->si_fd);
 		break;
+	case __SI_TRAP:
+		err |= __put_user(s->si_pid, &d->si_pid);
+		err |= __put_user(s->si_uid, &d->si_uid);
+		err |= __put_user(s->si_pt_flags, &d->si_pt_flags);
+		break;
 	case __SI_TIMER >> 16:
 		err |= __put_user(s->si_tid, &d->si_tid);
 		err |= __put_user(s->si_overrun, &d->si_overrun);
diff --git a/arch/s390/kernel/compat_linux.h b/arch/s390/kernel/compat_linux.h
index 9635d75..f8c973f 100644
--- a/arch/s390/kernel/compat_linux.h
+++ b/arch/s390/kernel/compat_linux.h
@@ -72,6 +72,13 @@ typedef struct compat_siginfo {
 			int	_band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int	_fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			pid_t		_pid;	/* sender's pid */
+			uid_t		_uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } compat_siginfo_t;
 
diff --git a/arch/s390/kernel/compat_signal.c b/arch/s390/kernel/compat_signal.c
index eee9998..b3c9f6b 100644
--- a/arch/s390/kernel/compat_signal.c
+++ b/arch/s390/kernel/compat_signal.c
@@ -96,6 +96,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_band, &to->si_band);
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_TIMER >> 16:
 			err |= __put_user(from->si_tid, &to->si_tid);
 			err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c
index 75fad42..c545212 100644
--- a/arch/sparc/kernel/signal32.c
+++ b/arch/sparc/kernel/signal32.c
@@ -102,6 +102,13 @@ typedef struct compat_siginfo{
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			compat_pid_t _pid;		/* sender's pid */
+			unsigned int _uid;		/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 }compat_siginfo_t;
 
@@ -165,6 +172,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			err |= __put_user(from->si_band, &to->si_band);
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_pid, &to->si_pid);
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_RT >> 16: /* This is not generated by the kernel as of now.  */
 		case __SI_MESGQ >> 16:
 			err |= __put_user(from->si_pid, &to->si_pid);
diff --git a/arch/tile/kernel/compat_signal.c b/arch/tile/kernel/compat_signal.c
index dbb0dfc..0a5d694 100644
--- a/arch/tile/kernel/compat_signal.c
+++ b/arch/tile/kernel/compat_signal.c
@@ -109,6 +109,13 @@ struct compat_siginfo {
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			unsigned int _pid;	/* sender's pid */
+			unsigned int _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 };
 
@@ -219,6 +226,10 @@ int copy_siginfo_to_user32(struct compat_siginfo __user *to, siginfo_t *from)
 		case __SI_POLL >> 16:
 			err |= __put_user(from->si_fd, &to->si_fd);
 			break;
+		case __SI_TRAP >> 16:
+			err |= __put_user(from->si_uid, &to->si_uid);
+			err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+			break;
 		case __SI_TIMER >> 16:
 			err |= __put_user(from->si_overrun, &to->si_overrun);
 			err |= __put_user(ptr_to_compat(from->si_ptr),
diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 588a7aa..1df88cc 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -85,6 +85,10 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 			case __SI_POLL >> 16:
 				put_user_ex(from->si_fd, &to->si_fd);
 				break;
+			case __SI_TRAP:
+				put_user_ex(from->si_uid, &to->si_uid);
+				put_user_ex(from->si_pt_flags, &to->si_pt_flags);
+				break;
 			case __SI_TIMER >> 16:
 				put_user_ex(from->si_overrun, &to->si_overrun);
 				put_user_ex(ptr_to_compat(from->si_ptr),
diff --git a/arch/x86/include/asm/ia32.h b/arch/x86/include/asm/ia32.h
index 1f7e625..7eab27a 100644
--- a/arch/x86/include/asm/ia32.h
+++ b/arch/x86/include/asm/ia32.h
@@ -126,6 +126,13 @@ typedef struct compat_siginfo {
 			int _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			unsigned int _pid;	/* sender's pid */
+			unsigned int _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } compat_siginfo_t;
 
diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 942d30b..e243927 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -90,6 +90,13 @@ typedef struct siginfo {
 			__ARCH_SI_BAND_T _band;	/* POLL_IN, POLL_OUT, POLL_MSG */
 			int _fd;
 		} _sigpoll;
+
+		/* SIGTRAP */
+		struct {
+			__kernel_pid_t _pid;	/* sender's pid */
+			__ARCH_SI_UID_T _uid;	/* sender's uid */
+			unsigned int _pt_flags;
+		} _sigtrap;
 	} _sifields;
 } siginfo_t;
 
@@ -116,6 +123,7 @@ typedef struct siginfo {
 #define si_addr_lsb	_sifields._sigfault._addr_lsb
 #define si_band		_sifields._sigpoll._band
 #define si_fd		_sifields._sigpoll._fd
+#define si_pt_flags	_sifields._sigtrap._pt_flags
 
 #ifdef __KERNEL__
 #define __SI_MASK	0xffff0000u
@@ -126,6 +134,7 @@ typedef struct siginfo {
 #define __SI_CHLD	(4 << 16)
 #define __SI_RT		(5 << 16)
 #define __SI_MESGQ	(6 << 16)
+#define __SI_TRAP	(7 << 16)
 #define __SI_CODE(T,N)	((T) | ((N) & 0xffff))
 #else
 #define __SI_KILL	0
@@ -135,6 +144,7 @@ typedef struct siginfo {
 #define __SI_CHLD	0
 #define __SI_RT		0
 #define __SI_MESGQ	0
+#define __SI_TRAP	0
 #define __SI_CODE(T,N)	(N)
 #endif
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 84e75db..c05f4d1 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1848,7 +1848,7 @@ static void ptrace_do_notify(int exit_code, int why)
 
 	memset(&info, 0, sizeof info);
 	info.si_signo = SIGTRAP;
-	info.si_code = exit_code;
+	info.si_code = __SI_TRAP | exit_code;
 	info.si_pid = task_pid_vnr(current);
 	info.si_uid = current_uid();
 
@@ -2501,6 +2501,11 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
 		err |= __put_user(from->si_band, &to->si_band);
 		err |= __put_user(from->si_fd, &to->si_fd);
 		break;
+	case __SI_TRAP:
+		err |= __put_user(from->si_pid, &to->si_pid);
+		err |= __put_user(from->si_uid, &to->si_uid);
+		err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+		break;
 	case __SI_FAULT:
 		err |= __put_user(from->si_addr, &to->si_addr);
 #ifdef __ARCH_SI_TRAPNO
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 07/10] ptrace: make group stop state visible via PTRACE_GETSIGINFO
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (5 preceding siblings ...)
  2011-05-16 18:17 ` [PATCH 06/10] ptrace: add siginfo.si_pt_flags Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-19 16:27   ` Oleg Nesterov
  2011-05-16 18:17 ` [PATCH 08/10] ptrace: don't let PTRACE_SETSIGINFO override __SI_TRAP siginfo Tejun Heo
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

If tracee is SEIZED, PTRACE_EVENT_STOP trap is used for group stop;
however, there currently is no way to find out which signal initiated
the group stop or if the group stop is still in effect.

This patch changes PTRACE_GET_SIGINFO on ptrace traps to report group
stop information via si.si_signo and si.si_pt_flags.  Note that it's
only available if tracee was seized.

This doesn't address notification and tracer has to put tracee in an
appropriate trap and poll the flag.  Later patches will deal with
notification and trap transition.

Test program follows.

  #define PTRACE_SEIZE		0x4206
  #define PTRACE_INTERRUPT	0x4207

  #define PTRACE_SEIZE_DEVEL	0x80000000

  static const struct timespec ts1s = { .tv_sec = 1 };

  int main(int argc, char **argv)
  {
	  pid_t tracee, tracer;
	  int i;

	  tracee = fork();
	  if (!tracee)
		  while (1)
			  nanosleep(&ts1s, NULL);

	  tracer = fork();
	  if (!tracer) {
		  int last_stopped = 0, stopped;
		  siginfo_t si;

		  ptrace(PTRACE_SEIZE, tracee, NULL,
			 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
	  repeat:
		  waitid(P_PID, tracee, NULL, WSTOPPED);

		  ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
		  if (si.si_code) {
			  stopped = !!si.si_status;
			  if (stopped != last_stopped)
				  printf("tracer: stopped=%d signo=%d\n",
					 stopped, si.si_signo);
			  last_stopped = stopped;
			  ptrace(PTRACE_CONT, tracee, NULL, NULL);
		  } else {
			  printf("tracer: SIG %d\n", si.si_signo);
			  ptrace(PTRACE_CONT, tracee, NULL,
				 (void *)(unsigned long)si.si_signo);
		  }

		  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
		  goto repeat;
	  }

	  for (i = 0; i < 3; i++) {
		  nanosleep(&ts1s, NULL);
		  printf("mother: SIGSTOP\n");
		  kill(tracee, SIGSTOP);
		  nanosleep(&ts1s, NULL);
		  printf("mother: SIGCONT\n");
		  kill(tracee, SIGCONT);
	  }
	  nanosleep(&ts1s, NULL);

	  kill(tracer, SIGKILL);
	  kill(tracee, SIGKILL);
	  return 0;
  }

Tracer delivers signal, resumes group stop, induces INTERRUPT traps
and reports group stop state change in busy loop.  Mother sends
SIGSTOP or CONT to tracee on each second.  Note that si_pt_flags and
flag testing are replaced with si_status testing.  si_status occupies
the same offset as si_pt_flags and PTRACE_SI_STOPPED is the only flag
defined, so I took a dirty short cut.

  # ./test-stopped
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: SIG 18
  tracer: stopped=0 signo=5

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/ptrace.h |    3 +++
 kernel/ptrace.c        |   19 +++++++++++++++++++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index b07b9e3..72b9150 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -73,6 +73,9 @@
 #define PTRACE_EVENT_EXIT	6
 #define PTRACE_EVENT_STOP	7
 
+/* flags in siginfo.si_pt_flags from PTRACE_GETSIGINFO */
+#define PTRACE_SI_STOPPED	0x00000001 /* tracee is job control stopped */
+
 #include <asm/ptrace.h>
 
 #ifdef __KERNEL__
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 42037a4..30d2331 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -573,11 +573,13 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
 {
+	struct signal_struct *sig;
 	unsigned long flags;
 	int error;
 
 	if (!lock_task_sighand(child, &flags))
 		return -ESRCH;
+	sig = child->signal;
 
 	error = -EINVAL;
 	if (!child->last_siginfo)
@@ -585,6 +587,23 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
 
 	error = 0;
 	*info = *child->last_siginfo;
+
+	/*
+	 * If reporting ptrace trap for a seized tracee, enable reporting
+	 * of info->si_pt_flags.
+	 */
+	if ((child->ptrace & PT_SEIZED) &&
+	    (info->si_code & __SI_MASK) == __SI_TRAP) {
+		/*
+		 * Report whether group stop is in effect w/ SI_STOPPED and
+		 * if so which signal caused it.
+		 */
+		if (sig->group_stop_count || sig->flags & SIGNAL_STOP_STOPPED) {
+			info->si_pt_flags |= PTRACE_SI_STOPPED;
+			info->si_signo = child->jobctl & JOBCTL_STOP_SIGMASK;
+			WARN_ON_ONCE(!info->si_signo);
+		}
+	}
 out_unlock:
 	unlock_task_sighand(child, &flags);
 	return error;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 08/10] ptrace: don't let PTRACE_SETSIGINFO override __SI_TRAP siginfo
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (6 preceding siblings ...)
  2011-05-16 18:17 ` [PATCH 07/10] ptrace: make group stop state visible via PTRACE_GETSIGINFO Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-16 18:17 ` [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY Tejun Heo
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

__SI_TRAP siginfo is special in the operation of ptrace.  It reports
group stop related information and will also interact with
notification retraps.  Don't let userland mess with it.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/ptrace.c |   31 ++++++++++++++++++++++---------
 1 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 30d2331..c12daec 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -612,16 +612,29 @@ out_unlock:
 static int ptrace_setsiginfo(struct task_struct *child, const siginfo_t *info)
 {
 	unsigned long flags;
-	int error = -ESRCH;
+	int error;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			*child->last_siginfo = *info;
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
+	if (!lock_task_sighand(child, &flags))
+		return -ESRCH;
+
+	error = -EINVAL;
+	if (unlikely(!child->last_siginfo))
+		goto out_unlock;
+
+	/*
+	 * If seized, __SI_TRAP siginfo is used to communicate information
+	 * regarding traps and contains dynamic information generated on
+	 * GETSIGINFO.  Don't let userland override or fake it.
+	 */
+	if ((child->ptrace & PT_SEIZED) &&
+	    unlikely((child->last_siginfo->si_code & __SI_MASK) == __SI_TRAP ||
+		     (info->si_code & __SI_MASK) == __SI_TRAP))
+		goto out_unlock;
+
+	*child->last_siginfo = *info;
+	error = 0;
+out_unlock:
+	unlock_task_sighand(child, &flags);
 	return error;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (7 preceding siblings ...)
  2011-05-16 18:17 ` [PATCH 08/10] ptrace: don't let PTRACE_SETSIGINFO override __SI_TRAP siginfo Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-19 16:32   ` Oleg Nesterov
  2011-05-16 18:17 ` [PATCH 10/10] ptrace: implement group stop notification for ptracer Tejun Heo
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

For to-be-added notification retraps, other tasks need to be able to
tell whether ptrace request is currently in progress while tracee is
in STOP trap.  This patch adds JOBCTL_BLOCK_NOTIFY which is set on
ptrace_check_attach() if the request requires tracee to be trapped and
it's trapped for STOP, and cleared when ptrace syscall finishes.

This flag isn't used yet.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/ptrace.h |    2 +
 include/linux/sched.h  |    1 +
 kernel/ptrace.c        |   51 ++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 72b9150..b0c1347 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -73,6 +73,8 @@
 #define PTRACE_EVENT_EXIT	6
 #define PTRACE_EVENT_STOP	7
 
+#define PTRACE_STOP_SI_CODE	(__SI_TRAP | SIGTRAP | PTRACE_EVENT_STOP << 8)
+
 /* flags in siginfo.si_pt_flags from PTRACE_GETSIGINFO */
 #define PTRACE_SI_STOPPED	0x00000001 /* tracee is job control stopped */
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d8a11cb..1f082d9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1786,6 +1786,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define JOBCTL_STOP_CONSUME	(1 << 18) /* consume group stop count */
 #define JOBCTL_TRAP_STOP	(1 << 19) /* trap for STOP */
 #define JOBCTL_TRAPPING		(1 << 21) /* switching to TRACED */
+#define JOBCTL_BLOCK_NOTIFY	(1 << 22) /* block NOTIFY re-traps */
 
 #define JOBCTL_TRAP_MASK	JOBCTL_TRAP_STOP
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c12daec..d382f81 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -189,10 +189,24 @@ int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 		 */
 		spin_lock_irq(&child->sighand->siglock);
 		WARN_ON_ONCE(task_is_stopped(child));
-		if (task_is_traced(child) || ignore_state)
+
+		if (ignore_state) {
+			ret = 0;
+		} else if (task_is_traced(child)) {
+			siginfo_t *si = child->last_siginfo;
+
+			/*
+			 * If STOP trapped, ptrace notification may cause
+			 * re-traps, which we don't want while ptrace
+			 * request is in progress.  Block notification.
+			 */
+			if (si && si->si_code == PTRACE_STOP_SI_CODE)
+				child->jobctl |= JOBCTL_BLOCK_NOTIFY;
 			ret = 0;
-		else if (ptrace_wait_trapping(child))
+		} else if (ptrace_wait_trapping(child)) {
 			return restart_syscall();
+		}
+
 		spin_unlock_irq(&child->sighand->siglock);
 	}
 	read_unlock(&tasklist_lock);
@@ -889,6 +903,35 @@ static struct task_struct *ptrace_get_task_struct(pid_t pid)
 #define arch_ptrace_attach(child)	do { } while (0)
 #endif
 
+/**
+ * ptrace_put_task_struct - ptrace request processing done, put child
+ * @child: child task struct to put
+ *
+ * ptrace request processing for @child is finished.  Clean up and put
+ * @child.  This function clears %JOBCTL_BLOCK_NOTIFY which can be set by
+ * ptrace_check_attach().
+ */
+static void ptrace_put_task_struct(struct task_struct *child)
+{
+	unsigned long flags;
+
+	if (!(child->jobctl & JOBCTL_BLOCK_NOTIFY))
+		goto out_put;
+
+	if (unlikely(!lock_task_sighand(child, &flags)))
+		goto out_put;
+
+	/*
+	 * Make sure @chlid is still ptraced by us and clear BLOCK_NOTIFY.
+	 */
+	if (likely((child->ptrace & PT_PTRACED) && child->parent == current))
+		child->jobctl &= ~JOBCTL_BLOCK_NOTIFY;
+
+	unlock_task_sighand(child, &flags);
+out_put:
+	put_task_struct(child);
+}
+
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -927,7 +970,7 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 	ret = arch_ptrace(child, request, addr, data);
 
  out_put_task_struct:
-	put_task_struct(child);
+	ptrace_put_task_struct(child);
  out:
 	return ret;
 }
@@ -1068,7 +1111,7 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
 		ret = compat_arch_ptrace(child, request, addr, data);
 
  out_put_task_struct:
-	put_task_struct(child);
+	ptrace_put_task_struct(child);
  out:
 	return ret;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (8 preceding siblings ...)
  2011-05-16 18:17 ` [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY Tejun Heo
@ 2011-05-16 18:17 ` Tejun Heo
  2011-05-19 16:32   ` Oleg Nesterov
  2011-05-18 18:50 ` [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Oleg Nesterov
  2011-05-19 15:04 ` Linus Torvalds
  11 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-16 18:17 UTC (permalink / raw)
  To: oleg, jan.kratochvil, vda.linux
  Cc: linux-kernel, torvalds, akpm, indan, bdonlan, Tejun Heo

Currently there's no way for ptracer to find out whether group stop
that tracee was in finished other than polling with PTRACE_GETSIGINFO.
Also, tracer can't detect new group stop started by an untraced thread
if tracee is already trapped.  This patch implements group stop
notification for ptracer using STOP traps.

When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
is set, which triggers STOP trap but is sticky until the next
PTRACE_GETSIGINFO.  As GETSIGINFO exports the current group stop
state, this guarantees that tracer checks the current group stop state
at least once after group stop state change.  Stickiness is necessary
because notification trap may race with PTRACE_CONT for other traps
and get lost.

Note that simply scheduling such trap isn't enough.  If tracee is
running (PTRACE_CONT'd from group stop trap), the usual trapping -
setting NOTIFY followed by the usual signal_wake_up() - is enough;
however, if tracee is trapped, the scheduled trap won't happen until
the trap is continued.

This is solved by re-trapping if tracee is in STOP trap.  Along with
JOBCTL_TRAP_NOTIFY, JOBCTL_TRAPPING is set and tracee is woken up from
TASK_TRACED.  Tracee then (re-)enters INTERRUPT trap generating
notification for tracer.  TRAPPING hides the TRACED -> RUNNING ->
TRACED transition from tracer.

Many ptrace requests expect tracee to remain trapped until they
finish.  Such conditions are marked with JOBCTL_BLOCK_NOTIFY and if
notification happens while BLOCK_NOTIFY is set, JOBCTL_TRAPPING is set
but the actual wake up and re-trapping takes place when the ptrace
request finishes.  This is safe as the only task which can wait for
TRAPPING is the ptracer.

Re-trapping is used only for STOP trap.  If tracer wants to get
notified about group stop, it either leaves tracee in the initial STOP
trap or puts it into STOP trap using PTRACE_INTERRUPT.  If STOP trap
is scheduled while tracee is already in a trap, it's guaranteed that
tracee will enter a trap without returning to userland, so tracer
doesn't lose any control over tracee execution for group stop
notification.

An example program follows.

  #define PTRACE_SEIZE		0x4206
  #define PTRACE_INTERRUPT	0x4207

  #define PTRACE_SEIZE_DEVEL	0x80000000

  static const struct timespec ts1s = { .tv_sec = 1 };

  int main(int argc, char **argv)
  {
	  pid_t tracee, tracer;
	  int i;

	  tracee = fork();
	  if (!tracee)
		  while (1)
			  pause();

	  tracer = fork();
	  if (!tracer) {
		  int last_stopped = 0, stopped;
		  siginfo_t si;

		  ptrace(PTRACE_SEIZE, tracee, NULL,
			 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
	  repeat:
		  waitid(P_PID, tracee, NULL, WSTOPPED);

		  ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
		  if (!si.si_code) {
			  printf("tracer: SIG %d\n", si.si_signo);
			  ptrace(PTRACE_CONT, tracee, NULL,
				 (void *)(unsigned long)si.si_signo);
			  goto repeat;
		  }
		  stopped = !!si.si_status;

		  if (stopped != last_stopped)
			  printf("tracer: stopped=%d signo=%d\n",
				 stopped, si.si_signo);
		  last_stopped = stopped;

		  if (!stopped)
			  ptrace(PTRACE_CONT, tracee, NULL, NULL);
		  goto repeat;
	  }

	  for (i = 0; i < 3; i++) {
		  nanosleep(&ts1s, NULL);
		  printf("mother: SIGSTOP\n");
		  kill(tracee, SIGSTOP);
		  nanosleep(&ts1s, NULL);
		  printf("mother: SIGCONT\n");
		  kill(tracee, SIGCONT);
	  }
	  nanosleep(&ts1s, NULL);

	  kill(tracer, SIGKILL);
	  kill(tracee, SIGKILL);
	  return 0;
  }

In the above program, tracer gets notification of group stop state
changes and can track stopped state without polling PTRACE_GETSIGINFO.

  # ./test-gstop-notify
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18
  mother: SIGSTOP
  tracer: SIG 19
  tracer: stopped=1 signo=19
  mother: SIGCONT
  tracer: stopped=0 signo=5
  tracer: SIG 18

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/sched.h |    3 +-
 kernel/ptrace.c       |   11 +++++++-
 kernel/signal.c       |   68 ++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1f082d9..5573930 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1785,10 +1785,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 #define JOBCTL_STOP_PENDING	(1 << 17) /* task should stop for group stop */
 #define JOBCTL_STOP_CONSUME	(1 << 18) /* consume group stop count */
 #define JOBCTL_TRAP_STOP	(1 << 19) /* trap for STOP */
+#define JOBCTL_TRAP_NOTIFY	(1 << 20) /* sticky trap for notifications */
 #define JOBCTL_TRAPPING		(1 << 21) /* switching to TRACED */
 #define JOBCTL_BLOCK_NOTIFY	(1 << 22) /* block NOTIFY re-traps */
 
-#define JOBCTL_TRAP_MASK	JOBCTL_TRAP_STOP
+#define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
 extern void task_clear_jobctl_pending(struct task_struct *task,
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d382f81..2640761 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -130,6 +130,7 @@ void __ptrace_unlink(struct task_struct *child)
 	 * Clear jobctl bits owned by this tracer along with STOP_PENDING,
 	 * which is reinstated below if necessary.
 	 */
+	child->jobctl &= ~JOBCTL_BLOCK_NOTIFY;
 	task_clear_jobctl_pending(child, JOBCTL_PENDING_MASK);
 
 	/*
@@ -617,6 +618,9 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
 			info->si_signo = child->jobctl & JOBCTL_STOP_SIGMASK;
 			WARN_ON_ONCE(!info->si_signo);
 		}
+
+		/* tracer got siginfo, clear the sticky trap */
+		child->jobctl &= ~JOBCTL_TRAP_NOTIFY;
 	}
 out_unlock:
 	unlock_task_sighand(child, &flags);
@@ -923,9 +927,14 @@ static void ptrace_put_task_struct(struct task_struct *child)
 
 	/*
 	 * Make sure @chlid is still ptraced by us and clear BLOCK_NOTIFY.
+	 * If TRAPPING is set, it means NOTIFY occurred in-between and
+	 * re-trap was blocked.  Trigger re-trap.
 	 */
-	if (likely((child->ptrace & PT_PTRACED) && child->parent == current))
+	if (likely((child->ptrace & PT_PTRACED) && child->parent == current)) {
 		child->jobctl &= ~JOBCTL_BLOCK_NOTIFY;
+		if (child->jobctl & JOBCTL_TRAPPING)
+			signal_wake_up(child, task_is_traced(child));
+	}
 
 	unlock_task_sighand(child, &flags);
 out_put:
diff --git a/kernel/signal.c b/kernel/signal.c
index c05f4d1..b794363 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -784,6 +784,60 @@ static int check_kill_permission(int sig, struct siginfo *info,
 	return security_task_kill(t, info, sig, 0);
 }
 
+/**
+ * ptrace_trap_notify - schedule trap to notify ptracer
+ * @t: tracee wanting to notify tracer
+ *
+ * This function schedules sticky ptrace trap which is cleared on
+ * PTRACE_GETSIGINFO to notify ptracer of an event.  @t must have been
+ * seized by ptracer.
+ *
+ * If @t is running, STOP trap will be taken.  If already trapped for STOP,
+ * it will re-trap.  If trapped for other traps, STOP trap will be
+ * eventually taken without returning to userland after the existing traps
+ * are finished by PTRACE_CONT.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void ptrace_trap_notify(struct task_struct *t)
+{
+	siginfo_t *si = t->last_siginfo;
+
+	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
+	assert_spin_locked(&t->sighand->siglock);
+
+	/*
+	 * @t is being ptraced and new SEIZE behavior is in effect.
+	 * Schedule sticky trap which will clear on the next GETSIGINFO.
+	 */
+	t->jobctl |= JOBCTL_TRAP_NOTIFY;
+
+	/*
+	 * If @t is currently trapped for STOP, it should re-trap with new
+	 * exit_code indicating continuation so that the ptracer can notice
+	 * the event; otherwise, use normal signal delivery wake up.
+	 *
+	 * The re-trapping sets JOBCTL_TRAPPING such that the transition is
+	 * hidden from the ptracer.
+	 *
+	 * This means that if @t is trapped for other reasons than STOP,
+	 * the notification trap won't be delievered until the current one
+	 * is complete.  This is the intended behavior.
+	 *
+	 * Note that if JOBCTL_BLOCK_NOTIFY, TRAPPING is set but actual
+	 * re-trap doesn't happen.  This is used to avoid waking up while
+	 * ptrace request is in progress.  The ptracer will notice TRAPPING
+	 * is set on request completion and trigger re-trap.
+	 */
+	if (task_is_traced(t) && si && si->si_code == PTRACE_STOP_SI_CODE) {
+		t->jobctl |= JOBCTL_TRAPPING;
+		if (!(t->jobctl & JOBCTL_BLOCK_NOTIFY))
+			signal_wake_up(t, true);
+	} else
+		signal_wake_up(t, false);
+}
+
 /*
  * Handle magic process-wide effects of stop/continue signals. Unlike
  * the signal actions, these happen immediately at signal-generation
@@ -822,7 +876,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		do {
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
-			wake_up_state(t, __TASK_STOPPED);
+			if (likely(!(t->ptrace & PT_SEIZED)))
+				wake_up_state(t, __TASK_STOPPED);
+			else
+				ptrace_trap_notify(t);
 		} while_each_thread(p, t);
 
 		/*
@@ -1938,7 +1995,10 @@ static bool do_signal_stop(int signr)
 			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
 				t->jobctl |= signr | gstop;
 				sig->group_stop_count++;
-				signal_wake_up(t, 0);
+				if (likely(!(t->ptrace & PT_SEIZED)))
+					signal_wake_up(t, 0);
+				else
+					ptrace_trap_notify(t);
 			}
 		}
 	}
@@ -1976,10 +2036,10 @@ static bool do_signal_stop(int signr)
 		schedule();
 	} else {
 		/*
-		 * While ptraced, group stop is handled by STOP trap.
+		 * While ptraced, group stop is handled by NOTIFY trap.
 		 * Schedule it and let the caller deal with it.
 		 */
-		current->jobctl |= JOBCTL_TRAP_STOP;
+		current->jobctl |= JOBCTL_TRAP_NOTIFY;
 		spin_unlock_irq(&current->sighand->siglock);
 	}
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/10] signal: remove three noop tracehooks
  2011-05-16 18:17 ` [PATCH 01/10] signal: remove three noop tracehooks Tejun Heo
@ 2011-05-17 16:22   ` Christoph Hellwig
  2011-05-17 16:27     ` Tejun Heo
  2011-05-18 18:45   ` Oleg Nesterov
  1 sibling, 1 reply; 88+ messages in thread
From: Christoph Hellwig @ 2011-05-17 16:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: oleg, jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm,
	indan, bdonlan

On Mon, May 16, 2011 at 08:17:20PM +0200, Tejun Heo wrote:
> Remove the following three noop tracehooks in signals.c.
> 
> * tracehook_force_sigpending()
> * tracehook_get_signal()
> * tracehook_finish_jctl()
> 
> The code area is about to be updated and these hooks don't do anything
> other than obfuscating the logic.

Btw, I think we should kill that stupid <linux/tracehook.h> layer
entirely.  It consists of a few relatively useful function which should
just be renamed to ptrace_* and moved to ptrace.c, and a lot of entirely
pointless wrappers which generally have only a single user, and often
tons of arguments that aren't actually used.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/10] signal: remove three noop tracehooks
  2011-05-17 16:22   ` Christoph Hellwig
@ 2011-05-17 16:27     ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-17 16:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: oleg, jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm,
	indan, bdonlan

Hello,

On Tue, May 17, 2011 at 12:22:24PM -0400, Christoph Hellwig wrote:
> On Mon, May 16, 2011 at 08:17:20PM +0200, Tejun Heo wrote:
> > Remove the following three noop tracehooks in signals.c.
> > 
> > * tracehook_force_sigpending()
> > * tracehook_get_signal()
> > * tracehook_finish_jctl()
> > 
> > The code area is about to be updated and these hooks don't do anything
> > other than obfuscating the logic.
> 
> Btw, I think we should kill that stupid <linux/tracehook.h> layer
> entirely.  It consists of a few relatively useful function which should
> just be renamed to ptrace_* and moved to ptrace.c, and a lot of entirely
> pointless wrappers which generally have only a single user, and often
> tons of arguments that aren't actually used.

Yeap, agreed.  It's on the todo list.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-16 18:17 ` [PATCH 03/10] ptrace: implement PTRACE_SEIZE Tejun Heo
@ 2011-05-18  0:40   ` Denys Vlasenko
  2011-05-18  9:55     ` Tejun Heo
  2011-05-18 18:17   ` Oleg Nesterov
  1 sibling, 1 reply; 88+ messages in thread
From: Denys Vlasenko @ 2011-05-18  0:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: oleg, jan.kratochvil, linux-kernel, torvalds, akpm, indan, bdonlan

On Monday 16 May 2011 20:17, Tejun Heo wrote:
> PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
> effects on tracee signal and job control states.  This patch
> implements a new ptrace request PTRACE_SEIZE which attaches and traps
> tracee without affecting its signal and job control states.
> 
> The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
> flags in @data.  Currently, the only defined flag is
> PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
> PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
> The changes will be implemented gradually and the DEVEL flag is to
> prevent programs which expect full SEIZE behavior from using it before
> all the behavior modifications are complete while allowing unit
> testing.  The flag will be removed once SEIZE behaviors are completely
> implemented.
> 
> After PTRACE_SEIZE, tracee will trap.  Which trap will happen isn't
> fixed.  If other trap conditions exist (e.g. signal delivery), they
> might be taken; otherwise, a trap with exit_code == (SIGTRAP |
> PTRACE_EVENT_STOP << 8) is taken.  If seized, this trap is also used
> for group stop traps instead of exit_code == 0 with NULL GETSIGINFO.
> 
> * PTRACE_SEIZE doesn't affect signal or group stop state.
> 
> * After PTRACE_SEIZE, one trap will happen which might be a
>   PTRACE_EVENT_STOP trap.
> 
> * If PTRACE_SEIZE'd, group stop also uses PTRACE_EVENT_STOP trap which
>   uses exit_code of (SIGTRAP | PTRACE_EVENT_STOP << 8) instead of the
>   stopping signal number and returns usual trap siginfo on
>   PTRACE_GETSIGINFO instead of NULL.
> 
> Note that there currently is no way to find out the stopping signal
> number while seized.  This will be improved by future patches.
> 
> Seizing sets PT_SEIZED in ->ptrace of the tracee.  This flag will be
> used to determine whether new SEIZE behaviors should be enabled.

This makes PTRACE_EVENT_STOP similar to other PTRACE_EVENTs.
The only difference is that it can't be set by PTRACE_SETOPTIONS
as other events do, but activated implicitly by PTRACE_SEIZE.

This made me thinking.

How about making API even more similar to existing one?

Create PTRACE_O_TRACESTOP, make it settable by PTRACE_SETOPTIONS too.

Make PTRACE_SEIZE take the mask of PTRACE_O_xyz flags
as data argument.
If PTRACE_O_TRACESTOP is set, it works as you described above.
If PTRACE_O_TRACESTOP is not set, then it works as good old PTRACE_ATTACH.
In both cases, immediately at attach it sets opts a-la PTRACE_SETOPTIONS.

We can even avoid introducing PTRACE_SEIZE at all, because
currently PTRACE_ATTACH ignores its data argument.

I know, I know, "this changes API", but did we ever promise
that PTRACE_ATTACH with nonzero data arg is a valid usage?
Also, I perused first 10 pages of google code search results
and I see that everybody passes 0 or NULL.

-- 
vda

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-18  0:40   ` Denys Vlasenko
@ 2011-05-18  9:55     ` Tejun Heo
  2011-05-18 10:44       ` Denys Vlasenko
                         ` (2 more replies)
  0 siblings, 3 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-18  9:55 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: oleg, jan.kratochvil, linux-kernel, torvalds, akpm, indan, bdonlan

Hello, Denys.

On Wed, May 18, 2011 at 02:40:56AM +0200, Denys Vlasenko wrote:
> This makes PTRACE_EVENT_STOP similar to other PTRACE_EVENTs.
> The only difference is that it can't be set by PTRACE_SETOPTIONS
> as other events do, but activated implicitly by PTRACE_SEIZE.

Also by PTRACE_INTERRUPT and group stop.

> This made me thinking.
> 
> How about making API even more similar to existing one?
> 
> Create PTRACE_O_TRACESTOP, make it settable by PTRACE_SETOPTIONS too.
> 
> Make PTRACE_SEIZE take the mask of PTRACE_O_xyz flags
> as data argument.
> If PTRACE_O_TRACESTOP is set, it works as you described above.
> If PTRACE_O_TRACESTOP is not set, then it works as good old PTRACE_ATTACH.
> In both cases, immediately at attach it sets opts a-la PTRACE_SETOPTIONS.
>
> We can even avoid introducing PTRACE_SEIZE at all, because
> currently PTRACE_ATTACH ignores its data argument.
> 
> I know, I know, "this changes API", but did we ever promise
> that PTRACE_ATTACH with nonzero data arg is a valid usage?
> Also, I perused first 10 pages of google code search results
> and I see that everybody passes 0 or NULL.

But as SEIZE introduces behavior differences throughout ptrace
operation, I think it's actually beneficial to use a distinctively new
request.  It's not like it costs anything or we're short on request
number space.

Similar issue with PTRACE_O_TRACESTOP.  It won't only enable TRACESTOP
it will change other behaviors too and I think allowing clearing the
option while attached would open up a lot of new issues without much
benefit.  Making it a PTRACE_O_ flag and not allowing clearing is
weird too.

I've been thinking about Jan's suggestion to make ATTACH and DETACH
not require tracee to trap.  We already have this for DETACH for cases
where the tracer is killed and it seems it wouldn't be too difficult
to make that happen for ATTACH either and for that to be truly useful
I suppose PTRACE_SETOPTIONS shouldn't require trapped state either.
Jan, would that be enough for the use cases you have on mind?

So, if we do that, there would be a clear separation between SEIZE and
INTERRUPT and there wouldn't be any reason to make SEIZE to set
PTRACE_O_ options or whatever.  It just attaches and the rest can be
done with appropriate requests.  SEIZE might not be the best name with
this functionality but I think ATTACH_NOINTERRUPT would be too
confusing in that it implies that the only difference is not
interrupting.  Can somebody think of a better verb?

Oleg, what do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-18  9:55     ` Tejun Heo
@ 2011-05-18 10:44       ` Denys Vlasenko
  2011-05-18 11:14         ` Tejun Heo
  2011-05-19 14:17       ` Tejun Heo
  2011-05-23 12:43       ` Oleg Nesterov
  2 siblings, 1 reply; 88+ messages in thread
From: Denys Vlasenko @ 2011-05-18 10:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: oleg, jan.kratochvil, linux-kernel, torvalds, akpm, indan, bdonlan

On Wed, May 18, 2011 at 11:55 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Denys.
>
> On Wed, May 18, 2011 at 02:40:56AM +0200, Denys Vlasenko wrote:
>> This makes PTRACE_EVENT_STOP similar to other PTRACE_EVENTs.
>> The only difference is that it can't be set by PTRACE_SETOPTIONS
>> as other events do, but activated implicitly by PTRACE_SEIZE.
>
> Also by PTRACE_INTERRUPT and group stop.

"Activating" meaning "enabling", not "causing". Only PTRACE_SEIZE
enables PTRACE_EVENT_STOP.

>> This made me thinking.
>>
>> How about making API even more similar to existing one?
>>
>> Create PTRACE_O_TRACESTOP, make it settable by PTRACE_SETOPTIONS too.
>>
>> Make PTRACE_SEIZE take the mask of PTRACE_O_xyz flags
>> as data argument.
>> If PTRACE_O_TRACESTOP is set, it works as you described above.
>> If PTRACE_O_TRACESTOP is not set, then it works as good old PTRACE_ATTACH.
>> In both cases, immediately at attach it sets opts a-la PTRACE_SETOPTIONS.
>>
>> We can even avoid introducing PTRACE_SEIZE at all, because
>> currently PTRACE_ATTACH ignores its data argument.
>>
>> I know, I know, "this changes API", but did we ever promise
>> that PTRACE_ATTACH with nonzero data arg is a valid usage?
>> Also, I perused first 10 pages of google code search results
>> and I see that everybody passes 0 or NULL.
>
> But as SEIZE introduces behavior differences throughout ptrace
> operation,
>
> Similar issue with PTRACE_O_TRACESTOP.  It won't only enable TRACESTOP
> it will change other behaviors too

All these differences revolve around making handling of stops
and SIGCONT better. It seems fitting to the option name.

PTRACE_SEIZE sets a flag somewhere "please convert
group-stops into PTRACE_EVENT_STOPs".

But, we *already have* ptrace op which performs this action
of modifying (or adding) stops - it's PTRACE_SETOPTIONS.
With PTRACE_O_TRACESYSGOOD it modifies syscall-stops.
With PTRACE_O_TRACEEXEC - post-execve stop.
With PTRACE_O_TRACEFORK it adds a stop after fork.

Since we have this API, why not use it for the very similar
concept of modifying group-stops too?

> I think it's actually beneficial to use a distinctively new
> request.  It's not like it costs anything or we're short on request
> number space.

"Entities must not be multiplied beyond necessity"?

-- 
vda

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-18 10:44       ` Denys Vlasenko
@ 2011-05-18 11:14         ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-18 11:14 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: oleg, jan.kratochvil, linux-kernel, torvalds, akpm, indan, bdonlan

Hello, Denys.

On Wed, May 18, 2011 at 12:44:58PM +0200, Denys Vlasenko wrote:
> > But as SEIZE introduces behavior differences throughout ptrace
> > operation,
> >
> > Similar issue with PTRACE_O_TRACESTOP.  It won't only enable TRACESTOP
> > it will change other behaviors too
> 
> All these differences revolve around making handling of stops
> and SIGCONT better. It seems fitting to the option name.
>
> PTRACE_SEIZE sets a flag somewhere "please convert
> group-stops into PTRACE_EVENT_STOPs".

I don't know.  It seems a bit too subtle.  The new behaviors are a lot
more pervasive than any controlled by PTRACE_O_ flags and there's the
problem of allowing its clearing.

> Since we have this API, why not use it for the very similar
> concept of modifying group-stops too?

Different scope.  Conflicting semantics (clearing).  API compatibility
as you pointed out.

> > I think it's actually beneficial to use a distinctively new
> > request.  It's not like it costs anything or we're short on request
> > number space.
> 
> "Entities must not be multiplied beyond necessity"?

Sure, but whether to have a flag or a request is trivial.  When one of
them has issues like above, I see a new request number go above the
necessity threshold.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap
  2011-05-16 18:17 ` [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap Tejun Heo
@ 2011-05-18 16:48   ` Oleg Nesterov
  2011-05-18 16:57     ` Oleg Nesterov
  2011-05-19 10:19     ` Tejun Heo
  0 siblings, 2 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-18 16:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/16, Tejun Heo wrote:
>
> This patch adds a new jobctl flag - JOBCTL_TRAP_STOP.  When set, it
> triggers a trap site, which behaves like group stop trap, in
> get_signal_to_deliver() before checking for pending signals.  While
> ptraced, do_signal_stop() doesn't stop itself.  It initiates group
> stop if requested and schedules JOBCTL_TRAP_STOP and returns, which
> makes its caller - get_signal_to_deliver() - to relock, check and
> enter the trap.

Heh. Starting from this patch, I think I will never understand this
code in details ;)

> Although this adds an unlock-relocking between checking of
> JOBCTL_STOP_PENDING and actually trapping for STOP, this doesn't
> affect correctness.

Well, I think it does affect. Although the problem is minor.

> ptrace_stop() already had conditional
> unlock-relocking

Yes, but ptrace_stop() can't send the CLD_STOPPED notfication before
the tracee is ready for do_wait(WNOHANG). Contrary, get_signal_to_deliver()
can if we race with SIGCONT.

> While at it, add proper function comment to do_signal_stop() and make
> it return bool.
> ...
>
> + * RETURNS:
> + * %false if group stop is already cancelled and nothing happened.  %true
> + * if participated in group stop.

Well, the traced task didn't participate yet... Nevermind.



So far I can't really understand why do we have both JOBCTL_TRAP_STOP and
and JOBCTL_TRAPPING... Please ignore, I didn't read other patches yet.


Hmm. And afaics there is a bug in do_signal_stop(), after
"[PATCH 6/9] job control: make task_clear_jobctl_pending() clear TRAPPING automatically"
->jobctl &= ~JOBCTL_STOP_SIGMASK is no longer safe. We can clear _TRAPPING
without wakeup.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap
  2011-05-18 16:48   ` Oleg Nesterov
@ 2011-05-18 16:57     ` Oleg Nesterov
  2011-05-19 10:19     ` Tejun Heo
  1 sibling, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-18 16:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/18, Oleg Nesterov wrote:
>
> So far I can't really understand why do we have both JOBCTL_TRAP_STOP and
> and JOBCTL_TRAPPING... Please ignore, I didn't read other patches yet.

OK, SEIZE sets JOBCTL_TRAP_STOP.

> Hmm. And afaics there is a bug in do_signal_stop(), after
> "[PATCH 6/9] job control: make task_clear_jobctl_pending() clear TRAPPING automatically"
> ->jobctl &= ~JOBCTL_STOP_SIGMASK is no longer safe. We can clear _TRAPPING
> without wakeup.

And this also affects ptrace_attach(PTRACE_SEIZE), TRAP_STOP is cleared
as well.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-16 18:17 ` [PATCH 03/10] ptrace: implement PTRACE_SEIZE Tejun Heo
  2011-05-18  0:40   ` Denys Vlasenko
@ 2011-05-18 18:17   ` Oleg Nesterov
  2011-05-19 10:34     ` Tejun Heo
  1 sibling, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-18 18:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/16, Tejun Heo wrote:
>
> @@ -315,6 +338,9 @@ static int ptrace_attach(struct task_struct *task)
>  	if (task_is_stopped(task)) {
>  		task->jobctl |= JOBCTL_TRAP_STOP | JOBCTL_TRAPPING;
>  		signal_wake_up(task, 1);
> +	} else if (seize) {
> +		task->jobctl |= JOBCTL_TRAP_STOP;

So, this can race with do_signal_stop(), it can clear TRAP_STOP and
JOBCTL_STOP_PENDING can guarantee the tracee will trap later.

> +	 * When PT_SEIZED, it's used for both group stop and explicit
> +	 * SEIZE/INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap
> +	 * with accompanying siginfo.
> +	 *
> +	 * When !PT_SEIZED, it's used only for group stop trap with
> +	 * CLD_STOPPED as exit_code and no siginfo.
>  	 */
>  	if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
> -		signr = current->jobctl & JOBCTL_STOP_SIGMASK;
> -		WARN_ON_ONCE(!signr);
> -		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
> -		current->exit_code = 0;
> +		if (current->ptrace & PT_SEIZED) {
> +			ptrace_do_notify(SIGTRAP | PTRACE_EVENT_STOP << 8,
> +					 CLD_STOPPED);

So. When PT_SEIZED, we always report PTRACE_EVENT_STOP and PTRACE_GET_SIGINFO
always works.

Personally I agree, this looks more clean and natural.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 04/10] ptrace: implement PTRACE_INTERRUPT
  2011-05-16 18:17 ` [PATCH 04/10] ptrace: implement PTRACE_INTERRUPT Tejun Heo
@ 2011-05-18 18:38   ` Oleg Nesterov
  2011-05-19 12:07     ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-18 18:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/16, Tejun Heo wrote:
>
> +	case PTRACE_INTERRUPT:
> +		/*
> +		 * Stop tracee without any side-effect on signal or job
> +		 * control.  At least one trap is guaranteed to happen
> +		 * after this request.  If @child is already trapped, the
> +		 * current trap is not disturbed and another trap will
> +		 * happen after the current trap is ended with PTRACE_CONT.
> +		 *
> +		 * The actual trap might not be PTRACE_EVENT_STOP trap but
> +		 * the pending condition is cleared regardless.
> +		 */
> +		if (likely(child->ptrace & PT_SEIZED) &&
> +		    lock_task_sighand(child, &flags)) {
> +			child->jobctl |= JOBCTL_TRAP_STOP;

The same race with do_signal_stop() afaics.

Otherwise looks fine to me. Compared to V1, personally I like the new
behaviour more. PTRACE_INTERRUPT and PTRACE_SEIZE do the same.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/10] signal: remove three noop tracehooks
  2011-05-16 18:17 ` [PATCH 01/10] signal: remove three noop tracehooks Tejun Heo
  2011-05-17 16:22   ` Christoph Hellwig
@ 2011-05-18 18:45   ` Oleg Nesterov
  2011-05-19 12:11     ` Tejun Heo
  1 sibling, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-18 18:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/16, Tejun Heo wrote:
>
> these hooks don't do anything
> other than obfuscating the logic.

Heh. What can I say? Yes, this is true. Of course this breaks the out-of-tree
project, completely, and you even know its name ;) Although tracehook_get_signal()
logic was already broken by other changes...

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (9 preceding siblings ...)
  2011-05-16 18:17 ` [PATCH 10/10] ptrace: implement group stop notification for ptracer Tejun Heo
@ 2011-05-18 18:50 ` Oleg Nesterov
  2011-05-19 12:08   ` Tejun Heo
  2011-05-19 15:04 ` Linus Torvalds
  11 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-18 18:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/16, Tejun Heo wrote:
>
> This patchset is on top of Oleg's ptrace branch[3] + prep patchset[2]
> and available in the following git branch.

IIUC 8/9 from prep patchset[2] needs the fix, do_wait()->restart_syscall()
is not right. Do you agree or I misunderstood the result of our discussion?

I'll continue the reading tomorrow.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap
  2011-05-18 16:48   ` Oleg Nesterov
  2011-05-18 16:57     ` Oleg Nesterov
@ 2011-05-19 10:19     ` Tejun Heo
  2011-05-19 16:19       ` Oleg Nesterov
  1 sibling, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 10:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello, Oleg.

On Wed, May 18, 2011 at 06:48:14PM +0200, Oleg Nesterov wrote:
> > Although this adds an unlock-relocking between checking of
> > JOBCTL_STOP_PENDING and actually trapping for STOP, this doesn't
> > affect correctness.
> 
> Well, I think it does affect. Although the problem is minor.
> 
> > ptrace_stop() already had conditional unlock-relocking
> 
> Yes, but ptrace_stop() can't send the CLD_STOPPED notfication before
> the tracee is ready for do_wait(WNOHANG). Contrary, get_signal_to_deliver()
> can if we race with SIGCONT.

That CLD_STOPPED notification is actually for continuation (although
it's indistinguishible from actual stopped notification) and as such
the ptracer has to query the tracee state after the notification
signal and can't expect it to be in TRACED.  Please consider the
following scenario.

1. thread t1 and t2.  t1 is ptraced.  Both running.

2. SIGSTOP delivered by t2.  Group stop starts.

3. SIGCONT is generated before t1 participates.

4. t1 will enter get_signal_to_deliver() and deliver CLD_STOPPED but
   won't trap.

So, I don't think this affects correctness.  The above can happen if
both t1 and t2 are attached by the same ptracer.  What changes is that
it now may happen with single thread too.  We can put STOP trap before
CONTINUED notification but I don't think that's necessary.

> > While at it, add proper function comment to do_signal_stop() and make
> > it return bool.
> > ...
> >
> > + * RETURNS:
> > + * %false if group stop is already cancelled and nothing happened.  %true
> > + * if participated in group stop.
> 
> Well, the traced task didn't participate yet... Nevermind.

Eh, should have read this earlier.  :-)

> So far I can't really understand why do we have both JOBCTL_TRAP_STOP and
> and JOBCTL_TRAPPING... Please ignore, I didn't read other patches yet.

JOBCTL_TRAP_STOP is the renamed JOBCTL_TRAP_INTERRUPT.  It's a trap
condition while TRAPPING is synchronization flag to protect -> TRACED
transitions.  JOBCTL_TRAP_STOP might not be the best name, but it
isn't INTERRUPT trap anymore JOBCTL_TRAP_GROUP_STOP or
JOBCTL_TRAP_GSTOP seemed a bit inconsistent with other flags.

> Hmm. And afaics there is a bug in do_signal_stop(), after
> "[PATCH 6/9] job control: make task_clear_jobctl_pending() clear TRAPPING automatically"
> ->jobctl &= ~JOBCTL_STOP_SIGMASK is no longer safe. We can clear _TRAPPING
> without wakeup.

Hmmm?  ->jobctl &= ~JOBCTL_STOP_SIGMASK clears lower sixteen bits.
All the flags including TRAPPING live in the upper sixteen bits.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-18 18:17   ` Oleg Nesterov
@ 2011-05-19 10:34     ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 10:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello,

On Wed, May 18, 2011 at 08:17:56PM +0200, Oleg Nesterov wrote:
> On 05/16, Tejun Heo wrote:
> >
> > @@ -315,6 +338,9 @@ static int ptrace_attach(struct task_struct *task)
> >  	if (task_is_stopped(task)) {
> >  		task->jobctl |= JOBCTL_TRAP_STOP | JOBCTL_TRAPPING;
> >  		signal_wake_up(task, 1);
> > +	} else if (seize) {
> > +		task->jobctl |= JOBCTL_TRAP_STOP;
> 
> So, this can race with do_signal_stop(), it can clear TRAP_STOP and
> JOBCTL_STOP_PENDING can guarantee the tracee will trap later.

Confused.  How can do_signal_stop() clear TRAP_STOP?

> > +	 * When PT_SEIZED, it's used for both group stop and explicit
> > +	 * SEIZE/INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap
> > +	 * with accompanying siginfo.
> > +	 *
> > +	 * When !PT_SEIZED, it's used only for group stop trap with
> > +	 * CLD_STOPPED as exit_code and no siginfo.
> >  	 */
> >  	if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
> > -		signr = current->jobctl & JOBCTL_STOP_SIGMASK;
> > -		WARN_ON_ONCE(!signr);
> > -		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
> > -		current->exit_code = 0;
> > +		if (current->ptrace & PT_SEIZED) {
> > +			ptrace_do_notify(SIGTRAP | PTRACE_EVENT_STOP << 8,
> > +					 CLD_STOPPED);
> 
> So. When PT_SEIZED, we always report PTRACE_EVENT_STOP and PTRACE_GET_SIGINFO
> always works.
> 
> Personally I agree, this looks more clean and natural.

Yeah, I like it much better.  INTERRUPT trap and group stop trap being
separate while sharing some attributes was disturbing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 04/10] ptrace: implement PTRACE_INTERRUPT
  2011-05-18 18:38   ` Oleg Nesterov
@ 2011-05-19 12:07     ` Tejun Heo
  2011-05-19 16:21       ` Oleg Nesterov
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 12:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello,

On Wed, May 18, 2011 at 08:38:15PM +0200, Oleg Nesterov wrote:
> On 05/16, Tejun Heo wrote:
> >
> > +	case PTRACE_INTERRUPT:
> > +		/*
> > +		 * Stop tracee without any side-effect on signal or job
> > +		 * control.  At least one trap is guaranteed to happen
> > +		 * after this request.  If @child is already trapped, the
> > +		 * current trap is not disturbed and another trap will
> > +		 * happen after the current trap is ended with PTRACE_CONT.
> > +		 *
> > +		 * The actual trap might not be PTRACE_EVENT_STOP trap but
> > +		 * the pending condition is cleared regardless.
> > +		 */
> > +		if (likely(child->ptrace & PT_SEIZED) &&
> > +		    lock_task_sighand(child, &flags)) {
> > +			child->jobctl |= JOBCTL_TRAP_STOP;
> 
> The same race with do_signal_stop() afaics.

I didn't understand that one, so you'll need to elaborate.

> Otherwise looks fine to me. Compared to V1, personally I like the new
> behaviour more. PTRACE_INTERRUPT and PTRACE_SEIZE do the same.

Great. :)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2
  2011-05-18 18:50 ` [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Oleg Nesterov
@ 2011-05-19 12:08   ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 12:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On Wed, May 18, 2011 at 08:50:50PM +0200, Oleg Nesterov wrote:
> On 05/16, Tejun Heo wrote:
> >
> > This patchset is on top of Oleg's ptrace branch[3] + prep patchset[2]
> > and available in the following git branch.
> 
> IIUC 8/9 from prep patchset[2] needs the fix, do_wait()->restart_syscall()
> is not right. Do you agree or I misunderstood the result of our discussion?

Yeah, I agree.  I was just waiting for review of this series before
proceeding with the next round.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/10] signal: remove three noop tracehooks
  2011-05-18 18:45   ` Oleg Nesterov
@ 2011-05-19 12:11     ` Tejun Heo
  2011-05-19 16:10       ` Oleg Nesterov
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 12:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello,

On Wed, May 18, 2011 at 08:45:59PM +0200, Oleg Nesterov wrote:
> On 05/16, Tejun Heo wrote:
> Heh. What can I say? Yes, this is true. Of course this breaks the out-of-tree
> project, completely, and you even know its name ;) Although tracehook_get_signal()
> logic was already broken by other changes...

Yeap, but with ptrace updates, tracehook behaviors, which can't even
be fully determined without consulting out-of-tree code, are bound to
change.  Now that ptrace is receiving more capabilities, I don't see
much point in maintaining them.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-18  9:55     ` Tejun Heo
  2011-05-18 10:44       ` Denys Vlasenko
@ 2011-05-19 14:17       ` Tejun Heo
  2011-05-19 15:02         ` Tejun Heo
                           ` (2 more replies)
  2011-05-23 12:43       ` Oleg Nesterov
  2 siblings, 3 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 14:17 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: oleg, jan.kratochvil, linux-kernel, torvalds, akpm, indan, bdonlan

Hello,

On Wed, May 18, 2011 at 11:55:39AM +0200, Tejun Heo wrote:
> I've been thinking about Jan's suggestion to make ATTACH and DETACH
> not require tracee to trap.  We already have this for DETACH for cases
> where the tracer is killed and it seems it wouldn't be too difficult
> to make that happen for ATTACH either and for that to be truly useful
> I suppose PTRACE_SETOPTIONS shouldn't require trapped state either.
> Jan, would that be enough for the use cases you have on mind?

I've been trying this and clean DETACH requires the tracee to be
trapped (or not running).  The arch detach hook, which BTW is not
executed when the tracer is killed, modifies tracee state expecting it
to be off-cpu.

But making SEIZE not trigger INTERRUPT and SETOPTIONS without
requiring TRACED don't seem too difficult.  Jan, would that be enough?
Oleg, what do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-19 14:17       ` Tejun Heo
@ 2011-05-19 15:02         ` Tejun Heo
  2011-05-19 19:31         ` Pedro Alves
  2011-05-23 13:09         ` Oleg Nesterov
  2 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 15:02 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: oleg, jan.kratochvil, linux-kernel, torvalds, akpm, indan, bdonlan

Hey, again.

On Thu, May 19, 2011 at 04:17:28PM +0200, Tejun Heo wrote:
> On Wed, May 18, 2011 at 11:55:39AM +0200, Tejun Heo wrote:
> > I've been thinking about Jan's suggestion to make ATTACH and DETACH
> > not require tracee to trap.  We already have this for DETACH for cases
> > where the tracer is killed and it seems it wouldn't be too difficult
> > to make that happen for ATTACH either and for that to be truly useful
> > I suppose PTRACE_SETOPTIONS shouldn't require trapped state either.
> > Jan, would that be enough for the use cases you have on mind?
> 
> I've been trying this and clean DETACH requires the tracee to be
> trapped (or not running).  The arch detach hook, which BTW is not
> executed when the tracer is killed, modifies tracee state expecting it
> to be off-cpu.
> 
> But making SEIZE not trigger INTERRUPT and SETOPTIONS without
> requiring TRACED don't seem too difficult.  Jan, would that be enough?
> Oleg, what do you think?

Even the implementation is rather simple.  If SEIZE and INTERRUPT are
okay as implemented, the following should be fine too.

Thanks.

---
 kernel/ptrace.c |   22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

Index: work/kernel/ptrace.c
===================================================================
--- work.orig/kernel/ptrace.c
+++ work/kernel/ptrace.c
@@ -329,7 +329,7 @@ static int ptrace_attach(struct task_str
 
 	__ptrace_link(task, current);
 
-	/* SEIZE uses TRAP_STOP instead of SIGSTOP for initial trap */
+	/* SEIZE doesn't trap tracee on attach */
 	if (!seize)
 		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
 
@@ -353,9 +353,6 @@ static int ptrace_attach(struct task_str
 	if (task_is_stopped(task)) {
 		task->jobctl |= JOBCTL_TRAP_STOP | JOBCTL_TRAPPING;
 		signal_wake_up(task, 1);
-	} else if (seize) {
-		task->jobctl |= JOBCTL_TRAP_STOP;
-		signal_wake_up(task, 0);
 	}
 
 	spin_unlock(&task->sighand->siglock);
@@ -907,6 +904,17 @@ static struct task_struct *ptrace_get_ta
 #define arch_ptrace_attach(child)	do { } while (0)
 #endif
 
+static bool ptrace_is_async_req(struct task_struct *child, int req)
+{
+	if (req == PTRACE_KILL)
+		return true;
+
+	if (!(child->ptrace & PT_SEIZED))
+		return false;
+
+	return req == PTRACE_SETOPTIONS || req == PTRACE_INTERRUPT;
+}
+
 /**
  * ptrace_put_task_struct - ptrace request processing done, put child
  * @child: child task struct to put
@@ -971,8 +979,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
-				  request == PTRACE_INTERRUPT);
+	ret = ptrace_check_attach(child, ptrace_is_async_req(child, request));
 	if (ret < 0)
 		goto out_put_task_struct;
 
@@ -1114,8 +1121,7 @@ asmlinkage long compat_sys_ptrace(compat
 		goto out_put_task_struct;
 	}
 
-	ret = ptrace_check_attach(child, request == PTRACE_KILL ||
-				  request == PTRACE_INTERRUPT);
+	ret = ptrace_check_attach(child, ptrace_is_async_req(child, request));
 	if (!ret)
 		ret = compat_arch_ptrace(child, request, addr, data);
 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2
  2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
                   ` (10 preceding siblings ...)
  2011-05-18 18:50 ` [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Oleg Nesterov
@ 2011-05-19 15:04 ` Linus Torvalds
  2011-05-19 15:19   ` Tejun Heo
  2011-05-19 22:45   ` Denys Vlasenko
  11 siblings, 2 replies; 88+ messages in thread
From: Linus Torvalds @ 2011-05-19 15:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: oleg, jan.kratochvil, vda.linux, linux-kernel, akpm, indan, bdonlan

On Mon, May 16, 2011 at 11:17 AM, Tejun Heo <tj@kernel.org> wrote:
>
> This is the second try at implementing PTRACE_SEIZE/INTERRUPT and
> group stop notification.  Notable changes from the first take[1] are,
>
> * Prep patches moved to a separate patchset[2].

So having followed the discussion so far, quite frankly I'm not
convinced this series is 2.6.40 material.

I think that conceptually the split-up of PTRACE_ATTACH into
SEIZE/INTERRUPT might be fine, but I don't think the interface is
necessarily cooked, and perhaps more importantly I'm not at all sure
that the (few) current users of ptrace() would even switch over.

So I think Oleg's branch with cleanups is probably ready, and maybe a
few of the preparatory patches from this branch can be merged, but I
would _strongly_ suggest that the plan for 2.6.40 should be to not
actually mess with interfaces to the kernel, but just cleaning up the
actual internal implementation. I would like to keep 2.6.40 small and
simple.

Comments?

                       Linus

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2
  2011-05-19 15:04 ` Linus Torvalds
@ 2011-05-19 15:19   ` Tejun Heo
  2011-05-19 22:45   ` Denys Vlasenko
  1 sibling, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 15:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: oleg, jan.kratochvil, vda.linux, linux-kernel, akpm, indan, bdonlan

Hello, Linus.

On Thu, May 19, 2011 at 08:04:28AM -0700, Linus Torvalds wrote:
> On Mon, May 16, 2011 at 11:17 AM, Tejun Heo <tj@kernel.org> wrote:
> >
> > This is the second try at implementing PTRACE_SEIZE/INTERRUPT and
> > group stop notification.  Notable changes from the first take[1] are,
> >
> > * Prep patches moved to a separate patchset[2].
> 
> So having followed the discussion so far, quite frankly I'm not
> convinced this series is 2.6.40 material.

Definitely not.  The interface is not even fixed/complete yet.  I
think we should document the whole ptrace, job control and signal
interaction properly before committing to the interface.

> I think that conceptually the split-up of PTRACE_ATTACH into
> SEIZE/INTERRUPT might be fine, but I don't think the interface is
> necessarily cooked, and perhaps more importantly I'm not at all sure
> that the (few) current users of ptrace() would even switch over.

I think it shouldn't be too difficult for strace.  gdb would be a
challenge tho.  At any rate, given that the existing mechanism is
simply broken, I'm hoping they would switch over soonish.

> So I think Oleg's branch with cleanups is probably ready, and maybe a
> few of the preparatory patches from this branch can be merged, but I
> would _strongly_ suggest that the plan for 2.6.40 should be to not
> actually mess with interfaces to the kernel, but just cleaning up the
> actual internal implementation. I would like to keep 2.6.40 small and
> simple.

Upto the patchset which cleaned up job control notifications, Oleg's
recent sigprocmask plus some odd fix patches should be basically it
for 2.6.40.  The current ptrace branch in Oleg's tree seems to have
just that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 01/10] signal: remove three noop tracehooks
  2011-05-19 12:11     ` Tejun Heo
@ 2011-05-19 16:10       ` Oleg Nesterov
  0 siblings, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 16:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hi,

On 05/19, Tejun Heo wrote:
>
> On Wed, May 18, 2011 at 08:45:59PM +0200, Oleg Nesterov wrote:
> > On 05/16, Tejun Heo wrote:
> > Heh. What can I say? Yes, this is true. Of course this breaks the out-of-tree
> > project, completely, and you even know its name ;) Although tracehook_get_signal()
> > logic was already broken by other changes...
>
> Yeap, but with ptrace updates, tracehook behaviors, which can't even
> be fully determined without consulting out-of-tree code, are bound to
> change.  Now that ptrace is receiving more capabilities, I don't see
> much point in maintaining them.

I didn't try to argue against this change. Although it will certainly
complicate my life ;)

Btw, this patch conflicts with 25985edcedea6396277003854657b5f3cb31a628
"Fix common misspellings", trivial.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap
  2011-05-19 10:19     ` Tejun Heo
@ 2011-05-19 16:19       ` Oleg Nesterov
  0 siblings, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 16:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/19, Tejun Heo wrote:
>
> On Wed, May 18, 2011 at 06:48:14PM +0200, Oleg Nesterov wrote:
> > > Although this adds an unlock-relocking between checking of
> > > JOBCTL_STOP_PENDING and actually trapping for STOP, this doesn't
> > > affect correctness.
> >
> > Well, I think it does affect. Although the problem is minor.
> >
> > > ptrace_stop() already had conditional unlock-relocking
> >
> > Yes, but ptrace_stop() can't send the CLD_STOPPED notfication before
> > the tracee is ready for do_wait(WNOHANG). Contrary, get_signal_to_deliver()
> > can if we race with SIGCONT.
>
> That CLD_STOPPED notification is actually for continuation (although
> it's indistinguishible from actual stopped notification) and as such
> the ptracer has to query the tracee state after the notification
> signal and can't expect it to be in TRACED.  Please consider the
> following scenario.

OK, agreed.

> > Hmm. And afaics there is a bug in do_signal_stop(), after
> > "[PATCH 6/9] job control: make task_clear_jobctl_pending() clear TRAPPING automatically"
> > ->jobctl &= ~JOBCTL_STOP_SIGMASK is no longer safe. We can clear _TRAPPING
> > without wakeup.
>
> Hmmm?  ->jobctl &= ~JOBCTL_STOP_SIGMASK clears lower sixteen bits.
> All the flags including TRAPPING live in the upper sixteen bits.

Indeed, I was confused. Thanks for correcting me.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 04/10] ptrace: implement PTRACE_INTERRUPT
  2011-05-19 12:07     ` Tejun Heo
@ 2011-05-19 16:21       ` Oleg Nesterov
  0 siblings, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 16:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/19, Tejun Heo wrote:
>
> Hello,
>
> On Wed, May 18, 2011 at 08:38:15PM +0200, Oleg Nesterov wrote:
> > On 05/16, Tejun Heo wrote:
> > >
> > > +	case PTRACE_INTERRUPT:
> > > +		/*
> > > +		 * Stop tracee without any side-effect on signal or job
> > > +		 * control.  At least one trap is guaranteed to happen
> > > +		 * after this request.  If @child is already trapped, the
> > > +		 * current trap is not disturbed and another trap will
> > > +		 * happen after the current trap is ended with PTRACE_CONT.
> > > +		 *
> > > +		 * The actual trap might not be PTRACE_EVENT_STOP trap but
> > > +		 * the pending condition is cleared regardless.
> > > +		 */
> > > +		if (likely(child->ptrace & PT_SEIZED) &&
> > > +		    lock_task_sighand(child, &flags)) {
> > > +			child->jobctl |= JOBCTL_TRAP_STOP;
> >
> > The same race with do_signal_stop() afaics.
>
> I didn't understand that one,

Yeah, I was wrong, see the previous email.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 07/10] ptrace: make group stop state visible via PTRACE_GETSIGINFO
  2011-05-16 18:17 ` [PATCH 07/10] ptrace: make group stop state visible via PTRACE_GETSIGINFO Tejun Heo
@ 2011-05-19 16:27   ` Oleg Nesterov
  2011-05-19 16:40     ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 16:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/16, Tejun Heo wrote:
>
>  static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
>  {
> +	struct signal_struct *sig;
>  	unsigned long flags;
>  	int error;
>
>  	if (!lock_task_sighand(child, &flags))
>  		return -ESRCH;
> +	sig = child->signal;
>
>  	error = -EINVAL;
>  	if (!child->last_siginfo)
> @@ -585,6 +587,23 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
>
>  	error = 0;
>  	*info = *child->last_siginfo;
> +
> +	/*
> +	 * If reporting ptrace trap for a seized tracee, enable reporting
> +	 * of info->si_pt_flags.
> +	 */
> +	if ((child->ptrace & PT_SEIZED) &&
> +	    (info->si_code & __SI_MASK) == __SI_TRAP) {
> +		/*
> +		 * Report whether group stop is in effect w/ SI_STOPPED and
> +		 * if so which signal caused it.
> +		 */
> +		if (sig->group_stop_count || sig->flags & SIGNAL_STOP_STOPPED) {

Cosmetic nit, you could declare "struct signal_struct *sig" here, under
if (PT_SEIZED && __SI_TRAP).


I must admit, personally I still dislike this new PTRACE_GETSIGINFO API...
But I can't suggest anything better, so I am not going to argue.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY
  2011-05-16 18:17 ` [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY Tejun Heo
@ 2011-05-19 16:32   ` Oleg Nesterov
  2011-05-19 16:44     ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 16:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/16, Tejun Heo wrote:
>
> +static void ptrace_put_task_struct(struct task_struct *child)
> +{
> +	unsigned long flags;
> +
> +	if (!(child->jobctl & JOBCTL_BLOCK_NOTIFY))
> +		goto out_put;
> +
> +	if (unlikely(!lock_task_sighand(child, &flags)))
> +		goto out_put;
> +
> +	/*
> +	 * Make sure @chlid is still ptraced by us and clear BLOCK_NOTIFY.
> +	 */
> +	if (likely((child->ptrace & PT_PTRACED) && child->parent == current))

This looks a bit confusing... It is either traced or not. If it is traced,
nobody else can trace it. In fact even PT_PTRACED is not strictly needed
but I agree this check makes the code cleaner.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-16 18:17 ` [PATCH 10/10] ptrace: implement group stop notification for ptracer Tejun Heo
@ 2011-05-19 16:32   ` Oleg Nesterov
  2011-05-19 16:57     ` Tejun Heo
  2011-05-19 16:58     ` Oleg Nesterov
  0 siblings, 2 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 16:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/16, Tejun Heo wrote:
>
> When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
> is set, which triggers STOP trap but is sticky until the next
> PTRACE_GETSIGINFO.

I simply can't understand this patch. And the supposed API as it seen
by the user-space. I'll try to read it again and think more.

A couple of questions,

> +static void ptrace_trap_notify(struct task_struct *t)
> +{
> +	siginfo_t *si = t->last_siginfo;
> +
> +	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
> +	assert_spin_locked(&t->sighand->siglock);
> +
> +	/*
> +	 * @t is being ptraced and new SEIZE behavior is in effect.
> +	 * Schedule sticky trap which will clear on the next GETSIGINFO.
> +	 */
> +	t->jobctl |= JOBCTL_TRAP_NOTIFY;

This is also set by do_signal_stop(). Cleared by PTRACE_GETSIGINFO.

How can this work? Doesn't this mean PTRACE_GETSIGINFO becomes mandatory
before PTRACE_CONT? IOW, unless the tracee does PTRACE_GETSIGINFO to clear
this bit, PTRACE_CONT just leads to another trap, no?

> +	if (task_is_traced(t) && si && si->si_code == PTRACE_STOP_SI_CODE) {

OK, this PTRACE_STOP_SI_CODE check is clear. But the same check in
ptrace_check_attach() looks confusing, why can't we set BLOCK_NOTIFY
unconditionally?

> +		t->jobctl |= JOBCTL_TRAPPING;
> +		if (!(t->jobctl & JOBCTL_BLOCK_NOTIFY))
> +			signal_wake_up(t, true);

Could you please remind me why we can't avoid the awful ptrace_wait_trapping()
in do_wait() paths? Assuming that ptrace_check_attach() does this. I got lost
a bit.

So. The tracee reports PTRACE_EVENT_STOP, debugger issues a lot of PTRACE_
requests. The tracee can report another trap "in between". Looks confusing...
Perhaps I need to get used to it.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 07/10] ptrace: make group stop state visible via PTRACE_GETSIGINFO
  2011-05-19 16:27   ` Oleg Nesterov
@ 2011-05-19 16:40     ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 16:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On Thu, May 19, 2011 at 06:27:41PM +0200, Oleg Nesterov wrote:
> On 05/16, Tejun Heo wrote:
> > @@ -585,6 +587,23 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
> >
> >  	error = 0;
> >  	*info = *child->last_siginfo;
> > +
> > +	/*
> > +	 * If reporting ptrace trap for a seized tracee, enable reporting
> > +	 * of info->si_pt_flags.
> > +	 */
> > +	if ((child->ptrace & PT_SEIZED) &&
> > +	    (info->si_code & __SI_MASK) == __SI_TRAP) {
> > +		/*
> > +		 * Report whether group stop is in effect w/ SI_STOPPED and
> > +		 * if so which signal caused it.
> > +		 */
> > +		if (sig->group_stop_count || sig->flags & SIGNAL_STOP_STOPPED) {
> 
> Cosmetic nit, you could declare "struct signal_struct *sig" here, under
> if (PT_SEIZED && __SI_TRAP).

Yeap, sure.  Will update.

> I must admit, personally I still dislike this new PTRACE_GETSIGINFO API...
> But I can't suggest anything better, so I am not going to argue.

It's not the prettiest, I agree, but it is more or less in line with
the rest of the interface, so...

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY
  2011-05-19 16:32   ` Oleg Nesterov
@ 2011-05-19 16:44     ` Tejun Heo
  2011-05-19 16:48       ` Oleg Nesterov
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 16:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello,

On Thu, May 19, 2011 at 06:32:33PM +0200, Oleg Nesterov wrote:
> > +	/*
> > +	 * Make sure @chlid is still ptraced by us and clear BLOCK_NOTIFY.
> > +	 */
> > +	if (likely((child->ptrace & PT_PTRACED) && child->parent == current))
> 
> This looks a bit confusing... It is either traced or not. If it is traced,
> nobody else can trace it. In fact even PT_PTRACED is not strictly needed
> but I agree this check makes the code cleaner.

Hmmm... I suppose we can make the error paths finer grained so that
attach failure doesn't end up calling ptrace_put_task_struct() but I
think it would be better to keep the two tests there so that the
function can be called regardless of what happened.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY
  2011-05-19 16:44     ` Tejun Heo
@ 2011-05-19 16:48       ` Oleg Nesterov
  2011-05-19 16:58         ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 16:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/19, Tejun Heo wrote:
>
> Hello,
>
> On Thu, May 19, 2011 at 06:32:33PM +0200, Oleg Nesterov wrote:
> > > +	/*
> > > +	 * Make sure @chlid is still ptraced by us and clear BLOCK_NOTIFY.
> > > +	 */
> > > +	if (likely((child->ptrace & PT_PTRACED) && child->parent == current))
> >
> > This looks a bit confusing... It is either traced or not. If it is traced,
> > nobody else can trace it. In fact even PT_PTRACED is not strictly needed
> > but I agree this check makes the code cleaner.
>
> Hmmm... I suppose we can make the error paths finer grained so that
> attach failure

Ah, you are right again, I forgot about PTRACE_ATTACH path.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-19 16:32   ` Oleg Nesterov
@ 2011-05-19 16:57     ` Tejun Heo
  2011-05-19 17:13       ` Oleg Nesterov
  2011-05-19 16:58     ` Oleg Nesterov
  1 sibling, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 16:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hey,

On Thu, May 19, 2011 at 06:32:46PM +0200, Oleg Nesterov wrote:
> > +static void ptrace_trap_notify(struct task_struct *t)
> > +{
> > +	siginfo_t *si = t->last_siginfo;
> > +
> > +	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
> > +	assert_spin_locked(&t->sighand->siglock);
> > +
> > +	/*
> > +	 * @t is being ptraced and new SEIZE behavior is in effect.
> > +	 * Schedule sticky trap which will clear on the next GETSIGINFO.
> > +	 */
> > +	t->jobctl |= JOBCTL_TRAP_NOTIFY;
> 
> This is also set by do_signal_stop(). Cleared by PTRACE_GETSIGINFO.
> 
> How can this work? Doesn't this mean PTRACE_GETSIGINFO becomes mandatory
> before PTRACE_CONT? IOW, unless the tracee does PTRACE_GETSIGINFO to clear
> this bit, PTRACE_CONT just leads to another trap, no?

Yes, group stop state change raises a sticky trap condition which is
cleared by GETSIGINFO.

> > +	if (task_is_traced(t) && si && si->si_code == PTRACE_STOP_SI_CODE) {
> 
> OK, this PTRACE_STOP_SI_CODE check is clear. But the same check in
> ptrace_check_attach() looks confusing, why can't we set BLOCK_NOTIFY
> unconditionally?

It's an optimization.  If we set the flag, we'll have to acquire
siglock and clear it before returning to userland.  Setting
BLOCK_NOTIFY isn't necessary unless tracee is in TRAP_STOP, so...

> > +		t->jobctl |= JOBCTL_TRAPPING;
> > +		if (!(t->jobctl & JOBCTL_BLOCK_NOTIFY))
> > +			signal_wake_up(t, true);
> 
> Could you please remind me why we can't avoid the awful ptrace_wait_trapping()
> in do_wait() paths? Assuming that ptrace_check_attach() does this. I got lost
> a bit.

Please consider the following scenario.

1. Tracee is in group stop and stops at TRAP_STOP notifying the
   tracer.

2. Tracer does WNOWAIT wait(2) and determines that the tracee is
   trapped in TRAP_STOP.

3. Something generates SIGCONT which finishes the group stop and
   triggers the notification re-trapping.

4. While tracee is re-trapping, tracer issues WNOHANG wait(2)
   expecting it to succeed but the tracee could be RUNNING for re-trap
   thus failing WNOHANG wait(2).

> So. The tracee reports PTRACE_EVENT_STOP, debugger issues a lot of PTRACE_
> requests. The tracee can report another trap "in between". Looks confusing...
> Perhaps I need to get used to it.

Yes, and the reported trap doesn't interfere with wait(2) or ptrace
requests.  Trapping is the only event notification mechanism available
from tracee to tracer and we already have interface to wait and poll
them, so I think it makes sense to use it.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-19 16:32   ` Oleg Nesterov
  2011-05-19 16:57     ` Tejun Heo
@ 2011-05-19 16:58     ` Oleg Nesterov
  2011-05-23 11:45       ` Oleg Nesterov
  1 sibling, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 16:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/19, Oleg Nesterov wrote:
>
> On 05/16, Tejun Heo wrote:
> >
> > When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
> > is set, which triggers STOP trap but is sticky until the next
> > PTRACE_GETSIGINFO.
>
> I simply can't understand this patch. And the supposed API as it seen
> by the user-space.

I'm afraid I wasn't clear. I didn't mean the patch is bad, I only meant
that I got lost in details and I feel I'll ask more questions later.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY
  2011-05-19 16:48       ` Oleg Nesterov
@ 2011-05-19 16:58         ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-19 16:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On Thu, May 19, 2011 at 06:48:33PM +0200, Oleg Nesterov wrote:
> On 05/19, Tejun Heo wrote:
> >
> > Hello,
> >
> > On Thu, May 19, 2011 at 06:32:33PM +0200, Oleg Nesterov wrote:
> > > > +	/*
> > > > +	 * Make sure @chlid is still ptraced by us and clear BLOCK_NOTIFY.
> > > > +	 */
> > > > +	if (likely((child->ptrace & PT_PTRACED) && child->parent == current))
> > >
> > > This looks a bit confusing... It is either traced or not. If it is traced,
> > > nobody else can trace it. In fact even PT_PTRACED is not strictly needed
> > > but I agree this check makes the code cleaner.
> >
> > Hmmm... I suppose we can make the error paths finer grained so that
> > attach failure
> 
> Ah, you are right again, I forgot about PTRACE_ATTACH path.

We can move the PT_PTRACED and parenthood test outside the siglock tho
and I agree that the 'still ptraced' comment is misleading.  I'll
update it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-19 16:57     ` Tejun Heo
@ 2011-05-19 17:13       ` Oleg Nesterov
  2011-05-19 22:48         ` Denys Vlasenko
  2011-05-20  8:46         ` Tejun Heo
  0 siblings, 2 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-19 17:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/19, Tejun Heo wrote:
>
> Hey,
>
> On Thu, May 19, 2011 at 06:32:46PM +0200, Oleg Nesterov wrote:
> > > +static void ptrace_trap_notify(struct task_struct *t)
> > > +{
> > > +	siginfo_t *si = t->last_siginfo;
> > > +
> > > +	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
> > > +	assert_spin_locked(&t->sighand->siglock);
> > > +
> > > +	/*
> > > +	 * @t is being ptraced and new SEIZE behavior is in effect.
> > > +	 * Schedule sticky trap which will clear on the next GETSIGINFO.
> > > +	 */
> > > +	t->jobctl |= JOBCTL_TRAP_NOTIFY;
> >
> > This is also set by do_signal_stop(). Cleared by PTRACE_GETSIGINFO.
> >
> > How can this work? Doesn't this mean PTRACE_GETSIGINFO becomes mandatory
> > before PTRACE_CONT? IOW, unless the tracee does PTRACE_GETSIGINFO to clear
> > this bit, PTRACE_CONT just leads to another trap, no?
>
> Yes, group stop state change raises a sticky trap condition which is
> cleared by GETSIGINFO.

Hmm. At least now I understand the meaining what "sticky" means in
this discussion ;) I was confused.

> > > +	if (task_is_traced(t) && si && si->si_code == PTRACE_STOP_SI_CODE) {
> >
> > OK, this PTRACE_STOP_SI_CODE check is clear. But the same check in
> > ptrace_check_attach() looks confusing, why can't we set BLOCK_NOTIFY
> > unconditionally?
>
> It's an optimization.  If we set the flag, we'll have to acquire
> siglock

OK, I see.

> > > +		t->jobctl |= JOBCTL_TRAPPING;
> > > +		if (!(t->jobctl & JOBCTL_BLOCK_NOTIFY))
> > > +			signal_wake_up(t, true);
> >
> > Could you please remind me why we can't avoid the awful ptrace_wait_trapping()
> > in do_wait() paths? Assuming that ptrace_check_attach() does this. I got lost
> > a bit.
>
> Please consider the following scenario.
>
> 1. Tracee is in group stop and stops at TRAP_STOP notifying the
>    tracer.
>
> 2. Tracer does WNOWAIT wait(2) and determines that the tracee is
>    trapped in TRAP_STOP.
>
> 3. Something generates SIGCONT which finishes the group stop and
>    triggers the notification re-trapping.
>
> 4. While tracee is re-trapping, tracer issues WNOHANG

OK. I still hope we can avoid this somehow. May be play with exit_code
so that do_wait() can succeed even if the JOBCTL_TRAPPING tracee is
running. Perhaps.

If only we could notify the tracer from ptrace_trap_notify... IIUC,
this is the only problem? I mean, apart from this there is no need
to wake up the tracee.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-19 14:17       ` Tejun Heo
  2011-05-19 15:02         ` Tejun Heo
@ 2011-05-19 19:31         ` Pedro Alves
  2011-05-19 22:42           ` Denys Vlasenko
  2011-05-23 13:09         ` Oleg Nesterov
  2 siblings, 1 reply; 88+ messages in thread
From: Pedro Alves @ 2011-05-19 19:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

On Thursday 19 May 2011 15:17:28, Tejun Heo wrote:
> But making SEIZE not trigger INTERRUPT and SETOPTIONS without
> requiring TRACED don't seem too difficult.  Jan, would that be enough?
> Oleg, what do you think?

UUIC, that opens a race where between SEIZEing and
SETOPTIONS(O_TRACE FORK|VFORK|EXEC...), the tracee can
fork/vfork/clone/exec, without the tracer getting the
nice corresponding PTRACE_EVENT_ events.

In GDBs case, GDB will want to poke at memory
right after attaching, to at least read the
loaded DSO list, so if you can't read tracee
memory without interrupting it, you won't get much
benefit from SEIZE not interrupting.
Not to say other users wouldn't benefit.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-19 19:31         ` Pedro Alves
@ 2011-05-19 22:42           ` Denys Vlasenko
  2011-05-19 23:00             ` Pedro Alves
  0 siblings, 1 reply; 88+ messages in thread
From: Denys Vlasenko @ 2011-05-19 22:42 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Tejun Heo, oleg, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On Thursday 19 May 2011 21:31, Pedro Alves wrote:
> On Thursday 19 May 2011 15:17:28, Tejun Heo wrote:
> > But making SEIZE not trigger INTERRUPT and SETOPTIONS without
> > requiring TRACED don't seem too difficult.  Jan, would that be enough?
> > Oleg, what do you think?
> 
> UUIC, that opens a race where between SEIZEing and
> SETOPTIONS(O_TRACE FORK|VFORK|EXEC...), the tracee can
> fork/vfork/clone/exec, without the tracer getting the
> nice corresponding PTRACE_EVENT_ events.

SEIZE,fork-in-tracee,INTERRUPT sequence is indistinguishable
from SEIZE happening two microseconds later:

fork-in-tracee,SEIZE,INTERRUPT

> In GDBs case, GDB will want to poke at memory
> right after attaching

...where "right after attaching" is defined as "when the first ptrace-stop
is reported". Which will happen very soon.

-- 
vda

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2
  2011-05-19 15:04 ` Linus Torvalds
  2011-05-19 15:19   ` Tejun Heo
@ 2011-05-19 22:45   ` Denys Vlasenko
  1 sibling, 0 replies; 88+ messages in thread
From: Denys Vlasenko @ 2011-05-19 22:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, oleg, jan.kratochvil, linux-kernel, akpm, indan, bdonlan

On Thursday 19 May 2011 17:04, Linus Torvalds wrote:
> On Mon, May 16, 2011 at 11:17 AM, Tejun Heo <tj@kernel.org> wrote:
> >
> > This is the second try at implementing PTRACE_SEIZE/INTERRUPT and
> > group stop notification.  Notable changes from the first take[1] are,
> >
> > * Prep patches moved to a separate patchset[2].
> 
> So having followed the discussion so far, quite frankly I'm not
> convinced this series is 2.6.40 material.
> 
> I think that conceptually the split-up of PTRACE_ATTACH into
> SEIZE/INTERRUPT might be fine, but I don't think the interface is
> necessarily cooked, and perhaps more importantly I'm not at all sure
> that the (few) current users of ptrace() would even switch over.

I will push strace to at least use what we already have (TRACESYSGOOD etc).

> So I think Oleg's branch with cleanups is probably ready, and maybe a
> few of the preparatory patches from this branch can be merged, but I
> would _strongly_ suggest that the plan for 2.6.40 should be to not
> actually mess with interfaces to the kernel, but just cleaning up the
> actual internal implementation. I would like to keep 2.6.40 small and
> simple.

strace wasn't handling ^Z correctly for many years. It definitely can wait
for another kernel release cycle.

-- 
vda

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-19 17:13       ` Oleg Nesterov
@ 2011-05-19 22:48         ` Denys Vlasenko
  2011-05-20  8:59           ` Tejun Heo
  2011-05-20  8:46         ` Tejun Heo
  1 sibling, 1 reply; 88+ messages in thread
From: Denys Vlasenko @ 2011-05-19 22:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Tejun Heo, jan.kratochvil, linux-kernel, torvalds, akpm, indan, bdonlan

On Thursday 19 May 2011 19:13, Oleg Nesterov wrote:
> On 05/19, Tejun Heo wrote:
> > On Thu, May 19, 2011 at 06:32:46PM +0200, Oleg Nesterov wrote:
> > > > +static void ptrace_trap_notify(struct task_struct *t)
> > > > +{
> > > > +	siginfo_t *si = t->last_siginfo;
> > > > +
> > > > +	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
> > > > +	assert_spin_locked(&t->sighand->siglock);
> > > > +
> > > > +	/*
> > > > +	 * @t is being ptraced and new SEIZE behavior is in effect.
> > > > +	 * Schedule sticky trap which will clear on the next GETSIGINFO.
> > > > +	 */
> > > > +	t->jobctl |= JOBCTL_TRAP_NOTIFY;
> > >
> > > This is also set by do_signal_stop(). Cleared by PTRACE_GETSIGINFO.
> > >
> > > How can this work? Doesn't this mean PTRACE_GETSIGINFO becomes mandatory
> > > before PTRACE_CONT? IOW, unless the tracee does PTRACE_GETSIGINFO to clear
> > > this bit, PTRACE_CONT just leads to another trap, no?
> >
> > Yes, group stop state change raises a sticky trap condition which is
> > cleared by GETSIGINFO.
> 
> Hmm. At least now I understand the meaining what "sticky" means in
> this discussion ;) I was confused.

I still don't see why is it needed. Userspace API doesn't require it.

Are you trying to solve/simplify something on the kernel side with it?

-- 
vda

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-19 22:42           ` Denys Vlasenko
@ 2011-05-19 23:00             ` Pedro Alves
  2011-05-20  1:44               ` Denys Vlasenko
  2011-05-20  9:07               ` Tejun Heo
  0 siblings, 2 replies; 88+ messages in thread
From: Pedro Alves @ 2011-05-19 23:00 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Tejun Heo, oleg, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On Thursday 19 May 2011 23:42:12, Denys Vlasenko wrote:
> On Thursday 19 May 2011 21:31, Pedro Alves wrote:
> > On Thursday 19 May 2011 15:17:28, Tejun Heo wrote:
> > > But making SEIZE not trigger INTERRUPT and SETOPTIONS without
> > > requiring TRACED don't seem too difficult.  Jan, would that be enough?
> > > Oleg, what do you think?
> > 
> > UUIC, that opens a race where between SEIZEing and
> > SETOPTIONS(O_TRACE FORK|VFORK|EXEC...), the tracee can
> > fork/vfork/clone/exec, without the tracer getting the
> > nice corresponding PTRACE_EVENT_ events.
> 
> SEIZE,fork-in-tracee,INTERRUPT sequence is indistinguishable
> from SEIZE happening two microseconds later:
> 
> fork-in-tracee,SEIZE,INTERRUPT

 SEIZE,execvd,INTERRUPT (SETOPTS on interrupt)

will make the tracer see a SIGTRAP that 

 execvd,SEIZE,INTERRUPT

nor

 SEIZE,SETOPTS,execvd (SETOPTS on interrupt)

would cause, isn't it?

Now, if it were possible for the tracer to set the
default OPTS _before_ PTRACE_ATTACH/PTRACE_SEIZE...

> 
> > In GDBs case, GDB will want to poke at memory
> > right after attaching
> 
> ...where "right after attaching" is defined as "when the first ptrace-stop
> is reported". Which will happen very soon.

Hmm?  Why would it happen very soon?  Isn't the point of SEIZE not
interrupting that you'd not get any INTERRUPT or stop at all?
Where is the ptrace-stop coming from?

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-19 23:00             ` Pedro Alves
@ 2011-05-20  1:44               ` Denys Vlasenko
  2011-05-20  8:56                 ` Pedro Alves
  2011-05-20  9:07               ` Tejun Heo
  1 sibling, 1 reply; 88+ messages in thread
From: Denys Vlasenko @ 2011-05-20  1:44 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Tejun Heo, oleg, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On Friday 20 May 2011 01:00, Pedro Alves wrote:
> On Thursday 19 May 2011 23:42:12, Denys Vlasenko wrote:
> > On Thursday 19 May 2011 21:31, Pedro Alves wrote:
> > > On Thursday 19 May 2011 15:17:28, Tejun Heo wrote:
> > > > But making SEIZE not trigger INTERRUPT and SETOPTIONS without
> > > > requiring TRACED don't seem too difficult.  Jan, would that be enough?
> > > > Oleg, what do you think?
> > > 
> > > UUIC, that opens a race where between SEIZEing and
> > > SETOPTIONS(O_TRACE FORK|VFORK|EXEC...), the tracee can
> > > fork/vfork/clone/exec, without the tracer getting the
> > > nice corresponding PTRACE_EVENT_ events.
> > 
> > SEIZE,fork-in-tracee,INTERRUPT sequence is indistinguishable
> > from SEIZE happening two microseconds later:
> > 
> > fork-in-tracee,SEIZE,INTERRUPT
> 
>  SEIZE,execvd,INTERRUPT (SETOPTS on interrupt)
> 
> will make the tracer see a SIGTRAP that 
> 
>  execvd,SEIZE,INTERRUPT
> 
> nor
> 
>  SEIZE,SETOPTS,execvd (SETOPTS on interrupt)
> 
> would cause, isn't it?

Yes, you are right about this particular case.

Execve's extra SIGTRAP is a particularly painful misfeature.


> Now, if it were possible for the tracer to set the
> default OPTS _before_ PTRACE_ATTACH/PTRACE_SEIZE...

I propose to do it *during* SEIZE then. Say, by passing SETOPTION style
option flags in data argument. To fight above example, we'd want
to pass PTRACE_O_TRACEEXEC. 

Tejun, what do you think?


> > > In GDBs case, GDB will want to poke at memory
> > > right after attaching
> > 
> > ...where "right after attaching" is defined as "when the first ptrace-stop
> > is reported". Which will happen very soon.
> 
> Hmm?  Why would it happen very soon?
> Isn't the point of SEIZE not 
> interrupting that you'd not get any INTERRUPT or stop at all?
> Where is the ptrace-stop coming from?

>From PTRACE_INTERRUPT. Without it, tracee is running. Ptrace API never
allowed poking of running tracees. You need to stop it first.

-- 
vda

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-19 17:13       ` Oleg Nesterov
  2011-05-19 22:48         ` Denys Vlasenko
@ 2011-05-20  8:46         ` Tejun Heo
  1 sibling, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-20  8:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello, Oleg.

On Thu, May 19, 2011 at 07:13:27PM +0200, Oleg Nesterov wrote:
> OK. I still hope we can avoid this somehow. May be play with exit_code
> so that do_wait() can succeed even if the JOBCTL_TRAPPING tracee is
> running. Perhaps.

I considered that but re-trapping seemed cleaner to me.  wait(2)
already depends on retries so I don't think adding TRAPPING wait
introduces too much extra complexity.  That said, if you can come up
with something simpler, by all means.

> If only we could notify the tracer from ptrace_trap_notify... IIUC,
> this is the only problem? I mean, apart from this there is no need
> to wake up the tracee.

Yeap, correct.
ahead.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-20  1:44               ` Denys Vlasenko
@ 2011-05-20  8:56                 ` Pedro Alves
  2011-05-20  9:12                   ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Pedro Alves @ 2011-05-20  8:56 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Tejun Heo, oleg, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On Friday 20 May 2011 02:44:44, Denys Vlasenko wrote:
> > > > In GDBs case, GDB will want to poke at memory
> > > > right after attaching
> > > 
> > > ...where "right after attaching" is defined as "when the first ptrace-stop
> > > is reported". Which will happen very soon.
> > 
> > Hmm?  Why would it happen very soon?
> > Isn't the point of SEIZE not 
> > interrupting that you'd not get any INTERRUPT or stop at all?
> > Where is the ptrace-stop coming from?
> 
> From PTRACE_INTERRUPT. Without it, tracee is running. 

> Ptrace API never allowed poking of running tracees.

Which is a bit lame.

> You need to stop it first.

That was my point...  I was just pointing out that GDB will end
up PTRACE_INTERRUPTing the target anyway.  Maybe we could extend
GDB's "observer" mode to be even more observer-only, and delay
reading the DSO list and whatever else GDB does on attach until
the first stop, or to user request.  Some archs need reading
registers as soon as possible in order to actually know which
arch variant we've attached to.  Anyway, this is GDBs business.
SEIZE not interrupting won't hurt GDB, and is obviously useful for
some use cases and tracers, _provided the race with SETOPTS is fixed_.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-19 22:48         ` Denys Vlasenko
@ 2011-05-20  8:59           ` Tejun Heo
  2011-05-23 13:34             ` Oleg Nesterov
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-20  8:59 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Oleg Nesterov, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

Hello, Denys.

On Fri, May 20, 2011 at 12:48:07AM +0200, Denys Vlasenko wrote:
> > Hmm. At least now I understand the meaining what "sticky" means in
> > this discussion ;) I was confused.
> 
> I still don't see why is it needed. Userspace API doesn't require it.
> 
> Are you trying to solve/simplify something on the kernel side with it?

No, no.  Please consider the following scenario.

1. A process has two threads t1 and t2.  t1 is ptraced by p1.  Both t1
   and t2 are running.

2. p1 INTERRUPTs t1.  t1 enters TRAP_STOP and p1 notices it.

3. Something else sends SIGSTOP to t2 which initiates group stop.

4. As t1 re-traps to notify p1 of group stop, p1 issues PTRACE_CONT.

5. PTRACE_CONT succeeds right after t1 re-traps for notification and
   t1 resumes execution.

6. p1 has no way to find out the group stop which is now in effect.

And vice-versa for SIGCONT.  When an event needs to be delivered, the
event should remain pending until the target the event is destined to
has fetched the event; otherwise, the event can be lost.  Here, the
event to be delivered is change of group stop state and fetching
mechanism is GETSIGINFO.  The event should remain pending until
GETSIGINFO is called.

The only requirement is for the ptracer to call PTRACE_GETSIGINFO on
PTRACE_EVENT_STOP's which can be determined from the wait(2) exit
code.  If ptracer does that, there won't be any unnecessary trap.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-19 23:00             ` Pedro Alves
  2011-05-20  1:44               ` Denys Vlasenko
@ 2011-05-20  9:07               ` Tejun Heo
  2011-05-20  9:27                 ` Pedro Alves
  1 sibling, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-20  9:07 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

Hello, Pedro.

On Fri, May 20, 2011 at 12:00:17AM +0100, Pedro Alves wrote:
> > > UUIC, that opens a race where between SEIZEing and
> > > SETOPTIONS(O_TRACE FORK|VFORK|EXEC...), the tracee can
> > > fork/vfork/clone/exec, without the tracer getting the
> > > nice corresponding PTRACE_EVENT_ events.

Does it matter?  The order of execution isn't even well defined
without synchronization border.  If you want full synchronization, you
can INTERRUPT tracee.

>  SEIZE,execvd,INTERRUPT (SETOPTS on interrupt)
> 
> will make the tracer see a SIGTRAP that 
> 
>  execvd,SEIZE,INTERRUPT
> 
> nor
> 
>  SEIZE,SETOPTS,execvd (SETOPTS on interrupt)
> 
> would cause, isn't it?

Yes, SIGTRAP on exec(2) is nasty but also is scheduled to be removed
if SEIZED.

> Now, if it were possible for the tracer to set the
> default OPTS _before_ PTRACE_ATTACH/PTRACE_SEIZE...

I don't see why that would be necessary.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-20  8:56                 ` Pedro Alves
@ 2011-05-20  9:12                   ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-20  9:12 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

Hello,

On Fri, May 20, 2011 at 09:56:22AM +0100, Pedro Alves wrote:
> > Ptrace API never allowed poking of running tracees.
> 
> Which is a bit lame.

Lame as it may be, it would be quite challenging to change that.

> > You need to stop it first.
> 
> That was my point...  I was just pointing out that GDB will end
> up PTRACE_INTERRUPTing the target anyway.  Maybe we could extend
> GDB's "observer" mode to be even more observer-only, and delay
> reading the DSO list and whatever else GDB does on attach until
> the first stop, or to user request.  Some archs need reading
> registers as soon as possible in order to actually know which
> arch variant we've attached to.  Anyway, this is GDBs business.
> SEIZE not interrupting won't hurt GDB, and is obviously useful for
> some use cases and tracers, _provided the race with SETOPTS is fixed_.

Other than the to-be-removed SIGTRAP, I don't think the trace is
observable from userland.  Do you have anything specific on mind?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-20  9:07               ` Tejun Heo
@ 2011-05-20  9:27                 ` Pedro Alves
  2011-05-20  9:31                   ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Pedro Alves @ 2011-05-20  9:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

On Friday 20 May 2011 10:07:18, Tejun Heo wrote:
> Hello, Pedro.
> 
> On Fri, May 20, 2011 at 12:00:17AM +0100, Pedro Alves wrote:
> > > > UUIC, that opens a race where between SEIZEing and
> > > > SETOPTIONS(O_TRACE FORK|VFORK|EXEC...), the tracee can
> > > > fork/vfork/clone/exec, without the tracer getting the
> > > > nice corresponding PTRACE_EVENT_ events.
> 
> Does it matter?  The order of execution isn't even well defined
> without synchronization border.  If you want full synchronization, you
> can INTERRUPT tracee.

The point I was trying to raise was not about the order of
execution, but about letting the old pre-nice PTRACE_EVENT_
events quirks stick through.

> 
> >  SEIZE,execvd,INTERRUPT (SETOPTS on interrupt)
> > 
> > will make the tracer see a SIGTRAP that 
> > 
> >  execvd,SEIZE,INTERRUPT
> > 
> > nor
> > 
> >  SEIZE,SETOPTS,execvd (SETOPTS on interrupt)
> > 
> > would cause, isn't it?
> 
> Yes, SIGTRAP on exec(2) is nasty but also is scheduled to be removed
> if SEIZED.

Okay, good to hear that.  Looks like the tracer can do:

 SEIZE,execve,SETOPTS,'readlink /proc/pid/exe'

and pretend it SEIZED after the execve.

I'm happy for now.

Thanks.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-20  9:27                 ` Pedro Alves
@ 2011-05-20  9:31                   ` Tejun Heo
  2011-05-24  9:49                     ` Pedro Alves
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-20  9:31 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

Hello, Pedro.

On Fri, May 20, 2011 at 10:27:35AM +0100, Pedro Alves wrote:
> > Does it matter?  The order of execution isn't even well defined
> > without synchronization border.  If you want full synchronization, you
> > can INTERRUPT tracee.
> 
> The point I was trying to raise was not about the order of
> execution, but about letting the old pre-nice PTRACE_EVENT_
> events quirks stick through.

I see.

> > Yes, SIGTRAP on exec(2) is nasty but also is scheduled to be removed
> > if SEIZED.
> 
> Okay, good to hear that.  Looks like the tracer can do:
> 
>  SEIZE,execve,SETOPTS,'readlink /proc/pid/exe'
> 
> and pretend it SEIZED after the execve.

Yeap, and I was trying to say that if tracer and tracee are running on
different CPUs, the order between SEIZE and execve isn't even well
defined (sans the nasty automatic SIGTRAP).

> I'm happy for now.

Awesome, thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-19 16:58     ` Oleg Nesterov
@ 2011-05-23 11:45       ` Oleg Nesterov
  2011-05-24 13:44         ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-23 11:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

On 05/19, Oleg Nesterov wrote:
>
> I only meant
> that I got lost in details and I feel I'll ask more questions later.

... and I am greatly disappointed by the fact all technical problems I
noticed were bogus, I should try more ;)



1. __ptrace_unlink:

	task_clear_jobctl_pending(child, JOBCTL_PENDING_MASK);

	/*
	 * Reinstate JOBCTL_STOP_PENDING if group stop is in effect and
	 * @child isn't dead.
	 */
	if (!(child->flags & PF_EXITING) &&
	    (child->signal->flags & SIGNAL_STOP_STOPPED ||
	     child->signal->group_stop_count))
		child->jobctl |= JOBCTL_STOP_PENDING;

Yes, we set JOBCTL_STOP_PENDING correctly. But what about JOBCTL_STOP_CONSUME?
It was potentially flushed by task_clear_jobctl_pending() above.




2. ptrace_trap_notify() always sets JOBCTL_TRAP_NOTIFY. Even if the caller
is prepare_signal(SIGCONT) and there were no SIGSTOP in the past. This looks
a bit strange to me. I mean, perhaps it would be better to provoke the trap
only if this SIGCONT is going to change the jobctl state.

In any case, we shouldn't set JOBCTL_TRAP_NOTIFY if fatal_signal_pending().
do_signal_stop() is fine, and prepare_signal(SIGCONT) can't race with SIGKILL.
but it can race with the zap_other_threads-like code (exec, coredump).
If we set JOBCTL_TRAP_NOTIFY when fatal_signal_pending() is true, the tracee
can't stop, it will call get_signal_to_deliver()->ptrace_stop() in a loop
forever and the tracer can't clear this bit.

Minor, but perhaps it would be more consistent to check PF_EXITING in
the !task_is_traced() case.




3. The similar (but currently theoretical) problem with PTRACE_TRAP_STOP.
We shouldn't assume anything about arch_ptrace_stop_needed(), the code
should work correctly even if it always returns true. This means that,
say, PTRACE_INTERRUPT should not set JOBCTL_TRAP_STOP if the tracee was
killed, or ptrace_stop() should clear JOBCTL_TRAP_MASK if it aborts.




-------------------------------------------------------------------------------
Now about the API. Can't we avoid the "asynchronous" re-trapping and the
subsequent complications like PTRACE_GETSIGINFO-must-be-called-to-cont?

What if we simply add the new request, say, PTRACE_JOBCTL_CONT which acts
as "do not cont, but please report me the next change in jctl state". IOW,
in short, instead of JOBCTL_BLOCK_NOTIFY we add JOBCTL_PLEASE_NOTIFY which
is set by this request.

Roughly, PTRACE_JOBCTL_CONT works as follows:

	- it is only valid if the tracee reported PTRACE_EVENT_STOP

	- SIGCONT/do_signal_stop/prepare_signal(SIGCONT) set
	  JOBCTL_TRAP_NOTIFY but do not wakeup unless
	  JOBCTL_PLEASE_NOTIFY was set by PTRACE_JOBCTL_CONT

	- of course, PTRACE_JOBCTL_CONT checks if JOBCTL_TRAP_NOTIFY
	  is already set.

As for PTRACE_CONT, it should set JOBCTL_PLEASE_NOTIFY if we resume the
stopped tracee. Or the tracer can do PTRACE_JOBCTL_CONT + PTRACE_CONT.


So. If the tracer wants to wait for SIGCONT after the tracee reports the
jobctl stop, it can simply do PTRACE_JOBCTL_CONT and wait(). This looks
much simpler to me. The only (afaics) complication is: PTRACE_INTERRUPT
should guarantee the trap after PTRACE_JOBCTL_CONT (in this case we can
simply set ->exit_code, no need to re-trap).

si_pt_flags (or whatever we use to report the jobctl state) can be recorded
during the trap, not at the time of ptrace() syscall.

The implementation should be simpler too. Only ptrace_attach() needs the
wait_trapping() logic, no need to complicate do_wait(). The tracee re-traps
only if the tracer asks for this.

And _imho_ this is more understandable from the user-space pov. To me it
is a bit strange a TASK_TRACED tracee can run and then take another trap
without some sort of CONT from debugger, even if we hide the transition.

What do you think?

I must admit, I didnt think over this all thoroughly, just a sudden idea.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-18  9:55     ` Tejun Heo
  2011-05-18 10:44       ` Denys Vlasenko
  2011-05-19 14:17       ` Tejun Heo
@ 2011-05-23 12:43       ` Oleg Nesterov
  2011-05-24 10:28         ` Tejun Heo
  2 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-23 12:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On 05/18, Tejun Heo wrote:
>
> I've been thinking about Jan's suggestion to make ATTACH and DETACH
> not require tracee to trap. We already have this for DETACH for cases
> where the tracer is killed

Yes, I still think that the new DETACH_XXX request which doesn't need
the stopped tracee makes sense. Yes, we have PTRACE_INTERRUPT. But please
recall the previous discussion, it is possible that the tracee can't
react to PTRACE_INTERRUPT and trap because it waits for other threads
we are tracing.

And. Currently there is no way to detach a zombie leader. Perhaps we
should change do_wait(), but it is not clear what should we do if the
tracer is the real parent (we already discussed this a bit).

> and it seems it wouldn't be too difficult
> to make that happen for ATTACH either

Yes, I think this is simple to do. Do we need this? I leave this up
to you and Jan.

To me personally attach-implies-trap looks more natural, but probably
gdb has another opinion.


Anyway. IIUC, gdb wants the auto-attach-on-clone without the trap,
this is more important but this opens a lot of problems.


> and for that to be truly useful
> I suppose PTRACE_SETOPTIONS shouldn't require trapped state either.

Hmm. Why? we could pass this options along with PTRACE_SEIZE?

> Jan, would that be enough for the use cases you have on mind?

Jan?

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-19 14:17       ` Tejun Heo
  2011-05-19 15:02         ` Tejun Heo
  2011-05-19 19:31         ` Pedro Alves
@ 2011-05-23 13:09         ` Oleg Nesterov
  2 siblings, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-23 13:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On 05/19, Tejun Heo wrote:
>
> Hello,
>
> On Wed, May 18, 2011 at 11:55:39AM +0200, Tejun Heo wrote:
> > I've been thinking about Jan's suggestion to make ATTACH and DETACH
> > not require tracee to trap.  We already have this for DETACH for cases
> > where the tracer is killed and it seems it wouldn't be too difficult
> > to make that happen for ATTACH either and for that to be truly useful
> > I suppose PTRACE_SETOPTIONS shouldn't require trapped state either.
> > Jan, would that be enough for the use cases you have on mind?
>
> I've been trying this and clean DETACH requires the tracee to be
> trapped (or not running).  The arch detach hook, which BTW is not
> executed when the tracer is killed, modifies tracee state expecting it
> to be off-cpu.

Argh. Yes, ptrace_disable() is the problem, I didn't think about it :/

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-20  8:59           ` Tejun Heo
@ 2011-05-23 13:34             ` Oleg Nesterov
  0 siblings, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-23 13:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On 05/20, Tejun Heo wrote:
>
> 1. A process has two threads t1 and t2.  t1 is ptraced by p1.  Both t1
>    and t2 are running.
>
> 2. p1 INTERRUPTs t1.  t1 enters TRAP_STOP and p1 notices it.
>
> 3. Something else sends SIGSTOP to t2 which initiates group stop.
>
> 4. As t1 re-traps to notify p1 of group stop, p1 issues PTRACE_CONT.
>
> 5. PTRACE_CONT succeeds right after t1 re-traps for notification and
>    t1 resumes execution.

IOW. The root of the problem is that TASK_TRACED no longer means the
tracee is stopped, it can change its state and the "volatile" info
in si_pt_flags reflects this fact.

This makes me nervous ;) Yes, we hide the TRACED->RUNNING->TRACED
transitions, and I have to agree that the proposed API looks consistent
to me. Still, can't we do this differently ? (see another email).

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-20  9:31                   ` Tejun Heo
@ 2011-05-24  9:49                     ` Pedro Alves
  2011-05-24 12:00                       ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Pedro Alves @ 2011-05-24  9:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

On Friday 20 May 2011 10:31:11, Tejun Heo wrote:
> On Fri, May 20, 2011 at 10:27:35AM +0100, Pedro Alves wrote:
> > Okay, good to hear that.  Looks like the tracer can do:
> > 
> >  SEIZE,execve,SETOPTS,'readlink /proc/pid/exe'
> > 
> > and pretend it SEIZED after the execve.
> 
> Yeap, and I was trying to say that if tracer and tracee are running on
> different CPUs, the order between SEIZE and execve isn't even well
> defined (sans the nasty automatic SIGTRAP).

I see, indeed, thanks.

A couple interface questions that just crossed my mind:

 - on a fork/vfork/clone, if PTRACE_EVENT_FORK|VFORK|CLONE have been
   enabled, will the tracer still see the new child stop with a
   SIGSTOP, or will it see a PTRACE_EVENT_INTERRUPT?

 - is PTRACE_INTERRUPT on PTRACE_TRACEME-traced-child planed to
   be allowed (for convenience)?
   A PTRACE_O_TRACEINTERRUPT, or some such PTRACE_SETOPTIONS
   option might be necessary to get PTRACE_EVENT_INTERRUPT instead
   of SIGSTOP in the point above.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-23 12:43       ` Oleg Nesterov
@ 2011-05-24 10:28         ` Tejun Heo
  2011-05-25 18:29           ` Oleg Nesterov
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-24 10:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

Hello, Oleg.

On Mon, May 23, 2011 at 02:43:14PM +0200, Oleg Nesterov wrote:
> On 05/18, Tejun Heo wrote:
> > I've been thinking about Jan's suggestion to make ATTACH and DETACH
> > not require tracee to trap. We already have this for DETACH for cases
> > where the tracer is killed
> 
> Yes, I still think that the new DETACH_XXX request which doesn't need
> the stopped tracee makes sense. Yes, we have PTRACE_INTERRUPT. But please
> recall the previous discussion, it is possible that the tracee can't
> react to PTRACE_INTERRUPT and trap because it waits for other threads
> we are tracing.

Yeah, untrapped DETACH sounds nice but as you've already acknowledged
in another reply, we have those nasty disable traps.  As the impact on
debugging capability is much lower, having a synchronization barrier
on detach might not be such a bad idea.  As for debugger building
dependency loop with debuggees, I think SIGKILL is the solution there.
Debugger can do that with PTRACE_INTERRUPT regardless of DETACH after
all.

> And. Currently there is no way to detach a zombie leader. Perhaps we
> should change do_wait(), but it is not clear what should we do if the
> tracer is the real parent (we already discussed this a bit).

Hmmm... maybe just allow detaching zombie leader?  As it's guaranteed
to be not running, we don't have problem with ptrace_disable.

> > and for that to be truly useful
> > I suppose PTRACE_SETOPTIONS shouldn't require trapped state either.
> 
> Hmm. Why? we could pass this options along with PTRACE_SEIZE?

Hmmm... I just don't see any reason to combine them.  If the tracer
wants sync points, it can INTERRUPT tracee and adjust options;
otherwise, it can just do it.  Isn't that simpler?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-24  9:49                     ` Pedro Alves
@ 2011-05-24 12:00                       ` Tejun Heo
  2011-05-24 12:36                         ` Pedro Alves
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-24 12:00 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

Hello,

On Tue, May 24, 2011 at 10:49:58AM +0100, Pedro Alves wrote:
> A couple interface questions that just crossed my mind:
> 
>  - on a fork/vfork/clone, if PTRACE_EVENT_FORK|VFORK|CLONE have been
>    enabled, will the tracer still see the new child stop with a
>    SIGSTOP, or will it see a PTRACE_EVENT_INTERRUPT?

This won't change, so SIGSTOP although we probably want to improve it
such that this can be distinguished from SIGTRAP from userland.

>  - is PTRACE_INTERRUPT on PTRACE_TRACEME-traced-child planed to
>    be allowed (for convenience)?
>    A PTRACE_O_TRACEINTERRUPT, or some such PTRACE_SETOPTIONS
>    option might be necessary to get PTRACE_EVENT_INTERRUPT instead
>    of SIGSTOP in the point above.

I'm currently leaning toward deprecating PTRACE_TRACEME.  If a task
can PTRACE_TRACEME, it may as well just do pause(2) and let the parent
SEIZE it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-24 12:00                       ` Tejun Heo
@ 2011-05-24 12:36                         ` Pedro Alves
  2011-05-24 14:02                           ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Pedro Alves @ 2011-05-24 12:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

On Tuesday 24 May 2011 13:00:13, Tejun Heo wrote:
> Hello,
> 
> On Tue, May 24, 2011 at 10:49:58AM +0100, Pedro Alves wrote:
> > A couple interface questions that just crossed my mind:
> > 
> >  - on a fork/vfork/clone, if PTRACE_EVENT_FORK|VFORK|CLONE have been
> >    enabled, will the tracer still see the new child stop with a
> >    SIGSTOP, or will it see a PTRACE_EVENT_INTERRUPT?
> 
> This won't change, so SIGSTOP although we probably want to improve it
> such that this can be distinguished from SIGTRAP from userland.

(I assume you meant SIGSTOP from userland.)  So that if a SIGSTOPs
from userland is sent before the tracer waits for the child, the
tracer sees a siginfo corresponding to the userland SIGSTOP?  Sounds
like it might work.

> >  - is PTRACE_INTERRUPT on PTRACE_TRACEME-traced-child planed to
> >    be allowed (for convenience)?
> >    A PTRACE_O_TRACEINTERRUPT, or some such PTRACE_SETOPTIONS
> >    option might be necessary to get PTRACE_EVENT_INTERRUPT instead
> >    of SIGSTOP in the point above.
> 
> I'm currently leaning toward deprecating PTRACE_TRACEME.  If a task
> can PTRACE_TRACEME, it may as well just do pause(2) and let the parent
> SEIZE it.

Debuggers will want to nurse the child through a couple of
execs (shell, then real debuggee), so that scheme requires a bit
more synchronization, because SEIZE hides the magic exec SIGTRAP,
and so the tracer needs to set the O_TRACEXEC option before the first
exec, and make sure external signals don't break the synchronization.
Reading/writing to/from blocking pipes for that initial synchronization
is what GDB uses instead for e.g., hpux/ttrace support, which looks
similar to using PTRACE_SEIZE for PTRACE_TRACEME.  A bit more
cumbersome, though doable, I suppose.

Thanks.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-23 11:45       ` Oleg Nesterov
@ 2011-05-24 13:44         ` Tejun Heo
  2011-05-24 15:44           ` Tejun Heo
  2011-05-26 14:44           ` Oleg Nesterov
  0 siblings, 2 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-24 13:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello, Oleg.

On Mon, May 23, 2011 at 01:45:59PM +0200, Oleg Nesterov wrote:
> On 05/19, Oleg Nesterov wrote:
> >
> > I only meant
> > that I got lost in details and I feel I'll ask more questions later.
> 
> ... and I am greatly disappointed by the fact all technical problems I
> noticed were bogus, I should try more ;)

Yay, do I get to win for once? :)

> 1. __ptrace_unlink:
> 
> 	task_clear_jobctl_pending(child, JOBCTL_PENDING_MASK);
> 
> 	/*
> 	 * Reinstate JOBCTL_STOP_PENDING if group stop is in effect and
> 	 * @child isn't dead.
> 	 */
> 	if (!(child->flags & PF_EXITING) &&
> 	    (child->signal->flags & SIGNAL_STOP_STOPPED ||
> 	     child->signal->group_stop_count))
> 		child->jobctl |= JOBCTL_STOP_PENDING;
> 
> Yes, we set JOBCTL_STOP_PENDING correctly. But what about JOBCTL_STOP_CONSUME?
> It was potentially flushed by task_clear_jobctl_pending() above.

Right, will fix.  Probably best to just clear trap ones there.

> 2. ptrace_trap_notify() always sets JOBCTL_TRAP_NOTIFY. Even if the caller
> is prepare_signal(SIGCONT) and there were no SIGSTOP in the past. This looks
> a bit strange to me. I mean, perhaps it would be better to provoke the trap
> only if this SIGCONT is going to change the jobctl state.

Sure.  It doesn't really matter tho and might even be better for
weeding out invalid assumptions.

> In any case, we shouldn't set JOBCTL_TRAP_NOTIFY if fatal_signal_pending().
> do_signal_stop() is fine, and prepare_signal(SIGCONT) can't race with SIGKILL.
> but it can race with the zap_other_threads-like code (exec, coredump).
> If we set JOBCTL_TRAP_NOTIFY when fatal_signal_pending() is true, the tracee
> can't stop, it will call get_signal_to_deliver()->ptrace_stop() in a loop
> forever and the tracer can't clear this bit.
> 
> Minor, but perhaps it would be more consistent to check PF_EXITING in
> the !task_is_traced() case.

Yes, indeed.  Will update.  It would be nice if we have a helper which
checks PF_EXITING before manipulating the flags.  It seems a bit too
fragile as it currently stands.

> 3. The similar (but currently theoretical) problem with PTRACE_TRAP_STOP.
> We shouldn't assume anything about arch_ptrace_stop_needed(), the code
> should work correctly even if it always returns true. This means that,
> say, PTRACE_INTERRUPT should not set JOBCTL_TRAP_STOP if the tracee was
> killed, or ptrace_stop() should clear JOBCTL_TRAP_MASK if it aborts.

Hmmm... yes.  I think I'll add a wrapper to raise trap conditions
which always check PF_EXITING.

> -------------------------------------------------------------------------------
> Now about the API. Can't we avoid the "asynchronous" re-trapping and the
> subsequent complications like PTRACE_GETSIGINFO-must-be-called-to-cont?
> 
> What if we simply add the new request, say, PTRACE_JOBCTL_CONT which acts
> as "do not cont, but please report me the next change in jctl state". IOW,
> in short, instead of JOBCTL_BLOCK_NOTIFY we add JOBCTL_PLEASE_NOTIFY which
> is set by this request.
> 
> Roughly, PTRACE_JOBCTL_CONT works as follows:
> 
> 	- it is only valid if the tracee reported PTRACE_EVENT_STOP
> 
> 	- SIGCONT/do_signal_stop/prepare_signal(SIGCONT) set
> 	  JOBCTL_TRAP_NOTIFY but do not wakeup unless
> 	  JOBCTL_PLEASE_NOTIFY was set by PTRACE_JOBCTL_CONT
> 
> 	- of course, PTRACE_JOBCTL_CONT checks if JOBCTL_TRAP_NOTIFY
> 	  is already set.

The main idea was that per-task execution state while ptraced is
separate, and thus asynchronous, from group stop state.  Group stop
affects task execution but is ultimiately a separate attribute.  This
is reflected in the modifications made upto this point - which
exit_code to use, and PTRACE_CONT working beneath group stop state.

While ptraced, group stop is a separate and asynchronous event and I
think it would be best if the interface reflects that rather than
trying to make it look like something different.

IIUC, the interface you suggested doesn't remove asynchronousity but,
rather, confines it, but to me it seems to be a too thin a veil.
e.g. What is the state of the tracee after PTRACE_JOBCTL_CONT?  It is
trapped but could be waken asynchronously and the ptracer isn't
equipped to deal with that.  What happens if the ptracer issues
further PTRACE requests to the tracee?  Are they rejected?  What does
it mean to PTRACE_INTERRUPT while tracee is in this state?

I'm somewhat uncomfortable with the fact that SIGCONT is directly
resuming tracee.  Re-trapping is different.  The implementation might
not be the easiest to swallow but conceptually it's just generating a
notification telling the ptracer that something has changed (and we
can try to do that from SIGCONT generator although I think re-trapping
is cleaner).  The execution state is still fully under the control of
ptracer.

(oops, please read on before replying.  I thought you were suggesting
 resuming tracee without tracer intervention.)

> As for PTRACE_CONT, it should set JOBCTL_PLEASE_NOTIFY if we resume the
> stopped tracee. Or the tracer can do PTRACE_JOBCTL_CONT + PTRACE_CONT.

Hmmm... so, if tracee is PTRACE_CONT'd from group stop and then
SIGCONT happens, what happens?  Does the tracee trap?  IOW, what does
it mean to have JOBCTL_PLEASE_NOTIFY set while tracee is not in
TRAP_STOP?

> jobctl stop, it can simply do PTRACE_JOBCTL_CONT and wait(). This looks
> much simpler to me. The only (afaics) complication is: PTRACE_INTERRUPT
> should guarantee the trap after PTRACE_JOBCTL_CONT (in this case we can
> simply set ->exit_code, no need to re-trap).
> 
> si_pt_flags (or whatever we use to report the jobctl state) can be recorded
> during the trap, not at the time of ptrace() syscall.

The reason why si_pt_flags is generated on PTRACE_GETSIGINFO is
because it reflects information which is asynchronous in nature.  It
doesn't make sense to sample the state when the task is going into
trap.  When a task enters a trap is fundamentally unrelated to how
group stop state changes.  I think sampling it on GETSIGINFO is the
right thing to do as group stop state _is_ asynchronous to whatever
the task itself is doing; otherwise, we end up with situation where no
job control state is being changed but two tracees consistently report
different group stop state depending on when they took trap.  How
should that be interpreted?  What does that even mean?  It gets
weirder if there are untraced tasks in the group.

> The implementation should be simpler too. Only ptrace_attach() needs the
> wait_trapping() logic, no need to complicate do_wait(). The tracee re-traps
> only if the tracer asks for this.

Ah, okay, so, it does re-trap on SIGCONT but it happens only after
JOBCTLY_PLEASE_NOTIFY is set.  Then the only thing which changes is
removal of wait(2) retry logic at the cost of requiring the user to
use a new request and stating that WNOHANG wait might fail
unexpectedly after PTRACE_JOBCTL_CONT.  Am I misunderstanding
something?

> And _imho_ this is more understandable from the user-space pov. To me it
> is a bit strange a TASK_TRACED tracee can run and then take another trap
> without some sort of CONT from debugger, even if we hide the transition.

I think it's best to represent a consistent interface / information
which reflects the actual interworking of job control and tracees.
Trying to mask it ultimately makes it more confusing, so I think it
makes sense to actually report job control state changes as
asynchronous notifications.  Re-trapping is an implementation detail
which is chosen to generate such notifications - we can try a
different approach - which I think would be worse but anyways.

If we take a step back and forget about the implementation itself for
a while, the re-trapping and on-the-fly GETSIGINFO are adding a rather
generic async notification mechanism for ptrace, and I still think
that's the better way to take.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-24 12:36                         ` Pedro Alves
@ 2011-05-24 14:02                           ` Tejun Heo
  2011-05-24 14:55                             ` Pedro Alves
  2011-05-25 18:18                             ` Oleg Nesterov
  0 siblings, 2 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-24 14:02 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

Hello,

On Tue, May 24, 2011 at 01:36:03PM +0100, Pedro Alves wrote:
> On Tuesday 24 May 2011 13:00:13, Tejun Heo wrote:
> > Hello,
> > 
> > On Tue, May 24, 2011 at 10:49:58AM +0100, Pedro Alves wrote:
> > > A couple interface questions that just crossed my mind:
> > > 
> > >  - on a fork/vfork/clone, if PTRACE_EVENT_FORK|VFORK|CLONE have been
> > >    enabled, will the tracer still see the new child stop with a
> > >    SIGSTOP, or will it see a PTRACE_EVENT_INTERRUPT?
> > 
> > This won't change, so SIGSTOP although we probably want to improve it
> > such that this can be distinguished from SIGTRAP from userland.
> 
> (I assume you meant SIGSTOP from userland.)  So that if a SIGSTOPs
> from userland is sent before the tracer waits for the child, the
> tracer sees a siginfo corresponding to the userland SIGSTOP?  Sounds
> like it might work.

Now that thinking more about it, TRAP_STOP (INTERRUPT trap) would
probably be better.  I'll think more about it.  For fork, it doesn't
really matter but deliverying SIGSTOP on CLONE isn't too good.  From
user's POV, TRAP_STOP should work too, right?

> > I'm currently leaning toward deprecating PTRACE_TRACEME.  If a task
> > can PTRACE_TRACEME, it may as well just do pause(2) and let the parent
> > SEIZE it.
> 
> Debuggers will want to nurse the child through a couple of
> execs (shell, then real debuggee), so that scheme requires a bit
> more synchronization, because SEIZE hides the magic exec SIGTRAP,
> and so the tracer needs to set the O_TRACEXEC option before the first
> exec, and make sure external signals don't break the synchronization.
> Reading/writing to/from blocking pipes for that initial synchronization
> is what GDB uses instead for e.g., hpux/ttrace support, which looks
> similar to using PTRACE_SEIZE for PTRACE_TRACEME.  A bit more
> cumbersome, though doable, I suppose.

Yes, it would require some sort of synchronization.  I was thinking
more along the line of ptracer modifying tracee so that it exits
pause(2) loop after ptracer issues PTRACE_CONT, but I agree using
pipes would be more straight-forward.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-24 14:02                           ` Tejun Heo
@ 2011-05-24 14:55                             ` Pedro Alves
  2011-05-25 18:18                             ` Oleg Nesterov
  1 sibling, 0 replies; 88+ messages in thread
From: Pedro Alves @ 2011-05-24 14:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, oleg, jan.kratochvil, linux-kernel, torvalds,
	akpm, indan, bdonlan

On Tuesday 24 May 2011 15:02:15, Tejun Heo wrote:
> Now that thinking more about it, TRAP_STOP (INTERRUPT trap) would
> probably be better.  I'll think more about it.  For fork, it doesn't
> really matter but deliverying SIGSTOP on CLONE isn't too good.  From
> user's POV, TRAP_STOP should work too, right?

AFAICS, it should be no problem.

Thanks,
-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-24 13:44         ` Tejun Heo
@ 2011-05-24 15:44           ` Tejun Heo
  2011-05-26 14:44           ` Oleg Nesterov
  1 sibling, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-24 15:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello, Oleg.

On Tue, May 24, 2011 at 03:44:11PM +0200, Tejun Heo wrote:
> > 2. ptrace_trap_notify() always sets JOBCTL_TRAP_NOTIFY. Even if the caller
> > is prepare_signal(SIGCONT) and there were no SIGSTOP in the past. This looks
> > a bit strange to me. I mean, perhaps it would be better to provoke the trap
> > only if this SIGCONT is going to change the jobctl state.
> 
> Sure.  It doesn't really matter tho and might even be better for
> weeding out invalid assumptions.

Hmmm... tried to change this but I think it's better as-is.  As an
optimization, it isn't whole lot meaningful, and, if I do it, I would
probably end up moving SIGNAL_STOP_STOPPED and group_stop_count test
above actual wake up and then skipping wake up if none was true - it
would make SIGCONT handling more fragile without much benefit.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-24 14:02                           ` Tejun Heo
  2011-05-24 14:55                             ` Pedro Alves
@ 2011-05-25 18:18                             ` Oleg Nesterov
  2011-05-26  9:10                               ` Tejun Heo
  1 sibling, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-25 18:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Pedro Alves, Denys Vlasenko, jan.kratochvil, linux-kernel,
	torvalds, akpm, indan, bdonlan

Sorry for delay again, I was a bit offline.

On 05/24, Tejun Heo wrote:
>
> Hello,
>
> On Tue, May 24, 2011 at 01:36:03PM +0100, Pedro Alves wrote:
> > On Tuesday 24 May 2011 13:00:13, Tejun Heo wrote:
> > > Hello,
> > >
> > > On Tue, May 24, 2011 at 10:49:58AM +0100, Pedro Alves wrote:
> > > > A couple interface questions that just crossed my mind:
> > > >
> > > >  - on a fork/vfork/clone, if PTRACE_EVENT_FORK|VFORK|CLONE have been
> > > >    enabled, will the tracer still see the new child stop with a
> > > >    SIGSTOP, or will it see a PTRACE_EVENT_INTERRUPT?
> > >
> > > This won't change, so SIGSTOP although we probably want to improve it
> > > such that this can be distinguished from SIGTRAP from userland.
> >
> > (I assume you meant SIGSTOP from userland.)  So that if a SIGSTOPs
> > from userland is sent before the tracer waits for the child, the
> > tracer sees a siginfo corresponding to the userland SIGSTOP?  Sounds
> > like it might work.
>
> Now that thinking more about it, TRAP_STOP (INTERRUPT trap) would
> probably be better.  I'll think more about it.  For fork, it doesn't
> really matter but deliverying SIGSTOP on CLONE isn't too good.  From
> user's POV, TRAP_STOP should work too, right?

I am a bit confused, sorry if I missed/misunderstood the context...

I thought, we already discussed this. Obviously, PT_SEIZED should be
inherited by the child during the auto-attach (and it is indeed copied
by ptrace_init_task). Why should we send SIGSTOP to the child then?
Even in the fork case, this leads to the same problems as the current
PTRACE_ATTACH has with the bogus SIGSTOP it sends. I think TRAP_STOP
is the only sane option, no?


Btw. Speaking of SEIZE->execvd->INTERRUPT which makes the tracee see
a SIGTRAP. Stupid question. Perhaps PTRACE_SEIZE should set
PT_TRACESYSGOOD | PT_TRACE_EXEC along with PT_SEIZED automatically?
PT_SEIZED implies the new behaviour anyway.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-24 10:28         ` Tejun Heo
@ 2011-05-25 18:29           ` Oleg Nesterov
  2011-05-26  9:14             ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-25 18:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On 05/24, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Mon, May 23, 2011 at 02:43:14PM +0200, Oleg Nesterov wrote:
> > On 05/18, Tejun Heo wrote:
> > > I've been thinking about Jan's suggestion to make ATTACH and DETACH
> > > not require tracee to trap. We already have this for DETACH for cases
> > > where the tracer is killed
> >
> > Yes, I still think that the new DETACH_XXX request which doesn't need
> > the stopped tracee makes sense. Yes, we have PTRACE_INTERRUPT. But please
> > recall the previous discussion, it is possible that the tracee can't
> > react to PTRACE_INTERRUPT and trap because it waits for other threads
> > we are tracing.
>
> Yeah, untrapped DETACH sounds nice but as you've already acknowledged
> in another reply, we have those nasty disable traps.

Yes. Which I never thought about, I alway assume PTRACE_DETACH_ASYNC
is trivial, but it is not. Lets forget it for now.

> > And. Currently there is no way to detach a zombie leader. Perhaps we
> > should change do_wait(), but it is not clear what should we do if the
> > tracer is the real parent (we already discussed this a bit).
>
> Hmmm... maybe just allow detaching zombie leader?

Yes, I think we should do this.

If we change PTRACE_DETACH (or add the new request) to allow this, then
I think it it should detach any zombie, leader or not.

Or we can change do_wait() to detach a zombie leader. In this case it
is not clear what should we do if the debugger is the real parent.
Perhaps do_wait() should do the same: detach a leader (but not reap).
When the last thread does, the real parent will be notified again.
IOW, wait(tgid) can succeed twice.

> As it's guaranteed
> to be not running, we don't have problem with ptrace_disable.

Agreed. In fact it can be running, but it can't return to the user-space,
and I think this is enough.

ptrace_detach()->ptrace_disable() can race with SIGKILL anyway, this means
it should safe to call it if the tracee is exiting/exited.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-25 18:18                             ` Oleg Nesterov
@ 2011-05-26  9:10                               ` Tejun Heo
  2011-05-26 10:01                                 ` Pedro Alves
  2011-05-26 14:55                                 ` Oleg Nesterov
  0 siblings, 2 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-26  9:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Pedro Alves, Denys Vlasenko, jan.kratochvil, linux-kernel,
	torvalds, akpm, indan, bdonlan

Hey,

On Wed, May 25, 2011 at 08:18:56PM +0200, Oleg Nesterov wrote:
> > Now that thinking more about it, TRAP_STOP (INTERRUPT trap) would
> > probably be better.  I'll think more about it.  For fork, it doesn't
> > really matter but deliverying SIGSTOP on CLONE isn't too good.  From
> > user's POV, TRAP_STOP should work too, right?
> 
> I am a bit confused, sorry if I missed/misunderstood the context...
> 
> I thought, we already discussed this. Obviously, PT_SEIZED should be
> inherited by the child during the auto-attach (and it is indeed copied
> by ptrace_init_task). Why should we send SIGSTOP to the child then?
> Even in the fork case, this leads to the same problems as the current
> PTRACE_ATTACH has with the bogus SIGSTOP it sends. I think TRAP_STOP
> is the only sane option, no?

I forgot about that and was thinking about events reported by
SIGTRAPs.  :)

Yes, in this case, TRAP_STOP is the only sane solution.

> Btw. Speaking of SEIZE->execvd->INTERRUPT which makes the tracee see
> a SIGTRAP. Stupid question. Perhaps PTRACE_SEIZE should set
> PT_TRACESYSGOOD | PT_TRACE_EXEC along with PT_SEIZED automatically?
> PT_SEIZED implies the new behaviour anyway.

Yeap, it makes sense to set them by default.  I was thinking about
defining a new PTRACE_EVENT_* rather than using 0x80 tho.  That way we
should be able to distinguish between enter and exit too.

This reminded me of another issue.  Currently we have two different
methods of reporting debug events.  One is direct ptrace event trap,
which ends up in ptrace_stop() from the event site and thus is
synchronous.  The other is sending SIGTRAP, literally.  Besides the
usual problems related with implicitly sending signals, this also
makes it difficult to determine which event is happening exactly and
makes the whole signal delivery/injection thing much worse.

I suspect that all ptrace traps probably started as actual SIGTRAPs
and so all the events were reported via signal delivery path and so
the signal "injection" worked; however, now we have synchronous traps
and some SIGTRAPs, which are confusing and unreliable.  I don't see
any reason why the events which are currently reported via SIGTRAPs
can't be reported via ptrace traps with unique PTRACE_EVENT_*.  I
don't think we can take synchronous traps directly but we can schedule
them, record relevant information in preallocated area and just set
TIF_SIGPENDING and let the signal delivery path report the trap.  This
will make all the events reliable and easily distinguishible and we
can fix the SIGTRAP signal problem.  What do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-25 18:29           ` Oleg Nesterov
@ 2011-05-26  9:14             ` Tejun Heo
  2011-05-26 15:01               ` Oleg Nesterov
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-26  9:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

Hello,

On Wed, May 25, 2011 at 08:29:19PM +0200, Oleg Nesterov wrote:
> > > And. Currently there is no way to detach a zombie leader. Perhaps we
> > > should change do_wait(), but it is not clear what should we do if the
> > > tracer is the real parent (we already discussed this a bit).
> >
> > Hmmm... maybe just allow detaching zombie leader?
> 
> Yes, I think we should do this.
> 
> If we change PTRACE_DETACH (or add the new request) to allow this, then
> I think it it should detach any zombie, leader or not.

I think we can just make PTRACE_DETACH to succeed for zombies.  No
reason to add a new request for this.

> Or we can change do_wait() to detach a zombie leader. In this case it
> is not clear what should we do if the debugger is the real parent.
> Perhaps do_wait() should do the same: detach a leader (but not reap).
> When the last thread does, the real parent will be notified again.
> IOW, wait(tgid) can succeed twice.

Just letting PTRACE_DETACH work for zombies sounds much simpler to me.

> > As it's guaranteed to be not running, we don't have problem with
> > ptrace_disable.
> 
> Agreed. In fact it can be running, but it can't return to the user-space,
> and I think this is enough.
> 
> ptrace_detach()->ptrace_disable() can race with SIGKILL anyway, this means
> it should safe to call it if the tracee is exiting/exited.

Yeap, unless userland gets to run again, I don't think there's any
problem.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-26  9:10                               ` Tejun Heo
@ 2011-05-26 10:01                                 ` Pedro Alves
  2011-05-26 10:11                                   ` Tejun Heo
  2011-05-26 14:55                                 ` Oleg Nesterov
  1 sibling, 1 reply; 88+ messages in thread
From: Pedro Alves @ 2011-05-26 10:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Denys Vlasenko, jan.kratochvil, linux-kernel,
	torvalds, akpm, indan, bdonlan

On Thursday 26 May 2011 10:10:41, Tejun Heo wrote:
> > Btw. Speaking of SEIZE->execvd->INTERRUPT which makes the tracee see
> > a SIGTRAP. 

I was told before that when SEIZE was in effect, there's no magic
SIGTRAP on exec.

> > Stupid question. Perhaps PTRACE_SEIZE should set
> > PT_TRACESYSGOOD | PT_TRACE_EXEC along with PT_SEIZED automatically?
> > PT_SEIZED implies the new behaviour anyway.
> 
> Yeap, it makes sense to set them by default.  

SYSGOOD makes sense, it just enables a means to distinguish syscall
SIGTRAPs from regular SIGTRAPs -- it doesn't cause child stops itself.
TRACE_EXEC, I'm not so sure.  (and it appears to have been proposed
on the premise that SEIZE would still report the SIGTRAP).
Why would that make sense, and not TRACE_FORK, for example?  I can imagine
a tracer only caring for syscall entry/exit, and not needing a special
event on exec.  IMO, any kind of event that forces a child stop that
would't happen if the child wasn't traced should have to be enabled
explicitly.

Heck, GDB passes a subset of signals straight down to
the child without informing the user (e.g., see "handle SIGALRM"
command), and it would be an improvement in
the tracer-affects-tracee's-scheduling department to have a means to
let ptrace know a tracer isn't interested in such-and-such signals.
Conversely, going with the non-intrusive tracing theme, it would
even make sense for the tracer to have to request "let me know
about signals (all or a subset) sent to tracee too"

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-26 10:01                                 ` Pedro Alves
@ 2011-05-26 10:11                                   ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-26 10:11 UTC (permalink / raw)
  To: Pedro Alves
  Cc: Oleg Nesterov, Denys Vlasenko, jan.kratochvil, linux-kernel,
	torvalds, akpm, indan, bdonlan

Hello, Pedro.

On Thu, May 26, 2011 at 11:01:42AM +0100, Pedro Alves wrote:
> SYSGOOD makes sense, it just enables a means to distinguish syscall
> SIGTRAPs from regular SIGTRAPs -- it doesn't cause child stops itself.
> TRACE_EXEC, I'm not so sure.  (and it appears to have been proposed
> on the premise that SEIZE would still report the SIGTRAP).
> Why would that make sense, and not TRACE_FORK, for example?  I can imagine
> a tracer only caring for syscall entry/exit, and not needing a special
> event on exec.  IMO, any kind of event that forces a child stop that
> would't happen if the child wasn't traced should have to be enabled
> explicitly.

The problem with exec is that very weird things happen during exec.
Tasks change their ids, tracees get silently pruned and so on, so
there might not be a transparent way for a ptracer to deal with it.
It needs to be notified and handle the situation whether it wants or
not.

What I was saying was there won't be SIGTRAP.  In general, we're
trying to move away from kernel implicitly sending actual signals.  If
we enable it by default, it will be a proper ptrace trap.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-24 13:44         ` Tejun Heo
  2011-05-24 15:44           ` Tejun Heo
@ 2011-05-26 14:44           ` Oleg Nesterov
  2011-05-28  7:32             ` Tejun Heo
  1 sibling, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-26 14:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hello Tejun,

On 05/24, Tejun Heo wrote:
>
> On Mon, May 23, 2011 at 01:45:59PM +0200, Oleg Nesterov wrote:
> > Now about the API. Can't we avoid the "asynchronous" re-trapping and the
> > subsequent complications like PTRACE_GETSIGINFO-must-be-called-to-cont?
> >
> > What if we simply add the new request, say, PTRACE_JOBCTL_CONT which acts
> > as "do not cont, but please report me the next change in jctl state". IOW,
> > in short, instead of JOBCTL_BLOCK_NOTIFY we add JOBCTL_PLEASE_NOTIFY which
> > is set by this request.
> >
> > Roughly, PTRACE_JOBCTL_CONT works as follows:
> >
> > 	- it is only valid if the tracee reported PTRACE_EVENT_STOP
> >
> > 	- SIGCONT/do_signal_stop/prepare_signal(SIGCONT) set
> > 	  JOBCTL_TRAP_NOTIFY but do not wakeup unless
> > 	  JOBCTL_PLEASE_NOTIFY was set by PTRACE_JOBCTL_CONT
> >
> > 	- of course, PTRACE_JOBCTL_CONT checks if JOBCTL_TRAP_NOTIFY
> > 	  is already set.
>
> While ptraced, group stop is a separate and asynchronous event and I
> think it would be best if the interface reflects

I agree. And it continues to be asynchronous, just the tracer can
explicitly tell when it wants to know about the state change.

> What is the state of the tracee after PTRACE_JOBCTL_CONT?  It is
> trapped

Yes,

> but could be waken asynchronously and the ptracer isn't
> equipped to deal with that.

It can be waken by SIGCONT, yes, but the tracer is equipped, it asked
for this.

> What happens if the ptracer issues
> further PTRACE requests to the tracee?  Are they rejected?

Yes. Except PTRACE_INTERRUPT, of course. (OK, may be PTRACE_CONT makes
sense too but this is minor).

> it mean to PTRACE_INTERRUPT while tracee is in this state?

It should guarentee PTRACE_EVENT_STOP, as if the tracee was running.

IOW. The tracee is no longer quiescent after PTRACE_JOBCTL_CONT. It is
still TASK_TRACED (although we could use another state), but it is
waiting for another jobctl state change. The tracer should wait for
notification (or call wait) as usual. No need to hide the TRACED->
RUNNING->TRACED transition during the re-trapping.

> > As for PTRACE_CONT, it should set JOBCTL_PLEASE_NOTIFY if we resume the
> > stopped tracee. Or the tracer can do PTRACE_JOBCTL_CONT + PTRACE_CONT.
>
> Hmmm... so, if tracee is PTRACE_CONT'd from group stop and then
> SIGCONT happens, what happens?  Does the tracee trap?

Yes. This is (almost) the same case as above, ignoring the small
implementation details.

To me this looks really simpler and more understandable for the user
space. If the simple tracer doesn't care about SIGCONT it can simply
do PTRACE_CONT without PTRACE_GETSIGINFO.

> > jobctl stop, it can simply do PTRACE_JOBCTL_CONT and wait(). This looks
> > much simpler to me. The only (afaics) complication is: PTRACE_INTERRUPT
> > should guarantee the trap after PTRACE_JOBCTL_CONT (in this case we can
> > simply set ->exit_code, no need to re-trap).
> >
> > si_pt_flags (or whatever we use to report the jobctl state) can be recorded
> > during the trap, not at the time of ptrace() syscall.
>
> The reason why si_pt_flags is generated on PTRACE_GETSIGINFO is
> because it reflects information which is asynchronous in nature.

I see. And I understand that this is necessary if the tracee re-traps
asynchronously. Either the tracer gets the up-to-date info, or the
tracee can't be continued without another trap. Honestly, I do not
like this, but I agree this works.

> It
> doesn't make sense to sample the state when the task is going into
> trap.  When a task enters a trap is fundamentally unrelated to how
> group stop state changes.

Sure. But:

> I think sampling it on GETSIGINFO is the
> right thing to do as group stop state _is_ asynchronous to whatever
> the task itself is doing;

The group stop state _is_ asynchronous to GETSIGINFO as well. Once again,
I understand (I hope) how this works, but dislike.


If we add PTRACE_JOBCTL_CONT, I think we can sample the state when
the task is going into trap. If the tracer cares about the next state
change, the tracee will trap again.

> > The implementation should be simpler too. Only ptrace_attach() needs the
> > wait_trapping() logic, no need to complicate do_wait(). The tracee re-traps
> > only if the tracer asks for this.
>
> Ah, okay, so, it does re-trap on SIGCONT but it happens only after
> JOBCTLY_PLEASE_NOTIFY is set.  Then the only thing which changes is
> removal of wait(2) retry logic at the cost of requiring the user to
> use a new request and stating that WNOHANG wait might fail
> unexpectedly after PTRACE_JOBCTL_CONT.

Yes. Except "unexpectedly" is a bit confusing. Once again, the tracee
is no longer quiescent.

> > And _imho_ this is more understandable from the user-space pov. To me it
> > is a bit strange a TASK_TRACED tracee can run and then take another trap
> > without some sort of CONT from debugger, even if we hide the transition.
>
> I think it's best to represent a consistent interface / information
> which reflects the actual interworking of job control and tracees.
> Trying to mask it ultimately makes it more confusing, so I think it
> makes sense to actually report job control state changes as
> asynchronous notifications.

Hmm. Here we start to disagree ;)

> If we take a step back and forget about the implementation itself for
> a while,

I must admit, I am a bit biased because I think this makes the
implementation more simple. But my point is, _in my opinion_
PTRACE_JOBCTL_CONT is better from the user-space pov.

> the re-trapping and on-the-fly GETSIGINFO are adding a rather
> generic async notification mechanism for ptrace,

For what? Why this is good?

It was always true that once the tracee reports the trap it is
completely quiescent untill debugger resumes it (except SIGKILL).
This is simple and understandable. Why should we change this?

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-26  9:10                               ` Tejun Heo
  2011-05-26 10:01                                 ` Pedro Alves
@ 2011-05-26 14:55                                 ` Oleg Nesterov
  1 sibling, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-26 14:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Pedro Alves, Denys Vlasenko, jan.kratochvil, linux-kernel,
	torvalds, akpm, indan, bdonlan

On 05/26, Tejun Heo wrote:
>
> This reminded me of another issue.  Currently we have two different
> methods of reporting debug events.  One is direct ptrace event trap,
> which ends up in ptrace_stop() from the event site and thus is
> synchronous.  The other is sending SIGTRAP, literally.  Besides the
> usual problems related with implicitly sending signals, this also
> makes it difficult to determine which event is happening exactly and
> makes the whole signal delivery/injection thing much worse.
>
> I suspect that all ptrace traps probably started as actual SIGTRAPs
> and so all the events were reported via signal delivery path and so
> the signal "injection" worked; however, now we have synchronous traps
> and some SIGTRAPs, which are confusing and unreliable.  I don't see
> any reason why the events which are currently reported via SIGTRAPs
> can't be reported via ptrace traps with unique PTRACE_EVENT_*.  I
> don't think we can take synchronous traps directly but we can schedule
> them, record relevant information in preallocated area and just set
> TIF_SIGPENDING and let the signal delivery path report the trap.  This
> will make all the events reliable and easily distinguishible and we
> can fix the SIGTRAP signal problem.  What do you think?

Agreed, it would be nice to avoid SIGTRAP's. Although right now it is
not clear to me how we can do this. The problematic case is the
single-stepping, afaics. Say, int3. Even if the task is ptraced, we
can't know why do we hit this insn. It is possible that this task
auto-debugs itself and thus we should send SIGTRAP. Even the
X86_EFLAGS_TF case is not that simple although we can probably take
TIF_SINGLESTEP into account. But again, the task can set X86_EFLAGS_TF
itself.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-26  9:14             ` Tejun Heo
@ 2011-05-26 15:01               ` Oleg Nesterov
  2011-05-27 18:21                 ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-26 15:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On 05/26, Tejun Heo wrote:
>
> Hello,
>
> On Wed, May 25, 2011 at 08:29:19PM +0200, Oleg Nesterov wrote:
> > > > And. Currently there is no way to detach a zombie leader. Perhaps we
> > > > should change do_wait(), but it is not clear what should we do if the
> > > > tracer is the real parent (we already discussed this a bit).
> > >
> > > Hmmm... maybe just allow detaching zombie leader?
> >
> > Yes, I think we should do this.
> >
> > If we change PTRACE_DETACH (or add the new request) to allow this, then
> > I think it it should detach any zombie, leader or not.
>
> I think we can just make PTRACE_DETACH to succeed for zombies.  No
> reason to add a new request for this.

OK.

> > Or we can change do_wait() to detach a zombie leader. In this case it
> > is not clear what should we do if the debugger is the real parent.
> > Perhaps do_wait() should do the same: detach a leader (but not reap).
> > When the last thread does, the real parent will be notified again.
> > IOW, wait(tgid) can succeed twice.
>
> Just letting PTRACE_DETACH work for zombies sounds much simpler to me.

Probably, but please note we have to modify do_wait() anyway. Otherwise
in general the tracer simply can not know the tracee has exited. IOW,
waitpid(zombie_leader_pid, WEXITED) should succeed without reaping if
delay_group_leader(), then the tracer can do PTRACE_DETACH. But this is
not symmetrical with sub-thread zombies.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-26 15:01               ` Oleg Nesterov
@ 2011-05-27 18:21                 ` Tejun Heo
  2011-05-30 19:22                   ` Oleg Nesterov
  0 siblings, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-05-27 18:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

Hello,

On Thu, May 26, 2011 at 05:01:50PM +0200, Oleg Nesterov wrote:
> On 05/26, Tejun Heo wrote:
> > > Or we can change do_wait() to detach a zombie leader. In this case it
> > > is not clear what should we do if the debugger is the real parent.
> > > Perhaps do_wait() should do the same: detach a leader (but not reap).
> > > When the last thread does, the real parent will be notified again.
> > > IOW, wait(tgid) can succeed twice.
> >
> > Just letting PTRACE_DETACH work for zombies sounds much simpler to me.
> 
> Probably, but please note we have to modify do_wait() anyway. Otherwise
> in general the tracer simply can not know the tracee has exited. IOW,
> waitpid(zombie_leader_pid, WEXITED) should succeed without reaping if
> delay_group_leader(), then the tracer can do PTRACE_DETACH. But this is
> not symmetrical with sub-thread zombies.

Yes, complicated.  The task/process duality conflicts.  wait(2)
already is different for group leader and succeeds twice for the
ptracer and the real parent (when they're different).

If we relocate ptrace group leader zombie wait, as suggested, such
that it waits for the task itself rather than the whole group, we
would be taking away a feature - ptracer waiting for the whole process
to exit and having access to group exit code, which might not mean
much, I have no idea.

I think group leader wait becoming asymmetrical with sub-thread
zombies is perfectly fine - it already is.  But would it be okay to
change ptrace wait(2) on group leader to wait for the task itself
rather than the whole group?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/10] ptrace: implement group stop notification for ptracer
  2011-05-26 14:44           ` Oleg Nesterov
@ 2011-05-28  7:32             ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-05-28  7:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: jan.kratochvil, vda.linux, linux-kernel, torvalds, akpm, indan, bdonlan

Hey, Oleg.

On Thu, May 26, 2011 at 04:44:02PM +0200, Oleg Nesterov wrote:
> IOW. The tracee is no longer quiescent after PTRACE_JOBCTL_CONT. It is
> still TASK_TRACED (although we could use another state), but it is
> waiting for another jobctl state change. The tracer should wait for
> notification (or call wait) as usual. No need to hide the TRACED->
> RUNNING->TRACED transition during the re-trapping.

Ah, okay, this makes it much clearer to me.  Thanks for the
explanation.

> To me this looks really simpler and more understandable for the user
> space. If the simple tracer doesn't care about SIGCONT it can simply
> do PTRACE_CONT without PTRACE_GETSIGINFO.

But, I'm not sure it achieves that.  Please read on.

> If we add PTRACE_JOBCTL_CONT, I think we can sample the state when
> the task is going into trap. If the tracer cares about the next state
> change, the tracee will trap again.

Oh, this is a separate issue.  Your suggestoin only narrows the window
in which re-trap is allowed to happen.  GETSIGINFO being generated on
request doesn't have much to do with that.  It's determined by who is
considered to be notification consumer.

I suppose you're suggesting STOP_TRAP to be the consumer such that
there will be at least one STOP_TRAP after each group stop state
change and the user is required to fetch what happened after each
STOP_TRAP - ie. the chain of custody becomes event -> TRAP_STOP ->
GETSIGINFO instead of event -> GETSIGINFO.

I did the direct thing primarily because I thought it was a primarily
separate thing from tracee state itself but yeah I can see going
through TRAP_STOP could be better, but I don't think it has anything
to do with when to allow re-trap.

> > Ah, okay, so, it does re-trap on SIGCONT but it happens only after
> > JOBCTLY_PLEASE_NOTIFY is set.  Then the only thing which changes is
> > removal of wait(2) retry logic at the cost of requiring the user to
> > use a new request and stating that WNOHANG wait might fail
> > unexpectedly after PTRACE_JOBCTL_CONT.
> 
> Yes. Except "unexpectedly" is a bit confusing. Once again, the tracee
> is no longer quiescent.

But we better make the behavior consistent.  If it's non-quiescent,
make wait(2) and non-INTERRUPT ptrace requests should fail
consistently; otherwise, users are likely to get it wrong and such
bugs can be very difficult to track down and reproduce.

> I must admit, I am a bit biased because I think this makes the
> implementation more simple. But my point is, _in my opinion_
> PTRACE_JOBCTL_CONT is better from the user-space pov.

I can't agree there.  More on this below.

> > the re-trapping and on-the-fly GETSIGINFO are adding a rather
> > generic async notification mechanism for ptrace,
> 
> For what? Why this is good?

The thing I liked about it was that it's a straight forward mostly
independent event mechanism.  It may be used for other events and it
forces the user to follow event delivery protocol (otherwise, trap
won't clear).  If you forget about GETSIGINFO and consider it to be a
different thing - say PTRACE_GETEVENTS or something - it really is
straight forward.  The entanglement with GETSIGINFO is a bit messy but
that's what we compromise when trying to live with legacy interface.

Anyways, I'm perfectly fine with sticky TRAP_STOP too.  It makes sense
in a different, more group stop specific way.

> It was always true that once the tracee reports the trap it is
> completely quiescent untill debugger resumes it (except SIGKILL).
> This is simple and understandable. Why should we change this?

It's a bit besides the point but just for the record.  It hasn't been
true upto 2.6.40-rc1.  Group stopped tracee could be woken up by
SIGCONT before, so we actually had userland-visible non-quiescent
state.

Anyways, the task is quiescent when viewed from userland.  That's the
whole point of TRAPPING.  Making the re-trapping transparent.  When
viewed from userland, the only difference between TRAPPING and
JOBCTL_CONT is when re-trapping is allowed.

With TRAPPING, re-trapping is allowed whenever tracee is in TRAP_STOP
and the tracee is considered to be in quiescent state through the
whole time - ie. wait(2) and ptrace(2) operatons aren't affected by
generation of new event - I suppose this is the part that you don't
like.

With JOBCTL_CONT, re-trapping is allowed while tracee is in TRAP_STOP
and after JOBCL_CONT is issued and after JOBCTL_CONT is issued the
tracee is considered non-quiescent until the next trap (which can be
caused by group stop event or explicit INTERRUPT request).  As tracee
can be trated as non-quiescent, wait(2) and ptrace(2) don't need to be
transparent against async events - we can just fail them.  Note that
we still need to change wait(2) so that it recognizes the state and
fails.

Hmmm... yeah, I agree, the making-it-transparent part is tricky and
implementation would be simpler with JOBCTL_CONT (BTW I hope we can
find a better name.  It's a bit confusing to me.); however, if we
forget about implementation details and view it from userland POV, I
think TRAPPING is the better behavior.

The notification itself isn't intrusive - for the most part it's just
generation of another exit_code which can be fetched by the user if it
wants to, so there really isn't much to be gained from userland with
JOBCT_CONT.  Its primary appeal is that it can simplify kernel
implementation, which is not a bad thing at all.  It's a bit messier
to use from userland but not by too much and doesn't compromise any
functionality AFAICS, so yeah, maybe.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-27 18:21                 ` Tejun Heo
@ 2011-05-30 19:22                   ` Oleg Nesterov
       [not found]                     ` <BANLkTimupSd774N-VBoswOj+Dza=5ofvWQ@mail.gmail.com>
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-30 19:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Denys Vlasenko, jan.kratochvil, linux-kernel, torvalds, akpm,
	indan, bdonlan

On 05/27, Tejun Heo wrote:
>
> On Thu, May 26, 2011 at 05:01:50PM +0200, Oleg Nesterov wrote:
> > On 05/26, Tejun Heo wrote:
> > > > Or we can change do_wait() to detach a zombie leader. In this case it
> > > > is not clear what should we do if the debugger is the real parent.
> > > > Perhaps do_wait() should do the same: detach a leader (but not reap).
> > > > When the last thread does, the real parent will be notified again.
> > > > IOW, wait(tgid) can succeed twice.
> > >
> > > Just letting PTRACE_DETACH work for zombies sounds much simpler to me.
> >
> > Probably, but please note we have to modify do_wait() anyway. Otherwise
> > in general the tracer simply can not know the tracee has exited. IOW,
> > waitpid(zombie_leader_pid, WEXITED) should succeed without reaping if
> > delay_group_leader(), then the tracer can do PTRACE_DETACH. But this is
> > not symmetrical with sub-thread zombies.
>
> Yes, complicated.  The task/process duality conflicts.  wait(2)
> already is different for group leader and succeeds twice for the
> ptracer and the real parent (when they're different).
>
> If we relocate ptrace group leader zombie wait, as suggested, such
> that it waits for the task itself rather than the whole group, we
> would be taking away a feature - ptracer waiting for the whole process

Partly yes. The parent will be notified again and it can do another
wait() later. The problem is, this is confusing.

> I think group leader wait becoming asymmetrical with sub-thread
> zombies is perfectly fine - it already is.

Well, probably yes... But why? We have to change do_wait() anyway to
report the death of the leader. If we do this, we can detach it as
well.

> But would it be okay to
> change ptrace wait(2) on group leader to wait for the task itself
> rather than the whole group?

Yes, this is iffy. Simply because this can confuse the userspace.
May be we can add W_PTRACED_THREAD_EXITED? (should be used instead
of WEXITED by ptracer). Or, perhaps WEXITED should succeed but put
something special into status?

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
       [not found]                     ` <BANLkTimupSd774N-VBoswOj+Dza=5ofvWQ@mail.gmail.com>
@ 2011-05-31 19:08                       ` Oleg Nesterov
  2011-05-31 21:32                         ` Linus Torvalds
  2011-06-01  5:34                         ` Tejun Heo
  0 siblings, 2 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-05-31 19:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, indan, Tejun Heo, bdonlan, linux-kernel,
	jan.kratochvil, akpm

On 05/30, Linus Torvalds wrote:
>
> Guys, changing interfaces is not going to help user space. The current ones
> may not be wonderful, but they work.
>
> Making up some new "better" interface would be just worse for users, they
> still need to support the old one anyway.

IOW, you are starting to agree that ptrace is not fixable ;)

On a seriouse note, of course you are right, but what we should do?

Speaking of this particular problem, it is really annoying. The leader
exits, and it even notifies the tracer. For what? waitpid() returns 0
or hangs depending on WNOHANG, any ptrace() request fails with ESRCH.

Ironically, recently I got the bug-report which blaims this behaviour,
although it was mentioned many times during the previous discussions.

I agree, we can't change the current behaviour of WEXITED in this case.

I also agree that if we, say, add WPTRACE_EXITED or change the semantics
of WEXITED when the traced leader is PT_SEIZED, then probably this new
feature won't be used by gdb/etc in the forseeable future.

So what should we do?

OK, this particular problem is minor, perhaps we can simply ignore it
and do not try to help the userspace.

But the same objection applies to any new feature in this area. Say,
PTRACE_INTERRUPT. To me, this request is obviously good/useful and
nobody argued so far. But, will it ever be used by gdb? I am not sure,
but at least we add the possibility.

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-31 19:08                       ` Oleg Nesterov
@ 2011-05-31 21:32                         ` Linus Torvalds
  2011-06-01 20:04                           ` Oleg Nesterov
  2011-06-01  5:34                         ` Tejun Heo
  1 sibling, 1 reply; 88+ messages in thread
From: Linus Torvalds @ 2011-05-31 21:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Denys Vlasenko, indan, Tejun Heo, bdonlan, linux-kernel,
	jan.kratochvil, akpm

On Wed, Jun 1, 2011 at 4:08 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On a seriouse note, of course you are right, but what we should do?

Have we *ever* gotten a bug-report about these kinds of things from
actual users?

IOW, it may be an annoying issue from a theoretical standpoint, but
afaik it's not an annoying issue from a practical user standpoint.

Compare it to the _practically_ annoying issue we had with ^C and
bash, where it was fundamentally impossible to know whether the child
actually exited due to the ^C or not due to the interfaces. That one
caused actual practical issues that people hit, and that were
impossible to work around in theory, and hard to work around in
practice..

In comparison, I don't think pid-vs-namespace issues for ptrace or
some subtle interaction with execve is really all that big of a
problem in practice.

Keep the eye on the ball: look around what people actually complain
about for practical reason that they hit, rather than issues that
people say "wouldn't it be nice if.." about.

             Linus

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-31 19:08                       ` Oleg Nesterov
  2011-05-31 21:32                         ` Linus Torvalds
@ 2011-06-01  5:34                         ` Tejun Heo
  2011-06-01 20:08                           ` Oleg Nesterov
  1 sibling, 1 reply; 88+ messages in thread
From: Tejun Heo @ 2011-06-01  5:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Denys Vlasenko, indan, bdonlan, linux-kernel,
	jan.kratochvil, akpm

Hello, Oleg.

On Tue, May 31, 2011 at 09:08:09PM +0200, Oleg Nesterov wrote:
> Speaking of this particular problem, it is really annoying. The leader
> exits, and it even notifies the tracer. For what? waitpid() returns 0
> or hangs depending on WNOHANG, any ptrace() request fails with ESRCH.

Hmm... can't we just allow PTRACE_DETACH after PTRACE_EVENT_EXIT is
reported?  It doesn't involve changes to any visible behavior (other
than allowing DETACH of course) and we don't lose anything.  Haven't
really thought about the implementation but I don't think this would
be too difficult.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-05-31 21:32                         ` Linus Torvalds
@ 2011-06-01 20:04                           ` Oleg Nesterov
  0 siblings, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2011-06-01 20:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, indan, Tejun Heo, bdonlan, linux-kernel,
	jan.kratochvil, akpm

On 06/01, Linus Torvalds wrote:
>
> On Wed, Jun 1, 2011 at 4:08 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On a seriouse note, of course you are right, but what we should do?
>
> Have we *ever* gotten a bug-report about these kinds of things from
> actual users?

Yes ;) https://bugzilla.redhat.com/show_bug.cgi?id=705583

> IOW, it may be an annoying issue from a theoretical standpoint, but
> afaik it's not an annoying issue from a practical user standpoint.

Probably...

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-06-01  5:34                         ` Tejun Heo
@ 2011-06-01 20:08                           ` Oleg Nesterov
  2011-06-02  5:01                             ` Tejun Heo
  0 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2011-06-01 20:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Denys Vlasenko, indan, bdonlan, linux-kernel,
	jan.kratochvil, akpm

On 06/01, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Tue, May 31, 2011 at 09:08:09PM +0200, Oleg Nesterov wrote:
> > Speaking of this particular problem, it is really annoying. The leader
> > exits, and it even notifies the tracer. For what? waitpid() returns 0
> > or hangs depending on WNOHANG, any ptrace() request fails with ESRCH.
>
> Hmm... can't we just allow PTRACE_DETACH after PTRACE_EVENT_EXIT is
> reported?

Confused. PTRACE_DETACH surely works if the tracee reports EVENT_EXIT...
Could you explain what you meant?

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 03/10] ptrace: implement PTRACE_SEIZE
  2011-06-01 20:08                           ` Oleg Nesterov
@ 2011-06-02  5:01                             ` Tejun Heo
  0 siblings, 0 replies; 88+ messages in thread
From: Tejun Heo @ 2011-06-02  5:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Denys Vlasenko, indan, bdonlan, linux-kernel,
	jan.kratochvil, akpm

Hey,

On Wed, Jun 01, 2011 at 10:08:07PM +0200, Oleg Nesterov wrote:
> > Hmm... can't we just allow PTRACE_DETACH after PTRACE_EVENT_EXIT is
> > reported?
> 
> Confused. PTRACE_DETACH surely works if the tracee reports EVENT_EXIT...
> Could you explain what you meant?

Right, I briefly forgot it's already a ptrace stop and was thinking
about using PTRACE_EVENT_EXIT as the event notification point.
Anyways, back to the topic.

Maybe I misunderstood the problem but wasn't the problem about not
being able to wait for the exit of a leader thread and detach it?  We
have reliable (sans exec but that's a different story) exit
notification with EVENT_EXIT which even reports the exit_code, so I
don't see what the problem is.  What am I missing?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2011-06-02  5:01 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-16 18:17 [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Tejun Heo
2011-05-16 18:17 ` [PATCH 01/10] signal: remove three noop tracehooks Tejun Heo
2011-05-17 16:22   ` Christoph Hellwig
2011-05-17 16:27     ` Tejun Heo
2011-05-18 18:45   ` Oleg Nesterov
2011-05-19 12:11     ` Tejun Heo
2011-05-19 16:10       ` Oleg Nesterov
2011-05-16 18:17 ` [PATCH 02/10] job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap Tejun Heo
2011-05-18 16:48   ` Oleg Nesterov
2011-05-18 16:57     ` Oleg Nesterov
2011-05-19 10:19     ` Tejun Heo
2011-05-19 16:19       ` Oleg Nesterov
2011-05-16 18:17 ` [PATCH 03/10] ptrace: implement PTRACE_SEIZE Tejun Heo
2011-05-18  0:40   ` Denys Vlasenko
2011-05-18  9:55     ` Tejun Heo
2011-05-18 10:44       ` Denys Vlasenko
2011-05-18 11:14         ` Tejun Heo
2011-05-19 14:17       ` Tejun Heo
2011-05-19 15:02         ` Tejun Heo
2011-05-19 19:31         ` Pedro Alves
2011-05-19 22:42           ` Denys Vlasenko
2011-05-19 23:00             ` Pedro Alves
2011-05-20  1:44               ` Denys Vlasenko
2011-05-20  8:56                 ` Pedro Alves
2011-05-20  9:12                   ` Tejun Heo
2011-05-20  9:07               ` Tejun Heo
2011-05-20  9:27                 ` Pedro Alves
2011-05-20  9:31                   ` Tejun Heo
2011-05-24  9:49                     ` Pedro Alves
2011-05-24 12:00                       ` Tejun Heo
2011-05-24 12:36                         ` Pedro Alves
2011-05-24 14:02                           ` Tejun Heo
2011-05-24 14:55                             ` Pedro Alves
2011-05-25 18:18                             ` Oleg Nesterov
2011-05-26  9:10                               ` Tejun Heo
2011-05-26 10:01                                 ` Pedro Alves
2011-05-26 10:11                                   ` Tejun Heo
2011-05-26 14:55                                 ` Oleg Nesterov
2011-05-23 13:09         ` Oleg Nesterov
2011-05-23 12:43       ` Oleg Nesterov
2011-05-24 10:28         ` Tejun Heo
2011-05-25 18:29           ` Oleg Nesterov
2011-05-26  9:14             ` Tejun Heo
2011-05-26 15:01               ` Oleg Nesterov
2011-05-27 18:21                 ` Tejun Heo
2011-05-30 19:22                   ` Oleg Nesterov
     [not found]                     ` <BANLkTimupSd774N-VBoswOj+Dza=5ofvWQ@mail.gmail.com>
2011-05-31 19:08                       ` Oleg Nesterov
2011-05-31 21:32                         ` Linus Torvalds
2011-06-01 20:04                           ` Oleg Nesterov
2011-06-01  5:34                         ` Tejun Heo
2011-06-01 20:08                           ` Oleg Nesterov
2011-06-02  5:01                             ` Tejun Heo
2011-05-18 18:17   ` Oleg Nesterov
2011-05-19 10:34     ` Tejun Heo
2011-05-16 18:17 ` [PATCH 04/10] ptrace: implement PTRACE_INTERRUPT Tejun Heo
2011-05-18 18:38   ` Oleg Nesterov
2011-05-19 12:07     ` Tejun Heo
2011-05-19 16:21       ` Oleg Nesterov
2011-05-16 18:17 ` [PATCH 05/10] ptrace: restructure ptrace_getsiginfo() Tejun Heo
2011-05-16 18:17 ` [PATCH 06/10] ptrace: add siginfo.si_pt_flags Tejun Heo
2011-05-16 18:17 ` [PATCH 07/10] ptrace: make group stop state visible via PTRACE_GETSIGINFO Tejun Heo
2011-05-19 16:27   ` Oleg Nesterov
2011-05-19 16:40     ` Tejun Heo
2011-05-16 18:17 ` [PATCH 08/10] ptrace: don't let PTRACE_SETSIGINFO override __SI_TRAP siginfo Tejun Heo
2011-05-16 18:17 ` [PATCH 09/10] ptrace: add JOBCTL_BLOCK_NOTIFY Tejun Heo
2011-05-19 16:32   ` Oleg Nesterov
2011-05-19 16:44     ` Tejun Heo
2011-05-19 16:48       ` Oleg Nesterov
2011-05-19 16:58         ` Tejun Heo
2011-05-16 18:17 ` [PATCH 10/10] ptrace: implement group stop notification for ptracer Tejun Heo
2011-05-19 16:32   ` Oleg Nesterov
2011-05-19 16:57     ` Tejun Heo
2011-05-19 17:13       ` Oleg Nesterov
2011-05-19 22:48         ` Denys Vlasenko
2011-05-20  8:59           ` Tejun Heo
2011-05-23 13:34             ` Oleg Nesterov
2011-05-20  8:46         ` Tejun Heo
2011-05-19 16:58     ` Oleg Nesterov
2011-05-23 11:45       ` Oleg Nesterov
2011-05-24 13:44         ` Tejun Heo
2011-05-24 15:44           ` Tejun Heo
2011-05-26 14:44           ` Oleg Nesterov
2011-05-28  7:32             ` Tejun Heo
2011-05-18 18:50 ` [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification, take#2 Oleg Nesterov
2011-05-19 12:08   ` Tejun Heo
2011-05-19 15:04 ` Linus Torvalds
2011-05-19 15:19   ` Tejun Heo
2011-05-19 22:45   ` Denys Vlasenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.