All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET] ptrace,signal: cleanups and fixes
@ 2010-12-22 11:13 Tejun Heo
  2010-12-22 11:13 ` [PATCH 1/4] signal: fix SIGCONT notification code Tejun Heo
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Tejun Heo @ 2010-12-22 11:13 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, rjw, jan.kratochvil

Hello,

This patchset contains the following four reviewed and updated patches
from the "ptrace,signal: sane interaction between ptrace and job
control signals, take#2"[1] patchset.

 0001-signal-fix-SIGCONT-notification-code.patch
 0002-ptrace-remove-the-extra-wake_up_process-from-ptrace_.patch
 0003-signal-remove-superflous-try_to_freeze-loop-in-do_si.patch
 0004-ptrace-kill-tracehook_notify_jctl.patch

The incorrect @why determination logic is dropped from 0004.

The git branch is located at,

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git ptrace-reviewed

Oleg, Roland, I'd like to base further patches on top of these, so
that we don't end up constantly reshuffling patches.  If there's a
better place to host the above patches, please feel free to pull the
branch or apply the patches.  Just let me know where to base further
patches.

Thank you.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1072474

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] signal: fix SIGCONT notification code
  2010-12-22 11:13 [PATCHSET] ptrace,signal: cleanups and fixes Tejun Heo
@ 2010-12-22 11:13 ` Tejun Heo
  2011-01-12 20:41   ` Jan Kratochvil
  2010-12-22 11:13 ` [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2010-12-22 11:13 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, rjw, jan.kratochvil; +Cc: Tejun Heo

After a task receives SIGCONT, its parent is notified via SIGCHLD with
its siginfo describing what the notified event is.  If SIGCONT is
received while the child process is stopped, the code should be
CLD_CONTINUED.  If SIGCONT is recieved while the child process is in
the process of being stopped, it should be CLD_STOPPED.  Which code to
use is determined in prepare_signal() and recorded in signal->flags
using SIGNAL_CLD_CONTINUED|STOP flags.

get_signal_deliver() should test these flags and then notify
accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying
CLD_CONTINUED if the signal is delivered before the task is wait(2)ed
and CLD_STOPPED if the state was fetched already.

Fix it by testing SIGNAL_CLD_CONTINUED.  While at it, uncompress the
?: test into if/else clause for better readability.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 4e3cff1..fe004b5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1853,8 +1853,13 @@ relock:
 	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
 	 */
 	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
-		int why = (signal->flags & SIGNAL_STOP_CONTINUED)
-				? CLD_CONTINUED : CLD_STOPPED;
+		int why;
+
+		if (signal->flags & SIGNAL_CLD_CONTINUED)
+			why = CLD_CONTINUED;
+		else
+			why = CLD_STOPPED;
+
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
 		why = tracehook_notify_jctl(why, CLD_CONTINUED);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2010-12-22 11:13 [PATCHSET] ptrace,signal: cleanups and fixes Tejun Heo
  2010-12-22 11:13 ` [PATCH 1/4] signal: fix SIGCONT notification code Tejun Heo
@ 2010-12-22 11:13 ` Tejun Heo
  2011-01-17 22:13   ` Roland McGrath
  2010-12-22 11:13 ` [PATCH 3/4] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
  2010-12-22 11:13 ` [PATCH 4/4] ptrace: kill tracehook_notify_jctl() Tejun Heo
  3 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2010-12-22 11:13 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, rjw, jan.kratochvil; +Cc: Tejun Heo

This wake_up_process() has a turbulent history.  This is a remnant
from ancient ptrace implementation and patently wrong.  Commit
95a3540d (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx
logic) removed it but the change was reverted later by commit edaba2c5
(ptrace: revert "ptrace_detach: the wrong wakeup breaks the
ERESTARTxxx logic ") citing compatibility breakage and general
brokeness of the whole group stop / ptrace interaction.

Digging through the mailing archives, the compatibility breakage
doesn't seem to be critical in the sense that the behavior isn't well
defined or reliable to begin with and it seems to have been agreed to
remove the wakeup with proper cleanup of the whole thing.

Now that the group stop and its interaction with ptrace are cleaned up
and well defined, it's high time to finally kill this silliness.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/ptrace.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 99bbaa3..a8c9f26 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -312,8 +312,6 @@ int ptrace_detach(struct task_struct *child, unsigned int data)
 	if (child->ptrace) {
 		child->exit_code = data;
 		dead = __ptrace_detach(current, child);
-		if (!child->exit_state)
-			wake_up_process(child);
 	}
 	write_unlock_irq(&tasklist_lock);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] signal: remove superflous try_to_freeze() loop in do_signal_stop()
  2010-12-22 11:13 [PATCHSET] ptrace,signal: cleanups and fixes Tejun Heo
  2010-12-22 11:13 ` [PATCH 1/4] signal: fix SIGCONT notification code Tejun Heo
  2010-12-22 11:13 ` [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
@ 2010-12-22 11:13 ` Tejun Heo
  2011-01-17 22:10   ` Roland McGrath
  2010-12-22 11:13 ` [PATCH 4/4] ptrace: kill tracehook_notify_jctl() Tejun Heo
  3 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2010-12-22 11:13 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, rjw, jan.kratochvil; +Cc: Tejun Heo

do_signal_stop() is used only by get_signal_to_deliver() and after a
successful signal stop, it always calls try_to_freeze(), so the
try_to_freeze() loop around schedule() in do_signal_stop() is
superflous and confusing.  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 kernel/signal.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index fe004b5..0a6816a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1781,9 +1781,7 @@ static int do_signal_stop(int signr)
 	}
 
 	/* Now we don't run again until woken by SIGCONT or SIGKILL */
-	do {
-		schedule();
-	} while (try_to_freeze());
+	schedule();
 
 	tracehook_finish_jctl();
 	current->exit_code = 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] ptrace: kill tracehook_notify_jctl()
  2010-12-22 11:13 [PATCHSET] ptrace,signal: cleanups and fixes Tejun Heo
                   ` (2 preceding siblings ...)
  2010-12-22 11:13 ` [PATCH 3/4] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
@ 2010-12-22 11:13 ` Tejun Heo
  2011-01-17 22:14   ` Roland McGrath
  3 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2010-12-22 11:13 UTC (permalink / raw)
  To: oleg, roland, linux-kernel, rjw, jan.kratochvil; +Cc: Tejun Heo

tracehook_notify_jctl() aids in determining whether and what to report
to the parent when a task is stopped or continued.  The function also
adds an extra requirement that siglock may be released across it,
which is currently unused and quite difficult to satisfy in
well-defined manner.

As job control and the notifications are about to receive major
overhaul, remove the tracehook and open code it.  If ever necessary,
let's factor it out after the overhaul.

= Oleg spotted incorrect CLD_CONTINUED/STOPPED selection when ptraced.
  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
---
 include/linux/tracehook.h |   27 ---------------------------
 kernel/signal.c           |   34 ++++++++++++++--------------------
 2 files changed, 14 insertions(+), 47 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3a2e66d..b073f3c 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -469,33 +469,6 @@ static inline int tracehook_get_signal(struct task_struct *task,
 }
 
 /**
- * tracehook_notify_jctl - report about job control stop/continue
- * @notify:		zero, %CLD_STOPPED or %CLD_CONTINUED
- * @why:		%CLD_STOPPED or %CLD_CONTINUED
- *
- * This is called when we might call do_notify_parent_cldstop().
- *
- * @notify is zero if we would not ordinarily send a %SIGCHLD,
- * or is the %CLD_STOPPED or %CLD_CONTINUED .si_code for %SIGCHLD.
- *
- * @why is %CLD_STOPPED when about to stop for job control;
- * we are already in %TASK_STOPPED state, about to call schedule().
- * It might also be that we have just exited (check %PF_EXITING),
- * but need to report that a group-wide stop is complete.
- *
- * @why is %CLD_CONTINUED when waking up after job control stop and
- * ready to make a delayed @notify report.
- *
- * Return the %CLD_* value for %SIGCHLD, or zero to generate no signal.
- *
- * Called with the siglock held.
- */
-static inline int tracehook_notify_jctl(int notify, int why)
-{
-	return notify ?: (current->ptrace & PT_PTRACED) ? why : 0;
-}
-
-/**
  * tracehook_finish_jctl - report about return from job control stop
  *
  * This is called by do_signal_stop() after wakeup.
diff --git a/kernel/signal.c b/kernel/signal.c
index 0a6816a..7dc0ca2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1727,7 +1727,7 @@ void ptrace_notify(int exit_code)
 static int do_signal_stop(int signr)
 {
 	struct signal_struct *sig = current->signal;
-	int notify;
+	int notify = 0;
 
 	if (!sig->group_stop_count) {
 		struct task_struct *t;
@@ -1759,19 +1759,16 @@ static int do_signal_stop(int signr)
 	 * a group stop in progress and we are the last to stop, report
 	 * to the parent.  When ptraced, every thread reports itself.
 	 */
-	notify = sig->group_stop_count == 1 ? CLD_STOPPED : 0;
-	notify = tracehook_notify_jctl(notify, CLD_STOPPED);
-	/*
-	 * tracehook_notify_jctl() can drop and reacquire siglock, so
-	 * we keep ->group_stop_count != 0 before the call. If SIGCONT
-	 * or SIGKILL comes in between ->group_stop_count == 0.
-	 */
-	if (sig->group_stop_count) {
-		if (!--sig->group_stop_count)
-			sig->flags = SIGNAL_STOP_STOPPED;
-		current->exit_code = sig->group_exit_code;
-		__set_current_state(TASK_STOPPED);
+	if (!--sig->group_stop_count) {
+		sig->flags = SIGNAL_STOP_STOPPED;
+		notify = CLD_STOPPED;
 	}
+	if (task_ptrace(current))
+		notify = CLD_STOPPED;
+
+	current->exit_code = sig->group_exit_code;
+	__set_current_state(TASK_STOPPED);
+
 	spin_unlock_irq(&current->sighand->siglock);
 
 	if (notify) {
@@ -1860,14 +1857,11 @@ relock:
 
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		why = tracehook_notify_jctl(why, CLD_CONTINUED);
 		spin_unlock_irq(&sighand->siglock);
 
-		if (why) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current->group_leader, why);
-			read_unlock(&tasklist_lock);
-		}
+		read_lock(&tasklist_lock);
+		do_notify_parent_cldstop(current->group_leader, why);
+		read_unlock(&tasklist_lock);
 		goto relock;
 	}
 
@@ -2034,7 +2028,7 @@ void exit_signals(struct task_struct *tsk)
 	if (unlikely(tsk->signal->group_stop_count) &&
 			!--tsk->signal->group_stop_count) {
 		tsk->signal->flags = SIGNAL_STOP_STOPPED;
-		group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
+		group_stop = CLD_STOPPED;
 	}
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] signal: fix SIGCONT notification code
  2010-12-22 11:13 ` [PATCH 1/4] signal: fix SIGCONT notification code Tejun Heo
@ 2011-01-12 20:41   ` Jan Kratochvil
  2011-01-13 11:27     ` Tejun Heo
  2011-01-13 14:40     ` Oleg Nesterov
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Kratochvil @ 2011-01-12 20:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: oleg, roland, linux-kernel, rjw

On Wed, 22 Dec 2010 12:13:15 +0100, Tejun Heo wrote:
> If SIGCONT is received while the child process is stopped, the code should
> be CLD_CONTINUED.  If SIGCONT is recieved while the child process is in the
> process of being stopped, it should be CLD_STOPPED.

If a process does
	kill (PID, SIGSTOP);
	<varying delay, possibly even from a different process>
	kill (PID, SIGCONT);

does it mean the PID's parent may get different waitid() results?
Or even that PID will finally remain still `T (stopped)'?

I do not see it has any userland impact, the
PTRACE_ATTACH-to-T(stopped)-process is already racy for different reasons.


Thanks,
Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] signal: fix SIGCONT notification code
  2011-01-12 20:41   ` Jan Kratochvil
@ 2011-01-13 11:27     ` Tejun Heo
  2011-01-13 14:40     ` Oleg Nesterov
  1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2011-01-13 11:27 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: oleg, roland, linux-kernel, rjw

Hello, Jan.

On Wed, Jan 12, 2011 at 09:41:36PM +0100, Jan Kratochvil wrote:
> On Wed, 22 Dec 2010 12:13:15 +0100, Tejun Heo wrote:
> > If SIGCONT is received while the child process is stopped, the code should
> > be CLD_CONTINUED.  If SIGCONT is recieved while the child process is in the
> > process of being stopped, it should be CLD_STOPPED.
> 
> If a process does
> 	kill (PID, SIGSTOP);
> 	<varying delay, possibly even from a different process>
> 	kill (PID, SIGCONT);
> 
> does it mean the PID's parent may get different waitid() results?
> Or even that PID will finally remain still `T (stopped)'?

Yeah, the @why part could be different with the fix.  Before the fix
the result was indeterministic.

> I do not see it has any userland impact, the
> PTRACE_ATTACH-to-T(stopped)-process is already racy for different reasons.

I agree that it wouldn't have any userland impact especially given
that the previous behavior was rather indeterministic.  It just fixes
an obvious bug which tested the wrong flag.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] signal: fix SIGCONT notification code
  2011-01-12 20:41   ` Jan Kratochvil
  2011-01-13 11:27     ` Tejun Heo
@ 2011-01-13 14:40     ` Oleg Nesterov
  1 sibling, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2011-01-13 14:40 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: Tejun Heo, roland, linux-kernel, rjw

On 01/12, Jan Kratochvil wrote:
>
> On Wed, 22 Dec 2010 12:13:15 +0100, Tejun Heo wrote:
> > If SIGCONT is received while the child process is stopped, the code should
> > be CLD_CONTINUED.  If SIGCONT is recieved while the child process is in the
> > process of being stopped, it should be CLD_STOPPED.
>
> If a process does
> 	kill (PID, SIGSTOP);
> 	<varying delay, possibly even from a different process>
> 	kill (PID, SIGCONT);
>
> does it mean the PID's parent may get different waitid() results?

No. The only problem is that SIGCHLD can come with the wrong
info.si_code/si_status

> Or even that PID will finally remain still `T (stopped)'?

No.

> I do not see it has any userland impact,

Yes, the problem is minor.

Still, this is the clear bug due to
SIGNAL_CLD_CONTINUED/SIGNAL_STOP_CONTINUED typo. And damn, it was
introduced by me ;)

> the
> PTRACE_ATTACH-to-T(stopped)-process is already racy for different reasons.

Please note that this bug affects !ptrace case too.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] signal: remove superflous try_to_freeze() loop in do_signal_stop()
  2010-12-22 11:13 ` [PATCH 3/4] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
@ 2011-01-17 22:10   ` Roland McGrath
  0 siblings, 0 replies; 14+ messages in thread
From: Roland McGrath @ 2011-01-17 22:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: oleg, linux-kernel, rjw, jan.kratochvil

Acked-by: Roland McGrath <roland@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2010-12-22 11:13 ` [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
@ 2011-01-17 22:13   ` Roland McGrath
  2011-01-27 13:34     ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Roland McGrath @ 2011-01-17 22:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: oleg, linux-kernel, rjw, jan.kratochvil

Of course I agree that the current code is wrong here.  But I'm still not
at all clear on what practical compatibility problems it introduces.  This
change is OK if and only if we are really making the only area clean and
well-defined in the same release cycle.  It doesn't really matter that the
old behavior was ill-defined and unreliable in absolute terms, because the
practical userland experience of it in real-world cases is what userland
has grown to expect.  We can't break any such expectations until we have a
clear answer about how to solve the problems correctly.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] ptrace: kill tracehook_notify_jctl()
  2010-12-22 11:13 ` [PATCH 4/4] ptrace: kill tracehook_notify_jctl() Tejun Heo
@ 2011-01-17 22:14   ` Roland McGrath
  0 siblings, 0 replies; 14+ messages in thread
From: Roland McGrath @ 2011-01-17 22:14 UTC (permalink / raw)
  To: Tejun Heo; +Cc: oleg, linux-kernel, rjw, jan.kratochvil

This causes problems for our unmerged patch-sets, but I'll leave it Oleg to
decide if it's OK to remove this hook now.

Thanks,
Roland


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2011-01-17 22:13   ` Roland McGrath
@ 2011-01-27 13:34     ` Tejun Heo
  2011-01-28  7:02       ` Roland McGrath
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2011-01-27 13:34 UTC (permalink / raw)
  To: Roland McGrath; +Cc: oleg, linux-kernel, rjw, jan.kratochvil

Hello,

On Mon, Jan 17, 2011 at 02:13:40PM -0800, Roland McGrath wrote:
> Of course I agree that the current code is wrong here.  But I'm still not
> at all clear on what practical compatibility problems it introduces.  This
> change is OK if and only if we are really making the only area clean and
> well-defined in the same release cycle.

I don't follow what you meant by the above sentence.  Can you please
clarify a bit?

> It doesn't really matter that the old behavior was ill-defined and
> unreliable in absolute terms, because the practical userland
> experience of it in real-world cases is what userland has grown to
> expect.  We can't break any such expectations until we have a clear
> answer about how to solve the problems correctly.

It's slightly more serious than that.  Calling wake_up_process()
unconditionally can cause a quite serious failure.  It wakes up a
process in any state and there are code paths which aren't prepared to
be woken up unless certain conditions are met.  A safer choice can be
changing it to wake up only tasks in interruptible sleep.

I have very difficult time imagining what real user visible
implication it would have in negative and noticeable manner, so while
I do agree that we should proceed carefully, I think it would be
better to take the chance and remove the erratic behavior.  We have
release cycles and testing at multiple layers after all.  If we find
out that it breaks something in serious manner, we can always revert.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2011-01-27 13:34     ` Tejun Heo
@ 2011-01-28  7:02       ` Roland McGrath
  2011-01-28 10:59         ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Roland McGrath @ 2011-01-28  7:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: oleg, linux-kernel, rjw, jan.kratochvil

> On Mon, Jan 17, 2011 at 02:13:40PM -0800, Roland McGrath wrote:
> > Of course I agree that the current code is wrong here.  But I'm still not
> > at all clear on what practical compatibility problems it introduces.  This
> > change is OK if and only if we are really making the only area clean and
> > well-defined in the same release cycle.
> 
> I don't follow what you meant by the above sentence.  Can you please
> clarify a bit?

I think I meant to write "the whole area" rather than "the only area".
The point I meant to make is that removing this (wrong) wake-up entirely
should only be done when we are also thoroughly cleaning up and verifying
everything about the behavior of stopping/resuming in ptrace.  When we get
to a clear statement of what all the rules are and we really believe that
the new code matches those rules, then fine.  When we are still in the
quagmire, no change is better than an incremental change whose subtle
ramifications we can't clearly articulate.

> It's slightly more serious than that.  Calling wake_up_process()
> unconditionally can cause a quite serious failure.  It wakes up a
> process in any state and there are code paths which aren't prepared to
> be woken up unless certain conditions are met.  

I understand that.  But I don't recall having seen anyone cite an actual
scenario in the real world where it does something problematic.

> A safer choice can be changing it to wake up only tasks in interruptible
> sleep.

For ptrace-related purposes, wake_up_state(child, TASK_TRACED|TASK_STOPPED)
ought to cover anything that it is doing now that anything could reasonably
be relying on.

> I have very difficult time imagining what real user visible
> implication it would have in negative and noticeable manner, so while
> I do agree that we should proceed carefully, I think it would be
> better to take the chance and remove the erratic behavior.  We have
> release cycles and testing at multiple layers after all.  If we find
> out that it breaks something in serious manner, we can always revert.

"Difficult time imagining" and ptrace subtleties unfortunately often go
together.  I want to be sure we are doing something right, rather than just
being thoroughly unsure that we aren't doing something wrong.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach()
  2011-01-28  7:02       ` Roland McGrath
@ 2011-01-28 10:59         ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2011-01-28 10:59 UTC (permalink / raw)
  To: Roland McGrath; +Cc: oleg, linux-kernel, rjw, jan.kratochvil

Hello, Roland.

On Thu, Jan 27, 2011 at 11:02:52PM -0800, Roland McGrath wrote:
> > On Mon, Jan 17, 2011 at 02:13:40PM -0800, Roland McGrath wrote:
> > > Of course I agree that the current code is wrong here.  But I'm still not
> > > at all clear on what practical compatibility problems it introduces.  This
> > > change is OK if and only if we are really making the only area clean and
> > > well-defined in the same release cycle.
> > 
> > I don't follow what you meant by the above sentence.  Can you please
> > clarify a bit?
> 
> I think I meant to write "the whole area" rather than "the only area".
> The point I meant to make is that removing this (wrong) wake-up entirely
> should only be done when we are also thoroughly cleaning up and verifying
> everything about the behavior of stopping/resuming in ptrace.  When we get
> to a clear statement of what all the rules are and we really believe that
> the new code matches those rules, then fine.  When we are still in the
> quagmire, no change is better than an incremental change whose subtle
> ramifications we can't clearly articulate.

Hmm... I'm not sure I quite agree.  We sure should proceed with the
big picture in mind but procedding gradually is usually the better and
sometimes the only way to get there.  Unconditional wake up doesn't
belong anywhere no matter which direction we end up taking.  It's a
natural step to take.

> > It's slightly more serious than that.  Calling wake_up_process()
> > unconditionally can cause a quite serious failure.  It wakes up a
> > process in any state and there are code paths which aren't prepared to
> > be woken up unless certain conditions are met.  
> 
> I understand that.  But I don't recall having seen anyone cite an actual
> scenario in the real world where it does something problematic.

I don't think it will be easily seen during regular use.  Debugging is
often pretty slow paced and tasks are usually stopped in well known
places.  But it does make a pretty decent attack vector for malicious
users.  I don't think it will take too much effort to come up with a
proof-of-concept DOS attack exploiting this.  It's plain dangerous to
do this no matter how unlikely regular users might hit it.

> > A safer choice can be changing it to wake up only tasks in interruptible
> > sleep.
> 
> For ptrace-related purposes, wake_up_state(child, TASK_TRACED|TASK_STOPPED)
> ought to cover anything that it is doing now that anything could reasonably
> be relying on.

That probably is something we need to do for -stable, I agree.

> > I have very difficult time imagining what real user visible
> > implication it would have in negative and noticeable manner, so while
> > I do agree that we should proceed carefully, I think it would be
> > better to take the chance and remove the erratic behavior.  We have
> > release cycles and testing at multiple layers after all.  If we find
> > out that it breaks something in serious manner, we can always revert.
> 
> "Difficult time imagining" and ptrace subtleties unfortunately often go
> together.

But that's the only way forward.  Ptrace sure is complicated, some
part due to its inherent nature but more because things were
continuously piled on top of it without proper clean up, but other
parts of kernel are complicated too and we make changes all the time.
We have the whole release cycle structure and testing hierarchies
throughout the ecosystem to enable changes like this.

> I want to be sure we are doing something right, rather than just
> being thoroughly unsure that we aren't doing something wrong.

But this is something which is apparently wrong and highly unlikely to
cause any problem especially because with the patchset an attached
task is always in traced state on detach and thus always gets
signal_wake_up().  The only difference is removal of an extra spurious
unconditional wakeup which may or may not disturb the group stop
depending on timing.  If we can't change this, I don't know what we
can.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-01-28 11:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-22 11:13 [PATCHSET] ptrace,signal: cleanups and fixes Tejun Heo
2010-12-22 11:13 ` [PATCH 1/4] signal: fix SIGCONT notification code Tejun Heo
2011-01-12 20:41   ` Jan Kratochvil
2011-01-13 11:27     ` Tejun Heo
2011-01-13 14:40     ` Oleg Nesterov
2010-12-22 11:13 ` [PATCH 2/4] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
2011-01-17 22:13   ` Roland McGrath
2011-01-27 13:34     ` Tejun Heo
2011-01-28  7:02       ` Roland McGrath
2011-01-28 10:59         ` Tejun Heo
2010-12-22 11:13 ` [PATCH 3/4] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
2011-01-17 22:10   ` Roland McGrath
2010-12-22 11:13 ` [PATCH 4/4] ptrace: kill tracehook_notify_jctl() Tejun Heo
2011-01-17 22:14   ` Roland McGrath

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.