All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ptrace: fix race between ptrace_resume() and wait_task_stopped()
@ 2015-03-24 18:54 Oleg Nesterov
  2015-03-24 18:54 ` [PATCH 1/2] " Oleg Nesterov
  2015-03-24 18:54 ` [PATCH 2/2] ptrace: ptrace_detach() can no longer race with SIGKILL Oleg Nesterov
  0 siblings, 2 replies; 3+ messages in thread
From: Oleg Nesterov @ 2015-03-24 18:54 UTC (permalink / raw)
  To: Andrew Morton, Pavel Labath
  Cc: Josh Stone, Pedro Alves, Vince Harron, linux-kernel

On 03/24, Pavel Labath wrote:
>
> I have tested your patch and I can confirm that the error is gone when
> running a patched kernel.

Great! Thanks a lot for the detailed/clear report and testing.

So I am sending this fix + another patch. 2/2 is "while at it" change,
just because ptrace_detach() can resume the tracee with the new code
too, so it makes sense to add a comment and remove the outdated logic.

> I am still seeing one very rare failure where the SIGUSR does not
> appear to be reported. However, I will need to dig around this a bit
> more to make sure there is no error on our end.

Hmm, perhaps we have (yet) another bug... please let me know if/when
you have more details.

> Now I am thinking about how to work around these bugs, as our code
> will need to run on unpatched kernels as well. As for this
> ptrace/waitpid race, I think I will just refactor the code to make
> wait and ptrace calls on the same thread. This should sidestep the
> race, right?

Yes sure, this will hide the problem.

> Regarding your bug, I am not exactly sure what are the implications.
> Could you briefly describe the situations in which this behavior can
> occur? Am I correct in understanding that this is always a race
> between a SIGKILL and another non-lethal signal? And that the SIGKILL
> will be (eventually) reported?

No, SIGKILL can be never reported. But note that ptrace_stop() does

	set_current_state(TASK_TRACED);

	current->last_siginfo = info;
	current->exit_code = exit_code;

and this is another case when wait_task_stopped() can consume/report this
exit_code even if the tracee won't actually stop because it is killed.

Usually this is not that bad, we can pretend that it was killed after stop.
Still this can confuse the debugger which sends SIGKILL to the stopped tracee.
We need more fatal_signal_pending() checks in ptrace_stop(). And in fact
in get_signal(), I think. The problem is that we need other cleanups here.
But fortunately this problem is minor.

Oleg.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] ptrace: fix race between ptrace_resume() and wait_task_stopped()
  2015-03-24 18:54 [PATCH 0/2] ptrace: fix race between ptrace_resume() and wait_task_stopped() Oleg Nesterov
@ 2015-03-24 18:54 ` Oleg Nesterov
  2015-03-24 18:54 ` [PATCH 2/2] ptrace: ptrace_detach() can no longer race with SIGKILL Oleg Nesterov
  1 sibling, 0 replies; 3+ messages in thread
From: Oleg Nesterov @ 2015-03-24 18:54 UTC (permalink / raw)
  To: Andrew Morton, Pavel Labath
  Cc: Josh Stone, Pedro Alves, Vince Harron, linux-kernel

ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.

This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.

Test-case:

	#include <stdio.h>
	#include <unistd.h>
	#include <sys/wait.h>
	#include <sys/ptrace.h>
	#include <pthread.h>
	#include <assert.h>

	int pid;

	void *waiter(void *arg)
	{
		int stat;

		for (;;) {
			assert(pid == wait(&stat));
			assert(WIFSTOPPED(stat));
			if (WSTOPSIG(stat) == SIGHUP)
				continue;

			assert(WSTOPSIG(stat) == SIGCONT);
			printf("ERR! extra/wrong report:%x\n", stat);
		}
	}

	int main(void)
	{
		pthread_t thread;

		pid = fork();
		if (!pid) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			for (;;)
				kill(getpid(), SIGHUP);
		}

		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);

		for (;;)
			ptrace(PTRACE_CONT, pid, 0, SIGCONT);

		return 0;
	}

Note for stable: the bug is very old, but without 9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Pavel Labath <labath@google.com>
Tested-by: Pavel Labath <labath@google.com>
Cc: <stable@vger.kernel.org>
---
 kernel/ptrace.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1eb9d90..5009263 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -697,6 +697,8 @@ static int ptrace_peek_siginfo(struct task_struct *child,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
+	bool need_siglock;
+
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -724,8 +726,26 @@ static int ptrace_resume(struct task_struct *child, long request,
 		user_disable_single_step(child);
 	}
 
+	/*
+	 * Change ->exit_code and ->state under siglock to avoid the race
+	 * with wait_task_stopped() in between; a non-zero ->exit_code will
+	 * wrongly look like another report from tracee.
+	 *
+	 * Note that we need siglock even if ->exit_code == data and/or this
+	 * status was not reported yet, the new status must not be cleared by
+	 * wait_task_stopped() after resume.
+	 *
+	 * If data == 0 we do not care if wait_task_stopped() reports the old
+	 * status and clears the code too; this can't race with the tracee, it
+	 * takes siglock after resume.
+	 */
+	need_siglock = data && !thread_group_empty(current);
+	if (need_siglock)
+		spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
+	if (need_siglock)
+		spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] ptrace: ptrace_detach() can no longer race with SIGKILL
  2015-03-24 18:54 [PATCH 0/2] ptrace: fix race between ptrace_resume() and wait_task_stopped() Oleg Nesterov
  2015-03-24 18:54 ` [PATCH 1/2] " Oleg Nesterov
@ 2015-03-24 18:54 ` Oleg Nesterov
  1 sibling, 0 replies; 3+ messages in thread
From: Oleg Nesterov @ 2015-03-24 18:54 UTC (permalink / raw)
  To: Andrew Morton, Pavel Labath
  Cc: Josh Stone, Pedro Alves, Vince Harron, linux-kernel

ptrace_detach() re-checks ->ptrace under tasklist lock and calls
release_task() if __ptrace_detach() returns true. This was needed
because the __TASK_TRACED tracee could be killed/untraced, and it
could even pass exit_notify() before we take tasklist_lock.

But this is no longer possible after 9899d11f6544 "ptrace: ensure
arch_ptrace/ptrace_request can never race with SIGKILL". We can turn
these checks into WARN_ON() and remove release_task().

While at it, document the setting of child->exit_code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/ptrace.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 5009263..23be9dd 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -456,8 +456,6 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
 
 static int ptrace_detach(struct task_struct *child, unsigned int data)
 {
-	bool dead = false;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -467,18 +465,19 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 
 	write_lock_irq(&tasklist_lock);
 	/*
-	 * This child can be already killed. Make sure de_thread() or
-	 * our sub-thread doing do_wait() didn't do release_task() yet.
+	 * We rely on ptrace_freeze_traced(). It can't be killed and
+	 * untraced by another thread, it can't be a zombie.
 	 */
-	if (child->ptrace) {
-		child->exit_code = data;
-		dead = __ptrace_detach(current, child);
-	}
+	WARN_ON(!child->ptrace || child->exit_state);
+	/*
+	 * tasklist_lock avoids the race with wait_task_stopped(), see
+	 * the comment in ptrace_resume().
+	 */
+	child->exit_code = data;
+	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
 	proc_ptrace_connector(child, PTRACE_DETACH);
-	if (unlikely(dead))
-		release_task(child);
 
 	return 0;
 }
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-24 18:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-24 18:54 [PATCH 0/2] ptrace: fix race between ptrace_resume() and wait_task_stopped() Oleg Nesterov
2015-03-24 18:54 ` [PATCH 1/2] " Oleg Nesterov
2015-03-24 18:54 ` [PATCH 2/2] ptrace: ptrace_detach() can no longer race with SIGKILL Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.