All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition
@ 2014-02-20 17:38 Oleg Nesterov
  2014-02-20 17:38 ` [PATCH 1/5] wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race Oleg Nesterov
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-20 17:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

Hello.

1/7 cc's stable, and imo it is v3.14 material. This hack is reverted
by the next patch.

Many thanks to Michal and Jan for investigating.


And it seems that we need more (unrelated) changes in do_wait(), will
try to do this tomorrow.

Tejun, unless I missed something WSTOPPED logic is broken if a process
has a zombie/ptraced leader, "A zombie ptracee is only visible to its
ptracer" is wrong in this case. Plus perhaps some cleanups make sense.

Oleg.

 fs/proc/array.c       |    4 +-
 include/linux/sched.h |    5 ++-
 kernel/exit.c         |   51 +++++++++++++++++++++++-------------------------
 3 files changed, 29 insertions(+), 31 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race
  2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
@ 2014-02-20 17:38 ` Oleg Nesterov
  2014-02-20 17:39 ` [PATCH 2/5] wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-20 17:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock. If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.

The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race". wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal
if EXIT_DEAD, assuming that this task must be reaped by someone else.
So the tracee can be re-parented with ->exit_signal != SIGCHLD, and
if /sbin/init doesn't use __WALL it becomes unreapable.

Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
Note: this is the simple temporary hack for -stable, it doesn't try
to solve all problems, it will be reverted by the next changes.

Cc: stable@vger.kernel.org
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Reported-and-tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 1e77fc6..5281522 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -560,9 +560,6 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
 				struct list_head *dead)
 {
 	list_move_tail(&p->sibling, &p->real_parent->children);
-
-	if (p->exit_state == EXIT_DEAD)
-		return;
 	/*
 	 * If this is a threaded reparent there is no need to
 	 * notify anyone anything has happened.
@@ -570,9 +567,19 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
 	if (same_thread_group(p->real_parent, father))
 		return;
 
-	/* We don't want people slaying init.  */
+	/*
+	 * We don't want people slaying init.
+	 *
+	 * Note: we do this even if it is EXIT_DEAD, wait_task_zombie()
+	 * can change ->exit_state to EXIT_ZOMBIE. If this is the final
+	 * state, do_notify_parent() was already called and ->exit_signal
+	 * doesn't matter.
+	 */
 	p->exit_signal = SIGCHLD;
 
+	if (p->exit_state == EXIT_DEAD)
+		return;
+
 	/* If it has exited notify the new parent about this child's death. */
 	if (!p->ptrace &&
 	    p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition
  2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
  2014-02-20 17:38 ` [PATCH 1/5] wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race Oleg Nesterov
@ 2014-02-20 17:39 ` Oleg Nesterov
  2014-02-20 17:39 ` [PATCH 3/5] wait: use EXIT_TRACE only if thread_group_leader(zombie) Oleg Nesterov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-20 17:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock. If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.

The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race". wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal
if EXIT_DEAD, assuming that this task must be reaped by someone else.
So the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable. This was fixed by
the previous commit, but it was the temporary hack.

1. Add the new exit_state, EXIT_TRACE. It means that the task is the
   traced zombie, debugger is going to detach and notify its natural
   parent.

   This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we
   can avoid the changes in proc/kgdb code, get_task_state() still
   reports "X (dead)" in this case.

   Note: with or without this change userspace can see Z -> X -> Z
   transition. Not really bad, but probably makes sense to fix.

2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD
   if we need to notify the ->real_parent.

3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD
   is always the final state we can safely ignore such a task.

4. Change wait_consider_task() to check EXIT_TRACE separately and kill
   the racy and no longer needed ptrace_reparented() case.

   If ptrace == T an EXIT_TRACE thread should be simply ignored, the
   owner of this state is going to ptrace_unlink() this task. We can
   pretend that it was already removed from ->ptraced list.

   Otherwise we should skip this thread too but clear ->notask_error,
   we must be the natural parent and debugger is going to untrace and
   notify us. IOW, this doesn't differ from "EXIT_ZOMBIE && p->ptrace"
   even if the task was already untraced.

Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Reported-and-tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h |    1 +
 kernel/exit.c         |   50 ++++++++++++++++++++----------------------------
 2 files changed, 22 insertions(+), 29 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a781dec..b3593ac 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -206,6 +206,7 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
 /* in tsk->exit_state */
 #define EXIT_ZOMBIE		16
 #define EXIT_DEAD		32
+#define EXIT_TRACE		(EXIT_ZOMBIE | EXIT_DEAD)
 /* in tsk->state again */
 #define TASK_DEAD		64
 #define TASK_WAKEKILL		128
diff --git a/kernel/exit.c b/kernel/exit.c
index 5281522..c702824 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -560,6 +560,9 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
 				struct list_head *dead)
 {
 	list_move_tail(&p->sibling, &p->real_parent->children);
+
+	if (p->exit_state == EXIT_DEAD)
+		return;
 	/*
 	 * If this is a threaded reparent there is no need to
 	 * notify anyone anything has happened.
@@ -567,19 +570,9 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p,
 	if (same_thread_group(p->real_parent, father))
 		return;
 
-	/*
-	 * We don't want people slaying init.
-	 *
-	 * Note: we do this even if it is EXIT_DEAD, wait_task_zombie()
-	 * can change ->exit_state to EXIT_ZOMBIE. If this is the final
-	 * state, do_notify_parent() was already called and ->exit_signal
-	 * doesn't matter.
-	 */
+	/* We don't want people slaying init. */
 	p->exit_signal = SIGCHLD;
 
-	if (p->exit_state == EXIT_DEAD)
-		return;
-
 	/* If it has exited notify the new parent about this child's death. */
 	if (!p->ptrace &&
 	    p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) {
@@ -1045,17 +1038,13 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 		return wait_noreap_copyout(wo, p, pid, uid, why, status);
 	}
 
+	traced = ptrace_reparented(p);
 	/*
-	 * Try to move the task's state to DEAD
-	 * only one thread is allowed to do this:
+	 * Move the task's state to DEAD/TRACE, only one thread can do this.
 	 */
-	state = xchg(&p->exit_state, EXIT_DEAD);
-	if (state != EXIT_ZOMBIE) {
-		BUG_ON(state != EXIT_DEAD);
+	state = traced ? EXIT_TRACE : EXIT_DEAD;
+	if (cmpxchg(&p->exit_state, EXIT_ZOMBIE, state) != EXIT_ZOMBIE)
 		return 0;
-	}
-
-	traced = ptrace_reparented(p);
 	/*
 	 * It can be ptraced but not reparented, check
 	 * thread_group_leader() to filter out sub-threads.
@@ -1116,7 +1105,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 
 	/*
 	 * Now we are sure this task is interesting, and no other
-	 * thread can reap it because we set its state to EXIT_DEAD.
+	 * thread can reap it because we its state == DEAD/TRACE.
 	 */
 	read_unlock(&tasklist_lock);
 
@@ -1161,14 +1150,14 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 		 * If this is not a sub-thread, notify the parent.
 		 * If parent wants a zombie, don't release it now.
 		 */
+		state = EXIT_DEAD;
 		if (thread_group_leader(p) &&
-		    !do_notify_parent(p, p->exit_signal)) {
-			p->exit_state = EXIT_ZOMBIE;
-			p = NULL;
-		}
+		    !do_notify_parent(p, p->exit_signal))
+			state = EXIT_ZOMBIE;
+		p->exit_state = state;
 		write_unlock_irq(&tasklist_lock);
 	}
-	if (p != NULL)
+	if (state == EXIT_DEAD)
 		release_task(p);
 
 	return retval;
@@ -1364,12 +1353,15 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 	}
 
 	/* dead body doesn't have much to contribute */
-	if (unlikely(p->exit_state == EXIT_DEAD)) {
+	if (unlikely(p->exit_state == EXIT_DEAD))
+		return 0;
+
+	if (unlikely(p->exit_state == EXIT_TRACE)) {
 		/*
-		 * But do not ignore this task until the tracer does
-		 * wait_task_zombie()->do_notify_parent().
+		 * ptrace == 0 means we are the natural parent. In this case
+		 * we should clear notask_error, debugger will notify us.
 		 */
-		if (likely(!ptrace) && unlikely(ptrace_reparented(p)))
+		if (likely(!ptrace))
 			wo->notask_error = 0;
 		return 0;
 	}
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] wait: use EXIT_TRACE only if thread_group_leader(zombie)
  2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
  2014-02-20 17:38 ` [PATCH 1/5] wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race Oleg Nesterov
  2014-02-20 17:39 ` [PATCH 2/5] wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
@ 2014-02-20 17:39 ` Oleg Nesterov
  2014-02-20 17:39 ` [PATCH 4/5] wait: completely ignore the EXIT_DEAD tasks Oleg Nesterov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-20 17:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

wait_task_zombie() always uses EXIT_TRACE/ptrace_unlink() if
ptrace_reparented(). This is suboptimal and a bit confusing:
we do not need do_notify_parent(p) if !thread_group_leader(p)
and in this case we also do not need ptrace_unlink(), we can
rely on ptrace_release_task().

Change wait_task_zombie() to check thread_group_leader() along
with ptrace_reparented() and simplify the final p->exit_state
transition.

Tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |   17 +++++++----------
 1 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index c702824..aaad08d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1042,7 +1042,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 	/*
 	 * Move the task's state to DEAD/TRACE, only one thread can do this.
 	 */
-	state = traced ? EXIT_TRACE : EXIT_DEAD;
+	state = traced && thread_group_leader(p) ? EXIT_TRACE : EXIT_DEAD;
 	if (cmpxchg(&p->exit_state, EXIT_ZOMBIE, state) != EXIT_ZOMBIE)
 		return 0;
 	/*
@@ -1142,18 +1142,15 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 	if (!retval)
 		retval = pid;
 
-	if (traced) {
+	if (state == EXIT_TRACE) {
 		write_lock_irq(&tasklist_lock);
 		/* We dropped tasklist, ptracer could die and untrace */
 		ptrace_unlink(p);
-		/*
-		 * If this is not a sub-thread, notify the parent.
-		 * If parent wants a zombie, don't release it now.
-		 */
-		state = EXIT_DEAD;
-		if (thread_group_leader(p) &&
-		    !do_notify_parent(p, p->exit_signal))
-			state = EXIT_ZOMBIE;
+
+		/* If parent wants a zombie, don't release it now */
+		state = EXIT_ZOMBIE;
+		if (do_notify_parent(p, p->exit_signal))
+			state = EXIT_DEAD;
 		p->exit_state = state;
 		write_unlock_irq(&tasklist_lock);
 	}
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] wait: completely ignore the EXIT_DEAD tasks
  2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
                   ` (2 preceding siblings ...)
  2014-02-20 17:39 ` [PATCH 3/5] wait: use EXIT_TRACE only if thread_group_leader(zombie) Oleg Nesterov
@ 2014-02-20 17:39 ` Oleg Nesterov
  2014-02-20 17:39 ` [PATCH 5/5] wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space Oleg Nesterov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-20 17:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

Now that EXIT_DEAD is the terminal state it doesn't make sense
to call eligible_child() or security_task_wait() if the task is
really dead.

Tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index aaad08d..3d6f247 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1331,7 +1331,12 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
 static int wait_consider_task(struct wait_opts *wo, int ptrace,
 				struct task_struct *p)
 {
-	int ret = eligible_child(wo, p);
+	int ret;
+
+	if (unlikely(p->exit_state == EXIT_DEAD))
+		return 0;
+
+	ret = eligible_child(wo, p);
 	if (!ret)
 		return ret;
 
@@ -1349,10 +1354,6 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 		return 0;
 	}
 
-	/* dead body doesn't have much to contribute */
-	if (unlikely(p->exit_state == EXIT_DEAD))
-		return 0;
-
 	if (unlikely(p->exit_state == EXIT_TRACE)) {
 		/*
 		 * ptrace == 0 means we are the natural parent. In this case
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space
  2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
                   ` (3 preceding siblings ...)
  2014-02-20 17:39 ` [PATCH 4/5] wait: completely ignore the EXIT_DEAD tasks Oleg Nesterov
@ 2014-02-20 17:39 ` Oleg Nesterov
  2014-02-20 19:48 ` [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Tejun Heo
  2014-02-26 16:55 ` [PATCH 0/2] wait: WSTOPPED & ptrace fixes Oleg Nesterov
  6 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-20 17:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

get_task_state() uses the most significant bit to report the state
to user-space, this means that EXIT_ZOMBIE->EXIT_TRACE->EXIT_DEAD
transition can be noticed via /proc as Z -> X -> Z change. Note
that this was possible even before EXIT_TRACE was introduced.

This is not really bad but imho it make sense to hide EXIT_TRACE
from user-space completely. So the patch simply swaps EXIT_ZOMBIE
and EXIT_DEAD, this way EXIT_TRACE will be seen as EXIT_ZOMBIE by
user-space.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/array.c       |    4 ++--
 include/linux/sched.h |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 656e401..64db2bc 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -138,8 +138,8 @@ static const char * const task_state_array[] = {
 	"D (disk sleep)",	/*   2 */
 	"T (stopped)",		/*   4 */
 	"t (tracing stop)",	/*   8 */
-	"Z (zombie)",		/*  16 */
-	"X (dead)",		/*  32 */
+	"X (dead)",		/*  16 */
+	"Z (zombie)",		/*  32 */
 };
 
 static inline const char *get_task_state(struct task_struct *tsk)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b3593ac..53f9136 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -204,8 +204,8 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
 #define __TASK_STOPPED		4
 #define __TASK_TRACED		8
 /* in tsk->exit_state */
-#define EXIT_ZOMBIE		16
-#define EXIT_DEAD		32
+#define EXIT_DEAD		16
+#define EXIT_ZOMBIE		32
 #define EXIT_TRACE		(EXIT_ZOMBIE | EXIT_DEAD)
 /* in tsk->state again */
 #define TASK_DEAD		64
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition
  2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
                   ` (4 preceding siblings ...)
  2014-02-20 17:39 ` [PATCH 5/5] wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space Oleg Nesterov
@ 2014-02-20 19:48 ` Tejun Heo
  2014-02-24 15:51   ` Oleg Nesterov
  2014-02-26 16:55 ` [PATCH 0/2] wait: WSTOPPED & ptrace fixes Oleg Nesterov
  6 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2014-02-20 19:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Al Viro, Jan Kratochvil, Lennart Poettering,
	Linus Torvalds, Michal Schmidt, Roland McGrath, linux-kernel

On Thu, Feb 20, 2014 at 06:38:38PM +0100, Oleg Nesterov wrote:
> Tejun, unless I missed something WSTOPPED logic is broken if a process
> has a zombie/ptraced leader, "A zombie ptracee is only visible to its
> ptracer" is wrong in this case. Plus perhaps some cleanups make sense.

Heh, I already forgot most details about ptrace. :) Is it something
urgent?  If so, I'll try to get back to it right away; otherwise, I'll
probably be able to get around to it in several weeks.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition
  2014-02-20 19:48 ` [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Tejun Heo
@ 2014-02-24 15:51   ` Oleg Nesterov
  0 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-24 15:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Al Viro, Jan Kratochvil, Lennart Poettering,
	Linus Torvalds, Michal Schmidt, Roland McGrath, linux-kernel

On 02/20, Tejun Heo wrote:
>
> On Thu, Feb 20, 2014 at 06:38:38PM +0100, Oleg Nesterov wrote:
> > Tejun, unless I missed something WSTOPPED logic is broken if a process
> > has a zombie/ptraced leader, "A zombie ptracee is only visible to its
> > ptracer" is wrong in this case. Plus perhaps some cleanups make sense.
>
> Heh, I already forgot most details about ptrace. :)

This is mostly about do_wait(), although yes, !ptrace case looks fine.
And it turns out, I forgot these details too.

> Is it something
> urgent?

No, please forget. But it seems that wait_consider_task(zombie_leader)
is very wrong wrt zombie leaders. WSTOPPED is simple, but I am still
trying to find a simple fix for other problems. And the main problem,
of course, is that this API is broken and we can't fix it.

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 0/2] wait: WSTOPPED & ptrace fixes
  2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
                   ` (5 preceding siblings ...)
  2014-02-20 19:48 ` [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Tejun Heo
@ 2014-02-26 16:55 ` Oleg Nesterov
  2014-02-26 16:55   ` [PATCH 1/2] wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent Oleg Nesterov
  2014-02-26 16:56   ` [PATCH 2/2] wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process Oleg Nesterov
  6 siblings, 2 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-26 16:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

On 02/20, Oleg Nesterov wrote:
>
> Tejun, unless I missed something WSTOPPED logic is broken if a process
> has a zombie/ptraced leader, "A zombie ptracee is only visible to its
> ptracer" is wrong in this case. Plus perhaps some cleanups make sense.

In fact it is broken even if it the child/tracee is single-threaded.

And I failed to find a simple fix which is not visible to user-space.
It would be nice if you can ack/nack at least the changelog in 1/2,
I mean the note about SIGNAL_STOP_CONTINUED.

And unless I missed something WEXITED is broken too, so I'll probably
send more changes.

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent
  2014-02-26 16:55 ` [PATCH 0/2] wait: WSTOPPED & ptrace fixes Oleg Nesterov
@ 2014-02-26 16:55   ` Oleg Nesterov
  2014-02-26 16:56   ` [PATCH 2/2] wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-26 16:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

"A zombie is only visible to its ptracer" logic in wait_consider_task()
is very wrong. Trivial test-case:

	#include <unistd.h>
	#include <sys/ptrace.h>
	#include <sys/wait.h>
	#include <assert.h>

	int main(void)
	{
		int child = fork();

		if (!child) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			return 0x23;
		}

		assert(waitid(P_ALL, child, NULL, WEXITED | WNOWAIT) == 0);
		assert(waitid(P_ALL, 0, NULL, WSTOPPED) == -1);
		return 0;
	}

it hangs in waitpid(WSTOPPED) despite the fact it has a single zombie
child. This is because wait_consider_task(ptrace => 0) sees p->ptrace
and cleares ->notask_error assuming that the debugger should detach and
notify us.

Change wait_consider_task(ptrace => 0) to pretend that ptrace == T if
the child is traced by us. This really simplifies the logic and allows
us to do more fixes, see the next changes. This also hides the unwanted
group stop state automatically, we can remove another ptrace_reparented()
check.

Unfortunately, this adds the following behavioural changes:

	1. Before this patch wait(WEXITED | __WNOTHREAD) does not reap
	   a natural child if it is traced by the caller's sub-thread.

	   Hopefully nobody will ever notice this change, and I think
	   that nobody should rely on this behaviour anyway.

	2. SIGNAL_STOP_CONTINUED is no longer hidden from debugger if
	   it is real parent.

	   While this change comes as a side effect, I think it is good
	   by itself. The group continued state can not be consumed by
	   another process in this case, it doesn't depend on ptrace,
	   it doesn't make sense to hide it from real parent.

	   Perhaps we should add the thread_group_leader() check before
	   wait_task_continued()? May be, but this shouldn't depend on
	   ptrace_reparented().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |   29 ++++++++++++++++-------------
 1 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 3d6f247..56c5ac3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1364,6 +1364,22 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 		return 0;
 	}
 
+	if (likely(!ptrace) && unlikely(p->ptrace)) {
+		/*
+		 * If it is traced by its real parent's group, just pretend
+		 * the caller is ptrace_do_wait() and reap this child if it
+		 * is zombie.
+		 *
+		 * This also hides group stop state from real parent; otherwise
+		 * a single stop can be reported twice as group and ptrace stop.
+		 * If a ptracer wants to distinguish these two events for its
+		 * own children it should create a separate process which takes
+		 * the role of real parent.
+		 */
+		if (!ptrace_reparented(p))
+			ptrace = 1;
+	}
+
 	/* slay zombie? */
 	if (p->exit_state == EXIT_ZOMBIE) {
 		/*
@@ -1405,19 +1421,6 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 			wo->notask_error = 0;
 	} else {
 		/*
-		 * If @p is ptraced by a task in its real parent's group,
-		 * hide group stop/continued state when looking at @p as
-		 * the real parent; otherwise, a single stop can be
-		 * reported twice as group and ptrace stops.
-		 *
-		 * If a ptracer wants to distinguish the two events for its
-		 * own children, it should create a separate process which
-		 * takes the role of real parent.
-		 */
-		if (likely(!ptrace) && p->ptrace && !ptrace_reparented(p))
-			return 0;
-
-		/*
 		 * @p is alive and it's gonna stop, continue or exit, so
 		 * there always is something to wait for.
 		 */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process
  2014-02-26 16:55 ` [PATCH 0/2] wait: WSTOPPED & ptrace fixes Oleg Nesterov
  2014-02-26 16:55   ` [PATCH 1/2] wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent Oleg Nesterov
@ 2014-02-26 16:56   ` Oleg Nesterov
  1 sibling, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2014-02-26 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kratochvil, Lennart Poettering, Linus Torvalds,
	Michal Schmidt, Roland McGrath, Tejun Heo, linux-kernel

Even if the main thread is dead the process still can stop/continue.
However, if the leader is ptraced wait_consider_task(ptrace => false)
always skips wait_task_stopped/wait_task_continued, so WSTOPPED or
WCONTINUED can never work for the natural parent in this case.

Move the "A zombie ptracee is only visible to its ptracer" check into
the "if (!delay_group_leader(p))" block. ->notask_error is cleared by
the "fall through" code below.

This depends on the previous change, wait_task_stopped/continued must
be avoided if !delay_group_leader() and the tracer is ->real_parent.
Otherwise WSTOPPED|WEXITED could wrongly report "stopped" when the child
is already dead (single-threaded or not). If it is traced by another
task then the "stopped" state is fine until the debugger detaches and
reveals a zombie state.

Stupid test-case:

	void *tfunc(void *arg)
	{
		sleep(1);	// wait for zombie leader
		raise(SIGSTOP);
		exit(0x13);
		return NULL;
	}

	int run_child(void)
	{
		pthread_t thread;

		if (!fork()) {
			int tracee = getppid();

			assert(ptrace(PTRACE_ATTACH, tracee, 0,0) == 0);
			do
				ptrace(PTRACE_CONT, tracee, 0,0);
			while (wait(NULL) > 0);

			return 0;
		}

		sleep(1);	// wait for PTRACE_ATTACH
		assert(pthread_create(&thread, NULL, tfunc, NULL) == 0);
		pthread_exit(NULL);
	}

	int main(void)
	{
		int child, stat;

		child = fork();
		if (!child)
			return run_child();

		assert(child == waitpid(-1, &stat, WSTOPPED));
		assert(stat == 0x137f);

		kill(child, SIGCONT);

		assert(child == waitpid(-1, &stat, WCONTINUED));
		assert(stat == 0xffff);

		assert(child == waitpid(-1, &stat, 0));
		assert(stat == 0x1300);

		return 0;
	}

Without this patch it hangs in waitpid(WSTOPPED), wait_task_stopped()
is never called.

Note: this doesn't fix all problems with a zombie delay_group_leader(),
WCONTINUED | WEXITED check is not exactly right. debugger can't assume
it will be notified if another thread reaps the whole thread group.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |   22 +++++++++-------------
 1 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 56c5ac3..790b73c 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1382,20 +1382,16 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
 
 	/* slay zombie? */
 	if (p->exit_state == EXIT_ZOMBIE) {
-		/*
-		 * A zombie ptracee is only visible to its ptracer.
-		 * Notification and reaping will be cascaded to the real
-		 * parent when the ptracer detaches.
-		 */
-		if (likely(!ptrace) && unlikely(p->ptrace)) {
-			/* it will become visible, clear notask_error */
-			wo->notask_error = 0;
-			return 0;
-		}
-
 		/* we don't reap group leaders with subthreads */
-		if (!delay_group_leader(p))
-			return wait_task_zombie(wo, p);
+		if (!delay_group_leader(p)) {
+			/*
+			 * A zombie ptracee is only visible to its ptracer.
+			 * Notification and reaping will be cascaded to the
+			 * real parent when the ptracer detaches.
+			 */
+			if (unlikely(ptrace) || likely(!p->ptrace))
+				return wait_task_zombie(wo, p);
+		}
 
 		/*
 		 * Allow access to stopped/continued state via zombie by
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-02-26 16:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-20 17:38 [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
2014-02-20 17:38 ` [PATCH 1/5] wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race Oleg Nesterov
2014-02-20 17:39 ` [PATCH 2/5] wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition Oleg Nesterov
2014-02-20 17:39 ` [PATCH 3/5] wait: use EXIT_TRACE only if thread_group_leader(zombie) Oleg Nesterov
2014-02-20 17:39 ` [PATCH 4/5] wait: completely ignore the EXIT_DEAD tasks Oleg Nesterov
2014-02-20 17:39 ` [PATCH 5/5] wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space Oleg Nesterov
2014-02-20 19:48 ` [PATCH 0/5] kill the racy EXIT_ZOMBIE->EXIT_DEAD->EXIT_ZOMBIE transition Tejun Heo
2014-02-24 15:51   ` Oleg Nesterov
2014-02-26 16:55 ` [PATCH 0/2] wait: WSTOPPED & ptrace fixes Oleg Nesterov
2014-02-26 16:55   ` [PATCH 1/2] wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent Oleg Nesterov
2014-02-26 16:56   ` [PATCH 2/2] wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.