All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] pidfd fixes
@ 2019-07-22 14:22 Christian Brauner
  2019-07-22 16:27 ` Linus Torvalds
  2019-07-22 16:40 ` pr-tracker-bot
  0 siblings, 2 replies; 8+ messages in thread
From: Christian Brauner @ 2019-07-22 14:22 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, oleg, surenb, joel

Hi Linus,

This contains a fix for pidfd polling. It ensures that the task's exit
state is visible to all waiters:

The following changes since commit 5f9e832c137075045d15cd6899ab0505cfb2ca4b:

  Linus 5.3-rc1 (2019-07-21 14:05:38 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-20190722

for you to fetch changes up to b191d6491be67cef2b3fa83015561caca1394ab9:

  pidfd: fix a poll race when setting exit_state (2019-07-22 16:02:03 +0200)

/* Summary */
The pidfd polling code suffered from a race condition. A waiter could be
notified via do_notify_pidfd() without the task's exit state being set and
thus not visible to the waiter. This would cause the waiter to be blocked
indefinitely. The following schematic illustrates how this could happen:

    CPU 0                            CPU 1
    ------------------------------------------------
    exit_notify
      do_notify_parent
        do_notify_pidfd
                                       pidfd_poll
                                          if (tsk->exit_state)
      tsk->exit_state = EXIT_DEAD

This is fixed by ensuring that the task's exit state is set before calling
into do_notify_pidfd(). 

Please consider pulling from the signed for-linus-20190722 tag.

Thanks!
Christian

----------------------------------------------------------------
for-linus-20190722

----------------------------------------------------------------
Suren Baghdasaryan (1):
      pidfd: fix a poll race when setting exit_state

 kernel/exit.c | 1 +
 1 file changed, 1 insertion(+)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] pidfd fixes
  2019-07-22 14:22 [GIT PULL] pidfd fixes Christian Brauner
@ 2019-07-22 16:27 ` Linus Torvalds
  2019-07-22 16:39   ` Christian Brauner
  2019-07-23 10:12   ` Oleg Nesterov
  2019-07-22 16:40 ` pr-tracker-bot
  1 sibling, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2019-07-22 16:27 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linux List Kernel Mailing, Oleg Nesterov, Suren Baghdasaryan,
	Joel Fernandes

On Mon, Jul 22, 2019 at 7:26 AM Christian Brauner <christian@brauner.io> wrote:
>
> This contains a fix for pidfd polling. It ensures that the task's exit
> state is visible to all waiters:

Hmm.

I've pulled this, but the exit_state thing has been very fragile
before, and I'm not entirely happy with how this just changes where it
is set. I guess the movement here is all inside the tasklist_lock, so
it's not that big of a deal, but still..

I would *really* like Oleg to take a look.

Also, and the primary reason I write this email is that this basically
makes the "EXIT_ZOMBIE / EXIT_DEAD" state handling look all kinds of
crazy. You set it to EXIT_ZOMBIE potentially _twice_. Whaa?

So if we set EXIT_ZOMBIE early, then I think we should change the
EXIT_DEAD case too. IOW, do something like this on top:

  --- a/kernel/exit.c
  +++ b/kernel/exit.c
  @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct
*tsk, int group_dead)
                autoreap = true;
        }

  -     tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
  -     if (tsk->exit_state == EXIT_DEAD)
  +     if (autoreap) {
  +             tsk->exit_state = EXIT_DEAD;
                list_add(&tsk->ptrace_entry, &dead);
  +     }

        /* mt-exec, de_thread() is waiting for group leader */
        if (unlikely(tsk->signal->notify_count < 0))

where now the logic becomes "ok, we turned into a zombie above, and if
we autoreap this thread then we turn the zombie into a fully dead
thread".

Because currently we end up having "first turn it into a zombie", then
"set it to zombie or dead depending on autoreap" and then "if we
turned it into dead, move it to the dead list".

That just feels confused to me. Comments?

              Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] pidfd fixes
  2019-07-22 16:27 ` Linus Torvalds
@ 2019-07-22 16:39   ` Christian Brauner
  2019-07-23 10:12   ` Oleg Nesterov
  1 sibling, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2019-07-22 16:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux List Kernel Mailing, Oleg Nesterov, Suren Baghdasaryan,
	Joel Fernandes

On Mon, Jul 22, 2019 at 09:27:08AM -0700, Linus Torvalds wrote:
> On Mon, Jul 22, 2019 at 7:26 AM Christian Brauner <christian@brauner.io> wrote:
> >
> > This contains a fix for pidfd polling. It ensures that the task's exit
> > state is visible to all waiters:
> 
> Hmm.
> 
> I've pulled this, but the exit_state thing has been very fragile
> before, and I'm not entirely happy with how this just changes where it
> is set. I guess the movement here is all inside the tasklist_lock, so
> it's not that big of a deal, but still..
> 
> I would *really* like Oleg to take a look.

Oh, sorry. Oleg did take a look. See:

https://lore.kernel.org/lkml/20190718150057.GB13355@redhat.com/
https://lore.kernel.org/lkml/20190719161404.GA24170@redhat.com/

> 
> Also, and the primary reason I write this email is that this basically
> makes the "EXIT_ZOMBIE / EXIT_DEAD" state handling look all kinds of
> crazy. You set it to EXIT_ZOMBIE potentially _twice_. Whaa?
> 
> So if we set EXIT_ZOMBIE early, then I think we should change the
> EXIT_DEAD case too. IOW, do something like this on top:
> 
>   --- a/kernel/exit.c
>   +++ b/kernel/exit.c
>   @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct
> *tsk, int group_dead)
>                 autoreap = true;
>         }
> 
>   -     tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
>   -     if (tsk->exit_state == EXIT_DEAD)
>   +     if (autoreap) {
>   +             tsk->exit_state = EXIT_DEAD;
>                 list_add(&tsk->ptrace_entry, &dead);
>   +     }
> 
>         /* mt-exec, de_thread() is waiting for group leader */
>         if (unlikely(tsk->signal->notify_count < 0))
> 
> where now the logic becomes "ok, we turned into a zombie above, and if
> we autoreap this thread then we turn the zombie into a fully dead
> thread".
> 
> Because currently we end up having "first turn it into a zombie", then
> "set it to zombie or dead depending on autoreap" and then "if we
> turned it into dead, move it to the dead list".
> 
> That just feels confused to me. Comments?

Agreed. But that codepath is so core-kernel that I really felt more
comfortable just doing the absolut minimal thing so that when things
bite us we see it right away.

There's no harm in sending a cleanup for this later I think, when we
haven't hit any weirdness with the current change. Does that sound ok?

Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] pidfd fixes
  2019-07-22 14:22 [GIT PULL] pidfd fixes Christian Brauner
  2019-07-22 16:27 ` Linus Torvalds
@ 2019-07-22 16:40 ` pr-tracker-bot
  1 sibling, 0 replies; 8+ messages in thread
From: pr-tracker-bot @ 2019-07-22 16:40 UTC (permalink / raw)
  To: Christian Brauner; +Cc: torvalds, linux-kernel, oleg, surenb, joel

The pull request you sent on Mon, 22 Jul 2019 16:22:41 +0200:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-20190722

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/44b912cd0b55777796c5ae8ae857bd1d5ff83ed5

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] pidfd fixes
  2019-07-22 16:27 ` Linus Torvalds
  2019-07-22 16:39   ` Christian Brauner
@ 2019-07-23 10:12   ` Oleg Nesterov
  2019-07-23 10:25     ` Christian Brauner
  1 sibling, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2019-07-23 10:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Brauner, Linux List Kernel Mailing, Suren Baghdasaryan,
	Joel Fernandes

On 07/22, Linus Torvalds wrote:
>
> So if we set EXIT_ZOMBIE early, then I think we should change the
> EXIT_DEAD case too. IOW, do something like this on top:
> 
>   --- a/kernel/exit.c
>   +++ b/kernel/exit.c
>   @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct
> *tsk, int group_dead)
>                 autoreap = true;
>         }
> 
>   -     tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
>   -     if (tsk->exit_state == EXIT_DEAD)
>   +     if (autoreap) {
>   +             tsk->exit_state = EXIT_DEAD;
>                 list_add(&tsk->ptrace_entry, &dead);
>   +     }

Yes, this needs cleanups. Actually I was going to suggest another change
below, this way do_notify_pidfd() is only called when it is really needed.
But then I decided a trivial one-liner makes more sense for the start.

I'll try to think. Perhaps we should also change do_notify_parent() to set
p->exit_state, at least if autoreap. Then the early exit_state = EXIT_ZOMBIE
won't look so confusing and we can do more (minor) cleanups.

Oleg.

--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -182,6 +182,13 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
 	put_task_struct(tsk);
 }
 
+static void do_notify_pidfd(struct task_struct *task)
+{
+	struct pid *pid;
+
+	pid = task_pid(task);
+	wake_up_all(&pid->wait_pidfd);
+}
 
 void release_task(struct task_struct *p)
 {
@@ -218,6 +225,8 @@ void release_task(struct task_struct *p)
 		zap_leader = do_notify_parent(leader, leader->exit_signal);
 		if (zap_leader)
 			leader->exit_state = EXIT_DEAD;
+
+		do_notify_pidfd(leader);
 	}
 
 	write_unlock_irq(&tasklist_lock);
@@ -710,7 +719,7 @@ static void forget_original_parent(struct task_struct *father,
  */
 static void exit_notify(struct task_struct *tsk, int group_dead)
 {
-	bool autoreap;
+	bool autoreap, xxx;
 	struct task_struct *p, *n;
 	LIST_HEAD(dead);
 
@@ -720,23 +729,22 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 	if (group_dead)
 		kill_orphaned_pgrp(tsk->group_leader, NULL);
 
-	if (unlikely(tsk->ptrace)) {
-		int sig = thread_group_leader(tsk) &&
-				thread_group_empty(tsk) &&
-				!ptrace_reparented(tsk) ?
-			tsk->exit_signal : SIGCHLD;
+	autoreap = true;
+	xxx = thread_group_leader(tsk) && thread_group_empty(tsk);
+
+	if (xxx || unlikely(tsk->ptrace)) {
+		int sig = xxx && !ptrace_reparented(tsk)
+			? tsk->exit_signal : SIGCHLD;
 		autoreap = do_notify_parent(tsk, sig);
-	} else if (thread_group_leader(tsk)) {
-		autoreap = thread_group_empty(tsk) &&
-			do_notify_parent(tsk, tsk->exit_signal);
-	} else {
-		autoreap = true;
 	}
 
 	tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
 	if (tsk->exit_state == EXIT_DEAD)
 		list_add(&tsk->ptrace_entry, &dead);
 
+	if (xxx)
+		do_notify_pidfd(tsk);
+
 	/* mt-exec, de_thread() is waiting for group leader */
 	if (unlikely(tsk->signal->notify_count < 0))
 		wake_up_process(tsk->signal->group_exit_task);
--- x/kernel/signal.c
+++ x/kernel/signal.c
@@ -1881,14 +1881,6 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 	return ret;
 }
 
-static void do_notify_pidfd(struct task_struct *task)
-{
-	struct pid *pid;
-
-	pid = task_pid(task);
-	wake_up_all(&pid->wait_pidfd);
-}
-
 /*
  * Let a parent know about the death of a child.
  * For a stopped/continued status change, use do_notify_parent_cldstop instead.
@@ -1912,9 +1904,6 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	BUG_ON(!tsk->ptrace &&
 	       (tsk->group_leader != tsk || !thread_group_empty(tsk)));
 
-	/* Wake up all pidfd waiters */
-	do_notify_pidfd(tsk);
-
 	if (sig != SIGCHLD) {
 		/*
 		 * This is only possible if parent == real_parent.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] pidfd fixes
  2019-07-23 10:12   ` Oleg Nesterov
@ 2019-07-23 10:25     ` Christian Brauner
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2019-07-23 10:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Linux List Kernel Mailing, Suren Baghdasaryan,
	Joel Fernandes

On Tue, Jul 23, 2019 at 12:12:49PM +0200, Oleg Nesterov wrote:
> On 07/22, Linus Torvalds wrote:
> >
> > So if we set EXIT_ZOMBIE early, then I think we should change the
> > EXIT_DEAD case too. IOW, do something like this on top:
> > 
> >   --- a/kernel/exit.c
> >   +++ b/kernel/exit.c
> >   @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct
> > *tsk, int group_dead)
> >                 autoreap = true;
> >         }
> > 
> >   -     tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
> >   -     if (tsk->exit_state == EXIT_DEAD)
> >   +     if (autoreap) {
> >   +             tsk->exit_state = EXIT_DEAD;
> >                 list_add(&tsk->ptrace_entry, &dead);
> >   +     }
> 
> Yes, this needs cleanups. Actually I was going to suggest another change
> below, this way do_notify_pidfd() is only called when it is really needed.
> But then I decided a trivial one-liner makes more sense for the start.

Yeah, that was my thinking exactly.

> 
> I'll try to think. Perhaps we should also change do_notify_parent() to set
> p->exit_state, at least if autoreap. Then the early exit_state = EXIT_ZOMBIE
> won't look so confusing and we can do more (minor) cleanups.

You mind sending that as a proper patch?
I also have a few more cleanups for other parts I intend to send later
this week. I'd pick this one up too.

> 
> Oleg.
> 
> --- x/kernel/exit.c
> +++ x/kernel/exit.c
> @@ -182,6 +182,13 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
>  	put_task_struct(tsk);
>  }
>  
> +static void do_notify_pidfd(struct task_struct *task)
> +{
> +	struct pid *pid;
> +
> +	pid = task_pid(task);
> +	wake_up_all(&pid->wait_pidfd);
> +}
>  
>  void release_task(struct task_struct *p)
>  {
> @@ -218,6 +225,8 @@ void release_task(struct task_struct *p)
>  		zap_leader = do_notify_parent(leader, leader->exit_signal);
>  		if (zap_leader)
>  			leader->exit_state = EXIT_DEAD;
> +
> +		do_notify_pidfd(leader);
>  	}
>  
>  	write_unlock_irq(&tasklist_lock);
> @@ -710,7 +719,7 @@ static void forget_original_parent(struct task_struct *father,
>   */
>  static void exit_notify(struct task_struct *tsk, int group_dead)
>  {
> -	bool autoreap;
> +	bool autoreap, xxx;

In case you don't mind the length of it how about:

bool autoreap, leading_empty_thread_group;

It's not the prettiest names but having rather descriptive var names in
these codepaths seems like a good idea to me.
It also reads very naturally further below:

if (leading_empty_thread_group || unlikely(tsk->ptrace)) {
 	int sig = leading_empty_thread_group && !ptrace_reparented(tsk)
 		? tsk->exit_signal : SIGCHLD;

>  	struct task_struct *p, *n;
>  	LIST_HEAD(dead);
>  
> @@ -720,23 +729,22 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>  	if (group_dead)
>  		kill_orphaned_pgrp(tsk->group_leader, NULL);
>  
> -	if (unlikely(tsk->ptrace)) {
> -		int sig = thread_group_leader(tsk) &&
> -				thread_group_empty(tsk) &&
> -				!ptrace_reparented(tsk) ?
> -			tsk->exit_signal : SIGCHLD;
> +	autoreap = true;
> +	xxx = thread_group_leader(tsk) && thread_group_empty(tsk);
> +
> +	if (xxx || unlikely(tsk->ptrace)) {
> +		int sig = xxx && !ptrace_reparented(tsk)
> +			? tsk->exit_signal : SIGCHLD;
>  		autoreap = do_notify_parent(tsk, sig);
> -	} else if (thread_group_leader(tsk)) {
> -		autoreap = thread_group_empty(tsk) &&
> -			do_notify_parent(tsk, tsk->exit_signal);
> -	} else {
> -		autoreap = true;
>  	}
>  
>  	tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
>  	if (tsk->exit_state == EXIT_DEAD)
>  		list_add(&tsk->ptrace_entry, &dead);
>  
> +	if (xxx)
> +		do_notify_pidfd(tsk);
> +
>  	/* mt-exec, de_thread() is waiting for group leader */
>  	if (unlikely(tsk->signal->notify_count < 0))
>  		wake_up_process(tsk->signal->group_exit_task);
> --- x/kernel/signal.c
> +++ x/kernel/signal.c
> @@ -1881,14 +1881,6 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>  	return ret;
>  }
>  
> -static void do_notify_pidfd(struct task_struct *task)
> -{
> -	struct pid *pid;
> -
> -	pid = task_pid(task);
> -	wake_up_all(&pid->wait_pidfd);
> -}
> -
>  /*
>   * Let a parent know about the death of a child.
>   * For a stopped/continued status change, use do_notify_parent_cldstop instead.
> @@ -1912,9 +1904,6 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	BUG_ON(!tsk->ptrace &&
>  	       (tsk->group_leader != tsk || !thread_group_empty(tsk)));
>  
> -	/* Wake up all pidfd waiters */
> -	do_notify_pidfd(tsk);
> -
>  	if (sig != SIGCHLD) {
>  		/*
>  		 * This is only possible if parent == real_parent.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] pidfd fixes
  2019-07-30 19:04 Christian Brauner
@ 2019-07-30 20:40 ` pr-tracker-bot
  0 siblings, 0 replies; 8+ messages in thread
From: pr-tracker-bot @ 2019-07-30 20:40 UTC (permalink / raw)
  To: Christian Brauner; +Cc: torvalds, linux-kernel, oleg, joel

The pull request you sent on Tue, 30 Jul 2019 21:04:37 +0200:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-20190730

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/629f8205a6cc63d2e8e30956bad958a3507d018f

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [GIT PULL] pidfd fixes
@ 2019-07-30 19:04 Christian Brauner
  2019-07-30 20:40 ` pr-tracker-bot
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2019-07-30 19:04 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, oleg, joel

Hi Linus,

This makes setting the exit_state in exit_notify() consistent after fixing
the pidfd polling race pre-rc1. Related to the race fix, this adds a
WARN_ON() to do_notify_pidfd() to catch any future exit_state races.
Last, this removes an obsolete comment from the pidfd tests.

The following changes since commit 609488bc979f99f805f34e9a32c1e3b71179d10b:

  Linux 5.3-rc2 (2019-07-28 12:47:02 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-20190730

for you to fetch changes up to 30b692d3b390c6fe78a5064be0c4bbd44a41be59:

  exit: make setting exit_state consistent (2019-07-30 19:57:14 +0200)

Please consider pulling from the signed for-linus-20190730 tag.

Thanks!
Christian

----------------------------------------------------------------
for-linus-20190730

----------------------------------------------------------------
Christian Brauner (2):
      pidfd: remove obsolete comments from test
      exit: make setting exit_state consistent

Joel Fernandes (Google) (1):
      pidfd: Add warning if exit_state is 0 during notification

 kernel/exit.c                              | 5 +++--
 kernel/signal.c                            | 1 +
 tools/testing/selftests/pidfd/pidfd_test.c | 6 +-----
 3 files changed, 5 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-30 20:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22 14:22 [GIT PULL] pidfd fixes Christian Brauner
2019-07-22 16:27 ` Linus Torvalds
2019-07-22 16:39   ` Christian Brauner
2019-07-23 10:12   ` Oleg Nesterov
2019-07-23 10:25     ` Christian Brauner
2019-07-22 16:40 ` pr-tracker-bot
2019-07-30 19:04 Christian Brauner
2019-07-30 20:40 ` pr-tracker-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.