* [GIT PULL] pidfd fixes @ 2019-07-30 19:04 Christian Brauner 2019-07-30 20:40 ` pr-tracker-bot 0 siblings, 1 reply; 8+ messages in thread From: Christian Brauner @ 2019-07-30 19:04 UTC (permalink / raw) To: torvalds; +Cc: linux-kernel, oleg, joel Hi Linus, This makes setting the exit_state in exit_notify() consistent after fixing the pidfd polling race pre-rc1. Related to the race fix, this adds a WARN_ON() to do_notify_pidfd() to catch any future exit_state races. Last, this removes an obsolete comment from the pidfd tests. The following changes since commit 609488bc979f99f805f34e9a32c1e3b71179d10b: Linux 5.3-rc2 (2019-07-28 12:47:02 -0700) are available in the Git repository at: git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-20190730 for you to fetch changes up to 30b692d3b390c6fe78a5064be0c4bbd44a41be59: exit: make setting exit_state consistent (2019-07-30 19:57:14 +0200) Please consider pulling from the signed for-linus-20190730 tag. Thanks! Christian ---------------------------------------------------------------- for-linus-20190730 ---------------------------------------------------------------- Christian Brauner (2): pidfd: remove obsolete comments from test exit: make setting exit_state consistent Joel Fernandes (Google) (1): pidfd: Add warning if exit_state is 0 during notification kernel/exit.c | 5 +++-- kernel/signal.c | 1 + tools/testing/selftests/pidfd/pidfd_test.c | 6 +----- 3 files changed, 5 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] pidfd fixes 2019-07-30 19:04 [GIT PULL] pidfd fixes Christian Brauner @ 2019-07-30 20:40 ` pr-tracker-bot 0 siblings, 0 replies; 8+ messages in thread From: pr-tracker-bot @ 2019-07-30 20:40 UTC (permalink / raw) To: Christian Brauner; +Cc: torvalds, linux-kernel, oleg, joel The pull request you sent on Tue, 30 Jul 2019 21:04:37 +0200: > git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-20190730 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/629f8205a6cc63d2e8e30956bad958a3507d018f Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ^ permalink raw reply [flat|nested] 8+ messages in thread
* [GIT PULL] pidfd fixes @ 2019-07-22 14:22 Christian Brauner 2019-07-22 16:27 ` Linus Torvalds 2019-07-22 16:40 ` pr-tracker-bot 0 siblings, 2 replies; 8+ messages in thread From: Christian Brauner @ 2019-07-22 14:22 UTC (permalink / raw) To: torvalds; +Cc: linux-kernel, oleg, surenb, joel Hi Linus, This contains a fix for pidfd polling. It ensures that the task's exit state is visible to all waiters: The following changes since commit 5f9e832c137075045d15cd6899ab0505cfb2ca4b: Linus 5.3-rc1 (2019-07-21 14:05:38 -0700) are available in the Git repository at: git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-20190722 for you to fetch changes up to b191d6491be67cef2b3fa83015561caca1394ab9: pidfd: fix a poll race when setting exit_state (2019-07-22 16:02:03 +0200) /* Summary */ The pidfd polling code suffered from a race condition. A waiter could be notified via do_notify_pidfd() without the task's exit state being set and thus not visible to the waiter. This would cause the waiter to be blocked indefinitely. The following schematic illustrates how this could happen: CPU 0 CPU 1 ------------------------------------------------ exit_notify do_notify_parent do_notify_pidfd pidfd_poll if (tsk->exit_state) tsk->exit_state = EXIT_DEAD This is fixed by ensuring that the task's exit state is set before calling into do_notify_pidfd(). Please consider pulling from the signed for-linus-20190722 tag. Thanks! Christian ---------------------------------------------------------------- for-linus-20190722 ---------------------------------------------------------------- Suren Baghdasaryan (1): pidfd: fix a poll race when setting exit_state kernel/exit.c | 1 + 1 file changed, 1 insertion(+) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] pidfd fixes 2019-07-22 14:22 Christian Brauner @ 2019-07-22 16:27 ` Linus Torvalds 2019-07-22 16:39 ` Christian Brauner 2019-07-23 10:12 ` Oleg Nesterov 2019-07-22 16:40 ` pr-tracker-bot 1 sibling, 2 replies; 8+ messages in thread From: Linus Torvalds @ 2019-07-22 16:27 UTC (permalink / raw) To: Christian Brauner Cc: Linux List Kernel Mailing, Oleg Nesterov, Suren Baghdasaryan, Joel Fernandes On Mon, Jul 22, 2019 at 7:26 AM Christian Brauner <christian@brauner.io> wrote: > > This contains a fix for pidfd polling. It ensures that the task's exit > state is visible to all waiters: Hmm. I've pulled this, but the exit_state thing has been very fragile before, and I'm not entirely happy with how this just changes where it is set. I guess the movement here is all inside the tasklist_lock, so it's not that big of a deal, but still.. I would *really* like Oleg to take a look. Also, and the primary reason I write this email is that this basically makes the "EXIT_ZOMBIE / EXIT_DEAD" state handling look all kinds of crazy. You set it to EXIT_ZOMBIE potentially _twice_. Whaa? So if we set EXIT_ZOMBIE early, then I think we should change the EXIT_DEAD case too. IOW, do something like this on top: --- a/kernel/exit.c +++ b/kernel/exit.c @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead) autoreap = true; } - tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; - if (tsk->exit_state == EXIT_DEAD) + if (autoreap) { + tsk->exit_state = EXIT_DEAD; list_add(&tsk->ptrace_entry, &dead); + } /* mt-exec, de_thread() is waiting for group leader */ if (unlikely(tsk->signal->notify_count < 0)) where now the logic becomes "ok, we turned into a zombie above, and if we autoreap this thread then we turn the zombie into a fully dead thread". Because currently we end up having "first turn it into a zombie", then "set it to zombie or dead depending on autoreap" and then "if we turned it into dead, move it to the dead list". That just feels confused to me. Comments? Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] pidfd fixes 2019-07-22 16:27 ` Linus Torvalds @ 2019-07-22 16:39 ` Christian Brauner 2019-07-23 10:12 ` Oleg Nesterov 1 sibling, 0 replies; 8+ messages in thread From: Christian Brauner @ 2019-07-22 16:39 UTC (permalink / raw) To: Linus Torvalds Cc: Linux List Kernel Mailing, Oleg Nesterov, Suren Baghdasaryan, Joel Fernandes On Mon, Jul 22, 2019 at 09:27:08AM -0700, Linus Torvalds wrote: > On Mon, Jul 22, 2019 at 7:26 AM Christian Brauner <christian@brauner.io> wrote: > > > > This contains a fix for pidfd polling. It ensures that the task's exit > > state is visible to all waiters: > > Hmm. > > I've pulled this, but the exit_state thing has been very fragile > before, and I'm not entirely happy with how this just changes where it > is set. I guess the movement here is all inside the tasklist_lock, so > it's not that big of a deal, but still.. > > I would *really* like Oleg to take a look. Oh, sorry. Oleg did take a look. See: https://lore.kernel.org/lkml/20190718150057.GB13355@redhat.com/ https://lore.kernel.org/lkml/20190719161404.GA24170@redhat.com/ > > Also, and the primary reason I write this email is that this basically > makes the "EXIT_ZOMBIE / EXIT_DEAD" state handling look all kinds of > crazy. You set it to EXIT_ZOMBIE potentially _twice_. Whaa? > > So if we set EXIT_ZOMBIE early, then I think we should change the > EXIT_DEAD case too. IOW, do something like this on top: > > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct > *tsk, int group_dead) > autoreap = true; > } > > - tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; > - if (tsk->exit_state == EXIT_DEAD) > + if (autoreap) { > + tsk->exit_state = EXIT_DEAD; > list_add(&tsk->ptrace_entry, &dead); > + } > > /* mt-exec, de_thread() is waiting for group leader */ > if (unlikely(tsk->signal->notify_count < 0)) > > where now the logic becomes "ok, we turned into a zombie above, and if > we autoreap this thread then we turn the zombie into a fully dead > thread". > > Because currently we end up having "first turn it into a zombie", then > "set it to zombie or dead depending on autoreap" and then "if we > turned it into dead, move it to the dead list". > > That just feels confused to me. Comments? Agreed. But that codepath is so core-kernel that I really felt more comfortable just doing the absolut minimal thing so that when things bite us we see it right away. There's no harm in sending a cleanup for this later I think, when we haven't hit any weirdness with the current change. Does that sound ok? Christian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] pidfd fixes 2019-07-22 16:27 ` Linus Torvalds 2019-07-22 16:39 ` Christian Brauner @ 2019-07-23 10:12 ` Oleg Nesterov 2019-07-23 10:25 ` Christian Brauner 1 sibling, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2019-07-23 10:12 UTC (permalink / raw) To: Linus Torvalds Cc: Christian Brauner, Linux List Kernel Mailing, Suren Baghdasaryan, Joel Fernandes On 07/22, Linus Torvalds wrote: > > So if we set EXIT_ZOMBIE early, then I think we should change the > EXIT_DEAD case too. IOW, do something like this on top: > > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct > *tsk, int group_dead) > autoreap = true; > } > > - tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; > - if (tsk->exit_state == EXIT_DEAD) > + if (autoreap) { > + tsk->exit_state = EXIT_DEAD; > list_add(&tsk->ptrace_entry, &dead); > + } Yes, this needs cleanups. Actually I was going to suggest another change below, this way do_notify_pidfd() is only called when it is really needed. But then I decided a trivial one-liner makes more sense for the start. I'll try to think. Perhaps we should also change do_notify_parent() to set p->exit_state, at least if autoreap. Then the early exit_state = EXIT_ZOMBIE won't look so confusing and we can do more (minor) cleanups. Oleg. --- x/kernel/exit.c +++ x/kernel/exit.c @@ -182,6 +182,13 @@ static void delayed_put_task_struct(struct rcu_head *rhp) put_task_struct(tsk); } +static void do_notify_pidfd(struct task_struct *task) +{ + struct pid *pid; + + pid = task_pid(task); + wake_up_all(&pid->wait_pidfd); +} void release_task(struct task_struct *p) { @@ -218,6 +225,8 @@ void release_task(struct task_struct *p) zap_leader = do_notify_parent(leader, leader->exit_signal); if (zap_leader) leader->exit_state = EXIT_DEAD; + + do_notify_pidfd(leader); } write_unlock_irq(&tasklist_lock); @@ -710,7 +719,7 @@ static void forget_original_parent(struct task_struct *father, */ static void exit_notify(struct task_struct *tsk, int group_dead) { - bool autoreap; + bool autoreap, xxx; struct task_struct *p, *n; LIST_HEAD(dead); @@ -720,23 +729,22 @@ static void exit_notify(struct task_struct *tsk, int group_dead) if (group_dead) kill_orphaned_pgrp(tsk->group_leader, NULL); - if (unlikely(tsk->ptrace)) { - int sig = thread_group_leader(tsk) && - thread_group_empty(tsk) && - !ptrace_reparented(tsk) ? - tsk->exit_signal : SIGCHLD; + autoreap = true; + xxx = thread_group_leader(tsk) && thread_group_empty(tsk); + + if (xxx || unlikely(tsk->ptrace)) { + int sig = xxx && !ptrace_reparented(tsk) + ? tsk->exit_signal : SIGCHLD; autoreap = do_notify_parent(tsk, sig); - } else if (thread_group_leader(tsk)) { - autoreap = thread_group_empty(tsk) && - do_notify_parent(tsk, tsk->exit_signal); - } else { - autoreap = true; } tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; if (tsk->exit_state == EXIT_DEAD) list_add(&tsk->ptrace_entry, &dead); + if (xxx) + do_notify_pidfd(tsk); + /* mt-exec, de_thread() is waiting for group leader */ if (unlikely(tsk->signal->notify_count < 0)) wake_up_process(tsk->signal->group_exit_task); --- x/kernel/signal.c +++ x/kernel/signal.c @@ -1881,14 +1881,6 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) return ret; } -static void do_notify_pidfd(struct task_struct *task) -{ - struct pid *pid; - - pid = task_pid(task); - wake_up_all(&pid->wait_pidfd); -} - /* * Let a parent know about the death of a child. * For a stopped/continued status change, use do_notify_parent_cldstop instead. @@ -1912,9 +1904,6 @@ bool do_notify_parent(struct task_struct *tsk, int sig) BUG_ON(!tsk->ptrace && (tsk->group_leader != tsk || !thread_group_empty(tsk))); - /* Wake up all pidfd waiters */ - do_notify_pidfd(tsk); - if (sig != SIGCHLD) { /* * This is only possible if parent == real_parent. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] pidfd fixes 2019-07-23 10:12 ` Oleg Nesterov @ 2019-07-23 10:25 ` Christian Brauner 0 siblings, 0 replies; 8+ messages in thread From: Christian Brauner @ 2019-07-23 10:25 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Linux List Kernel Mailing, Suren Baghdasaryan, Joel Fernandes On Tue, Jul 23, 2019 at 12:12:49PM +0200, Oleg Nesterov wrote: > On 07/22, Linus Torvalds wrote: > > > > So if we set EXIT_ZOMBIE early, then I think we should change the > > EXIT_DEAD case too. IOW, do something like this on top: > > > > --- a/kernel/exit.c > > +++ b/kernel/exit.c > > @@ -734,9 +734,10 @@ static void exit_notify(struct task_struct > > *tsk, int group_dead) > > autoreap = true; > > } > > > > - tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; > > - if (tsk->exit_state == EXIT_DEAD) > > + if (autoreap) { > > + tsk->exit_state = EXIT_DEAD; > > list_add(&tsk->ptrace_entry, &dead); > > + } > > Yes, this needs cleanups. Actually I was going to suggest another change > below, this way do_notify_pidfd() is only called when it is really needed. > But then I decided a trivial one-liner makes more sense for the start. Yeah, that was my thinking exactly. > > I'll try to think. Perhaps we should also change do_notify_parent() to set > p->exit_state, at least if autoreap. Then the early exit_state = EXIT_ZOMBIE > won't look so confusing and we can do more (minor) cleanups. You mind sending that as a proper patch? I also have a few more cleanups for other parts I intend to send later this week. I'd pick this one up too. > > Oleg. > > --- x/kernel/exit.c > +++ x/kernel/exit.c > @@ -182,6 +182,13 @@ static void delayed_put_task_struct(struct rcu_head *rhp) > put_task_struct(tsk); > } > > +static void do_notify_pidfd(struct task_struct *task) > +{ > + struct pid *pid; > + > + pid = task_pid(task); > + wake_up_all(&pid->wait_pidfd); > +} > > void release_task(struct task_struct *p) > { > @@ -218,6 +225,8 @@ void release_task(struct task_struct *p) > zap_leader = do_notify_parent(leader, leader->exit_signal); > if (zap_leader) > leader->exit_state = EXIT_DEAD; > + > + do_notify_pidfd(leader); > } > > write_unlock_irq(&tasklist_lock); > @@ -710,7 +719,7 @@ static void forget_original_parent(struct task_struct *father, > */ > static void exit_notify(struct task_struct *tsk, int group_dead) > { > - bool autoreap; > + bool autoreap, xxx; In case you don't mind the length of it how about: bool autoreap, leading_empty_thread_group; It's not the prettiest names but having rather descriptive var names in these codepaths seems like a good idea to me. It also reads very naturally further below: if (leading_empty_thread_group || unlikely(tsk->ptrace)) { int sig = leading_empty_thread_group && !ptrace_reparented(tsk) ? tsk->exit_signal : SIGCHLD; > struct task_struct *p, *n; > LIST_HEAD(dead); > > @@ -720,23 +729,22 @@ static void exit_notify(struct task_struct *tsk, int group_dead) > if (group_dead) > kill_orphaned_pgrp(tsk->group_leader, NULL); > > - if (unlikely(tsk->ptrace)) { > - int sig = thread_group_leader(tsk) && > - thread_group_empty(tsk) && > - !ptrace_reparented(tsk) ? > - tsk->exit_signal : SIGCHLD; > + autoreap = true; > + xxx = thread_group_leader(tsk) && thread_group_empty(tsk); > + > + if (xxx || unlikely(tsk->ptrace)) { > + int sig = xxx && !ptrace_reparented(tsk) > + ? tsk->exit_signal : SIGCHLD; > autoreap = do_notify_parent(tsk, sig); > - } else if (thread_group_leader(tsk)) { > - autoreap = thread_group_empty(tsk) && > - do_notify_parent(tsk, tsk->exit_signal); > - } else { > - autoreap = true; > } > > tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE; > if (tsk->exit_state == EXIT_DEAD) > list_add(&tsk->ptrace_entry, &dead); > > + if (xxx) > + do_notify_pidfd(tsk); > + > /* mt-exec, de_thread() is waiting for group leader */ > if (unlikely(tsk->signal->notify_count < 0)) > wake_up_process(tsk->signal->group_exit_task); > --- x/kernel/signal.c > +++ x/kernel/signal.c > @@ -1881,14 +1881,6 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > return ret; > } > > -static void do_notify_pidfd(struct task_struct *task) > -{ > - struct pid *pid; > - > - pid = task_pid(task); > - wake_up_all(&pid->wait_pidfd); > -} > - > /* > * Let a parent know about the death of a child. > * For a stopped/continued status change, use do_notify_parent_cldstop instead. > @@ -1912,9 +1904,6 @@ bool do_notify_parent(struct task_struct *tsk, int sig) > BUG_ON(!tsk->ptrace && > (tsk->group_leader != tsk || !thread_group_empty(tsk))); > > - /* Wake up all pidfd waiters */ > - do_notify_pidfd(tsk); > - > if (sig != SIGCHLD) { > /* > * This is only possible if parent == real_parent. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] pidfd fixes 2019-07-22 14:22 Christian Brauner 2019-07-22 16:27 ` Linus Torvalds @ 2019-07-22 16:40 ` pr-tracker-bot 1 sibling, 0 replies; 8+ messages in thread From: pr-tracker-bot @ 2019-07-22 16:40 UTC (permalink / raw) To: Christian Brauner; +Cc: torvalds, linux-kernel, oleg, surenb, joel The pull request you sent on Mon, 22 Jul 2019 16:22:41 +0200: > git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/for-linus-20190722 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/44b912cd0b55777796c5ae8ae857bd1d5ff83ed5 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-07-30 20:40 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-30 19:04 [GIT PULL] pidfd fixes Christian Brauner 2019-07-30 20:40 ` pr-tracker-bot -- strict thread matches above, loose matches on Subject: below -- 2019-07-22 14:22 Christian Brauner 2019-07-22 16:27 ` Linus Torvalds 2019-07-22 16:39 ` Christian Brauner 2019-07-23 10:12 ` Oleg Nesterov 2019-07-23 10:25 ` Christian Brauner 2019-07-22 16:40 ` pr-tracker-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).