All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Michal Hocko <mhocko@suse.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: re. Spurious wakeup on a newly created kthread
Date: Sun, 26 Jun 2022 10:58:04 +0900	[thread overview]
Message-ID: <Yre9LO2nj+Hbr67V@mtj.duckdns.org> (raw)
In-Reply-To: <CAHk-=wiC7rj1o7vTnYUPfD7YxAu09MZiZbahHqvLm9+Cgg1dFw@mail.gmail.com>

Hello,

On Sat, Jun 25, 2022 at 10:01:35AM -0700, Linus Torvalds wrote:
> On Fri, Jun 24, 2022 at 10:00 PM Tejun Heo <tj@kernel.org> wrote:
> >
> > So, Petr debugged a NULL deref in workqueue code to a spurious wakeup
> > on a newly created kthread.
> 
> What? No. That patch can't be right for several reasons.
> 
> What we call "spurious wakeups" exist, but they are about wakeups that
> happen from being on a _previous_ wait-queue, and having already been
> removed from it.
> 
> They aren't "really" spurious, they are just asynchronous enough (and
> thus unexpected) that you basically should never have a "sleep on
> wait-queue" without then looping and re-testing the condition.

Can you elaborate on this a bit? At least for the standard
wait_event-ish wait blocks, the waiter always does finish_wait()
before leavig a wait. finish_wait() does lockless check on
wq_entry->entry and may or may not grab wq_head->lock. When it does,
it's fully synchronized against the waker. Even when it doesn't, while
the lack of memory ordering on the finish_wait() side may let things
slide a bit, I can't see how it can slide after the set_current_state
in the next wait block.

I'm probably missing sometihng. Is it about bespoke wait mechanisms?
Can you give a concrete example of an async wakeup scenario?

> There is no _truly_ spurious wakeup. You were always woken up for a
> reason, it's just that there are more reasons than the entirely
> obvious ones.

So, the deferred wakeups from earlier waits are one. Can you give some
other examples? This is something which has always bothered me and I
couldn't find explanations which aren't hand-wavy on my own. It'd be
really great to have clarity.

> For example, the reason that quoted patch cannot be right is that this
> code pattern:
> 
>   while (wait_for_completion_interruptible(&worker->ready_to_start))
>     ;
> 
> is not valid kernel code. EVER. There is absolutely no way that can be correct.
>
> Either that code can take a signal, or it cannot. If it can take a
> signal, it had better react to said signal. If it cannot, it must not
> use an interruptble sleep - since now that loop turned into a
> kernel-side busy-loop.
> 
> So NAK on this kind of crazy "I don't know what happened, so I'll just
> add *more* bugs to the code" voodoo programming.
>
> And no, we don't "fix" that by then adding a timeout.

Yeah, I should've been more explicit on this. Michal already pointed
out that it doesn't make sense to loop over interruptible timed sleeps
and it should use one plain uninterruptible sleep, so this part isn't
in question.

...
> I think the problem here is much more fundamental: you expect a new
> thread to not wake up until you've told it to.
> 
> We do have that infrastructure in the kernel: when  you create a new
> thread, you can do various setup, and the thread won't actually run
> until you do "wake_up_new_task()" on it.

That's because that's the only thing which ignores TASK_NEW, right?

> However, that's not how kernel_thread() (or kernel_clone()) works.
> Those will call wake_up_new_task(p) for you, and as such a new kernel
> thread will immediately start running.
> 
> So I think the expectations here are entirely wrong.  I think
> create_worker() is fundamentally buggy, in how it does that
> 
>         /* start the newly created worker */
>         ..
>         wake_up_process(worker->task);
> 
> because that wake_up_process() is already much too late. The process
> got woken up already, because it was created by create_kthread() ->
> kernel_thread() -> kernel_clone, which does that wake_up_new_task()
> and it starts running.

A couple things still aren't clear for me.

* If there are no true spurious wakeups, where did the racing wakeup
  come from? The task just got created w/ TASK_NEW and woken up once
  with wake_up_new_task(). It hasn't been on any wait queue or
  advertised itself to anything.

* If there are spurious wakeups, why is kthread() scheduling after
  signaling creation completion in the first place? As I wrote before,
  all it would do is masking these bugs. If we can't gurantee that the
  kthread will stay blocked, shouldn't we just remove the
  schedule_preempt_disabled() call in kthread()?

Thanks.

-- 
tejun

  parent reply	other threads:[~2022-06-26  1:58 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 14:08 [PATCH] workqueue: Make create_worker() safe against spurious wakeups Petr Mladek
2022-06-23  7:00 ` Petr Mladek
2022-06-23  7:14   ` Michal Hocko
2022-06-26  4:19   ` Hillf Danton
2022-06-25  5:00 ` re. Spurious wakeup on a newly created kthread Tejun Heo
2022-06-25 17:01   ` Linus Torvalds
2022-06-25 17:36     ` Eric W. Biederman
2022-06-25 18:25       ` Linus Torvalds
2022-06-25 18:43         ` Linus Torvalds
2022-06-25 23:28           ` Eric W. Biederman
2022-06-25 23:41             ` Eric W. Biederman
2022-06-25 23:43             ` Linus Torvalds
2022-06-25 23:48               ` Linus Torvalds
2022-06-26  0:19                 ` Eric W. Biederman
2022-06-27  0:01                   ` Wedson Almeida Filho
2022-06-27  7:11                     ` Peter Zijlstra
2022-06-27 18:23                       ` Wedson Almeida Filho
2022-06-27 18:45                         ` Linus Torvalds
2022-06-26 19:14                 ` [PATCH 0/3] kthread: Stop using TASK_UNINTERRUPTIBLE Eric W. Biederman
2022-06-26 19:15                   ` [PATCH 1/3] kthread: Remove the flags argument from kernel_thread Eric W. Biederman
2022-06-26 21:20                     ` Linus Torvalds
2022-06-26 19:16                   ` [PATCH 2/3] kthread: Replace kernel_thread with new_kthread Eric W. Biederman
2022-06-26 19:16                   ` [PATCH 3/3] kthread: Stop abusing TASK_UNINTERRUPTIBLE (INCOMPLETE) Eric W. Biederman
2022-06-26 19:59                     ` Linus Torvalds
2022-06-26 20:23                       ` Tejun Heo
2022-06-26 20:55                         ` Linus Torvalds
2022-06-27  7:22                         ` Peter Zijlstra
2022-06-27  8:11                           ` Tejun Heo
2022-06-27 18:04                             ` Wedson Almeida Filho
2022-06-27 22:06                               ` Peter Zijlstra
2022-06-27 22:34                                 ` Linus Torvalds
2022-06-27 22:45                                 ` Wedson Almeida Filho
2022-06-28  0:32                                 ` Wedson Almeida Filho
2022-06-28  7:58                                   ` Peter Zijlstra
2022-06-30  0:57                                     ` Wedson Almeida Filho
2022-06-26 22:14                     ` kernel test robot
2022-06-26 22:34                     ` kernel test robot
2022-06-26  0:21               ` re. Spurious wakeup on a newly created kthread Eric W. Biederman
2022-06-28 14:16           ` Christian Brauner
2022-06-26  0:26         ` Eric W. Biederman
2022-06-26  1:58     ` Tejun Heo [this message]
2022-06-26  2:53       ` Linus Torvalds
2022-06-26  6:09         ` Tejun Heo
2022-06-27 12:04         ` Michal Hocko
2022-06-28  9:51     ` Petr Mladek
2022-06-28 10:07       ` Tejun Heo
2022-06-27  8:07   ` Michal Hocko
2022-06-27  8:21     ` Tejun Heo
2022-06-27 10:18       ` Michal Hocko
2022-06-28 15:08     ` Petr Mladek
2022-08-04  8:57 ` [PATCH] workqueue: Make create_worker() safe against spurious wakeups Lai Jiangshan
2022-08-04 10:19   ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yre9LO2nj+Hbr67V@mtj.duckdns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.