All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
Date: Thu, 23 Jul 2020 11:22:44 -0700	[thread overview]
Message-ID: <CAHk-=wj9KWfs799xU5eW0J_hkee52C5kvFFmBV-A+vN7qNWnjA@mail.gmail.com> (raw)
In-Reply-To: <20200723180100.GA21755@redhat.com>

On Thu, Jul 23, 2020 at 11:01 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> > +      *
> > +      * We _really_ should have a "list_del_init_careful()" to
> > +      * properly pair with the unlocked "list_empty_careful()"
> > +      * in finish_wait().
> > +      */
> > +     smp_mb();
> > +     list_del_init(&wait->entry);
>
> I think smp_wmb() would be enough, but this is minor.

Well, what we _really_ want (and what that comment is about) would be
got that list_del_init_careful() to use a "smp_store_release()" for
the last store, and then "list_empty_careful()" would use a
"smp_load_acquire()" for the corresponding first load.

On x86, that's free. On most other architectures, it's the minimal
ordering requirement.

And no, I don't think "smp_wmb()" is technically enough.

With crazy memory ordering, one of the earlier *reads* (eg loading
"wait->private" when waking things up) could have been delayed until
after the stores that initialize the list - and thus read stack
contents from another process after it's been released and re-used.

Does that happen in reality? No. There are various conditionals in
there which means that the stores end up being gated on the loads and
cannot actually be re-ordered, but it's the kind of subtley

So we actually do want to constrain all earlier reads and writes wrt
the final write. Which is exactly what "smp_store_release()" does.

But with our current list_empty_careful(), the smp_mb() is I think
technically sufficient.

> We need a barrier between "wait->flags |= WQ_FLAG_WOKEN" and list_del_init(),

See above: we need more than just that write barrier, although in
_practice_ you're right, and the other barriers actually all already
exist and are part of wake_up_state().

So the smp_mb() is unnecessary, and in fact your smp_wmb() would be
too. But I left it there basically as "documentation".

> But afaics we need another barrier, rmb(), in wait_on_page_bit_common() fo
> the case when wait->private was not blocked; we need to ensure that if
> finish_wait() sees list_empty_careful() == T then we can't miss WQ_FLAG_WOKEN.

Again, this is what a proper list_empty_careful() with a
smp_load_acquire() would have automatically gotten for us.

But yes, I think that without that, and with the explicit barriers, we
need an smp_rmb() after the list_empty_careful().

I really think it should be _in_ list_empty_careful(), though. Or
maybe finish_wait(). Hmm.

Because looking at all the other finish_wait() uses, the fact that the
waitqueue _list_ is properly ordered isn't really a guarantee of the
rest of the stack space is.

In practice, it will be, but I think this lack of serialization is a
potential real bug elsewhere too.

(Obviously none of this would show on x86, where we already *get* that
smp_store_release/smp_load_acquire behavior for the existing
list_del_init()/list_empty_careful(), since all stores are releases,
and all loads are acquires)

So I think that is a separate issue, generic to our finish_wait() uses.

             Linus

  reply	other threads:[~2020-07-23 18:23 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  6:32 [RFC PATCH] mm: silence soft lockups from unlock_page Michal Hocko
2020-07-21 11:10 ` Qian Cai
2020-07-21 11:25   ` Michal Hocko
2020-07-21 11:44     ` Qian Cai
2020-07-21 12:17       ` Michal Hocko
2020-07-21 13:23         ` Qian Cai
2020-07-21 13:38           ` Michal Hocko
2020-07-21 14:15             ` Qian Cai
2020-07-21 14:17 ` Chris Down
2020-07-21 15:00   ` Michal Hocko
2020-07-21 15:33 ` Linus Torvalds
2020-07-21 15:33   ` Linus Torvalds
2020-07-21 15:49   ` Michal Hocko
2020-07-22 18:29   ` Linus Torvalds
2020-07-22 18:29     ` Linus Torvalds
2020-07-22 21:29     ` Hugh Dickins
2020-07-22 21:29       ` Hugh Dickins
2020-07-22 22:10       ` Linus Torvalds
2020-07-22 22:10         ` Linus Torvalds
2020-07-22 23:42         ` Linus Torvalds
2020-07-22 23:42           ` Linus Torvalds
2020-07-23  0:23           ` Linus Torvalds
2020-07-23  0:23             ` Linus Torvalds
2020-07-23 12:47           ` Oleg Nesterov
2020-07-23 17:32             ` Linus Torvalds
2020-07-23 17:32               ` Linus Torvalds
2020-07-23 18:01               ` Oleg Nesterov
2020-07-23 18:22                 ` Linus Torvalds [this message]
2020-07-23 18:22                   ` Linus Torvalds
2020-07-23 19:03                   ` Linus Torvalds
2020-07-23 19:03                     ` Linus Torvalds
2020-07-24 14:45                     ` Oleg Nesterov
2020-07-23 20:03               ` Linus Torvalds
2020-07-23 20:03                 ` Linus Torvalds
2020-07-23 23:11                 ` Hugh Dickins
2020-07-23 23:11                   ` Hugh Dickins
2020-07-23 23:43                   ` Linus Torvalds
2020-07-23 23:43                     ` Linus Torvalds
2020-07-24  0:07                     ` Hugh Dickins
2020-07-24  0:07                       ` Hugh Dickins
2020-07-24  0:46                       ` Linus Torvalds
2020-07-24  0:46                         ` Linus Torvalds
2020-07-24  3:45                         ` Hugh Dickins
2020-07-24  3:45                           ` Hugh Dickins
2020-07-24 15:24                     ` Oleg Nesterov
2020-07-24 17:32                       ` Linus Torvalds
2020-07-24 17:32                         ` Linus Torvalds
2020-07-24 23:25                         ` Linus Torvalds
2020-07-24 23:25                           ` Linus Torvalds
2020-07-25  2:08                           ` Hugh Dickins
2020-07-25  2:08                             ` Hugh Dickins
2020-07-25  2:46                             ` Linus Torvalds
2020-07-25  2:46                               ` Linus Torvalds
2020-07-25 10:14                           ` Oleg Nesterov
2020-07-25 18:48                             ` Linus Torvalds
2020-07-25 18:48                               ` Linus Torvalds
2020-07-25 19:27                               ` Oleg Nesterov
2020-07-25 19:51                                 ` Linus Torvalds
2020-07-25 19:51                                   ` Linus Torvalds
2020-07-26 13:57                                   ` Oleg Nesterov
2020-07-25 21:19                               ` Hugh Dickins
2020-07-25 21:19                                 ` Hugh Dickins
2020-07-26  4:22                                 ` Hugh Dickins
2020-07-26  4:22                                   ` Hugh Dickins
2020-07-26 20:30                                   ` Hugh Dickins
2020-07-26 20:30                                     ` Hugh Dickins
2020-07-26 20:41                                     ` Linus Torvalds
2020-07-26 20:41                                       ` Linus Torvalds
2020-07-26 22:09                                       ` Hugh Dickins
2020-07-26 22:09                                         ` Hugh Dickins
2020-07-27 19:35                                     ` Greg KH
2020-08-06  5:46                                       ` Hugh Dickins
2020-08-06  5:46                                         ` Hugh Dickins
2020-08-18 13:50                                         ` Greg KH
2020-08-06  5:21                                     ` Hugh Dickins
2020-08-06  5:21                                       ` Hugh Dickins
2020-08-06 17:07                                       ` Linus Torvalds
2020-08-06 17:07                                         ` Linus Torvalds
2020-08-06 18:00                                         ` Matthew Wilcox
2020-08-06 18:32                                           ` Linus Torvalds
2020-08-06 18:32                                             ` Linus Torvalds
2020-08-07 18:41                                             ` Hugh Dickins
2020-08-07 18:41                                               ` Hugh Dickins
2020-08-07 19:07                                               ` Linus Torvalds
2020-08-07 19:07                                                 ` Linus Torvalds
2020-08-07 19:35                                               ` Matthew Wilcox
2020-08-03 13:14                           ` Michal Hocko
2020-08-03 17:56                             ` Linus Torvalds
2020-08-03 17:56                               ` Linus Torvalds
2020-07-25  9:39                         ` Oleg Nesterov
2020-07-23  8:03     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wj9KWfs799xU5eW0J_hkee52C5kvFFmBV-A+vN7qNWnjA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.