linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>,
	 Michal Hocko <mhocko@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Tim Chen <tim.c.chen@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>,
	 Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
Date: Fri, 7 Aug 2020 12:07:47 -0700	[thread overview]
Message-ID: <CAHk-=wjHdwaz9aU1NMcucr+kK3BdZkM7FYVxpBLOo1H0NHPynw@mail.gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2008071047510.1239@eggly.anvils>

On Fri, Aug 7, 2020 at 11:41 AM Hugh Dickins <hughd@google.com> wrote:
>
> +
> +       /*
> +        * If we hoped to pass PG_locked on to the next locker, but found
> +        * no exclusive taker, then we must clear it before dropping q->lock.
> +        * Otherwise unlock_page() can race trylock_page_bit_common(), and if
> +        * PageWaiters was already set from before, then cmpxchg sees no
> +        * difference to send it back to wake_up_page_bit().
> +        *
> +        * We may have already dropped and retaken q->lock on the way here, but
> +        * I think this works out because new entries are always added at tail.
> +        */
> +       if (exclude && !exclusive)
> +               clear_bit(bit_nr, &page->flags);
> +
>         spin_unlock_irqrestore(&q->lock, flags);

Yeah, I started thinking about this, and that's when I decided to just
throw away the patch.

Because now it clears the bit *after* it has woken people up, and that
just made me go "Eww".

You did add a comment about the whole "always added to the tail"
thing, and the spinlock should serialize everything, so I guess it's
ok (because the spinlock should serialize things that care), but it
just feels wrong.

I also started worrying about other people waiting on the page lock
(ie we now have that whole io_uring case), and interaction with the
PG_writeback case etc, and it just ended up feeling very messy.

I think it was all fine - other cases won't hit that exclusive case at
all - but I had a hard time wrapping my little mind around the exact
handoff rules, and the cmpxchg behavior when other bits changed (eg
PG_waiters), so I gave up.

> My home testing was, effectively, on top of c6fe44d96fc1 (v5.8 plus
> your two patches): I did not have in the io_uring changes from the
> current tree. In glancing there, I noticed one new and one previous
> instance of SetPageWaiters() *after* __add_wait_queue_entry_tail():
> are those correct?

I don't think SetPageWaiters() has any ordering constraints with the
wait queue. Nobody looks at the waitqueue unless they already got to
the slowpath.

The only ordering constraint is with the testing of the bit we're
going to wait on. Because doing

     if (test_and_set_bit())
          SetPageWaiters(page);

is wrong - there's a race in there when somebody can clear the bit,
and not see that there are waiters.

And obviously that needs to be done inside the spinlock, but once you
do that, the ordering of the actual wait-queue is entirely irrelevant.
The spinlock will order _that_ part. The only thing the spinlock won't
order and serialize is the PG_lock bit and making sure we get to the
slowpath in the first place.

So if you're talking about __wait_on_page_locked_async(), then I think
that part is ok.

Or am I missing something?

                Linus


  reply	other threads:[~2020-08-07 19:08 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  6:32 [RFC PATCH] mm: silence soft lockups from unlock_page Michal Hocko
     [not found] ` <FCC3EB2D-9F11-4E9E-88F4-40B2926B35CC@lca.pw>
2020-07-21 11:25   ` Michal Hocko
     [not found]     ` <664A07B6-DBCD-4520-84F1-241A4E7A339F@lca.pw>
2020-07-21 12:17       ` Michal Hocko
     [not found]         ` <20200721132343.GA4261@lca.pw>
2020-07-21 13:38           ` Michal Hocko
2020-07-21 14:17 ` Chris Down
2020-07-21 15:00   ` Michal Hocko
2020-07-21 15:33 ` Linus Torvalds
2020-07-21 15:49   ` Michal Hocko
2020-07-22 18:29   ` Linus Torvalds
2020-07-22 21:29     ` Hugh Dickins
2020-07-22 22:10       ` Linus Torvalds
2020-07-22 23:42         ` Linus Torvalds
2020-07-23  0:23           ` Linus Torvalds
2020-07-23 12:47           ` Oleg Nesterov
2020-07-23 17:32             ` Linus Torvalds
2020-07-23 18:01               ` Oleg Nesterov
2020-07-23 18:22                 ` Linus Torvalds
2020-07-23 19:03                   ` Linus Torvalds
2020-07-24 14:45                     ` Oleg Nesterov
2020-07-23 20:03               ` Linus Torvalds
2020-07-23 23:11                 ` Hugh Dickins
2020-07-23 23:43                   ` Linus Torvalds
2020-07-24  0:07                     ` Hugh Dickins
2020-07-24  0:46                       ` Linus Torvalds
2020-07-24  3:45                         ` Hugh Dickins
2020-07-24 15:24                     ` Oleg Nesterov
2020-07-24 17:32                       ` Linus Torvalds
2020-07-24 23:25                         ` Linus Torvalds
2020-07-25  2:08                           ` Hugh Dickins
2020-07-25  2:46                             ` Linus Torvalds
2020-07-25 10:14                           ` Oleg Nesterov
2020-07-25 18:48                             ` Linus Torvalds
2020-07-25 19:27                               ` Oleg Nesterov
2020-07-25 19:51                                 ` Linus Torvalds
2020-07-26 13:57                                   ` Oleg Nesterov
2020-07-25 21:19                               ` Hugh Dickins
2020-07-26  4:22                                 ` Hugh Dickins
2020-07-26 20:30                                   ` Hugh Dickins
2020-07-26 20:41                                     ` Linus Torvalds
2020-07-26 22:09                                       ` Hugh Dickins
2020-07-27 19:35                                     ` Greg KH
2020-08-06  5:46                                       ` Hugh Dickins
2020-08-18 13:50                                         ` Greg KH
2020-08-06  5:21                                     ` Hugh Dickins
2020-08-06 17:07                                       ` Linus Torvalds
2020-08-06 18:00                                         ` Matthew Wilcox
2020-08-06 18:32                                           ` Linus Torvalds
2020-08-07 18:41                                             ` Hugh Dickins
2020-08-07 19:07                                               ` Linus Torvalds [this message]
2020-08-07 19:35                                               ` Matthew Wilcox
2020-08-03 13:14                           ` Michal Hocko
2020-08-03 17:56                             ` Linus Torvalds
2020-07-25  9:39                         ` Oleg Nesterov
2020-07-23  8:03     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjHdwaz9aU1NMcucr+kK3BdZkM7FYVxpBLOo1H0NHPynw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).