linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>, Hugh Dickins <hughd@google.com>,
	 Michal Hocko <mhocko@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Tim Chen <tim.c.chen@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
Date: Fri, 24 Jul 2020 19:08:24 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2007241848300.8192@eggly.anvils> (raw)
In-Reply-To: <CAHk-=wjYHvbOs9i39EnUsC6VEJiuJ2e_5gZB5-J5CRKxq80B_Q@mail.gmail.com>

On Fri, 24 Jul 2020, Linus Torvalds wrote:

> On Fri, Jul 24, 2020 at 10:32 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > Ok, that makes sense. Except you did it on top of the original patch
> > without the fix to set WQ_FLAG_WOKEN for the non-wakeup case.
> 
> Hmm.
> 
> I just realized that one thing we could do is to not even test the
> page bit for the shared case in the wakeup path.
> 
> Because anybody who uses the non-exclusive "wait_on_page_locked()" or
> "wait_on_page_writeback()" isn't actually interested in the bit state
> any more at that point. All they care about is that somebody cleared
> it - not whether it was then re-taken again.
> 
> So instead of keeping them on the list - or stopping the waitqueue
> walk because somebody else got the bit - we could just mark them
> successfully done, wake them up, and remove those entries from the
> list.
> 
> That would be better for everybody - less pointless waiting for a new
> lock or writeback event, but also fewer entries on the wait queues as
> we get rid of them more quickly instead of walking them over and over
> just because somebody else re-took the page lock.
> 
> Generally "wait_on_page_locked()" is used for two things
> 
>  - either wait for the IO to then check if it's now uptodate
> 
>  - throttle things that can't afford to lock the page (eg page faults
> that dropped the mm lock, and as such need to go through the whole
> restart logic, but that don't want to lock the page because it's now
> too late, but also the page migration things)
> 
> In the first case, waiting to actually seeing the locked bit clear is
> pointless - the code only cared about the "wait for IO in progress"
> not about the lock bit itself.
> 
> And that second case generally might want to retry, but doesn't want
> to busy-loop.
> 
> And "wait_on_page_writeback()" is basically used for similar reasons
> (ie check if there were IO errors, but also possibly to throttle
> writeback traffic).
> 
> Saying "stop walking, keep it on the list" seems wrong. It makes IO
> error handling and retries much worse, for example.
> 
> So it turns out that the wakeup logic and the initial wait logic don't
> have so much in common after all, and there is a fundamental
> conceptual difference between that "check bit one last time" case, and
> the "we got woken up, now what" case..
> 
> End result: one final (yes, hopefully - I think I'm done) version of
> this patch-series.
> 
> This not only makes the code cleaner (the generated code for
> wake_up_page() is really quite nice now), but getting rid of extra
> waiting might help the load that Michal reported.
> 
> Because a lot of page waiting might be that non-exclusive
> "wait_on_page_locked()" kind, particularly in the thundering herd kind
> of situation where one process starts IO, and then other processes
> wait for it to finish.
> 
> Those users don't even care if somebody else then did a "lock_page()"
> for some other reason (maybe for writeback). They are generally
> perfectly happy with a locked page, as long as it's now up-to-date.
> 
> So this not only simplifies the code, it really might avoid some problems too.

That set of tests I started yesterday has now completed: no crashes
due to your changes (though, one machine crashed with an entirely
unrelated list_del corruption: over in driverland, just a coincidence).

I do see more of these stresstests reporting failure than I remember
from the last time I ran them, and I wasn't quickly able to work out
why (usually I just care about not crashing or hanging, rarely delve
deeper into what they actually call success).  The most likely cause
would be some internal infrastructural oddity, and nothing for you
to worry about; but there is a possibility that it's meaningful.

But whatever, what happens on the next run, with these latest patches,
will be more important; and I'll follow this next run with a run on
the baseline without them, to compare results.

Hugh


  reply	other threads:[~2020-07-25  2:08 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  6:32 [RFC PATCH] mm: silence soft lockups from unlock_page Michal Hocko
     [not found] ` <FCC3EB2D-9F11-4E9E-88F4-40B2926B35CC@lca.pw>
2020-07-21 11:25   ` Michal Hocko
     [not found]     ` <664A07B6-DBCD-4520-84F1-241A4E7A339F@lca.pw>
2020-07-21 12:17       ` Michal Hocko
     [not found]         ` <20200721132343.GA4261@lca.pw>
2020-07-21 13:38           ` Michal Hocko
2020-07-21 14:17 ` Chris Down
2020-07-21 15:00   ` Michal Hocko
2020-07-21 15:33 ` Linus Torvalds
2020-07-21 15:49   ` Michal Hocko
2020-07-22 18:29   ` Linus Torvalds
2020-07-22 21:29     ` Hugh Dickins
2020-07-22 22:10       ` Linus Torvalds
2020-07-22 23:42         ` Linus Torvalds
2020-07-23  0:23           ` Linus Torvalds
2020-07-23 12:47           ` Oleg Nesterov
2020-07-23 17:32             ` Linus Torvalds
2020-07-23 18:01               ` Oleg Nesterov
2020-07-23 18:22                 ` Linus Torvalds
2020-07-23 19:03                   ` Linus Torvalds
2020-07-24 14:45                     ` Oleg Nesterov
2020-07-23 20:03               ` Linus Torvalds
2020-07-23 23:11                 ` Hugh Dickins
2020-07-23 23:43                   ` Linus Torvalds
2020-07-24  0:07                     ` Hugh Dickins
2020-07-24  0:46                       ` Linus Torvalds
2020-07-24  3:45                         ` Hugh Dickins
2020-07-24 15:24                     ` Oleg Nesterov
2020-07-24 17:32                       ` Linus Torvalds
2020-07-24 23:25                         ` Linus Torvalds
2020-07-25  2:08                           ` Hugh Dickins [this message]
2020-07-25  2:46                             ` Linus Torvalds
2020-07-25 10:14                           ` Oleg Nesterov
2020-07-25 18:48                             ` Linus Torvalds
2020-07-25 19:27                               ` Oleg Nesterov
2020-07-25 19:51                                 ` Linus Torvalds
2020-07-26 13:57                                   ` Oleg Nesterov
2020-07-25 21:19                               ` Hugh Dickins
2020-07-26  4:22                                 ` Hugh Dickins
2020-07-26 20:30                                   ` Hugh Dickins
2020-07-26 20:41                                     ` Linus Torvalds
2020-07-26 22:09                                       ` Hugh Dickins
2020-07-27 19:35                                     ` Greg KH
2020-08-06  5:46                                       ` Hugh Dickins
2020-08-18 13:50                                         ` Greg KH
2020-08-06  5:21                                     ` Hugh Dickins
2020-08-06 17:07                                       ` Linus Torvalds
2020-08-06 18:00                                         ` Matthew Wilcox
2020-08-06 18:32                                           ` Linus Torvalds
2020-08-07 18:41                                             ` Hugh Dickins
2020-08-07 19:07                                               ` Linus Torvalds
2020-08-07 19:35                                               ` Matthew Wilcox
2020-08-03 13:14                           ` Michal Hocko
2020-08-03 17:56                             ` Linus Torvalds
2020-07-25  9:39                         ` Oleg Nesterov
2020-07-23  8:03     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2007241848300.8192@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).