linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	 Oleg Nesterov <oleg@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,  Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Tim Chen <tim.c.chen@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
Date: Sat, 25 Jul 2020 21:22:29 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2007252100230.5376@eggly.anvils> (raw)
In-Reply-To: <alpine.LSU.2.11.2007251343370.3804@eggly.anvils>

On Sat, 25 Jul 2020, Hugh Dickins wrote:
> On Sat, 25 Jul 2020, Linus Torvalds wrote:
> > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > Heh. I too thought about this. And just in case, your patch looks correct
> > > to me. But I can't really comment this behavioural change. Perhaps it
> > > should come in a separate patch?
> > 
> > We could do that. At the same time, I think both parts change how the
> > waitqueue works that it might as well just be one "fix page_bit_wait
> > waitqueue usage".
> > 
> > But let's wait to see what Hugh's numbers say.
> 
> Oh no, no no: sorry for getting your hopes up there, I won't come up
> with any numbers more significant than "0 out of 10" machines crashed.
> I know it would be *really* useful if I could come up with performance
> comparisons, or steer someone else to do so: but I'm sorry, cannot.
> 
> Currently it's actually 1 out of 10 machines crashed, for the same
> driverland issue seen last time, maybe it's a bad machine; and another
> 1 out of the 10 machines went AWOL for unknown reasons, but probably
> something outside the kernel got confused by the stress.  No reason
> to suspect your changes at all (but some unanalyzed "failure"s, of
> dubious significance, accumulating like last time).
> 
> I'm optimistic: nothing has happened to warn us off your changes.

Less optimistic now, I'm afraid.

The machine I said had (twice) crashed coincidentally in driverland
(some USB completion thing): that machine I set running a comparison
kernel without your changes this morning, while the others still
running with your changes; and it has now passed the point where it
twice crashed before (the most troublesome test), without crashing.

Surprising: maybe still just coincidence, but I must look closer at
the crashes.

The others have now completed, and one other crashed in that
troublesome test, but sadly without yielding any crash info.

I've just set comparison runs going on them all, to judge whether
to take the "failure"s seriously; and I'll look more closely at them.

But hungry and tired now: unlikely to have more to say tonight.

> 
> And on Fri, 24 Jul 2020, Linus Torvalds had written:
> > So the loads you are running are known to have sensitivity to this
> > particular area, and are why you've done your patches to the page wait
> > bit code?
> 
> Yes. It's a series of nineteen ~hour-long tests, of which about five
> exhibited wake_up_page_bit problems in the past, and one has remained
> intermittently troublesome that way.  Intermittently: usually it does
> get through, so getting through yesterday and today won't even tell
> us that your changes fixed it - that we shall learn over time later.
> 
> Hugh


  reply	other threads:[~2020-07-26  4:22 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  6:32 [RFC PATCH] mm: silence soft lockups from unlock_page Michal Hocko
     [not found] ` <FCC3EB2D-9F11-4E9E-88F4-40B2926B35CC@lca.pw>
2020-07-21 11:25   ` Michal Hocko
     [not found]     ` <664A07B6-DBCD-4520-84F1-241A4E7A339F@lca.pw>
2020-07-21 12:17       ` Michal Hocko
     [not found]         ` <20200721132343.GA4261@lca.pw>
2020-07-21 13:38           ` Michal Hocko
2020-07-21 14:17 ` Chris Down
2020-07-21 15:00   ` Michal Hocko
2020-07-21 15:33 ` Linus Torvalds
2020-07-21 15:49   ` Michal Hocko
2020-07-22 18:29   ` Linus Torvalds
2020-07-22 21:29     ` Hugh Dickins
2020-07-22 22:10       ` Linus Torvalds
2020-07-22 23:42         ` Linus Torvalds
2020-07-23  0:23           ` Linus Torvalds
2020-07-23 12:47           ` Oleg Nesterov
2020-07-23 17:32             ` Linus Torvalds
2020-07-23 18:01               ` Oleg Nesterov
2020-07-23 18:22                 ` Linus Torvalds
2020-07-23 19:03                   ` Linus Torvalds
2020-07-24 14:45                     ` Oleg Nesterov
2020-07-23 20:03               ` Linus Torvalds
2020-07-23 23:11                 ` Hugh Dickins
2020-07-23 23:43                   ` Linus Torvalds
2020-07-24  0:07                     ` Hugh Dickins
2020-07-24  0:46                       ` Linus Torvalds
2020-07-24  3:45                         ` Hugh Dickins
2020-07-24 15:24                     ` Oleg Nesterov
2020-07-24 17:32                       ` Linus Torvalds
2020-07-24 23:25                         ` Linus Torvalds
2020-07-25  2:08                           ` Hugh Dickins
2020-07-25  2:46                             ` Linus Torvalds
2020-07-25 10:14                           ` Oleg Nesterov
2020-07-25 18:48                             ` Linus Torvalds
2020-07-25 19:27                               ` Oleg Nesterov
2020-07-25 19:51                                 ` Linus Torvalds
2020-07-26 13:57                                   ` Oleg Nesterov
2020-07-25 21:19                               ` Hugh Dickins
2020-07-26  4:22                                 ` Hugh Dickins [this message]
2020-07-26 20:30                                   ` Hugh Dickins
2020-07-26 20:41                                     ` Linus Torvalds
2020-07-26 22:09                                       ` Hugh Dickins
2020-07-27 19:35                                     ` Greg KH
2020-08-06  5:46                                       ` Hugh Dickins
2020-08-18 13:50                                         ` Greg KH
2020-08-06  5:21                                     ` Hugh Dickins
2020-08-06 17:07                                       ` Linus Torvalds
2020-08-06 18:00                                         ` Matthew Wilcox
2020-08-06 18:32                                           ` Linus Torvalds
2020-08-07 18:41                                             ` Hugh Dickins
2020-08-07 19:07                                               ` Linus Torvalds
2020-08-07 19:35                                               ` Matthew Wilcox
2020-08-03 13:14                           ` Michal Hocko
2020-08-03 17:56                             ` Linus Torvalds
2020-07-25  9:39                         ` Oleg Nesterov
2020-07-23  8:03     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2007252100230.5376@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).