linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
Date: Tue, 21 Jul 2020 08:33:33 -0700	[thread overview]
Message-ID: <CAHk-=whewL14RgwLZTXcNAnrDPt0H+sRJS6iDq0oGb6zwaBMxg@mail.gmail.com> (raw)
In-Reply-To: <20200721063258.17140-1-mhocko@kernel.org>

On Mon, Jul 20, 2020 at 11:33 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> The lockup is in page_unlock in do_read_fault and I suspect that this is
> yet another effect of a very long waitqueue chain which has been
> addresses by 11a19c7b099f ("sched/wait: Introduce wakeup boomark in
> wake_up_page_bit") previously.

Hmm.

I do not believe that you can actually get to the point where you have
a million waiters and it takes 20+ seconds to wake everybody up.

More likely, it's actually *caused* by that commit 11a19c7b099f, and
what might be happening is that other CPU's are just adding new
waiters to the list *while* we're waking things up, because somebody
else already got the page lock again.

Humor me.. Does something like this work instead? It's
whitespace-damaged because of just a cut-and-paste, but it's entirely
untested, and I haven't really thought about any memory ordering
issues, but I think it's ok.

The logic is that anybody who called wake_up_page_bit() _must_ have
cleared that bit before that. So if we ever see it set again (and
memory ordering doesn't matter), then clearly somebody else got access
to the page bit (whichever it was), and we should not

 (a) waste time waking up people who can't get the bit anyway

 (b) be in a  livelock where other CPU's continually add themselves to
the wait queue because somebody else got the bit.

and it's that (b) case that I think happens for you.

NOTE! Totally UNTESTED patch follows. I think it's good, but maybe
somebody sees some problem with this approach?

I realize that people can wait for other bits than the unlocked, and
if you're waiting for writeback to complete maybe you don't care if
somebody else then started writeback *AGAIN* on the page and you'd
actually want to be woken up regardless, but honestly, I don't think
it really matters.

                Linus

--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1054,6 +1054,15 @@ static void wake_up_page_bit(struct page *page,
int bit_nr)
                 * from wait queue
                 */
                spin_unlock_irqrestore(&q->lock, flags);
+
+               /*
+                * If somebody else set the bit again, stop waking
+                * people up. It's now the responsibility of that
+                * other page bit owner to do so.
+                */
+               if (test_bit(bit_nr, &page->flags))
+                       return;
+
                cpu_relax();
                spin_lock_irqsave(&q->lock, flags);
                __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);

  parent reply	other threads:[~2020-07-21 15:33 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21  6:32 [RFC PATCH] mm: silence soft lockups from unlock_page Michal Hocko
2020-07-21 11:10 ` Qian Cai
2020-07-21 11:25   ` Michal Hocko
2020-07-21 11:44     ` Qian Cai
2020-07-21 12:17       ` Michal Hocko
2020-07-21 13:23         ` Qian Cai
2020-07-21 13:38           ` Michal Hocko
2020-07-21 14:15             ` Qian Cai
2020-07-21 14:17 ` Chris Down
2020-07-21 15:00   ` Michal Hocko
2020-07-21 15:33 ` Linus Torvalds [this message]
2020-07-21 15:49   ` Michal Hocko
2020-07-22 18:29   ` Linus Torvalds
2020-07-22 21:29     ` Hugh Dickins
2020-07-22 22:10       ` Linus Torvalds
2020-07-22 23:42         ` Linus Torvalds
2020-07-23  0:23           ` Linus Torvalds
2020-07-23 12:47           ` Oleg Nesterov
2020-07-23 17:32             ` Linus Torvalds
2020-07-23 18:01               ` Oleg Nesterov
2020-07-23 18:22                 ` Linus Torvalds
2020-07-23 19:03                   ` Linus Torvalds
2020-07-24 14:45                     ` Oleg Nesterov
2020-07-23 20:03               ` Linus Torvalds
2020-07-23 23:11                 ` Hugh Dickins
2020-07-23 23:43                   ` Linus Torvalds
2020-07-24  0:07                     ` Hugh Dickins
2020-07-24  0:46                       ` Linus Torvalds
2020-07-24  3:45                         ` Hugh Dickins
2020-07-24 15:24                     ` Oleg Nesterov
2020-07-24 17:32                       ` Linus Torvalds
2020-07-24 23:25                         ` Linus Torvalds
2020-07-25  2:08                           ` Hugh Dickins
2020-07-25  2:46                             ` Linus Torvalds
2020-07-25 10:14                           ` Oleg Nesterov
2020-07-25 18:48                             ` Linus Torvalds
2020-07-25 19:27                               ` Oleg Nesterov
2020-07-25 19:51                                 ` Linus Torvalds
2020-07-26 13:57                                   ` Oleg Nesterov
2020-07-25 21:19                               ` Hugh Dickins
2020-07-26  4:22                                 ` Hugh Dickins
2020-07-26 20:30                                   ` Hugh Dickins
2020-07-26 20:41                                     ` Linus Torvalds
2020-07-26 22:09                                       ` Hugh Dickins
2020-07-27 19:35                                     ` Greg KH
2020-08-06  5:46                                       ` Hugh Dickins
2020-08-18 13:50                                         ` Greg KH
2020-08-06  5:21                                     ` Hugh Dickins
2020-08-06 17:07                                       ` Linus Torvalds
2020-08-06 18:00                                         ` Matthew Wilcox
2020-08-06 18:32                                           ` Linus Torvalds
2020-08-07 18:41                                             ` Hugh Dickins
2020-08-07 19:07                                               ` Linus Torvalds
2020-08-07 19:35                                               ` Matthew Wilcox
2020-08-03 13:14                           ` Michal Hocko
2020-08-03 17:56                             ` Linus Torvalds
2020-07-25  9:39                         ` Oleg Nesterov
2020-07-23  8:03     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whewL14RgwLZTXcNAnrDPt0H+sRJS6iDq0oGb6zwaBMxg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).