All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Hugh Dickins <hughd@google.com>, Baoquan He <bhe@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, pifang@redhat.com,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	aarcange@redhat.com, Mel Gorman <mgorman@suse.de>
Subject: Re: Memory hotplug softlock issue
Date: Wed, 21 Nov 2018 17:53:33 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.11.1811211726080.5557@eggly.anvils> (raw)
In-Reply-To: <20181121173123.GS12932@dhcp22.suse.cz>

On Wed, 21 Nov 2018, Michal Hocko wrote:
> On Mon 19-11-18 21:44:41, Hugh Dickins wrote:
> [...]
> > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated
> > 
> > We have all assumed that it is essential to hold a page reference while
> > waiting on a page lock: partly to guarantee that there is still a struct
> > page when MEMORY_HOTREMOVE is configured, but also to protect against
> > reuse of the struct page going to someone who then holds the page locked
> > indefinitely, when the waiter can reasonably expect timely unlocking.
> 
> I would add the following for the "problem statement". Feel free to
> reuse per your preference:
> "
> An elevated reference count, however, stands in the way of migration and
> forces it to fail with a bad timing. This is especially a problem for
> memory offlining which retries for ever (or until the operation is
> terminated from userspace) because a heavy refault workload can trigger
> essentially an endless loop of migration failures. Therefore
> __migration_entry_wait is essentially harmful for the even it is waiting
> for.
> "

Okay, I do have a lot written from way back when I prepared the
now-abandoned migration_waitqueue patch internally, but I'll factor in
what you say above when I get there - in particular, you highlight the
memory offlining aspect, as in this mailthread: which is very helpful,
because it's outside my experience so I won't have mentioned it - thanks.

I just know that there's some important linkage to do, to the August 2017
WQ_FLAG_BOOKMARK discussion: so it's a research and editing job I have to
work myself up to at the right moment.

> 
> > But in fact, so long as wait_on_page_bit_common() does the put_page(),
> > and is careful not to rely on struct page contents thereafter, there is
> > no need to hold a reference to the page while waiting on it.  That does
> > mean that this case cannot go back through the loop: but that's fine for
> > the page migration case, and even if used more widely, is limited by the
> > "Stop walking if it's locked" optimization in wake_page_function().
> 
> I would appreciate this would be more explicit about the existence of
> the elevated-ref-count problem but it reduces it to a tiny time window
> compared to the whole time the waiter is blocked. So a great
> improvement.

Fair enough, I'll do so. (But that's a bit like when we say we've attached
something and then forget to do so: please check that I've been honest
when I do post.)

> 
> > Add interface put_and_wait_on_page_locked() to do this, using negative
> > value of the lock arg to wait_on_page_bit_common() to implement it.
> > No interruptible or killable variant needed yet, but they might follow:
> > I have a vague notion that reporting -EINTR should take precedence over
> > return from wait_on_page_bit_common() without knowing the page state,
> > so arrange it accordingly - but that may be nothing but pedantic.
> > 
> > shrink_page_list()'s __ClearPageLocked(): that was a surprise!
> 
> and I can imagine a bad one. Do we really have to be so clever here?
> The unlock_page went away in the name of performance (a978d6f521063)
> and I would argue that this is a slow path where this is just not worth
> it.

Do we really have to be so clever here? That's a good question: now we
have PG_waiters, we probably do not need to bother with this cleverness,
and it would save me from having to expand on that comment as I was asked.
I'll try going back to a simple unlock_page() there: and can always restore
the __ClearPageLocked if a reviewer demands, or 0-day notices regression,

> 
> > this
> > survived a lot of testing before that showed up.  It does raise the
> > question: should is_page_cache_freeable() and __remove_mapping() now
> > treat a PG_waiters page as if an extra reference were held?  Perhaps,
> > but I don't think it matters much, since shrink_page_list() already
> > had to win its trylock_page(), so waiters are not very common there: I
> > noticed no difference when trying the bigger change, and it's surely not
> > needed while put_and_wait_on_page_locked() is only for page migration.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> 
> The patch looks good to me - quite ugly but it doesn't make the existing
> code much worse.
> 
> With the problem described Vlastimil fixed, feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> 
> And thanks for a prompt patch. This is something I've been chasing for
> quite some time. __migration_entry_wait came to my radar only recently
> because this is an extremely volatile area.

You are very gracious to describe a patch promised six months ago as
"prompt".  But it does help me a lot to have it fixing a real problem
for someone (thank you Baoquan) - well, it fixed a real problem for us
internally too, but very nice to gather more backing for it like this.

Hugh

  reply	other threads:[~2018-11-22  1:53 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14  7:09 Memory hotplug softlock issue Baoquan He
2018-11-14  7:16 ` Baoquan He
2018-11-14  7:16   ` Baoquan He
2018-11-14  8:18 ` David Hildenbrand
2018-11-14  9:00   ` Baoquan He
2018-11-14  9:25     ` David Hildenbrand
2018-11-14  9:41       ` Michal Hocko
2018-11-14  9:48         ` David Hildenbrand
2018-11-14 10:04           ` Michal Hocko
2018-11-14  9:01   ` Michal Hocko
2018-11-14  9:22     ` David Hildenbrand
2018-11-14  9:37       ` Michal Hocko
2018-11-14  9:39         ` David Hildenbrand
2018-11-14 14:52     ` Baoquan He
2018-11-14 15:00       ` Michal Hocko
2018-11-15  5:10         ` Baoquan He
2018-11-15  7:30           ` Michal Hocko
2018-11-15  7:53             ` Baoquan He
2018-11-15  8:30               ` Michal Hocko
2018-11-15  9:42                 ` David Hildenbrand
2018-11-15  9:52                   ` Baoquan He
2018-11-15  9:53                     ` David Hildenbrand
2018-11-15 13:12                 ` Baoquan He
2018-11-15 13:19                   ` Michal Hocko
2018-11-15 13:23                     ` Baoquan He
2018-11-15 14:25                       ` Michal Hocko
2018-11-15 13:38                     ` Baoquan He
2018-11-15 14:32                       ` Michal Hocko
2018-11-15 14:34                         ` Baoquan He
2018-11-16  1:24                         ` Baoquan He
2018-11-16  9:14                           ` Michal Hocko
2018-11-17  4:22                             ` Baoquan He
2018-11-19 10:52                             ` Baoquan He
2018-11-19 12:40                               ` Michal Hocko
2018-11-19 12:51                                 ` Michal Hocko
2018-11-19 14:10                                   ` Michal Hocko
2018-11-19 16:36                                     ` Vlastimil Babka
2018-11-19 16:46                                       ` Michal Hocko
2018-11-19 16:46                                         ` Vlastimil Babka
2018-11-19 16:48                                           ` Vlastimil Babka
2018-11-19 17:01                                             ` Michal Hocko
2018-11-19 17:33                                     ` Michal Hocko
2018-11-19 20:34                                       ` Hugh Dickins
2018-11-19 20:59                                         ` Michal Hocko
2018-11-20  1:56                                           ` Baoquan He
2018-11-20  5:44                                             ` Hugh Dickins
2018-11-20 13:38                                               ` Vlastimil Babka
2018-11-20 13:58                                                 ` Baoquan He
2018-11-20 13:58                                                   ` Baoquan He
2018-11-20 14:05                                                   ` Michal Hocko
2018-11-20 14:12                                                     ` Baoquan He
2018-11-21  1:21                                                   ` Hugh Dickins
2018-11-21  1:08                                                 ` Hugh Dickins
2018-11-21  3:20                                                   ` Hugh Dickins
2018-11-21 17:31                                               ` Michal Hocko
2018-11-22  1:53                                                 ` Hugh Dickins [this message]
2018-11-14 10:00 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1811211726080.5557@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=pifang@redhat.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.