linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	David Hildenbrand <david@redhat.com>,
	Oscar Salvador <osalvador@suse.de>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [RFC 5/5] mm, page_alloc: disable pcplists during page isolation
Date: Wed, 9 Sep 2020 13:55:49 +0200	[thread overview]
Message-ID: <b057e618-94f1-3f5a-a4f6-55fc93ac34eb@suse.cz> (raw)
In-Reply-To: <20200909113647.GG7348@dhcp22.suse.cz>

On 9/9/20 1:36 PM, Michal Hocko wrote:
> On Wed 09-09-20 12:48:54, Vlastimil Babka wrote:
>> Here's a version that will apply on top of next-20200908. The first 4 patches need no change.
>> 
>> ----8<----
>> >From 8febc17272b8e8b378e2e5ea5e76b2616f029c5b Mon Sep 17 00:00:00 2001
>> From: Vlastimil Babka <vbabka@suse.cz>
>> Date: Mon, 7 Sep 2020 17:20:39 +0200
>> Subject: [PATCH] mm, page_alloc: disable pcplists during page isolation
>> 
>> Page isolation can race with process freeing pages to pcplists in a way that
>> a page from isolated pageblock can end up on pcplist. This can be fixed by
>> repeated draining of pcplists, as done by patch "mm/memory_hotplug: drain
>> per-cpu pages again during memory offline" in [1].
>> 
>> David and Michal would prefer that this race was closed in a way that callers
>> of page isolation don't need to care about drain. David suggested disabling
>> pcplists usage completely during page isolation, instead of repeatedly draining
>> them.
>> 
>> To achieve this without adding special cases in alloc/free fastpath, we can use
>> the same 'trick' as boot pagesets - when pcp->high is 0, any pcplist addition
>> will be immediately flushed.
>> 
>> The race can thus be closed by setting pcp->high to 0 and draining pcplists
>> once in start_isolate_page_range(). The draining will serialize after processes
>> that already disabled interrupts and read the old value of pcp->high in
>> free_unref_page_commit(), and processes that have not yet disabled interrupts,
>> will observe pcp->high == 0 when they are rescheduled, and skip pcplists.
>> This guarantees no stray pages on pcplists in zones where isolation happens.
>> 
>> We can use the variable zone->nr_isolate_pageblock (protected by zone->lock)
>> to detect transitions from 0 to 1 (to change pcp->high to 0 and issue drain)
>> and from 1 to 0 (to restore original pcp->high and batch values cached in
>> struct zone). We have to avoid external updates to high and batch by taking
>> pcp_batch_high_lock. To allow multiple isolations in parallel, change this
>> lock from mutex to rwsem.
>> 
>> For callers that pair start_isolate_page_range() with
>> undo_isolated_page_range() properly, this is transparent. Currently that's
>> alloc_contig_range(). __offline_pages() doesn't call undo_isolated_page_range()
>> in the succes case, so it has to be carful to handle restoring pcp->high and batch
>> and unlocking pcp_batch_high_lock.
> 
> I was hoping that it would be possible to have this completely hidden
> inside start_isolate_page_range code path.

I hoped so too, but we can't know the moment when all processes that were in the
critical part of freeing pages to pcplists have moved on (they might have been
rescheduled).
We could change free_unref_page() to disable IRQs sooner, before
free_unref_page_prepare(), or at least the get_pfnblock_migratetype() part. Then
after the single drain, we should be safe, AFAICS?
RT guys might not be happy though, but it's much simpler than this patch. I
still like some of the cleanups in 1-4 though tbh :)

> If we need some sort of
> disable_pcp_free/enable_pcp_free then it seems like a better fit to have
> an explicit API for that (the naming would be obviously different
> because we do not want to call out pcp free lists). I strongly suspect
> that only the memory hotplug really cares for this hard guanrantee.
> alloc_contig_range simply goes with EBUSY.
>  
>> This commit also changes drain_all_pages() to not trust reading pcp->count during
>> drain for page isolation - I believe that could be racy and lead to missing some
>> cpu's to drain. If others agree, this can be separated and potentially backported.
>> 
>> [1] https://lore.kernel.org/linux-mm/20200903140032.380431-1-pasha.tatashin@soleen.com/
>> 
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Michal Hocko <mhocko@suse.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>  include/linux/gfp.h |  1 +
>>  mm/internal.h       |  4 +++
>>  mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++++-------------
>>  mm/page_alloc.c     | 60 +++++++++++++++++++++++++++++----------------
>>  mm/page_isolation.c | 45 ++++++++++++++++++++++++++++------
>>  5 files changed, 119 insertions(+), 46 deletions(-)
> 
> This has turned out much larger than I would expect.
> 



  parent reply	other threads:[~2020-09-09 12:13 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 16:36 [RFC 0/5] disable pcplists during page isolation Vlastimil Babka
2020-09-07 16:36 ` [RFC 1/5] mm, page_alloc: clean up pageset high and batch update Vlastimil Babka
2020-09-10  8:31   ` Oscar Salvador
2020-09-10  8:34     ` Oscar Salvador
2020-09-10 10:47     ` David Hildenbrand
2020-09-17 16:05     ` Vlastimil Babka
2020-09-07 16:36 ` [RFC 2/5] mm, page_alloc: calculate pageset high and batch once per zone Vlastimil Babka
2020-09-10  9:00   ` Oscar Salvador
2020-09-10 10:56   ` David Hildenbrand
2020-09-07 16:36 ` [RFC 3/5] mm, page_alloc(): remove setup_pageset() Vlastimil Babka
2020-09-10  9:23   ` Oscar Salvador
2020-09-10  9:57     ` Oscar Salvador
2020-09-18  9:37     ` Vlastimil Babka
2020-09-10 11:00   ` David Hildenbrand
2020-09-07 16:36 ` [RFC 4/5] mm, page_alloc: cache pageset high and batch in struct zone Vlastimil Babka
2020-09-10 11:30   ` Oscar Salvador
2020-09-18 12:02     ` Vlastimil Babka
2020-09-07 16:36 ` [RFC 5/5] mm, page_alloc: disable pcplists during page isolation Vlastimil Babka
2020-09-09 10:48   ` Vlastimil Babka
2020-09-09 11:36     ` Michal Hocko
2020-09-09 11:44       ` David Hildenbrand
2020-09-09 12:57         ` David Hildenbrand
2020-09-09 11:55       ` Vlastimil Babka [this message]
2020-09-10 10:29         ` David Hildenbrand
2020-09-10 11:05           ` Vlastimil Babka
2020-09-10 12:42             ` David Hildenbrand
2020-09-09 12:53       ` Pavel Tatashin
2020-09-08 18:29 ` [RFC 0/5] " David Hildenbrand
2020-09-09 10:54   ` Vlastimil Babka
2020-09-09 11:27     ` osalvador
2020-09-09 11:29       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b057e618-94f1-3f5a-a4f6-55fc93ac34eb@suse.cz \
    --to=vbabka@suse.cz \
    --cc=david@redhat.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).