linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline
Date: Thu, 3 Sep 2020 20:20:42 +0200	[thread overview]
Message-ID: <c6b11905-2456-52a0-3b15-d4ceae6e7f54@redhat.com> (raw)
In-Reply-To: <20200903063806.GM4617@dhcp22.suse.cz>

On 03.09.20 08:38, Michal Hocko wrote:
> On Wed 02-09-20 19:51:45, Vlastimil Babka wrote:
>> On 9/2/20 5:13 PM, Michal Hocko wrote:
>>> On Wed 02-09-20 16:55:05, Vlastimil Babka wrote:
>>>> On 9/2/20 4:26 PM, Pavel Tatashin wrote:
>>>>> On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Thread#1 - continue
>>>>>>>          free_unref_page_commit
>>>>>>>            migratetype = get_pcppage_migratetype(page);
>>>>>>>               // get old migration type
>>>>>>>            list_add(&page->lru, &pcp->lists[migratetype]);
>>>>>>>               // add new page to already drained pcp list
>>>>>>>
>>>>>>> Thread#2
>>>>>>> Never drains pcp again, and therefore gets stuck in the loop.
>>>>>>>
>>>>>>> The fix is to try to drain per-cpu lists again after
>>>>>>> check_pages_isolated_cb() fails.
>>>>>>
>>>>>> But this means that the page is not isolated and so it could be reused
>>>>>> for something else. No?
>>>>>
>>>>> The page is in a movable zone, has zero references, and the section is
>>>>> isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is
>>>>> set. The page should be offlinable, but it is lost in a pcp list as
>>>>> that list is never drained again after the first failure to migrate
>>>>> all pages in the range.
>>>>
>>>> Yeah. To answer Michal's "it could be reused for something else" - yes, somebody
>>>> could allocate it from the pcplist before we do the extra drain. But then it
>>>> becomes "visible again" and the loop in __offline_pages() should catch it by
>>>> scan_movable_pages() - do_migrate_range(). And this time the pageblock is
>>>> already marked as isolated, so the page (freed by migration) won't end up on the
>>>> pcplist again.
>>>
>>> So the page block is marked MIGRATE_ISOLATE but the allocation itself
>>> could be used for non migrateable objects. Or does anything prevent that
>>> from happening?
>>
>> In a movable zone, the allocation should not be used for non migrateable
>> objects. E.g. if the zone was not ZONE_MOVABLE, the offlining could fail
>> regardless of this race (analogically for migrating away from CMA pageblocks).
>>
>>> We really do depend on isolation to not allow reuse when offlining.
>>
>> This is not really different than if the page on pcplist was allocated just a
>> moment before the offlining, thus isolation started. We ultimately rely on being
>> able to migrate any allocated pages away during the isolation. This "freeing to
>> pcplists" race doesn't fundamentally change anything in this regard. We just
>> have to guarantee that pages on pcplists will be eventually flushed, to make
>> forward progress, and there was a bug in this aspect.
> 
> You are right. I managed to confuse myself yesterday. The race is
> impossible for !ZONE_MOVABLE because we do PageBuddy check there. And on
> the movable zone we are not losing the migrateability property.
> 
> Pavel I think this will be a useful information to add to the changelog.
> We should also document this in the code to prevent from further
> confusion. I would suggest something like the following:
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 242c03121d73..56d4892bceb8 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * pageblocks we may have modified and return -EBUSY to caller. This
>   * prevents two threads from simultaneously working on overlapping ranges.
>   *
> + * Please note that there is no strong synchronization with the page allocator
> + * either. Pages might be freed while their page blocks are marked ISOLATED.
> + * In some cases pages might still end up on pcp lists and that would allow
> + * for their allocation even when they are in fact isolated already. Depending on
> + * how strong of a guarantee the caller needs drain_all_pages might be needed
> + * (e.g. __offline_pages will need to call it after check for isolated range for
> + * a next retry).
> + *

As expressed in reply to v2, I dislike this hack. There is strong
synchronization, just PCP is special. Allocating from MIGRATE_ISOLATE is
just plain ugly.

Can't we temporarily disable PCP (while some pageblock in the zone is
isolated, which we know e.g., due to the counter), so no new pages get
put into PCP lists after draining, and re-enable after no pageblocks are
isolated again? We keep draining the PCP, so it doesn't seem to be of a
lot of use during that period, no? It's a performance hit already.

Then, we would only need exactly one drain. And we would only have to
check on the free path whether PCP is temporarily disabled.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2020-09-03 18:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01 12:46 [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin
2020-09-01 18:37 ` David Rientjes
2020-09-02 14:01 ` Michal Hocko
2020-09-02 14:10   ` Michal Hocko
2020-09-02 14:31     ` Pavel Tatashin
2020-09-02 14:49       ` Vlastimil Babka
2020-09-02 14:08 ` Michal Hocko
2020-09-02 14:26   ` Pavel Tatashin
2020-09-02 14:55     ` Vlastimil Babka
2020-09-02 15:13       ` Michal Hocko
2020-09-02 15:40         ` Pavel Tatashin
2020-09-02 17:51         ` Vlastimil Babka
2020-09-03  6:38           ` Michal Hocko
2020-09-03 18:20             ` David Hildenbrand [this message]
2020-09-03 18:23               ` Pavel Tatashin
2020-09-03 18:31                 ` David Hildenbrand
2020-09-04  7:02                   ` Michal Hocko
2020-09-04 14:25                     ` Pavel Tatashin
2020-09-07  7:26                       ` Michal Hocko
2020-09-04  6:32                 ` Vlastimil Babka
2020-09-03  7:07 ` Michal Hocko
2020-09-03 13:43   ` Pavel Tatashin
2020-09-03 13:50 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6b11905-2456-52a0-3b15-d4ceae6e7f54@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).