Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages

From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Eric Ren <renzhengeek@gmail.com>, Mike Rapoport <rppt@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
Date: Tue, 12 Apr 2022 17:06:01 +0200	[thread overview]
Message-ID: <afa115e6-d2ae-eeff-ba31-f8b2a5715b95@redhat.com> (raw)
In-Reply-To: <B4336C2E-E728-49E2-B84B-5728B9A1EA69@nvidia.com>

On 12.04.22 17:01, Zi Yan wrote:
> On 12 Apr 2022, at 10:49, David Hildenbrand wrote:
> 
>> On 12.04.22 16:07, Zi Yan wrote:
>>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
>>>
>>>> On 06.04.22 17:18, Zi Yan wrote:
>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>
>>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>>> unmovable pages during isolation. Page isolation is done
>>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>>>> granularity are intended to be isolated. For example,
>>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>>> alignment. This commit makes unmovable page check only look for
>>>>> interesting pages, so that page isolation can succeed for any
>>>>> non-overlapping ranges.
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>>  /*
>>>>> - * This function checks whether pageblock includes unmovable pages or not.
>>>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>>>> + * consequently belong to a single zone.
>>>>>   *
>>>>>   * PageLRU check without isolation or lru_lock could race so that
>>>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>>>> @@ -28,12 +30,14 @@
>>>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>>>   *
>>>>>   */
>>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>> -				 int migratetype, int flags)
>>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>>>> +				int migratetype, int flags)
>>>>>  {
>>>>> -	unsigned long iter = 0;
>>>>> -	unsigned long pfn = page_to_pfn(page);
>>>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>>>> +	unsigned long pfn = start_pfn;
>>>>> +	struct page *page = pfn_to_page(pfn);
>>>>
>>>>
>>>> Just do
>>>>
>>>> struct page *page = pfn_to_page(start_pfn);
>>>> struct zone *zone = page_zone(page);
>>>>
>>>> here. No need to lookup the zone again in the loop because, as you
>>>> document "must ... belong to a single zone.".
>>>>
>>>> Then, there is also no need to initialize "pfn" here. In the loop header
>>>> is sufficient.
>>>>
>>>
>>> Sure.
>>>
>>>>> +
>>>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>>>
>>>>>  	if (is_migrate_cma_page(page)) {
>>>>>  		/*
>>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		return page;
>>>>>  	}
>>>>>
>>>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>>>> -		page = pfn_to_page(pfn + iter);
>>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>>> +		struct zone *zone;
>>>>> +
>>>>> +		page = pfn_to_page(pfn);
>>>>> +		zone = page_zone(page);
>>>>>
>>>>>  		/*
>>>>>  		 * Both, bootmem allocations and memory holes are marked
>>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  			}
>>>>>
>>>>>  			skip_pages = compound_nr(head) - (page - head);
>>>>> -			iter += skip_pages - 1;
>>>>> +			pfn += skip_pages - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		 */
>>>>>  		if (!page_ref_count(page)) {
>>>>>  			if (PageBuddy(page))
>>>>> -				iter += (1 << buddy_order(page)) - 1;
>>>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  	return NULL;
>>>>>  }
>>>>>
>>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>>>> +/*
>>>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>>>> + * [start_pfn, end_pfn).
>>>>> + */
>>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>>>
>>>> I think we might be able do better, eventually not passing start_pfn at
>>>> all. Hmm.
>>>
>>> IMHO, having start_pfn and end_pfn in the parameter list would make the
>>> interface easier to understand. Otherwise if we remove start_pfn,
>>> the caller needs to adjust @page to be within the range of [start_pfn,
>>> end_pfn)
>>>
>>>>
>>>> I think we want to pull out the
>>>> start_isolate_page_range()/undo_isolate_page_range() interface change
>>>> into a separate patch.
>>>
>>> You mean a patch just adding
>>>
>>> unsigned long isolate_start = pfn_max_align_down(start_pfn);
>>> unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>>
>>> in start_isolate_page_range()/undo_isolate_page_range()?
>>>
>>> Yes I can do that.
>>
>> I think we have to be careful with memory onlining/offlining. There are
>> corner cases where we get called with only pageblock alignment and
>> must not adjust the range.
> 
> In the patch below, you added a new set of start_isolate_pageblocks()
> and undo_isolate_pageblocks(), where start_isolate_pageblocks() still
> calls set_migratetype_isolate() and noted their range should not be
> adjusted. But in my patch, set_migratetype_isolate() adjusts
> the range for has_unmovable_pages() check. Do you mean

Right, that's broken in your patch. Memory onlining/offlining behavior
changed recently when "vmemmap on memory" was added. The start range
might only be aligned to pageblocks but not MAX_ORDER -1 -- and we must
not align u..

The core thing is that there are two types of users: memory offlining
that knows what it's doing when it aligns to less then MAX_ORDER -1 ,
and range allocators, that just pass in the range of interest.

> start_isolate_pageblocks() should call a different version of
> set_migratetype_isolate() without range adjustment? That can be done
> with an additional parameter in start_isolate_page_range(), like
> bool strict, right?

Random boolean flags are in general frowned upon ;)

-- 
Thanks,

David / dhildenb