All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Eric Ren <renzhengeek@gmail.com>, Mike Rapoport <rppt@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
Date: Tue, 12 Apr 2022 17:06:01 +0200	[thread overview]
Message-ID: <afa115e6-d2ae-eeff-ba31-f8b2a5715b95@redhat.com> (raw)
In-Reply-To: <B4336C2E-E728-49E2-B84B-5728B9A1EA69@nvidia.com>

On 12.04.22 17:01, Zi Yan wrote:
> On 12 Apr 2022, at 10:49, David Hildenbrand wrote:
> 
>> On 12.04.22 16:07, Zi Yan wrote:
>>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
>>>
>>>> On 06.04.22 17:18, Zi Yan wrote:
>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>
>>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>>> unmovable pages during isolation. Page isolation is done
>>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>>>> granularity are intended to be isolated. For example,
>>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>>> alignment. This commit makes unmovable page check only look for
>>>>> interesting pages, so that page isolation can succeed for any
>>>>> non-overlapping ranges.
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>>  /*
>>>>> - * This function checks whether pageblock includes unmovable pages or not.
>>>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>>>> + * consequently belong to a single zone.
>>>>>   *
>>>>>   * PageLRU check without isolation or lru_lock could race so that
>>>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>>>> @@ -28,12 +30,14 @@
>>>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>>>   *
>>>>>   */
>>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>> -				 int migratetype, int flags)
>>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>>>> +				int migratetype, int flags)
>>>>>  {
>>>>> -	unsigned long iter = 0;
>>>>> -	unsigned long pfn = page_to_pfn(page);
>>>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>>>> +	unsigned long pfn = start_pfn;
>>>>> +	struct page *page = pfn_to_page(pfn);
>>>>
>>>>
>>>> Just do
>>>>
>>>> struct page *page = pfn_to_page(start_pfn);
>>>> struct zone *zone = page_zone(page);
>>>>
>>>> here. No need to lookup the zone again in the loop because, as you
>>>> document "must ... belong to a single zone.".
>>>>
>>>> Then, there is also no need to initialize "pfn" here. In the loop header
>>>> is sufficient.
>>>>
>>>
>>> Sure.
>>>
>>>>> +
>>>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>>>
>>>>>  	if (is_migrate_cma_page(page)) {
>>>>>  		/*
>>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		return page;
>>>>>  	}
>>>>>
>>>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>>>> -		page = pfn_to_page(pfn + iter);
>>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>>> +		struct zone *zone;
>>>>> +
>>>>> +		page = pfn_to_page(pfn);
>>>>> +		zone = page_zone(page);
>>>>>
>>>>>  		/*
>>>>>  		 * Both, bootmem allocations and memory holes are marked
>>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  			}
>>>>>
>>>>>  			skip_pages = compound_nr(head) - (page - head);
>>>>> -			iter += skip_pages - 1;
>>>>> +			pfn += skip_pages - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		 */
>>>>>  		if (!page_ref_count(page)) {
>>>>>  			if (PageBuddy(page))
>>>>> -				iter += (1 << buddy_order(page)) - 1;
>>>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  	return NULL;
>>>>>  }
>>>>>
>>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>>>> +/*
>>>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>>>> + * [start_pfn, end_pfn).
>>>>> + */
>>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>>>
>>>> I think we might be able do better, eventually not passing start_pfn at
>>>> all. Hmm.
>>>
>>> IMHO, having start_pfn and end_pfn in the parameter list would make the
>>> interface easier to understand. Otherwise if we remove start_pfn,
>>> the caller needs to adjust @page to be within the range of [start_pfn,
>>> end_pfn)
>>>
>>>>
>>>> I think we want to pull out the
>>>> start_isolate_page_range()/undo_isolate_page_range() interface change
>>>> into a separate patch.
>>>
>>> You mean a patch just adding
>>>
>>> unsigned long isolate_start = pfn_max_align_down(start_pfn);
>>> unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>>
>>> in start_isolate_page_range()/undo_isolate_page_range()?
>>>
>>> Yes I can do that.
>>
>> I think we have to be careful with memory onlining/offlining. There are
>> corner cases where we get called with only pageblock alignment and
>> must not adjust the range.
> 
> In the patch below, you added a new set of start_isolate_pageblocks()
> and undo_isolate_pageblocks(), where start_isolate_pageblocks() still
> calls set_migratetype_isolate() and noted their range should not be
> adjusted. But in my patch, set_migratetype_isolate() adjusts
> the range for has_unmovable_pages() check. Do you mean

Right, that's broken in your patch. Memory onlining/offlining behavior
changed recently when "vmemmap on memory" was added. The start range
might only be aligned to pageblocks but not MAX_ORDER -1 -- and we must
not align u..

The core thing is that there are two types of users: memory offlining
that knows what it's doing when it aligns to less then MAX_ORDER -1 ,
and range allocators, that just pass in the range of interest.

> start_isolate_pageblocks() should call a different version of
> set_migratetype_isolate() without range adjustment? That can be done
> with an additional parameter in start_isolate_page_range(), like
> bool strict, right?

Random boolean flags are in general frowned upon ;)

-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: linux-kernel@vger.kernel.org,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
	Mike Rapoport <rppt@kernel.org>, Eric Ren <renzhengeek@gmail.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Vlastimil Babka <vbabka@suse.cz>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages
Date: Tue, 12 Apr 2022 17:06:01 +0200	[thread overview]
Message-ID: <afa115e6-d2ae-eeff-ba31-f8b2a5715b95@redhat.com> (raw)
In-Reply-To: <B4336C2E-E728-49E2-B84B-5728B9A1EA69@nvidia.com>

On 12.04.22 17:01, Zi Yan wrote:
> On 12 Apr 2022, at 10:49, David Hildenbrand wrote:
> 
>> On 12.04.22 16:07, Zi Yan wrote:
>>> On 12 Apr 2022, at 9:10, David Hildenbrand wrote:
>>>
>>>> On 06.04.22 17:18, Zi Yan wrote:
>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>
>>>>> Enable set_migratetype_isolate() to check specified sub-range for
>>>>> unmovable pages during isolation. Page isolation is done
>>>>> at MAX_ORDER_NR_PAEGS granularity, but not all pages within that
>>>>> granularity are intended to be isolated. For example,
>>>>> alloc_contig_range(), which uses page isolation, allows ranges without
>>>>> alignment. This commit makes unmovable page check only look for
>>>>> interesting pages, so that page isolation can succeed for any
>>>>> non-overlapping ranges.
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>>  /*
>>>>> - * This function checks whether pageblock includes unmovable pages or not.
>>>>> + * This function checks whether the range [start_pfn, end_pfn) includes
>>>>> + * unmovable pages or not. The range must fall into a single pageblock and
>>>>> + * consequently belong to a single zone.
>>>>>   *
>>>>>   * PageLRU check without isolation or lru_lock could race so that
>>>>>   * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
>>>>> @@ -28,12 +30,14 @@
>>>>>   * cannot get removed (e.g., via memory unplug) concurrently.
>>>>>   *
>>>>>   */
>>>>> -static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>> -				 int migratetype, int flags)
>>>>> +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long end_pfn,
>>>>> +				int migratetype, int flags)
>>>>>  {
>>>>> -	unsigned long iter = 0;
>>>>> -	unsigned long pfn = page_to_pfn(page);
>>>>> -	unsigned long offset = pfn % pageblock_nr_pages;
>>>>> +	unsigned long pfn = start_pfn;
>>>>> +	struct page *page = pfn_to_page(pfn);
>>>>
>>>>
>>>> Just do
>>>>
>>>> struct page *page = pfn_to_page(start_pfn);
>>>> struct zone *zone = page_zone(page);
>>>>
>>>> here. No need to lookup the zone again in the loop because, as you
>>>> document "must ... belong to a single zone.".
>>>>
>>>> Then, there is also no need to initialize "pfn" here. In the loop header
>>>> is sufficient.
>>>>
>>>
>>> Sure.
>>>
>>>>> +
>>>>> +	VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=
>>>>> +		  ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages));
>>>>>
>>>>>  	if (is_migrate_cma_page(page)) {
>>>>>  		/*
>>>>> @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		return page;
>>>>>  	}
>>>>>
>>>>> -	for (; iter < pageblock_nr_pages - offset; iter++) {
>>>>> -		page = pfn_to_page(pfn + iter);
>>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>>> +		struct zone *zone;
>>>>> +
>>>>> +		page = pfn_to_page(pfn);
>>>>> +		zone = page_zone(page);
>>>>>
>>>>>  		/*
>>>>>  		 * Both, bootmem allocations and memory holes are marked
>>>>> @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  			}
>>>>>
>>>>>  			skip_pages = compound_nr(head) - (page - head);
>>>>> -			iter += skip_pages - 1;
>>>>> +			pfn += skip_pages - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  		 */
>>>>>  		if (!page_ref_count(page)) {
>>>>>  			if (PageBuddy(page))
>>>>> -				iter += (1 << buddy_order(page)) - 1;
>>>>> +				pfn += (1 << buddy_order(page)) - 1;
>>>>>  			continue;
>>>>>  		}
>>>>>
>>>>> @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
>>>>>  	return NULL;
>>>>>  }
>>>>>
>>>>> -static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
>>>>> +/*
>>>>> + * This function set pageblock migratetype to isolate if no unmovable page is
>>>>> + * present in [start_pfn, end_pfn). The pageblock must intersect with
>>>>> + * [start_pfn, end_pfn).
>>>>> + */
>>>>> +static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags,
>>>>> +			unsigned long start_pfn, unsigned long end_pfn)
>>>>
>>>> I think we might be able do better, eventually not passing start_pfn at
>>>> all. Hmm.
>>>
>>> IMHO, having start_pfn and end_pfn in the parameter list would make the
>>> interface easier to understand. Otherwise if we remove start_pfn,
>>> the caller needs to adjust @page to be within the range of [start_pfn,
>>> end_pfn)
>>>
>>>>
>>>> I think we want to pull out the
>>>> start_isolate_page_range()/undo_isolate_page_range() interface change
>>>> into a separate patch.
>>>
>>> You mean a patch just adding
>>>
>>> unsigned long isolate_start = pfn_max_align_down(start_pfn);
>>> unsigned long isolate_end = pfn_max_align_up(end_pfn);
>>>
>>> in start_isolate_page_range()/undo_isolate_page_range()?
>>>
>>> Yes I can do that.
>>
>> I think we have to be careful with memory onlining/offlining. There are
>> corner cases where we get called with only pageblock alignment and
>> must not adjust the range.
> 
> In the patch below, you added a new set of start_isolate_pageblocks()
> and undo_isolate_pageblocks(), where start_isolate_pageblocks() still
> calls set_migratetype_isolate() and noted their range should not be
> adjusted. But in my patch, set_migratetype_isolate() adjusts
> the range for has_unmovable_pages() check. Do you mean

Right, that's broken in your patch. Memory onlining/offlining behavior
changed recently when "vmemmap on memory" was added. The start range
might only be aligned to pageblocks but not MAX_ORDER -1 -- and we must
not align u..

The core thing is that there are two types of users: memory offlining
that knows what it's doing when it aligns to less then MAX_ORDER -1 ,
and range allocators, that just pass in the range of interest.

> start_isolate_pageblocks() should call a different version of
> set_migratetype_isolate() without range adjustment? That can be done
> with an additional parameter in start_isolate_page_range(), like
> bool strict, right?

Random boolean flags are in general frowned upon ;)

-- 
Thanks,

David / dhildenb

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2022-04-12 15:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06 15:18 [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-04-06 15:18 ` [PATCH v10 1/5] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-04-06 15:18 ` [PATCH v10 2/5] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-04-12 13:10   ` David Hildenbrand
2022-04-12 13:10     ` David Hildenbrand
2022-04-12 14:07     ` Zi Yan
2022-04-12 14:49       ` David Hildenbrand
2022-04-12 14:49         ` David Hildenbrand
2022-04-12 15:01         ` Zi Yan
2022-04-12 15:06           ` David Hildenbrand [this message]
2022-04-12 15:06             ` David Hildenbrand
2022-04-12 17:41             ` Zi Yan
2022-04-12 19:34               ` Zi Yan
2022-04-06 15:18 ` [PATCH v10 3/5] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-04-06 15:18 ` [PATCH v10 4/5] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-04-06 15:18 ` [PATCH v10 5/5] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-04-12 12:35 ` [PATCH v10 0/5] Use pageblock_order for cma and alloc_contig_range alignment David Hildenbrand
2022-04-12 12:35   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afa115e6-d2ae-eeff-ba31-f8b2a5715b95@redhat.com \
    --to=david@redhat.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=osalvador@suse.de \
    --cc=renzhengeek@gmail.com \
    --cc=rppt@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.