All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Doug Berger <opendmb@gmail.com>
Cc: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Eric Ren <renzhengeek@gmail.com>, Mike Rapoport <rppt@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Andrew Morton <akpm@linux-foundation.org>,
	kernel test robot <lkp@intel.com>,
	Qian Cai <quic_qiancai@quicinc.com>
Subject: Re: [PATCH v11 3/6] mm: make alloc_contig_range work at pageblock granularity
Date: Wed, 25 May 2022 13:53:04 -0400	[thread overview]
Message-ID: <F80CEC0E-0EA8-4210-8730-57D4D0CF0B23@nvidia.com> (raw)
In-Reply-To: <180aaa57-28d8-30f0-e843-ea52e3a180a8@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 26078 bytes --]

On 25 May 2022, at 13:41, Doug Berger wrote:

> I am seeing some free memory accounting problems with linux-next that I have bisected to this commit (i.e. b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity").
>
> On an arm64 SMP platform with 4GB total memory and the default 16MB default CMA pool, I am seeing the following after boot with a sysrq Show Memory (e.g. 'echo m > /proc/sysrq-trigger'):
>
> [   16.015906] sysrq: Show Memory
> [   16.019039] Mem-Info:
> [   16.021348] active_anon:14604 inactive_anon:919 isolated_anon:0
> [   16.021348]  active_file:0 inactive_file:0 isolated_file:0
> [   16.021348]  unevictable:0 dirty:0 writeback:0
> [   16.021348]  slab_reclaimable:3662 slab_unreclaimable:3333
> [   16.021348]  mapped:928 shmem:15146 pagetables:63 bounce:0
> [   16.021348]  kernel_misc_reclaimable:0
> [   16.021348]  free:976766 free_pcp:991 free_cma:7017
> [   16.056937] Node 0 active_anon:58416kB inactive_anon:3676kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3712kB dirty:0kB writeback:0kB shmem:60584kB writeback_tmp:0kB kernel_stack:1200kB pagetables:252kB all_unreclaimable? no
> [   16.081526] DMA free:3041036kB boost:0kB min:6036kB low:9044kB high:12052kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3145728kB managed:3029992kB mlocked:0kB bounce:0kB free_pcp:636kB local_pcp:0kB free_cma:28068kB
> [   16.108650] lowmem_reserve[]: 0 0 944 944
> [   16.112746] Normal free:866028kB boost:0kB min:1936kB low:2900kB high:3864kB reserved_highatomic:0KB active_anon:58416kB inactive_anon:3676kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1048576kB managed:967352kB mlocked:0kB bounce:0kB free_pcp:3328kB local_pcp:864kB free_cma:0kB
> [   16.140393] lowmem_reserve[]: 0 0 0 0
> [   16.144133] DMA: 7*4kB (UMC) 4*8kB (M) 3*16kB (M) 3*32kB (MC) 5*64kB (M) 4*128kB (MC) 5*256kB (UMC) 7*512kB (UM) 5*1024kB (UM) 9*2048kB (UMC) 732*4096kB (MC) = 3027724kB
> [   16.159609] Normal: 149*4kB (UM) 95*8kB (UME) 26*16kB (UME) 8*32kB (ME) 2*64kB (UE) 1*128kB (M) 2*256kB (ME) 2*512kB (ME) 2*1024kB (UM) 0*2048kB 210*4096kB (M) = 866028kB
> [   16.175165] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> [   16.183937] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
> [   16.192533] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [   16.201040] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
> [   16.209374] 15146 total pagecache pages
> [   16.213246] 0 pages in swap cache
> [   16.216595] Swap cache stats: add 0, delete 0, find 0/0
> [   16.221867] Free swap  = 0kB
> [   16.224780] Total swap = 0kB
> [   16.227693] 1048576 pages RAM
> [   16.230694] 0 pages HighMem/MovableOnly
> [   16.234564] 49240 pages reserved
> [   16.237825] 4096 pages cma reserved
>
> Some anomolies in the above are:
> free_cma:7017 with only 4096 pages cma reserved
> DMA free:3041036kB with only managed:3029992kB
>
> I'm not sure what is going on here, but I am suspicious of split_free_page() since del_page_from_free_list doesn't affect migrate_type accounting, but __free_one_page() can.
> Also PageBuddy(page) is being checked without zone->lock in isolate_single_pageblock().
>
> Please investigate this as well.


Can you try this patch https://lore.kernel.org/linux-mm/20220524194756.1698351-1-zi.yan@sent.com/
and see if it fixes the issue?

Thanks.

>
> Thanks!
>     Doug
>
> On 4/29/2022 6:54 AM, Zi Yan wrote:
>> On 25 Apr 2022, at 10:31, Zi Yan wrote:
>>
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> alloc_contig_range() worked at MAX_ORDER_NR_PAGES granularity to avoid
>>> merging pageblocks with different migratetypes. It might unnecessarily
>>> convert extra pageblocks at the beginning and at the end of the range.
>>> Change alloc_contig_range() to work at pageblock granularity.
>>>
>>> Special handling is needed for free pages and in-use pages across the
>>> boundaries of the range specified by alloc_contig_range(). Because these
>>> partially isolated pages causes free page accounting issues. The free
>>> pages will be split and freed into separate migratetype lists; the
>>> in-use pages will be migrated then the freed pages will be handled in
>>> the aforementioned way.
>>>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>   include/linux/page-isolation.h |   4 +-
>>>   mm/internal.h                  |   6 ++
>>>   mm/memory_hotplug.c            |   3 +-
>>>   mm/page_alloc.c                |  54 ++++++++--
>>>   mm/page_isolation.c            | 184 ++++++++++++++++++++++++++++++++-
>>>   5 files changed, 233 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>>> index e14eddf6741a..5456b7be38ae 100644
>>> --- a/include/linux/page-isolation.h
>>> +++ b/include/linux/page-isolation.h
>>> @@ -42,7 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
>>>    */
>>>   int
>>>   start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>> -			 unsigned migratetype, int flags);
>>> +			 int migratetype, int flags, gfp_t gfp_flags);
>>>
>>>   /*
>>>    * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
>>> @@ -50,7 +50,7 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>>    */
>>>   void
>>>   undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>> -			unsigned migratetype);
>>> +			int migratetype);
>>>
>>>   /*
>>>    * Test all pages in [start_pfn, end_pfn) are isolated or not.
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 919fa07e1031..0667abd57634 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -359,6 +359,9 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align,
>>>   			  phys_addr_t min_addr,
>>>   			  int nid, bool exact_nid);
>>>
>>> +void split_free_page(struct page *free_page,
>>> +				int order, unsigned long split_pfn_offset);
>>> +
>>>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>>
>>>   /*
>>> @@ -422,6 +425,9 @@ isolate_freepages_range(struct compact_control *cc,
>>>   int
>>>   isolate_migratepages_range(struct compact_control *cc,
>>>   			   unsigned long low_pfn, unsigned long end_pfn);
>>> +
>>> +int __alloc_contig_migrate_range(struct compact_control *cc,
>>> +					unsigned long start, unsigned long end);
>>>   #endif
>>>   int find_suitable_fallback(struct free_area *area, unsigned int order,
>>>   			int migratetype, bool only_stealable, bool *can_steal);
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 4c6065e5d274..9f8ae4cb77ee 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1845,7 +1845,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>>>   	/* set above range as isolated */
>>>   	ret = start_isolate_page_range(start_pfn, end_pfn,
>>>   				       MIGRATE_MOVABLE,
>>> -				       MEMORY_OFFLINE | REPORT_FAILURE);
>>> +				       MEMORY_OFFLINE | REPORT_FAILURE,
>>> +				       GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
>>>   	if (ret) {
>>>   		reason = "failure to isolate range";
>>>   		goto failed_removal_pcplists_disabled;
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ce23ac8ad085..70ddd9a0bcf3 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1094,6 +1094,43 @@ static inline void __free_one_page(struct page *page,
>>>   		page_reporting_notify_free(order);
>>>   }
>>>
>>> +/**
>>> + * split_free_page() -- split a free page at split_pfn_offset
>>> + * @free_page:		the original free page
>>> + * @order:		the order of the page
>>> + * @split_pfn_offset:	split offset within the page
>>> + *
>>> + * It is used when the free page crosses two pageblocks with different migratetypes
>>> + * at split_pfn_offset within the page. The split free page will be put into
>>> + * separate migratetype lists afterwards. Otherwise, the function achieves
>>> + * nothing.
>>> + */
>>> +void split_free_page(struct page *free_page,
>>> +				int order, unsigned long split_pfn_offset)
>>> +{
>>> +	struct zone *zone = page_zone(free_page);
>>> +	unsigned long free_page_pfn = page_to_pfn(free_page);
>>> +	unsigned long pfn;
>>> +	unsigned long flags;
>>> +	int free_page_order;
>>> +
>>> +	spin_lock_irqsave(&zone->lock, flags);
>>> +	del_page_from_free_list(free_page, zone, order);
>>> +	for (pfn = free_page_pfn;
>>> +	     pfn < free_page_pfn + (1UL << order);) {
>>> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>>> +
>>> +		free_page_order = ffs(split_pfn_offset) - 1;
>>> +		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
>>> +				mt, FPI_NONE);
>>> +		pfn += 1UL << free_page_order;
>>> +		split_pfn_offset -= (1UL << free_page_order);
>>> +		/* we have done the first part, now switch to second part */
>>> +		if (split_pfn_offset == 0)
>>> +			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
>>> +	}
>>> +	spin_unlock_irqrestore(&zone->lock, flags);
>>> +}
>>>   /*
>>>    * A bad page could be due to a number of fields. Instead of multiple branches,
>>>    * try and check multiple fields with one check. The caller must do a detailed
>>> @@ -8919,7 +8956,7 @@ static inline void alloc_contig_dump_pages(struct list_head *page_list)
>>>   #endif
>>>
>>>   /* [start, end) must belong to a single zone. */
>>> -static int __alloc_contig_migrate_range(struct compact_control *cc,
>>> +int __alloc_contig_migrate_range(struct compact_control *cc,
>>>   					unsigned long start, unsigned long end)
>>>   {
>>>   	/* This function is based on compact_zone() from compaction.c. */
>>> @@ -9002,7 +9039,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>>   		       unsigned migratetype, gfp_t gfp_mask)
>>>   {
>>>   	unsigned long outer_start, outer_end;
>>> -	unsigned int order;
>>> +	int order;
>>>   	int ret = 0;
>>>
>>>   	struct compact_control cc = {
>>> @@ -9021,14 +9058,11 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>>   	 * What we do here is we mark all pageblocks in range as
>>>   	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
>>>   	 * have different sizes, and due to the way page allocator
>>> -	 * work, we align the range to biggest of the two pages so
>>> -	 * that page allocator won't try to merge buddies from
>>> -	 * different pageblocks and change MIGRATE_ISOLATE to some
>>> -	 * other migration type.
>>> +	 * work, start_isolate_page_range() has special handlings for this.
>>>   	 *
>>>   	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
>>>   	 * migrate the pages from an unaligned range (ie. pages that
>>> -	 * we are interested in).  This will put all the pages in
>>> +	 * we are interested in). This will put all the pages in
>>>   	 * range back to page allocator as MIGRATE_ISOLATE.
>>>   	 *
>>>   	 * When this is done, we take the pages in range from page
>>> @@ -9042,9 +9076,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>>   	 */
>>>
>>>   	ret = start_isolate_page_range(pfn_max_align_down(start),
>>> -				       pfn_max_align_up(end), migratetype, 0);
>>> +				pfn_max_align_up(end), migratetype, 0, gfp_mask);
>>>   	if (ret)
>>> -		return ret;
>>> +		goto done;
>>>
>>>   	drain_all_pages(cc.zone);
>>>
>>> @@ -9064,7 +9098,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>>   	ret = 0;
>>>
>>>   	/*
>>> -	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
>>> +	 * Pages from [start, end) are within a pageblock_nr_pages
>>>   	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
>>>   	 * more, all pages in [start, end) are free in page allocator.
>>>   	 * What we are going to do is to allocate all pages from
>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>> index c2f7a8bb634d..94b3467e5ba2 100644
>>> --- a/mm/page_isolation.c
>>> +++ b/mm/page_isolation.c
>>> @@ -203,7 +203,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>>>   	return -EBUSY;
>>>   }
>>>
>>> -static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>>> +static void unset_migratetype_isolate(struct page *page, int migratetype)
>>>   {
>>>   	struct zone *zone;
>>>   	unsigned long flags, nr_pages;
>>> @@ -279,6 +279,157 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>>   	return NULL;
>>>   }
>>>
>>> +/**
>>> + * isolate_single_pageblock() -- tries to isolate a pageblock that might be
>>> + * within a free or in-use page.
>>> + * @boundary_pfn:		pageblock-aligned pfn that a page might cross
>>> + * @gfp_flags:			GFP flags used for migrating pages
>>> + * @isolate_before:	isolate the pageblock before the boundary_pfn
>>> + *
>>> + * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
>>> + * pageblock. When not all pageblocks within a page are isolated at the same
>>> + * time, free page accounting can go wrong. For example, in the case of
>>> + * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
>>> + * [         MAX_ORDER-1         ]
>>> + * [  pageblock0  |  pageblock1  ]
>>> + * When either pageblock is isolated, if it is a free page, the page is not
>>> + * split into separate migratetype lists, which is supposed to; if it is an
>>> + * in-use page and freed later, __free_one_page() does not split the free page
>>> + * either. The function handles this by splitting the free page or migrating
>>> + * the in-use page then splitting the free page.
>>> + */
>>> +static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>>> +			bool isolate_before)
>>> +{
>>> +	unsigned char saved_mt;
>>> +	unsigned long start_pfn;
>>> +	unsigned long isolate_pageblock;
>>> +	unsigned long pfn;
>>> +	struct zone *zone;
>>> +
>>> +	VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
>>> +
>>> +	if (isolate_before)
>>> +		isolate_pageblock = boundary_pfn - pageblock_nr_pages;
>>> +	else
>>> +		isolate_pageblock = boundary_pfn;
>>> +
>>> +	/*
>>> +	 * scan at the beginning of MAX_ORDER_NR_PAGES aligned range to avoid
>>> +	 * only isolating a subset of pageblocks from a bigger than pageblock
>>> +	 * free or in-use page. Also make sure all to-be-isolated pageblocks
>>> +	 * are within the same zone.
>>> +	 */
>>> +	zone  = page_zone(pfn_to_page(isolate_pageblock));
>>> +	start_pfn  = max(ALIGN_DOWN(isolate_pageblock, MAX_ORDER_NR_PAGES),
>>> +				      zone->zone_start_pfn);
>>> +
>>> +	saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock));
>>> +	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), MIGRATE_ISOLATE);
>>> +
>>> +	/*
>>> +	 * Bail out early when the to-be-isolated pageblock does not form
>>> +	 * a free or in-use page across boundary_pfn:
>>> +	 *
>>> +	 * 1. isolate before boundary_pfn: the page after is not online
>>> +	 * 2. isolate after boundary_pfn: the page before is not online
>>> +	 *
>>> +	 * This also ensures correctness. Without it, when isolate after
>>> +	 * boundary_pfn and [start_pfn, boundary_pfn) are not online,
>>> +	 * __first_valid_page() will return unexpected NULL in the for loop
>>> +	 * below.
>>> +	 */
>>> +	if (isolate_before) {
>>> +		if (!pfn_to_online_page(boundary_pfn))
>>> +			return 0;
>>> +	} else {
>>> +		if (!pfn_to_online_page(boundary_pfn - 1))
>>> +			return 0;
>>> +	}
>>> +
>>> +	for (pfn = start_pfn; pfn < boundary_pfn;) {
>>> +		struct page *page = __first_valid_page(pfn, boundary_pfn - pfn);
>>> +
>>> +		VM_BUG_ON(!page);
>>> +		pfn = page_to_pfn(page);
>>> +		/*
>>> +		 * start_pfn is MAX_ORDER_NR_PAGES aligned, if there is any
>>> +		 * free pages in [start_pfn, boundary_pfn), its head page will
>>> +		 * always be in the range.
>>> +		 */
>>> +		if (PageBuddy(page)) {
>>> +			int order = buddy_order(page);
>>> +
>>> +			if (pfn + (1UL << order) > boundary_pfn)
>>> +				split_free_page(page, order, boundary_pfn - pfn);
>>> +			pfn += (1UL << order);
>>> +			continue;
>>> +		}
>>> +		/*
>>> +		 * migrate compound pages then let the free page handling code
>>> +		 * above do the rest. If migration is not enabled, just fail.
>>> +		 */
>>> +		if (PageHuge(page) || PageTransCompound(page)) {
>>> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>> +			unsigned long nr_pages = compound_nr(page);
>>> +			int order = compound_order(page);
>>> +			struct page *head = compound_head(page);
>>> +			unsigned long head_pfn = page_to_pfn(head);
>>> +			int ret;
>>> +			struct compact_control cc = {
>>> +				.nr_migratepages = 0,
>>> +				.order = -1,
>>> +				.zone = page_zone(pfn_to_page(head_pfn)),
>>> +				.mode = MIGRATE_SYNC,
>>> +				.ignore_skip_hint = true,
>>> +				.no_set_skip_hint = true,
>>> +				.gfp_mask = gfp_flags,
>>> +				.alloc_contig = true,
>>> +			};
>>> +			INIT_LIST_HEAD(&cc.migratepages);
>>> +
>>> +			if (head_pfn + nr_pages < boundary_pfn) {
>>> +				pfn += nr_pages;
>>> +				continue;
>>> +			}
>>> +
>>> +			ret = __alloc_contig_migrate_range(&cc, head_pfn,
>>> +						head_pfn + nr_pages);
>>> +
>>> +			if (ret)
>>> +				goto failed;
>>> +			/*
>>> +			 * reset pfn, let the free page handling code above
>>> +			 * split the free page to the right migratetype list.
>>> +			 *
>>> +			 * head_pfn is not used here as a hugetlb page order
>>> +			 * can be bigger than MAX_ORDER-1, but after it is
>>> +			 * freed, the free page order is not. Use pfn within
>>> +			 * the range to find the head of the free page and
>>> +			 * reset order to 0 if a hugetlb page with
>>> +			 * >MAX_ORDER-1 order is encountered.
>>> +			 */
>>> +			if (order > MAX_ORDER-1)
>>> +				order = 0;
>>> +			while (!PageBuddy(pfn_to_page(pfn))) {
>>> +				order++;
>>> +				pfn &= ~0UL << order;
>>> +			}
>>> +			continue;
>>> +#else
>>> +			goto failed;
>>> +#endif
>>> +		}
>>> +
>>> +		pfn++;
>>> +	}
>>> +	return 0;
>>> +failed:
>>> +	/* restore the original migratetype */
>>> +	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), saved_mt);
>>> +	return -EBUSY;
>>> +}
>>> +
>>>   /**
>>>    * start_isolate_page_range() - make page-allocation-type of range of pages to
>>>    * be MIGRATE_ISOLATE.
>>> @@ -293,6 +444,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>>    *					 and PageOffline() pages.
>>>    *			REPORT_FAILURE - report details about the failure to
>>>    *			isolate the range
>>> + * @gfp_flags:		GFP flags used for migrating pages that sit across the
>>> + *			range boundaries.
>>>    *
>>>    * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
>>>    * the range will never be allocated. Any free pages and pages freed in the
>>> @@ -301,6 +454,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>>    * pages in the range finally, the caller have to free all pages in the range.
>>>    * test_page_isolated() can be used for test it.
>>>    *
>>> + * The function first tries to isolate the pageblocks at the beginning and end
>>> + * of the range, since there might be pages across the range boundaries.
>>> + * Afterwards, it isolates the rest of the range.
>>> + *
>>>    * There is no high level synchronization mechanism that prevents two threads
>>>    * from trying to isolate overlapping ranges. If this happens, one thread
>>>    * will notice pageblocks in the overlapping range already set to isolate.
>>> @@ -321,21 +478,38 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>>    * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
>>>    */
>>>   int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>> -			     unsigned migratetype, int flags)
>>> +			     int migratetype, int flags, gfp_t gfp_flags)
>>>   {
>>>   	unsigned long pfn;
>>>   	struct page *page;
>>> +	int ret;
>>>
>>>   	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
>>>   	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
>>>
>>> -	for (pfn = start_pfn;
>>> -	     pfn < end_pfn;
>>> +	/* isolate [start_pfn, start_pfn + pageblock_nr_pages) pageblock */
>>> +	ret = isolate_single_pageblock(start_pfn, gfp_flags, false);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/* isolate [end_pfn - pageblock_nr_pages, end_pfn) pageblock */
>>> +	ret = isolate_single_pageblock(end_pfn, gfp_flags, true);
>>> +	if (ret) {
>>> +		unset_migratetype_isolate(pfn_to_page(start_pfn), migratetype);
>>> +		return ret;
>>> +	}
>>> +
>>> +	/* skip isolated pageblocks at the beginning and end */
>>> +	for (pfn = start_pfn + pageblock_nr_pages;
>>> +	     pfn < end_pfn - pageblock_nr_pages;
>>>   	     pfn += pageblock_nr_pages) {
>>>   		page = __first_valid_page(pfn, pageblock_nr_pages);
>>>   		if (page && set_migratetype_isolate(page, migratetype, flags,
>>>   					start_pfn, end_pfn)) {
>>>   			undo_isolate_page_range(start_pfn, pfn, migratetype);
>>> +			unset_migratetype_isolate(
>>> +				pfn_to_page(end_pfn - pageblock_nr_pages),
>>> +				migratetype);
>>>   			return -EBUSY;
>>>   		}
>>>   	}
>>> @@ -346,7 +520,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>>    * Make isolated pages available again.
>>>    */
>>>   void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>> -			    unsigned migratetype)
>>> +			    int migratetype)
>>>   {
>>>   	unsigned long pfn;
>>>   	struct page *page;
>>> -- 
>>> 2.35.1
>>
>> Qian hit a bug caused by this series https://lore.kernel.org/linux-mm/20220426201855.GA1014@qian/
>> and the fix is:
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 75e454f5cf45..b3f074d1682e 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -367,58 +367,67 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>>   		}
>>   		/*
>>   		 * migrate compound pages then let the free page handling code
>> -		 * above do the rest. If migration is not enabled, just fail.
>> +		 * above do the rest. If migration is not possible, just fail.
>>   		 */
>> -		if (PageHuge(page) || PageTransCompound(page)) {
>> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>> +		if (PageCompound(page)) {
>>   			unsigned long nr_pages = compound_nr(page);
>> -			int order = compound_order(page);
>>   			struct page *head = compound_head(page);
>>   			unsigned long head_pfn = page_to_pfn(head);
>> -			int ret;
>> -			struct compact_control cc = {
>> -				.nr_migratepages = 0,
>> -				.order = -1,
>> -				.zone = page_zone(pfn_to_page(head_pfn)),
>> -				.mode = MIGRATE_SYNC,
>> -				.ignore_skip_hint = true,
>> -				.no_set_skip_hint = true,
>> -				.gfp_mask = gfp_flags,
>> -				.alloc_contig = true,
>> -			};
>> -			INIT_LIST_HEAD(&cc.migratepages);
>>
>>   			if (head_pfn + nr_pages < boundary_pfn) {
>> -				pfn += nr_pages;
>> +				pfn = head_pfn + nr_pages;
>>   				continue;
>>   			}
>> -
>> -			ret = __alloc_contig_migrate_range(&cc, head_pfn,
>> -						head_pfn + nr_pages);
>> -
>> -			if (ret)
>> -				goto failed;
>> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>   			/*
>> -			 * reset pfn, let the free page handling code above
>> -			 * split the free page to the right migratetype list.
>> -			 *
>> -			 * head_pfn is not used here as a hugetlb page order
>> -			 * can be bigger than MAX_ORDER-1, but after it is
>> -			 * freed, the free page order is not. Use pfn within
>> -			 * the range to find the head of the free page and
>> -			 * reset order to 0 if a hugetlb page with
>> -			 * >MAX_ORDER-1 order is encountered.
>> +			 * hugetlb, lru compound (THP), and movable compound pages
>> +			 * can be migrated. Otherwise, fail the isolation.
>>   			 */
>> -			if (order > MAX_ORDER-1)
>> +			if (PageHuge(page) || PageLRU(page) || __PageMovable(page)) {
>> +				int order;
>> +				unsigned long outer_pfn;
>> +				int ret;
>> +				struct compact_control cc = {
>> +					.nr_migratepages = 0,
>> +					.order = -1,
>> +					.zone = page_zone(pfn_to_page(head_pfn)),
>> +					.mode = MIGRATE_SYNC,
>> +					.ignore_skip_hint = true,
>> +					.no_set_skip_hint = true,
>> +					.gfp_mask = gfp_flags,
>> +					.alloc_contig = true,
>> +				};
>> +				INIT_LIST_HEAD(&cc.migratepages);
>> +
>> +				ret = __alloc_contig_migrate_range(&cc, head_pfn,
>> +							head_pfn + nr_pages);
>> +
>> +				if (ret)
>> +					goto failed;
>> +				/*
>> +				 * reset pfn to the head of the free page, so
>> +				 * that the free page handling code above can split
>> +				 * the free page to the right migratetype list.
>> +				 *
>> +				 * head_pfn is not used here as a hugetlb page order
>> +				 * can be bigger than MAX_ORDER-1, but after it is
>> +				 * freed, the free page order is not. Use pfn within
>> +				 * the range to find the head of the free page.
>> +				 */
>>   				order = 0;
>> -			while (!PageBuddy(pfn_to_page(pfn))) {
>> -				order++;
>> -				pfn &= ~0UL << order;
>> -			}
>> -			continue;
>> -#else
>> -			goto failed;
>> +				outer_pfn = pfn;
>> +				while (!PageBuddy(pfn_to_page(outer_pfn))) {
>> +					if (++order >= MAX_ORDER) {
>> +						outer_pfn = pfn;
>> +						break;
>> +					}
>> +					outer_pfn &= ~0UL << order;
>> +				}
>> +				pfn = outer_pfn;
>> +				continue;
>> +			} else
>>   #endif
>> +				goto failed;
>>   		}
>>
>>   		pfn++;

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  reply	other threads:[~2022-05-25 17:53 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-25 14:31 [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment Zi Yan
2022-04-25 14:31 ` [PATCH v11 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Zi Yan
2022-04-25 14:31 ` [PATCH v11 2/6] mm: page_isolation: check specified range for unmovable pages Zi Yan
2022-04-25 14:31 ` [PATCH v11 3/6] mm: make alloc_contig_range work at pageblock granularity Zi Yan
2022-04-29 13:54   ` Zi Yan
2022-05-24 19:00     ` Zi Yan
2022-05-25 17:41     ` Doug Berger
2022-05-25 17:53       ` Zi Yan [this message]
2022-05-25 21:03         ` Doug Berger
2022-05-25 21:11           ` Zi Yan
2022-05-26 17:34             ` Zi Yan
2022-05-26 19:46               ` Doug Berger
2022-04-25 14:31 ` [PATCH v11 4/6] mm: page_isolation: enable arbitrary range page isolation Zi Yan
2022-05-24 19:02   ` Zi Yan
2022-04-25 14:31 ` [PATCH v11 5/6] mm: cma: use pageblock_order as the single alignment Zi Yan
2022-04-25 14:31 ` [PATCH v11 6/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size Zi Yan
2022-04-26 20:18 ` [PATCH v11 0/6] Use pageblock_order for cma and alloc_contig_range alignment Qian Cai
2022-04-26 20:26   ` Zi Yan
2022-04-26 21:08     ` Qian Cai
2022-04-26 21:38       ` Zi Yan
2022-04-27 12:41         ` Qian Cai
2022-04-27 13:10         ` Qian Cai
2022-04-27 13:27         ` Qian Cai
2022-04-27 13:30           ` Zi Yan
2022-04-27 21:04             ` Zi Yan
2022-04-28 12:33               ` Qian Cai
2022-04-28 12:39                 ` Zi Yan
2022-04-28 16:19                   ` Qian Cai
2022-04-29 13:38                     ` Zi Yan
2022-05-19 20:57                   ` Qian Cai
2022-05-19 21:35                     ` Zi Yan
2022-05-19 23:24                       ` Zi Yan
2022-05-20 11:30                       ` Qian Cai
2022-05-20 13:43                         ` Zi Yan
2022-05-20 14:13                           ` Zi Yan
2022-05-20 19:41                             ` Qian Cai
2022-05-20 21:56                               ` Zi Yan
2022-05-20 23:41                                 ` Qian Cai
2022-05-22 16:54                                   ` Zi Yan
2022-05-22 19:33                                     ` Zi Yan
2022-05-24 16:59                                     ` Qian Cai
2022-05-10  1:03 ` Andrew Morton
2022-05-10  1:03   ` Andrew Morton
2022-05-10  1:07   ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F80CEC0E-0EA8-4210-8730-57D4D0CF0B23@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=opendmb@gmail.com \
    --cc=osalvador@suse.de \
    --cc=quic_qiancai@quicinc.com \
    --cc=renzhengeek@gmail.com \
    --cc=rppt@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.