From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D30FAC433F5
	for <linux-mm@archiver.kernel.org>; Wed, 25 May 2022 17:41:05 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 59F798D0005; Wed, 25 May 2022 13:41:05 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 54EF88D0002; Wed, 25 May 2022 13:41:05 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3E9AD8D0005; Wed, 25 May 2022 13:41:05 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 2DC1C8D0002
	for <linux-mm@kvack.org>; Wed, 25 May 2022 13:41:05 -0400 (EDT)
Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay11.hostedemail.com (Postfix) with ESMTP id F03C4812F2
	for <linux-mm@kvack.org>; Wed, 25 May 2022 17:41:04 +0000 (UTC)
X-FDA: 79504981248.25.45DE994
Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181])
	by imf15.hostedemail.com (Postfix) with ESMTP id AFCC7A0002
	for <linux-mm@kvack.org>; Wed, 25 May 2022 17:40:44 +0000 (UTC)
Received: by mail-pl1-f181.google.com with SMTP id s14so19168450plk.8
        for <linux-mm@kvack.org>; Wed, 25 May 2022 10:41:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=message-id:date:mime-version:user-agent:subject:content-language:to
         :cc:references:from:in-reply-to:content-transfer-encoding;
        bh=94skciStQ5ymW9DUAo7HpPJI985hblEZnnynHghreMA=;
        b=HwKasncUXQo45lRkquuWZgDLaXjxuojBMk3sjNbUe/b+TEeKz1HBhOsqesRecSEIP+
         4XArGdiBUrQt+sy4KWK/JvF5A9phSs+7wOh+ZKjygNkpsAV/Bz+hApvxLncr2kdF/TDg
         RuY2/aWScvvzNlvXDYbr9S7FoP7E7IP8lnixCXSAfcibMKwq+wF8jYDwNQPGkEvoh5AI
         VQU0XjkOBFLsPeysNzwyaT+zpn4sktL0nS+sDfFJx+t93SIKc1l/i1tHrBlv71ENb4Ez
         fyl450S4EFgQuPHa42rlB0vaRxkD3bduDHBG7q+K0qXdu4+HmFg1ZR1dvafxCKw6NRx6
         88Mw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
         :content-language:to:cc:references:from:in-reply-to
         :content-transfer-encoding;
        bh=94skciStQ5ymW9DUAo7HpPJI985hblEZnnynHghreMA=;
        b=5Nr7SFOlEXpuX/Dmy3LU8B9mZkQxlle7WmoU5T/iBHD6560zi9cLxqboUlSiSgY9pv
         Q1fqAWz6B9EtF2bZXhUSWptB+JXZhaXHZEp3QOcOnc+wM4BndcfNuPnmMr+lEx45VRU+
         a9FHprqtp/Dn0D8Q8prgOz5sf4IORHy1XLyArWeQq2rVYv1Bg6k2zeUu58LpI1qVS3iF
         j4ei4+XAV2dqDf93I8HxjfLYh0z9WNEdgioCynOpYdKduDcKlOpixCSTXkCpUa4aYkA3
         LzU6Yp0tfFv8gUx+xZS66qOPCTCzA+d6LVMlVqdKrXsFBvwKt7dkNt3r3G2fNBJPn1MJ
         hR4A==
X-Gm-Message-State: AOAM531U57AyVZaKQBvq7Ec6fQLfflxRBTcU2d//oHhkwEjQfXn8q0Pa
	YbaME+uNwHTTuB/iaSoKHa4=
X-Google-Smtp-Source: ABdhPJwHT+IApPaiqv9j85IpfV9bU4nn1vJ5a3jjnn/B38zsGdMxpO61uXzL4Fy1nPw9wuzw7KWQaw==
X-Received: by 2002:a17:90b:4a4a:b0:1df:a250:e583 with SMTP id lb10-20020a17090b4a4a00b001dfa250e583mr11521322pjb.172.1653500463109;
        Wed, 25 May 2022 10:41:03 -0700 (PDT)
Received: from [192.168.1.50] (ip70-191-40-110.oc.oc.cox.net. [70.191.40.110])
        by smtp.gmail.com with ESMTPSA id t12-20020a170902e84c00b0015e8d4eb244sm9292455plg.142.2022.05.25.10.41.01
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 25 May 2022 10:41:02 -0700 (PDT)
Message-ID: <180aaa57-28d8-30f0-e843-ea52e3a180a8@gmail.com>
Date: Wed, 25 May 2022 10:41:00 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.9.0
Subject: Re: [PATCH v11 3/6] mm: make alloc_contig_range work at pageblock
 granularity
Content-Language: en-US
To: Zi Yan <ziy@nvidia.com>, David Hildenbrand <david@redhat.com>,
 linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org,
 Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@techsingularity.net>,
 Eric Ren <renzhengeek@gmail.com>, Mike Rapoport <rppt@kernel.org>,
 Oscar Salvador <osalvador@suse.de>,
 Christophe Leroy <christophe.leroy@csgroup.eu>,
 Andrew Morton <akpm@linux-foundation.org>, kernel test robot
 <lkp@intel.com>, Qian Cai <quic_qiancai@quicinc.com>
References: <20220425143118.2850746-1-zi.yan@sent.com>
 <20220425143118.2850746-4-zi.yan@sent.com>
 <23A7297E-6C84-4138-A9FE-3598234004E6@nvidia.com>
From: Doug Berger <opendmb@gmail.com>
In-Reply-To: <23A7297E-6C84-4138-A9FE-3598234004E6@nvidia.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: AFCC7A0002
X-Stat-Signature: rabrief7pgoioqijmatyb7jzhww8o8tj
X-Rspam-User: 
Authentication-Results: imf15.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=HwKasncU;
	spf=pass (imf15.hostedemail.com: domain of opendmb@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=opendmb@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
X-HE-Tag: 1653500444-109487
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

I am seeing some free memory accounting problems with linux-next that I 
have bisected to this commit (i.e. b2c9e2fbba32 ("mm: make 
alloc_contig_range work at pageblock granularity").

On an arm64 SMP platform with 4GB total memory and the default 16MB 
default CMA pool, I am seeing the following after boot with a sysrq Show 
Memory (e.g. 'echo m > /proc/sysrq-trigger'):

[   16.015906] sysrq: Show Memory
[   16.019039] Mem-Info:
[   16.021348] active_anon:14604 inactive_anon:919 isolated_anon:0
[   16.021348]  active_file:0 inactive_file:0 isolated_file:0
[   16.021348]  unevictable:0 dirty:0 writeback:0
[   16.021348]  slab_reclaimable:3662 slab_unreclaimable:3333
[   16.021348]  mapped:928 shmem:15146 pagetables:63 bounce:0
[   16.021348]  kernel_misc_reclaimable:0
[   16.021348]  free:976766 free_pcp:991 free_cma:7017
[   16.056937] Node 0 active_anon:58416kB inactive_anon:3676kB 
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB mapped:3712kB dirty:0kB writeback:0kB shmem:60584kB 
writeback_tmp:0kB kernel_stack:1200kB pagetables:252kB all_unreclaimable? no
[   16.081526] DMA free:3041036kB boost:0kB min:6036kB low:9044kB 
high:12052kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB 
present:3145728kB managed:3029992kB mlocked:0kB bounce:0kB 
free_pcp:636kB local_pcp:0kB free_cma:28068kB
[   16.108650] lowmem_reserve[]: 0 0 944 944
[   16.112746] Normal free:866028kB boost:0kB min:1936kB low:2900kB 
high:3864kB reserved_highatomic:0KB active_anon:58416kB 
inactive_anon:3676kB active_file:0kB inactive_file:0kB unevictable:0kB 
writepending:0kB present:1048576kB managed:967352kB mlocked:0kB 
bounce:0kB free_pcp:3328kB local_pcp:864kB free_cma:0kB
[   16.140393] lowmem_reserve[]: 0 0 0 0
[   16.144133] DMA: 7*4kB (UMC) 4*8kB (M) 3*16kB (M) 3*32kB (MC) 5*64kB 
(M) 4*128kB (MC) 5*256kB (UMC) 7*512kB (UM) 5*1024kB (UM) 9*2048kB (UMC) 
732*4096kB (MC) = 3027724kB
[   16.159609] Normal: 149*4kB (UM) 95*8kB (UME) 26*16kB (UME) 8*32kB 
(ME) 2*64kB (UE) 1*128kB (M) 2*256kB (ME) 2*512kB (ME) 2*1024kB (UM) 
0*2048kB 210*4096kB (M) = 866028kB
[   16.175165] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=1048576kB
[   16.183937] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=32768kB
[   16.192533] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB
[   16.201040] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=64kB
[   16.209374] 15146 total pagecache pages
[   16.213246] 0 pages in swap cache
[   16.216595] Swap cache stats: add 0, delete 0, find 0/0
[   16.221867] Free swap  = 0kB
[   16.224780] Total swap = 0kB
[   16.227693] 1048576 pages RAM
[   16.230694] 0 pages HighMem/MovableOnly
[   16.234564] 49240 pages reserved
[   16.237825] 4096 pages cma reserved

Some anomolies in the above are:
free_cma:7017 with only 4096 pages cma reserved
DMA free:3041036kB with only managed:3029992kB

I'm not sure what is going on here, but I am suspicious of 
split_free_page() since del_page_from_free_list doesn't affect 
migrate_type accounting, but __free_one_page() can.
Also PageBuddy(page) is being checked without zone->lock in 
isolate_single_pageblock().

Please investigate this as well.

Thanks!
     Doug

On 4/29/2022 6:54 AM, Zi Yan wrote:
> On 25 Apr 2022, at 10:31, Zi Yan wrote:
> 
>> From: Zi Yan <ziy@nvidia.com>
>>
>> alloc_contig_range() worked at MAX_ORDER_NR_PAGES granularity to avoid
>> merging pageblocks with different migratetypes. It might unnecessarily
>> convert extra pageblocks at the beginning and at the end of the range.
>> Change alloc_contig_range() to work at pageblock granularity.
>>
>> Special handling is needed for free pages and in-use pages across the
>> boundaries of the range specified by alloc_contig_range(). Because these
>> partially isolated pages causes free page accounting issues. The free
>> pages will be split and freed into separate migratetype lists; the
>> in-use pages will be migrated then the freed pages will be handled in
>> the aforementioned way.
>>
>> Reported-by: kernel test robot <lkp@intel.com>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>   include/linux/page-isolation.h |   4 +-
>>   mm/internal.h                  |   6 ++
>>   mm/memory_hotplug.c            |   3 +-
>>   mm/page_alloc.c                |  54 ++++++++--
>>   mm/page_isolation.c            | 184 ++++++++++++++++++++++++++++++++-
>>   5 files changed, 233 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>> index e14eddf6741a..5456b7be38ae 100644
>> --- a/include/linux/page-isolation.h
>> +++ b/include/linux/page-isolation.h
>> @@ -42,7 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
>>    */
>>   int
>>   start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>> -			 unsigned migratetype, int flags);
>> +			 int migratetype, int flags, gfp_t gfp_flags);
>>
>>   /*
>>    * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
>> @@ -50,7 +50,7 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>    */
>>   void
>>   undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>> -			unsigned migratetype);
>> +			int migratetype);
>>
>>   /*
>>    * Test all pages in [start_pfn, end_pfn) are isolated or not.
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 919fa07e1031..0667abd57634 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -359,6 +359,9 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align,
>>   			  phys_addr_t min_addr,
>>   			  int nid, bool exact_nid);
>>
>> +void split_free_page(struct page *free_page,
>> +				int order, unsigned long split_pfn_offset);
>> +
>>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>
>>   /*
>> @@ -422,6 +425,9 @@ isolate_freepages_range(struct compact_control *cc,
>>   int
>>   isolate_migratepages_range(struct compact_control *cc,
>>   			   unsigned long low_pfn, unsigned long end_pfn);
>> +
>> +int __alloc_contig_migrate_range(struct compact_control *cc,
>> +					unsigned long start, unsigned long end);
>>   #endif
>>   int find_suitable_fallback(struct free_area *area, unsigned int order,
>>   			int migratetype, bool only_stealable, bool *can_steal);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 4c6065e5d274..9f8ae4cb77ee 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1845,7 +1845,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>>   	/* set above range as isolated */
>>   	ret = start_isolate_page_range(start_pfn, end_pfn,
>>   				       MIGRATE_MOVABLE,
>> -				       MEMORY_OFFLINE | REPORT_FAILURE);
>> +				       MEMORY_OFFLINE | REPORT_FAILURE,
>> +				       GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
>>   	if (ret) {
>>   		reason = "failure to isolate range";
>>   		goto failed_removal_pcplists_disabled;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ce23ac8ad085..70ddd9a0bcf3 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1094,6 +1094,43 @@ static inline void __free_one_page(struct page *page,
>>   		page_reporting_notify_free(order);
>>   }
>>
>> +/**
>> + * split_free_page() -- split a free page at split_pfn_offset
>> + * @free_page:		the original free page
>> + * @order:		the order of the page
>> + * @split_pfn_offset:	split offset within the page
>> + *
>> + * It is used when the free page crosses two pageblocks with different migratetypes
>> + * at split_pfn_offset within the page. The split free page will be put into
>> + * separate migratetype lists afterwards. Otherwise, the function achieves
>> + * nothing.
>> + */
>> +void split_free_page(struct page *free_page,
>> +				int order, unsigned long split_pfn_offset)
>> +{
>> +	struct zone *zone = page_zone(free_page);
>> +	unsigned long free_page_pfn = page_to_pfn(free_page);
>> +	unsigned long pfn;
>> +	unsigned long flags;
>> +	int free_page_order;
>> +
>> +	spin_lock_irqsave(&zone->lock, flags);
>> +	del_page_from_free_list(free_page, zone, order);
>> +	for (pfn = free_page_pfn;
>> +	     pfn < free_page_pfn + (1UL << order);) {
>> +		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>> +
>> +		free_page_order = ffs(split_pfn_offset) - 1;
>> +		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
>> +				mt, FPI_NONE);
>> +		pfn += 1UL << free_page_order;
>> +		split_pfn_offset -= (1UL << free_page_order);
>> +		/* we have done the first part, now switch to second part */
>> +		if (split_pfn_offset == 0)
>> +			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
>> +	}
>> +	spin_unlock_irqrestore(&zone->lock, flags);
>> +}
>>   /*
>>    * A bad page could be due to a number of fields. Instead of multiple branches,
>>    * try and check multiple fields with one check. The caller must do a detailed
>> @@ -8919,7 +8956,7 @@ static inline void alloc_contig_dump_pages(struct list_head *page_list)
>>   #endif
>>
>>   /* [start, end) must belong to a single zone. */
>> -static int __alloc_contig_migrate_range(struct compact_control *cc,
>> +int __alloc_contig_migrate_range(struct compact_control *cc,
>>   					unsigned long start, unsigned long end)
>>   {
>>   	/* This function is based on compact_zone() from compaction.c. */
>> @@ -9002,7 +9039,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>   		       unsigned migratetype, gfp_t gfp_mask)
>>   {
>>   	unsigned long outer_start, outer_end;
>> -	unsigned int order;
>> +	int order;
>>   	int ret = 0;
>>
>>   	struct compact_control cc = {
>> @@ -9021,14 +9058,11 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>   	 * What we do here is we mark all pageblocks in range as
>>   	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
>>   	 * have different sizes, and due to the way page allocator
>> -	 * work, we align the range to biggest of the two pages so
>> -	 * that page allocator won't try to merge buddies from
>> -	 * different pageblocks and change MIGRATE_ISOLATE to some
>> -	 * other migration type.
>> +	 * work, start_isolate_page_range() has special handlings for this.
>>   	 *
>>   	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
>>   	 * migrate the pages from an unaligned range (ie. pages that
>> -	 * we are interested in).  This will put all the pages in
>> +	 * we are interested in). This will put all the pages in
>>   	 * range back to page allocator as MIGRATE_ISOLATE.
>>   	 *
>>   	 * When this is done, we take the pages in range from page
>> @@ -9042,9 +9076,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>   	 */
>>
>>   	ret = start_isolate_page_range(pfn_max_align_down(start),
>> -				       pfn_max_align_up(end), migratetype, 0);
>> +				pfn_max_align_up(end), migratetype, 0, gfp_mask);
>>   	if (ret)
>> -		return ret;
>> +		goto done;
>>
>>   	drain_all_pages(cc.zone);
>>
>> @@ -9064,7 +9098,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>   	ret = 0;
>>
>>   	/*
>> -	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
>> +	 * Pages from [start, end) are within a pageblock_nr_pages
>>   	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
>>   	 * more, all pages in [start, end) are free in page allocator.
>>   	 * What we are going to do is to allocate all pages from
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index c2f7a8bb634d..94b3467e5ba2 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -203,7 +203,7 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>>   	return -EBUSY;
>>   }
>>
>> -static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>> +static void unset_migratetype_isolate(struct page *page, int migratetype)
>>   {
>>   	struct zone *zone;
>>   	unsigned long flags, nr_pages;
>> @@ -279,6 +279,157 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>   	return NULL;
>>   }
>>
>> +/**
>> + * isolate_single_pageblock() -- tries to isolate a pageblock that might be
>> + * within a free or in-use page.
>> + * @boundary_pfn:		pageblock-aligned pfn that a page might cross
>> + * @gfp_flags:			GFP flags used for migrating pages
>> + * @isolate_before:	isolate the pageblock before the boundary_pfn
>> + *
>> + * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
>> + * pageblock. When not all pageblocks within a page are isolated at the same
>> + * time, free page accounting can go wrong. For example, in the case of
>> + * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
>> + * [         MAX_ORDER-1         ]
>> + * [  pageblock0  |  pageblock1  ]
>> + * When either pageblock is isolated, if it is a free page, the page is not
>> + * split into separate migratetype lists, which is supposed to; if it is an
>> + * in-use page and freed later, __free_one_page() does not split the free page
>> + * either. The function handles this by splitting the free page or migrating
>> + * the in-use page then splitting the free page.
>> + */
>> +static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>> +			bool isolate_before)
>> +{
>> +	unsigned char saved_mt;
>> +	unsigned long start_pfn;
>> +	unsigned long isolate_pageblock;
>> +	unsigned long pfn;
>> +	struct zone *zone;
>> +
>> +	VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
>> +
>> +	if (isolate_before)
>> +		isolate_pageblock = boundary_pfn - pageblock_nr_pages;
>> +	else
>> +		isolate_pageblock = boundary_pfn;
>> +
>> +	/*
>> +	 * scan at the beginning of MAX_ORDER_NR_PAGES aligned range to avoid
>> +	 * only isolating a subset of pageblocks from a bigger than pageblock
>> +	 * free or in-use page. Also make sure all to-be-isolated pageblocks
>> +	 * are within the same zone.
>> +	 */
>> +	zone  = page_zone(pfn_to_page(isolate_pageblock));
>> +	start_pfn  = max(ALIGN_DOWN(isolate_pageblock, MAX_ORDER_NR_PAGES),
>> +				      zone->zone_start_pfn);
>> +
>> +	saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock));
>> +	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), MIGRATE_ISOLATE);
>> +
>> +	/*
>> +	 * Bail out early when the to-be-isolated pageblock does not form
>> +	 * a free or in-use page across boundary_pfn:
>> +	 *
>> +	 * 1. isolate before boundary_pfn: the page after is not online
>> +	 * 2. isolate after boundary_pfn: the page before is not online
>> +	 *
>> +	 * This also ensures correctness. Without it, when isolate after
>> +	 * boundary_pfn and [start_pfn, boundary_pfn) are not online,
>> +	 * __first_valid_page() will return unexpected NULL in the for loop
>> +	 * below.
>> +	 */
>> +	if (isolate_before) {
>> +		if (!pfn_to_online_page(boundary_pfn))
>> +			return 0;
>> +	} else {
>> +		if (!pfn_to_online_page(boundary_pfn - 1))
>> +			return 0;
>> +	}
>> +
>> +	for (pfn = start_pfn; pfn < boundary_pfn;) {
>> +		struct page *page = __first_valid_page(pfn, boundary_pfn - pfn);
>> +
>> +		VM_BUG_ON(!page);
>> +		pfn = page_to_pfn(page);
>> +		/*
>> +		 * start_pfn is MAX_ORDER_NR_PAGES aligned, if there is any
>> +		 * free pages in [start_pfn, boundary_pfn), its head page will
>> +		 * always be in the range.
>> +		 */
>> +		if (PageBuddy(page)) {
>> +			int order = buddy_order(page);
>> +
>> +			if (pfn + (1UL << order) > boundary_pfn)
>> +				split_free_page(page, order, boundary_pfn - pfn);
>> +			pfn += (1UL << order);
>> +			continue;
>> +		}
>> +		/*
>> +		 * migrate compound pages then let the free page handling code
>> +		 * above do the rest. If migration is not enabled, just fail.
>> +		 */
>> +		if (PageHuge(page) || PageTransCompound(page)) {
>> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>> +			unsigned long nr_pages = compound_nr(page);
>> +			int order = compound_order(page);
>> +			struct page *head = compound_head(page);
>> +			unsigned long head_pfn = page_to_pfn(head);
>> +			int ret;
>> +			struct compact_control cc = {
>> +				.nr_migratepages = 0,
>> +				.order = -1,
>> +				.zone = page_zone(pfn_to_page(head_pfn)),
>> +				.mode = MIGRATE_SYNC,
>> +				.ignore_skip_hint = true,
>> +				.no_set_skip_hint = true,
>> +				.gfp_mask = gfp_flags,
>> +				.alloc_contig = true,
>> +			};
>> +			INIT_LIST_HEAD(&cc.migratepages);
>> +
>> +			if (head_pfn + nr_pages < boundary_pfn) {
>> +				pfn += nr_pages;
>> +				continue;
>> +			}
>> +
>> +			ret = __alloc_contig_migrate_range(&cc, head_pfn,
>> +						head_pfn + nr_pages);
>> +
>> +			if (ret)
>> +				goto failed;
>> +			/*
>> +			 * reset pfn, let the free page handling code above
>> +			 * split the free page to the right migratetype list.
>> +			 *
>> +			 * head_pfn is not used here as a hugetlb page order
>> +			 * can be bigger than MAX_ORDER-1, but after it is
>> +			 * freed, the free page order is not. Use pfn within
>> +			 * the range to find the head of the free page and
>> +			 * reset order to 0 if a hugetlb page with
>> +			 * >MAX_ORDER-1 order is encountered.
>> +			 */
>> +			if (order > MAX_ORDER-1)
>> +				order = 0;
>> +			while (!PageBuddy(pfn_to_page(pfn))) {
>> +				order++;
>> +				pfn &= ~0UL << order;
>> +			}
>> +			continue;
>> +#else
>> +			goto failed;
>> +#endif
>> +		}
>> +
>> +		pfn++;
>> +	}
>> +	return 0;
>> +failed:
>> +	/* restore the original migratetype */
>> +	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), saved_mt);
>> +	return -EBUSY;
>> +}
>> +
>>   /**
>>    * start_isolate_page_range() - make page-allocation-type of range of pages to
>>    * be MIGRATE_ISOLATE.
>> @@ -293,6 +444,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>    *					 and PageOffline() pages.
>>    *			REPORT_FAILURE - report details about the failure to
>>    *			isolate the range
>> + * @gfp_flags:		GFP flags used for migrating pages that sit across the
>> + *			range boundaries.
>>    *
>>    * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
>>    * the range will never be allocated. Any free pages and pages freed in the
>> @@ -301,6 +454,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>    * pages in the range finally, the caller have to free all pages in the range.
>>    * test_page_isolated() can be used for test it.
>>    *
>> + * The function first tries to isolate the pageblocks at the beginning and end
>> + * of the range, since there might be pages across the range boundaries.
>> + * Afterwards, it isolates the rest of the range.
>> + *
>>    * There is no high level synchronization mechanism that prevents two threads
>>    * from trying to isolate overlapping ranges. If this happens, one thread
>>    * will notice pageblocks in the overlapping range already set to isolate.
>> @@ -321,21 +478,38 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>    * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
>>    */
>>   int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>> -			     unsigned migratetype, int flags)
>> +			     int migratetype, int flags, gfp_t gfp_flags)
>>   {
>>   	unsigned long pfn;
>>   	struct page *page;
>> +	int ret;
>>
>>   	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
>>   	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
>>
>> -	for (pfn = start_pfn;
>> -	     pfn < end_pfn;
>> +	/* isolate [start_pfn, start_pfn + pageblock_nr_pages) pageblock */
>> +	ret = isolate_single_pageblock(start_pfn, gfp_flags, false);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* isolate [end_pfn - pageblock_nr_pages, end_pfn) pageblock */
>> +	ret = isolate_single_pageblock(end_pfn, gfp_flags, true);
>> +	if (ret) {
>> +		unset_migratetype_isolate(pfn_to_page(start_pfn), migratetype);
>> +		return ret;
>> +	}
>> +
>> +	/* skip isolated pageblocks at the beginning and end */
>> +	for (pfn = start_pfn + pageblock_nr_pages;
>> +	     pfn < end_pfn - pageblock_nr_pages;
>>   	     pfn += pageblock_nr_pages) {
>>   		page = __first_valid_page(pfn, pageblock_nr_pages);
>>   		if (page && set_migratetype_isolate(page, migratetype, flags,
>>   					start_pfn, end_pfn)) {
>>   			undo_isolate_page_range(start_pfn, pfn, migratetype);
>> +			unset_migratetype_isolate(
>> +				pfn_to_page(end_pfn - pageblock_nr_pages),
>> +				migratetype);
>>   			return -EBUSY;
>>   		}
>>   	}
>> @@ -346,7 +520,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>    * Make isolated pages available again.
>>    */
>>   void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>> -			    unsigned migratetype)
>> +			    int migratetype)
>>   {
>>   	unsigned long pfn;
>>   	struct page *page;
>> -- 
>> 2.35.1
> 
> Qian hit a bug caused by this series https://lore.kernel.org/linux-mm/20220426201855.GA1014@qian/
> and the fix is:
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 75e454f5cf45..b3f074d1682e 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -367,58 +367,67 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>   		}
>   		/*
>   		 * migrate compound pages then let the free page handling code
> -		 * above do the rest. If migration is not enabled, just fail.
> +		 * above do the rest. If migration is not possible, just fail.
>   		 */
> -		if (PageHuge(page) || PageTransCompound(page)) {
> -#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +		if (PageCompound(page)) {
>   			unsigned long nr_pages = compound_nr(page);
> -			int order = compound_order(page);
>   			struct page *head = compound_head(page);
>   			unsigned long head_pfn = page_to_pfn(head);
> -			int ret;
> -			struct compact_control cc = {
> -				.nr_migratepages = 0,
> -				.order = -1,
> -				.zone = page_zone(pfn_to_page(head_pfn)),
> -				.mode = MIGRATE_SYNC,
> -				.ignore_skip_hint = true,
> -				.no_set_skip_hint = true,
> -				.gfp_mask = gfp_flags,
> -				.alloc_contig = true,
> -			};
> -			INIT_LIST_HEAD(&cc.migratepages);
> 
>   			if (head_pfn + nr_pages < boundary_pfn) {
> -				pfn += nr_pages;
> +				pfn = head_pfn + nr_pages;
>   				continue;
>   			}
> -
> -			ret = __alloc_contig_migrate_range(&cc, head_pfn,
> -						head_pfn + nr_pages);
> -
> -			if (ret)
> -				goto failed;
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>   			/*
> -			 * reset pfn, let the free page handling code above
> -			 * split the free page to the right migratetype list.
> -			 *
> -			 * head_pfn is not used here as a hugetlb page order
> -			 * can be bigger than MAX_ORDER-1, but after it is
> -			 * freed, the free page order is not. Use pfn within
> -			 * the range to find the head of the free page and
> -			 * reset order to 0 if a hugetlb page with
> -			 * >MAX_ORDER-1 order is encountered.
> +			 * hugetlb, lru compound (THP), and movable compound pages
> +			 * can be migrated. Otherwise, fail the isolation.
>   			 */
> -			if (order > MAX_ORDER-1)
> +			if (PageHuge(page) || PageLRU(page) || __PageMovable(page)) {
> +				int order;
> +				unsigned long outer_pfn;
> +				int ret;
> +				struct compact_control cc = {
> +					.nr_migratepages = 0,
> +					.order = -1,
> +					.zone = page_zone(pfn_to_page(head_pfn)),
> +					.mode = MIGRATE_SYNC,
> +					.ignore_skip_hint = true,
> +					.no_set_skip_hint = true,
> +					.gfp_mask = gfp_flags,
> +					.alloc_contig = true,
> +				};
> +				INIT_LIST_HEAD(&cc.migratepages);
> +
> +				ret = __alloc_contig_migrate_range(&cc, head_pfn,
> +							head_pfn + nr_pages);
> +
> +				if (ret)
> +					goto failed;
> +				/*
> +				 * reset pfn to the head of the free page, so
> +				 * that the free page handling code above can split
> +				 * the free page to the right migratetype list.
> +				 *
> +				 * head_pfn is not used here as a hugetlb page order
> +				 * can be bigger than MAX_ORDER-1, but after it is
> +				 * freed, the free page order is not. Use pfn within
> +				 * the range to find the head of the free page.
> +				 */
>   				order = 0;
> -			while (!PageBuddy(pfn_to_page(pfn))) {
> -				order++;
> -				pfn &= ~0UL << order;
> -			}
> -			continue;
> -#else
> -			goto failed;
> +				outer_pfn = pfn;
> +				while (!PageBuddy(pfn_to_page(outer_pfn))) {
> +					if (++order >= MAX_ORDER) {
> +						outer_pfn = pfn;
> +						break;
> +					}
> +					outer_pfn &= ~0UL << order;
> +				}
> +				pfn = outer_pfn;
> +				continue;
> +			} else
>   #endif
> +				goto failed;
>   		}
> 
>   		pfn++;