* [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock()
@ 2022-05-30 11:50 Anshuman Khandual
2022-05-30 13:53 ` Zi Yan
2022-05-31 6:21 ` kernel test robot
0 siblings, 2 replies; 4+ messages in thread
From: Anshuman Khandual @ 2022-05-30 11:50 UTC (permalink / raw)
To: linux-mm; +Cc: Anshuman Khandual, Andrew Morton, Zi Yan, linux-kernel
HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform
is getting stuck in isolate_single_pageblock(), because of an infinite loop
Because head_pfn always evaluate the same, so does pfn, and the outer loop
never exits. Dropping the relevant code block, which seems redundant, makes
the problem go away.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
I am not sure about this fix, and also did not find much time today to
debug any further. There are much code changes around this function in
recent days. This problem is present on latest mainline kernel.
- Anshuman
mm/page_isolation.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6021f8444b5a..b0922fee75c1 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
struct page *head = compound_head(page);
unsigned long head_pfn = page_to_pfn(head);
- if (head_pfn + nr_pages <= boundary_pfn) {
- pfn = head_pfn + nr_pages;
- continue;
- }
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
/*
* hugetlb, lru compound (THP), and movable compound pages
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock()
2022-05-30 11:50 [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock() Anshuman Khandual
@ 2022-05-30 13:53 ` Zi Yan
2022-05-31 2:22 ` Anshuman Khandual
2022-05-31 6:21 ` kernel test robot
1 sibling, 1 reply; 4+ messages in thread
From: Zi Yan @ 2022-05-30 13:53 UTC (permalink / raw)
To: Anshuman Khandual; +Cc: linux-mm, Andrew Morton, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]
On 30 May 2022, at 7:50, Anshuman Khandual wrote:
> HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform
> is getting stuck in isolate_single_pageblock(), because of an infinite loop
> Because head_pfn always evaluate the same, so does pfn, and the outer loop
> never exits. Dropping the relevant code block, which seems redundant, makes
> the problem go away.
Thanks for the report.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> I am not sure about this fix, and also did not find much time today to
> debug any further. There are much code changes around this function in
> recent days. This problem is present on latest mainline kernel.
>
> - Anshuman
>
> mm/page_isolation.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 6021f8444b5a..b0922fee75c1 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
> struct page *head = compound_head(page);
> unsigned long head_pfn = page_to_pfn(head);
>
> - if (head_pfn + nr_pages <= boundary_pfn) {
> - pfn = head_pfn + nr_pages;
> - continue;
> - }
> #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> /*
> * hugetlb, lru compound (THP), and movable compound pages
> --
> 2.20.1
Can you try the patch below to see if it fixes the issue? Thanks.
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6021f8444b5a..d200d41ad0d3 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -385,9 +385,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
* above do the rest. If migration is not possible, just fail.
*/
if (PageCompound(page)) {
- unsigned long nr_pages = compound_nr(page);
struct page *head = compound_head(page);
unsigned long head_pfn = page_to_pfn(head);
+ unsigned long nr_pages = compound_nr(head);
if (head_pfn + nr_pages <= boundary_pfn) {
pfn = head_pfn + nr_pages;
--
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock()
2022-05-30 13:53 ` Zi Yan
@ 2022-05-31 2:22 ` Anshuman Khandual
0 siblings, 0 replies; 4+ messages in thread
From: Anshuman Khandual @ 2022-05-31 2:22 UTC (permalink / raw)
To: Zi Yan; +Cc: linux-mm, Andrew Morton, linux-kernel
On 5/30/22 19:23, Zi Yan wrote:
> On 30 May 2022, at 7:50, Anshuman Khandual wrote:
>
>> HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform
>> is getting stuck in isolate_single_pageblock(), because of an infinite loop
>> Because head_pfn always evaluate the same, so does pfn, and the outer loop
>> never exits. Dropping the relevant code block, which seems redundant, makes
>> the problem go away.
>
> Thanks for the report.
>
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> I am not sure about this fix, and also did not find much time today to
>> debug any further. There are much code changes around this function in
>> recent days. This problem is present on latest mainline kernel.
>>
>> - Anshuman
>>
>> mm/page_isolation.c | 4 ----
>> 1 file changed, 4 deletions(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 6021f8444b5a..b0922fee75c1 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>> struct page *head = compound_head(page);
>> unsigned long head_pfn = page_to_pfn(head);
>>
>> - if (head_pfn + nr_pages <= boundary_pfn) {
>> - pfn = head_pfn + nr_pages;
>> - continue;
>> - }
>> #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>> /*
>> * hugetlb, lru compound (THP), and movable compound pages
>> --
>> 2.20.1
>
> Can you try the patch below to see if it fixes the issue? Thanks.
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 6021f8444b5a..d200d41ad0d3 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -385,9 +385,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
> * above do the rest. If migration is not possible, just fail.
> */
> if (PageCompound(page)) {
> - unsigned long nr_pages = compound_nr(page);
> struct page *head = compound_head(page);
> unsigned long head_pfn = page_to_pfn(head);
> + unsigned long nr_pages = compound_nr(head);
>
> if (head_pfn + nr_pages <= boundary_pfn) {
> pfn = head_pfn + nr_pages;
>
>
Yes, this does solve the problem. I guess nr_pages should have been derived
from the compound head itself for it be meaningful (i.e > 1). I assume you
will send a fix patch with appropriate write up that describes this problem.
- Anshuman
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock()
2022-05-30 11:50 [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock() Anshuman Khandual
2022-05-30 13:53 ` Zi Yan
@ 2022-05-31 6:21 ` kernel test robot
1 sibling, 0 replies; 4+ messages in thread
From: kernel test robot @ 2022-05-31 6:21 UTC (permalink / raw)
To: Anshuman Khandual; +Cc: llvm, kbuild-all
Hi Anshuman,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/mm-page_isolation-Fix-an-infinite-loop-in-isolate_single_pageblock/20220530-195220
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
config: x86_64-randconfig-a016 (https://download.01.org/0day-ci/archive/20220531/202205311448.NLkd29GM-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project c825abd6b0198fb088d9752f556a70705bc99dfd)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/13632aee393de39b7f25b163df4d9fd6a29e4061
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Anshuman-Khandual/mm-page_isolation-Fix-an-infinite-loop-in-isolate_single_pageblock/20220530-195220
git checkout 13632aee393de39b7f25b163df4d9fd6a29e4061
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> mm/page_isolation.c:388:18: warning: unused variable 'nr_pages' [-Wunused-variable]
unsigned long nr_pages = compound_nr(page);
^
>> mm/page_isolation.c:390:18: warning: unused variable 'head_pfn' [-Wunused-variable]
unsigned long head_pfn = page_to_pfn(head);
^
2 warnings generated.
vim +/nr_pages +388 mm/page_isolation.c
a5d76b54a3f3a4 KAMEZAWA Hiroyuki 2007-10-16 281
b2c9e2fbba3253 Zi Yan 2022-05-12 282 /**
b2c9e2fbba3253 Zi Yan 2022-05-12 283 * isolate_single_pageblock() -- tries to isolate a pageblock that might be
b2c9e2fbba3253 Zi Yan 2022-05-12 284 * within a free or in-use page.
b2c9e2fbba3253 Zi Yan 2022-05-12 285 * @boundary_pfn: pageblock-aligned pfn that a page might cross
88ee134320b831 Zi Yan 2022-05-24 286 * @flags: isolation flags
b2c9e2fbba3253 Zi Yan 2022-05-12 287 * @gfp_flags: GFP flags used for migrating pages
b2c9e2fbba3253 Zi Yan 2022-05-12 288 * @isolate_before: isolate the pageblock before the boundary_pfn
b2c9e2fbba3253 Zi Yan 2022-05-12 289 *
b2c9e2fbba3253 Zi Yan 2022-05-12 290 * Free and in-use pages can be as big as MAX_ORDER-1 and contain more than one
b2c9e2fbba3253 Zi Yan 2022-05-12 291 * pageblock. When not all pageblocks within a page are isolated at the same
b2c9e2fbba3253 Zi Yan 2022-05-12 292 * time, free page accounting can go wrong. For example, in the case of
b2c9e2fbba3253 Zi Yan 2022-05-12 293 * MAX_ORDER-1 = pageblock_order + 1, a MAX_ORDER-1 page has two pagelbocks.
b2c9e2fbba3253 Zi Yan 2022-05-12 294 * [ MAX_ORDER-1 ]
b2c9e2fbba3253 Zi Yan 2022-05-12 295 * [ pageblock0 | pageblock1 ]
b2c9e2fbba3253 Zi Yan 2022-05-12 296 * When either pageblock is isolated, if it is a free page, the page is not
b2c9e2fbba3253 Zi Yan 2022-05-12 297 * split into separate migratetype lists, which is supposed to; if it is an
b2c9e2fbba3253 Zi Yan 2022-05-12 298 * in-use page and freed later, __free_one_page() does not split the free page
b2c9e2fbba3253 Zi Yan 2022-05-12 299 * either. The function handles this by splitting the free page or migrating
b2c9e2fbba3253 Zi Yan 2022-05-12 300 * the in-use page then splitting the free page.
b2c9e2fbba3253 Zi Yan 2022-05-12 301 */
88ee134320b831 Zi Yan 2022-05-24 302 static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
9b209e557d698f Zi Yan 2022-05-26 303 gfp_t gfp_flags, bool isolate_before, bool skip_isolation)
b2c9e2fbba3253 Zi Yan 2022-05-12 304 {
b2c9e2fbba3253 Zi Yan 2022-05-12 305 unsigned char saved_mt;
b2c9e2fbba3253 Zi Yan 2022-05-12 306 unsigned long start_pfn;
b2c9e2fbba3253 Zi Yan 2022-05-12 307 unsigned long isolate_pageblock;
b2c9e2fbba3253 Zi Yan 2022-05-12 308 unsigned long pfn;
b2c9e2fbba3253 Zi Yan 2022-05-12 309 struct zone *zone;
88ee134320b831 Zi Yan 2022-05-24 310 int ret;
b2c9e2fbba3253 Zi Yan 2022-05-12 311
b2c9e2fbba3253 Zi Yan 2022-05-12 312 VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
b2c9e2fbba3253 Zi Yan 2022-05-12 313
b2c9e2fbba3253 Zi Yan 2022-05-12 314 if (isolate_before)
b2c9e2fbba3253 Zi Yan 2022-05-12 315 isolate_pageblock = boundary_pfn - pageblock_nr_pages;
b2c9e2fbba3253 Zi Yan 2022-05-12 316 else
b2c9e2fbba3253 Zi Yan 2022-05-12 317 isolate_pageblock = boundary_pfn;
b2c9e2fbba3253 Zi Yan 2022-05-12 318
b2c9e2fbba3253 Zi Yan 2022-05-12 319 /*
b2c9e2fbba3253 Zi Yan 2022-05-12 320 * scan at the beginning of MAX_ORDER_NR_PAGES aligned range to avoid
b2c9e2fbba3253 Zi Yan 2022-05-12 321 * only isolating a subset of pageblocks from a bigger than pageblock
b2c9e2fbba3253 Zi Yan 2022-05-12 322 * free or in-use page. Also make sure all to-be-isolated pageblocks
b2c9e2fbba3253 Zi Yan 2022-05-12 323 * are within the same zone.
b2c9e2fbba3253 Zi Yan 2022-05-12 324 */
b2c9e2fbba3253 Zi Yan 2022-05-12 325 zone = page_zone(pfn_to_page(isolate_pageblock));
b2c9e2fbba3253 Zi Yan 2022-05-12 326 start_pfn = max(ALIGN_DOWN(isolate_pageblock, MAX_ORDER_NR_PAGES),
b2c9e2fbba3253 Zi Yan 2022-05-12 327 zone->zone_start_pfn);
b2c9e2fbba3253 Zi Yan 2022-05-12 328
b2c9e2fbba3253 Zi Yan 2022-05-12 329 saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock));
9b209e557d698f Zi Yan 2022-05-26 330
9b209e557d698f Zi Yan 2022-05-26 331 if (skip_isolation)
9b209e557d698f Zi Yan 2022-05-26 332 VM_BUG_ON(!is_migrate_isolate(saved_mt));
9b209e557d698f Zi Yan 2022-05-26 333 else {
88ee134320b831 Zi Yan 2022-05-24 334 ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags,
88ee134320b831 Zi Yan 2022-05-24 335 isolate_pageblock, isolate_pageblock + pageblock_nr_pages);
88ee134320b831 Zi Yan 2022-05-24 336
88ee134320b831 Zi Yan 2022-05-24 337 if (ret)
88ee134320b831 Zi Yan 2022-05-24 338 return ret;
9b209e557d698f Zi Yan 2022-05-26 339 }
b2c9e2fbba3253 Zi Yan 2022-05-12 340
b2c9e2fbba3253 Zi Yan 2022-05-12 341 /*
b2c9e2fbba3253 Zi Yan 2022-05-12 342 * Bail out early when the to-be-isolated pageblock does not form
b2c9e2fbba3253 Zi Yan 2022-05-12 343 * a free or in-use page across boundary_pfn:
b2c9e2fbba3253 Zi Yan 2022-05-12 344 *
b2c9e2fbba3253 Zi Yan 2022-05-12 345 * 1. isolate before boundary_pfn: the page after is not online
b2c9e2fbba3253 Zi Yan 2022-05-12 346 * 2. isolate after boundary_pfn: the page before is not online
b2c9e2fbba3253 Zi Yan 2022-05-12 347 *
b2c9e2fbba3253 Zi Yan 2022-05-12 348 * This also ensures correctness. Without it, when isolate after
b2c9e2fbba3253 Zi Yan 2022-05-12 349 * boundary_pfn and [start_pfn, boundary_pfn) are not online,
b2c9e2fbba3253 Zi Yan 2022-05-12 350 * __first_valid_page() will return unexpected NULL in the for loop
b2c9e2fbba3253 Zi Yan 2022-05-12 351 * below.
b2c9e2fbba3253 Zi Yan 2022-05-12 352 */
b2c9e2fbba3253 Zi Yan 2022-05-12 353 if (isolate_before) {
b2c9e2fbba3253 Zi Yan 2022-05-12 354 if (!pfn_to_online_page(boundary_pfn))
b2c9e2fbba3253 Zi Yan 2022-05-12 355 return 0;
b2c9e2fbba3253 Zi Yan 2022-05-12 356 } else {
b2c9e2fbba3253 Zi Yan 2022-05-12 357 if (!pfn_to_online_page(boundary_pfn - 1))
b2c9e2fbba3253 Zi Yan 2022-05-12 358 return 0;
b2c9e2fbba3253 Zi Yan 2022-05-12 359 }
b2c9e2fbba3253 Zi Yan 2022-05-12 360
b2c9e2fbba3253 Zi Yan 2022-05-12 361 for (pfn = start_pfn; pfn < boundary_pfn;) {
b2c9e2fbba3253 Zi Yan 2022-05-12 362 struct page *page = __first_valid_page(pfn, boundary_pfn - pfn);
b2c9e2fbba3253 Zi Yan 2022-05-12 363
b2c9e2fbba3253 Zi Yan 2022-05-12 364 VM_BUG_ON(!page);
b2c9e2fbba3253 Zi Yan 2022-05-12 365 pfn = page_to_pfn(page);
b2c9e2fbba3253 Zi Yan 2022-05-12 366 /*
b2c9e2fbba3253 Zi Yan 2022-05-12 367 * start_pfn is MAX_ORDER_NR_PAGES aligned, if there is any
b2c9e2fbba3253 Zi Yan 2022-05-12 368 * free pages in [start_pfn, boundary_pfn), its head page will
b2c9e2fbba3253 Zi Yan 2022-05-12 369 * always be in the range.
b2c9e2fbba3253 Zi Yan 2022-05-12 370 */
b2c9e2fbba3253 Zi Yan 2022-05-12 371 if (PageBuddy(page)) {
b2c9e2fbba3253 Zi Yan 2022-05-12 372 int order = buddy_order(page);
b2c9e2fbba3253 Zi Yan 2022-05-12 373
86d28b0709279c Zi Yan 2022-05-26 374 if (pfn + (1UL << order) > boundary_pfn) {
86d28b0709279c Zi Yan 2022-05-26 375 /* free page changed before split, check it again */
86d28b0709279c Zi Yan 2022-05-26 376 if (split_free_page(page, order, boundary_pfn - pfn))
86d28b0709279c Zi Yan 2022-05-26 377 continue;
86d28b0709279c Zi Yan 2022-05-26 378 }
86d28b0709279c Zi Yan 2022-05-26 379
86d28b0709279c Zi Yan 2022-05-26 380 pfn += 1UL << order;
b2c9e2fbba3253 Zi Yan 2022-05-12 381 continue;
b2c9e2fbba3253 Zi Yan 2022-05-12 382 }
b2c9e2fbba3253 Zi Yan 2022-05-12 383 /*
b2c9e2fbba3253 Zi Yan 2022-05-12 384 * migrate compound pages then let the free page handling code
b2c9e2fbba3253 Zi Yan 2022-05-12 385 * above do the rest. If migration is not possible, just fail.
b2c9e2fbba3253 Zi Yan 2022-05-12 386 */
b2c9e2fbba3253 Zi Yan 2022-05-12 387 if (PageCompound(page)) {
b2c9e2fbba3253 Zi Yan 2022-05-12 @388 unsigned long nr_pages = compound_nr(page);
b2c9e2fbba3253 Zi Yan 2022-05-12 389 struct page *head = compound_head(page);
b2c9e2fbba3253 Zi Yan 2022-05-12 @390 unsigned long head_pfn = page_to_pfn(head);
b2c9e2fbba3253 Zi Yan 2022-05-12 391
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-05-31 6:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-30 11:50 [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock() Anshuman Khandual
2022-05-30 13:53 ` Zi Yan
2022-05-31 2:22 ` Anshuman Khandual
2022-05-31 6:21 ` kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.