On 19 May 2022, at 16:57, Qian Cai wrote: > On Thu, Apr 28, 2022 at 08:39:06AM -0400, Zi Yan wrote: >> How about the one attached? I can apply it to next-20220428. Let me know >> if you are using a different branch. Thanks. > > Zi, it turns out that the endless loop in isolate_single_pageblock() can > still be reproduced on today's linux-next tree by running the reproducer a > few times. With this debug patch applied, it keeps printing the same > values. > > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -399,6 +399,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > }; > INIT_LIST_HEAD(&cc.migratepages); > > + printk_ratelimited("KK stucked pfn=%lu head_pfn=%lu nr_pages=%lu boundary_pfn=%lu\n", pfn, head_pfn, nr_pages, boundary_pfn); > ret = __alloc_contig_migrate_range(&cc, head_pfn, > head_pfn + nr_pages); > > isolate_single_pageblock: 179 callbacks suppressed > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 > KK stucked pfn=2151120384 head_pfn=2151120384 nr_pages=512 boundary_pfn=2151120896 Hi Qian, Thanks for your testing. Do you have a complete reproducer? From your printout, it is clear that a 512-page compound page caused the infinite loop, because the page was not migrated and the code kept retrying. But __alloc_contig_migrate_range() is supposed to return non-zero to tell the code the page cannot be migrated and the code will goto failed without retrying. It will be great you can share what exactly has run after boot, so that I can reproduce locally to identify what makes __alloc_contig_migrate_range() return 0 without migrating the page. Can you also try the patch below to see if it fixes the infinite loop? diff --git a/mm/page_isolation.c b/mm/page_isolation.c index b3f074d1682e..abde1877bbcb 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -417,10 +417,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, order = 0; outer_pfn = pfn; while (!PageBuddy(pfn_to_page(outer_pfn))) { - if (++order >= MAX_ORDER) { - outer_pfn = pfn; - break; - } + /* abort if the free page cannot be found */ + if (++order >= MAX_ORDER) + goto failed; outer_pfn &= ~0UL << order; } pfn = outer_pfn; -- Best Regards, Yan, Zi