On 30 Mar 2022, at 17:25, Zi Yan wrote: > On 30 Mar 2022, at 16:53, Steven Rostedt wrote: > >> On Wed, 30 Mar 2022 16:29:28 -0400 >> Zi Yan wrote: >> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index bdc8f60ae462..83a90e2973b7 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -1108,6 +1108,8 @@ static inline void __free_one_page(struct page *page, >>> >>> buddy_pfn = __find_buddy_pfn(pfn, order); >>> buddy = page + (buddy_pfn - pfn); >>> + if (!page_is_buddy(page, buddy, order)) >>> + goto done_merging; >>> buddy_mt = get_pageblock_migratetype(buddy); >>> >>> if (migratetype != buddy_mt >>> >> >> The above did not apply to Linus's tree, nor even the problem commit >> (before or after), but I found where the code is, and added it manually. >> >> It does appear to allow the machine to boot. >> > I just pulled Linus’s tree and grabbed the diff. Anyway, thanks. > > I would like to get more understanding of the issue before blindly sending > this as a fix. > > Merge the other thread: >> >> Not sure if this matters or not, but my kernel command line has: >> >> crashkernel=256M >> >> Could that have caused this to break? > > Unlikely, 256MB is MAX_ORDER_NR_PAGES aligned (MAX_ORDER is 11 here). > __find_buddy_pfn() will not get any buddy_pfn from crashkernel memory > region, since that would cross MAX_ORDER_NR_PAGES boundary. > > page_is_buddy() checks page_is_guard(buddy), PageBuddy(buddy), > buddy_order(buddy), and page_zone_id(buddy), where page_is_guard(buddy) > is always false since CONFIG_DEBUG_PAGEALLOC is not set in your config. > So either PageBuddy(buddy) is false, buddy_order(buddy) != order, > or page_zone_id(buddy) is not the same as page_zone_id(page). > > Do you mind adding the following code right before my fix code above > and provide a complete boot log? I would like to understand what > went wrong. Thanks. > > pr_info("buddy_pfn: %lx, PageBuddy: %d, buddy_order: %d (vs %d), page_zone_id: %d (vs %d)\n", > buddy_pfn, PageBuddy(buddy), buddy_order(buddy), order, page_zone_id(buddy), > page_zone_id(page)); > > This seems to be a bug in the original code too. But "if (unlikely(has_isolate_pageblock(zone)))" is too rare to trigger it. I do not see how having isolated pageblocks in a zone could get us away from checking page_is_buddy(). -- Best Regards, Yan, Zi