On 30 Mar 2022, at 17:43, Vlastimil Babka wrote: > On 3/30/22 22:05, Linus Torvalds wrote: >> On Wed, Mar 30, 2022 at 12:42 PM Steven Rostedt wrote: >>> >>> I started testing new patches and it crashed when doing the x86-32 test on >>> boot up. >>> >>> Initializing HighMem for node 0 (000375fe:0021ee00) >>> BUG: kernel NULL pointer dereference, address: 00000878 >>> #PF: supervisor read access in kernel mode >>> #PF: error_code(0x0000) - not-present page >>> *pdpt = 0000000000000000 *pde = f0000000f000eef3 >>> Oops: 0000 [#1] PREEMPT SMP PTI >>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-test+ #469 >>> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 >>> EIP: get_pfnblock_flags_mask+0x2c/0x36 >>> Code: 6d ea ff 55 89 e5 56 89 ce 53 8b 18 89 d8 c1 eb 1e e8 f7 fb ff ff 69 db c0 02 00 00 89 c1 89 c2 c1 ea 05 8b 83 7c d7 79 c1 5b <8b> 04 90 d3 e8 21 f0 5e 5d c3 55 89 e5 57 56 89 d6 53 89 c3 64 a1 >> >> The whole function is in that Code: thing, and it decodes to: >> >> 0: 55 push %ebp >> 1: 89 e5 mov %esp,%ebp >> 3: 56 push %esi >> 4: 89 ce mov %ecx,%esi >> 6: 53 push %ebx >> 7: 8b 18 mov (%eax),%ebx >> 9: 89 d8 mov %ebx,%eax >> b: c1 eb 1e shr $0x1e,%ebx >> e: e8 f7 fb ff ff call 0xfffffc0a >> 13: 69 db c0 02 00 00 imul $0x2c0,%ebx,%ebx >> 19: 89 c1 mov %eax,%ecx >> 1b: 89 c2 mov %eax,%edx >> 1d: c1 ea 05 shr $0x5,%edx >> 20: 8b 83 7c d7 79 c1 mov -0x3e862884(%ebx),%eax >> 26: 5b pop %ebx >> 27:* 8b 04 90 mov (%eax,%edx,4),%eax <-- trapping instruction >> 2a: d3 e8 shr %cl,%eax >> 2c: 21 f0 and %esi,%eax >> 2e: 5e pop %esi >> 2f: 5d pop %ebp >> 30: c3 ret >> >> with '%eax' being NULL, and %edx being 0x21e. >> >> (The call seems to be to 'pfn_to_bitidx().isra.0' if my compiler does >> similar code generation, so it's out-of-lined part of pfn_to_bitidx() >> despite being marked inline) >> >> So that oops is that >> >> word = bitmap[word_bitidx]; >> >> line, with 'bitmap' being NULL (and %edx contains 'word_bitidx'). >> >> Looking around, your 'config-bad' doesn't even have >> CONFIG_MEMORY_ISOLATION enabled, and so I suspect the culprit is this >> part of the change: >> >> - if (unlikely(has_isolate_pageblock(zone))) { >> >> which used to always be false for that config, and now the code is >> suddenly enabled. > > If CONFIG_MEMORY_ISOLATION was enabled then the zone layout would be the > same, so I think it's not simply that. I think it's the timing - > has_isolate_pageblock(zone) would only be possible to become true later > in runtime when some isolation is ongoing, but here it seems we are > still in the early boot. Probably at a boundary of highmem with another > zone that doesn't have the pageblock bitmap yet initialized? While later > it would have, and all would be fine. It seems so based on the boot log from Steven. > > As Zi Yan said, the usual merging code will, through page_is_buddy() > find safely enough the buddy is not applicable, so I agree with his > patch direction. Seems this also shows the code tried to become too > smart and for the next merge window we should try just move the > migratetype checks into the main while loop (under something like "if > (order >= max_order)") and simplify the function a lot, hopefully with > negligible perf impact. Something like this? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6c6af8658775..568ecaf5700d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1053,7 +1053,6 @@ static inline void __free_one_page(struct page *page, int migratetype, fpi_t fpi_flags) { struct capture_control *capc = task_capc(zone); - unsigned int max_order = pageblock_order; unsigned long buddy_pfn; unsigned long combined_pfn; struct page *buddy; @@ -1069,8 +1068,7 @@ static inline void __free_one_page(struct page *page, VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); VM_BUG_ON_PAGE(bad_range(zone, page), page); -continue_merging: - while (order < max_order) { + while (order < MAX_ORDER - 1) { if (compaction_capture(capc, page, order, migratetype)) { __mod_zone_freepage_state(zone, -(1 << order), migratetype); @@ -1081,6 +1079,27 @@ static inline void __free_one_page(struct page *page, if (!page_is_buddy(page, buddy, order)) goto done_merging; + + if (order >= pageblock_order) { + /* If we are here, it means order is >= pageblock_order. + * We want to prevent merge between freepages on pageblock + * without fallbacks and normal pageblock. Without this, + * pageblock isolation could cause incorrect freepage or CMA + * accounting or HIGHATOMIC accounting. + * + * We don't want to hit this code for the more frequent + * low-order merging. + */ + int buddy_mt; + + buddy_mt = get_pageblock_migratetype(buddy); + + if (migratetype != buddy_mt + && (!migratetype_is_mergeable(migratetype) || + !migratetype_is_mergeable(buddy_mt))) + goto done_merging; + } + /* * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, * merge with it and move up one order. @@ -1094,32 +1113,6 @@ static inline void __free_one_page(struct page *page, pfn = combined_pfn; order++; } - if (order < MAX_ORDER - 1) { - /* If we are here, it means order is >= pageblock_order. - * We want to prevent merge between freepages on pageblock - * without fallbacks and normal pageblock. Without this, - * pageblock isolation could cause incorrect freepage or CMA - * accounting or HIGHATOMIC accounting. - * - * We don't want to hit this code for the more frequent - * low-order merging. - */ - int buddy_mt; - - buddy_pfn = __find_buddy_pfn(pfn, order); - buddy = page + (buddy_pfn - pfn); - - if (!page_is_buddy(page, buddy, order)) - goto done_merging; - buddy_mt = get_pageblock_migratetype(buddy); - - if (migratetype != buddy_mt - && (!migratetype_is_mergeable(migratetype) || - !migratetype_is_mergeable(buddy_mt))) - goto done_merging; - max_order = order + 1; - goto continue_merging; - } done_merging: set_buddy_order(page, order); -- Best Regards, Yan, Zi