* [PATCH 1/1] mm: Defer freeing reserved pages in memblock_free_late()
2023-02-06 7:12 [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Aaron Thompson
@ 2023-02-06 7:12 ` Aaron Thompson
2023-02-07 8:02 ` [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Mike Rapoport
1 sibling, 0 replies; 4+ messages in thread
From: Aaron Thompson @ 2023-02-06 7:12 UTC (permalink / raw)
To: Mike Rapoport; +Cc: Andrew Morton, linux-mm, linux-kernel, Aaron Thompson
Commit 115d9d77bb0f ("mm: Always release pages to the buddy allocator in
memblock_free_late().") introduced a bug. The pages being freed by
memblock_free_late() have already been initialized, but if they are in
the deferred init range, __free_one_page() might access nearby
uninitialized pages when trying to coalesce buddies. This can, for
example, trigger this BUG:
BUG: unable to handle page fault for address: ffffe964c02580c8
RIP: 0010:__list_del_entry_valid+0x3f/0x70
<TASK>
__free_one_page+0x139/0x410
__free_pages_ok+0x21d/0x450
memblock_free_late+0x8c/0xb9
efi_free_boot_services+0x16b/0x25c
efi_enter_virtual_mode+0x403/0x446
start_kernel+0x678/0x714
secondary_startup_64_no_verify+0xd2/0xdb
</TASK>
Instead of freeing such pages immediately, remove the range from
memblock.reserved. This causes the deferred init process to treat it as
a range of free pages, which means they will be initialized and freed by
deferred_init_maxorder().
Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().")
Signed-off-by: Aaron Thompson <dev@aaront.org>
---
mm/internal.h | 2 ++
mm/memblock.c | 36 ++++++++++++++++++++-----------
mm/page_alloc.c | 17 +++++++++++++++
tools/testing/memblock/internal.h | 7 +++---
4 files changed, 47 insertions(+), 15 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index bcf75a8b032d..48d87f334f8c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -358,6 +358,8 @@ extern void __putback_isolated_page(struct page *page, unsigned int order,
int mt);
extern void memblock_free_pages(struct page *page, unsigned long pfn,
unsigned int order);
+extern void memblock_free_reserved_pages(struct page *page, unsigned long pfn,
+ unsigned int order);
extern void __free_pages_core(struct page *page, unsigned int order);
extern void prep_compound_page(struct page *page, unsigned int order);
extern void post_alloc_hook(struct page *page, unsigned int order,
diff --git a/mm/memblock.c b/mm/memblock.c
index 685e30e6d27c..8f65ea3533c6 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -160,6 +160,7 @@ static bool system_has_some_mirror __initdata_memblock = false;
static int memblock_can_resize __initdata_memblock;
static int memblock_memory_in_slab __initdata_memblock = 0;
static int memblock_reserved_in_slab __initdata_memblock = 0;
+static bool memblock_discard_called __initdata = false;
static enum memblock_flags __init_memblock choose_memblock_flags(void)
{
@@ -366,6 +367,8 @@ void __init memblock_discard(void)
{
phys_addr_t addr, size;
+ memblock_discard_called = true;
+
if (memblock.reserved.regions != memblock_reserved_init_regions) {
addr = __pa(memblock.reserved.regions);
size = PAGE_ALIGN(sizeof(struct memblock_region) *
@@ -1620,13 +1623,16 @@ void * __init memblock_alloc_try_nid(
}
/**
- * memblock_free_late - free pages directly to buddy allocator
- * @base: phys starting address of the boot memory block
+ * memblock_free_late - free boot memory block after memblock_free_all() has run
+ * @base: phys starting address of the boot memory block
* @size: size of the boot memory block in bytes
*
- * This is only useful when the memblock allocator has already been torn
- * down, but we are still initializing the system. Pages are released directly
- * to the buddy allocator.
+ * Free boot memory block previously allocated or reserved via memblock APIs.
+ * This function is to be used after memblock_free_all() has run (prior to that,
+ * use memblock_free()/memblock_phys_free()). Pages will be released to the
+ * buddy allocator, either immediately or as part of deferred page
+ * initialization. The block will also be removed from the reserved regions if
+ * memblock_discard() has not yet run.
*/
void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
{
@@ -1640,15 +1646,21 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
end = PFN_DOWN(base + size);
for (; cursor < end; cursor++) {
- /*
- * Reserved pages are always initialized by the end of
- * memblock_free_all() (by memmap_init() and, if deferred
- * initialization is enabled, memmap_init_reserved_pages()), so
- * these pages can be released directly to the buddy allocator.
- */
- __free_pages_core(pfn_to_page(cursor), 0);
+ memblock_free_reserved_pages(pfn_to_page(cursor), cursor, 0);
totalram_pages_inc();
}
+
+ if (!memblock_discard_called)
+ /*
+ * Also remove the range from memblock.reserved. If deferred
+ * page init is enabled, memblock_free_reserved_pages() does not
+ * free pages that are in the deferred range, but because the
+ * range is no longer reserved, deferred init will initialize
+ * and free the pages. Note that such pages will be initialized
+ * twice, first by memmap_init_reserved_pages() and again by
+ * deferred_init_maxorder().
+ */
+ memblock_remove_range(&memblock.reserved, base, size);
}
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0745aedebb37..4583215bfe3a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1813,6 +1813,23 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
__free_pages_core(page, order);
}
+void __init memblock_free_reserved_pages(struct page *page, unsigned long pfn,
+ unsigned int order)
+{
+ /*
+ * All reserved pages have been initialized at this point by either
+ * memmap_init() or memmap_init_reserved_pages(), but if the pages to
+ * be freed are in the deferred init range (which is what
+ * early_page_uninitialised() checks), freeing them now could result
+ * in __free_one_page() accessing nearby uninitialized pages when it
+ * tries to coalesce buddies. They will be freed as part of deferred
+ * init instead.
+ */
+ if (early_page_uninitialised(pfn))
+ return;
+ __free_pages_core(page, order);
+}
+
/*
* Check that the whole (or subset of) a pageblock given by the interval of
* [start_pfn, end_pfn) is valid and within the same zone, before scanning it
diff --git a/tools/testing/memblock/internal.h b/tools/testing/memblock/internal.h
index 85973e55489e..524d93e71bee 100644
--- a/tools/testing/memblock/internal.h
+++ b/tools/testing/memblock/internal.h
@@ -15,12 +15,13 @@ bool mirrored_kernelcore = false;
struct page {};
-void __free_pages_core(struct page *page, unsigned int order)
+void memblock_free_pages(struct page *page, unsigned long pfn,
+ unsigned int order)
{
}
-void memblock_free_pages(struct page *page, unsigned long pfn,
- unsigned int order)
+void memblock_free_reserved_pages(struct page *page, unsigned long pfn,
+ unsigned int order)
{
}
--
2.30.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux
2023-02-06 7:12 [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Aaron Thompson
2023-02-06 7:12 ` [PATCH 1/1] mm: Defer freeing reserved pages in memblock_free_late() Aaron Thompson
@ 2023-02-07 8:02 ` Mike Rapoport
2023-02-07 8:28 ` Aaron Thompson
1 sibling, 1 reply; 4+ messages in thread
From: Mike Rapoport @ 2023-02-07 8:02 UTC (permalink / raw)
To: Aaron Thompson; +Cc: Andrew Morton, linux-mm, linux-kernel
Hi Aaron,
On Mon, Feb 06, 2023 at 07:12:09AM +0000, Aaron Thompson wrote:
> Hi Mike,
>
> Unfortunately my attempted bugfix 115d9d77bb0f ("mm: Always release pages to the
> buddy allocator in memblock_free_late()") is itself buggy. It's true that all
> reserved pages are initialized by the time memblock_free_late() is called, but
> if the pages being freed are in the deferred init range, __free_one_page() might
> access nearby uninitialized pages when trying to coalesce buddies, in which case
> badness ensues :(
>
> deferred_init_maxorder() handles this by initializing a max-order-sized block of
> pages before freeing any of them. We could change memblock_free_late() to do
> that, but it seems to me that having memblock_free_late() get involved in
> initializing and freeing free pages would be a bit messy. I think it will be
> simpler to free the reserved pages later, as part of deferred init or after.
>
> I can see a few ways to accomplish that:
>
> 1. Have memblock_free_late() remove the range from memblock.reserved. Deferred
> init will then handle that range as just another free range, so the pages
> will be initialized and freed by deferred_init_maxorder().
>
> This is the simplest fix, but the problem is that the pages will be
> initialized twice, first by memmap_init_reserved_pages() and again by
> deferred_init_maxorder(). It looks risky to me to blindly zero out an
> already-initialized page struct, but if we could say for certain that the
> page structs for memblock-reserved ranges aren't actually used, at least
> until after deferred init is done, then this could work. I don't know the
> usage of page structs well enough to say.
>
> 2. Take 1 and fix the double-init problem. In addition to removing the range
> from memblock.reserved, also set a flag on the range in memblock.memory that
> indicates the pages for that range have already been initialized.
> deferred_init_maxorder() would skip initializing pages for ranges with the
> flag set, but it would still free them.
>
> This seems like a bit of a conceptual stretch of the memblock region flags
> since this is not a property of the memory itself but rather of the page
> structs corresponding to that memory. But it gets the job done.
>
> 3. Defer the freeing of reserved pages until after deferred init is completely
> done. Have memblock_free_late() set a flag on the range in memblock.reserved,
> and have memblock_discard() call __free_pages_core() on those ranges.
>
> I think this would be a reasonable use of flags on reserved regions. They are
> not currently used.
I think 3 is the most straight-forward as a concept. It'll need some care
for ARCH_KEEP_MEMBLOCK architectures, e.g. arm64, though
I also can think about
4. Extend initialization of the memory map around the reserved regions in
memmap_init_reserved_pages()/reserve_bootmem_region(). If these functions
initialize the memory map of the entire pageblock surrounding the reserved
range, __free_one_page() will certainly access initialized struct pages.
> The included patch implements option 1 because it is the simplest, but it should
> not be used if the double-init of the page structs is unsafe. In my testing I
> verified that the count, mapcount, and lru list head of all pages are at their
> defaults when memblock_free_late() is called by efi_free_boot_services(), but
> that's obviously not conclusive. I have draft versions of 2 and 3 that I can
> finish up quickly if either of those are preferable.
At this point of the release cycle I prefer to revert 115d9d77bb0f ("mm:
Always release pages to the buddy allocator in memblock_free_late()") and
to work on the proper fix for the next release.
> Please let me know what you think, and sorry for introducing this bug.
>
> Thanks,
> Aaron
>
> Aaron Thompson (1):
> mm: Defer freeing reserved pages in memblock_free_late()
>
> mm/internal.h | 2 ++
> mm/memblock.c | 36 ++++++++++++++++++++-----------
> mm/page_alloc.c | 17 +++++++++++++++
> tools/testing/memblock/internal.h | 7 +++---
> 4 files changed, 47 insertions(+), 15 deletions(-)
>
> --
> 2.30.2
>
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 4+ messages in thread