All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux
@ 2023-02-06  7:12 Aaron Thompson
  2023-02-06  7:12 ` [PATCH 1/1] mm: Defer freeing reserved pages in memblock_free_late() Aaron Thompson
  2023-02-07  8:02 ` [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Mike Rapoport
  0 siblings, 2 replies; 4+ messages in thread
From: Aaron Thompson @ 2023-02-06  7:12 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Andrew Morton, linux-mm, linux-kernel, Aaron Thompson

Hi Mike,

Unfortunately my attempted bugfix 115d9d77bb0f ("mm: Always release pages to the
buddy allocator in memblock_free_late()") is itself buggy. It's true that all
reserved pages are initialized by the time memblock_free_late() is called, but
if the pages being freed are in the deferred init range, __free_one_page() might
access nearby uninitialized pages when trying to coalesce buddies, in which case
badness ensues :(

deferred_init_maxorder() handles this by initializing a max-order-sized block of
pages before freeing any of them. We could change memblock_free_late() to do
that, but it seems to me that having memblock_free_late() get involved in
initializing and freeing free pages would be a bit messy. I think it will be
simpler to free the reserved pages later, as part of deferred init or after.

I can see a few ways to accomplish that:

1. Have memblock_free_late() remove the range from memblock.reserved. Deferred
   init will then handle that range as just another free range, so the pages
   will be initialized and freed by deferred_init_maxorder().

   This is the simplest fix, but the problem is that the pages will be
   initialized twice, first by memmap_init_reserved_pages() and again by
   deferred_init_maxorder(). It looks risky to me to blindly zero out an
   already-initialized page struct, but if we could say for certain that the
   page structs for memblock-reserved ranges aren't actually used, at least
   until after deferred init is done, then this could work. I don't know the
   usage of page structs well enough to say.

2. Take 1 and fix the double-init problem. In addition to removing the range
   from memblock.reserved, also set a flag on the range in memblock.memory that
   indicates the pages for that range have already been initialized.
   deferred_init_maxorder() would skip initializing pages for ranges with the
   flag set, but it would still free them.

   This seems like a bit of a conceptual stretch of the memblock region flags
   since this is not a property of the memory itself but rather of the page
   structs corresponding to that memory. But it gets the job done.

3. Defer the freeing of reserved pages until after deferred init is completely
   done. Have memblock_free_late() set a flag on the range in memblock.reserved,
   and have memblock_discard() call __free_pages_core() on those ranges.

   I think this would be a reasonable use of flags on reserved regions. They are
   not currently used.

The included patch implements option 1 because it is the simplest, but it should
not be used if the double-init of the page structs is unsafe. In my testing I
verified that the count, mapcount, and lru list head of all pages are at their
defaults when memblock_free_late() is called by efi_free_boot_services(), but
that's obviously not conclusive. I have draft versions of 2 and 3 that I can
finish up quickly if either of those are preferable.

Please let me know what you think, and sorry for introducing this bug.

Thanks,
Aaron

Aaron Thompson (1):
  mm: Defer freeing reserved pages in memblock_free_late()

 mm/internal.h                     |  2 ++
 mm/memblock.c                     | 36 ++++++++++++++++++++-----------
 mm/page_alloc.c                   | 17 +++++++++++++++
 tools/testing/memblock/internal.h |  7 +++---
 4 files changed, 47 insertions(+), 15 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] mm: Defer freeing reserved pages in memblock_free_late()
  2023-02-06  7:12 [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Aaron Thompson
@ 2023-02-06  7:12 ` Aaron Thompson
  2023-02-07  8:02 ` [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Mike Rapoport
  1 sibling, 0 replies; 4+ messages in thread
From: Aaron Thompson @ 2023-02-06  7:12 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Andrew Morton, linux-mm, linux-kernel, Aaron Thompson

Commit 115d9d77bb0f ("mm: Always release pages to the buddy allocator in
memblock_free_late().") introduced a bug. The pages being freed by
memblock_free_late() have already been initialized, but if they are in
the deferred init range, __free_one_page() might access nearby
uninitialized pages when trying to coalesce buddies. This can, for
example, trigger this BUG:

  BUG: unable to handle page fault for address: ffffe964c02580c8
  RIP: 0010:__list_del_entry_valid+0x3f/0x70
   <TASK>
   __free_one_page+0x139/0x410
   __free_pages_ok+0x21d/0x450
   memblock_free_late+0x8c/0xb9
   efi_free_boot_services+0x16b/0x25c
   efi_enter_virtual_mode+0x403/0x446
   start_kernel+0x678/0x714
   secondary_startup_64_no_verify+0xd2/0xdb
   </TASK>

Instead of freeing such pages immediately, remove the range from
memblock.reserved. This causes the deferred init process to treat it as
a range of free pages, which means they will be initialized and freed by
deferred_init_maxorder().

Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().")
Signed-off-by: Aaron Thompson <dev@aaront.org>
---
 mm/internal.h                     |  2 ++
 mm/memblock.c                     | 36 ++++++++++++++++++++-----------
 mm/page_alloc.c                   | 17 +++++++++++++++
 tools/testing/memblock/internal.h |  7 +++---
 4 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index bcf75a8b032d..48d87f334f8c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -358,6 +358,8 @@ extern void __putback_isolated_page(struct page *page, unsigned int order,
 				    int mt);
 extern void memblock_free_pages(struct page *page, unsigned long pfn,
 					unsigned int order);
+extern void memblock_free_reserved_pages(struct page *page, unsigned long pfn,
+					 unsigned int order);
 extern void __free_pages_core(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned int order);
 extern void post_alloc_hook(struct page *page, unsigned int order,
diff --git a/mm/memblock.c b/mm/memblock.c
index 685e30e6d27c..8f65ea3533c6 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -160,6 +160,7 @@ static bool system_has_some_mirror __initdata_memblock = false;
 static int memblock_can_resize __initdata_memblock;
 static int memblock_memory_in_slab __initdata_memblock = 0;
 static int memblock_reserved_in_slab __initdata_memblock = 0;
+static bool memblock_discard_called __initdata = false;
 
 static enum memblock_flags __init_memblock choose_memblock_flags(void)
 {
@@ -366,6 +367,8 @@ void __init memblock_discard(void)
 {
 	phys_addr_t addr, size;
 
+	memblock_discard_called = true;
+
 	if (memblock.reserved.regions != memblock_reserved_init_regions) {
 		addr = __pa(memblock.reserved.regions);
 		size = PAGE_ALIGN(sizeof(struct memblock_region) *
@@ -1620,13 +1623,16 @@ void * __init memblock_alloc_try_nid(
 }
 
 /**
- * memblock_free_late - free pages directly to buddy allocator
- * @base: phys starting address of the  boot memory block
+ * memblock_free_late - free boot memory block after memblock_free_all() has run
+ * @base: phys starting address of the boot memory block
  * @size: size of the boot memory block in bytes
  *
- * This is only useful when the memblock allocator has already been torn
- * down, but we are still initializing the system.  Pages are released directly
- * to the buddy allocator.
+ * Free boot memory block previously allocated or reserved via memblock APIs.
+ * This function is to be used after memblock_free_all() has run (prior to that,
+ * use memblock_free()/memblock_phys_free()). Pages will be released to the
+ * buddy allocator, either immediately or as part of deferred page
+ * initialization. The block will also be removed from the reserved regions if
+ * memblock_discard() has not yet run.
  */
 void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
 {
@@ -1640,15 +1646,21 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
 	end = PFN_DOWN(base + size);
 
 	for (; cursor < end; cursor++) {
-		/*
-		 * Reserved pages are always initialized by the end of
-		 * memblock_free_all() (by memmap_init() and, if deferred
-		 * initialization is enabled, memmap_init_reserved_pages()), so
-		 * these pages can be released directly to the buddy allocator.
-		 */
-		__free_pages_core(pfn_to_page(cursor), 0);
+		memblock_free_reserved_pages(pfn_to_page(cursor), cursor, 0);
 		totalram_pages_inc();
 	}
+
+	if (!memblock_discard_called)
+		/*
+		 * Also remove the range from memblock.reserved. If deferred
+		 * page init is enabled, memblock_free_reserved_pages() does not
+		 * free pages that are in the deferred range, but because the
+		 * range is no longer reserved, deferred init will initialize
+		 * and free the pages. Note that such pages will be initialized
+		 * twice, first by memmap_init_reserved_pages() and again by
+		 * deferred_init_maxorder().
+		 */
+		memblock_remove_range(&memblock.reserved, base, size);
 }
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0745aedebb37..4583215bfe3a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1813,6 +1813,23 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
 	__free_pages_core(page, order);
 }
 
+void __init memblock_free_reserved_pages(struct page *page, unsigned long pfn,
+					 unsigned int order)
+{
+	/*
+	 * All reserved pages have been initialized at this point by either
+	 * memmap_init() or memmap_init_reserved_pages(), but if the pages to
+	 * be freed are in the deferred init range (which is what
+	 * early_page_uninitialised() checks), freeing them now could result
+	 * in __free_one_page() accessing nearby uninitialized pages when it
+	 * tries to coalesce buddies. They will be freed as part of deferred
+	 * init instead.
+	 */
+	if (early_page_uninitialised(pfn))
+		return;
+	__free_pages_core(page, order);
+}
+
 /*
  * Check that the whole (or subset of) a pageblock given by the interval of
  * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
diff --git a/tools/testing/memblock/internal.h b/tools/testing/memblock/internal.h
index 85973e55489e..524d93e71bee 100644
--- a/tools/testing/memblock/internal.h
+++ b/tools/testing/memblock/internal.h
@@ -15,12 +15,13 @@ bool mirrored_kernelcore = false;
 
 struct page {};
 
-void __free_pages_core(struct page *page, unsigned int order)
+void memblock_free_pages(struct page *page, unsigned long pfn,
+			 unsigned int order)
 {
 }
 
-void memblock_free_pages(struct page *page, unsigned long pfn,
-			 unsigned int order)
+void memblock_free_reserved_pages(struct page *page, unsigned long pfn,
+				  unsigned int order)
 {
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux
  2023-02-06  7:12 [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Aaron Thompson
  2023-02-06  7:12 ` [PATCH 1/1] mm: Defer freeing reserved pages in memblock_free_late() Aaron Thompson
@ 2023-02-07  8:02 ` Mike Rapoport
  2023-02-07  8:28   ` Aaron Thompson
  1 sibling, 1 reply; 4+ messages in thread
From: Mike Rapoport @ 2023-02-07  8:02 UTC (permalink / raw)
  To: Aaron Thompson; +Cc: Andrew Morton, linux-mm, linux-kernel

Hi Aaron,

On Mon, Feb 06, 2023 at 07:12:09AM +0000, Aaron Thompson wrote:
> Hi Mike,
> 
> Unfortunately my attempted bugfix 115d9d77bb0f ("mm: Always release pages to the
> buddy allocator in memblock_free_late()") is itself buggy. It's true that all
> reserved pages are initialized by the time memblock_free_late() is called, but
> if the pages being freed are in the deferred init range, __free_one_page() might
> access nearby uninitialized pages when trying to coalesce buddies, in which case
> badness ensues :(
> 
> deferred_init_maxorder() handles this by initializing a max-order-sized block of
> pages before freeing any of them. We could change memblock_free_late() to do
> that, but it seems to me that having memblock_free_late() get involved in
> initializing and freeing free pages would be a bit messy. I think it will be
> simpler to free the reserved pages later, as part of deferred init or after.
> 
> I can see a few ways to accomplish that:
> 
> 1. Have memblock_free_late() remove the range from memblock.reserved. Deferred
>    init will then handle that range as just another free range, so the pages
>    will be initialized and freed by deferred_init_maxorder().
> 
>    This is the simplest fix, but the problem is that the pages will be
>    initialized twice, first by memmap_init_reserved_pages() and again by
>    deferred_init_maxorder(). It looks risky to me to blindly zero out an
>    already-initialized page struct, but if we could say for certain that the
>    page structs for memblock-reserved ranges aren't actually used, at least
>    until after deferred init is done, then this could work. I don't know the
>    usage of page structs well enough to say.
> 
> 2. Take 1 and fix the double-init problem. In addition to removing the range
>    from memblock.reserved, also set a flag on the range in memblock.memory that
>    indicates the pages for that range have already been initialized.
>    deferred_init_maxorder() would skip initializing pages for ranges with the
>    flag set, but it would still free them.
> 
>    This seems like a bit of a conceptual stretch of the memblock region flags
>    since this is not a property of the memory itself but rather of the page
>    structs corresponding to that memory. But it gets the job done.
> 
> 3. Defer the freeing of reserved pages until after deferred init is completely
>    done. Have memblock_free_late() set a flag on the range in memblock.reserved,
>    and have memblock_discard() call __free_pages_core() on those ranges.
> 
>    I think this would be a reasonable use of flags on reserved regions. They are
>    not currently used.

I think 3 is the most straight-forward as a concept. It'll need some care
for ARCH_KEEP_MEMBLOCK architectures, e.g. arm64, though

I also can think about

4. Extend initialization of the memory map around the reserved regions in
memmap_init_reserved_pages()/reserve_bootmem_region(). If these functions
initialize the memory map of the entire pageblock surrounding the reserved
range, __free_one_page() will certainly access initialized struct pages.
 
> The included patch implements option 1 because it is the simplest, but it should
> not be used if the double-init of the page structs is unsafe. In my testing I
> verified that the count, mapcount, and lru list head of all pages are at their
> defaults when memblock_free_late() is called by efi_free_boot_services(), but
> that's obviously not conclusive. I have draft versions of 2 and 3 that I can
> finish up quickly if either of those are preferable.

At this point of the release cycle I prefer to revert 115d9d77bb0f ("mm:
Always release pages to the buddy allocator in memblock_free_late()") and
to work on the proper fix for the next release.
 
> Please let me know what you think, and sorry for introducing this bug.
> 
> Thanks,
> Aaron
> 
> Aaron Thompson (1):
>   mm: Defer freeing reserved pages in memblock_free_late()
> 
>  mm/internal.h                     |  2 ++
>  mm/memblock.c                     | 36 ++++++++++++++++++++-----------
>  mm/page_alloc.c                   | 17 +++++++++++++++
>  tools/testing/memblock/internal.h |  7 +++---
>  4 files changed, 47 insertions(+), 15 deletions(-)
> 
> -- 
> 2.30.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux
  2023-02-07  8:02 ` [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Mike Rapoport
@ 2023-02-07  8:28   ` Aaron Thompson
  0 siblings, 0 replies; 4+ messages in thread
From: Aaron Thompson @ 2023-02-07  8:28 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Andrew Morton, linux-mm, linux-kernel

Hi Mike,

On 2023-02-07 00:02, Mike Rapoport wrote:
> Hi Aaron,
> 
> [snip]
> 
> At this point of the release cycle I prefer to revert 115d9d77bb0f 
> ("mm:
> Always release pages to the buddy allocator in memblock_free_late()") 
> and
> to work on the proper fix for the next release.

Makes sense. I just sent a revert patch, and I'll work on a better fix.

Thanks,
-- Aaron

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-07  8:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-06  7:12 [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Aaron Thompson
2023-02-06  7:12 ` [PATCH 1/1] mm: Defer freeing reserved pages in memblock_free_late() Aaron Thompson
2023-02-07  8:02 ` [PATCH 0/1] Fix memblock_free_late() deferred init bug, redux Mike Rapoport
2023-02-07  8:28   ` Aaron Thompson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.