All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Runtime allocation of memory regions
@ 2021-08-15 11:09 Patrick Steinhardt
  2021-08-15 11:09 ` [PATCH v3 1/6] mm: Drop unused unloading of modules on OOM Patrick Steinhardt
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Patrick Steinhardt @ 2021-08-15 11:09 UTC (permalink / raw)
  To: grub-devel; +Cc: Leif Lindholm, Daniel Kiper, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 11528 bytes --]

Hi,

this is the third version of my patch set to implement runtime
allocation of additional memory regions.

Changes compared to v2:

    - A new preparatory patch was added to remove unused code which
      unloaded modules on OOM.

    - Patch 2/4 has been split up into two patches: one to drop the
      logic where we request a quarter of available memory and then
      put bounds to it, and one to split apart request of additional
      regions and initialization of the EFI MM system.

    - Flags are now `unsigned int` instead of `unsigned`.

    - `add_memory_regions ()` now gets all flags instead of only a
      single flag `consecutive`.

    - Flags are now defines and not an enum anymore.

    - The callback function is now called `grub_mm_add_region_func_t`
      instead of `grub_mm_region_func_t`. Flags and its variable have
      been renamed accordingly.

Patrick

Patrick Steinhardt (6):
  mm: Drop unused unloading of modules on OOM
  mm: Allow dynamically requesting additional memory regions
  efi: mm: Always request a fixed number of pages on init
  efi: mm: Extract function to add memory regions
  efi: mm: Pass up errors from `add_memory_regions ()`
  efi: mm: Implement runtime addition of pages

 grub-core/kern/dl.c     | 20 ----------
 grub-core/kern/efi/mm.c | 83 +++++++++++++++++++----------------------
 grub-core/kern/mm.c     | 12 +++---
 include/grub/dl.h       |  1 -
 include/grub/mm.h       | 16 ++++++++
 5 files changed, 61 insertions(+), 71 deletions(-)

Range-diff against v2:
1:  cf709a8a1 < -:  --------- mm: Allow dynamically requesting additional memory regions
-:  --------- > 1:  7f2738a64 mm: Drop unused unloading of modules on OOM
-:  --------- > 2:  3f0ec2a76 mm: Allow dynamically requesting additional memory regions
2:  2a9ccee9e ! 3:  5da49ddf1 efi: mm: Extract function to add memory regions
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    efi: mm: Extract function to add memory regions
    +    efi: mm: Always request a fixed number of pages on init
     
    -    In preparation of support for runtime-allocating additional memory
    -    region, this patch extracts the function to retrieve the EFI memory map
    -    and add a subset of it to GRUB's own memory regions.
    +    When initializing the EFI memory subsytem, we will by default request a
    +    quarter of the available memory, bounded by a minimum/maximum value.
    +    Given that we're about to extend the EFI memory system to dynamically
    +    request additional pages from the firmware as required, this scaling of
    +    requested memory based on available memory will not make a lot of sense
    +    anymore.
     
    -    Note that this commit also changes how many bytes we request by default.
    -    Previously, we would've tried to allocate a quarter of available system
    -    memory, bounded by a minimum/maximum value. As we're about to implement
    -    runtime allocation of memory, we now instead always request the minimum
    -    amount of bytes and let the memory allocator call out to our callback.
    +    Remove this logic as a preparatory patch such that we'll instead defer
    +    to the runtime memory allocator. Note that ideally, we'd want to change
    +    this after dynamic requesting of pages has been implemented for the EFI
    +    platform. But because we'll need to split up initialization of the
    +    memory subsystem and the request of pages from the firmware, we'd have
    +    to duplicate quite some logic at first only to remove it afterwards
    +    again. This seems quite pointless, so we instead have patches slightly
    +    out of order.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    @@ grub-core/kern/efi/mm.c: filter_memory_map (grub_efi_memory_descriptor_t *memory
      /* Add memory regions.  */
      static void
      add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
    -@@ grub-core/kern/efi/mm.c: add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
    - 
    -       addr = grub_efi_allocate_pages_real (start, pages,
    - 					   GRUB_EFI_ALLOCATE_ADDRESS,
    --					   GRUB_EFI_LOADER_CODE);      
    -+					   GRUB_EFI_LOADER_CODE);
    -       if (! addr)
    - 	grub_fatal ("cannot allocate conventional memory %p with %u pages",
    - 		    (void *) ((grub_addr_t) start),
    -@@ grub-core/kern/efi/mm.c: print_memory_map (grub_efi_memory_descriptor_t *memory_map,
    - }
    - #endif
    - 
    --void
    --grub_efi_mm_init (void)
    -+static grub_err_t
    -+grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes)
    - {
    -   grub_efi_memory_descriptor_t *memory_map;
    -   grub_efi_memory_descriptor_t *memory_map_end;
     @@ grub-core/kern/efi/mm.c: grub_efi_mm_init (void)
        grub_efi_memory_descriptor_t *filtered_memory_map_end;
        grub_efi_uintn_t map_size;
    @@ grub-core/kern/efi/mm.c: grub_efi_mm_init (void)
        int mm_status;
      
        /* Prepare a memory region to store two memory maps.  */
    -   memory_map = grub_efi_allocate_any_pages (2 * BYTES_TO_PAGES (MEMORY_MAP_SIZE));
    -   if (! memory_map)
    --    grub_fatal ("cannot allocate memory");
    -+    return grub_error (GRUB_ERR_OUT_OF_MEMORY, "cannot allocate memory");
    - 
    -   /* Obtain descriptors for available memory.  */
    -   map_size = MEMORY_MAP_SIZE;
    -@@ grub-core/kern/efi/mm.c: grub_efi_mm_init (void)
    - 
    -       memory_map = grub_efi_allocate_any_pages (2 * BYTES_TO_PAGES (map_size));
    -       if (! memory_map)
    --	grub_fatal ("cannot allocate memory");
    -+	return grub_error (GRUB_ERR_OUT_OF_MEMORY, "cannot allocate memory");
    - 
    -       mm_status = grub_efi_get_memory_map (&map_size, memory_map, 0,
    - 					   &desc_size, 0);
    -     }
    - 
    -   if (mm_status < 0)
    --    grub_fatal ("cannot get memory map");
    -+    return grub_error (GRUB_ERR_OUT_OF_MEMORY, "cannot get memory map");
    - 
    -   memory_map_end = NEXT_MEMORY_DESCRIPTOR (memory_map, map_size);
    - 
     @@ grub-core/kern/efi/mm.c: grub_efi_mm_init (void)
        filtered_memory_map_end = filter_memory_map (memory_map, filtered_memory_map,
      					       desc_size, memory_map_end);
    @@ grub-core/kern/efi/mm.c: grub_efi_mm_init (void)
        /* Allocate memory regions for GRUB's memory management.  */
        add_memory_regions (filtered_memory_map, desc_size,
     -		      filtered_memory_map_end, required_pages);
    -+		      filtered_memory_map_end,
    -+		      BYTES_TO_PAGES (required_bytes));
    ++		      filtered_memory_map_end, BYTES_TO_PAGES (DEFAULT_HEAP_SIZE));
      
      #if 0
        /* For debug.  */
    -@@ grub-core/kern/efi/mm.c: grub_efi_mm_init (void)
    -   /* Release the memory maps.  */
    -   grub_efi_free_pages ((grub_addr_t) memory_map,
    - 		       2 * BYTES_TO_PAGES (MEMORY_MAP_SIZE));
    -+
    -+  return GRUB_ERR_NONE;
    -+}
    -+
    -+void
    -+grub_efi_mm_init (void)
    -+{
    -+  if (grub_efi_mm_add_regions (DEFAULT_HEAP_SIZE) != GRUB_ERR_NONE)
    -+    grub_fatal (grub_errmsg);
    - }
    - 
    - #if defined (__aarch64__) || defined (__arm__) || defined (__riscv)
-:  --------- > 4:  7b2f5fcb5 efi: mm: Extract function to add memory regions
3:  ff7e5be9f ! 5:  e90bbcf04 efi: mm: Pass up errors from `add_memory_regions ()`
    @@ Commit message
         from the firmware at runtime, where it doesn't make sense anymore to
         fail hard.
     
    -    Refactor the function to return an error to prepare for this.
    +    Refactor the function to return an error to prepare for this. Note that
    +    this does not change the behaviour when initializing the memory system
    +    because `grub_efi_mm_init ()` knows to call `grub_fatal ()` in case
    +    `grub_efi_mm_add_regions ()` returns an error.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    @@ grub-core/kern/efi/mm.c: grub_efi_mm_add_regions (grub_efi_uint64_t required_byt
      
        /* Allocate memory regions for GRUB's memory management.  */
     -  add_memory_regions (filtered_memory_map, desc_size,
    --		      filtered_memory_map_end,
    --		      BYTES_TO_PAGES (required_bytes));
    +-		      filtered_memory_map_end, BYTES_TO_PAGES (required_bytes));
     +  err = add_memory_regions (filtered_memory_map, desc_size,
     +			    filtered_memory_map_end,
     +			    BYTES_TO_PAGES (required_bytes));
4:  a27f5b047 ! 6:  0a3ef3d2a efi: mm: Implement runtime addition of pages
    @@ Commit message
         efi: mm: Implement runtime addition of pages
     
         Adjust the interface of `grub_efi_mm_add_regions ()` to take a set of
    -    `GRUB_MM_REGION_*` flags, which most notably is currently only the
    +    `GRUB_MM_ADD_REGION_*` flags, which most notably is currently only the
         `CONSECUTVE` flag. This allows us to set the function up as callback for
         the memory subsystem and have it call out to us in case there's not
         enough pages available in the current heap.
    @@ grub-core/kern/efi/mm.c: static grub_err_t
      		    grub_efi_memory_descriptor_t *memory_map_end,
     -		    grub_efi_uint64_t required_pages)
     +		    grub_efi_uint64_t required_pages,
    -+		    char consecutive)
    ++		    unsigned int flags)
      {
        grub_efi_memory_descriptor_t *desc;
      
    @@ grub-core/kern/efi/mm.c: add_memory_regions (grub_efi_memory_descriptor_t *memor
            start = desc->physical_start;
            pages = desc->num_pages;
     +
    -+      if (pages < required_pages && consecutive)
    ++      if (pages < required_pages && (flags & GRUB_MM_ADD_REGION_CONSECUTIVE))
     +	continue;
     +
            if (pages > required_pages)
    @@ grub-core/kern/efi/mm.c: print_memory_map (grub_efi_memory_descriptor_t *memory_
      
      static grub_err_t
     -grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes)
    -+grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes, unsigned flags)
    ++grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes, unsigned int flags)
      {
        grub_efi_memory_descriptor_t *memory_map;
        grub_efi_memory_descriptor_t *memory_map_end;
    @@ grub-core/kern/efi/mm.c: grub_efi_mm_add_regions (grub_efi_uint64_t required_byt
      			    filtered_memory_map_end,
     -			    BYTES_TO_PAGES (required_bytes));
     +			    BYTES_TO_PAGES (required_bytes),
    -+			    flags & GRUB_MM_REGION_CONSECUTIVE);
    ++			    flags);
        if (err != GRUB_ERR_NONE)
          return err;
      
    @@ grub-core/kern/efi/mm.c: grub_efi_mm_add_regions (grub_efi_uint64_t required_byt
      grub_efi_mm_init (void)
      {
     -  if (grub_efi_mm_add_regions (DEFAULT_HEAP_SIZE) != GRUB_ERR_NONE)
    -+  if (grub_efi_mm_add_regions (DEFAULT_HEAP_SIZE, 0) != GRUB_ERR_NONE)
    -     grub_fatal (grub_errmsg);
    -+  grub_mm_region_fn = grub_efi_mm_add_regions;
    ++  if (grub_efi_mm_add_regions (DEFAULT_HEAP_SIZE, GRUB_MM_ADD_REGION_NONE) != GRUB_ERR_NONE)
    +     grub_fatal ("%s", grub_errmsg);
    ++  grub_mm_add_region_fn = grub_efi_mm_add_regions;
      }
      
      #if defined (__aarch64__) || defined (__arm__) || defined (__riscv)
-- 
2.32.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/6] mm: Drop unused unloading of modules on OOM
  2021-08-15 11:09 [PATCH v3 0/6] Runtime allocation of memory regions Patrick Steinhardt
@ 2021-08-15 11:09 ` Patrick Steinhardt
  2021-08-15 11:09 ` [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions Patrick Steinhardt
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Patrick Steinhardt @ 2021-08-15 11:09 UTC (permalink / raw)
  To: grub-devel; +Cc: Leif Lindholm, Daniel Kiper, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

In `grub_memalign ()`, there's a commented section which would allow for
unloading of unneeded modules in case where there is not enough free
memory available to satisfy a request. Given that this code is never
compiled in, let's remove it together with `grub_dl_unload_unneeded()`

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 grub-core/kern/dl.c | 20 --------------------
 grub-core/kern/mm.c |  8 --------
 include/grub/dl.h   |  1 -
 3 files changed, 29 deletions(-)

diff --git a/grub-core/kern/dl.c b/grub-core/kern/dl.c
index 48f8a7907..a62dbeebb 100644
--- a/grub-core/kern/dl.c
+++ b/grub-core/kern/dl.c
@@ -803,23 +803,3 @@ grub_dl_unload (grub_dl_t mod)
   grub_free (mod);
   return 1;
 }
-
-/* Unload unneeded modules.  */
-void
-grub_dl_unload_unneeded (void)
-{
-  /* Because grub_dl_remove modifies the list of modules, this
-     implementation is tricky.  */
-  grub_dl_t p = grub_dl_head;
-
-  while (p)
-    {
-      if (grub_dl_unload (p))
-	{
-	  p = grub_dl_head;
-	  continue;
-	}
-
-      p = p->next;
-    }
-}
diff --git a/grub-core/kern/mm.c b/grub-core/kern/mm.c
index c070afc62..e0e580270 100644
--- a/grub-core/kern/mm.c
+++ b/grub-core/kern/mm.c
@@ -360,14 +360,6 @@ grub_memalign (grub_size_t align, grub_size_t size)
       count++;
       goto again;
 
-#if 0
-    case 1:
-      /* Unload unneeded modules.  */
-      grub_dl_unload_unneeded ();
-      count++;
-      goto again;
-#endif
-
     default:
       break;
     }
diff --git a/include/grub/dl.h b/include/grub/dl.h
index b3753c9ca..536717776 100644
--- a/include/grub/dl.h
+++ b/include/grub/dl.h
@@ -203,7 +203,6 @@ grub_dl_t EXPORT_FUNC(grub_dl_load) (const char *name);
 grub_dl_t grub_dl_load_core (void *addr, grub_size_t size);
 grub_dl_t EXPORT_FUNC(grub_dl_load_core_noinit) (void *addr, grub_size_t size);
 int EXPORT_FUNC(grub_dl_unload) (grub_dl_t mod);
-extern void grub_dl_unload_unneeded (void);
 extern int EXPORT_FUNC(grub_dl_ref) (grub_dl_t mod);
 extern int EXPORT_FUNC(grub_dl_unref) (grub_dl_t mod);
 extern int EXPORT_FUNC(grub_dl_ref_count) (grub_dl_t mod);
-- 
2.32.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions
  2021-08-15 11:09 [PATCH v3 0/6] Runtime allocation of memory regions Patrick Steinhardt
  2021-08-15 11:09 ` [PATCH v3 1/6] mm: Drop unused unloading of modules on OOM Patrick Steinhardt
@ 2021-08-15 11:09 ` Patrick Steinhardt
  2021-09-01 14:48   ` Daniel Axtens
  2021-08-15 11:09 ` [PATCH v3 3/6] efi: mm: Always request a fixed number of pages on init Patrick Steinhardt
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Patrick Steinhardt @ 2021-08-15 11:09 UTC (permalink / raw)
  To: grub-devel; +Cc: Leif Lindholm, Daniel Kiper, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 3191 bytes --]

Currently, all platforms will set up their heap on initialization of the
platform code. While this works mostly fine, it poses some limitations
on memory management on us. Most notably, allocating big chunks of
memory in the gigabyte range would require us to pre-request this many
bytes from the firmware and add it to the heap from the beginning on
some platforms like EFI. As this isn't needed for most configurations,
it is inefficient and may even negatively impact some usecases when,
e.g., chainloading. Nonetheless, allocating big chunks of memory is
required sometimes, where one example is the upcoming support for the
Argon2 key derival function in LUKS2.

In order to avoid pre-allocating big chunks of memory, this commit
implements a runtime mechanism to add more pages to the system. When a
given allocation cannot be currently satisfied, we'll call a given
callback set up by the platform's own memory management subsystem,
asking it to add a memory area with at least `n` bytes. If this
succeeds, we retry searching for a valid memory region, which should now
succeed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 grub-core/kern/mm.c | 10 ++++++++++
 include/grub/mm.h   | 16 ++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/grub-core/kern/mm.c b/grub-core/kern/mm.c
index e0e580270..2df835392 100644
--- a/grub-core/kern/mm.c
+++ b/grub-core/kern/mm.c
@@ -81,6 +81,7 @@
 \f
 
 grub_mm_region_t grub_mm_base;
+grub_mm_add_region_func_t grub_mm_add_region_fn;
 
 /* Get a header from the pointer PTR, and set *P and *R to a pointer
    to the header and a pointer to its region, respectively. PTR must
@@ -360,6 +361,15 @@ grub_memalign (grub_size_t align, grub_size_t size)
       count++;
       goto again;
 
+    case 1:
+      /* Request additional pages.  */
+      count++;
+
+      if (grub_mm_add_region_fn && grub_mm_add_region_fn (size, GRUB_MM_ADD_REGION_CONSECUTIVE) == GRUB_ERR_NONE)
+	goto again;
+
+      /* fallthrough */
+
     default:
       break;
     }
diff --git a/include/grub/mm.h b/include/grub/mm.h
index 9c38dd3ca..afde57d2e 100644
--- a/include/grub/mm.h
+++ b/include/grub/mm.h
@@ -20,6 +20,7 @@
 #ifndef GRUB_MM_H
 #define GRUB_MM_H	1
 
+#include <grub/err.h>
 #include <grub/types.h>
 #include <grub/symbol.h>
 #include <config.h>
@@ -28,6 +29,21 @@
 # define NULL	((void *) 0)
 #endif
 
+#define GRUB_MM_ADD_REGION_NONE        0
+#define GRUB_MM_ADD_REGION_CONSECUTIVE (1 << 0)
+
+/*
+ * Function used to request memory regions of `grub_size_t` bytes. The second
+ * parameter is a bitfield of `GRUB_MM_ADD_REGION` flags.
+ */
+typedef grub_err_t (*grub_mm_add_region_func_t) (grub_size_t, unsigned int);
+
+/*
+ * Set this function pointer to enable adding memory-regions at runtime in case
+ * a memory allocation cannot be satisfied with existing regions.
+ */
+extern grub_mm_add_region_func_t EXPORT_VAR(grub_mm_add_region_fn);
+
 void grub_mm_init_region (void *addr, grub_size_t size);
 void *EXPORT_FUNC(grub_calloc) (grub_size_t nmemb, grub_size_t size);
 void *EXPORT_FUNC(grub_malloc) (grub_size_t size);
-- 
2.32.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/6] efi: mm: Always request a fixed number of pages on init
  2021-08-15 11:09 [PATCH v3 0/6] Runtime allocation of memory regions Patrick Steinhardt
  2021-08-15 11:09 ` [PATCH v3 1/6] mm: Drop unused unloading of modules on OOM Patrick Steinhardt
  2021-08-15 11:09 ` [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions Patrick Steinhardt
@ 2021-08-15 11:09 ` Patrick Steinhardt
  2021-08-15 11:09 ` [PATCH v3 4/6] efi: mm: Extract function to add memory regions Patrick Steinhardt
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Patrick Steinhardt @ 2021-08-15 11:09 UTC (permalink / raw)
  To: grub-devel; +Cc: Leif Lindholm, Daniel Kiper, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 3732 bytes --]

When initializing the EFI memory subsytem, we will by default request a
quarter of the available memory, bounded by a minimum/maximum value.
Given that we're about to extend the EFI memory system to dynamically
request additional pages from the firmware as required, this scaling of
requested memory based on available memory will not make a lot of sense
anymore.

Remove this logic as a preparatory patch such that we'll instead defer
to the runtime memory allocator. Note that ideally, we'd want to change
this after dynamic requesting of pages has been implemented for the EFI
platform. But because we'll need to split up initialization of the
memory subsystem and the request of pages from the firmware, we'd have
to duplicate quite some logic at first only to remove it afterwards
again. This seems quite pointless, so we instead have patches slightly
out of order.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 grub-core/kern/efi/mm.c | 35 +++--------------------------------
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/grub-core/kern/efi/mm.c b/grub-core/kern/efi/mm.c
index 9838fb2f5..4d276bc87 100644
--- a/grub-core/kern/efi/mm.c
+++ b/grub-core/kern/efi/mm.c
@@ -38,9 +38,8 @@
    a multiplier of 4KB.  */
 #define MEMORY_MAP_SIZE	0x3000
 
-/* The minimum and maximum heap size for GRUB itself.  */
-#define MIN_HEAP_SIZE	0x100000
-#define MAX_HEAP_SIZE	(1600 * 0x100000)
+/* The default heap size for GRUB itself in bytes.  */
+#define DEFAULT_HEAP_SIZE	0x100000
 
 static void *finish_mmap_buf = 0;
 static grub_efi_uintn_t finish_mmap_size = 0;
@@ -478,23 +477,6 @@ filter_memory_map (grub_efi_memory_descriptor_t *memory_map,
   return filtered_desc;
 }
 
-/* Return the total number of pages.  */
-static grub_efi_uint64_t
-get_total_pages (grub_efi_memory_descriptor_t *memory_map,
-		 grub_efi_uintn_t desc_size,
-		 grub_efi_memory_descriptor_t *memory_map_end)
-{
-  grub_efi_memory_descriptor_t *desc;
-  grub_efi_uint64_t total = 0;
-
-  for (desc = memory_map;
-       desc < memory_map_end;
-       desc = NEXT_MEMORY_DESCRIPTOR (desc, desc_size))
-    total += desc->num_pages;
-
-  return total;
-}
-
 /* Add memory regions.  */
 static void
 add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
@@ -583,8 +565,6 @@ grub_efi_mm_init (void)
   grub_efi_memory_descriptor_t *filtered_memory_map_end;
   grub_efi_uintn_t map_size;
   grub_efi_uintn_t desc_size;
-  grub_efi_uint64_t total_pages;
-  grub_efi_uint64_t required_pages;
   int mm_status;
 
   /* Prepare a memory region to store two memory maps.  */
@@ -624,22 +604,13 @@ grub_efi_mm_init (void)
   filtered_memory_map_end = filter_memory_map (memory_map, filtered_memory_map,
 					       desc_size, memory_map_end);
 
-  /* By default, request a quarter of the available memory.  */
-  total_pages = get_total_pages (filtered_memory_map, desc_size,
-				 filtered_memory_map_end);
-  required_pages = (total_pages >> 2);
-  if (required_pages < BYTES_TO_PAGES (MIN_HEAP_SIZE))
-    required_pages = BYTES_TO_PAGES (MIN_HEAP_SIZE);
-  else if (required_pages > BYTES_TO_PAGES (MAX_HEAP_SIZE))
-    required_pages = BYTES_TO_PAGES (MAX_HEAP_SIZE);
-
   /* Sort the filtered descriptors, so that GRUB can allocate pages
      from smaller regions.  */
   sort_memory_map (filtered_memory_map, desc_size, filtered_memory_map_end);
 
   /* Allocate memory regions for GRUB's memory management.  */
   add_memory_regions (filtered_memory_map, desc_size,
-		      filtered_memory_map_end, required_pages);
+		      filtered_memory_map_end, BYTES_TO_PAGES (DEFAULT_HEAP_SIZE));
 
 #if 0
   /* For debug.  */
-- 
2.32.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/6] efi: mm: Extract function to add memory regions
  2021-08-15 11:09 [PATCH v3 0/6] Runtime allocation of memory regions Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2021-08-15 11:09 ` [PATCH v3 3/6] efi: mm: Always request a fixed number of pages on init Patrick Steinhardt
@ 2021-08-15 11:09 ` Patrick Steinhardt
  2021-08-15 11:09 ` [PATCH v3 5/6] efi: mm: Pass up errors from `add_memory_regions ()` Patrick Steinhardt
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Patrick Steinhardt @ 2021-08-15 11:09 UTC (permalink / raw)
  To: grub-devel; +Cc: Leif Lindholm, Daniel Kiper, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 3034 bytes --]

In preparation of support for runtime-allocating additional memory
region, this patch extracts the function to retrieve the EFI memory map
and add a subset of it to GRUB's own memory regions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 grub-core/kern/efi/mm.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/grub-core/kern/efi/mm.c b/grub-core/kern/efi/mm.c
index 4d276bc87..cfc6a67fc 100644
--- a/grub-core/kern/efi/mm.c
+++ b/grub-core/kern/efi/mm.c
@@ -504,7 +504,7 @@ add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
 
       addr = grub_efi_allocate_pages_real (start, pages,
 					   GRUB_EFI_ALLOCATE_ADDRESS,
-					   GRUB_EFI_LOADER_CODE);      
+					   GRUB_EFI_LOADER_CODE);
       if (! addr)
 	grub_fatal ("cannot allocate conventional memory %p with %u pages",
 		    (void *) ((grub_addr_t) start),
@@ -556,8 +556,8 @@ print_memory_map (grub_efi_memory_descriptor_t *memory_map,
 }
 #endif
 
-void
-grub_efi_mm_init (void)
+static grub_err_t
+grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes)
 {
   grub_efi_memory_descriptor_t *memory_map;
   grub_efi_memory_descriptor_t *memory_map_end;
@@ -570,7 +570,7 @@ grub_efi_mm_init (void)
   /* Prepare a memory region to store two memory maps.  */
   memory_map = grub_efi_allocate_any_pages (2 * BYTES_TO_PAGES (MEMORY_MAP_SIZE));
   if (! memory_map)
-    grub_fatal ("cannot allocate memory");
+    return grub_error (GRUB_ERR_OUT_OF_MEMORY, "cannot allocate memory");
 
   /* Obtain descriptors for available memory.  */
   map_size = MEMORY_MAP_SIZE;
@@ -588,14 +588,14 @@ grub_efi_mm_init (void)
 
       memory_map = grub_efi_allocate_any_pages (2 * BYTES_TO_PAGES (map_size));
       if (! memory_map)
-	grub_fatal ("cannot allocate memory");
+	return grub_error (GRUB_ERR_OUT_OF_MEMORY, "cannot allocate memory");
 
       mm_status = grub_efi_get_memory_map (&map_size, memory_map, 0,
 					   &desc_size, 0);
     }
 
   if (mm_status < 0)
-    grub_fatal ("cannot get memory map");
+    return grub_error (GRUB_ERR_OUT_OF_MEMORY, "cannot get memory map");
 
   memory_map_end = NEXT_MEMORY_DESCRIPTOR (memory_map, map_size);
 
@@ -610,7 +610,7 @@ grub_efi_mm_init (void)
 
   /* Allocate memory regions for GRUB's memory management.  */
   add_memory_regions (filtered_memory_map, desc_size,
-		      filtered_memory_map_end, BYTES_TO_PAGES (DEFAULT_HEAP_SIZE));
+		      filtered_memory_map_end, BYTES_TO_PAGES (required_bytes));
 
 #if 0
   /* For debug.  */
@@ -628,6 +628,15 @@ grub_efi_mm_init (void)
   /* Release the memory maps.  */
   grub_efi_free_pages ((grub_addr_t) memory_map,
 		       2 * BYTES_TO_PAGES (MEMORY_MAP_SIZE));
+
+  return GRUB_ERR_NONE;
+}
+
+void
+grub_efi_mm_init (void)
+{
+  if (grub_efi_mm_add_regions (DEFAULT_HEAP_SIZE) != GRUB_ERR_NONE)
+    grub_fatal ("%s", grub_errmsg);
 }
 
 #if defined (__aarch64__) || defined (__arm__) || defined (__riscv)
-- 
2.32.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/6] efi: mm: Pass up errors from `add_memory_regions ()`
  2021-08-15 11:09 [PATCH v3 0/6] Runtime allocation of memory regions Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2021-08-15 11:09 ` [PATCH v3 4/6] efi: mm: Extract function to add memory regions Patrick Steinhardt
@ 2021-08-15 11:09 ` Patrick Steinhardt
  2021-08-15 11:09 ` [PATCH v3 6/6] efi: mm: Implement runtime addition of pages Patrick Steinhardt
  2021-08-26 15:12 ` [PATCH v3 0/6] Runtime allocation of memory regions Daniel Kiper
  6 siblings, 0 replies; 19+ messages in thread
From: Patrick Steinhardt @ 2021-08-15 11:09 UTC (permalink / raw)
  To: grub-devel; +Cc: Leif Lindholm, Daniel Kiper, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

The function `add_memory_regions ()` is currently only called on system
initialization to allocate a fixed amount of pages. As such, it didn't
need to return any errors: in case it failed, we cannot proceed anyway.
This will change with the upcoming support for requesting more memory
from the firmware at runtime, where it doesn't make sense anymore to
fail hard.

Refactor the function to return an error to prepare for this. Note that
this does not change the behaviour when initializing the memory system
because `grub_efi_mm_init ()` knows to call `grub_fatal ()` in case
`grub_efi_mm_add_regions ()` returns an error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 grub-core/kern/efi/mm.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/grub-core/kern/efi/mm.c b/grub-core/kern/efi/mm.c
index cfc6a67fc..ced3ee5e7 100644
--- a/grub-core/kern/efi/mm.c
+++ b/grub-core/kern/efi/mm.c
@@ -478,7 +478,7 @@ filter_memory_map (grub_efi_memory_descriptor_t *memory_map,
 }
 
 /* Add memory regions.  */
-static void
+static grub_err_t
 add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
 		    grub_efi_uintn_t desc_size,
 		    grub_efi_memory_descriptor_t *memory_map_end,
@@ -506,9 +506,9 @@ add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
 					   GRUB_EFI_ALLOCATE_ADDRESS,
 					   GRUB_EFI_LOADER_CODE);
       if (! addr)
-	grub_fatal ("cannot allocate conventional memory %p with %u pages",
-		    (void *) ((grub_addr_t) start),
-		    (unsigned) pages);
+	return grub_error (GRUB_ERR_OUT_OF_MEMORY,
+			    "cannot allocate conventional memory %p with %u pages",
+			    (void *) ((grub_addr_t) start), (unsigned) pages);
 
       grub_mm_init_region (addr, PAGES_TO_BYTES (pages));
 
@@ -518,7 +518,9 @@ add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
     }
 
   if (required_pages > 0)
-    grub_fatal ("too little memory");
+    return grub_error (GRUB_ERR_OUT_OF_MEMORY, "too little memory");
+
+  return GRUB_ERR_NONE;
 }
 
 void
@@ -565,6 +567,7 @@ grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes)
   grub_efi_memory_descriptor_t *filtered_memory_map_end;
   grub_efi_uintn_t map_size;
   grub_efi_uintn_t desc_size;
+  grub_err_t err;
   int mm_status;
 
   /* Prepare a memory region to store two memory maps.  */
@@ -609,8 +612,11 @@ grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes)
   sort_memory_map (filtered_memory_map, desc_size, filtered_memory_map_end);
 
   /* Allocate memory regions for GRUB's memory management.  */
-  add_memory_regions (filtered_memory_map, desc_size,
-		      filtered_memory_map_end, BYTES_TO_PAGES (required_bytes));
+  err = add_memory_regions (filtered_memory_map, desc_size,
+			    filtered_memory_map_end,
+			    BYTES_TO_PAGES (required_bytes));
+  if (err != GRUB_ERR_NONE)
+    return err;
 
 #if 0
   /* For debug.  */
-- 
2.32.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 6/6] efi: mm: Implement runtime addition of pages
  2021-08-15 11:09 [PATCH v3 0/6] Runtime allocation of memory regions Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2021-08-15 11:09 ` [PATCH v3 5/6] efi: mm: Pass up errors from `add_memory_regions ()` Patrick Steinhardt
@ 2021-08-15 11:09 ` Patrick Steinhardt
  2021-08-26 15:12 ` [PATCH v3 0/6] Runtime allocation of memory regions Daniel Kiper
  6 siblings, 0 replies; 19+ messages in thread
From: Patrick Steinhardt @ 2021-08-15 11:09 UTC (permalink / raw)
  To: grub-devel; +Cc: Leif Lindholm, Daniel Kiper, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 2565 bytes --]

Adjust the interface of `grub_efi_mm_add_regions ()` to take a set of
`GRUB_MM_ADD_REGION_*` flags, which most notably is currently only the
`CONSECUTVE` flag. This allows us to set the function up as callback for
the memory subsystem and have it call out to us in case there's not
enough pages available in the current heap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 grub-core/kern/efi/mm.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/grub-core/kern/efi/mm.c b/grub-core/kern/efi/mm.c
index ced3ee5e7..f3d2e2b99 100644
--- a/grub-core/kern/efi/mm.c
+++ b/grub-core/kern/efi/mm.c
@@ -482,7 +482,8 @@ static grub_err_t
 add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
 		    grub_efi_uintn_t desc_size,
 		    grub_efi_memory_descriptor_t *memory_map_end,
-		    grub_efi_uint64_t required_pages)
+		    grub_efi_uint64_t required_pages,
+		    unsigned int flags)
 {
   grub_efi_memory_descriptor_t *desc;
 
@@ -496,6 +497,10 @@ add_memory_regions (grub_efi_memory_descriptor_t *memory_map,
 
       start = desc->physical_start;
       pages = desc->num_pages;
+
+      if (pages < required_pages && (flags & GRUB_MM_ADD_REGION_CONSECUTIVE))
+	continue;
+
       if (pages > required_pages)
 	{
 	  start += PAGES_TO_BYTES (pages - required_pages);
@@ -559,7 +564,7 @@ print_memory_map (grub_efi_memory_descriptor_t *memory_map,
 #endif
 
 static grub_err_t
-grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes)
+grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes, unsigned int flags)
 {
   grub_efi_memory_descriptor_t *memory_map;
   grub_efi_memory_descriptor_t *memory_map_end;
@@ -614,7 +619,8 @@ grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes)
   /* Allocate memory regions for GRUB's memory management.  */
   err = add_memory_regions (filtered_memory_map, desc_size,
 			    filtered_memory_map_end,
-			    BYTES_TO_PAGES (required_bytes));
+			    BYTES_TO_PAGES (required_bytes),
+			    flags);
   if (err != GRUB_ERR_NONE)
     return err;
 
@@ -641,8 +647,9 @@ grub_efi_mm_add_regions (grub_efi_uint64_t required_bytes)
 void
 grub_efi_mm_init (void)
 {
-  if (grub_efi_mm_add_regions (DEFAULT_HEAP_SIZE) != GRUB_ERR_NONE)
+  if (grub_efi_mm_add_regions (DEFAULT_HEAP_SIZE, GRUB_MM_ADD_REGION_NONE) != GRUB_ERR_NONE)
     grub_fatal ("%s", grub_errmsg);
+  grub_mm_add_region_fn = grub_efi_mm_add_regions;
 }
 
 #if defined (__aarch64__) || defined (__arm__) || defined (__riscv)
-- 
2.32.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/6] Runtime allocation of memory regions
  2021-08-15 11:09 [PATCH v3 0/6] Runtime allocation of memory regions Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2021-08-15 11:09 ` [PATCH v3 6/6] efi: mm: Implement runtime addition of pages Patrick Steinhardt
@ 2021-08-26 15:12 ` Daniel Kiper
  2021-08-27  3:39   ` Daniel Axtens
  6 siblings, 1 reply; 19+ messages in thread
From: Daniel Kiper @ 2021-08-26 15:12 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: grub-devel, Leif Lindholm, Stefan Berger, dja

Hey,

Adding Daniel Axtens...

On Sun, Aug 15, 2021 at 01:09:06PM +0200, Patrick Steinhardt wrote:
> Hi,
>
> this is the third version of my patch set to implement runtime
> allocation of additional memory regions.
>
> Changes compared to v2:
>
>     - A new preparatory patch was added to remove unused code which
>       unloaded modules on OOM.
>
>     - Patch 2/4 has been split up into two patches: one to drop the
>       logic where we request a quarter of available memory and then
>       put bounds to it, and one to split apart request of additional
>       regions and initialization of the EFI MM system.
>
>     - Flags are now `unsigned int` instead of `unsigned`.
>
>     - `add_memory_regions ()` now gets all flags instead of only a
>       single flag `consecutive`.
>
>     - Flags are now defines and not an enum anymore.
>
>     - The callback function is now called `grub_mm_add_region_func_t`
>       instead of `grub_mm_region_func_t`. Flags and its variable have
>       been renamed accordingly.
>
> Patrick
>
> Patrick Steinhardt (6):
>   mm: Drop unused unloading of modules on OOM
>   mm: Allow dynamically requesting additional memory regions
>   efi: mm: Always request a fixed number of pages on init
>   efi: mm: Extract function to add memory regions
>   efi: mm: Pass up errors from `add_memory_regions ()`
>   efi: mm: Implement runtime addition of pages
>
>  grub-core/kern/dl.c     | 20 ----------
>  grub-core/kern/efi/mm.c | 83 +++++++++++++++++++----------------------
>  grub-core/kern/mm.c     | 12 +++---
>  include/grub/dl.h       |  1 -
>  include/grub/mm.h       | 16 ++++++++
>  5 files changed, 61 insertions(+), 71 deletions(-)

Patrick, I went quickly through this patch series and in general it
LGTM. There are some minor issues but we can fix them later. Thank
you for doing this work.

Stefan and/or Daniel Axtens, may I ask you to test these patches with
your use case? If it works for you please repost this patch series with
your changes added. Then I will merge it after final review.

Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/6] Runtime allocation of memory regions
  2021-08-26 15:12 ` [PATCH v3 0/6] Runtime allocation of memory regions Daniel Kiper
@ 2021-08-27  3:39   ` Daniel Axtens
  2021-08-30 17:49     ` Daniel Kiper
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Axtens @ 2021-08-27  3:39 UTC (permalink / raw)
  To: Daniel Kiper, Patrick Steinhardt; +Cc: grub-devel, Leif Lindholm, Stefan Berger

Daniel Kiper <dkiper@net-space.pl> writes:

> Hey,
>
> Adding Daniel Axtens...
>
> On Sun, Aug 15, 2021 at 01:09:06PM +0200, Patrick Steinhardt wrote:
>> Hi,
>>
>> this is the third version of my patch set to implement runtime
>> allocation of additional memory regions.
>>
>> Changes compared to v2:
>>
>>     - A new preparatory patch was added to remove unused code which
>>       unloaded modules on OOM.
>>
>>     - Patch 2/4 has been split up into two patches: one to drop the
>>       logic where we request a quarter of available memory and then
>>       put bounds to it, and one to split apart request of additional
>>       regions and initialization of the EFI MM system.
>>
>>     - Flags are now `unsigned int` instead of `unsigned`.
>>
>>     - `add_memory_regions ()` now gets all flags instead of only a
>>       single flag `consecutive`.
>>
>>     - Flags are now defines and not an enum anymore.
>>
>>     - The callback function is now called `grub_mm_add_region_func_t`
>>       instead of `grub_mm_region_func_t`. Flags and its variable have
>>       been renamed accordingly.
>>
>> Patrick
>>
>> Patrick Steinhardt (6):
>>   mm: Drop unused unloading of modules on OOM
>>   mm: Allow dynamically requesting additional memory regions
>>   efi: mm: Always request a fixed number of pages on init
>>   efi: mm: Extract function to add memory regions
>>   efi: mm: Pass up errors from `add_memory_regions ()`
>>   efi: mm: Implement runtime addition of pages
>>
>>  grub-core/kern/dl.c     | 20 ----------
>>  grub-core/kern/efi/mm.c | 83 +++++++++++++++++++----------------------
>>  grub-core/kern/mm.c     | 12 +++---
>>  include/grub/dl.h       |  1 -
>>  include/grub/mm.h       | 16 ++++++++
>>  5 files changed, 61 insertions(+), 71 deletions(-)
>
> Patrick, I went quickly through this patch series and in general it
> LGTM. There are some minor issues but we can fix them later. Thank
> you for doing this work.
>
> Stefan and/or Daniel Axtens, may I ask you to test these patches with
> your use case? If it works for you please repost this patch series with
> your changes added. Then I will merge it after final review.

Sure, I'll have a look.

My initial thoughts are:

 - with the CAS support patch. We would still need that, and would want
   to do that call early as possible because it will cause the partition
   to be rebooted.

 - We have to be careful where we ask for memory because the kernel
   assumes that there will be some free memory below a particular address.

  - I'd also want to verify what the performance impact would be - not
    just on powerpc-ieee1275 but also on efi - of going out to
    OpenFirmware/UEFI for each new zone...

I'll test this early next week and report back.

Kind regards,
Daniel

>
> Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/6] Runtime allocation of memory regions
  2021-08-27  3:39   ` Daniel Axtens
@ 2021-08-30 17:49     ` Daniel Kiper
  2021-08-30 18:05       ` Patrick Steinhardt
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Kiper @ 2021-08-30 17:49 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Patrick Steinhardt, grub-devel, Leif Lindholm, Stefan Berger

On Fri, Aug 27, 2021 at 01:39:05PM +1000, Daniel Axtens wrote:
> Daniel Kiper <dkiper@net-space.pl> writes:
>
> > Hey,
> >
> > Adding Daniel Axtens...
> >
> > On Sun, Aug 15, 2021 at 01:09:06PM +0200, Patrick Steinhardt wrote:
> >> Hi,
> >>
> >> this is the third version of my patch set to implement runtime
> >> allocation of additional memory regions.
> >>
> >> Changes compared to v2:
> >>
> >>     - A new preparatory patch was added to remove unused code which
> >>       unloaded modules on OOM.
> >>
> >>     - Patch 2/4 has been split up into two patches: one to drop the
> >>       logic where we request a quarter of available memory and then
> >>       put bounds to it, and one to split apart request of additional
> >>       regions and initialization of the EFI MM system.
> >>
> >>     - Flags are now `unsigned int` instead of `unsigned`.
> >>
> >>     - `add_memory_regions ()` now gets all flags instead of only a
> >>       single flag `consecutive`.
> >>
> >>     - Flags are now defines and not an enum anymore.
> >>
> >>     - The callback function is now called `grub_mm_add_region_func_t`
> >>       instead of `grub_mm_region_func_t`. Flags and its variable have
> >>       been renamed accordingly.
> >>
> >> Patrick
> >>
> >> Patrick Steinhardt (6):
> >>   mm: Drop unused unloading of modules on OOM
> >>   mm: Allow dynamically requesting additional memory regions
> >>   efi: mm: Always request a fixed number of pages on init
> >>   efi: mm: Extract function to add memory regions
> >>   efi: mm: Pass up errors from `add_memory_regions ()`
> >>   efi: mm: Implement runtime addition of pages
> >>
> >>  grub-core/kern/dl.c     | 20 ----------
> >>  grub-core/kern/efi/mm.c | 83 +++++++++++++++++++----------------------
> >>  grub-core/kern/mm.c     | 12 +++---
> >>  include/grub/dl.h       |  1 -
> >>  include/grub/mm.h       | 16 ++++++++
> >>  5 files changed, 61 insertions(+), 71 deletions(-)
> >
> > Patrick, I went quickly through this patch series and in general it
> > LGTM. There are some minor issues but we can fix them later. Thank
> > you for doing this work.
> >
> > Stefan and/or Daniel Axtens, may I ask you to test these patches with
> > your use case? If it works for you please repost this patch series with
> > your changes added. Then I will merge it after final review.
>
> Sure, I'll have a look.
>
> My initial thoughts are:
>
>  - with the CAS support patch. We would still need that, and would want
>    to do that call early as possible because it will cause the partition
>    to be rebooted.
>
>  - We have to be careful where we ask for memory because the kernel
>    assumes that there will be some free memory below a particular address.
>
>   - I'd also want to verify what the performance impact would be - not
>     just on powerpc-ieee1275 but also on efi - of going out to
>     OpenFirmware/UEFI for each new zone...

Good point! I was thinking about performance at some point too. Sadly
I forgot to say something about that... Patrick, did you compare
performance before and after you patch series? If the impact is
significant maybe we should not change initial allocation strategy
and add a function to allocate more memory during runtime only...

> I'll test this early next week and report back.

Cool! Thanks a lot!

Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/6] Runtime allocation of memory regions
  2021-08-30 17:49     ` Daniel Kiper
@ 2021-08-30 18:05       ` Patrick Steinhardt
  2021-09-01  5:50         ` Daniel Axtens
  0 siblings, 1 reply; 19+ messages in thread
From: Patrick Steinhardt @ 2021-08-30 18:05 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: Daniel Axtens, Leif Lindholm, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 3752 bytes --]

On Mon, Aug 30, 2021 at 07:49:07PM +0200, Daniel Kiper wrote:
> On Fri, Aug 27, 2021 at 01:39:05PM +1000, Daniel Axtens wrote:
> > Daniel Kiper <dkiper@net-space.pl> writes:
> >
> > > Hey,
> > >
> > > Adding Daniel Axtens...
> > >
> > > On Sun, Aug 15, 2021 at 01:09:06PM +0200, Patrick Steinhardt wrote:
> > >> Hi,
> > >>
> > >> this is the third version of my patch set to implement runtime
> > >> allocation of additional memory regions.
> > >>
> > >> Changes compared to v2:
> > >>
> > >>     - A new preparatory patch was added to remove unused code which
> > >>       unloaded modules on OOM.
> > >>
> > >>     - Patch 2/4 has been split up into two patches: one to drop the
> > >>       logic where we request a quarter of available memory and then
> > >>       put bounds to it, and one to split apart request of additional
> > >>       regions and initialization of the EFI MM system.
> > >>
> > >>     - Flags are now `unsigned int` instead of `unsigned`.
> > >>
> > >>     - `add_memory_regions ()` now gets all flags instead of only a
> > >>       single flag `consecutive`.
> > >>
> > >>     - Flags are now defines and not an enum anymore.
> > >>
> > >>     - The callback function is now called `grub_mm_add_region_func_t`
> > >>       instead of `grub_mm_region_func_t`. Flags and its variable have
> > >>       been renamed accordingly.
> > >>
> > >> Patrick
> > >>
> > >> Patrick Steinhardt (6):
> > >>   mm: Drop unused unloading of modules on OOM
> > >>   mm: Allow dynamically requesting additional memory regions
> > >>   efi: mm: Always request a fixed number of pages on init
> > >>   efi: mm: Extract function to add memory regions
> > >>   efi: mm: Pass up errors from `add_memory_regions ()`
> > >>   efi: mm: Implement runtime addition of pages
> > >>
> > >>  grub-core/kern/dl.c     | 20 ----------
> > >>  grub-core/kern/efi/mm.c | 83 +++++++++++++++++++----------------------
> > >>  grub-core/kern/mm.c     | 12 +++---
> > >>  include/grub/dl.h       |  1 -
> > >>  include/grub/mm.h       | 16 ++++++++
> > >>  5 files changed, 61 insertions(+), 71 deletions(-)
> > >
> > > Patrick, I went quickly through this patch series and in general it
> > > LGTM. There are some minor issues but we can fix them later. Thank
> > > you for doing this work.
> > >
> > > Stefan and/or Daniel Axtens, may I ask you to test these patches with
> > > your use case? If it works for you please repost this patch series with
> > > your changes added. Then I will merge it after final review.
> >
> > Sure, I'll have a look.
> >
> > My initial thoughts are:
> >
> >  - with the CAS support patch. We would still need that, and would want
> >    to do that call early as possible because it will cause the partition
> >    to be rebooted.
> >
> >  - We have to be careful where we ask for memory because the kernel
> >    assumes that there will be some free memory below a particular address.
> >
> >   - I'd also want to verify what the performance impact would be - not
> >     just on powerpc-ieee1275 but also on efi - of going out to
> >     OpenFirmware/UEFI for each new zone...
> 
> Good point! I was thinking about performance at some point too. Sadly
> I forgot to say something about that... Patrick, did you compare
> performance before and after you patch series? If the impact is
> significant maybe we should not change initial allocation strategy
> and add a function to allocate more memory during runtime only...
> 
> > I'll test this early next week and report back.
> 
> Cool! Thanks a lot!
> 
> Daniel

The question is how to measure performance. Are there any benchmarks
that result in somewhat reproducible results?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/6] Runtime allocation of memory regions
  2021-08-30 18:05       ` Patrick Steinhardt
@ 2021-09-01  5:50         ` Daniel Axtens
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Axtens @ 2021-09-01  5:50 UTC (permalink / raw)
  To: Patrick Steinhardt, The development of GNU GRUB
  Cc: Leif Lindholm, Stefan Berger

> The question is how to measure performance. Are there any benchmarks
> that result in somewhat reproducible results?

I don't know.

I was going to see what the speed of `cat foo.gz` is for various sizes
of file because that seems like it would do a reasonable amount of
stressing of the allocator...

> Patrick


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions
  2021-08-15 11:09 ` [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions Patrick Steinhardt
@ 2021-09-01 14:48   ` Daniel Axtens
  2021-09-02 12:40     ` Daniel Kiper
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Axtens @ 2021-09-01 14:48 UTC (permalink / raw)
  To: Patrick Steinhardt, grub-devel; +Cc: Leif Lindholm, Daniel Kiper, Stefan Berger

Patrick Steinhardt <ps@pks.im> writes:

> Currently, all platforms will set up their heap on initialization of the
> platform code. While this works mostly fine, it poses some limitations
> on memory management on us. Most notably, allocating big chunks of
> memory in the gigabyte range would require us to pre-request this many
> bytes from the firmware and add it to the heap from the beginning on
> some platforms like EFI. As this isn't needed for most configurations,
> it is inefficient and may even negatively impact some usecases when,
> e.g., chainloading. Nonetheless, allocating big chunks of memory is
> required sometimes, where one example is the upcoming support for the
> Argon2 key derival function in LUKS2.
>
> In order to avoid pre-allocating big chunks of memory, this commit
> implements a runtime mechanism to add more pages to the system. When a
> given allocation cannot be currently satisfied, we'll call a given
> callback set up by the platform's own memory management subsystem,
> asking it to add a memory area with at least `n` bytes. If this
> succeeds, we retry searching for a valid memory region, which should now
> succeed.
>

I implemented this for ieee1275-powerpc. I set the initial memory claim
to 1MB to match EFI and to exercise the code.

Thoughts as I progressed:

 - You probably need to think about how to satisfy requests with
   particular alignments: currently there is no way to specify that with
   the current interface, and I saw powerpc-ieee1275 return bunch of
   allocations at e.g 0x2a561e which is not particularly well aligned!

 - You haven't included in the calculations the extra space required for
   mm housekeeping. For example, I'm seeing an allocation for 32kB be
   requested via grub_mm_add_region_fn. I dutifully allocate 32kB, with
   no alignment requirements, and pass that pointer to
   grub_mm_init_region. grub_mm_init_region throws away bytes at the
   start to get to GRUB_MM_ALIGN, then uses some bytes for the
   grub_mm_header_t, then any actual allocation uses bytes for the
   malloc metadata. So the actual underlying allocation cannot be
   satisfied.
   
   I think you get away with this on EFI because you use BYTES_TO_PAGES
   and get page-aligned memory, but I think you should probably round up
   to the next power of 2 for smaller allocations or to the next page or
   so for larger allocations.

 - After fixing that in the ieee1275 code, all_functional_test
   hangs trying to run the cmdline_cat test. I think this is from a slow
   algorithm somewhere - the grub allocator isn't exactly optimised for
   a proliferation of regions.

 - I noticed that nearly all the allocations were under 1MB. This seems
   inefficient for a trip out to firmware. So I made the ieee1275 code
   allocate at least max(4MB, (size of allocation rounded up nearest
   1MB) + 4kB). This makes the tests run with only the usual failures,
   at least on pseries with debug on... still chasing some bugs beyond
   that.

 - The speed impact depends on the allocation size. I'll post something
   on that tomorrow, hopefully, but larger minimum allocations work
   noticably better.

 - We only have 4GB max to play with because (at least) powerpc-ieee1275
   is technically defined to be 32 bit. So I'm a bit nervous about
   further large allocations unless we have a way to release them back
   to _firmware_, not just to grub.

I would think a better overall approach would be to allocate the 1/4 of
ram when grub starts, and create a whole new interface for large slabs
of memory that are directly allocated from, and directly returned to,
the firmware.

Kind regards,
Daniel

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  grub-core/kern/mm.c | 10 ++++++++++
>  include/grub/mm.h   | 16 ++++++++++++++++
>  2 files changed, 26 insertions(+)
>
> diff --git a/grub-core/kern/mm.c b/grub-core/kern/mm.c
> index e0e580270..2df835392 100644
> --- a/grub-core/kern/mm.c
> +++ b/grub-core/kern/mm.c
> @@ -81,6 +81,7 @@
>  \f
>  
>  grub_mm_region_t grub_mm_base;
> +grub_mm_add_region_func_t grub_mm_add_region_fn;
>  
>  /* Get a header from the pointer PTR, and set *P and *R to a pointer
>     to the header and a pointer to its region, respectively. PTR must
> @@ -360,6 +361,15 @@ grub_memalign (grub_size_t align, grub_size_t size)
>        count++;
>        goto again;
>  
> +    case 1:
> +      /* Request additional pages.  */
> +      count++;
> +
> +      if (grub_mm_add_region_fn && grub_mm_add_region_fn (size, GRUB_MM_ADD_REGION_CONSECUTIVE) == GRUB_ERR_NONE)
> +	goto again;
> +
> +      /* fallthrough */
> +
>      default:
>        break;
>      }
> diff --git a/include/grub/mm.h b/include/grub/mm.h
> index 9c38dd3ca..afde57d2e 100644
> --- a/include/grub/mm.h
> +++ b/include/grub/mm.h
> @@ -20,6 +20,7 @@
>  #ifndef GRUB_MM_H
>  #define GRUB_MM_H	1
>  
> +#include <grub/err.h>
>  #include <grub/types.h>
>  #include <grub/symbol.h>
>  #include <config.h>
> @@ -28,6 +29,21 @@
>  # define NULL	((void *) 0)
>  #endif
>  
> +#define GRUB_MM_ADD_REGION_NONE        0
> +#define GRUB_MM_ADD_REGION_CONSECUTIVE (1 << 0)

CONSECUTIVE needs a definition here. Initially I thought it meant that
the request had to expand an existing region, but what I think it means
having read your EFI implementation is that the request needs to be
satisfied by a single region rather than by a number of smaller regions.
(NONE probably also needs a better name, because it seemed a bit odd to
type ADD_REGION_NONE. Even ADD_REGION_FLAGS_NONE would be better.)


> +
> +/*
> + * Function used to request memory regions of `grub_size_t` bytes. The second
> + * parameter is a bitfield of `GRUB_MM_ADD_REGION` flags.
> + */
> +typedef grub_err_t (*grub_mm_add_region_func_t) (grub_size_t, unsigned int);
> +
> +/*
> + * Set this function pointer to enable adding memory-regions at runtime in case
> + * a memory allocation cannot be satisfied with existing regions.
> + */
> +extern grub_mm_add_region_func_t EXPORT_VAR(grub_mm_add_region_fn);
> +
>  void grub_mm_init_region (void *addr, grub_size_t size);
>  void *EXPORT_FUNC(grub_calloc) (grub_size_t nmemb, grub_size_t size);
>  void *EXPORT_FUNC(grub_malloc) (grub_size_t size);
> -- 
> 2.32.0
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions
  2021-09-01 14:48   ` Daniel Axtens
@ 2021-09-02 12:40     ` Daniel Kiper
  2021-09-03 12:23       ` Daniel Axtens
  2021-09-06  8:23       ` Daniel Axtens
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Kiper @ 2021-09-02 12:40 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Patrick Steinhardt, grub-devel, Leif Lindholm, Stefan Berger

On Thu, Sep 02, 2021 at 12:48:24AM +1000, Daniel Axtens wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > Currently, all platforms will set up their heap on initialization of the
> > platform code. While this works mostly fine, it poses some limitations
> > on memory management on us. Most notably, allocating big chunks of
> > memory in the gigabyte range would require us to pre-request this many
> > bytes from the firmware and add it to the heap from the beginning on
> > some platforms like EFI. As this isn't needed for most configurations,
> > it is inefficient and may even negatively impact some usecases when,
> > e.g., chainloading. Nonetheless, allocating big chunks of memory is
> > required sometimes, where one example is the upcoming support for the
> > Argon2 key derival function in LUKS2.
> >
> > In order to avoid pre-allocating big chunks of memory, this commit
> > implements a runtime mechanism to add more pages to the system. When a
> > given allocation cannot be currently satisfied, we'll call a given
> > callback set up by the platform's own memory management subsystem,
> > asking it to add a memory area with at least `n` bytes. If this
> > succeeds, we retry searching for a valid memory region, which should now
> > succeed.
> >
>
> I implemented this for ieee1275-powerpc. I set the initial memory claim
> to 1MB to match EFI and to exercise the code.
>
> Thoughts as I progressed:
>
>  - You probably need to think about how to satisfy requests with
>    particular alignments: currently there is no way to specify that with
>    the current interface, and I saw powerpc-ieee1275 return bunch of
>    allocations at e.g 0x2a561e which is not particularly well aligned!

I think at "firmware memory allocation" level we could always allocate
page aligned regions. Of course this may not satisfy allocations
aligned at larger values than a page. Though I think we can solve this
by allocating larger regions from firmware. Please look below for more
details...

>  - You haven't included in the calculations the extra space required for
>    mm housekeeping. For example, I'm seeing an allocation for 32kB be
>    requested via grub_mm_add_region_fn. I dutifully allocate 32kB, with
>    no alignment requirements, and pass that pointer to
>    grub_mm_init_region. grub_mm_init_region throws away bytes at the
>    start to get to GRUB_MM_ALIGN, then uses some bytes for the
>    grub_mm_header_t, then any actual allocation uses bytes for the
>    malloc metadata. So the actual underlying allocation cannot be
>    satisfied.
>
>    I think you get away with this on EFI because you use BYTES_TO_PAGES
>    and get page-aligned memory, but I think you should probably round up
>    to the next power of 2 for smaller allocations or to the next page or
>    so for larger allocations.

I think we could allocate at least e.g. 128 MiB from firmware if there is
not enough memory available in the GRUB mm. This way we would avoid frequent
calls to firmware and could satisfy requests for larger alignments.

>  - After fixing that in the ieee1275 code, all_functional_test
>    hangs trying to run the cmdline_cat test. I think this is from a slow
>    algorithm somewhere - the grub allocator isn't exactly optimised for
>    a proliferation of regions.

Could you try the solution proposed above? Maybe it will solve problem of
frequent additions of memory to the GRUB mm.

>  - I noticed that nearly all the allocations were under 1MB. This seems
>    inefficient for a trip out to firmware. So I made the ieee1275 code
>    allocate at least max(4MB, (size of allocation rounded up nearest
>    1MB) + 4kB). This makes the tests run with only the usual failures,
>    at least on pseries with debug on... still chasing some bugs beyond
>    that.

Yeah, this is similar to what I proposed above. Though I would want to see
larger numbers tested as I said earlier.

>  - The speed impact depends on the allocation size. I'll post something
>    on that tomorrow, hopefully, but larger minimum allocations work
>    noticably better.
>
>  - We only have 4GB max to play with because (at least) powerpc-ieee1275
>    is technically defined to be 32 bit. So I'm a bit nervous about
>    further large allocations unless we have a way to release them back
>    to _firmware_, not just to grub.

Ugh... This can be difficult. I am not sure the GRUB mm is smart enough
to release memory regions if they are not used anymore by it.

> I would think a better overall approach would be to allocate the 1/4 of
> ram when grub starts, and create a whole new interface for large slabs

I am not very happy with allocating 1/4 of memory at start of the day.
I think allocating larger chunks of memory from firmware should be
enough to make things working as expected.

> of memory that are directly allocated from, and directly returned to,
> the firmware.

Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions
  2021-09-02 12:40     ` Daniel Kiper
@ 2021-09-03 12:23       ` Daniel Axtens
  2021-09-12 11:13         ` Patrick Steinhardt
  2021-09-06  8:23       ` Daniel Axtens
  1 sibling, 1 reply; 19+ messages in thread
From: Daniel Axtens @ 2021-09-03 12:23 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: Patrick Steinhardt, grub-devel, Leif Lindholm, Stefan Berger

Daniel Kiper <dkiper@net-space.pl> writes:

> On Thu, Sep 02, 2021 at 12:48:24AM +1000, Daniel Axtens wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>>
>> > Currently, all platforms will set up their heap on initialization of the
>> > platform code. While this works mostly fine, it poses some limitations
>> > on memory management on us. Most notably, allocating big chunks of
>> > memory in the gigabyte range would require us to pre-request this many
>> > bytes from the firmware and add it to the heap from the beginning on
>> > some platforms like EFI. As this isn't needed for most configurations,
>> > it is inefficient and may even negatively impact some usecases when,
>> > e.g., chainloading. Nonetheless, allocating big chunks of memory is
>> > required sometimes, where one example is the upcoming support for the
>> > Argon2 key derival function in LUKS2.
>> >
>> > In order to avoid pre-allocating big chunks of memory, this commit
>> > implements a runtime mechanism to add more pages to the system. When a
>> > given allocation cannot be currently satisfied, we'll call a given
>> > callback set up by the platform's own memory management subsystem,
>> > asking it to add a memory area with at least `n` bytes. If this
>> > succeeds, we retry searching for a valid memory region, which should now
>> > succeed.
>> >
>>
>> I implemented this for ieee1275-powerpc. I set the initial memory claim
>> to 1MB to match EFI and to exercise the code.
>>
>> Thoughts as I progressed:
>>
>>  - You probably need to think about how to satisfy requests with
>>    particular alignments: currently there is no way to specify that with
>>    the current interface, and I saw powerpc-ieee1275 return bunch of
>>    allocations at e.g 0x2a561e which is not particularly well aligned!
>
> I think at "firmware memory allocation" level we could always allocate
> page aligned regions. Of course this may not satisfy allocations
> aligned at larger values than a page. Though I think we can solve this
> by allocating larger regions from firmware. Please look below for more
> details...
>
>>  - You haven't included in the calculations the extra space required for
>>    mm housekeeping. For example, I'm seeing an allocation for 32kB be
>>    requested via grub_mm_add_region_fn. I dutifully allocate 32kB, with
>>    no alignment requirements, and pass that pointer to
>>    grub_mm_init_region. grub_mm_init_region throws away bytes at the
>>    start to get to GRUB_MM_ALIGN, then uses some bytes for the
>>    grub_mm_header_t, then any actual allocation uses bytes for the
>>    malloc metadata. So the actual underlying allocation cannot be
>>    satisfied.
>>
>>    I think you get away with this on EFI because you use BYTES_TO_PAGES
>>    and get page-aligned memory, but I think you should probably round up
>>    to the next power of 2 for smaller allocations or to the next page or
>>    so for larger allocations.
>
> I think we could allocate at least e.g. 128 MiB from firmware if there is
> not enough memory available in the GRUB mm. This way we would avoid frequent
> calls to firmware and could satisfy requests for larger alignments.
>
>>  - After fixing that in the ieee1275 code, all_functional_test
>>    hangs trying to run the cmdline_cat test. I think this is from a slow
>>    algorithm somewhere - the grub allocator isn't exactly optimised for
>>    a proliferation of regions.
>
> Could you try the solution proposed above? Maybe it will solve problem of
> frequent additions of memory to the GRUB mm.
>
>>  - I noticed that nearly all the allocations were under 1MB. This seems
>>    inefficient for a trip out to firmware. So I made the ieee1275 code
>>    allocate at least max(4MB, (size of allocation rounded up nearest
>>    1MB) + 4kB). This makes the tests run with only the usual failures,
>>    at least on pseries with debug on... still chasing some bugs beyond
>>    that.
>
> Yeah, this is similar to what I proposed above. Though I would want to see
> larger numbers tested as I said earlier.
>
>>  - The speed impact depends on the allocation size. I'll post something
>>    on that tomorrow, hopefully, but larger minimum allocations work
>>    noticably better.
>>
>>  - We only have 4GB max to play with because (at least) powerpc-ieee1275
>>    is technically defined to be 32 bit. So I'm a bit nervous about
>>    further large allocations unless we have a way to release them back
>>    to _firmware_, not just to grub.
>
> Ugh... This can be difficult. I am not sure the GRUB mm is smart enough
> to release memory regions if they are not used anymore by it.

I didn't explain this well. What I'm trying to say is this:

Say you require 1GB of memory temporarily.

You do this:

  {
    void *mem = grub_large_malloc(1024 * 1024 * 1024);
    operate_upon(mem);
    grub_large_free(mem);
  }

large_malloc and large_free go directly to firmware. We bypass the
existing grub mm infrastructure completely. This way, people who need
this sort of very large allocation do it in a way that can return the
memory to firmware.

The reason I like this approach more than integrating the memory into
the grub mm infrastructure is that it would be very possible to get
yourself into an unbootable situation by requesting so much memory from
firmware that there's no memory left to load the kernel and initrd into.
In the case of powerpc-ieee1275 there's another failure mode where can
load the kernel but then the kernel can't claim memory from FW for it's
purposes (this is before it quiesces openfirmware and takes control of
memory management itself) and so fails to boot.

I'll try your suggestions around 128MB allocations next week - I think
they will mostly work but I'm just a bit worried about the implications of
letting grub take arbitrarily large chunks of memory.

Kind regards,
Daniel

>
>> I would think a better overall approach would be to allocate the 1/4 of
>> ram when grub starts, and create a whole new interface for large slabs
>
> I am not very happy with allocating 1/4 of memory at start of the day.
> I think allocating larger chunks of memory from firmware should be
> enough to make things working as expected.
>
>> of memory that are directly allocated from, and directly returned to,
>> the firmware.
>
> Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions
  2021-09-02 12:40     ` Daniel Kiper
  2021-09-03 12:23       ` Daniel Axtens
@ 2021-09-06  8:23       ` Daniel Axtens
  2021-09-10  0:03         ` Daniel Kiper
  1 sibling, 1 reply; 19+ messages in thread
From: Daniel Axtens @ 2021-09-06  8:23 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: Patrick Steinhardt, grub-devel, Leif Lindholm, Stefan Berger

>>    I think you get away with this on EFI because you use BYTES_TO_PAGES
>>    and get page-aligned memory, but I think you should probably round up
>>    to the next power of 2 for smaller allocations or to the next page or
>>    so for larger allocations.
>
> I think we could allocate at least e.g. 128 MiB from firmware if there is
> not enough memory available in the GRUB mm. This way we would avoid frequent
> calls to firmware and could satisfy requests for larger alignments.

128 MiB and 64 MiB cause some tests to fail (cannot allocate memory in echo1
or compression tests because there isn't enough free memory to get a
64MiB chunk). 32 MiB chunk size seems to work and seems fast enough.

[It's a bit hard to tell because at some point in time time the powerpc
machine stopped shutting down when we got to the end of the tests. oh
well.]

>>  - After fixing that in the ieee1275 code, all_functional_test
>>    hangs trying to run the cmdline_cat test. I think this is from a slow
>>    algorithm somewhere - the grub allocator isn't exactly optimised for
>>    a proliferation of regions.
>
> Could you try the solution proposed above? Maybe it will solve problem of
> frequent additions of memory to the GRUB mm.
>
>>  - I noticed that nearly all the allocations were under 1MB. This seems
>>    inefficient for a trip out to firmware. So I made the ieee1275 code
>>    allocate at least max(4MB, (size of allocation rounded up nearest
>>    1MB) + 4kB). This makes the tests run with only the usual failures,
>>    at least on pseries with debug on... still chasing some bugs beyond
>>    that.
>
> Yeah, this is similar to what I proposed above. Though I would want to see
> larger numbers tested as I said earlier.
>
>>  - The speed impact depends on the allocation size. I'll post something
>>    on that tomorrow, hopefully, but larger minimum allocations work
>>    noticably better.
>>
>>  - We only have 4GB max to play with because (at least) powerpc-ieee1275
>>    is technically defined to be 32 bit. So I'm a bit nervous about
>>    further large allocations unless we have a way to release them back
>>    to _firmware_, not just to grub.
>
> Ugh... This can be difficult. I am not sure the GRUB mm is smart enough
> to release memory regions if they are not used anymore by it.
>
>> I would think a better overall approach would be to allocate the 1/4 of
>> ram when grub starts, and create a whole new interface for large slabs
>
> I am not very happy with allocating 1/4 of memory at start of the day.
> I think allocating larger chunks of memory from firmware should be
> enough to make things working as expected.

Maybe the per-platform memory chunk allocator just needs to be smart
enough to make sure that there is enough memory left over to load a
"normal sized" kernel and initrd... although the sizes of distro images
keep going up so that's going to be a bit fraught.

>> of memory that are directly allocated from, and directly returned to,
>> the firmware.

I still would really prefer to bypass grub mm completely as described in
my other mail. If we are able to give memory back to fw, we can claim
1GB chunks (on SLOF, PFW is going to be another issue) without having to
worry about where we put them and if we have enough memory to load a
kernel or initrd. It makes it much harder accidentally render your
system unbootable.

Kind regards,
Daniel



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions
  2021-09-06  8:23       ` Daniel Axtens
@ 2021-09-10  0:03         ` Daniel Kiper
  2021-09-27 14:18           ` Daniel Axtens
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Kiper @ 2021-09-10  0:03 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Patrick Steinhardt, grub-devel, Leif Lindholm, Stefan Berger

On Mon, Sep 06, 2021 at 06:23:19PM +1000, Daniel Axtens wrote:
> >>    I think you get away with this on EFI because you use BYTES_TO_PAGES
> >>    and get page-aligned memory, but I think you should probably round up
> >>    to the next power of 2 for smaller allocations or to the next page or
> >>    so for larger allocations.
> >
> > I think we could allocate at least e.g. 128 MiB from firmware if there is
> > not enough memory available in the GRUB mm. This way we would avoid frequent
> > calls to firmware and could satisfy requests for larger alignments.
>
> 128 MiB and 64 MiB cause some tests to fail (cannot allocate memory in echo1
> or compression tests because there isn't enough free memory to get a
> 64MiB chunk). 32 MiB chunk size seems to work and seems fast enough.

Nice...

> [It's a bit hard to tell because at some point in time time the powerpc
> machine stopped shutting down when we got to the end of the tests. oh
> well.]

Ohhh... :-(

> >>  - After fixing that in the ieee1275 code, all_functional_test
> >>    hangs trying to run the cmdline_cat test. I think this is from a slow
> >>    algorithm somewhere - the grub allocator isn't exactly optimised for
> >>    a proliferation of regions.
> >
> > Could you try the solution proposed above? Maybe it will solve problem of
> > frequent additions of memory to the GRUB mm.
> >
> >>  - I noticed that nearly all the allocations were under 1MB. This seems
> >>    inefficient for a trip out to firmware. So I made the ieee1275 code
> >>    allocate at least max(4MB, (size of allocation rounded up nearest
> >>    1MB) + 4kB). This makes the tests run with only the usual failures,
> >>    at least on pseries with debug on... still chasing some bugs beyond
> >>    that.
> >
> > Yeah, this is similar to what I proposed above. Though I would want to see
> > larger numbers tested as I said earlier.
> >
> >>  - The speed impact depends on the allocation size. I'll post something
> >>    on that tomorrow, hopefully, but larger minimum allocations work
> >>    noticably better.
> >>
> >>  - We only have 4GB max to play with because (at least) powerpc-ieee1275
> >>    is technically defined to be 32 bit. So I'm a bit nervous about
> >>    further large allocations unless we have a way to release them back
> >>    to _firmware_, not just to grub.
> >
> > Ugh... This can be difficult. I am not sure the GRUB mm is smart enough
> > to release memory regions if they are not used anymore by it.
> >
> >> I would think a better overall approach would be to allocate the 1/4 of
> >> ram when grub starts, and create a whole new interface for large slabs
> >
> > I am not very happy with allocating 1/4 of memory at start of the day.
> > I think allocating larger chunks of memory from firmware should be
> > enough to make things working as expected.
>
> Maybe the per-platform memory chunk allocator just needs to be smart
> enough to make sure that there is enough memory left over to load a
> "normal sized" kernel and initrd... although the sizes of distro images
> keep going up so that's going to be a bit fraught.
>
> >> of memory that are directly allocated from, and directly returned to,
> >> the firmware.
>
> I still would really prefer to bypass grub mm completely as described in
> my other mail. If we are able to give memory back to fw, we can claim
> 1GB chunks (on SLOF, PFW is going to be another issue) without having to
> worry about where we put them and if we have enough memory to load a
> kernel or initrd. It makes it much harder accidentally render your
> system unbootable.

I like your approach because you can return memory to the firmware (the
situation in the UEFI is simpler because all memory allocated by the
GRUB and in general by UEFI applications should be marked, usually, as
"loader data" in UEFI memory map; then after EBS "loader data" should be
treated as free memory; so, we do not need to return allocated memory
regions to the UEFI explicitly; I am not sure it is possible in IEEE 1275
firmware). However, I think your solution have at least two problems:
  - if you skip GRUB mm you cannot use relocator which means you cannot
    load and use big initrds,
  - you add third family of memory management functions to the GRUB.

So, I would still try to use mm. However, maybe we should improve
algorithm which allocates memory at the GRUB init. In general we should
have after init enough memory in the GRUB to store <handwaving> mm
structures which describe whole or most of system memory plus an amount
of RAM needed to run loaded core.img </handwaving>. Then I think we
should stop seeing "firmware allocation" failures in later phases. The
tricky part here is an algorithm which properly estimates initial memory
requirements.

Daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions
  2021-09-03 12:23       ` Daniel Axtens
@ 2021-09-12 11:13         ` Patrick Steinhardt
  0 siblings, 0 replies; 19+ messages in thread
From: Patrick Steinhardt @ 2021-09-12 11:13 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: Daniel Kiper, grub-devel, Leif Lindholm, Stefan Berger

[-- Attachment #1: Type: text/plain, Size: 7040 bytes --]

On Fri, Sep 03, 2021 at 10:23:15PM +1000, Daniel Axtens wrote:
> Daniel Kiper <dkiper@net-space.pl> writes:
> 
> > On Thu, Sep 02, 2021 at 12:48:24AM +1000, Daniel Axtens wrote:
> >> Patrick Steinhardt <ps@pks.im> writes:
> >>
> >> > Currently, all platforms will set up their heap on initialization of the
> >> > platform code. While this works mostly fine, it poses some limitations
> >> > on memory management on us. Most notably, allocating big chunks of
> >> > memory in the gigabyte range would require us to pre-request this many
> >> > bytes from the firmware and add it to the heap from the beginning on
> >> > some platforms like EFI. As this isn't needed for most configurations,
> >> > it is inefficient and may even negatively impact some usecases when,
> >> > e.g., chainloading. Nonetheless, allocating big chunks of memory is
> >> > required sometimes, where one example is the upcoming support for the
> >> > Argon2 key derival function in LUKS2.
> >> >
> >> > In order to avoid pre-allocating big chunks of memory, this commit
> >> > implements a runtime mechanism to add more pages to the system. When a
> >> > given allocation cannot be currently satisfied, we'll call a given
> >> > callback set up by the platform's own memory management subsystem,
> >> > asking it to add a memory area with at least `n` bytes. If this
> >> > succeeds, we retry searching for a valid memory region, which should now
> >> > succeed.
> >> >
> >>
> >> I implemented this for ieee1275-powerpc. I set the initial memory claim
> >> to 1MB to match EFI and to exercise the code.
> >>
> >> Thoughts as I progressed:
> >>
> >>  - You probably need to think about how to satisfy requests with
> >>    particular alignments: currently there is no way to specify that with
> >>    the current interface, and I saw powerpc-ieee1275 return bunch of
> >>    allocations at e.g 0x2a561e which is not particularly well aligned!
> >
> > I think at "firmware memory allocation" level we could always allocate
> > page aligned regions. Of course this may not satisfy allocations
> > aligned at larger values than a page. Though I think we can solve this
> > by allocating larger regions from firmware. Please look below for more
> > details...
> >
> >>  - You haven't included in the calculations the extra space required for
> >>    mm housekeeping. For example, I'm seeing an allocation for 32kB be
> >>    requested via grub_mm_add_region_fn. I dutifully allocate 32kB, with
> >>    no alignment requirements, and pass that pointer to
> >>    grub_mm_init_region. grub_mm_init_region throws away bytes at the
> >>    start to get to GRUB_MM_ALIGN, then uses some bytes for the
> >>    grub_mm_header_t, then any actual allocation uses bytes for the
> >>    malloc metadata. So the actual underlying allocation cannot be
> >>    satisfied.
> >>
> >>    I think you get away with this on EFI because you use BYTES_TO_PAGES
> >>    and get page-aligned memory, but I think you should probably round up
> >>    to the next power of 2 for smaller allocations or to the next page or
> >>    so for larger allocations.
> >
> > I think we could allocate at least e.g. 128 MiB from firmware if there is
> > not enough memory available in the GRUB mm. This way we would avoid frequent
> > calls to firmware and could satisfy requests for larger alignments.
> >
> >>  - After fixing that in the ieee1275 code, all_functional_test
> >>    hangs trying to run the cmdline_cat test. I think this is from a slow
> >>    algorithm somewhere - the grub allocator isn't exactly optimised for
> >>    a proliferation of regions.
> >
> > Could you try the solution proposed above? Maybe it will solve problem of
> > frequent additions of memory to the GRUB mm.
> >
> >>  - I noticed that nearly all the allocations were under 1MB. This seems
> >>    inefficient for a trip out to firmware. So I made the ieee1275 code
> >>    allocate at least max(4MB, (size of allocation rounded up nearest
> >>    1MB) + 4kB). This makes the tests run with only the usual failures,
> >>    at least on pseries with debug on... still chasing some bugs beyond
> >>    that.
> >
> > Yeah, this is similar to what I proposed above. Though I would want to see
> > larger numbers tested as I said earlier.
> >
> >>  - The speed impact depends on the allocation size. I'll post something
> >>    on that tomorrow, hopefully, but larger minimum allocations work
> >>    noticably better.
> >>
> >>  - We only have 4GB max to play with because (at least) powerpc-ieee1275
> >>    is technically defined to be 32 bit. So I'm a bit nervous about
> >>    further large allocations unless we have a way to release them back
> >>    to _firmware_, not just to grub.
> >
> > Ugh... This can be difficult. I am not sure the GRUB mm is smart enough
> > to release memory regions if they are not used anymore by it.
> 
> I didn't explain this well. What I'm trying to say is this:
> 
> Say you require 1GB of memory temporarily.
> 
> You do this:
> 
>   {
>     void *mem = grub_large_malloc(1024 * 1024 * 1024);
>     operate_upon(mem);
>     grub_large_free(mem);
>   }
> 
> large_malloc and large_free go directly to firmware. We bypass the
> existing grub mm infrastructure completely. This way, people who need
> this sort of very large allocation do it in a way that can return the
> memory to firmware.
> 
> The reason I like this approach more than integrating the memory into
> the grub mm infrastructure is that it would be very possible to get
> yourself into an unbootable situation by requesting so much memory from
> firmware that there's no memory left to load the kernel and initrd into.
> In the case of powerpc-ieee1275 there's another failure mode where can
> load the kernel but then the kernel can't claim memory from FW for it's
> purposes (this is before it quiesces openfirmware and takes control of
> memory management itself) and so fails to boot.
> 
> I'll try your suggestions around 128MB allocations next week - I think
> they will mostly work but I'm just a bit worried about the implications of
> letting grub take arbitrarily large chunks of memory.
> 
> Kind regards,
> Daniel

To be frank, I don't really mind whether we integrate it into mm
directly or make those large allocations explicit: both work equally
well for my usecase of allocating memory for memory-hard KDFs. So I'd
rather defer the decision to you folks to judge which approach will suit
GRUB better in the long term.

If we decide to continue with my approach, then I'll try to implement
the feedback provided by Daniel, which is to:

    - Use chunk sizes such that we do not have many tiny allocations,
      but instead amortize costs (e.g. 32MB chunks).

    - Have a look at supporting memory alignments.

    - Account for the mm overhead.

But I'll postpone sending another version until we have reached an
agreement of where we want to go.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions
  2021-09-10  0:03         ` Daniel Kiper
@ 2021-09-27 14:18           ` Daniel Axtens
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Axtens @ 2021-09-27 14:18 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: Patrick Steinhardt, grub-devel, Leif Lindholm, Stefan Berger

Hi Daniel,

>> I still would really prefer to bypass grub mm completely as described in
>> my other mail. If we are able to give memory back to fw, we can claim
>> 1GB chunks (on SLOF, PFW is going to be another issue) without having to
>> worry about where we put them and if we have enough memory to load a
>> kernel or initrd. It makes it much harder accidentally render your
>> system unbootable.
>
> I like your approach because you can return memory to the firmware (the
> situation in the UEFI is simpler because all memory allocated by the
> GRUB and in general by UEFI applications should be marked, usually, as
> "loader data" in UEFI memory map; then after EBS "loader data" should be
> treated as free memory; so, we do not need to return allocated memory
> regions to the UEFI explicitly; I am not sure it is possible in IEEE 1275
> firmware).

Once we leave openfirmware then we can do whatever we want with the
memory and we don't need to return it. However, this doesn't happen
until well after we get into linux*, and early parts of linux* use
openfirmware allocation methods, so unfortunately we do have to care
about the firmware's view of memory after we leave grub :(

As an aside, grub's ieee1275 linux loader iterates over the firmware's
view of available memory. It's an interesting thought experiment to
consider if we could:

 - also/alternatively look in Grub's view of the available memory for
   places to load the kernel
 
 - return a bunch of grub memory to firmware right before we jump to
   linux.

Maybe a project for another time.

> However, I think your solution have at least two problems:
>   - if you skip GRUB mm you cannot use relocator which means you cannot
>     load and use big initrds,

OK, I did not understand the role of the relocator - we don't use it to
boot linux under powerpc-ieee1275. I've done some reading up on it, I
found https://www.kernel.org/doc/html/latest/x86/boot.html very helpful.
If I understand correctly the relocator can be important if there are
things that have to be loaded within particular address ranges.

My hope was that people who used the mm-skipping were extremely diligent
to return the memory - e.g. you used it only within argon2 and then
immediately returned it. However, I understand memory management is a
hard problem and so it's reasonably not to want to add more ways to
shoot yourself in the foot.

>   - you add third family of memory management functions to the GRUB.

This is a fair criticism and I find it much more convincing now that I
know what the relocator is.

> So, I would still try to use mm. However, maybe we should improve
> algorithm which allocates memory at the GRUB init. In general we should
> have after init enough memory in the GRUB to store <handwaving> mm
> structures which describe whole or most of system memory plus an amount
> of RAM needed to run loaded core.img </handwaving>. Then I think we
> should stop seeing "firmware allocation" failures in later phases. The
> tricky part here is an algorithm which properly estimates initial memory
> requirements.

Ok so I still don't love sticking with mm, but I think I can live with
it. If we do go down this road:

 - The thing I really want for powerpc-ieee1275 is to have the platform
   code cap the amount of memory that you can acquire. I did a fair
   amount of research and maths on the edge cases of powerpc + PowerVM +
   current distro kernels and I think if grub allocates more than 1/4 of
   the RMA/OpenFirmware-visible memory we are going to really start to
   struggle to boot. I am happy to write the code to handle this on
   powerpc.

 - I'd like the grub mm to become a bit smarter: let's try to keep our
   regions from becoming fragmented and actively return entire regions
   to firmware if they become empty. Because powerpc-ieee1275 can be a
   weirdly memory-constrained platform in the early boot stages, I don't
   want to have to debug a situation where grub has chewed up a bunch of
   memory in mostly empty allocations and then can't allocate enough
   memory to boot.

   (To illustrate this with an example, say you are trying to verify a
   30MB kernel, it fails, you verify a 40MB kernel, it fails, you verify
   a 50MB kernel, it fails. Each of those verifications will require a
   completely new allocation from firmware, and if you can't give them
   back you've now locked up 120MB of memory!)

   I'm not saying we need a full-blown buddy allocator, although I
   certainly wouldn't object to a buddy allocator!

   I would like to say that I have time to do this but it would end up
   pretty far down my list. :(

 - For powerpc I would be happy to stick with the current initial memory
   claim - 32MB - or maybe consolidate with x86 and move to 64MB if that
   works for older/smaller systems - and then claim extra memory as
   required.

I don't know how much memory you want for EFI. If you want to have
memory to describe the memory map + extra for core.img, maybe you do
want to stick with a ratio, just a smaller ratio - maybe 1/8th?

Kind regards,
Daniel

* it is complex. prom_init runs as an openfirmare client, does a bunch
  of stuff, quiesces openfirmware (stops hardware DMAs etc) and then
  jumps to __start/Linux proper. Then we start the MMU, etc etc. At some
  point we take over exceptions too. (Take all of this with a grain of
  salt too, I'm not an expert in linux arch/powerpc boot process.)


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-09-27 14:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-15 11:09 [PATCH v3 0/6] Runtime allocation of memory regions Patrick Steinhardt
2021-08-15 11:09 ` [PATCH v3 1/6] mm: Drop unused unloading of modules on OOM Patrick Steinhardt
2021-08-15 11:09 ` [PATCH v3 2/6] mm: Allow dynamically requesting additional memory regions Patrick Steinhardt
2021-09-01 14:48   ` Daniel Axtens
2021-09-02 12:40     ` Daniel Kiper
2021-09-03 12:23       ` Daniel Axtens
2021-09-12 11:13         ` Patrick Steinhardt
2021-09-06  8:23       ` Daniel Axtens
2021-09-10  0:03         ` Daniel Kiper
2021-09-27 14:18           ` Daniel Axtens
2021-08-15 11:09 ` [PATCH v3 3/6] efi: mm: Always request a fixed number of pages on init Patrick Steinhardt
2021-08-15 11:09 ` [PATCH v3 4/6] efi: mm: Extract function to add memory regions Patrick Steinhardt
2021-08-15 11:09 ` [PATCH v3 5/6] efi: mm: Pass up errors from `add_memory_regions ()` Patrick Steinhardt
2021-08-15 11:09 ` [PATCH v3 6/6] efi: mm: Implement runtime addition of pages Patrick Steinhardt
2021-08-26 15:12 ` [PATCH v3 0/6] Runtime allocation of memory regions Daniel Kiper
2021-08-27  3:39   ` Daniel Axtens
2021-08-30 17:49     ` Daniel Kiper
2021-08-30 18:05       ` Patrick Steinhardt
2021-09-01  5:50         ` Daniel Axtens

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.