All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] efi: Fix handling of multiple efi_fake_mem= entries
@ 2019-12-31 22:04 ` Dan Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo
  Cc: Taku Izumi, Thomas Gleixner, Dave Young, Ingo Molnar,
	Michael Weiser, Ard Biesheuvel, linux-kernel, linux-efi, kexec,
	x86

Changes since v1 [1]:
- Add support for garbage collecting idle efi_memmap_alloc() allocations

- As Dave noticed the same problem of calling efi_memmap_split_count()
  multiple times with the old map also applies to efi_memmap_insert().
  Update efi_fake_memmap() to invoke efi_memmap_install() after every
  efi_memmap_insert() call.

[1]: https://lore.kernel.org/r/157773590338.4153451.5898675419563883883.stgit@dwillia2-desk3.amr.corp.intel.com/

---

Copied from patch4:

Dave noticed that when specifying multiple efi_fake_mem= entries only
the last entry was successfully being reflected in the efi memory map.
This is due to the fact that the efi_memmap_insert() is being called
multiple times, but on successive invocations the insertion should be
applied to the last new memmap rather than the original map at
efi_fake_memmap() entry.

Rework efi_fake_memmap() to install the new memory map after each
efi_fake_mem= entry is parsed.

This also fixes an issue in efi_fake_memmap() that caused it to litter
emtpy entries into the end of the efi memory map. The empty entry causes
efi_memmap_insert() to attempt more memmap splits / copies than
efi_memmap_split_count() accounted for when sizing the new map.

    BUG: unable to handle page fault for address: ffffffffff281000
    [..]
    RIP: 0010:efi_memmap_insert+0x11d/0x191
    [..]
    Call Trace:
     ? bgrt_init+0xbe/0xbe
     ? efi_arch_mem_reserve+0x1cb/0x228
     ? acpi_parse_bgrt+0xa/0xd
     ? acpi_table_parse+0x86/0xb8
     ? acpi_boot_init+0x494/0x4e3
     ? acpi_parse_x2apic+0x87/0x87
     ? setup_acpi_sci+0xa2/0xa2
     ? setup_arch+0x8db/0x9e1
     ? start_kernel+0x6a/0x547
     ? secondary_startup_64+0xb6/0xc0

Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
services data to fix kexec breakage" is listed in Fixes: since it
introduces more occurrences where efi_memmap_insert() is invoked after
an efi_fake_mem= configuration has been parsed. Previously the side
effects of vestigial empty entries were benign, but with commit
af1648984828 that follow-on efi_memmap_insert() invocation triggers the
above crash signature.

---

Dan Williams (4):
      efi: Add a flags parameter to efi_memory_map
      efi: Add tracking for dynamically allocated memmaps
      efi: Fix efi_memmap_alloc() leaks
      efi: Fix handling of multiple efi_fake_mem= entries


 arch/x86/platform/efi/efi.c     |    2 +
 arch/x86/platform/efi/quirks.c  |   11 ++++---
 drivers/firmware/efi/fake_mem.c |   37 +++++++++++++------------
 drivers/firmware/efi/memmap.c   |   58 ++++++++++++++++++++++++++++++---------
 include/linux/efi.h             |   13 +++++++--
 5 files changed, 81 insertions(+), 40 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 0/4] efi: Fix handling of multiple efi_fake_mem= entries
@ 2019-12-31 22:04 ` Dan Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo
  Cc: Michael Weiser, linux-efi, Ard Biesheuvel, x86, kexec,
	linux-kernel, Taku Izumi, Thomas Gleixner, Dave Young,
	Ingo Molnar

Changes since v1 [1]:
- Add support for garbage collecting idle efi_memmap_alloc() allocations

- As Dave noticed the same problem of calling efi_memmap_split_count()
  multiple times with the old map also applies to efi_memmap_insert().
  Update efi_fake_memmap() to invoke efi_memmap_install() after every
  efi_memmap_insert() call.

[1]: https://lore.kernel.org/r/157773590338.4153451.5898675419563883883.stgit@dwillia2-desk3.amr.corp.intel.com/

---

Copied from patch4:

Dave noticed that when specifying multiple efi_fake_mem= entries only
the last entry was successfully being reflected in the efi memory map.
This is due to the fact that the efi_memmap_insert() is being called
multiple times, but on successive invocations the insertion should be
applied to the last new memmap rather than the original map at
efi_fake_memmap() entry.

Rework efi_fake_memmap() to install the new memory map after each
efi_fake_mem= entry is parsed.

This also fixes an issue in efi_fake_memmap() that caused it to litter
emtpy entries into the end of the efi memory map. The empty entry causes
efi_memmap_insert() to attempt more memmap splits / copies than
efi_memmap_split_count() accounted for when sizing the new map.

    BUG: unable to handle page fault for address: ffffffffff281000
    [..]
    RIP: 0010:efi_memmap_insert+0x11d/0x191
    [..]
    Call Trace:
     ? bgrt_init+0xbe/0xbe
     ? efi_arch_mem_reserve+0x1cb/0x228
     ? acpi_parse_bgrt+0xa/0xd
     ? acpi_table_parse+0x86/0xb8
     ? acpi_boot_init+0x494/0x4e3
     ? acpi_parse_x2apic+0x87/0x87
     ? setup_acpi_sci+0xa2/0xa2
     ? setup_arch+0x8db/0x9e1
     ? start_kernel+0x6a/0x547
     ? secondary_startup_64+0xb6/0xc0

Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
services data to fix kexec breakage" is listed in Fixes: since it
introduces more occurrences where efi_memmap_insert() is invoked after
an efi_fake_mem= configuration has been parsed. Previously the side
effects of vestigial empty entries were benign, but with commit
af1648984828 that follow-on efi_memmap_insert() invocation triggers the
above crash signature.

---

Dan Williams (4):
      efi: Add a flags parameter to efi_memory_map
      efi: Add tracking for dynamically allocated memmaps
      efi: Fix efi_memmap_alloc() leaks
      efi: Fix handling of multiple efi_fake_mem= entries


 arch/x86/platform/efi/efi.c     |    2 +
 arch/x86/platform/efi/quirks.c  |   11 ++++---
 drivers/firmware/efi/fake_mem.c |   37 +++++++++++++------------
 drivers/firmware/efi/memmap.c   |   58 ++++++++++++++++++++++++++++++---------
 include/linux/efi.h             |   13 +++++++--
 5 files changed, 81 insertions(+), 40 deletions(-)

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/4] efi: Add a flags parameter to efi_memory_map
  2019-12-31 22:04 ` Dan Williams
@ 2019-12-31 22:04   ` Dan Williams
  -1 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo; +Cc: Taku Izumi, Ard Biesheuvel, linux-kernel, linux-efi, kexec, x86

In preparation for garbage collecting dynamically allocated efi memory
maps, where the allocation method of memblock vs slab needs to be
recalled, convert the existing 'late' flag into a 'flags' bitmask.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/memmap.c |   24 +++++++++++++-----------
 include/linux/efi.h           |    3 ++-
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 38b686c67b17..813674ef9000 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -52,7 +52,7 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
- * @late: Use early or late mapping function?
+ * @flags: Use early or late mapping function?
  *
  * This function takes care of figuring out which function to use to
  * map the EFI memory map in efi.memmap based on how far into the boot
@@ -66,9 +66,9 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
  * Returns zero on success, a negative error code on failure.
  */
 static int __init
-__efi_memmap_init(struct efi_memory_map_data *data, bool late)
+__efi_memmap_init(struct efi_memory_map_data *data, unsigned long flags)
 {
-	struct efi_memory_map map;
+	struct efi_memory_map map = { 0 };
 	phys_addr_t phys_map;
 
 	if (efi_enabled(EFI_PARAVIRT))
@@ -76,7 +76,7 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 
 	phys_map = data->phys_map;
 
-	if (late)
+	if (flags & EFI_MEMMAP_LATE)
 		map.map = memremap(phys_map, data->size, MEMREMAP_WB);
 	else
 		map.map = early_memremap(phys_map, data->size);
@@ -92,7 +92,7 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 
 	map.desc_version = data->desc_version;
 	map.desc_size = data->desc_size;
-	map.late = late;
+	map.flags |= flags;
 
 	set_bit(EFI_MEMMAP, &efi.flags);
 
@@ -111,9 +111,9 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 int __init efi_memmap_init_early(struct efi_memory_map_data *data)
 {
 	/* Cannot go backwards */
-	WARN_ON(efi.memmap.late);
+	WARN_ON(efi.memmap.flags & EFI_MEMMAP_LATE);
 
-	return __efi_memmap_init(data, false);
+	return __efi_memmap_init(data, 0);
 }
 
 void __init efi_memmap_unmap(void)
@@ -121,7 +121,7 @@ void __init efi_memmap_unmap(void)
 	if (!efi_enabled(EFI_MEMMAP))
 		return;
 
-	if (!efi.memmap.late) {
+	if (!(efi.memmap.flags & EFI_MEMMAP_LATE)) {
 		unsigned long size;
 
 		size = efi.memmap.desc_size * efi.memmap.nr_map;
@@ -168,7 +168,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 	WARN_ON(efi.memmap.map);
 
 	/* Were we already called? */
-	WARN_ON(efi.memmap.late);
+	WARN_ON(efi.memmap.flags & EFI_MEMMAP_LATE);
 
 	/*
 	 * It makes no sense to allow callers to register different
@@ -178,7 +178,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 	data.desc_version = efi.memmap.desc_version;
 	data.desc_size = efi.memmap.desc_size;
 
-	return __efi_memmap_init(&data, true);
+	return __efi_memmap_init(&data, EFI_MEMMAP_LATE);
 }
 
 /**
@@ -195,6 +195,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
 {
 	struct efi_memory_map_data data;
+	unsigned long flags;
 
 	efi_memmap_unmap();
 
@@ -202,8 +203,9 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
 	data.size = efi.memmap.desc_size * nr_map;
 	data.desc_version = efi.memmap.desc_version;
 	data.desc_size = efi.memmap.desc_size;
+	flags = efi.memmap.flags & EFI_MEMMAP_LATE;
 
-	return __efi_memmap_init(&data, efi.memmap.late);
+	return __efi_memmap_init(&data, flags);
 }
 
 /**
diff --git a/include/linux/efi.h b/include/linux/efi.h
index aa54586db7a5..b8e930f5ff77 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -795,7 +795,8 @@ struct efi_memory_map {
 	int nr_map;
 	unsigned long desc_version;
 	unsigned long desc_size;
-	bool late;
+#define EFI_MEMMAP_LATE (1UL << 0)
+	unsigned long flags;
 };
 
 struct efi_mem_range {


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 1/4] efi: Add a flags parameter to efi_memory_map
@ 2019-12-31 22:04   ` Dan Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo; +Cc: linux-efi, Ard Biesheuvel, x86, kexec, linux-kernel, Taku Izumi

In preparation for garbage collecting dynamically allocated efi memory
maps, where the allocation method of memblock vs slab needs to be
recalled, convert the existing 'late' flag into a 'flags' bitmask.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/memmap.c |   24 +++++++++++++-----------
 include/linux/efi.h           |    3 ++-
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 38b686c67b17..813674ef9000 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -52,7 +52,7 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
- * @late: Use early or late mapping function?
+ * @flags: Use early or late mapping function?
  *
  * This function takes care of figuring out which function to use to
  * map the EFI memory map in efi.memmap based on how far into the boot
@@ -66,9 +66,9 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
  * Returns zero on success, a negative error code on failure.
  */
 static int __init
-__efi_memmap_init(struct efi_memory_map_data *data, bool late)
+__efi_memmap_init(struct efi_memory_map_data *data, unsigned long flags)
 {
-	struct efi_memory_map map;
+	struct efi_memory_map map = { 0 };
 	phys_addr_t phys_map;
 
 	if (efi_enabled(EFI_PARAVIRT))
@@ -76,7 +76,7 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 
 	phys_map = data->phys_map;
 
-	if (late)
+	if (flags & EFI_MEMMAP_LATE)
 		map.map = memremap(phys_map, data->size, MEMREMAP_WB);
 	else
 		map.map = early_memremap(phys_map, data->size);
@@ -92,7 +92,7 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 
 	map.desc_version = data->desc_version;
 	map.desc_size = data->desc_size;
-	map.late = late;
+	map.flags |= flags;
 
 	set_bit(EFI_MEMMAP, &efi.flags);
 
@@ -111,9 +111,9 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 int __init efi_memmap_init_early(struct efi_memory_map_data *data)
 {
 	/* Cannot go backwards */
-	WARN_ON(efi.memmap.late);
+	WARN_ON(efi.memmap.flags & EFI_MEMMAP_LATE);
 
-	return __efi_memmap_init(data, false);
+	return __efi_memmap_init(data, 0);
 }
 
 void __init efi_memmap_unmap(void)
@@ -121,7 +121,7 @@ void __init efi_memmap_unmap(void)
 	if (!efi_enabled(EFI_MEMMAP))
 		return;
 
-	if (!efi.memmap.late) {
+	if (!(efi.memmap.flags & EFI_MEMMAP_LATE)) {
 		unsigned long size;
 
 		size = efi.memmap.desc_size * efi.memmap.nr_map;
@@ -168,7 +168,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 	WARN_ON(efi.memmap.map);
 
 	/* Were we already called? */
-	WARN_ON(efi.memmap.late);
+	WARN_ON(efi.memmap.flags & EFI_MEMMAP_LATE);
 
 	/*
 	 * It makes no sense to allow callers to register different
@@ -178,7 +178,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 	data.desc_version = efi.memmap.desc_version;
 	data.desc_size = efi.memmap.desc_size;
 
-	return __efi_memmap_init(&data, true);
+	return __efi_memmap_init(&data, EFI_MEMMAP_LATE);
 }
 
 /**
@@ -195,6 +195,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
 {
 	struct efi_memory_map_data data;
+	unsigned long flags;
 
 	efi_memmap_unmap();
 
@@ -202,8 +203,9 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
 	data.size = efi.memmap.desc_size * nr_map;
 	data.desc_version = efi.memmap.desc_version;
 	data.desc_size = efi.memmap.desc_size;
+	flags = efi.memmap.flags & EFI_MEMMAP_LATE;
 
-	return __efi_memmap_init(&data, efi.memmap.late);
+	return __efi_memmap_init(&data, flags);
 }
 
 /**
diff --git a/include/linux/efi.h b/include/linux/efi.h
index aa54586db7a5..b8e930f5ff77 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -795,7 +795,8 @@ struct efi_memory_map {
 	int nr_map;
 	unsigned long desc_version;
 	unsigned long desc_size;
-	bool late;
+#define EFI_MEMMAP_LATE (1UL << 0)
+	unsigned long flags;
 };
 
 struct efi_mem_range {


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/4] efi: Add tracking for dynamically allocated memmaps
  2019-12-31 22:04 ` Dan Williams
@ 2019-12-31 22:04   ` Dan Williams
  -1 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo; +Cc: Taku Izumi, Ard Biesheuvel, linux-kernel, linux-efi, kexec, x86

In preparation for fixing efi_memmap_alloc() leaks, add support for
recording whether the memmap was dynamically allocated from slab,
memblock, or is the original physical memmap provided by the platform.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/platform/efi/efi.c     |    2 +-
 arch/x86/platform/efi/quirks.c  |   11 ++++++-----
 drivers/firmware/efi/fake_mem.c |    5 +++--
 drivers/firmware/efi/memmap.c   |   16 ++++++++++------
 include/linux/efi.h             |    8 ++++++--
 5 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 38d44f36d5ed..7086afbb84fd 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -333,7 +333,7 @@ static void __init efi_clean_memmap(void)
 		u64 size = efi.memmap.nr_map - n_removal;
 
 		pr_warn("Removing %d invalid memory map entries.\n", n_removal);
-		efi_memmap_install(efi.memmap.phys_map, size);
+		efi_memmap_install(efi.memmap.phys_map, size, 0);
 	}
 }
 
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f8f0220b6a66..4a71c790f9c3 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -244,6 +244,7 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
 void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 {
 	phys_addr_t new_phys, new_size;
+	unsigned long flags = 0;
 	struct efi_mem_range mr;
 	efi_memory_desc_t md;
 	int num_entries;
@@ -272,8 +273,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 	num_entries += efi.memmap.nr_map;
 
 	new_size = efi.memmap.desc_size * num_entries;
-
-	new_phys = efi_memmap_alloc(num_entries);
+	new_phys = efi_memmap_alloc(num_entries, &flags);
 	if (!new_phys) {
 		pr_err("Could not allocate boot services memmap\n");
 		return;
@@ -288,7 +288,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 	efi_memmap_insert(&efi.memmap, new, &mr);
 	early_memunmap(new, new_size);
 
-	efi_memmap_install(new_phys, num_entries);
+	efi_memmap_install(new_phys, num_entries, flags);
 	e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
 	e820__update_table(e820_table);
 }
@@ -408,6 +408,7 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
 void __init efi_free_boot_services(void)
 {
 	phys_addr_t new_phys, new_size;
+	unsigned long flags = 0;
 	efi_memory_desc_t *md;
 	int num_entries = 0;
 	void *new, *new_md;
@@ -463,7 +464,7 @@ void __init efi_free_boot_services(void)
 		return;
 
 	new_size = efi.memmap.desc_size * num_entries;
-	new_phys = efi_memmap_alloc(num_entries);
+	new_phys = efi_memmap_alloc(num_entries, &flags);
 	if (!new_phys) {
 		pr_err("Failed to allocate new EFI memmap\n");
 		return;
@@ -493,7 +494,7 @@ void __init efi_free_boot_services(void)
 
 	memunmap(new);
 
-	if (efi_memmap_install(new_phys, num_entries)) {
+	if (efi_memmap_install(new_phys, num_entries, flags)) {
 		pr_err("Could not install new EFI memmap\n");
 		return;
 	}
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index bb9fc70d0cfa..7e53e5520548 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -39,6 +39,7 @@ void __init efi_fake_memmap(void)
 	int new_nr_map = efi.memmap.nr_map;
 	efi_memory_desc_t *md;
 	phys_addr_t new_memmap_phy;
+	unsigned long flags = 0;
 	void *new_memmap;
 	int i;
 
@@ -55,7 +56,7 @@ void __init efi_fake_memmap(void)
 	}
 
 	/* allocate memory for new EFI memmap */
-	new_memmap_phy = efi_memmap_alloc(new_nr_map);
+	new_memmap_phy = efi_memmap_alloc(new_nr_map, &flags);
 	if (!new_memmap_phy)
 		return;
 
@@ -73,7 +74,7 @@ void __init efi_fake_memmap(void)
 	/* swap into new EFI memmap */
 	early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
-	efi_memmap_install(new_memmap_phy, new_nr_map);
+	efi_memmap_install(new_memmap_phy, new_nr_map, flags);
 
 	/* print new EFI memmap */
 	efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 813674ef9000..2b81ee6858a9 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -32,6 +32,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 /**
  * efi_memmap_alloc - Allocate memory for the EFI memory map
  * @num_entries: Number of entries in the allocated map.
+ * @flags: Late map, memblock alloc, slab alloc flags
  *
  * Depending on whether mm_init() has already been invoked or not,
  * either memblock or "normal" page allocation is used.
@@ -39,20 +40,23 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
  * Returns the physical address of the allocated memory map on
  * success, zero on failure.
  */
-phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+phys_addr_t __init efi_memmap_alloc(unsigned int num_entries, unsigned long *flags)
 {
 	unsigned long size = num_entries * efi.memmap.desc_size;
 
-	if (slab_is_available())
+	if (slab_is_available()) {
+		*flags |= EFI_MEMMAP_SLAB;
 		return __efi_memmap_alloc_late(size);
+	}
 
+	*flags |= EFI_MEMMAP_MEMBLOCK;
 	return __efi_memmap_alloc_early(size);
 }
 
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
- * @flags: Use early or late mapping function?
+ * @flags: Use early or late mapping function, and allocator
  *
  * This function takes care of figuring out which function to use to
  * map the EFI memory map in efi.memmap based on how far into the boot
@@ -192,10 +196,10 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
  *
  * Returns zero on success, a negative error code on failure.
  */
-int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
+int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map,
+		unsigned long flags)
 {
 	struct efi_memory_map_data data;
-	unsigned long flags;
 
 	efi_memmap_unmap();
 
@@ -203,7 +207,7 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
 	data.size = efi.memmap.desc_size * nr_map;
 	data.desc_version = efi.memmap.desc_version;
 	data.desc_size = efi.memmap.desc_size;
-	flags = efi.memmap.flags & EFI_MEMMAP_LATE;
+	flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
 
 	return __efi_memmap_init(&data, flags);
 }
diff --git a/include/linux/efi.h b/include/linux/efi.h
index b8e930f5ff77..fa2668a992ae 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -796,6 +796,8 @@ struct efi_memory_map {
 	unsigned long desc_version;
 	unsigned long desc_size;
 #define EFI_MEMMAP_LATE (1UL << 0)
+#define EFI_MEMMAP_MEMBLOCK (1UL << 1)
+#define EFI_MEMMAP_SLAB (1UL << 2)
 	unsigned long flags;
 };
 
@@ -1057,11 +1059,13 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
+extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries,
+		unsigned long *flags);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);
-extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map);
+extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map,
+		unsigned long flags);
 extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
 					 struct range *range);
 extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/4] efi: Add tracking for dynamically allocated memmaps
@ 2019-12-31 22:04   ` Dan Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo; +Cc: linux-efi, Ard Biesheuvel, x86, kexec, linux-kernel, Taku Izumi

In preparation for fixing efi_memmap_alloc() leaks, add support for
recording whether the memmap was dynamically allocated from slab,
memblock, or is the original physical memmap provided by the platform.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/platform/efi/efi.c     |    2 +-
 arch/x86/platform/efi/quirks.c  |   11 ++++++-----
 drivers/firmware/efi/fake_mem.c |    5 +++--
 drivers/firmware/efi/memmap.c   |   16 ++++++++++------
 include/linux/efi.h             |    8 ++++++--
 5 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 38d44f36d5ed..7086afbb84fd 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -333,7 +333,7 @@ static void __init efi_clean_memmap(void)
 		u64 size = efi.memmap.nr_map - n_removal;
 
 		pr_warn("Removing %d invalid memory map entries.\n", n_removal);
-		efi_memmap_install(efi.memmap.phys_map, size);
+		efi_memmap_install(efi.memmap.phys_map, size, 0);
 	}
 }
 
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f8f0220b6a66..4a71c790f9c3 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -244,6 +244,7 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
 void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 {
 	phys_addr_t new_phys, new_size;
+	unsigned long flags = 0;
 	struct efi_mem_range mr;
 	efi_memory_desc_t md;
 	int num_entries;
@@ -272,8 +273,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 	num_entries += efi.memmap.nr_map;
 
 	new_size = efi.memmap.desc_size * num_entries;
-
-	new_phys = efi_memmap_alloc(num_entries);
+	new_phys = efi_memmap_alloc(num_entries, &flags);
 	if (!new_phys) {
 		pr_err("Could not allocate boot services memmap\n");
 		return;
@@ -288,7 +288,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 	efi_memmap_insert(&efi.memmap, new, &mr);
 	early_memunmap(new, new_size);
 
-	efi_memmap_install(new_phys, num_entries);
+	efi_memmap_install(new_phys, num_entries, flags);
 	e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
 	e820__update_table(e820_table);
 }
@@ -408,6 +408,7 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
 void __init efi_free_boot_services(void)
 {
 	phys_addr_t new_phys, new_size;
+	unsigned long flags = 0;
 	efi_memory_desc_t *md;
 	int num_entries = 0;
 	void *new, *new_md;
@@ -463,7 +464,7 @@ void __init efi_free_boot_services(void)
 		return;
 
 	new_size = efi.memmap.desc_size * num_entries;
-	new_phys = efi_memmap_alloc(num_entries);
+	new_phys = efi_memmap_alloc(num_entries, &flags);
 	if (!new_phys) {
 		pr_err("Failed to allocate new EFI memmap\n");
 		return;
@@ -493,7 +494,7 @@ void __init efi_free_boot_services(void)
 
 	memunmap(new);
 
-	if (efi_memmap_install(new_phys, num_entries)) {
+	if (efi_memmap_install(new_phys, num_entries, flags)) {
 		pr_err("Could not install new EFI memmap\n");
 		return;
 	}
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index bb9fc70d0cfa..7e53e5520548 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -39,6 +39,7 @@ void __init efi_fake_memmap(void)
 	int new_nr_map = efi.memmap.nr_map;
 	efi_memory_desc_t *md;
 	phys_addr_t new_memmap_phy;
+	unsigned long flags = 0;
 	void *new_memmap;
 	int i;
 
@@ -55,7 +56,7 @@ void __init efi_fake_memmap(void)
 	}
 
 	/* allocate memory for new EFI memmap */
-	new_memmap_phy = efi_memmap_alloc(new_nr_map);
+	new_memmap_phy = efi_memmap_alloc(new_nr_map, &flags);
 	if (!new_memmap_phy)
 		return;
 
@@ -73,7 +74,7 @@ void __init efi_fake_memmap(void)
 	/* swap into new EFI memmap */
 	early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
-	efi_memmap_install(new_memmap_phy, new_nr_map);
+	efi_memmap_install(new_memmap_phy, new_nr_map, flags);
 
 	/* print new EFI memmap */
 	efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 813674ef9000..2b81ee6858a9 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -32,6 +32,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 /**
  * efi_memmap_alloc - Allocate memory for the EFI memory map
  * @num_entries: Number of entries in the allocated map.
+ * @flags: Late map, memblock alloc, slab alloc flags
  *
  * Depending on whether mm_init() has already been invoked or not,
  * either memblock or "normal" page allocation is used.
@@ -39,20 +40,23 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
  * Returns the physical address of the allocated memory map on
  * success, zero on failure.
  */
-phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+phys_addr_t __init efi_memmap_alloc(unsigned int num_entries, unsigned long *flags)
 {
 	unsigned long size = num_entries * efi.memmap.desc_size;
 
-	if (slab_is_available())
+	if (slab_is_available()) {
+		*flags |= EFI_MEMMAP_SLAB;
 		return __efi_memmap_alloc_late(size);
+	}
 
+	*flags |= EFI_MEMMAP_MEMBLOCK;
 	return __efi_memmap_alloc_early(size);
 }
 
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
- * @flags: Use early or late mapping function?
+ * @flags: Use early or late mapping function, and allocator
  *
  * This function takes care of figuring out which function to use to
  * map the EFI memory map in efi.memmap based on how far into the boot
@@ -192,10 +196,10 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
  *
  * Returns zero on success, a negative error code on failure.
  */
-int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
+int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map,
+		unsigned long flags)
 {
 	struct efi_memory_map_data data;
-	unsigned long flags;
 
 	efi_memmap_unmap();
 
@@ -203,7 +207,7 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
 	data.size = efi.memmap.desc_size * nr_map;
 	data.desc_version = efi.memmap.desc_version;
 	data.desc_size = efi.memmap.desc_size;
-	flags = efi.memmap.flags & EFI_MEMMAP_LATE;
+	flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
 
 	return __efi_memmap_init(&data, flags);
 }
diff --git a/include/linux/efi.h b/include/linux/efi.h
index b8e930f5ff77..fa2668a992ae 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -796,6 +796,8 @@ struct efi_memory_map {
 	unsigned long desc_version;
 	unsigned long desc_size;
 #define EFI_MEMMAP_LATE (1UL << 0)
+#define EFI_MEMMAP_MEMBLOCK (1UL << 1)
+#define EFI_MEMMAP_SLAB (1UL << 2)
 	unsigned long flags;
 };
 
@@ -1057,11 +1059,13 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
+extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries,
+		unsigned long *flags);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);
-extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map);
+extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map,
+		unsigned long flags);
 extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
 					 struct range *range);
 extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/4] efi: Fix efi_memmap_alloc() leaks
  2019-12-31 22:04 ` Dan Williams
@ 2019-12-31 22:04   ` Dan Williams
  -1 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo; +Cc: Taku Izumi, Ard Biesheuvel, linux-kernel, linux-efi, kexec, x86

With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
updated and replaced multiple times. When that happens a previous
dynamically allocated efi memory map can be garbage collected. Use the
new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
allocated memory map is being replaced.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 2b81ee6858a9..188ab3cd5c52 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 	return PFN_PHYS(page_to_pfn(p));
 }
 
+static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
+{
+	if (WARN_ON(slab_is_available() && (flags & EFI_MEMMAP_MEMBLOCK)))
+		return;
+
+	if (flags & EFI_MEMMAP_MEMBLOCK) {
+		memblock_free(phys, size);
+	} else if (flags & EFI_MEMMAP_SLAB) {
+		struct page *p = pfn_to_page(PHYS_PFN(phys));
+		unsigned int order = get_order(size);
+
+		free_pages((unsigned long) page_address(p), order);
+	}
+}
+
+static void __init efi_memmap_free(void)
+{
+	__efi_memmap_free(efi.memmap.phys_map,
+			efi.memmap.desc_size * efi.memmap.nr_map,
+			efi.memmap.flags);
+}
+
 /**
  * efi_memmap_alloc - Allocate memory for the EFI memory map
  * @num_entries: Number of entries in the allocated map.
@@ -209,6 +231,8 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map,
 	data.desc_size = efi.memmap.desc_size;
 	flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
 
+	efi_memmap_free();
+
 	return __efi_memmap_init(&data, flags);
 }
 


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/4] efi: Fix efi_memmap_alloc() leaks
@ 2019-12-31 22:04   ` Dan Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo; +Cc: linux-efi, Ard Biesheuvel, x86, kexec, linux-kernel, Taku Izumi

With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
updated and replaced multiple times. When that happens a previous
dynamically allocated efi memory map can be garbage collected. Use the
new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
allocated memory map is being replaced.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 2b81ee6858a9..188ab3cd5c52 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 	return PFN_PHYS(page_to_pfn(p));
 }
 
+static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
+{
+	if (WARN_ON(slab_is_available() && (flags & EFI_MEMMAP_MEMBLOCK)))
+		return;
+
+	if (flags & EFI_MEMMAP_MEMBLOCK) {
+		memblock_free(phys, size);
+	} else if (flags & EFI_MEMMAP_SLAB) {
+		struct page *p = pfn_to_page(PHYS_PFN(phys));
+		unsigned int order = get_order(size);
+
+		free_pages((unsigned long) page_address(p), order);
+	}
+}
+
+static void __init efi_memmap_free(void)
+{
+	__efi_memmap_free(efi.memmap.phys_map,
+			efi.memmap.desc_size * efi.memmap.nr_map,
+			efi.memmap.flags);
+}
+
 /**
  * efi_memmap_alloc - Allocate memory for the EFI memory map
  * @num_entries: Number of entries in the allocated map.
@@ -209,6 +231,8 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map,
 	data.desc_size = efi.memmap.desc_size;
 	flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
 
+	efi_memmap_free();
+
 	return __efi_memmap_init(&data, flags);
 }
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2019-12-31 22:04 ` Dan Williams
@ 2019-12-31 22:04   ` Dan Williams
  -1 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo
  Cc: Dave Young, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, linux-kernel, linux-efi, kexec,
	x86

Dave noticed that when specifying multiple efi_fake_mem= entries only
the last entry was successfully being reflected in the efi memory map.
This is due to the fact that the efi_memmap_insert() is being called
multiple times, but on successive invocations the insertion should be
applied to the last new memmap rather than the original map at
efi_fake_memmap() entry.

Rework efi_fake_memmap() to install the new memory map after each
efi_fake_mem= entry is parsed.

This also fixes an issue in efi_fake_memmap() that caused it to litter
emtpy entries into the end of the efi memory map. The empty entry causes
efi_memmap_insert() to attempt more memmap splits / copies than
efi_memmap_split_count() accounted for when sizing the new map.

    BUG: unable to handle page fault for address: ffffffffff281000
    [..]
    RIP: 0010:efi_memmap_insert+0x11d/0x191
    [..]
    Call Trace:
     ? bgrt_init+0xbe/0xbe
     ? efi_arch_mem_reserve+0x1cb/0x228
     ? acpi_parse_bgrt+0xa/0xd
     ? acpi_table_parse+0x86/0xb8
     ? acpi_boot_init+0x494/0x4e3
     ? acpi_parse_x2apic+0x87/0x87
     ? setup_acpi_sci+0xa2/0xa2
     ? setup_arch+0x8db/0x9e1
     ? start_kernel+0x6a/0x547
     ? secondary_startup_64+0xb6/0xc0

Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
services data to fix kexec breakage" is listed in Fixes: since it
introduces more occurrences where efi_memmap_insert() is invoked after
an efi_fake_mem= configuration has been parsed. Previously the side
effects of vestigial empty entries were benign, but with commit
af1648984828 that follow-on efi_memmap_insert() invocation triggers the
above crash signature.

Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
Link: https://lore.kernel.org/r/20191231014630.GA24942@dhcp-128-65.nay.redhat.com
Reported-by: Dave Young <dyoung@redhat.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Michael Weiser <michael@weiser.dinsnail.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/fake_mem.c |   32 +++++++++++++++++---------------
 drivers/firmware/efi/memmap.c   |    2 +-
 include/linux/efi.h             |    2 ++
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 7e53e5520548..68d752d8af21 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -34,26 +34,17 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
 	return 0;
 }
 
-void __init efi_fake_memmap(void)
+static void __init efi_fake_range(struct efi_mem_range *efi_range)
 {
 	int new_nr_map = efi.memmap.nr_map;
 	efi_memory_desc_t *md;
 	phys_addr_t new_memmap_phy;
 	unsigned long flags = 0;
 	void *new_memmap;
-	int i;
-
-	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
-		return;
 
 	/* count up the number of EFI memory descriptor */
-	for (i = 0; i < nr_fake_mem; i++) {
-		for_each_efi_memory_desc(md) {
-			struct range *r = &efi_fake_mems[i].range;
-
-			new_nr_map += efi_memmap_split_count(md, r);
-		}
-	}
+	for_each_efi_memory_desc(md)
+		new_nr_map += efi_memmap_split_count(md, &efi_range->range);
 
 	/* allocate memory for new EFI memmap */
 	new_memmap_phy = efi_memmap_alloc(new_nr_map, &flags);
@@ -64,17 +55,28 @@ void __init efi_fake_memmap(void)
 	new_memmap = early_memremap(new_memmap_phy,
 				    efi.memmap.desc_size * new_nr_map);
 	if (!new_memmap) {
-		memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
+		__efi_memmap_free(new_memmap_phy,
+				efi.memmap.desc_size * new_nr_map, flags);
 		return;
 	}
 
-	for (i = 0; i < nr_fake_mem; i++)
-		efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
+	efi_memmap_insert(&efi.memmap, new_memmap, efi_range);
 
 	/* swap into new EFI memmap */
 	early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
 	efi_memmap_install(new_memmap_phy, new_nr_map, flags);
+}
+
+void __init efi_fake_memmap(void)
+{
+	int i;
+
+	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
+		return;
+
+	for (i = 0; i < nr_fake_mem; i++)
+		efi_fake_range(&efi_fake_mems[i]);
 
 	/* print new EFI memmap */
 	efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 188ab3cd5c52..de66c2a0e8f8 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,7 +29,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 	return PFN_PHYS(page_to_pfn(p));
 }
 
-static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
+void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
 {
 	if (WARN_ON(slab_is_available() && (flags & EFI_MEMMAP_MEMBLOCK)))
 		return;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index fa2668a992ae..6ae31e064321 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1061,6 +1061,8 @@ extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
 extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries,
 		unsigned long *flags);
+extern void __efi_memmap_free(u64 phys, unsigned long size,
+		unsigned long flags);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries
@ 2019-12-31 22:04   ` Dan Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2019-12-31 22:04 UTC (permalink / raw)
  To: mingo
  Cc: Michael Weiser, linux-efi, Ard Biesheuvel, x86, kexec,
	linux-kernel, Taku Izumi, Thomas Gleixner, Dave Young,
	Ingo Molnar

Dave noticed that when specifying multiple efi_fake_mem= entries only
the last entry was successfully being reflected in the efi memory map.
This is due to the fact that the efi_memmap_insert() is being called
multiple times, but on successive invocations the insertion should be
applied to the last new memmap rather than the original map at
efi_fake_memmap() entry.

Rework efi_fake_memmap() to install the new memory map after each
efi_fake_mem= entry is parsed.

This also fixes an issue in efi_fake_memmap() that caused it to litter
emtpy entries into the end of the efi memory map. The empty entry causes
efi_memmap_insert() to attempt more memmap splits / copies than
efi_memmap_split_count() accounted for when sizing the new map.

    BUG: unable to handle page fault for address: ffffffffff281000
    [..]
    RIP: 0010:efi_memmap_insert+0x11d/0x191
    [..]
    Call Trace:
     ? bgrt_init+0xbe/0xbe
     ? efi_arch_mem_reserve+0x1cb/0x228
     ? acpi_parse_bgrt+0xa/0xd
     ? acpi_table_parse+0x86/0xb8
     ? acpi_boot_init+0x494/0x4e3
     ? acpi_parse_x2apic+0x87/0x87
     ? setup_acpi_sci+0xa2/0xa2
     ? setup_arch+0x8db/0x9e1
     ? start_kernel+0x6a/0x547
     ? secondary_startup_64+0xb6/0xc0

Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
services data to fix kexec breakage" is listed in Fixes: since it
introduces more occurrences where efi_memmap_insert() is invoked after
an efi_fake_mem= configuration has been parsed. Previously the side
effects of vestigial empty entries were benign, but with commit
af1648984828 that follow-on efi_memmap_insert() invocation triggers the
above crash signature.

Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
Link: https://lore.kernel.org/r/20191231014630.GA24942@dhcp-128-65.nay.redhat.com
Reported-by: Dave Young <dyoung@redhat.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Michael Weiser <michael@weiser.dinsnail.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/fake_mem.c |   32 +++++++++++++++++---------------
 drivers/firmware/efi/memmap.c   |    2 +-
 include/linux/efi.h             |    2 ++
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 7e53e5520548..68d752d8af21 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -34,26 +34,17 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
 	return 0;
 }
 
-void __init efi_fake_memmap(void)
+static void __init efi_fake_range(struct efi_mem_range *efi_range)
 {
 	int new_nr_map = efi.memmap.nr_map;
 	efi_memory_desc_t *md;
 	phys_addr_t new_memmap_phy;
 	unsigned long flags = 0;
 	void *new_memmap;
-	int i;
-
-	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
-		return;
 
 	/* count up the number of EFI memory descriptor */
-	for (i = 0; i < nr_fake_mem; i++) {
-		for_each_efi_memory_desc(md) {
-			struct range *r = &efi_fake_mems[i].range;
-
-			new_nr_map += efi_memmap_split_count(md, r);
-		}
-	}
+	for_each_efi_memory_desc(md)
+		new_nr_map += efi_memmap_split_count(md, &efi_range->range);
 
 	/* allocate memory for new EFI memmap */
 	new_memmap_phy = efi_memmap_alloc(new_nr_map, &flags);
@@ -64,17 +55,28 @@ void __init efi_fake_memmap(void)
 	new_memmap = early_memremap(new_memmap_phy,
 				    efi.memmap.desc_size * new_nr_map);
 	if (!new_memmap) {
-		memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
+		__efi_memmap_free(new_memmap_phy,
+				efi.memmap.desc_size * new_nr_map, flags);
 		return;
 	}
 
-	for (i = 0; i < nr_fake_mem; i++)
-		efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
+	efi_memmap_insert(&efi.memmap, new_memmap, efi_range);
 
 	/* swap into new EFI memmap */
 	early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
 	efi_memmap_install(new_memmap_phy, new_nr_map, flags);
+}
+
+void __init efi_fake_memmap(void)
+{
+	int i;
+
+	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
+		return;
+
+	for (i = 0; i < nr_fake_mem; i++)
+		efi_fake_range(&efi_fake_mems[i]);
 
 	/* print new EFI memmap */
 	efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 188ab3cd5c52..de66c2a0e8f8 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,7 +29,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 	return PFN_PHYS(page_to_pfn(p));
 }
 
-static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
+void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
 {
 	if (WARN_ON(slab_is_available() && (flags & EFI_MEMMAP_MEMBLOCK)))
 		return;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index fa2668a992ae..6ae31e064321 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1061,6 +1061,8 @@ extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
 extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries,
 		unsigned long *flags);
+extern void __efi_memmap_free(u64 phys, unsigned long size,
+		unsigned long flags);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] efi: Fix efi_memmap_alloc() leaks
  2019-12-31 22:04   ` Dan Williams
  (?)
@ 2020-01-01  3:35   ` Dave Young
  2020-01-01  4:52     ` Dan Williams
  -1 siblings, 1 reply; 18+ messages in thread
From: Dave Young @ 2020-01-01  3:35 UTC (permalink / raw)
  To: Dan Williams
  Cc: mingo, Taku Izumi, Ard Biesheuvel, linux-kernel, linux-efi, kexec, x86

Hi Dan,
On 12/31/19 at 02:04pm, Dan Williams wrote:
> With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> updated and replaced multiple times. When that happens a previous
> dynamically allocated efi memory map can be garbage collected. Use the
> new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> allocated memory map is being replaced.
> 
> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> index 2b81ee6858a9..188ab3cd5c52 100644
> --- a/drivers/firmware/efi/memmap.c
> +++ b/drivers/firmware/efi/memmap.c
> @@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
>  	return PFN_PHYS(page_to_pfn(p));
>  }
>  
> +static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
> +{
> +	if (WARN_ON(slab_is_available() && (flags & EFI_MEMMAP_MEMBLOCK)))
> +		return;
> +
> +	if (flags & EFI_MEMMAP_MEMBLOCK) {
> +		memblock_free(phys, size);
> +	} else if (flags & EFI_MEMMAP_SLAB) {
> +		struct page *p = pfn_to_page(PHYS_PFN(phys));
> +		unsigned int order = get_order(size);
> +
> +		free_pages((unsigned long) page_address(p), order);
> +	}
> +}
> +
> +static void __init efi_memmap_free(void)
> +{
> +	__efi_memmap_free(efi.memmap.phys_map,
> +			efi.memmap.desc_size * efi.memmap.nr_map,
> +			efi.memmap.flags);
> +}
> +
>  /**
>   * efi_memmap_alloc - Allocate memory for the EFI memory map
>   * @num_entries: Number of entries in the allocated map.
> @@ -209,6 +231,8 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map,
>  	data.desc_size = efi.memmap.desc_size;
>  	flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
>  
> +	efi_memmap_free();
> +
>  	return __efi_memmap_init(&data, flags);

Hmm, only free the memmap in case __efi_memmap_init succeeded..

Thanks
Dave


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2019-12-31 22:04   ` Dan Williams
  (?)
@ 2020-01-01  4:51   ` Dave Young
  2020-01-01  5:04     ` Dan Williams
  -1 siblings, 1 reply; 18+ messages in thread
From: Dave Young @ 2020-01-01  4:51 UTC (permalink / raw)
  To: Dan Williams
  Cc: mingo, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, linux-kernel, linux-efi, kexec,
	x86

Hi Dan,
On 12/31/19 at 02:04pm, Dan Williams wrote:
> Dave noticed that when specifying multiple efi_fake_mem= entries only
> the last entry was successfully being reflected in the efi memory map.
> This is due to the fact that the efi_memmap_insert() is being called
> multiple times, but on successive invocations the insertion should be
> applied to the last new memmap rather than the original map at
> efi_fake_memmap() entry.
> 
> Rework efi_fake_memmap() to install the new memory map after each
> efi_fake_mem= entry is parsed.
> 
> This also fixes an issue in efi_fake_memmap() that caused it to litter
> emtpy entries into the end of the efi memory map. The empty entry causes
> efi_memmap_insert() to attempt more memmap splits / copies than
> efi_memmap_split_count() accounted for when sizing the new map.
> 
>     BUG: unable to handle page fault for address: ffffffffff281000
>     [..]
>     RIP: 0010:efi_memmap_insert+0x11d/0x191
>     [..]
>     Call Trace:
>      ? bgrt_init+0xbe/0xbe
>      ? efi_arch_mem_reserve+0x1cb/0x228
>      ? acpi_parse_bgrt+0xa/0xd
>      ? acpi_table_parse+0x86/0xb8
>      ? acpi_boot_init+0x494/0x4e3
>      ? acpi_parse_x2apic+0x87/0x87
>      ? setup_acpi_sci+0xa2/0xa2
>      ? setup_arch+0x8db/0x9e1
>      ? start_kernel+0x6a/0x547
>      ? secondary_startup_64+0xb6/0xc0
> 
> Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> services data to fix kexec breakage" is listed in Fixes: since it
> introduces more occurrences where efi_memmap_insert() is invoked after
> an efi_fake_mem= configuration has been parsed. Previously the side
> effects of vestigial empty entries were benign, but with commit
> af1648984828 that follow-on efi_memmap_insert() invocation triggers the
> above crash signature.
> 
> Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> Link: https://lore.kernel.org/r/20191231014630.GA24942@dhcp-128-65.nay.redhat.com
> Reported-by: Dave Young <dyoung@redhat.com>
> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> Cc: Michael Weiser <michael@weiser.dinsnail.net>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/firmware/efi/fake_mem.c |   32 +++++++++++++++++---------------
>  drivers/firmware/efi/memmap.c   |    2 +-
>  include/linux/efi.h             |    2 ++
>  3 files changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> index 7e53e5520548..68d752d8af21 100644
> --- a/drivers/firmware/efi/fake_mem.c
> +++ b/drivers/firmware/efi/fake_mem.c
> @@ -34,26 +34,17 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
>  	return 0;
>  }
>  
> -void __init efi_fake_memmap(void)
> +static void __init efi_fake_range(struct efi_mem_range *efi_range)
>  {
>  	int new_nr_map = efi.memmap.nr_map;
>  	efi_memory_desc_t *md;
>  	phys_addr_t new_memmap_phy;
>  	unsigned long flags = 0;
>  	void *new_memmap;
> -	int i;
> -
> -	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
> -		return;
>  
>  	/* count up the number of EFI memory descriptor */
> -	for (i = 0; i < nr_fake_mem; i++) {
> -		for_each_efi_memory_desc(md) {
> -			struct range *r = &efi_fake_mems[i].range;
> -
> -			new_nr_map += efi_memmap_split_count(md, r);
> -		}
> -	}
> +	for_each_efi_memory_desc(md)
> +		new_nr_map += efi_memmap_split_count(md, &efi_range->range);

I have another concern here :(

THe efi_memmap_split_count mean to only split for a specific md, and you
can see arch/x86/platform/efi/quirks.c about the use:
        if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
                pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
                return;
        }

Any memory region to be inserted but spans different md will be
rejected.  So the memmap insert logic seems does not support the
spanned ranges.  I did not find a case two contiguous same type ranges
eg. two "Conventional memory", if have they should have been merged. 

So maybe just use same way as the quirks.c here to find the valid md first
then get the split count?

Otherwise I tested the series bootup test passed.

BTW, another issue about fakemem,  currently it only works with normal
physical boot,  in case of kexec reboot the kernel only aware of EFI
runtime memory ranges, we do not pass other types in memmap.  But maybe
we can live with it considering fake mem is only for debugging purpose.

Thanks
Dave


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] efi: Fix efi_memmap_alloc() leaks
  2020-01-01  3:35   ` Dave Young
@ 2020-01-01  4:52     ` Dan Williams
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Williams @ 2020-01-01  4:52 UTC (permalink / raw)
  To: Dave Young
  Cc: Ingo Molnar, Taku Izumi, Ard Biesheuvel,
	Linux Kernel Mailing List, linux-efi, kexec, X86 ML

On Tue, Dec 31, 2019 at 7:35 PM Dave Young <dyoung@redhat.com> wrote:
>
> Hi Dan,
> On 12/31/19 at 02:04pm, Dan Williams wrote:
> > With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> > updated and replaced multiple times. When that happens a previous
> > dynamically allocated efi memory map can be garbage collected. Use the
> > new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> > allocated memory map is being replaced.
> >
> > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> > index 2b81ee6858a9..188ab3cd5c52 100644
> > --- a/drivers/firmware/efi/memmap.c
> > +++ b/drivers/firmware/efi/memmap.c
> > @@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
> >       return PFN_PHYS(page_to_pfn(p));
> >  }
> >
> > +static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
> > +{
> > +     if (WARN_ON(slab_is_available() && (flags & EFI_MEMMAP_MEMBLOCK)))
> > +             return;
> > +
> > +     if (flags & EFI_MEMMAP_MEMBLOCK) {
> > +             memblock_free(phys, size);
> > +     } else if (flags & EFI_MEMMAP_SLAB) {
> > +             struct page *p = pfn_to_page(PHYS_PFN(phys));
> > +             unsigned int order = get_order(size);
> > +
> > +             free_pages((unsigned long) page_address(p), order);
> > +     }
> > +}
> > +
> > +static void __init efi_memmap_free(void)
> > +{
> > +     __efi_memmap_free(efi.memmap.phys_map,
> > +                     efi.memmap.desc_size * efi.memmap.nr_map,
> > +                     efi.memmap.flags);
> > +}
> > +
> >  /**
> >   * efi_memmap_alloc - Allocate memory for the EFI memory map
> >   * @num_entries: Number of entries in the allocated map.
> > @@ -209,6 +231,8 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map,
> >       data.desc_size = efi.memmap.desc_size;
> >       flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
> >
> > +     efi_memmap_free();
> > +
> >       return __efi_memmap_init(&data, flags);
>
> Hmm, only free the memmap in case __efi_memmap_init succeeded..

Ah true, that is a hastily chosen placement. Probably better in
__efi_memmap_init() after we're committed to the new map.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-01  4:51   ` Dave Young
@ 2020-01-01  5:04     ` Dan Williams
  2020-01-01  6:15       ` Dave Young
  0 siblings, 1 reply; 18+ messages in thread
From: Dan Williams @ 2020-01-01  5:04 UTC (permalink / raw)
  To: Dave Young
  Cc: Ingo Molnar, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, Linux Kernel Mailing List,
	linux-efi, kexec, X86 ML

On Tue, Dec 31, 2019 at 8:52 PM Dave Young <dyoung@redhat.com> wrote:
>
> Hi Dan,
> On 12/31/19 at 02:04pm, Dan Williams wrote:
> > Dave noticed that when specifying multiple efi_fake_mem= entries only
> > the last entry was successfully being reflected in the efi memory map.
> > This is due to the fact that the efi_memmap_insert() is being called
> > multiple times, but on successive invocations the insertion should be
> > applied to the last new memmap rather than the original map at
> > efi_fake_memmap() entry.
> >
> > Rework efi_fake_memmap() to install the new memory map after each
> > efi_fake_mem= entry is parsed.
> >
> > This also fixes an issue in efi_fake_memmap() that caused it to litter
> > emtpy entries into the end of the efi memory map. The empty entry causes
> > efi_memmap_insert() to attempt more memmap splits / copies than
> > efi_memmap_split_count() accounted for when sizing the new map.
> >
> >     BUG: unable to handle page fault for address: ffffffffff281000
> >     [..]
> >     RIP: 0010:efi_memmap_insert+0x11d/0x191
> >     [..]
> >     Call Trace:
> >      ? bgrt_init+0xbe/0xbe
> >      ? efi_arch_mem_reserve+0x1cb/0x228
> >      ? acpi_parse_bgrt+0xa/0xd
> >      ? acpi_table_parse+0x86/0xb8
> >      ? acpi_boot_init+0x494/0x4e3
> >      ? acpi_parse_x2apic+0x87/0x87
> >      ? setup_acpi_sci+0xa2/0xa2
> >      ? setup_arch+0x8db/0x9e1
> >      ? start_kernel+0x6a/0x547
> >      ? secondary_startup_64+0xb6/0xc0
> >
> > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> > services data to fix kexec breakage" is listed in Fixes: since it
> > introduces more occurrences where efi_memmap_insert() is invoked after
> > an efi_fake_mem= configuration has been parsed. Previously the side
> > effects of vestigial empty entries were benign, but with commit
> > af1648984828 that follow-on efi_memmap_insert() invocation triggers the
> > above crash signature.
> >
> > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> > Link: https://lore.kernel.org/r/20191231014630.GA24942@dhcp-128-65.nay.redhat.com
> > Reported-by: Dave Young <dyoung@redhat.com>
> > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> > Cc: Michael Weiser <michael@weiser.dinsnail.net>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/firmware/efi/fake_mem.c |   32 +++++++++++++++++---------------
> >  drivers/firmware/efi/memmap.c   |    2 +-
> >  include/linux/efi.h             |    2 ++
> >  3 files changed, 20 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> > index 7e53e5520548..68d752d8af21 100644
> > --- a/drivers/firmware/efi/fake_mem.c
> > +++ b/drivers/firmware/efi/fake_mem.c
> > @@ -34,26 +34,17 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
> >       return 0;
> >  }
> >
> > -void __init efi_fake_memmap(void)
> > +static void __init efi_fake_range(struct efi_mem_range *efi_range)
> >  {
> >       int new_nr_map = efi.memmap.nr_map;
> >       efi_memory_desc_t *md;
> >       phys_addr_t new_memmap_phy;
> >       unsigned long flags = 0;
> >       void *new_memmap;
> > -     int i;
> > -
> > -     if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
> > -             return;
> >
> >       /* count up the number of EFI memory descriptor */
> > -     for (i = 0; i < nr_fake_mem; i++) {
> > -             for_each_efi_memory_desc(md) {
> > -                     struct range *r = &efi_fake_mems[i].range;
> > -
> > -                     new_nr_map += efi_memmap_split_count(md, r);
> > -             }
> > -     }
> > +     for_each_efi_memory_desc(md)
> > +             new_nr_map += efi_memmap_split_count(md, &efi_range->range);
>
> I have another concern here :(
>
> THe efi_memmap_split_count mean to only split for a specific md, and you
> can see arch/x86/platform/efi/quirks.c about the use:
>         if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
>                 pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
>                 return;
>         }
>
> Any memory region to be inserted but spans different md will be
> rejected.  So the memmap insert logic seems does not support the
> spanned ranges.  I did not find a case two contiguous same type ranges
> eg. two "Conventional memory", if have they should have been merged.
>
> So maybe just use same way as the quirks.c here to find the valid md first
> then get the split count?

I don't immediately see why it would be a problem to just let the md
loop that efi_fake_memmap() performs try to split multiple entries. It
may end up with more splits than necessary in which case we'll need
that piece from my original patch to clean those up. Thanks for the
heads up, I'll give it a try and see what shakes out. Are you seeing
any misbehavior on your end?

>
> Otherwise I tested the series bootup test passed.
>
> BTW, another issue about fakemem,  currently it only works with normal
> physical boot,  in case of kexec reboot the kernel only aware of EFI
> runtime memory ranges, we do not pass other types in memmap.  But maybe
> we can live with it considering fake mem is only for debugging purpose.

Does kexec preserve iomem? I.e. as long as the initial translation of
efi entries to e820, and resulting resource tree, is preserved by
successive kexec cycles then I think we're ok.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-01  5:04     ` Dan Williams
@ 2020-01-01  6:15       ` Dave Young
  2020-01-01  6:20         ` Dave Young
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Young @ 2020-01-01  6:15 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, Linux Kernel Mailing List,
	linux-efi, kexec, X86 ML

On 12/31/19 at 09:04pm, Dan Williams wrote:
> On Tue, Dec 31, 2019 at 8:52 PM Dave Young <dyoung@redhat.com> wrote:
> >
> > Hi Dan,
> > On 12/31/19 at 02:04pm, Dan Williams wrote:
> > > Dave noticed that when specifying multiple efi_fake_mem= entries only
> > > the last entry was successfully being reflected in the efi memory map.
> > > This is due to the fact that the efi_memmap_insert() is being called
> > > multiple times, but on successive invocations the insertion should be
> > > applied to the last new memmap rather than the original map at
> > > efi_fake_memmap() entry.
> > >
> > > Rework efi_fake_memmap() to install the new memory map after each
> > > efi_fake_mem= entry is parsed.
> > >
> > > This also fixes an issue in efi_fake_memmap() that caused it to litter
> > > emtpy entries into the end of the efi memory map. The empty entry causes
> > > efi_memmap_insert() to attempt more memmap splits / copies than
> > > efi_memmap_split_count() accounted for when sizing the new map.
> > >
> > >     BUG: unable to handle page fault for address: ffffffffff281000
> > >     [..]
> > >     RIP: 0010:efi_memmap_insert+0x11d/0x191
> > >     [..]
> > >     Call Trace:
> > >      ? bgrt_init+0xbe/0xbe
> > >      ? efi_arch_mem_reserve+0x1cb/0x228
> > >      ? acpi_parse_bgrt+0xa/0xd
> > >      ? acpi_table_parse+0x86/0xb8
> > >      ? acpi_boot_init+0x494/0x4e3
> > >      ? acpi_parse_x2apic+0x87/0x87
> > >      ? setup_acpi_sci+0xa2/0xa2
> > >      ? setup_arch+0x8db/0x9e1
> > >      ? start_kernel+0x6a/0x547
> > >      ? secondary_startup_64+0xb6/0xc0
> > >
> > > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> > > services data to fix kexec breakage" is listed in Fixes: since it
> > > introduces more occurrences where efi_memmap_insert() is invoked after
> > > an efi_fake_mem= configuration has been parsed. Previously the side
> > > effects of vestigial empty entries were benign, but with commit
> > > af1648984828 that follow-on efi_memmap_insert() invocation triggers the
> > > above crash signature.
> > >
> > > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> > > Link: https://lore.kernel.org/r/20191231014630.GA24942@dhcp-128-65.nay.redhat.com
> > > Reported-by: Dave Young <dyoung@redhat.com>
> > > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> > > Cc: Michael Weiser <michael@weiser.dinsnail.net>
> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  drivers/firmware/efi/fake_mem.c |   32 +++++++++++++++++---------------
> > >  drivers/firmware/efi/memmap.c   |    2 +-
> > >  include/linux/efi.h             |    2 ++
> > >  3 files changed, 20 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> > > index 7e53e5520548..68d752d8af21 100644
> > > --- a/drivers/firmware/efi/fake_mem.c
> > > +++ b/drivers/firmware/efi/fake_mem.c
> > > @@ -34,26 +34,17 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
> > >       return 0;
> > >  }
> > >
> > > -void __init efi_fake_memmap(void)
> > > +static void __init efi_fake_range(struct efi_mem_range *efi_range)
> > >  {
> > >       int new_nr_map = efi.memmap.nr_map;
> > >       efi_memory_desc_t *md;
> > >       phys_addr_t new_memmap_phy;
> > >       unsigned long flags = 0;
> > >       void *new_memmap;
> > > -     int i;
> > > -
> > > -     if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
> > > -             return;
> > >
> > >       /* count up the number of EFI memory descriptor */
> > > -     for (i = 0; i < nr_fake_mem; i++) {
> > > -             for_each_efi_memory_desc(md) {
> > > -                     struct range *r = &efi_fake_mems[i].range;
> > > -
> > > -                     new_nr_map += efi_memmap_split_count(md, r);
> > > -             }
> > > -     }
> > > +     for_each_efi_memory_desc(md)
> > > +             new_nr_map += efi_memmap_split_count(md, &efi_range->range);
> >
> > I have another concern here :(
> >
> > THe efi_memmap_split_count mean to only split for a specific md, and you
> > can see arch/x86/platform/efi/quirks.c about the use:
> >         if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
> >                 pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
> >                 return;
> >         }
> >
> > Any memory region to be inserted but spans different md will be
> > rejected.  So the memmap insert logic seems does not support the
> > spanned ranges.  I did not find a case two contiguous same type ranges
> > eg. two "Conventional memory", if have they should have been merged.
> >
> > So maybe just use same way as the quirks.c here to find the valid md first
> > then get the split count?
> 
> I don't immediately see why it would be a problem to just let the md
> loop that efi_fake_memmap() performs try to split multiple entries. It
> may end up with more splits than necessary in which case we'll need
> that piece from my original patch to clean those up. Thanks for the
> heads up, I'll give it a try and see what shakes out. Are you seeing
> any misbehavior on your end?

Just some worries, but I did not see any misbehaviors :)

> 
> >
> > Otherwise I tested the series bootup test passed.
> >
> > BTW, another issue about fakemem,  currently it only works with normal
> > physical boot,  in case of kexec reboot the kernel only aware of EFI
> > runtime memory ranges, we do not pass other types in memmap.  But maybe
> > we can live with it considering fake mem is only for debugging purpose.
> 
> Does kexec preserve iomem? I.e. as long as the initial translation of
> efi entries to e820, and resulting resource tree, is preserved by
> successive kexec cycles then I think we're ok.

It will not preserve them automatically, but that can be fixed if
needed.

There are two places:
1. the in kernel loader, we can do similar with below commit (for Soft
Reseved instead):
commit 980621daf368f2b9aa69c7ea01baa654edb7577b
Author: Lianbo Jiang <lijiang@redhat.com>
Date:   Tue Apr 23 09:30:07 2019 +0800

    x86/crash: Add e820 reserved ranges to kdump kernel's e820 table

2. the userspace loader, in kexec-tools:
It only parse and pass "Reserved" for the time being, also need handle
the Soft Reserved" part as well.

Thanks
Dave


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-01  6:15       ` Dave Young
@ 2020-01-01  6:20         ` Dave Young
  2020-01-01 18:36           ` Dan Williams
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Young @ 2020-01-01  6:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, Linux Kernel Mailing List,
	linux-efi, kexec, X86 ML

> > Does kexec preserve iomem? I.e. as long as the initial translation of
> > efi entries to e820, and resulting resource tree, is preserved by
> > successive kexec cycles then I think we're ok.
> 
> It will not preserve them automatically, but that can be fixed if
> needed.
> 
> There are two places:
> 1. the in kernel loader, we can do similar with below commit (for Soft
> Reseved instead):
> commit 980621daf368f2b9aa69c7ea01baa654edb7577b
> Author: Lianbo Jiang <lijiang@redhat.com>
> Date:   Tue Apr 23 09:30:07 2019 +0800
> 
>     x86/crash: Add e820 reserved ranges to kdump kernel's e820 table

Oops, that is for kdump only, for kexec, should update the kexec e820
table.  But before doing that we need first to see if this is necessary. 

Thanks
Dave


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-01  6:20         ` Dave Young
@ 2020-01-01 18:36           ` Dan Williams
  2020-01-02  2:21             ` Dave Young
  0 siblings, 1 reply; 18+ messages in thread
From: Dan Williams @ 2020-01-01 18:36 UTC (permalink / raw)
  To: Dave Young
  Cc: Ingo Molnar, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, Linux Kernel Mailing List,
	linux-efi, kexec, X86 ML

On Tue, Dec 31, 2019 at 10:21 PM Dave Young <dyoung@redhat.com> wrote:
>
> > > Does kexec preserve iomem? I.e. as long as the initial translation of
> > > efi entries to e820, and resulting resource tree, is preserved by
> > > successive kexec cycles then I think we're ok.
> >
> > It will not preserve them automatically, but that can be fixed if
> > needed.
> >
> > There are two places:
> > 1. the in kernel loader, we can do similar with below commit (for Soft
> > Reseved instead):
> > commit 980621daf368f2b9aa69c7ea01baa654edb7577b
> > Author: Lianbo Jiang <lijiang@redhat.com>
> > Date:   Tue Apr 23 09:30:07 2019 +0800
> >
> >     x86/crash: Add e820 reserved ranges to kdump kernel's e820 table
>
> Oops, that is for kdump only, for kexec, should update the kexec e820
> table.  But before doing that we need first to see if this is necessary.

We can cross that bridge later, but I expect it will eventually be
necessary. The soft-reservation facility will become more prevalent as
more platforms ship with DRAM differentiated memory ranges, like
high-bandwidth-memory, and the system needs to reserve it from general
kernel allocations. See commit 262b45ae3ab4 "x86/efi: EFI soft
reservation to E820 enumeration" and commit fe3e5e65c06e "efi:
Enumerate EFI_MEMORY_SP" for more details.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-01 18:36           ` Dan Williams
@ 2020-01-02  2:21             ` Dave Young
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Young @ 2020-01-02  2:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, Linux Kernel Mailing List,
	linux-efi, kexec, X86 ML

On 01/01/20 at 10:36am, Dan Williams wrote:
> On Tue, Dec 31, 2019 at 10:21 PM Dave Young <dyoung@redhat.com> wrote:
> >
> > > > Does kexec preserve iomem? I.e. as long as the initial translation of
> > > > efi entries to e820, and resulting resource tree, is preserved by
> > > > successive kexec cycles then I think we're ok.
> > >
> > > It will not preserve them automatically, but that can be fixed if
> > > needed.
> > >
> > > There are two places:
> > > 1. the in kernel loader, we can do similar with below commit (for Soft
> > > Reseved instead):
> > > commit 980621daf368f2b9aa69c7ea01baa654edb7577b
> > > Author: Lianbo Jiang <lijiang@redhat.com>
> > > Date:   Tue Apr 23 09:30:07 2019 +0800
> > >
> > >     x86/crash: Add e820 reserved ranges to kdump kernel's e820 table
> >
> > Oops, that is for kdump only, for kexec, should update the kexec e820
> > table.  But before doing that we need first to see if this is necessary.
> 
> We can cross that bridge later, but I expect it will eventually be
> necessary. The soft-reservation facility will become more prevalent as
> more platforms ship with DRAM differentiated memory ranges, like
> high-bandwidth-memory, and the system needs to reserve it from general
> kernel allocations. See commit 262b45ae3ab4 "x86/efi: EFI soft
> reservation to E820 enumeration" and commit fe3e5e65c06e "efi:
> Enumerate EFI_MEMORY_SP" for more details.

Ok, agreed the EFI_MEMORY_SP should be preserved across kexec reboot,
I think those firmware provided EFI_MEMORY_SP should be persistent
because the e820 table is just a copy.   But I have no such hardware
to test,  could you do a test to confirm if possible?

The test steps should be:
# -s means to use kexec_file_load syscall
kexec -s -l bzImage --initrd initramfs-file --reuse-cmdline 
reboot

Maybe this should be fine for the time being.  And
leave the faked mem only works once during the physical boot?

Thanks
Dave


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-01-02  2:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-31 22:04 [PATCH v2 0/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
2019-12-31 22:04 ` Dan Williams
2019-12-31 22:04 ` [PATCH v2 1/4] efi: Add a flags parameter to efi_memory_map Dan Williams
2019-12-31 22:04   ` Dan Williams
2019-12-31 22:04 ` [PATCH v2 2/4] efi: Add tracking for dynamically allocated memmaps Dan Williams
2019-12-31 22:04   ` Dan Williams
2019-12-31 22:04 ` [PATCH v2 3/4] efi: Fix efi_memmap_alloc() leaks Dan Williams
2019-12-31 22:04   ` Dan Williams
2020-01-01  3:35   ` Dave Young
2020-01-01  4:52     ` Dan Williams
2019-12-31 22:04 ` [PATCH v2 4/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
2019-12-31 22:04   ` Dan Williams
2020-01-01  4:51   ` Dave Young
2020-01-01  5:04     ` Dan Williams
2020-01-01  6:15       ` Dave Young
2020-01-01  6:20         ` Dave Young
2020-01-01 18:36           ` Dan Williams
2020-01-02  2:21             ` Dave Young

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.