linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] efi: Fix handling of multiple efi_fake_mem= entries
@ 2020-01-07  0:40 Dan Williams
  2020-01-07  0:40 ` [PATCH v4 1/4] efi: Add a flags parameter to efi_memory_map Dan Williams
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Dan Williams @ 2020-01-07  0:40 UTC (permalink / raw)
  To: mingo
  Cc: Taku Izumi, Thomas Gleixner, Dave Young, Ingo Molnar,
	Michael Weiser, Ard Biesheuvel, linux-efi, x86, linux-kernel,
	kexec

Changes since v3 [1]:
- Rather than pass a reference to a new flags argument, pass a common
  data structure ('struct efi_memory_map_data'), between
  efi_memmap_alloc() and efi_memmap_install(). (Ard)

- Arrange for EFI_MEMMAP_SLAB to be clear if EFI_MEMMAP_MEMBLOCK was set
  and vice versa (Ard).

[1]: http://lore.kernel.org/r/157793839827.977550.7845382457971215205.stgit@dwillia2-desk3.amr.corp.intel.com
---

While testing an upcoming patchset to enhance the "soft reservation"
implementation it started crashing when rebased on v5.5-rc3. This
uncovered a few bugs in the efi_fake_mem= handling and
efi_memmap_alloc() leaks.

---

Copied from patch4:

Dave noticed that when specifying multiple efi_fake_mem= entries only
the last entry was successfully being reflected in the efi memory map.
This is due to the fact that the efi_memmap_insert() is being called
multiple times, but on successive invocations the insertion should be
applied to the last new memmap rather than the original map at
efi_fake_memmap() entry.

Rework efi_fake_memmap() to install the new memory map after each
efi_fake_mem= entry is parsed.

This also fixes an issue in efi_fake_memmap() that caused it to litter
emtpy entries into the end of the efi memory map. An empty entry causes
efi_memmap_insert() to attempt more memmap splits / copies than
efi_memmap_split_count() accounted for when sizing the new map. When
that happens efi_memmap_insert() may overrun its allocation, and if you
are lucky will spill over to an unmapped page leading to crash
signature like the following rather than silent corruption:

    BUG: unable to handle page fault for address: ffffffffff281000
    [..]
    RIP: 0010:efi_memmap_insert+0x11d/0x191
    [..]
    Call Trace:
     ? bgrt_init+0xbe/0xbe
     ? efi_arch_mem_reserve+0x1cb/0x228
     ? acpi_parse_bgrt+0xa/0xd
     ? acpi_table_parse+0x86/0xb8
     ? acpi_boot_init+0x494/0x4e3
     ? acpi_parse_x2apic+0x87/0x87
     ? setup_acpi_sci+0xa2/0xa2
     ? setup_arch+0x8db/0x9e1
     ? start_kernel+0x6a/0x547
     ? secondary_startup_64+0xb6/0xc0

Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
services data to fix kexec breakage" is listed in Fixes: since it
introduces more occurrences where efi_memmap_insert() is invoked after
an efi_fake_mem= configuration has been parsed. Previously the side
effects of vestigial empty entries were benign, but with commit
af1648984828 that follow-on efi_memmap_insert() invocation triggers
efi_memmap_insert() overruns.

---

Dan Williams (4):
      efi: Add a flags parameter to efi_memory_map
      efi: Add tracking for dynamically allocated memmaps
      efi: Fix efi_memmap_alloc() leaks
      efi: Fix handling of multiple efi_fake_mem= entries


 arch/x86/platform/efi/efi.c     |   10 +++-
 arch/x86/platform/efi/quirks.c  |   23 ++++------
 drivers/firmware/efi/fake_mem.c |   43 +++++++++---------
 drivers/firmware/efi/memmap.c   |   94 ++++++++++++++++++++++++++-------------
 include/linux/efi.h             |   17 +++++--
 5 files changed, 113 insertions(+), 74 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/4] efi: Add a flags parameter to efi_memory_map
  2020-01-07  0:40 [PATCH v4 0/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
@ 2020-01-07  0:40 ` Dan Williams
  2020-01-07  0:40 ` [PATCH v4 2/4] efi: Add tracking for dynamically allocated memmaps Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2020-01-07  0:40 UTC (permalink / raw)
  To: mingo; +Cc: Taku Izumi, Ard Biesheuvel, linux-efi, x86, linux-kernel, kexec

In preparation for garbage collecting dynamically allocated efi memory
maps, where the allocation method of memblock vs slab needs to be
recalled, convert the existing 'late' flag into a 'flags' bitmask.

Arrange for the flag to be passed via 'struct efi_memory_map_data'. This
structure grows additional flags in follow-on changes.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/memmap.c |   31 +++++++++++++++++--------------
 include/linux/efi.h           |    4 +++-
 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 38b686c67b17..4f98eb12c172 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -52,21 +52,20 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
- * @late: Use early or late mapping function?
  *
  * This function takes care of figuring out which function to use to
  * map the EFI memory map in efi.memmap based on how far into the boot
  * we are.
  *
- * During bootup @late should be %false since we only have access to
- * the early_memremap*() functions as the vmalloc space isn't setup.
- * Once the kernel is fully booted we can fallback to the more robust
- * memremap*() API.
+ * During bootup EFI_MEMMAP_LATE in data->flags should be clear since we
+ * only have access to the early_memremap*() functions as the vmalloc
+ * space isn't setup.  Once the kernel is fully booted we can fallback
+ * to the more robust memremap*() API.
  *
  * Returns zero on success, a negative error code on failure.
  */
 static int __init
-__efi_memmap_init(struct efi_memory_map_data *data, bool late)
+__efi_memmap_init(struct efi_memory_map_data *data)
 {
 	struct efi_memory_map map;
 	phys_addr_t phys_map;
@@ -76,7 +75,7 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 
 	phys_map = data->phys_map;
 
-	if (late)
+	if (data->flags & EFI_MEMMAP_LATE)
 		map.map = memremap(phys_map, data->size, MEMREMAP_WB);
 	else
 		map.map = early_memremap(phys_map, data->size);
@@ -92,7 +91,7 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 
 	map.desc_version = data->desc_version;
 	map.desc_size = data->desc_size;
-	map.late = late;
+	map.flags = data->flags;
 
 	set_bit(EFI_MEMMAP, &efi.flags);
 
@@ -111,9 +110,10 @@ __efi_memmap_init(struct efi_memory_map_data *data, bool late)
 int __init efi_memmap_init_early(struct efi_memory_map_data *data)
 {
 	/* Cannot go backwards */
-	WARN_ON(efi.memmap.late);
+	WARN_ON(efi.memmap.flags & EFI_MEMMAP_LATE);
 
-	return __efi_memmap_init(data, false);
+	data->flags = 0;
+	return __efi_memmap_init(data);
 }
 
 void __init efi_memmap_unmap(void)
@@ -121,7 +121,7 @@ void __init efi_memmap_unmap(void)
 	if (!efi_enabled(EFI_MEMMAP))
 		return;
 
-	if (!efi.memmap.late) {
+	if (!(efi.memmap.flags & EFI_MEMMAP_LATE)) {
 		unsigned long size;
 
 		size = efi.memmap.desc_size * efi.memmap.nr_map;
@@ -162,13 +162,14 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 	struct efi_memory_map_data data = {
 		.phys_map = addr,
 		.size = size,
+		.flags = EFI_MEMMAP_LATE,
 	};
 
 	/* Did we forget to unmap the early EFI memmap? */
 	WARN_ON(efi.memmap.map);
 
 	/* Were we already called? */
-	WARN_ON(efi.memmap.late);
+	WARN_ON(efi.memmap.flags & EFI_MEMMAP_LATE);
 
 	/*
 	 * It makes no sense to allow callers to register different
@@ -178,7 +179,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 	data.desc_version = efi.memmap.desc_version;
 	data.desc_size = efi.memmap.desc_size;
 
-	return __efi_memmap_init(&data, true);
+	return __efi_memmap_init(&data);
 }
 
 /**
@@ -195,6 +196,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
 {
 	struct efi_memory_map_data data;
+	unsigned long flags;
 
 	efi_memmap_unmap();
 
@@ -202,8 +204,9 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
 	data.size = efi.memmap.desc_size * nr_map;
 	data.desc_version = efi.memmap.desc_version;
 	data.desc_size = efi.memmap.desc_size;
+	data.flags = efi.memmap.flags & EFI_MEMMAP_LATE;
 
-	return __efi_memmap_init(&data, efi.memmap.late);
+	return __efi_memmap_init(&data);
 }
 
 /**
diff --git a/include/linux/efi.h b/include/linux/efi.h
index aa54586db7a5..e7e6c64469a7 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -786,6 +786,7 @@ struct efi_memory_map_data {
 	unsigned long size;
 	unsigned long desc_version;
 	unsigned long desc_size;
+	unsigned long flags;
 };
 
 struct efi_memory_map {
@@ -795,7 +796,8 @@ struct efi_memory_map {
 	int nr_map;
 	unsigned long desc_version;
 	unsigned long desc_size;
-	bool late;
+#define EFI_MEMMAP_LATE (1UL << 0)
+	unsigned long flags;
 };
 
 struct efi_mem_range {


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/4] efi: Add tracking for dynamically allocated memmaps
  2020-01-07  0:40 [PATCH v4 0/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
  2020-01-07  0:40 ` [PATCH v4 1/4] efi: Add a flags parameter to efi_memory_map Dan Williams
@ 2020-01-07  0:40 ` Dan Williams
  2020-01-07  0:40 ` [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks Dan Williams
  2020-01-07  0:40 ` [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
  3 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2020-01-07  0:40 UTC (permalink / raw)
  To: mingo; +Cc: Taku Izumi, Ard Biesheuvel, linux-efi, x86, linux-kernel, kexec

In preparation for fixing efi_memmap_alloc() leaks, add support for
recording whether the memmap was dynamically allocated from slab,
memblock, or is the original physical memmap provided by the platform.

Given this tracking is established in efi_memmap_alloc() and needs to be
carried to efi_memmap_install(), use 'struct efi_memory_map_data' to
convey the flags.

Some small cleanups result from this reorganization, specifically the
removal of local variables for 'phys' and 'size' that are already
tracked in @data.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/platform/efi/efi.c     |   10 +++++++-
 arch/x86/platform/efi/quirks.c  |   23 +++++++------------
 drivers/firmware/efi/fake_mem.c |   14 +++++-------
 drivers/firmware/efi/memmap.c   |   47 ++++++++++++++++++++++-----------------
 include/linux/efi.h             |   11 ++++++---
 5 files changed, 56 insertions(+), 49 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 38d44f36d5ed..0bd45a988501 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -330,10 +330,16 @@ static void __init efi_clean_memmap(void)
 	}
 
 	if (n_removal > 0) {
-		u64 size = efi.memmap.nr_map - n_removal;
+		struct efi_memory_map_data data = {
+			.phys_map = efi.memmap.phys_map,
+			.desc_version = efi.memmap.desc_version,
+			.desc_size = efi.memmap.desc_size,
+			.size = data.desc_size * (efi.memmap.nr_map - n_removal),
+			.flags = 0,
+		};
 
 		pr_warn("Removing %d invalid memory map entries.\n", n_removal);
-		efi_memmap_install(efi.memmap.phys_map, size);
+		efi_memmap_install(&data);
 	}
 }
 
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f8f0220b6a66..a4f6c7176f24 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -243,7 +243,7 @@ EXPORT_SYMBOL_GPL(efi_query_variable_store);
  */
 void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 {
-	phys_addr_t new_phys, new_size;
+	struct efi_memory_map_data data = { 0 };
 	struct efi_mem_range mr;
 	efi_memory_desc_t md;
 	int num_entries;
@@ -271,24 +271,21 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 	num_entries = efi_memmap_split_count(&md, &mr.range);
 	num_entries += efi.memmap.nr_map;
 
-	new_size = efi.memmap.desc_size * num_entries;
-
-	new_phys = efi_memmap_alloc(num_entries);
-	if (!new_phys) {
+	if (efi_memmap_alloc(num_entries, &data) != 0) {
 		pr_err("Could not allocate boot services memmap\n");
 		return;
 	}
 
-	new = early_memremap(new_phys, new_size);
+	new = early_memremap(data.phys_map, data.size);
 	if (!new) {
 		pr_err("Failed to map new boot services memmap\n");
 		return;
 	}
 
 	efi_memmap_insert(&efi.memmap, new, &mr);
-	early_memunmap(new, new_size);
+	early_memunmap(new, data.size);
 
-	efi_memmap_install(new_phys, num_entries);
+	efi_memmap_install(&data);
 	e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED);
 	e820__update_table(e820_table);
 }
@@ -407,7 +404,7 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
 
 void __init efi_free_boot_services(void)
 {
-	phys_addr_t new_phys, new_size;
+	struct efi_memory_map_data data = { 0 };
 	efi_memory_desc_t *md;
 	int num_entries = 0;
 	void *new, *new_md;
@@ -462,14 +459,12 @@ void __init efi_free_boot_services(void)
 	if (!num_entries)
 		return;
 
-	new_size = efi.memmap.desc_size * num_entries;
-	new_phys = efi_memmap_alloc(num_entries);
-	if (!new_phys) {
+	if (efi_memmap_alloc(num_entries, &data) != 0) {
 		pr_err("Failed to allocate new EFI memmap\n");
 		return;
 	}
 
-	new = memremap(new_phys, new_size, MEMREMAP_WB);
+	new = memremap(data.phys_map, data.size, MEMREMAP_WB);
 	if (!new) {
 		pr_err("Failed to map new EFI memmap\n");
 		return;
@@ -493,7 +488,7 @@ void __init efi_free_boot_services(void)
 
 	memunmap(new);
 
-	if (efi_memmap_install(new_phys, num_entries)) {
+	if (efi_memmap_install(&data) != 0) {
 		pr_err("Could not install new EFI memmap\n");
 		return;
 	}
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index bb9fc70d0cfa..a8d20568d532 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -36,9 +36,9 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
 
 void __init efi_fake_memmap(void)
 {
+	struct efi_memory_map_data data = { 0 };
 	int new_nr_map = efi.memmap.nr_map;
 	efi_memory_desc_t *md;
-	phys_addr_t new_memmap_phy;
 	void *new_memmap;
 	int i;
 
@@ -55,15 +55,13 @@ void __init efi_fake_memmap(void)
 	}
 
 	/* allocate memory for new EFI memmap */
-	new_memmap_phy = efi_memmap_alloc(new_nr_map);
-	if (!new_memmap_phy)
+	if (efi_memmap_alloc(new_nr_map, &data) != 0)
 		return;
 
 	/* create new EFI memmap */
-	new_memmap = early_memremap(new_memmap_phy,
-				    efi.memmap.desc_size * new_nr_map);
+	new_memmap = early_memremap(data.phys_map, data.size);
 	if (!new_memmap) {
-		memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map);
+		memblock_free(data.phys_map, data.size);
 		return;
 	}
 
@@ -71,9 +69,9 @@ void __init efi_fake_memmap(void)
 		efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
 
 	/* swap into new EFI memmap */
-	early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
+	early_memunmap(new_memmap, data.size);
 
-	efi_memmap_install(new_memmap_phy, new_nr_map);
+	efi_memmap_install(&data);
 
 	/* print new EFI memmap */
 	efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 4f98eb12c172..04dfa56b994b 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -32,6 +32,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 /**
  * efi_memmap_alloc - Allocate memory for the EFI memory map
  * @num_entries: Number of entries in the allocated map.
+ * @data: efi memmap installation parameters
  *
  * Depending on whether mm_init() has already been invoked or not,
  * either memblock or "normal" page allocation is used.
@@ -39,14 +40,29 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
  * Returns the physical address of the allocated memory map on
  * success, zero on failure.
  */
-phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+int __init efi_memmap_alloc(unsigned int num_entries,
+		struct efi_memory_map_data *data)
 {
-	unsigned long size = num_entries * efi.memmap.desc_size;
-
-	if (slab_is_available())
-		return __efi_memmap_alloc_late(size);
+	/* Expect allocation parameters are zero initialized */
+	WARN_ON(data->phys_map || data->size);
+
+	data->size = num_entries * efi.memmap.desc_size;
+	data->desc_version = efi.memmap.desc_version;
+	data->desc_size = efi.memmap.desc_size;
+	data->flags &= ~(EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK);
+	data->flags |= efi.memmap.flags & EFI_MEMMAP_LATE;
+
+	if (slab_is_available()) {
+		data->flags |= EFI_MEMMAP_SLAB;
+		data->phys_map = __efi_memmap_alloc_late(data->size);
+	} else {
+		data->flags |= EFI_MEMMAP_MEMBLOCK;
+		data->phys_map = __efi_memmap_alloc_early(data->size);
+	}
 
-	return __efi_memmap_alloc_early(size);
+	if (!data->phys_map)
+		return -ENOMEM;
+	return 0;
 }
 
 /**
@@ -64,8 +80,7 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
  *
  * Returns zero on success, a negative error code on failure.
  */
-static int __init
-__efi_memmap_init(struct efi_memory_map_data *data)
+static int __init __efi_memmap_init(struct efi_memory_map_data *data)
 {
 	struct efi_memory_map map;
 	phys_addr_t phys_map;
@@ -184,8 +199,7 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
 
 /**
  * efi_memmap_install - Install a new EFI memory map in efi.memmap
- * @addr: Physical address of the memory map
- * @nr_map: Number of entries in the memory map
+ * @ctx: map allocation parameters (address, size, flags)
  *
  * Unlike efi_memmap_init_*(), this function does not allow the caller
  * to switch from early to late mappings. It simply uses the existing
@@ -193,20 +207,11 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size)
  *
  * Returns zero on success, a negative error code on failure.
  */
-int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map)
+int __init efi_memmap_install(struct efi_memory_map_data *data)
 {
-	struct efi_memory_map_data data;
-	unsigned long flags;
-
 	efi_memmap_unmap();
 
-	data.phys_map = addr;
-	data.size = efi.memmap.desc_size * nr_map;
-	data.desc_version = efi.memmap.desc_version;
-	data.desc_size = efi.memmap.desc_size;
-	data.flags = efi.memmap.flags & EFI_MEMMAP_LATE;
-
-	return __efi_memmap_init(&data);
+	return __efi_memmap_init(data);
 }
 
 /**
diff --git a/include/linux/efi.h b/include/linux/efi.h
index e7e6c64469a7..416eac01b1a1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -778,8 +778,8 @@ typedef struct {
 
 /*
  * Architecture independent structure for describing a memory map for the
- * benefit of efi_memmap_init_early(), saving us the need to pass four
- * parameters.
+ * benefit of efi_memmap_init_early(), and for passing context between
+ * efi_memmap_alloc() and efi_memmap_install().
  */
 struct efi_memory_map_data {
 	phys_addr_t phys_map;
@@ -797,6 +797,8 @@ struct efi_memory_map {
 	unsigned long desc_version;
 	unsigned long desc_size;
 #define EFI_MEMMAP_LATE (1UL << 0)
+#define EFI_MEMMAP_MEMBLOCK (1UL << 1)
+#define EFI_MEMMAP_SLAB (1UL << 2)
 	unsigned long flags;
 };
 
@@ -1058,11 +1060,12 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
+extern int __init efi_memmap_alloc(unsigned int num_entries,
+				   struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);
-extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map);
+extern int __init efi_memmap_install(struct efi_memory_map_data *data);
 extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
 					 struct range *range);
 extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks
  2020-01-07  0:40 [PATCH v4 0/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
  2020-01-07  0:40 ` [PATCH v4 1/4] efi: Add a flags parameter to efi_memory_map Dan Williams
  2020-01-07  0:40 ` [PATCH v4 2/4] efi: Add tracking for dynamically allocated memmaps Dan Williams
@ 2020-01-07  0:40 ` Dan Williams
  2020-01-07  3:58   ` Dave Young
  2020-01-07  0:40 ` [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
  3 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2020-01-07  0:40 UTC (permalink / raw)
  To: mingo; +Cc: Taku Izumi, Ard Biesheuvel, linux-efi, x86, linux-kernel, kexec

With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
updated and replaced multiple times. When that happens a previous
dynamically allocated efi memory map can be garbage collected. Use the
new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
allocated memory map is being replaced.

Debug statements in efi_memmap_free() reveal:

 efi: __efi_memmap_free:37: phys: 0x23ffdd580 size: 2688 flags: 0x2
 efi: __efi_memmap_free:37: phys: 0x9db00 size: 2640 flags: 0x2
 efi: __efi_memmap_free:37: phys: 0x9e580 size: 2640 flags: 0x2

...a savings of 7968 bytes on a qemu boot with 2 entries specified to
efi_fake_mem=.

Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 04dfa56b994b..bffa320d2f9a 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 	return PFN_PHYS(page_to_pfn(p));
 }
 
+static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
+{
+	if (flags & EFI_MEMMAP_MEMBLOCK) {
+		if (slab_is_available())
+			memblock_free_late(phys, size);
+		else
+			memblock_free(phys, size);
+	} else if (flags & EFI_MEMMAP_SLAB) {
+		struct page *p = pfn_to_page(PHYS_PFN(phys));
+		unsigned int order = get_order(size);
+
+		free_pages((unsigned long) page_address(p), order);
+	}
+}
+
+static void __init efi_memmap_free(void)
+{
+	__efi_memmap_free(efi.memmap.phys_map,
+			efi.memmap.desc_size * efi.memmap.nr_map,
+			efi.memmap.flags);
+}
+
 /**
  * efi_memmap_alloc - Allocate memory for the EFI memory map
  * @num_entries: Number of entries in the allocated map.
@@ -100,6 +122,8 @@ static int __init __efi_memmap_init(struct efi_memory_map_data *data)
 		return -ENOMEM;
 	}
 
+	efi_memmap_free();
+
 	map.phys_map = data->phys_map;
 	map.nr_map = data->size / data->desc_size;
 	map.map_end = map.map + data->size;


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-07  0:40 [PATCH v4 0/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
                   ` (2 preceding siblings ...)
  2020-01-07  0:40 ` [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks Dan Williams
@ 2020-01-07  0:40 ` Dan Williams
  2020-01-07  4:04   ` Dave Young
  3 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2020-01-07  0:40 UTC (permalink / raw)
  To: mingo
  Cc: Dave Young, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, linux-efi, x86, linux-kernel,
	kexec

Dave noticed that when specifying multiple efi_fake_mem= entries only
the last entry was successfully being reflected in the efi memory map.
This is due to the fact that the efi_memmap_insert() is being called
multiple times, but on successive invocations the insertion should be
applied to the last new memmap rather than the original map at
efi_fake_memmap() entry.

Rework efi_fake_memmap() to install the new memory map after each
efi_fake_mem= entry is parsed.

This also fixes an issue in efi_fake_memmap() that caused it to litter
emtpy entries into the end of the efi memory map. An empty entry causes
efi_memmap_insert() to attempt more memmap splits / copies than
efi_memmap_split_count() accounted for when sizing the new map. When
that happens efi_memmap_insert() may overrun its allocation, and if you
are lucky will spill over to an unmapped page leading to crash
signature like the following rather than silent corruption:

    BUG: unable to handle page fault for address: ffffffffff281000
    [..]
    RIP: 0010:efi_memmap_insert+0x11d/0x191
    [..]
    Call Trace:
     ? bgrt_init+0xbe/0xbe
     ? efi_arch_mem_reserve+0x1cb/0x228
     ? acpi_parse_bgrt+0xa/0xd
     ? acpi_table_parse+0x86/0xb8
     ? acpi_boot_init+0x494/0x4e3
     ? acpi_parse_x2apic+0x87/0x87
     ? setup_acpi_sci+0xa2/0xa2
     ? setup_arch+0x8db/0x9e1
     ? start_kernel+0x6a/0x547
     ? secondary_startup_64+0xb6/0xc0

Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
services data to fix kexec breakage" is listed in Fixes: since it
introduces more occurrences where efi_memmap_insert() is invoked after
an efi_fake_mem= configuration has been parsed. Previously the side
effects of vestigial empty entries were benign, but with commit
af1648984828 that follow-on efi_memmap_insert() invocation triggers
efi_memmap_insert() overruns.

Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
Link: https://lore.kernel.org/r/20191231014630.GA24942@dhcp-128-65.nay.redhat.com
Reported-by: Dave Young <dyoung@redhat.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Michael Weiser <michael@weiser.dinsnail.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/fake_mem.c |   31 ++++++++++++++++---------------
 drivers/firmware/efi/memmap.c   |    2 +-
 include/linux/efi.h             |    2 ++
 3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index a8d20568d532..6e0f34a38171 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -34,25 +34,16 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
 	return 0;
 }
 
-void __init efi_fake_memmap(void)
+static void __init efi_fake_range(struct efi_mem_range *efi_range)
 {
 	struct efi_memory_map_data data = { 0 };
 	int new_nr_map = efi.memmap.nr_map;
 	efi_memory_desc_t *md;
 	void *new_memmap;
-	int i;
-
-	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
-		return;
 
 	/* count up the number of EFI memory descriptor */
-	for (i = 0; i < nr_fake_mem; i++) {
-		for_each_efi_memory_desc(md) {
-			struct range *r = &efi_fake_mems[i].range;
-
-			new_nr_map += efi_memmap_split_count(md, r);
-		}
-	}
+	for_each_efi_memory_desc(md)
+		new_nr_map += efi_memmap_split_count(md, &efi_range->range);
 
 	/* allocate memory for new EFI memmap */
 	if (efi_memmap_alloc(new_nr_map, &data) != 0)
@@ -61,17 +52,27 @@ void __init efi_fake_memmap(void)
 	/* create new EFI memmap */
 	new_memmap = early_memremap(data.phys_map, data.size);
 	if (!new_memmap) {
-		memblock_free(data.phys_map, data.size);
+		__efi_memmap_free(data.phys_map, data.size, data.flags);
 		return;
 	}
 
-	for (i = 0; i < nr_fake_mem; i++)
-		efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
+	efi_memmap_insert(&efi.memmap, new_memmap, efi_range);
 
 	/* swap into new EFI memmap */
 	early_memunmap(new_memmap, data.size);
 
 	efi_memmap_install(&data);
+}
+
+void __init efi_fake_memmap(void)
+{
+	int i;
+
+	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
+		return;
+
+	for (i = 0; i < nr_fake_mem; i++)
+		efi_fake_range(&efi_fake_mems[i]);
 
 	/* print new EFI memmap */
 	efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index bffa320d2f9a..1b6a4aa78a09 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -29,7 +29,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
 	return PFN_PHYS(page_to_pfn(p));
 }
 
-static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
+void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
 {
 	if (flags & EFI_MEMMAP_MEMBLOCK) {
 		if (slab_is_available())
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 416eac01b1a1..539e81c942dc 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1062,6 +1062,8 @@ extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
 extern int __init efi_memmap_alloc(unsigned int num_entries,
 				   struct efi_memory_map_data *data);
+extern void __efi_memmap_free(u64 phys, unsigned long size,
+			      unsigned long flags);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks
  2020-01-07  0:40 ` [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks Dan Williams
@ 2020-01-07  3:58   ` Dave Young
  2020-01-07  4:24     ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Young @ 2020-01-07  3:58 UTC (permalink / raw)
  To: Dan Williams
  Cc: mingo, Taku Izumi, Ard Biesheuvel, linux-efi, x86, linux-kernel, kexec

On 01/06/20 at 04:40pm, Dan Williams wrote:
> With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> updated and replaced multiple times. When that happens a previous
> dynamically allocated efi memory map can be garbage collected. Use the
> new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> allocated memory map is being replaced.
> 
> Debug statements in efi_memmap_free() reveal:
> 
>  efi: __efi_memmap_free:37: phys: 0x23ffdd580 size: 2688 flags: 0x2
>  efi: __efi_memmap_free:37: phys: 0x9db00 size: 2640 flags: 0x2
>  efi: __efi_memmap_free:37: phys: 0x9e580 size: 2640 flags: 0x2
> 
> ...a savings of 7968 bytes on a qemu boot with 2 entries specified to
> efi_fake_mem=.
> 
> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> index 04dfa56b994b..bffa320d2f9a 100644
> --- a/drivers/firmware/efi/memmap.c
> +++ b/drivers/firmware/efi/memmap.c
> @@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
>  	return PFN_PHYS(page_to_pfn(p));
>  }
>  
> +static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
> +{
> +	if (flags & EFI_MEMMAP_MEMBLOCK) {
> +		if (slab_is_available())
> +			memblock_free_late(phys, size);
> +		else
> +			memblock_free(phys, size);
> +	} else if (flags & EFI_MEMMAP_SLAB) {
> +		struct page *p = pfn_to_page(PHYS_PFN(phys));
> +		unsigned int order = get_order(size);
> +
> +		free_pages((unsigned long) page_address(p), order);
> +	}
> +}
> +
> +static void __init efi_memmap_free(void)
> +{
> +	__efi_memmap_free(efi.memmap.phys_map,
> +			efi.memmap.desc_size * efi.memmap.nr_map,
> +			efi.memmap.flags);
> +}
> +
>  /**
>   * efi_memmap_alloc - Allocate memory for the EFI memory map
>   * @num_entries: Number of entries in the allocated map.
> @@ -100,6 +122,8 @@ static int __init __efi_memmap_init(struct efi_memory_map_data *data)
>  		return -ENOMEM;
>  	}
>  
> +	efi_memmap_free();
> +

This seems still not safe,  see below function:
arch/x86/platform/efi/efi.c:
static void __init efi_clean_memmap(void)
It use same memmap for both old and new, and filter out those invalid
ranges in place, if the memory is freed then ..

>  	map.phys_map = data->phys_map;
>  	map.nr_map = data->size / data->desc_size;
>  	map.map_end = map.map + data->size;
> 

Thanks
Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-07  0:40 ` [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
@ 2020-01-07  4:04   ` Dave Young
  2020-01-07  4:16     ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Young @ 2020-01-07  4:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: mingo, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, linux-efi, x86, linux-kernel,
	kexec

On 01/06/20 at 04:40pm, Dan Williams wrote:
> Dave noticed that when specifying multiple efi_fake_mem= entries only
> the last entry was successfully being reflected in the efi memory map.
> This is due to the fact that the efi_memmap_insert() is being called
> multiple times, but on successive invocations the insertion should be
> applied to the last new memmap rather than the original map at
> efi_fake_memmap() entry.
> 
> Rework efi_fake_memmap() to install the new memory map after each
> efi_fake_mem= entry is parsed.
> 
> This also fixes an issue in efi_fake_memmap() that caused it to litter
> emtpy entries into the end of the efi memory map. An empty entry causes
> efi_memmap_insert() to attempt more memmap splits / copies than
> efi_memmap_split_count() accounted for when sizing the new map. When
> that happens efi_memmap_insert() may overrun its allocation, and if you
> are lucky will spill over to an unmapped page leading to crash
> signature like the following rather than silent corruption:
> 
>     BUG: unable to handle page fault for address: ffffffffff281000
>     [..]
>     RIP: 0010:efi_memmap_insert+0x11d/0x191
>     [..]
>     Call Trace:
>      ? bgrt_init+0xbe/0xbe
>      ? efi_arch_mem_reserve+0x1cb/0x228
>      ? acpi_parse_bgrt+0xa/0xd
>      ? acpi_table_parse+0x86/0xb8
>      ? acpi_boot_init+0x494/0x4e3
>      ? acpi_parse_x2apic+0x87/0x87
>      ? setup_acpi_sci+0xa2/0xa2
>      ? setup_arch+0x8db/0x9e1
>      ? start_kernel+0x6a/0x547
>      ? secondary_startup_64+0xb6/0xc0
> 
> Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> services data to fix kexec breakage" is listed in Fixes: since it
> introduces more occurrences where efi_memmap_insert() is invoked after
> an efi_fake_mem= configuration has been parsed. Previously the side
> effects of vestigial empty entries were benign, but with commit
> af1648984828 that follow-on efi_memmap_insert() invocation triggers
> efi_memmap_insert() overruns.
> 
> Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")

A nitpick for the Fixes flags, as I replied in the thread below:
https://lore.kernel.org/linux-efi/CAPcyv4jLxqPaB22Ao9oV31Gm=b0+Phty+Uz33Snex4QchOUb0Q@mail.gmail.com/T/#m2bb2dd00f7715c9c19ccc48efef0fcd5fdb626e7

I reproduced two other panics without the patches applied, so this issue
is not caused by either of the commits, maybe just drop the Fixes.

> Link: https://lore.kernel.org/r/20191231014630.GA24942@dhcp-128-65.nay.redhat.com
> Reported-by: Dave Young <dyoung@redhat.com>
> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> Cc: Michael Weiser <michael@weiser.dinsnail.net>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/firmware/efi/fake_mem.c |   31 ++++++++++++++++---------------
>  drivers/firmware/efi/memmap.c   |    2 +-
>  include/linux/efi.h             |    2 ++
>  3 files changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> index a8d20568d532..6e0f34a38171 100644
> --- a/drivers/firmware/efi/fake_mem.c
> +++ b/drivers/firmware/efi/fake_mem.c
> @@ -34,25 +34,16 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
>  	return 0;
>  }
>  
> -void __init efi_fake_memmap(void)
> +static void __init efi_fake_range(struct efi_mem_range *efi_range)
>  {
>  	struct efi_memory_map_data data = { 0 };
>  	int new_nr_map = efi.memmap.nr_map;
>  	efi_memory_desc_t *md;
>  	void *new_memmap;
> -	int i;
> -
> -	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
> -		return;
>  
>  	/* count up the number of EFI memory descriptor */
> -	for (i = 0; i < nr_fake_mem; i++) {
> -		for_each_efi_memory_desc(md) {
> -			struct range *r = &efi_fake_mems[i].range;
> -
> -			new_nr_map += efi_memmap_split_count(md, r);
> -		}
> -	}
> +	for_each_efi_memory_desc(md)
> +		new_nr_map += efi_memmap_split_count(md, &efi_range->range);

For this part, although I still have some concerns, but since I'm not
100% clear about it, maybe just leave it as you do, and see if it is
good to Ard.

>  
>  	/* allocate memory for new EFI memmap */
>  	if (efi_memmap_alloc(new_nr_map, &data) != 0)
> @@ -61,17 +52,27 @@ void __init efi_fake_memmap(void)
>  	/* create new EFI memmap */
>  	new_memmap = early_memremap(data.phys_map, data.size);
>  	if (!new_memmap) {
> -		memblock_free(data.phys_map, data.size);
> +		__efi_memmap_free(data.phys_map, data.size, data.flags);
>  		return;
>  	}
>  
> -	for (i = 0; i < nr_fake_mem; i++)
> -		efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
> +	efi_memmap_insert(&efi.memmap, new_memmap, efi_range);
>  
>  	/* swap into new EFI memmap */
>  	early_memunmap(new_memmap, data.size);
>  
>  	efi_memmap_install(&data);
> +}
> +
> +void __init efi_fake_memmap(void)
> +{
> +	int i;
> +
> +	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
> +		return;
> +
> +	for (i = 0; i < nr_fake_mem; i++)
> +		efi_fake_range(&efi_fake_mems[i]);
>  
>  	/* print new EFI memmap */
>  	efi_print_memmap();
> diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> index bffa320d2f9a..1b6a4aa78a09 100644
> --- a/drivers/firmware/efi/memmap.c
> +++ b/drivers/firmware/efi/memmap.c
> @@ -29,7 +29,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
>  	return PFN_PHYS(page_to_pfn(p));
>  }
>  
> -static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
> +void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
>  {
>  	if (flags & EFI_MEMMAP_MEMBLOCK) {
>  		if (slab_is_available())
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 416eac01b1a1..539e81c942dc 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -1062,6 +1062,8 @@ extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
>  
>  extern int __init efi_memmap_alloc(unsigned int num_entries,
>  				   struct efi_memory_map_data *data);
> +extern void __efi_memmap_free(u64 phys, unsigned long size,
> +			      unsigned long flags);
>  extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
>  extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
>  extern void __init efi_memmap_unmap(void);
> 

Thanks
Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-07  4:04   ` Dave Young
@ 2020-01-07  4:16     ` Dan Williams
  2020-01-07  5:19       ` Dave Young
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2020-01-07  4:16 UTC (permalink / raw)
  To: Dave Young
  Cc: Ingo Molnar, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On Mon, Jan 6, 2020 at 8:04 PM Dave Young <dyoung@redhat.com> wrote:
>
> On 01/06/20 at 04:40pm, Dan Williams wrote:
> > Dave noticed that when specifying multiple efi_fake_mem= entries only
> > the last entry was successfully being reflected in the efi memory map.
> > This is due to the fact that the efi_memmap_insert() is being called
> > multiple times, but on successive invocations the insertion should be
> > applied to the last new memmap rather than the original map at
> > efi_fake_memmap() entry.
> >
> > Rework efi_fake_memmap() to install the new memory map after each
> > efi_fake_mem= entry is parsed.
> >
> > This also fixes an issue in efi_fake_memmap() that caused it to litter
> > emtpy entries into the end of the efi memory map. An empty entry causes
> > efi_memmap_insert() to attempt more memmap splits / copies than
> > efi_memmap_split_count() accounted for when sizing the new map. When
> > that happens efi_memmap_insert() may overrun its allocation, and if you
> > are lucky will spill over to an unmapped page leading to crash
> > signature like the following rather than silent corruption:
> >
> >     BUG: unable to handle page fault for address: ffffffffff281000
> >     [..]
> >     RIP: 0010:efi_memmap_insert+0x11d/0x191
> >     [..]
> >     Call Trace:
> >      ? bgrt_init+0xbe/0xbe
> >      ? efi_arch_mem_reserve+0x1cb/0x228
> >      ? acpi_parse_bgrt+0xa/0xd
> >      ? acpi_table_parse+0x86/0xb8
> >      ? acpi_boot_init+0x494/0x4e3
> >      ? acpi_parse_x2apic+0x87/0x87
> >      ? setup_acpi_sci+0xa2/0xa2
> >      ? setup_arch+0x8db/0x9e1
> >      ? start_kernel+0x6a/0x547
> >      ? secondary_startup_64+0xb6/0xc0
> >
> > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> > services data to fix kexec breakage" is listed in Fixes: since it
> > introduces more occurrences where efi_memmap_insert() is invoked after
> > an efi_fake_mem= configuration has been parsed. Previously the side
> > effects of vestigial empty entries were benign, but with commit
> > af1648984828 that follow-on efi_memmap_insert() invocation triggers
> > efi_memmap_insert() overruns.
> >
> > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
>
> A nitpick for the Fixes flags, as I replied in the thread below:
> https://lore.kernel.org/linux-efi/CAPcyv4jLxqPaB22Ao9oV31Gm=b0+Phty+Uz33Snex4QchOUb0Q@mail.gmail.com/T/#m2bb2dd00f7715c9c19ccc48efef0fcd5fdb626e7
>
> I reproduced two other panics without the patches applied, so this issue
> is not caused by either of the commits, maybe just drop the Fixes.

Just the "Fixes: af1648984828", right? No objection from me. I'll let
Ingo say if he needs a resend for that.

The "Fixes: 0f96a99dab36" is valid because the original implementation
failed to handle the multiple argument case from the beginning.

>
> > Link: https://lore.kernel.org/r/20191231014630.GA24942@dhcp-128-65.nay.redhat.com
> > Reported-by: Dave Young <dyoung@redhat.com>
> > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> > Cc: Michael Weiser <michael@weiser.dinsnail.net>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/firmware/efi/fake_mem.c |   31 ++++++++++++++++---------------
> >  drivers/firmware/efi/memmap.c   |    2 +-
> >  include/linux/efi.h             |    2 ++
> >  3 files changed, 19 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> > index a8d20568d532..6e0f34a38171 100644
> > --- a/drivers/firmware/efi/fake_mem.c
> > +++ b/drivers/firmware/efi/fake_mem.c
> > @@ -34,25 +34,16 @@ static int __init cmp_fake_mem(const void *x1, const void *x2)
> >       return 0;
> >  }
> >
> > -void __init efi_fake_memmap(void)
> > +static void __init efi_fake_range(struct efi_mem_range *efi_range)
> >  {
> >       struct efi_memory_map_data data = { 0 };
> >       int new_nr_map = efi.memmap.nr_map;
> >       efi_memory_desc_t *md;
> >       void *new_memmap;
> > -     int i;
> > -
> > -     if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
> > -             return;
> >
> >       /* count up the number of EFI memory descriptor */
> > -     for (i = 0; i < nr_fake_mem; i++) {
> > -             for_each_efi_memory_desc(md) {
> > -                     struct range *r = &efi_fake_mems[i].range;
> > -
> > -                     new_nr_map += efi_memmap_split_count(md, r);
> > -             }
> > -     }
> > +     for_each_efi_memory_desc(md)
> > +             new_nr_map += efi_memmap_split_count(md, &efi_range->range);
>
> For this part, although I still have some concerns, but since I'm not
> 100% clear about it, maybe just leave it as you do, and see if it is
> good to Ard.

Absent a specific failure case I didn't see anything to change here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks
  2020-01-07  3:58   ` Dave Young
@ 2020-01-07  4:24     ` Dan Williams
  2020-01-07  5:18       ` Dave Young
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2020-01-07  4:24 UTC (permalink / raw)
  To: Dave Young
  Cc: Ingo Molnar, Taku Izumi, Ard Biesheuvel, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On Mon, Jan 6, 2020 at 7:58 PM Dave Young <dyoung@redhat.com> wrote:
>
> On 01/06/20 at 04:40pm, Dan Williams wrote:
> > With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> > updated and replaced multiple times. When that happens a previous
> > dynamically allocated efi memory map can be garbage collected. Use the
> > new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> > allocated memory map is being replaced.
> >
> > Debug statements in efi_memmap_free() reveal:
> >
> >  efi: __efi_memmap_free:37: phys: 0x23ffdd580 size: 2688 flags: 0x2
> >  efi: __efi_memmap_free:37: phys: 0x9db00 size: 2640 flags: 0x2
> >  efi: __efi_memmap_free:37: phys: 0x9e580 size: 2640 flags: 0x2
> >
> > ...a savings of 7968 bytes on a qemu boot with 2 entries specified to
> > efi_fake_mem=.
> >
> > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> > index 04dfa56b994b..bffa320d2f9a 100644
> > --- a/drivers/firmware/efi/memmap.c
> > +++ b/drivers/firmware/efi/memmap.c
> > @@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
> >       return PFN_PHYS(page_to_pfn(p));
> >  }
> >
> > +static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
> > +{
> > +     if (flags & EFI_MEMMAP_MEMBLOCK) {
> > +             if (slab_is_available())
> > +                     memblock_free_late(phys, size);
> > +             else
> > +                     memblock_free(phys, size);
> > +     } else if (flags & EFI_MEMMAP_SLAB) {
> > +             struct page *p = pfn_to_page(PHYS_PFN(phys));
> > +             unsigned int order = get_order(size);
> > +
> > +             free_pages((unsigned long) page_address(p), order);
> > +     }
> > +}
> > +
> > +static void __init efi_memmap_free(void)
> > +{
> > +     __efi_memmap_free(efi.memmap.phys_map,
> > +                     efi.memmap.desc_size * efi.memmap.nr_map,
> > +                     efi.memmap.flags);
> > +}
> > +
> >  /**
> >   * efi_memmap_alloc - Allocate memory for the EFI memory map
> >   * @num_entries: Number of entries in the allocated map.
> > @@ -100,6 +122,8 @@ static int __init __efi_memmap_init(struct efi_memory_map_data *data)
> >               return -ENOMEM;
> >       }
> >
> > +     efi_memmap_free();
> > +
>
> This seems still not safe,  see below function:
> arch/x86/platform/efi/efi.c:
> static void __init efi_clean_memmap(void)
> It use same memmap for both old and new, and filter out those invalid
> ranges in place, if the memory is freed then ..

In the efi_clean_memmap() case flags are 0, so efi_memmap_free() is a nop.

Would you feel better with an explicit?

WARN_ON(efi.memmap.phys_map == data->phys_map && (data->flags &
(EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK))

...not sure it's worth it.

>
> >       map.phys_map = data->phys_map;
> >       map.nr_map = data->size / data->desc_size;
> >       map.map_end = map.map + data->size;
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks
  2020-01-07  4:24     ` Dan Williams
@ 2020-01-07  5:18       ` Dave Young
  2020-01-07 17:49         ` Ard Biesheuvel
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Young @ 2020-01-07  5:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, Taku Izumi, Ard Biesheuvel, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On 01/06/20 at 08:24pm, Dan Williams wrote:
> On Mon, Jan 6, 2020 at 7:58 PM Dave Young <dyoung@redhat.com> wrote:
> >
> > On 01/06/20 at 04:40pm, Dan Williams wrote:
> > > With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> > > updated and replaced multiple times. When that happens a previous
> > > dynamically allocated efi memory map can be garbage collected. Use the
> > > new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> > > allocated memory map is being replaced.
> > >
> > > Debug statements in efi_memmap_free() reveal:
> > >
> > >  efi: __efi_memmap_free:37: phys: 0x23ffdd580 size: 2688 flags: 0x2
> > >  efi: __efi_memmap_free:37: phys: 0x9db00 size: 2640 flags: 0x2
> > >  efi: __efi_memmap_free:37: phys: 0x9e580 size: 2640 flags: 0x2
> > >
> > > ...a savings of 7968 bytes on a qemu boot with 2 entries specified to
> > > efi_fake_mem=.
> > >
> > > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
> > >  1 file changed, 24 insertions(+)
> > >
> > > diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> > > index 04dfa56b994b..bffa320d2f9a 100644
> > > --- a/drivers/firmware/efi/memmap.c
> > > +++ b/drivers/firmware/efi/memmap.c
> > > @@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
> > >       return PFN_PHYS(page_to_pfn(p));
> > >  }
> > >
> > > +static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
> > > +{
> > > +     if (flags & EFI_MEMMAP_MEMBLOCK) {
> > > +             if (slab_is_available())
> > > +                     memblock_free_late(phys, size);
> > > +             else
> > > +                     memblock_free(phys, size);
> > > +     } else if (flags & EFI_MEMMAP_SLAB) {
> > > +             struct page *p = pfn_to_page(PHYS_PFN(phys));
> > > +             unsigned int order = get_order(size);
> > > +
> > > +             free_pages((unsigned long) page_address(p), order);
> > > +     }
> > > +}
> > > +
> > > +static void __init efi_memmap_free(void)
> > > +{
> > > +     __efi_memmap_free(efi.memmap.phys_map,
> > > +                     efi.memmap.desc_size * efi.memmap.nr_map,
> > > +                     efi.memmap.flags);
> > > +}
> > > +
> > >  /**
> > >   * efi_memmap_alloc - Allocate memory for the EFI memory map
> > >   * @num_entries: Number of entries in the allocated map.
> > > @@ -100,6 +122,8 @@ static int __init __efi_memmap_init(struct efi_memory_map_data *data)
> > >               return -ENOMEM;
> > >       }
> > >
> > > +     efi_memmap_free();
> > > +
> >
> > This seems still not safe,  see below function:
> > arch/x86/platform/efi/efi.c:
> > static void __init efi_clean_memmap(void)
> > It use same memmap for both old and new, and filter out those invalid
> > ranges in place, if the memory is freed then ..
> 
> In the efi_clean_memmap() case flags are 0, so efi_memmap_free() is a nop.
> 
> Would you feel better with an explicit?
> 
> WARN_ON(efi.memmap.phys_map == data->phys_map && (data->flags &
> (EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK))
> 
> ...not sure it's worth it.

Ah, yes, sorry I did not see the flags, although it is not very obvious.
Maybe add some code comment for efi_mem_alloc and efi_mem_init.

Let's defer the suggestion to Ard.

Thanks
Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-07  4:16     ` Dan Williams
@ 2020-01-07  5:19       ` Dave Young
  2020-01-07 17:51         ` Ard Biesheuvel
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Young @ 2020-01-07  5:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ingo Molnar, Taku Izumi, Michael Weiser, Ard Biesheuvel,
	Thomas Gleixner, Ingo Molnar, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On 01/06/20 at 08:16pm, Dan Williams wrote:
> On Mon, Jan 6, 2020 at 8:04 PM Dave Young <dyoung@redhat.com> wrote:
> >
> > On 01/06/20 at 04:40pm, Dan Williams wrote:
> > > Dave noticed that when specifying multiple efi_fake_mem= entries only
> > > the last entry was successfully being reflected in the efi memory map.
> > > This is due to the fact that the efi_memmap_insert() is being called
> > > multiple times, but on successive invocations the insertion should be
> > > applied to the last new memmap rather than the original map at
> > > efi_fake_memmap() entry.
> > >
> > > Rework efi_fake_memmap() to install the new memory map after each
> > > efi_fake_mem= entry is parsed.
> > >
> > > This also fixes an issue in efi_fake_memmap() that caused it to litter
> > > emtpy entries into the end of the efi memory map. An empty entry causes
> > > efi_memmap_insert() to attempt more memmap splits / copies than
> > > efi_memmap_split_count() accounted for when sizing the new map. When
> > > that happens efi_memmap_insert() may overrun its allocation, and if you
> > > are lucky will spill over to an unmapped page leading to crash
> > > signature like the following rather than silent corruption:
> > >
> > >     BUG: unable to handle page fault for address: ffffffffff281000
> > >     [..]
> > >     RIP: 0010:efi_memmap_insert+0x11d/0x191
> > >     [..]
> > >     Call Trace:
> > >      ? bgrt_init+0xbe/0xbe
> > >      ? efi_arch_mem_reserve+0x1cb/0x228
> > >      ? acpi_parse_bgrt+0xa/0xd
> > >      ? acpi_table_parse+0x86/0xb8
> > >      ? acpi_boot_init+0x494/0x4e3
> > >      ? acpi_parse_x2apic+0x87/0x87
> > >      ? setup_acpi_sci+0xa2/0xa2
> > >      ? setup_arch+0x8db/0x9e1
> > >      ? start_kernel+0x6a/0x547
> > >      ? secondary_startup_64+0xb6/0xc0
> > >
> > > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> > > services data to fix kexec breakage" is listed in Fixes: since it
> > > introduces more occurrences where efi_memmap_insert() is invoked after
> > > an efi_fake_mem= configuration has been parsed. Previously the side
> > > effects of vestigial empty entries were benign, but with commit
> > > af1648984828 that follow-on efi_memmap_insert() invocation triggers
> > > efi_memmap_insert() overruns.
> > >
> > > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> >
> > A nitpick for the Fixes flags, as I replied in the thread below:
> > https://lore.kernel.org/linux-efi/CAPcyv4jLxqPaB22Ao9oV31Gm=b0+Phty+Uz33Snex4QchOUb0Q@mail.gmail.com/T/#m2bb2dd00f7715c9c19ccc48efef0fcd5fdb626e7
> >
> > I reproduced two other panics without the patches applied, so this issue
> > is not caused by either of the commits, maybe just drop the Fixes.
> 
> Just the "Fixes: af1648984828", right? No objection from me. I'll let
> Ingo say if he needs a resend for that.
> 
> The "Fixes: 0f96a99dab36" is valid because the original implementation
> failed to handle the multiple argument case from the beginning.

Agreed, thanks!

Thanks
Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks
  2020-01-07  5:18       ` Dave Young
@ 2020-01-07 17:49         ` Ard Biesheuvel
  0 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2020-01-07 17:49 UTC (permalink / raw)
  To: Dave Young
  Cc: Dan Williams, Ingo Molnar, Taku Izumi, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On Tue, 7 Jan 2020 at 06:18, Dave Young <dyoung@redhat.com> wrote:
>
> On 01/06/20 at 08:24pm, Dan Williams wrote:
> > On Mon, Jan 6, 2020 at 7:58 PM Dave Young <dyoung@redhat.com> wrote:
> > >
> > > On 01/06/20 at 04:40pm, Dan Williams wrote:
> > > > With efi_fake_memmap() and efi_arch_mem_reserve() the efi table may be
> > > > updated and replaced multiple times. When that happens a previous
> > > > dynamically allocated efi memory map can be garbage collected. Use the
> > > > new EFI_MEMMAP_{SLAB,MEMBLOCK} flags to detect when a dynamically
> > > > allocated memory map is being replaced.
> > > >
> > > > Debug statements in efi_memmap_free() reveal:
> > > >
> > > >  efi: __efi_memmap_free:37: phys: 0x23ffdd580 size: 2688 flags: 0x2
> > > >  efi: __efi_memmap_free:37: phys: 0x9db00 size: 2640 flags: 0x2
> > > >  efi: __efi_memmap_free:37: phys: 0x9e580 size: 2640 flags: 0x2
> > > >
> > > > ...a savings of 7968 bytes on a qemu boot with 2 entries specified to
> > > > efi_fake_mem=.
> > > >
> > > > Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
> > > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > > ---
> > > >  drivers/firmware/efi/memmap.c |   24 ++++++++++++++++++++++++
> > > >  1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> > > > index 04dfa56b994b..bffa320d2f9a 100644
> > > > --- a/drivers/firmware/efi/memmap.c
> > > > +++ b/drivers/firmware/efi/memmap.c
> > > > @@ -29,6 +29,28 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
> > > >       return PFN_PHYS(page_to_pfn(p));
> > > >  }
> > > >
> > > > +static void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags)
> > > > +{
> > > > +     if (flags & EFI_MEMMAP_MEMBLOCK) {
> > > > +             if (slab_is_available())
> > > > +                     memblock_free_late(phys, size);
> > > > +             else
> > > > +                     memblock_free(phys, size);
> > > > +     } else if (flags & EFI_MEMMAP_SLAB) {
> > > > +             struct page *p = pfn_to_page(PHYS_PFN(phys));
> > > > +             unsigned int order = get_order(size);
> > > > +
> > > > +             free_pages((unsigned long) page_address(p), order);
> > > > +     }
> > > > +}
> > > > +
> > > > +static void __init efi_memmap_free(void)
> > > > +{
> > > > +     __efi_memmap_free(efi.memmap.phys_map,
> > > > +                     efi.memmap.desc_size * efi.memmap.nr_map,
> > > > +                     efi.memmap.flags);
> > > > +}
> > > > +
> > > >  /**
> > > >   * efi_memmap_alloc - Allocate memory for the EFI memory map
> > > >   * @num_entries: Number of entries in the allocated map.
> > > > @@ -100,6 +122,8 @@ static int __init __efi_memmap_init(struct efi_memory_map_data *data)
> > > >               return -ENOMEM;
> > > >       }
> > > >
> > > > +     efi_memmap_free();
> > > > +
> > >
> > > This seems still not safe,  see below function:
> > > arch/x86/platform/efi/efi.c:
> > > static void __init efi_clean_memmap(void)
> > > It use same memmap for both old and new, and filter out those invalid
> > > ranges in place, if the memory is freed then ..
> >
> > In the efi_clean_memmap() case flags are 0, so efi_memmap_free() is a nop.
> >
> > Would you feel better with an explicit?
> >
> > WARN_ON(efi.memmap.phys_map == data->phys_map && (data->flags &
> > (EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK))
> >
> > ...not sure it's worth it.
>
> Ah, yes, sorry I did not see the flags, although it is not very obvious.
> Maybe add some code comment for efi_mem_alloc and efi_mem_init.
>
> Let's defer the suggestion to Ard.
>

A one line comment to remind our future selves of this discussion
would probably be helpful, but beyond that, I don't think we need to
do much here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-07  5:19       ` Dave Young
@ 2020-01-07 17:51         ` Ard Biesheuvel
  2020-01-08 21:53           ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Ard Biesheuvel @ 2020-01-07 17:51 UTC (permalink / raw)
  To: Dave Young
  Cc: Dan Williams, Ingo Molnar, Taku Izumi, Michael Weiser,
	Thomas Gleixner, Ingo Molnar, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On Tue, 7 Jan 2020 at 06:19, Dave Young <dyoung@redhat.com> wrote:
>
> On 01/06/20 at 08:16pm, Dan Williams wrote:
> > On Mon, Jan 6, 2020 at 8:04 PM Dave Young <dyoung@redhat.com> wrote:
> > >
> > > On 01/06/20 at 04:40pm, Dan Williams wrote:
> > > > Dave noticed that when specifying multiple efi_fake_mem= entries only
> > > > the last entry was successfully being reflected in the efi memory map.
> > > > This is due to the fact that the efi_memmap_insert() is being called
> > > > multiple times, but on successive invocations the insertion should be
> > > > applied to the last new memmap rather than the original map at
> > > > efi_fake_memmap() entry.
> > > >
> > > > Rework efi_fake_memmap() to install the new memory map after each
> > > > efi_fake_mem= entry is parsed.
> > > >
> > > > This also fixes an issue in efi_fake_memmap() that caused it to litter
> > > > emtpy entries into the end of the efi memory map. An empty entry causes
> > > > efi_memmap_insert() to attempt more memmap splits / copies than
> > > > efi_memmap_split_count() accounted for when sizing the new map. When
> > > > that happens efi_memmap_insert() may overrun its allocation, and if you
> > > > are lucky will spill over to an unmapped page leading to crash
> > > > signature like the following rather than silent corruption:
> > > >
> > > >     BUG: unable to handle page fault for address: ffffffffff281000
> > > >     [..]
> > > >     RIP: 0010:efi_memmap_insert+0x11d/0x191
> > > >     [..]
> > > >     Call Trace:
> > > >      ? bgrt_init+0xbe/0xbe
> > > >      ? efi_arch_mem_reserve+0x1cb/0x228
> > > >      ? acpi_parse_bgrt+0xa/0xd
> > > >      ? acpi_table_parse+0x86/0xb8
> > > >      ? acpi_boot_init+0x494/0x4e3
> > > >      ? acpi_parse_x2apic+0x87/0x87
> > > >      ? setup_acpi_sci+0xa2/0xa2
> > > >      ? setup_arch+0x8db/0x9e1
> > > >      ? start_kernel+0x6a/0x547
> > > >      ? secondary_startup_64+0xb6/0xc0
> > > >
> > > > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> > > > services data to fix kexec breakage" is listed in Fixes: since it
> > > > introduces more occurrences where efi_memmap_insert() is invoked after
> > > > an efi_fake_mem= configuration has been parsed. Previously the side
> > > > effects of vestigial empty entries were benign, but with commit
> > > > af1648984828 that follow-on efi_memmap_insert() invocation triggers
> > > > efi_memmap_insert() overruns.
> > > >
> > > > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > > > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> > >
> > > A nitpick for the Fixes flags, as I replied in the thread below:
> > > https://lore.kernel.org/linux-efi/CAPcyv4jLxqPaB22Ao9oV31Gm=b0+Phty+Uz33Snex4QchOUb0Q@mail.gmail.com/T/#m2bb2dd00f7715c9c19ccc48efef0fcd5fdb626e7
> > >
> > > I reproduced two other panics without the patches applied, so this issue
> > > is not caused by either of the commits, maybe just drop the Fixes.
> >
> > Just the "Fixes: af1648984828", right? No objection from me. I'll let
> > Ingo say if he needs a resend for that.
> >
> > The "Fixes: 0f96a99dab36" is valid because the original implementation
> > failed to handle the multiple argument case from the beginning.
>
> Agreed, thanks!
>

I'll queue this but without the fixes tags. The -stable maintainers
are far too trigger happy IMHO, and this really needs careful review
before being backported. efi_fake_mem is a debug feature anyway, so I
don't see an urgent need to get this fixed retroactively in older
kernels.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-07 17:51         ` Ard Biesheuvel
@ 2020-01-08 21:53           ` Dan Williams
  2020-01-09  9:35             ` Ard Biesheuvel
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2020-01-08 21:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Dave Young, Ingo Molnar, Taku Izumi, Michael Weiser,
	Thomas Gleixner, Ingo Molnar, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On Tue, Jan 7, 2020 at 9:52 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On Tue, 7 Jan 2020 at 06:19, Dave Young <dyoung@redhat.com> wrote:
> >
> > On 01/06/20 at 08:16pm, Dan Williams wrote:
> > > On Mon, Jan 6, 2020 at 8:04 PM Dave Young <dyoung@redhat.com> wrote:
> > > >
> > > > On 01/06/20 at 04:40pm, Dan Williams wrote:
> > > > > Dave noticed that when specifying multiple efi_fake_mem= entries only
> > > > > the last entry was successfully being reflected in the efi memory map.
> > > > > This is due to the fact that the efi_memmap_insert() is being called
> > > > > multiple times, but on successive invocations the insertion should be
> > > > > applied to the last new memmap rather than the original map at
> > > > > efi_fake_memmap() entry.
> > > > >
> > > > > Rework efi_fake_memmap() to install the new memory map after each
> > > > > efi_fake_mem= entry is parsed.
> > > > >
> > > > > This also fixes an issue in efi_fake_memmap() that caused it to litter
> > > > > emtpy entries into the end of the efi memory map. An empty entry causes
> > > > > efi_memmap_insert() to attempt more memmap splits / copies than
> > > > > efi_memmap_split_count() accounted for when sizing the new map. When
> > > > > that happens efi_memmap_insert() may overrun its allocation, and if you
> > > > > are lucky will spill over to an unmapped page leading to crash
> > > > > signature like the following rather than silent corruption:
> > > > >
> > > > >     BUG: unable to handle page fault for address: ffffffffff281000
> > > > >     [..]
> > > > >     RIP: 0010:efi_memmap_insert+0x11d/0x191
> > > > >     [..]
> > > > >     Call Trace:
> > > > >      ? bgrt_init+0xbe/0xbe
> > > > >      ? efi_arch_mem_reserve+0x1cb/0x228
> > > > >      ? acpi_parse_bgrt+0xa/0xd
> > > > >      ? acpi_table_parse+0x86/0xb8
> > > > >      ? acpi_boot_init+0x494/0x4e3
> > > > >      ? acpi_parse_x2apic+0x87/0x87
> > > > >      ? setup_acpi_sci+0xa2/0xa2
> > > > >      ? setup_arch+0x8db/0x9e1
> > > > >      ? start_kernel+0x6a/0x547
> > > > >      ? secondary_startup_64+0xb6/0xc0
> > > > >
> > > > > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> > > > > services data to fix kexec breakage" is listed in Fixes: since it
> > > > > introduces more occurrences where efi_memmap_insert() is invoked after
> > > > > an efi_fake_mem= configuration has been parsed. Previously the side
> > > > > effects of vestigial empty entries were benign, but with commit
> > > > > af1648984828 that follow-on efi_memmap_insert() invocation triggers
> > > > > efi_memmap_insert() overruns.
> > > > >
> > > > > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > > > > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> > > >
> > > > A nitpick for the Fixes flags, as I replied in the thread below:
> > > > https://lore.kernel.org/linux-efi/CAPcyv4jLxqPaB22Ao9oV31Gm=b0+Phty+Uz33Snex4QchOUb0Q@mail.gmail.com/T/#m2bb2dd00f7715c9c19ccc48efef0fcd5fdb626e7
> > > >
> > > > I reproduced two other panics without the patches applied, so this issue
> > > > is not caused by either of the commits, maybe just drop the Fixes.
> > >
> > > Just the "Fixes: af1648984828", right? No objection from me. I'll let
> > > Ingo say if he needs a resend for that.
> > >
> > > The "Fixes: 0f96a99dab36" is valid because the original implementation
> > > failed to handle the multiple argument case from the beginning.
> >
> > Agreed, thanks!
> >
>
> I'll queue this but without the fixes tags. The -stable maintainers
> are far too trigger happy IMHO, and this really needs careful review
> before being backported. efi_fake_mem is a debug feature anyway, so I
> don't see an urgent need to get this fixed retroactively in older
> kernels.

I'm fine to drop the fixes tags.

However, I do want to point out my driving motive for digging in on
efi_fake_mem=nn@ss:0x40000, is that it is a better interface for
diverting memory ranges to device-dax than the current standard bearer
memmap=nn!ss. The main benefit is that the kernel only considers the
attribute when it is applied to EFI_CONVENTIONAL_MEMORY. This fixes a
long standing unsolvable issue of people picking busted memmap=nn!ss
settings that clobber platform memory ranges, or vector off into
nothing.

So yes, efi_fake_mem is a debug feature, but if the popularity of
memmap=nn!ss is any clue I expect efi_fake_mem=nn@ss:0x40000 will be a
useful tool going forward for late enabling, or repairing platform
"soft reservation" declarations.

I'll respin the series with those tags dropped and add the comment you
recommended about the cases when efi_memmap_free() is expected to be a
nop. Holler if there's anything else, but that's all I had on my list
to fix up.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-08 21:53           ` Dan Williams
@ 2020-01-09  9:35             ` Ard Biesheuvel
  2020-01-09 19:32               ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Ard Biesheuvel @ 2020-01-09  9:35 UTC (permalink / raw)
  To: Dan Williams
  Cc: Dave Young, Ingo Molnar, Taku Izumi, Michael Weiser,
	Thomas Gleixner, Ingo Molnar, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On Wed, 8 Jan 2020 at 22:53, Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Tue, Jan 7, 2020 at 9:52 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >
> > On Tue, 7 Jan 2020 at 06:19, Dave Young <dyoung@redhat.com> wrote:
> > >
> > > On 01/06/20 at 08:16pm, Dan Williams wrote:
> > > > On Mon, Jan 6, 2020 at 8:04 PM Dave Young <dyoung@redhat.com> wrote:
> > > > >
> > > > > On 01/06/20 at 04:40pm, Dan Williams wrote:
> > > > > > Dave noticed that when specifying multiple efi_fake_mem= entries only
> > > > > > the last entry was successfully being reflected in the efi memory map.
> > > > > > This is due to the fact that the efi_memmap_insert() is being called
> > > > > > multiple times, but on successive invocations the insertion should be
> > > > > > applied to the last new memmap rather than the original map at
> > > > > > efi_fake_memmap() entry.
> > > > > >
> > > > > > Rework efi_fake_memmap() to install the new memory map after each
> > > > > > efi_fake_mem= entry is parsed.
> > > > > >
> > > > > > This also fixes an issue in efi_fake_memmap() that caused it to litter
> > > > > > emtpy entries into the end of the efi memory map. An empty entry causes
> > > > > > efi_memmap_insert() to attempt more memmap splits / copies than
> > > > > > efi_memmap_split_count() accounted for when sizing the new map. When
> > > > > > that happens efi_memmap_insert() may overrun its allocation, and if you
> > > > > > are lucky will spill over to an unmapped page leading to crash
> > > > > > signature like the following rather than silent corruption:
> > > > > >
> > > > > >     BUG: unable to handle page fault for address: ffffffffff281000
> > > > > >     [..]
> > > > > >     RIP: 0010:efi_memmap_insert+0x11d/0x191
> > > > > >     [..]
> > > > > >     Call Trace:
> > > > > >      ? bgrt_init+0xbe/0xbe
> > > > > >      ? efi_arch_mem_reserve+0x1cb/0x228
> > > > > >      ? acpi_parse_bgrt+0xa/0xd
> > > > > >      ? acpi_table_parse+0x86/0xb8
> > > > > >      ? acpi_boot_init+0x494/0x4e3
> > > > > >      ? acpi_parse_x2apic+0x87/0x87
> > > > > >      ? setup_acpi_sci+0xa2/0xa2
> > > > > >      ? setup_arch+0x8db/0x9e1
> > > > > >      ? start_kernel+0x6a/0x547
> > > > > >      ? secondary_startup_64+0xb6/0xc0
> > > > > >
> > > > > > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot
> > > > > > services data to fix kexec breakage" is listed in Fixes: since it
> > > > > > introduces more occurrences where efi_memmap_insert() is invoked after
> > > > > > an efi_fake_mem= configuration has been parsed. Previously the side
> > > > > > effects of vestigial empty entries were benign, but with commit
> > > > > > af1648984828 that follow-on efi_memmap_insert() invocation triggers
> > > > > > efi_memmap_insert() overruns.
> > > > > >
> > > > > > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option")
> > > > > > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...")
> > > > >
> > > > > A nitpick for the Fixes flags, as I replied in the thread below:
> > > > > https://lore.kernel.org/linux-efi/CAPcyv4jLxqPaB22Ao9oV31Gm=b0+Phty+Uz33Snex4QchOUb0Q@mail.gmail.com/T/#m2bb2dd00f7715c9c19ccc48efef0fcd5fdb626e7
> > > > >
> > > > > I reproduced two other panics without the patches applied, so this issue
> > > > > is not caused by either of the commits, maybe just drop the Fixes.
> > > >
> > > > Just the "Fixes: af1648984828", right? No objection from me. I'll let
> > > > Ingo say if he needs a resend for that.
> > > >
> > > > The "Fixes: 0f96a99dab36" is valid because the original implementation
> > > > failed to handle the multiple argument case from the beginning.
> > >
> > > Agreed, thanks!
> > >
> >
> > I'll queue this but without the fixes tags. The -stable maintainers
> > are far too trigger happy IMHO, and this really needs careful review
> > before being backported. efi_fake_mem is a debug feature anyway, so I
> > don't see an urgent need to get this fixed retroactively in older
> > kernels.
>
> I'm fine to drop the fixes tags.
>
> However, I do want to point out my driving motive for digging in on
> efi_fake_mem=nn@ss:0x40000, is that it is a better interface for
> diverting memory ranges to device-dax than the current standard bearer
> memmap=nn!ss. The main benefit is that the kernel only considers the
> attribute when it is applied to EFI_CONVENTIONAL_MEMORY. This fixes a
> long standing unsolvable issue of people picking busted memmap=nn!ss
> settings that clobber platform memory ranges, or vector off into
> nothing.
>
> So yes, efi_fake_mem is a debug feature, but if the popularity of
> memmap=nn!ss is any clue I expect efi_fake_mem=nn@ss:0x40000 will be a
> useful tool going forward for late enabling, or repairing platform
> "soft reservation" declarations.
>

OK, good to know.

> I'll respin the series with those tags dropped and add the comment you
> recommended about the cases when efi_memmap_free() is expected to be a
> nop. Holler if there's anything else, but that's all I had on my list
> to fix up.

If it's just for the comment, I can just slap that on, as I already
queued the patches with the fixes tags dropped. Or respin, whichever
you prefer (efi/next branch is not stable anyway)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries
  2020-01-09  9:35             ` Ard Biesheuvel
@ 2020-01-09 19:32               ` Dan Williams
  0 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2020-01-09 19:32 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Dave Young, Ingo Molnar, Taku Izumi, Michael Weiser,
	Thomas Gleixner, Ingo Molnar, linux-efi, X86 ML,
	Linux Kernel Mailing List, Kexec Mailing List

On Thu, Jan 9, 2020 at 1:36 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
[..]
> If it's just for the comment, I can just slap that on, as I already
> queued the patches with the fixes tags dropped.

That would be great. Thanks Ard!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-01-09 19:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-07  0:40 [PATCH v4 0/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
2020-01-07  0:40 ` [PATCH v4 1/4] efi: Add a flags parameter to efi_memory_map Dan Williams
2020-01-07  0:40 ` [PATCH v4 2/4] efi: Add tracking for dynamically allocated memmaps Dan Williams
2020-01-07  0:40 ` [PATCH v4 3/4] efi: Fix efi_memmap_alloc() leaks Dan Williams
2020-01-07  3:58   ` Dave Young
2020-01-07  4:24     ` Dan Williams
2020-01-07  5:18       ` Dave Young
2020-01-07 17:49         ` Ard Biesheuvel
2020-01-07  0:40 ` [PATCH v4 4/4] efi: Fix handling of multiple efi_fake_mem= entries Dan Williams
2020-01-07  4:04   ` Dave Young
2020-01-07  4:16     ` Dan Williams
2020-01-07  5:19       ` Dave Young
2020-01-07 17:51         ` Ard Biesheuvel
2020-01-08 21:53           ` Dan Williams
2020-01-09  9:35             ` Ard Biesheuvel
2020-01-09 19:32               ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).