linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL 0/2] EFI urgent fixes
@ 2015-09-25 22:02 Matt Fleming
  2015-09-25 22:02 ` [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime Matt Fleming
  2015-09-25 22:02 ` [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions Matt Fleming
  0 siblings, 2 replies; 48+ messages in thread
From: Matt Fleming @ 2015-09-25 22:02 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin
  Cc: Matt Fleming, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel,
	Borislav Petkov, Catalin Marinas, Dave Young, James Bottomley,
	Lee, Chun-Yi, Leif Lindholm, Mark Rutland, Mark Salter,
	Matthew Garrett, Peter Jones, stable-u79uwXL29TY76Z2rM5mHXA,
	Will Deacon

From: Matt Fleming <matt.fleming-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Folks,

The patches in this pull request fix kernel crashes when booting Linux
on UEFI v2.5 machines with the Properties Table feature enabled.

Essentially, when this feature is enabled the firmware allocates
separate entries in the EFI memory map for the code and data sections
of PE/COFF images, whereas previously only one memory map entry would
have existed.

Because we've now got two entries that reference each other we *must*
map them into the kernel virtual address space with the same offsets
and in the same order as they appear in the EFI memory map. Failure to
do so causes the firmware to access unmapped/invalid addresses. 

These patches were intentionally kept as small as possible so that
they can be backported by distributions, aggressively.

The following changes since commit 1f93e4a96c9109378204c147b3eec0d0e8100fde:

  Linux 4.3-rc2 (2015-09-20 14:32:34 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git tags/efi-urgent

for you to fetch changes up to 1fa25e09ca2ce07f03bca93ad71800c312fd4951:

  arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions (2015-09-25 22:35:15 +0100)

----------------------------------------------------------------
 * arm64 bug fix for UEFI 2.5 firmware that has the Properties Table
   feature enabled. The fix avoids a kernel crash by removing the padding
   between runtime regions that we currently do in the kernel so we don't
   break the EFI's cross-region references - Ard Biesheuvel

 * Map EFI memory regions in-order on x86 so that we maintain the
   relative offset between regions and fix a crash when booting on
   UEFI 2.5 machines with the Properties Table feature enabled.

----------------------------------------------------------------
Ard Biesheuvel (1):
      arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions

Matt Fleming (1):
      x86/efi: Map EFI memmap entries in-order at runtime

 arch/arm64/kernel/efi.c                 |  3 +-
 arch/x86/platform/efi/efi.c             | 67 ++++++++++++++++++++++++-
 drivers/firmware/efi/libstub/arm-stub.c | 88 +++++++++++++++++++++++++++------
 3 files changed, 141 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-25 22:02 [GIT PULL 0/2] EFI urgent fixes Matt Fleming
@ 2015-09-25 22:02 ` Matt Fleming
       [not found]   ` <1443218539-7610-2-git-send-email-matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
  2015-09-25 22:02 ` [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions Matt Fleming
  1 sibling, 1 reply; 48+ messages in thread
From: Matt Fleming @ 2015-09-25 22:02 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin
  Cc: Matt Fleming, linux-kernel, linux-efi, Lee, Chun-Yi,
	Borislav Petkov, Leif Lindholm, Peter Jones, James Bottomley,
	Matthew Garrett, Dave Young, stable, Ard Biesheuvel

From: Matt Fleming <matt.fleming@intel.com>

Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
signals that the firmware PE/COFF loader supports splitting code and
data sections of PE/COFF images into separate EFI memory map entries.
This allows the kernel to map those regions with strict memory
protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.

Unfortunately, an unwritten requirement of this new feature is that
the regions need to be mapped with the same offsets relative to each
other as observed in the EFI memory map. If this is not done crashes
like this may occur,

 [    0.006391] BUG: unable to handle kernel paging request at fffffffefe6086dd
 [    0.006923] IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
 [    0.007000] Call Trace:
 [    0.007000]  [<ffffffff8104c90e>] efi_call+0x7e/0x100
 [    0.007000]  [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
 [    0.007000]  [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
 [    0.007000]  [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
 [    0.007000]  [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
 [    0.007000]  [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
 [    0.007000]  [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef

Here 0xfffffffefe6086dd refers to an address the firmware expects to
be mapped but which the OS never claimed was mapped. The issue is that
included in these regions are relative addresses to other regions
which were emitted by the firmware toolchain before the "splitting" of
sections occurred at runtime.

Needless to say, we don't satisfy this unwritten requirement on x86_64
and instead map the EFI memory map entries in reverse order. The above
crash is almost certainly triggerable with any kernel newer than v3.13
because that's when we rewrote the EFI runtime region mapping code, in
commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For
kernel versions before v3.13 things may work by pure luck depending on
the fragmentation of the kernel virtual address space at the time we
map the EFI regions.

Instead of mapping the EFI memory map entries in reverse order, where
entry N has a higher virtual address than entry N+1, map them in the
same order as they appear in the EFI memory map to preserve this
relative offset between regions.

This patch has been kept as small as possible with the intention that
it should be applied aggressively to stable and distribution kernels.
It is very much a bugfix rather than support for a new feature, since
when EFI_PROPERTIES_TABLE is enabled we must map things as outlined
above to even boot - we have no way of asking the firmware not to
split the code/data regions.

In fact, this patch doesn't even make use of the more strict memory
protections available in UEFI v2.5. That will come later.

Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "Lee, Chun-Yi" <jlee@suse.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/platform/efi/efi.c | 67 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 1db84c0758b7..6a28ded74211 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -705,6 +705,70 @@ out:
 }
 
 /*
+ * Iterate the EFI memory map in reverse order because the regions
+ * will be mapped top-down. The end result is the same as if we had
+ * mapped things forward, but doesn't require us to change the
+ * existing implementation of efi_map_region().
+ */
+static inline void *efi_map_next_entry_reverse(void *entry)
+{
+	/* Initial call */
+	if (!entry)
+		return memmap.map_end - memmap.desc_size;
+
+	entry -= memmap.desc_size;
+	if (entry < memmap.map)
+		return NULL;
+
+	return entry;
+}
+
+/*
+ * efi_map_next_entry - Return the next EFI memory map descriptor
+ * @entry: Previous EFI memory map descriptor
+ *
+ * This is a helper function to iterate over the EFI memory map, which
+ * we do in different orders depending on the current configuration.
+ *
+ * To begin traversing the memory map @entry must be %NULL.
+ *
+ * Returns %NULL when we reach the end of the memory map.
+ */
+static void *efi_map_next_entry(void *entry)
+{
+	if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
+		/*
+		 * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
+		 * config table feature requires us to map all entries
+		 * in the same order as they appear in the EFI memory
+		 * map. That is to say, entry N must have a lower
+		 * virtual address than entry N+1. This is because the
+		 * firmware toolchain leaves relative references in
+		 * the code/data sections, which are split and become
+		 * separate EFI memory regions. Mapping things
+		 * out-of-order leads to the firmware accessing
+		 * unmapped addresses.
+		 *
+		 * Since we need to map things this way whether or not
+		 * the kernel actually makes use of
+		 * EFI_PROPERTIES_TABLE, let's just switch to this
+		 * scheme by default for 64-bit.
+		 */
+		return efi_map_next_entry_reverse(entry);
+	}
+
+	/* Initial call */
+	if (!entry)
+		return memmap.map;
+
+	entry += memmap.desc_size;
+	if (entry >= memmap.map_end)
+		return NULL;
+
+	return entry;
+}
+
+/*
  * Map the efi memory ranges of the runtime services and update new_mmap with
  * virtual addresses.
  */
@@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
 	unsigned long left = 0;
 	efi_memory_desc_t *md;
 
-	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
+	p = NULL;
+	while ((p = efi_map_next_entry(p))) {
 		md = p;
 		if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
 #ifdef CONFIG_X86_64
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-25 22:02 [GIT PULL 0/2] EFI urgent fixes Matt Fleming
  2015-09-25 22:02 ` [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime Matt Fleming
@ 2015-09-25 22:02 ` Matt Fleming
       [not found]   ` <1443218539-7610-3-git-send-email-matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
  1 sibling, 1 reply; 48+ messages in thread
From: Matt Fleming @ 2015-09-25 22:02 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin
  Cc: Ard Biesheuvel, linux-kernel, linux-efi, Leif Lindholm,
	Catalin Marinas, Will Deacon, stable, Matt Fleming, Mark Rutland,
	Mark Salter

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

The new Properties Table feature introduced in UEFIv2.5 may split
memory regions that cover PE/COFF memory images into separate code
and data regions. Since these regions only differ in the type (runtime
code vs runtime data) and the permission bits, but not in the memory
type attributes (UC/WC/WT/WB), the spec does not require them to be
aligned to 64 KB.

Since the relative offset of PE/COFF .text and .data segments cannot
be changed on the fly, this means that we can no longer pad out those
regions to be mappable using 64 KB pages.
Unfortunately, there is no annotation in the UEFI memory map that
identifies data regions that were split off from a code region, so we
must apply this logic to all adjacent runtime regions whose attributes
only differ in the permission bits.

So instead of rounding each memory region to 64 KB alignment at both
ends, only round down regions that are not directly preceded by another
runtime region with the same type attributes. Since the UEFI spec does
not mandate that the memory map be sorted, this means we also need to
sort it first.

Note that this change will result in all EFI_MEMORY_RUNTIME regions whose
start addresses are not aligned to the OS page size to be mapped with
executable permissions (i.e., on kernels compiled with 64 KB pages).
However, since these mappings are only active during the time that UEFI
Runtime Services are being invoked, the window for abuse is rather small.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Mark Salter <msalter@redhat.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [UEFI 2.4 only]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v4.0+
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 arch/arm64/kernel/efi.c                 |  3 +-
 drivers/firmware/efi/libstub/arm-stub.c | 88 +++++++++++++++++++++++++++------
 2 files changed, 75 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index e8ca6eaedd02..13671a9cf016 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -258,7 +258,8 @@ static bool __init efi_virtmap_init(void)
 		 */
 		if (!is_normal_ram(md))
 			prot = __pgprot(PROT_DEVICE_nGnRE);
-		else if (md->type == EFI_RUNTIME_SERVICES_CODE)
+		else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
+			 !PAGE_ALIGNED(md->phys_addr))
 			prot = PAGE_KERNEL_EXEC;
 		else
 			prot = PAGE_KERNEL;
diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
index e29560e6b40b..950c87f5d279 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -13,6 +13,7 @@
  */
 
 #include <linux/efi.h>
+#include <linux/sort.h>
 #include <asm/efi.h>
 
 #include "efistub.h"
@@ -305,6 +306,44 @@ fail:
  */
 #define EFI_RT_VIRTUAL_BASE	0x40000000
 
+static int cmp_mem_desc(const void *l, const void *r)
+{
+	const efi_memory_desc_t *left = l, *right = r;
+
+	return (left->phys_addr > right->phys_addr) ? 1 : -1;
+}
+
+/*
+ * Returns whether region @left ends exactly where region @right starts,
+ * or false if either argument is NULL.
+ */
+static bool regions_are_adjacent(efi_memory_desc_t *left,
+				 efi_memory_desc_t *right)
+{
+	u64 left_end;
+
+	if (left == NULL || right == NULL)
+		return false;
+
+	left_end = left->phys_addr + left->num_pages * EFI_PAGE_SIZE;
+
+	return left_end == right->phys_addr;
+}
+
+/*
+ * Returns whether region @left and region @right have compatible memory type
+ * mapping attributes, and are both EFI_MEMORY_RUNTIME regions.
+ */
+static bool regions_have_compatible_memory_type_attrs(efi_memory_desc_t *left,
+						      efi_memory_desc_t *right)
+{
+	static const u64 mem_type_mask = EFI_MEMORY_WB | EFI_MEMORY_WT |
+					 EFI_MEMORY_WC | EFI_MEMORY_UC |
+					 EFI_MEMORY_RUNTIME;
+
+	return ((left->attribute ^ right->attribute) & mem_type_mask) == 0;
+}
+
 /*
  * efi_get_virtmap() - create a virtual mapping for the EFI memory map
  *
@@ -317,33 +356,52 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
 		     int *count)
 {
 	u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
-	efi_memory_desc_t *out = runtime_map;
+	efi_memory_desc_t *in, *prev = NULL, *out = runtime_map;
 	int l;
 
-	for (l = 0; l < map_size; l += desc_size) {
-		efi_memory_desc_t *in = (void *)memory_map + l;
+	/*
+	 * To work around potential issues with the Properties Table feature
+	 * introduced in UEFI 2.5, which may split PE/COFF executable images
+	 * in memory into several RuntimeServicesCode and RuntimeServicesData
+	 * regions, we need to preserve the relative offsets between adjacent
+	 * EFI_MEMORY_RUNTIME regions with the same memory type attributes.
+	 * The easiest way to find adjacent regions is to sort the memory map
+	 * before traversing it.
+	 */
+	sort(memory_map, map_size / desc_size, desc_size, cmp_mem_desc, NULL);
+
+	for (l = 0; l < map_size; l += desc_size, prev = in) {
 		u64 paddr, size;
 
+		in = (void *)memory_map + l;
 		if (!(in->attribute & EFI_MEMORY_RUNTIME))
 			continue;
 
+		paddr = in->phys_addr;
+		size = in->num_pages * EFI_PAGE_SIZE;
+
 		/*
 		 * Make the mapping compatible with 64k pages: this allows
 		 * a 4k page size kernel to kexec a 64k page size kernel and
 		 * vice versa.
 		 */
-		paddr = round_down(in->phys_addr, SZ_64K);
-		size = round_up(in->num_pages * EFI_PAGE_SIZE +
-				in->phys_addr - paddr, SZ_64K);
-
-		/*
-		 * Avoid wasting memory on PTEs by choosing a virtual base that
-		 * is compatible with section mappings if this region has the
-		 * appropriate size and physical alignment. (Sections are 2 MB
-		 * on 4k granule kernels)
-		 */
-		if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
-			efi_virt_base = round_up(efi_virt_base, SZ_2M);
+		if (!regions_are_adjacent(prev, in) ||
+		    !regions_have_compatible_memory_type_attrs(prev, in)) {
+
+			paddr = round_down(in->phys_addr, SZ_64K);
+			size += in->phys_addr - paddr;
+
+			/*
+			 * Avoid wasting memory on PTEs by choosing a virtual
+			 * base that is compatible with section mappings if this
+			 * region has the appropriate size and physical
+			 * alignment. (Sections are 2 MB on 4k granule kernels)
+			 */
+			if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
+				efi_virt_base = round_up(efi_virt_base, SZ_2M);
+			else
+				efi_virt_base = round_up(efi_virt_base, SZ_64K);
+		}
 
 		in->virt_addr = efi_virt_base + in->phys_addr - paddr;
 		efi_virt_base += size;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]   ` <1443218539-7610-2-git-send-email-matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-09-26  5:56     ` Ingo Molnar
       [not found]       ` <20150926055643.GA25877-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-09-26 13:43       ` Matt Fleming
  0 siblings, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2015-09-26  5:56 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel,
	Linus Torvalds, Borislav Petkov, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Andrew Morton


So this commit worries me.

This bug is a good find, and the fix is obviously needed and urgent, but I'm not 
sure about the implementation at all. (I've Cc:-ed a few more x86 low level 
gents.)

* Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:

>  /*
> + * Iterate the EFI memory map in reverse order because the regions
> + * will be mapped top-down. The end result is the same as if we had
> + * mapped things forward, but doesn't require us to change the
> + * existing implementation of efi_map_region().
> + */
> +static inline void *efi_map_next_entry_reverse(void *entry)
> +{
> +	/* Initial call */
> +	if (!entry)
> +		return memmap.map_end - memmap.desc_size;
> +
> +	entry -= memmap.desc_size;
> +	if (entry < memmap.map)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +/*
> + * efi_map_next_entry - Return the next EFI memory map descriptor
> + * @entry: Previous EFI memory map descriptor
> + *
> + * This is a helper function to iterate over the EFI memory map, which
> + * we do in different orders depending on the current configuration.
> + *
> + * To begin traversing the memory map @entry must be %NULL.
> + *
> + * Returns %NULL when we reach the end of the memory map.
> + */
> +static void *efi_map_next_entry(void *entry)
> +{
> +	if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> +		/*
> +		 * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
> +		 * config table feature requires us to map all entries
> +		 * in the same order as they appear in the EFI memory
> +		 * map. That is to say, entry N must have a lower
> +		 * virtual address than entry N+1. This is because the
> +		 * firmware toolchain leaves relative references in
> +		 * the code/data sections, which are split and become
> +		 * separate EFI memory regions. Mapping things
> +		 * out-of-order leads to the firmware accessing
> +		 * unmapped addresses.
> +		 *
> +		 * Since we need to map things this way whether or not
> +		 * the kernel actually makes use of
> +		 * EFI_PROPERTIES_TABLE, let's just switch to this
> +		 * scheme by default for 64-bit.

The thing is, if relative accesses between these 'sections' do happen then the 
requirement is much stronger than just 'ordered by addresses' - then we must map 
them continuously and as a single block!

So at minimum the comment should say that. But I think we want more:


> +		 */
> +		return efi_map_next_entry_reverse(entry);
> +	}
> +
> +	/* Initial call */
> +	if (!entry)
> +		return memmap.map;
> +
> +	entry += memmap.desc_size;
> +	if (entry >= memmap.map_end)
> +		return NULL;
> +
> +	return entry;
> +}
> +
> +/*
>   * Map the efi memory ranges of the runtime services and update new_mmap with
>   * virtual addresses.
>   */
> @@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
>  	unsigned long left = 0;
>  	efi_memory_desc_t *md;
>  
> -	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> +	p = NULL;
> +	while ((p = efi_map_next_entry(p))) {
>  		md = p;
>  		if (!(md->attribute & EFI_MEMORY_RUNTIME)) {

So why is this 64-bit only? Is 32-bit not affected because there we allocate 
virtual addresses bottom-up?

This would be a lot clearer if we just mapped the entries in order, no questions 
asked. Conditions like this:

> +	if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {

... just invite confusion and possible corner cases where we end up mapping them 
wrong.

So could we make the whole code obviously bottom-up? Such as first calculating the 
size of virtual memory needed, then allocating a _single_, obviously continuous 
mapping, and then doing a very clear in-order mapping within that window? That 
would remove any bitness and legacy dependencies.

Good virtual memory layout is so critical for any third party code that this 
should be the central property of the whole approach, not just some side condition 
somewhere in an iteration loop.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
       [not found]   ` <1443218539-7610-3-git-send-email-matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-09-26  6:01     ` Ingo Molnar
       [not found]       ` <20150926060159.GB25877-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2015-09-26  6:01 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Thomas Gleixner, H. Peter Anvin, Ard Biesheuvel,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable-u79uwXL29TY76Z2rM5mHXA, Matt Fleming,
	Mark Rutland, Mark Salter, Linus Torvalds, Andrew Morton,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst


* Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:

> From: Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> 
> The new Properties Table feature introduced in UEFIv2.5 may split
> memory regions that cover PE/COFF memory images into separate code
> and data regions. Since these regions only differ in the type (runtime
> code vs runtime data) and the permission bits, but not in the memory
> type attributes (UC/WC/WT/WB), the spec does not require them to be
> aligned to 64 KB.
> 
> Since the relative offset of PE/COFF .text and .data segments cannot
> be changed on the fly, this means that we can no longer pad out those
> regions to be mappable using 64 KB pages.
> Unfortunately, there is no annotation in the UEFI memory map that
> identifies data regions that were split off from a code region, so we
> must apply this logic to all adjacent runtime regions whose attributes
> only differ in the permission bits.
> 
> So instead of rounding each memory region to 64 KB alignment at both
> ends, only round down regions that are not directly preceded by another
> runtime region with the same type attributes. Since the UEFI spec does
> not mandate that the memory map be sorted, this means we also need to
> sort it first.

So I think this is fundamentally wrong as well, similarly to the related x86 fix.

I think for compatibility reasons the whole 'EFI runtime image' should be mapped 
in a single go, as closely matching the EFI layouts and offsets as possible. We 
are not talking about gigabytes here, right?

Even if technically they are 'separate sections', the x86 bug shows that they 
aren't. So we should not pretend so on the Linux side either and we should not 
tear them apart (and then work hard to preserve the interdependencies, some 
declared, some hidden!).

If we allocate the EFI runtime as a single virtual memory block then issues like 
rounding between sections does not even come up as a problem: we map the original 
offsets and sizes byte by byte.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]       ` <20150926055643.GA25877-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-09-26  6:44         ` Ard Biesheuvel
  2015-09-26 17:01         ` Andy Lutomirski
  1 sibling, 0 replies; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-26  6:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable-u79uwXL29TY76Z2rM5mHXA, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton

On 25 September 2015 at 22:56, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> So this commit worries me.
>
> This bug is a good find, and the fix is obviously needed and urgent, but I'm not
> sure about the implementation at all. (I've Cc:-ed a few more x86 low level
> gents.)
>
> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>
>>  /*
>> + * Iterate the EFI memory map in reverse order because the regions
>> + * will be mapped top-down. The end result is the same as if we had
>> + * mapped things forward, but doesn't require us to change the
>> + * existing implementation of efi_map_region().
>> + */
>> +static inline void *efi_map_next_entry_reverse(void *entry)
>> +{
>> +     /* Initial call */
>> +     if (!entry)
>> +             return memmap.map_end - memmap.desc_size;
>> +
>> +     entry -= memmap.desc_size;
>> +     if (entry < memmap.map)
>> +             return NULL;
>> +
>> +     return entry;
>> +}
>> +
>> +/*
>> + * efi_map_next_entry - Return the next EFI memory map descriptor
>> + * @entry: Previous EFI memory map descriptor
>> + *
>> + * This is a helper function to iterate over the EFI memory map, which
>> + * we do in different orders depending on the current configuration.
>> + *
>> + * To begin traversing the memory map @entry must be %NULL.
>> + *
>> + * Returns %NULL when we reach the end of the memory map.
>> + */
>> +static void *efi_map_next_entry(void *entry)
>> +{
>> +     if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>> +             /*
>> +              * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
>> +              * config table feature requires us to map all entries
>> +              * in the same order as they appear in the EFI memory
>> +              * map. That is to say, entry N must have a lower
>> +              * virtual address than entry N+1. This is because the
>> +              * firmware toolchain leaves relative references in
>> +              * the code/data sections, which are split and become
>> +              * separate EFI memory regions. Mapping things
>> +              * out-of-order leads to the firmware accessing
>> +              * unmapped addresses.
>> +              *
>> +              * Since we need to map things this way whether or not
>> +              * the kernel actually makes use of
>> +              * EFI_PROPERTIES_TABLE, let's just switch to this
>> +              * scheme by default for 64-bit.
>
> The thing is, if relative accesses between these 'sections' do happen then the
> requirement is much stronger than just 'ordered by addresses' - then we must map
> them continuously and as a single block!
>

The primary difference between pre-2.5 and 2.5 with this feature
enabled is that formerly, each PE/COFF image in memory would be
covered by at most a single EfiRuntimeServicesCode region, and now, a
single PE/COFF image may be split into different regions. It is only
relative references *inside* such a PE/COFF image that we are
concerned about, since no symbol references exist between separate
ones.

Also, it is not only relative references inside the PE/COFF image that
cause the problem. Another aspect is that the offset that is applied
to all absolute references at relocation time is derived from the
offset of the base of the PE/COFF image. If part of the PE/COFF image
(the .data section) is moved relatively to the code section, these
absolute references are fixed up incorrectly. This is actually a
problem that we could solve at the firmware side, but since PE/COFF
does not really tolerate being split up like that, the correct fix is
to keep all regions belonging to a single PE/COFF image adjacent.
Since we can't tell which those regions are, the next best approach is
to keep all adjacent regions with the EFI_MEMORY_RUNTIME attribute
adjacent in the VA mapping.

-- 
Ard.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
       [not found]       ` <20150926060159.GB25877-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-09-26  7:08         ` Ard Biesheuvel
       [not found]           ` <CAKv+Gu_JRCmx5qKq7SDDq73BegVs3LKZwe+OdvTKb4YD_CURyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-26  7:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable-u79uwXL29TY76Z2rM5mHXA, Matt Fleming,
	Mark Rutland, Mark Salter, Linus Torvalds, Andrew Morton,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On 25 September 2015 at 23:01, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>
>> From: Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>
>> The new Properties Table feature introduced in UEFIv2.5 may split
>> memory regions that cover PE/COFF memory images into separate code
>> and data regions. Since these regions only differ in the type (runtime
>> code vs runtime data) and the permission bits, but not in the memory
>> type attributes (UC/WC/WT/WB), the spec does not require them to be
>> aligned to 64 KB.
>>
>> Since the relative offset of PE/COFF .text and .data segments cannot
>> be changed on the fly, this means that we can no longer pad out those
>> regions to be mappable using 64 KB pages.
>> Unfortunately, there is no annotation in the UEFI memory map that
>> identifies data regions that were split off from a code region, so we
>> must apply this logic to all adjacent runtime regions whose attributes
>> only differ in the permission bits.
>>
>> So instead of rounding each memory region to 64 KB alignment at both
>> ends, only round down regions that are not directly preceded by another
>> runtime region with the same type attributes. Since the UEFI spec does
>> not mandate that the memory map be sorted, this means we also need to
>> sort it first.
>
> So I think this is fundamentally wrong as well, similarly to the related x86 fix.
>
> I think for compatibility reasons the whole 'EFI runtime image' should be mapped
> in a single go, as closely matching the EFI layouts and offsets as possible. We
> are not talking about gigabytes here, right?
>

As I explained in the other thread, this is really not necessary, and
never has been until the firmware started splitting up PE/COFF images
into several sections each. As long as we keep those PE/COFF images
together, everything will work as before, and the only complication is
that the memory map does not contain any clues about which regions
belong to a single PE/COFF image, so we need to keep all adjacent
EFI_MEMORY_RUNTIME regions adjacent in the VA mapping.

> Even if technically they are 'separate sections', the x86 bug shows that they
> aren't. So we should not pretend so on the Linux side either and we should not
> tear them apart (and then work hard to preserve the interdependencies, some
> declared, some hidden!).
>

This is about relocations and interdependencies at the symbol level,
and such interdependencies only exist internally inside PE/COFF
images.

> If we allocate the EFI runtime as a single virtual memory block then issues like
> rounding between sections does not even come up as a problem: we map the original
> offsets and sizes byte by byte.
>

Well, by that reasoning, we should not call SetVirtualAddressMap() in
the first place, and just use the 1:1 mapping UEFI uses natively. This
is more than feasible on arm64, and I actually fought hard against
using SetVirtualAddressMap() at all, but I was overruled by others. I
think this is also trivially possible on X64, since the 1:1 mapping is
already active alongside the VA mapping.

-- 
Ard.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-26  5:56     ` Ingo Molnar
       [not found]       ` <20150926055643.GA25877-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-09-26 13:43       ` Matt Fleming
       [not found]         ` <20150926134329.GA3144-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
  1 sibling, 1 reply; 48+ messages in thread
From: Matt Fleming @ 2015-09-26 13:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H. Peter Anvin, Matt Fleming, linux-kernel,
	linux-efi, Lee, Chun-Yi, Borislav Petkov, Leif Lindholm,
	Peter Jones, James Bottomley, Matthew Garrett, Dave Young,
	stable, Ard Biesheuvel, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Andrew Morton

On Sat, 26 Sep, at 07:56:43AM, Ingo Molnar wrote:
> 
> So this commit worries me.
> 
> This bug is a good find, and the fix is obviously needed and urgent, but I'm not 
> sure about the implementation at all. (I've Cc:-ed a few more x86 low level 
> gents.)
 
Thanks, the more the merrier.

> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
> 
> >  /*
> > + * Iterate the EFI memory map in reverse order because the regions
> > + * will be mapped top-down. The end result is the same as if we had
> > + * mapped things forward, but doesn't require us to change the
> > + * existing implementation of efi_map_region().
> > + */
> > +static inline void *efi_map_next_entry_reverse(void *entry)
> > +{
> > +	/* Initial call */
> > +	if (!entry)
> > +		return memmap.map_end - memmap.desc_size;
> > +
> > +	entry -= memmap.desc_size;
> > +	if (entry < memmap.map)
> > +		return NULL;
> > +
> > +	return entry;
> > +}
> > +
> > +/*
> > + * efi_map_next_entry - Return the next EFI memory map descriptor
> > + * @entry: Previous EFI memory map descriptor
> > + *
> > + * This is a helper function to iterate over the EFI memory map, which
> > + * we do in different orders depending on the current configuration.
> > + *
> > + * To begin traversing the memory map @entry must be %NULL.
> > + *
> > + * Returns %NULL when we reach the end of the memory map.
> > + */
> > +static void *efi_map_next_entry(void *entry)
> > +{
> > +	if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> > +		/*
> > +		 * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
> > +		 * config table feature requires us to map all entries
> > +		 * in the same order as they appear in the EFI memory
> > +		 * map. That is to say, entry N must have a lower
> > +		 * virtual address than entry N+1. This is because the
> > +		 * firmware toolchain leaves relative references in
> > +		 * the code/data sections, which are split and become
> > +		 * separate EFI memory regions. Mapping things
> > +		 * out-of-order leads to the firmware accessing
> > +		 * unmapped addresses.
> > +		 *
> > +		 * Since we need to map things this way whether or not
> > +		 * the kernel actually makes use of
> > +		 * EFI_PROPERTIES_TABLE, let's just switch to this
> > +		 * scheme by default for 64-bit.
> 
> The thing is, if relative accesses between these 'sections' do happen then the 
> requirement is much stronger than just 'ordered by addresses' - then we must map 
> them continuously and as a single block!
> 
> So at minimum the comment should say that. But I think we want more:
 
Well, the firmware doesn't place arbitrary gaps between these runtime
image sections in memory, they still get loaded into memory by the
firmware the old-fashioned way (as one contiguous blob). It's just
that we've now got two memory map entries for the single PE/COFF
image. Also, it's only sections from the same PE/COFF image that
reference each other (but like Ard said, we can't tell from the memmap
which entries are for a single PE/COFF image).

Because EFI memory map entries that are part of the same PE/COFF image
will be at consecutive addresses, we'll also map them contiguously in
the kernel virtual address space, because that's how the current code
works - it's just that the current code maps them backwards.

Yeah, I'm completely in favour of improving any of the comments. 
 
> > +		 */
> > +		return efi_map_next_entry_reverse(entry);
> > +	}
> > +
> > +	/* Initial call */
> > +	if (!entry)
> > +		return memmap.map;
> > +
> > +	entry += memmap.desc_size;
> > +	if (entry >= memmap.map_end)
> > +		return NULL;
> > +
> > +	return entry;
> > +}
> > +
> > +/*
> >   * Map the efi memory ranges of the runtime services and update new_mmap with
> >   * virtual addresses.
> >   */
> > @@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
> >  	unsigned long left = 0;
> >  	efi_memory_desc_t *md;
> >  
> > -	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> > +	p = NULL;
> > +	while ((p = efi_map_next_entry(p))) {
> >  		md = p;
> >  		if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
> 
> So why is this 64-bit only? Is 32-bit not affected because there we allocate 
> virtual addresses bottom-up?
 
32-bit would potentially be effected if 32-bit firmware ever enabled
the EFI_PROPERTIES_TABLE feature. Whether or not it worked would
depend on the fragemntation of the virtual memory space when we map
the EFI regions.

But I'm not aware of any 32-bit firmware shipping with Properties
Table support, and fixing it in the kernel like we do for x86-64 would
require changing the EFI region mapping code. That's something I
wanted to avoid for this patch since it needs to be applied to stable,
and especially when we haven't seen the feature being used in the
wild.

> This would be a lot clearer if we just mapped the entries in order, no questions 
> asked. Conditions like this:
> 
> > +	if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> 
> ... just invite confusion and possible corner cases where we end up mapping them 
> wrong.
 
When we introduced the top-down scheme in commit d2f7cbe7b26a
("x86/efi: Runtime services virtual mapping") we also introduced the
"efi=old_map" kernel parameter to force the old virtual address
allocation scheme exactly because we were concerned we might run into
firmware "issues" with the way we mapped things.

That concern hasn't gone away now - it's intensified.

The above conditional just codifies a config combination that we've
always treated differently, and the conditional logic is only
necessary because this is generic x86 code.

I agree that it would be nicer without the conditional code, but
forcing the same allocation scheme across 32-bit and 64-bit is a
*much* bigger change than what is proposed here. Not just in terms of
patch size, but also in terms of risk.

> So could we make the whole code obviously bottom-up? Such as first calculating the 
> size of virtual memory needed, then allocating a _single_, obviously continuous 
> mapping, and then doing a very clear in-order mapping within that window? That 
> would remove any bitness and legacy dependencies.
 
So, we could, and in fact the first version of this patch did just
that. You can find it here,

  https://lkml.kernel.org/r/1441372447-23439-1-git-send-email-matt@codeblueprint.co.uk

But Ard suggested re-using the existing code and simply changing the
order we map the memmap entries in. And given the constraint for a
small patch for backporting, I think it's a better solution. The
actual virtual addresses we pick are exactly the same with the two
patches.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]       ` <20150926055643.GA25877-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-09-26  6:44         ` Ard Biesheuvel
@ 2015-09-26 17:01         ` Andy Lutomirski
  2015-09-26 17:20           ` H. Peter Anvin
  2015-09-27  6:50           ` Ingo Molnar
  1 sibling, 2 replies; 48+ messages in thread
From: Andy Lutomirski @ 2015-09-26 17:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Ard Biesheuvel, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton

On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> So this commit worries me.
>
> This bug is a good find, and the fix is obviously needed and urgent, but I'm not
> sure about the implementation at all. (I've Cc:-ed a few more x86 low level
> gents.)
>
> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>> +             /*
>> +              * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
>> +              * config table feature requires us to map all entries
>> +              * in the same order as they appear in the EFI memory
>> +              * map. That is to say, entry N must have a lower
>> +              * virtual address than entry N+1. This is because the
>> +              * firmware toolchain leaves relative references in
>> +              * the code/data sections, which are split and become
>> +              * separate EFI memory regions. Mapping things
>> +              * out-of-order leads to the firmware accessing
>> +              * unmapped addresses.
>> +              *

I'm clearly missing something.  What is EFI doing that it doesn't care
how big the gap between sections is but it still requires them to be
in order?  It's not as though x86_64 has an addressing mode that
allows only non-negative offsets.

--Andy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-26 17:01         ` Andy Lutomirski
@ 2015-09-26 17:20           ` H. Peter Anvin
       [not found]             ` <E40EBF14-FD9E-49C0-A7FB-951958F72F79-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  2015-09-27  6:50           ` Ingo Molnar
  1 sibling, 1 reply; 48+ messages in thread
From: H. Peter Anvin @ 2015-09-26 17:20 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: Matt Fleming, Thomas Gleixner, Matt Fleming, linux-kernel,
	linux-efi, Lee, Chun-Yi, Borislav Petkov, Leif Lindholm,
	Peter Jones, James Bottomley, Matthew Garrett, Dave Young,
	stable, Ard Biesheuvel, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Andrew Morton

I think it "works" because the affected BIOSes don't put spaces between the chunks.  I have discussed this with Matt.

On September 26, 2015 10:01:14 AM PDT, Andy Lutomirski <luto@amacapital.net> wrote:
>On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> So this commit worries me.
>>
>> This bug is a good find, and the fix is obviously needed and urgent,
>but I'm not
>> sure about the implementation at all. (I've Cc:-ed a few more x86 low
>level
>> gents.)
>>
>> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>>> +             /*
>>> +              * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
>>> +              * config table feature requires us to map all entries
>>> +              * in the same order as they appear in the EFI memory
>>> +              * map. That is to say, entry N must have a lower
>>> +              * virtual address than entry N+1. This is because the
>>> +              * firmware toolchain leaves relative references in
>>> +              * the code/data sections, which are split and become
>>> +              * separate EFI memory regions. Mapping things
>>> +              * out-of-order leads to the firmware accessing
>>> +              * unmapped addresses.
>>> +              *
>
>I'm clearly missing something.  What is EFI doing that it doesn't care
>how big the gap between sections is but it still requires them to be
>in order?  It's not as though x86_64 has an addressing mode that
>allows only non-negative offsets.
>
>--Andy

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]             ` <E40EBF14-FD9E-49C0-A7FB-951958F72F79-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2015-09-26 18:15               ` Ard Biesheuvel
  2015-09-26 19:49                 ` H. Peter Anvin
  2015-09-26 19:55               ` Matt Fleming
  1 sibling, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-26 18:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Ingo Molnar, Matt Fleming, Thomas Gleixner,
	Matt Fleming, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Andrew Morton

On 26 September 2015 at 10:20, H. Peter Anvin <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> wrote:
> I think it "works" because the affected BIOSes don't put spaces between the chunks.  I have discussed this with Matt.
>

Forgive the ASCII art but perhaps an illustration might help:

before the 2.5 feature, PE/COFF runtime images were remapped as
illustrated here:

                                PA                        VA
+---------------+         +---------------+
|               |         |               |
| PE/COFF .text |         |    EFI        |
|               |         |    Runtime    |
+- - - - - - - -+    =>   |    Services   |----+
|               |         |    Code       |    |    :               :
| PE/COFF .data |         |               |    |    :               :
|               |         |               |    |    +---------------+
+---------------+         +---------------+    |    |               |
|               |         |               |    |    |    EFI        |
:               :         :               :    |    |    Runtime    |
:               :         :               :    +--->|    Services   |
|               |         |               |         |    Code       |
+---------------+         +---------------+         |               |
|               |         |               |         |               |
| PE/COFF .text |         |    EFI        |         +---------------+
|               |         |    Runtime    |         :      gap      :
+- - - - - - - -+    =>   |    Services   |---+     +---------------+
|               |         |    Code       |   |     |               |
| PE/COFF .data |         |               |   |     |    EFI        |
|               |         |               |   |     |    Runtime    |
+---------------+         +---------------+   +---->|    Services   |
|               |         |               |         |    Code       |
:               :         :               :         |               |
:               :         :               :         |               |
:               :         :               :         +---------------+
:               :         :               :         :               :

Since the affected symbol references only exist between PE/COFF .text
and PE/COFF .data, there is never a problem since each is PE/COFF
image is mapped as a single region.
However, with the new feature enabled, this no longer holds:
                                PA                        VA
+---------------+         +---------------+
|               |         |               |
| PE/COFF .text |         |    RtServices |----+
|               |         |    Code       |    |
+- - - - - - - -+    =>   +---------------+    |    +---------------+
|               |         |    RtServices |    +--->|    RtServices |
| PE/COFF .data |         |    Data       |         |    Code       |
|               |         |               |----+    +---------------+
+---------------+         +---------------+    |    :     gap       :
|               |         |               |    |    +---------------+
:               :         :               :    +--->|    RtServices |
:               :         :               :         |    Data       |
|               |         |               |         +---------------+
+---------------+         +---------------+         :     gap       :
|               |         |               |         +---------------+
| PE/COFF .text |         |    RtServices |-------->|    RtServices |
|               |         |    Code       |         |    Code       |
+- - - - - - - -+    =>   +---------------+         +---------------+
|               |         |    RtServices |         :     gap       :
| PE/COFF .data |         |    Data       |---+     +---------------+
|               |         |               |   |     |    RtServices |
+---------------+         +---------------+   +---->|    Data       |
|               |         |               |         |               |
:               :         :               :         +---------------+
:               :         :               :         :               :
:               :         :               :         :               :

The illustration uses gaps, but obviously, this applies equally to
inverting the mapping order, since the PE/COFF .text and .data
sections will end up out of order.

-- 
Ard.


> On September 26, 2015 10:01:14 AM PDT, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>
>>> So this commit worries me.
>>>
>>> This bug is a good find, and the fix is obviously needed and urgent,
>>but I'm not
>>> sure about the implementation at all. (I've Cc:-ed a few more x86 low
>>level
>>> gents.)
>>>
>>> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>>>> +             /*
>>>> +              * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
>>>> +              * config table feature requires us to map all entries
>>>> +              * in the same order as they appear in the EFI memory
>>>> +              * map. That is to say, entry N must have a lower
>>>> +              * virtual address than entry N+1. This is because the
>>>> +              * firmware toolchain leaves relative references in
>>>> +              * the code/data sections, which are split and become
>>>> +              * separate EFI memory regions. Mapping things
>>>> +              * out-of-order leads to the firmware accessing
>>>> +              * unmapped addresses.
>>>> +              *
>>
>>I'm clearly missing something.  What is EFI doing that it doesn't care
>>how big the gap between sections is but it still requires them to be
>>in order?  It's not as though x86_64 has an addressing mode that
>>allows only non-negative offsets.
>>
>>--Andy
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-26 18:15               ` Ard Biesheuvel
@ 2015-09-26 19:49                 ` H. Peter Anvin
  2015-09-26 19:57                   ` Matt Fleming
  0 siblings, 1 reply; 48+ messages in thread
From: H. Peter Anvin @ 2015-09-26 19:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andy Lutomirski, Ingo Molnar, Matt Fleming, Thomas Gleixner,
	Matt Fleming, linux-kernel, linux-efi, Lee, Chun-Yi,
	Borislav Petkov, Leif Lindholm, Peter Jones, James Bottomley,
	Matthew Garrett, Dave Young, stable, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton

It is still a hack unless all relative offsets are preserved.  That is actually simpler, even: no sorting necessary.

On September 26, 2015 11:15:57 AM PDT, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>On 26 September 2015 at 10:20, H. Peter Anvin <hpa@zytor.com> wrote:
>> I think it "works" because the affected BIOSes don't put spaces
>between the chunks.  I have discussed this with Matt.
>>
>
>Forgive the ASCII art but perhaps an illustration might help:
>
>before the 2.5 feature, PE/COFF runtime images were remapped as
>illustrated here:
>
>                                PA                        VA
>+---------------+         +---------------+
>|               |         |               |
>| PE/COFF .text |         |    EFI        |
>|               |         |    Runtime    |
>+- - - - - - - -+    =>   |    Services   |----+
>|               |         |    Code       |    |    :               :
>| PE/COFF .data |         |               |    |    :               :
>|               |         |               |    |    +---------------+
>+---------------+         +---------------+    |    |               |
>|               |         |               |    |    |    EFI        |
>:               :         :               :    |    |    Runtime    |
>:               :         :               :    +--->|    Services   |
>|               |         |               |         |    Code       |
>+---------------+         +---------------+         |               |
>|               |         |               |         |               |
>| PE/COFF .text |         |    EFI        |         +---------------+
>|               |         |    Runtime    |         :      gap      :
>+- - - - - - - -+    =>   |    Services   |---+     +---------------+
>|               |         |    Code       |   |     |               |
>| PE/COFF .data |         |               |   |     |    EFI        |
>|               |         |               |   |     |    Runtime    |
>+---------------+         +---------------+   +---->|    Services   |
>|               |         |               |         |    Code       |
>:               :         :               :         |               |
>:               :         :               :         |               |
>:               :         :               :         +---------------+
>:               :         :               :         :               :
>
>Since the affected symbol references only exist between PE/COFF .text
>and PE/COFF .data, there is never a problem since each is PE/COFF
>image is mapped as a single region.
>However, with the new feature enabled, this no longer holds:
>                                PA                        VA
>+---------------+         +---------------+
>|               |         |               |
>| PE/COFF .text |         |    RtServices |----+
>|               |         |    Code       |    |
>+- - - - - - - -+    =>   +---------------+    |    +---------------+
>|               |         |    RtServices |    +--->|    RtServices |
>| PE/COFF .data |         |    Data       |         |    Code       |
>|               |         |               |----+    +---------------+
>+---------------+         +---------------+    |    :     gap       :
>|               |         |               |    |    +---------------+
>:               :         :               :    +--->|    RtServices |
>:               :         :               :         |    Data       |
>|               |         |               |         +---------------+
>+---------------+         +---------------+         :     gap       :
>|               |         |               |         +---------------+
>| PE/COFF .text |         |    RtServices |-------->|    RtServices |
>|               |         |    Code       |         |    Code       |
>+- - - - - - - -+    =>   +---------------+         +---------------+
>|               |         |    RtServices |         :     gap       :
>| PE/COFF .data |         |    Data       |---+     +---------------+
>|               |         |               |   |     |    RtServices |
>+---------------+         +---------------+   +---->|    Data       |
>|               |         |               |         |               |
>:               :         :               :         +---------------+
>:               :         :               :         :               :
>:               :         :               :         :               :
>
>The illustration uses gaps, but obviously, this applies equally to
>inverting the mapping order, since the PE/COFF .text and .data
>sections will end up out of order.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]             ` <E40EBF14-FD9E-49C0-A7FB-951958F72F79-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  2015-09-26 18:15               ` Ard Biesheuvel
@ 2015-09-26 19:55               ` Matt Fleming
  1 sibling, 0 replies; 48+ messages in thread
From: Matt Fleming @ 2015-09-26 19:55 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Ard Biesheuvel, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton

On Sat, 26 Sep, at 10:20:22AM, H. Peter Anvin wrote:
>
> I think it "works" because the affected BIOSes don't put spaces
> between the chunks.  I have discussed this with Matt.

Right, that's very true. Though note that the current mapping
algorithm will handle a gap <= PMD_SIZE, it's just anything larger
than PMD_SIZE we "squash" to the next multiple of PMD_SIZE.

It's unclear whether the firmware toolchains would even support
references between sections with a PMD_SIZE gap between them, and I
think the firmware engineers would have to go out of their way to
actually insert such a gap.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-26 19:49                 ` H. Peter Anvin
@ 2015-09-26 19:57                   ` Matt Fleming
       [not found]                     ` <20150926195755.GC3144-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Fleming @ 2015-09-26 19:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ard Biesheuvel, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	Matt Fleming, linux-kernel, linux-efi, Lee, Chun-Yi,
	Borislav Petkov, Leif Lindholm, Peter Jones, James Bottomley,
	Matthew Garrett, Dave Young, stable, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton

On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote:
>
> It is still a hack unless all relative offsets are preserved.  That
> is actually simpler, even: no sorting necessary.

Unless I'm missing something, preserving relative offsets is exactly
what we do today, modulo PMD_SIZE gaps.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                     ` <20150926195755.GC3144-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-09-26 20:09                       ` Ard Biesheuvel
  2015-09-26 20:19                         ` H. Peter Anvin
  0 siblings, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-26 20:09 UTC (permalink / raw)
  To: Matt Fleming
  Cc: H. Peter Anvin, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	Matt Fleming, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Andrew Morton


> On 26 sep. 2015, at 12:57, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
> 
>> On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote:
>> 
>> It is still a hack unless all relative offsets are preserved.  That
>> is actually simpler, even: no sorting necessary.
> 
> Unless I'm missing something, preserving relative offsets is exactly
> what we do today, modulo PMD_SIZE gaps.
> 

I think what Peter means is preserving the relative offsets inside the entire 1:1 space.

This is not at all what we do currently, and i don't think it is generally feasible on 32-bit (since the physical range may conflict with the virtual kernel mappings)

However, on 64 bit (both arm and x86), this boils down to not calling setVA() in the first place, which i'm all in favor of.
-- 
Ard.

> -- 
> Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-26 20:09                       ` Ard Biesheuvel
@ 2015-09-26 20:19                         ` H. Peter Anvin
  2015-09-27 16:30                           ` Andy Lutomirski
  0 siblings, 1 reply; 48+ messages in thread
From: H. Peter Anvin @ 2015-09-26 20:19 UTC (permalink / raw)
  To: Ard Biesheuvel, Matt Fleming
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, Matt Fleming,
	linux-kernel, linux-efi, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Andrew Morton

Sadly a lot of firmware is known to fail in that configuration :(  That was very much our guest choice.

I don't actually think it is all that infeasible to keep relative offsets consistent for the regions we have to map. PMD_SIZE is not a very large chunk so it could be a problem.

On September 26, 2015 1:09:17 PM PDT, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
>> On 26 sep. 2015, at 12:57, Matt Fleming <matt@codeblueprint.co.uk>
>wrote:
>> 
>>> On Sat, 26 Sep, at 12:49:26PM, H. Peter Anvin wrote:
>>> 
>>> It is still a hack unless all relative offsets are preserved.  That
>>> is actually simpler, even: no sorting necessary.
>> 
>> Unless I'm missing something, preserving relative offsets is exactly
>> what we do today, modulo PMD_SIZE gaps.
>> 
>
>I think what Peter means is preserving the relative offsets inside the
>entire 1:1 space.
>
>This is not at all what we do currently, and i don't think it is
>generally feasible on 32-bit (since the physical range may conflict
>with the virtual kernel mappings)
>
>However, on 64 bit (both arm and x86), this boils down to not calling
>setVA() in the first place, which i'm all in favor of.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-26 17:01         ` Andy Lutomirski
  2015-09-26 17:20           ` H. Peter Anvin
@ 2015-09-27  6:50           ` Ingo Molnar
  1 sibling, 0 replies; 48+ messages in thread
From: Ingo Molnar @ 2015-09-27  6:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel, linux-efi, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Ard Biesheuvel, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Fri, Sep 25, 2015 at 10:56 PM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > So this commit worries me.
> >
> > This bug is a good find, and the fix is obviously needed and urgent, but I'm not
> > sure about the implementation at all. (I've Cc:-ed a few more x86 low level
> > gents.)
> >
> > * Matt Fleming <matt@codeblueprint.co.uk> wrote:
> >> +             /*
> >> +              * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
> >> +              * config table feature requires us to map all entries
> >> +              * in the same order as they appear in the EFI memory
> >> +              * map. That is to say, entry N must have a lower
> >> +              * virtual address than entry N+1. This is because the
> >> +              * firmware toolchain leaves relative references in
> >> +              * the code/data sections, which are split and become
> >> +              * separate EFI memory regions. Mapping things
> >> +              * out-of-order leads to the firmware accessing
> >> +              * unmapped addresses.
> >> +              *
> 
> I'm clearly missing something.  What is EFI doing that it doesn't care how big 
> the gap between sections is but it still requires them to be in order?  It's not 
> as though x86_64 has an addressing mode that allows only non-negative offsets.

It appears the problem is that what we think to be 'different sections' are in 
reality smaller parts of the same section.

Any relative address calculation will be broken if we don't preserve the relative 
positions of these sections/sub-sections. Any CPU that supports addition is 
affected, it doesn't need any special addressing modes.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]         ` <20150926134329.GA3144-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-09-27  7:03           ` Ingo Molnar
  2015-09-28  6:49             ` Ard Biesheuvel
  0 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2015-09-27  7:03 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel,
	Linus Torvalds, Borislav Petkov, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Andrew Morton


* Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:

> > So could we make the whole code obviously bottom-up? Such as first calculating 
> > the size of virtual memory needed, then allocating a _single_, obviously 
> > continuous mapping, and then doing a very clear in-order mapping within that 
> > window? That would remove any bitness and legacy dependencies.
>  
> So, we could, and in fact the first version of this patch did just that. You can 
> find it here,
> 
>   https://lkml.kernel.org/r/1441372447-23439-1-git-send-email-matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org
> 
> But Ard suggested re-using the existing code and simply changing the order we 
> map the memmap entries in.

Such implementational arguments (which are an internal cost) never trump 
compatibility and robustness concerns (which are an external constraint).

> [...] And given the constraint for a small patch for backporting, I think it's a 
> better solution. [...]

Ugh, backporting size is _even less_ of a valid argument when it comes to firmware 
support correctness!

We can (perhaps) use these already existing patches as a simpler backport, but 
there's absolutely no reason to keep that code as a solution:

> [...] The actual virtual addresses we pick are exactly the same with the two 
> patches.

So I'm NAK-ing this for now:

 - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an
   allocator, because all the sections have already been determined by the
   firmware, and, as we just learned the hard way, we do not want to deviate from 
   that! There's nothing to 'allocate'!

   What these patches seem to implement is an elaborate 'allocator' that ends up
   doing nothing on 'new 64-bit' ...

 - The 32-bit and 64-bit and 'old_mmap' asymmetries:

	if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {

   seem fragile and nonsensical. The question is: is it possible for the whole EFI
   image to be larger than a couple of megabytes? If not then 32-bit should just
   mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything
   differently from this _obvious_ 1:1 mapping of the EFI memory offsets then it's
   not worth keeping as a legacy, because there's just nothing better than
   mirroring the firmware layout.

My suggestion would be to just 1:1 map what the EFI tables describe, modulo the 
single absolute offset by which we shift the whole thing to a single base.

Is there any technical reason why we'd want to deviate from that? Gigabytes of 
tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants an OS 
layout that differs from the firmware layout?

Also, nobody seems to be asking the obvious hardware compatibility question when 
trying to implement a standard influenced in great part by an entity that is 
partly ignorant of and partly hostile to Linux: how does Windows map the EFI 
sections, under what OSs are these firmware versions tested? I suspect no firmware 
is released that crashes on bootup on all OSs that can run on that hardware, 
right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
       [not found]           ` <CAKv+Gu_JRCmx5qKq7SDDq73BegVs3LKZwe+OdvTKb4YD_CURyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-27  7:06             ` Ingo Molnar
  2015-09-27 10:40               ` Borislav Petkov
  2015-09-30  1:03               ` H. Peter Anvin
  0 siblings, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2015-09-27  7:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable-u79uwXL29TY76Z2rM5mHXA, Matt Fleming,
	Mark Rutland, Mark Salter, Linus Torvalds, Andrew Morton,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst


* Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:

> > If we allocate the EFI runtime as a single virtual memory block then issues 
> > like rounding between sections does not even come up as a problem: we map the 
> > original offsets and sizes byte by byte.
> 
> Well, by that reasoning, we should not call SetVirtualAddressMap() in the first 
> place, and just use the 1:1 mapping UEFI uses natively. This is more than 
> feasible on arm64, and I actually fought hard against using 
> SetVirtualAddressMap() at all, but I was overruled by others. I think this is 
> also trivially possible on X64, since the 1:1 mapping is already active 
> alongside the VA mapping.

Could we please re-list all the arguments pro and contra of 1:1 physical mappings, 
in a post that also explains the background so that more people can chime in, not 
just people versed in EFI internals? It's very much possible that a bad decision 
was made.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-27  7:06             ` Ingo Molnar
@ 2015-09-27 10:40               ` Borislav Petkov
  2015-09-28  6:20                 ` Ingo Molnar
                                   ` (2 more replies)
  2015-09-30  1:03               ` H. Peter Anvin
  1 sibling, 3 replies; 48+ messages in thread
From: Borislav Petkov @ 2015-09-27 10:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ard Biesheuvel, Matt Fleming, Thomas Gleixner, H. Peter Anvin,
	linux-kernel, linux-efi, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable, Matt Fleming, Mark Rutland, Mark Salter,
	Linus Torvalds, Andrew Morton, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst

On Sun, Sep 27, 2015 at 09:06:44AM +0200, Ingo Molnar wrote:
> Could we please re-list all the arguments pro and contra of 1:1 physical mappings, 
> in a post that also explains the background so that more people can chime in, not 
> just people versed in EFI internals? It's very much possible that a bad decision 
> was made.

The main reason why we did the additional, top-down mapping was kexec
kernel wanting to use UEFI runtime facilities too and the braindead
design of SetVirtualAddressMap() being callable only once per system
boot. So we had to have stable mappings which are valid in the kexec-ed
kernel too.

But this was long time ago and I most certainly have forgotten all the
details.

And now I'm wondering why didn't we do the 1:1 thing and rebuild the
exact same EFI pagetable in the kexec-ed kernel? Because when we do
an EFI call, we switch to the special pagetable so why didn't we make
the kexec-ed kernel rebuild the 1:1 pagetable which it can use for EFI
calls...

Hmm, again, I've forgotten a lot of details so I'm sure Matt will come
in and say "No, you can't do that because..."

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-26 20:19                         ` H. Peter Anvin
@ 2015-09-27 16:30                           ` Andy Lutomirski
       [not found]                             ` <CALCETrUA6k-oHzuWA54VzEA1v63b5--JPCCuBoOxLisOtjUn+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2015-09-27 16:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Denys Vlasenko, Leif Lindholm, Thomas Gleixner, Borislav Petkov,
	stable, Andrew Morton, Matthew Garrett, Ingo Molnar, Brian Gerst,
	Dave Young, linux-efi, Ard Biesheuvel, Linus Torvalds,
	Peter Jones, Matt Fleming, Matt Fleming, Borislav Petkov, Lee,
	Chun-Yi, linux-kernel, James Bottomley

On Sep 26, 2015 1:19 PM, "H. Peter Anvin" <hpa@zytor.com> wrote:
>
> Sadly a lot of firmware is known to fail in that configuration :(  That was very much our guest choice.
>

Why can't we map everything completely 1:1 (VA = PA) and call the
setVA thing but pass it literally the identity.

Firmwares that need it to be called will work, and firmwares that fail
to update all offsets will be fine because there's nothing to update.

--Andy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                             ` <CALCETrUA6k-oHzuWA54VzEA1v63b5--JPCCuBoOxLisOtjUn+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-27 18:06                               ` Matthew Garrett
       [not found]                                 ` <20150927180633.GA29466-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Matthew Garrett @ 2015-09-27 18:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Denys Vlasenko, Leif Lindholm, Thomas Gleixner,
	Borislav Petkov, stable, Andrew Morton, Ingo Molnar, Brian Gerst,
	Dave Young, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel,
	Linus Torvalds, Peter Jones, Matt Fleming, Matt Fleming,
	Borislav Petkov, Lee, Chun-Yi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley

On Sun, Sep 27, 2015 at 09:30:48AM -0700, Andy Lutomirski wrote:
> On Sep 26, 2015 1:19 PM, "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> wrote:
> >
> > Sadly a lot of firmware is known to fail in that configuration :(  That was very much our guest choice.
> >
> 
> Why can't we map everything completely 1:1 (VA = PA) and call the
> setVA thing but pass it literally the identity.

Last time I tried this I found that some firmware makes assumptions 
about having high addresses.

-- 
Matthew Garrett | mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                                 ` <20150927180633.GA29466-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org>
@ 2015-09-28  6:16                                   ` Ingo Molnar
  2015-09-28  6:41                                     ` Matthew Garrett
       [not found]                                     ` <20150928061646.GA21690-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2015-09-28  6:16 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Andy Lutomirski, H. Peter Anvin, Denys Vlasenko, Leif Lindholm,
	Thomas Gleixner, Borislav Petkov, stable, Andrew Morton,
	Brian Gerst, Dave Young, linux-efi-u79uwXL29TY76Z2rM5mHXA,
	Ard Biesheuvel, Linus Torvalds, Peter Jones, Matt Fleming,
	Matt Fleming, Borislav Petkov, Lee, Chun-Yi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley


* Matthew Garrett <mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org> wrote:

> On Sun, Sep 27, 2015 at 09:30:48AM -0700, Andy Lutomirski wrote:
> > On Sep 26, 2015 1:19 PM, "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> wrote:
> > >
> > > Sadly a lot of firmware is known to fail in that configuration :(  That was very much our guest choice.
> > >
> > 
> > Why can't we map everything completely 1:1 (VA = PA) and call the
> > setVA thing but pass it literally the identity.
> 
> Last time I tried this I found that some firmware makes assumptions 
> about having high addresses.

So the question is, what does Windows do?

PC firmware is a hostile environment for Linux, to be compatible the best we can 
do is to mimic the environment that the firmware is tested under - i.e. try to use 
the firmware in the way Windows uses it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-27 10:40               ` Borislav Petkov
@ 2015-09-28  6:20                 ` Ingo Molnar
  2015-09-29  9:31                 ` Dave Young
       [not found]                 ` <20150927104014.GA7631-fF5Pk5pvG8Y@public.gmane.org>
  2 siblings, 0 replies; 48+ messages in thread
From: Ingo Molnar @ 2015-09-28  6:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ard Biesheuvel, Matt Fleming, Thomas Gleixner, H. Peter Anvin,
	linux-kernel, linux-efi, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable, Matt Fleming, Mark Rutland, Mark Salter,
	Linus Torvalds, Andrew Morton, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst


* Borislav Petkov <bp@alien8.de> wrote:

> On Sun, Sep 27, 2015 at 09:06:44AM +0200, Ingo Molnar wrote:
>
> > Could we please re-list all the arguments pro and contra of 1:1 physical 
> > mappings, in a post that also explains the background so that more people can 
> > chime in, not just people versed in EFI internals? It's very much possible 
> > that a bad decision was made.
> 
> The main reason why we did the additional, top-down mapping was kexec kernel 
> wanting to use UEFI runtime facilities too and the braindead design of 
> SetVirtualAddressMap() being callable only once per system boot. So we had to 
> have stable mappings which are valid in the kexec-ed kernel too.
> 
> But this was long time ago and I most certainly have forgotten all the details.
> 
> And now I'm wondering why didn't we do the 1:1 thing and rebuild the exact same 
> EFI pagetable in the kexec-ed kernel? Because when we do an EFI call, we switch 
> to the special pagetable so why didn't we make the kexec-ed kernel rebuild the 
> 1:1 pagetable which it can use for EFI calls...

Yeah.

> Hmm, again, I've forgotten a lot of details so I'm sure Matt will come in and 
> say "No, you can't do that because..."

Would be nice to re-examine all this.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-28  6:16                                   ` Ingo Molnar
@ 2015-09-28  6:41                                     ` Matthew Garrett
       [not found]                                       ` <20150928064143.GA7380-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org>
       [not found]                                     ` <20150928061646.GA21690-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 48+ messages in thread
From: Matthew Garrett @ 2015-09-28  6:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, H. Peter Anvin, Denys Vlasenko, Leif Lindholm,
	Thomas Gleixner, Borislav Petkov, stable, Andrew Morton,
	Brian Gerst, Dave Young, linux-efi, Ard Biesheuvel,
	Linus Torvalds, Peter Jones, Matt Fleming, Matt Fleming,
	Borislav Petkov, Lee, Chun-Yi, linux-kernel, James Bottomley

On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:

> So the question is, what does Windows do?

It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap() 
arguments to the qemu debug port. Unfortunately I'm about to drop mostly 
offline for a week, otherwise I'd give it a go...

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-27  7:03           ` Ingo Molnar
@ 2015-09-28  6:49             ` Ard Biesheuvel
       [not found]               ` <CAKv+Gu8M2pmzfeA=NT4c44E6-PQvRKHZjEJt78mryGcp6bBD8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-28  6:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel, linux-efi, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Andrew Morton

On 27 September 2015 at 08:03, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>
[...]
>> [...] The actual virtual addresses we pick are exactly the same with the two
>> patches.
>
> So I'm NAK-ing this for now:
>
>  - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an
>    allocator, because all the sections have already been determined by the
>    firmware, and, as we just learned the hard way, we do not want to deviate from
>    that! There's nothing to 'allocate'!
>
>    What these patches seem to implement is an elaborate 'allocator' that ends up
>    doing nothing on 'new 64-bit' ...
>
>  - The 32-bit and 64-bit and 'old_mmap' asymmetries:
>
>         if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>
>    seem fragile and nonsensical. The question is: is it possible for the whole EFI
>    image to be larger than a couple of megabytes? If not then 32-bit should just
>    mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything
>    differently from this _obvious_ 1:1 mapping of the EFI memory offsets then it's
>    not worth keeping as a legacy, because there's just nothing better than
>    mirroring the firmware layout.
>
> My suggestion would be to just 1:1 map what the EFI tables describe, modulo the
> single absolute offset by which we shift the whole thing to a single base.
>
> Is there any technical reason why we'd want to deviate from that? Gigabytes of
> tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants an OS
> layout that differs from the firmware layout?
>

The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1
addressable PA space. They usually don't but it is a possibility,
which means 32-bit will not generally be able to support this
approach. For 64-bit ARM, there are some minor complications when the
base of RAM is up very high in physical memory, but we already fixed
that for the boot time ID map and for KVM.

> Also, nobody seems to be asking the obvious hardware compatibility question when
> trying to implement a standard influenced in great part by an entity that is
> partly ignorant of and partly hostile to Linux: how does Windows map the EFI
> sections, under what OSs are these firmware versions tested? I suspect no firmware
> is released that crashes on bootup on all OSs that can run on that hardware,
> right?
>

Interestingly, it was the other way around this time. The engineers
that implemented this feature for EDK2 could not boot Windows 8
anymore, because it supposedly maps the regions in reverse order as
well (and MS too will need to backport a fix that inverts the mapping
order). The engineers also tested Linux/x86, by means of a SUSE
installer image, which booted fine, most likely due to the fact that
it is an older version which still uses the old memmap layout.

My concern with all of this is that this security feature will become
an obscure opt-in feature rather than something UEFIv2.5 firmware
implementations can enable by default.
-- 
Ard.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]               ` <CAKv+Gu8M2pmzfeA=NT4c44E6-PQvRKHZjEJt78mryGcp6bBD8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-28  8:22                 ` Ingo Molnar
       [not found]                   ` <20150928082245.GA28796-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2015-09-28  8:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable-u79uwXL29TY76Z2rM5mHXA, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton


* Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:

> On 27 September 2015 at 08:03, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >
> > * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
> >
> [...]
> >> [...] The actual virtual addresses we pick are exactly the same with the two
> >> patches.
> >
> > So I'm NAK-ing this for now:
> >
> >  - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an
> >    allocator, because all the sections have already been determined by the
> >    firmware, and, as we just learned the hard way, we do not want to deviate from
> >    that! There's nothing to 'allocate'!
> >
> >    What these patches seem to implement is an elaborate 'allocator' that ends up
> >    doing nothing on 'new 64-bit' ...
> >
> >  - The 32-bit and 64-bit and 'old_mmap' asymmetries:
> >
> >         if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> >
> >    seem fragile and nonsensical. The question is: is it possible for the whole EFI
> >    image to be larger than a couple of megabytes? If not then 32-bit should just
> >    mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything
> >    differently from this _obvious_ 1:1 mapping of the EFI memory offsets then it's
> >    not worth keeping as a legacy, because there's just nothing better than
> >    mirroring the firmware layout.
> >
> > My suggestion would be to just 1:1 map what the EFI tables describe, modulo the
> > single absolute offset by which we shift the whole thing to a single base.
> >
> > Is there any technical reason why we'd want to deviate from that? Gigabytes of
> > tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants an OS
> > layout that differs from the firmware layout?
> >
> 
> The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1 addressable PA 
> space. They usually don't but it is a possibility, which means 32-bit will not 
> generally be able to support this approach. [...]

Ok, that's a good argument which invalidates my NAK.

> [...] For 64-bit ARM, there are some minor complications when the base of RAM is 
> up very high in physical memory, but we already fixed that for the boot time ID 
> map and for KVM.
> 
> > Also, nobody seems to be asking the obvious hardware compatibility question 
> > when trying to implement a standard influenced in great part by an entity that 
> > is partly ignorant of and partly hostile to Linux: how does Windows map the 
> > EFI sections, under what OSs are these firmware versions tested? I suspect no 
> > firmware is released that crashes on bootup on all OSs that can run on that 
> > hardware, right?
> 
> Interestingly, it was the other way around this time. The engineers that 
> implemented this feature for EDK2 could not boot Windows 8 anymore, because it 
> supposedly maps the regions in reverse order as well (and MS too will need to 
> backport a fix that inverts the mapping order). The engineers also tested 
> Linux/x86, by means of a SUSE installer image, which booted fine, most likely 
> due to the fact that it is an older version which still uses the old memmap 
> layout.

That's nice to hear!

> My concern with all of this is that this security feature will become an obscure 
> opt-in feature rather than something UEFIv2.5 firmware implementations can 
> enable by default.

Ok, so I think the patches are mostly fine after all, except that I don't think 
the condition on 64-bit makes any sense:

+       if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {

I can see us being nervous wrt. backported patches, but is there any strong reason 
to not follow this up with a third (non-backported) patch that changes this to:

+       if (!efi_enabled(EFI_OLD_MEMMAP)) {

for v4.4?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                   ` <20150928082245.GA28796-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-09-28  9:51                     ` Ard Biesheuvel
  2015-09-29  9:12                       ` Ingo Molnar
  0 siblings, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-28  9:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable-u79uwXL29TY76Z2rM5mHXA, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton

On 28 September 2015 at 09:22, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> * Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
>> On 27 September 2015 at 08:03, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> >
>> > * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>> >
>> [...]
>> >> [...] The actual virtual addresses we pick are exactly the same with the two
>> >> patches.
>> >
>> > So I'm NAK-ing this for now:
>> >
>> >  - The code is it reads today pretends to be an 'allocator'. It is _NOT_ an
>> >    allocator, because all the sections have already been determined by the
>> >    firmware, and, as we just learned the hard way, we do not want to deviate from
>> >    that! There's nothing to 'allocate'!
>> >
>> >    What these patches seem to implement is an elaborate 'allocator' that ends up
>> >    doing nothing on 'new 64-bit' ...
>> >
>> >  - The 32-bit and 64-bit and 'old_mmap' asymmetries:
>> >
>> >         if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>> >
>> >    seem fragile and nonsensical. The question is: is it possible for the whole EFI
>> >    image to be larger than a couple of megabytes? If not then 32-bit should just
>> >    mirror the firmware layout as well, and if EFI_OLD_MEMMAP does anything
>> >    differently from this _obvious_ 1:1 mapping of the EFI memory offsets then it's
>> >    not worth keeping as a legacy, because there's just nothing better than
>> >    mirroring the firmware layout.
>> >
>> > My suggestion would be to just 1:1 map what the EFI tables describe, modulo the
>> > single absolute offset by which we shift the whole thing to a single base.
>> >
>> > Is there any technical reason why we'd want to deviate from that? Gigabytes of
>> > tables or gigabytes of holes that 32-bit cannot handle? Firmware that wants an OS
>> > layout that differs from the firmware layout?
>> >
>>
>> The combined EFI_MEMORY_RUNTIME regions could span the entire 1:1 addressable PA
>> space. They usually don't but it is a possibility, which means 32-bit will not
>> generally be able to support this approach. [...]
>
> Ok, that's a good argument which invalidates my NAK.
>
>> [...] For 64-bit ARM, there are some minor complications when the base of RAM is
>> up very high in physical memory, but we already fixed that for the boot time ID
>> map and for KVM.
>>
>> > Also, nobody seems to be asking the obvious hardware compatibility question
>> > when trying to implement a standard influenced in great part by an entity that
>> > is partly ignorant of and partly hostile to Linux: how does Windows map the
>> > EFI sections, under what OSs are these firmware versions tested? I suspect no
>> > firmware is released that crashes on bootup on all OSs that can run on that
>> > hardware, right?
>>
>> Interestingly, it was the other way around this time. The engineers that
>> implemented this feature for EDK2 could not boot Windows 8 anymore, because it
>> supposedly maps the regions in reverse order as well (and MS too will need to
>> backport a fix that inverts the mapping order). The engineers also tested
>> Linux/x86, by means of a SUSE installer image, which booted fine, most likely
>> due to the fact that it is an older version which still uses the old memmap
>> layout.
>
> That's nice to hear!
>
>> My concern with all of this is that this security feature will become an obscure
>> opt-in feature rather than something UEFIv2.5 firmware implementations can
>> enable by default.
>
> Ok, so I think the patches are mostly fine after all,

That is good to hear.

> except that I don't think
> the condition on 64-bit makes any sense:
>
> +       if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>
> I can see us being nervous wrt. backported patches, but is there any strong reason
> to not follow this up with a third (non-backported) patch that changes this to:
>
> +       if (!efi_enabled(EFI_OLD_MEMMAP)) {
>
> for v4.4?
>

The 32-bit side essentially implements the old memmap only, which is
the the bottom-up version. So old memmap will be implied by 32-bit but
not set in the EFI flags, resulting in the reverse enumeration being
used with the bottom-up mapping logic. The net result of that is that
we create the same problem for 32-bit that we are trying to solve for
64-bit, i.e., the regions will end up in reverse order in the VA
mapping.

To deobfuscate this particular conditional, we could set
EFI_OLD_MEMMAP unconditionally on 32-bit x86. Or we could reshuffle
variables and conditionals in various other way. I am not convinced
that the overall end result will be any better though.

-- 
Ard.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-28  9:51                     ` Ard Biesheuvel
@ 2015-09-29  9:12                       ` Ingo Molnar
  2015-09-29 10:41                         ` Ard Biesheuvel
       [not found]                         ` <20150929091230.GA2023-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2015-09-29  9:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel, linux-efi, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Andrew Morton


* Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> > except that I don't think
> > the condition on 64-bit makes any sense:
> >
> > +       if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> >
> > I can see us being nervous wrt. backported patches, but is there any strong reason
> > to not follow this up with a third (non-backported) patch that changes this to:
> >
> > +       if (!efi_enabled(EFI_OLD_MEMMAP)) {
> >
> > for v4.4?
> >
> 
> The 32-bit side essentially implements the old memmap only, which is the the 
> bottom-up version. So old memmap will be implied by 32-bit but not set in the 
> EFI flags, resulting in the reverse enumeration being used with the bottom-up 
> mapping logic. The net result of that is that we create the same problem for 
> 32-bit that we are trying to solve for 64-bit, i.e., the regions will end up in 
> reverse order in the VA mapping.
> 
> To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP 
> unconditionally on 32-bit x86. Or we could reshuffle variables and conditionals 
> in various other way.

Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects.

> [...] I am not convinced that the overall end result will be any better though.

That's not true, we change an obscure, implicit dependency on 32-bit detail to an 
explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a clear 
improvement.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-27 10:40               ` Borislav Petkov
  2015-09-28  6:20                 ` Ingo Molnar
@ 2015-09-29  9:31                 ` Dave Young
  2015-09-29 10:24                   ` Borislav Petkov
       [not found]                 ` <20150927104014.GA7631-fF5Pk5pvG8Y@public.gmane.org>
  2 siblings, 1 reply; 48+ messages in thread
From: Dave Young @ 2015-09-29  9:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Ard Biesheuvel, Matt Fleming, Thomas Gleixner,
	H. Peter Anvin, linux-kernel, linux-efi, Leif Lindholm,
	Catalin Marinas, Will Deacon, stable, Matt Fleming, Mark Rutland,
	Mark Salter, Linus Torvalds, Andrew Morton, Andy Lutomirski,
	Denys Vlasenko, Brian Gerst

On 09/27/15 at 12:40pm, Borislav Petkov wrote:
> On Sun, Sep 27, 2015 at 09:06:44AM +0200, Ingo Molnar wrote:
> > Could we please re-list all the arguments pro and contra of 1:1 physical mappings, 
> > in a post that also explains the background so that more people can chime in, not 
> > just people versed in EFI internals? It's very much possible that a bad decision 
> > was made.
> 
> The main reason why we did the additional, top-down mapping was kexec
> kernel wanting to use UEFI runtime facilities too and the braindead
> design of SetVirtualAddressMap() being callable only once per system
> boot. So we had to have stable mappings which are valid in the kexec-ed
> kernel too.
> 
> But this was long time ago and I most certainly have forgotten all the
> details.
> 
> And now I'm wondering why didn't we do the 1:1 thing and rebuild the
> exact same EFI pagetable in the kexec-ed kernel? Because when we do
> an EFI call, we switch to the special pagetable so why didn't we make
> the kexec-ed kernel rebuild the 1:1 pagetable which it can use for EFI
> calls...
> 
> Hmm, again, I've forgotten a lot of details so I'm sure Matt will come
> in and say "No, you can't do that because..."
> 

http://permalink.gmane.org/gmane.linux.kernel.efi/822

And more replies here:
http://comments.gmane.org/gmane.linux.kernel.efi/820

Thanks
Dave

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-29  9:31                 ` Dave Young
@ 2015-09-29 10:24                   ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2015-09-29 10:24 UTC (permalink / raw)
  To: Dave Young
  Cc: Ingo Molnar, Ard Biesheuvel, Matt Fleming, Thomas Gleixner,
	H. Peter Anvin, linux-kernel, linux-efi, Leif Lindholm,
	Catalin Marinas, Will Deacon, stable, Matt Fleming, Mark Rutland,
	Mark Salter, Linus Torvalds, Andrew Morton, Andy Lutomirski,
	Denys Vlasenko, Brian Gerst

On Tue, Sep 29, 2015 at 05:31:00PM +0800, Dave Young wrote:
> http://permalink.gmane.org/gmane.linux.kernel.efi/822
> 
> And more replies here:
> http://comments.gmane.org/gmane.linux.kernel.efi/820

Whatever we did think back then, it all is moot as long as some UEFI
vendors do funky crap or the spec has documentation holes.

So we need a reality check and just do exactly what the OS does with
which UEFI writers are testing with. Be it windoze, be it some open
firmware testing tool or whatever. Provided there's even a tool with
which UEFI validation is done and it is used by everybody...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-29  9:12                       ` Ingo Molnar
@ 2015-09-29 10:41                         ` Ard Biesheuvel
       [not found]                           ` <CAKv+Gu8GpaHRL8EZxJCs8dw8ngFVoYNP1Q=hqhZ=LOF+3vXPLw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]                         ` <20150929091230.GA2023-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-29 10:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel, linux-efi, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Andrew Morton

On 29 September 2015 at 11:12, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
>> > except that I don't think
>> > the condition on 64-bit makes any sense:
>> >
>> > +       if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
>> >
>> > I can see us being nervous wrt. backported patches, but is there any strong reason
>> > to not follow this up with a third (non-backported) patch that changes this to:
>> >
>> > +       if (!efi_enabled(EFI_OLD_MEMMAP)) {
>> >
>> > for v4.4?
>> >
>>
>> The 32-bit side essentially implements the old memmap only, which is the the
>> bottom-up version. So old memmap will be implied by 32-bit but not set in the
>> EFI flags, resulting in the reverse enumeration being used with the bottom-up
>> mapping logic. The net result of that is that we create the same problem for
>> 32-bit that we are trying to solve for 64-bit, i.e., the regions will end up in
>> reverse order in the VA mapping.
>>
>> To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP
>> unconditionally on 32-bit x86. Or we could reshuffle variables and conditionals
>> in various other way.
>
> Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects.
>
>> [...] I am not convinced that the overall end result will be any better though.
>
> That's not true, we change an obscure, implicit dependency on 32-bit detail to an
> explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a clear
> improvement.
>

OK, fair enough. I agree that setting the flag for 32-bit would be
semantically correct. I will leave it to Matt to comment whether it is
reasonable in terms of changes to other parts of the code.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                         ` <20150929091230.GA2023-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-09-29 13:52                           ` Matt Fleming
  0 siblings, 0 replies; 48+ messages in thread
From: Matt Fleming @ 2015-09-29 13:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ard Biesheuvel, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable-u79uwXL29TY76Z2rM5mHXA, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton

On Tue, 29 Sep, at 11:12:30AM, Ingo Molnar wrote:
> 
> * Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> 
> > > except that I don't think
> > > the condition on 64-bit makes any sense:
> > >
> > > +       if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> > >
> > > I can see us being nervous wrt. backported patches, but is there any strong reason
> > > to not follow this up with a third (non-backported) patch that changes this to:
> > >
> > > +       if (!efi_enabled(EFI_OLD_MEMMAP)) {
> > >
> > > for v4.4?
> > >
> > 
> > The 32-bit side essentially implements the old memmap only, which is the the 
> > bottom-up version. So old memmap will be implied by 32-bit but not set in the 
> > EFI flags, resulting in the reverse enumeration being used with the bottom-up 
> > mapping logic. The net result of that is that we create the same problem for 
> > 32-bit that we are trying to solve for 64-bit, i.e., the regions will end up in 
> > reverse order in the VA mapping.
> > 
> > To deobfuscate this particular conditional, we could set EFI_OLD_MEMMAP 
> > unconditionally on 32-bit x86. Or we could reshuffle variables and conditionals 
> > in various other way.
> 
> Setting EFI_OLD_MEMMAP would be fine, if doing that has no bad side effects.
 
Right, I think that's a very good suggestion, because like Ard
mentioned, since EFI_OLD_MEMMAP is implied for 32-bit (there's no
other way to map stuff currently), so it makes sense to force set the
bit.

> > [...] I am not convinced that the overall end result will be any better though.
> 
> That's not true, we change an obscure, implicit dependency on 32-bit detail to an 
> explicit EFI_OLD_MEMMAP flag that shows exactly what's happening. That's a clear 
> improvement.

Agreed.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                           ` <CAKv+Gu8GpaHRL8EZxJCs8dw8ngFVoYNP1Q=hqhZ=LOF+3vXPLw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-29 14:18                             ` Matt Fleming
  0 siblings, 0 replies; 48+ messages in thread
From: Matt Fleming @ 2015-09-29 14:18 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Matt Fleming,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Lee, Chun-Yi, Borislav Petkov,
	Leif Lindholm, Peter Jones, James Bottomley, Matthew Garrett,
	Dave Young, stable-u79uwXL29TY76Z2rM5mHXA, Linus Torvalds,
	Borislav Petkov, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Andrew Morton

On Tue, 29 Sep, at 12:41:23PM, Ard Biesheuvel wrote:
> 
> OK, fair enough. I agree that setting the flag for 32-bit would be
> semantically correct. I will leave it to Matt to comment whether it is
> reasonable in terms of changes to other parts of the code.

It should be pretty minimal. Let me take a swing at the patch.
 
-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
       [not found]                 ` <20150927104014.GA7631-fF5Pk5pvG8Y@public.gmane.org>
@ 2015-09-29 14:36                   ` Matt Fleming
       [not found]                     ` <20150929143612.GC4401-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Fleming @ 2015-09-29 14:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Ard Biesheuvel, Thomas Gleixner, H. Peter Anvin,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable-u79uwXL29TY76Z2rM5mHXA, Matt Fleming,
	Mark Rutland, Mark Salter, Linus Torvalds, Andrew Morton,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst

On Sun, 27 Sep, at 12:40:14PM, Borislav Petkov wrote:
> On Sun, Sep 27, 2015 at 09:06:44AM +0200, Ingo Molnar wrote:
> > Could we please re-list all the arguments pro and contra of 1:1 physical mappings, 
> > in a post that also explains the background so that more people can chime in, not 
> > just people versed in EFI internals? It's very much possible that a bad decision 
> > was made.
> 
> The main reason why we did the additional, top-down mapping was kexec
> kernel wanting to use UEFI runtime facilities too and the braindead
> design of SetVirtualAddressMap() being callable only once per system
> boot. So we had to have stable mappings which are valid in the kexec-ed
> kernel too.
> 
> But this was long time ago and I most certainly have forgotten all the
> details.
 
That's a pretty good summary for x86. I think specifically the reason
we map the EFI memmap entries "backwards" (entry N has higher VA than
entry N+1) is because the code was easier to write that way, but
you'll know better than me ;-)

> And now I'm wondering why didn't we do the 1:1 thing and rebuild the
> exact same EFI pagetable in the kexec-ed kernel? Because when we do
> an EFI call, we switch to the special pagetable so why didn't we make
> the kexec-ed kernel rebuild the 1:1 pagetable which it can use for EFI
> calls...
> 
> Hmm, again, I've forgotten a lot of details so I'm sure Matt will come
> in and say "No, you can't do that because..."

I *think* the only reason was the Apple firmware problem where it
explodes if you pass the 1:1 mappings to SetVirtualAddressMap(). And
obviously people do want to use kexec with Apple machines.

It's probably worth revisiting this whole thing from the x86 side.

-- 
Matt Fleming, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                                       ` <20150928064143.GA7380-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org>
@ 2015-09-29 21:58                                         ` Laszlo Ersek
  2015-09-30  9:30                                           ` Ard Biesheuvel
  0 siblings, 1 reply; 48+ messages in thread
From: Laszlo Ersek @ 2015-09-29 21:58 UTC (permalink / raw)
  To: Matthew Garrett, Ingo Molnar
  Cc: Andy Lutomirski, H. Peter Anvin, Denys Vlasenko, Leif Lindholm,
	Thomas Gleixner, Borislav Petkov, stable, Andrew Morton,
	Brian Gerst, Dave Young, linux-efi-u79uwXL29TY76Z2rM5mHXA,
	Ard Biesheuvel, Linus Torvalds, Peter Jones, Matt Fleming,
	Matt Fleming, Borislav Petkov, Lee, Chun-Yi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Jordan Justen (Intel address)

On 09/28/15 08:41, Matthew Garrett wrote:
> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:
>
>> So the question is, what does Windows do?
>
> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap()
> arguments to the qemu debug port. Unfortunately I'm about to drop
> mostly  offline for a week, otherwise I'd give it a go...

In order to enable the properties table feature in OVMF, two conditions
have to be satisfied:

(a) set gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable to TRUE
(b) build DXE_RUNTIME_DRIVER modules with 4KB section alignment

(a) used to be possible only statically (ie. at build time), but since
edk2 commit ab081a50e5 [1] it can be done dynamically too, on the qemu
command line. See that edk2 commit for details.

This OVMF feature depends on QEMU commit 81b2b81062 [2]. Another patch
from Gabriel is under review that enables such simple settings without
host-side files [3].

(b) is satisfied by the edk2 patch that Ard posted today [4].

Because I know how much people like building OVMF, I uploaded a fresh
OVMF binary (which includes a number of other patches from my personal
tree, but those are irrelevent here), with a matching varstore template,
to [5]. The relevant edk2 patches (ie. the one from Ard [4], and a debug
patch like you mention above) can also be found under [5].

(

I recommend to start QEMU like this:

  # Create a new (empty) varstore for the virtual machine, from the
  # varstore template.

  cp OVMF_VARS.fd my-vars.fd

  # Start the VM, with direct kernel boot.

  qemu-system-x86_64 \
  -m 2048 \
  -M pc,accel=kvm \
  \
  -device qxl-vga \
  \
  -drive if=pflash,format=raw,unit=0,file=OVMF_CODE.fd,readonly=on \
  -drive if=pflash,format=raw,unit=1,file=my-vars.fd \
  \
  -debugcon file:debug.log \
  -global isa-debugcon.iobase=0x402 \
  \
  -chardev stdio,signal=off,mux=on,id=char0 \
  -mon chardev=char0,mode=readline,default \
  -serial chardev:char0 \
  \
  -kernel vmlinuz... \
  -initrd initramfs... \
  -append "root=..." \
  \
  ...

The guest will have 2GB of RAM, it will use KVM, the virtual video card
will be QXL; the OVMF debug log will be written to "debug.log". The key
combination [Control-A C] will switch between the QEMU monitor prompt
and the serial port of the guest. The above should be suitable for rapid
testing of kernels built on the host.

)

Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10
guests, with the properties table feature enabled vs. disabled in the
firmware. (All three Windows guests were updated first though.)

All three Windows OSes adapt their SetVirtualAddressMap() calls, when
the feature is enabled in the firmware. However, Windows 8.1 crashes
nonetheless (BSOD, I forget the fault details, sorry). Windows Server
2012 R2 and Windows 10 boot fine.

I uploaded the verbose OVMF log files from all six guest boots to [5].
The tables you might be interested in are dumped at the ends of the log
files.

All three guests had 2GB of RAM. They had different VM configurations,
but between disabling and enabling the properties table feature, no
other knob was tweaked. Therefore the two log files of the same guest
should be comparable against each other, for each guest. For example:

$ colordiff -u ovmf.win10.prop.{disabled,enabled}.log

Because stuff hosted on the web privately tends to go away, I'll quote
that diff here, for posterity:

> --- ovmf.win10.prop.disabled.log	2015-09-29 22:01:45.252126086 +0200
> +++ ovmf.win10.prop.enabled.log	2015-09-29 21:50:54.579475078 +0200
> @@ -24,7 +24,7 @@
>  QemuFwCfg interface is supported.
>  Platform PEIM Loaded
>  CMOS:
> -00: 38 00 01 00 22 00 03 29 09 15 26 02 00 80 00 00
> +00: 48 00 50 00 21 00 03 29 09 15 26 02 00 80 00 00
>  10: 00 00 00 00 06 80 02 FF FF 00 00 00 00 00 00 00
>  20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  30: FF FF 20 00 00 7F 00 20 30 00 00 00 00 12 00 00
> @@ -1110,6 +1110,17 @@
>  InstallProtocolInterface: 4CF5B200-68B8-4CA5-9EEC-B23E3F50029A 7F271568
>  [Variable]END_OF_DXE is signaled
>  Initialize variable error flag (FF)
> +MemoryProtectionAttribute - 0x0000000000000001
> +Total Image Count - 0x8
> +Dump ImageRecord:
> +  Image[0]: 0x000000007EC26000 - 0x000000000000A000
> +  Image[1]: 0x000000007EC35000 - 0x0000000000008000
> +  Image[2]: 0x000000007EC65000 - 0x000000000000C000
> +  Image[3]: 0x000000007ED76000 - 0x00000000000A3000
> +  Image[4]: 0x000000007FE9E000 - 0x0000000000009000
> +  Image[5]: 0x000000007FEA7000 - 0x000000000000A000
> +  Image[6]: 0x000000007FEB1000 - 0x000000000000C000
> +  Image[7]: 0x000000007FEBD000 - 0x000000000000C000
>  S3Ready!
>  SaveLockBox: Guid=DEA652B0-D587-4C54-B5B4-C682E7A0AA3D Buffer=7FEEE000 Length=0xA
>  SetLockBoxAttributes: Guid=DEA652B0-D587-4C54-B5B4-C682E7A0AA3D Attributes=0x1
> @@ -1822,18 +1833,29 @@
>  InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7F0EBA00
>  Loading driver at 0x00010000000 EntryPoint=0x00010014790 bootmgfw.efi
>  InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7F0B8898
> -RuntimeDriverSetVirtualAddressMap: 12 descriptors
> +RuntimeDriverSetVirtualAddressMap: 23 descriptors
>   # Memory Type  PhysicalStart 0x  VirtualStart 0x          Size 0x Attributes
>  -- ------------ ---------------- ---------------- ---------------- --------------------------------------
> - 0 RuntimeData  000000007EC21000 FFFFFFFFFF7E4000 0000000000005000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 1 RuntimeCode  000000007EC26000 FFFFFFFFFF7E9000 000000000000A000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 2 RuntimeData  000000007EC30000 FFFFFFFFFF7F3000 0000000000005000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 3 RuntimeCode  000000007EC35000 FFFFFFFFFF7F8000 0000000000008000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 4 RuntimeData  000000007EC60000 FFFFFFFFFF800000 0000000000005000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 5 RuntimeCode  000000007EC65000 FFFFFFFFFF805000 000000000000C000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 6 RuntimeData  000000007EC9E000 FFFFFFFFFF811000 00000000000D8000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 7 RuntimeCode  000000007ED76000 FFFFFFFFFF8E9000 00000000000A3000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 8 RuntimeCode  000000007FE99000 FFFFFFFFFF98C000 0000000000030000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> - 9 RuntimeData  000000007FEC9000 FFFFFFFFFF9BC000 0000000000024000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> -10 RuntimeData  000000007FFD0000 FFFFFFFFFF9E0000 0000000000020000 [UC|WC|WT|WB|   |  |  |  |  |  |  |RT]
> -11 RuntimeData  00000000FFE00000 FFFFFFFFFFA00000 0000000000200000 [UC|  |  |  |   |  |  |  |  |  |  |RT]
> + 0 RuntimeData  000000007EC21000 FFFFFFFFFF7E4000 0000000000006000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> + 1 RuntimeCode  000000007EC27000 FFFFFFFFFF7EA000 0000000000007000 [UC|WC|WT|WB|   |  |  |  |RO|  |  |RT]
> + 2 RuntimeData  000000007EC2E000 FFFFFFFFFF7F1000 0000000000007000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> + 3 RuntimeData  000000007EC35000 FFFFFFFFFF7F8000 0000000000001000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> + 4 RuntimeCode  000000007EC36000 FFFFFFFFFF7F9000 0000000000005000 [UC|WC|WT|WB|   |  |  |  |RO|  |  |RT]
> + 5 RuntimeData  000000007EC3B000 FFFFFFFFFF7FE000 0000000000002000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> + 6 RuntimeData  000000007EC60000 FFFFFFFFFF800000 0000000000006000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> + 7 RuntimeCode  000000007EC66000 FFFFFFFFFF806000 0000000000009000 [UC|WC|WT|WB|   |  |  |  |RO|  |  |RT]
> + 8 RuntimeData  000000007EC6F000 FFFFFFFFFF80F000 0000000000002000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> + 9 RuntimeData  000000007EC9E000 FFFFFFFFFF811000 00000000000D9000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> +10 RuntimeCode  000000007ED77000 FFFFFFFFFF8EA000 0000000000097000 [UC|WC|WT|WB|   |  |  |  |RO|  |  |RT]
> +11 RuntimeData  000000007EE0E000 FFFFFFFFFF981000 000000000000B000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> +12 RuntimeData  000000007FE99000 FFFFFFFFFF98C000 0000000000006000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> +13 RuntimeCode  000000007FE9F000 FFFFFFFFFF992000 0000000000006000 [UC|WC|WT|WB|   |  |  |  |RO|  |  |RT]
> +14 RuntimeData  000000007FEA5000 FFFFFFFFFF998000 0000000000003000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> +15 RuntimeCode  000000007FEA8000 FFFFFFFFFF99B000 0000000000007000 [UC|WC|WT|WB|   |  |  |  |RO|  |  |RT]
> +16 RuntimeData  000000007FEAF000 FFFFFFFFFF9A2000 0000000000003000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> +17 RuntimeCode  000000007FEB2000 FFFFFFFFFF9A5000 0000000000009000 [UC|WC|WT|WB|   |  |  |  |RO|  |  |RT]
> +18 RuntimeData  000000007FEBB000 FFFFFFFFFF9AE000 0000000000003000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> +19 RuntimeCode  000000007FEBE000 FFFFFFFFFF9B1000 0000000000009000 [UC|WC|WT|WB|   |  |  |  |RO|  |  |RT]
> +20 RuntimeData  000000007FEC7000 FFFFFFFFFF9BA000 0000000000026000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> +21 RuntimeData  000000007FFD0000 FFFFFFFFFF9E0000 0000000000020000 [UC|WC|WT|WB|   |  |  |XP|  |  |  |RT]
> +22 RuntimeData  00000000FFE00000 FFFFFFFFFFA00000 0000000000200000 [UC|  |  |  |   |  |  |XP|  |  |  |RT]

Thanks
Laszlo

[1] https://github.com/tianocore/edk2/commit/ab081a50e5
[2] https://github.com/qemu/qemu/commit/81b2b81062
[3] http://news.gmane.org/find-root.php?message_id=%3C1443544141-26568-1-git-send-email-somlo-D+Gtc/HYRWM@public.gmane.org%3E
[4] http://thread.gmane.org/gmane.comp.bios.edk2.devel/2640
[5] http://people.redhat.com/~lersek/ovmf_prop_table/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                                     ` <20150928061646.GA21690-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-09-30  0:54                                       ` H. Peter Anvin
  0 siblings, 0 replies; 48+ messages in thread
From: H. Peter Anvin @ 2015-09-30  0:54 UTC (permalink / raw)
  To: Ingo Molnar, Matthew Garrett
  Cc: Andy Lutomirski, Denys Vlasenko, Leif Lindholm, Thomas Gleixner,
	Borislav Petkov, stable, Andrew Morton, Brian Gerst, Dave Young,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel, Linus Torvalds,
	Peter Jones, Matt Fleming, Matt Fleming, Borislav Petkov, Lee,
	Chun-Yi, linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley

On 09/27/2015 11:16 PM, Ingo Molnar wrote:
> 
> So the question is, what does Windows do?
> 
> PC firmware is a hostile environment for Linux, to be compatible the best we can 
> do is to mimic the environment that the firmware is tested under - i.e. try to use 
> the firmware in the way Windows uses it.
> 

Windows apparently went through the same exercise of breakage followed
by a fix.  It is unknown if Windows will preserve gaps since those are
not manifest on existing firmware.

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
       [not found]                     ` <20150929143612.GC4401-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-09-30  0:56                       ` H. Peter Anvin
       [not found]                         ` <560B3327.9020801-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: H. Peter Anvin @ 2015-09-30  0:56 UTC (permalink / raw)
  To: Matt Fleming, Borislav Petkov
  Cc: Ingo Molnar, Ard Biesheuvel, Thomas Gleixner,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable-u79uwXL29TY76Z2rM5mHXA, Matt Fleming,
	Mark Rutland, Mark Salter, Linus Torvalds, Andrew Morton,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst

On 09/29/2015 07:36 AM, Matt Fleming wrote:
>  
> That's a pretty good summary for x86. I think specifically the reason
> we map the EFI memmap entries "backwards" (entry N has higher VA than
> entry N+1) is because the code was easier to write that way, but
> you'll know better than me ;-)
> 

There were two reasons:

1. The code was easier to write.
2. Windows did it that way.

Windows apparently broke and was changed due to this feature, too.

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-27  7:06             ` Ingo Molnar
  2015-09-27 10:40               ` Borislav Petkov
@ 2015-09-30  1:03               ` H. Peter Anvin
  2015-09-30  1:16                 ` Andy Lutomirski
  2015-10-01 10:44                 ` Ingo Molnar
  1 sibling, 2 replies; 48+ messages in thread
From: H. Peter Anvin @ 2015-09-30  1:03 UTC (permalink / raw)
  To: Ingo Molnar, Ard Biesheuvel
  Cc: Matt Fleming, Thomas Gleixner, linux-kernel, linux-efi,
	Leif Lindholm, Catalin Marinas, Will Deacon, stable,
	Matt Fleming, Mark Rutland, Mark Salter, Linus Torvalds,
	Andrew Morton, Andy Lutomirski, Borislav Petkov, Denys Vlasenko,
	Brian Gerst

On 09/27/2015 12:06 AM, Ingo Molnar wrote:
> 
> * Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> 
>>> If we allocate the EFI runtime as a single virtual memory block then issues 
>>> like rounding between sections does not even come up as a problem: we map the 
>>> original offsets and sizes byte by byte.
>>
>> Well, by that reasoning, we should not call SetVirtualAddressMap() in the first 
>> place, and just use the 1:1 mapping UEFI uses natively. This is more than 
>> feasible on arm64, and I actually fought hard against using 
>> SetVirtualAddressMap() at all, but I was overruled by others. I think this is 
>> also trivially possible on X64, since the 1:1 mapping is already active 
>> alongside the VA mapping.
> 
> Could we please re-list all the arguments pro and contra of 1:1 physical mappings, 
> in a post that also explains the background so that more people can chime in, not 
> just people versed in EFI internals? It's very much possible that a bad decision 
> was made.
> 

Pro: by far the sanest way to map the UEFI tables.
Con: doesn't actually work (breaks on several known platforms.)

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-30  1:03               ` H. Peter Anvin
@ 2015-09-30  1:16                 ` Andy Lutomirski
  2015-09-30  1:19                   ` H. Peter Anvin
       [not found]                   ` <CALCETrVvwxZN95swaUDG2fkz9brWV3BQkfMDtLJXZkTRpe8nRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-10-01 10:44                 ` Ingo Molnar
  1 sibling, 2 replies; 48+ messages in thread
From: Andy Lutomirski @ 2015-09-30  1:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Ard Biesheuvel, Matt Fleming, Thomas Gleixner,
	linux-kernel, linux-efi, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable, Matt Fleming, Mark Rutland, Mark Salter,
	Linus Torvalds, Andrew Morton, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On Tue, Sep 29, 2015 at 6:03 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/27/2015 12:06 AM, Ingo Molnar wrote:
>>
>> * Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>
>>>> If we allocate the EFI runtime as a single virtual memory block then issues
>>>> like rounding between sections does not even come up as a problem: we map the
>>>> original offsets and sizes byte by byte.
>>>
>>> Well, by that reasoning, we should not call SetVirtualAddressMap() in the first
>>> place, and just use the 1:1 mapping UEFI uses natively. This is more than
>>> feasible on arm64, and I actually fought hard against using
>>> SetVirtualAddressMap() at all, but I was overruled by others. I think this is
>>> also trivially possible on X64, since the 1:1 mapping is already active
>>> alongside the VA mapping.
>>
>> Could we please re-list all the arguments pro and contra of 1:1 physical mappings,
>> in a post that also explains the background so that more people can chime in, not
>> just people versed in EFI internals? It's very much possible that a bad decision
>> was made.
>>
>
> Pro: by far the sanest way to map the UEFI tables.
> Con: doesn't actually work (breaks on several known platforms.)

Can we at least do 1:1 without an offset on arm64?  Given that Linux
seems to be more of a reference on arm64 than Windows is, maybe that
would give everyone something vaguely sane to work with.

--Andy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-30  1:16                 ` Andy Lutomirski
@ 2015-09-30  1:19                   ` H. Peter Anvin
       [not found]                   ` <CALCETrVvwxZN95swaUDG2fkz9brWV3BQkfMDtLJXZkTRpe8nRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 0 replies; 48+ messages in thread
From: H. Peter Anvin @ 2015-09-30  1:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Ard Biesheuvel, Matt Fleming, Thomas Gleixner,
	linux-kernel, linux-efi, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable, Matt Fleming, Mark Rutland, Mark Salter,
	Linus Torvalds, Andrew Morton, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, Brian Gerst

On 09/29/2015 06:16 PM, Andy Lutomirski wrote:
> 
> Can we at least do 1:1 without an offset on arm64?  Given that Linux
> seems to be more of a reference on arm64 than Windows is, maybe that
> would give everyone something vaguely sane to work with.
> 

I have no idea.  That's a question for the ARM folks.

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
       [not found]                   ` <CALCETrVvwxZN95swaUDG2fkz9brWV3BQkfMDtLJXZkTRpe8nRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-30  4:24                     ` Ard Biesheuvel
  0 siblings, 0 replies; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-30  4:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Ingo Molnar, Matt Fleming, Thomas Gleixner,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable-u79uwXL29TY76Z2rM5mHXA, Matt Fleming,
	Mark Rutland, Mark Salter, Linus Torvalds, Andrew Morton,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Brian Gerst

On 30 September 2015 at 03:16, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> On Tue, Sep 29, 2015 at 6:03 PM, H. Peter Anvin <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org> wrote:
>> On 09/27/2015 12:06 AM, Ingo Molnar wrote:
>>>
>>> * Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>>
>>>>> If we allocate the EFI runtime as a single virtual memory block then issues
>>>>> like rounding between sections does not even come up as a problem: we map the
>>>>> original offsets and sizes byte by byte.
>>>>
>>>> Well, by that reasoning, we should not call SetVirtualAddressMap() in the first
>>>> place, and just use the 1:1 mapping UEFI uses natively. This is more than
>>>> feasible on arm64, and I actually fought hard against using
>>>> SetVirtualAddressMap() at all, but I was overruled by others. I think this is
>>>> also trivially possible on X64, since the 1:1 mapping is already active
>>>> alongside the VA mapping.
>>>
>>> Could we please re-list all the arguments pro and contra of 1:1 physical mappings,
>>> in a post that also explains the background so that more people can chime in, not
>>> just people versed in EFI internals? It's very much possible that a bad decision
>>> was made.
>>>
>>
>> Pro: by far the sanest way to map the UEFI tables.
>> Con: doesn't actually work (breaks on several known platforms.)
>
> Can we at least do 1:1 without an offset on arm64?  Given that Linux
> seems to be more of a reference on arm64 than Windows is, maybe that
> would give everyone something vaguely sane to work with.
>

Yes, as I mentioned before in this thread, on arm64 this is very much
feasible, and it was my strong preference all along. However, the
arguments made by others that outweighed my preference were
(a) x86 uses it
(b) if we don't use it now, we will never be able to start using it
later since it will undoubtedly be broken in /some/ implementation in
the field.

As I also mentioned, the only minor complication is that arm64's VA
space may be configured to be smaller than the physical base of DRAM,
but I already had to address that for the boot ID map and KVM as well.

I will cook up a patch later today.

-- 
Ard.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
       [not found]                         ` <560B3327.9020801-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2015-09-30  8:33                           ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2015-09-30  8:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Matt Fleming, Ingo Molnar, Ard Biesheuvel, Thomas Gleixner,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Leif Lindholm, Catalin Marinas,
	Will Deacon, stable-u79uwXL29TY76Z2rM5mHXA, Matt Fleming,
	Mark Rutland, Mark Salter, Linus Torvalds, Andrew Morton,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst

On Tue, Sep 29, 2015 at 05:56:07PM -0700, H. Peter Anvin wrote:
> On 09/29/2015 07:36 AM, Matt Fleming wrote:
> >  
> > That's a pretty good summary for x86. I think specifically the reason
> > we map the EFI memmap entries "backwards" (entry N has higher VA than
> > entry N+1) is because the code was easier to write that way, but
> > you'll know better than me ;-)
> > 
> 
> There were two reasons:
> 
> 1. The code was easier to write.
> 2. Windows did it that way.
> 
> Windows apparently broke and was changed due to this feature, too.

So can we do the 1:1 thing again?

I mean, we do create a special pagetable for EFI anyway, we can put in
there whatever we want.

I know, some apple boxes reportedly fail when 1:1 mapping is in use but
we can do the VA mapping as a workaround for them. I.e., have the 1:1
mapping be the default.

Apparently, there's not a single OS or tool which is used by fw writers
to test their brain dumplings. Windoze breakage case-in-point.

Because if there were, we'd simply do what that OS/tool does and be done
with it.

What really makes me climb the walls is when half-cooked, untested fw
hits the wild and we have to support it indefinitely.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-29 21:58                                         ` Laszlo Ersek
@ 2015-09-30  9:30                                           ` Ard Biesheuvel
       [not found]                                             ` <CAKv+Gu-enULj-OMRufB3OPSUVueM1843bD1CdFybaAS7xMUKgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2015-09-30  9:30 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Matthew Garrett, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
	Denys Vlasenko, Leif Lindholm, Thomas Gleixner, Borislav Petkov,
	stable, Andrew Morton, Brian Gerst, Dave Young, linux-efi,
	Linus Torvalds, Peter Jones, Matt Fleming, Matt Fleming,
	Borislav Petkov, Lee, Chun-Yi, linux-kernel, James Bottomley,
	Jordan Justen (Intel address)

On 29 September 2015 at 23:58, Laszlo Ersek <lersek@redhat.com> wrote:
> On 09/28/15 08:41, Matthew Garrett wrote:
>> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:
>>
>>> So the question is, what does Windows do?
>>
>> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap()
>> arguments to the qemu debug port. Unfortunately I'm about to drop
>> mostly  offline for a week, otherwise I'd give it a go...
[...]
> Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10
> guests, with the properties table feature enabled vs. disabled in the
> firmware. (All three Windows guests were updated first though.)
>
> All three Windows OSes adapt their SetVirtualAddressMap() calls, when
> the feature is enabled in the firmware. However, Windows 8.1 crashes
> nonetheless (BSOD, I forget the fault details, sorry). Windows Server
> 2012 R2 and Windows 10 boot fine.
>

Looking at the log, it seems the VA mapping strategy is actually the
same (i.e., bottom-up for Win10), and the difference can be explained
by the differences in the memory map provided by the firmware to the
OS. And indeed, the Win8.1 log shows the following:

 # MemType Phys 0x  Virt 0x  Size 0x Attributes
-- ------- -------- -------- ------- -------------------------------
 0 RtData  7EC21000 FFBFA000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 1 RtCode  7EC27000 FFBF3000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
 2 RtData  7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 3 RtData  7EC35000 FFBEB000 0001000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 4 RtCode  7EC36000 FFBE6000 0005000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
 5 RtData  7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 6 RtData  7EC60000 FFBDE000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 7 RtCode  7EC66000 FFBD5000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
 8 RtData  7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
 9 RtData  7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
10 RtCode  7ED77000 FFA63000 0097000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
11 RtData  7EE0E000 FFA58000 000B000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
12 RtData  7FE99000 FFA52000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
13 RtCode  7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
14 RtData  7FEA5000 FFA49000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
15 RtCode  7FEA8000 FFA42000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
16 RtData  7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
17 RtCode  7FEB2000 FFA36000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
18 RtData  7FEBB000 FFA33000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
19 RtCode  7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
20 RtData  7FEC7000 FFA04000 0026000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
21 RtData  7FFD0000 FF9E4000 0020000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
22 RtData  FFE00000 FF7E4000 0200000 [UC|  |  |  |  |XP|  |  |  |RT]

I.e., the physical addresses increase while the virtual addresses
decrease, and since each consecutive RuntimeCode/RuntimeData pair
constitutes a PE/COFF image (.text and .data, respectively), the
PE/COFF images appear corrupted in the virtual space.

> I uploaded the verbose OVMF log files from all six guest boots to [5].
> The tables you might be interested in are dumped at the ends of the log
> files.
>
> All three guests had 2GB of RAM. They had different VM configurations,
> but between disabling and enabling the properties table feature, no
> other knob was tweaked. Therefore the two log files of the same guest
> should be comparable against each other, for each guest. For example:
>
> $ colordiff -u ovmf.win10.prop.{disabled,enabled}.log
>
> Because stuff hosted on the web privately tends to go away, I'll quote
> that diff here, for posterity:
>
>> --- ovmf.win10.prop.disabled.log      2015-09-29 22:01:45.252126086 +0200
>> +++ ovmf.win10.prop.enabled.log       2015-09-29 21:50:54.579475078 +0200
[...]

Thanks a lot for taking the time, this is very useful info.

-- 
Ard.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
       [not found]                                             ` <CAKv+Gu-enULj-OMRufB3OPSUVueM1843bD1CdFybaAS7xMUKgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-30 16:43                                               ` Andy Lutomirski
  2015-09-30 17:24                                                 ` James Bottomley
  0 siblings, 1 reply; 48+ messages in thread
From: Andy Lutomirski @ 2015-09-30 16:43 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Laszlo Ersek, Matthew Garrett, Ingo Molnar, H. Peter Anvin,
	Denys Vlasenko, Leif Lindholm, Thomas Gleixner, Borislav Petkov,
	stable, Andrew Morton, Brian Gerst, Dave Young,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Linus Torvalds, Peter Jones,
	Matt Fleming, Matt Fleming, Borislav Petkov, Lee, Chun-Yi,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, James Bottomley,
	Jordan Justen (Intel address)

On Wed, Sep 30, 2015 at 2:30 AM, Ard Biesheuvel
<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On 29 September 2015 at 23:58, Laszlo Ersek <lersek-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> On 09/28/15 08:41, Matthew Garrett wrote:
>>> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:
>>>
>>>> So the question is, what does Windows do?
>>>
>>> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap()
>>> arguments to the qemu debug port. Unfortunately I'm about to drop
>>> mostly  offline for a week, otherwise I'd give it a go...
> [...]
>> Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10
>> guests, with the properties table feature enabled vs. disabled in the
>> firmware. (All three Windows guests were updated first though.)
>>
>> All three Windows OSes adapt their SetVirtualAddressMap() calls, when
>> the feature is enabled in the firmware. However, Windows 8.1 crashes
>> nonetheless (BSOD, I forget the fault details, sorry). Windows Server
>> 2012 R2 and Windows 10 boot fine.
>>
>
> Looking at the log, it seems the VA mapping strategy is actually the
> same (i.e., bottom-up for Win10), and the difference can be explained
> by the differences in the memory map provided by the firmware to the
> OS. And indeed, the Win8.1 log shows the following:
>
>  # MemType Phys 0x  Virt 0x  Size 0x Attributes
> -- ------- -------- -------- ------- -------------------------------
>  0 RtData  7EC21000 FFBFA000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  1 RtCode  7EC27000 FFBF3000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
>  2 RtData  7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  3 RtData  7EC35000 FFBEB000 0001000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  4 RtCode  7EC36000 FFBE6000 0005000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
>  5 RtData  7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  6 RtData  7EC60000 FFBDE000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  7 RtCode  7EC66000 FFBD5000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
>  8 RtData  7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
>  9 RtData  7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 10 RtCode  7ED77000 FFA63000 0097000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 11 RtData  7EE0E000 FFA58000 000B000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 12 RtData  7FE99000 FFA52000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 13 RtCode  7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 14 RtData  7FEA5000 FFA49000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 15 RtCode  7FEA8000 FFA42000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 16 RtData  7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 17 RtCode  7FEB2000 FFA36000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 18 RtData  7FEBB000 FFA33000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 19 RtCode  7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> 20 RtData  7FEC7000 FFA04000 0026000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 21 RtData  7FFD0000 FF9E4000 0020000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> 22 RtData  FFE00000 FF7E4000 0200000 [UC|  |  |  |  |XP|  |  |  |RT]
>
> I.e., the physical addresses increase while the virtual addresses
> decrease, and since each consecutive RuntimeCode/RuntimeData pair
> constitutes a PE/COFF image (.text and .data, respectively), the
> PE/COFF images appear corrupted in the virtual space.

All of this garbage makes me want to ask a rhetorical question:

Why on Earth did anyone think it's a good idea to invoke EFI functions
at CPL0 once the OS is booted?

And a more practical question:

Do we actually have to invoke EFI functions at CPL0?

I really mean it.  Sure, for things like reboot where we give up
control and don't get it back, we need to do that.  But for things
like variable access, the EFI code should really only need access to
EFI memor (with a known PA -> VA map) and the ability to trigger an
SMI.  Doing it at CPL3 could require more fixups than would really
make sense, but could we virtualize it instead?

Actually, CPL3 + IOPL3 just might work.

Heck, on mixed-mode, we're already invoke EFI functions in compat
mode, and that seems okay, so those functions can't be poking at any
CPU state that varies between long and 32-bit modes.

--Andy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime
  2015-09-30 16:43                                               ` Andy Lutomirski
@ 2015-09-30 17:24                                                 ` James Bottomley
  0 siblings, 0 replies; 48+ messages in thread
From: James Bottomley @ 2015-09-30 17:24 UTC (permalink / raw)
  To: luto
  Cc: matt, mingo, pjones, ard.biesheuvel, jlee, torvalds, tglx,
	lersek, dyoung, linux-kernel, stable, jordan.l.justen, akpm, hpa,
	brgerst, linux-efi, bp, bp, dvlasenk

On Wed, 2015-09-30 at 09:43 -0700, Andy Lutomirski wrote:
> On Wed, Sep 30, 2015 at 2:30 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> > On 29 September 2015 at 23:58, Laszlo Ersek <lersek@redhat.com> wrote:
> >> On 09/28/15 08:41, Matthew Garrett wrote:
> >>> On Mon, Sep 28, 2015 at 08:16:46AM +0200, Ingo Molnar wrote:
> >>>
> >>>> So the question is, what does Windows do?
> >>>
> >>> It's pretty trivial to hack OVMF to dump the SetVirtualAddressMap()
> >>> arguments to the qemu debug port. Unfortunately I'm about to drop
> >>> mostly  offline for a week, otherwise I'd give it a go...
> > [...]
> >> Then I booted my Windows Server 2012 R2, Windows 8.1, and Windows 10
> >> guests, with the properties table feature enabled vs. disabled in the
> >> firmware. (All three Windows guests were updated first though.)
> >>
> >> All three Windows OSes adapt their SetVirtualAddressMap() calls, when
> >> the feature is enabled in the firmware. However, Windows 8.1 crashes
> >> nonetheless (BSOD, I forget the fault details, sorry). Windows Server
> >> 2012 R2 and Windows 10 boot fine.
> >>
> >
> > Looking at the log, it seems the VA mapping strategy is actually the
> > same (i.e., bottom-up for Win10), and the difference can be explained
> > by the differences in the memory map provided by the firmware to the
> > OS. And indeed, the Win8.1 log shows the following:
> >
> >  # MemType Phys 0x  Virt 0x  Size 0x Attributes
> > -- ------- -------- -------- ------- -------------------------------
> >  0 RtData  7EC21000 FFBFA000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  1 RtCode  7EC27000 FFBF3000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> >  2 RtData  7EC2E000 FFBEC000 0007000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  3 RtData  7EC35000 FFBEB000 0001000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  4 RtCode  7EC36000 FFBE6000 0005000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> >  5 RtData  7EC3B000 FFBE4000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  6 RtData  7EC60000 FFBDE000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  7 RtCode  7EC66000 FFBD5000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> >  8 RtData  7EC6F000 FFBD3000 0002000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> >  9 RtData  7EC9E000 FFAFA000 00D9000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 10 RtCode  7ED77000 FFA63000 0097000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 11 RtData  7EE0E000 FFA58000 000B000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 12 RtData  7FE99000 FFA52000 0006000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 13 RtCode  7FE9F000 FFA4C000 0006000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 14 RtData  7FEA5000 FFA49000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 15 RtCode  7FEA8000 FFA42000 0007000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 16 RtData  7FEAF000 FFA3F000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 17 RtCode  7FEB2000 FFA36000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 18 RtData  7FEBB000 FFA33000 0003000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 19 RtCode  7FEBE000 FFA2A000 0009000 [UC|WC|WT|WB|  |  |RO|  |  |RT]
> > 20 RtData  7FEC7000 FFA04000 0026000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 21 RtData  7FFD0000 FF9E4000 0020000 [UC|WC|WT|WB|  |XP|  |  |  |RT]
> > 22 RtData  FFE00000 FF7E4000 0200000 [UC|  |  |  |  |XP|  |  |  |RT]
> >
> > I.e., the physical addresses increase while the virtual addresses
> > decrease, and since each consecutive RuntimeCode/RuntimeData pair
> > constitutes a PE/COFF image (.text and .data, respectively), the
> > PE/COFF images appear corrupted in the virtual space.
> 
> All of this garbage makes me want to ask a rhetorical question:
> 
> Why on Earth did anyone think it's a good idea to invoke EFI functions
> at CPL0 once the OS is booted?

I'm afraid the originators of EFI (Intel) look on it as a DOS
replacement ... with the same OS support.

> And a more practical question:
> 
> Do we actually have to invoke EFI functions at CPL0?
> 
> I really mean it.  Sure, for things like reboot where we give up
> control and don't get it back, we need to do that.  But for things
> like variable access, the EFI code should really only need access to
> EFI memor (with a known PA -> VA map) and the ability to trigger an
> SMI.  Doing it at CPL3 could require more fixups than would really
> make sense, but could we virtualize it instead?
> 
> Actually, CPL3 + IOPL3 just might work.
> 
> Heck, on mixed-mode, we're already invoke EFI functions in compat
> mode, and that seems okay, so those functions can't be poking at any
> CPU state that varies between long and 32-bit modes.

It's hard.  The EFI functions expect to interact directly with kernel
memory, which they can't at CPL3.  We could vector all that through a
CPL3 readable buffer but anything within EFI that uses privileged
instructions will fault and we'll have to handle it ... this really
sounds like a can of worms.  Especially as windows will be no help
testing all of this because it will call in at CPL0.

James


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions
  2015-09-30  1:03               ` H. Peter Anvin
  2015-09-30  1:16                 ` Andy Lutomirski
@ 2015-10-01 10:44                 ` Ingo Molnar
  1 sibling, 0 replies; 48+ messages in thread
From: Ingo Molnar @ 2015-10-01 10:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ard Biesheuvel, Matt Fleming, Thomas Gleixner, linux-kernel,
	linux-efi, Leif Lindholm, Catalin Marinas, Will Deacon, stable,
	Matt Fleming, Mark Rutland, Mark Salter, Linus Torvalds,
	Andrew Morton, Andy Lutomirski, Borislav Petkov, Denys Vlasenko,
	Brian Gerst


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 09/27/2015 12:06 AM, Ingo Molnar wrote:
> > 
> > * Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > 
> >>> If we allocate the EFI runtime as a single virtual memory block then issues 
> >>> like rounding between sections does not even come up as a problem: we map the 
> >>> original offsets and sizes byte by byte.
> >>
> >> Well, by that reasoning, we should not call SetVirtualAddressMap() in the first 
> >> place, and just use the 1:1 mapping UEFI uses natively. This is more than 
> >> feasible on arm64, and I actually fought hard against using 
> >> SetVirtualAddressMap() at all, but I was overruled by others. I think this is 
> >> also trivially possible on X64, since the 1:1 mapping is already active 
> >> alongside the VA mapping.
> > 
> > Could we please re-list all the arguments pro and contra of 1:1 physical mappings, 
> > in a post that also explains the background so that more people can chime in, not 
> > just people versed in EFI internals? It's very much possible that a bad decision 
> > was made.
> > 
> 
> Pro: by far the sanest way to map the UEFI tables.
> Con: doesn't actually work (breaks on several known platforms.)

You knew this next question was coming: in what way does it break on known 
platforms?

I.e. do those platforms require a SetVirtualAddressMap() call and break if one 
does not come?

Note that there's 3 models possible:

 - pure 1:1
 - 1:1 plus offset, with SetVirtualAddressMap(offset)
 - bottom up allocator

I don't think we want 'pure' 1:1 physical/virtual (for security reasons, etc.).

So the question is, in what way does our current proposed bottom-up allocator 
differ from 1:1 plus offset? My impression is that they are mostly identical.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 2/2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions
       [not found] ` <1435659443-17625-1-git-send-email-ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
@ 2015-06-30 10:17   ` Ard Biesheuvel
  0 siblings, 0 replies; 48+ messages in thread
From: Ard Biesheuvel @ 2015-06-30 10:17 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA,
	matt.fleming-ral2JQCrhuEAvxtiuMwx3w, mark.rutland-5wv7dgnIgG8
  Cc: leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	roy.franz-QSEj5FYQhm4dnm+yROfE0A, msalter-H+wXaHxf7aLQT0dZR+AlfA,
	lersek-H+wXaHxf7aLQT0dZR+AlfA, Ard Biesheuvel

The new Properties Table feature introduced in UEFIv2.5 may split
memory regions that cover PE/COFF memory images into separate code
and data regions.

Since the relative offset of PE/COFF .text and .data segments cannot
be changed on the fly, this means that we can no longer pad out those
regions to be mappable using 64 KB pages.
Unfortunately, there is no annotation in the UEFI memory map that
identifies data regions that were split off from a code region, so we
must apply this logic to all runtime code and data regions.

So instead of rounding each memory region to 64 KB alignment at both
ends, only round down regions that are not directly preceded by another
runtime region. Since the UEFI spec does not mandate that the memory map
be sorted, this means we also need to sort it first.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
 drivers/firmware/efi/libstub/arm-stub.c | 58 +++++++++++++++-----
 1 file changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
index e29560e6b40b..dd9e2addc9fb 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -13,6 +13,7 @@
  */
 
 #include <linux/efi.h>
+#include <linux/sort.h>
 #include <asm/efi.h>
 
 #include "efistub.h"
@@ -305,6 +306,13 @@ fail:
  */
 #define EFI_RT_VIRTUAL_BASE	0x40000000
 
+static int cmp_mem_desc(const void *a, const void *b)
+{
+	efi_memory_desc_t const *left = a, *right = b;
+
+	return (left->phys_addr > right->phys_addr) ? 1 : -1;
+}
+
 /*
  * efi_get_virtmap() - create a virtual mapping for the EFI memory map
  *
@@ -317,33 +325,53 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
 		     int *count)
 {
 	u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
-	efi_memory_desc_t *out = runtime_map;
+	efi_memory_desc_t *in, *prev = NULL, *out = runtime_map;
 	int l;
 
-	for (l = 0; l < map_size; l += desc_size) {
-		efi_memory_desc_t *in = (void *)memory_map + l;
+	/*
+	 * To work around potential issues with the Properties Table feature
+	 * introduced in UEFI 2.5 (which may split PE/COFF memory images
+	 * into several RuntimeServicesCode and RuntimeServicesData regions
+	 * whose relative offset in memory needs to be retained), we need to
+	 * sort the memory map before traversing it, and avoid padding out those
+	 * regions to 64 KB granularity.
+	 */
+	sort(memory_map, map_size / desc_size, desc_size, cmp_mem_desc, NULL);
+
+	for (l = 0; l < map_size; l += desc_size, prev = in) {
 		u64 paddr, size;
 
+		in = (void *)memory_map + l;
 		if (!(in->attribute & EFI_MEMORY_RUNTIME))
 			continue;
 
+		paddr = in->phys_addr;
+		size = in->num_pages * EFI_PAGE_SIZE;
+
 		/*
 		 * Make the mapping compatible with 64k pages: this allows
 		 * a 4k page size kernel to kexec a 64k page size kernel and
 		 * vice versa.
 		 */
-		paddr = round_down(in->phys_addr, SZ_64K);
-		size = round_up(in->num_pages * EFI_PAGE_SIZE +
-				in->phys_addr - paddr, SZ_64K);
-
-		/*
-		 * Avoid wasting memory on PTEs by choosing a virtual base that
-		 * is compatible with section mappings if this region has the
-		 * appropriate size and physical alignment. (Sections are 2 MB
-		 * on 4k granule kernels)
-		 */
-		if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
-			efi_virt_base = round_up(efi_virt_base, SZ_2M);
+		if (!prev ||
+		    !(paddr == (prev->phys_addr +
+				prev->num_pages * EFI_PAGE_SIZE) &&
+		      (prev->attribute & EFI_MEMORY_RUNTIME))) {
+
+			paddr = round_down(in->phys_addr, SZ_64K);
+			size += in->phys_addr - paddr;
+
+			/*
+			 * Avoid wasting memory on PTEs by choosing a virtual
+			 * base that is compatible with section mappings if this
+			 * region has the appropriate size and physical
+			 * alignment. (Sections are 2 MB on 4k granule kernels)
+			 */
+			if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
+				efi_virt_base = round_up(efi_virt_base, SZ_2M);
+			else
+				efi_virt_base = round_up(efi_virt_base, SZ_64K);
+		}
 
 		in->virt_addr = efi_virt_base + in->phys_addr - paddr;
 		efi_virt_base += size;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-10-01 10:44 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-25 22:02 [GIT PULL 0/2] EFI urgent fixes Matt Fleming
2015-09-25 22:02 ` [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime Matt Fleming
     [not found]   ` <1443218539-7610-2-git-send-email-matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-09-26  5:56     ` Ingo Molnar
     [not found]       ` <20150926055643.GA25877-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-26  6:44         ` Ard Biesheuvel
2015-09-26 17:01         ` Andy Lutomirski
2015-09-26 17:20           ` H. Peter Anvin
     [not found]             ` <E40EBF14-FD9E-49C0-A7FB-951958F72F79-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2015-09-26 18:15               ` Ard Biesheuvel
2015-09-26 19:49                 ` H. Peter Anvin
2015-09-26 19:57                   ` Matt Fleming
     [not found]                     ` <20150926195755.GC3144-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-09-26 20:09                       ` Ard Biesheuvel
2015-09-26 20:19                         ` H. Peter Anvin
2015-09-27 16:30                           ` Andy Lutomirski
     [not found]                             ` <CALCETrUA6k-oHzuWA54VzEA1v63b5--JPCCuBoOxLisOtjUn+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-27 18:06                               ` Matthew Garrett
     [not found]                                 ` <20150927180633.GA29466-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org>
2015-09-28  6:16                                   ` Ingo Molnar
2015-09-28  6:41                                     ` Matthew Garrett
     [not found]                                       ` <20150928064143.GA7380-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org>
2015-09-29 21:58                                         ` Laszlo Ersek
2015-09-30  9:30                                           ` Ard Biesheuvel
     [not found]                                             ` <CAKv+Gu-enULj-OMRufB3OPSUVueM1843bD1CdFybaAS7xMUKgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-30 16:43                                               ` Andy Lutomirski
2015-09-30 17:24                                                 ` James Bottomley
     [not found]                                     ` <20150928061646.GA21690-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-30  0:54                                       ` H. Peter Anvin
2015-09-26 19:55               ` Matt Fleming
2015-09-27  6:50           ` Ingo Molnar
2015-09-26 13:43       ` Matt Fleming
     [not found]         ` <20150926134329.GA3144-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-09-27  7:03           ` Ingo Molnar
2015-09-28  6:49             ` Ard Biesheuvel
     [not found]               ` <CAKv+Gu8M2pmzfeA=NT4c44E6-PQvRKHZjEJt78mryGcp6bBD8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-28  8:22                 ` Ingo Molnar
     [not found]                   ` <20150928082245.GA28796-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-28  9:51                     ` Ard Biesheuvel
2015-09-29  9:12                       ` Ingo Molnar
2015-09-29 10:41                         ` Ard Biesheuvel
     [not found]                           ` <CAKv+Gu8GpaHRL8EZxJCs8dw8ngFVoYNP1Q=hqhZ=LOF+3vXPLw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-29 14:18                             ` Matt Fleming
     [not found]                         ` <20150929091230.GA2023-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-29 13:52                           ` Matt Fleming
2015-09-25 22:02 ` [PATCH 2/2] arm64/efi: Don't pad between EFI_MEMORY_RUNTIME regions Matt Fleming
     [not found]   ` <1443218539-7610-3-git-send-email-matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-09-26  6:01     ` Ingo Molnar
     [not found]       ` <20150926060159.GB25877-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-09-26  7:08         ` Ard Biesheuvel
     [not found]           ` <CAKv+Gu_JRCmx5qKq7SDDq73BegVs3LKZwe+OdvTKb4YD_CURyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-27  7:06             ` Ingo Molnar
2015-09-27 10:40               ` Borislav Petkov
2015-09-28  6:20                 ` Ingo Molnar
2015-09-29  9:31                 ` Dave Young
2015-09-29 10:24                   ` Borislav Petkov
     [not found]                 ` <20150927104014.GA7631-fF5Pk5pvG8Y@public.gmane.org>
2015-09-29 14:36                   ` Matt Fleming
     [not found]                     ` <20150929143612.GC4401-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-09-30  0:56                       ` H. Peter Anvin
     [not found]                         ` <560B3327.9020801-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2015-09-30  8:33                           ` Borislav Petkov
2015-09-30  1:03               ` H. Peter Anvin
2015-09-30  1:16                 ` Andy Lutomirski
2015-09-30  1:19                   ` H. Peter Anvin
     [not found]                   ` <CALCETrVvwxZN95swaUDG2fkz9brWV3BQkfMDtLJXZkTRpe8nRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-30  4:24                     ` Ard Biesheuvel
2015-10-01 10:44                 ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2015-06-30 10:17 [PATCH 0/2] arm64/efi: adapt to UEFI 2.5 properties table changes Ard Biesheuvel
     [not found] ` <1435659443-17625-1-git-send-email-ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2015-06-30 10:17   ` [PATCH 2/2] arm64/efi: don't pad between EFI_MEMORY_RUNTIME regions Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).