linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] x86/efi: don't allocate memmap through memblock after mm_init()
@ 2017-01-05 12:51 Nicolai Stange
  2017-01-05 12:51 ` [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve " Nicolai Stange
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Nicolai Stange @ 2017-01-05 12:51 UTC (permalink / raw)
  To: Ingo Molnar, Matt Fleming
  Cc: Thomas Gleixner, H. Peter Anvin, x86, Mika Penttilä,
	Dan Williams, Ard Biesheuvel, Dave Young, linux-efi,
	linux-kernel, Nicolai Stange

With commit 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid
copying image data"), efi_bgrt_init() calls into the memblock allocator
through efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init()
has been called.

Indeed, KASAN reports a bad read access later on in
efi_free_boot_services():

  BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
            at addr ffff88022de12740
  Read of size 4 by task swapper/0/0
  page:ffffea0008b78480 count:0 mapcount:-127
  mapping:          (null) index:0x1 flags: 0x5fff8000000000()
  [...]
  Call Trace:
   dump_stack+0x68/0x9f
   kasan_report_error+0x4c8/0x500
   kasan_report+0x58/0x60
   __asan_load4+0x61/0x80
   efi_free_boot_services+0xae/0x24c
   start_kernel+0x527/0x562
   x86_64_start_reservations+0x24/0x26
   x86_64_start_kernel+0x157/0x17a
   start_cpu+0x5/0x14

The instruction at the given address is the first read from the memmap's
memory, i.e. the read of md->type in efi_free_boot_services().

Note that the writes earlier in efi_arch_mem_reserve() don't splat because
they're done through early_memremap()ed addresses.

So, after memblock is gone, allocations should be done through the "normal"
page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
of consistency, from efi_fake_memmap() as well.

Note that for the latter, the memmap allocations cease to be page aligned.
This isn't needed though.

Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
Applicable to next-20170105.
Boot-tested with an efi_fake_mem= parameter on x86_64.

v2 of this patch got a

  Tested-by: Dan Williams <dan.j.williams@intel.com>


Changes to v2:
 - Use the new efi_memmap_alloc() from efi_fake_memmap(), too.
 - Update commit message accordingly.

 arch/x86/platform/efi/quirks.c  |  4 ++--
 drivers/firmware/efi/fake_mem.c |  3 +--
 drivers/firmware/efi/memmap.c   | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/efi.h             |  1 +
 4 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 10aca63a50d7..30031d5293c4 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -214,7 +214,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 
 	new_size = efi.memmap.desc_size * num_entries;
 
-	new_phys = memblock_alloc(new_size, 0);
+	new_phys = efi_memmap_alloc(num_entries);
 	if (!new_phys) {
 		pr_err("Could not allocate boot services memmap\n");
 		return;
@@ -355,7 +355,7 @@ void __init efi_free_boot_services(void)
 	}
 
 	new_size = efi.memmap.desc_size * num_entries;
-	new_phys = memblock_alloc(new_size, 0);
+	new_phys = efi_memmap_alloc(num_entries);
 	if (!new_phys) {
 		pr_err("Failed to allocate new EFI memmap\n");
 		return;
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 520a40e5e0e4..6c7d60c239b5 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -71,8 +71,7 @@ void __init efi_fake_memmap(void)
 	}
 
 	/* allocate memory for new EFI memmap */
-	new_memmap_phy = memblock_alloc(efi.memmap.desc_size * new_nr_map,
-					PAGE_SIZE);
+	new_memmap_phy = efi_memmap_alloc(new_nr_map);
 	if (!new_memmap_phy)
 		return;
 
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index f03ddecd232b..78686443cb37 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -9,6 +9,44 @@
 #include <linux/efi.h>
 #include <linux/io.h>
 #include <asm/early_ioremap.h>
+#include <linux/memblock.h>
+#include <linux/slab.h>
+
+static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size)
+{
+	return memblock_alloc(size, 0);
+}
+
+static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
+{
+	unsigned int order = get_order(size);
+	struct page *p = alloc_pages(GFP_KERNEL, order);
+
+	if (!p)
+		return 0;
+
+	return PFN_PHYS(page_to_pfn(p));
+}
+
+/**
+ * efi_memmap_alloc - Allocate memory for the EFI memory map
+ * @num_entries: Number of entries in the allocated map.
+ *
+ * Depending on whether mm_init() has already been invoked or not,
+ * either memblock or "normal" page allocation is used.
+ *
+ * Returns the physical address of the allocated memory map on
+ * success, zero on failure.
+ */
+phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+{
+	unsigned long size = num_entries * efi.memmap.desc_size;
+
+	if (slab_is_available())
+		return __efi_memmap_alloc_late(size);
+
+	return __efi_memmap_alloc_early(size);
+}
 
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
diff --git a/include/linux/efi.h b/include/linux/efi.h
index a07a476178cd..0c5420208c40 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -950,6 +950,7 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
+extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-05 12:51 [PATCH v3 1/2] x86/efi: don't allocate memmap through memblock after mm_init() Nicolai Stange
@ 2017-01-05 12:51 ` Nicolai Stange
  2017-01-06  8:35   ` Ard Biesheuvel
  2017-01-05 21:30 ` [PATCH v3 1/2] x86/efi: don't allocate memmap " Dan Williams
  2017-01-09  6:43 ` [tip:efi/urgent] x86/efi: Don't " tip-bot for Nicolai Stange
  2 siblings, 1 reply; 12+ messages in thread
From: Nicolai Stange @ 2017-01-05 12:51 UTC (permalink / raw)
  To: Ingo Molnar, Matt Fleming
  Cc: Thomas Gleixner, H. Peter Anvin, x86, Mika Penttilä,
	Dan Williams, Ard Biesheuvel, Dave Young, linux-efi,
	linux-kernel, Nicolai Stange

Before invoking the arch specific handler, efi_mem_reserve() reserves
the given memory region through memblock.

efi_mem_reserve() can get called after mm_init() though -- through
efi_bgrt_init(), for example. After mm_init(), memblock is dead and should
not be used anymore.

Let efi_mem_reserve() check whether memblock is dead and not do the
reservation if so. Emit a warning from the generic efi_arch mem_reserve()
in this case: if the architecture doesn't provide any other means of
registering the region as reserved, the operation would be a nop.

Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
Applicable to next-20170105.
No changes to v2.
Boot-tested on x86_64.

 drivers/firmware/efi/efi.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 92914801e388..158a8df2f4af 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -403,7 +403,10 @@ u64 __init efi_mem_desc_end(efi_memory_desc_t *md)
 	return end;
 }
 
-void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
+void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size)
+{
+	WARN(slab_is_available(), "efi_mem_reserve() has no effect");
+}
 
 /**
  * efi_mem_reserve - Reserve an EFI memory region
@@ -419,7 +422,7 @@ void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
  */
 void __init efi_mem_reserve(phys_addr_t addr, u64 size)
 {
-	if (!memblock_is_region_reserved(addr, size))
+	if (!slab_is_available() && !memblock_is_region_reserved(addr, size))
 		memblock_reserve(addr, size);
 
 	/*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] x86/efi: don't allocate memmap through memblock after mm_init()
  2017-01-05 12:51 [PATCH v3 1/2] x86/efi: don't allocate memmap through memblock after mm_init() Nicolai Stange
  2017-01-05 12:51 ` [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve " Nicolai Stange
@ 2017-01-05 21:30 ` Dan Williams
  2017-01-09  6:43 ` [tip:efi/urgent] x86/efi: Don't " tip-bot for Nicolai Stange
  2 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2017-01-05 21:30 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Ingo Molnar, Matt Fleming, Thomas Gleixner, H. Peter Anvin, x86,
	Mika Penttilä,
	Ard Biesheuvel, Dave Young, linux-efi, linux-kernel

On Thu, Jan 5, 2017 at 4:51 AM, Nicolai Stange <nicstange@gmail.com> wrote:
> With commit 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid
> copying image data"), efi_bgrt_init() calls into the memblock allocator
> through efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init()
> has been called.
>
> Indeed, KASAN reports a bad read access later on in
> efi_free_boot_services():
>
>   BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
>             at addr ffff88022de12740
>   Read of size 4 by task swapper/0/0
>   page:ffffea0008b78480 count:0 mapcount:-127
>   mapping:          (null) index:0x1 flags: 0x5fff8000000000()
>   [...]
>   Call Trace:
>    dump_stack+0x68/0x9f
>    kasan_report_error+0x4c8/0x500
>    kasan_report+0x58/0x60
>    __asan_load4+0x61/0x80
>    efi_free_boot_services+0xae/0x24c
>    start_kernel+0x527/0x562
>    x86_64_start_reservations+0x24/0x26
>    x86_64_start_kernel+0x157/0x17a
>    start_cpu+0x5/0x14
>
> The instruction at the given address is the first read from the memmap's
> memory, i.e. the read of md->type in efi_free_boot_services().
>
> Note that the writes earlier in efi_arch_mem_reserve() don't splat because
> they're done through early_memremap()ed addresses.
>
> So, after memblock is gone, allocations should be done through the "normal"
> page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
> it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
> of consistency, from efi_fake_memmap() as well.
>
> Note that for the latter, the memmap allocations cease to be page aligned.
> This isn't needed though.
>
> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
> ---
> Applicable to next-20170105.
> Boot-tested with an efi_fake_mem= parameter on x86_64.
>
> v2 of this patch got a
>
>   Tested-by: Dan Williams <dan.j.williams@intel.com>
>
>
> Changes to v2:
>  - Use the new efi_memmap_alloc() from efi_fake_memmap(), too.
>  - Update commit message accordingly.
>
>  arch/x86/platform/efi/quirks.c  |  4 ++--
>  drivers/firmware/efi/fake_mem.c |  3 +--
>  drivers/firmware/efi/memmap.c   | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/efi.h             |  1 +
>  4 files changed, 42 insertions(+), 4 deletions(-)

v3 works as well:

Tested-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-05 12:51 ` [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve " Nicolai Stange
@ 2017-01-06  8:35   ` Ard Biesheuvel
  2017-01-06 13:02     ` Nicolai Stange
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2017-01-06  8:35 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Ingo Molnar, Matt Fleming, Thomas Gleixner, H. Peter Anvin, x86,
	Mika Penttilä,
	Dan Williams, Dave Young, linux-efi, linux-kernel

On 5 January 2017 at 12:51, Nicolai Stange <nicstange@gmail.com> wrote:
> Before invoking the arch specific handler, efi_mem_reserve() reserves
> the given memory region through memblock.
>
> efi_mem_reserve() can get called after mm_init() though -- through
> efi_bgrt_init(), for example. After mm_init(), memblock is dead and should
> not be used anymore.
>
> Let efi_mem_reserve() check whether memblock is dead and not do the
> reservation if so. Emit a warning from the generic efi_arch mem_reserve()
> in this case: if the architecture doesn't provide any other means of
> registering the region as reserved, the operation would be a nop.
>
> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
> ---
> Applicable to next-20170105.
> No changes to v2.
> Boot-tested on x86_64.
>
>  drivers/firmware/efi/efi.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 92914801e388..158a8df2f4af 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -403,7 +403,10 @@ u64 __init efi_mem_desc_end(efi_memory_desc_t *md)
>         return end;
>  }
>
> -void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
> +void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size)
> +{
> +       WARN(slab_is_available(), "efi_mem_reserve() has no effect");
> +}
>
>  /**
>   * efi_mem_reserve - Reserve an EFI memory region
> @@ -419,7 +422,7 @@ void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>   */
>  void __init efi_mem_reserve(phys_addr_t addr, u64 size)
>  {
> -       if (!memblock_is_region_reserved(addr, size))
> +       if (!slab_is_available() && !memblock_is_region_reserved(addr, size))
>                 memblock_reserve(addr, size);
>

I share Dave's concern: on x86, this will silently ignore the
reservation if slab_is_available() returns true, so we should at least
warn here. I don't think this patch solves any known issues, so I'd
rather defer this for now, and pick up the discussion when Matt is
back,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-06  8:35   ` Ard Biesheuvel
@ 2017-01-06 13:02     ` Nicolai Stange
  2017-01-06 16:41       ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolai Stange @ 2017-01-06 13:02 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolai Stange, Ingo Molnar, Matt Fleming, Thomas Gleixner,
	H. Peter Anvin, x86, Mika Penttilä,
	Dan Williams, Dave Young, linux-efi, linux-kernel

Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:

> On 5 January 2017 at 12:51, Nicolai Stange <nicstange@gmail.com> wrote:
>> Before invoking the arch specific handler, efi_mem_reserve() reserves
>> the given memory region through memblock.
>>
>> efi_mem_reserve() can get called after mm_init() though -- through
>> efi_bgrt_init(), for example. After mm_init(), memblock is dead and should
>> not be used anymore.
>>
>> Let efi_mem_reserve() check whether memblock is dead and not do the
>> reservation if so. Emit a warning from the generic efi_arch mem_reserve()
>> in this case: if the architecture doesn't provide any other means of
>> registering the region as reserved, the operation would be a nop.
>>
>> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>> ---
>> Applicable to next-20170105.
>> No changes to v2.
>> Boot-tested on x86_64.
>>
>>  drivers/firmware/efi/efi.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
>> index 92914801e388..158a8df2f4af 100644
>> --- a/drivers/firmware/efi/efi.c
>> +++ b/drivers/firmware/efi/efi.c
>> @@ -403,7 +403,10 @@ u64 __init efi_mem_desc_end(efi_memory_desc_t *md)
>>         return end;
>>  }
>>
>> -void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>> +void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size)
>> +{
>> +       WARN(slab_is_available(), "efi_mem_reserve() has no effect");
>> +}
>>
>>  /**
>>   * efi_mem_reserve - Reserve an EFI memory region
>> @@ -419,7 +422,7 @@ void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>   */
>>  void __init efi_mem_reserve(phys_addr_t addr, u64 size)
>>  {
>> -       if (!memblock_is_region_reserved(addr, size))
>> +       if (!slab_is_available() && !memblock_is_region_reserved(addr, size))
>>                 memblock_reserve(addr, size);
>>

More context:

            /*
             * Some architectures (x86) reserve all boot services ranges
             * until efi_free_boot_services() because of buggy firmware
             * implementations. This means the above memblock_reserve() is
             * superfluous on x86 and instead what it needs to do is
             * ensure the @start, @size is not freed.
             */
            efi_arch_mem_reserve(addr, size);
    }


> I share Dave's concern: on x86, this will silently ignore the
> reservation if slab_is_available() returns true,

AFAICS, x86 has got an efi_arch_mem_reserve() which doesn't ignore the
reservation at any stage.

The default implementation of efi_arch_mem_reserve() used on ARM is
empty though ... 

> so we should at least warn here.

... and this patch adds a WARN() to the empty stub.


> I don't think this patch solves any known issues, so I'd
> rather defer this for now, and pick up the discussion when Matt is
> back,

I'm fine with either way and yes, no splat has been observed in the
wild.

Just to make it explicit: the issue addressed here is a potential
use-after-free (both, read and write) on memblock.reserved.regions in
case of CONFIG_ARCH_DISCARD_MEMBLOCK=y. It would certainly make sense to
clarify the commit description in the next iteration...

Thanks,

Nicolai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-06 13:02     ` Nicolai Stange
@ 2017-01-06 16:41       ` Ard Biesheuvel
  2017-01-06 17:46         ` Nicolai Stange
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2017-01-06 16:41 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Ingo Molnar, Matt Fleming, Thomas Gleixner, H. Peter Anvin, x86,
	Mika Penttilä,
	Dan Williams, Dave Young, linux-efi, linux-kernel

On 6 January 2017 at 13:02, Nicolai Stange <nicstange@gmail.com> wrote:
> Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:
>
>> On 5 January 2017 at 12:51, Nicolai Stange <nicstange@gmail.com> wrote:
>>> Before invoking the arch specific handler, efi_mem_reserve() reserves
>>> the given memory region through memblock.
>>>
>>> efi_mem_reserve() can get called after mm_init() though -- through
>>> efi_bgrt_init(), for example. After mm_init(), memblock is dead and should
>>> not be used anymore.
>>>
>>> Let efi_mem_reserve() check whether memblock is dead and not do the
>>> reservation if so. Emit a warning from the generic efi_arch mem_reserve()
>>> in this case: if the architecture doesn't provide any other means of
>>> registering the region as reserved, the operation would be a nop.
>>>
>>> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
>>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>>> ---
>>> Applicable to next-20170105.
>>> No changes to v2.
>>> Boot-tested on x86_64.
>>>
>>>  drivers/firmware/efi/efi.c | 7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
>>> index 92914801e388..158a8df2f4af 100644
>>> --- a/drivers/firmware/efi/efi.c
>>> +++ b/drivers/firmware/efi/efi.c
>>> @@ -403,7 +403,10 @@ u64 __init efi_mem_desc_end(efi_memory_desc_t *md)
>>>         return end;
>>>  }
>>>
>>> -void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>> +void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size)
>>> +{
>>> +       WARN(slab_is_available(), "efi_mem_reserve() has no effect");
>>> +}
>>>
>>>  /**
>>>   * efi_mem_reserve - Reserve an EFI memory region
>>> @@ -419,7 +422,7 @@ void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>>   */
>>>  void __init efi_mem_reserve(phys_addr_t addr, u64 size)
>>>  {
>>> -       if (!memblock_is_region_reserved(addr, size))
>>> +       if (!slab_is_available() && !memblock_is_region_reserved(addr, size))
>>>                 memblock_reserve(addr, size);
>>>
>
> More context:
>
>             /*
>              * Some architectures (x86) reserve all boot services ranges
>              * until efi_free_boot_services() because of buggy firmware
>              * implementations. This means the above memblock_reserve() is
>              * superfluous on x86 and instead what it needs to do is
>              * ensure the @start, @size is not freed.
>              */
>             efi_arch_mem_reserve(addr, size);
>     }
>
>
>> I share Dave's concern: on x86, this will silently ignore the
>> reservation if slab_is_available() returns true,
>
> AFAICS, x86 has got an efi_arch_mem_reserve() which doesn't ignore the
> reservation at any stage.
>

Thanks for the clarification. But my concern is whether changing the
EFI memory map is going to have any effect at this stage, i.e., after
slab_is_available() returns true: haven't we already communicated to
the kernel which RAM regions it may allocate from? How does it know
the memory map has changed, and how do we ensure that it has not
already allocated from the region we are reserving here?

> The default implementation of efi_arch_mem_reserve() used on ARM is
> empty though ...
>
>> so we should at least warn here.
>
> ... and this patch adds a WARN() to the empty stub.
>
>
>> I don't think this patch solves any known issues, so I'd
>> rather defer this for now, and pick up the discussion when Matt is
>> back,
>
> I'm fine with either way and yes, no splat has been observed in the
> wild.
>
> Just to make it explicit: the issue addressed here is a potential
> use-after-free (both, read and write) on memblock.reserved.regions in
> case of CONFIG_ARCH_DISCARD_MEMBLOCK=y. It would certainly make sense to
> clarify the commit description in the next iteration...
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-06 16:41       ` Ard Biesheuvel
@ 2017-01-06 17:46         ` Nicolai Stange
  2017-01-06 19:28           ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolai Stange @ 2017-01-06 17:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolai Stange, Ingo Molnar, Matt Fleming, Thomas Gleixner,
	H. Peter Anvin, x86, Mika Penttilä,
	Dan Williams, Dave Young, linux-efi, linux-kernel

Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:

> On 6 January 2017 at 13:02, Nicolai Stange <nicstange@gmail.com> wrote:
>> Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:
>>
>>> On 5 January 2017 at 12:51, Nicolai Stange <nicstange@gmail.com> wrote:
>>>> Before invoking the arch specific handler, efi_mem_reserve() reserves
>>>> the given memory region through memblock.
>>>>
>>>> efi_mem_reserve() can get called after mm_init() though -- through
>>>> efi_bgrt_init(), for example. After mm_init(), memblock is dead and should
>>>> not be used anymore.
>>>>
>>>> Let efi_mem_reserve() check whether memblock is dead and not do the
>>>> reservation if so. Emit a warning from the generic efi_arch mem_reserve()
>>>> in this case: if the architecture doesn't provide any other means of
>>>> registering the region as reserved, the operation would be a nop.
>>>>
>>>> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
>>>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>>>> ---
>>>> Applicable to next-20170105.
>>>> No changes to v2.
>>>> Boot-tested on x86_64.
>>>>
>>>>  drivers/firmware/efi/efi.c | 7 +++++--
>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
>>>> index 92914801e388..158a8df2f4af 100644
>>>> --- a/drivers/firmware/efi/efi.c
>>>> +++ b/drivers/firmware/efi/efi.c
>>>> @@ -403,7 +403,10 @@ u64 __init efi_mem_desc_end(efi_memory_desc_t *md)
>>>>         return end;
>>>>  }
>>>>
>>>> -void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>>> +void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size)
>>>> +{
>>>> +       WARN(slab_is_available(), "efi_mem_reserve() has no effect");
>>>> +}
>>>>
>>>>  /**
>>>>   * efi_mem_reserve - Reserve an EFI memory region
>>>> @@ -419,7 +422,7 @@ void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>>>   */
>>>>  void __init efi_mem_reserve(phys_addr_t addr, u64 size)
>>>>  {
>>>> -       if (!memblock_is_region_reserved(addr, size))
>>>> +       if (!slab_is_available() && !memblock_is_region_reserved(addr, size))
>>>>                 memblock_reserve(addr, size);
>>>>
>>
>> More context:
>>
>>             /*
>>              * Some architectures (x86) reserve all boot services ranges
>>              * until efi_free_boot_services() because of buggy firmware
>>              * implementations. This means the above memblock_reserve() is
>>              * superfluous on x86 and instead what it needs to do is
>>              * ensure the @start, @size is not freed.
>>              */
>>             efi_arch_mem_reserve(addr, size);
>>     }
>>
>>
>>> I share Dave's concern: on x86, this will silently ignore the
>>> reservation if slab_is_available() returns true,
>>
>> AFAICS, x86 has got an efi_arch_mem_reserve() which doesn't ignore the
>> reservation at any stage.
>>
>
> Thanks for the clarification. But my concern is whether changing the
> EFI memory map is going to have any effect at this stage, i.e., after
> slab_is_available() returns true: haven't we already communicated to
> the kernel which RAM regions it may allocate from? How does it know
> the memory map has changed, and how do we ensure that it has not
> already allocated from the region we are reserving here?

Ah, I see what you mean. I think it works like this on x86:

All EFI_BOOT_SERVICES_* regions as reported by the firmware are marked
as reserved at memblock unconditionally through the early setup_arch()
=> efi_reserve_boot_services(). This prevents these from getting handed
over to the "normal" kernel MM until efi_free_boot_services()
gets called later on. The latter frees these EFI_BOOT_SERVICES_* regions,
but only if their EFI_MEMORY_RUNTIME flag is not set.

Now, efi_arch_mem_reserve() basically just sets the EFI_MEMORY_RUNTIME
flag, allowing the given region to survive beyond efi_free_boot_services().

Corrolary 1: any efi_mem_reserve() after efi_free_boot_services wouldn't
have any effect.

Corollary 2: anything handed to efi_arch_mem_reserve() must live within
some memory region which had been reported by firmware already.

Indeed, at its very top, there is

  if (efi_mem_desc_lookup(addr, &md)) {
    pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
    return;
  }

  if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
  	pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
  	return;
  }

For further information, the comment at the x86's efi_arch_mem_reserve()
might be helpful.


I hope this is correct and helps.

Thanks,

Nicolai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-06 17:46         ` Nicolai Stange
@ 2017-01-06 19:28           ` Ard Biesheuvel
  2017-01-08  0:24             ` Nicolai Stange
  2017-01-09 13:00             ` Matt Fleming
  0 siblings, 2 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2017-01-06 19:28 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Ingo Molnar, Matt Fleming, Thomas Gleixner, H. Peter Anvin, x86,
	Mika Penttilä,
	Dan Williams, Dave Young, linux-efi, linux-kernel

On 6 January 2017 at 17:46, Nicolai Stange <nicstange@gmail.com> wrote:
> Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:
>
>> On 6 January 2017 at 13:02, Nicolai Stange <nicstange@gmail.com> wrote:
>>> Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:
>>>
>>>> On 5 January 2017 at 12:51, Nicolai Stange <nicstange@gmail.com> wrote:
>>>>> Before invoking the arch specific handler, efi_mem_reserve() reserves
>>>>> the given memory region through memblock.
>>>>>
>>>>> efi_mem_reserve() can get called after mm_init() though -- through
>>>>> efi_bgrt_init(), for example. After mm_init(), memblock is dead and should
>>>>> not be used anymore.
>>>>>
>>>>> Let efi_mem_reserve() check whether memblock is dead and not do the
>>>>> reservation if so. Emit a warning from the generic efi_arch mem_reserve()
>>>>> in this case: if the architecture doesn't provide any other means of
>>>>> registering the region as reserved, the operation would be a nop.
>>>>>
>>>>> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
>>>>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>>>>> ---
>>>>> Applicable to next-20170105.
>>>>> No changes to v2.
>>>>> Boot-tested on x86_64.
>>>>>
>>>>>  drivers/firmware/efi/efi.c | 7 +++++--
>>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
>>>>> index 92914801e388..158a8df2f4af 100644
>>>>> --- a/drivers/firmware/efi/efi.c
>>>>> +++ b/drivers/firmware/efi/efi.c
>>>>> @@ -403,7 +403,10 @@ u64 __init efi_mem_desc_end(efi_memory_desc_t *md)
>>>>>         return end;
>>>>>  }
>>>>>
>>>>> -void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>>>> +void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size)
>>>>> +{
>>>>> +       WARN(slab_is_available(), "efi_mem_reserve() has no effect");
>>>>> +}
>>>>>
>>>>>  /**
>>>>>   * efi_mem_reserve - Reserve an EFI memory region
>>>>> @@ -419,7 +422,7 @@ void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>>>>   */
>>>>>  void __init efi_mem_reserve(phys_addr_t addr, u64 size)
>>>>>  {
>>>>> -       if (!memblock_is_region_reserved(addr, size))
>>>>> +       if (!slab_is_available() && !memblock_is_region_reserved(addr, size))
>>>>>                 memblock_reserve(addr, size);
>>>>>
>>>
>>> More context:
>>>
>>>             /*
>>>              * Some architectures (x86) reserve all boot services ranges
>>>              * until efi_free_boot_services() because of buggy firmware
>>>              * implementations. This means the above memblock_reserve() is
>>>              * superfluous on x86 and instead what it needs to do is
>>>              * ensure the @start, @size is not freed.
>>>              */
>>>             efi_arch_mem_reserve(addr, size);
>>>     }
>>>
>>>
>>>> I share Dave's concern: on x86, this will silently ignore the
>>>> reservation if slab_is_available() returns true,
>>>
>>> AFAICS, x86 has got an efi_arch_mem_reserve() which doesn't ignore the
>>> reservation at any stage.
>>>
>>
>> Thanks for the clarification. But my concern is whether changing the
>> EFI memory map is going to have any effect at this stage, i.e., after
>> slab_is_available() returns true: haven't we already communicated to
>> the kernel which RAM regions it may allocate from? How does it know
>> the memory map has changed, and how do we ensure that it has not
>> already allocated from the region we are reserving here?
>
> Ah, I see what you mean. I think it works like this on x86:
>
> All EFI_BOOT_SERVICES_* regions as reported by the firmware are marked
> as reserved at memblock unconditionally through the early setup_arch()
> => efi_reserve_boot_services(). This prevents these from getting handed
> over to the "normal" kernel MM until efi_free_boot_services()
> gets called later on. The latter frees these EFI_BOOT_SERVICES_* regions,
> but only if their EFI_MEMORY_RUNTIME flag is not set.
>
> Now, efi_arch_mem_reserve() basically just sets the EFI_MEMORY_RUNTIME
> flag, allowing the given region to survive beyond efi_free_boot_services().
>
> Corrolary 1: any efi_mem_reserve() after efi_free_boot_services wouldn't
> have any effect.
>

This is my point exactly. But it appears efi_free_boot_services()
occurs much later than I thought, and so there is a sizabe time window
where SLAB is up but reservations can still be made. But we don't
check whether efi_free_boot_services() has been called. Another
problem is that we never check that the reservation is covered by a
BootServicesData region, which are the only ones that are guaranteed
to be retained up to this point.

> Corollary 2: anything handed to efi_arch_mem_reserve() must live within
> some memory region which had been reported by firmware already.
>

Yes, but the EFI memory map describes all of RAM, so this is not an
issue by itself. The issue is that the region must have been covered
by a BootServicesCode or BootServicesData region

> Indeed, at its very top, there is
>
>   if (efi_mem_desc_lookup(addr, &md)) {
>     pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
>     return;
>   }
>
>   if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
>         pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
>         return;
>   }
>
> For further information, the comment at the x86's efi_arch_mem_reserve()
> might be helpful.
>
>
> I hope this is correct and helps.
>
> Thanks,
>
> Nicolai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-06 19:28           ` Ard Biesheuvel
@ 2017-01-08  0:24             ` Nicolai Stange
  2017-01-09 13:07               ` Matt Fleming
  2017-01-09 13:00             ` Matt Fleming
  1 sibling, 1 reply; 12+ messages in thread
From: Nicolai Stange @ 2017-01-08  0:24 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolai Stange, Ingo Molnar, Matt Fleming, Thomas Gleixner,
	H. Peter Anvin, x86, Mika Penttilä,
	Dan Williams, Dave Young, linux-efi, linux-kernel

Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:

> On 6 January 2017 at 17:46, Nicolai Stange <nicstange@gmail.com> wrote:
>> Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:
>>
>>> On 6 January 2017 at 13:02, Nicolai Stange <nicstange@gmail.com> wrote:
>>>> Ard Biesheuvel <ard.biesheuvel@linaro.org> writes:
>>>>
>>>>> On 5 January 2017 at 12:51, Nicolai Stange <nicstange@gmail.com> wrote:
>>>>>> Before invoking the arch specific handler, efi_mem_reserve() reserves
>>>>>> the given memory region through memblock.
>>>>>>
>>>>>> efi_mem_reserve() can get called after mm_init() though -- through
>>>>>> efi_bgrt_init(), for example. After mm_init(), memblock is dead and should
>>>>>> not be used anymore.
>>>>>>
>>>>>> Let efi_mem_reserve() check whether memblock is dead and not do the
>>>>>> reservation if so. Emit a warning from the generic efi_arch mem_reserve()
>>>>>> in this case: if the architecture doesn't provide any other means of
>>>>>> registering the region as reserved, the operation would be a nop.
>>>>>>
>>>>>> Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
>>>>>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>>>>>> ---
>>>>>> Applicable to next-20170105.
>>>>>> No changes to v2.
>>>>>> Boot-tested on x86_64.
>>>>>>
>>>>>>  drivers/firmware/efi/efi.c | 7 +++++--
>>>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
>>>>>> index 92914801e388..158a8df2f4af 100644
>>>>>> --- a/drivers/firmware/efi/efi.c
>>>>>> +++ b/drivers/firmware/efi/efi.c
>>>>>> @@ -403,7 +403,10 @@ u64 __init efi_mem_desc_end(efi_memory_desc_t *md)
>>>>>>         return end;
>>>>>>  }
>>>>>>
>>>>>> -void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>>>>> +void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size)
>>>>>> +{
>>>>>> +       WARN(slab_is_available(), "efi_mem_reserve() has no effect");
>>>>>> +}
>>>>>>
>>>>>>  /**
>>>>>>   * efi_mem_reserve - Reserve an EFI memory region
>>>>>> @@ -419,7 +422,7 @@ void __init __weak efi_arch_mem_reserve(phys_addr_t addr, u64 size) {}
>>>>>>   */
>>>>>>  void __init efi_mem_reserve(phys_addr_t addr, u64 size)
>>>>>>  {
>>>>>> -       if (!memblock_is_region_reserved(addr, size))
>>>>>> +       if (!slab_is_available() && !memblock_is_region_reserved(addr, size))
>>>>>>                 memblock_reserve(addr, size);
>>>>>>
>>>>
>>>> More context:
>>>>
>>>>             /*
>>>>              * Some architectures (x86) reserve all boot services ranges
>>>>              * until efi_free_boot_services() because of buggy firmware
>>>>              * implementations. This means the above memblock_reserve() is
>>>>              * superfluous on x86 and instead what it needs to do is
>>>>              * ensure the @start, @size is not freed.
>>>>              */
>>>>             efi_arch_mem_reserve(addr, size);
>>>>     }
>>>>
>>>>
>>>>> I share Dave's concern: on x86, this will silently ignore the
>>>>> reservation if slab_is_available() returns true,
>>>>
>>>> AFAICS, x86 has got an efi_arch_mem_reserve() which doesn't ignore the
>>>> reservation at any stage.
>>>>
>>>
>>> Thanks for the clarification. But my concern is whether changing the
>>> EFI memory map is going to have any effect at this stage, i.e., after
>>> slab_is_available() returns true: haven't we already communicated to
>>> the kernel which RAM regions it may allocate from? How does it know
>>> the memory map has changed, and how do we ensure that it has not
>>> already allocated from the region we are reserving here?
>>
>> Ah, I see what you mean. I think it works like this on x86:
>>
>> All EFI_BOOT_SERVICES_* regions as reported by the firmware are marked
>> as reserved at memblock unconditionally through the early setup_arch()
>> => efi_reserve_boot_services(). This prevents these from getting handed
>> over to the "normal" kernel MM until efi_free_boot_services()
>> gets called later on. The latter frees these EFI_BOOT_SERVICES_* regions,
>> but only if their EFI_MEMORY_RUNTIME flag is not set.
>>
>> Now, efi_arch_mem_reserve() basically just sets the EFI_MEMORY_RUNTIME
>> flag, allowing the given region to survive beyond efi_free_boot_services().
>>
>> Corrolary 1: any efi_mem_reserve() after efi_free_boot_services wouldn't
>> have any effect.
>>
>
> This is my point exactly. But it appears efi_free_boot_services()
> occurs much later than I thought, and so there is a sizabe time window
> where SLAB is up but reservations can still be made. But we don't
> check whether efi_free_boot_services() has been called.

Ok, but this patch is about slab_is_available()/resp. the
non-availability of memblock and I'd rather consider the implementation
of these kinds of safety checks as being a different story?



Out of curiosity, I had a deeper look at the BootServices*-md
requirement though:

> Another problem is that we never check that the reservation is covered
> by a BootServicesData region, which are the only ones that are
> guaranteed to be retained up to this point.

I think the "only ones that are guaranteed to be retained" part might
not be completely correct: at least my firmware seems to report only the
EFI_CONVENTIONAL_MEMORY, EFI_LOADER_DATA, EFI_LOADER_CODE,
EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA as E820_RAM
(I think that these mappings are dictated by table 15-330 of ACPI 6.1:
"UEFI Memory Types and mapping to ACPI address range types").

This would mean, that memblock_x86_fill() adds only these regions to
memblock.memory.

free_all_bootmem() only operates on the (non-highmem) regions given by
memblock.memory and thus, any region of a type different from the ones
listed above would never get freed to the buddy allocator anyway, AFAICS.

Thus, the only md type where ranges efi_mem_reserve()'d therein aren't
retained are EFI_CONVENTIONAL_MEMORY, EFI_LOADER_DATA and
EFI_LOADER_CODE (and possibly highmem). Hopefully, nobody would ever
call efi_mem_reserve() on such a range but that assumption might be
wrong.


>> Corollary 2: anything handed to efi_arch_mem_reserve() must live within
>> some memory region which had been reported by firmware already.
>>
>
> Yes, but the EFI memory map describes all of RAM, so this is not an
> issue by itself.

Ok. But I've just realized that after __efi_enter_virtual_mode(),
efi_map_regions() would have stripped that list down to only those MDs
for which should_map_region() returns true. With efi_is_native(), that's
everything having EFI_MEMORY_RUNTIME set and the BOOT_SERVICES_*
regions.


Thanks,

Nicolai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip:efi/urgent] x86/efi: Don't allocate memmap through memblock after mm_init()
  2017-01-05 12:51 [PATCH v3 1/2] x86/efi: don't allocate memmap through memblock after mm_init() Nicolai Stange
  2017-01-05 12:51 ` [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve " Nicolai Stange
  2017-01-05 21:30 ` [PATCH v3 1/2] x86/efi: don't allocate memmap " Dan Williams
@ 2017-01-09  6:43 ` tip-bot for Nicolai Stange
  2 siblings, 0 replies; 12+ messages in thread
From: tip-bot for Nicolai Stange @ 2017-01-09  6:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: nicstange, tglx, linux-kernel, ard.biesheuvel, mingo, hpa,
	peterz, dyoung, dan.j.williams, torvalds, matt, mika.penttila

Commit-ID:  20b1e22d01a4b0b11d3a1066e9feb04be38607ec
Gitweb:     http://git.kernel.org/tip/20b1e22d01a4b0b11d3a1066e9feb04be38607ec
Author:     Nicolai Stange <nicstange@gmail.com>
AuthorDate: Thu, 5 Jan 2017 13:51:29 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 7 Jan 2017 08:58:07 +0100

x86/efi: Don't allocate memmap through memblock after mm_init()

With the following commit:

  4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")

...  efi_bgrt_init() calls into the memblock allocator through
efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called.

Indeed, KASAN reports a bad read access later on in efi_free_boot_services():

  BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
            at addr ffff88022de12740
  Read of size 4 by task swapper/0/0
  page:ffffea0008b78480 count:0 mapcount:-127
  mapping:          (null) index:0x1 flags: 0x5fff8000000000()
  [...]
  Call Trace:
   dump_stack+0x68/0x9f
   kasan_report_error+0x4c8/0x500
   kasan_report+0x58/0x60
   __asan_load4+0x61/0x80
   efi_free_boot_services+0xae/0x24c
   start_kernel+0x527/0x562
   x86_64_start_reservations+0x24/0x26
   x86_64_start_kernel+0x157/0x17a
   start_cpu+0x5/0x14

The instruction at the given address is the first read from the memmap's
memory, i.e. the read of md->type in efi_free_boot_services().

Note that the writes earlier in efi_arch_mem_reserve() don't splat because
they're done through early_memremap()ed addresses.

So, after memblock is gone, allocations should be done through the "normal"
page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
of consistency, from efi_fake_memmap() as well.

Note that for the latter, the memmap allocations cease to be page aligned.
This isn't needed though.

Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org> # v4.9
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/platform/efi/quirks.c  |  4 ++--
 drivers/firmware/efi/fake_mem.c |  3 +--
 drivers/firmware/efi/memmap.c   | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/efi.h             |  1 +
 4 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 10aca63..30031d5 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -214,7 +214,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 
 	new_size = efi.memmap.desc_size * num_entries;
 
-	new_phys = memblock_alloc(new_size, 0);
+	new_phys = efi_memmap_alloc(num_entries);
 	if (!new_phys) {
 		pr_err("Could not allocate boot services memmap\n");
 		return;
@@ -355,7 +355,7 @@ void __init efi_free_boot_services(void)
 	}
 
 	new_size = efi.memmap.desc_size * num_entries;
-	new_phys = memblock_alloc(new_size, 0);
+	new_phys = efi_memmap_alloc(num_entries);
 	if (!new_phys) {
 		pr_err("Failed to allocate new EFI memmap\n");
 		return;
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 520a40e..6c7d60c 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -71,8 +71,7 @@ void __init efi_fake_memmap(void)
 	}
 
 	/* allocate memory for new EFI memmap */
-	new_memmap_phy = memblock_alloc(efi.memmap.desc_size * new_nr_map,
-					PAGE_SIZE);
+	new_memmap_phy = efi_memmap_alloc(new_nr_map);
 	if (!new_memmap_phy)
 		return;
 
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index f03ddec..7868644 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -9,6 +9,44 @@
 #include <linux/efi.h>
 #include <linux/io.h>
 #include <asm/early_ioremap.h>
+#include <linux/memblock.h>
+#include <linux/slab.h>
+
+static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size)
+{
+	return memblock_alloc(size, 0);
+}
+
+static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size)
+{
+	unsigned int order = get_order(size);
+	struct page *p = alloc_pages(GFP_KERNEL, order);
+
+	if (!p)
+		return 0;
+
+	return PFN_PHYS(page_to_pfn(p));
+}
+
+/**
+ * efi_memmap_alloc - Allocate memory for the EFI memory map
+ * @num_entries: Number of entries in the allocated map.
+ *
+ * Depending on whether mm_init() has already been invoked or not,
+ * either memblock or "normal" page allocation is used.
+ *
+ * Returns the physical address of the allocated memory map on
+ * success, zero on failure.
+ */
+phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+{
+	unsigned long size = num_entries * efi.memmap.desc_size;
+
+	if (slab_is_available())
+		return __efi_memmap_alloc_late(size);
+
+	return __efi_memmap_alloc_early(size);
+}
 
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
diff --git a/include/linux/efi.h b/include/linux/efi.h
index a07a476..0c54202 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -950,6 +950,7 @@ static inline efi_status_t efi_query_variable_store(u32 attributes,
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
+extern phys_addr_t __init efi_memmap_alloc(unsigned int num_entries);
 extern int __init efi_memmap_init_early(struct efi_memory_map_data *data);
 extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size);
 extern void __init efi_memmap_unmap(void);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-06 19:28           ` Ard Biesheuvel
  2017-01-08  0:24             ` Nicolai Stange
@ 2017-01-09 13:00             ` Matt Fleming
  1 sibling, 0 replies; 12+ messages in thread
From: Matt Fleming @ 2017-01-09 13:00 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolai Stange, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	x86, Mika Penttilä,
	Dan Williams, Dave Young, linux-efi, linux-kernel

On Fri, 06 Jan, at 07:28:40PM, Ard Biesheuvel wrote:
> 
> This is my point exactly. But it appears efi_free_boot_services()
> occurs much later than I thought, and so there is a sizabe time window
> where SLAB is up but reservations can still be made. But we don't
> check whether efi_free_boot_services() has been called.

True. This has only been correct thus far because all code has been
audited, but adding a check to catch future offenders is a good idea.

> Another problem is that we never check that the reservation is
> covered by a BootServicesData region, which are the only ones that
> are guaranteed to be retained up to this point.

The runtime regions are guaranteed to be retained too.

Again, this shouldn't actually be a problem today, but the potential
for breakage here warrants some kind of check and loud warning. 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve through memblock after mm_init()
  2017-01-08  0:24             ` Nicolai Stange
@ 2017-01-09 13:07               ` Matt Fleming
  0 siblings, 0 replies; 12+ messages in thread
From: Matt Fleming @ 2017-01-09 13:07 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Ard Biesheuvel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	x86, Mika Penttilä,
	Dan Williams, Dave Young, linux-efi, linux-kernel

On Sun, 08 Jan, at 01:24:49AM, Nicolai Stange wrote:
> 
> Out of curiosity, I had a deeper look at the BootServices*-md
> requirement though:
> 
> > Another problem is that we never check that the reservation is covered
> > by a BootServicesData region, which are the only ones that are
> > guaranteed to be retained up to this point.
> 
> I think the "only ones that are guaranteed to be retained" part might
> not be completely correct: at least my firmware seems to report only the
> EFI_CONVENTIONAL_MEMORY, EFI_LOADER_DATA, EFI_LOADER_CODE,
> EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA as E820_RAM
> (I think that these mappings are dictated by table 15-330 of ACPI 6.1:
> "UEFI Memory Types and mapping to ACPI address range types").
> 
> This would mean, that memblock_x86_fill() adds only these regions to
> memblock.memory.
 
Data required at runtime should only be in EFI_LOADER* regions if it's
part of some setup_data object (see things like SETUP_EFI), and
subsequently has been memblock_reserve()'d at some point.

Nothing valuable should be in EFI_CONVENTIONAL_MEMORY because, by
definition, it's free memory. 

> free_all_bootmem() only operates on the (non-highmem) regions given by
> memblock.memory and thus, any region of a type different from the ones
> listed above would never get freed to the buddy allocator anyway, AFAICS.
 
This is true.

> Thus, the only md type where ranges efi_mem_reserve()'d therein aren't
> retained are EFI_CONVENTIONAL_MEMORY, EFI_LOADER_DATA and
> EFI_LOADER_CODE (and possibly highmem). Hopefully, nobody would ever
> call efi_mem_reserve() on such a range but that assumption might be
> wrong.

I would happily welcome some diagnostic checks to ensure we never get
silently stung by this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-01-09 13:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-05 12:51 [PATCH v3 1/2] x86/efi: don't allocate memmap through memblock after mm_init() Nicolai Stange
2017-01-05 12:51 ` [PATCH v3 2/2] efi: efi_mem_reserve(): don't reserve " Nicolai Stange
2017-01-06  8:35   ` Ard Biesheuvel
2017-01-06 13:02     ` Nicolai Stange
2017-01-06 16:41       ` Ard Biesheuvel
2017-01-06 17:46         ` Nicolai Stange
2017-01-06 19:28           ` Ard Biesheuvel
2017-01-08  0:24             ` Nicolai Stange
2017-01-09 13:07               ` Matt Fleming
2017-01-09 13:00             ` Matt Fleming
2017-01-05 21:30 ` [PATCH v3 1/2] x86/efi: don't allocate memmap " Dan Williams
2017-01-09  6:43 ` [tip:efi/urgent] x86/efi: Don't " tip-bot for Nicolai Stange

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).