linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] arm64: kexec_file_load vs memory reservations
@ 2021-04-29 13:35 Marc Zyngier
  2021-04-29 13:35 ` [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations Marc Zyngier
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Marc Zyngier @ 2021-04-29 13:35 UTC (permalink / raw)
  To: kexec, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Mark Rutland,
	James Morse, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Eric Biederman, Bhupesh Sharma, AKASHI Takahiro, kernel-team

It recently became apparent that using kexec with kexec_file_load() on
arm64 is pretty similar to playing Russian roulette.

Depending on the amount of memory, the HW supported and the firmware
interface used, your secondary kernel may overwrite critical memory
regions without which the secondary kernel cannot boot (the GICv3 LPI
tables being a prime example of such reserved regions).

It turns out that there is at least two ways for reserved memory
regions to be described to kexec: /proc/iomem for the userspace
implementation, and memblock.reserved for kexec_file. And of course,
our LPI tables are only reserved using the resource tree, leading to
the aforementioned stamping. Similar things could happen with ACPI
tables as well.

On my 24xA53 system artificially limited to 256MB of RAM (yes, it
boots with that little memory), trying to kexec a secondary kernel
failed every times. I can only presume that this was mostly tested
using kdump, which preserves the entire kernel memory range.

This small series aims at triggering a discussion on what are the
expectations for kexec_file, and whether we should unify the two
reservation mechanisms.

And in the meantime, it gets things going...

Marc Zyngier (2):
  firmware/efi: Tell memblock about EFI reservations
  ACPI: arm64: Reserve the ACPI tables in memblock

 arch/arm64/kernel/acpi.c   |  1 +
 drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

-- 
2.29.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations
  2021-04-29 13:35 [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
@ 2021-04-29 13:35 ` Marc Zyngier
  2021-05-03 18:56   ` Moritz Fischer
  2021-04-29 13:35 ` [PATCH 2/2] ACPI: arm64: Reserve the ACPI tables in memblock Marc Zyngier
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2021-04-29 13:35 UTC (permalink / raw)
  To: kexec, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Mark Rutland,
	James Morse, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Eric Biederman, Bhupesh Sharma, AKASHI Takahiro, kernel-team

kexec_load_file() relies on the memblock infrastructure to avoid
stamping over regions of memory that are essential to the survival
of the system.

However, nobody seems to agree how to flag these regions as reserved,
and (for example) EFI only publishes its reservations in /proc/iomem
for the benefit of the traditional, userspace based kexec tool.

On arm64 platforms with GICv3, this can result in the payload being
placed at the location of the LPI tables. Shock, horror!

Let's augment the EFI reservation code with a memblock_reserve() call,
protecting our dear tables from the secondary kernel invasion.

At some point, someone will have to go and figure out a way to unify
these multiple reservation trees, because sprinkling random reservation
calls is only a temporary workaround.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 4b7ee3fa9224..026b02f5f7d8 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -896,11 +896,25 @@ static int __init efi_memreserve_map_root(void)
 static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
 {
 	struct resource *res, *parent;
+	int ret;
 
 	res = kzalloc(sizeof(struct resource), GFP_ATOMIC);
 	if (!res)
 		return -ENOMEM;
 
+	/*
+	 * Given that efi_mem_reserve_iomem() can be called at any
+	 * time, only call memblock_reserve() if the architecture
+	 * keeps the infrastructure around.
+	 */
+	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
+		ret = memblock_reserve(addr, size);
+		if (ret) {
+			kfree(res);
+			return ret;
+		}
+	}
+
 	res->name	= "reserved";
 	res->flags	= IORESOURCE_MEM;
 	res->start	= addr;
@@ -908,7 +922,14 @@ static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
 
 	/* we expect a conflict with a 'System RAM' region */
 	parent = request_resource_conflict(&iomem_resource, res);
-	return parent ? request_resource(parent, res) : 0;
+	ret = parent ? request_resource(parent, res) : 0;
+	if (ret) {
+		kfree(res);
+		if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
+			memblock_free(addr, size);
+	}
+
+	return ret;
 }
 
 int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 size)
-- 
2.29.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] ACPI: arm64: Reserve the ACPI tables in memblock
  2021-04-29 13:35 [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
  2021-04-29 13:35 ` [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations Marc Zyngier
@ 2021-04-29 13:35 ` Marc Zyngier
  2021-05-03 18:57   ` Moritz Fischer
  2021-05-12 18:04 ` [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2021-04-29 13:35 UTC (permalink / raw)
  To: kexec, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Mark Rutland,
	James Morse, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Eric Biederman, Bhupesh Sharma, AKASHI Takahiro, kernel-team

Just like the EFI reservations, ACPI tables can be stamped over
by kexec_file_load(), and the secondary kernel will be unable to
recover from such corruption.

It looks like our x86 friends have been there before, and have
recently added some infratructure that does what we need since
1a1c130ab757 ("ACPI: tables: x86: Reserve memory occupied by ACPI
tables"), though it appears to exist for different reasons.

Let's call into acpi_reserve_initial_tables() early so that
our tables are protected from the big bad kexec.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kernel/acpi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index cada0b816c8a..5b5406c92ee4 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -233,6 +233,7 @@ void __init acpi_boot_table_init(void)
 		if (earlycon_acpi_spcr_enable)
 			early_init_dt_scan_chosen_stdout();
 	} else {
+		acpi_reserve_initial_tables();
 		acpi_parse_spcr(earlycon_acpi_spcr_enable, true);
 		if (IS_ENABLED(CONFIG_ACPI_BGRT))
 			acpi_table_parse(ACPI_SIG_BGRT, acpi_parse_bgrt);
-- 
2.29.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations
  2021-04-29 13:35 ` [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations Marc Zyngier
@ 2021-05-03 18:56   ` Moritz Fischer
  2021-05-13  3:20     ` Dave Young
  0 siblings, 1 reply; 17+ messages in thread
From: Moritz Fischer @ 2021-05-03 18:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kexec, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Ard Biesheuvel, Mark Rutland, James Morse, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Eric Biederman, Bhupesh Sharma,
	AKASHI Takahiro, kernel-team

Marc,

On Thu, Apr 29, 2021 at 02:35:32PM +0100, Marc Zyngier wrote:
> kexec_load_file() relies on the memblock infrastructure to avoid
> stamping over regions of memory that are essential to the survival
> of the system.
> 
> However, nobody seems to agree how to flag these regions as reserved,
> and (for example) EFI only publishes its reservations in /proc/iomem
> for the benefit of the traditional, userspace based kexec tool.
> 
> On arm64 platforms with GICv3, this can result in the payload being
> placed at the location of the LPI tables. Shock, horror!
> 
> Let's augment the EFI reservation code with a memblock_reserve() call,
> protecting our dear tables from the secondary kernel invasion.
> 
> At some point, someone will have to go and figure out a way to unify
> these multiple reservation trees, because sprinkling random reservation
> calls is only a temporary workaround.
> 

Feel free to add (and/or):

Reported-by: Moritz Fischer <mdf@kernel.org>
Tested-by: Moritz Fischer <mdf@kernel.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 4b7ee3fa9224..026b02f5f7d8 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -896,11 +896,25 @@ static int __init efi_memreserve_map_root(void)
>  static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
>  {
>  	struct resource *res, *parent;
> +	int ret;
>  
>  	res = kzalloc(sizeof(struct resource), GFP_ATOMIC);
>  	if (!res)
>  		return -ENOMEM;
>  
> +	/*
> +	 * Given that efi_mem_reserve_iomem() can be called at any
> +	 * time, only call memblock_reserve() if the architecture
> +	 * keeps the infrastructure around.
> +	 */
> +	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> +		ret = memblock_reserve(addr, size);
> +		if (ret) {
> +			kfree(res);
> +			return ret;
> +		}
> +	}
> +
>  	res->name	= "reserved";
>  	res->flags	= IORESOURCE_MEM;
>  	res->start	= addr;
> @@ -908,7 +922,14 @@ static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
>  
>  	/* we expect a conflict with a 'System RAM' region */
>  	parent = request_resource_conflict(&iomem_resource, res);
> -	return parent ? request_resource(parent, res) : 0;
> +	ret = parent ? request_resource(parent, res) : 0;
> +	if (ret) {
> +		kfree(res);
> +		if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> +			memblock_free(addr, size);
> +	}
> +
> +	return ret;
>  }
>  
>  int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 size)
> -- 
> 2.29.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Thanks,
Moritz

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] ACPI: arm64: Reserve the ACPI tables in memblock
  2021-04-29 13:35 ` [PATCH 2/2] ACPI: arm64: Reserve the ACPI tables in memblock Marc Zyngier
@ 2021-05-03 18:57   ` Moritz Fischer
  0 siblings, 0 replies; 17+ messages in thread
From: Moritz Fischer @ 2021-05-03 18:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kexec, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Ard Biesheuvel, Mark Rutland, James Morse, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Eric Biederman, Bhupesh Sharma,
	AKASHI Takahiro, kernel-team

Marc,

On Thu, Apr 29, 2021 at 02:35:33PM +0100, Marc Zyngier wrote:
> Just like the EFI reservations, ACPI tables can be stamped over
> by kexec_file_load(), and the secondary kernel will be unable to
> recover from such corruption.
> 
> It looks like our x86 friends have been there before, and have
> recently added some infratructure that does what we need since
> 1a1c130ab757 ("ACPI: tables: x86: Reserve memory occupied by ACPI
> tables"), though it appears to exist for different reasons.
> 
> Let's call into acpi_reserve_initial_tables() early so that
> our tables are protected from the big bad kexec.
> 

Feel free to add (and/or):

Tested-by: Moritz Fischer <mdf@kernel.org>
Reported-by: Moritz Fischer <mdf@kernel.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kernel/acpi.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index cada0b816c8a..5b5406c92ee4 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -233,6 +233,7 @@ void __init acpi_boot_table_init(void)
>  		if (earlycon_acpi_spcr_enable)
>  			early_init_dt_scan_chosen_stdout();
>  	} else {
> +		acpi_reserve_initial_tables();
>  		acpi_parse_spcr(earlycon_acpi_spcr_enable, true);
>  		if (IS_ENABLED(CONFIG_ACPI_BGRT))
>  			acpi_table_parse(ACPI_SIG_BGRT, acpi_parse_bgrt);
> -- 
> 2.29.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Thanks,
Moritz

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-04-29 13:35 [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
  2021-04-29 13:35 ` [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations Marc Zyngier
  2021-04-29 13:35 ` [PATCH 2/2] ACPI: arm64: Reserve the ACPI tables in memblock Marc Zyngier
@ 2021-05-12 18:04 ` Marc Zyngier
  2021-05-13  3:17   ` Dave Young
  2021-05-18 11:48 ` Will Deacon
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2021-05-12 18:04 UTC (permalink / raw)
  To: kexec, linux-arm-kernel
  Cc: Catalin Marinas, Dave Young, Moritz Fischer, Will Deacon,
	Ard Biesheuvel, Mark Rutland, James Morse, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Eric Biederman, Bhupesh Sharma,
	AKASHI Takahiro, kernel-team

+ Dave Young, which I accidentally missed in my initial post

On Thu, 29 Apr 2021 14:35:31 +0100,
Marc Zyngier <maz@kernel.org> wrote:
> 
> It recently became apparent that using kexec with kexec_file_load() on
> arm64 is pretty similar to playing Russian roulette.
> 
> Depending on the amount of memory, the HW supported and the firmware
> interface used, your secondary kernel may overwrite critical memory
> regions without which the secondary kernel cannot boot (the GICv3 LPI
> tables being a prime example of such reserved regions).
> 
> It turns out that there is at least two ways for reserved memory
> regions to be described to kexec: /proc/iomem for the userspace
> implementation, and memblock.reserved for kexec_file. And of course,
> our LPI tables are only reserved using the resource tree, leading to
> the aforementioned stamping. Similar things could happen with ACPI
> tables as well.
> 
> On my 24xA53 system artificially limited to 256MB of RAM (yes, it
> boots with that little memory), trying to kexec a secondary kernel
> failed every times. I can only presume that this was mostly tested
> using kdump, which preserves the entire kernel memory range.
> 
> This small series aims at triggering a discussion on what are the
> expectations for kexec_file, and whether we should unify the two
> reservation mechanisms.
> 
> And in the meantime, it gets things going...
> 
> Marc Zyngier (2):
>   firmware/efi: Tell memblock about EFI reservations
>   ACPI: arm64: Reserve the ACPI tables in memblock
> 
>  arch/arm64/kernel/acpi.c   |  1 +
>  drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
>  2 files changed, 23 insertions(+), 1 deletion(-)

Any comment on this?

I've separately started working on using the resource tree to slice
and dice the memblocks that are candidate for kexec_file_load(), but
I'd like some consensus on whether this is the right way to address
the issue.

Without something like this, kexec_file_load() is not usable on arm64,
so I'm pretty eager to see the back of this bug.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-05-12 18:04 ` [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
@ 2021-05-13  3:17   ` Dave Young
  2021-05-13 11:07     ` Marc Zyngier
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Young @ 2021-05-13  3:17 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kexec, linux-arm-kernel, Catalin Marinas, Moritz Fischer,
	Will Deacon, Ard Biesheuvel, Mark Rutland, James Morse,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Eric Biederman,
	Bhupesh Sharma, Pingfan Liu, AKASHI Takahiro, kernel-team

Hi Marc,
On 05/12/21 at 07:04pm, Marc Zyngier wrote:
> + Dave Young, which I accidentally missed in my initial post
> 
> On Thu, 29 Apr 2021 14:35:31 +0100,
> Marc Zyngier <maz@kernel.org> wrote:
> > 
> > It recently became apparent that using kexec with kexec_file_load() on
> > arm64 is pretty similar to playing Russian roulette.
> > 
> > Depending on the amount of memory, the HW supported and the firmware
> > interface used, your secondary kernel may overwrite critical memory
> > regions without which the secondary kernel cannot boot (the GICv3 LPI
> > tables being a prime example of such reserved regions).
> > 
> > It turns out that there is at least two ways for reserved memory
> > regions to be described to kexec: /proc/iomem for the userspace
> > implementation, and memblock.reserved for kexec_file. And of course,
> > our LPI tables are only reserved using the resource tree, leading to
> > the aforementioned stamping. Similar things could happen with ACPI
> > tables as well.
> > 
> > On my 24xA53 system artificially limited to 256MB of RAM (yes, it
> > boots with that little memory), trying to kexec a secondary kernel
> > failed every times. I can only presume that this was mostly tested
> > using kdump, which preserves the entire kernel memory range.
> > 
> > This small series aims at triggering a discussion on what are the
> > expectations for kexec_file, and whether we should unify the two
> > reservation mechanisms.
> > 
> > And in the meantime, it gets things going...
> > 
> > Marc Zyngier (2):
> >   firmware/efi: Tell memblock about EFI reservations
> >   ACPI: arm64: Reserve the ACPI tables in memblock
> > 
> >  arch/arm64/kernel/acpi.c   |  1 +
> >  drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
> >  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> Any comment on this?
> 
> I've separately started working on using the resource tree to slice
> and dice the memblocks that are candidate for kexec_file_load(), but
> I'd like some consensus on whether this is the right way to address
> the issue.
> 
> Without something like this, kexec_file_load() is not usable on arm64,
> so I'm pretty eager to see the back of this bug.

The arm64 memory reservation is tricky, I do not think I understand it
correctly.  Previously there were a lot discussion, Ard and AKASHI
should know more about it, see if they can provide comments.

About the problem you see, another way is to just add an arch weak
function like powerpc: arch_kexec_locate_mem_hole, and walking resource
tree for kexec_file_load as well.  But I might be wrong since I did not
follow up the arm64 specific history.

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 

Thanks
Dave


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations
  2021-05-03 18:56   ` Moritz Fischer
@ 2021-05-13  3:20     ` Dave Young
  2021-05-13 11:11       ` Marc Zyngier
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Young @ 2021-05-13  3:20 UTC (permalink / raw)
  To: Moritz Fischer
  Cc: Marc Zyngier, kexec, linux-arm-kernel, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, James Morse,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Eric Biederman,
	AKASHI Takahiro, kernel-team

On 05/03/21 at 11:56am, Moritz Fischer wrote:
> Marc,
> 
> On Thu, Apr 29, 2021 at 02:35:32PM +0100, Marc Zyngier wrote:
> > kexec_load_file() relies on the memblock infrastructure to avoid
> > stamping over regions of memory that are essential to the survival
> > of the system.
> > 
> > However, nobody seems to agree how to flag these regions as reserved,
> > and (for example) EFI only publishes its reservations in /proc/iomem
> > for the benefit of the traditional, userspace based kexec tool.
> > 
> > On arm64 platforms with GICv3, this can result in the payload being
> > placed at the location of the LPI tables. Shock, horror!
> > 
> > Let's augment the EFI reservation code with a memblock_reserve() call,
> > protecting our dear tables from the secondary kernel invasion.
> > 
> > At some point, someone will have to go and figure out a way to unify
> > these multiple reservation trees, because sprinkling random reservation
> > calls is only a temporary workaround.
> > 
> 
> Feel free to add (and/or):
> 
> Reported-by: Moritz Fischer <mdf@kernel.org>
> Tested-by: Moritz Fischer <mdf@kernel.org>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
> >  1 file changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index 4b7ee3fa9224..026b02f5f7d8 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -896,11 +896,25 @@ static int __init efi_memreserve_map_root(void)
> >  static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
> >  {
> >  	struct resource *res, *parent;
> > +	int ret;
> >  
> >  	res = kzalloc(sizeof(struct resource), GFP_ATOMIC);
> >  	if (!res)
> >  		return -ENOMEM;
> >  
> > +	/*
> > +	 * Given that efi_mem_reserve_iomem() can be called at any
> > +	 * time, only call memblock_reserve() if the architecture
> > +	 * keeps the infrastructure around.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> > +		ret = memblock_reserve(addr, size);
> > +		if (ret) {
> > +			kfree(res);
> > +			return ret;
> > +		}
> > +	}
> > +

If you go with memblock, it would be better to handle it separately from
the iomem?

> >  	res->name	= "reserved";
> >  	res->flags	= IORESOURCE_MEM;
> >  	res->start	= addr;
> > @@ -908,7 +922,14 @@ static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
> >  
> >  	/* we expect a conflict with a 'System RAM' region */
> >  	parent = request_resource_conflict(&iomem_resource, res);
> > -	return parent ? request_resource(parent, res) : 0;
> > +	ret = parent ? request_resource(parent, res) : 0;
> > +	if (ret) {
> > +		kfree(res);
> > +		if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> > +			memblock_free(addr, size);
> > +	}
> > +
> > +	return ret;

It looks odd to free memblock when reqeust resource fails, they are not
relavant?

> >  }
> >  
> >  int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 size)
> > -- 
> > 2.29.2
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> Thanks,
> Moritz
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
> 

Thanks
Dave


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-05-13  3:17   ` Dave Young
@ 2021-05-13 11:07     ` Marc Zyngier
  0 siblings, 0 replies; 17+ messages in thread
From: Marc Zyngier @ 2021-05-13 11:07 UTC (permalink / raw)
  To: Dave Young
  Cc: kexec, linux-arm-kernel, Catalin Marinas, Moritz Fischer,
	Will Deacon, Ard Biesheuvel, Mark Rutland, James Morse,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Eric Biederman,
	Pingfan Liu, AKASHI Takahiro, kernel-team

[- Bhupesh, as his RH address now bounces]

On Thu, 13 May 2021 04:17:38 +0100,
Dave Young <dyoung@redhat.com> wrote:
> 
> Hi Marc,
> On 05/12/21 at 07:04pm, Marc Zyngier wrote:
> > + Dave Young, which I accidentally missed in my initial post
> > 
> > On Thu, 29 Apr 2021 14:35:31 +0100,
> > Marc Zyngier <maz@kernel.org> wrote:
> > > 
> > > It recently became apparent that using kexec with kexec_file_load() on
> > > arm64 is pretty similar to playing Russian roulette.
> > > 
> > > Depending on the amount of memory, the HW supported and the firmware
> > > interface used, your secondary kernel may overwrite critical memory
> > > regions without which the secondary kernel cannot boot (the GICv3 LPI
> > > tables being a prime example of such reserved regions).
> > > 
> > > It turns out that there is at least two ways for reserved memory
> > > regions to be described to kexec: /proc/iomem for the userspace
> > > implementation, and memblock.reserved for kexec_file. And of course,
> > > our LPI tables are only reserved using the resource tree, leading to
> > > the aforementioned stamping. Similar things could happen with ACPI
> > > tables as well.
> > > 
> > > On my 24xA53 system artificially limited to 256MB of RAM (yes, it
> > > boots with that little memory), trying to kexec a secondary kernel
> > > failed every times. I can only presume that this was mostly tested
> > > using kdump, which preserves the entire kernel memory range.
> > > 
> > > This small series aims at triggering a discussion on what are the
> > > expectations for kexec_file, and whether we should unify the two
> > > reservation mechanisms.
> > > 
> > > And in the meantime, it gets things going...
> > > 
> > > Marc Zyngier (2):
> > >   firmware/efi: Tell memblock about EFI reservations
> > >   ACPI: arm64: Reserve the ACPI tables in memblock
> > > 
> > >  arch/arm64/kernel/acpi.c   |  1 +
> > >  drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
> > >  2 files changed, 23 insertions(+), 1 deletion(-)
> > 
> > Any comment on this?
> > 
> > I've separately started working on using the resource tree to slice
> > and dice the memblocks that are candidate for kexec_file_load(), but
> > I'd like some consensus on whether this is the right way to address
> > the issue.
> > 
> > Without something like this, kexec_file_load() is not usable on arm64,
> > so I'm pretty eager to see the back of this bug.
> 
> The arm64 memory reservation is tricky, I do not think I understand it
> correctly.  Previously there were a lot discussion, Ard and AKASHI
> should know more about it, see if they can provide comments.

I'll let them chime in. It looks like most of the discussions were
around kdump, which doesn't suffer from this issue (the memory is
reserved upfront).

> About the problem you see, another way is to just add an arch weak
> function like powerpc: arch_kexec_locate_mem_hole, and walking resource
> tree for kexec_file_load as well.  But I might be wrong since I did not
> follow up the arm64 specific history.

Right, this would avoid messing with the core code. However, the
problem remains in the sense that there is no clear definition of what
"reserved memory" is in general, and where it is described. For
example, x86 places the ACPI tables in reserved memblocks, while arm64
doesn't (unless we use my second patch).

To reliably use the resource tree, we need to ensure that it contains
all the reservations that appeared in memblock too. And if we can't
have a single reference, then we have to consider the union of the two
trees.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations
  2021-05-13  3:20     ` Dave Young
@ 2021-05-13 11:11       ` Marc Zyngier
  0 siblings, 0 replies; 17+ messages in thread
From: Marc Zyngier @ 2021-05-13 11:11 UTC (permalink / raw)
  To: Dave Young
  Cc: Moritz Fischer, kexec, linux-arm-kernel, Catalin Marinas,
	Will Deacon, Ard Biesheuvel, Mark Rutland, James Morse,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Eric Biederman,
	AKASHI Takahiro, kernel-team

On Thu, 13 May 2021 04:20:21 +0100,
Dave Young <dyoung@redhat.com> wrote:
> 
> On 05/03/21 at 11:56am, Moritz Fischer wrote:
> > Marc,
> > 
> > On Thu, Apr 29, 2021 at 02:35:32PM +0100, Marc Zyngier wrote:
> > > kexec_load_file() relies on the memblock infrastructure to avoid
> > > stamping over regions of memory that are essential to the survival
> > > of the system.
> > > 
> > > However, nobody seems to agree how to flag these regions as reserved,
> > > and (for example) EFI only publishes its reservations in /proc/iomem
> > > for the benefit of the traditional, userspace based kexec tool.
> > > 
> > > On arm64 platforms with GICv3, this can result in the payload being
> > > placed at the location of the LPI tables. Shock, horror!
> > > 
> > > Let's augment the EFI reservation code with a memblock_reserve() call,
> > > protecting our dear tables from the secondary kernel invasion.
> > > 
> > > At some point, someone will have to go and figure out a way to unify
> > > these multiple reservation trees, because sprinkling random reservation
> > > calls is only a temporary workaround.
> > > 
> > 
> > Feel free to add (and/or):
> > 
> > Reported-by: Moritz Fischer <mdf@kernel.org>
> > Tested-by: Moritz Fischer <mdf@kernel.org>
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
> > >  1 file changed, 22 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > > index 4b7ee3fa9224..026b02f5f7d8 100644
> > > --- a/drivers/firmware/efi/efi.c
> > > +++ b/drivers/firmware/efi/efi.c
> > > @@ -896,11 +896,25 @@ static int __init efi_memreserve_map_root(void)
> > >  static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
> > >  {
> > >  	struct resource *res, *parent;
> > > +	int ret;
> > >  
> > >  	res = kzalloc(sizeof(struct resource), GFP_ATOMIC);
> > >  	if (!res)
> > >  		return -ENOMEM;
> > >  
> > > +	/*
> > > +	 * Given that efi_mem_reserve_iomem() can be called at any
> > > +	 * time, only call memblock_reserve() if the architecture
> > > +	 * keeps the infrastructure around.
> > > +	 */
> > > +	if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> > > +		ret = memblock_reserve(addr, size);
> > > +		if (ret) {
> > > +			kfree(res);
> > > +			return ret;
> > > +		}
> > > +	}
> > > +
> 
> If you go with memblock, it would be better to handle it separately from
> the iomem?

Do you mean having a separate helper from efi_mem_reserve_iomem()?
Sure, can do.

> 
> > >  	res->name	= "reserved";
> > >  	res->flags	= IORESOURCE_MEM;
> > >  	res->start	= addr;
> > > @@ -908,7 +922,14 @@ static int efi_mem_reserve_iomem(phys_addr_t addr, u64 size)
> > >  
> > >  	/* we expect a conflict with a 'System RAM' region */
> > >  	parent = request_resource_conflict(&iomem_resource, res);
> > > -	return parent ? request_resource(parent, res) : 0;
> > > +	ret = parent ? request_resource(parent, res) : 0;
> > > +	if (ret) {
> > > +		kfree(res);
> > > +		if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> > > +			memblock_free(addr, size);
> > > +	}
> > > +
> > > +	return ret;
> 
> It looks odd to free memblock when reqeust resource fails, they are not
> relavant?

I'm trying to keep the two trees in sync so that when the caller finds
out that the reservation has failed, we're not in a half-baked state.

But maybe it doesn't really matter, and if a reservation fails, we're
already screwed.

Ard, what do you think?

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-04-29 13:35 [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
                   ` (2 preceding siblings ...)
  2021-05-12 18:04 ` [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
@ 2021-05-18 11:48 ` Will Deacon
  2021-05-18 14:23   ` Bhupesh Sharma
  2021-05-19 15:19 ` Catalin Marinas
  2021-06-02 14:22 ` James Morse
  5 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2021-05-18 11:48 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kexec, linux-arm-kernel, Catalin Marinas, Ard Biesheuvel,
	Mark Rutland, James Morse, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Eric Biederman, bhupesh.sharma, AKASHI Takahiro,
	kernel-team

[Fixing Bhupesh's email address]

On Thu, Apr 29, 2021 at 02:35:31PM +0100, Marc Zyngier wrote:
> It recently became apparent that using kexec with kexec_file_load() on
> arm64 is pretty similar to playing Russian roulette.
> 
> Depending on the amount of memory, the HW supported and the firmware
> interface used, your secondary kernel may overwrite critical memory
> regions without which the secondary kernel cannot boot (the GICv3 LPI
> tables being a prime example of such reserved regions).
> 
> It turns out that there is at least two ways for reserved memory
> regions to be described to kexec: /proc/iomem for the userspace
> implementation, and memblock.reserved for kexec_file. And of course,
> our LPI tables are only reserved using the resource tree, leading to
> the aforementioned stamping. Similar things could happen with ACPI
> tables as well.
> 
> On my 24xA53 system artificially limited to 256MB of RAM (yes, it
> boots with that little memory), trying to kexec a secondary kernel
> failed every times. I can only presume that this was mostly tested
> using kdump, which preserves the entire kernel memory range.
> 
> This small series aims at triggering a discussion on what are the
> expectations for kexec_file, and whether we should unify the two
> reservation mechanisms.

Bhupesh, since you've been involved with kexec file on arm64 before, please
could you take a look at these patches?

Thanks,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-05-18 11:48 ` Will Deacon
@ 2021-05-18 14:23   ` Bhupesh Sharma
  0 siblings, 0 replies; 17+ messages in thread
From: Bhupesh Sharma @ 2021-05-18 14:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kexec, linux-arm-kernel, Catalin Marinas,
	Ard Biesheuvel, Mark Rutland, James Morse, Lorenzo Pieralisi,
	Hanjun Guo, Sudeep Holla, Eric Biederman, AKASHI Takahiro,
	kernel-team

Hi Will,

On Tue, 18 May 2021 at 17:19, Will Deacon <will@kernel.org> wrote:
>
> [Fixing Bhupesh's email address]
>
> On Thu, Apr 29, 2021 at 02:35:31PM +0100, Marc Zyngier wrote:
> > It recently became apparent that using kexec with kexec_file_load() on
> > arm64 is pretty similar to playing Russian roulette.
> >
> > Depending on the amount of memory, the HW supported and the firmware
> > interface used, your secondary kernel may overwrite critical memory
> > regions without which the secondary kernel cannot boot (the GICv3 LPI
> > tables being a prime example of such reserved regions).
> >
> > It turns out that there is at least two ways for reserved memory
> > regions to be described to kexec: /proc/iomem for the userspace
> > implementation, and memblock.reserved for kexec_file. And of course,
> > our LPI tables are only reserved using the resource tree, leading to
> > the aforementioned stamping. Similar things could happen with ACPI
> > tables as well.
> >
> > On my 24xA53 system artificially limited to 256MB of RAM (yes, it
> > boots with that little memory), trying to kexec a secondary kernel
> > failed every times. I can only presume that this was mostly tested
> > using kdump, which preserves the entire kernel memory range.
> >
> > This small series aims at triggering a discussion on what are the
> > expectations for kexec_file, and whether we should unify the two
> > reservation mechanisms.
>
> Bhupesh, since you've been involved with kexec file on arm64 before, please
> could you take a look at these patches?

Thanks for adding me in Cc.
Yes, I will look and test these patches asap.

Regards,
Bhupesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-04-29 13:35 [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
                   ` (3 preceding siblings ...)
  2021-05-18 11:48 ` Will Deacon
@ 2021-05-19 15:19 ` Catalin Marinas
  2021-05-25 16:22   ` Marc Zyngier
  2021-06-02 14:22 ` James Morse
  5 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2021-05-19 15:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kexec, linux-arm-kernel, Will Deacon, Ard Biesheuvel,
	Mark Rutland, James Morse, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Eric Biederman, Bhupesh Sharma, AKASHI Takahiro,
	kernel-team

On Thu, Apr 29, 2021 at 02:35:31PM +0100, Marc Zyngier wrote:
> It recently became apparent that using kexec with kexec_file_load() on
> arm64 is pretty similar to playing Russian roulette.
> 
> Depending on the amount of memory, the HW supported and the firmware
> interface used, your secondary kernel may overwrite critical memory
> regions without which the secondary kernel cannot boot (the GICv3 LPI
> tables being a prime example of such reserved regions).
> 
> It turns out that there is at least two ways for reserved memory
> regions to be described to kexec: /proc/iomem for the userspace
> implementation, and memblock.reserved for kexec_file. And of course,
> our LPI tables are only reserved using the resource tree, leading to
> the aforementioned stamping. Similar things could happen with ACPI
> tables as well.

So which one of these (/proc/iomem and memblock.reserved) would be the
correct option? If none of them, is their intersection any better?
Looking at the default kexec_locate_mem_hole(), it uses the resources
tree if !CONFIG_ARCH_KEEP_MEMBLOCK, otherwise memblock.

PowerPC implements its own arch_kexec_locate_mem_hole() to skip specific
arch regions. We could do something similar for arm64 if the arch code
knows where the LPI reservation is or the ACPI tables.

If we conclude that we need some intersection of resource reservations
and memblock, maybe we should change the default kexec_locate_mem_hole()
implementation to check for both (e.g. start with the resource tree and
only consider a range valid if not in memblock.reserved).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-05-19 15:19 ` Catalin Marinas
@ 2021-05-25 16:22   ` Marc Zyngier
  0 siblings, 0 replies; 17+ messages in thread
From: Marc Zyngier @ 2021-05-25 16:22 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: kexec, linux-arm-kernel, Will Deacon, Ard Biesheuvel,
	Mark Rutland, James Morse, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Eric Biederman, Bhupesh SHARMA, AKASHI Takahiro,
	kernel-team

On Wed, 19 May 2021 16:19:44 +0100,
Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> On Thu, Apr 29, 2021 at 02:35:31PM +0100, Marc Zyngier wrote:
> > It recently became apparent that using kexec with kexec_file_load() on
> > arm64 is pretty similar to playing Russian roulette.
> > 
> > Depending on the amount of memory, the HW supported and the firmware
> > interface used, your secondary kernel may overwrite critical memory
> > regions without which the secondary kernel cannot boot (the GICv3 LPI
> > tables being a prime example of such reserved regions).
> > 
> > It turns out that there is at least two ways for reserved memory
> > regions to be described to kexec: /proc/iomem for the userspace
> > implementation, and memblock.reserved for kexec_file. And of course,
> > our LPI tables are only reserved using the resource tree, leading to
> > the aforementioned stamping. Similar things could happen with ACPI
> > tables as well.
> 
> So which one of these (/proc/iomem and memblock.reserved) would be the
> correct option? If none of them, is their intersection any better?

/proc/iomem is what we use for userspace, so you'd expect this to be
the right thing to use.

> Looking at the default kexec_locate_mem_hole(), it uses the resources
> tree if !CONFIG_ARCH_KEEP_MEMBLOCK, otherwise memblock.

Yup, and funnily enough, forcing a fallback to the resources tree
doesn't help either, because the logic used here isn't much better (it
takes the RAM areas at face value, without excluding any of the
reserved regions that are children of the "System RAM" regions).

It's not funny anymore.

> PowerPC implements its own arch_kexec_locate_mem_hole() to skip specific
> arch regions. We could do something similar for arm64 if the arch code
> knows where the LPI reservation is or the ACPI tables.

It feels like a bit of a failure to duplicate all that code. I'd
consider that the last possible outcome.

> If we conclude that we need some intersection of resource reservations
> and memblock, maybe we should change the default kexec_locate_mem_hole()
> implementation to check for both (e.g. start with the resource tree and
> only consider a range valid if not in memblock.reserved).

I am more angling towards this. But my worry is that different
architectures have already different ways to reserve memory (PPC seems
to do their own stuff on top of memblock, x86 I assumes uses the
resource tree in a different way than arm64).

Anyway, I'll keep digging.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-04-29 13:35 [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
                   ` (4 preceding siblings ...)
  2021-05-19 15:19 ` Catalin Marinas
@ 2021-06-02 14:22 ` James Morse
  2021-06-02 15:59   ` Marc Zyngier
  5 siblings, 1 reply; 17+ messages in thread
From: James Morse @ 2021-06-02 14:22 UTC (permalink / raw)
  To: Marc Zyngier, kexec, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Mark Rutland,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Eric Biederman,
	Bhupesh Sharma, AKASHI Takahiro, kernel-team

Hi Marc,

On 29/04/2021 14:35, Marc Zyngier wrote:
> It recently became apparent that using kexec with kexec_file_load() on
> arm64 is pretty similar to playing Russian roulette.
> 
> Depending on the amount of memory, the HW supported and the firmware
> interface used, your secondary kernel may overwrite critical memory
> regions without which the secondary kernel cannot boot (the GICv3 LPI
> tables being a prime example of such reserved regions).
> 
> It turns out that there is at least two ways for reserved memory
> regions to be described to kexec: /proc/iomem for the userspace
> implementation, and memblock.reserved for kexec_file. 

One is spilled into the other by request_standard_resources()...


> And of course,
> our LPI tables are only reserved using the resource tree, leading to
> the aforementioned stamping.

Presumably well after efi_init() has run...

> Similar things could happen with ACPI tables as well.

efi_init() calls reserve_regions(), which has:
|	/* keep ACPI reclaim memory intact for kexec etc. */
|	if (md->type == EFI_ACPI_RECLAIM_MEMORY)
|		memblock_reserve(paddr, size);

This is also what stops mm from allocating them, as memblock-reserved gets copied into the
PG_Reserved flag by free_low_memory_core_early()'s calls to reserve_bootmem_region().

Is your machines firmware putting them in a region with a different type?

(The UEFI spec has something to say: see 2.3.6 "AArch64 Platforms":
| ACPI Tables loaded at boot time can be contained in memory of type EfiACPIReclaimMemory
| (recommended) or EfiACPIMemoryNVS

NVS would fail the is_usable_memory() check earlier, so gets treated as nomap)


Thanks,

James

> On my 24xA53 system artificially limited to 256MB of RAM (yes, it
> boots with that little memory), trying to kexec a secondary kernel
> failed every times. I can only presume that this was mostly tested
> using kdump, which preserves the entire kernel memory range.
> 
> This small series aims at triggering a discussion on what are the
> expectations for kexec_file, and whether we should unify the two
> reservation mechanisms.
> 
> And in the meantime, it gets things going...
> 
> Marc Zyngier (2):
>   firmware/efi: Tell memblock about EFI reservations
>   ACPI: arm64: Reserve the ACPI tables in memblock
> 
>  arch/arm64/kernel/acpi.c   |  1 +
>  drivers/firmware/efi/efi.c | 23 ++++++++++++++++++++++-
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-06-02 14:22 ` James Morse
@ 2021-06-02 15:59   ` Marc Zyngier
  2021-06-02 16:58     ` James Morse
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2021-06-02 15:59 UTC (permalink / raw)
  To: James Morse
  Cc: kexec, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Ard Biesheuvel, Mark Rutland, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Eric Biederman, Bhupesh Sharma, AKASHI Takahiro,
	kernel-team, Moritz Fischer

Hi James,

On Wed, 02 Jun 2021 15:22:00 +0100,
James Morse <james.morse@arm.com> wrote:
> 
> Hi Marc,
> 
> On 29/04/2021 14:35, Marc Zyngier wrote:
> > It recently became apparent that using kexec with kexec_file_load() on
> > arm64 is pretty similar to playing Russian roulette.
> > 
> > Depending on the amount of memory, the HW supported and the firmware
> > interface used, your secondary kernel may overwrite critical memory
> > regions without which the secondary kernel cannot boot (the GICv3 LPI
> > tables being a prime example of such reserved regions).
> > 
> > It turns out that there is at least two ways for reserved memory
> > regions to be described to kexec: /proc/iomem for the userspace
> > implementation, and memblock.reserved for kexec_file. 
> 
> One is spilled into the other by request_standard_resources()...
> 
> 
> > And of course,
> > our LPI tables are only reserved using the resource tree, leading to
> > the aforementioned stamping.
> 
> Presumably well after efi_init() has run...

Yup, much later. And we can keep on reserving memory as long as we
boot new CPUs. Having it as a one-off sync doesn't really help here.

> 
> > Similar things could happen with ACPI tables as well.
> 
> efi_init() calls reserve_regions(), which has:
> |	/* keep ACPI reclaim memory intact for kexec etc. */
> |	if (md->type == EFI_ACPI_RECLAIM_MEMORY)
> |		memblock_reserve(paddr, size);
> 
> This is also what stops mm from allocating them, as
> memblock-reserved gets copied into the PG_Reserved flag by
> free_low_memory_core_early()'s calls to reserve_bootmem_region().
> 
> Is your machines firmware putting them in a region with a different type?

Good question. Moritz (cc'd) saw the tables being overwritten on his
system (which I don't have access to), so I guess this is not entirely
clear cut how this happens.

My SQ box reports the ACPI region as "ACPI Reclaim", so I guess it
works as expected here.

> (The UEFI spec has something to say: see 2.3.6 "AArch64 Platforms":
> | ACPI Tables loaded at boot time can be contained in memory of type EfiACPIReclaimMemory
> | (recommended) or EfiACPIMemoryNVS
> 
> NVS would fail the is_usable_memory() check earlier, so gets treated
> as nomap)

Note that I've since changed tactics and proposed that we fully rely
on the resource tree instead[1].

Thanks,

	M.

[1] https://lore.kernel.org/r/20210531095720.77469-1-maz@kernel.org

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
  2021-06-02 15:59   ` Marc Zyngier
@ 2021-06-02 16:58     ` James Morse
  0 siblings, 0 replies; 17+ messages in thread
From: James Morse @ 2021-06-02 16:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kexec, linux-arm-kernel, Catalin Marinas, Will Deacon,
	Ard Biesheuvel, Mark Rutland, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Eric Biederman, Bhupesh Sharma, AKASHI Takahiro,
	kernel-team, Moritz Fischer

Hi Marc,

On 02/06/2021 16:59, Marc Zyngier wrote:
> On Wed, 02 Jun 2021 15:22:00 +0100,
> James Morse <james.morse@arm.com> wrote:
>> On 29/04/2021 14:35, Marc Zyngier wrote:
>>> It recently became apparent that using kexec with kexec_file_load() on
>>> arm64 is pretty similar to playing Russian roulette.
>>>
>>> Depending on the amount of memory, the HW supported and the firmware
>>> interface used, your secondary kernel may overwrite critical memory
>>> regions without which the secondary kernel cannot boot (the GICv3 LPI
>>> tables being a prime example of such reserved regions).
>>>
>>> It turns out that there is at least two ways for reserved memory
>>> regions to be described to kexec: /proc/iomem for the userspace
>>> implementation, and memblock.reserved for kexec_file. 
>>
>> One is spilled into the other by request_standard_resources()...
>>
>>
>>> And of course,
>>> our LPI tables are only reserved using the resource tree, leading to
>>> the aforementioned stamping.
>>
>> Presumably well after efi_init() has run...
> 
> Yup, much later. And we can keep on reserving memory as long as we
> boot new CPUs. Having it as a one-off sync doesn't really help here.

It might need doing for all possible CPUs up-front... otherwise someone loads a kexec
kernel and correctly picks a safe location ... then a CPU comes online and reserves a hole
in the middle of it: kexec isn't using the selected location until you reboot().
(memory hotplug has some 'fun' in this area, which can only be fixed by using memblock,
which ought to know about removable memory ranges ... but doesn't)

There does need to be a point where the list of reserved memory stops changing.


>>> Similar things could happen with ACPI tables as well.
>>
>> efi_init() calls reserve_regions(), which has:
>> |	/* keep ACPI reclaim memory intact for kexec etc. */
>> |	if (md->type == EFI_ACPI_RECLAIM_MEMORY)
>> |		memblock_reserve(paddr, size);
>>
>> This is also what stops mm from allocating them, as
>> memblock-reserved gets copied into the PG_Reserved flag by
>> free_low_memory_core_early()'s calls to reserve_bootmem_region().
>>
>> Is your machines firmware putting them in a region with a different type?

> Good question. Moritz (cc'd) saw the tables being overwritten on his
> system (which I don't have access to), so I guess this is not entirely
> clear cut how this happens.

If we have systems that store the tables in 'conventional memory' we have bigger problems!


> My SQ box reports the ACPI region as "ACPI Reclaim", so I guess it
> works as expected here.
> 
>> (The UEFI spec has something to say: see 2.3.6 "AArch64 Platforms":
>> | ACPI Tables loaded at boot time can be contained in memory of type EfiACPIReclaimMemory
>> | (recommended) or EfiACPIMemoryNVS
>>
>> NVS would fail the is_usable_memory() check earlier, so gets treated
>> as nomap)

> Note that I've since changed tactics and proposed that we fully rely
> on the resource tree instead[1].

Yup - I came back here to work out why you gave up on memblock:reserving the reserved
regions...



Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-06-02 17:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29 13:35 [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
2021-04-29 13:35 ` [PATCH 1/2] firmware/efi: Tell memblock about EFI reservations Marc Zyngier
2021-05-03 18:56   ` Moritz Fischer
2021-05-13  3:20     ` Dave Young
2021-05-13 11:11       ` Marc Zyngier
2021-04-29 13:35 ` [PATCH 2/2] ACPI: arm64: Reserve the ACPI tables in memblock Marc Zyngier
2021-05-03 18:57   ` Moritz Fischer
2021-05-12 18:04 ` [PATCH 0/2] arm64: kexec_file_load vs memory reservations Marc Zyngier
2021-05-13  3:17   ` Dave Young
2021-05-13 11:07     ` Marc Zyngier
2021-05-18 11:48 ` Will Deacon
2021-05-18 14:23   ` Bhupesh Sharma
2021-05-19 15:19 ` Catalin Marinas
2021-05-25 16:22   ` Marc Zyngier
2021-06-02 14:22 ` James Morse
2021-06-02 15:59   ` Marc Zyngier
2021-06-02 16:58     ` James Morse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).