linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] irqchip/gic-v3: Don't reserve persistent memory for Xen domain
@ 2022-09-06  2:40 Leo Yan
  2022-09-06  6:22 ` Marc Zyngier
  0 siblings, 1 reply; 2+ messages in thread
From: Leo Yan @ 2022-09-06  2:40 UTC (permalink / raw)
  To: Thomas Gleixner, Marc Zyngier, linux-kernel
  Cc: Leo Yan, Ard Biesheuvel, Bertrand Marquis, Rahul Singh,
	Julien Grall, Mathieu Poirier

For GICv3 with its redistributor, the driver needs to reserve the
persistent memory for LPI configuration and pending tables, so the
reserved pages will not be overwritten by secondary kernel launched by
kexec, the hardware can continue to use the pages for maintenance
LPI states.

When kernel runs in Xen domain, Xen uses FDT with encapsulating ACPI
table in device tree.  Therefore, the EFI stub is not invoked and
the memreserve table is not installed, this leads to the memory
cannot be reserved as persistent region and kernel reports oops:

[    0.403737] ------------[ cut here ]------------
[    0.403738] WARNING: CPU: 30 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:3074 its_cpu_init+0x814/0xae0
[    0.403745] Modules linked in:
[    0.403748] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G        W         5.15.23-ampere-lts-standard #1
[    0.403752] pstate: 600001c5 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.403755] pc : its_cpu_init+0x814/0xae0
[    0.403758] lr : its_cpu_init+0x810/0xae0
[    0.403761] sp : ffff800009c03ce0
[    0.403762] x29: ffff800009c03ce0 x28: 000000000000001e x27: ffff880711f43000
[    0.403767] x26: ffff80000a3c0070 x25: fffffc1ffe0a4400 x24: ffff80000a3c0000
[    0.403770] x23: ffff8000095bc998 x22: ffff8000090a6000 x21: ffff800009850cb0
[    0.403774] x20: ffff800009701a10 x19: ffff800009701000 x18: ffffffffffffffff
[    0.403777] x17: 3030303035303031 x16: 3030313030303078 x15: 303a30206e6f6967
[    0.403780] x14: 6572206530312072 x13: 3030303030353030 x12: 3130303130303030
[    0.403784] x11: 78303a30206e6f69 x10: 6765722065303120 x9 : ffff80000870e710
[    0.403788] x8 : 6964657220646e75 x7 : 0000000000000003 x6 : 0000000000000000
[    0.403791] x5 : 0000000000000000 x4 : fffffc0000000000 x3 : 0000000000000010
[    0.403794] x2 : 000000000000ffff x1 : 0000000000010000 x0 : 00000000ffffffed
[    0.403798] Call trace:
[    0.403799]  its_cpu_init+0x814/0xae0
[    0.403802]  gic_starting_cpu+0x48/0x90
[    0.403805]  cpuhp_invoke_callback+0x16c/0x5b0
[    0.403808]  cpuhp_invoke_callback_range+0x78/0xf0
[    0.403811]  notify_cpu_starting+0xbc/0xdc
[    0.403814]  secondary_start_kernel+0xe0/0x170
[    0.403817]  __secondary_switched+0x94/0x98
[    0.403821] ---[ end trace f68728a0d3053b70 ]---

GICv3 interrupt controller is emulated by Xen hypervisor, this means the
LPI configuration table and pending table allocated by Linux kernel are
only emulated by software by not accessed by hardware, so it has no risk
to introduce race condition between the secondary kernel launched by
kexec and the physical interrupt controller.  And when the secondary
kernel is booting, it uses totally separate memory region from the
primary kernel, the secondary kernel can allocate its own LPI
configuration table and pending table and register them into Xen
hypervisor afterwards.

If look into the GIC implementation, LPI serves for message-based
interrupts (MSI), it comes from ITS or directly from MSI, and at the end
forward LPI to redistributor.  This means the physical LPIs are received
in Xen hypervisor (in EL2) and sets List Register for virtual CPU
interface (consumed in EL1).  Furthermore, to support the emulated LPIs,
the first question is how to connect virtual GICv3 with MSI, and then
it also requires Xen to emulate the ITS and redistributor; so far, Xen
hypervisor doesn't really emulate these hardware mechanism thus the
allocated LPI tables in Linux are not used by Xen hypervisor.

For above reasons, this patch simply skips to reserve persistent memory
for Xen domain so can mute the useless oops.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
Cc: Rahul Singh <Rahul.Singh@arm.com>
Cc: Julien Grall <jgrall@amazon.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 drivers/irqchip/irq-gic-v3-its.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5ff09de6c48f..9ba9984401de 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -34,6 +34,8 @@
 #include <linux/irqchip/arm-gic-v3.h>
 #include <linux/irqchip/arm-gic-v4.h>
 
+#include <xen/xen.h>
+
 #include <asm/cputype.h>
 #include <asm/exception.h>
 
@@ -2220,6 +2222,21 @@ static bool gic_check_reserved_range(phys_addr_t addr, unsigned long size)
 
 static int gic_reserve_range(phys_addr_t addr, unsigned long size)
 {
+	/*
+	 * When kernel runs in Xen domain, it misses to invoke the EFI stub,
+	 * thus the memreserve table is not installed; in this case, the
+	 * memory cannot be reserved as persistent region.
+	 *
+	 * On the other hand, the GICv3 controller is emulated by Xen
+	 * hypervisor, given a redistrubitor its LPI pending table and
+	 * configuration table are emulated by software but not manipulated
+	 * by hardware.  Therefore, it's not necessary to reserve them, for
+	 * kexec/kdump the secondary kernel can allocate new pages for these
+	 * two tables.
+	 */
+	if (xen_domain())
+		return 0;
+
 	if (efi_enabled(EFI_CONFIG_TABLES))
 		return efi_mem_reserve_persistent(addr, size);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] irqchip/gic-v3: Don't reserve persistent memory for Xen domain
  2022-09-06  2:40 [PATCH] irqchip/gic-v3: Don't reserve persistent memory for Xen domain Leo Yan
@ 2022-09-06  6:22 ` Marc Zyngier
  0 siblings, 0 replies; 2+ messages in thread
From: Marc Zyngier @ 2022-09-06  6:22 UTC (permalink / raw)
  To: Leo Yan
  Cc: Thomas Gleixner, linux-kernel, Ard Biesheuvel, Bertrand Marquis,
	Rahul Singh, Julien Grall, Mathieu Poirier

On Tue, 06 Sep 2022 03:40:40 +0100,
Leo Yan <leo.yan@linaro.org> wrote:
> 
> For GICv3 with its redistributor, the driver needs to reserve the
> persistent memory for LPI configuration and pending tables, so the
> reserved pages will not be overwritten by secondary kernel launched by
> kexec, the hardware can continue to use the pages for maintenance
> LPI states.
> 
> When kernel runs in Xen domain, Xen uses FDT with encapsulating ACPI
> table in device tree.  Therefore, the EFI stub is not invoked and
> the memreserve table is not installed, this leads to the memory
> cannot be reserved as persistent region and kernel reports oops:
> 
> [    0.403737] ------------[ cut here ]------------
> [    0.403738] WARNING: CPU: 30 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:3074 its_cpu_init+0x814/0xae0
> [    0.403745] Modules linked in:
> [    0.403748] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G        W         5.15.23-ampere-lts-standard #1
> [    0.403752] pstate: 600001c5 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    0.403755] pc : its_cpu_init+0x814/0xae0
> [    0.403758] lr : its_cpu_init+0x810/0xae0
> [    0.403761] sp : ffff800009c03ce0
> [    0.403762] x29: ffff800009c03ce0 x28: 000000000000001e x27: ffff880711f43000
> [    0.403767] x26: ffff80000a3c0070 x25: fffffc1ffe0a4400 x24: ffff80000a3c0000
> [    0.403770] x23: ffff8000095bc998 x22: ffff8000090a6000 x21: ffff800009850cb0
> [    0.403774] x20: ffff800009701a10 x19: ffff800009701000 x18: ffffffffffffffff
> [    0.403777] x17: 3030303035303031 x16: 3030313030303078 x15: 303a30206e6f6967
> [    0.403780] x14: 6572206530312072 x13: 3030303030353030 x12: 3130303130303030
> [    0.403784] x11: 78303a30206e6f69 x10: 6765722065303120 x9 : ffff80000870e710
> [    0.403788] x8 : 6964657220646e75 x7 : 0000000000000003 x6 : 0000000000000000
> [    0.403791] x5 : 0000000000000000 x4 : fffffc0000000000 x3 : 0000000000000010
> [    0.403794] x2 : 000000000000ffff x1 : 0000000000010000 x0 : 00000000ffffffed
> [    0.403798] Call trace:
> [    0.403799]  its_cpu_init+0x814/0xae0
> [    0.403802]  gic_starting_cpu+0x48/0x90
> [    0.403805]  cpuhp_invoke_callback+0x16c/0x5b0
> [    0.403808]  cpuhp_invoke_callback_range+0x78/0xf0
> [    0.403811]  notify_cpu_starting+0xbc/0xdc
> [    0.403814]  secondary_start_kernel+0xe0/0x170
> [    0.403817]  __secondary_switched+0x94/0x98
> [    0.403821] ---[ end trace f68728a0d3053b70 ]---
> 
> GICv3 interrupt controller is emulated by Xen hypervisor, this means the
> LPI configuration table and pending table allocated by Linux kernel are
> only emulated by software by not accessed by hardware, so it has no risk
> to introduce race condition between the secondary kernel launched by
> kexec and the physical interrupt controller.  And when the secondary
> kernel is booting, it uses totally separate memory region from the
> primary kernel, the secondary kernel can allocate its own LPI
> configuration table and pending table and register them into Xen
> hypervisor afterwards.
> 
> If look into the GIC implementation, LPI serves for message-based
> interrupts (MSI), it comes from ITS or directly from MSI, and at the end
> forward LPI to redistributor.  This means the physical LPIs are received
> in Xen hypervisor (in EL2) and sets List Register for virtual CPU
> interface (consumed in EL1).  Furthermore, to support the emulated LPIs,
> the first question is how to connect virtual GICv3 with MSI, and then
> it also requires Xen to emulate the ITS and redistributor; so far, Xen
> hypervisor doesn't really emulate these hardware mechanism thus the
> allocated LPI tables in Linux are not used by Xen hypervisor.
> 
> For above reasons, this patch simply skips to reserve persistent memory
> for Xen domain so can mute the useless oops.
> 
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Cc: Rahul Singh <Rahul.Singh@arm.com>
> Cc: Julien Grall <jgrall@amazon.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 5ff09de6c48f..9ba9984401de 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -34,6 +34,8 @@
>  #include <linux/irqchip/arm-gic-v3.h>
>  #include <linux/irqchip/arm-gic-v4.h>
>  
> +#include <xen/xen.h>
> +
>  #include <asm/cputype.h>
>  #include <asm/exception.h>
>  
> @@ -2220,6 +2222,21 @@ static bool gic_check_reserved_range(phys_addr_t addr, unsigned long size)
>  
>  static int gic_reserve_range(phys_addr_t addr, unsigned long size)
>  {
> +	/*
> +	 * When kernel runs in Xen domain, it misses to invoke the EFI stub,
> +	 * thus the memreserve table is not installed; in this case, the
> +	 * memory cannot be reserved as persistent region.
> +	 *
> +	 * On the other hand, the GICv3 controller is emulated by Xen
> +	 * hypervisor, given a redistrubitor its LPI pending table and
> +	 * configuration table are emulated by software but not manipulated
> +	 * by hardware.  Therefore, it's not necessary to reserve them, for
> +	 * kexec/kdump the secondary kernel can allocate new pages for these
> +	 * two tables.
> +	 */
> +	if (xen_domain())
> +		return 0;
> +
>  	if (efi_enabled(EFI_CONFIG_TABLES))
>  		return efi_mem_reserve_persistent(addr, size);
>  

No, never. The driver follows the architecture to the letter, and I'm
not going to pollute it because Xen is broken.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-09-06  6:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-06  2:40 [PATCH] irqchip/gic-v3: Don't reserve persistent memory for Xen domain Leo Yan
2022-09-06  6:22 ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).