linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds
@ 2018-03-13 17:21 Marc Zyngier
  2018-03-13 17:21 ` [PATCH 1/3] irqchip/gic-v2: Reset APRn registers at boot time Marc Zyngier
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Marc Zyngier @ 2018-03-13 17:21 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Jason Cooper, Thomas Gleixner, Grzegorz Jaszczyk, Mark Rutland

As kexec and kdump are getting used a bit more intensively, I've been
made aware of a number of shortcomings.

The main gripe is from folks trying to launch a kdump kernel from
within an interrupt handler. If using EOImode==1, things work as
expected. If using EOImode==0 (such as in a guest), the secondary
kernel hangs as the previous interrupt hasn't been EOI'd, and the
active priority is still set. The first two patches are addressing
this situation for both GICv2 and GICv3 by reseting the APRs to their
default value.

The last patch is introduced to workaround an annoying shortcoming on
some GICv3 implementations, where LPIs cannot be disabled once they've
been enabled. This completely kills the whole kexec story, as the
secondary kernel will not be able to configure LPIs, and may
experience random single bit corruption due to interrupts being made
pending in the now reallocated pending tables. Fun!

This is quite annoying in those (limited) situations where you'd like
Linux to act as your bootloader. For this particular use case, we
introduce a kernel command line option "irqchip.gicv3_nolpi=1", which
will force the kernel to ignore LPIs altogether, leaving them intact
to the secondary kernel to enjoy. This has proved to be quite useful
on my Chromebook.

I'd welcome any testing and reviewing. If nobody fundamentaly
disagrees with this, I plan to get it merged in 4.17.

Marc Zyngier (3):
  irqchip/gic-v2: Reset APRn registers at boot time
  irqchip/gic-v3: Reset APgRn registers at boot time
  irqchip/gic-v3: Allow LPIs to be disabled from the command line

 Documentation/admin-guide/kernel-parameters.txt |  8 +++++
 arch/arm/include/asm/arch_gicv3.h               | 41 +++++++++++++++++++------
 drivers/irqchip/irq-gic-v3.c                    | 33 +++++++++++++++++++-
 drivers/irqchip/irq-gic.c                       | 17 ++++++----
 4 files changed, 82 insertions(+), 17 deletions(-)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] irqchip/gic-v2: Reset APRn registers at boot time
  2018-03-13 17:21 [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Marc Zyngier
@ 2018-03-13 17:21 ` Marc Zyngier
  2018-03-13 17:21 ` [PATCH 2/3] irqchip/gic-v3: Reset APgRn " Marc Zyngier
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2018-03-13 17:21 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Jason Cooper, Thomas Gleixner, Grzegorz Jaszczyk, Mark Rutland

Booting a crash kernel while in an interrupt handler is likely
to leave the Active Priority Registers with some state that
is not relevant to the new kernel, and is likely to lead
to erratic behaviours such as interrupts not firing as their
priority is already active.

As a sanity measure, wipe the APRs clean on startup.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 drivers/irqchip/irq-gic.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 121af5cf688f..79801c24800b 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -453,15 +453,26 @@ static u8 gic_get_cpumask(struct gic_chip_data *gic)
 	return mask;
 }
 
+static bool gic_check_gicv2(void __iomem *base)
+{
+	u32 val = readl_relaxed(base + GIC_CPU_IDENT);
+	return (val & 0xff0fff) == 0x02043B;
+}
+
 static void gic_cpu_if_up(struct gic_chip_data *gic)
 {
 	void __iomem *cpu_base = gic_data_cpu_base(gic);
 	u32 bypass = 0;
 	u32 mode = 0;
+	int i;
 
 	if (gic == &gic_data[0] && static_key_true(&supports_deactivate))
 		mode = GIC_CPU_CTRL_EOImodeNS;
 
+	if (gic_check_gicv2(cpu_base))
+		for (i = 0; i < 4; i++)
+			writel_relaxed(0, cpu_base + GIC_CPU_ACTIVEPRIO + i * 4);
+
 	/*
 	* Preserve bypass disable bits to be written back later
 	*/
@@ -1264,12 +1275,6 @@ static int __init gicv2_force_probe_cfg(char *buf)
 }
 early_param("irqchip.gicv2_force_probe", gicv2_force_probe_cfg);
 
-static bool gic_check_gicv2(void __iomem *base)
-{
-	u32 val = readl_relaxed(base + GIC_CPU_IDENT);
-	return (val & 0xff0fff) == 0x02043B;
-}
-
 static bool gic_check_eoimode(struct device_node *node, void __iomem **base)
 {
 	struct resource cpuif_res;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] irqchip/gic-v3: Reset APgRn registers at boot time
  2018-03-13 17:21 [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Marc Zyngier
  2018-03-13 17:21 ` [PATCH 1/3] irqchip/gic-v2: Reset APRn registers at boot time Marc Zyngier
@ 2018-03-13 17:21 ` Marc Zyngier
  2018-03-13 17:21 ` [PATCH 3/3] irqchip/gic-v3: Allow LPIs to be disabled from the command line Marc Zyngier
  2018-03-13 17:51 ` [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Mark Rutland
  3 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2018-03-13 17:21 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Jason Cooper, Thomas Gleixner, Grzegorz Jaszczyk, Mark Rutland

Booting a crash kernel while in an interrupt handler is likely
to leave the Active Priority Registers with some state that
is not relevant to the new kernel, and is likely to lead
to erratic behaviours such as interrupts not firing as their
priority is already active.

As a sanity measure, wipe the APRs clean on startup. We make
sure to wipe both group 0 and 1 registers in order to avoid
any surprise.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/arch_gicv3.h | 41 +++++++++++++++++++++++++++++----------
 drivers/irqchip/irq-gic-v3.c      | 23 ++++++++++++++++++++++
 2 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/arch_gicv3.h b/arch/arm/include/asm/arch_gicv3.h
index 1070044f5c3f..27288bdbd840 100644
--- a/arch/arm/include/asm/arch_gicv3.h
+++ b/arch/arm/include/asm/arch_gicv3.h
@@ -35,6 +35,18 @@
 #define ICC_IGRPEN1			__ACCESS_CP15(c12, 0, c12, 7)
 #define ICC_BPR1			__ACCESS_CP15(c12, 0, c12, 3)
 
+#define __ICC_AP0Rx(x)			__ACCESS_CP15(c12, 0, c8, 4 | x)
+#define ICC_AP0R0			__ICC_AP0Rx(0)
+#define ICC_AP0R1			__ICC_AP0Rx(1)
+#define ICC_AP0R2			__ICC_AP0Rx(2)
+#define ICC_AP0R3			__ICC_AP0Rx(3)
+
+#define __ICC_AP1Rx(x)			__ACCESS_CP15(c12, 0, c9, x)
+#define ICC_AP1R0			__ICC_AP1Rx(0)
+#define ICC_AP1R1			__ICC_AP1Rx(1)
+#define ICC_AP1R2			__ICC_AP1Rx(2)
+#define ICC_AP1R3			__ICC_AP1Rx(3)
+
 #define ICC_HSRE			__ACCESS_CP15(c12, 4, c9, 5)
 
 #define ICH_VSEIR			__ACCESS_CP15(c12, 4, c9, 4)
@@ -86,17 +98,17 @@
 #define ICH_LRC14			__LRC8(6)
 #define ICH_LRC15			__LRC8(7)
 
-#define __AP0Rx(x)			__ACCESS_CP15(c12, 4, c8, x)
-#define ICH_AP0R0			__AP0Rx(0)
-#define ICH_AP0R1			__AP0Rx(1)
-#define ICH_AP0R2			__AP0Rx(2)
-#define ICH_AP0R3			__AP0Rx(3)
+#define __ICH_AP0Rx(x)			__ACCESS_CP15(c12, 4, c8, x)
+#define ICH_AP0R0			__ICH_AP0Rx(0)
+#define ICH_AP0R1			__ICH_AP0Rx(1)
+#define ICH_AP0R2			__ICH_AP0Rx(2)
+#define ICH_AP0R3			__ICH_AP0Rx(3)
 
-#define __AP1Rx(x)			__ACCESS_CP15(c12, 4, c9, x)
-#define ICH_AP1R0			__AP1Rx(0)
-#define ICH_AP1R1			__AP1Rx(1)
-#define ICH_AP1R2			__AP1Rx(2)
-#define ICH_AP1R3			__AP1Rx(3)
+#define __ICH_AP1Rx(x)			__ACCESS_CP15(c12, 4, c9, x)
+#define ICH_AP1R0			__ICH_AP1Rx(0)
+#define ICH_AP1R1			__ICH_AP1Rx(1)
+#define ICH_AP1R2			__ICH_AP1Rx(2)
+#define ICH_AP1R3			__ICH_AP1Rx(3)
 
 /* A32-to-A64 mappings used by VGIC save/restore */
 
@@ -125,6 +137,15 @@ static inline u64 read_ ## a64(void)		\
 	return val; 				\
 }
 
+CPUIF_MAP(ICC_AP0R0, ICC_AP0R0_EL1)
+CPUIF_MAP(ICC_AP0R1, ICC_AP0R1_EL1)
+CPUIF_MAP(ICC_AP0R2, ICC_AP0R2_EL1)
+CPUIF_MAP(ICC_AP0R3, ICC_AP0R3_EL1)
+CPUIF_MAP(ICC_AP1R0, ICC_AP1R0_EL1)
+CPUIF_MAP(ICC_AP1R1, ICC_AP1R1_EL1)
+CPUIF_MAP(ICC_AP1R2, ICC_AP1R2_EL1)
+CPUIF_MAP(ICC_AP1R3, ICC_AP1R3_EL1)
+
 CPUIF_MAP(ICH_HCR, ICH_HCR_EL2)
 CPUIF_MAP(ICH_VTR, ICH_VTR_EL2)
 CPUIF_MAP(ICH_MISR, ICH_MISR_EL2)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index d99cc07903ec..0ea02504115d 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -532,6 +532,7 @@ static void gic_cpu_sys_reg_init(void)
 	int i, cpu = smp_processor_id();
 	u64 mpidr = cpu_logical_map(cpu);
 	u64 need_rss = MPIDR_RS(mpidr);
+	u32 val;
 
 	/*
 	 * Need to check that the SRE bit has actually been set. If
@@ -562,6 +563,28 @@ static void gic_cpu_sys_reg_init(void)
 		gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir);
 	}
 
+	val = gic_read_ctlr();
+	val &= ICC_CTLR_EL1_PRI_BITS_MASK;
+	val >>= ICC_CTLR_EL1_PRI_BITS_SHIFT;
+
+	switch(val + 1) {
+	case 8:
+	case 7:
+		write_gicreg(0, ICC_AP0R3_EL1);
+		write_gicreg(0, ICC_AP1R3_EL1);
+		write_gicreg(0, ICC_AP0R2_EL1);
+		write_gicreg(0, ICC_AP1R2_EL1);
+	case 6:
+		write_gicreg(0, ICC_AP0R1_EL1);
+		write_gicreg(0, ICC_AP1R1_EL1);
+	case 5:
+	case 4:
+		write_gicreg(0, ICC_AP0R0_EL1);
+		write_gicreg(0, ICC_AP1R0_EL1);
+	}
+
+	isb();
+
 	/* ... and let's hit the road... */
 	gic_write_grpen1(1);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] irqchip/gic-v3: Allow LPIs to be disabled from the command line
  2018-03-13 17:21 [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Marc Zyngier
  2018-03-13 17:21 ` [PATCH 1/3] irqchip/gic-v2: Reset APRn registers at boot time Marc Zyngier
  2018-03-13 17:21 ` [PATCH 2/3] irqchip/gic-v3: Reset APgRn " Marc Zyngier
@ 2018-03-13 17:21 ` Marc Zyngier
  2018-03-15 14:58   ` Shanker Donthineni
  2018-03-13 17:51 ` [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Mark Rutland
  3 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2018-03-13 17:21 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Jason Cooper, Thomas Gleixner, Grzegorz Jaszczyk, Mark Rutland

For most GICv3 implementations, enabling LPIs is a one way switch.
Once they're on, there is no turning back, which completely kills
kexec (pending tables will always be live, and we can't tell the
secondary kernel where they are).

This is really annoying if you plan to use Linux as a bootloader,
as it pretty much guarantees that the secondary kernel won't be
able to use MSIs, and may even see some memory corruption. Bad.

A workaround for this unfortunate situation is to allow the kernel
not to enable LPIs, even if the feature is present in the HW. This
would allow Linux-as-a-bootloader to leave LPIs alone, and let the
secondary kernel to do whatever it wants with them.

Let's introduce a boolean "irqchip.gicv3_nolpi" command line option
that serves that purpose.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  8 ++++++++
 drivers/irqchip/irq-gic-v3.c                    | 10 +++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1d1d53f85ddd..60130231db3b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1743,6 +1743,14 @@
 			of a GICv2 controller even if the memory range
 			exposed by the device tree is too small.
 
+	irqchip.gicv3_nolpi=
+			[ARM, ARM64]
+			Force the kernel to ignore the availability of
+			LPIs (and by consequence ITSs). Intended for system
+			that use the kernel as a bootloader, and thus want
+			to let secondary kernels in charge of setting up
+			LPIs.
+
 	irqfixup	[HW]
 			When an interrupt is not handled search all handlers
 			for it. Intended to get systems with badly broken
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 0ea02504115d..3e9eeb6cb294 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -613,9 +613,17 @@ static void gic_cpu_sys_reg_init(void)
 		pr_crit_once("RSS is required but GICD doesn't support it\n");
 }
 
+static bool gicv3_nolpi;
+
+static int __init gicv3_nolpi_cfg(char *buf)
+{
+	return strtobool(buf, &gicv3_nolpi);
+}
+early_param("irqchip.gicv3_nolpi", gicv3_nolpi_cfg);
+
 static int gic_dist_supports_lpis(void)
 {
-	return !!(readl_relaxed(gic_data.dist_base + GICD_TYPER) & GICD_TYPER_LPIS);
+	return !!(readl_relaxed(gic_data.dist_base + GICD_TYPER) & GICD_TYPER_LPIS) && !gicv3_nolpi;
 }
 
 static void gic_cpu_init(void)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds
  2018-03-13 17:21 [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Marc Zyngier
                   ` (2 preceding siblings ...)
  2018-03-13 17:21 ` [PATCH 3/3] irqchip/gic-v3: Allow LPIs to be disabled from the command line Marc Zyngier
@ 2018-03-13 17:51 ` Mark Rutland
  2018-03-13 18:35   ` Marc Zyngier
  3 siblings, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2018-03-13 17:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Jason Cooper, Thomas Gleixner,
	Grzegorz Jaszczyk

On Tue, Mar 13, 2018 at 05:21:00PM +0000, Marc Zyngier wrote:
> As kexec and kdump are getting used a bit more intensively, I've been
> made aware of a number of shortcomings.
> 
> The main gripe is from folks trying to launch a kdump kernel from
> within an interrupt handler. If using EOImode==1, things work as
> expected. If using EOImode==0 (such as in a guest), the secondary
> kernel hangs as the previous interrupt hasn't been EOI'd, and the
> active priority is still set. The first two patches are addressing
> this situation for both GICv2 and GICv3 by reseting the APRs to their
> default value.

As a more general thing, if irqchip drivers have state that needs to be
reset in their init code, can we live all this irqchip reset to the
crashdump kernel, and kill machine_kexec_mask_interrupts() entirely?

That would avoid some work (including pointer chasing on potentially
corrupt memory) in the kernel that crashed, making it more likely that
we get to the crashkernel intact...

Thanks,
Mark

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds
  2018-03-13 17:51 ` [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Mark Rutland
@ 2018-03-13 18:35   ` Marc Zyngier
  2018-03-14 16:57     ` Mark Rutland
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2018-03-13 18:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, linux-arm-kernel, Jason Cooper, Thomas Gleixner,
	Grzegorz Jaszczyk

On 13/03/18 17:51, Mark Rutland wrote:
> On Tue, Mar 13, 2018 at 05:21:00PM +0000, Marc Zyngier wrote:
>> As kexec and kdump are getting used a bit more intensively, I've been
>> made aware of a number of shortcomings.
>>
>> The main gripe is from folks trying to launch a kdump kernel from
>> within an interrupt handler. If using EOImode==1, things work as
>> expected. If using EOImode==0 (such as in a guest), the secondary
>> kernel hangs as the previous interrupt hasn't been EOI'd, and the
>> active priority is still set. The first two patches are addressing
>> this situation for both GICv2 and GICv3 by reseting the APRs to their
>> default value.
> 
> As a more general thing, if irqchip drivers have state that needs to be
> reset in their init code, can we live all this irqchip reset to the
> crashdump kernel, and kill machine_kexec_mask_interrupts() entirely?

We could, once we know for sure that all the potential irqchips have
been fixed. Or we could just remove it immediately, and see what breaks.

> That would avoid some work (including pointer chasing on potentially
> corrupt memory) in the kernel that crashed, making it more likely that
> we get to the crashkernel intact...

Seems perfectly sensible to me.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds
  2018-03-13 18:35   ` Marc Zyngier
@ 2018-03-14 16:57     ` Mark Rutland
  2018-03-14 17:11       ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2018-03-14 16:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Jason Cooper, Thomas Gleixner,
	Grzegorz Jaszczyk

On Tue, Mar 13, 2018 at 06:35:07PM +0000, Marc Zyngier wrote:
> On 13/03/18 17:51, Mark Rutland wrote:
> > On Tue, Mar 13, 2018 at 05:21:00PM +0000, Marc Zyngier wrote:
> >> As kexec and kdump are getting used a bit more intensively, I've been
> >> made aware of a number of shortcomings.
> >>
> >> The main gripe is from folks trying to launch a kdump kernel from
> >> within an interrupt handler. If using EOImode==1, things work as
> >> expected. If using EOImode==0 (such as in a guest), the secondary
> >> kernel hangs as the previous interrupt hasn't been EOI'd, and the
> >> active priority is still set. The first two patches are addressing
> >> this situation for both GICv2 and GICv3 by reseting the APRs to their
> >> default value.
> > 
> > As a more general thing, if irqchip drivers have state that needs to be
> > reset in their init code, can we live all this irqchip reset to the
> > crashdump kernel, and kill machine_kexec_mask_interrupts() entirely?
> 
> We could, once we know for sure that all the potential irqchips have
> been fixed. Or we could just remove it immediately, and see what breaks.

I would be very tempted to do the latter.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds
  2018-03-14 16:57     ` Mark Rutland
@ 2018-03-14 17:11       ` Thomas Gleixner
  2018-03-14 17:42         ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2018-03-14 17:11 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, linux-kernel, linux-arm-kernel, Jason Cooper,
	Grzegorz Jaszczyk

On Wed, 14 Mar 2018, Mark Rutland wrote:
> On Tue, Mar 13, 2018 at 06:35:07PM +0000, Marc Zyngier wrote:
> > On 13/03/18 17:51, Mark Rutland wrote:
> > > On Tue, Mar 13, 2018 at 05:21:00PM +0000, Marc Zyngier wrote:
> > >> As kexec and kdump are getting used a bit more intensively, I've been
> > >> made aware of a number of shortcomings.
> > >>
> > >> The main gripe is from folks trying to launch a kdump kernel from
> > >> within an interrupt handler. If using EOImode==1, things work as
> > >> expected. If using EOImode==0 (such as in a guest), the secondary
> > >> kernel hangs as the previous interrupt hasn't been EOI'd, and the
> > >> active priority is still set. The first two patches are addressing
> > >> this situation for both GICv2 and GICv3 by reseting the APRs to their
> > >> default value.
> > > 
> > > As a more general thing, if irqchip drivers have state that needs to be
> > > reset in their init code, can we live all this irqchip reset to the
> > > crashdump kernel, and kill machine_kexec_mask_interrupts() entirely?
> > 
> > We could, once we know for sure that all the potential irqchips have
> > been fixed. Or we could just remove it immediately, and see what breaks.
> 
> I would be very tempted to do the latter.

Makes sense. Do we have any indicator that tells us that a particular irq
chip is missing something in the init code or do we have to rely on crash
reports?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds
  2018-03-14 17:11       ` Thomas Gleixner
@ 2018-03-14 17:42         ` Marc Zyngier
  2018-03-14 19:35           ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2018-03-14 17:42 UTC (permalink / raw)
  To: Thomas Gleixner, Mark Rutland
  Cc: linux-kernel, linux-arm-kernel, Jason Cooper, Grzegorz Jaszczyk

On 14/03/18 17:11, Thomas Gleixner wrote:
> On Wed, 14 Mar 2018, Mark Rutland wrote:
>> On Tue, Mar 13, 2018 at 06:35:07PM +0000, Marc Zyngier wrote:
>>> On 13/03/18 17:51, Mark Rutland wrote:
>>>> On Tue, Mar 13, 2018 at 05:21:00PM +0000, Marc Zyngier wrote:
>>>>> As kexec and kdump are getting used a bit more intensively, I've been
>>>>> made aware of a number of shortcomings.
>>>>>
>>>>> The main gripe is from folks trying to launch a kdump kernel from
>>>>> within an interrupt handler. If using EOImode==1, things work as
>>>>> expected. If using EOImode==0 (such as in a guest), the secondary
>>>>> kernel hangs as the previous interrupt hasn't been EOI'd, and the
>>>>> active priority is still set. The first two patches are addressing
>>>>> this situation for both GICv2 and GICv3 by reseting the APRs to their
>>>>> default value.
>>>>
>>>> As a more general thing, if irqchip drivers have state that needs to be
>>>> reset in their init code, can we live all this irqchip reset to the
>>>> crashdump kernel, and kill machine_kexec_mask_interrupts() entirely?
>>>
>>> We could, once we know for sure that all the potential irqchips have
>>> been fixed. Or we could just remove it immediately, and see what breaks.
>>
>> I would be very tempted to do the latter.
> 
> Makes sense. Do we have any indicator that tells us that a particular irq
> chip is missing something in the init code or do we have to rely on crash
> reports?
A way to work out what is potentially missing would be to make sure that
whatever we're removing from machine_kexec_mask_interrupts, we can find
it in the irqchip init code. Not an easy task, and certainly not perfect
(patches 1 and 2 in this series have no equivalent in the kexec code).

There is still another category of "reset" stuff that belongs to the
teardown path, and that's for things that may have an impact on the
secondary kernel.

The case I have in mind is that of the GIC LPI pending tables. These are
allocated to the GIC, which can write pending bits at any time. Think of
it as a DMA engine. At the moment we enter the secondary kernel, we must
make sure the GIC has already been shut down, as the table memory will
be reallocated.

For that particular case, I've started looking at some "reset" API that
an irqchip to register with, and get called back on kexec/kdump. Not
completely dissimilar to the shutdown method that some IOMMU drivers use
to gracefully stop in the same circumstances.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds
  2018-03-14 17:42         ` Marc Zyngier
@ 2018-03-14 19:35           ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2018-03-14 19:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mark Rutland, linux-kernel, linux-arm-kernel, Jason Cooper,
	Grzegorz Jaszczyk

On Wed, 14 Mar 2018, Marc Zyngier wrote:
> On 14/03/18 17:11, Thomas Gleixner wrote:
> > Makes sense. Do we have any indicator that tells us that a particular irq
> > chip is missing something in the init code or do we have to rely on crash
> > reports?
> A way to work out what is potentially missing would be to make sure that
> whatever we're removing from machine_kexec_mask_interrupts, we can find
> it in the irqchip init code. Not an easy task, and certainly not perfect
> (patches 1 and 2 in this series have no equivalent in the kexec code).
> 
> There is still another category of "reset" stuff that belongs to the
> teardown path, and that's for things that may have an impact on the
> secondary kernel.
> 
> The case I have in mind is that of the GIC LPI pending tables. These are
> allocated to the GIC, which can write pending bits at any time. Think of
> it as a DMA engine. At the moment we enter the secondary kernel, we must
> make sure the GIC has already been shut down, as the table memory will
> be reallocated.

Yes, you surely need to prevent that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] irqchip/gic-v3: Allow LPIs to be disabled from the command line
  2018-03-13 17:21 ` [PATCH 3/3] irqchip/gic-v3: Allow LPIs to be disabled from the command line Marc Zyngier
@ 2018-03-15 14:58   ` Shanker Donthineni
  2018-03-15 15:59     ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: Shanker Donthineni @ 2018-03-15 14:58 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel, linux-arm-kernel
  Cc: Mark Rutland, Grzegorz Jaszczyk, Thomas Gleixner, Jason Cooper


Hi Marc,

On 03/13/2018 12:21 PM, Marc Zyngier wrote:
> For most GICv3 implementations, enabling LPIs is a one way switch.
> Once they're on, there is no turning back, which completely kills
> kexec (pending tables will always be live, and we can't tell the
> secondary kernel where they are).
> 
> This is really annoying if you plan to use Linux as a bootloader,
> as it pretty much guarantees that the secondary kernel won't be
> able to use MSIs, and may even see some memory corruption. Bad.
> 
> A workaround for this unfortunate situation is to allow the kernel
> not to enable LPIs, even if the feature is present in the HW. This
> would allow Linux-as-a-bootloader to leave LPIs alone, and let the
> secondary kernel to do whatever it wants with them.
> 
> Let's introduce a boolean "irqchip.gicv3_nolpi" command line option
> that serves that purpose.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  8 ++++++++
>  drivers/irqchip/irq-gic-v3.c                    | 10 +++++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 1d1d53f85ddd..60130231db3b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1743,6 +1743,14 @@
>  			of a GICv2 controller even if the memory range
>  			exposed by the device tree is too small.
>  
> +	irqchip.gicv3_nolpi=
> +			[ARM, ARM64]
> +			Force the kernel to ignore the availability of
> +			LPIs (and by consequence ITSs). Intended for system
> +			that use the kernel as a bootloader, and thus want
> +			to let secondary kernels in charge of setting up
> +			LPIs.
> +
>  	irqfixup	[HW]
>  			When an interrupt is not handled search all handlers
>  			for it. Intended to get systems with badly broken
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 0ea02504115d..3e9eeb6cb294 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -613,9 +613,17 @@ static void gic_cpu_sys_reg_init(void)
>  		pr_crit_once("RSS is required but GICD doesn't support it\n");
>  }
>  
> +static bool gicv3_nolpi;
> +
> +static int __init gicv3_nolpi_cfg(char *buf)
> +{
> +	return strtobool(buf, &gicv3_nolpi);
> +}
> +early_param("irqchip.gicv3_nolpi", gicv3_nolpi_cfg);
> +
>  static int gic_dist_supports_lpis(void)
>  {
> -	return !!(readl_relaxed(gic_data.dist_base + GICD_TYPER) & GICD_TYPER_LPIS);
> +	return !!(readl_relaxed(gic_data.dist_base + GICD_TYPER) & GICD_TYPER_LPIS) && !gicv3_nolpi;

Thanks for this patch series especially for KDUMP case. It would be nice if we disable GIC-ITS and
GICR-LPI functionality completely to avoid in flight LPIs which were triggered by first kernel.

  
>  }
>  
>  static void gic_cpu_init(void)
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] irqchip/gic-v3: Allow LPIs to be disabled from the command line
  2018-03-15 14:58   ` Shanker Donthineni
@ 2018-03-15 15:59     ` Marc Zyngier
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2018-03-15 15:59 UTC (permalink / raw)
  To: shankerd, linux-kernel, linux-arm-kernel
  Cc: Mark Rutland, Grzegorz Jaszczyk, Thomas Gleixner, Jason Cooper

On 15/03/18 14:58, Shanker Donthineni wrote:
> 
> Hi Marc,
> 
> On 03/13/2018 12:21 PM, Marc Zyngier wrote:
>> For most GICv3 implementations, enabling LPIs is a one way switch.
>> Once they're on, there is no turning back, which completely kills
>> kexec (pending tables will always be live, and we can't tell the
>> secondary kernel where they are).
>>
>> This is really annoying if you plan to use Linux as a bootloader,
>> as it pretty much guarantees that the secondary kernel won't be
>> able to use MSIs, and may even see some memory corruption. Bad.
>>
>> A workaround for this unfortunate situation is to allow the kernel
>> not to enable LPIs, even if the feature is present in the HW. This
>> would allow Linux-as-a-bootloader to leave LPIs alone, and let the
>> secondary kernel to do whatever it wants with them.
>>
>> Let's introduce a boolean "irqchip.gicv3_nolpi" command line option
>> that serves that purpose.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  Documentation/admin-guide/kernel-parameters.txt |  8 ++++++++
>>  drivers/irqchip/irq-gic-v3.c                    | 10 +++++++++-
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 1d1d53f85ddd..60130231db3b 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1743,6 +1743,14 @@
>>  			of a GICv2 controller even if the memory range
>>  			exposed by the device tree is too small.
>>  
>> +	irqchip.gicv3_nolpi=
>> +			[ARM, ARM64]
>> +			Force the kernel to ignore the availability of
>> +			LPIs (and by consequence ITSs). Intended for system
>> +			that use the kernel as a bootloader, and thus want
>> +			to let secondary kernels in charge of setting up
>> +			LPIs.
>> +
>>  	irqfixup	[HW]
>>  			When an interrupt is not handled search all handlers
>>  			for it. Intended to get systems with badly broken
>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>> index 0ea02504115d..3e9eeb6cb294 100644
>> --- a/drivers/irqchip/irq-gic-v3.c
>> +++ b/drivers/irqchip/irq-gic-v3.c
>> @@ -613,9 +613,17 @@ static void gic_cpu_sys_reg_init(void)
>>  		pr_crit_once("RSS is required but GICD doesn't support it\n");
>>  }
>>  
>> +static bool gicv3_nolpi;
>> +
>> +static int __init gicv3_nolpi_cfg(char *buf)
>> +{
>> +	return strtobool(buf, &gicv3_nolpi);
>> +}
>> +early_param("irqchip.gicv3_nolpi", gicv3_nolpi_cfg);
>> +
>>  static int gic_dist_supports_lpis(void)
>>  {
>> -	return !!(readl_relaxed(gic_data.dist_base + GICD_TYPER) & GICD_TYPER_LPIS);
>> +	return !!(readl_relaxed(gic_data.dist_base + GICD_TYPER) & GICD_TYPER_LPIS) && !gicv3_nolpi;
> 
> Thanks for this patch series especially for KDUMP case. It would be nice if we disable GIC-ITS and
> GICR-LPI functionality completely to avoid in flight LPIs which were triggered by first kernel.

For kdump, it doesn't really matter much. The kdump kernel lives in its
own memory space, and is unaffected by LPIs being triggered. You just
need to make sure that if you can't reset EnableLPIs, you still carry on
using wired interrupts. The ITS doesn't really matter, as long as the
redistributors have their EnableLPIs zeroed.

It really is for kexec that it matters a lot, because the secondary
kernel expects to find a sane environment, which it cannot have if LPIs
are still on.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-03-15 15:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-13 17:21 [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Marc Zyngier
2018-03-13 17:21 ` [PATCH 1/3] irqchip/gic-v2: Reset APRn registers at boot time Marc Zyngier
2018-03-13 17:21 ` [PATCH 2/3] irqchip/gic-v3: Reset APgRn " Marc Zyngier
2018-03-13 17:21 ` [PATCH 3/3] irqchip/gic-v3: Allow LPIs to be disabled from the command line Marc Zyngier
2018-03-15 14:58   ` Shanker Donthineni
2018-03-15 15:59     ` Marc Zyngier
2018-03-13 17:51 ` [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds Mark Rutland
2018-03-13 18:35   ` Marc Zyngier
2018-03-14 16:57     ` Mark Rutland
2018-03-14 17:11       ` Thomas Gleixner
2018-03-14 17:42         ` Marc Zyngier
2018-03-14 19:35           ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).