linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke
@ 2023-06-08 12:00 James Gowans
  2023-06-08 12:00 ` [PATCH 1/3] genirq: Expand doc for PENDING and REPLAY flags James Gowans
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: James Gowans @ 2023-06-08 12:00 UTC (permalink / raw)
  To: Thomas Gleixner, liaochang1, Marc Zyngier; +Cc: linux-kernel, James Gowans

If interrupts do not have global active states it is possible for
the next interrupt to arrive on a new CPU if an affinity change happens
while the original CPU is still running the handler. This specifically
impacts GIC-v3.

In this series, generic functionality is added to handle_fast_eoi() to
support resending the interrupt when this race happens, and that generic
functionality is enabled specifically for the GIC-v3 which is impacted
by this issue. GIC-v3 uses the handle_fast_eoi() generic handler, hence
that is the handler getting the functionality.

Also adding a bit more details to the IRQD flags docs to help future
readers know when/why flags should be used and what they mean.

== Testing: ==

TL;DR: Run a virt using QEMU on a EC2 R6g.metal host with a ENA device
passed through using VFIO - bounce IRQ affinity between two CPUs. Before
this change an interrupt can get lost and the device stalls; after this
change the interrupt is not lost.

=== Details: ===

Intentionally slow down the IRQ injection a bit, to turn this from a
rare race condition which to something which can easily be flushed out
in testing:

@@ -763,6 +764,7 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi)
        raw_spin_lock_irqsave(&irq->irq_lock, flags);
        irq->pending_latch = true;
        vgic_queue_irq_unlock(kvm, irq, flags);
+       udelay(10);

        return 0;
 }

Also sprinkle a print to make it clear when the race described here is
hit:

@@ -698,6 +698,7 @@ void handle_fasteoi_irq(struct irq_desc *desc)
         * handling the previous one - it may need to be resent.
         */
        if (!irq_may_run(desc)) {
+               printk("!irq_may_run %i\n", desc->irq_data.irq);
                if (irqd_needs_resend_when_in_progress(&desc->irq_data))
                        desc->istate |= IRQS_PENDING;
                goto out;

Launch QEMU in your favourite way, with an ENA device passed through via
VFIO (VFIO driver re-binding needs to be done before this):

qemu-system-aarch64 -enable-kvm  -machine virt,gic_version=3 -device vfio-pci,host=04:00.0 ...

In the VM, generate network traffic to get interrupts flowing:

ping -f -i 0.001 10.0.3.1 > /dev/null

On the host, change affinity of the interrupt around to flush out the race:

while true; do
	echo 1 > /proc/irq/71/smp_affinity ; sleep 0.01;
	echo 2 > /proc/irq/71/smp_affinity ; sleep 0.01;
done

In host dmesg the printk indicates that the race is hit:

[  102.215801] !irq_may_run 71
[  105.426413] !irq_may_run 71
[  105.586462] !irq_may_run 71

Before this change, an interrupt is lost and this manifests as a driver
watchdog timeout in the guest device driver:

[   35.124441] ena 0000:00:02.0 enp0s2: Found a Tx that wasn't completed on time,...
...
[   37.124459] ------------[ cut here ]------------
[   37.124791] NETDEV WATCHDOG: enp0s2 (ena): transmit queue 0 timed out

After this change, even though the !irq_may_run print is still shown
(indicating that the race is still hit) the driver no longer times out
because the interrupt now gets resent when the race occurs.

James Gowans (3):
  genirq: Expand doc for PENDING and REPLAY flags
  genirq: fasteoi supports resend on concurrent invoke
  irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs

 drivers/irqchip/irq-gic-v3-its.c |  2 ++
 include/linux/irq.h              | 13 +++++++++++++
 kernel/irq/chip.c                | 16 +++++++++++++++-
 kernel/irq/debugfs.c             |  2 ++
 kernel/irq/internals.h           |  7 +++++--
 5 files changed, 37 insertions(+), 3 deletions(-)


base-commit: 5f63595ebd82f56a2dd36ca013dd7f5ff2e2416a
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] genirq: Expand doc for PENDING and REPLAY flags
  2023-06-08 12:00 [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke James Gowans
@ 2023-06-08 12:00 ` James Gowans
  2023-06-16 11:25   ` [irqchip: irq/irqchip-next] " irqchip-bot for James Gowans
  2023-06-08 12:00 ` [PATCH 2/3] genirq: fasteoi supports resend on concurrent invoke James Gowans
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: James Gowans @ 2023-06-08 12:00 UTC (permalink / raw)
  To: Thomas Gleixner, liaochang1, Marc Zyngier; +Cc: linux-kernel, James Gowans

Adding a bit more info about what the flags are used for may help future
code readers.

Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Liao Chang <liaochang1@huawei.com>
---
 kernel/irq/internals.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 5fdc0b557579..c443a0ddc07e 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -47,9 +47,12 @@ enum {
  *				  detection
  * IRQS_POLL_INPROGRESS		- polling in progress
  * IRQS_ONESHOT			- irq is not unmasked in primary handler
- * IRQS_REPLAY			- irq is replayed
+ * IRQS_REPLAY			- irq has been resent and will not be resent
+ * 				  again until the handler has run and cleared
+ * 				  this flag.
  * IRQS_WAITING			- irq is waiting
- * IRQS_PENDING			- irq is pending and replayed later
+ * IRQS_PENDING			- irq needs to be resent and should be resent
+ * 				  at the next available opportunity.
  * IRQS_SUSPENDED		- irq is suspended
  * IRQS_NMI			- irq line is used to deliver NMIs
  * IRQS_SYSFS			- descriptor has been added to sysfs
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] genirq: fasteoi supports resend on concurrent invoke
  2023-06-08 12:00 [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke James Gowans
  2023-06-08 12:00 ` [PATCH 1/3] genirq: Expand doc for PENDING and REPLAY flags James Gowans
@ 2023-06-08 12:00 ` James Gowans
  2023-06-16 11:25   ` [irqchip: irq/irqchip-next] genirq: Allow fasteoi handler to resend interrupts on concurrent handling irqchip-bot for James Gowans
  2023-06-08 12:00 ` [PATCH 3/3] irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs James Gowans
  2023-06-16  8:32 ` [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke Gowans, James
  3 siblings, 1 reply; 10+ messages in thread
From: James Gowans @ 2023-06-08 12:00 UTC (permalink / raw)
  To: Thomas Gleixner, liaochang1, Marc Zyngier
  Cc: linux-kernel, James Gowans, KarimAllah Raslan, Yipeng Zou, Zhang Jianhua

Update the generic handle_fasteoi_irq to support catering for the case
when the next interrupt comes in while the previous handler is still
running. Currently when that happens the irq_may_run() early out causes
the next IRQ to be lost. Support marking the interrupt as pending when
that happens so that the interrupt can be resent before
handle_fasteoi_irq returns. This is inspired by handle_edge_irq.

Generally it should not be possible for the next interrupt to arrive
while the previous handler is still running: the CPU will not preempt an
interrupt with another from the same source or same priority. However,
there is a race: if the interrupt affinity is changed while the previous
handler is running, then the next interrupt can arrive at a different
CPU while the previous handler is still running. In that case there will
be a concurrent invoke and the early out will be taken.

For example:

           CPU 0             |          CPU 1
-----------------------------|-----------------------------
interrupt start              |
  handle_fasteoi_irq         | set_affinity(CPU 1)
    handler                  |
    ...                      | interrupt start
    ...                      |   handle_fasteoi_irq -> early out
  handle_fasteoi_irq return  | interrupt end
interrupt end                |

This can happen on interrupt controllers which do not have a global
active state; the next commit will enable the flag for known impacted
controllers.

Implementation notes:

It is believed that it's NOT necessary to mask the interrupt in
handle_fasteoi_irq() the way that handle_edge_irq() does. This is
because handle_edge_irq() caters for controllers which are too simple to
gate interrupts from the same source, so the kernel explicitly masks the
interrupt if it re-occurs [0].

The resend on concurrent invoke logic is gated by a flag which the
interrupt controller can set if it's susceptible to this problem. It is
not desirable to resend unconditionally: a wake up source for example
has no need to be re-sent.

[0] https://lore.kernel.org/all/bf94a380-fadd-8c38-cc51-4b54711d84b3@huawei.com/

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
---
 include/linux/irq.h  | 13 +++++++++++++
 kernel/irq/chip.c    | 16 +++++++++++++++-
 kernel/irq/debugfs.c |  2 ++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index b1b28affb32a..cb77da1ac4c6 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -223,6 +223,8 @@ struct irq_data {
  *				  irq_chip::irq_set_affinity() when deactivated.
  * IRQD_IRQ_ENABLED_ON_SUSPEND	- Interrupt is enabled on suspend by irq pm if
  *				  irqchip have flag IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND set.
+ * IRQD_RESEND_WHEN_IN_PROGRESS	- Interrupt may fire when already in progress in which
+ *				  case it must be resent at the next available opportunity.
  */
 enum {
 	IRQD_TRIGGER_MASK		= 0xf,
@@ -249,6 +251,7 @@ enum {
 	IRQD_HANDLE_ENFORCE_IRQCTX	= (1 << 28),
 	IRQD_AFFINITY_ON_ACTIVATE	= (1 << 29),
 	IRQD_IRQ_ENABLED_ON_SUSPEND	= (1 << 30),
+	IRQD_RESEND_WHEN_IN_PROGRESS    = (1U << 31),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -448,6 +451,16 @@ static inline bool irqd_affinity_on_activate(struct irq_data *d)
 	return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE;
 }
 
+static inline void irqd_set_resend_when_in_progress(struct irq_data *d)
+{
+	__irqd_to_state(d) |= IRQD_RESEND_WHEN_IN_PROGRESS;
+}
+
+static inline bool irqd_needs_resend_when_in_progress(struct irq_data *d)
+{
+	return __irqd_to_state(d) & IRQD_RESEND_WHEN_IN_PROGRESS;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 49e7bc871fec..57cd8f475302 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -692,8 +692,16 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 
 	raw_spin_lock(&desc->lock);
 
-	if (!irq_may_run(desc))
+	/*
+	 * When an affinity change races with IRQ handling, the next interrupt
+	 * can arrive on the new CPU before the original CPU has completed
+	 * handling the previous one - it may need to be resent.
+	 */
+	if (!irq_may_run(desc)) {
+		if (irqd_needs_resend_when_in_progress(&desc->irq_data))
+			desc->istate |= IRQS_PENDING;
 		goto out;
+	}
 
 	desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
 
@@ -715,6 +723,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 
 	cond_unmask_eoi_irq(desc, chip);
 
+	/*
+	 * When the race described above happens this will resend the interrupt.
+	 */
+	if (unlikely(desc->istate & IRQS_PENDING))
+		check_irq_resend(desc, false);
+
 	raw_spin_unlock(&desc->lock);
 	return;
 out:
diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c
index bbcaac64038e..5971a66be034 100644
--- a/kernel/irq/debugfs.c
+++ b/kernel/irq/debugfs.c
@@ -133,6 +133,8 @@ static const struct irq_bit_descr irqdata_states[] = {
 	BIT_MASK_DESCR(IRQD_HANDLE_ENFORCE_IRQCTX),
 
 	BIT_MASK_DESCR(IRQD_IRQ_ENABLED_ON_SUSPEND),
+
+	BIT_MASK_DESCR(IRQD_RESEND_WHEN_IN_PROGRESS),
 };
 
 static const struct irq_bit_descr irqdesc_states[] = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
  2023-06-08 12:00 [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke James Gowans
  2023-06-08 12:00 ` [PATCH 1/3] genirq: Expand doc for PENDING and REPLAY flags James Gowans
  2023-06-08 12:00 ` [PATCH 2/3] genirq: fasteoi supports resend on concurrent invoke James Gowans
@ 2023-06-08 12:00 ` James Gowans
  2023-06-16 11:25   ` [irqchip: irq/irqchip-next] " irqchip-bot for James Gowans
  2023-06-16  8:32 ` [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke Gowans, James
  3 siblings, 1 reply; 10+ messages in thread
From: James Gowans @ 2023-06-08 12:00 UTC (permalink / raw)
  To: Thomas Gleixner, liaochang1, Marc Zyngier
  Cc: linux-kernel, James Gowans, KarimAllah Raslan, Yipeng Zou, Zhang Jianhua

GIC-v3 LPIs are impacted by an architectural design issue: they do not
have a global active state and as such the next LPI can be delivered to
a new CPU after an affinity change while the previous LPI handler has
not yet returned on the original CPU. If LPIs had an active state, then
an LPI would not be able to be retriggered until the first CPU had
issued a deactivation. As is, it is necessary to cater for this case in
software by enabling the IRQD_RESEND_WHEN_IN_PROGRESS flag for GIC-v3
interrupts. Setting this flag mitigates the issue by getting the
original CPU to resend the interrupt after the handler completed.

This RESEND_WHEN_INPROGRESS enablement is done for vanilla GIC-v3
interrupts allocated from its_irq_domain_alloc as well as directly
injected vLPIs for vPEs provided by GIC-v4 features, which still uses
the GIC-v3 code.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 0ec2b1e1df75..1994541eaef8 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3585,6 +3585,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 		irqd = irq_get_irq_data(virq + i);
 		irqd_set_single_target(irqd);
 		irqd_set_affinity_on_activate(irqd);
+		irqd_set_resend_when_in_progress(irqd);
 		pr_debug("ID:%d pID:%d vID:%d\n",
 			 (int)(hwirq + i - its_dev->event_map.lpi_base),
 			 (int)(hwirq + i), virq + i);
@@ -4523,6 +4524,7 @@ static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int virq
 		irq_domain_set_hwirq_and_chip(domain, virq + i, i,
 					      irqchip, vm->vpes[i]);
 		set_bit(i, bitmap);
+		irqd_set_resend_when_in_progress(irq_get_irq_data(virq + i));
 	}
 
 	if (err) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke
  2023-06-08 12:00 [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke James Gowans
                   ` (2 preceding siblings ...)
  2023-06-08 12:00 ` [PATCH 3/3] irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs James Gowans
@ 2023-06-16  8:32 ` Gowans, James
  2023-06-16 11:30   ` Marc Zyngier
  3 siblings, 1 reply; 10+ messages in thread
From: Gowans, James @ 2023-06-16  8:32 UTC (permalink / raw)
  To: tglx, maz, liaochang1; +Cc: linux-kernel

Hi Marc and Tomas,
Just a ping on this series; would be great to get any more feedback, or
get this merged.

Thanks!
James

On Thu, 2023-06-08 at 14:00 +0200, James Gowans wrote:
> If interrupts do not have global active states it is possible for
> the next interrupt to arrive on a new CPU if an affinity change happens
> while the original CPU is still running the handler. This specifically
> impacts GIC-v3.
> 
> In this series, generic functionality is added to handle_fast_eoi() to
> support resending the interrupt when this race happens, and that generic
> functionality is enabled specifically for the GIC-v3 which is impacted
> by this issue. GIC-v3 uses the handle_fast_eoi() generic handler, hence
> that is the handler getting the functionality.
> 
> Also adding a bit more details to the IRQD flags docs to help future
> readers know when/why flags should be used and what they mean.
> 
> == Testing: ==
> 
> TL;DR: Run a virt using QEMU on a EC2 R6g.metal host with a ENA device
> passed through using VFIO - bounce IRQ affinity between two CPUs. Before
> this change an interrupt can get lost and the device stalls; after this
> change the interrupt is not lost.
> 
> === Details: ===
> 
> Intentionally slow down the IRQ injection a bit, to turn this from a
> rare race condition which to something which can easily be flushed out
> in testing:
> 
> @@ -763,6 +764,7 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi)
>         raw_spin_lock_irqsave(&irq->irq_lock, flags);
>         irq->pending_latch = true;
>         vgic_queue_irq_unlock(kvm, irq, flags);
> +       udelay(10);
> 
>         return 0;
>  }
> 
> Also sprinkle a print to make it clear when the race described here is
> hit:
> 
> @@ -698,6 +698,7 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>          * handling the previous one - it may need to be resent.
>          */
>         if (!irq_may_run(desc)) {
> +               printk("!irq_may_run %i\n", desc->irq_data.irq);
>                 if (irqd_needs_resend_when_in_progress(&desc->irq_data))
>                         desc->istate |= IRQS_PENDING;
>                 goto out;
> 
> Launch QEMU in your favourite way, with an ENA device passed through via
> VFIO (VFIO driver re-binding needs to be done before this):
> 
> qemu-system-aarch64 -enable-kvm  -machine virt,gic_version=3 -device vfio-pci,host=04:00.0 ...
> 
> In the VM, generate network traffic to get interrupts flowing:
> 
> ping -f -i 0.001 10.0.3.1 > /dev/null
> 
> On the host, change affinity of the interrupt around to flush out the race:
> 
> while true; do
> 	echo 1 > /proc/irq/71/smp_affinity ; sleep 0.01;
> 	echo 2 > /proc/irq/71/smp_affinity ; sleep 0.01;
> done
> 
> In host dmesg the printk indicates that the race is hit:
> 
> [  102.215801] !irq_may_run 71
> [  105.426413] !irq_may_run 71
> [  105.586462] !irq_may_run 71
> 
> Before this change, an interrupt is lost and this manifests as a driver
> watchdog timeout in the guest device driver:
> 
> [   35.124441] ena 0000:00:02.0 enp0s2: Found a Tx that wasn't completed on time,...
> ...
> [   37.124459] ------------[ cut here ]------------
> [   37.124791] NETDEV WATCHDOG: enp0s2 (ena): transmit queue 0 timed out
> 
> After this change, even though the !irq_may_run print is still shown
> (indicating that the race is still hit) the driver no longer times out
> because the interrupt now gets resent when the race occurs.
> 
> James Gowans (3):
>   genirq: Expand doc for PENDING and REPLAY flags
>   genirq: fasteoi supports resend on concurrent invoke
>   irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
> 
>  drivers/irqchip/irq-gic-v3-its.c |  2 ++
>  include/linux/irq.h              | 13 +++++++++++++
>  kernel/irq/chip.c                | 16 +++++++++++++++-
>  kernel/irq/debugfs.c             |  2 ++
>  kernel/irq/internals.h           |  7 +++++--
>  5 files changed, 37 insertions(+), 3 deletions(-)
> 
> 
> base-commit: 5f63595ebd82f56a2dd36ca013dd7f5ff2e2416a

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [irqchip: irq/irqchip-next] irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
  2023-06-08 12:00 ` [PATCH 3/3] irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs James Gowans
@ 2023-06-16 11:25   ` irqchip-bot for James Gowans
  0 siblings, 0 replies; 10+ messages in thread
From: irqchip-bot for James Gowans @ 2023-06-16 11:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Marc Zyngier, James Gowans, Thomas Gleixner, KarimAllah Raslan,
	Yipeng Zou, Zhang Jianhua

The following commit has been merged into the irq/irqchip-next branch of irqchip:

Commit-ID:     8f4b589595d01f882d63d21efe15af4a5ad7c59b
Gitweb:        https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms/8f4b589595d01f882d63d21efe15af4a5ad7c59b
Author:        James Gowans <jgowans@amazon.com>
AuthorDate:    Thu, 08 Jun 2023 14:00:21 +02:00
Committer:     Marc Zyngier <maz@kernel.org>
CommitterDate: Fri, 16 Jun 2023 12:23:40 +01:00

irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs

GICv3 LPIs are impacted by an architectural design issue: they do not
have a global active state and as such a given LPI can be delivered to
a new CPU after an affinity change while the previous instance of the
same LPI handler has not yet completed on the original CPU.

If LPIs had an active state, this second LPI would not be delivered
until the first CPU deactivated the initial LPI, just like SPIs.

To solve this issue, use the newly introduced IRQD_RESEND_WHEN_IN_PROGRESS
flag, ensuring that we do not lose an LPI being delivered during that window
by getting the GIC to resend it.

This workaround gets enabled for all LPIs, including the VPE doorbells.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
[maz: massaged commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-4-jgowans@amazon.com
---
 drivers/irqchip/irq-gic-v3-its.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 0ec2b1e..1994541 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3585,6 +3585,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 		irqd = irq_get_irq_data(virq + i);
 		irqd_set_single_target(irqd);
 		irqd_set_affinity_on_activate(irqd);
+		irqd_set_resend_when_in_progress(irqd);
 		pr_debug("ID:%d pID:%d vID:%d\n",
 			 (int)(hwirq + i - its_dev->event_map.lpi_base),
 			 (int)(hwirq + i), virq + i);
@@ -4523,6 +4524,7 @@ static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int virq
 		irq_domain_set_hwirq_and_chip(domain, virq + i, i,
 					      irqchip, vm->vpes[i]);
 		set_bit(i, bitmap);
+		irqd_set_resend_when_in_progress(irq_get_irq_data(virq + i));
 	}
 
 	if (err) {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [irqchip: irq/irqchip-next] genirq: Allow fasteoi handler to resend interrupts on concurrent handling
  2023-06-08 12:00 ` [PATCH 2/3] genirq: fasteoi supports resend on concurrent invoke James Gowans
@ 2023-06-16 11:25   ` irqchip-bot for James Gowans
  0 siblings, 0 replies; 10+ messages in thread
From: irqchip-bot for James Gowans @ 2023-06-16 11:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Marc Zyngier, James Gowans, Thomas Gleixner, KarimAllah Raslan,
	Yipeng Zou, Zhang Jianhua

The following commit has been merged into the irq/irqchip-next branch of irqchip:

Commit-ID:     9c15eeb5362c48dd27d51bd72e8873341fa9383c
Gitweb:        https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms/9c15eeb5362c48dd27d51bd72e8873341fa9383c
Author:        James Gowans <jgowans@amazon.com>
AuthorDate:    Thu, 08 Jun 2023 14:00:20 +02:00
Committer:     Marc Zyngier <maz@kernel.org>
CommitterDate: Fri, 16 Jun 2023 12:22:35 +01:00

genirq: Allow fasteoi handler to resend interrupts on concurrent handling

There is a class of interrupt controllers out there that, once they
have signalled a given interrupt number, will still signal incoming
instances of the *same* interrupt despite the original interrupt
not having been EOIed yet.

As long as the new interrupt reaches the *same* CPU, nothing bad
happens, as that CPU still has its interrupts globally disabled,
and we will only take the new interrupt once the interrupt has
been EOIed.

However, things become more "interesting" if an affinity change comes
in while the interrupt is being handled. More specifically, while
the per-irq lock is being dropped. This results in the affinity change
taking place immediately. At this point, there is nothing that prevents
the interrupt from firing on the new target CPU. We end-up with the
interrupt running concurrently on two CPUs, which isn't a good thing.

And that's where things become worse: the new CPU notices that the
interrupt handling is in progress (irq_may_run() return false), and
*drops the interrupt on the floor*.

The whole race looks like this:

           CPU 0             |          CPU 1
-----------------------------|-----------------------------
interrupt start              |
  handle_fasteoi_irq         | set_affinity(CPU 1)
    handler                  |
    ...                      | interrupt start
    ...                      |   handle_fasteoi_irq -> early out
  handle_fasteoi_irq return  | interrupt end
interrupt end                |

If the interrupt was an edge, too bad. The interrupt is lost, and
the system will eventually die one way or another. Not great.

A way to avoid this situation is to detect this problem at the point
we handle the interrupt on the new target. Instead of dropping the
interrupt, use the resend mechanism to force it to be replayed.

Also, in order to limit the impact of this workaround to the pathetic
architectures that require it, gate it behind a new irq flag aptly
named IRQD_RESEND_WHEN_IN_PROGRESS.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
[maz: reworded commit mesage]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-3-jgowans@amazon.com
---
 include/linux/irq.h  | 13 +++++++++++++
 kernel/irq/chip.c    | 16 +++++++++++++++-
 kernel/irq/debugfs.c |  2 ++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index d9c86db..d8a6fdc 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -223,6 +223,8 @@ struct irq_data {
  *				  irq_chip::irq_set_affinity() when deactivated.
  * IRQD_IRQ_ENABLED_ON_SUSPEND	- Interrupt is enabled on suspend by irq pm if
  *				  irqchip have flag IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND set.
+ * IRQD_RESEND_WHEN_IN_PROGRESS	- Interrupt may fire when already in progress in which
+ *				  case it must be resent at the next available opportunity.
  */
 enum {
 	IRQD_TRIGGER_MASK		= 0xf,
@@ -249,6 +251,7 @@ enum {
 	IRQD_HANDLE_ENFORCE_IRQCTX	= BIT(28),
 	IRQD_AFFINITY_ON_ACTIVATE	= BIT(29),
 	IRQD_IRQ_ENABLED_ON_SUSPEND	= BIT(30),
+	IRQD_RESEND_WHEN_IN_PROGRESS    = BIT(31),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -448,6 +451,16 @@ static inline bool irqd_affinity_on_activate(struct irq_data *d)
 	return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE;
 }
 
+static inline void irqd_set_resend_when_in_progress(struct irq_data *d)
+{
+	__irqd_to_state(d) |= IRQD_RESEND_WHEN_IN_PROGRESS;
+}
+
+static inline bool irqd_needs_resend_when_in_progress(struct irq_data *d)
+{
+	return __irqd_to_state(d) & IRQD_RESEND_WHEN_IN_PROGRESS;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 49e7bc8..57cd8f4 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -692,8 +692,16 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 
 	raw_spin_lock(&desc->lock);
 
-	if (!irq_may_run(desc))
+	/*
+	 * When an affinity change races with IRQ handling, the next interrupt
+	 * can arrive on the new CPU before the original CPU has completed
+	 * handling the previous one - it may need to be resent.
+	 */
+	if (!irq_may_run(desc)) {
+		if (irqd_needs_resend_when_in_progress(&desc->irq_data))
+			desc->istate |= IRQS_PENDING;
 		goto out;
+	}
 
 	desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
 
@@ -715,6 +723,12 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 
 	cond_unmask_eoi_irq(desc, chip);
 
+	/*
+	 * When the race described above happens this will resend the interrupt.
+	 */
+	if (unlikely(desc->istate & IRQS_PENDING))
+		check_irq_resend(desc, false);
+
 	raw_spin_unlock(&desc->lock);
 	return;
 out:
diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c
index bbcaac6..5971a66 100644
--- a/kernel/irq/debugfs.c
+++ b/kernel/irq/debugfs.c
@@ -133,6 +133,8 @@ static const struct irq_bit_descr irqdata_states[] = {
 	BIT_MASK_DESCR(IRQD_HANDLE_ENFORCE_IRQCTX),
 
 	BIT_MASK_DESCR(IRQD_IRQ_ENABLED_ON_SUSPEND),
+
+	BIT_MASK_DESCR(IRQD_RESEND_WHEN_IN_PROGRESS),
 };
 
 static const struct irq_bit_descr irqdesc_states[] = {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [irqchip: irq/irqchip-next] genirq: Expand doc for PENDING and REPLAY flags
  2023-06-08 12:00 ` [PATCH 1/3] genirq: Expand doc for PENDING and REPLAY flags James Gowans
@ 2023-06-16 11:25   ` irqchip-bot for James Gowans
  0 siblings, 0 replies; 10+ messages in thread
From: irqchip-bot for James Gowans @ 2023-06-16 11:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: James Gowans, Thomas Gleixner, Marc Zyngier, Liao Chang

The following commit has been merged into the irq/irqchip-next branch of irqchip:

Commit-ID:     7cc148a32f1e7496e22c0005dd113a31d4a3b3d4
Gitweb:        https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms/7cc148a32f1e7496e22c0005dd113a31d4a3b3d4
Author:        James Gowans <jgowans@amazon.com>
AuthorDate:    Thu, 08 Jun 2023 14:00:19 +02:00
Committer:     Marc Zyngier <maz@kernel.org>
CommitterDate: Fri, 16 Jun 2023 12:22:05 +01:00

genirq: Expand doc for PENDING and REPLAY flags

Adding a bit more info about what the flags are used for may help future
code readers.

Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-2-jgowans@amazon.com
---
 kernel/irq/internals.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 5fdc0b5..c443a0d 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -47,9 +47,12 @@ enum {
  *				  detection
  * IRQS_POLL_INPROGRESS		- polling in progress
  * IRQS_ONESHOT			- irq is not unmasked in primary handler
- * IRQS_REPLAY			- irq is replayed
+ * IRQS_REPLAY			- irq has been resent and will not be resent
+ * 				  again until the handler has run and cleared
+ * 				  this flag.
  * IRQS_WAITING			- irq is waiting
- * IRQS_PENDING			- irq is pending and replayed later
+ * IRQS_PENDING			- irq needs to be resent and should be resent
+ * 				  at the next available opportunity.
  * IRQS_SUSPENDED		- irq is suspended
  * IRQS_NMI			- irq line is used to deliver NMIs
  * IRQS_SYSFS			- descriptor has been added to sysfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke
  2023-06-16  8:32 ` [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke Gowans, James
@ 2023-06-16 11:30   ` Marc Zyngier
  2023-06-16 12:25     ` Gowans, James
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2023-06-16 11:30 UTC (permalink / raw)
  To: Gowans, James; +Cc: tglx, liaochang1, linux-kernel

On Fri, 16 Jun 2023 09:32:30 +0100,
"Gowans, James" <jgowans@amazon.com> wrote:
> 
> Hi Marc and Tomas,
> Just a ping on this series; would be great to get any more feedback, or
> get this merged.

Just did, after converting everything to BIT() and massaging the
commit messages to my own liking.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke
  2023-06-16 11:30   ` Marc Zyngier
@ 2023-06-16 12:25     ` Gowans, James
  0 siblings, 0 replies; 10+ messages in thread
From: Gowans, James @ 2023-06-16 12:25 UTC (permalink / raw)
  To: maz; +Cc: tglx, liaochang1, linux-kernel

On Fri, 2023-06-16 at 12:30 +0100, Marc Zyngier wrote:
> On Fri, 16 Jun 2023 09:32:30 +0100,
> "Gowans, James" <jgowans@amazon.com> wrote:
> > Hi Marc and Tomas,
> > Just a ping on this series; would be great to get any more feedback, or
> > get this merged.
> 
> Just did, after converting everything to BIT() and massaging the
> commit messages to my own liking.

The commit message improvements are fantastic! Thanks for merging it
Marc.

Cheers,
JG

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-06-16 12:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08 12:00 [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke James Gowans
2023-06-08 12:00 ` [PATCH 1/3] genirq: Expand doc for PENDING and REPLAY flags James Gowans
2023-06-16 11:25   ` [irqchip: irq/irqchip-next] " irqchip-bot for James Gowans
2023-06-08 12:00 ` [PATCH 2/3] genirq: fasteoi supports resend on concurrent invoke James Gowans
2023-06-16 11:25   ` [irqchip: irq/irqchip-next] genirq: Allow fasteoi handler to resend interrupts on concurrent handling irqchip-bot for James Gowans
2023-06-08 12:00 ` [PATCH 3/3] irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs James Gowans
2023-06-16 11:25   ` [irqchip: irq/irqchip-next] " irqchip-bot for James Gowans
2023-06-16  8:32 ` [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke Gowans, James
2023-06-16 11:30   ` Marc Zyngier
2023-06-16 12:25     ` Gowans, James

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).