[PATCH] irq: fasteoi handler re-runs on concurrent invoke

* [PATCH] irq: fasteoi handler re-runs on concurrent invoke
@ 2023-03-17  9:53 James Gowans
  2023-03-17 10:12 ` Yipeng Zou
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: James Gowans @ 2023-03-17  9:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, James Gowans, David Woodhouse, Marc Zyngier,
	KarimAllah Raslan

Update the generic handle_fasteoi_irq to cater for the case when the
next interrupt comes in while the previous handler is still running.
Currently when that happens the irq_may_run() early out causes the next
IRQ to be lost. Change the behaviour to mark the interrupt as pending
and re-run the handler when handle_fasteoi_irq sees that the pending
flag has been set again. This is largely inspired by handle_edge_irq.

Generally it should not be possible for the next interrupt to arrive
while the previous handler is still running: the next interrupt should
only arrive after the EOI message has been sent and the previous handler
has returned. However, there is a race where if the interrupt affinity
is changed while the previous handler is running, then the next
interrupt can arrive at a different CPU while the previous handler is
still running. In that case there will be a concurrent invoke and the
early out will be taken.

For example:

           CPU 0             |          CPU 1
-----------------------------|-----------------------------
interrupt start              |
  handle_fasteoi_irq         | set_affinity(CPU 1)
    handler                  |
    ...                      | interrupt start
    ...                      |   handle_fasteoi_irq -> early out
  handle_fasteoi_irq return  | interrupt end
interrupt end                |

This issue was observed specifically on an arm64 system with a GIC-v3
handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
that the global ITS is responsible for affinity but does not know
whether interrupts are pending/running, only the CPU-local redistributor
handles the EOI. Hence when the affinity is changed in the ITS, the new
CPU's redistributor does not know that the original CPU is still running
the handler.

There are a few uncertainties about this implementation compared to the
prior art in handle_edge_irq:

1. Do we need to mask the IRQ and then unmask it later? I don't think so
but it's not entirely clear why handle_edge_irq does this anyway; it's
an edge IRQ so not sure why it needs to be masked.

2. Should the EOI delivery be inside the do loop after every handler
run? I think outside the loops is best; only inform the chip to deliver
more IRQs once all pending runs have been finished.

3. Do we need to check that desc->action is still set? I don't know if
it can be un-set while the IRQ is still enabled.

4. There is now more overlap with the handle_edge_eoi_irq handler;
should we try to unify these?

Signed-off-by: James Gowans <jgowans@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
---
 Documentation/core-api/genericirq.rst | 9 ++++++++-
 kernel/irq/chip.c                     | 9 +++++++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/Documentation/core-api/genericirq.rst b/Documentation/core-api/genericirq.rst
index f959c9b53f61..b54485eca8b5 100644
--- a/Documentation/core-api/genericirq.rst
+++ b/Documentation/core-api/genericirq.rst
@@ -240,7 +240,14 @@ which only need an EOI at the end of the handler.
 
 The following control flow is implemented (simplified excerpt)::
 
-    handle_irq_event(desc->action);
+    if (desc->status & running) {
+        desc->status |= pending;
+        return;
+    }
+    do {
+        desc->status &= ~pending;
+        handle_irq_event(desc->action);
+    } while (status & pending);
     desc->irq_data.chip->irq_eoi();
 
 
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 49e7bc871fec..4e5fc2b7e8a9 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -692,8 +692,10 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 
 	raw_spin_lock(&desc->lock);
 
-	if (!irq_may_run(desc))
+	if (!irq_may_run(desc)) {
+		desc->istate |= IRQS_PENDING;
 		goto out;
+	}
 
 	desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
 
@@ -711,7 +713,10 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 	if (desc->istate & IRQS_ONESHOT)
 		mask_irq(desc);
 
-	handle_irq_event(desc);
+	do {
+		handle_irq_event(desc);
+	} while (unlikely((desc->istate & IRQS_PENDING) &&
+			!irqd_irq_disabled(&desc->irq_data)));
 
 	cond_unmask_eoi_irq(desc, chip);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread