Re: [PATCH] irq: fasteoi handler re-runs on concurrent invoke

From: "Gowans, James" <jgowans@amazon.com>
To: "maz@kernel.org" <maz@kernel.org>,
	"liaochang1@huawei.com" <liaochang1@huawei.com>
Cc: "tglx@linutronix.de" <tglx@linutronix.de>,
	"zouyipeng@huawei.com" <zouyipeng@huawei.com>,
	"Raslan, KarimAllah" <karahmed@amazon.com>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Sironi, Filippo" <sironi@amazon.de>,
	"chris.zjh@huawei.com" <chris.zjh@huawei.com>
Subject: Re: [PATCH] irq: fasteoi handler re-runs on concurrent invoke
Date: Tue, 23 May 2023 12:47:36 +0000	[thread overview]
Message-ID: <605113d89cdf53ba1a5034bb15704da58a5336d7.camel@amazon.com> (raw)
In-Reply-To: <86sfcfghqh.wl-maz@kernel.org>

On Tue, 2023-05-02 at 11:17 +0100, Marc Zyngier wrote:
> 
> Sorry for the delay, I've been travelling the past couple of weeks.

Me too - just getting back to this now. :-)
> 
> On Tue, 02 May 2023 09:43:02 +0100,
> "Gowans, James" <jgowans@amazon.com> wrote:
> > 
> > > This will run check_irq_resend() on the *newly affined* CPU, while the old
> > > one is still running the original handler. AFAICT what will happen is:
> > > check_irq_resend
> > >   try_retrigger
> > >     irq_chip_retrigger_hierarchy
> > >       its_irq_retrigger
> > > ... which will cause the ITS to *immediately* re-trigger the IRQ. The
> > > original CPU can still be running the handler in that case.
> 
> Immediately is a relative thing. The GIC processes interrupts far
> slower than the CPU can handle them.

Ack - I've tested with your patch below and empirically on my system it
seems that a resend storm (running resend in a tight loop on the new CPU
while the original one is still slowly running a handler) causes
interrupts get delivered at period of about 40 µs. That is indeed less
"immediately" than handlers *typically* take.

> Yes, this is clearly missing from my patch, thanks for pointing this
> out. I think something like the hack below should handle it.
> 
> diff --git a/kernel/irq/resend.c b/kernel/irq/resend.c
> index 0c46e9fe3a89..5fa96842a882 100644
> --- a/kernel/irq/resend.c
> +++ b/kernel/irq/resend.c
> @@ -117,7 +117,8 @@ int check_irq_resend(struct irq_desc *desc, bool inject)
>                 return -EINVAL;
>         }
> 
> -       if (desc->istate & IRQS_REPLAY)
> +       if ((desc->istate & IRQS_REPLAY) &&
> +           !irqd_needs_resend_when_in_progress(&desc->irq_data))
>                 return -EBUSY;
> 
>         if (!(desc->istate & IRQS_PENDING) && !inject)

Have tested this and confirm it mitigates the issue.

> I really think that we want to move away from the old CPU asap.
> Imagine the following case: we want to turn a CPU off, and for this we
> need to migrate the interrupts away from it. But the source of the
> interrupt is a guest which keeps the interrupt coming, and this could
> at least in theory delay the offlining indefinitely.

Agreed - just note that whether we do the hardware-assisted
irq_retrigger() before the handler on the newly affined CPU (as your patch
does) or after the handler on the old CPU (as my original patch suggested)
doesn't make a difference - either way the next actual handler run will
happen on the newly affined CPU and we get off the old CPU for the next
handler runs.

The only time is does make a difference is in the SW resend case but as
you point out:

> With SW resend, the tasklet should get moved to CPUs that are still
> online, and there may be some additional work to do here.

> > Just checking to see if you've had a chance to consider these
> > issues/thoughts, and if/how they should be handled?
> > I'm still tending towards saying that the check_irq_resend() should run
> > after handle_irq_event() and the IRQS_PENDING flag should be wrangled to
> > decide whether or not to resend.
> 
> I still think this is the wrong approach. This mixes the pending and
> replay states, which are significantly different in the IRQ code.

Okay, TBH I'm not too sure what the difference between these two flags is,
the comments on the enum values isn't too detailed. I was inspired by
handle_edge_irq() to use the IRQ_PENDING. Any idea why it's correct to use
IRQ_PENDING there but not here? We could tweak this to use the REPLAY flag
and use that to conditionally run check_irq_resend() after the handler.

> I definitely want to see more testing on this, but I'm not sure the SW
> resend part is that critical. The issue at hand really is a GIC
> specific problem.

Ack, I'm also happy to not bother with the SW resend case too much. Just
cross-posting from your other email (unifying the threads):

> I contend that this really is a GICv3 architectural bug. The lack of
> an active state on LPIs leads to it, and as far as I can tell, no
> other interrupt architecture has the same issue. So the onus should be
> on the GIC, the GIC only, and only the parts of the GIC that require
> it (SPIs, PPIs and SGIs are fine, either because they have an active
> state, or because the lock isn't dropped when calling the handler).

Again I'm happy to defer to you here. I'm still not sure why we wouldn't
prefer to solve this in a way that *doesn't* need flags, and will just
never run for interrupt controllers which are not impacted? That seems
like we could have simpler code with no downside.

JG