Re: Why GICD_ITARGETSR is not used by Linux

From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Li Chen <me@linux.beauty>
Cc: Arnd Bergmann <arnd@arndb.de>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: Why GICD_ITARGETSR is not used by Linux
Date: Tue, 20 Sep 2022 11:14:24 +0100	[thread overview]
Message-ID: <YymSgJoHEKhD3ypC@shell.armlinux.org.uk> (raw)
In-Reply-To: <YymRYhjQpRDsD5jM@shell.armlinux.org.uk>

On Tue, Sep 20, 2022 at 11:09:38AM +0100, Russell King (Oracle) wrote:
> On Tue, Sep 20, 2022 at 11:45:10AM +0200, Li Chen wrote:
> > Hi Arnd,
> > 
> >  ---- On Tue, 20 Sep 2022 09:04:16 +0200  Arnd Bergmann  wrote --- 
> >  > On Tue, Sep 20, 2022, at 3:42 AM, Li Chen wrote:
> >  > > Hi Arnd,
> >  > >
> >  > > I noticed GIC has GICD_ITARGETSR to distribute IRQ to different CPUs, 
> >  > > but currently, it is not used by Linux.
> >  > >
> >  > > There was a patchset from MTK people: 
> >  > > http://archive.lwn.net:8080/linux-kernel/1606486531-25719-1-git-send-email-hanks.chen@mediatek.com/T/#t 
> >  > > which implements GIC-level IRQ distributor using GICD_ITARGETSR, but it 
> >  > > is 
> >  > > not accepted because the maintainer thinks it will break existing codes 
> >  > > and not provide benefits compared with the existing affinity mechanism.
> >  > >
> >  > > IIUC, Linux only relies on affinity/irqbalance to distribute IRQ 
> >  > > instead of architecture-specific solutions like GIC's distributor.
> >  > >
> >  > > Maybe latency can somewhat get improved, but there is no benchmark yet.
> >  > >
> >  > > I have two questions here:
> >  > > 1. Now that Linux doesn't use GICD_ITARGETSR, where does it set CPU 0 
> >  > > to be the only IRQ distributor core?
> >  > > 2. Do you know any other reasons that GICD_ITARGETSR is not used by 
> >  > > Linux?
> >  > 
> >  > Hi Li,
> >  > 
> >  > It looks like the original submitter never followed up
> >  > with a new version of the patch that addresses the
> >  > issues found in review. I would assume they gave up either
> >  > because it did not show any real-world advantage, or they
> >  > could not address all of the concerns.
> > 
> > Thanks for your reply.
> > 
> > FYI, here is another thread about this topic: https://lore.kernel.org/linux-arm-kernel/20191120105017.GN25745@shell.armlinux.org.uk/
> 
> Oh god, not this again.
> 
> The behaviour of the GIC is as follows. If you set two CPUs in
> GICD_ITARGETSRn, then the interrupt will be delivered to _both_ of
> those CPUs. Not just one selected at random or determined by some
> algorithm, but both CPUs.
> 
> Both CPUs get woken up if they're in sleep, and both CPUs attempt to
> process the interrupt. One CPU will win the lock, while the other CPU
> spins waiting for the lock to process the interrupt.
> 
> The winning CPU will process the interrupt, clear it on the device,
> release the lock and acknowledge it at the GIC CPU interface.
> 
> The CPU that lost the previous race can now proceed to process the
> very same interrupt, discovers that it's no longer pending on the
> device, and signals IRQ_NONE as it appears to be a spurious interrupt.
> 
> The result is that the losing CPU ends up wasting CPU cycles, and
> if the losing CPU was in a low power idle state, needlessly wakes up
> to process this interrupt.
> 
> If you have more CPUs involved, you have more CPUs wasting CPU cycles,
> being woken up wasting power - not just occasionally, but almost every
> single interrupt that is raised from a device in the system.
> 
> On architectures such as x86, the PICs distribute the interrupts in
> hardware amongst the CPUs. So if a single interrupt is set to be sent
> to multiple CPUs, only _one_ of the CPUs is actually interrupted.
> Hence, x86 can have multiple CPUs selected as a destination, and
> the hardware delivers the interrupt across all CPUs.
> 
> On ARM, we don't have that. We have a thundering herd of CPUs if we
> set more than one CPU to process the interrupt, which is grossly
> inefficient.
> 
> As I said in the reply you linked to above, I did attempt to implement
> several ideas in software, where the kernel would attempt to distribute
> dynamically the interrupt amongst the CPUs in the affinity mask, but I
> could never get what appeared to be a good behaviour on the platforms
> I was trying and performance wasn't as good. So I abandoned it.

Oh, I just remembered an additional detail. There was the irqbalance
daemon that used to work at that time which would do the distribution
from userspace - and it knew various policies about interrupts that
were found to be beneficial on x86. Such as not moving busy ethernet
interrupts amongst the CPUs due to networking paths being "hot" in the
cache, vs other interrupts where that didn't matter so much.

I never understood why folk in the ARM community were so reluctant to
use what was an existing userspace approach... which I believe has now
fallen into disrepair for use on ARM because no one was interested.

> This doesn't preclude someone else having a go at solving that problem,
> but the problem is not solved by setting multiple CPU bits in the
> GICD_ITARGETSRn registers. As I said above, that just gets you a
> thundering herd problem, less performance, and worse power consumption.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel