Re: Unfair qspinlocks on ARM64 without LSE atomics => 3ms delay in interrupt handling

From: Kurt Kanzenbach <kurt@linutronix.de>
To: Jan Kiszka <jan.kiszka@siemens.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Bouska, Zdenek" <zdenek.bouska@siemens.com>,
	Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>
Cc: "linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>,
	Nishanth Menon <nm@ti.com>, Puranjay Mohan <p-mohan@ti.com>
Subject: Re: Unfair qspinlocks on ARM64 without LSE atomics => 3ms delay in interrupt handling
Date: Thu, 27 Apr 2023 15:45:09 +0200	[thread overview]
Message-ID: <871qk5782i.fsf@kurt> (raw)
In-Reply-To: <19641ab0-ab6a-9af7-8c64-34030e187848@siemens.com>

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

On Thu Apr 27 2023, Jan Kiszka wrote:
> On 26.04.23 23:29, Thomas Gleixner wrote:
>> On Wed, Apr 26 2023 at 12:03, Zdenek Bouska wrote:
>>> following patch is my current approach for fixing this issue. I introduced
>>> big_cpu_relax(), which uses Will's implementation [1] on ARM64 without
>>> LSE atomics and original cpu_relax() on any other CPU.
>> 
>> Why is this interrupt handling specific? Just because it's the place
>> where you observed it?
>> 
>> That's a general issue for any code which uses atomics for forward
>> progress. LL/SC simply does not guarantee that.
>> 
>> So if that helps, then this needs to be addressed globaly and not with
>> some crude hack in the interrupt handling code.
>
> My impression is that the retry loop of irq_finalize_oneshot is
> particularly susceptible to that issue due to the high acquire/relax
> pressure and inter-dependency between holder and waiter it generates -
> which does not mean it cannot occur in other places.
>
> Are we aware of other concrete case where it bites? Even with just
> "normal" contented spin_lock usage?

Well, some years ago I've observed a similar problem with ARM64
spinlocks, cpu_relax() and retry loops (in the futex code). It also
generated latency spikes up to 2-3ms. Back then, it was easily
reproducible using stress-ng --ptrace 4.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]