Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock

From: Will Deacon <will@kernel.org>
To: Waiman Long <longman@redhat.com>
Cc: Lihao Liang <lihaoliang@google.com>,
	Alex Kogan <alex.kogan@oracle.com>,
	linux@armlinux.org.uk, peterz@infradead.org, mingo@redhat.com,
	will.deacon@arm.com, arnd@arndb.de, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de,
	hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com,
	jglauber@marvell.com, dave.dice@oracle.com,
	steven.sistare@oracle.com, daniel.m.jordan@oracle.com
Subject: Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock
Date: Thu, 23 Jan 2020 11:35:47 +0000	[thread overview]
Message-ID: <20200123113547.GD18991@willie-the-truck> (raw)
In-Reply-To: <4e15fa1d-9540-3274-502a-4195a0d46f63@redhat.com>

Hi folks,

(I think Lihao is travelling at the moment, so he may be delayed in his
replies)

On Wed, Jan 22, 2020 at 12:24:58PM -0500, Waiman Long wrote:
> On 1/22/20 6:45 AM, Lihao Liang wrote:
> > On Wed, Jan 22, 2020 at 10:28 AM Alex Kogan <alex.kogan@oracle.com> wrote:
> >> Summary
> >> -------
> >>
> >> Lock throughput can be increased by handing a lock to a waiter on the
> >> same NUMA node as the lock holder, provided care is taken to avoid
> >> starvation of waiters on other NUMA nodes. This patch introduces CNA
> >> (compact NUMA-aware lock) as the slow path for qspinlock. It is
> >> enabled through a configuration option (NUMA_AWARE_SPINLOCKS).
> >>
> > Thanks for your patches. The experimental results look promising!
> >
> > I understand that the new CNA qspinlock uses randomization to achieve
> > long-term fairness, and provides the numa_spinlock_threshold parameter
> > for users to tune. As Linux runs extremely diverse workloads, it is not
> > clear how randomization affects its fairness, and how users with
> > different requirements are supposed to tune this parameter.
> >
> > To this end, Will and I consider it beneficial to be able to answer the
> > following question:
> >
> > With different values of numa_spinlock_threshold and
> > SHUFFLE_REDUCTION_PROB_ARG, how long do threads running on different
> > sockets have to wait to acquire the lock? This is particularly relevant
> > in high contention situations when new threads keep arriving on the same
> > socket as the lock holder.
> >
> > In this email, I try to provide some formal analysis to address this
> > question. Let's assume the probability for the lock to stay on the
> > same socket is *at least* p, which corresponds to the probability for
> > the function probably(unsigned int num_bits) in the patch to return *false*,
> > where SHUFFLE_REDUCTION_PROB_ARG is passed as the value of num_bits to the
> > function.
> 
> That is not strictly true from my understanding of the code. The
> probably() function does not come into play if a secondary queue is
> present. Also calling cna_scan_main_queue() doesn't guarantee that a
> waiter in the same node can be found. So the simple mathematical
> analysis isn't that applicable in this case. One will have to do an
> actual simulation to find out what the actual behavior will be.

It's certainly true that the analysis is based on the worst-case scenario,
but I think it's still worth considering. For example, the secondary queue
does not exist initially so it seems a bit odd that we only instantiate it
with < 1% probability.

That said, my real concern with any of this is that it makes formal
modelling and analysis of the qspinlock considerably more challenging. I
would /really/ like to see an update to the TLA+ model we have of the
current implementation [1] and preferably also the userspace version I
hacked together [2] so that we can continue to test and validate changes
to the code outside of the usual kernel stress-testing.

Will

[1] https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/
[2] https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/spinbench/