All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Kogan <alex.kogan@oracle.com>
To: Waiman Long <longman@redhat.com>
Cc: linux@armlinux.org.uk, peterz@infradead.org, mingo@redhat.com,
	will.deacon@arm.com, arnd@arndb.de, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de,
	hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com,
	jglauber@marvell.com, steven.sistare@oracle.com,
	daniel.m.jordan@oracle.com, dave.dice@oracle.com,
	rahul.x.yadav@oracle.com
Subject: Re: [PATCH v4 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
Date: Thu, 19 Sep 2019 11:55:21 -0400	[thread overview]
Message-ID: <87B87982-670F-4F12-9EE0-DC89A059FAEC@oracle.com> (raw)
In-Reply-To: <3ae2b6a2-ffe6-2ca1-e5bf-2292db50e26f@redhat.com>

>> +/*
>> + * cna_try_find_next - scan the main waiting queue looking for the first
>> + * thread running on the same NUMA node as the lock holder. If found (call it
>> + * thread T), move all threads in the main queue between the lock holder and
>> + * T to the end of the secondary queue and return T; otherwise, return NULL.
>> + *
>> + * Schematically, this may look like the following (nn stands for numa_node and
>> + * et stands for encoded_tail).
>> + *
>> + *     when cna_try_find_next() is called (the secondary queue is empty):
>> + *
>> + *  A+------------+   B+--------+   C+--------+   T+--------+
>> + *   |mcs:next    | -> |mcs:next| -> |mcs:next| -> |mcs:next| -> NULL
>> + *   |mcs:locked=1|    |cna:nn=0|    |cna:nn=2|    |cna:nn=1|
>> + *   |cna:nn=1    |    +--------+    +--------+    +--------+
>> + *   +----------- +
>> + *
>> + *     when cna_try_find_next() returns (the secondary queue contains B and C):
>> + *
>> + *  A+----------------+    T+--------+
>> + *   |mcs:next        | ->  |mcs:next| -> NULL
>> + *   |mcs:locked=B.et | -+  |cna:nn=1|
>> + *   |cna:nn=1        |  |  +--------+
>> + *   +--------------- +  |
>> + *                       |
>> + *                       +->  B+--------+   C+--------+
>> + *                             |mcs:next| -> |mcs:next|
>> + *                             |cna:nn=0|    |cna:nn=2|
>> + *                             |cna:tail| -> +--------+
>> + *                             +--------+
>> + *
>> + * The worst case complexity of the scan is O(n), where n is the number
>> + * of current waiters. However, the fast path, which is expected to be the
>> + * common case, is O(1).
>> + */
>> +static struct mcs_spinlock *cna_try_find_next(struct mcs_spinlock *node,
>> +					      struct mcs_spinlock *next)
>> +{
>> +	struct cna_node *cn = (struct cna_node *)node;
>> +	struct cna_node *cni = (struct cna_node *)next;
>> +	struct cna_node *first, *last = NULL;
>> +	int my_numa_node = cn->numa_node;
>> +
>> +	/* fast path: immediate successor is on the same NUMA node */
>> +	if (cni->numa_node == my_numa_node)
>> +		return next;
>> +
>> +	/* find any next waiter on 'our' NUMA node */
>> +	for (first = cni;
>> +	     cni && cni->numa_node != my_numa_node;
>> +	     last = cni, cni = (struct cna_node *)READ_ONCE(cni->mcs.next))
>> +		;
>> +
>> +	/* if found, splice any skipped waiters onto the secondary queue */
>> +	if (cni && last)
>> +		cna_splice_tail(cn, first, last);
>> +
>> +	return (struct mcs_spinlock *)cni;
>> +}
> 
> At the Linux Plumbers Conference last week, Will has raised the concern
> about the latency of the O(1) cna_try_find_next() operation that will
> add to the lock hold time.
While the worst case complexity of the scan is O(n), I _think it can be proven
that the amortized complexity is O(1). For intuition, consider a two-node 
system with N threads total. In the worst case scenario, the scan will go 
over N/2 threads running on a different node. If the scan ultimately “fails”
(no thread from the lock holder’s node is found), the lock will be passed
to the first thread from a different node and then between all those N/2 threads,
with a scan of just one node for the next N/2 - 1 passes. Otherwise, those 
N/2 threads will be moved to the secondary queue. On the next lock handover, 
we pass the lock either to the next thread in the main queue (as it has to be 
from our node) or to the first node in the secondary queue. In both cases, we 
scan just one node, and in the latter case, we have again N/2 - 1 passes with 
a scan of just one node each.

> One way to hide some of the latency is to do
> a pre-scan before acquiring the lock. The CNA code could override the
> pv_wait_head_or_lock() function to call cna_try_find_next() as a
> pre-scan and return 0. What do you think?
This is certainly possible, but I do not think it would completely eliminate 
the worst case scenario. It will probably make it even less likely, but at 
the same time, we will reduce the chance of actually finding a thread from the
same node (that may enter the main queue while we wait for the owner & pending 
to go away).

Regards,
— Alex

WARNING: multiple messages have this Message-ID (diff)
From: Alex Kogan <alex.kogan@oracle.com>
To: Waiman Long <longman@redhat.com>
Cc: linux-arch@vger.kernel.org, guohanjun@huawei.com, arnd@arndb.de,
	peterz@infradead.org, dave.dice@oracle.com, jglauber@marvell.com,
	x86@kernel.org, will.deacon@arm.com, linux@armlinux.org.uk,
	linux-kernel@vger.kernel.org, rahul.x.yadav@oracle.com,
	mingo@redhat.com, bp@alien8.de, hpa@zytor.com,
	steven.sistare@oracle.com, tglx@linutronix.de,
	daniel.m.jordan@oracle.com, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v4 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
Date: Thu, 19 Sep 2019 11:55:21 -0400	[thread overview]
Message-ID: <87B87982-670F-4F12-9EE0-DC89A059FAEC@oracle.com> (raw)
In-Reply-To: <3ae2b6a2-ffe6-2ca1-e5bf-2292db50e26f@redhat.com>

>> +/*
>> + * cna_try_find_next - scan the main waiting queue looking for the first
>> + * thread running on the same NUMA node as the lock holder. If found (call it
>> + * thread T), move all threads in the main queue between the lock holder and
>> + * T to the end of the secondary queue and return T; otherwise, return NULL.
>> + *
>> + * Schematically, this may look like the following (nn stands for numa_node and
>> + * et stands for encoded_tail).
>> + *
>> + *     when cna_try_find_next() is called (the secondary queue is empty):
>> + *
>> + *  A+------------+   B+--------+   C+--------+   T+--------+
>> + *   |mcs:next    | -> |mcs:next| -> |mcs:next| -> |mcs:next| -> NULL
>> + *   |mcs:locked=1|    |cna:nn=0|    |cna:nn=2|    |cna:nn=1|
>> + *   |cna:nn=1    |    +--------+    +--------+    +--------+
>> + *   +----------- +
>> + *
>> + *     when cna_try_find_next() returns (the secondary queue contains B and C):
>> + *
>> + *  A+----------------+    T+--------+
>> + *   |mcs:next        | ->  |mcs:next| -> NULL
>> + *   |mcs:locked=B.et | -+  |cna:nn=1|
>> + *   |cna:nn=1        |  |  +--------+
>> + *   +--------------- +  |
>> + *                       |
>> + *                       +->  B+--------+   C+--------+
>> + *                             |mcs:next| -> |mcs:next|
>> + *                             |cna:nn=0|    |cna:nn=2|
>> + *                             |cna:tail| -> +--------+
>> + *                             +--------+
>> + *
>> + * The worst case complexity of the scan is O(n), where n is the number
>> + * of current waiters. However, the fast path, which is expected to be the
>> + * common case, is O(1).
>> + */
>> +static struct mcs_spinlock *cna_try_find_next(struct mcs_spinlock *node,
>> +					      struct mcs_spinlock *next)
>> +{
>> +	struct cna_node *cn = (struct cna_node *)node;
>> +	struct cna_node *cni = (struct cna_node *)next;
>> +	struct cna_node *first, *last = NULL;
>> +	int my_numa_node = cn->numa_node;
>> +
>> +	/* fast path: immediate successor is on the same NUMA node */
>> +	if (cni->numa_node == my_numa_node)
>> +		return next;
>> +
>> +	/* find any next waiter on 'our' NUMA node */
>> +	for (first = cni;
>> +	     cni && cni->numa_node != my_numa_node;
>> +	     last = cni, cni = (struct cna_node *)READ_ONCE(cni->mcs.next))
>> +		;
>> +
>> +	/* if found, splice any skipped waiters onto the secondary queue */
>> +	if (cni && last)
>> +		cna_splice_tail(cn, first, last);
>> +
>> +	return (struct mcs_spinlock *)cni;
>> +}
> 
> At the Linux Plumbers Conference last week, Will has raised the concern
> about the latency of the O(1) cna_try_find_next() operation that will
> add to the lock hold time.
While the worst case complexity of the scan is O(n), I _think it can be proven
that the amortized complexity is O(1). For intuition, consider a two-node 
system with N threads total. In the worst case scenario, the scan will go 
over N/2 threads running on a different node. If the scan ultimately “fails”
(no thread from the lock holder’s node is found), the lock will be passed
to the first thread from a different node and then between all those N/2 threads,
with a scan of just one node for the next N/2 - 1 passes. Otherwise, those 
N/2 threads will be moved to the secondary queue. On the next lock handover, 
we pass the lock either to the next thread in the main queue (as it has to be 
from our node) or to the first node in the secondary queue. In both cases, we 
scan just one node, and in the latter case, we have again N/2 - 1 passes with 
a scan of just one node each.

> One way to hide some of the latency is to do
> a pre-scan before acquiring the lock. The CNA code could override the
> pv_wait_head_or_lock() function to call cna_try_find_next() as a
> pre-scan and return 0. What do you think?
This is certainly possible, but I do not think it would completely eliminate 
the worst case scenario. It will probably make it even less likely, but at 
the same time, we will reduce the chance of actually finding a thread from the
same node (that may enter the main queue while we wait for the owner & pending 
to go away).

Regards,
— Alex
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-09-19 15:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06 14:25 [PATCH v4 0/5] Add NUMA-awareness to qspinlock Alex Kogan
2019-09-06 14:25 ` Alex Kogan
2019-09-06 14:25 ` [PATCH v4 1/5] locking/qspinlock: Rename arch_mcs_spin_unlock_contended to arch_mcs_pass_lock and make it more generic Alex Kogan
2019-09-06 14:25   ` Alex Kogan
2019-09-17  6:25   ` Hanjun Guo
2019-09-17  6:25     ` Hanjun Guo
2019-09-17  6:25     ` Hanjun Guo
2019-09-06 14:25 ` [PATCH v4 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan
2019-09-06 14:25   ` Alex Kogan
2019-09-06 14:25   ` Alex Kogan
2019-09-06 14:25 ` [PATCH v4 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
2019-09-06 14:25   ` Alex Kogan
2019-09-17 17:44   ` Waiman Long
2019-09-17 17:44     ` Waiman Long
2019-09-19 15:55     ` Alex Kogan [this message]
2019-09-19 15:55       ` Alex Kogan
2019-09-19 20:54       ` Waiman Long
2019-09-19 20:54         ` Waiman Long
2019-09-06 14:25 ` [PATCH v4 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan
2019-09-06 14:25   ` Alex Kogan
2019-09-17 18:07   ` Waiman Long
2019-09-17 18:07     ` Waiman Long
2019-09-06 14:25 ` [PATCH v4 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan
2019-09-06 14:25   ` Alex Kogan
2019-09-06 14:25   ` Alex Kogan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87B87982-670F-4F12-9EE0-DC89A059FAEC@oracle.com \
    --to=alex.kogan@oracle.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.dice@oracle.com \
    --cc=guohanjun@huawei.com \
    --cc=hpa@zytor.com \
    --cc=jglauber@marvell.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rahul.x.yadav@oracle.com \
    --cc=steven.sistare@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.