linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Will Deacon <will@kernel.org>
Cc: Alex Kogan <alex.kogan@oracle.com>,
	linux-arch@vger.kernel.org, guohanjun@huawei.com, arnd@arndb.de,
	dave.dice@oracle.com, jglauber@marvell.com, x86@kernel.org,
	will.deacon@arm.com, linux@armlinux.org.uk,
	steven.sistare@oracle.com, linux-kernel@vger.kernel.org,
	mingo@redhat.com, bp@alien8.de, hpa@zytor.com,
	longman@redhat.com, tglx@linutronix.de,
	daniel.m.jordan@oracle.com, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v9 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock
Date: Thu, 23 Jan 2020 14:17:37 +0100	[thread overview]
Message-ID: <20200123131737.GV14914@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20200123112251.GC18991@willie-the-truck>

On Thu, Jan 23, 2020 at 11:22:51AM +0000, Will Deacon wrote:

> > Argh, make that:
> > 
> > 	tail_2nd->next = NULL;
> > 
> > 	smp_wmb();
> > 
> > > 	if (!atomic_try_cmpxchg_relaxed(&lock, &val, new)) {
> 
> ... or could you drop the smp_wmb() and make this
> atomic_try_cmpxchg_release()?

My current code has the smp_wmb(), because most _releases end up being
an smp_mb() (except for powerpc where it is of equal cost to wmb and
arm64, where I have no idea of the costs).

> To be honest, I've failed to understand the code prior to your changes
> in this area: it appears to reply on a control-dependency from the two
> cmpxchg_relaxed() calls (which isn't sufficient to order the store parts
> afaict) and I also don't get how we deal with a transiently circular primary
> queue.

Ha!, yes, so this little piece took me a while too. Let me attempt an
explanation.

+ *    cna_node
+ *   +----------+     +--------+         +--------+
+ *   |mcs:next  | --> |mcs:next| --> ... |mcs:next| --> NULL  [Primary queue]
+ *   |mcs:locked| -.  +--------+         +--------+
+ *   +----------+  |
+ *                 `----------------------.
+ *                                        v
+ *                 +--------+         +--------+
+ *                 |mcs:next| --> ... |mcs:next|            [Secondary queue]
+ *                 +--------+         +--------+
+ *                     ^                    |
+ *                     `--------------------'

So @node is the current lock holder, node->next == NULL (primary queue
is empty) and we're going to try and splice the secondary queue to the
head of the primary.

+       tail_2nd = decode_tail(node->locked);
+       head_2nd = tail_2nd->next;

this gets the secondary head and tail, so far so simple

+       new = ((struct cna_node *)tail_2nd)->encoded_tail + _Q_LOCKED_VAL;

this encodes the new primary tail (as kept in lock->val), still simple

+       if (atomic_try_cmpxchg_relaxed(&lock->val, &val, new)) {

if this here succeeds, we've got the primary tail pointing at the
secondary tail. This is safe because only the lock holder (us) ever
modifies the secondary queue.

+               /*
+                * Try to reset @next in tail_2nd to NULL, but no need to check
+                * the result - if failed, a new successor has updated it.
+                */
+               cmpxchg_relaxed(&tail_2nd->next, head_2nd, NULL);

This is (broken, as per the prior argument) breaking the circular link
the secondary queue has. The trick here is that since we're the lock
holder, nothing will actually iterate the primary ->next chain, so a
bogus value in there is of no concern.

_However_ a new waiter might at this point do:

	old = xchg_tail(lock, node);
	if (old) {
		prev = decode_tail(old);
		WRITE_ONCE(prev->next, node);
		...
	}

which then results in conflicting stores to the one ->next variable.

The cmpxchg() is attempting to terminate the list, while the new waiter
is extending the list, it is therefore paramount the new waiter always
wins this. To that end they're employing the cmpxchg, but it very much
relies on the @head_2nd load to have happened before we exposed the
secondary tail as primary tail, otherwise it can have loaded the new
->next pointer and overwriten it.

+               arch_mcs_pass_lock(&head_2nd->locked, 1);
+               return true;
+       }
+
+       return false;


Did that help, or just make it worse?

  reply	other threads:[~2020-01-23 13:18 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-15  3:59 [PATCH v9 0/5] Add NUMA-awareness to qspinlock Alex Kogan
2020-01-15  3:59 ` [PATCH v9 1/5] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic Alex Kogan
2020-01-15  3:59 ` [PATCH v9 2/5] locking/qspinlock: Refactor the qspinlock slow path Alex Kogan
2020-01-15  3:59 ` [PATCH v9 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock Alex Kogan
2020-01-23  9:26   ` Peter Zijlstra
2020-01-23 10:06     ` Peter Zijlstra
2020-01-23 10:16       ` Peter Zijlstra
2020-01-23 11:22         ` Will Deacon
2020-01-23 13:17           ` Peter Zijlstra [this message]
2020-01-23 14:15   ` Waiman Long
2020-01-23 15:29     ` Peter Zijlstra
2020-01-15  3:59 ` [PATCH v9 4/5] locking/qspinlock: Introduce starvation avoidance into CNA Alex Kogan
2020-01-23 19:55   ` Waiman Long
2020-01-23 20:39     ` Waiman Long
2020-01-23 23:39       ` Alex Kogan
2020-01-15  3:59 ` [PATCH v9 5/5] locking/qspinlock: Introduce the shuffle reduction optimization " Alex Kogan
2020-03-02  1:14   ` [locking/qspinlock] 7b6da71157: unixbench.score 8.4% improvement kernel test robot
2020-01-22 11:45 ` [PATCH v9 0/5] Add NUMA-awareness to qspinlock Lihao Liang
2020-01-22 17:24   ` Waiman Long
2020-01-23 11:35     ` Will Deacon
2020-01-23 15:25       ` Waiman Long
2020-01-23 19:08         ` Waiman Long
2020-01-22 19:29   ` Alex Kogan
2020-01-26  0:32     ` Lihao Liang
2020-01-26  1:58       ` Lihao Liang
2020-01-27 16:01         ` Alex Kogan
2020-01-29  1:39           ` Lihao Liang
2020-01-27  6:16       ` Alex Kogan
2020-01-24 22:24 ` Paul E. McKenney
     [not found]   ` <6AAE7FC6-F5DE-4067-8BC4-77F27948CD09@oracle.com>
2020-01-25  0:57     ` Paul E. McKenney
2020-01-25  1:59       ` Waiman Long
     [not found]         ` <adb4fb09-f374-4d64-096b-ba9ad8b35fd5@redhat.com>
2020-01-25  4:58           ` Paul E. McKenney
2020-01-25 19:41             ` Waiman Long
2020-01-26 15:35               ` Paul E. McKenney
2020-01-26 22:42                 ` Paul E. McKenney
2020-01-26 23:32                   ` Paul E. McKenney
2020-01-27  6:04                   ` Alex Kogan
2020-01-27 14:11                   ` Waiman Long
2020-01-27 15:09                     ` Paul E. McKenney
     [not found]                       ` <9b3a3f16-5405-b6d1-d023-b85f4aab46dd@redhat.com>
2020-01-27 17:17                         ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200123131737.GV14914@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=alex.kogan@oracle.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.dice@oracle.com \
    --cc=guohanjun@huawei.com \
    --cc=hpa@zytor.com \
    --cc=jglauber@marvell.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=steven.sistare@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).