All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Jordan Niethe" <jniethe5@gmail.com>, <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH 04/17] powerpc/qspinlock: convert atomic operations to assembly
Date: Thu, 10 Nov 2022 19:40:11 +1000	[thread overview]
Message-ID: <CO8ILQMO00P0.3ATB8LNW57SOQ@bobo> (raw)
In-Reply-To: <9ccfa76e921ea0b79a7ff166604004370e7aa30b.camel@gmail.com>

On Thu Nov 10, 2022 at 10:39 AM AEST, Jordan Niethe wrote:
> On Thu, 2022-07-28 at 16:31 +1000, Nicholas Piggin wrote:
> [resend as utf-8, not utf-7]
> > This uses more optimal ll/sc style access patterns (rather than
> > cmpxchg), and also sets the EH=1 lock hint on those operations
> > which acquire ownership of the lock.
> > ---
> >  arch/powerpc/include/asm/qspinlock.h       | 25 +++++--
> >  arch/powerpc/include/asm/qspinlock_types.h |  6 +-
> >  arch/powerpc/lib/qspinlock.c               | 81 +++++++++++++++-------
> >  3 files changed, 79 insertions(+), 33 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h
> > index 79a1936fb68d..3ab354159e5e 100644
> > --- a/arch/powerpc/include/asm/qspinlock.h
> > +++ b/arch/powerpc/include/asm/qspinlock.h
> > @@ -2,28 +2,43 @@
> >  #ifndef _ASM_POWERPC_QSPINLOCK_H
> >  #define _ASM_POWERPC_QSPINLOCK_H
> >  
> > -#include <linux/atomic.h>
> >  #include <linux/compiler.h>
> >  #include <asm/qspinlock_types.h>
> >  
> >  static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
> >  {
> > -	return atomic_read(&lock->val);
> > +	return READ_ONCE(lock->val);
> >  }
> >  
> >  static __always_inline int queued_spin_value_unlocked(struct qspinlock lock)
> >  {
> > -	return !atomic_read(&lock.val);
> > +	return !lock.val;
> >  }
> >  
> >  static __always_inline int queued_spin_is_contended(struct qspinlock *lock)
> >  {
> > -	return !!(atomic_read(&lock->val) & _Q_TAIL_CPU_MASK);
> > +	return !!(READ_ONCE(lock->val) & _Q_TAIL_CPU_MASK);
> >  }
> >  
> >  static __always_inline int queued_spin_trylock(struct qspinlock *lock)
> >  {
> > -	if (atomic_cmpxchg_acquire(&lock->val, 0, _Q_LOCKED_VAL) == 0)
> > +	u32 new = _Q_LOCKED_VAL;
> > +	u32 prev;
> > +
> > +	asm volatile(
> > +"1:	lwarx	%0,0,%1,%3	# queued_spin_trylock			\n"
> > +"	cmpwi	0,%0,0							\n"
> > +"	bne-	2f							\n"
> > +"	stwcx.	%2,0,%1							\n"
> > +"	bne-	1b							\n"
> > +"\t"	PPC_ACQUIRE_BARRIER "						\n"
> > +"2:									\n"
> > +	: "=&r" (prev)
> > +	: "r" (&lock->val), "r" (new),
> > +	  "i" (IS_ENABLED(CONFIG_PPC64) ? 1 : 0)
>
> btw IS_ENABLED() already returns 1 or 0

I guess we already do that in atomic.h too. Okay.

> > +	: "cr0", "memory");
>
> This is the ISA's "test and set" atomic primitive. Do you think it would be worth seperating it as a helper?

It ends up getting more complex as we go. I might leave some of these
primitives open coded for now, we could possibly look at providing them
or reusing more generic primitives after the series though.

> > +
> > +	if (likely(prev == 0))
> >  		return 1;
> >  	return 0;
>
> same optional style nit: return likely(prev == 0);

Will do.

>
> >  }
> > diff --git a/arch/powerpc/include/asm/qspinlock_types.h b/arch/powerpc/include/asm/qspinlock_types.h
> > index 3425dab42576..210adf05b235 100644
> > --- a/arch/powerpc/include/asm/qspinlock_types.h
> > +++ b/arch/powerpc/include/asm/qspinlock_types.h
> > @@ -7,7 +7,7 @@
> >  
> >  typedef struct qspinlock {
> >  	union {
> > -		atomic_t val;
> > +		u32 val;
> >  
> >  #ifdef __LITTLE_ENDIAN
> >  		struct {
> > @@ -23,10 +23,10 @@ typedef struct qspinlock {
> >  	};
> >  } arch_spinlock_t;
> >  
> > -#define	__ARCH_SPIN_LOCK_UNLOCKED	{ { .val = ATOMIC_INIT(0) } }
> > +#define	__ARCH_SPIN_LOCK_UNLOCKED	{ { .val = 0 } }
> >  
> >  /*
> > - * Bitfields in the atomic value:
> > + * Bitfields in the lock word:
> >   *
> >   *     0: locked bit
> >   * 16-31: tail cpu (+1)
> > diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> > index 5ebb88d95636..7c71e5e287df 100644
> > --- a/arch/powerpc/lib/qspinlock.c
> > +++ b/arch/powerpc/lib/qspinlock.c
> > @@ -1,5 +1,4 @@
> >  // SPDX-License-Identifier: GPL-2.0-or-later
> > -#include <linux/atomic.h>
> >  #include <linux/bug.h>
> >  #include <linux/compiler.h>
> >  #include <linux/export.h>
> > @@ -22,32 +21,59 @@ struct qnodes {
> >  
> >  static DEFINE_PER_CPU_ALIGNED(struct qnodes, qnodes);
> >  
> > -static inline int encode_tail_cpu(void)
> > +static inline u32 encode_tail_cpu(void)
> >  {
> >  	return (smp_processor_id() + 1) << _Q_TAIL_CPU_OFFSET;
> >  }
> >  
> > -static inline int get_tail_cpu(int val)
> > +static inline int get_tail_cpu(u32 val)
> >  {
> >  	return (val >> _Q_TAIL_CPU_OFFSET) - 1;
> >  }
> >  
> >  /* Take the lock by setting the bit, no other CPUs may concurrently lock it. */
>
> I think you missed deleting the above line.
>
> > +/* Take the lock by setting the lock bit, no other CPUs will touch it. */
> >  static __always_inline void lock_set_locked(struct qspinlock *lock)
> >  {
> > -	atomic_or(_Q_LOCKED_VAL, &lock->val);
> > -	__atomic_acquire_fence();
> > +	u32 new = _Q_LOCKED_VAL;
> > +	u32 prev;
> > +
> > +	asm volatile(
> > +"1:	lwarx	%0,0,%1,%3	# lock_set_locked			\n"
> > +"	or	%0,%0,%2						\n"
> > +"	stwcx.	%0,0,%1							\n"
> > +"	bne-	1b							\n"
> > +"\t"	PPC_ACQUIRE_BARRIER "						\n"
> > +	: "=&r" (prev)
> > +	: "r" (&lock->val), "r" (new),
> > +	  "i" (IS_ENABLED(CONFIG_PPC64) ? 1 : 0)
> > +	: "cr0", "memory");
> >  }
>
> This is pretty similar with the DEFINE_TESTOP() pattern from
> arch/powerpc/include/asm/bitops.h (such as test_and_set_bits_lock()) except for
> word instead of double word. Do you think it's possible / beneficial to make
> use of those macros?

If we could pull almost all our atomic primitives into one place and
make them usable by atomics, bitops, locks, etc. might be a good idea.

That function specifically works on a dword so we can't use it here,
and I don't want to modify any files except for the new ones in this
series if possible, but consolidating our primitives a bit more would
be nice.

> > -/* Take lock, clearing tail, cmpxchg with val (which must not be locked) */
> > -static __always_inline int trylock_clear_tail_cpu(struct qspinlock *lock, int val)
> > +/* Take lock, clearing tail, cmpxchg with old (which must not be locked) */
> > +static __always_inline int trylock_clear_tail_cpu(struct qspinlock *lock, u32 old)
> >  {
> > -	int newval = _Q_LOCKED_VAL;
> > -
> > -	if (atomic_cmpxchg_acquire(&lock->val, val, newval) == val)
> > +	u32 new = _Q_LOCKED_VAL;
> > +	u32 prev;
> > +
> > +	BUG_ON(old & _Q_LOCKED_VAL);
>
> The BUG_ON() could have been introduced in an earlier patch I think.

Yes.

> > +
> > +	asm volatile(
> > +"1:	lwarx	%0,0,%1,%4	# trylock_clear_tail_cpu		\n"
> > +"	cmpw	0,%0,%2							\n"
> > +"	bne-	2f							\n"
> > +"	stwcx.	%3,0,%1							\n"
> > +"	bne-	1b							\n"
> > +"\t"	PPC_ACQUIRE_BARRIER "						\n"
> > +"2:									\n"
> > +	: "=&r" (prev)
> > +	: "r" (&lock->val), "r"(old), "r" (new),
>
> Could this be like  "r"(_Q_TAIL_CPU_MASK) below?
> i.e. "r" (_Q_LOCKED_VAL)? Makes it clear new doesn't change.

Sure.

>
> > +	  "i" (IS_ENABLED(CONFIG_PPC64) ? 1 : 0)
> > +	: "cr0", "memory");
> > +
> > +	if (likely(prev == old))
> >  		return 1;
> > -	else
> > -		return 0;
> > +	return 0;
> >  }
> >  
> >  /*
> > @@ -56,20 +82,25 @@ static __always_inline int trylock_clear_tail_cpu(struct qspinlock *lock, int va
> >   * This provides a release barrier for publishing node, and an acquire barrier
>
> Does the comment mean there needs to be an acquire barrier in this assembly?

Yes, another good catch. What I'm going to do instead is add the acquire
to get_tail_qnode() because that path is only hit when you have multiple
waiters, and I think pairing it that way makes the barriers more
obvious.

Thanks,
Nick

  parent reply	other threads:[~2022-11-10  9:41 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-28  6:31 [PATCH 00/17] powerpc: alternate queued spinlock implementation Nicholas Piggin
2022-07-28  6:31 ` [PATCH 01/17] powerpc/qspinlock: powerpc qspinlock implementation Nicholas Piggin
2022-08-10  1:52   ` Jordan NIethe
2022-08-10  6:48     ` Christophe Leroy
2022-11-10  0:35   ` Jordan Niethe
2022-11-10  6:37     ` Christophe Leroy
2022-11-10 11:44       ` Nicholas Piggin
2022-11-10  9:09     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 1a/17] powerpc/qspinlock: Prepare qspinlock code Nicholas Piggin
2022-07-28  6:31 ` [PATCH 02/17] powerpc/qspinlock: add mcs queueing for contended waiters Nicholas Piggin
2022-08-10  2:28   ` Jordan NIethe
2022-11-10  0:36   ` Jordan Niethe
2022-11-10  9:21     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 03/17] powerpc/qspinlock: use a half-word store to unlock to avoid larx/stcx Nicholas Piggin
2022-08-10  3:28   ` Jordan Niethe
2022-11-10  0:39   ` Jordan Niethe
2022-11-10  9:25     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 04/17] powerpc/qspinlock: convert atomic operations to assembly Nicholas Piggin
2022-08-10  3:54   ` Jordan Niethe
2022-11-10  0:39   ` Jordan Niethe
2022-11-10  8:36     ` Christophe Leroy
2022-11-10 11:48       ` Nicholas Piggin
2022-11-10  9:40     ` Nicholas Piggin [this message]
2022-07-28  6:31 ` [PATCH 05/17] powerpc/qspinlock: allow new waiters to steal the lock before queueing Nicholas Piggin
2022-08-10  4:31   ` Jordan Niethe
2022-11-10  0:40   ` Jordan Niethe
2022-11-10 10:54     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 06/17] powerpc/qspinlock: theft prevention to control latency Nicholas Piggin
2022-08-10  5:51   ` Jordan Niethe
2022-11-10  0:40   ` Jordan Niethe
2022-11-10 10:57     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 07/17] powerpc/qspinlock: store owner CPU in lock word Nicholas Piggin
2022-08-12  0:50   ` Jordan Niethe
2022-11-10  0:40   ` Jordan Niethe
2022-11-10 10:59     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 08/17] powerpc/qspinlock: paravirt yield to lock owner Nicholas Piggin
2022-08-12  2:01   ` Jordan Niethe
2022-11-10  0:41   ` Jordan Niethe
2022-11-10 11:13     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 09/17] powerpc/qspinlock: implement option to yield to previous node Nicholas Piggin
2022-08-12  2:07   ` Jordan Niethe
2022-11-10  0:41   ` Jordan Niethe
2022-11-10 11:14     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 10/17] powerpc/qspinlock: allow stealing when head of queue yields Nicholas Piggin
2022-08-12  4:06   ` Jordan Niethe
2022-11-10  0:42   ` Jordan Niethe
2022-11-10 11:22     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 11/17] powerpc/qspinlock: allow propagation of yield CPU down the queue Nicholas Piggin
2022-08-12  4:17   ` Jordan Niethe
2022-10-06 17:27   ` Laurent Dufour
2022-11-10  0:42   ` Jordan Niethe
2022-11-10 11:25     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 12/17] powerpc/qspinlock: add ability to prod new queue head CPU Nicholas Piggin
2022-08-12  4:22   ` Jordan Niethe
2022-11-10  0:42   ` Jordan Niethe
2022-11-10 11:32     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 13/17] powerpc/qspinlock: trylock and initial lock attempt may steal Nicholas Piggin
2022-08-12  4:32   ` Jordan Niethe
2022-11-10  0:43   ` Jordan Niethe
2022-11-10 11:35     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 14/17] powerpc/qspinlock: use spin_begin/end API Nicholas Piggin
2022-08-12  4:36   ` Jordan Niethe
2022-11-10  0:43   ` Jordan Niethe
2022-11-10 11:36     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 15/17] powerpc/qspinlock: reduce remote node steal spins Nicholas Piggin
2022-08-12  4:43   ` Jordan Niethe
2022-11-10  0:43   ` Jordan Niethe
2022-11-10 11:37     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 16/17] powerpc/qspinlock: allow indefinite spinning on a preempted owner Nicholas Piggin
2022-08-12  4:49   ` Jordan Niethe
2022-09-22 15:02   ` Laurent Dufour
2022-09-23  8:16     ` Nicholas Piggin
2022-11-10  0:44   ` Jordan Niethe
2022-11-10 11:38     ` Nicholas Piggin
2022-07-28  6:31 ` [PATCH 17/17] powerpc/qspinlock: provide accounting and options for sleepy locks Nicholas Piggin
2022-08-15  1:11   ` Jordan Niethe
2022-11-10  0:44   ` Jordan Niethe
2022-11-10 11:41     ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CO8ILQMO00P0.3ATB8LNW57SOQ@bobo \
    --to=npiggin@gmail.com \
    --cc=jniethe5@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.