All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Hook up qspinlock for arm64
@ 2018-06-26 11:00 Will Deacon
  2018-06-26 11:00 ` [PATCH 1/3] arm64: barrier: Implement smp_cond_load_relaxed Will Deacon
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Will Deacon @ 2018-06-26 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi everybody,

With my recent changes to the core qspinlock code, it now performs well
enough on arm64 to replace our ticket-based approach.

Testing welcome,

Will

--->8

Will Deacon (3):
  arm64: barrier: Implement smp_cond_load_relaxed
  arm64: locking: Replace ticket lock implementation with qspinlock
  arm64: kconfig: Ensure spinlock fastpaths are inlined if !PREEMPT

 arch/arm64/Kconfig                      |  11 +++
 arch/arm64/include/asm/Kbuild           |   1 +
 arch/arm64/include/asm/barrier.h        |  13 ++++
 arch/arm64/include/asm/spinlock.h       | 117 +-------------------------------
 arch/arm64/include/asm/spinlock_types.h |  17 +----
 5 files changed, 27 insertions(+), 132 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] arm64: barrier: Implement smp_cond_load_relaxed
  2018-06-26 11:00 [PATCH 0/3] Hook up qspinlock for arm64 Will Deacon
@ 2018-06-26 11:00 ` Will Deacon
  2018-06-26 11:00 ` [PATCH 2/3] arm64: locking: Replace ticket lock implementation with qspinlock Will Deacon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2018-06-26 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

We can provide an implementation of smp_cond_load_relaxed using READ_ONCE
and __cmpwait_relaxed.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/barrier.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index f11518af96a9..822a9192c551 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -128,6 +128,19 @@ do {									\
 	__u.__val;							\
 })
 
+#define smp_cond_load_relaxed(ptr, cond_expr)				\
+({									\
+	typeof(ptr) __PTR = (ptr);					\
+	typeof(*ptr) VAL;						\
+	for (;;) {							\
+		VAL = READ_ONCE(*__PTR);				\
+		if (cond_expr)						\
+			break;						\
+		__cmpwait_relaxed(__PTR, VAL);				\
+	}								\
+	VAL;								\
+})
+
 #define smp_cond_load_acquire(ptr, cond_expr)				\
 ({									\
 	typeof(ptr) __PTR = (ptr);					\
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] arm64: locking: Replace ticket lock implementation with qspinlock
  2018-06-26 11:00 [PATCH 0/3] Hook up qspinlock for arm64 Will Deacon
  2018-06-26 11:00 ` [PATCH 1/3] arm64: barrier: Implement smp_cond_load_relaxed Will Deacon
@ 2018-06-26 11:00 ` Will Deacon
  2018-06-26 11:00 ` [PATCH 3/3] arm64: kconfig: Ensure spinlock fastpaths are inlined if !PREEMPT Will Deacon
  2018-07-20  9:07 ` [PATCH 0/3] Hook up qspinlock for arm64 John Garry
  3 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2018-06-26 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

It's fair to say that our ticket lock has served us well over time, but
it's time to bite the bullet and start using the generic qspinlock code
so we can make use of explicit MCS queuing and potentially better PV
performance in future.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig                      |   1 +
 arch/arm64/include/asm/Kbuild           |   1 +
 arch/arm64/include/asm/spinlock.h       | 117 +-------------------------------
 arch/arm64/include/asm/spinlock_types.h |  17 +----
 4 files changed, 4 insertions(+), 132 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 42c090cf0292..facd19625563 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -44,6 +44,7 @@ config ARM64
 	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_QUEUED_RWLOCKS
+	select ARCH_USE_QUEUED_SPINLOCKS
 	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 50000 || CC_IS_CLANG
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 3a9b84d39d71..6cd5d77b6b44 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += preempt.h
 generic-y += qrwlock.h
+generic-y += qspinlock.h
 generic-y += rwsem.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index 26c5bd7d88d8..38116008d18b 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -16,123 +16,8 @@
 #ifndef __ASM_SPINLOCK_H
 #define __ASM_SPINLOCK_H
 
-#include <asm/lse.h>
-#include <asm/spinlock_types.h>
-#include <asm/processor.h>
-
-/*
- * Spinlock implementation.
- *
- * The memory barriers are implicit with the load-acquire and store-release
- * instructions.
- */
-
-static inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	unsigned int tmp;
-	arch_spinlock_t lockval, newval;
-
-	asm volatile(
-	/* Atomically increment the next ticket. */
-	ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-"	prfm	pstl1strm, %3\n"
-"1:	ldaxr	%w0, %3\n"
-"	add	%w1, %w0, %w5\n"
-"	stxr	%w2, %w1, %3\n"
-"	cbnz	%w2, 1b\n",
-	/* LSE atomics */
-"	mov	%w2, %w5\n"
-"	ldadda	%w2, %w0, %3\n"
-	__nops(3)
-	)
-
-	/* Did we get the lock? */
-"	eor	%w1, %w0, %w0, ror #16\n"
-"	cbz	%w1, 3f\n"
-	/*
-	 * No: spin on the owner. Send a local event to avoid missing an
-	 * unlock before the exclusive load.
-	 */
-"	sevl\n"
-"2:	wfe\n"
-"	ldaxrh	%w2, %4\n"
-"	eor	%w1, %w2, %w0, lsr #16\n"
-"	cbnz	%w1, 2b\n"
-	/* We got the lock. Critical section starts here. */
-"3:"
-	: "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
-	: "Q" (lock->owner), "I" (1 << TICKET_SHIFT)
-	: "memory");
-}
-
-static inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned int tmp;
-	arch_spinlock_t lockval;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"	prfm	pstl1strm, %2\n"
-	"1:	ldaxr	%w0, %2\n"
-	"	eor	%w1, %w0, %w0, ror #16\n"
-	"	cbnz	%w1, 2f\n"
-	"	add	%w0, %w0, %3\n"
-	"	stxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 1b\n"
-	"2:",
-	/* LSE atomics */
-	"	ldr	%w0, %2\n"
-	"	eor	%w1, %w0, %w0, ror #16\n"
-	"	cbnz	%w1, 1f\n"
-	"	add	%w1, %w0, %3\n"
-	"	casa	%w0, %w1, %2\n"
-	"	sub	%w1, %w1, %3\n"
-	"	eor	%w1, %w1, %w0\n"
-	"1:")
-	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
-	: "I" (1 << TICKET_SHIFT)
-	: "memory");
-
-	return !tmp;
-}
-
-static inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	unsigned long tmp;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"	ldrh	%w1, %0\n"
-	"	add	%w1, %w1, #1\n"
-	"	stlrh	%w1, %0",
-	/* LSE atomics */
-	"	mov	%w1, #1\n"
-	"	staddlh	%w1, %0\n"
-	__nops(1))
-	: "=Q" (lock->owner), "=&r" (tmp)
-	:
-	: "memory");
-}
-
-static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	return lock.owner == lock.next;
-}
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	return !arch_spin_value_unlocked(READ_ONCE(*lock));
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	arch_spinlock_t lockval = READ_ONCE(*lock);
-	return (lockval.next - lockval.owner) > 1;
-}
-#define arch_spin_is_contended	arch_spin_is_contended
-
 #include <asm/qrwlock.h>
+#include <asm/qspinlock.h>
 
 /* See include/linux/spinlock.h */
 #define smp_mb__after_spinlock()	smp_mb()
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
index 6b856012c51b..a157ff465e27 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -20,22 +20,7 @@
 # error "please don't include this file directly"
 #endif
 
-#include <linux/types.h>
-
-#define TICKET_SHIFT	16
-
-typedef struct {
-#ifdef __AARCH64EB__
-	u16 next;
-	u16 owner;
-#else
-	u16 owner;
-	u16 next;
-#endif
-} __aligned(4) arch_spinlock_t;
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 , 0 }
-
+#include <asm-generic/qspinlock_types.h>
 #include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] arm64: kconfig: Ensure spinlock fastpaths are inlined if !PREEMPT
  2018-06-26 11:00 [PATCH 0/3] Hook up qspinlock for arm64 Will Deacon
  2018-06-26 11:00 ` [PATCH 1/3] arm64: barrier: Implement smp_cond_load_relaxed Will Deacon
  2018-06-26 11:00 ` [PATCH 2/3] arm64: locking: Replace ticket lock implementation with qspinlock Will Deacon
@ 2018-06-26 11:00 ` Will Deacon
  2018-07-20  9:07 ` [PATCH 0/3] Hook up qspinlock for arm64 John Garry
  3 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2018-06-26 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

When running with CONFIG_PREEMPT=n, the spinlock fastpaths fit inside
64 bytes, which typically coincides with the L1 I-cache line size.

Inline the spinlock fastpaths, like we do already for rwlocks.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index facd19625563..476de9b1d239 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -42,6 +42,16 @@ config ARM64
 	select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPT
 	select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPT
 	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
+	select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPT
+	select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPT
+	select ARCH_INLINE_SPIN_LOCK if !PREEMPT
+	select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPT
+	select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPT
+	select ARCH_INLINE_SPIN_UNLOCK if !PREEMPT
+	select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPT
+	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPT
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 0/3] Hook up qspinlock for arm64
  2018-06-26 11:00 [PATCH 0/3] Hook up qspinlock for arm64 Will Deacon
                   ` (2 preceding siblings ...)
  2018-06-26 11:00 ` [PATCH 3/3] arm64: kconfig: Ensure spinlock fastpaths are inlined if !PREEMPT Will Deacon
@ 2018-07-20  9:07 ` John Garry
  3 siblings, 0 replies; 5+ messages in thread
From: John Garry @ 2018-07-20  9:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/06/2018 12:00, Will Deacon wrote:
> Hi everybody,
>
> With my recent changes to the core qspinlock code, it now performs well
> enough on arm64 to replace our ticket-based approach.
>
> Testing welcome,
>

Hi Will,

JFYI, In the scenario we tested - which had a spinlock under high 
contention from many CPUs - we were see a big performance improvement.

I see this patchset is in linux-next, so assume it will be in 4.19

Cheers,
John

> Will
>
> --->8
>
> Will Deacon (3):
>   arm64: barrier: Implement smp_cond_load_relaxed
>   arm64: locking: Replace ticket lock implementation with qspinlock
>   arm64: kconfig: Ensure spinlock fastpaths are inlined if !PREEMPT
>
>  arch/arm64/Kconfig                      |  11 +++
>  arch/arm64/include/asm/Kbuild           |   1 +
>  arch/arm64/include/asm/barrier.h        |  13 ++++
>  arch/arm64/include/asm/spinlock.h       | 117 +-------------------------------
>  arch/arm64/include/asm/spinlock_types.h |  17 +----
>  5 files changed, 27 insertions(+), 132 deletions(-)
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-07-20  9:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-26 11:00 [PATCH 0/3] Hook up qspinlock for arm64 Will Deacon
2018-06-26 11:00 ` [PATCH 1/3] arm64: barrier: Implement smp_cond_load_relaxed Will Deacon
2018-06-26 11:00 ` [PATCH 2/3] arm64: locking: Replace ticket lock implementation with qspinlock Will Deacon
2018-06-26 11:00 ` [PATCH 3/3] arm64: kconfig: Ensure spinlock fastpaths are inlined if !PREEMPT Will Deacon
2018-07-20  9:07 ` [PATCH 0/3] Hook up qspinlock for arm64 John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.