All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] locking: Generic ticket lock
@ 2021-10-21 13:05 ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-21 13:05 UTC (permalink / raw)
  To: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne


There's currently a number of architectures that want/have graduated
from test-and-set locks and are looking at qspinlock.

*HOWEVER* qspinlock is very complicated and requires a lot of an
architecture to actually work correctly. Specifically it requires
forward progress between a fair number of atomic primitives, including
an xchg16 operation, which I've seen a fair number of fundamentally
broken implementations of in the tree (specifically for qspinlock no
less).

The benefit of qspinlock over ticket lock is also non-obvious, esp.
at low contention (the vast majority of cases in the kernel), and it
takes a fairly large number of CPUs (typically also NUMA) to make
qspinlock beat ticket locks.

Esp. things like ARM64's WFE can move the balance a lot in favour of
simpler locks by reducing the cacheline pressure due to waiters (see
their smp_cond_load_acquire() implementation for details).

Unless you've audited qspinlock for your architecture and found it
sound *and* can show actual benefit, simpler is better.

Therefore provide ticket locks, which depend on a single atomic
operation (fetch_add) while still providing fairness.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/qspinlock.h         |   30 +++++++++
 include/asm-generic/ticket_lock_types.h |   11 +++
 include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
 3 files changed, 138 insertions(+)

--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -2,6 +2,36 @@
 /*
  * Queued spinlock
  *
+ * A 'generic' spinlock implementation that is based on MCS locks. An
+ * architecture that's looking for a 'generic' spinlock, please first consider
+ * ticket_lock.h and only come looking here when you've considered all the
+ * constraints below and can show your hardware does actually perform better
+ * with qspinlock.
+ *
+ *
+ * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
+ * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
+ *
+ * It relies on a far greater (compared to ticket_lock.h) set of atomic
+ * operations to behave well together, please audit them carefully to ensure
+ * they all have forward progress. Many atomic operations may default to
+ * cmpxchg() loops which will not have good forward progress properties on
+ * LL/SC architectures.
+ *
+ * One notable example is atomic_fetch_or_acquire(), which x86 cannot (cheaply)
+ * do. Carefully read the patches that introduced
+ * queued_fetch_set_pending_acquire().
+ *
+ * It also heavily relies on mixed size atomic operations, in specific it
+ * requires architectures to have xchg16; something which many LL/SC
+ * architectures need to implement as a 32bit and+or in order to satisfy the
+ * forward progress guarantees mentioned above.
+ *
+ * Further reading on mixed size atomics that might be relevant:
+ *
+ *   http://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf
+ *
+ *
  * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
  * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
  *
--- /dev/null
+++ b/include/asm-generic/ticket_lock_types.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
+#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
+
+#include <linux/types.h>
+typedef atomic_t arch_spinlock_t;
+
+#define __ARCH_SPIN_LOCK_UNLOCKED	ATOMIC_INIT(0)
+
+#endif /* __ASM_GENERIC_TICKET_LOCK_TYPES_H */
--- /dev/null
+++ b/include/asm-generic/ticket_lock.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * 'Generic' ticket lock implementation.
+ *
+ * It relies on atomic_fetch_add() having well defined forward progress
+ * guarantees under contention. If your architecture cannot provide this, stick
+ * to a test-and-set lock.
+ *
+ * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
+ * sub-word of the value. This is generally true for anything LL/SC although
+ * you'd be hard pressed to find anything useful in architecture specifications
+ * about this. If your architecture cannot do this you might be better off with
+ * a test-and-set.
+ *
+ * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
+ * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
+ *
+ * The implementation uses smp_cond_load_acquire() to spin, so if the
+ * architecture has WFE like instructions to sleep instead of poll for word
+ * modifications be sure to implement that (see ARM64 for example).
+ *
+ */
+
+#ifndef __ASM_GENERIC_TICKET_LOCK_H
+#define __ASM_GENERIC_TICKET_LOCK_H
+
+#include <linux/atomic.h>
+#include <asm/ticket_lock_types.h>
+
+#define ONE_TICKET	(1 << 16)
+#define __ticket(val)	(u16)((val) >> 16)
+#define __owner(val)	(u16)((val) & 0xffff)
+
+static __always_inline bool __ticket_is_locked(u32 val)
+{
+	return __ticket(val) != __owner(val);
+}
+
+static __always_inline void ticket_lock(arch_spinlock_t *lock)
+{
+	u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);
+	u16 ticket = __ticket(val);
+
+	if (ticket == __owner(val))
+		return;
+
+	atomic_cond_read_acquire(lock, ticket == __owner(VAL));
+}
+
+static __always_inline bool ticket_trylock(arch_spinlock_t *lock)
+{
+	u32 old = atomic_read(lock);
+
+	if (__ticket_is_locked(old))
+		return false;
+
+	return atomic_try_cmpxchg_acquire(lock, &old, old + ONE_TICKET);
+}
+
+static __always_inline void ticket_unlock(arch_spinlock_t *lock)
+{
+	u16 *ptr = (u16 *)lock + __is_defined(__BIG_ENDIAN);
+	u32 val = atomic_read(lock);
+
+	smp_store_release(ptr, __owner(val) + 1);
+}
+
+static __always_inline int ticket_is_contended(arch_spinlock_t *lock)
+{
+	u32 val = atomic_read(lock);
+
+	return (__ticket(val) - __owner(val)) > 1;
+}
+
+static __always_inline int ticket_is_locked(arch_spinlock_t *lock)
+{
+	return __ticket_is_locked(atomic_read(lock));
+}
+
+static __always_inline int ticket_value_unlocked(arch_spinlock_t lock)
+{
+	return !__ticket_is_locked(lock.counter);
+}
+
+#undef __owner
+#undef __ticket
+#undef ONE_TICKET
+
+#define arch_spin_lock(l)		ticket_lock(l)
+#define arch_spin_trylock(l)		ticket_trylock(l)
+#define arch_spin_unlock(l)		ticket_unlock(l)
+#define arch_spin_is_locked(l)		ticket_is_locked(l)
+#define arch_spin_is_contended(l)	ticket_is_contended(l)
+#define arch_spin_value_unlocked(l)	ticket_value_unlocked(l)
+
+#endif /* __ASM_GENERIC_TICKET_LOCK_H */

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] locking: Generic ticket lock
@ 2021-10-21 13:05 ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-21 13:05 UTC (permalink / raw)
  To: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne


There's currently a number of architectures that want/have graduated
from test-and-set locks and are looking at qspinlock.

*HOWEVER* qspinlock is very complicated and requires a lot of an
architecture to actually work correctly. Specifically it requires
forward progress between a fair number of atomic primitives, including
an xchg16 operation, which I've seen a fair number of fundamentally
broken implementations of in the tree (specifically for qspinlock no
less).

The benefit of qspinlock over ticket lock is also non-obvious, esp.
at low contention (the vast majority of cases in the kernel), and it
takes a fairly large number of CPUs (typically also NUMA) to make
qspinlock beat ticket locks.

Esp. things like ARM64's WFE can move the balance a lot in favour of
simpler locks by reducing the cacheline pressure due to waiters (see
their smp_cond_load_acquire() implementation for details).

Unless you've audited qspinlock for your architecture and found it
sound *and* can show actual benefit, simpler is better.

Therefore provide ticket locks, which depend on a single atomic
operation (fetch_add) while still providing fairness.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/qspinlock.h         |   30 +++++++++
 include/asm-generic/ticket_lock_types.h |   11 +++
 include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
 3 files changed, 138 insertions(+)

--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -2,6 +2,36 @@
 /*
  * Queued spinlock
  *
+ * A 'generic' spinlock implementation that is based on MCS locks. An
+ * architecture that's looking for a 'generic' spinlock, please first consider
+ * ticket_lock.h and only come looking here when you've considered all the
+ * constraints below and can show your hardware does actually perform better
+ * with qspinlock.
+ *
+ *
+ * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
+ * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
+ *
+ * It relies on a far greater (compared to ticket_lock.h) set of atomic
+ * operations to behave well together, please audit them carefully to ensure
+ * they all have forward progress. Many atomic operations may default to
+ * cmpxchg() loops which will not have good forward progress properties on
+ * LL/SC architectures.
+ *
+ * One notable example is atomic_fetch_or_acquire(), which x86 cannot (cheaply)
+ * do. Carefully read the patches that introduced
+ * queued_fetch_set_pending_acquire().
+ *
+ * It also heavily relies on mixed size atomic operations, in specific it
+ * requires architectures to have xchg16; something which many LL/SC
+ * architectures need to implement as a 32bit and+or in order to satisfy the
+ * forward progress guarantees mentioned above.
+ *
+ * Further reading on mixed size atomics that might be relevant:
+ *
+ *   http://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf
+ *
+ *
  * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
  * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
  *
--- /dev/null
+++ b/include/asm-generic/ticket_lock_types.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
+#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
+
+#include <linux/types.h>
+typedef atomic_t arch_spinlock_t;
+
+#define __ARCH_SPIN_LOCK_UNLOCKED	ATOMIC_INIT(0)
+
+#endif /* __ASM_GENERIC_TICKET_LOCK_TYPES_H */
--- /dev/null
+++ b/include/asm-generic/ticket_lock.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * 'Generic' ticket lock implementation.
+ *
+ * It relies on atomic_fetch_add() having well defined forward progress
+ * guarantees under contention. If your architecture cannot provide this, stick
+ * to a test-and-set lock.
+ *
+ * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
+ * sub-word of the value. This is generally true for anything LL/SC although
+ * you'd be hard pressed to find anything useful in architecture specifications
+ * about this. If your architecture cannot do this you might be better off with
+ * a test-and-set.
+ *
+ * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
+ * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
+ *
+ * The implementation uses smp_cond_load_acquire() to spin, so if the
+ * architecture has WFE like instructions to sleep instead of poll for word
+ * modifications be sure to implement that (see ARM64 for example).
+ *
+ */
+
+#ifndef __ASM_GENERIC_TICKET_LOCK_H
+#define __ASM_GENERIC_TICKET_LOCK_H
+
+#include <linux/atomic.h>
+#include <asm/ticket_lock_types.h>
+
+#define ONE_TICKET	(1 << 16)
+#define __ticket(val)	(u16)((val) >> 16)
+#define __owner(val)	(u16)((val) & 0xffff)
+
+static __always_inline bool __ticket_is_locked(u32 val)
+{
+	return __ticket(val) != __owner(val);
+}
+
+static __always_inline void ticket_lock(arch_spinlock_t *lock)
+{
+	u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);
+	u16 ticket = __ticket(val);
+
+	if (ticket == __owner(val))
+		return;
+
+	atomic_cond_read_acquire(lock, ticket == __owner(VAL));
+}
+
+static __always_inline bool ticket_trylock(arch_spinlock_t *lock)
+{
+	u32 old = atomic_read(lock);
+
+	if (__ticket_is_locked(old))
+		return false;
+
+	return atomic_try_cmpxchg_acquire(lock, &old, old + ONE_TICKET);
+}
+
+static __always_inline void ticket_unlock(arch_spinlock_t *lock)
+{
+	u16 *ptr = (u16 *)lock + __is_defined(__BIG_ENDIAN);
+	u32 val = atomic_read(lock);
+
+	smp_store_release(ptr, __owner(val) + 1);
+}
+
+static __always_inline int ticket_is_contended(arch_spinlock_t *lock)
+{
+	u32 val = atomic_read(lock);
+
+	return (__ticket(val) - __owner(val)) > 1;
+}
+
+static __always_inline int ticket_is_locked(arch_spinlock_t *lock)
+{
+	return __ticket_is_locked(atomic_read(lock));
+}
+
+static __always_inline int ticket_value_unlocked(arch_spinlock_t lock)
+{
+	return !__ticket_is_locked(lock.counter);
+}
+
+#undef __owner
+#undef __ticket
+#undef ONE_TICKET
+
+#define arch_spin_lock(l)		ticket_lock(l)
+#define arch_spin_trylock(l)		ticket_trylock(l)
+#define arch_spin_unlock(l)		ticket_unlock(l)
+#define arch_spin_is_locked(l)		ticket_is_locked(l)
+#define arch_spin_is_contended(l)	ticket_is_contended(l)
+#define arch_spin_value_unlocked(l)	ticket_value_unlocked(l)
+
+#endif /* __ASM_GENERIC_TICKET_LOCK_H */

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 13:05 ` Peter Zijlstra
@ 2021-10-21 13:12   ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-21 13:12 UTC (permalink / raw)
  To: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> 
> There's currently a number of architectures that want/have graduated
> from test-and-set locks and are looking at qspinlock.
> 
> *HOWEVER* qspinlock is very complicated and requires a lot of an
> architecture to actually work correctly. Specifically it requires
> forward progress between a fair number of atomic primitives, including
> an xchg16 operation, which I've seen a fair number of fundamentally
> broken implementations of in the tree (specifically for qspinlock no
> less).
> 
> The benefit of qspinlock over ticket lock is also non-obvious, esp.
> at low contention (the vast majority of cases in the kernel), and it
> takes a fairly large number of CPUs (typically also NUMA) to make
> qspinlock beat ticket locks.
> 
> Esp. things like ARM64's WFE can move the balance a lot in favour of
> simpler locks by reducing the cacheline pressure due to waiters (see
> their smp_cond_load_acquire() implementation for details).
> 
> Unless you've audited qspinlock for your architecture and found it
> sound *and* can show actual benefit, simpler is better.
> 
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/qspinlock.h         |   30 +++++++++
>  include/asm-generic/ticket_lock_types.h |   11 +++
>  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)

A few notes...

> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),

This should hold true to RISC-V in its current form, AFAICT
atomic_fetch_add ends up using AMOADD, and therefore the argument made
in the unlock+lock thread [1], gives that this results in RW,RW
ordering.

[1] https://lore.kernel.org/lkml/5412ab37-2979-5717-4951-6a61366df0f2@nvidia.com/


I've compile tested on openrisc/simple_smp_defconfig using the below.

--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -30,7 +30,6 @@ config OPENRISC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select OR1K_PIC
 	select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
-	select ARCH_USE_QUEUED_SPINLOCKS
 	select ARCH_USE_QUEUED_RWLOCKS
 	select OMPIC if SMP
 	select ARCH_WANT_FRAME_POINTERS
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -1,9 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 generic-y += extable.h
 generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
-generic-y += qspinlock_types.h
-generic-y += qspinlock.h
+generic-y += ticket_lock_types.h
+generic-y += ticket_lock.h
 generic-y += qrwlock_types.h
 generic-y += qrwlock.h
 generic-y += user.h
--- a/arch/openrisc/include/asm/spinlock.h
+++ b/arch/openrisc/include/asm/spinlock.h
@@ -15,7 +15,7 @@
 #ifndef __ASM_OPENRISC_SPINLOCK_H
 #define __ASM_OPENRISC_SPINLOCK_H
 
-#include <asm/qspinlock.h>
+#include <asm/ticket_lock.h>
 
 #include <asm/qrwlock.h>
 
--- a/arch/openrisc/include/asm/spinlock_types.h
+++ b/arch/openrisc/include/asm/spinlock_types.h
@@ -1,7 +1,7 @@
 #ifndef _ASM_OPENRISC_SPINLOCK_TYPES_H
 #define _ASM_OPENRISC_SPINLOCK_TYPES_H
 
-#include <asm/qspinlock_types.h>
+#include <asm/ticket_lock_types.h>
 #include <asm/qrwlock_types.h>
 
 #endif /* _ASM_OPENRISC_SPINLOCK_TYPES_H */

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-21 13:12   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-21 13:12 UTC (permalink / raw)
  To: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> 
> There's currently a number of architectures that want/have graduated
> from test-and-set locks and are looking at qspinlock.
> 
> *HOWEVER* qspinlock is very complicated and requires a lot of an
> architecture to actually work correctly. Specifically it requires
> forward progress between a fair number of atomic primitives, including
> an xchg16 operation, which I've seen a fair number of fundamentally
> broken implementations of in the tree (specifically for qspinlock no
> less).
> 
> The benefit of qspinlock over ticket lock is also non-obvious, esp.
> at low contention (the vast majority of cases in the kernel), and it
> takes a fairly large number of CPUs (typically also NUMA) to make
> qspinlock beat ticket locks.
> 
> Esp. things like ARM64's WFE can move the balance a lot in favour of
> simpler locks by reducing the cacheline pressure due to waiters (see
> their smp_cond_load_acquire() implementation for details).
> 
> Unless you've audited qspinlock for your architecture and found it
> sound *and* can show actual benefit, simpler is better.
> 
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/qspinlock.h         |   30 +++++++++
>  include/asm-generic/ticket_lock_types.h |   11 +++
>  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)

A few notes...

> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),

This should hold true to RISC-V in its current form, AFAICT
atomic_fetch_add ends up using AMOADD, and therefore the argument made
in the unlock+lock thread [1], gives that this results in RW,RW
ordering.

[1] https://lore.kernel.org/lkml/5412ab37-2979-5717-4951-6a61366df0f2@nvidia.com/


I've compile tested on openrisc/simple_smp_defconfig using the below.

--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -30,7 +30,6 @@ config OPENRISC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select OR1K_PIC
 	select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
-	select ARCH_USE_QUEUED_SPINLOCKS
 	select ARCH_USE_QUEUED_RWLOCKS
 	select OMPIC if SMP
 	select ARCH_WANT_FRAME_POINTERS
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -1,9 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 generic-y += extable.h
 generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
-generic-y += qspinlock_types.h
-generic-y += qspinlock.h
+generic-y += ticket_lock_types.h
+generic-y += ticket_lock.h
 generic-y += qrwlock_types.h
 generic-y += qrwlock.h
 generic-y += user.h
--- a/arch/openrisc/include/asm/spinlock.h
+++ b/arch/openrisc/include/asm/spinlock.h
@@ -15,7 +15,7 @@
 #ifndef __ASM_OPENRISC_SPINLOCK_H
 #define __ASM_OPENRISC_SPINLOCK_H
 
-#include <asm/qspinlock.h>
+#include <asm/ticket_lock.h>
 
 #include <asm/qrwlock.h>
 
--- a/arch/openrisc/include/asm/spinlock_types.h
+++ b/arch/openrisc/include/asm/spinlock_types.h
@@ -1,7 +1,7 @@
 #ifndef _ASM_OPENRISC_SPINLOCK_TYPES_H
 #define _ASM_OPENRISC_SPINLOCK_TYPES_H
 
-#include <asm/qspinlock_types.h>
+#include <asm/ticket_lock_types.h>
 #include <asm/qrwlock_types.h>
 
 #endif /* _ASM_OPENRISC_SPINLOCK_TYPES_H */

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 13:05 ` Peter Zijlstra
@ 2021-10-21 13:49   ` Arnd Bergmann
  -1 siblings, 0 replies; 26+ messages in thread
From: Arnd Bergmann @ 2021-10-21 13:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, Linux Kernel Mailing List, Guo Ren, Palmer Dabbelt,
	Anup Patel, linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 3:05 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.

Nice!

Aside from the qspinlock vs ticket-lock question, can you describe the
tradeoffs between this generic ticket lock and a custom implementation
in architecture code? Should we convert most architectures over
to the generic code in the long run, or is there something they
can usually do better with an inline asm based ticket lock or
a trivial test-and-set?

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/qspinlock.h         |   30 +++++++++
>  include/asm-generic/ticket_lock_types.h |   11 +++
>  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)

If anyone wants to use this for their architecture, feel free to add

Acked-by: Arnd Bergmann <arnd@arndb.de>

to merge it through the respective architecture git tree. If there is more
than one architecture that wants it right now, I could also take them
all through
the asm-generic tree.

          Arnd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-21 13:49   ` Arnd Bergmann
  0 siblings, 0 replies; 26+ messages in thread
From: Arnd Bergmann @ 2021-10-21 13:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, Linux Kernel Mailing List, Guo Ren, Palmer Dabbelt,
	Anup Patel, linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 3:05 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.

Nice!

Aside from the qspinlock vs ticket-lock question, can you describe the
tradeoffs between this generic ticket lock and a custom implementation
in architecture code? Should we convert most architectures over
to the generic code in the long run, or is there something they
can usually do better with an inline asm based ticket lock or
a trivial test-and-set?

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/qspinlock.h         |   30 +++++++++
>  include/asm-generic/ticket_lock_types.h |   11 +++
>  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)

If anyone wants to use this for their architecture, feel free to add

Acked-by: Arnd Bergmann <arnd@arndb.de>

to merge it through the respective architecture git tree. If there is more
than one architecture that wants it right now, I could also take them
all through
the asm-generic tree.

          Arnd

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 13:12   ` Peter Zijlstra
@ 2021-10-21 13:50     ` Stafford Horne
  -1 siblings, 0 replies; 26+ messages in thread
From: Stafford Horne @ 2021-10-21 13:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner

On Thu, Oct 21, 2021 at 03:12:25PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> > 
> > There's currently a number of architectures that want/have graduated
> > from test-and-set locks and are looking at qspinlock.
> > 
> > *HOWEVER* qspinlock is very complicated and requires a lot of an
> > architecture to actually work correctly. Specifically it requires
> > forward progress between a fair number of atomic primitives, including
> > an xchg16 operation, which I've seen a fair number of fundamentally
> > broken implementations of in the tree (specifically for qspinlock no
> > less).
> > 
> > The benefit of qspinlock over ticket lock is also non-obvious, esp.
> > at low contention (the vast majority of cases in the kernel), and it
> > takes a fairly large number of CPUs (typically also NUMA) to make
> > qspinlock beat ticket locks.
> > 
> > Esp. things like ARM64's WFE can move the balance a lot in favour of
> > simpler locks by reducing the cacheline pressure due to waiters (see
> > their smp_cond_load_acquire() implementation for details).
> > 
> > Unless you've audited qspinlock for your architecture and found it
> > sound *and* can show actual benefit, simpler is better.

For OpenRISC originally we had a custom ticket locking mechanism, but it was
suggested to use qspinlocks as the genric implementation meant less code.

Changed here:

	https://yhbt.net/lore/all/86vaix5fmr.fsf@arm.com/T/

I think moving to qspinlocks was suggested by you.  But now that we have this
generic infrastructure, I am good to switch.

> > Therefore provide ticket locks, which depend on a single atomic
> > operation (fetch_add) while still providing fairness.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  include/asm-generic/qspinlock.h         |   30 +++++++++
> >  include/asm-generic/ticket_lock_types.h |   11 +++
> >  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
> >  3 files changed, 138 insertions(+)
> 
> A few notes...
> 
> > + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> > + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> 
> This should hold true to RISC-V in its current form, AFAICT
> atomic_fetch_add ends up using AMOADD, and therefore the argument made
> in the unlock+lock thread [1], gives that this results in RW,RW
> ordering.
> 
> [1] https://lore.kernel.org/lkml/5412ab37-2979-5717-4951-6a61366df0f2@nvidia.com/
> 
> 
> I've compile tested on openrisc/simple_smp_defconfig using the below.
> 
> --- a/arch/openrisc/Kconfig
> +++ b/arch/openrisc/Kconfig
> @@ -30,7 +30,6 @@ config OPENRISC
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select OR1K_PIC
>  	select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
> -	select ARCH_USE_QUEUED_SPINLOCKS
>  	select ARCH_USE_QUEUED_RWLOCKS
>  	select OMPIC if SMP
>  	select ARCH_WANT_FRAME_POINTERS
> --- a/arch/openrisc/include/asm/Kbuild
> +++ b/arch/openrisc/include/asm/Kbuild
> @@ -1,9 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0
>  generic-y += extable.h
>  generic-y += kvm_para.h
> -generic-y += mcs_spinlock.h
> -generic-y += qspinlock_types.h
> -generic-y += qspinlock.h
> +generic-y += ticket_lock_types.h
> +generic-y += ticket_lock.h
>  generic-y += qrwlock_types.h
>  generic-y += qrwlock.h
>  generic-y += user.h
> --- a/arch/openrisc/include/asm/spinlock.h
> +++ b/arch/openrisc/include/asm/spinlock.h
> @@ -15,7 +15,7 @@
>  #ifndef __ASM_OPENRISC_SPINLOCK_H
>  #define __ASM_OPENRISC_SPINLOCK_H
>  
> -#include <asm/qspinlock.h>
> +#include <asm/ticket_lock.h>
>  
>  #include <asm/qrwlock.h>
>  
> --- a/arch/openrisc/include/asm/spinlock_types.h
> +++ b/arch/openrisc/include/asm/spinlock_types.h
> @@ -1,7 +1,7 @@
>  #ifndef _ASM_OPENRISC_SPINLOCK_TYPES_H
>  #define _ASM_OPENRISC_SPINLOCK_TYPES_H
>  
> -#include <asm/qspinlock_types.h>
> +#include <asm/ticket_lock_types.h>
>  #include <asm/qrwlock_types.h>
>  
>  #endif /* _ASM_OPENRISC_SPINLOCK_TYPES_H */

This looks good to me.  Do you want to commit along with the
generic ticket lock patch?  Otherwise I can queue it after it is
upstreamed.  Another option is I can help merge the generic ticket
lock code via the OpenRISC branch.

Let me know what works.

-Stafford

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-21 13:50     ` Stafford Horne
  0 siblings, 0 replies; 26+ messages in thread
From: Stafford Horne @ 2021-10-21 13:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner

On Thu, Oct 21, 2021 at 03:12:25PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> > 
> > There's currently a number of architectures that want/have graduated
> > from test-and-set locks and are looking at qspinlock.
> > 
> > *HOWEVER* qspinlock is very complicated and requires a lot of an
> > architecture to actually work correctly. Specifically it requires
> > forward progress between a fair number of atomic primitives, including
> > an xchg16 operation, which I've seen a fair number of fundamentally
> > broken implementations of in the tree (specifically for qspinlock no
> > less).
> > 
> > The benefit of qspinlock over ticket lock is also non-obvious, esp.
> > at low contention (the vast majority of cases in the kernel), and it
> > takes a fairly large number of CPUs (typically also NUMA) to make
> > qspinlock beat ticket locks.
> > 
> > Esp. things like ARM64's WFE can move the balance a lot in favour of
> > simpler locks by reducing the cacheline pressure due to waiters (see
> > their smp_cond_load_acquire() implementation for details).
> > 
> > Unless you've audited qspinlock for your architecture and found it
> > sound *and* can show actual benefit, simpler is better.

For OpenRISC originally we had a custom ticket locking mechanism, but it was
suggested to use qspinlocks as the genric implementation meant less code.

Changed here:

	https://yhbt.net/lore/all/86vaix5fmr.fsf@arm.com/T/

I think moving to qspinlocks was suggested by you.  But now that we have this
generic infrastructure, I am good to switch.

> > Therefore provide ticket locks, which depend on a single atomic
> > operation (fetch_add) while still providing fairness.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  include/asm-generic/qspinlock.h         |   30 +++++++++
> >  include/asm-generic/ticket_lock_types.h |   11 +++
> >  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
> >  3 files changed, 138 insertions(+)
> 
> A few notes...
> 
> > + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> > + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> 
> This should hold true to RISC-V in its current form, AFAICT
> atomic_fetch_add ends up using AMOADD, and therefore the argument made
> in the unlock+lock thread [1], gives that this results in RW,RW
> ordering.
> 
> [1] https://lore.kernel.org/lkml/5412ab37-2979-5717-4951-6a61366df0f2@nvidia.com/
> 
> 
> I've compile tested on openrisc/simple_smp_defconfig using the below.
> 
> --- a/arch/openrisc/Kconfig
> +++ b/arch/openrisc/Kconfig
> @@ -30,7 +30,6 @@ config OPENRISC
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select OR1K_PIC
>  	select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
> -	select ARCH_USE_QUEUED_SPINLOCKS
>  	select ARCH_USE_QUEUED_RWLOCKS
>  	select OMPIC if SMP
>  	select ARCH_WANT_FRAME_POINTERS
> --- a/arch/openrisc/include/asm/Kbuild
> +++ b/arch/openrisc/include/asm/Kbuild
> @@ -1,9 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0
>  generic-y += extable.h
>  generic-y += kvm_para.h
> -generic-y += mcs_spinlock.h
> -generic-y += qspinlock_types.h
> -generic-y += qspinlock.h
> +generic-y += ticket_lock_types.h
> +generic-y += ticket_lock.h
>  generic-y += qrwlock_types.h
>  generic-y += qrwlock.h
>  generic-y += user.h
> --- a/arch/openrisc/include/asm/spinlock.h
> +++ b/arch/openrisc/include/asm/spinlock.h
> @@ -15,7 +15,7 @@
>  #ifndef __ASM_OPENRISC_SPINLOCK_H
>  #define __ASM_OPENRISC_SPINLOCK_H
>  
> -#include <asm/qspinlock.h>
> +#include <asm/ticket_lock.h>
>  
>  #include <asm/qrwlock.h>
>  
> --- a/arch/openrisc/include/asm/spinlock_types.h
> +++ b/arch/openrisc/include/asm/spinlock_types.h
> @@ -1,7 +1,7 @@
>  #ifndef _ASM_OPENRISC_SPINLOCK_TYPES_H
>  #define _ASM_OPENRISC_SPINLOCK_TYPES_H
>  
> -#include <asm/qspinlock_types.h>
> +#include <asm/ticket_lock_types.h>
>  #include <asm/qrwlock_types.h>
>  
>  #endif /* _ASM_OPENRISC_SPINLOCK_TYPES_H */

This looks good to me.  Do you want to commit along with the
generic ticket lock patch?  Otherwise I can queue it after it is
upstreamed.  Another option is I can help merge the generic ticket
lock code via the OpenRISC branch.

Let me know what works.

-Stafford

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 13:49   ` Arnd Bergmann
@ 2021-10-21 15:14     ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-21 15:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, linux-arch,
	Linux Kernel Mailing List, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 03:49:51PM +0200, Arnd Bergmann wrote:
> On Thu, Oct 21, 2021 at 3:05 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Therefore provide ticket locks, which depend on a single atomic
> > operation (fetch_add) while still providing fairness.
> 
> Nice!
> 
> Aside from the qspinlock vs ticket-lock question, can you describe the
> tradeoffs between this generic ticket lock and a custom implementation
> in architecture code? Should we convert most architectures over
> to the generic code in the long run, or is there something they
> can usually do better with an inline asm based ticket lock

I think for a load-store arch this thing should generate pretty close to
optimal code. x86 can do ticket_unlock() slightly better using a single
INCW (or ADDW 1) on the owner subword, where this implementation will to
separate load-add-store instructions.

If that is actually measurable is something else entirely.

> or a trivial test-and-set?

If your SMP arch is halfway sane (no fwd progress issues etc..) then
ticket should behave well and avoid the starvation/variablilty of TaS
lock.

The big exception there is virtualized architectures, ticket is
absolutely horrendous for 'guests' (any fair lock is for that matter).


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-21 15:14     ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-21 15:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, linux-arch,
	Linux Kernel Mailing List, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 03:49:51PM +0200, Arnd Bergmann wrote:
> On Thu, Oct 21, 2021 at 3:05 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Therefore provide ticket locks, which depend on a single atomic
> > operation (fetch_add) while still providing fairness.
> 
> Nice!
> 
> Aside from the qspinlock vs ticket-lock question, can you describe the
> tradeoffs between this generic ticket lock and a custom implementation
> in architecture code? Should we convert most architectures over
> to the generic code in the long run, or is there something they
> can usually do better with an inline asm based ticket lock

I think for a load-store arch this thing should generate pretty close to
optimal code. x86 can do ticket_unlock() slightly better using a single
INCW (or ADDW 1) on the owner subword, where this implementation will to
separate load-add-store instructions.

If that is actually measurable is something else entirely.

> or a trivial test-and-set?

If your SMP arch is halfway sane (no fwd progress issues etc..) then
ticket should behave well and avoid the starvation/variablilty of TaS
lock.

The big exception there is virtualized architectures, ticket is
absolutely horrendous for 'guests' (any fair lock is for that matter).


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 15:14     ` Peter Zijlstra
@ 2021-10-21 15:31       ` Arnd Bergmann
  -1 siblings, 0 replies; 26+ messages in thread
From: Arnd Bergmann @ 2021-10-21 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnd Bergmann, Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long,
	linux-arch, Linux Kernel Mailing List, Guo Ren, Palmer Dabbelt,
	Anup Patel, linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 5:14 PM Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Oct 21, 2021 at 03:49:51PM +0200, Arnd Bergmann wrote:
> I think for a load-store arch this thing should generate pretty close to
> optimal code. x86 can do ticket_unlock() slightly better using a single
> INCW (or ADDW 1) on the owner subword, where this implementation will to
> separate load-add-store instructions.
>
> If that is actually measurable is something else entirely.

Ok, so I guess such an architecture could take the generic implementation
and override just arch_spin_unlock() or just arch_spin_lock(), if that
makes a difference for them.

Should we perhaps turn your modified openrisc asm/spinlock.h
and asm/spin_lock_types.h the fallback in asm-generic, and
remove the ones for the architectures that have no overrides
at all?

Or possibly a version that can do both based on
CONFIG_ARCH_USE_QUEUED_SPINLOCKS? That would
let us remove even more architecture specific headers, but
it increases the risk of some architecture using qspinlock
when they really should not.

> > or a trivial test-and-set?
>
> If your SMP arch is halfway sane (no fwd progress issues etc..) then
> ticket should behave well and avoid the starvation/variablilty of TaS
> lock.

Ok, and I guess we still need to keep the parisc and sparc32 versions
anyway.

> The big exception there is virtualized architectures, ticket is
> absolutely horrendous for 'guests' (any fair lock is for that matter).

This might be useful information to put into the header, at least
I had no idea about this distinction.

       Arnd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-21 15:31       ` Arnd Bergmann
  0 siblings, 0 replies; 26+ messages in thread
From: Arnd Bergmann @ 2021-10-21 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnd Bergmann, Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long,
	linux-arch, Linux Kernel Mailing List, Guo Ren, Palmer Dabbelt,
	Anup Patel, linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 5:14 PM Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Oct 21, 2021 at 03:49:51PM +0200, Arnd Bergmann wrote:
> I think for a load-store arch this thing should generate pretty close to
> optimal code. x86 can do ticket_unlock() slightly better using a single
> INCW (or ADDW 1) on the owner subword, where this implementation will to
> separate load-add-store instructions.
>
> If that is actually measurable is something else entirely.

Ok, so I guess such an architecture could take the generic implementation
and override just arch_spin_unlock() or just arch_spin_lock(), if that
makes a difference for them.

Should we perhaps turn your modified openrisc asm/spinlock.h
and asm/spin_lock_types.h the fallback in asm-generic, and
remove the ones for the architectures that have no overrides
at all?

Or possibly a version that can do both based on
CONFIG_ARCH_USE_QUEUED_SPINLOCKS? That would
let us remove even more architecture specific headers, but
it increases the risk of some architecture using qspinlock
when they really should not.

> > or a trivial test-and-set?
>
> If your SMP arch is halfway sane (no fwd progress issues etc..) then
> ticket should behave well and avoid the starvation/variablilty of TaS
> lock.

Ok, and I guess we still need to keep the parisc and sparc32 versions
anyway.

> The big exception there is virtualized architectures, ticket is
> absolutely horrendous for 'guests' (any fair lock is for that matter).

This might be useful information to put into the header, at least
I had no idea about this distinction.

       Arnd

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 15:31       ` Arnd Bergmann
@ 2021-10-21 16:28         ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-21 16:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, linux-arch,
	Linux Kernel Mailing List, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 05:31:51PM +0200, Arnd Bergmann wrote:
> On Thu, Oct 21, 2021 at 5:14 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Oct 21, 2021 at 03:49:51PM +0200, Arnd Bergmann wrote:
> > I think for a load-store arch this thing should generate pretty close to
> > optimal code. x86 can do ticket_unlock() slightly better using a single
> > INCW (or ADDW 1) on the owner subword, where this implementation will to
> > separate load-add-store instructions.
> >
> > If that is actually measurable is something else entirely.
> 
> Ok, so I guess such an architecture could take the generic implementation
> and override just arch_spin_unlock() or just arch_spin_lock(), if that
> makes a difference for them.

Also, Pre EV5 Dec Alpha might have issues since it can only do 32bit
wide accesses, and it would need an ll/sc to unlock.

But yes, if/when needed we could allow overrides.

> Should we perhaps turn your modified openrisc asm/spinlock.h
> and asm/spin_lock_types.h the fallback in asm-generic, and
> remove the ones for the architectures that have no overrides
> at all?

Possibly, yes.

> > If your SMP arch is halfway sane (no fwd progress issues etc..) then
> > ticket should behave well and avoid the starvation/variablilty of TaS
> > lock.
> 
> Ok, and I guess we still need to keep the parisc and sparc32 versions
> anyway.

Yes, both those only have an xchg() (like) instruction and can
realistically only implement TaS locks and have to build everything else
on top of that... if only we could get rid of all that :-)

> > The big exception there is virtualized architectures, ticket is
> > absolutely horrendous for 'guests' (any fair lock is for that matter).
> 
> This might be useful information to put into the header, at least
> I had no idea about this distinction.

Yes indeed, I'd not thought of it until you asked.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-21 16:28         ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-21 16:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, linux-arch,
	Linux Kernel Mailing List, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 05:31:51PM +0200, Arnd Bergmann wrote:
> On Thu, Oct 21, 2021 at 5:14 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Oct 21, 2021 at 03:49:51PM +0200, Arnd Bergmann wrote:
> > I think for a load-store arch this thing should generate pretty close to
> > optimal code. x86 can do ticket_unlock() slightly better using a single
> > INCW (or ADDW 1) on the owner subword, where this implementation will to
> > separate load-add-store instructions.
> >
> > If that is actually measurable is something else entirely.
> 
> Ok, so I guess such an architecture could take the generic implementation
> and override just arch_spin_unlock() or just arch_spin_lock(), if that
> makes a difference for them.

Also, Pre EV5 Dec Alpha might have issues since it can only do 32bit
wide accesses, and it would need an ll/sc to unlock.

But yes, if/when needed we could allow overrides.

> Should we perhaps turn your modified openrisc asm/spinlock.h
> and asm/spin_lock_types.h the fallback in asm-generic, and
> remove the ones for the architectures that have no overrides
> at all?

Possibly, yes.

> > If your SMP arch is halfway sane (no fwd progress issues etc..) then
> > ticket should behave well and avoid the starvation/variablilty of TaS
> > lock.
> 
> Ok, and I guess we still need to keep the parisc and sparc32 versions
> anyway.

Yes, both those only have an xchg() (like) instruction and can
realistically only implement TaS locks and have to build everything else
on top of that... if only we could get rid of all that :-)

> > The big exception there is virtualized architectures, ticket is
> > absolutely horrendous for 'guests' (any fair lock is for that matter).
> 
> This might be useful information to put into the header, at least
> I had no idea about this distinction.

Yes indeed, I'd not thought of it until you asked.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 13:05 ` Peter Zijlstra
@ 2021-10-21 18:04   ` Waiman Long
  -1 siblings, 0 replies; 26+ messages in thread
From: Waiman Long @ 2021-10-21 18:04 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On 10/21/21 9:05 AM, Peter Zijlstra wrote:
> There's currently a number of architectures that want/have graduated
> from test-and-set locks and are looking at qspinlock.
>
> *HOWEVER* qspinlock is very complicated and requires a lot of an
> architecture to actually work correctly. Specifically it requires
> forward progress between a fair number of atomic primitives, including
> an xchg16 operation, which I've seen a fair number of fundamentally
> broken implementations of in the tree (specifically for qspinlock no
> less).
>
> The benefit of qspinlock over ticket lock is also non-obvious, esp.
> at low contention (the vast majority of cases in the kernel), and it
> takes a fairly large number of CPUs (typically also NUMA) to make
> qspinlock beat ticket locks.
>
> Esp. things like ARM64's WFE can move the balance a lot in favour of
> simpler locks by reducing the cacheline pressure due to waiters (see
> their smp_cond_load_acquire() implementation for details).
>
> Unless you've audited qspinlock for your architecture and found it
> sound *and* can show actual benefit, simpler is better.
>
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   include/asm-generic/qspinlock.h         |   30 +++++++++
>   include/asm-generic/ticket_lock_types.h |   11 +++
>   include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>   3 files changed, 138 insertions(+)
>
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -2,6 +2,36 @@
>   /*
>    * Queued spinlock
>    *
> + * A 'generic' spinlock implementation that is based on MCS locks. An
> + * architecture that's looking for a 'generic' spinlock, please first consider
> + * ticket_lock.h and only come looking here when you've considered all the
> + * constraints below and can show your hardware does actually perform better
> + * with qspinlock.
> + *
> + *
> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> + *
> + * It relies on a far greater (compared to ticket_lock.h) set of atomic
> + * operations to behave well together, please audit them carefully to ensure
> + * they all have forward progress. Many atomic operations may default to
> + * cmpxchg() loops which will not have good forward progress properties on
> + * LL/SC architectures.
> + *
> + * One notable example is atomic_fetch_or_acquire(), which x86 cannot (cheaply)
> + * do. Carefully read the patches that introduced
> + * queued_fetch_set_pending_acquire().
> + *
> + * It also heavily relies on mixed size atomic operations, in specific it
> + * requires architectures to have xchg16; something which many LL/SC
> + * architectures need to implement as a 32bit and+or in order to satisfy the
> + * forward progress guarantees mentioned above.
> + *
> + * Further reading on mixed size atomics that might be relevant:
> + *
> + *   http://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf
> + *
> + *
>    * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
>    * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
>    *
> --- /dev/null
> +++ b/include/asm-generic/ticket_lock_types.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +
> +#include <linux/types.h>
> +typedef atomic_t arch_spinlock_t;
> +
> +#define __ARCH_SPIN_LOCK_UNLOCKED	ATOMIC_INIT(0)
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_TYPES_H */
> --- /dev/null
> +++ b/include/asm-generic/ticket_lock.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * 'Generic' ticket lock implementation.
> + *
> + * It relies on atomic_fetch_add() having well defined forward progress
> + * guarantees under contention. If your architecture cannot provide this, stick
> + * to a test-and-set lock.
> + *
> + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
> + * sub-word of the value. This is generally true for anything LL/SC although
> + * you'd be hard pressed to find anything useful in architecture specifications
> + * about this. If your architecture cannot do this you might be better off with
> + * a test-and-set.
> + *
> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> + *
> + * The implementation uses smp_cond_load_acquire() to spin, so if the
> + * architecture has WFE like instructions to sleep instead of poll for word
> + * modifications be sure to implement that (see ARM64 for example).
> + *
> + */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_H
> +#define __ASM_GENERIC_TICKET_LOCK_H
> +
> +#include <linux/atomic.h>
> +#include <asm/ticket_lock_types.h>
> +
> +#define ONE_TICKET	(1 << 16)
> +#define __ticket(val)	(u16)((val) >> 16)
> +#define __owner(val)	(u16)((val) & 0xffff)
> +
> +static __always_inline bool __ticket_is_locked(u32 val)
> +{
> +	return __ticket(val) != __owner(val);
> +}
> +
> +static __always_inline void ticket_lock(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);
> +	u16 ticket = __ticket(val);
> +
> +	if (ticket == __owner(val))
> +		return;
> +
> +	atomic_cond_read_acquire(lock, ticket == __owner(VAL));
> +}
> +
> +static __always_inline bool ticket_trylock(arch_spinlock_t *lock)
> +{
> +	u32 old = atomic_read(lock);
> +
> +	if (__ticket_is_locked(old))
> +		return false;
> +
> +	return atomic_try_cmpxchg_acquire(lock, &old, old + ONE_TICKET);
> +}
> +
> +static __always_inline void ticket_unlock(arch_spinlock_t *lock)
> +{
> +	u16 *ptr = (u16 *)lock + __is_defined(__BIG_ENDIAN);
> +	u32 val = atomic_read(lock);
> +
> +	smp_store_release(ptr, __owner(val) + 1);
> +}
> +
> +static __always_inline int ticket_is_contended(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_read(lock);
> +
> +	return (__ticket(val) - __owner(val)) > 1;
Nit: The left side is unsigned, but the right is signed. I think you are 
relying on the implicit signed to unsigned conversion. It may be a bit 
clearer if you use 1U instead.
> +}
> +
> +static __always_inline int ticket_is_locked(arch_spinlock_t *lock)
> +{
> +	return __ticket_is_locked(atomic_read(lock));
> +}
> +
> +static __always_inline int ticket_value_unlocked(arch_spinlock_t lock)
> +{
> +	return !__ticket_is_locked(lock.counter);
> +}
> +
> +#undef __owner
> +#undef __ticket
> +#undef ONE_TICKET
> +
> +#define arch_spin_lock(l)		ticket_lock(l)
> +#define arch_spin_trylock(l)		ticket_trylock(l)
> +#define arch_spin_unlock(l)		ticket_unlock(l)
> +#define arch_spin_is_locked(l)		ticket_is_locked(l)
> +#define arch_spin_is_contended(l)	ticket_is_contended(l)
> +#define arch_spin_value_unlocked(l)	ticket_value_unlocked(l)
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_H */

Other than the nit above, the patch looks good to me.

Reviewed-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-21 18:04   ` Waiman Long
  0 siblings, 0 replies; 26+ messages in thread
From: Waiman Long @ 2021-10-21 18:04 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On 10/21/21 9:05 AM, Peter Zijlstra wrote:
> There's currently a number of architectures that want/have graduated
> from test-and-set locks and are looking at qspinlock.
>
> *HOWEVER* qspinlock is very complicated and requires a lot of an
> architecture to actually work correctly. Specifically it requires
> forward progress between a fair number of atomic primitives, including
> an xchg16 operation, which I've seen a fair number of fundamentally
> broken implementations of in the tree (specifically for qspinlock no
> less).
>
> The benefit of qspinlock over ticket lock is also non-obvious, esp.
> at low contention (the vast majority of cases in the kernel), and it
> takes a fairly large number of CPUs (typically also NUMA) to make
> qspinlock beat ticket locks.
>
> Esp. things like ARM64's WFE can move the balance a lot in favour of
> simpler locks by reducing the cacheline pressure due to waiters (see
> their smp_cond_load_acquire() implementation for details).
>
> Unless you've audited qspinlock for your architecture and found it
> sound *and* can show actual benefit, simpler is better.
>
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   include/asm-generic/qspinlock.h         |   30 +++++++++
>   include/asm-generic/ticket_lock_types.h |   11 +++
>   include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>   3 files changed, 138 insertions(+)
>
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -2,6 +2,36 @@
>   /*
>    * Queued spinlock
>    *
> + * A 'generic' spinlock implementation that is based on MCS locks. An
> + * architecture that's looking for a 'generic' spinlock, please first consider
> + * ticket_lock.h and only come looking here when you've considered all the
> + * constraints below and can show your hardware does actually perform better
> + * with qspinlock.
> + *
> + *
> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> + *
> + * It relies on a far greater (compared to ticket_lock.h) set of atomic
> + * operations to behave well together, please audit them carefully to ensure
> + * they all have forward progress. Many atomic operations may default to
> + * cmpxchg() loops which will not have good forward progress properties on
> + * LL/SC architectures.
> + *
> + * One notable example is atomic_fetch_or_acquire(), which x86 cannot (cheaply)
> + * do. Carefully read the patches that introduced
> + * queued_fetch_set_pending_acquire().
> + *
> + * It also heavily relies on mixed size atomic operations, in specific it
> + * requires architectures to have xchg16; something which many LL/SC
> + * architectures need to implement as a 32bit and+or in order to satisfy the
> + * forward progress guarantees mentioned above.
> + *
> + * Further reading on mixed size atomics that might be relevant:
> + *
> + *   http://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf
> + *
> + *
>    * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
>    * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
>    *
> --- /dev/null
> +++ b/include/asm-generic/ticket_lock_types.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +
> +#include <linux/types.h>
> +typedef atomic_t arch_spinlock_t;
> +
> +#define __ARCH_SPIN_LOCK_UNLOCKED	ATOMIC_INIT(0)
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_TYPES_H */
> --- /dev/null
> +++ b/include/asm-generic/ticket_lock.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * 'Generic' ticket lock implementation.
> + *
> + * It relies on atomic_fetch_add() having well defined forward progress
> + * guarantees under contention. If your architecture cannot provide this, stick
> + * to a test-and-set lock.
> + *
> + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
> + * sub-word of the value. This is generally true for anything LL/SC although
> + * you'd be hard pressed to find anything useful in architecture specifications
> + * about this. If your architecture cannot do this you might be better off with
> + * a test-and-set.
> + *
> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> + *
> + * The implementation uses smp_cond_load_acquire() to spin, so if the
> + * architecture has WFE like instructions to sleep instead of poll for word
> + * modifications be sure to implement that (see ARM64 for example).
> + *
> + */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_H
> +#define __ASM_GENERIC_TICKET_LOCK_H
> +
> +#include <linux/atomic.h>
> +#include <asm/ticket_lock_types.h>
> +
> +#define ONE_TICKET	(1 << 16)
> +#define __ticket(val)	(u16)((val) >> 16)
> +#define __owner(val)	(u16)((val) & 0xffff)
> +
> +static __always_inline bool __ticket_is_locked(u32 val)
> +{
> +	return __ticket(val) != __owner(val);
> +}
> +
> +static __always_inline void ticket_lock(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);
> +	u16 ticket = __ticket(val);
> +
> +	if (ticket == __owner(val))
> +		return;
> +
> +	atomic_cond_read_acquire(lock, ticket == __owner(VAL));
> +}
> +
> +static __always_inline bool ticket_trylock(arch_spinlock_t *lock)
> +{
> +	u32 old = atomic_read(lock);
> +
> +	if (__ticket_is_locked(old))
> +		return false;
> +
> +	return atomic_try_cmpxchg_acquire(lock, &old, old + ONE_TICKET);
> +}
> +
> +static __always_inline void ticket_unlock(arch_spinlock_t *lock)
> +{
> +	u16 *ptr = (u16 *)lock + __is_defined(__BIG_ENDIAN);
> +	u32 val = atomic_read(lock);
> +
> +	smp_store_release(ptr, __owner(val) + 1);
> +}
> +
> +static __always_inline int ticket_is_contended(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_read(lock);
> +
> +	return (__ticket(val) - __owner(val)) > 1;
Nit: The left side is unsigned, but the right is signed. I think you are 
relying on the implicit signed to unsigned conversion. It may be a bit 
clearer if you use 1U instead.
> +}
> +
> +static __always_inline int ticket_is_locked(arch_spinlock_t *lock)
> +{
> +	return __ticket_is_locked(atomic_read(lock));
> +}
> +
> +static __always_inline int ticket_value_unlocked(arch_spinlock_t lock)
> +{
> +	return !__ticket_is_locked(lock.counter);
> +}
> +
> +#undef __owner
> +#undef __ticket
> +#undef ONE_TICKET
> +
> +#define arch_spin_lock(l)		ticket_lock(l)
> +#define arch_spin_trylock(l)		ticket_trylock(l)
> +#define arch_spin_unlock(l)		ticket_unlock(l)
> +#define arch_spin_is_locked(l)		ticket_is_locked(l)
> +#define arch_spin_is_contended(l)	ticket_is_contended(l)
> +#define arch_spin_value_unlocked(l)	ticket_value_unlocked(l)
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_H */

Other than the nit above, the patch looks good to me.

Reviewed-by: Waiman Long <longman@redhat.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 13:05 ` Peter Zijlstra
@ 2021-10-22  2:04   ` Guo Ren
  -1 siblings, 0 replies; 26+ messages in thread
From: Guo Ren @ 2021-10-22  2:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, Linux Kernel Mailing List, Palmer Dabbelt,
	Anup Patel, linux-riscv, Christoph Müllner, Stafford Horne

C-SKY would follow this, thx.

Reviewed-by: Guo Ren <guoren@kernel.org>

On Thu, Oct 21, 2021 at 9:05 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> There's currently a number of architectures that want/have graduated
> from test-and-set locks and are looking at qspinlock.
>
> *HOWEVER* qspinlock is very complicated and requires a lot of an
> architecture to actually work correctly. Specifically it requires
> forward progress between a fair number of atomic primitives, including
> an xchg16 operation, which I've seen a fair number of fundamentally
> broken implementations of in the tree (specifically for qspinlock no
> less).
>
> The benefit of qspinlock over ticket lock is also non-obvious, esp.
> at low contention (the vast majority of cases in the kernel), and it
> takes a fairly large number of CPUs (typically also NUMA) to make
> qspinlock beat ticket locks.
>
> Esp. things like ARM64's WFE can move the balance a lot in favour of
> simpler locks by reducing the cacheline pressure due to waiters (see
> their smp_cond_load_acquire() implementation for details).
>
> Unless you've audited qspinlock for your architecture and found it
> sound *and* can show actual benefit, simpler is better.
>
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/qspinlock.h         |   30 +++++++++
>  include/asm-generic/ticket_lock_types.h |   11 +++
>  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)
>
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -2,6 +2,36 @@
>  /*
>   * Queued spinlock
>   *
> + * A 'generic' spinlock implementation that is based on MCS locks. An
> + * architecture that's looking for a 'generic' spinlock, please first consider
> + * ticket_lock.h and only come looking here when you've considered all the
> + * constraints below and can show your hardware does actually perform better
> + * with qspinlock.
> + *
> + *
> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> + *
> + * It relies on a far greater (compared to ticket_lock.h) set of atomic
> + * operations to behave well together, please audit them carefully to ensure
> + * they all have forward progress. Many atomic operations may default to
> + * cmpxchg() loops which will not have good forward progress properties on
> + * LL/SC architectures.
> + *
> + * One notable example is atomic_fetch_or_acquire(), which x86 cannot (cheaply)
> + * do. Carefully read the patches that introduced
> + * queued_fetch_set_pending_acquire().
> + *
> + * It also heavily relies on mixed size atomic operations, in specific it
> + * requires architectures to have xchg16; something which many LL/SC
> + * architectures need to implement as a 32bit and+or in order to satisfy the
> + * forward progress guarantees mentioned above.
> + *
> + * Further reading on mixed size atomics that might be relevant:
> + *
> + *   http://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf
> + *
> + *
>   * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
>   * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
>   *
> --- /dev/null
> +++ b/include/asm-generic/ticket_lock_types.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +
> +#include <linux/types.h>
> +typedef atomic_t arch_spinlock_t;
> +
> +#define __ARCH_SPIN_LOCK_UNLOCKED      ATOMIC_INIT(0)
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_TYPES_H */
> --- /dev/null
> +++ b/include/asm-generic/ticket_lock.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * 'Generic' ticket lock implementation.
> + *
> + * It relies on atomic_fetch_add() having well defined forward progress
> + * guarantees under contention. If your architecture cannot provide this, stick
> + * to a test-and-set lock.
> + *
> + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
> + * sub-word of the value. This is generally true for anything LL/SC although
> + * you'd be hard pressed to find anything useful in architecture specifications
> + * about this. If your architecture cannot do this you might be better off with
> + * a test-and-set.
> + *
> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> + *
> + * The implementation uses smp_cond_load_acquire() to spin, so if the
> + * architecture has WFE like instructions to sleep instead of poll for word
> + * modifications be sure to implement that (see ARM64 for example).
> + *
> + */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_H
> +#define __ASM_GENERIC_TICKET_LOCK_H
> +
> +#include <linux/atomic.h>
> +#include <asm/ticket_lock_types.h>
> +
> +#define ONE_TICKET     (1 << 16)
> +#define __ticket(val)  (u16)((val) >> 16)
> +#define __owner(val)   (u16)((val) & 0xffff)
> +
> +static __always_inline bool __ticket_is_locked(u32 val)
> +{
> +       return __ticket(val) != __owner(val);
> +}
> +
> +static __always_inline void ticket_lock(arch_spinlock_t *lock)
> +{
> +       u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);
> +       u16 ticket = __ticket(val);
> +
> +       if (ticket == __owner(val))
> +               return;
> +
> +       atomic_cond_read_acquire(lock, ticket == __owner(VAL));
> +}
> +
> +static __always_inline bool ticket_trylock(arch_spinlock_t *lock)
> +{
> +       u32 old = atomic_read(lock);
> +
> +       if (__ticket_is_locked(old))
> +               return false;
> +
> +       return atomic_try_cmpxchg_acquire(lock, &old, old + ONE_TICKET);
> +}
> +
> +static __always_inline void ticket_unlock(arch_spinlock_t *lock)
> +{
> +       u16 *ptr = (u16 *)lock + __is_defined(__BIG_ENDIAN);
> +       u32 val = atomic_read(lock);
> +
> +       smp_store_release(ptr, __owner(val) + 1);
> +}
> +
> +static __always_inline int ticket_is_contended(arch_spinlock_t *lock)
> +{
> +       u32 val = atomic_read(lock);
> +
> +       return (__ticket(val) - __owner(val)) > 1;
> +}
> +
> +static __always_inline int ticket_is_locked(arch_spinlock_t *lock)
> +{
> +       return __ticket_is_locked(atomic_read(lock));
> +}
> +
> +static __always_inline int ticket_value_unlocked(arch_spinlock_t lock)
> +{
> +       return !__ticket_is_locked(lock.counter);
> +}
> +
> +#undef __owner
> +#undef __ticket
> +#undef ONE_TICKET
> +
> +#define arch_spin_lock(l)              ticket_lock(l)
> +#define arch_spin_trylock(l)           ticket_trylock(l)
> +#define arch_spin_unlock(l)            ticket_unlock(l)
> +#define arch_spin_is_locked(l)         ticket_is_locked(l)
> +#define arch_spin_is_contended(l)      ticket_is_contended(l)
> +#define arch_spin_value_unlocked(l)    ticket_value_unlocked(l)
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_H */



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-22  2:04   ` Guo Ren
  0 siblings, 0 replies; 26+ messages in thread
From: Guo Ren @ 2021-10-22  2:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, Linux Kernel Mailing List, Palmer Dabbelt,
	Anup Patel, linux-riscv, Christoph Müllner, Stafford Horne

C-SKY would follow this, thx.

Reviewed-by: Guo Ren <guoren@kernel.org>

On Thu, Oct 21, 2021 at 9:05 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> There's currently a number of architectures that want/have graduated
> from test-and-set locks and are looking at qspinlock.
>
> *HOWEVER* qspinlock is very complicated and requires a lot of an
> architecture to actually work correctly. Specifically it requires
> forward progress between a fair number of atomic primitives, including
> an xchg16 operation, which I've seen a fair number of fundamentally
> broken implementations of in the tree (specifically for qspinlock no
> less).
>
> The benefit of qspinlock over ticket lock is also non-obvious, esp.
> at low contention (the vast majority of cases in the kernel), and it
> takes a fairly large number of CPUs (typically also NUMA) to make
> qspinlock beat ticket locks.
>
> Esp. things like ARM64's WFE can move the balance a lot in favour of
> simpler locks by reducing the cacheline pressure due to waiters (see
> their smp_cond_load_acquire() implementation for details).
>
> Unless you've audited qspinlock for your architecture and found it
> sound *and* can show actual benefit, simpler is better.
>
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/qspinlock.h         |   30 +++++++++
>  include/asm-generic/ticket_lock_types.h |   11 +++
>  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)
>
> --- a/include/asm-generic/qspinlock.h
> +++ b/include/asm-generic/qspinlock.h
> @@ -2,6 +2,36 @@
>  /*
>   * Queued spinlock
>   *
> + * A 'generic' spinlock implementation that is based on MCS locks. An
> + * architecture that's looking for a 'generic' spinlock, please first consider
> + * ticket_lock.h and only come looking here when you've considered all the
> + * constraints below and can show your hardware does actually perform better
> + * with qspinlock.
> + *
> + *
> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> + *
> + * It relies on a far greater (compared to ticket_lock.h) set of atomic
> + * operations to behave well together, please audit them carefully to ensure
> + * they all have forward progress. Many atomic operations may default to
> + * cmpxchg() loops which will not have good forward progress properties on
> + * LL/SC architectures.
> + *
> + * One notable example is atomic_fetch_or_acquire(), which x86 cannot (cheaply)
> + * do. Carefully read the patches that introduced
> + * queued_fetch_set_pending_acquire().
> + *
> + * It also heavily relies on mixed size atomic operations, in specific it
> + * requires architectures to have xchg16; something which many LL/SC
> + * architectures need to implement as a 32bit and+or in order to satisfy the
> + * forward progress guarantees mentioned above.
> + *
> + * Further reading on mixed size atomics that might be relevant:
> + *
> + *   http://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf
> + *
> + *
>   * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
>   * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
>   *
> --- /dev/null
> +++ b/include/asm-generic/ticket_lock_types.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +
> +#include <linux/types.h>
> +typedef atomic_t arch_spinlock_t;
> +
> +#define __ARCH_SPIN_LOCK_UNLOCKED      ATOMIC_INIT(0)
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_TYPES_H */
> --- /dev/null
> +++ b/include/asm-generic/ticket_lock.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * 'Generic' ticket lock implementation.
> + *
> + * It relies on atomic_fetch_add() having well defined forward progress
> + * guarantees under contention. If your architecture cannot provide this, stick
> + * to a test-and-set lock.
> + *
> + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
> + * sub-word of the value. This is generally true for anything LL/SC although
> + * you'd be hard pressed to find anything useful in architecture specifications
> + * about this. If your architecture cannot do this you might be better off with
> + * a test-and-set.
> + *
> + * It relies on smp_store_release() + atomic_*_acquire() to be RCsc (or no
> + * weaker than RCtso if you're Power, also see smp_mb__after_unlock_lock()),
> + *
> + * The implementation uses smp_cond_load_acquire() to spin, so if the
> + * architecture has WFE like instructions to sleep instead of poll for word
> + * modifications be sure to implement that (see ARM64 for example).
> + *
> + */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_H
> +#define __ASM_GENERIC_TICKET_LOCK_H
> +
> +#include <linux/atomic.h>
> +#include <asm/ticket_lock_types.h>
> +
> +#define ONE_TICKET     (1 << 16)
> +#define __ticket(val)  (u16)((val) >> 16)
> +#define __owner(val)   (u16)((val) & 0xffff)
> +
> +static __always_inline bool __ticket_is_locked(u32 val)
> +{
> +       return __ticket(val) != __owner(val);
> +}
> +
> +static __always_inline void ticket_lock(arch_spinlock_t *lock)
> +{
> +       u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);
> +       u16 ticket = __ticket(val);
> +
> +       if (ticket == __owner(val))
> +               return;
> +
> +       atomic_cond_read_acquire(lock, ticket == __owner(VAL));
> +}
> +
> +static __always_inline bool ticket_trylock(arch_spinlock_t *lock)
> +{
> +       u32 old = atomic_read(lock);
> +
> +       if (__ticket_is_locked(old))
> +               return false;
> +
> +       return atomic_try_cmpxchg_acquire(lock, &old, old + ONE_TICKET);
> +}
> +
> +static __always_inline void ticket_unlock(arch_spinlock_t *lock)
> +{
> +       u16 *ptr = (u16 *)lock + __is_defined(__BIG_ENDIAN);
> +       u32 val = atomic_read(lock);
> +
> +       smp_store_release(ptr, __owner(val) + 1);
> +}
> +
> +static __always_inline int ticket_is_contended(arch_spinlock_t *lock)
> +{
> +       u32 val = atomic_read(lock);
> +
> +       return (__ticket(val) - __owner(val)) > 1;
> +}
> +
> +static __always_inline int ticket_is_locked(arch_spinlock_t *lock)
> +{
> +       return __ticket_is_locked(atomic_read(lock));
> +}
> +
> +static __always_inline int ticket_value_unlocked(arch_spinlock_t lock)
> +{
> +       return !__ticket_is_locked(lock.counter);
> +}
> +
> +#undef __owner
> +#undef __ticket
> +#undef ONE_TICKET
> +
> +#define arch_spin_lock(l)              ticket_lock(l)
> +#define arch_spin_trylock(l)           ticket_trylock(l)
> +#define arch_spin_unlock(l)            ticket_unlock(l)
> +#define arch_spin_is_locked(l)         ticket_is_locked(l)
> +#define arch_spin_is_contended(l)      ticket_is_contended(l)
> +#define arch_spin_value_unlocked(l)    ticket_value_unlocked(l)
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_H */



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 13:05 ` Peter Zijlstra
@ 2021-10-22  9:23   ` Mark Rutland
  -1 siblings, 0 replies; 26+ messages in thread
From: Mark Rutland @ 2021-10-22  9:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

Hi Peter,

On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> +static __always_inline void ticket_lock(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);

I wonder, should these atomics be arch_atomic_*(), in case an arch_ or raw_
lock is used in noinstr code? The plain atomic_*() forms can have explicit
inline instrumentation.

I haven't seen any issues with qspinlock so far, and that also uses the
(instrumented) atomics, so maybe that's not actually a problem, but I'm not
sure what we intend here w.r.t.  instrumentability.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-22  9:23   ` Mark Rutland
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Rutland @ 2021-10-22  9:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

Hi Peter,

On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> +static __always_inline void ticket_lock(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);

I wonder, should these atomics be arch_atomic_*(), in case an arch_ or raw_
lock is used in noinstr code? The plain atomic_*() forms can have explicit
inline instrumentation.

I haven't seen any issues with qspinlock so far, and that also uses the
(instrumented) atomics, so maybe that's not actually a problem, but I'm not
sure what we intend here w.r.t.  instrumentability.

Thanks,
Mark.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-22  9:23   ` Mark Rutland
@ 2021-10-22 12:31     ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-22 12:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Fri, Oct 22, 2021 at 10:23:02AM +0100, Mark Rutland wrote:
> Hi Peter,
> 
> On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> > +static __always_inline void ticket_lock(arch_spinlock_t *lock)
> > +{
> > +	u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);
> 
> I wonder, should these atomics be arch_atomic_*(), in case an arch_ or raw_
> lock is used in noinstr code? The plain atomic_*() forms can have explicit
> inline instrumentation.
> 
> I haven't seen any issues with qspinlock so far, and that also uses the
> (instrumented) atomics, so maybe that's not actually a problem, but I'm not
> sure what we intend here w.r.t.  instrumentability.

So far it's not been a problem, and as you say, if we want to change
this, we need a larger audit/cleanup.

IIRC there's a potential problem in the arm idle code (noinstr'ing the
idle code is still on the TODO list somewhre, hampered by the need to
create more tooling).

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-22 12:31     ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2021-10-22 12:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann,
	linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Fri, Oct 22, 2021 at 10:23:02AM +0100, Mark Rutland wrote:
> Hi Peter,
> 
> On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> > +static __always_inline void ticket_lock(arch_spinlock_t *lock)
> > +{
> > +	u32 val = atomic_fetch_add_acquire(ONE_TICKET, lock);
> 
> I wonder, should these atomics be arch_atomic_*(), in case an arch_ or raw_
> lock is used in noinstr code? The plain atomic_*() forms can have explicit
> inline instrumentation.
> 
> I haven't seen any issues with qspinlock so far, and that also uses the
> (instrumented) atomics, so maybe that's not actually a problem, but I'm not
> sure what we intend here w.r.t.  instrumentability.

So far it's not been a problem, and as you say, if we want to change
this, we need a larger audit/cleanup.

IIRC there's a potential problem in the arm idle code (noinstr'ing the
idle code is still on the TODO list somewhre, hampered by the need to
create more tooling).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 18:04   ` Waiman Long
@ 2021-10-22 15:19     ` Boqun Feng
  -1 siblings, 0 replies; 26+ messages in thread
From: Boqun Feng @ 2021-10-22 15:19 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Will Deacon, Ingo Molnar, Arnd Bergmann,
	linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 02:04:55PM -0400, Waiman Long wrote:
[...]
> > +static __always_inline int ticket_is_contended(arch_spinlock_t *lock)
> > +{
> > +	u32 val = atomic_read(lock);
> > +
> > +	return (__ticket(val) - __owner(val)) > 1;
> Nit: The left side is unsigned, but the right is signed. I think you are
> relying on the implicit signed to unsigned conversion. It may be a bit
> clearer if you use 1U instead.
> > +}
> > +
> > +static __always_inline int ticket_is_locked(arch_spinlock_t *lock)
> > +{
> > +	return __ticket_is_locked(atomic_read(lock));
> > +}
> > +
> > +static __always_inline int ticket_value_unlocked(arch_spinlock_t lock)
> > +{
> > +	return !__ticket_is_locked(lock.counter);
> > +}
> > +
> > +#undef __owner
> > +#undef __ticket
> > +#undef ONE_TICKET
> > +
> > +#define arch_spin_lock(l)		ticket_lock(l)
> > +#define arch_spin_trylock(l)		ticket_trylock(l)
> > +#define arch_spin_unlock(l)		ticket_unlock(l)
> > +#define arch_spin_is_locked(l)		ticket_is_locked(l)
> > +#define arch_spin_is_contended(l)	ticket_is_contended(l)
> > +#define arch_spin_value_unlocked(l)	ticket_value_unlocked(l)
> > +
> > +#endif /* __ASM_GENERIC_TICKET_LOCK_H */
> 
> Other than the nit above, the patch looks good to me.
> 
> Reviewed-by: Waiman Long <longman@redhat.com>
> 

Same here ;-)

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-10-22 15:19     ` Boqun Feng
  0 siblings, 0 replies; 26+ messages in thread
From: Boqun Feng @ 2021-10-22 15:19 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Will Deacon, Ingo Molnar, Arnd Bergmann,
	linux-arch, linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel,
	linux-riscv, Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 02:04:55PM -0400, Waiman Long wrote:
[...]
> > +static __always_inline int ticket_is_contended(arch_spinlock_t *lock)
> > +{
> > +	u32 val = atomic_read(lock);
> > +
> > +	return (__ticket(val) - __owner(val)) > 1;
> Nit: The left side is unsigned, but the right is signed. I think you are
> relying on the implicit signed to unsigned conversion. It may be a bit
> clearer if you use 1U instead.
> > +}
> > +
> > +static __always_inline int ticket_is_locked(arch_spinlock_t *lock)
> > +{
> > +	return __ticket_is_locked(atomic_read(lock));
> > +}
> > +
> > +static __always_inline int ticket_value_unlocked(arch_spinlock_t lock)
> > +{
> > +	return !__ticket_is_locked(lock.counter);
> > +}
> > +
> > +#undef __owner
> > +#undef __ticket
> > +#undef ONE_TICKET
> > +
> > +#define arch_spin_lock(l)		ticket_lock(l)
> > +#define arch_spin_trylock(l)		ticket_trylock(l)
> > +#define arch_spin_unlock(l)		ticket_unlock(l)
> > +#define arch_spin_is_locked(l)		ticket_is_locked(l)
> > +#define arch_spin_is_contended(l)	ticket_is_contended(l)
> > +#define arch_spin_value_unlocked(l)	ticket_value_unlocked(l)
> > +
> > +#endif /* __ASM_GENERIC_TICKET_LOCK_H */
> 
> Other than the nit above, the patch looks good to me.
> 
> Reviewed-by: Waiman Long <longman@redhat.com>
> 

Same here ;-)

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
  2021-10-21 13:05 ` Peter Zijlstra
@ 2021-12-14 15:40   ` Will Deacon
  -1 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2021-12-14 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann, linux-arch,
	linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel, linux-riscv,
	Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> 
> There's currently a number of architectures that want/have graduated
> from test-and-set locks and are looking at qspinlock.
> 
> *HOWEVER* qspinlock is very complicated and requires a lot of an
> architecture to actually work correctly. Specifically it requires
> forward progress between a fair number of atomic primitives, including
> an xchg16 operation, which I've seen a fair number of fundamentally
> broken implementations of in the tree (specifically for qspinlock no
> less).
> 
> The benefit of qspinlock over ticket lock is also non-obvious, esp.
> at low contention (the vast majority of cases in the kernel), and it
> takes a fairly large number of CPUs (typically also NUMA) to make
> qspinlock beat ticket locks.
> 
> Esp. things like ARM64's WFE can move the balance a lot in favour of
> simpler locks by reducing the cacheline pressure due to waiters (see
> their smp_cond_load_acquire() implementation for details).
> 
> Unless you've audited qspinlock for your architecture and found it
> sound *and* can show actual benefit, simpler is better.
> 
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/qspinlock.h         |   30 +++++++++
>  include/asm-generic/ticket_lock_types.h |   11 +++
>  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)

Huh. I looked quite closely at this a while back but seems like I forgot to
actually reply here. So, given that it doesn't seem to be in linux-next yet:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] locking: Generic ticket lock
@ 2021-12-14 15:40   ` Will Deacon
  0 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2021-12-14 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Ingo Molnar, Waiman Long, Arnd Bergmann, linux-arch,
	linux-kernel, Guo Ren, Palmer Dabbelt, Anup Patel, linux-riscv,
	Christoph Müllner, Stafford Horne

On Thu, Oct 21, 2021 at 03:05:15PM +0200, Peter Zijlstra wrote:
> 
> There's currently a number of architectures that want/have graduated
> from test-and-set locks and are looking at qspinlock.
> 
> *HOWEVER* qspinlock is very complicated and requires a lot of an
> architecture to actually work correctly. Specifically it requires
> forward progress between a fair number of atomic primitives, including
> an xchg16 operation, which I've seen a fair number of fundamentally
> broken implementations of in the tree (specifically for qspinlock no
> less).
> 
> The benefit of qspinlock over ticket lock is also non-obvious, esp.
> at low contention (the vast majority of cases in the kernel), and it
> takes a fairly large number of CPUs (typically also NUMA) to make
> qspinlock beat ticket locks.
> 
> Esp. things like ARM64's WFE can move the balance a lot in favour of
> simpler locks by reducing the cacheline pressure due to waiters (see
> their smp_cond_load_acquire() implementation for details).
> 
> Unless you've audited qspinlock for your architecture and found it
> sound *and* can show actual benefit, simpler is better.
> 
> Therefore provide ticket locks, which depend on a single atomic
> operation (fetch_add) while still providing fairness.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/asm-generic/qspinlock.h         |   30 +++++++++
>  include/asm-generic/ticket_lock_types.h |   11 +++
>  include/asm-generic/ticket_lock.h       |   97 ++++++++++++++++++++++++++++++++
>  3 files changed, 138 insertions(+)

Huh. I looked quite closely at this a while back but seems like I forgot to
actually reply here. So, given that it doesn't seem to be in linux-next yet:

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-12-14 15:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 13:05 [PATCH] locking: Generic ticket lock Peter Zijlstra
2021-10-21 13:05 ` Peter Zijlstra
2021-10-21 13:12 ` Peter Zijlstra
2021-10-21 13:12   ` Peter Zijlstra
2021-10-21 13:50   ` Stafford Horne
2021-10-21 13:50     ` Stafford Horne
2021-10-21 13:49 ` Arnd Bergmann
2021-10-21 13:49   ` Arnd Bergmann
2021-10-21 15:14   ` Peter Zijlstra
2021-10-21 15:14     ` Peter Zijlstra
2021-10-21 15:31     ` Arnd Bergmann
2021-10-21 15:31       ` Arnd Bergmann
2021-10-21 16:28       ` Peter Zijlstra
2021-10-21 16:28         ` Peter Zijlstra
2021-10-21 18:04 ` Waiman Long
2021-10-21 18:04   ` Waiman Long
2021-10-22 15:19   ` Boqun Feng
2021-10-22 15:19     ` Boqun Feng
2021-10-22  2:04 ` Guo Ren
2021-10-22  2:04   ` Guo Ren
2021-10-22  9:23 ` Mark Rutland
2021-10-22  9:23   ` Mark Rutland
2021-10-22 12:31   ` Peter Zijlstra
2021-10-22 12:31     ` Peter Zijlstra
2021-12-14 15:40 ` Will Deacon
2021-12-14 15:40   ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.