linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Generic Ticket Spinlocks
@ 2022-04-14 22:02 Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock Palmer Dabbelt
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-14 22:02 UTC (permalink / raw)
  To: Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, longman, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou,
	Arnd Bergmann, macro, Greg KH, sudipm.mukherjee, wangkefeng.wang,
	jszhang, linux-csky, linux-kernel, openrisc, linux-riscv,
	linux-arch

Looks like feedback has been largely positive on this one.  I think I
got everything from the v1 and v2, but it was a bit mixed up so sorry if
I missed something.  I'm generally being conservative on the tags here,
as things have drifted around a bit.  Specifically I dropped the
Tested-bys, as this is all based on 5.18-rc1 now and there's been a
touch of diff.

I've put this at palmer/tspinlock-v3, in case that helps anyone.  This
generally looks good to me, but I'll wait for feedback before putting it
anywhere else.  I'd default to doing a shared tag for the asm-generic
stuff and then let other arch folks pull in that (with their arch
support), but if you want me to take it via my tree then feel free to
just say so explicitly.  What's on that branch right now definately
shouldn't be treated as stable, though, as I'll wait for at least an
official Ack/Review from the asm-generic folks (and of course there may
be more feedback).

This passes my standard tests, both as the whole thing and as just the
RISC-V spinlock change.  That's just QEMU, though, so it's not all that
exciting.

Changes since v2 <20220319035457.2214979-1-guoren@kernel.org>:
* Picked up Peter's SOBs, which were posted on the v1.
* Re-ordered the first two patches, as they
* Re-worded the RISC-V qrwlock patch, as it was a bit mushy.  I also
  added a blurb in the qrwlock's top comment about this dependency.
* Picked up Stafford's fix for big-endian systems, which I have not
  tested as I don't have one (at least easily availiable, I think the BE
  MIPS systems are still in that pile in my garage).
* Call the generic version <asm-genenic/spinlock{_types}.h>, as there's
  really no utility to the version that only errors out.

Changes since v1 <20220316232600.20419-1-palmer@rivosinc.com>:
* Follow Arnd suggestion to make the patch series more generic.
* Add csky in the series.
* Combine RISC-V's two patches into one.
* Modify openrisc's patch to suit the new generic version.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock
  2022-04-14 22:02 [PATCH v3 0/7] Generic Ticket Spinlocks Palmer Dabbelt
@ 2022-04-14 22:02 ` Palmer Dabbelt
  2022-04-15  1:09   ` Boqun Feng
  2022-04-15  1:27   ` Waiman Long
  2022-04-14 22:02 ` [PATCH v3 2/7] asm-generic: qspinlock: Indicate the use of mixed-size atomics Palmer Dabbelt
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-14 22:02 UTC (permalink / raw)
  To: Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, longman, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou,
	Arnd Bergmann, macro, Greg KH, sudipm.mukherjee, wangkefeng.wang,
	jszhang, linux-csky, linux-kernel, openrisc, linux-riscv,
	linux-arch, Palmer Dabbelt

From: Peter Zijlstra <peterz@infradead.org>

This is a simple, fair spinlock.  Specifically it doesn't have all the
subtle memory model dependencies that qspinlock has, which makes it more
suitable for simple systems as it is more likely to be correct.  It is
implemented entirely in terms of standard atomics and thus works fine
without any arch-specific code.

This replaces the existing asm-generic/spinlock.h, which just errored
out on SMP systems.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 include/asm-generic/spinlock.h       | 85 +++++++++++++++++++++++++---
 include/asm-generic/spinlock_types.h | 17 ++++++
 2 files changed, 94 insertions(+), 8 deletions(-)
 create mode 100644 include/asm-generic/spinlock_types.h

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index adaf6acab172..ca829fcb9672 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -1,12 +1,81 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_GENERIC_SPINLOCK_H
-#define __ASM_GENERIC_SPINLOCK_H
+
 /*
- * You need to implement asm/spinlock.h for SMP support. The generic
- * version does not handle SMP.
+ * 'Generic' ticket-lock implementation.
+ *
+ * It relies on atomic_fetch_add() having well defined forward progress
+ * guarantees under contention. If your architecture cannot provide this, stick
+ * to a test-and-set lock.
+ *
+ * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
+ * sub-word of the value. This is generally true for anything LL/SC although
+ * you'd be hard pressed to find anything useful in architecture specifications
+ * about this. If your architecture cannot do this you might be better off with
+ * a test-and-set.
+ *
+ * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
+ * uses atomic_fetch_add() which is SC to create an RCsc lock.
+ *
+ * The implementation uses smp_cond_load_acquire() to spin, so if the
+ * architecture has WFE like instructions to sleep instead of poll for word
+ * modifications be sure to implement that (see ARM64 for example).
+ *
  */
-#ifdef CONFIG_SMP
-#error need an architecture specific asm/spinlock.h
-#endif
 
-#endif /* __ASM_GENERIC_SPINLOCK_H */
+#ifndef __ASM_GENERIC_TICKET_LOCK_H
+#define __ASM_GENERIC_TICKET_LOCK_H
+
+#include <linux/atomic.h>
+#include <asm-generic/spinlock_types.h>
+
+static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+	u32 val = atomic_fetch_add(1<<16, lock); /* SC, gives us RCsc */
+	u16 ticket = val >> 16;
+
+	if (ticket == (u16)val)
+		return;
+
+	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
+}
+
+static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
+{
+	u32 old = atomic_read(lock);
+
+	if ((old >> 16) != (old & 0xffff))
+		return false;
+
+	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
+}
+
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+	u32 val = atomic_read(lock);
+
+	smp_store_release(ptr, (u16)val + 1);
+}
+
+static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
+{
+	u32 val = atomic_read(lock);
+
+	return ((val >> 16) != (val & 0xffff));
+}
+
+static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
+{
+	u32 val = atomic_read(lock);
+
+	return (s16)((val >> 16) - (val & 0xffff)) > 1;
+}
+
+static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+	return !arch_spin_is_locked(&lock);
+}
+
+#include <asm/qrwlock.h>
+
+#endif /* __ASM_GENERIC_TICKET_LOCK_H */
diff --git a/include/asm-generic/spinlock_types.h b/include/asm-generic/spinlock_types.h
new file mode 100644
index 000000000000..e56ddb84d030
--- /dev/null
+++ b/include/asm-generic/spinlock_types.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
+#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
+
+#include <linux/types.h>
+typedef atomic_t arch_spinlock_t;
+
+/*
+ * qrwlock_types depends on arch_spinlock_t, so we must typedef that before the
+ * include.
+ */
+#include <asm/qrwlock_types.h>
+
+#define __ARCH_SPIN_LOCK_UNLOCKED	ATOMIC_INIT(0)
+
+#endif /* __ASM_GENERIC_TICKET_LOCK_TYPES_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/7] asm-generic: qspinlock: Indicate the use of mixed-size atomics
  2022-04-14 22:02 [PATCH v3 0/7] Generic Ticket Spinlocks Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock Palmer Dabbelt
@ 2022-04-14 22:02 ` Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 3/7] asm-generic: qrwlock: Document the spinlock fairness requirements Palmer Dabbelt
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-14 22:02 UTC (permalink / raw)
  To: Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, longman, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou,
	Arnd Bergmann, macro, Greg KH, sudipm.mukherjee, wangkefeng.wang,
	jszhang, linux-csky, linux-kernel, openrisc, linux-riscv,
	linux-arch, Palmer Dabbelt

From: Peter Zijlstra <peterz@infradead.org>

The qspinlock implementation depends on having well behaved mixed-size
atomics.  This is true on the more widely-used platforms, but these
requirements are somewhat subtle and may not be satisfied by all the
platforms that qspinlock is used on.

Document these requirements, so ports that use qspinlock can more easily
determine if they meet these requirements.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 include/asm-generic/qspinlock.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index d74b13825501..95be3f3c28b5 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -2,6 +2,37 @@
 /*
  * Queued spinlock
  *
+ * A 'generic' spinlock implementation that is based on MCS locks. An
+ * architecture that's looking for a 'generic' spinlock, please first consider
+ * ticket-lock.h and only come looking here when you've considered all the
+ * constraints below and can show your hardware does actually perform better
+ * with qspinlock.
+ *
+ *
+ * It relies on atomic_*_release()/atomic_*_acquire() to be RCsc (or no weaker
+ * than RCtso if you're power), where regular code only expects atomic_t to be
+ * RCpc.
+ *
+ * It relies on a far greater (compared to asm-generic/spinlock.h) set of
+ * atomic operations to behave well together, please audit them carefully to
+ * ensure they all have forward progress. Many atomic operations may default to
+ * cmpxchg() loops which will not have good forward progress properties on
+ * LL/SC architectures.
+ *
+ * One notable example is atomic_fetch_or_acquire(), which x86 cannot (cheaply)
+ * do. Carefully read the patches that introduced
+ * queued_fetch_set_pending_acquire().
+ *
+ * It also heavily relies on mixed size atomic operations, in specific it
+ * requires architectures to have xchg16; something which many LL/SC
+ * architectures need to implement as a 32bit and+or in order to satisfy the
+ * forward progress guarantees mentioned above.
+ *
+ * Further reading on mixed size atomics that might be relevant:
+ *
+ *   http://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf
+ *
+ *
  * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
  * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/7] asm-generic: qrwlock: Document the spinlock fairness requirements
  2022-04-14 22:02 [PATCH v3 0/7] Generic Ticket Spinlocks Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 2/7] asm-generic: qspinlock: Indicate the use of mixed-size atomics Palmer Dabbelt
@ 2022-04-14 22:02 ` Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 4/7] openrisc: Move to ticket-spinlock Palmer Dabbelt
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-14 22:02 UTC (permalink / raw)
  To: Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, longman, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou,
	Arnd Bergmann, macro, Greg KH, sudipm.mukherjee, wangkefeng.wang,
	jszhang, linux-csky, linux-kernel, openrisc, linux-riscv,
	linux-arch, Palmer Dabbelt

From: Palmer Dabbelt <palmer@rivosinc.com>

I could only find the fairness requirements documented as the C code,
this calls them out in a comment just to be a bit more explicit.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 include/asm-generic/qrwlock.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 7ae0ece07b4e..24ae09c1db9f 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -2,6 +2,10 @@
 /*
  * Queue read/write lock
  *
+ * These use generic atomic and locking routines, but depend on a fair spinlock
+ * implementation in order to be fair themselves.  The implementation in
+ * asm-generic/spinlock.h meets these requirements.
+ *
  * (C) Copyright 2013-2014 Hewlett-Packard Development Company, L.P.
  *
  * Authors: Waiman Long <waiman.long@hp.com>
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/7] openrisc: Move to ticket-spinlock
  2022-04-14 22:02 [PATCH v3 0/7] Generic Ticket Spinlocks Palmer Dabbelt
                   ` (2 preceding siblings ...)
  2022-04-14 22:02 ` [PATCH v3 3/7] asm-generic: qrwlock: Document the spinlock fairness requirements Palmer Dabbelt
@ 2022-04-14 22:02 ` Palmer Dabbelt
  2022-04-30  7:52   ` Stafford Horne
  2022-04-14 22:02 ` [PATCH v3 5/7] RISC-V: Move to generic spinlocks Palmer Dabbelt
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-14 22:02 UTC (permalink / raw)
  To: Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, longman, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou,
	Arnd Bergmann, macro, Greg KH, sudipm.mukherjee, wangkefeng.wang,
	jszhang, linux-csky, linux-kernel, openrisc, linux-riscv,
	linux-arch, Palmer Dabbelt

From: Peter Zijlstra <peterz@infradead.org>

We have no indications that openrisc meets the qspinlock requirements,
so move to ticket-spinlock as that is more likey to be correct.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 arch/openrisc/Kconfig                      |  1 -
 arch/openrisc/include/asm/Kbuild           |  5 ++--
 arch/openrisc/include/asm/spinlock.h       | 27 ----------------------
 arch/openrisc/include/asm/spinlock_types.h |  7 ------
 4 files changed, 2 insertions(+), 38 deletions(-)
 delete mode 100644 arch/openrisc/include/asm/spinlock.h
 delete mode 100644 arch/openrisc/include/asm/spinlock_types.h

diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 0d68adf6e02b..99f0e4a4cbbd 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -30,7 +30,6 @@ config OPENRISC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select OR1K_PIC
 	select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
-	select ARCH_USE_QUEUED_SPINLOCKS
 	select ARCH_USE_QUEUED_RWLOCKS
 	select OMPIC if SMP
 	select ARCH_WANT_FRAME_POINTERS
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index ca5987e11053..3386b9c1c073 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -1,9 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 generic-y += extable.h
 generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
-generic-y += qspinlock_types.h
-generic-y += qspinlock.h
+generic-y += spinlock_types.h
+generic-y += spinlock.h
 generic-y += qrwlock_types.h
 generic-y += qrwlock.h
 generic-y += user.h
diff --git a/arch/openrisc/include/asm/spinlock.h b/arch/openrisc/include/asm/spinlock.h
deleted file mode 100644
index 264944a71535..000000000000
--- a/arch/openrisc/include/asm/spinlock.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * OpenRISC Linux
- *
- * Linux architectural port borrowing liberally from similar works of
- * others.  All original copyrights apply as per the original source
- * declaration.
- *
- * OpenRISC implementation:
- * Copyright (C) 2003 Matjaz Breskvar <phoenix@bsemi.com>
- * Copyright (C) 2010-2011 Jonas Bonn <jonas@southpole.se>
- * et al.
- */
-
-#ifndef __ASM_OPENRISC_SPINLOCK_H
-#define __ASM_OPENRISC_SPINLOCK_H
-
-#include <asm/qspinlock.h>
-
-#include <asm/qrwlock.h>
-
-#define arch_spin_relax(lock)	cpu_relax()
-#define arch_read_relax(lock)	cpu_relax()
-#define arch_write_relax(lock)	cpu_relax()
-
-
-#endif
diff --git a/arch/openrisc/include/asm/spinlock_types.h b/arch/openrisc/include/asm/spinlock_types.h
deleted file mode 100644
index 7c6fb1208c88..000000000000
--- a/arch/openrisc/include/asm/spinlock_types.h
+++ /dev/null
@@ -1,7 +0,0 @@
-#ifndef _ASM_OPENRISC_SPINLOCK_TYPES_H
-#define _ASM_OPENRISC_SPINLOCK_TYPES_H
-
-#include <asm/qspinlock_types.h>
-#include <asm/qrwlock_types.h>
-
-#endif /* _ASM_OPENRISC_SPINLOCK_TYPES_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/7] RISC-V: Move to generic spinlocks
  2022-04-14 22:02 [PATCH v3 0/7] Generic Ticket Spinlocks Palmer Dabbelt
                   ` (3 preceding siblings ...)
  2022-04-14 22:02 ` [PATCH v3 4/7] openrisc: Move to ticket-spinlock Palmer Dabbelt
@ 2022-04-14 22:02 ` Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 6/7] RISC-V: Move to queued RW locks Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 7/7] csky: Move to generic ticket-spinlock Palmer Dabbelt
  6 siblings, 0 replies; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-14 22:02 UTC (permalink / raw)
  To: Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, longman, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou,
	Arnd Bergmann, macro, Greg KH, sudipm.mukherjee, wangkefeng.wang,
	jszhang, linux-csky, linux-kernel, openrisc, linux-riscv,
	linux-arch, Palmer Dabbelt

From: Palmer Dabbelt <palmer@rivosinc.com>

Our existing spinlocks aren't fair and replacing them has been on the
TODO list for a long time.  This moves to the recently-introduced ticket
spinlocks, which are simple enough that they are likely to be correct
and fast on the vast majority of extant implementations.

This introduces a horrible hack that allows us to split out the spinlock
conversion from the rwlock conversion.  We have to do the spinlocks
first because qrwlock needs fair spinlocks, but we don't want to pollute
the asm-generic code to support the generic spinlocks without qrwlocks.
Thus we pollute the RISC-V code, but just until the next commit as it's
all going away.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 arch/riscv/include/asm/Kbuild           |  2 ++
 arch/riscv/include/asm/spinlock.h       | 44 +++----------------------
 arch/riscv/include/asm/spinlock_types.h |  9 +++--
 3 files changed, 10 insertions(+), 45 deletions(-)

diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 5edf5b8587e7..c3f229ae8033 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -3,5 +3,7 @@ generic-y += early_ioremap.h
 generic-y += flat.h
 generic-y += kvm_para.h
 generic-y += parport.h
+generic-y += qrwlock.h
+generic-y += qrwlock_types.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
index f4f7fa1b7ca8..88a4d5d0d98a 100644
--- a/arch/riscv/include/asm/spinlock.h
+++ b/arch/riscv/include/asm/spinlock.h
@@ -7,49 +7,13 @@
 #ifndef _ASM_RISCV_SPINLOCK_H
 #define _ASM_RISCV_SPINLOCK_H
 
+/* This is horible, but the whole file is going away in the next commit. */
+#define __ASM_GENERIC_QRWLOCK_H
+
 #include <linux/kernel.h>
 #include <asm/current.h>
 #include <asm/fence.h>
-
-/*
- * Simple spin lock operations.  These provide no fairness guarantees.
- */
-
-/* FIXME: Replace this with a ticket lock, like MIPS. */
-
-#define arch_spin_is_locked(x)	(READ_ONCE((x)->lock) != 0)
-
-static inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	smp_store_release(&lock->lock, 0);
-}
-
-static inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	int tmp = 1, busy;
-
-	__asm__ __volatile__ (
-		"	amoswap.w %0, %2, %1\n"
-		RISCV_ACQUIRE_BARRIER
-		: "=r" (busy), "+A" (lock->lock)
-		: "r" (tmp)
-		: "memory");
-
-	return !busy;
-}
-
-static inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	while (1) {
-		if (arch_spin_is_locked(lock))
-			continue;
-
-		if (arch_spin_trylock(lock))
-			break;
-	}
-}
-
-/***********************************************************/
+#include <asm-generic/spinlock.h>
 
 static inline void arch_read_lock(arch_rwlock_t *lock)
 {
diff --git a/arch/riscv/include/asm/spinlock_types.h b/arch/riscv/include/asm/spinlock_types.h
index 5a35a49505da..f2f9b5d7120d 100644
--- a/arch/riscv/include/asm/spinlock_types.h
+++ b/arch/riscv/include/asm/spinlock_types.h
@@ -6,15 +6,14 @@
 #ifndef _ASM_RISCV_SPINLOCK_TYPES_H
 #define _ASM_RISCV_SPINLOCK_TYPES_H
 
+/* This is horible, but the whole file is going away in the next commit. */
+#define __ASM_GENERIC_QRWLOCK_TYPES_H
+
 #ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 # error "please don't include this file directly"
 #endif
 
-typedef struct {
-	volatile unsigned int lock;
-} arch_spinlock_t;
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#include <asm-generic/spinlock_types.h>
 
 typedef struct {
 	volatile unsigned int lock;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 6/7] RISC-V: Move to queued RW locks
  2022-04-14 22:02 [PATCH v3 0/7] Generic Ticket Spinlocks Palmer Dabbelt
                   ` (4 preceding siblings ...)
  2022-04-14 22:02 ` [PATCH v3 5/7] RISC-V: Move to generic spinlocks Palmer Dabbelt
@ 2022-04-14 22:02 ` Palmer Dabbelt
  2022-04-14 22:02 ` [PATCH v3 7/7] csky: Move to generic ticket-spinlock Palmer Dabbelt
  6 siblings, 0 replies; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-14 22:02 UTC (permalink / raw)
  To: Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, longman, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou,
	Arnd Bergmann, macro, Greg KH, sudipm.mukherjee, wangkefeng.wang,
	jszhang, linux-csky, linux-kernel, openrisc, linux-riscv,
	linux-arch, Palmer Dabbelt

From: Palmer Dabbelt <palmer@rivosinc.com>

Now that we have fair spinlocks we can use the generic queued rwlocks,
so we might as well do so.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 arch/riscv/Kconfig                      |  1 +
 arch/riscv/include/asm/Kbuild           |  2 +
 arch/riscv/include/asm/spinlock.h       | 99 -------------------------
 arch/riscv/include/asm/spinlock_types.h | 24 ------
 4 files changed, 3 insertions(+), 123 deletions(-)
 delete mode 100644 arch/riscv/include/asm/spinlock.h
 delete mode 100644 arch/riscv/include/asm/spinlock_types.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 00fd9c548f26..f8a55d94016d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -39,6 +39,7 @@ config RISCV
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC if MMU
 	select ARCH_SUPPORTS_HUGETLBFS if MMU
 	select ARCH_USE_MEMTEST
+	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_GENERAL_HUGETLB
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index c3f229ae8033..504f8b7e72d4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -3,6 +3,8 @@ generic-y += early_ioremap.h
 generic-y += flat.h
 generic-y += kvm_para.h
 generic-y += parport.h
+generic-y += spinlock.h
+generic-y += spinlock_types.h
 generic-y += qrwlock.h
 generic-y += qrwlock_types.h
 generic-y += user.h
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
deleted file mode 100644
index 88a4d5d0d98a..000000000000
--- a/arch/riscv/include/asm/spinlock.h
+++ /dev/null
@@ -1,99 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2015 Regents of the University of California
- * Copyright (C) 2017 SiFive
- */
-
-#ifndef _ASM_RISCV_SPINLOCK_H
-#define _ASM_RISCV_SPINLOCK_H
-
-/* This is horible, but the whole file is going away in the next commit. */
-#define __ASM_GENERIC_QRWLOCK_H
-
-#include <linux/kernel.h>
-#include <asm/current.h>
-#include <asm/fence.h>
-#include <asm-generic/spinlock.h>
-
-static inline void arch_read_lock(arch_rwlock_t *lock)
-{
-	int tmp;
-
-	__asm__ __volatile__(
-		"1:	lr.w	%1, %0\n"
-		"	bltz	%1, 1b\n"
-		"	addi	%1, %1, 1\n"
-		"	sc.w	%1, %1, %0\n"
-		"	bnez	%1, 1b\n"
-		RISCV_ACQUIRE_BARRIER
-		: "+A" (lock->lock), "=&r" (tmp)
-		:: "memory");
-}
-
-static inline void arch_write_lock(arch_rwlock_t *lock)
-{
-	int tmp;
-
-	__asm__ __volatile__(
-		"1:	lr.w	%1, %0\n"
-		"	bnez	%1, 1b\n"
-		"	li	%1, -1\n"
-		"	sc.w	%1, %1, %0\n"
-		"	bnez	%1, 1b\n"
-		RISCV_ACQUIRE_BARRIER
-		: "+A" (lock->lock), "=&r" (tmp)
-		:: "memory");
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *lock)
-{
-	int busy;
-
-	__asm__ __volatile__(
-		"1:	lr.w	%1, %0\n"
-		"	bltz	%1, 1f\n"
-		"	addi	%1, %1, 1\n"
-		"	sc.w	%1, %1, %0\n"
-		"	bnez	%1, 1b\n"
-		RISCV_ACQUIRE_BARRIER
-		"1:\n"
-		: "+A" (lock->lock), "=&r" (busy)
-		:: "memory");
-
-	return !busy;
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *lock)
-{
-	int busy;
-
-	__asm__ __volatile__(
-		"1:	lr.w	%1, %0\n"
-		"	bnez	%1, 1f\n"
-		"	li	%1, -1\n"
-		"	sc.w	%1, %1, %0\n"
-		"	bnez	%1, 1b\n"
-		RISCV_ACQUIRE_BARRIER
-		"1:\n"
-		: "+A" (lock->lock), "=&r" (busy)
-		:: "memory");
-
-	return !busy;
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *lock)
-{
-	__asm__ __volatile__(
-		RISCV_RELEASE_BARRIER
-		"	amoadd.w x0, %1, %0\n"
-		: "+A" (lock->lock)
-		: "r" (-1)
-		: "memory");
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *lock)
-{
-	smp_store_release(&lock->lock, 0);
-}
-
-#endif /* _ASM_RISCV_SPINLOCK_H */
diff --git a/arch/riscv/include/asm/spinlock_types.h b/arch/riscv/include/asm/spinlock_types.h
deleted file mode 100644
index f2f9b5d7120d..000000000000
--- a/arch/riscv/include/asm/spinlock_types.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2015 Regents of the University of California
- */
-
-#ifndef _ASM_RISCV_SPINLOCK_TYPES_H
-#define _ASM_RISCV_SPINLOCK_TYPES_H
-
-/* This is horible, but the whole file is going away in the next commit. */
-#define __ASM_GENERIC_QRWLOCK_TYPES_H
-
-#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
-# error "please don't include this file directly"
-#endif
-
-#include <asm-generic/spinlock_types.h>
-
-typedef struct {
-	volatile unsigned int lock;
-} arch_rwlock_t;
-
-#define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
-
-#endif /* _ASM_RISCV_SPINLOCK_TYPES_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 7/7] csky: Move to generic ticket-spinlock
  2022-04-14 22:02 [PATCH v3 0/7] Generic Ticket Spinlocks Palmer Dabbelt
                   ` (5 preceding siblings ...)
  2022-04-14 22:02 ` [PATCH v3 6/7] RISC-V: Move to queued RW locks Palmer Dabbelt
@ 2022-04-14 22:02 ` Palmer Dabbelt
  6 siblings, 0 replies; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-14 22:02 UTC (permalink / raw)
  To: Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, longman, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou,
	Arnd Bergmann, macro, Greg KH, sudipm.mukherjee, wangkefeng.wang,
	jszhang, linux-csky, linux-kernel, openrisc, linux-riscv,
	linux-arch, Guo Ren, Palmer Dabbelt

From: Guo Ren <guoren@linux.alibaba.com>

There is no benefit from custom implementation for ticket-spinlock,
so move to generic ticket-spinlock for easy maintenance.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
 arch/csky/include/asm/Kbuild           |  3 +
 arch/csky/include/asm/spinlock.h       | 89 --------------------------
 arch/csky/include/asm/spinlock_types.h | 27 --------
 3 files changed, 3 insertions(+), 116 deletions(-)
 delete mode 100644 arch/csky/include/asm/spinlock.h
 delete mode 100644 arch/csky/include/asm/spinlock_types.h

diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 888248235c23..103207a58f97 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -3,7 +3,10 @@ generic-y += asm-offsets.h
 generic-y += extable.h
 generic-y += gpio.h
 generic-y += kvm_para.h
+generic-y += spinlock.h
+generic-y += spinlock_types.h
 generic-y += qrwlock.h
+generic-y += qrwlock_types.h
 generic-y += parport.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
deleted file mode 100644
index 69f5aa249c5f..000000000000
--- a/arch/csky/include/asm/spinlock.h
+++ /dev/null
@@ -1,89 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-
-#ifndef __ASM_CSKY_SPINLOCK_H
-#define __ASM_CSKY_SPINLOCK_H
-
-#include <linux/spinlock_types.h>
-#include <asm/barrier.h>
-
-/*
- * Ticket-based spin-locking.
- */
-static inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	arch_spinlock_t lockval;
-	u32 ticket_next = 1 << TICKET_NEXT;
-	u32 *p = &lock->lock;
-	u32 tmp;
-
-	asm volatile (
-		"1:	ldex.w		%0, (%2) \n"
-		"	mov		%1, %0	 \n"
-		"	add		%0, %3	 \n"
-		"	stex.w		%0, (%2) \n"
-		"	bez		%0, 1b   \n"
-		: "=&r" (tmp), "=&r" (lockval)
-		: "r"(p), "r"(ticket_next)
-		: "cc");
-
-	while (lockval.tickets.next != lockval.tickets.owner)
-		lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
-
-	smp_mb();
-}
-
-static inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	u32 tmp, contended, res;
-	u32 ticket_next = 1 << TICKET_NEXT;
-	u32 *p = &lock->lock;
-
-	do {
-		asm volatile (
-		"	ldex.w		%0, (%3)   \n"
-		"	movi		%2, 1	   \n"
-		"	rotli		%1, %0, 16 \n"
-		"	cmpne		%1, %0     \n"
-		"	bt		1f         \n"
-		"	movi		%2, 0	   \n"
-		"	add		%0, %0, %4 \n"
-		"	stex.w		%0, (%3)   \n"
-		"1:				   \n"
-		: "=&r" (res), "=&r" (tmp), "=&r" (contended)
-		: "r"(p), "r"(ticket_next)
-		: "cc");
-	} while (!res);
-
-	if (!contended)
-		smp_mb();
-
-	return !contended;
-}
-
-static inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	smp_mb();
-	WRITE_ONCE(lock->tickets.owner, lock->tickets.owner + 1);
-}
-
-static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	return lock.tickets.owner == lock.tickets.next;
-}
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	return !arch_spin_value_unlocked(READ_ONCE(*lock));
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	struct __raw_tickets tickets = READ_ONCE(lock->tickets);
-
-	return (tickets.next - tickets.owner) > 1;
-}
-#define arch_spin_is_contended	arch_spin_is_contended
-
-#include <asm/qrwlock.h>
-
-#endif /* __ASM_CSKY_SPINLOCK_H */
diff --git a/arch/csky/include/asm/spinlock_types.h b/arch/csky/include/asm/spinlock_types.h
deleted file mode 100644
index db87a12c3827..000000000000
--- a/arch/csky/include/asm/spinlock_types.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-
-#ifndef __ASM_CSKY_SPINLOCK_TYPES_H
-#define __ASM_CSKY_SPINLOCK_TYPES_H
-
-#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
-# error "please don't include this file directly"
-#endif
-
-#define TICKET_NEXT	16
-
-typedef struct {
-	union {
-		u32 lock;
-		struct __raw_tickets {
-			/* little endian */
-			u16 owner;
-			u16 next;
-		} tickets;
-	};
-} arch_spinlock_t;
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
-
-#include <asm-generic/qrwlock_types.h>
-
-#endif /* __ASM_CSKY_SPINLOCK_TYPES_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock
  2022-04-14 22:02 ` [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock Palmer Dabbelt
@ 2022-04-15  1:09   ` Boqun Feng
  2022-04-15  5:20     ` Palmer Dabbelt
  2022-04-15  1:27   ` Waiman Long
  1 sibling, 1 reply; 15+ messages in thread
From: Boqun Feng @ 2022-04-15  1:09 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Arnd Bergmann, heiko, guoren, shorne, peterz, mingo, Will Deacon,
	longman, jonas, stefan.kristiansson, Paul Walmsley,
	Palmer Dabbelt, aou, macro, Greg KH, sudipm.mukherjee,
	wangkefeng.wang, jszhang, linux-csky, linux-kernel, openrisc,
	linux-riscv, linux-arch

[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]

Hi,

On Thu, Apr 14, 2022 at 03:02:08PM -0700, Palmer Dabbelt wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> This is a simple, fair spinlock.  Specifically it doesn't have all the
> subtle memory model dependencies that qspinlock has, which makes it more
> suitable for simple systems as it is more likely to be correct.  It is
> implemented entirely in terms of standard atomics and thus works fine
> without any arch-specific code.
> 
> This replaces the existing asm-generic/spinlock.h, which just errored
> out on SMP systems.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> ---
>  include/asm-generic/spinlock.h       | 85 +++++++++++++++++++++++++---
>  include/asm-generic/spinlock_types.h | 17 ++++++
>  2 files changed, 94 insertions(+), 8 deletions(-)
>  create mode 100644 include/asm-generic/spinlock_types.h
> 
> diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
> index adaf6acab172..ca829fcb9672 100644
> --- a/include/asm-generic/spinlock.h
> +++ b/include/asm-generic/spinlock.h
> @@ -1,12 +1,81 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef __ASM_GENERIC_SPINLOCK_H
> -#define __ASM_GENERIC_SPINLOCK_H
> +
>  /*
> - * You need to implement asm/spinlock.h for SMP support. The generic
> - * version does not handle SMP.
> + * 'Generic' ticket-lock implementation.
> + *
> + * It relies on atomic_fetch_add() having well defined forward progress
> + * guarantees under contention. If your architecture cannot provide this, stick
> + * to a test-and-set lock.
> + *
> + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
> + * sub-word of the value. This is generally true for anything LL/SC although
> + * you'd be hard pressed to find anything useful in architecture specifications
> + * about this. If your architecture cannot do this you might be better off with
> + * a test-and-set.
> + *
> + * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
> + * uses atomic_fetch_add() which is SC to create an RCsc lock.
> + *
> + * The implementation uses smp_cond_load_acquire() to spin, so if the
> + * architecture has WFE like instructions to sleep instead of poll for word
> + * modifications be sure to implement that (see ARM64 for example).
> + *
>   */
> -#ifdef CONFIG_SMP
> -#error need an architecture specific asm/spinlock.h
> -#endif
>  
> -#endif /* __ASM_GENERIC_SPINLOCK_H */
> +#ifndef __ASM_GENERIC_TICKET_LOCK_H
> +#define __ASM_GENERIC_TICKET_LOCK_H
> +
> +#include <linux/atomic.h>
> +#include <asm-generic/spinlock_types.h>
> +
> +static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_fetch_add(1<<16, lock); /* SC, gives us RCsc */
> +	u16 ticket = val >> 16;
> +
> +	if (ticket == (u16)val)
> +		return;
> +
> +	atomic_cond_read_acquire(lock, ticket == (u16)VAL);

Looks like my follow comment is missing:

	https://lore.kernel.org/lkml/YjM+P32I4fENIqGV@boqun-archlinux/

Basically, I suggested that 1) instead of "SC", use "fully-ordered" as
that's a complete definition in our atomic API ("RCsc" is fine), 2)
introduce a RCsc atomic_cond_read_acquire() or add a full barrier here
to make arch_spin_lock() RCsc otherwise arch_spin_lock() is RCsc on
fastpath but RCpc on slowpath.

Regards,
Boqun

> +}
> +
> +static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
> +{
> +	u32 old = atomic_read(lock);
> +
> +	if ((old >> 16) != (old & 0xffff))
> +		return false;
> +
> +	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
> +}
> +
[...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock
  2022-04-14 22:02 ` [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock Palmer Dabbelt
  2022-04-15  1:09   ` Boqun Feng
@ 2022-04-15  1:27   ` Waiman Long
  2022-04-15 16:46     ` Palmer Dabbelt
  1 sibling, 1 reply; 15+ messages in thread
From: Waiman Long @ 2022-04-15  1:27 UTC (permalink / raw)
  To: Palmer Dabbelt, Arnd Bergmann, heiko, guoren, shorne
  Cc: peterz, mingo, Will Deacon, boqun.feng, jonas,
	stefan.kristiansson, Paul Walmsley, Palmer Dabbelt, aou, macro,
	Greg KH, sudipm.mukherjee, wangkefeng.wang, jszhang, linux-csky,
	linux-kernel, openrisc, linux-riscv, linux-arch

On 4/14/22 18:02, Palmer Dabbelt wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
> This is a simple, fair spinlock.  Specifically it doesn't have all the
> subtle memory model dependencies that qspinlock has, which makes it more
> suitable for simple systems as it is more likely to be correct.  It is
> implemented entirely in terms of standard atomics and thus works fine
> without any arch-specific code.
>
> This replaces the existing asm-generic/spinlock.h, which just errored
> out on SMP systems.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> ---
>   include/asm-generic/spinlock.h       | 85 +++++++++++++++++++++++++---
>   include/asm-generic/spinlock_types.h | 17 ++++++
>   2 files changed, 94 insertions(+), 8 deletions(-)
>   create mode 100644 include/asm-generic/spinlock_types.h
>
> diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
> index adaf6acab172..ca829fcb9672 100644
> --- a/include/asm-generic/spinlock.h
> +++ b/include/asm-generic/spinlock.h
> @@ -1,12 +1,81 @@
>   /* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef __ASM_GENERIC_SPINLOCK_H
> -#define __ASM_GENERIC_SPINLOCK_H
> +
>   /*
> - * You need to implement asm/spinlock.h for SMP support. The generic
> - * version does not handle SMP.
> + * 'Generic' ticket-lock implementation.
> + *
> + * It relies on atomic_fetch_add() having well defined forward progress
> + * guarantees under contention. If your architecture cannot provide this, stick
> + * to a test-and-set lock.
> + *
> + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
> + * sub-word of the value. This is generally true for anything LL/SC although
> + * you'd be hard pressed to find anything useful in architecture specifications
> + * about this. If your architecture cannot do this you might be better off with
> + * a test-and-set.
> + *
> + * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
> + * uses atomic_fetch_add() which is SC to create an RCsc lock.
> + *
> + * The implementation uses smp_cond_load_acquire() to spin, so if the
> + * architecture has WFE like instructions to sleep instead of poll for word
> + * modifications be sure to implement that (see ARM64 for example).
> + *
>    */
> -#ifdef CONFIG_SMP
> -#error need an architecture specific asm/spinlock.h
> -#endif
>   
> -#endif /* __ASM_GENERIC_SPINLOCK_H */
> +#ifndef __ASM_GENERIC_TICKET_LOCK_H
> +#define __ASM_GENERIC_TICKET_LOCK_H
It is not conventional to use a macro name that is different from the 
header file name.
> +
> +#include <linux/atomic.h>
> +#include <asm-generic/spinlock_types.h>
> +
> +static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_fetch_add(1<<16, lock); /* SC, gives us RCsc */
> +	u16 ticket = val >> 16;
> +
> +	if (ticket == (u16)val)
> +		return;
> +
> +	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
> +}
> +
> +static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
> +{
> +	u32 old = atomic_read(lock);
> +
> +	if ((old >> 16) != (old & 0xffff))
> +		return false;
> +
> +	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
> +}
> +
> +static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> +	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
> +	u32 val = atomic_read(lock);
> +
> +	smp_store_release(ptr, (u16)val + 1);
> +}
> +
> +static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_read(lock);
> +
> +	return ((val >> 16) != (val & 0xffff));
> +}
> +
> +static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
> +{
> +	u32 val = atomic_read(lock);
> +
> +	return (s16)((val >> 16) - (val & 0xffff)) > 1;
> +}
> +
> +static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> +{
> +	return !arch_spin_is_locked(&lock);
> +}
> +
> +#include <asm/qrwlock.h>
> +
> +#endif /* __ASM_GENERIC_TICKET_LOCK_H */
> diff --git a/include/asm-generic/spinlock_types.h b/include/asm-generic/spinlock_types.h
> new file mode 100644
> index 000000000000..e56ddb84d030
> --- /dev/null
> +++ b/include/asm-generic/spinlock_types.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
> +
> +#include <linux/types.h>
> +typedef atomic_t arch_spinlock_t;
> +
> +/*
> + * qrwlock_types depends on arch_spinlock_t, so we must typedef that before the
> + * include.
> + */
> +#include <asm/qrwlock_types.h>

I believe that if you guard the include line by

#ifdef CONFIG_QUEUED_RWLOCK
#include <asm/qrwlock_types.h>
#endif

You may not need to do the hack in patch 5.

You can also directly use the <asm-generic/qrwlock_types.h> line without 
importing it to include/asm.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock
  2022-04-15  1:09   ` Boqun Feng
@ 2022-04-15  5:20     ` Palmer Dabbelt
  2022-04-17  2:44       ` Boqun Feng
  0 siblings, 1 reply; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-15  5:20 UTC (permalink / raw)
  To: boqun.feng
  Cc: Arnd Bergmann, heiko, guoren, shorne, peterz, mingo, Will Deacon,
	longman, jonas, stefan.kristiansson, Paul Walmsley, aou, macro,
	Greg KH, sudipm.mukherjee, wangkefeng.wang, jszhang, linux-csky,
	linux-kernel, openrisc, linux-riscv, linux-arch

On Thu, 14 Apr 2022 18:09:29 PDT (-0700), boqun.feng@gmail.com wrote:
> Hi,
>
> On Thu, Apr 14, 2022 at 03:02:08PM -0700, Palmer Dabbelt wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>> 
>> This is a simple, fair spinlock.  Specifically it doesn't have all the
>> subtle memory model dependencies that qspinlock has, which makes it more
>> suitable for simple systems as it is more likely to be correct.  It is
>> implemented entirely in terms of standard atomics and thus works fine
>> without any arch-specific code.
>> 
>> This replaces the existing asm-generic/spinlock.h, which just errored
>> out on SMP systems.
>> 
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>> ---
>>  include/asm-generic/spinlock.h       | 85 +++++++++++++++++++++++++---
>>  include/asm-generic/spinlock_types.h | 17 ++++++
>>  2 files changed, 94 insertions(+), 8 deletions(-)
>>  create mode 100644 include/asm-generic/spinlock_types.h
>> 
>> diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
>> index adaf6acab172..ca829fcb9672 100644
>> --- a/include/asm-generic/spinlock.h
>> +++ b/include/asm-generic/spinlock.h
>> @@ -1,12 +1,81 @@
>>  /* SPDX-License-Identifier: GPL-2.0 */
>> -#ifndef __ASM_GENERIC_SPINLOCK_H
>> -#define __ASM_GENERIC_SPINLOCK_H
>> +
>>  /*
>> - * You need to implement asm/spinlock.h for SMP support. The generic
>> - * version does not handle SMP.
>> + * 'Generic' ticket-lock implementation.
>> + *
>> + * It relies on atomic_fetch_add() having well defined forward progress
>> + * guarantees under contention. If your architecture cannot provide this, stick
>> + * to a test-and-set lock.
>> + *
>> + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
>> + * sub-word of the value. This is generally true for anything LL/SC although
>> + * you'd be hard pressed to find anything useful in architecture specifications
>> + * about this. If your architecture cannot do this you might be better off with
>> + * a test-and-set.
>> + *
>> + * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
>> + * uses atomic_fetch_add() which is SC to create an RCsc lock.
>> + *
>> + * The implementation uses smp_cond_load_acquire() to spin, so if the
>> + * architecture has WFE like instructions to sleep instead of poll for word
>> + * modifications be sure to implement that (see ARM64 for example).
>> + *
>>   */
>> -#ifdef CONFIG_SMP
>> -#error need an architecture specific asm/spinlock.h
>> -#endif
>>  
>> -#endif /* __ASM_GENERIC_SPINLOCK_H */
>> +#ifndef __ASM_GENERIC_TICKET_LOCK_H
>> +#define __ASM_GENERIC_TICKET_LOCK_H
>> +
>> +#include <linux/atomic.h>
>> +#include <asm-generic/spinlock_types.h>
>> +
>> +static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> +	u32 val = atomic_fetch_add(1<<16, lock); /* SC, gives us RCsc */
>> +	u16 ticket = val >> 16;
>> +
>> +	if (ticket == (u16)val)
>> +		return;
>> +
>> +	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
>
> Looks like my follow comment is missing:
>
> 	https://lore.kernel.org/lkml/YjM+P32I4fENIqGV@boqun-archlinux/
>
> Basically, I suggested that 1) instead of "SC", use "fully-ordered" as
> that's a complete definition in our atomic API ("RCsc" is fine), 2)
> introduce a RCsc atomic_cond_read_acquire() or add a full barrier here
> to make arch_spin_lock() RCsc otherwise arch_spin_lock() is RCsc on
> fastpath but RCpc on slowpath.

Sorry about that, now that you mention it I remember seeing that comment 
but I guess I dropped it somehow -- I've been down a bunch of other 
RISC-V memory model rabbit holes lately, so I guess this just got lost 
in the shuffle.

I'm not really a memory model person, so I'm a bit confused here, but 
IIUC the issue is that there's only an RCpc ordering between the 
store_release that publishes the baker's ticket and the customer's spin 
to obtain a contested lock.  Thus we could see RCpc-legal accesses 
before the atomic_cond_read_acquire().

That's where I get a bit lost: the atomic_fetch_add() is RCsc, so the 
offending accesses are bounded to remain within arch_spin_lock().  I'm 
not sure how that lines up with the LKMM requirements, which I always 
see expressed in terms of the entire lock being RCsc (specifically with 
unlock->lock reordering weirdness, which the fully ordered AMO seems to 
prevent here).

That's kind of just a curiosity, though, so assuming we need some 
stronger ordering here I sort of considered this

    diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
    index ca829fcb9672..bf4e6050b9b2 100644
    --- a/include/asm-generic/spinlock.h
    +++ b/include/asm-generic/spinlock.h
    @@ -14,7 +14,7 @@
      * a test-and-set.
      *
      * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
    - * uses atomic_fetch_add() which is SC to create an RCsc lock.
    + * uses atomic_fetch_add_rcsc() which is RCsc to create an RCsc lock.
      *
      * The implementation uses smp_cond_load_acquire() to spin, so if the
      * architecture has WFE like instructions to sleep instead of poll for word
    @@ -30,13 +30,13 @@
    
     static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
     {
    	u32 val = atomic_fetch_add(1<<16, lock);
     	u16 ticket = val >> 16;
    
     	if (ticket == (u16)val)
     		return;
    
    -	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
    +	atomic_cond_read_rcsc(lock, ticket == (u16)VAL);
     }
    
     static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)

but that smells a bit awkward: it's not really that the access is RCsc, 
it's that the whole lock is, and the RCsc->branch->RCpc is just kind of 
screaming for arch-specific optimizations.  I think we either end up 
with some sort of "atomic_*_for_tspinlock" or a "mb_*_for_tspinlock", 
both of which seem very specific.

That, or we just run with the fence until someone has a concrete way to 
do it faster.  I don't know OpenRISC or C-SKY, but IIUC the full fence 
is free on RISC-V: our smp_cond_read_acquire() only emits read accesses, 
ends in a "fence r,r", and is proceeded by a full smp_mb() from 
atomic_fetch_add().  So I'd lean towards the much simpler

    diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
    index ca829fcb9672..08e3400a104f 100644
    --- a/include/asm-generic/spinlock.h
    +++ b/include/asm-generic/spinlock.h
    @@ -14,7 +14,9 @@
      * a test-and-set.
      *
      * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
    - * uses atomic_fetch_add() which is SC to create an RCsc lock.
    + * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
    + * a full fence after the spin to upgrade the otherwise-RCpc
    + * atomic_cond_read_acquire().
      *
      * The implementation uses smp_cond_load_acquire() to spin, so if the
      * architecture has WFE like instructions to sleep instead of poll for word
    @@ -30,13 +32,22 @@
    
     static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
     {
    -	u32 val = atomic_fetch_add(1<<16, lock); /* SC, gives us RCsc */
    +	u32 val = atomic_fetch_add(1<<16, lock);
     	u16 ticket = val >> 16;
    
     	if (ticket == (u16)val)
     		return;
    
    +	/*
    +	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
    +	 * custom cond_read_rcsc() here we just emit a full fence.  We only
    +	 * need the prior reads before subsequent writes ordering from
    +	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
    +	 * have no outstanding writes due to the atomic_fetch_add() the extra
    +	 * orderings are free.
    +	 */
     	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
    +	smp_mb();
     }
    
     static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)

I'm also now worried about trylock, but am too far down this rabbit hole 
to try and figure out how try maps between locks and cmpxchg.  This is 
all way too complicated to squash in, though, so I'll definitely plan on 
a v4.

> Regards,
> Boqun
>
>> +}
>> +
>> +static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
>> +{
>> +	u32 old = atomic_read(lock);
>> +
>> +	if ((old >> 16) != (old & 0xffff))
>> +		return false;
>> +
>> +	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
>> +}
>> +
> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock
  2022-04-15  1:27   ` Waiman Long
@ 2022-04-15 16:46     ` Palmer Dabbelt
  2022-04-15 17:02       ` Waiman Long
  0 siblings, 1 reply; 15+ messages in thread
From: Palmer Dabbelt @ 2022-04-15 16:46 UTC (permalink / raw)
  To: longman
  Cc: Arnd Bergmann, heiko, guoren, shorne, peterz, mingo, Will Deacon,
	boqun.feng, jonas, stefan.kristiansson, Paul Walmsley, aou,
	macro, Greg KH, sudipm.mukherjee, wangkefeng.wang, jszhang,
	linux-csky, linux-kernel, openrisc, linux-riscv, linux-arch

On Thu, 14 Apr 2022 18:27:12 PDT (-0700), longman@redhat.com wrote:
> On 4/14/22 18:02, Palmer Dabbelt wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>>
>> This is a simple, fair spinlock.  Specifically it doesn't have all the
>> subtle memory model dependencies that qspinlock has, which makes it more
>> suitable for simple systems as it is more likely to be correct.  It is
>> implemented entirely in terms of standard atomics and thus works fine
>> without any arch-specific code.
>>
>> This replaces the existing asm-generic/spinlock.h, which just errored
>> out on SMP systems.
>>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>> ---
>>   include/asm-generic/spinlock.h       | 85 +++++++++++++++++++++++++---
>>   include/asm-generic/spinlock_types.h | 17 ++++++
>>   2 files changed, 94 insertions(+), 8 deletions(-)
>>   create mode 100644 include/asm-generic/spinlock_types.h
>>
>> diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
>> index adaf6acab172..ca829fcb9672 100644
>> --- a/include/asm-generic/spinlock.h
>> +++ b/include/asm-generic/spinlock.h
>> @@ -1,12 +1,81 @@
>>   /* SPDX-License-Identifier: GPL-2.0 */
>> -#ifndef __ASM_GENERIC_SPINLOCK_H
>> -#define __ASM_GENERIC_SPINLOCK_H
>> +
>>   /*
>> - * You need to implement asm/spinlock.h for SMP support. The generic
>> - * version does not handle SMP.
>> + * 'Generic' ticket-lock implementation.
>> + *
>> + * It relies on atomic_fetch_add() having well defined forward progress
>> + * guarantees under contention. If your architecture cannot provide this, stick
>> + * to a test-and-set lock.
>> + *
>> + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
>> + * sub-word of the value. This is generally true for anything LL/SC although
>> + * you'd be hard pressed to find anything useful in architecture specifications
>> + * about this. If your architecture cannot do this you might be better off with
>> + * a test-and-set.
>> + *
>> + * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
>> + * uses atomic_fetch_add() which is SC to create an RCsc lock.
>> + *
>> + * The implementation uses smp_cond_load_acquire() to spin, so if the
>> + * architecture has WFE like instructions to sleep instead of poll for word
>> + * modifications be sure to implement that (see ARM64 for example).
>> + *
>>    */
>> -#ifdef CONFIG_SMP
>> -#error need an architecture specific asm/spinlock.h
>> -#endif
>>
>> -#endif /* __ASM_GENERIC_SPINLOCK_H */
>> +#ifndef __ASM_GENERIC_TICKET_LOCK_H
>> +#define __ASM_GENERIC_TICKET_LOCK_H
> It is not conventional to use a macro name that is different from the
> header file name.

Sorry, that was just a mistake: I renamed the header, but forgot to 
rename the guard.  I'll likely send a v4 due to Boqun's questions, I'll 
fix this as well.

>> +
>> +#include <linux/atomic.h>
>> +#include <asm-generic/spinlock_types.h>
>> +
>> +static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> +	u32 val = atomic_fetch_add(1<<16, lock); /* SC, gives us RCsc */
>> +	u16 ticket = val >> 16;
>> +
>> +	if (ticket == (u16)val)
>> +		return;
>> +
>> +	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
>> +}
>> +
>> +static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
>> +{
>> +	u32 old = atomic_read(lock);
>> +
>> +	if ((old >> 16) != (old & 0xffff))
>> +		return false;
>> +
>> +	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
>> +}
>> +
>> +static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
>> +{
>> +	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
>> +	u32 val = atomic_read(lock);
>> +
>> +	smp_store_release(ptr, (u16)val + 1);
>> +}
>> +
>> +static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
>> +{
>> +	u32 val = atomic_read(lock);
>> +
>> +	return ((val >> 16) != (val & 0xffff));
>> +}
>> +
>> +static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
>> +{
>> +	u32 val = atomic_read(lock);
>> +
>> +	return (s16)((val >> 16) - (val & 0xffff)) > 1;
>> +}
>> +
>> +static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>> +{
>> +	return !arch_spin_is_locked(&lock);
>> +}
>> +
>> +#include <asm/qrwlock.h>
>> +
>> +#endif /* __ASM_GENERIC_TICKET_LOCK_H */
>> diff --git a/include/asm-generic/spinlock_types.h b/include/asm-generic/spinlock_types.h
>> new file mode 100644
>> index 000000000000..e56ddb84d030
>> --- /dev/null
>> +++ b/include/asm-generic/spinlock_types.h
>> @@ -0,0 +1,17 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
>> +#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
>> +
>> +#include <linux/types.h>
>> +typedef atomic_t arch_spinlock_t;
>> +
>> +/*
>> + * qrwlock_types depends on arch_spinlock_t, so we must typedef that before the
>> + * include.
>> + */
>> +#include <asm/qrwlock_types.h>
>
> I believe that if you guard the include line by
>
> #ifdef CONFIG_QUEUED_RWLOCK
> #include <asm/qrwlock_types.h>
> #endif
>
> You may not need to do the hack in patch 5.

Yes, and we actually had it that way the first time around (specifically 
the ARCH_USES_QUEUED_RWLOCKS, but IIUC that's the same here).  The goal 
was to avoid adding the ifdef to the asm-generic code and instead keep 
the oddness in arch/riscv, it's only there for that one commit (and just 
so we can split out the spinlock conversion from the rwlock conversion, 
in case there's a bug and these need to be bisected later).

I'd also considered renaming qrwlock* to rwlock*, which would avoid the 
ifdef and make it a touch easier to override the rwlock implementation, 
but that didn't seem useful enough to warrant the diff.  These all seem 
a bit more coupled than I expected them to be (both 
{spin,qrw}lock{,_types}.h and the bits in linux/), I looked into 
cleaning that up a bit but it seemed like too much for just the one 
patch set.

> You can also directly use the <asm-generic/qrwlock_types.h> line without
> importing it to include/asm.

Yes, along with qrwlock.h (which has some unnecessary #include shims in 
a handful of arch dirs).  That's going to make the patch set bigger, 
I'll include it in the v4.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock
  2022-04-15 16:46     ` Palmer Dabbelt
@ 2022-04-15 17:02       ` Waiman Long
  0 siblings, 0 replies; 15+ messages in thread
From: Waiman Long @ 2022-04-15 17:02 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Arnd Bergmann, heiko, guoren, shorne, peterz, mingo, Will Deacon,
	boqun.feng, jonas, stefan.kristiansson, Paul Walmsley, aou,
	macro, Greg KH, sudipm.mukherjee, wangkefeng.wang, jszhang,
	linux-csky, linux-kernel, openrisc, linux-riscv, linux-arch

On 4/15/22 12:46, Palmer Dabbelt wrote:
>
>>> diff --git a/include/asm-generic/spinlock_types.h 
>>> b/include/asm-generic/spinlock_types.h
>>> new file mode 100644
>>> index 000000000000..e56ddb84d030
>>> --- /dev/null
>>> +++ b/include/asm-generic/spinlock_types.h
>>> @@ -0,0 +1,17 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +
>>> +#ifndef __ASM_GENERIC_TICKET_LOCK_TYPES_H
>>> +#define __ASM_GENERIC_TICKET_LOCK_TYPES_H
>>> +
>>> +#include <linux/types.h>
>>> +typedef atomic_t arch_spinlock_t;
>>> +
>>> +/*
>>> + * qrwlock_types depends on arch_spinlock_t, so we must typedef 
>>> that before the
>>> + * include.
>>> + */
>>> +#include <asm/qrwlock_types.h>
>>
>> I believe that if you guard the include line by
>>
>> #ifdef CONFIG_QUEUED_RWLOCK
>> #include <asm/qrwlock_types.h>
>> #endif
>>
>> You may not need to do the hack in patch 5.
>
> Yes, and we actually had it that way the first time around 
> (specifically the ARCH_USES_QUEUED_RWLOCKS, but IIUC that's the same 
> here).  The goal was to avoid adding the ifdef to the asm-generic code 
> and instead keep the oddness in arch/riscv, it's only there for that 
> one commit (and just so we can split out the spinlock conversion from 
> the rwlock conversion, in case there's a bug and these need to be 
> bisected later).
>
> I'd also considered renaming qrwlock* to rwlock*, which would avoid 
> the ifdef and make it a touch easier to override the rwlock 
> implementation, but that didn't seem useful enough to warrant the 
> diff.  These all seem a bit more coupled than I expected them to be 
> (both {spin,qrw}lock{,_types}.h and the bits in linux/), I looked into 
> cleaning that up a bit but it seemed like too much for just the one 
> patch set.

Then you are forcing arches that use asm_generic/spinlock.h to use 
qrwlock as well. Even though most of them probably will, but forcing it 
this way remove the flexibility an arch may want to have.

The difference between CONFIG_QUEUED_RWLOCK and ARCH_USES_QUEUED_RWLOCKS 
is that qrwlock will not be compiled in when PREEMPT_RT || !SMP. So 
CONFIG_QUEUED_RWLOCK is a more accurate guard as to whether qrwlock 
should really be used.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock
  2022-04-15  5:20     ` Palmer Dabbelt
@ 2022-04-17  2:44       ` Boqun Feng
  0 siblings, 0 replies; 15+ messages in thread
From: Boqun Feng @ 2022-04-17  2:44 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Arnd Bergmann, heiko, guoren, shorne, peterz, mingo, Will Deacon,
	longman, jonas, stefan.kristiansson, Paul Walmsley, aou, macro,
	Greg KH, sudipm.mukherjee, wangkefeng.wang, jszhang, linux-csky,
	linux-kernel, openrisc, linux-riscv, linux-arch

[-- Attachment #1: Type: text/plain, Size: 9890 bytes --]

On Thu, Apr 14, 2022 at 10:20:04PM -0700, Palmer Dabbelt wrote:
> On Thu, 14 Apr 2022 18:09:29 PDT (-0700), boqun.feng@gmail.com wrote:
> > Hi,
> > 
> > On Thu, Apr 14, 2022 at 03:02:08PM -0700, Palmer Dabbelt wrote:
> > > From: Peter Zijlstra <peterz@infradead.org>
> > > 
> > > This is a simple, fair spinlock.  Specifically it doesn't have all the
> > > subtle memory model dependencies that qspinlock has, which makes it more
> > > suitable for simple systems as it is more likely to be correct.  It is
> > > implemented entirely in terms of standard atomics and thus works fine
> > > without any arch-specific code.
> > > 
> > > This replaces the existing asm-generic/spinlock.h, which just errored
> > > out on SMP systems.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> > > ---
> > >  include/asm-generic/spinlock.h       | 85 +++++++++++++++++++++++++---
> > >  include/asm-generic/spinlock_types.h | 17 ++++++
> > >  2 files changed, 94 insertions(+), 8 deletions(-)
> > >  create mode 100644 include/asm-generic/spinlock_types.h
> > > 
> > > diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
> > > index adaf6acab172..ca829fcb9672 100644
> > > --- a/include/asm-generic/spinlock.h
> > > +++ b/include/asm-generic/spinlock.h
> > > @@ -1,12 +1,81 @@
> > >  /* SPDX-License-Identifier: GPL-2.0 */
> > > -#ifndef __ASM_GENERIC_SPINLOCK_H
> > > -#define __ASM_GENERIC_SPINLOCK_H
> > > +
> > >  /*
> > > - * You need to implement asm/spinlock.h for SMP support. The generic
> > > - * version does not handle SMP.
> > > + * 'Generic' ticket-lock implementation.
> > > + *
> > > + * It relies on atomic_fetch_add() having well defined forward progress
> > > + * guarantees under contention. If your architecture cannot provide this, stick
> > > + * to a test-and-set lock.
> > > + *
> > > + * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
> > > + * sub-word of the value. This is generally true for anything LL/SC although
> > > + * you'd be hard pressed to find anything useful in architecture specifications
> > > + * about this. If your architecture cannot do this you might be better off with
> > > + * a test-and-set.
> > > + *
> > > + * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
> > > + * uses atomic_fetch_add() which is SC to create an RCsc lock.
> > > + *
> > > + * The implementation uses smp_cond_load_acquire() to spin, so if the
> > > + * architecture has WFE like instructions to sleep instead of poll for word
> > > + * modifications be sure to implement that (see ARM64 for example).
> > > + *
> > >   */
> > > -#ifdef CONFIG_SMP
> > > -#error need an architecture specific asm/spinlock.h
> > > -#endif
> > > -#endif /* __ASM_GENERIC_SPINLOCK_H */
> > > +#ifndef __ASM_GENERIC_TICKET_LOCK_H
> > > +#define __ASM_GENERIC_TICKET_LOCK_H
> > > +
> > > +#include <linux/atomic.h>
> > > +#include <asm-generic/spinlock_types.h>
> > > +
> > > +static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
> > > +{
> > > +	u32 val = atomic_fetch_add(1<<16, lock); /* SC, gives us RCsc */
> > > +	u16 ticket = val >> 16;
> > > +
> > > +	if (ticket == (u16)val)
> > > +		return;
> > > +
> > > +	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
> > 
> > Looks like my follow comment is missing:
> > 
> > 	https://lore.kernel.org/lkml/YjM+P32I4fENIqGV@boqun-archlinux/
> > 
> > Basically, I suggested that 1) instead of "SC", use "fully-ordered" as
> > that's a complete definition in our atomic API ("RCsc" is fine), 2)
> > introduce a RCsc atomic_cond_read_acquire() or add a full barrier here
> > to make arch_spin_lock() RCsc otherwise arch_spin_lock() is RCsc on
> > fastpath but RCpc on slowpath.
> 
> Sorry about that, now that you mention it I remember seeing that comment but
> I guess I dropped it somehow -- I've been down a bunch of other RISC-V
> memory model rabbit holes lately, so I guess this just got lost in the
> shuffle.
> 
> I'm not really a memory model person, so I'm a bit confused here, but IIUC
> the issue is that there's only an RCpc ordering between the store_release
> that publishes the baker's ticket and the customer's spin to obtain a
> contested lock.  Thus we could see RCpc-legal accesses before the
> atomic_cond_read_acquire().
> 
> That's where I get a bit lost: the atomic_fetch_add() is RCsc, so the
> offending accesses are bounded to remain within arch_spin_lock().  I'm not
> sure how that lines up with the LKMM requirements, which I always see
> expressed in terms of the entire lock being RCsc (specifically with
> unlock->lock reordering weirdness, which the fully ordered AMO seems to
> prevent here).
> 

The case that I had in mind is as follow:

	CPU 0 			CPU 1
	=====			=====
	arch_spin_lock();
	// CPU 0 owns the lock
				arch_spin_lock():
				  atomic_fetch_add(); // fully-ordered
				  if (ticket == (u16)val) // didn't get the ticket yet. 
				  ...
				  atomic_cond_read_acquire():
				    while (true) {
	arch_spin_unlock(); // release
				    	atomic_read_acquire(); // RCpc
					// get the ticket
				    }

In that case the lock is RCpc not RCsc because our atomics are RCpc. So
you will need to enfore the ordering if you want to make generic ticket
lock RCsc.

> That's kind of just a curiosity, though, so assuming we need some stronger
> ordering here I sort of considered this
> 
>    diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
>    index ca829fcb9672..bf4e6050b9b2 100644
>    --- a/include/asm-generic/spinlock.h
>    +++ b/include/asm-generic/spinlock.h
>    @@ -14,7 +14,7 @@
>      * a test-and-set.
>      *
>      * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
>    - * uses atomic_fetch_add() which is SC to create an RCsc lock.
>    + * uses atomic_fetch_add_rcsc() which is RCsc to create an RCsc lock.
>      *
>      * The implementation uses smp_cond_load_acquire() to spin, so if the
>      * architecture has WFE like instructions to sleep instead of poll for word
>    @@ -30,13 +30,13 @@
>     static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
>     {
>    	u32 val = atomic_fetch_add(1<<16, lock);
>     	u16 ticket = val >> 16;
>     	if (ticket == (u16)val)
>     		return;
>    -	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
>    +	atomic_cond_read_rcsc(lock, ticket == (u16)VAL);
>     }
>     static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
> 
> but that smells a bit awkward: it's not really that the access is RCsc, it's

Yeah, agreed.

> that the whole lock is, and the RCsc->branch->RCpc is just kind of screaming
> for arch-specific optimizations.  I think we either end up with some sort of
> "atomic_*_for_tspinlock" or a "mb_*_for_tspinlock", both of which seem very
> specific.
> 
> That, or we just run with the fence until someone has a concrete way to do
> it faster.  I don't know OpenRISC or C-SKY, but IIUC the full fence is free
> on RISC-V: our smp_cond_read_acquire() only emits read accesses, ends in a
> "fence r,r", and is proceeded by a full smp_mb() from atomic_fetch_add().
> So I'd lean towards the much simpler
> 
>    diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
>    index ca829fcb9672..08e3400a104f 100644
>    --- a/include/asm-generic/spinlock.h
>    +++ b/include/asm-generic/spinlock.h
>    @@ -14,7 +14,9 @@
>      * a test-and-set.
>      *
>      * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
>    - * uses atomic_fetch_add() which is SC to create an RCsc lock.
>    + * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
>    + * a full fence after the spin to upgrade the otherwise-RCpc
>    + * atomic_cond_read_acquire().
>      *
>      * The implementation uses smp_cond_load_acquire() to spin, so if the
>      * architecture has WFE like instructions to sleep instead of poll for word
>    @@ -30,13 +32,22 @@
>     static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
>     {
>    -	u32 val = atomic_fetch_add(1<<16, lock); /* SC, gives us RCsc */
>    +	u32 val = atomic_fetch_add(1<<16, lock);
>     	u16 ticket = val >> 16;
>     	if (ticket == (u16)val)
>     		return;
>    +	/*
>    +	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
>    +	 * custom cond_read_rcsc() here we just emit a full fence.  We only
>    +	 * need the prior reads before subsequent writes ordering from
>    +	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
>    +	 * have no outstanding writes due to the atomic_fetch_add() the extra
>    +	 * orderings are free.
>    +	 */
>     	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
>    +	smp_mb();
>     }
>     static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
> 

I like this version ;-)

> I'm also now worried about trylock, but am too far down this rabbit hole to
> try and figure out how try maps between locks and cmpxchg.  This is all way
> too complicated to squash in, though, so I'll definitely plan on a v4.
> 

trylock should be fine, since no one will use a failed trylock to
ordering something (famous last word though ;-)).

Regards,
Boqun

> > Regards,
> > Boqun
> > 
> > > +}
> > > +
> > > +static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
> > > +{
> > > +	u32 old = atomic_read(lock);
> > > +
> > > +	if ((old >> 16) != (old & 0xffff))
> > > +		return false;
> > > +
> > > +	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
> > > +}
> > > +
> > [...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/7] openrisc: Move to ticket-spinlock
  2022-04-14 22:02 ` [PATCH v3 4/7] openrisc: Move to ticket-spinlock Palmer Dabbelt
@ 2022-04-30  7:52   ` Stafford Horne
  0 siblings, 0 replies; 15+ messages in thread
From: Stafford Horne @ 2022-04-30  7:52 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Arnd Bergmann, heiko, guoren, peterz, mingo, Will Deacon,
	longman, boqun.feng, jonas, stefan.kristiansson, Paul Walmsley,
	Palmer Dabbelt, aou, macro, Greg KH, sudipm.mukherjee,
	wangkefeng.wang, jszhang, linux-csky, linux-kernel, openrisc,
	linux-riscv, linux-arch

On Thu, Apr 14, 2022 at 03:02:11PM -0700, Palmer Dabbelt wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> We have no indications that openrisc meets the qspinlock requirements,
> so move to ticket-spinlock as that is more likey to be correct.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> ---
>  arch/openrisc/Kconfig                      |  1 -
>  arch/openrisc/include/asm/Kbuild           |  5 ++--
>  arch/openrisc/include/asm/spinlock.h       | 27 ----------------------
>  arch/openrisc/include/asm/spinlock_types.h |  7 ------
>  4 files changed, 2 insertions(+), 38 deletions(-)
>  delete mode 100644 arch/openrisc/include/asm/spinlock.h
>  delete mode 100644 arch/openrisc/include/asm/spinlock_types.h
> 
> diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
> index 0d68adf6e02b..99f0e4a4cbbd 100644
> --- a/arch/openrisc/Kconfig
> +++ b/arch/openrisc/Kconfig
> @@ -30,7 +30,6 @@ config OPENRISC
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select OR1K_PIC
>  	select CPU_NO_EFFICIENT_FFS if !OPENRISC_HAVE_INST_FF1
> -	select ARCH_USE_QUEUED_SPINLOCKS
>  	select ARCH_USE_QUEUED_RWLOCKS
>  	select OMPIC if SMP
>  	select ARCH_WANT_FRAME_POINTERS
> diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
> index ca5987e11053..3386b9c1c073 100644
> --- a/arch/openrisc/include/asm/Kbuild
> +++ b/arch/openrisc/include/asm/Kbuild
> @@ -1,9 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0
>  generic-y += extable.h
>  generic-y += kvm_para.h
> -generic-y += mcs_spinlock.h
> -generic-y += qspinlock_types.h
> -generic-y += qspinlock.h
> +generic-y += spinlock_types.h
> +generic-y += spinlock.h
>  generic-y += qrwlock_types.h
>  generic-y += qrwlock.h
>  generic-y += user.h
> diff --git a/arch/openrisc/include/asm/spinlock.h b/arch/openrisc/include/asm/spinlock.h
> deleted file mode 100644
> index 264944a71535..000000000000
> --- a/arch/openrisc/include/asm/spinlock.h
> +++ /dev/null
> @@ -1,27 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-or-later */
> -/*
> - * OpenRISC Linux
> - *
> - * Linux architectural port borrowing liberally from similar works of
> - * others.  All original copyrights apply as per the original source
> - * declaration.
> - *
> - * OpenRISC implementation:
> - * Copyright (C) 2003 Matjaz Breskvar <phoenix@bsemi.com>
> - * Copyright (C) 2010-2011 Jonas Bonn <jonas@southpole.se>
> - * et al.
> - */
> -
> -#ifndef __ASM_OPENRISC_SPINLOCK_H
> -#define __ASM_OPENRISC_SPINLOCK_H
> -
> -#include <asm/qspinlock.h>
> -
> -#include <asm/qrwlock.h>
> -
> -#define arch_spin_relax(lock)	cpu_relax()
> -#define arch_read_relax(lock)	cpu_relax()
> -#define arch_write_relax(lock)	cpu_relax()
> -
> -
> -#endif
> diff --git a/arch/openrisc/include/asm/spinlock_types.h b/arch/openrisc/include/asm/spinlock_types.h
> deleted file mode 100644
> index 7c6fb1208c88..000000000000
> --- a/arch/openrisc/include/asm/spinlock_types.h
> +++ /dev/null
> @@ -1,7 +0,0 @@
> -#ifndef _ASM_OPENRISC_SPINLOCK_TYPES_H
> -#define _ASM_OPENRISC_SPINLOCK_TYPES_H
> -
> -#include <asm/qspinlock_types.h>
> -#include <asm/qrwlock_types.h>
> -
> -#endif /* _ASM_OPENRISC_SPINLOCK_TYPES_H */

Thanks for this, a bit late but.

Acked-by: Stafford Horne <shorne@gmail.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-04-30  7:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-14 22:02 [PATCH v3 0/7] Generic Ticket Spinlocks Palmer Dabbelt
2022-04-14 22:02 ` [PATCH v3 1/7] asm-generic: ticket-lock: New generic ticket-based spinlock Palmer Dabbelt
2022-04-15  1:09   ` Boqun Feng
2022-04-15  5:20     ` Palmer Dabbelt
2022-04-17  2:44       ` Boqun Feng
2022-04-15  1:27   ` Waiman Long
2022-04-15 16:46     ` Palmer Dabbelt
2022-04-15 17:02       ` Waiman Long
2022-04-14 22:02 ` [PATCH v3 2/7] asm-generic: qspinlock: Indicate the use of mixed-size atomics Palmer Dabbelt
2022-04-14 22:02 ` [PATCH v3 3/7] asm-generic: qrwlock: Document the spinlock fairness requirements Palmer Dabbelt
2022-04-14 22:02 ` [PATCH v3 4/7] openrisc: Move to ticket-spinlock Palmer Dabbelt
2022-04-30  7:52   ` Stafford Horne
2022-04-14 22:02 ` [PATCH v3 5/7] RISC-V: Move to generic spinlocks Palmer Dabbelt
2022-04-14 22:02 ` [PATCH v3 6/7] RISC-V: Move to queued RW locks Palmer Dabbelt
2022-04-14 22:02 ` [PATCH v3 7/7] csky: Move to generic ticket-spinlock Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).