[PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup

linux-csky.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup
@ 2022-08-08  7:13 guoren
  2022-08-08  7:13 ` [PATCH V9 01/15] asm-generic: ticket-lock: Remove unnecessary atomic_read guoren
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

In this series:
 - Cleanup generic ticket-lock code, (Using smp_mb__after_spinlock as RCsc)
 - Add qspinlock and combo-lock for riscv
 - Add qspinlock to openrisc
 - Use generic header in csky
 - Optimize cmpxchg & atomic code

Enable qspinlock and meet the requirements mentioned in a8ad07e5240c9
("asm-generic: qspinlock: Indicate the use of mixed-size atomics").

RISC-V LR/SC pairs could provide a strong/weak forward guarantee that
depends on micro-architecture. And RISC-V ISA spec has given out
several limitations to let hardware support strict forward guarantee
(RISC-V User ISA - 8.3 Eventual Success of Store-Conditional
Instructions).

eg:
Some riscv hardware such as BOOMv3 & XiangShan could provide strict &
strong forward guarantee (The cache line would be kept in an exclusive
state for Backoff cycles, and only this core's interrupt could break
the LR/SC pair).
Qemu riscv give a weak forward guarantee by wrong implementation
currently [1].

So we Add combo spinlock (ticket & queued) support for riscv. Thus different
kinds of memory model micro-arch processors could use the same Image

The first try of qspinlock for riscv was made in 2019.1 [2].

[1] https://github.com/qemu/qemu/blob/master/target/riscv/insn_trans/trans_rva.c.inc
[2] https://lore.kernel.org/linux-riscv/20190211043829.30096-1-michaeljclark@mac.com/#r

Guo Ren (15):
  asm-generic: ticket-lock: Remove unnecessary atomic_read
  asm-generic: ticket-lock: Use the same struct definitions with qspinlock
  asm-generic: ticket-lock: Move into ticket_spinlock.h
  asm-generic: ticket-lock: Keep ticket-lock the same semantic with qspinlock
  asm-generic: spinlock: Add queued spinlock support in common header
  riscv: atomic: Clean up unnecessary acquire and release definitions
  riscv: cmpxchg: Remove xchg32 and xchg64
  riscv: cmpxchg: Forbid arch_cmpxchg64 for 32-bit
  riscv: cmpxchg: Optimize cmpxchg64
  riscv: Enable ARCH_INLINE_READ*/WRITE*/SPIN*
  riscv: Add qspinlock support
  riscv: Add combo spinlock support
  openrisc: cmpxchg: Cleanup unnecessary codes
  openrisc: Move from ticket-lock to qspinlock
  csky: spinlock: Use the generic header files

 arch/csky/include/asm/Kbuild           |   2 +
 arch/csky/include/asm/spinlock.h       |  12 --
 arch/csky/include/asm/spinlock_types.h |   9 --
 arch/openrisc/Kconfig                  |   1 +
 arch/openrisc/include/asm/Kbuild       |   2 +
 arch/openrisc/include/asm/cmpxchg.h    | 192 ++++++++++---------------
 arch/riscv/Kconfig                     |  49 +++++++
 arch/riscv/include/asm/Kbuild          |   3 +-
 arch/riscv/include/asm/atomic.h        |  19 ---
 arch/riscv/include/asm/cmpxchg.h       | 177 +++++++----------------
 arch/riscv/include/asm/spinlock.h      |  77 ++++++++++
 arch/riscv/kernel/setup.c              |  22 +++
 include/asm-generic/spinlock.h         |  94 ++----------
 include/asm-generic/spinlock_types.h   |  12 +-
 include/asm-generic/ticket_spinlock.h  |  93 ++++++++++++
 15 files changed, 384 insertions(+), 380 deletions(-)
 delete mode 100644 arch/csky/include/asm/spinlock.h
 delete mode 100644 arch/csky/include/asm/spinlock_types.h
 create mode 100644 arch/riscv/include/asm/spinlock.h
 create mode 100644 include/asm-generic/ticket_spinlock.h

-- 
2.36.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V9 01/15] asm-generic: ticket-lock: Remove unnecessary atomic_read
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 02/15] asm-generic: ticket-lock: Use the same struct definitions with qspinlock guoren
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Remove unnecessary atomic_read in arch_spin_value_unlocked(lock),
because the value has been in lock. This patch could prevent
arch_spin_value_unlocked contend spin_lock data again.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 include/asm-generic/spinlock.h | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index fdfebcb050f4..90803a826ba0 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -68,11 +68,18 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 	smp_store_release(ptr, (u16)val + 1);
 }
 
+static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+	u32 val = lock.counter;
+
+	return ((val >> 16) == (val & 0xffff));
+}
+
 static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
-	u32 val = atomic_read(lock);
+	arch_spinlock_t val = READ_ONCE(*lock);
 
-	return ((val >> 16) != (val & 0xffff));
+	return !arch_spin_value_unlocked(val);
 }
 
 static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
@@ -82,11 +89,6 @@ static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
 	return (s16)((val >> 16) - (val & 0xffff)) > 1;
 }
 
-static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	return !arch_spin_is_locked(&lock);
-}
-
 #include <asm/qrwlock.h>
 
 #endif /* __ASM_GENERIC_SPINLOCK_H */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 02/15] asm-generic: ticket-lock: Use the same struct definitions with qspinlock
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
  2022-08-08  7:13 ` [PATCH V9 01/15] asm-generic: ticket-lock: Remove unnecessary atomic_read guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 03/15] asm-generic: ticket-lock: Move into ticket_spinlock.h guoren
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Let ticket_lock use the same struct definitions with qspinlock, and then
we could move to combo spinlock (combine ticket & queue).

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 include/asm-generic/spinlock.h       | 14 +++++++-------
 include/asm-generic/spinlock_types.h | 12 ++----------
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 90803a826ba0..4773334ee638 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -32,7 +32,7 @@
 
 static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 {
-	u32 val = atomic_fetch_add(1<<16, lock);
+	u32 val = atomic_fetch_add(1<<16, &lock->val);
 	u16 ticket = val >> 16;
 
 	if (ticket == (u16)val)
@@ -46,31 +46,31 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 	 * have no outstanding writes due to the atomic_fetch_add() the extra
 	 * orderings are free.
 	 */
-	atomic_cond_read_acquire(lock, ticket == (u16)VAL);
+	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
 	smp_mb();
 }
 
 static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
 {
-	u32 old = atomic_read(lock);
+	u32 old = atomic_read(&lock->val);
 
 	if ((old >> 16) != (old & 0xffff))
 		return false;
 
-	return atomic_try_cmpxchg(lock, &old, old + (1<<16)); /* SC, for RCsc */
+	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
 }
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
-	u32 val = atomic_read(lock);
+	u32 val = atomic_read(&lock->val);
 
 	smp_store_release(ptr, (u16)val + 1);
 }
 
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
-	u32 val = lock.counter;
+	u32 val = lock.val.counter;
 
 	return ((val >> 16) == (val & 0xffff));
 }
@@ -84,7 +84,7 @@ static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
 
 static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
-	u32 val = atomic_read(lock);
+	u32 val = atomic_read(&lock->val);
 
 	return (s16)((val >> 16) - (val & 0xffff)) > 1;
 }
diff --git a/include/asm-generic/spinlock_types.h b/include/asm-generic/spinlock_types.h
index 8962bb730945..f534aa5de394 100644
--- a/include/asm-generic/spinlock_types.h
+++ b/include/asm-generic/spinlock_types.h
@@ -3,15 +3,7 @@
 #ifndef __ASM_GENERIC_SPINLOCK_TYPES_H
 #define __ASM_GENERIC_SPINLOCK_TYPES_H
 
-#include <linux/types.h>
-typedef atomic_t arch_spinlock_t;
-
-/*
- * qrwlock_types depends on arch_spinlock_t, so we must typedef that before the
- * include.
- */
-#include <asm/qrwlock_types.h>
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	ATOMIC_INIT(0)
+#include <asm-generic/qspinlock_types.h>
+#include <asm-generic/qrwlock_types.h>
 
 #endif /* __ASM_GENERIC_SPINLOCK_TYPES_H */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 03/15] asm-generic: ticket-lock: Move into ticket_spinlock.h
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
  2022-08-08  7:13 ` [PATCH V9 01/15] asm-generic: ticket-lock: Remove unnecessary atomic_read guoren
  2022-08-08  7:13 ` [PATCH V9 02/15] asm-generic: ticket-lock: Use the same struct definitions with qspinlock guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 04/15] asm-generic: ticket-lock: Keep ticket-lock the same semantic with qspinlock guoren
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Move ticket-lock definition into an independent file. It's a preparation
patch for merging qspinlock into asm-generic spinlock.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 include/asm-generic/spinlock.h        |  87 +---------------------
 include/asm-generic/ticket_spinlock.h | 103 ++++++++++++++++++++++++++
 2 files changed, 104 insertions(+), 86 deletions(-)
 create mode 100644 include/asm-generic/ticket_spinlock.h

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 4773334ee638..970590baf61b 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -1,94 +1,9 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
-/*
- * 'Generic' ticket-lock implementation.
- *
- * It relies on atomic_fetch_add() having well defined forward progress
- * guarantees under contention. If your architecture cannot provide this, stick
- * to a test-and-set lock.
- *
- * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
- * sub-word of the value. This is generally true for anything LL/SC although
- * you'd be hard pressed to find anything useful in architecture specifications
- * about this. If your architecture cannot do this you might be better off with
- * a test-and-set.
- *
- * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
- * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
- * a full fence after the spin to upgrade the otherwise-RCpc
- * atomic_cond_read_acquire().
- *
- * The implementation uses smp_cond_load_acquire() to spin, so if the
- * architecture has WFE like instructions to sleep instead of poll for word
- * modifications be sure to implement that (see ARM64 for example).
- *
- */
-
 #ifndef __ASM_GENERIC_SPINLOCK_H
 #define __ASM_GENERIC_SPINLOCK_H
 
-#include <linux/atomic.h>
-#include <asm-generic/spinlock_types.h>
-
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	u32 val = atomic_fetch_add(1<<16, &lock->val);
-	u16 ticket = val >> 16;
-
-	if (ticket == (u16)val)
-		return;
-
-	/*
-	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
-	 * custom cond_read_rcsc() here we just emit a full fence.  We only
-	 * need the prior reads before subsequent writes ordering from
-	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
-	 * have no outstanding writes due to the atomic_fetch_add() the extra
-	 * orderings are free.
-	 */
-	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
-	smp_mb();
-}
-
-static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
-{
-	u32 old = atomic_read(&lock->val);
-
-	if ((old >> 16) != (old & 0xffff))
-		return false;
-
-	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
-	u32 val = atomic_read(&lock->val);
-
-	smp_store_release(ptr, (u16)val + 1);
-}
-
-static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	u32 val = lock.val.counter;
-
-	return ((val >> 16) == (val & 0xffff));
-}
-
-static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	arch_spinlock_t val = READ_ONCE(*lock);
-
-	return !arch_spin_value_unlocked(val);
-}
-
-static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	u32 val = atomic_read(&lock->val);
-
-	return (s16)((val >> 16) - (val & 0xffff)) > 1;
-}
-
+#include <asm-generic/ticket_spinlock.h>
 #include <asm/qrwlock.h>
 
 #endif /* __ASM_GENERIC_SPINLOCK_H */
diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
new file mode 100644
index 000000000000..cfcff22b37b3
--- /dev/null
+++ b/include/asm-generic/ticket_spinlock.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * 'Generic' ticket-lock implementation.
+ *
+ * It relies on atomic_fetch_add() having well defined forward progress
+ * guarantees under contention. If your architecture cannot provide this, stick
+ * to a test-and-set lock.
+ *
+ * It also relies on atomic_fetch_add() being safe vs smp_store_release() on a
+ * sub-word of the value. This is generally true for anything LL/SC although
+ * you'd be hard pressed to find anything useful in architecture specifications
+ * about this. If your architecture cannot do this you might be better off with
+ * a test-and-set.
+ *
+ * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
+ * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
+ * a full fence after the spin to upgrade the otherwise-RCpc
+ * atomic_cond_read_acquire().
+ *
+ * The implementation uses smp_cond_load_acquire() to spin, so if the
+ * architecture has WFE like instructions to sleep instead of poll for word
+ * modifications be sure to implement that (see ARM64 for example).
+ *
+ */
+
+#ifndef __ASM_GENERIC_TICKET_SPINLOCK_H
+#define __ASM_GENERIC_TICKET_SPINLOCK_H
+
+#include <linux/atomic.h>
+#include <asm-generic/spinlock_types.h>
+
+static __always_inline void ticket_spin_lock(arch_spinlock_t *lock)
+{
+	u32 val = atomic_fetch_add(1<<16, &lock->val);
+	u16 ticket = val >> 16;
+
+	if (ticket == (u16)val)
+		return;
+
+	/*
+	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
+	 * custom cond_read_rcsc() here we just emit a full fence.  We only
+	 * need the prior reads before subsequent writes ordering from
+	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
+	 * have no outstanding writes due to the atomic_fetch_add() the extra
+	 * orderings are free.
+	 */
+	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
+	smp_mb();
+}
+
+static __always_inline bool ticket_spin_trylock(arch_spinlock_t *lock)
+{
+	u32 old = atomic_read(&lock->val);
+
+	if ((old >> 16) != (old & 0xffff))
+		return false;
+
+	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
+}
+
+static __always_inline void ticket_spin_unlock(arch_spinlock_t *lock)
+{
+	u16 *ptr = (u16 *)lock + IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+	u32 val = atomic_read(&lock->val);
+
+	smp_store_release(ptr, (u16)val + 1);
+}
+
+static __always_inline int ticket_spin_value_unlocked(arch_spinlock_t lock)
+{
+	u32 val = lock.val.counter;
+
+	return ((val >> 16) == (val & 0xffff));
+}
+
+static __always_inline int ticket_spin_is_locked(arch_spinlock_t *lock)
+{
+	arch_spinlock_t val = READ_ONCE(*lock);
+
+	return !ticket_spin_value_unlocked(val);
+}
+
+static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock)
+{
+	u32 val = atomic_read(&lock->val);
+
+	return (s16)((val >> 16) - (val & 0xffff)) > 1;
+}
+
+/*
+ * Remapping spinlock architecture specific functions to the corresponding
+ * ticket spinlock functions.
+ */
+#define arch_spin_is_locked(l)		ticket_spin_is_locked(l)
+#define arch_spin_is_contended(l)	ticket_spin_is_contended(l)
+#define arch_spin_value_unlocked(l)	ticket_spin_value_unlocked(l)
+#define arch_spin_lock(l)		ticket_spin_lock(l)
+#define arch_spin_trylock(l)		ticket_spin_trylock(l)
+#define arch_spin_unlock(l)		ticket_spin_unlock(l)
+
+#endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 04/15] asm-generic: ticket-lock: Keep ticket-lock the same semantic with qspinlock
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (2 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 03/15] asm-generic: ticket-lock: Move into ticket_spinlock.h guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 05/15] asm-generic: spinlock: Add queued spinlock support in common header guoren
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Define smp_mb__after_spinlock by smp_mb as default behavior to give RCsc
synchronization point for all architectures. Keep the same semantic with
qspinlock, a acquire (RCpc) synchronization point. More detail, see
include/linux/spinlock.h.

Some architectures could give more robust semantics than smp_mb, eg.
riscv. Some architectures needn't smp_mb__after_spinlock because their
spinlocks have contained an RCsc.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/asm-generic/spinlock.h        |  5 +++++
 include/asm-generic/ticket_spinlock.h | 18 ++++--------------
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 970590baf61b..6f5a1b838ca2 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -6,4 +6,9 @@
 #include <asm-generic/ticket_spinlock.h>
 #include <asm/qrwlock.h>
 
+/* See include/linux/spinlock.h */
+#ifndef smp_mb__after_spinlock
+#define smp_mb__after_spinlock()	smp_mb()
+#endif
+
 #endif /* __ASM_GENERIC_SPINLOCK_H */
diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h
index cfcff22b37b3..d8e6ec82f096 100644
--- a/include/asm-generic/ticket_spinlock.h
+++ b/include/asm-generic/ticket_spinlock.h
@@ -14,9 +14,8 @@
  * a test-and-set.
  *
  * It further assumes atomic_*_release() + atomic_*_acquire() is RCpc and hence
- * uses atomic_fetch_add() which is RCsc to create an RCsc hot path, along with
- * a full fence after the spin to upgrade the otherwise-RCpc
- * atomic_cond_read_acquire().
+ * uses smp_mb__after_spinlock which is RCsc to create an RCsc hot path, See
+ * include/linux/spinlock.h
  *
  * The implementation uses smp_cond_load_acquire() to spin, so if the
  * architecture has WFE like instructions to sleep instead of poll for word
@@ -32,22 +31,13 @@
 
 static __always_inline void ticket_spin_lock(arch_spinlock_t *lock)
 {
-	u32 val = atomic_fetch_add(1<<16, &lock->val);
+	u32 val = atomic_fetch_add_acquire(1<<16, &lock->val);
 	u16 ticket = val >> 16;
 
 	if (ticket == (u16)val)
 		return;
 
-	/*
-	 * atomic_cond_read_acquire() is RCpc, but rather than defining a
-	 * custom cond_read_rcsc() here we just emit a full fence.  We only
-	 * need the prior reads before subsequent writes ordering from
-	 * smb_mb(), but as atomic_cond_read_acquire() just emits reads and we
-	 * have no outstanding writes due to the atomic_fetch_add() the extra
-	 * orderings are free.
-	 */
 	atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL);
-	smp_mb();
 }
 
 static __always_inline bool ticket_spin_trylock(arch_spinlock_t *lock)
@@ -57,7 +47,7 @@ static __always_inline bool ticket_spin_trylock(arch_spinlock_t *lock)
 	if ((old >> 16) != (old & 0xffff))
 		return false;
 
-	return atomic_try_cmpxchg(&lock->val, &old, old + (1<<16)); /* SC, for RCsc */
+	return atomic_try_cmpxchg_acquire(&lock->val, &old, old + (1<<16));
 }
 
 static __always_inline void ticket_spin_unlock(arch_spinlock_t *lock)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 05/15] asm-generic: spinlock: Add queued spinlock support in common header
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (3 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 04/15] asm-generic: ticket-lock: Keep ticket-lock the same semantic with qspinlock guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 06/15] riscv: atomic: Clean up unnecessary acquire and release definitions guoren
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Select queued spinlock or ticket lock by CONFIG_QUEUED_SPINLOCKS in
the common header file. Define smp_mb__after_spinlock with smp_mb()
as default.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 include/asm-generic/spinlock.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index 6f5a1b838ca2..349cdb46a99c 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -3,7 +3,11 @@
 #ifndef __ASM_GENERIC_SPINLOCK_H
 #define __ASM_GENERIC_SPINLOCK_H
 
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm-generic/qspinlock.h>
+#else
 #include <asm-generic/ticket_spinlock.h>
+#endif
 #include <asm/qrwlock.h>
 
 /* See include/linux/spinlock.h */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 06/15] riscv: atomic: Clean up unnecessary acquire and release definitions
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (4 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 05/15] asm-generic: spinlock: Add queued spinlock support in common header guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 07/15] riscv: cmpxchg: Remove xchg32 and xchg64 guoren
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Clean up unnecessary xchg_acquire, xchg_release, and cmpxchg_release
custom definitions, because the generic implementation is the same as
the riscv custom implementation.

Before the patch:
000000000000024e <.LBB238>:
                ops = xchg_acquire(pending_ipis, 0);
 24e:   089937af                amoswap.d       a5,s1,(s2)
 252:   0230000f                fence   r,rw

0000000000000256 <.LBB243>:
                ops = xchg_release(pending_ipis, 0);
 256:   0310000f                fence   rw,w
 25a:   089934af                amoswap.d       s1,s1,(s2)

After the patch:
000000000000026e <.LBB245>:
                ops = xchg_acquire(pending_ipis, 0);
 26e:   089937af                amoswap.d       a5,s1,(s2)

0000000000000272 <.LBE247>:
 272:   0230000f                fence   r,rw

0000000000000276 <.LBB249>:
                ops = xchg_release(pending_ipis, 0);
 276:   0310000f                fence   rw,w

000000000000027a <.LBB251>:
 27a:   089934af                amoswap.d       s1,s1,(s2)

Only cmpxchg_acquire is necessary (It prevents unnecessary acquire
ordering when the value from lr is different from old).

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/include/asm/atomic.h  |  19 -----
 arch/riscv/include/asm/cmpxchg.h | 116 -------------------------------
 2 files changed, 135 deletions(-)

diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index 0dfe9d857a76..83636320ba95 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -249,16 +249,6 @@ c_t arch_atomic##prefix##_xchg_relaxed(atomic##prefix##_t *v, c_t n)	\
 	return __xchg_relaxed(&(v->counter), n, size);			\
 }									\
 static __always_inline							\
-c_t arch_atomic##prefix##_xchg_acquire(atomic##prefix##_t *v, c_t n)	\
-{									\
-	return __xchg_acquire(&(v->counter), n, size);			\
-}									\
-static __always_inline							\
-c_t arch_atomic##prefix##_xchg_release(atomic##prefix##_t *v, c_t n)	\
-{									\
-	return __xchg_release(&(v->counter), n, size);			\
-}									\
-static __always_inline							\
 c_t arch_atomic##prefix##_xchg(atomic##prefix##_t *v, c_t n)		\
 {									\
 	return __xchg(&(v->counter), n, size);				\
@@ -276,12 +266,6 @@ c_t arch_atomic##prefix##_cmpxchg_acquire(atomic##prefix##_t *v,	\
 	return __cmpxchg_acquire(&(v->counter), o, n, size);		\
 }									\
 static __always_inline							\
-c_t arch_atomic##prefix##_cmpxchg_release(atomic##prefix##_t *v,	\
-				     c_t o, c_t n)			\
-{									\
-	return __cmpxchg_release(&(v->counter), o, n, size);		\
-}									\
-static __always_inline							\
 c_t arch_atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n)	\
 {									\
 	return __cmpxchg(&(v->counter), o, n, size);			\
@@ -299,12 +283,9 @@ c_t arch_atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n)	\
 ATOMIC_OPS()
 
 #define arch_atomic_xchg_relaxed	arch_atomic_xchg_relaxed
-#define arch_atomic_xchg_acquire	arch_atomic_xchg_acquire
-#define arch_atomic_xchg_release	arch_atomic_xchg_release
 #define arch_atomic_xchg		arch_atomic_xchg
 #define arch_atomic_cmpxchg_relaxed	arch_atomic_cmpxchg_relaxed
 #define arch_atomic_cmpxchg_acquire	arch_atomic_cmpxchg_acquire
-#define arch_atomic_cmpxchg_release	arch_atomic_cmpxchg_release
 #define arch_atomic_cmpxchg		arch_atomic_cmpxchg
 
 #undef ATOMIC_OPS
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 12debce235e5..67ab6375b650 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -44,76 +44,6 @@
 					    _x_, sizeof(*(ptr)));	\
 })
 
-#define __xchg_acquire(ptr, new, size)					\
-({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(new) __new = (new);					\
-	__typeof__(*(ptr)) __ret;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			"	amoswap.w %0, %2, %1\n"			\
-			RISCV_ACQUIRE_BARRIER				\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			"	amoswap.d %0, %2, %1\n"			\
-			RISCV_ACQUIRE_BARRIER				\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
-
-#define arch_xchg_acquire(ptr, x)					\
-({									\
-	__typeof__(*(ptr)) _x_ = (x);					\
-	(__typeof__(*(ptr))) __xchg_acquire((ptr),			\
-					    _x_, sizeof(*(ptr)));	\
-})
-
-#define __xchg_release(ptr, new, size)					\
-({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(new) __new = (new);					\
-	__typeof__(*(ptr)) __ret;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			RISCV_RELEASE_BARRIER				\
-			"	amoswap.w %0, %2, %1\n"			\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			RISCV_RELEASE_BARRIER				\
-			"	amoswap.d %0, %2, %1\n"			\
-			: "=r" (__ret), "+A" (*__ptr)			\
-			: "r" (__new)					\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
-
-#define arch_xchg_release(ptr, x)					\
-({									\
-	__typeof__(*(ptr)) _x_ = (x);					\
-	(__typeof__(*(ptr))) __xchg_release((ptr),			\
-					    _x_, sizeof(*(ptr)));	\
-})
-
 #define __xchg(ptr, new, size)						\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
@@ -253,52 +183,6 @@
 					_o_, _n_, sizeof(*(ptr)));	\
 })
 
-#define __cmpxchg_release(ptr, old, new, size)				\
-({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	__typeof__(*(ptr)) __ret;					\
-	register unsigned int __rc;					\
-	switch (size) {							\
-	case 4:								\
-		__asm__ __volatile__ (					\
-			RISCV_RELEASE_BARRIER				\
-			"0:	lr.w %0, %2\n"				\
-			"	bne  %0, %z3, 1f\n"			\
-			"	sc.w %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" ((long)__old), "rJ" (__new)		\
-			: "memory");					\
-		break;							\
-	case 8:								\
-		__asm__ __volatile__ (					\
-			RISCV_RELEASE_BARRIER				\
-			"0:	lr.d %0, %2\n"				\
-			"	bne %0, %z3, 1f\n"			\
-			"	sc.d %1, %z4, %2\n"			\
-			"	bnez %1, 0b\n"				\
-			"1:\n"						\
-			: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr)	\
-			: "rJ" (__old), "rJ" (__new)			\
-			: "memory");					\
-		break;							\
-	default:							\
-		BUILD_BUG();						\
-	}								\
-	__ret;								\
-})
-
-#define arch_cmpxchg_release(ptr, o, n)					\
-({									\
-	__typeof__(*(ptr)) _o_ = (o);					\
-	__typeof__(*(ptr)) _n_ = (n);					\
-	(__typeof__(*(ptr))) __cmpxchg_release((ptr),			\
-					_o_, _n_, sizeof(*(ptr)));	\
-})
-
 #define __cmpxchg(ptr, old, new, size)					\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 07/15] riscv: cmpxchg: Remove xchg32 and xchg64
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (5 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 06/15] riscv: atomic: Clean up unnecessary acquire and release definitions guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 08/15] riscv: cmpxchg: Forbid arch_cmpxchg64 for 32-bit guoren
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

The xchg32 and xchg64 are unused, so remove them.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/include/asm/cmpxchg.h | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 67ab6375b650..567ed2e274c4 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -76,18 +76,6 @@
 	(__typeof__(*(ptr))) __xchg((ptr), _x_, sizeof(*(ptr)));	\
 })
 
-#define xchg32(ptr, x)							\
-({									\
-	BUILD_BUG_ON(sizeof(*(ptr)) != 4);				\
-	arch_xchg((ptr), (x));						\
-})
-
-#define xchg64(ptr, x)							\
-({									\
-	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
-	arch_xchg((ptr), (x));						\
-})
-
 /*
  * Atomic compare and exchange.  Compare OLD with MEM, if identical,
  * store NEW in MEM.  Return the initial value in MEM.  Success is
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 08/15] riscv: cmpxchg: Forbid arch_cmpxchg64 for 32-bit
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (6 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 07/15] riscv: cmpxchg: Remove xchg32 and xchg64 guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 09/15] riscv: cmpxchg: Optimize cmpxchg64 guoren
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

RISC-V 32-bit couldn't support lr.d/sc.d instructions, so using
arch_cmpxchg64 would cause error. Add forbid code to prevent the
situation.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/include/asm/cmpxchg.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 567ed2e274c4..14c9280c7f7f 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -25,6 +25,7 @@
 			: "memory");					\
 		break;							\
 	case 8:								\
+		BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT));			\
 		__asm__ __volatile__ (					\
 			"	amoswap.d %0, %2, %1\n"			\
 			: "=r" (__ret), "+A" (*__ptr)			\
@@ -58,6 +59,7 @@
 			: "memory");					\
 		break;							\
 	case 8:								\
+		BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT));			\
 		__asm__ __volatile__ (					\
 			"	amoswap.d.aqrl %0, %2, %1\n"		\
 			: "=r" (__ret), "+A" (*__ptr)			\
@@ -101,6 +103,7 @@
 			: "memory");					\
 		break;							\
 	case 8:								\
+		BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT));			\
 		__asm__ __volatile__ (					\
 			"0:	lr.d %0, %2\n"				\
 			"	bne %0, %z3, 1f\n"			\
@@ -146,6 +149,7 @@
 			: "memory");					\
 		break;							\
 	case 8:								\
+		BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT));			\
 		__asm__ __volatile__ (					\
 			"0:	lr.d %0, %2\n"				\
 			"	bne %0, %z3, 1f\n"			\
@@ -192,6 +196,7 @@
 			: "memory");					\
 		break;							\
 	case 8:								\
+		BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT));			\
 		__asm__ __volatile__ (					\
 			"0:	lr.d %0, %2\n"				\
 			"	bne %0, %z3, 1f\n"			\
@@ -220,6 +225,7 @@
 #define arch_cmpxchg_local(ptr, o, n)					\
 	(__cmpxchg_relaxed((ptr), (o), (n), sizeof(*(ptr))))
 
+#ifdef CONFIG_64BIT
 #define arch_cmpxchg64(ptr, o, n)					\
 ({									\
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
@@ -231,5 +237,6 @@
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
 	arch_cmpxchg_relaxed((ptr), (o), (n));				\
 })
+#endif /* CONFIG_64BIT */
 
 #endif /* _ASM_RISCV_CMPXCHG_H */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 09/15] riscv: cmpxchg: Optimize cmpxchg64
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (7 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 08/15] riscv: cmpxchg: Forbid arch_cmpxchg64 for 32-bit guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 10/15] riscv: Enable ARCH_INLINE_READ*/WRITE*/SPIN* guoren
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Optimize cmpxchg64 with relaxed, acquire, release implementation.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/include/asm/cmpxchg.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 14c9280c7f7f..4b5fa25f4336 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -226,6 +226,24 @@
 	(__cmpxchg_relaxed((ptr), (o), (n), sizeof(*(ptr))))
 
 #ifdef CONFIG_64BIT
+#define arch_cmpxchg64_relaxed(ptr, o, n)				\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	arch_cmpxchg_relaxed((ptr), (o), (n));				\
+})
+
+#define arch_cmpxchg64_acquire(ptr, o, n)				\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	arch_cmpxchg_acquire((ptr), (o), (n));				\
+})
+
+#define arch_cmpxchg64_release(ptr, o, n)				\
+({									\
+	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
+	arch_cmpxchg_release((ptr), (o), (n));				\
+})
+
 #define arch_cmpxchg64(ptr, o, n)					\
 ({									\
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 10/15] riscv: Enable ARCH_INLINE_READ*/WRITE*/SPIN*
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (8 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 09/15] riscv: cmpxchg: Optimize cmpxchg64 guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 11/15] riscv: Add qspinlock support guoren
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Enable ARCH_INLINE_READ*/WRITE*/SPIN* when !PREEMPTION, it is copied
from arch/arm64. It could reduce procedure calls and improves
performance.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Kconfig | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 51713e03c934..c3ca23bc6352 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -32,6 +32,32 @@ config RISCV
 	select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
+	select ARCH_INLINE_READ_LOCK if !PREEMPTION
+	select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
+	select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION
+	select ARCH_INLINE_READ_UNLOCK if !PREEMPTION
+	select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION
+	select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION
+	select ARCH_INLINE_WRITE_LOCK if !PREEMPTION
+	select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION
+	select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION
+	select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION
+	select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION
+	select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION
+	select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION
+	select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION
+	select ARCH_INLINE_SPIN_LOCK if !PREEMPTION
+	select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION
+	select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION
+	select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION
+	select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
+	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
 	select ARCH_STACKWALK
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 11/15] riscv: Add qspinlock support
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (9 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 10/15] riscv: Enable ARCH_INLINE_READ*/WRITE*/SPIN* guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 12/15] riscv: Add combo spinlock support guoren
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Enable qspinlock by the requirements mentioned in a8ad07e5240c9
("asm-generic: qspinlock: Indicate the use of mixed-size atomics").

 - RISC-V atomic_*_release()/atomic_*_acquire() are implemented with
   own relaxed version plus acquire/release_fence for RCsc
   synchronization.

 - RISC-V LR/SC pairs could provide a strong/weak forward guarantee
   that depends on micro-architecture. And RISC-V ISA spec has given
   out several limitations to let hardware support strict forward
   guarantee (RISC-V User ISA - 8.3 Eventual Success of
   Store-Conditional Instructions). Some riscv cores such as BOOMv3
   & XiangShan could provide strict & strong forward guarantee (The
   cache line would be kept in an exclusive state for Backoff cycles,
   and only this core's interrupt could break the LR/SC pair).

 - RISC-V provides cheap atomic_fetch_or_acquire() with RCsc.

 - RISC-V only provides relaxed xchg16 to support qspinlock.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Kconfig               | 16 ++++++++++++++++
 arch/riscv/include/asm/Kbuild    |  2 ++
 arch/riscv/include/asm/cmpxchg.h | 24 ++++++++++++++++++++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c3ca23bc6352..8b36a4307d03 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -359,6 +359,22 @@ config NODES_SHIFT
 	  Specify the maximum number of NUMA Nodes available on the target
 	  system.  Increases memory reserved to accommodate various tables.
 
+choice
+	prompt "RISC-V spinlock type"
+	default RISCV_TICKET_SPINLOCKS
+
+config RISCV_TICKET_SPINLOCKS
+	bool "Using ticket spinlock"
+
+config RISCV_QUEUED_SPINLOCKS
+	bool "Using queued spinlock"
+	depends on SMP && MMU
+	select ARCH_USE_QUEUED_SPINLOCKS
+	help
+	  Make sure your micro arch LL/SC has a strong forward progress guarantee.
+	  Otherwise, stay at ticket-lock.
+endchoice
+
 config RISCV_ALTERNATIVE
 	bool
 	depends on !XIP_KERNEL
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 504f8b7e72d4..2cce98c7b653 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -2,7 +2,9 @@
 generic-y += early_ioremap.h
 generic-y += flat.h
 generic-y += kvm_para.h
+generic-y += mcs_spinlock.h
 generic-y += parport.h
+generic-y += qspinlock.h
 generic-y += spinlock.h
 generic-y += spinlock_types.h
 generic-y += qrwlock.h
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 4b5fa25f4336..2ba88057db52 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,12 +11,36 @@
 #include <asm/barrier.h>
 #include <asm/fence.h>
 
+static inline ulong __xchg16_relaxed(ulong new, void *ptr)
+{
+	ulong ret, tmp;
+	ulong shif = ((ulong)ptr & 2) ? 16 : 0;
+	ulong mask = 0xffff << shif;
+	ulong *__ptr = (ulong *)((ulong)ptr & ~2);
+
+	__asm__ __volatile__ (
+		"0:	lr.w %0, %2\n"
+		"	and  %1, %0, %z3\n"
+		"	or   %1, %1, %z4\n"
+		"	sc.w %1, %1, %2\n"
+		"	bnez %1, 0b\n"
+		: "=&r" (ret), "=&r" (tmp), "+A" (*__ptr)
+		: "rJ" (~mask), "rJ" (new << shif)
+		: "memory");
+
+	return (ulong)((ret & mask) >> shif);
+}
+
 #define __xchg_relaxed(ptr, new, size)					\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
 	__typeof__(new) __new = (new);					\
 	__typeof__(*(ptr)) __ret;					\
 	switch (size) {							\
+	case 2:	{							\
+		__ret = (__typeof__(*(ptr)))				\
+			__xchg16_relaxed((ulong)__new, __ptr);		\
+		break;}							\
 	case 4:								\
 		__asm__ __volatile__ (					\
 			"	amoswap.w %0, %2, %1\n"			\
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 12/15] riscv: Add combo spinlock support
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (10 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 11/15] riscv: Add qspinlock support guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 13/15] openrisc: cmpxchg: Cleanup unnecessary codes guoren
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Combo spinlock could support queued and ticket in one Linux Image and
select them during boot time with command line option. Here is the
func-size(Bytes) comparison table below:

TYPE			: COMBO | TICKET | QUEUED
arch_spin_lock		: 106	| 60     | 50
arch_spin_unlock	: 54    | 36     | 26
arch_spin_trylock	: 110   | 72     | 54
arch_spin_is_locked	: 48    | 34     | 20
arch_spin_is_contended	: 56    | 40     | 24
rch_spin_value_unlocked	: 48    | 34     | 24

One example of disassemble combo arch_spin_unlock:
   0xffffffff8000409c <+14>:    nop # jump label slot
   0xffffffff800040a0 <+18>:    fence   rw,w # queued spinlock start
   0xffffffff800040a4 <+22>:    sb      zero,0(a4) # queued spinlock end
   0xffffffff800040a8 <+26>:    ld      s0,8(sp)
   0xffffffff800040aa <+28>:    addi    sp,sp,16
   0xffffffff800040ac <+30>:    ret
   0xffffffff800040ae <+32>:    lw      a5,0(a4) # ticket spinlock start
   0xffffffff800040b0 <+34>:    sext.w  a5,a5
   0xffffffff800040b2 <+36>:    fence   rw,w
   0xffffffff800040b6 <+40>:    addiw   a5,a5,1
   0xffffffff800040b8 <+42>:    slli    a5,a5,0x30
   0xffffffff800040ba <+44>:    srli    a5,a5,0x30
   0xffffffff800040bc <+46>:    sh      a5,0(a4) # ticket spinlock end
   0xffffffff800040c0 <+50>:    ld      s0,8(sp)
   0xffffffff800040c2 <+52>:    addi    sp,sp,16
   0xffffffff800040c4 <+54>:    ret

The qspinlock is smaller and faster than ticket-lock when all is in
fast-path, and combo spinlock could provide a compatible Linux Image for
different micro-arch design (weak/strict fwd guarantee) processors.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Kconfig                |  9 +++-
 arch/riscv/include/asm/Kbuild     |  1 -
 arch/riscv/include/asm/spinlock.h | 77 +++++++++++++++++++++++++++++++
 arch/riscv/kernel/setup.c         | 22 +++++++++
 4 files changed, 107 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/include/asm/spinlock.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8b36a4307d03..6645f04c7da4 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -361,7 +361,7 @@ config NODES_SHIFT
 
 choice
 	prompt "RISC-V spinlock type"
-	default RISCV_TICKET_SPINLOCKS
+	default RISCV_COMBO_SPINLOCKS
 
 config RISCV_TICKET_SPINLOCKS
 	bool "Using ticket spinlock"
@@ -373,6 +373,13 @@ config RISCV_QUEUED_SPINLOCKS
 	help
 	  Make sure your micro arch LL/SC has a strong forward progress guarantee.
 	  Otherwise, stay at ticket-lock.
+
+config RISCV_COMBO_SPINLOCKS
+	bool "Using combo spinlock"
+	depends on SMP && MMU
+	select ARCH_USE_QUEUED_SPINLOCKS
+	help
+	  Select queued spinlock or ticket-lock with jump_label.
 endchoice
 
 config RISCV_ALTERNATIVE
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 2cce98c7b653..59d5ea7390ea 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -5,7 +5,6 @@ generic-y += kvm_para.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
 generic-y += qspinlock.h
-generic-y += spinlock.h
 generic-y += spinlock_types.h
 generic-y += qrwlock.h
 generic-y += qrwlock_types.h
diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h
new file mode 100644
index 000000000000..b079462d818b
--- /dev/null
+++ b/arch/riscv/include/asm/spinlock.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_RISCV_SPINLOCK_H
+#define __ASM_RISCV_SPINLOCK_H
+
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+#include <asm-generic/ticket_spinlock.h>
+
+#undef arch_spin_is_locked
+#undef arch_spin_is_contended
+#undef arch_spin_value_unlocked
+#undef arch_spin_lock
+#undef arch_spin_trylock
+#undef arch_spin_unlock
+
+#include <asm-generic/qspinlock.h>
+#include <linux/jump_label.h>
+
+#undef arch_spin_is_locked
+#undef arch_spin_is_contended
+#undef arch_spin_value_unlocked
+#undef arch_spin_lock
+#undef arch_spin_trylock
+#undef arch_spin_unlock
+
+DECLARE_STATIC_KEY_TRUE(qspinlock_key);
+
+static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+	if (static_branch_likely(&qspinlock_key))
+		queued_spin_lock(lock);
+	else
+		ticket_spin_lock(lock);
+}
+
+static __always_inline bool arch_spin_trylock(arch_spinlock_t *lock)
+{
+	if (static_branch_likely(&qspinlock_key))
+		return queued_spin_trylock(lock);
+	return ticket_spin_trylock(lock);
+}
+
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+	if (static_branch_likely(&qspinlock_key))
+		queued_spin_unlock(lock);
+	else
+		ticket_spin_unlock(lock);
+}
+
+static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+	if (static_branch_likely(&qspinlock_key))
+		return queued_spin_value_unlocked(lock);
+	else
+		return ticket_spin_value_unlocked(lock);
+}
+
+static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
+{
+	if (static_branch_likely(&qspinlock_key))
+		return queued_spin_is_locked(lock);
+	return ticket_spin_is_locked(lock);
+}
+
+static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
+{
+	if (static_branch_likely(&qspinlock_key))
+		return queued_spin_is_contended(lock);
+	return ticket_spin_is_contended(lock);
+}
+#include <asm/qrwlock.h>
+#else
+#include <asm-generic/spinlock.h>
+#endif /* CONFIG_RISCV_COMBO_SPINLOCKS */
+
+#endif /* __ASM_RISCV_SPINLOCK_H */
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index f0f36a4a0e9b..b763039bf49b 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -261,6 +261,13 @@ static void __init parse_dtb(void)
 #endif
 }
 
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+DEFINE_STATIC_KEY_TRUE_RO(qspinlock_key);
+EXPORT_SYMBOL(qspinlock_key);
+
+static bool qspinlock_flag __initdata = false;
+#endif
+
 void __init setup_arch(char **cmdline_p)
 {
 	parse_dtb();
@@ -295,10 +302,25 @@ void __init setup_arch(char **cmdline_p)
 	setup_smp();
 #endif
 
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+	if (!qspinlock_flag)
+		static_branch_disable(&qspinlock_key);
+#endif
+
 	riscv_fill_hwcap();
 	apply_boot_alternatives();
 }
 
+#ifdef CONFIG_RISCV_COMBO_SPINLOCKS
+static int __init enable_qspinlock(char *p)
+{
+	qspinlock_flag = true;
+
+	return 0;
+}
+early_param("qspinlock", enable_qspinlock);
+#endif
+
 static int __init topology_init(void)
 {
 	int i, ret;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 13/15] openrisc: cmpxchg: Cleanup unnecessary codes
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (11 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 12/15] riscv: Add combo spinlock support guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 14/15] openrisc: Move from ticket-lock to qspinlock guoren
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren,
	Guo Ren, Jonas Bonn, Stefan Kristiansson

From: Guo Ren <guoren@linux.alibaba.com>

Remove cmpxchg_small and xchg_small, because it's unnecessary now, and
they break the forward guarantee for atomic operations.

Also Remove unnecessary __HAVE_ARCH_CMPXCHG.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
---
 arch/openrisc/include/asm/cmpxchg.h | 167 +++++++++-------------------
 1 file changed, 50 insertions(+), 117 deletions(-)

diff --git a/arch/openrisc/include/asm/cmpxchg.h b/arch/openrisc/include/asm/cmpxchg.h
index 79fd16162ccb..df83b33b5882 100644
--- a/arch/openrisc/include/asm/cmpxchg.h
+++ b/arch/openrisc/include/asm/cmpxchg.h
@@ -20,10 +20,8 @@
 #include  <linux/compiler.h>
 #include  <linux/types.h>
 
-#define __HAVE_ARCH_CMPXCHG 1
-
-static inline unsigned long cmpxchg_u32(volatile void *ptr,
-		unsigned long old, unsigned long new)
+/* cmpxchg */
+static inline u32 cmpxchg32(volatile void *ptr, u32 old, u32 new)
 {
 	__asm__ __volatile__(
 		"1:	l.lwa %0, 0(%1)		\n"
@@ -41,8 +39,33 @@ static inline unsigned long cmpxchg_u32(volatile void *ptr,
 	return old;
 }
 
-static inline unsigned long xchg_u32(volatile void *ptr,
-		unsigned long val)
+#define __cmpxchg(ptr, old, new, size)					\
+({									\
+	__typeof__(ptr) __ptr = (ptr);					\
+	__typeof__(*(ptr)) __old = (old);				\
+	__typeof__(*(ptr)) __new = (new);				\
+	__typeof__(*(ptr)) __ret;					\
+	switch (size) {							\
+	case 4:								\
+		__ret = (__typeof__(*(ptr)))				\
+			cmpxchg32(__ptr, (u32)__old, (u32)__new);	\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+	__ret;								\
+})
+
+#define arch_cmpxchg(ptr, o, n)						\
+({									\
+	__typeof__(*(ptr)) _o_ = (o);					\
+	__typeof__(*(ptr)) _n_ = (n);					\
+	(__typeof__(*(ptr))) __cmpxchg((ptr),				\
+				       _o_, _n_, sizeof(*(ptr)));	\
+})
+
+/* xchg */
+static inline u32 xchg32(volatile void *ptr, u32 val)
 {
 	__asm__ __volatile__(
 		"1:	l.lwa %0, 0(%1)		\n"
@@ -56,116 +79,26 @@ static inline unsigned long xchg_u32(volatile void *ptr,
 	return val;
 }
 
-static inline u32 cmpxchg_small(volatile void *ptr, u32 old, u32 new,
-				int size)
-{
-	int off = (unsigned long)ptr % sizeof(u32);
-	volatile u32 *p = ptr - off;
-#ifdef __BIG_ENDIAN
-	int bitoff = (sizeof(u32) - size - off) * BITS_PER_BYTE;
-#else
-	int bitoff = off * BITS_PER_BYTE;
-#endif
-	u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;
-	u32 load32, old32, new32;
-	u32 ret;
-
-	load32 = READ_ONCE(*p);
-
-	while (true) {
-		ret = (load32 & bitmask) >> bitoff;
-		if (old != ret)
-			return ret;
-
-		old32 = (load32 & ~bitmask) | (old << bitoff);
-		new32 = (load32 & ~bitmask) | (new << bitoff);
-
-		/* Do 32 bit cmpxchg */
-		load32 = cmpxchg_u32(p, old32, new32);
-		if (load32 == old32)
-			return old;
-	}
-}
-
-/* xchg */
-
-static inline u32 xchg_small(volatile void *ptr, u32 x, int size)
-{
-	int off = (unsigned long)ptr % sizeof(u32);
-	volatile u32 *p = ptr - off;
-#ifdef __BIG_ENDIAN
-	int bitoff = (sizeof(u32) - size - off) * BITS_PER_BYTE;
-#else
-	int bitoff = off * BITS_PER_BYTE;
-#endif
-	u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;
-	u32 oldv, newv;
-	u32 ret;
-
-	do {
-		oldv = READ_ONCE(*p);
-		ret = (oldv & bitmask) >> bitoff;
-		newv = (oldv & ~bitmask) | (x << bitoff);
-	} while (cmpxchg_u32(p, oldv, newv) != oldv);
-
-	return ret;
-}
-
-/*
- * This function doesn't exist, so you'll get a linker error
- * if something tries to do an invalid cmpxchg().
- */
-extern unsigned long __cmpxchg_called_with_bad_pointer(void)
-	__compiletime_error("Bad argument size for cmpxchg");
-
-static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
-		unsigned long new, int size)
-{
-	switch (size) {
-	case 1:
-	case 2:
-		return cmpxchg_small(ptr, old, new, size);
-	case 4:
-		return cmpxchg_u32(ptr, old, new);
-	default:
-		return __cmpxchg_called_with_bad_pointer();
-	}
-}
-
-#define arch_cmpxchg(ptr, o, n)						\
-	({								\
-		(__typeof__(*(ptr))) __cmpxchg((ptr),			\
-					       (unsigned long)(o),	\
-					       (unsigned long)(n),	\
-					       sizeof(*(ptr)));		\
-	})
-
-/*
- * This function doesn't exist, so you'll get a linker error if
- * something tries to do an invalidly-sized xchg().
- */
-extern unsigned long __xchg_called_with_bad_pointer(void)
-	__compiletime_error("Bad argument size for xchg");
-
-static inline unsigned long __xchg(volatile void *ptr, unsigned long with,
-		int size)
-{
-	switch (size) {
-	case 1:
-	case 2:
-		return xchg_small(ptr, with, size);
-	case 4:
-		return xchg_u32(ptr, with);
-	default:
-		return __xchg_called_with_bad_pointer();
-	}
-}
-
-#define arch_xchg(ptr, with) 						\
-	({								\
-		(__typeof__(*(ptr))) __xchg((ptr),			\
-					    (unsigned long)(with),	\
-					    sizeof(*(ptr)));		\
-	})
+#define __xchg(ptr, new, size)						\
+({									\
+	__typeof__(ptr) __ptr = (ptr);					\
+	__typeof__(new) __new = (new);					\
+	__typeof__(*(ptr)) __ret;					\
+	switch (size) {							\
+	case 4:								\
+		__ret = (__typeof__(*(ptr)))				\
+			xchg32(__ptr, (u32)__new);			\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+	__ret;								\
+})
+
+#define arch_xchg(ptr, x)						\
+({									\
+	__typeof__(*(ptr)) _x_ = (x);					\
+	(__typeof__(*(ptr))) __xchg((ptr), _x_, sizeof(*(ptr)));	\
+})
 
 #endif /* __ASM_OPENRISC_CMPXCHG_H */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 14/15] openrisc: Move from ticket-lock to qspinlock
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (12 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 13/15] openrisc: cmpxchg: Cleanup unnecessary codes guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:13 ` [PATCH V9 15/15] csky: spinlock: Use the generic header files guoren
  2022-08-08  7:25 ` [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup Guo Ren
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren,
	Guo Ren, Jonas Bonn, Stefan Kristiansson

From: Guo Ren <guoren@linux.alibaba.com>

Enable qspinlock by the requirements mentioned in a8ad07e5240c9
("asm-generic: qspinlock: Indicate the use of mixed-size atomics").

Openrisc only has "l.lwa/l.swa" for all atomic operations. That means
its ll/sc pair should be a strong atomic forward progress guarantee, or
all atomic operations may cause live lock. The ticket-lock needs
atomic_fetch_add well defined forward progress guarantees under
contention, and qspinlock needs xchg16 forward progress guarantees. The
atomic_fetch_add (l.lwa + add + l.swa) & xchg16 (l.lwa + and + or +
l.swa) have similar implementations, so they has the same forward
progress guarantees.

The qspinlock is smaller and faster than ticket-lock when all is in
fast-path. No reason keep openrisc in ticket-lock not qspinlock. Here is
the comparison between qspinlock and ticket-lock in fast-path code
sizes (bytes):

TYPE			: TICKET | QUEUED
arch_spin_lock		: 128    | 96
arch_spin_unlock	: 56     | 44
arch_spin_trylock	: 108    | 80
arch_spin_is_locked	: 36     | 36
arch_spin_is_contended	: 36     | 36
arch_spin_value_unlocked: 28     | 28

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
---
 arch/openrisc/Kconfig               |  1 +
 arch/openrisc/include/asm/Kbuild    |  2 ++
 arch/openrisc/include/asm/cmpxchg.h | 25 +++++++++++++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index c7f282f60f64..1652a6aac882 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -10,6 +10,7 @@ config OPENRISC
 	select ARCH_HAS_DMA_SET_UNCACHED
 	select ARCH_HAS_DMA_CLEAR_UNCACHED
 	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+	select ARCH_USE_QUEUED_SPINLOCKS
 	select COMMON_CLK
 	select OF
 	select OF_EARLY_FLATTREE
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index c8c99b554ca4..ad147fec50b4 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -2,6 +2,8 @@
 generic-y += extable.h
 generic-y += kvm_para.h
 generic-y += parport.h
+generic-y += mcs_spinlock.h
+generic-y += qspinlock.h
 generic-y += spinlock_types.h
 generic-y += spinlock.h
 generic-y += qrwlock_types.h
diff --git a/arch/openrisc/include/asm/cmpxchg.h b/arch/openrisc/include/asm/cmpxchg.h
index df83b33b5882..2d650b07a0f4 100644
--- a/arch/openrisc/include/asm/cmpxchg.h
+++ b/arch/openrisc/include/asm/cmpxchg.h
@@ -65,6 +65,27 @@ static inline u32 cmpxchg32(volatile void *ptr, u32 old, u32 new)
 })
 
 /* xchg */
+static inline u32 xchg16(volatile void *ptr, u32 val)
+{
+	u32 ret, tmp;
+	u32 shif = ((ulong)ptr & 2) ? 16 : 0;
+	u32 mask = 0xffff << shif;
+	u32 *__ptr = (u32 *)((ulong)ptr & ~2);
+
+	__asm__ __volatile__(
+		"1:	l.lwa %0, 0(%2)		\n"
+		"	l.and %1, %0, %3	\n"
+		"	l.or  %1, %1, %4	\n"
+		"	l.swa 0(%2), %1		\n"
+		"	l.bnf 1b		\n"
+		"	 l.nop			\n"
+		: "=&r" (ret), "=&r" (tmp)
+		: "r"(__ptr), "r" (~mask), "r" (val << shif)
+		: "cc", "memory");
+
+	return (ret & mask) >> shif;
+}
+
 static inline u32 xchg32(volatile void *ptr, u32 val)
 {
 	__asm__ __volatile__(
@@ -85,6 +106,10 @@ static inline u32 xchg32(volatile void *ptr, u32 val)
 	__typeof__(new) __new = (new);					\
 	__typeof__(*(ptr)) __ret;					\
 	switch (size) {							\
+	case 2:								\
+		__ret = (__typeof__(*(ptr)))				\
+			xchg16(__ptr, (u32)__new);			\
+		break;							\
 	case 4:								\
 		__ret = (__typeof__(*(ptr)))				\
 			xchg32(__ptr, (u32)__new);			\
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V9 15/15] csky: spinlock: Use the generic header files
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (13 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 14/15] openrisc: Move from ticket-lock to qspinlock guoren
@ 2022-08-08  7:13 ` guoren
  2022-08-08  7:25 ` [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup Guo Ren
  15 siblings, 0 replies; 17+ messages in thread
From: guoren @ 2022-08-08  7:13 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

There is no difference between csky and generic, so use the generic
header.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/csky/include/asm/Kbuild           |  2 ++
 arch/csky/include/asm/spinlock.h       | 12 ------------
 arch/csky/include/asm/spinlock_types.h |  9 ---------
 3 files changed, 2 insertions(+), 21 deletions(-)
 delete mode 100644 arch/csky/include/asm/spinlock.h
 delete mode 100644 arch/csky/include/asm/spinlock_types.h

diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 1117c28cb7e8..c08050fc0cce 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -7,6 +7,8 @@ generic-y += mcs_spinlock.h
 generic-y += qrwlock.h
 generic-y += qrwlock_types.h
 generic-y += qspinlock.h
+generic-y += spinlock_types.h
+generic-y += spinlock.h
 generic-y += parport.h
 generic-y += user.h
 generic-y += vmlinux.lds.h
diff --git a/arch/csky/include/asm/spinlock.h b/arch/csky/include/asm/spinlock.h
deleted file mode 100644
index 83a2005341f5..000000000000
--- a/arch/csky/include/asm/spinlock.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-
-#ifndef __ASM_CSKY_SPINLOCK_H
-#define __ASM_CSKY_SPINLOCK_H
-
-#include <asm/qspinlock.h>
-#include <asm/qrwlock.h>
-
-/* See include/linux/spinlock.h */
-#define smp_mb__after_spinlock()	smp_mb()
-
-#endif /* __ASM_CSKY_SPINLOCK_H */
diff --git a/arch/csky/include/asm/spinlock_types.h b/arch/csky/include/asm/spinlock_types.h
deleted file mode 100644
index 75bdf3af80ba..000000000000
--- a/arch/csky/include/asm/spinlock_types.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-
-#ifndef __ASM_CSKY_SPINLOCK_TYPES_H
-#define __ASM_CSKY_SPINLOCK_TYPES_H
-
-#include <asm-generic/qspinlock_types.h>
-#include <asm-generic/qrwlock_types.h>
-
-#endif /* __ASM_CSKY_SPINLOCK_TYPES_H */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup
  2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
                   ` (14 preceding siblings ...)
  2022-08-08  7:13 ` [PATCH V9 15/15] csky: spinlock: Use the generic header files guoren
@ 2022-08-08  7:25 ` Guo Ren
  15 siblings, 0 replies; 17+ messages in thread
From: Guo Ren @ 2022-08-08  7:25 UTC (permalink / raw)
  To: palmer, heiko, hch, arnd, peterz, will, boqun.feng, longman,
	shorne, conor.dooley
  Cc: linux-csky, linux-arch, linux-kernel, linux-riscv, Guo Ren

Sorry, here is the Changelog:

Changes in V9:
 - Fixup xchg16 compile warning
 - Keep ticket-lock the same semantic with qspinlock
 - Remove unused xchg32 and xchg64
 - Forbid arch_cmpxchg64 for 32-bit
 - Add openrisc qspinlock support

Changes in V8:
 - Coding convention ticket fixup
 - Move combo spinlock into riscv and simply asm-generic/spinlock.h
 - Fixup xchg16 with wrong return value
 - Add csky qspinlock
 - Add combo & qspinlock & ticket-lock comparison
 - Clean up unnecessary riscv acquire and release definitions
 - Enable ARCH_INLINE_READ*/WRITE*/SPIN* for riscv & csky

Changes in V7:
 - Add combo spinlock (ticket & queued) support
 - Rename ticket_spinlock.h
 - Remove unnecessary atomic_read in ticket_spin_value_unlocked

Changes in V6:
 - Fixup Clang compile problem Reported-by: kernel test robot
   <lkp@intel.com>
 - Cleanup asm-generic/spinlock.h
 - Remove changelog in patch main comment part, suggested by
   Conor.Dooley@microchip.com
 - Remove "default y if NUMA" in Kconfig

Changes in V5:
 - Update comment with RISC-V forward guarantee feature.
 - Back to V3 direction and optimize asm code.

Changes in V4:
 - Remove custom sub-word xchg implementation
 - Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in locking/qspinlock

Changes in V3:
 - Coding convention by Peter Zijlstra's advices

Changes in V2:
 - Coding convention in cmpxchg.h
 - Re-implement short xchg
 - Remove char & cmpxchg implementations



On Mon, Aug 8, 2022 at 3:14 PM <guoren@kernel.org> wrote:
>
> From: Guo Ren <guoren@linux.alibaba.com>
>
> In this series:
>  - Cleanup generic ticket-lock code, (Using smp_mb__after_spinlock as RCsc)
>  - Add qspinlock and combo-lock for riscv
>  - Add qspinlock to openrisc
>  - Use generic header in csky
>  - Optimize cmpxchg & atomic code
>
> Enable qspinlock and meet the requirements mentioned in a8ad07e5240c9
> ("asm-generic: qspinlock: Indicate the use of mixed-size atomics").
>
> RISC-V LR/SC pairs could provide a strong/weak forward guarantee that
> depends on micro-architecture. And RISC-V ISA spec has given out
> several limitations to let hardware support strict forward guarantee
> (RISC-V User ISA - 8.3 Eventual Success of Store-Conditional
> Instructions).
>
> eg:
> Some riscv hardware such as BOOMv3 & XiangShan could provide strict &
> strong forward guarantee (The cache line would be kept in an exclusive
> state for Backoff cycles, and only this core's interrupt could break
> the LR/SC pair).
> Qemu riscv give a weak forward guarantee by wrong implementation
> currently [1].
>
> So we Add combo spinlock (ticket & queued) support for riscv. Thus different
> kinds of memory model micro-arch processors could use the same Image
>
> The first try of qspinlock for riscv was made in 2019.1 [2].
>
> [1] https://github.com/qemu/qemu/blob/master/target/riscv/insn_trans/trans_rva.c.inc
> [2] https://lore.kernel.org/linux-riscv/20190211043829.30096-1-michaeljclark@mac.com/#r
>
> Guo Ren (15):
>   asm-generic: ticket-lock: Remove unnecessary atomic_read
>   asm-generic: ticket-lock: Use the same struct definitions with qspinlock
>   asm-generic: ticket-lock: Move into ticket_spinlock.h
>   asm-generic: ticket-lock: Keep ticket-lock the same semantic with qspinlock
>   asm-generic: spinlock: Add queued spinlock support in common header
>   riscv: atomic: Clean up unnecessary acquire and release definitions
>   riscv: cmpxchg: Remove xchg32 and xchg64
>   riscv: cmpxchg: Forbid arch_cmpxchg64 for 32-bit
>   riscv: cmpxchg: Optimize cmpxchg64
>   riscv: Enable ARCH_INLINE_READ*/WRITE*/SPIN*
>   riscv: Add qspinlock support
>   riscv: Add combo spinlock support
>   openrisc: cmpxchg: Cleanup unnecessary codes
>   openrisc: Move from ticket-lock to qspinlock
>   csky: spinlock: Use the generic header files
>
>  arch/csky/include/asm/Kbuild           |   2 +
>  arch/csky/include/asm/spinlock.h       |  12 --
>  arch/csky/include/asm/spinlock_types.h |   9 --
>  arch/openrisc/Kconfig                  |   1 +
>  arch/openrisc/include/asm/Kbuild       |   2 +
>  arch/openrisc/include/asm/cmpxchg.h    | 192 ++++++++++---------------
>  arch/riscv/Kconfig                     |  49 +++++++
>  arch/riscv/include/asm/Kbuild          |   3 +-
>  arch/riscv/include/asm/atomic.h        |  19 ---
>  arch/riscv/include/asm/cmpxchg.h       | 177 +++++++----------------
>  arch/riscv/include/asm/spinlock.h      |  77 ++++++++++
>  arch/riscv/kernel/setup.c              |  22 +++
>  include/asm-generic/spinlock.h         |  94 ++----------
>  include/asm-generic/spinlock_types.h   |  12 +-
>  include/asm-generic/ticket_spinlock.h  |  93 ++++++++++++
>  15 files changed, 384 insertions(+), 380 deletions(-)
>  delete mode 100644 arch/csky/include/asm/spinlock.h
>  delete mode 100644 arch/csky/include/asm/spinlock_types.h
>  create mode 100644 arch/riscv/include/asm/spinlock.h
>  create mode 100644 include/asm-generic/ticket_spinlock.h
>
> --
> 2.36.1
>


-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-08-08  7:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-08  7:13 [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup guoren
2022-08-08  7:13 ` [PATCH V9 01/15] asm-generic: ticket-lock: Remove unnecessary atomic_read guoren
2022-08-08  7:13 ` [PATCH V9 02/15] asm-generic: ticket-lock: Use the same struct definitions with qspinlock guoren
2022-08-08  7:13 ` [PATCH V9 03/15] asm-generic: ticket-lock: Move into ticket_spinlock.h guoren
2022-08-08  7:13 ` [PATCH V9 04/15] asm-generic: ticket-lock: Keep ticket-lock the same semantic with qspinlock guoren
2022-08-08  7:13 ` [PATCH V9 05/15] asm-generic: spinlock: Add queued spinlock support in common header guoren
2022-08-08  7:13 ` [PATCH V9 06/15] riscv: atomic: Clean up unnecessary acquire and release definitions guoren
2022-08-08  7:13 ` [PATCH V9 07/15] riscv: cmpxchg: Remove xchg32 and xchg64 guoren
2022-08-08  7:13 ` [PATCH V9 08/15] riscv: cmpxchg: Forbid arch_cmpxchg64 for 32-bit guoren
2022-08-08  7:13 ` [PATCH V9 09/15] riscv: cmpxchg: Optimize cmpxchg64 guoren
2022-08-08  7:13 ` [PATCH V9 10/15] riscv: Enable ARCH_INLINE_READ*/WRITE*/SPIN* guoren
2022-08-08  7:13 ` [PATCH V9 11/15] riscv: Add qspinlock support guoren
2022-08-08  7:13 ` [PATCH V9 12/15] riscv: Add combo spinlock support guoren
2022-08-08  7:13 ` [PATCH V9 13/15] openrisc: cmpxchg: Cleanup unnecessary codes guoren
2022-08-08  7:13 ` [PATCH V9 14/15] openrisc: Move from ticket-lock to qspinlock guoren
2022-08-08  7:13 ` [PATCH V9 15/15] csky: spinlock: Use the generic header files guoren
2022-08-08  7:25 ` [PATCH V9 00/15] arch: Add qspinlock support and atomic cleanup Guo Ren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).