linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] ARC atomics update
@ 2021-08-04 19:15 Vineet Gupta
  2021-08-04 19:15 ` [PATCH 01/11] ARC: atomics: disintegrate header Vineet Gupta
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

Hi,

This series contains long due update to ARC atomics, discussed back
in 2018 [1] and [2]. I had them for arc64 port and decided to post them
here for some review and inclusion, after Mark's rework.

The main changes are use of relaxed atomics and generic bitops. Latter
does cause some cogen bloat on ARC due to signed args but that can be
reviewd seperately consider cross-arch impact.

The changes survive glibc testsuite with no regressions whatsoever.

Please review and provide any feedback.

Thx,
-Vineet

[1] https://lore.kernel.org/r/20180830144344.GW24142@hirez.programming.kicks-ass.net
[2] https://lore.kernel.org/r/20180830135749.GA13005@arm.com


Vineet Gupta (10):
  ARC: atomics: disintegrate header
  ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
  ARC: atomic: !LLSC: use int data type consistently
  ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
  ARC: atomics: implement relaxed variants
  ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
  ARC: xchg: !LLSC: remove UP micro-optimization/hack
  ARC: cmpxchg/xchg: rewrite as macros to make type safe
  ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
  ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants

Will Deacon (1):
  ARC: switch to generic bitops

 arch/arc/include/asm/atomic-llsc.h     |  97 ++++++
 arch/arc/include/asm/atomic-spinlock.h | 102 ++++++
 arch/arc/include/asm/atomic.h          | 444 ++-----------------------
 arch/arc/include/asm/atomic64-arcv2.h  | 250 ++++++++++++++
 arch/arc/include/asm/bitops.h          | 188 +----------
 arch/arc/include/asm/cmpxchg.h         | 233 ++++++-------
 arch/arc/include/asm/smp.h             |  14 -
 arch/arc/kernel/smp.c                  |   2 -
 8 files changed, 588 insertions(+), 742 deletions(-)
 create mode 100644 arch/arc/include/asm/atomic-llsc.h
 create mode 100644 arch/arc/include/asm/atomic-spinlock.h
 create mode 100644 arch/arc/include/asm/atomic64-arcv2.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/11] ARC: atomics: disintegrate header
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 02/11] ARC: atomic: !LLSC: remove hack in atomic_set() for for UP Vineet Gupta
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

Non functional change, to ease future addition/removal

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic-llsc.h     | 103 ++++++
 arch/arc/include/asm/atomic-spinlock.h | 111 +++++++
 arch/arc/include/asm/atomic.h          | 429 +------------------------
 arch/arc/include/asm/atomic64-arcv2.h  | 242 ++++++++++++++
 4 files changed, 461 insertions(+), 424 deletions(-)
 create mode 100644 arch/arc/include/asm/atomic-llsc.h
 create mode 100644 arch/arc/include/asm/atomic-spinlock.h
 create mode 100644 arch/arc/include/asm/atomic64-arcv2.h

diff --git a/arch/arc/include/asm/atomic-llsc.h b/arch/arc/include/asm/atomic-llsc.h
new file mode 100644
index 000000000000..aab4f2855457
--- /dev/null
+++ b/arch/arc/include/asm/atomic-llsc.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_ARC_ATOMIC_LLSC_H
+#define _ASM_ARC_ATOMIC_LLSC_H
+
+#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
+
+#define ATOMIC_OP(op, c_op, asm_op)					\
+static inline void arch_atomic_##op(int i, atomic_t *v)			\
+{									\
+	unsigned int val;						\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock   %[val], [%[ctr]]		\n"		\
+	"	" #asm_op " %[val], %[val], %[i]	\n"		\
+	"	scond   %[val], [%[ctr]]		\n"		\
+	"	bnz     1b				\n"		\
+	: [val]	"=&r"	(val) /* Early clobber to prevent reg reuse */	\
+	: [ctr]	"r"	(&v->counter), /* Not "m": llock only supports reg direct addr mode */	\
+	  [i]	"ir"	(i)						\
+	: "cc");							\
+}									\
+
+#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+static inline int arch_atomic_##op##_return(int i, atomic_t *v)		\
+{									\
+	unsigned int val;						\
+									\
+	/*								\
+	 * Explicit full memory barrier needed before/after as		\
+	 * LLOCK/SCOND themselves don't provide any such semantics	\
+	 */								\
+	smp_mb();							\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock   %[val], [%[ctr]]		\n"		\
+	"	" #asm_op " %[val], %[val], %[i]	\n"		\
+	"	scond   %[val], [%[ctr]]		\n"		\
+	"	bnz     1b				\n"		\
+	: [val]	"=&r"	(val)						\
+	: [ctr]	"r"	(&v->counter),					\
+	  [i]	"ir"	(i)						\
+	: "cc");							\
+									\
+	smp_mb();							\
+									\
+	return val;							\
+}
+
+#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
+static inline int arch_atomic_fetch_##op(int i, atomic_t *v)		\
+{									\
+	unsigned int val, orig;						\
+									\
+	/*								\
+	 * Explicit full memory barrier needed before/after as		\
+	 * LLOCK/SCOND themselves don't provide any such semantics	\
+	 */								\
+	smp_mb();							\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock   %[orig], [%[ctr]]		\n"		\
+	"	" #asm_op " %[val], %[orig], %[i]	\n"		\
+	"	scond   %[val], [%[ctr]]		\n"		\
+	"	bnz     1b				\n"		\
+	: [val]	"=&r"	(val),						\
+	  [orig] "=&r" (orig)						\
+	: [ctr]	"r"	(&v->counter),					\
+	  [i]	"ir"	(i)						\
+	: "cc");							\
+									\
+	smp_mb();							\
+									\
+	return orig;							\
+}
+
+#define ATOMIC_OPS(op, c_op, asm_op)					\
+	ATOMIC_OP(op, c_op, asm_op)					\
+	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+	ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(add, +=, add)
+ATOMIC_OPS(sub, -=, sub)
+
+#undef ATOMIC_OPS
+#define ATOMIC_OPS(op, c_op, asm_op)					\
+	ATOMIC_OP(op, c_op, asm_op)					\
+	ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(and, &=, and)
+ATOMIC_OPS(andnot, &= ~, bic)
+ATOMIC_OPS(or, |=, or)
+ATOMIC_OPS(xor, ^=, xor)
+
+#define arch_atomic_andnot		arch_atomic_andnot
+#define arch_atomic_fetch_andnot	arch_atomic_fetch_andnot
+
+#undef ATOMIC_OPS
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP
+
+#endif
diff --git a/arch/arc/include/asm/atomic-spinlock.h b/arch/arc/include/asm/atomic-spinlock.h
new file mode 100644
index 000000000000..bdf87610b2d7
--- /dev/null
+++ b/arch/arc/include/asm/atomic-spinlock.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_ARC_ATOMIC_SPLOCK_H
+#define _ASM_ARC_ATOMIC_SPLOCK_H
+
+#ifndef CONFIG_SMP
+
+ /* violating atomic_xxx API locking protocol in UP for optimization sake */
+#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
+
+#else
+
+static inline void arch_atomic_set(atomic_t *v, int i)
+{
+	/*
+	 * Independent of hardware support, all of the atomic_xxx() APIs need
+	 * to follow the same locking rules to make sure that a "hardware"
+	 * atomic insn (e.g. LD) doesn't clobber an "emulated" atomic insn
+	 * sequence
+	 *
+	 * Thus atomic_set() despite being 1 insn (and seemingly atomic)
+	 * requires the locking.
+	 */
+	unsigned long flags;
+
+	atomic_ops_lock(flags);
+	WRITE_ONCE(v->counter, i);
+	atomic_ops_unlock(flags);
+}
+
+#define arch_atomic_set_release(v, i)	arch_atomic_set((v), (i))
+
+#endif
+
+/*
+ * Non hardware assisted Atomic-R-M-W
+ * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
+ */
+
+#define ATOMIC_OP(op, c_op, asm_op)					\
+static inline void arch_atomic_##op(int i, atomic_t *v)			\
+{									\
+	unsigned long flags;						\
+									\
+	atomic_ops_lock(flags);						\
+	v->counter c_op i;						\
+	atomic_ops_unlock(flags);					\
+}
+
+#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+static inline int arch_atomic_##op##_return(int i, atomic_t *v)		\
+{									\
+	unsigned long flags;						\
+	unsigned long temp;						\
+									\
+	/*								\
+	 * spin lock/unlock provides the needed smp_mb() before/after	\
+	 */								\
+	atomic_ops_lock(flags);						\
+	temp = v->counter;						\
+	temp c_op i;							\
+	v->counter = temp;						\
+	atomic_ops_unlock(flags);					\
+									\
+	return temp;							\
+}
+
+#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
+static inline int arch_atomic_fetch_##op(int i, atomic_t *v)		\
+{									\
+	unsigned long flags;						\
+	unsigned long orig;						\
+									\
+	/*								\
+	 * spin lock/unlock provides the needed smp_mb() before/after	\
+	 */								\
+	atomic_ops_lock(flags);						\
+	orig = v->counter;						\
+	v->counter c_op i;						\
+	atomic_ops_unlock(flags);					\
+									\
+	return orig;							\
+}
+
+#define ATOMIC_OPS(op, c_op, asm_op)					\
+	ATOMIC_OP(op, c_op, asm_op)					\
+	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+	ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(add, +=, add)
+ATOMIC_OPS(sub, -=, sub)
+
+#undef ATOMIC_OPS
+#define ATOMIC_OPS(op, c_op, asm_op)					\
+	ATOMIC_OP(op, c_op, asm_op)					\
+	ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(and, &=, and)
+ATOMIC_OPS(andnot, &= ~, bic)
+ATOMIC_OPS(or, |=, or)
+ATOMIC_OPS(xor, ^=, xor)
+
+#define arch_atomic_andnot		arch_atomic_andnot
+#define arch_atomic_fetch_andnot	arch_atomic_fetch_andnot
+
+#undef ATOMIC_OPS
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP
+
+#endif
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 7a36d79b5b2f..ee88e1dbaab5 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -17,435 +17,16 @@
 #define arch_atomic_read(v)  READ_ONCE((v)->counter)
 
 #ifdef CONFIG_ARC_HAS_LLSC
-
-#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
-
-#define ATOMIC_OP(op, c_op, asm_op)					\
-static inline void arch_atomic_##op(int i, atomic_t *v)			\
-{									\
-	unsigned int val;						\
-									\
-	__asm__ __volatile__(						\
-	"1:	llock   %[val], [%[ctr]]		\n"		\
-	"	" #asm_op " %[val], %[val], %[i]	\n"		\
-	"	scond   %[val], [%[ctr]]		\n"		\
-	"	bnz     1b				\n"		\
-	: [val]	"=&r"	(val) /* Early clobber to prevent reg reuse */	\
-	: [ctr]	"r"	(&v->counter), /* Not "m": llock only supports reg direct addr mode */	\
-	  [i]	"ir"	(i)						\
-	: "cc");							\
-}									\
-
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-static inline int arch_atomic_##op##_return(int i, atomic_t *v)		\
-{									\
-	unsigned int val;						\
-									\
-	/*								\
-	 * Explicit full memory barrier needed before/after as		\
-	 * LLOCK/SCOND themselves don't provide any such semantics	\
-	 */								\
-	smp_mb();							\
-									\
-	__asm__ __volatile__(						\
-	"1:	llock   %[val], [%[ctr]]		\n"		\
-	"	" #asm_op " %[val], %[val], %[i]	\n"		\
-	"	scond   %[val], [%[ctr]]		\n"		\
-	"	bnz     1b				\n"		\
-	: [val]	"=&r"	(val)						\
-	: [ctr]	"r"	(&v->counter),					\
-	  [i]	"ir"	(i)						\
-	: "cc");							\
-									\
-	smp_mb();							\
-									\
-	return val;							\
-}
-
-#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
-static inline int arch_atomic_fetch_##op(int i, atomic_t *v)		\
-{									\
-	unsigned int val, orig;						\
-									\
-	/*								\
-	 * Explicit full memory barrier needed before/after as		\
-	 * LLOCK/SCOND themselves don't provide any such semantics	\
-	 */								\
-	smp_mb();							\
-									\
-	__asm__ __volatile__(						\
-	"1:	llock   %[orig], [%[ctr]]		\n"		\
-	"	" #asm_op " %[val], %[orig], %[i]	\n"		\
-	"	scond   %[val], [%[ctr]]		\n"		\
-	"	bnz     1b				\n"		\
-	: [val]	"=&r"	(val),						\
-	  [orig] "=&r" (orig)						\
-	: [ctr]	"r"	(&v->counter),					\
-	  [i]	"ir"	(i)						\
-	: "cc");							\
-									\
-	smp_mb();							\
-									\
-	return orig;							\
-}
-
-#else	/* !CONFIG_ARC_HAS_LLSC */
-
-#ifndef CONFIG_SMP
-
- /* violating atomic_xxx API locking protocol in UP for optimization sake */
-#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
-
+#include <asm/atomic-llsc.h>
 #else
-
-static inline void arch_atomic_set(atomic_t *v, int i)
-{
-	/*
-	 * Independent of hardware support, all of the atomic_xxx() APIs need
-	 * to follow the same locking rules to make sure that a "hardware"
-	 * atomic insn (e.g. LD) doesn't clobber an "emulated" atomic insn
-	 * sequence
-	 *
-	 * Thus atomic_set() despite being 1 insn (and seemingly atomic)
-	 * requires the locking.
-	 */
-	unsigned long flags;
-
-	atomic_ops_lock(flags);
-	WRITE_ONCE(v->counter, i);
-	atomic_ops_unlock(flags);
-}
-
-#define arch_atomic_set_release(v, i)	arch_atomic_set((v), (i))
-
+#include <asm/atomic-spinlock.h>
 #endif
 
-/*
- * Non hardware assisted Atomic-R-M-W
- * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
- */
-
-#define ATOMIC_OP(op, c_op, asm_op)					\
-static inline void arch_atomic_##op(int i, atomic_t *v)			\
-{									\
-	unsigned long flags;						\
-									\
-	atomic_ops_lock(flags);						\
-	v->counter c_op i;						\
-	atomic_ops_unlock(flags);					\
-}
-
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-static inline int arch_atomic_##op##_return(int i, atomic_t *v)		\
-{									\
-	unsigned long flags;						\
-	unsigned long temp;						\
-									\
-	/*								\
-	 * spin lock/unlock provides the needed smp_mb() before/after	\
-	 */								\
-	atomic_ops_lock(flags);						\
-	temp = v->counter;						\
-	temp c_op i;							\
-	v->counter = temp;						\
-	atomic_ops_unlock(flags);					\
-									\
-	return temp;							\
-}
-
-#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
-static inline int arch_atomic_fetch_##op(int i, atomic_t *v)		\
-{									\
-	unsigned long flags;						\
-	unsigned long orig;						\
-									\
-	/*								\
-	 * spin lock/unlock provides the needed smp_mb() before/after	\
-	 */								\
-	atomic_ops_lock(flags);						\
-	orig = v->counter;						\
-	v->counter c_op i;						\
-	atomic_ops_unlock(flags);					\
-									\
-	return orig;							\
-}
-
-#endif /* !CONFIG_ARC_HAS_LLSC */
-
-#define ATOMIC_OPS(op, c_op, asm_op)					\
-	ATOMIC_OP(op, c_op, asm_op)					\
-	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-	ATOMIC_FETCH_OP(op, c_op, asm_op)
-
-ATOMIC_OPS(add, +=, add)
-ATOMIC_OPS(sub, -=, sub)
-
-#undef ATOMIC_OPS
-#define ATOMIC_OPS(op, c_op, asm_op)					\
-	ATOMIC_OP(op, c_op, asm_op)					\
-	ATOMIC_FETCH_OP(op, c_op, asm_op)
-
-ATOMIC_OPS(and, &=, and)
-ATOMIC_OPS(andnot, &= ~, bic)
-ATOMIC_OPS(or, |=, or)
-ATOMIC_OPS(xor, ^=, xor)
-
-#define arch_atomic_andnot		arch_atomic_andnot
-#define arch_atomic_fetch_andnot	arch_atomic_fetch_andnot
-
-#undef ATOMIC_OPS
-#undef ATOMIC_FETCH_OP
-#undef ATOMIC_OP_RETURN
-#undef ATOMIC_OP
-
 #ifdef CONFIG_GENERIC_ATOMIC64
-
 #include <asm-generic/atomic64.h>
-
-#else	/* Kconfig ensures this is only enabled with needed h/w assist */
-
-/*
- * ARCv2 supports 64-bit exclusive load (LLOCKD) / store (SCONDD)
- *  - The address HAS to be 64-bit aligned
- *  - There are 2 semantics involved here:
- *    = exclusive implies no interim update between load/store to same addr
- *    = both words are observed/updated together: this is guaranteed even
- *      for regular 64-bit load (LDD) / store (STD). Thus atomic64_set()
- *      is NOT required to use LLOCKD+SCONDD, STD suffices
- */
-
-typedef struct {
-	s64 __aligned(8) counter;
-} atomic64_t;
-
-#define ATOMIC64_INIT(a) { (a) }
-
-static inline s64 arch_atomic64_read(const atomic64_t *v)
-{
-	s64 val;
-
-	__asm__ __volatile__(
-	"	ldd   %0, [%1]	\n"
-	: "=r"(val)
-	: "r"(&v->counter));
-
-	return val;
-}
-
-static inline void arch_atomic64_set(atomic64_t *v, s64 a)
-{
-	/*
-	 * This could have been a simple assignment in "C" but would need
-	 * explicit volatile. Otherwise gcc optimizers could elide the store
-	 * which borked atomic64 self-test
-	 * In the inline asm version, memory clobber needed for exact same
-	 * reason, to tell gcc about the store.
-	 *
-	 * This however is not needed for sibling atomic64_add() etc since both
-	 * load/store are explicitly done in inline asm. As long as API is used
-	 * for each access, gcc has no way to optimize away any load/store
-	 */
-	__asm__ __volatile__(
-	"	std   %0, [%1]	\n"
-	:
-	: "r"(a), "r"(&v->counter)
-	: "memory");
-}
-
-#define ATOMIC64_OP(op, op1, op2)					\
-static inline void arch_atomic64_##op(s64 a, atomic64_t *v)		\
-{									\
-	s64 val;							\
-									\
-	__asm__ __volatile__(						\
-	"1:				\n"				\
-	"	llockd  %0, [%1]	\n"				\
-	"	" #op1 " %L0, %L0, %L2	\n"				\
-	"	" #op2 " %H0, %H0, %H2	\n"				\
-	"	scondd   %0, [%1]	\n"				\
-	"	bnz     1b		\n"				\
-	: "=&r"(val)							\
-	: "r"(&v->counter), "ir"(a)					\
-	: "cc");							\
-}									\
-
-#define ATOMIC64_OP_RETURN(op, op1, op2)		        	\
-static inline s64 arch_atomic64_##op##_return(s64 a, atomic64_t *v)	\
-{									\
-	s64 val;							\
-									\
-	smp_mb();							\
-									\
-	__asm__ __volatile__(						\
-	"1:				\n"				\
-	"	llockd   %0, [%1]	\n"				\
-	"	" #op1 " %L0, %L0, %L2	\n"				\
-	"	" #op2 " %H0, %H0, %H2	\n"				\
-	"	scondd   %0, [%1]	\n"				\
-	"	bnz     1b		\n"				\
-	: [val] "=&r"(val)						\
-	: "r"(&v->counter), "ir"(a)					\
-	: "cc");	/* memory clobber comes from smp_mb() */	\
-									\
-	smp_mb();							\
-									\
-	return val;							\
-}
-
-#define ATOMIC64_FETCH_OP(op, op1, op2)		        		\
-static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v)	\
-{									\
-	s64 val, orig;							\
-									\
-	smp_mb();							\
-									\
-	__asm__ __volatile__(						\
-	"1:				\n"				\
-	"	llockd   %0, [%2]	\n"				\
-	"	" #op1 " %L1, %L0, %L3	\n"				\
-	"	" #op2 " %H1, %H0, %H3	\n"				\
-	"	scondd   %1, [%2]	\n"				\
-	"	bnz     1b		\n"				\
-	: "=&r"(orig), "=&r"(val)					\
-	: "r"(&v->counter), "ir"(a)					\
-	: "cc");	/* memory clobber comes from smp_mb() */	\
-									\
-	smp_mb();							\
-									\
-	return orig;							\
-}
-
-#define ATOMIC64_OPS(op, op1, op2)					\
-	ATOMIC64_OP(op, op1, op2)					\
-	ATOMIC64_OP_RETURN(op, op1, op2)				\
-	ATOMIC64_FETCH_OP(op, op1, op2)
-
-ATOMIC64_OPS(add, add.f, adc)
-ATOMIC64_OPS(sub, sub.f, sbc)
-ATOMIC64_OPS(and, and, and)
-ATOMIC64_OPS(andnot, bic, bic)
-ATOMIC64_OPS(or, or, or)
-ATOMIC64_OPS(xor, xor, xor)
-
-#define arch_atomic64_andnot		arch_atomic64_andnot
-#define arch_atomic64_fetch_andnot	arch_atomic64_fetch_andnot
-
-#undef ATOMIC64_OPS
-#undef ATOMIC64_FETCH_OP
-#undef ATOMIC64_OP_RETURN
-#undef ATOMIC64_OP
-
-static inline s64
-arch_atomic64_cmpxchg(atomic64_t *ptr, s64 expected, s64 new)
-{
-	s64 prev;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llockd  %0, [%1]	\n"
-	"	brne    %L0, %L2, 2f	\n"
-	"	brne    %H0, %H2, 2f	\n"
-	"	scondd  %3, [%1]	\n"
-	"	bnz     1b		\n"
-	"2:				\n"
-	: "=&r"(prev)
-	: "r"(ptr), "ir"(expected), "r"(new)
-	: "cc");	/* memory clobber comes from smp_mb() */
-
-	smp_mb();
-
-	return prev;
-}
-
-static inline s64 arch_atomic64_xchg(atomic64_t *ptr, s64 new)
-{
-	s64 prev;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llockd  %0, [%1]	\n"
-	"	scondd  %2, [%1]	\n"
-	"	bnz     1b		\n"
-	"2:				\n"
-	: "=&r"(prev)
-	: "r"(ptr), "r"(new)
-	: "cc");	/* memory clobber comes from smp_mb() */
-
-	smp_mb();
-
-	return prev;
-}
-
-/**
- * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic64_t
- *
- * The function returns the old value of *v minus 1, even if
- * the atomic variable, v, was not decremented.
- */
-
-static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
-{
-	s64 val;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llockd  %0, [%1]	\n"
-	"	sub.f   %L0, %L0, 1	# w0 - 1, set C on borrow\n"
-	"	sub.c   %H0, %H0, 1	# if C set, w1 - 1\n"
-	"	brlt    %H0, 0, 2f	\n"
-	"	scondd  %0, [%1]	\n"
-	"	bnz     1b		\n"
-	"2:				\n"
-	: "=&r"(val)
-	: "r"(&v->counter)
-	: "cc");	/* memory clobber comes from smp_mb() */
-
-	smp_mb();
-
-	return val;
-}
-#define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
-
-/**
- * arch_atomic64_fetch_add_unless - add unless the number is a given value
- * @v: pointer of type atomic64_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, if it was not @u.
- * Returns the old value of @v
- */
-static inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
-{
-	s64 old, temp;
-
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llockd  %0, [%2]	\n"
-	"	brne	%L0, %L4, 2f	# continue to add since v != u \n"
-	"	breq.d	%H0, %H4, 3f	# return since v == u \n"
-	"2:				\n"
-	"	add.f   %L1, %L0, %L3	\n"
-	"	adc     %H1, %H0, %H3	\n"
-	"	scondd  %1, [%2]	\n"
-	"	bnz     1b		\n"
-	"3:				\n"
-	: "=&r"(old), "=&r" (temp)
-	: "r"(&v->counter), "r"(a), "r"(u)
-	: "cc");	/* memory clobber comes from smp_mb() */
-
-	smp_mb();
-
-	return old;
-}
-#define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
-
-#endif	/* !CONFIG_GENERIC_ATOMIC64 */
+#else
+#include <asm/atomic64-arcv2.h>
+#endif
 
 #endif	/* !__ASSEMBLY__ */
 
diff --git a/arch/arc/include/asm/atomic64-arcv2.h b/arch/arc/include/asm/atomic64-arcv2.h
new file mode 100644
index 000000000000..53996b11b551
--- /dev/null
+++ b/arch/arc/include/asm/atomic64-arcv2.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * ARCv2 supports 64-bit exclusive load (LLOCKD) / store (SCONDD)
+ *  - The address HAS to be 64-bit aligned
+ */
+
+#ifndef _ASM_ARC_ATOMIC64_ARCV2_H
+#define _ASM_ARC_ATOMIC64_ARCV2_H
+
+typedef struct {
+	s64 __aligned(8) counter;
+} atomic64_t;
+
+#define ATOMIC64_INIT(a) { (a) }
+
+static inline s64 arch_atomic64_read(const atomic64_t *v)
+{
+	s64 val;
+
+	__asm__ __volatile__(
+	"	ldd   %0, [%1]	\n"
+	: "=r"(val)
+	: "r"(&v->counter));
+
+	return val;
+}
+
+static inline void arch_atomic64_set(atomic64_t *v, s64 a)
+{
+	/*
+	 * This could have been a simple assignment in "C" but would need
+	 * explicit volatile. Otherwise gcc optimizers could elide the store
+	 * which borked atomic64 self-test
+	 * In the inline asm version, memory clobber needed for exact same
+	 * reason, to tell gcc about the store.
+	 *
+	 * This however is not needed for sibling atomic64_add() etc since both
+	 * load/store are explicitly done in inline asm. As long as API is used
+	 * for each access, gcc has no way to optimize away any load/store
+	 */
+	__asm__ __volatile__(
+	"	std   %0, [%1]	\n"
+	:
+	: "r"(a), "r"(&v->counter)
+	: "memory");
+}
+
+#define ATOMIC64_OP(op, op1, op2)					\
+static inline void arch_atomic64_##op(s64 a, atomic64_t *v)		\
+{									\
+	s64 val;							\
+									\
+	__asm__ __volatile__(						\
+	"1:				\n"				\
+	"	llockd  %0, [%1]	\n"				\
+	"	" #op1 " %L0, %L0, %L2	\n"				\
+	"	" #op2 " %H0, %H0, %H2	\n"				\
+	"	scondd   %0, [%1]	\n"				\
+	"	bnz     1b		\n"				\
+	: "=&r"(val)							\
+	: "r"(&v->counter), "ir"(a)					\
+	: "cc");							\
+}									\
+
+#define ATOMIC64_OP_RETURN(op, op1, op2)		        	\
+static inline s64 arch_atomic64_##op##_return(s64 a, atomic64_t *v)	\
+{									\
+	s64 val;							\
+									\
+	smp_mb();							\
+									\
+	__asm__ __volatile__(						\
+	"1:				\n"				\
+	"	llockd   %0, [%1]	\n"				\
+	"	" #op1 " %L0, %L0, %L2	\n"				\
+	"	" #op2 " %H0, %H0, %H2	\n"				\
+	"	scondd   %0, [%1]	\n"				\
+	"	bnz     1b		\n"				\
+	: [val] "=&r"(val)						\
+	: "r"(&v->counter), "ir"(a)					\
+	: "cc");	/* memory clobber comes from smp_mb() */	\
+									\
+	smp_mb();							\
+									\
+	return val;							\
+}
+
+#define ATOMIC64_FETCH_OP(op, op1, op2)		        		\
+static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v)	\
+{									\
+	s64 val, orig;							\
+									\
+	smp_mb();							\
+									\
+	__asm__ __volatile__(						\
+	"1:				\n"				\
+	"	llockd   %0, [%2]	\n"				\
+	"	" #op1 " %L1, %L0, %L3	\n"				\
+	"	" #op2 " %H1, %H0, %H3	\n"				\
+	"	scondd   %1, [%2]	\n"				\
+	"	bnz     1b		\n"				\
+	: "=&r"(orig), "=&r"(val)					\
+	: "r"(&v->counter), "ir"(a)					\
+	: "cc");	/* memory clobber comes from smp_mb() */	\
+									\
+	smp_mb();							\
+									\
+	return orig;							\
+}
+
+#define ATOMIC64_OPS(op, op1, op2)					\
+	ATOMIC64_OP(op, op1, op2)					\
+	ATOMIC64_OP_RETURN(op, op1, op2)				\
+	ATOMIC64_FETCH_OP(op, op1, op2)
+
+ATOMIC64_OPS(add, add.f, adc)
+ATOMIC64_OPS(sub, sub.f, sbc)
+ATOMIC64_OPS(and, and, and)
+ATOMIC64_OPS(andnot, bic, bic)
+ATOMIC64_OPS(or, or, or)
+ATOMIC64_OPS(xor, xor, xor)
+
+#define arch_atomic64_andnot		arch_atomic64_andnot
+#define arch_atomic64_fetch_andnot	arch_atomic64_fetch_andnot
+
+#undef ATOMIC64_OPS
+#undef ATOMIC64_FETCH_OP
+#undef ATOMIC64_OP_RETURN
+#undef ATOMIC64_OP
+
+static inline s64
+arch_atomic64_cmpxchg(atomic64_t *ptr, s64 expected, s64 new)
+{
+	s64 prev;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"1:	llockd  %0, [%1]	\n"
+	"	brne    %L0, %L2, 2f	\n"
+	"	brne    %H0, %H2, 2f	\n"
+	"	scondd  %3, [%1]	\n"
+	"	bnz     1b		\n"
+	"2:				\n"
+	: "=&r"(prev)
+	: "r"(ptr), "ir"(expected), "r"(new)
+	: "cc");	/* memory clobber comes from smp_mb() */
+
+	smp_mb();
+
+	return prev;
+}
+
+static inline s64 arch_atomic64_xchg(atomic64_t *ptr, s64 new)
+{
+	s64 prev;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"1:	llockd  %0, [%1]	\n"
+	"	scondd  %2, [%1]	\n"
+	"	bnz     1b		\n"
+	"2:				\n"
+	: "=&r"(prev)
+	: "r"(ptr), "r"(new)
+	: "cc");	/* memory clobber comes from smp_mb() */
+
+	smp_mb();
+
+	return prev;
+}
+
+/**
+ * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
+ * @v: pointer of type atomic64_t
+ *
+ * The function returns the old value of *v minus 1, even if
+ * the atomic variable, v, was not decremented.
+ */
+
+static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
+{
+	s64 val;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"1:	llockd  %0, [%1]	\n"
+	"	sub.f   %L0, %L0, 1	# w0 - 1, set C on borrow\n"
+	"	sub.c   %H0, %H0, 1	# if C set, w1 - 1\n"
+	"	brlt    %H0, 0, 2f	\n"
+	"	scondd  %0, [%1]	\n"
+	"	bnz     1b		\n"
+	"2:				\n"
+	: "=&r"(val)
+	: "r"(&v->counter)
+	: "cc");	/* memory clobber comes from smp_mb() */
+
+	smp_mb();
+
+	return val;
+}
+#define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
+
+/**
+ * arch_atomic64_fetch_add_unless - add unless the number is a given value
+ * @v: pointer of type atomic64_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, if it was not @u.
+ * Returns the old value of @v
+ */
+static inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
+{
+	s64 old, temp;
+
+	smp_mb();
+
+	__asm__ __volatile__(
+	"1:	llockd  %0, [%2]	\n"
+	"	brne	%L0, %L4, 2f	# continue to add since v != u \n"
+	"	breq.d	%H0, %H4, 3f	# return since v == u \n"
+	"2:				\n"
+	"	add.f   %L1, %L0, %L3	\n"
+	"	adc     %H1, %H0, %H3	\n"
+	"	scondd  %1, [%2]	\n"
+	"	bnz     1b		\n"
+	"3:				\n"
+	: "=&r"(old), "=&r" (temp)
+	: "r"(&v->counter), "r"(a), "r"(u)
+	: "cc");	/* memory clobber comes from smp_mb() */
+
+	smp_mb();
+
+	return old;
+}
+#define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
+
+#endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/11] ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
  2021-08-04 19:15 ` [PATCH 01/11] ARC: atomics: disintegrate header Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 03/11] ARC: atomic: !LLSC: use int data type consistently Vineet Gupta
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

!LLSC atomics use spinlock (SMP) or irq-disable (UP) to implement
criticla regions. UP atomic_set() however was "cheating" by not doing
any of that so and still being functional.

Remove this anomaly (primarily as cleanup for future code improvements)
given that this config is not worth hassle of special case code.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic-spinlock.h | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/arc/include/asm/atomic-spinlock.h b/arch/arc/include/asm/atomic-spinlock.h
index bdf87610b2d7..8c6fd0e651e5 100644
--- a/arch/arc/include/asm/atomic-spinlock.h
+++ b/arch/arc/include/asm/atomic-spinlock.h
@@ -3,12 +3,10 @@
 #ifndef _ASM_ARC_ATOMIC_SPLOCK_H
 #define _ASM_ARC_ATOMIC_SPLOCK_H
 
-#ifndef CONFIG_SMP
-
- /* violating atomic_xxx API locking protocol in UP for optimization sake */
-#define arch_atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
-
-#else
+/*
+ * Non hardware assisted Atomic-R-M-W
+ * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
+ */
 
 static inline void arch_atomic_set(atomic_t *v, int i)
 {
@@ -30,13 +28,6 @@ static inline void arch_atomic_set(atomic_t *v, int i)
 
 #define arch_atomic_set_release(v, i)	arch_atomic_set((v), (i))
 
-#endif
-
-/*
- * Non hardware assisted Atomic-R-M-W
- * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
- */
-
 #define ATOMIC_OP(op, c_op, asm_op)					\
 static inline void arch_atomic_##op(int i, atomic_t *v)			\
 {									\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/11] ARC: atomic: !LLSC: use int data type consistently
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
  2021-08-04 19:15 ` [PATCH 01/11] ARC: atomics: disintegrate header Vineet Gupta
  2021-08-04 19:15 ` [PATCH 02/11] ARC: atomic: !LLSC: remove hack in atomic_set() for for UP Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 04/11] ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return Vineet Gupta
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic-spinlock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/atomic-spinlock.h b/arch/arc/include/asm/atomic-spinlock.h
index 8c6fd0e651e5..2c830347bfb4 100644
--- a/arch/arc/include/asm/atomic-spinlock.h
+++ b/arch/arc/include/asm/atomic-spinlock.h
@@ -42,7 +42,7 @@ static inline void arch_atomic_##op(int i, atomic_t *v)			\
 static inline int arch_atomic_##op##_return(int i, atomic_t *v)		\
 {									\
 	unsigned long flags;						\
-	unsigned long temp;						\
+	unsigned int temp;						\
 									\
 	/*								\
 	 * spin lock/unlock provides the needed smp_mb() before/after	\
@@ -60,7 +60,7 @@ static inline int arch_atomic_##op##_return(int i, atomic_t *v)		\
 static inline int arch_atomic_fetch_##op(int i, atomic_t *v)		\
 {									\
 	unsigned long flags;						\
-	unsigned long orig;						\
+	unsigned int orig;						\
 									\
 	/*								\
 	 * spin lock/unlock provides the needed smp_mb() before/after	\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/11] ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (2 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 03/11] ARC: atomic: !LLSC: use int data type consistently Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 05/11] ARC: atomics: implement relaxed variants Vineet Gupta
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

This is a non-functional change since those wrappers are not
used in kernel sources at all.

Link: http://lists.infradead.org/pipermail/linux-snps-arc/2018-August/004246.html
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic64-arcv2.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arc/include/asm/atomic64-arcv2.h b/arch/arc/include/asm/atomic64-arcv2.h
index 53996b11b551..22ef1cbb94e2 100644
--- a/arch/arc/include/asm/atomic64-arcv2.h
+++ b/arch/arc/include/asm/atomic64-arcv2.h
@@ -116,6 +116,12 @@ static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v)	\
 
 ATOMIC64_OPS(add, add.f, adc)
 ATOMIC64_OPS(sub, sub.f, sbc)
+
+#undef ATOMIC64_OPS
+#define ATOMIC64_OPS(op, op1, op2)					\
+	ATOMIC64_OP(op, op1, op2)					\
+	ATOMIC64_FETCH_OP(op, op1, op2)
+
 ATOMIC64_OPS(and, and, and)
 ATOMIC64_OPS(andnot, bic, bic)
 ATOMIC64_OPS(or, or, or)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/11] ARC: atomics: implement relaxed variants
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (3 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 04/11] ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 06/11] ARC: switch to generic bitops Vineet Gupta
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

The current ARC fetch/return atomics provide fully ordered semantics
only with 2 full barriers around the operation.

Instead implement them as relaxed variants without any barriers and
rely on generic code to generate the fully-ordered, acquire and release
varaints by adding the appropriate full barriers.

This helps elide some extra barriers in case of acquire/release/relaxed
calls.

bloat-o-meter for hsdk defconfig shows codegen improvements, although
numbers below inflated due to unrelated inlining heuristic changes

| bloat-o-meter vmlinux-643babe34fd7-non-relaxed vmlinux-45aa05cb44d7-relaxed
| add/remove: 2/5 grow/shrink: 42/1222 up/down: 4158/-14312 (-10154)
| Function                                     old     new   delta
| ..
| sys_renameat                                 462     476     +14
| ip_mc_inc_group                              424     436     +12
| do_read_cache_page                          1882    1894     +12
| ..
| refcount_dec_and_mutex_lock                  254     250      -4
| refcount_dec_and_lock_irqsave                258     254      -4
| refcount_dec_and_lock                        254     250      -4
| ..
| tcp_v6_route_req                             246     238      -8
| tcp_v4_destroy_sock                          286     278      -8
| tcp_twsk_unique                              352     344      -8

Link: https://lore.kernel.org/r/20180830144344.GW24142@hirez.programming.kicks-ass.net
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic-llsc.h    | 32 +++++++++++----------------
 arch/arc/include/asm/atomic64-arcv2.h | 24 +++++++++++---------
 2 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/arch/arc/include/asm/atomic-llsc.h b/arch/arc/include/asm/atomic-llsc.h
index aab4f2855457..088d348781c1 100644
--- a/arch/arc/include/asm/atomic-llsc.h
+++ b/arch/arc/include/asm/atomic-llsc.h
@@ -22,16 +22,10 @@ static inline void arch_atomic_##op(int i, atomic_t *v)			\
 }									\
 
 #define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-static inline int arch_atomic_##op##_return(int i, atomic_t *v)		\
+static inline int arch_atomic_##op##_return_relaxed(int i, atomic_t *v)	\
 {									\
 	unsigned int val;						\
 									\
-	/*								\
-	 * Explicit full memory barrier needed before/after as		\
-	 * LLOCK/SCOND themselves don't provide any such semantics	\
-	 */								\
-	smp_mb();							\
-									\
 	__asm__ __volatile__(						\
 	"1:	llock   %[val], [%[ctr]]		\n"		\
 	"	" #asm_op " %[val], %[val], %[i]	\n"		\
@@ -42,22 +36,17 @@ static inline int arch_atomic_##op##_return(int i, atomic_t *v)		\
 	  [i]	"ir"	(i)						\
 	: "cc");							\
 									\
-	smp_mb();							\
-									\
 	return val;							\
 }
 
+#define arch_atomic_add_return_relaxed		arch_atomic_add_return_relaxed
+#define arch_atomic_sub_return_relaxed		arch_atomic_sub_return_relaxed
+
 #define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
-static inline int arch_atomic_fetch_##op(int i, atomic_t *v)		\
+static inline int arch_atomic_fetch_##op##_relaxed(int i, atomic_t *v)	\
 {									\
 	unsigned int val, orig;						\
 									\
-	/*								\
-	 * Explicit full memory barrier needed before/after as		\
-	 * LLOCK/SCOND themselves don't provide any such semantics	\
-	 */								\
-	smp_mb();							\
-									\
 	__asm__ __volatile__(						\
 	"1:	llock   %[orig], [%[ctr]]		\n"		\
 	"	" #asm_op " %[val], %[orig], %[i]	\n"		\
@@ -69,11 +58,17 @@ static inline int arch_atomic_fetch_##op(int i, atomic_t *v)		\
 	  [i]	"ir"	(i)						\
 	: "cc");							\
 									\
-	smp_mb();							\
-									\
 	return orig;							\
 }
 
+#define arch_atomic_fetch_add_relaxed		arch_atomic_fetch_add_relaxed
+#define arch_atomic_fetch_sub_relaxed		arch_atomic_fetch_sub_relaxed
+
+#define arch_atomic_fetch_and_relaxed		arch_atomic_fetch_and_relaxed
+#define arch_atomic_fetch_andnot_relaxed	arch_atomic_fetch_andnot_relaxed
+#define arch_atomic_fetch_or_relaxed		arch_atomic_fetch_or_relaxed
+#define arch_atomic_fetch_xor_relaxed		arch_atomic_fetch_xor_relaxed
+
 #define ATOMIC_OPS(op, c_op, asm_op)					\
 	ATOMIC_OP(op, c_op, asm_op)					\
 	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
@@ -93,7 +88,6 @@ ATOMIC_OPS(or, |=, or)
 ATOMIC_OPS(xor, ^=, xor)
 
 #define arch_atomic_andnot		arch_atomic_andnot
-#define arch_atomic_fetch_andnot	arch_atomic_fetch_andnot
 
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
diff --git a/arch/arc/include/asm/atomic64-arcv2.h b/arch/arc/include/asm/atomic64-arcv2.h
index 22ef1cbb94e2..c5a8010fdc97 100644
--- a/arch/arc/include/asm/atomic64-arcv2.h
+++ b/arch/arc/include/asm/atomic64-arcv2.h
@@ -64,12 +64,10 @@ static inline void arch_atomic64_##op(s64 a, atomic64_t *v)		\
 }									\
 
 #define ATOMIC64_OP_RETURN(op, op1, op2)		        	\
-static inline s64 arch_atomic64_##op##_return(s64 a, atomic64_t *v)	\
+static inline s64 arch_atomic64_##op##_return_relaxed(s64 a, atomic64_t *v)	\
 {									\
 	s64 val;							\
 									\
-	smp_mb();							\
-									\
 	__asm__ __volatile__(						\
 	"1:				\n"				\
 	"	llockd   %0, [%1]	\n"				\
@@ -81,18 +79,17 @@ static inline s64 arch_atomic64_##op##_return(s64 a, atomic64_t *v)	\
 	: "r"(&v->counter), "ir"(a)					\
 	: "cc");	/* memory clobber comes from smp_mb() */	\
 									\
-	smp_mb();							\
-									\
 	return val;							\
 }
 
+#define arch_atomic64_add_return_relaxed	arch_atomic64_add_return_relaxed
+#define arch_atomic64_sub_return_relaxed	arch_atomic64_sub_return_relaxed
+
 #define ATOMIC64_FETCH_OP(op, op1, op2)		        		\
-static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v)	\
+static inline s64 arch_atomic64_fetch_##op##_relaxed(s64 a, atomic64_t *v)	\
 {									\
 	s64 val, orig;							\
 									\
-	smp_mb();							\
-									\
 	__asm__ __volatile__(						\
 	"1:				\n"				\
 	"	llockd   %0, [%2]	\n"				\
@@ -104,11 +101,17 @@ static inline s64 arch_atomic64_fetch_##op(s64 a, atomic64_t *v)	\
 	: "r"(&v->counter), "ir"(a)					\
 	: "cc");	/* memory clobber comes from smp_mb() */	\
 									\
-	smp_mb();							\
-									\
 	return orig;							\
 }
 
+#define arch_atomic64_fetch_add_relaxed		arch_atomic64_fetch_add_relaxed
+#define arch_atomic64_fetch_sub_relaxed		arch_atomic64_fetch_sub_relaxed
+
+#define arch_atomic64_fetch_and_relaxed		arch_atomic64_fetch_and_relaxed
+#define arch_atomic64_fetch_andnot_relaxed	arch_atomic64_fetch_andnot_relaxed
+#define arch_atomic64_fetch_or_relaxed		arch_atomic64_fetch_or_relaxed
+#define arch_atomic64_fetch_xor_relaxed		arch_atomic64_fetch_xor_relaxed
+
 #define ATOMIC64_OPS(op, op1, op2)					\
 	ATOMIC64_OP(op, op1, op2)					\
 	ATOMIC64_OP_RETURN(op, op1, op2)				\
@@ -128,7 +131,6 @@ ATOMIC64_OPS(or, or, or)
 ATOMIC64_OPS(xor, xor, xor)
 
 #define arch_atomic64_andnot		arch_atomic64_andnot
-#define arch_atomic64_fetch_andnot	arch_atomic64_fetch_andnot
 
 #undef ATOMIC64_OPS
 #undef ATOMIC64_FETCH_OP
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/11] ARC: switch to generic bitops
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (4 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 05/11] ARC: atomics: implement relaxed variants Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 07/11] ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines Vineet Gupta
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Will Deacon,
	Vineet Gupta

From: Will Deacon <will.deacon@arm.com>

 - !LLSC now only needs a single spinlock for atomics and bitops

 - Some codegen changes (slight bloat) with generic bitops

   1. code increase due to LD-check-atomic paradigm vs. unconditonal
      atomic (but dirty'ing the cache line even if set already).
      So despite increase, generic is right thing to do.

   2. code decrease (but use of costlier instructions such as DIV vs.
      shifts based math) due to signed arithmetic.
      This needs to be revisited seperately.

     arc:
     static inline int test_bit(unsigned int nr, const volatile unsigned long *addr)
                                ^^^^^^^^^^^^
     generic:
     static inline int test_bit(int nr, const volatile unsigned long *addr)
                                ^^^

Link: https://lore.kernel.org/r/20180830135749.GA13005@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
[vgupta: wrote patch based on Will's poc, analysed codegen diffs]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/bitops.h | 184 +---------------------------------
 arch/arc/include/asm/smp.h    |  14 ---
 arch/arc/kernel/smp.c         |   2 -
 3 files changed, 2 insertions(+), 198 deletions(-)

diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index fb98440c0bd4..4f35130f5ba3 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -14,188 +14,6 @@
 
 #include <linux/types.h>
 #include <linux/compiler.h>
-#include <asm/barrier.h>
-#ifndef CONFIG_ARC_HAS_LLSC
-#include <asm/smp.h>
-#endif
-
-#ifdef CONFIG_ARC_HAS_LLSC
-
-/*
- * Hardware assisted Atomic-R-M-W
- */
-
-#define BIT_OP(op, c_op, asm_op)					\
-static inline void op##_bit(unsigned long nr, volatile unsigned long *m)\
-{									\
-	unsigned int temp;						\
-									\
-	m += nr >> 5;							\
-									\
-	nr &= 0x1f;							\
-									\
-	__asm__ __volatile__(						\
-	"1:	llock       %0, [%1]		\n"			\
-	"	" #asm_op " %0, %0, %2	\n"				\
-	"	scond       %0, [%1]		\n"			\
-	"	bnz         1b			\n"			\
-	: "=&r"(temp)	/* Early clobber, to prevent reg reuse */	\
-	: "r"(m),	/* Not "m": llock only supports reg direct addr mode */	\
-	  "ir"(nr)							\
-	: "cc");							\
-}
-
-/*
- * Semantically:
- *    Test the bit
- *    if clear
- *        set it and return 0 (old value)
- *    else
- *        return 1 (old value).
- *
- * Since ARC lacks a equivalent h/w primitive, the bit is set unconditionally
- * and the old value of bit is returned
- */
-#define TEST_N_BIT_OP(op, c_op, asm_op)					\
-static inline int test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
-{									\
-	unsigned long old, temp;					\
-									\
-	m += nr >> 5;							\
-									\
-	nr &= 0x1f;							\
-									\
-	/*								\
-	 * Explicit full memory barrier needed before/after as		\
-	 * LLOCK/SCOND themselves don't provide any such smenatic	\
-	 */								\
-	smp_mb();							\
-									\
-	__asm__ __volatile__(						\
-	"1:	llock       %0, [%2]	\n"				\
-	"	" #asm_op " %1, %0, %3	\n"				\
-	"	scond       %1, [%2]	\n"				\
-	"	bnz         1b		\n"				\
-	: "=&r"(old), "=&r"(temp)					\
-	: "r"(m), "ir"(nr)						\
-	: "cc");							\
-									\
-	smp_mb();							\
-									\
-	return (old & (1 << nr)) != 0;					\
-}
-
-#else /* !CONFIG_ARC_HAS_LLSC */
-
-/*
- * Non hardware assisted Atomic-R-M-W
- * Locking would change to irq-disabling only (UP) and spinlocks (SMP)
- *
- * There's "significant" micro-optimization in writing our own variants of
- * bitops (over generic variants)
- *
- * (1) The generic APIs have "signed" @nr while we have it "unsigned"
- *     This avoids extra code to be generated for pointer arithmatic, since
- *     is "not sure" that index is NOT -ve
- * (2) Utilize the fact that ARCompact bit fidding insn (BSET/BCLR/ASL) etc
- *     only consider bottom 5 bits of @nr, so NO need to mask them off.
- *     (GCC Quirk: however for constant @nr we still need to do the masking
- *             at compile time)
- */
-
-#define BIT_OP(op, c_op, asm_op)					\
-static inline void op##_bit(unsigned long nr, volatile unsigned long *m)\
-{									\
-	unsigned long temp, flags;					\
-	m += nr >> 5;							\
-									\
-	/*								\
-	 * spin lock/unlock provide the needed smp_mb() before/after	\
-	 */								\
-	bitops_lock(flags);						\
-									\
-	temp = *m;							\
-	*m = temp c_op (1UL << (nr & 0x1f));					\
-									\
-	bitops_unlock(flags);						\
-}
-
-#define TEST_N_BIT_OP(op, c_op, asm_op)					\
-static inline int test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
-{									\
-	unsigned long old, flags;					\
-	m += nr >> 5;							\
-									\
-	bitops_lock(flags);						\
-									\
-	old = *m;							\
-	*m = old c_op (1UL << (nr & 0x1f));				\
-									\
-	bitops_unlock(flags);						\
-									\
-	return (old & (1UL << (nr & 0x1f))) != 0;			\
-}
-
-#endif
-
-/***************************************
- * Non atomic variants
- **************************************/
-
-#define __BIT_OP(op, c_op, asm_op)					\
-static inline void __##op##_bit(unsigned long nr, volatile unsigned long *m)	\
-{									\
-	unsigned long temp;						\
-	m += nr >> 5;							\
-									\
-	temp = *m;							\
-	*m = temp c_op (1UL << (nr & 0x1f));				\
-}
-
-#define __TEST_N_BIT_OP(op, c_op, asm_op)				\
-static inline int __test_and_##op##_bit(unsigned long nr, volatile unsigned long *m)\
-{									\
-	unsigned long old;						\
-	m += nr >> 5;							\
-									\
-	old = *m;							\
-	*m = old c_op (1UL << (nr & 0x1f));				\
-									\
-	return (old & (1UL << (nr & 0x1f))) != 0;			\
-}
-
-#define BIT_OPS(op, c_op, asm_op)					\
-									\
-	/* set_bit(), clear_bit(), change_bit() */			\
-	BIT_OP(op, c_op, asm_op)					\
-									\
-	/* test_and_set_bit(), test_and_clear_bit(), test_and_change_bit() */\
-	TEST_N_BIT_OP(op, c_op, asm_op)					\
-									\
-	/* __set_bit(), __clear_bit(), __change_bit() */		\
-	__BIT_OP(op, c_op, asm_op)					\
-									\
-	/* __test_and_set_bit(), __test_and_clear_bit(), __test_and_change_bit() */\
-	__TEST_N_BIT_OP(op, c_op, asm_op)
-
-BIT_OPS(set, |, bset)
-BIT_OPS(clear, & ~, bclr)
-BIT_OPS(change, ^, bxor)
-
-/*
- * This routine doesn't need to be atomic.
- */
-static inline int
-test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
-	unsigned long mask;
-
-	addr += nr >> 5;
-
-	mask = 1UL << (nr & 0x1f);
-
-	return ((mask & *addr) != 0);
-}
 
 #ifdef CONFIG_ISA_ARCOMPACT
 
@@ -368,6 +186,8 @@ static inline __attribute__ ((const)) unsigned long __ffs(unsigned long x)
 #include <asm-generic/bitops/fls64.h>
 #include <asm-generic/bitops/sched.h>
 #include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/atomic.h>
+#include <asm-generic/bitops/non-atomic.h>
 
 #include <asm-generic/bitops/find.h>
 #include <asm-generic/bitops/le.h>
diff --git a/arch/arc/include/asm/smp.h b/arch/arc/include/asm/smp.h
index c5de4008d19f..d856491606ac 100644
--- a/arch/arc/include/asm/smp.h
+++ b/arch/arc/include/asm/smp.h
@@ -105,7 +105,6 @@ static inline const char *arc_platform_smp_cpuinfo(void)
 #include <asm/spinlock.h>
 
 extern arch_spinlock_t smp_atomic_ops_lock;
-extern arch_spinlock_t smp_bitops_lock;
 
 #define atomic_ops_lock(flags)	do {		\
 	local_irq_save(flags);			\
@@ -117,24 +116,11 @@ extern arch_spinlock_t smp_bitops_lock;
 	local_irq_restore(flags);		\
 } while (0)
 
-#define bitops_lock(flags)	do {		\
-	local_irq_save(flags);			\
-	arch_spin_lock(&smp_bitops_lock);	\
-} while (0)
-
-#define bitops_unlock(flags) do {		\
-	arch_spin_unlock(&smp_bitops_lock);	\
-	local_irq_restore(flags);		\
-} while (0)
-
 #else /* !CONFIG_SMP */
 
 #define atomic_ops_lock(flags)		local_irq_save(flags)
 #define atomic_ops_unlock(flags)	local_irq_restore(flags)
 
-#define bitops_lock(flags)		local_irq_save(flags)
-#define bitops_unlock(flags)		local_irq_restore(flags)
-
 #endif /* !CONFIG_SMP */
 
 #endif	/* !CONFIG_ARC_HAS_LLSC */
diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c
index db0e104d6835..62f641c9a493 100644
--- a/arch/arc/kernel/smp.c
+++ b/arch/arc/kernel/smp.c
@@ -29,10 +29,8 @@
 
 #ifndef CONFIG_ARC_HAS_LLSC
 arch_spinlock_t smp_atomic_ops_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-arch_spinlock_t smp_bitops_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 
 EXPORT_SYMBOL_GPL(smp_atomic_ops_lock);
-EXPORT_SYMBOL_GPL(smp_bitops_lock);
 #endif
 
 struct plat_smp_ops  __weak plat_smp_ops;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/11] ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (5 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 06/11] ARC: switch to generic bitops Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 08/11] ARC: xchg: !LLSC: remove UP micro-optimization/hack Vineet Gupta
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/bitops.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h
index 4f35130f5ba3..a7daaf64ae34 100644
--- a/arch/arc/include/asm/bitops.h
+++ b/arch/arc/include/asm/bitops.h
@@ -114,7 +114,7 @@ static inline __attribute__ ((const)) unsigned long __ffs(unsigned long word)
  * @result: [1-32]
  * fls(1) = 1, fls(0x80000000) = 32, fls(0) = 0
  */
-static inline __attribute__ ((const)) int fls(unsigned long x)
+static inline __attribute__ ((const)) int fls(unsigned int x)
 {
 	int n;
 
@@ -141,7 +141,7 @@ static inline __attribute__ ((const)) int __fls(unsigned long x)
  * ffs = Find First Set in word (LSB to MSB)
  * @result: [1-32], 0 if all 0's
  */
-static inline __attribute__ ((const)) int ffs(unsigned long x)
+static inline __attribute__ ((const)) int ffs(unsigned int x)
 {
 	int n;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/11] ARC: xchg: !LLSC: remove UP micro-optimization/hack
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (6 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 07/11] ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 09/11] ARC: cmpxchg/xchg: rewrite as macros to make type safe Vineet Gupta
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

It gets in the way of cleaning things up and is a maintenance
pain-in-neck !

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/cmpxchg.h | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index d42917e803e1..bac9b564a140 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -113,15 +113,9 @@ static inline unsigned long __xchg(unsigned long val, volatile void *ptr,
  *  - For !LLSC, cmpxchg() needs to use that lock (see above) and there is lot
  *    of  kernel code which calls xchg()/cmpxchg() on same data (see llist.h)
  *    Hence xchg() needs to follow same locking rules.
- *
- * Technically the lock is also needed for UP (boils down to irq save/restore)
- * but we can cheat a bit since cmpxchg() atomic_ops_lock() would cause irqs to
- * be disabled thus can't possibly be interrupted/preempted/clobbered by xchg()
- * Other way around, xchg is one instruction anyways, so can't be interrupted
- * as such
  */
 
-#if !defined(CONFIG_ARC_HAS_LLSC) && defined(CONFIG_SMP)
+#ifndef CONFIG_ARC_HAS_LLSC
 
 #define arch_xchg(ptr, with)		\
 ({					\
@@ -134,10 +128,6 @@ static inline unsigned long __xchg(unsigned long val, volatile void *ptr,
 	old_val;			\
 })
 
-#else
-
-#define arch_xchg(ptr, with)  _xchg(ptr, with)
-
 #endif
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/11] ARC: cmpxchg/xchg: rewrite as macros to make type safe
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (7 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 08/11] ARC: xchg: !LLSC: remove UP micro-optimization/hack Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 10/11] ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only) Vineet Gupta
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

Existing code forces/assume args to type "long" which won't work in LP64
regime, so prepare code for that

Interestingly this should be a non functional change but I do see
some codegen changes

| bloat-o-meter vmlinux-cmpxchg-A vmlinux-cmpxchg-B
| add/remove: 0/0 grow/shrink: 17/12 up/down: 218/-150 (68)
|
| Function                                     old     new   delta
| rwsem_optimistic_spin                        518     550     +32
| rwsem_down_write_slowpath                   1244    1274     +30
| __do_sys_perf_event_open                    2576    2600     +24
| down_read                                    192     200      +8
| __down_read                                  192     200      +8
...
| task_work_run                                168     148     -20
| dma_fence_chain_walk.part                    760     736     -24
| __genradix_ptr_alloc                         674     646     -28

Total: Before=6187409, After=6187477, chg +0.00%

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/cmpxchg.h | 209 ++++++++++++++++++---------------
 1 file changed, 117 insertions(+), 92 deletions(-)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index bac9b564a140..00deb076d6f6 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -6,6 +6,7 @@
 #ifndef __ASM_ARC_CMPXCHG_H
 #define __ASM_ARC_CMPXCHG_H
 
+#include <linux/build_bug.h>
 #include <linux/types.h>
 
 #include <asm/barrier.h>
@@ -13,62 +14,77 @@
 
 #ifdef CONFIG_ARC_HAS_LLSC
 
-static inline unsigned long
-__cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
-{
-	unsigned long prev;
-
-	/*
-	 * Explicit full memory barrier needed before/after as
-	 * LLOCK/SCOND themselves don't provide any such semantics
-	 */
-	smp_mb();
-
-	__asm__ __volatile__(
-	"1:	llock   %0, [%1]	\n"
-	"	brne    %0, %2, 2f	\n"
-	"	scond   %3, [%1]	\n"
-	"	bnz     1b		\n"
-	"2:				\n"
-	: "=&r"(prev)	/* Early clobber, to prevent reg reuse */
-	: "r"(ptr),	/* Not "m": llock only supports reg direct addr mode */
-	  "ir"(expected),
-	  "r"(new)	/* can't be "ir". scond can't take LIMM for "b" */
-	: "cc", "memory"); /* so that gcc knows memory is being written here */
-
-	smp_mb();
-
-	return prev;
-}
-
-#else /* !CONFIG_ARC_HAS_LLSC */
-
-static inline unsigned long
-__cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
-{
-	unsigned long flags;
-	int prev;
-	volatile unsigned long *p = ptr;
-
-	/*
-	 * spin lock/unlock provide the needed smp_mb() before/after
-	 */
-	atomic_ops_lock(flags);
-	prev = *p;
-	if (prev == expected)
-		*p = new;
-	atomic_ops_unlock(flags);
-	return prev;
-}
+/*
+ * if (*ptr == @old)
+ *      *ptr = @new
+ */
+#define __cmpxchg(ptr, old, new)					\
+({									\
+	__typeof__(*(ptr)) _prev;					\
+									\
+	__asm__ __volatile__(						\
+	"1:	llock  %0, [%1]	\n"					\
+	"	brne   %0, %2, 2f	\n"				\
+	"	scond  %3, [%1]	\n"					\
+	"	bnz     1b		\n"				\
+	"2:				\n"				\
+	: "=&r"(_prev)	/* Early clobber prevent reg reuse */		\
+	: "r"(ptr),	/* Not "m": llock only supports reg */		\
+	  "ir"(old),							\
+	  "r"(new)	/* Not "ir": scond can't take LIMM */		\
+	: "cc",								\
+	  "memory");	/* gcc knows memory is clobbered */		\
+									\
+	_prev;								\
+})
 
-#endif
+#define arch_cmpxchg(ptr, old, new)				        \
+({									\
+	__typeof__(ptr) _p_ = (ptr);					\
+	__typeof__(*(ptr)) _o_ = (old);					\
+	__typeof__(*(ptr)) _n_ = (new);					\
+	__typeof__(*(ptr)) _prev_;					\
+									\
+	switch(sizeof((_p_))) {						\
+	case 4:								\
+	        /*							\
+		 * Explicit full memory barrier needed before/after	\
+	         */							\
+		smp_mb();						\
+		_prev_ = __cmpxchg(_p_, _o_, _n_);			\
+		smp_mb();						\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+	_prev_;								\
+})
 
-#define arch_cmpxchg(ptr, o, n) ({			\
-	(typeof(*(ptr)))__cmpxchg((ptr),		\
-				  (unsigned long)(o),	\
-				  (unsigned long)(n));	\
+#else
+
+#define arch_cmpxchg(ptr, old, new)				        \
+({									\
+	volatile __typeof__(ptr) _p_ = (ptr);				\
+	__typeof__(*(ptr)) _o_ = (old);					\
+	__typeof__(*(ptr)) _n_ = (new);					\
+	__typeof__(*(ptr)) _prev_;					\
+	unsigned long __flags;						\
+									\
+	BUILD_BUG_ON(sizeof(_p_) != 4);					\
+									\
+	/*								\
+	 * spin lock/unlock provide the needed smp_mb() before/after	\
+	 */								\
+	atomic_ops_lock(__flags);					\
+	_prev_ = *_p_;							\
+	if (_prev_ == _o_)						\
+		*_p_ = _n_;						\
+	atomic_ops_unlock(__flags);					\
+	_prev_;								\
 })
 
+#endif
+
 /*
  * atomic_cmpxchg is same as cmpxchg
  *   LLSC: only different in data-type, semantics are exactly same
@@ -77,55 +93,64 @@ __cmpxchg(volatile void *ptr, unsigned long expected, unsigned long new)
  */
 #define arch_atomic_cmpxchg(v, o, n) ((int)arch_cmpxchg(&((v)->counter), (o), (n)))
 
-
 /*
- * xchg (reg with memory) based on "Native atomic" EX insn
+ * xchg
  */
-static inline unsigned long __xchg(unsigned long val, volatile void *ptr,
-				   int size)
-{
-	extern unsigned long __xchg_bad_pointer(void);
-
-	switch (size) {
-	case 4:
-		smp_mb();
-
-		__asm__ __volatile__(
-		"	ex  %0, [%1]	\n"
-		: "+r"(val)
-		: "r"(ptr)
-		: "memory");
+#ifdef CONFIG_ARC_HAS_LLSC
 
-		smp_mb();
+#define __xchg(ptr, val)						\
+({									\
+	__asm__ __volatile__(						\
+	"	ex  %0, [%1]	\n"	/* set new value */	        \
+	: "+r"(val)							\
+	: "r"(ptr)							\
+	: "memory");							\
+	_val_;		/* get old value */				\
+})
 
-		return val;
-	}
-	return __xchg_bad_pointer();
-}
+#define arch_xchg(ptr, val)						\
+({									\
+	__typeof__(ptr) _p_ = (ptr);					\
+	__typeof__(*(ptr)) _val_ = (val);				\
+									\
+	switch(sizeof(*(_p_))) {					\
+	case 4:								\
+		smp_mb();						\
+		_val_ = __xchg(_p_, _val_);				\
+	        smp_mb();						\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+	_val_;								\
+})
 
-#define _xchg(ptr, with) ((typeof(*(ptr)))__xchg((unsigned long)(with), (ptr), \
-						 sizeof(*(ptr))))
+#else  /* !CONFIG_ARC_HAS_LLSC */
 
 /*
- * xchg() maps directly to ARC EX instruction which guarantees atomicity.
- * However in !LLSC config, it also needs to be use @atomic_ops_lock spinlock
- * due to a subtle reason:
- *  - For !LLSC, cmpxchg() needs to use that lock (see above) and there is lot
- *    of  kernel code which calls xchg()/cmpxchg() on same data (see llist.h)
- *    Hence xchg() needs to follow same locking rules.
+ * EX instructions is baseline and present in !LLSC too. But in this
+ * regime it still needs use @atomic_ops_lock spinlock to allow interop
+ * with cmpxchg() which uses spinlock in !LLSC
+ * (llist.h use xchg and cmpxchg on sama data)
  */
 
-#ifndef CONFIG_ARC_HAS_LLSC
-
-#define arch_xchg(ptr, with)		\
-({					\
-	unsigned long flags;		\
-	typeof(*(ptr)) old_val;		\
-					\
-	atomic_ops_lock(flags);		\
-	old_val = _xchg(ptr, with);	\
-	atomic_ops_unlock(flags);	\
-	old_val;			\
+#define arch_xchg(ptr, val)					        \
+({									\
+	__typeof__(ptr) _p_ = (ptr);					\
+	__typeof__(*(ptr)) _val_ = (val);				\
+									\
+	unsigned long __flags;						\
+									\
+	atomic_ops_lock(__flags);					\
+									\
+	__asm__ __volatile__(						\
+	"	ex  %0, [%1]	\n"					\
+	: "+r"(_val_)							\
+	: "r"(_p_)							\
+	: "memory");							\
+									\
+	atomic_ops_unlock(__flags);					\
+	_val_;								\
 })
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/11] ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (8 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 09/11] ARC: cmpxchg/xchg: rewrite as macros to make type safe Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-04 19:15 ` [PATCH 11/11] ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants Vineet Gupta
  2021-08-05  9:02 ` [PATCH 00/11] ARC atomics update Peter Zijlstra
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

It only makes sense to do this for the LLSC config

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/cmpxchg.h | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index 00deb076d6f6..e2ae0eb1ca07 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -38,7 +38,7 @@
 	_prev;								\
 })
 
-#define arch_cmpxchg(ptr, old, new)				        \
+#define arch_cmpxchg_relaxed(ptr, old, new)				\
 ({									\
 	__typeof__(ptr) _p_ = (ptr);					\
 	__typeof__(*(ptr)) _o_ = (old);					\
@@ -47,12 +47,7 @@
 									\
 	switch(sizeof((_p_))) {						\
 	case 4:								\
-	        /*							\
-		 * Explicit full memory barrier needed before/after	\
-	         */							\
-		smp_mb();						\
 		_prev_ = __cmpxchg(_p_, _o_, _n_);			\
-		smp_mb();						\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
@@ -108,16 +103,14 @@
 	_val_;		/* get old value */				\
 })
 
-#define arch_xchg(ptr, val)						\
+#define arch_xchg_relaxed(ptr, val)					\
 ({									\
 	__typeof__(ptr) _p_ = (ptr);					\
 	__typeof__(*(ptr)) _val_ = (val);				\
 									\
 	switch(sizeof(*(_p_))) {					\
 	case 4:								\
-		smp_mb();						\
 		_val_ = __xchg(_p_, _val_);				\
-	        smp_mb();						\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/11] ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (9 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 10/11] ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only) Vineet Gupta
@ 2021-08-04 19:15 ` Vineet Gupta
  2021-08-05  9:02 ` [PATCH 00/11] ARC atomics update Peter Zijlstra
  11 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-04 19:15 UTC (permalink / raw)
  To: linux-snps-arc
  Cc: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev, Vineet Gupta

And move them out of cmpxchg.h to canonical atomic.h

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/atomic.h  | 27 +++++++++++++++++++++++++++
 arch/arc/include/asm/cmpxchg.h | 23 -----------------------
 2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index ee88e1dbaab5..52ee51e1ff7c 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -22,6 +22,33 @@
 #include <asm/atomic-spinlock.h>
 #endif
 
+#define arch_atomic_cmpxchg(v, o, n)					\
+({									\
+	arch_cmpxchg(&((v)->counter), (o), (n));			\
+})
+
+#ifdef arch_cmpxchg_relaxed
+#define arch_atomic_cmpxchg_relaxed(v, o, n)				\
+({									\
+	arch_cmpxchg_relaxed(&((v)->counter), (o), (n));		\
+})
+#endif
+
+#define arch_atomic_xchg(v, n)						\
+({									\
+	arch_xchg(&((v)->counter), (n));				\
+})
+
+#ifdef arch_xchg_relaxed
+#define arch_atomic_xchg_relaxed(v, n)					\
+({									\
+	arch_xchg_relaxed(&((v)->counter), (n));			\
+})
+#endif
+
+/*
+ * 64-bit atomics
+ */
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #else
diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h
index e2ae0eb1ca07..c5b544a5fe81 100644
--- a/arch/arc/include/asm/cmpxchg.h
+++ b/arch/arc/include/asm/cmpxchg.h
@@ -80,14 +80,6 @@
 
 #endif
 
-/*
- * atomic_cmpxchg is same as cmpxchg
- *   LLSC: only different in data-type, semantics are exactly same
- *  !LLSC: cmpxchg() has to use an external lock atomic_ops_lock to guarantee
- *         semantics, and this lock also happens to be used by atomic_*()
- */
-#define arch_atomic_cmpxchg(v, o, n) ((int)arch_cmpxchg(&((v)->counter), (o), (n)))
-
 /*
  * xchg
  */
@@ -148,19 +140,4 @@
 
 #endif
 
-/*
- * "atomic" variant of xchg()
- * REQ: It needs to follow the same serialization rules as other atomic_xxx()
- * Since xchg() doesn't always do that, it would seem that following definition
- * is incorrect. But here's the rationale:
- *   SMP : Even xchg() takes the atomic_ops_lock, so OK.
- *   LLSC: atomic_ops_lock are not relevant at all (even if SMP, since LLSC
- *         is natively "SMP safe", no serialization required).
- *   UP  : other atomics disable IRQ, so no way a difft ctxt atomic_xchg()
- *         could clobber them. atomic_xchg() itself would be 1 insn, so it
- *         can't be clobbered by others. Thus no serialization required when
- *         atomic_xchg is involved.
- */
-#define arch_atomic_xchg(v, new) (arch_xchg(&((v)->counter), new))
-
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/11] ARC atomics update
  2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
                   ` (10 preceding siblings ...)
  2021-08-04 19:15 ` [PATCH 11/11] ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants Vineet Gupta
@ 2021-08-05  9:02 ` Peter Zijlstra
  2021-08-05 16:18   ` Vineet Gupta
  11 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2021-08-05  9:02 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-snps-arc, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev

On Wed, Aug 04, 2021 at 12:15:43PM -0700, Vineet Gupta wrote:

> Vineet Gupta (10):
>   ARC: atomics: disintegrate header
>   ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
>   ARC: atomic: !LLSC: use int data type consistently
>   ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
>   ARC: atomics: implement relaxed variants
>   ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
>   ARC: xchg: !LLSC: remove UP micro-optimization/hack
>   ARC: cmpxchg/xchg: rewrite as macros to make type safe
>   ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
>   ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
> 
> Will Deacon (1):
>   ARC: switch to generic bitops

Didn't see any weird things:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/11] ARC atomics update
  2021-08-05  9:02 ` [PATCH 00/11] ARC atomics update Peter Zijlstra
@ 2021-08-05 16:18   ` Vineet Gupta
  2021-08-05 17:04     ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Vineet Gupta @ 2021-08-05 16:18 UTC (permalink / raw)
  To: Peter Zijlstra, Vineet Gupta
  Cc: linux-snps-arc, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev

On 8/5/21 2:02 AM, Peter Zijlstra wrote:
> On Wed, Aug 04, 2021 at 12:15:43PM -0700, Vineet Gupta wrote:
> 
>> Vineet Gupta (10):
>>    ARC: atomics: disintegrate header
>>    ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
>>    ARC: atomic: !LLSC: use int data type consistently
>>    ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
>>    ARC: atomics: implement relaxed variants
>>    ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
>>    ARC: xchg: !LLSC: remove UP micro-optimization/hack
>>    ARC: cmpxchg/xchg: rewrite as macros to make type safe
>>    ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
>>    ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
>>
>> Will Deacon (1):
>>    ARC: switch to generic bitops
> 
> Didn't see any weird things:
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thx Peter. A lot of this is your code anyways ;-)

Any initial thoughts/comments on patch 06/11 - is there an obvious 
reason that generic bitops take signed @nr or the hurdle is need to be 
done consistently cross-arch.

-Vineet

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/11] ARC atomics update
  2021-08-05 16:18   ` Vineet Gupta
@ 2021-08-05 17:04     ` Peter Zijlstra
  2021-08-05 19:14       ` [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV Vineet Gupta
  2021-08-06  8:41       ` [PATCH 00/11] ARC atomics update Will Deacon
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2021-08-05 17:04 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-snps-arc, Will Deacon, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev

On Thu, Aug 05, 2021 at 04:18:29PM +0000, Vineet Gupta wrote:
> On 8/5/21 2:02 AM, Peter Zijlstra wrote:
> > On Wed, Aug 04, 2021 at 12:15:43PM -0700, Vineet Gupta wrote:
> > 
> >> Vineet Gupta (10):
> >>    ARC: atomics: disintegrate header
> >>    ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
> >>    ARC: atomic: !LLSC: use int data type consistently
> >>    ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
> >>    ARC: atomics: implement relaxed variants
> >>    ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
> >>    ARC: xchg: !LLSC: remove UP micro-optimization/hack
> >>    ARC: cmpxchg/xchg: rewrite as macros to make type safe
> >>    ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
> >>    ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
> >>
> >> Will Deacon (1):
> >>    ARC: switch to generic bitops
> > 
> > Didn't see any weird things:
> > 
> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Thx Peter. A lot of this is your code anyways ;-)
> 
> Any initial thoughts/comments on patch 06/11 - is there an obvious 
> reason that generic bitops take signed @nr or the hurdle is need to be 
> done consistently cross-arch.

That does indeed seem daft and ready for a cleanup. Will any
recollection from when you touched this?

AFAICT bitops/atomic.h is consistently 'unsigned int nr', but
bitops/non-atomic.h is 'int nr' while bitops/instrumented-non-atomic.h
is consistently 'long nr'.

I'm thinking 'unsigned int nr' is the most sensible allround, but I've
not gone through all the cases.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV
  2021-08-05 17:04     ` Peter Zijlstra
@ 2021-08-05 19:14       ` Vineet Gupta
  2021-08-06 13:42         ` Will Deacon
  2021-08-06  8:41       ` [PATCH 00/11] ARC atomics update Will Deacon
  1 sibling, 1 reply; 19+ messages in thread
From: Vineet Gupta @ 2021-08-05 19:14 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Arnd Bergmann, Mark Rutland
  Cc: linux-kernel, linux-arch, linux-snps-arc, Vineet Gupta

signed math causes generation of costlier instructions such as DIV when
they could be done by barrerl shifter.

Worse part is this is not caught by things like bloat-o-meter since
instruction length / symbols are typically same size.

e.g.

stock (signed math)
__________________

919b4614 <test_taint>:
919b4614:	div	r2,r0,0x20
                ^^^
919b4618:	add2	r2,0x920f6050,r2
919b4620:	ld_s	r2,[r2,0]
919b4622:	lsr	r0,r2,r0
919b4626:	j_s.d	[blink]
919b4628:	bmsk_s	r0,r0,0
919b462a:	nop_s

(patched) unsigned math
__________________

919b4614 <test_taint>:
919b4614:	lsr	r2,r0,0x5  @nr/32
                ^^^
919b4618:	add2	r2,0x920f6050,r2
919b4620:	ld_s	r2,[r2,0]
919b4622:	lsr	r0,r2,r0     #test_bit()
919b4626:	j_s.d	[blink]
919b4628:	bmsk_s	r0,r0,0
919b462a:	nop_s

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
This is an RFC for feeback, I understand this impacts every arch,
but as of now it is only buld/run tested on ARC.
---
---
 include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..c5a7d8eb9c2b 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -13,7 +13,7 @@
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static inline void __set_bit(unsigned int nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +21,7 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
 	*p  |= mask;
 }
 
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static inline void __clear_bit(unsigned int nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +38,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
  * If it's called on the same region of memory simultaneously, the effect
  * may be that only one operation succeeds.
  */
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static inline void __change_bit(unsigned int nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +55,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline int __test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +74,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline int __test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
 	unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,7 +85,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
 }
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
+static inline int __test_and_change_bit(unsigned int nr,
 					    volatile unsigned long *addr)
 {
 	unsigned long mask = BIT_MASK(nr);
@@ -101,7 +101,7 @@ static inline int __test_and_change_bit(int nr,
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static inline int test_bit(unsigned int nr, const volatile unsigned long *addr)
 {
 	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/11] ARC atomics update
  2021-08-05 17:04     ` Peter Zijlstra
  2021-08-05 19:14       ` [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV Vineet Gupta
@ 2021-08-06  8:41       ` Will Deacon
  1 sibling, 0 replies; 19+ messages in thread
From: Will Deacon @ 2021-08-06  8:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vineet Gupta, linux-snps-arc, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-arch, Vladimir Isaev

On Thu, Aug 05, 2021 at 07:04:32PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 05, 2021 at 04:18:29PM +0000, Vineet Gupta wrote:
> > On 8/5/21 2:02 AM, Peter Zijlstra wrote:
> > > On Wed, Aug 04, 2021 at 12:15:43PM -0700, Vineet Gupta wrote:
> > > 
> > >> Vineet Gupta (10):
> > >>    ARC: atomics: disintegrate header
> > >>    ARC: atomic: !LLSC: remove hack in atomic_set() for for UP
> > >>    ARC: atomic: !LLSC: use int data type consistently
> > >>    ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return
> > >>    ARC: atomics: implement relaxed variants
> > >>    ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines
> > >>    ARC: xchg: !LLSC: remove UP micro-optimization/hack
> > >>    ARC: cmpxchg/xchg: rewrite as macros to make type safe
> > >>    ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only)
> > >>    ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants
> > >>
> > >> Will Deacon (1):
> > >>    ARC: switch to generic bitops
> > > 
> > > Didn't see any weird things:
> > > 
> > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > 
> > Thx Peter. A lot of this is your code anyways ;-)
> > 
> > Any initial thoughts/comments on patch 06/11 - is there an obvious 
> > reason that generic bitops take signed @nr or the hurdle is need to be 
> > done consistently cross-arch.
> 
> That does indeed seem daft and ready for a cleanup. Will any
> recollection from when you touched this?

I had a patch to fix this but it blew up in the robot and I didn't get round
to reworking it:

https://lore.kernel.org/patchwork/patch/1245555/

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV
  2021-08-05 19:14       ` [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV Vineet Gupta
@ 2021-08-06 13:42         ` Will Deacon
  2021-08-06 19:02           ` Vineet Gupta
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2021-08-06 13:42 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Peter Zijlstra, Arnd Bergmann, Mark Rutland, linux-kernel,
	linux-arch, linux-snps-arc

On Thu, Aug 05, 2021 at 12:14:08PM -0700, Vineet Gupta wrote:
> signed math causes generation of costlier instructions such as DIV when
> they could be done by barrerl shifter.
> 
> Worse part is this is not caught by things like bloat-o-meter since
> instruction length / symbols are typically same size.
> 
> e.g.
> 
> stock (signed math)
> __________________
> 
> 919b4614 <test_taint>:
> 919b4614:	div	r2,r0,0x20
>                 ^^^
> 919b4618:	add2	r2,0x920f6050,r2
> 919b4620:	ld_s	r2,[r2,0]
> 919b4622:	lsr	r0,r2,r0
> 919b4626:	j_s.d	[blink]
> 919b4628:	bmsk_s	r0,r0,0
> 919b462a:	nop_s
> 
> (patched) unsigned math
> __________________
> 
> 919b4614 <test_taint>:
> 919b4614:	lsr	r2,r0,0x5  @nr/32
>                 ^^^
> 919b4618:	add2	r2,0x920f6050,r2
> 919b4620:	ld_s	r2,[r2,0]
> 919b4622:	lsr	r0,r2,r0     #test_bit()
> 919b4626:	j_s.d	[blink]
> 919b4628:	bmsk_s	r0,r0,0
> 919b462a:	nop_s

Just FYI, but on arm64 the existing codegen is alright as we have both
arithmetic and logical shifts.

> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
> This is an RFC for feeback, I understand this impacts every arch,
> but as of now it is only buld/run tested on ARC.
> ---
> ---
>  include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

We should really move test_bit() into the atomic header, but I failed to fix
the resulting include mess last time I tried that.

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV
  2021-08-06 13:42         ` Will Deacon
@ 2021-08-06 19:02           ` Vineet Gupta
  0 siblings, 0 replies; 19+ messages in thread
From: Vineet Gupta @ 2021-08-06 19:02 UTC (permalink / raw)
  To: Will Deacon, Vineet Gupta
  Cc: Peter Zijlstra, Arnd Bergmann, Mark Rutland, linux-kernel,
	linux-arch, linux-snps-arc

On 8/6/21 6:42 AM, Will Deacon wrote:
> On Thu, Aug 05, 2021 at 12:14:08PM -0700, Vineet Gupta wrote:
>> signed math causes generation of costlier instructions such as DIV when
>> they could be done by barrerl shifter.
>>
>> Worse part is this is not caught by things like bloat-o-meter since
>> instruction length / symbols are typically same size.
>>
>> e.g.
>>
>> stock (signed math)
>> __________________
>>
>> 919b4614 <test_taint>:
>> 919b4614:	div	r2,r0,0x20
>>                  ^^^
>> 919b4618:	add2	r2,0x920f6050,r2
>> 919b4620:	ld_s	r2,[r2,0]
>> 919b4622:	lsr	r0,r2,r0
>> 919b4626:	j_s.d	[blink]
>> 919b4628:	bmsk_s	r0,r0,0
>> 919b462a:	nop_s
>>
>> (patched) unsigned math
>> __________________
>>
>> 919b4614 <test_taint>:
>> 919b4614:	lsr	r2,r0,0x5  @nr/32
>>                  ^^^
>> 919b4618:	add2	r2,0x920f6050,r2
>> 919b4620:	ld_s	r2,[r2,0]
>> 919b4622:	lsr	r0,r2,r0     #test_bit()
>> 919b4626:	j_s.d	[blink]
>> 919b4628:	bmsk_s	r0,r0,0
>> 919b462a:	nop_s
> Just FYI, but on arm64 the existing codegen is alright as we have both
> arithmetic and logical shifts.

ARC does too: There's LSR (Logical shift right) and ASR (Arithmetic 
Shift Right).
So perhaps something to be done in the compiler.

>> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
>> ---
>> This is an RFC for feeback, I understand this impacts every arch,
>> but as of now it is only buld/run tested on ARC.
>> ---
>> ---
>>   include/asm-generic/bitops/non-atomic.h | 14 +++++++-------
>>   1 file changed, 7 insertions(+), 7 deletions(-)
> Acked-by: Will Deacon <will@kernel.org>
>
> We should really move test_bit() into the atomic header, but I failed to fix
> the resulting include mess last time I tried that.

OK I'll give it a try too.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-08-06 19:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04 19:15 [PATCH 00/11] ARC atomics update Vineet Gupta
2021-08-04 19:15 ` [PATCH 01/11] ARC: atomics: disintegrate header Vineet Gupta
2021-08-04 19:15 ` [PATCH 02/11] ARC: atomic: !LLSC: remove hack in atomic_set() for for UP Vineet Gupta
2021-08-04 19:15 ` [PATCH 03/11] ARC: atomic: !LLSC: use int data type consistently Vineet Gupta
2021-08-04 19:15 ` [PATCH 04/11] ARC: atomic64: LLSC: elide unused atomic_{and,or,xor,andnot}_return Vineet Gupta
2021-08-04 19:15 ` [PATCH 05/11] ARC: atomics: implement relaxed variants Vineet Gupta
2021-08-04 19:15 ` [PATCH 06/11] ARC: switch to generic bitops Vineet Gupta
2021-08-04 19:15 ` [PATCH 07/11] ARC: bitops: fls/ffs to take int (vs long) per asm-generic defines Vineet Gupta
2021-08-04 19:15 ` [PATCH 08/11] ARC: xchg: !LLSC: remove UP micro-optimization/hack Vineet Gupta
2021-08-04 19:15 ` [PATCH 09/11] ARC: cmpxchg/xchg: rewrite as macros to make type safe Vineet Gupta
2021-08-04 19:15 ` [PATCH 10/11] ARC: cmpxchg/xchg: implement relaxed variants (LLSC config only) Vineet Gupta
2021-08-04 19:15 ` [PATCH 11/11] ARC: atomic_cmpxchg/atomic_xchg: implement relaxed variants Vineet Gupta
2021-08-05  9:02 ` [PATCH 00/11] ARC atomics update Peter Zijlstra
2021-08-05 16:18   ` Vineet Gupta
2021-08-05 17:04     ` Peter Zijlstra
2021-08-05 19:14       ` [RFC] bitops/non-atomic: make @nr unsigned to avoid any DIV Vineet Gupta
2021-08-06 13:42         ` Will Deacon
2021-08-06 19:02           ` Vineet Gupta
2021-08-06  8:41       ` [PATCH 00/11] ARC atomics update Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).