linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/37] MIPS: barriers & atomics cleanups
@ 2019-09-30 23:08 Paul Burton
  2019-09-30 23:08 ` [PATCH 01/37] MIPS: Unify sc beqz definition Paul Burton
                   ` (36 more replies)
  0 siblings, 37 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

This series consists of a bunch of cleanups to the way we handle memory
barriers (though no changes to the sync instructions we use to implement
them) & atomic memory accesses. One major goal was to ensure the
Loongson3 LL/SC errata workarounds are applied in a safe manner from
within inline-asm & that we can automatically verify the resulting
kernel binary looks reasonable. Many patches are cleanups found along
the way.

Applies atop v5.4-rc1.

Paul Burton (37):
  MIPS: Unify sc beqz definition
  MIPS: Use compact branch for LL/SC loops on MIPSr6+
  MIPS: barrier: Add __SYNC() infrastructure
  MIPS: barrier: Clean up rmb() & wmb() definitions
  MIPS: barrier: Clean up __smp_mb() definition
  MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery
  MIPS: barrier: Clean up __sync() definition
  MIPS: barrier: Clean up sync_ginv()
  MIPS: atomic: Fix whitespace in ATOMIC_OP macros
  MIPS: atomic: Handle !kernel_uses_llsc first
  MIPS: atomic: Use one macro to generate 32b & 64b functions
  MIPS: atomic: Emit Loongson3 sync workarounds within asm
  MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive()
  MIPS: atomic: Unify 32b & 64b sub_if_positive
  MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg
  MIPS: bitops: Use generic builtin ffs/fls; drop cpu_has_clo_clz
  MIPS: bitops: Handle !kernel_uses_llsc first
  MIPS: bitops: Only use ins for bit 16 or higher
  MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs
  MIPS: bitops: ins start position is always an immediate
  MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant
  MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit
  MIPS: bitops: Use the BIT() macro
  MIPS: bitops: Avoid redundant zero-comparison for non-LLSC
  MIPS: bitops: Abstract LL/SC loops
  MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG
  MIPS: bitops: Emit Loongson3 sync workarounds within asm
  MIPS: bitops: Use smp_mb__before_atomic in test_* ops
  MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm
  MIPS: cmpxchg: Omit redundant barriers for Loongson3
  MIPS: futex: Emit Loongson3 sync workarounds within asm
  MIPS: syscall: Emit Loongson3 sync workarounds within asm
  MIPS: barrier: Remove loongson_llsc_mb()
  MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3
  MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler
  MIPS: genex: Don't reload address unnecessarily
  MIPS: Check Loongson3 LL/SC errata workaround correctness

 arch/mips/Makefile                            |   2 +-
 arch/mips/Makefile.postlink                   |  10 +-
 arch/mips/include/asm/atomic.h                | 571 ++++++-----------
 arch/mips/include/asm/barrier.h               | 215 +------
 arch/mips/include/asm/bitops.h                | 593 ++++--------------
 arch/mips/include/asm/cmpxchg.h               |  59 +-
 arch/mips/include/asm/cpu-features.h          |  10 -
 arch/mips/include/asm/futex.h                 |   9 +-
 arch/mips/include/asm/llsc.h                  |  19 +-
 .../asm/mach-malta/cpu-feature-overrides.h    |   2 -
 arch/mips/include/asm/sync.h                  | 207 ++++++
 arch/mips/kernel/genex.S                      |   6 +-
 arch/mips/kernel/pm-cps.c                     |  20 +-
 arch/mips/kernel/syscall.c                    |   3 +-
 arch/mips/lib/bitops.c                        |  57 +-
 arch/mips/loongson64/Platform                 |   2 +-
 arch/mips/tools/.gitignore                    |   1 +
 arch/mips/tools/Makefile                      |   5 +
 arch/mips/tools/loongson3-llsc-check.c        | 307 +++++++++
 19 files changed, 975 insertions(+), 1123 deletions(-)
 create mode 100644 arch/mips/include/asm/sync.h
 create mode 100644 arch/mips/tools/loongson3-llsc-check.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/37] MIPS: Unify sc beqz definition
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 02/37] MIPS: Use compact branch for LL/SC loops on MIPSr6+ Paul Burton
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

We currently duplicate the definition of __scbeqz in asm/atomic.h &
asm/cmpxchg.h. Move it to asm/llsc.h & rename it to __SC_BEQZ to fit
better with the existing __SC macro provided there.

We include a tab in the string in order to avoid the need for users to
indent code any further to include whitespace of their own after the
instruction mnemonic.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/atomic.h  | 28 +++++++++-------------------
 arch/mips/include/asm/cmpxchg.h | 20 ++++----------------
 arch/mips/include/asm/llsc.h    | 11 +++++++++++
 3 files changed, 24 insertions(+), 35 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index bb8658cc7f12..7578c807ef98 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -20,19 +20,9 @@
 #include <asm/compiler.h>
 #include <asm/cpu-features.h>
 #include <asm/cmpxchg.h>
+#include <asm/llsc.h>
 #include <asm/war.h>
 
-/*
- * Using a branch-likely instruction to check the result of an sc instruction
- * works around a bug present in R10000 CPUs prior to revision 3.0 that could
- * cause ll-sc sequences to execute non-atomically.
- */
-#if R10000_LLSC_WAR
-# define __scbeqz "beqzl"
-#else
-# define __scbeqz "beqz"
-#endif
-
 #define ATOMIC_INIT(i)	  { (i) }
 
 /*
@@ -65,7 +55,7 @@ static __inline__ void atomic_##op(int i, atomic_t * v)			      \
 		"1:	ll	%0, %1		# atomic_" #op "	\n"   \
 		"	" #asm_op " %0, %2				\n"   \
 		"	sc	%0, %1					\n"   \
-		"\t" __scbeqz "	%0, 1b					\n"   \
+		"\t" __SC_BEQZ "%0, 1b					\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
 		: "Ir" (i) : __LLSC_CLOBBER);				      \
@@ -93,7 +83,7 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	      \
 		"1:	ll	%1, %2		# atomic_" #op "_return	\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	sc	%0, %2					\n"   \
-		"\t" __scbeqz "	%0, 1b					\n"   \
+		"\t" __SC_BEQZ "%0, 1b					\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
@@ -127,7 +117,7 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	      \
 		"1:	ll	%1, %2		# atomic_fetch_" #op "	\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	sc	%0, %2					\n"   \
-		"\t" __scbeqz "	%0, 1b					\n"   \
+		"\t" __SC_BEQZ "%0, 1b					\n"   \
 		"	.set	pop					\n"   \
 		"	move	%0, %1					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
@@ -205,7 +195,7 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_LEVEL"			\n"
 		"	sc	%1, %2					\n"
-		"\t" __scbeqz "	%1, 1b					\n"
+		"\t" __SC_BEQZ "%1, 1b					\n"
 		"2:							\n"
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
@@ -267,7 +257,7 @@ static __inline__ void atomic64_##op(s64 i, atomic64_t * v)		      \
 		"1:	lld	%0, %1		# atomic64_" #op "	\n"   \
 		"	" #asm_op " %0, %2				\n"   \
 		"	scd	%0, %1					\n"   \
-		"\t" __scbeqz "	%0, 1b					\n"   \
+		"\t" __SC_BEQZ "%0, 1b					\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
 		: "Ir" (i) : __LLSC_CLOBBER);				      \
@@ -295,7 +285,7 @@ static __inline__ s64 atomic64_##op##_return_relaxed(s64 i, atomic64_t * v)   \
 		"1:	lld	%1, %2		# atomic64_" #op "_return\n"  \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	scd	%0, %2					\n"   \
-		"\t" __scbeqz "	%0, 1b					\n"   \
+		"\t" __SC_BEQZ "%0, 1b					\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
@@ -329,7 +319,7 @@ static __inline__ s64 atomic64_fetch_##op##_relaxed(s64 i, atomic64_t * v)    \
 		"1:	lld	%1, %2		# atomic64_fetch_" #op "\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	scd	%0, %2					\n"   \
-		"\t" __scbeqz "	%0, 1b					\n"   \
+		"\t" __SC_BEQZ "%0, 1b					\n"   \
 		"	move	%0, %1					\n"   \
 		"	.set	pop					\n"   \
 		: "=&r" (result), "=&r" (temp),				      \
@@ -404,7 +394,7 @@ static __inline__ s64 atomic64_sub_if_positive(s64 i, atomic64_t * v)
 		"	move	%1, %0					\n"
 		"	bltz	%0, 1f					\n"
 		"	scd	%1, %2					\n"
-		"\t" __scbeqz "	%1, 1b					\n"
+		"\t" __SC_BEQZ "%1, 1b					\n"
 		"1:							\n"
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 79bf34efbc04..5d3f0e3513b4 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -11,19 +11,9 @@
 #include <linux/bug.h>
 #include <linux/irqflags.h>
 #include <asm/compiler.h>
+#include <asm/llsc.h>
 #include <asm/war.h>
 
-/*
- * Using a branch-likely instruction to check the result of an sc instruction
- * works around a bug present in R10000 CPUs prior to revision 3.0 that could
- * cause ll-sc sequences to execute non-atomically.
- */
-#if R10000_LLSC_WAR
-# define __scbeqz "beqzl"
-#else
-# define __scbeqz "beqz"
-#endif
-
 /*
  * These functions doesn't exist, so if they are called you'll either:
  *
@@ -57,7 +47,7 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 		"	move	$1, %z3				\n"	\
 		"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"	\
 		"	" st "	$1, %1				\n"	\
-		"\t" __scbeqz "	$1, 1b				\n"	\
+		"\t" __SC_BEQZ	"$1, 1b				\n"	\
 		"	.set	pop				\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)			\
@@ -130,7 +120,7 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
 		"	move	$1, %z4				\n"	\
 		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
 		"	" st "	$1, %1				\n"	\
-		"\t" __scbeqz "	$1, 1b				\n"	\
+		"\t" __SC_BEQZ	"$1, 1b				\n"	\
 		"	.set	pop				\n"	\
 		"2:						\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
@@ -268,7 +258,7 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
 	/* Attempt to store new at ptr */
 	"	scd	%L1, %2				\n"
 	/* If we failed, loop! */
-	"\t" __scbeqz "	%L1, 1b				\n"
+	"\t" __SC_BEQZ "%L1, 1b				\n"
 	"	.set	pop				\n"
 	"2:						\n"
 	: "=&r"(ret),
@@ -311,6 +301,4 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
 # endif /* !CONFIG_SMP */
 #endif /* !CONFIG_64BIT */
 
-#undef __scbeqz
-
 #endif /* __ASM_CMPXCHG_H */
diff --git a/arch/mips/include/asm/llsc.h b/arch/mips/include/asm/llsc.h
index c6d17d171147..9b19f38562ac 100644
--- a/arch/mips/include/asm/llsc.h
+++ b/arch/mips/include/asm/llsc.h
@@ -25,4 +25,15 @@
 #define __EXT		"dext	"
 #endif
 
+/*
+ * Using a branch-likely instruction to check the result of an sc instruction
+ * works around a bug present in R10000 CPUs prior to revision 3.0 that could
+ * cause ll-sc sequences to execute non-atomically.
+ */
+#if R10000_LLSC_WAR
+# define __SC_BEQZ "beqzl	"
+#else
+# define __SC_BEQZ "beqz	"
+#endif
+
 #endif /* __ASM_LLSC_H  */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/37] MIPS: barrier: Add __SYNC() infrastructure
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
  2019-09-30 23:08 ` [PATCH 01/37] MIPS: Unify sc beqz definition Paul Burton
  2019-09-30 23:08 ` [PATCH 02/37] MIPS: Use compact branch for LL/SC loops on MIPSr6+ Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 04/37] MIPS: barrier: Clean up rmb() & wmb() definitions Paul Burton
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Introduce an asm/sync.h header which provides infrastructure that can be
used to generate sync instructions of various types, and for various
reasons. For example if we need a sync instruction that provides a full
completion barrier but only on systems which have weak memory ordering,
we can generate the appropriate assembly code using:

  __SYNC(full, weak_ordering)

When the kernel is configured to run on systems with weak memory
ordering (ie. CONFIG_WEAK_ORDERING is selected) we'll emit a sync
instruction. When the kernel is configured to run on systems with strong
memory ordering (ie. CONFIG_WEAK_ORDERING is not selected) we'll emit
nothing. The caller doesn't need to know which happened - it simply says
what it needs & when, with no concern for checking the kernel
configuration.

There are some scenarios in which we may want to emit code only when we
*didn't* emit a sync instruction. For example, some Loongson3 CPUs
suffer from a bug that requires us to emit a sync instruction prior to
each ll instruction (enabled by CONFIG_CPU_LOONGSON3_WORKAROUNDS). In
cases where this bug workaround is enabled, it's wasteful to then have
more generic code emit another sync instruction to provide barriers we
need in general. A __SYNC_ELSE() macro allows for this, providing an
extra argument that contains code to be assembled only in cases where
the sync instruction was not emitted. For example if we have a scenario
in which we generally want to emit a release barrier but for affected
Loongson3 configurations upgrade that to a full completion barrier, we
can do that like so:

  __SYNC_ELSE(full, loongson3_war, __SYNC(rl, always))

The assembly generated by these macros can be used either as inline
assembly or in assembly source files.

Differing types of sync as provided by MIPSr6 are defined, but currently
they all generate a full completion barrier except in kernels configured
for Cavium Octeon systems. There the wmb sync-type is used, and rmb
syncs are omitted, as has been the case since commit 6b07d38aaa52
("MIPS: Octeon: Use optimized memory barrier primitives."). Using
__SYNC() with the wmb or rmb types will abstract away the Octeon
specific behavior and allow us to later clean up asm/barrier.h code that
currently includes a plethora of #ifdef's.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/barrier.h | 113 +----------------
 arch/mips/include/asm/sync.h    | 207 ++++++++++++++++++++++++++++++++
 arch/mips/kernel/pm-cps.c       |  20 +--
 3 files changed, 219 insertions(+), 121 deletions(-)
 create mode 100644 arch/mips/include/asm/sync.h

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 9228f7386220..5ad39bfd3b6d 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -9,116 +9,7 @@
 #define __ASM_BARRIER_H
 
 #include <asm/addrspace.h>
-
-/*
- * Sync types defined by the MIPS architecture (document MD00087 table 6.5)
- * These values are used with the sync instruction to perform memory barriers.
- * Types of ordering guarantees available through the SYNC instruction:
- * - Completion Barriers
- * - Ordering Barriers
- * As compared to the completion barrier, the ordering barrier is a
- * lighter-weight operation as it does not require the specified instructions
- * before the SYNC to be already completed. Instead it only requires that those
- * specified instructions which are subsequent to the SYNC in the instruction
- * stream are never re-ordered for processing ahead of the specified
- * instructions which are before the SYNC in the instruction stream.
- * This potentially reduces how many cycles the barrier instruction must stall
- * before it completes.
- * Implementations that do not use any of the non-zero values of stype to define
- * different barriers, such as ordering barriers, must make those stype values
- * act the same as stype zero.
- */
-
-/*
- * Completion barriers:
- * - Every synchronizable specified memory instruction (loads or stores or both)
- *   that occurs in the instruction stream before the SYNC instruction must be
- *   already globally performed before any synchronizable specified memory
- *   instructions that occur after the SYNC are allowed to be performed, with
- *   respect to any other processor or coherent I/O module.
- *
- * - The barrier does not guarantee the order in which instruction fetches are
- *   performed.
- *
- * - A stype value of zero will always be defined such that it performs the most
- *   complete set of synchronization operations that are defined.This means
- *   stype zero always does a completion barrier that affects both loads and
- *   stores preceding the SYNC instruction and both loads and stores that are
- *   subsequent to the SYNC instruction. Non-zero values of stype may be defined
- *   by the architecture or specific implementations to perform synchronization
- *   behaviors that are less complete than that of stype zero. If an
- *   implementation does not use one of these non-zero values to define a
- *   different synchronization behavior, then that non-zero value of stype must
- *   act the same as stype zero completion barrier. This allows software written
- *   for an implementation with a lighter-weight barrier to work on another
- *   implementation which only implements the stype zero completion barrier.
- *
- * - A completion barrier is required, potentially in conjunction with SSNOP (in
- *   Release 1 of the Architecture) or EHB (in Release 2 of the Architecture),
- *   to guarantee that memory reference results are visible across operating
- *   mode changes. For example, a completion barrier is required on some
- *   implementations on entry to and exit from Debug Mode to guarantee that
- *   memory effects are handled correctly.
- */
-
-/*
- * stype 0 - A completion barrier that affects preceding loads and stores and
- * subsequent loads and stores.
- * Older instructions which must reach the load/store ordering point before the
- * SYNC instruction completes: Loads, Stores
- * Younger instructions which must reach the load/store ordering point only
- * after the SYNC instruction completes: Loads, Stores
- * Older instructions which must be globally performed when the SYNC instruction
- * completes: Loads, Stores
- */
-#define STYPE_SYNC 0x0
-
-/*
- * Ordering barriers:
- * - Every synchronizable specified memory instruction (loads or stores or both)
- *   that occurs in the instruction stream before the SYNC instruction must
- *   reach a stage in the load/store datapath after which no instruction
- *   re-ordering is possible before any synchronizable specified memory
- *   instruction which occurs after the SYNC instruction in the instruction
- *   stream reaches the same stage in the load/store datapath.
- *
- * - If any memory instruction before the SYNC instruction in program order,
- *   generates a memory request to the external memory and any memory
- *   instruction after the SYNC instruction in program order also generates a
- *   memory request to external memory, the memory request belonging to the
- *   older instruction must be globally performed before the time the memory
- *   request belonging to the younger instruction is globally performed.
- *
- * - The barrier does not guarantee the order in which instruction fetches are
- *   performed.
- */
-
-/*
- * stype 0x10 - An ordering barrier that affects preceding loads and stores and
- * subsequent loads and stores.
- * Older instructions which must reach the load/store ordering point before the
- * SYNC instruction completes: Loads, Stores
- * Younger instructions which must reach the load/store ordering point only
- * after the SYNC instruction completes: Loads, Stores
- * Older instructions which must be globally performed when the SYNC instruction
- * completes: N/A
- */
-#define STYPE_SYNC_MB 0x10
-
-/*
- * stype 0x14 - A completion barrier specific to global invalidations
- *
- * When a sync instruction of this type completes any preceding GINVI or GINVT
- * operation has been globalized & completed on all coherent CPUs. Anything
- * that the GINV* instruction should invalidate will have been invalidated on
- * all coherent CPUs when this instruction completes. It is implementation
- * specific whether the GINV* instructions themselves will ensure completion,
- * or this sync type will.
- *
- * In systems implementing global invalidates (ie. with Config5.GI == 2 or 3)
- * this sync type also requires that previous SYNCI operations have completed.
- */
-#define STYPE_GINV	0x14
+#include <asm/sync.h>
 
 #ifdef CONFIG_CPU_HAS_SYNC
 #define __sync()				\
@@ -286,7 +177,7 @@
 
 static inline void sync_ginv(void)
 {
-	asm volatile("sync\t%0" :: "i"(STYPE_GINV));
+	asm volatile("sync\t%0" :: "i"(__SYNC_ginv));
 }
 
 #include <asm-generic/barrier.h>
diff --git a/arch/mips/include/asm/sync.h b/arch/mips/include/asm/sync.h
new file mode 100644
index 000000000000..7c6a1095f556
--- /dev/null
+++ b/arch/mips/include/asm/sync.h
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __MIPS_ASM_SYNC_H__
+#define __MIPS_ASM_SYNC_H__
+
+/*
+ * sync types are defined by the MIPS64 Instruction Set documentation in Volume
+ * II-A of the MIPS Architecture Reference Manual, which can be found here:
+ *
+ *   https://www.mips.com/?do-download=the-mips64-instruction-set-v6-06
+ *
+ * Two types of barrier are provided:
+ *
+ *   1) Completion barriers, which ensure that a memory operation has actually
+ *      completed & often involve stalling the CPU pipeline to do so.
+ *
+ *   2) Ordering barriers, which only ensure that affected memory operations
+ *      won't be reordered in the CPU pipeline in a manner that violates the
+ *      restrictions imposed by the barrier.
+ *
+ * Ordering barriers can be more efficient than completion barriers, since:
+ *
+ *   a) Ordering barriers only require memory access instructions which preceed
+ *      them in program order (older instructions) to reach a point in the
+ *      load/store datapath beyond which reordering is not possible before
+ *      allowing memory access instructions which follow them (younger
+ *      instructions) to be performed.  That is, older instructions don't
+ *      actually need to complete - they just need to get far enough that all
+ *      other coherent CPUs will observe their completion before they observe
+ *      the effects of younger instructions.
+ *
+ *   b) Multiple variants of ordering barrier are provided which allow the
+ *      effects to be restricted to different combinations of older or younger
+ *      loads or stores. By way of example, if we only care that stores older
+ *      than a barrier are observed prior to stores that are younger than a
+ *      barrier & don't care about the ordering of loads then the 'wmb'
+ *      ordering barrier can be used. Limiting the barrier's effects to stores
+ *      allows loads to continue unaffected & potentially allows the CPU to
+ *      make progress faster than if younger loads had to wait for older stores
+ *      to complete.
+ */
+
+/*
+ * No sync instruction at all; used to allow code to nullify the effect of the
+ * __SYNC() macro without needing lots of #ifdefery.
+ */
+#define __SYNC_none	-1
+
+/*
+ * A full completion barrier; all memory accesses appearing prior to this sync
+ * instruction in program order must complete before any memory accesses
+ * appearing after this sync instruction in program order.
+ */
+#define __SYNC_full	0x00
+
+/*
+ * For now we use a full completion barrier to implement all sync types, until
+ * we're satisfied that lightweight ordering barriers defined by MIPSr6 are
+ * sufficient to uphold our desired memory model.
+ */
+#define __SYNC_aq	__SYNC_full
+#define __SYNC_rl	__SYNC_full
+#define __SYNC_mb	__SYNC_full
+
+/*
+ * ...except on Cavium Octeon CPUs, which have been using the 'wmb' ordering
+ * barrier since 2010 & omit 'rmb' barriers because the CPUs don't perform
+ * speculative reads.
+ */
+#ifdef CONFIG_CPU_CAVIUM_OCTEON
+# define __SYNC_rmb	__SYNC_none
+# define __SYNC_wmb	0x04
+#else
+# define __SYNC_rmb	__SYNC_full
+# define __SYNC_wmb	__SYNC_full
+#endif
+
+/*
+ * A GINV sync is a little different; it doesn't relate directly to loads or
+ * stores, but instead causes synchronization of an icache or TLB global
+ * invalidation operation triggered by the ginvi or ginvt instructions
+ * respectively. In cases where we need to know that a ginvi or ginvt operation
+ * has been performed by all coherent CPUs, we must issue a sync instruction of
+ * this type. Once this instruction graduates all coherent CPUs will have
+ * observed the invalidation.
+ */
+#define __SYNC_ginv	0x14
+
+/* Trivial; indicate that we always need this sync instruction. */
+#define __SYNC_always	(1 << 0)
+
+/*
+ * Indicate that we need this sync instruction only on systems with weakly
+ * ordered memory access. In general this is most MIPS systems, but there are
+ * exceptions which provide strongly ordered memory.
+ */
+#ifdef CONFIG_WEAK_ORDERING
+# define __SYNC_weak_ordering	(1 << 1)
+#else
+# define __SYNC_weak_ordering	0
+#endif
+
+/*
+ * Indicate that we need this sync instruction only on systems where LL/SC
+ * don't implicitly provide a memory barrier. In general this is most MIPS
+ * systems.
+ */
+#ifdef CONFIG_WEAK_REORDERING_BEYOND_LLSC
+# define __SYNC_weak_llsc	(1 << 2)
+#else
+# define __SYNC_weak_llsc	0
+#endif
+
+/*
+ * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
+ * store or prefetch) in between an LL & SC can cause the SC instruction to
+ * erroneously succeed, breaking atomicity. Whilst it's unusual to write code
+ * containing such sequences, this bug bites harder than we might otherwise
+ * expect due to reordering & speculation:
+ *
+ * 1) A memory access appearing prior to the LL in program order may actually
+ *    be executed after the LL - this is the reordering case.
+ *
+ *    In order to avoid this we need to place a memory barrier (ie. a SYNC
+ *    instruction) prior to every LL instruction, in between it and any earlier
+ *    memory access instructions.
+ *
+ *    This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later.
+ *
+ * 2) If a conditional branch exists between an LL & SC with a target outside
+ *    of the LL-SC loop, for example an exit upon value mismatch in cmpxchg()
+ *    or similar, then misprediction of the branch may allow speculative
+ *    execution of memory accesses from outside of the LL-SC loop.
+ *
+ *    In order to avoid this we need a memory barrier (ie. a SYNC instruction)
+ *    at each affected branch target.
+ *
+ *    This case affects all current Loongson 3 CPUs.
+ *
+ * The above described cases cause an error in the cache coherence protocol;
+ * such that the Invalidate of a competing LL-SC goes 'missing' and SC
+ * erroneously observes its core still has Exclusive state and lets the SC
+ * proceed.
+ *
+ * Therefore the error only occurs on SMP systems.
+ */
+#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS
+# define __SYNC_loongson3_war	(1 << 31)
+#else
+# define __SYNC_loongson3_war	0
+#endif
+
+/*
+ * Some Cavium Octeon CPUs suffer from a bug that causes a single wmb ordering
+ * barrier to be ineffective, requiring the use of 2 in sequence to provide an
+ * effective barrier as noted by commit 6b07d38aaa52 ("MIPS: Octeon: Use
+ * optimized memory barrier primitives."). Here we specify that the affected
+ * sync instructions should be emitted twice.
+ */
+#ifdef CONFIG_CPU_CAVIUM_OCTEON
+# define __SYNC_rpt(type)	(1 + (type == __SYNC_wmb))
+#else
+# define __SYNC_rpt(type)	1
+#endif
+
+/*
+ * The main event. Here we actually emit a sync instruction of a given type, if
+ * reason is non-zero.
+ *
+ * In future we have the option of emitting entries in a fixups-style table
+ * here that would allow us to opportunistically remove some sync instructions
+ * when we detect at runtime that we're running on a CPU that doesn't need
+ * them.
+ */
+#ifdef CONFIG_CPU_HAS_SYNC
+# define ____SYNC(_type, _reason, _else)			\
+	.if	(( _type ) != -1) && ( _reason );		\
+	.set	push;						\
+	.set	MIPS_ISA_LEVEL_RAW;				\
+	.rept	__SYNC_rpt(_type);				\
+	sync	_type;						\
+	.endr;							\
+	.set	pop;						\
+	.else;							\
+	_else;							\
+	.endif
+#else
+# define ____SYNC(_type, _reason, _else)
+#endif
+
+/*
+ * Preprocessor magic to expand macros used as arguments before we insert them
+ * into assembly code.
+ */
+#ifdef __ASSEMBLY__
+# define ___SYNC(type, reason, else)				\
+	____SYNC(type, reason, else)
+#else
+# define ___SYNC(type, reason, else)				\
+	__stringify(____SYNC(type, reason, else))
+#endif
+
+#define __SYNC(type, reason)					\
+	___SYNC(__SYNC_##type, __SYNC_##reason, )
+#define __SYNC_ELSE(type, reason, else)				\
+	___SYNC(__SYNC_##type, __SYNC_##reason, else)
+
+#endif /* __MIPS_ASM_SYNC_H__ */
diff --git a/arch/mips/kernel/pm-cps.c b/arch/mips/kernel/pm-cps.c
index a26f40db15d0..9bf60d7d44d3 100644
--- a/arch/mips/kernel/pm-cps.c
+++ b/arch/mips/kernel/pm-cps.c
@@ -307,7 +307,7 @@ static int cps_gen_flush_fsb(u32 **pp, struct uasm_label **pl,
 	}
 
 	/* Barrier ensuring previous cache invalidates are complete */
-	uasm_i_sync(pp, STYPE_SYNC);
+	uasm_i_sync(pp, __SYNC_full);
 	uasm_i_ehb(pp);
 
 	/* Check whether the pipeline stalled due to the FSB being full */
@@ -397,7 +397,7 @@ static void *cps_gen_entry_code(unsigned cpu, enum cps_pm_state state)
 
 	if (coupled_coherence) {
 		/* Increment ready_count */
-		uasm_i_sync(&p, STYPE_SYNC_MB);
+		uasm_i_sync(&p, __SYNC_mb);
 		uasm_build_label(&l, p, lbl_incready);
 		uasm_i_ll(&p, t1, 0, r_nc_count);
 		uasm_i_addiu(&p, t2, t1, 1);
@@ -406,7 +406,7 @@ static void *cps_gen_entry_code(unsigned cpu, enum cps_pm_state state)
 		uasm_i_addiu(&p, t1, t1, 1);
 
 		/* Barrier ensuring all CPUs see the updated r_nc_count value */
-		uasm_i_sync(&p, STYPE_SYNC_MB);
+		uasm_i_sync(&p, __SYNC_mb);
 
 		/*
 		 * If this is the last VPE to become ready for non-coherence
@@ -473,7 +473,7 @@ static void *cps_gen_entry_code(unsigned cpu, enum cps_pm_state state)
 			      Index_Writeback_Inv_D, lbl_flushdcache);
 
 	/* Barrier ensuring previous cache invalidates are complete */
-	uasm_i_sync(&p, STYPE_SYNC);
+	uasm_i_sync(&p, __SYNC_full);
 	uasm_i_ehb(&p);
 
 	if (mips_cm_revision() < CM_REV_CM3) {
@@ -487,7 +487,7 @@ static void *cps_gen_entry_code(unsigned cpu, enum cps_pm_state state)
 		uasm_i_lw(&p, t0, 0, r_pcohctl);
 
 		/* Barrier to ensure write to coherence control is complete */
-		uasm_i_sync(&p, STYPE_SYNC);
+		uasm_i_sync(&p, __SYNC_full);
 		uasm_i_ehb(&p);
 	}
 
@@ -534,7 +534,7 @@ static void *cps_gen_entry_code(unsigned cpu, enum cps_pm_state state)
 		}
 
 		/* Barrier to ensure write to CPC command is complete */
-		uasm_i_sync(&p, STYPE_SYNC);
+		uasm_i_sync(&p, __SYNC_full);
 		uasm_i_ehb(&p);
 	}
 
@@ -572,13 +572,13 @@ static void *cps_gen_entry_code(unsigned cpu, enum cps_pm_state state)
 	uasm_i_lw(&p, t0, 0, r_pcohctl);
 
 	/* Barrier to ensure write to coherence control is complete */
-	uasm_i_sync(&p, STYPE_SYNC);
+	uasm_i_sync(&p, __SYNC_full);
 	uasm_i_ehb(&p);
 
 	if (coupled_coherence && (state == CPS_PM_NC_WAIT)) {
 		/* Decrement ready_count */
 		uasm_build_label(&l, p, lbl_decready);
-		uasm_i_sync(&p, STYPE_SYNC_MB);
+		uasm_i_sync(&p, __SYNC_mb);
 		uasm_i_ll(&p, t1, 0, r_nc_count);
 		uasm_i_addiu(&p, t2, t1, -1);
 		uasm_i_sc(&p, t2, 0, r_nc_count);
@@ -586,7 +586,7 @@ static void *cps_gen_entry_code(unsigned cpu, enum cps_pm_state state)
 		uasm_i_andi(&p, v0, t1, (1 << fls(smp_num_siblings)) - 1);
 
 		/* Barrier ensuring all CPUs see the updated r_nc_count value */
-		uasm_i_sync(&p, STYPE_SYNC_MB);
+		uasm_i_sync(&p, __SYNC_mb);
 	}
 
 	if (coupled_coherence && (state == CPS_PM_CLOCK_GATED)) {
@@ -608,7 +608,7 @@ static void *cps_gen_entry_code(unsigned cpu, enum cps_pm_state state)
 		uasm_build_label(&l, p, lbl_secondary_cont);
 
 		/* Barrier ensuring all CPUs see the updated r_nc_count value */
-		uasm_i_sync(&p, STYPE_SYNC_MB);
+		uasm_i_sync(&p, __SYNC_mb);
 	}
 
 	/* The core is coherent, time to return to C code */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/37] MIPS: Use compact branch for LL/SC loops on MIPSr6+
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
  2019-09-30 23:08 ` [PATCH 01/37] MIPS: Unify sc beqz definition Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 03/37] MIPS: barrier: Add __SYNC() infrastructure Paul Burton
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

When targeting MIPSr6 or higher make use of a compact branch in LL/SC
loops, preventing the insertion of a delay slot nop that only serves to
waste space.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/llsc.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/mips/include/asm/llsc.h b/arch/mips/include/asm/llsc.h
index 9b19f38562ac..d240a4a2d1c4 100644
--- a/arch/mips/include/asm/llsc.h
+++ b/arch/mips/include/asm/llsc.h
@@ -9,6 +9,8 @@
 #ifndef __ASM_LLSC_H
 #define __ASM_LLSC_H
 
+#include <asm/isa-rev.h>
+
 #if _MIPS_SZLONG == 32
 #define SZLONG_LOG 5
 #define SZLONG_MASK 31UL
@@ -32,6 +34,8 @@
  */
 #if R10000_LLSC_WAR
 # define __SC_BEQZ "beqzl	"
+#elif MIPS_ISA_REV >= 6
+# define __SC_BEQZ "beqzc	"
 #else
 # define __SC_BEQZ "beqz	"
 #endif
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/37] MIPS: barrier: Clean up rmb() & wmb() definitions
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (2 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 03/37] MIPS: barrier: Add __SYNC() infrastructure Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 05/37] MIPS: barrier: Clean up __smp_mb() definition Paul Burton
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Simplify our definitions of rmb() & wmb() using the new __SYNC()
infrastructure.

The fast_rmb() & fast_wmb() macros are removed, since they only provided
a level of indirection that made the code less readable & weren't
directly used anywhere in the kernel tree.

The Octeon #ifdef'ery is removed, since the "syncw" instruction
previously used is merely an alias for "sync 4" which __SYNC() will emit
for the wmb sync type when the kernel is configured for an Octeon CPU.
Similarly __SYNC() will emit nothing for the rmb sync type in Octeon
configurations.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/barrier.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 5ad39bfd3b6d..f36cab87cfde 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -26,6 +26,18 @@
 #define __sync()	do { } while(0)
 #endif
 
+static inline void rmb(void)
+{
+	asm volatile(__SYNC(rmb, always) ::: "memory");
+}
+#define rmb rmb
+
+static inline void wmb(void)
+{
+	asm volatile(__SYNC(wmb, always) ::: "memory");
+}
+#define wmb wmb
+
 #define __fast_iob()				\
 	__asm__ __volatile__(			\
 		".set	push\n\t"		\
@@ -37,16 +49,9 @@
 		: "m" (*(int *)CKSEG1)		\
 		: "memory")
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
-# define OCTEON_SYNCW_STR	".set push\n.set arch=octeon\nsyncw\nsyncw\n.set pop\n"
-# define __syncw()	__asm__ __volatile__(OCTEON_SYNCW_STR : : : "memory")
-
-# define fast_wmb()	__syncw()
-# define fast_rmb()	barrier()
 # define fast_mb()	__sync()
 # define fast_iob()	do { } while (0)
 #else /* ! CONFIG_CPU_CAVIUM_OCTEON */
-# define fast_wmb()	__sync()
-# define fast_rmb()	__sync()
 # define fast_mb()	__sync()
 # ifdef CONFIG_SGI_IP28
 #  define fast_iob()				\
@@ -83,19 +88,14 @@
 
 #endif /* !CONFIG_CPU_HAS_WB */
 
-#define wmb()		fast_wmb()
-#define rmb()		fast_rmb()
-
 #if defined(CONFIG_WEAK_ORDERING)
 # ifdef CONFIG_CPU_CAVIUM_OCTEON
 #  define __smp_mb()	__sync()
-#  define __smp_rmb()	barrier()
-#  define __smp_wmb()	__syncw()
 # else
 #  define __smp_mb()	__asm__ __volatile__("sync" : : :"memory")
-#  define __smp_rmb()	__asm__ __volatile__("sync" : : :"memory")
-#  define __smp_wmb()	__asm__ __volatile__("sync" : : :"memory")
 # endif
+# define __smp_rmb()	rmb()
+# define __smp_wmb()	wmb()
 #else
 #define __smp_mb()	barrier()
 #define __smp_rmb()	barrier()
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/37] MIPS: barrier: Clean up __smp_mb() definition
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (3 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 04/37] MIPS: barrier: Clean up rmb() & wmb() definitions Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 06/37] MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery Paul Burton
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

We #ifdef on Cavium Octeon CPUs, but emit the same sync instruction in
both cases. Remove the #ifdef & simply expand to the __sync() macro.

Whilst here indent the strong ordering case definitions to match the
indentation of the weak ordering ones, helping readability.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/barrier.h | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index f36cab87cfde..8a5abc1c85a6 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -89,17 +89,13 @@ static inline void wmb(void)
 #endif /* !CONFIG_CPU_HAS_WB */
 
 #if defined(CONFIG_WEAK_ORDERING)
-# ifdef CONFIG_CPU_CAVIUM_OCTEON
-#  define __smp_mb()	__sync()
-# else
-#  define __smp_mb()	__asm__ __volatile__("sync" : : :"memory")
-# endif
+# define __smp_mb()	__sync()
 # define __smp_rmb()	rmb()
 # define __smp_wmb()	wmb()
 #else
-#define __smp_mb()	barrier()
-#define __smp_rmb()	barrier()
-#define __smp_wmb()	barrier()
+# define __smp_mb()	barrier()
+# define __smp_rmb()	barrier()
+# define __smp_wmb()	barrier()
 #endif
 
 /*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/37] MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (4 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 05/37] MIPS: barrier: Clean up __smp_mb() definition Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 07/37] MIPS: barrier: Clean up __sync() definition Paul Burton
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

The definition of fast_mb() is the same in both the Octeon & non-Octeon
cases, so remove the duplication & define it only once.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/barrier.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 8a5abc1c85a6..657ec01120a4 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -38,6 +38,8 @@ static inline void wmb(void)
 }
 #define wmb wmb
 
+#define fast_mb()	__sync()
+
 #define __fast_iob()				\
 	__asm__ __volatile__(			\
 		".set	push\n\t"		\
@@ -49,10 +51,8 @@ static inline void wmb(void)
 		: "m" (*(int *)CKSEG1)		\
 		: "memory")
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
-# define fast_mb()	__sync()
 # define fast_iob()	do { } while (0)
 #else /* ! CONFIG_CPU_CAVIUM_OCTEON */
-# define fast_mb()	__sync()
 # ifdef CONFIG_SGI_IP28
 #  define fast_iob()				\
 	__asm__ __volatile__(			\
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/37] MIPS: barrier: Clean up __sync() definition
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (5 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 06/37] MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 08/37] MIPS: barrier: Clean up sync_ginv() Paul Burton
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Implement __sync() using the new __SYNC() infrastructure, which will
take care of not emitting an instruction for old R3k CPUs that don't
support it. The only behavioral difference is that __sync() will now
provide a compiler barrier on these old CPUs, but that seems like
reasonable behavior anyway.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/barrier.h | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 657ec01120a4..a117c6d95038 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -11,20 +11,10 @@
 #include <asm/addrspace.h>
 #include <asm/sync.h>
 
-#ifdef CONFIG_CPU_HAS_SYNC
-#define __sync()				\
-	__asm__ __volatile__(			\
-		".set	push\n\t"		\
-		".set	noreorder\n\t"		\
-		".set	mips2\n\t"		\
-		"sync\n\t"			\
-		".set	pop"			\
-		: /* no output */		\
-		: /* no input */		\
-		: "memory")
-#else
-#define __sync()	do { } while(0)
-#endif
+static inline void __sync(void)
+{
+	asm volatile(__SYNC(full, always) ::: "memory");
+}
 
 static inline void rmb(void)
 {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/37] MIPS: barrier: Clean up sync_ginv()
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (6 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 07/37] MIPS: barrier: Clean up __sync() definition Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 09/37] MIPS: atomic: Fix whitespace in ATOMIC_OP macros Paul Burton
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Use the new __SYNC() infrastructure to implement sync_ginv(), for
consistency with much of the rest of the asm/barrier.h.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index a117c6d95038..c7e05e832da9 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -163,7 +163,7 @@ static inline void wmb(void)
 
 static inline void sync_ginv(void)
 {
-	asm volatile("sync\t%0" :: "i"(__SYNC_ginv));
+	asm volatile(__SYNC(ginv, always));
 }
 
 #include <asm-generic/barrier.h>
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/37] MIPS: atomic: Fix whitespace in ATOMIC_OP macros
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (7 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 08/37] MIPS: barrier: Clean up sync_ginv() Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 11/37] MIPS: atomic: Use one macro to generate 32b & 64b functions Paul Burton
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

We define macros in asm/atomic.h which end each line with space
characters before a backslash to continue on the next line. Remove the
space characters leaving tabs as the whitespace used for conformity with
coding convention.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/atomic.h | 184 ++++++++++++++++-----------------
 1 file changed, 92 insertions(+), 92 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 7578c807ef98..2d2a8a74c51b 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -42,102 +42,102 @@
  */
 #define atomic_set(v, i)	WRITE_ONCE((v)->counter, (i))
 
-#define ATOMIC_OP(op, c_op, asm_op)					      \
-static __inline__ void atomic_##op(int i, atomic_t * v)			      \
-{									      \
-	if (kernel_uses_llsc) {						      \
-		int temp;						      \
-									      \
-		loongson_llsc_mb();					      \
-		__asm__ __volatile__(					      \
-		"	.set	push					\n"   \
-		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	ll	%0, %1		# atomic_" #op "	\n"   \
-		"	" #asm_op " %0, %2				\n"   \
-		"	sc	%0, %1					\n"   \
-		"\t" __SC_BEQZ "%0, 1b					\n"   \
-		"	.set	pop					\n"   \
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
-		: "Ir" (i) : __LLSC_CLOBBER);				      \
-	} else {							      \
-		unsigned long flags;					      \
-									      \
-		raw_local_irq_save(flags);				      \
-		v->counter c_op i;					      \
-		raw_local_irq_restore(flags);				      \
-	}								      \
+#define ATOMIC_OP(op, c_op, asm_op)					\
+static __inline__ void atomic_##op(int i, atomic_t * v)			\
+{									\
+	if (kernel_uses_llsc) {						\
+		int temp;						\
+									\
+		loongson_llsc_mb();					\
+		__asm__ __volatile__(					\
+		"	.set	push				\n"	\
+		"	.set	"MIPS_ISA_LEVEL"		\n"	\
+		"1:	ll	%0, %1	# atomic_" #op "	\n"	\
+		"	" #asm_op " %0, %2			\n"	\
+		"	sc	%0, %1				\n"	\
+		"\t" __SC_BEQZ "%0, 1b				\n"	\
+		"	.set	pop				\n"	\
+		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	\
+		: "Ir" (i) : __LLSC_CLOBBER);				\
+	} else {							\
+		unsigned long flags;					\
+									\
+		raw_local_irq_save(flags);				\
+		v->counter c_op i;					\
+		raw_local_irq_restore(flags);				\
+	}								\
 }
 
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)				      \
-static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	      \
-{									      \
-	int result;							      \
-									      \
-	if (kernel_uses_llsc) {						      \
-		int temp;						      \
-									      \
-		loongson_llsc_mb();					      \
-		__asm__ __volatile__(					      \
-		"	.set	push					\n"   \
-		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	ll	%1, %2		# atomic_" #op "_return	\n"   \
-		"	" #asm_op " %0, %1, %3				\n"   \
-		"	sc	%0, %2					\n"   \
-		"\t" __SC_BEQZ "%0, 1b					\n"   \
-		"	" #asm_op " %0, %1, %3				\n"   \
-		"	.set	pop					\n"   \
-		: "=&r" (result), "=&r" (temp),				      \
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i) : __LLSC_CLOBBER);				      \
-	} else {							      \
-		unsigned long flags;					      \
-									      \
-		raw_local_irq_save(flags);				      \
-		result = v->counter;					      \
-		result c_op i;						      \
-		v->counter = result;					      \
-		raw_local_irq_restore(flags);				      \
-	}								      \
-									      \
-	return result;							      \
+#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	\
+{									\
+	int result;							\
+									\
+	if (kernel_uses_llsc) {						\
+		int temp;						\
+									\
+		loongson_llsc_mb();					\
+		__asm__ __volatile__(					\
+		"	.set	push				\n"	\
+		"	.set	"MIPS_ISA_LEVEL"		\n"	\
+		"1:	ll	%1, %2	# atomic_" #op "_return	\n"	\
+		"	" #asm_op " %0, %1, %3			\n"	\
+		"	sc	%0, %2				\n"	\
+		"\t" __SC_BEQZ "%0, 1b				\n"	\
+		"	" #asm_op " %0, %1, %3			\n"	\
+		"	.set	pop				\n"	\
+		: "=&r" (result), "=&r" (temp),				\
+		  "+" GCC_OFF_SMALL_ASM() (v->counter)			\
+		: "Ir" (i) : __LLSC_CLOBBER);				\
+	} else {							\
+		unsigned long flags;					\
+									\
+		raw_local_irq_save(flags);				\
+		result = v->counter;					\
+		result c_op i;						\
+		v->counter = result;					\
+		raw_local_irq_restore(flags);				\
+	}								\
+									\
+	return result;							\
 }
 
-#define ATOMIC_FETCH_OP(op, c_op, asm_op)				      \
-static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	      \
-{									      \
-	int result;							      \
-									      \
-	if (kernel_uses_llsc) {						      \
-		int temp;						      \
-									      \
-		loongson_llsc_mb();					      \
-		__asm__ __volatile__(					      \
-		"	.set	push					\n"   \
-		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	ll	%1, %2		# atomic_fetch_" #op "	\n"   \
-		"	" #asm_op " %0, %1, %3				\n"   \
-		"	sc	%0, %2					\n"   \
-		"\t" __SC_BEQZ "%0, 1b					\n"   \
-		"	.set	pop					\n"   \
-		"	move	%0, %1					\n"   \
-		: "=&r" (result), "=&r" (temp),				      \
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i) : __LLSC_CLOBBER);				      \
-	} else {							      \
-		unsigned long flags;					      \
-									      \
-		raw_local_irq_save(flags);				      \
-		result = v->counter;					      \
-		v->counter c_op i;					      \
-		raw_local_irq_restore(flags);				      \
-	}								      \
-									      \
-	return result;							      \
+#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
+static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	\
+{									\
+	int result;							\
+									\
+	if (kernel_uses_llsc) {						\
+		int temp;						\
+									\
+		loongson_llsc_mb();					\
+		__asm__ __volatile__(					\
+		"	.set	push				\n"	\
+		"	.set	"MIPS_ISA_LEVEL"		\n"	\
+		"1:	ll	%1, %2	# atomic_fetch_" #op "	\n"	\
+		"	" #asm_op " %0, %1, %3			\n"	\
+		"	sc	%0, %2				\n"	\
+		"\t" __SC_BEQZ "%0, 1b				\n"	\
+		"	.set	pop				\n"	\
+		"	move	%0, %1				\n"	\
+		: "=&r" (result), "=&r" (temp),				\
+		  "+" GCC_OFF_SMALL_ASM() (v->counter)			\
+		: "Ir" (i) : __LLSC_CLOBBER);				\
+	} else {							\
+		unsigned long flags;					\
+									\
+		raw_local_irq_save(flags);				\
+		result = v->counter;					\
+		v->counter c_op i;					\
+		raw_local_irq_restore(flags);				\
+	}								\
+									\
+	return result;							\
 }
 
-#define ATOMIC_OPS(op, c_op, asm_op)					      \
-	ATOMIC_OP(op, c_op, asm_op)					      \
-	ATOMIC_OP_RETURN(op, c_op, asm_op)				      \
+#define ATOMIC_OPS(op, c_op, asm_op)					\
+	ATOMIC_OP(op, c_op, asm_op)					\
+	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
 	ATOMIC_FETCH_OP(op, c_op, asm_op)
 
 ATOMIC_OPS(add, +=, addu)
@@ -149,8 +149,8 @@ ATOMIC_OPS(sub, -=, subu)
 #define atomic_fetch_sub_relaxed	atomic_fetch_sub_relaxed
 
 #undef ATOMIC_OPS
-#define ATOMIC_OPS(op, c_op, asm_op)					      \
-	ATOMIC_OP(op, c_op, asm_op)					      \
+#define ATOMIC_OPS(op, c_op, asm_op)					\
+	ATOMIC_OP(op, c_op, asm_op)					\
 	ATOMIC_FETCH_OP(op, c_op, asm_op)
 
 ATOMIC_OPS(and, &=, and)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/37] MIPS: atomic: Handle !kernel_uses_llsc first
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (9 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 11/37] MIPS: atomic: Use one macro to generate 32b & 64b functions Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 12/37] MIPS: atomic: Emit Loongson3 sync workarounds within asm Paul Burton
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Handle the !kernel_uses_llsc path first in our ATOMIC_OP(),
ATOMIC_OP_RETURN() & ATOMIC_FETCH_OP() macros & return from within the
block. This allows us to de-indent the kernel_uses_llsc path by one
level which will be useful when making further changes.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/atomic.h | 99 +++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 50 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 2d2a8a74c51b..ace2ea005588 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -45,51 +45,36 @@
 #define ATOMIC_OP(op, c_op, asm_op)					\
 static __inline__ void atomic_##op(int i, atomic_t * v)			\
 {									\
-	if (kernel_uses_llsc) {						\
-		int temp;						\
+	int temp;							\
 									\
-		loongson_llsc_mb();					\
-		__asm__ __volatile__(					\
-		"	.set	push				\n"	\
-		"	.set	"MIPS_ISA_LEVEL"		\n"	\
-		"1:	ll	%0, %1	# atomic_" #op "	\n"	\
-		"	" #asm_op " %0, %2			\n"	\
-		"	sc	%0, %1				\n"	\
-		"\t" __SC_BEQZ "%0, 1b				\n"	\
-		"	.set	pop				\n"	\
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	\
-		: "Ir" (i) : __LLSC_CLOBBER);				\
-	} else {							\
+	if (!kernel_uses_llsc) {					\
 		unsigned long flags;					\
 									\
 		raw_local_irq_save(flags);				\
 		v->counter c_op i;					\
 		raw_local_irq_restore(flags);				\
+		return;							\
 	}								\
+									\
+	loongson_llsc_mb();						\
+	__asm__ __volatile__(						\
+	"	.set	push					\n"	\
+	"	.set	" MIPS_ISA_LEVEL "			\n"	\
+	"1:	ll	%0, %1		# atomic_" #op "	\n"	\
+	"	" #asm_op " %0, %2				\n"	\
+	"	sc	%0, %1					\n"	\
+	"\t" __SC_BEQZ "%0, 1b					\n"	\
+	"	.set	pop					\n"	\
+	: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)		\
+	: "Ir" (i) : __LLSC_CLOBBER);					\
 }
 
 #define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
 static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	\
 {									\
-	int result;							\
-									\
-	if (kernel_uses_llsc) {						\
-		int temp;						\
+	int temp, result;						\
 									\
-		loongson_llsc_mb();					\
-		__asm__ __volatile__(					\
-		"	.set	push				\n"	\
-		"	.set	"MIPS_ISA_LEVEL"		\n"	\
-		"1:	ll	%1, %2	# atomic_" #op "_return	\n"	\
-		"	" #asm_op " %0, %1, %3			\n"	\
-		"	sc	%0, %2				\n"	\
-		"\t" __SC_BEQZ "%0, 1b				\n"	\
-		"	" #asm_op " %0, %1, %3			\n"	\
-		"	.set	pop				\n"	\
-		: "=&r" (result), "=&r" (temp),				\
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)			\
-		: "Ir" (i) : __LLSC_CLOBBER);				\
-	} else {							\
+	if (!kernel_uses_llsc) {					\
 		unsigned long flags;					\
 									\
 		raw_local_irq_save(flags);				\
@@ -97,41 +82,55 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	\
 		result c_op i;						\
 		v->counter = result;					\
 		raw_local_irq_restore(flags);				\
+		return result;						\
 	}								\
 									\
+	loongson_llsc_mb();						\
+	__asm__ __volatile__(						\
+	"	.set	push					\n"	\
+	"	.set	" MIPS_ISA_LEVEL "			\n"	\
+	"1:	ll	%1, %2		# atomic_" #op "_return	\n"	\
+	"	" #asm_op " %0, %1, %3				\n"	\
+	"	sc	%0, %2					\n"	\
+	"\t" __SC_BEQZ "%0, 1b					\n"	\
+	"	" #asm_op " %0, %1, %3				\n"	\
+	"	.set	pop					\n"	\
+	: "=&r" (result), "=&r" (temp),					\
+	  "+" GCC_OFF_SMALL_ASM() (v->counter)				\
+	: "Ir" (i) : __LLSC_CLOBBER);					\
+									\
 	return result;							\
 }
 
 #define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
 static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	\
 {									\
-	int result;							\
+	int temp, result;						\
 									\
-	if (kernel_uses_llsc) {						\
-		int temp;						\
-									\
-		loongson_llsc_mb();					\
-		__asm__ __volatile__(					\
-		"	.set	push				\n"	\
-		"	.set	"MIPS_ISA_LEVEL"		\n"	\
-		"1:	ll	%1, %2	# atomic_fetch_" #op "	\n"	\
-		"	" #asm_op " %0, %1, %3			\n"	\
-		"	sc	%0, %2				\n"	\
-		"\t" __SC_BEQZ "%0, 1b				\n"	\
-		"	.set	pop				\n"	\
-		"	move	%0, %1				\n"	\
-		: "=&r" (result), "=&r" (temp),				\
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)			\
-		: "Ir" (i) : __LLSC_CLOBBER);				\
-	} else {							\
+	if (!kernel_uses_llsc) {					\
 		unsigned long flags;					\
 									\
 		raw_local_irq_save(flags);				\
 		result = v->counter;					\
 		v->counter c_op i;					\
 		raw_local_irq_restore(flags);				\
+		return result;						\
 	}								\
 									\
+	loongson_llsc_mb();						\
+	__asm__ __volatile__(						\
+	"	.set	push					\n"	\
+	"	.set	"MIPS_ISA_LEVEL"			\n"	\
+	"1:	ll	%1, %2		# atomic_fetch_" #op "	\n"	\
+	"	" #asm_op " %0, %1, %3				\n"	\
+	"	sc	%0, %2					\n"	\
+	"\t" __SC_BEQZ "%0, 1b					\n"	\
+	"	.set	pop					\n"	\
+	"	move	%0, %1					\n"	\
+	: "=&r" (result), "=&r" (temp),					\
+	  "+" GCC_OFF_SMALL_ASM() (v->counter)				\
+	: "Ir" (i) : __LLSC_CLOBBER);					\
+									\
 	return result;							\
 }
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/37] MIPS: atomic: Use one macro to generate 32b & 64b functions
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (8 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 09/37] MIPS: atomic: Fix whitespace in ATOMIC_OP macros Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 10/37] MIPS: atomic: Handle !kernel_uses_llsc first Paul Burton
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Cut down on duplication by generalizing the ATOMIC_OP(),
ATOMIC_OP_RETURN() & ATOMIC_FETCH_OP() macros to work for both 32b &
64b atomics, and removing the ATOMIC64_ variants. This ensures
consistency between our atomic_* & atomic64_* functions.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/atomic.h | 196 ++++++++-------------------------
 1 file changed, 45 insertions(+), 151 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index ace2ea005588..b834af5a7382 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -42,10 +42,10 @@
  */
 #define atomic_set(v, i)	WRITE_ONCE((v)->counter, (i))
 
-#define ATOMIC_OP(op, c_op, asm_op)					\
-static __inline__ void atomic_##op(int i, atomic_t * v)			\
+#define ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc)			\
+static __inline__ void pfx##_##op(type i, pfx##_t * v)			\
 {									\
-	int temp;							\
+	type temp;							\
 									\
 	if (!kernel_uses_llsc) {					\
 		unsigned long flags;					\
@@ -60,19 +60,19 @@ static __inline__ void atomic_##op(int i, atomic_t * v)			\
 	__asm__ __volatile__(						\
 	"	.set	push					\n"	\
 	"	.set	" MIPS_ISA_LEVEL "			\n"	\
-	"1:	ll	%0, %1		# atomic_" #op "	\n"	\
+	"1:	" #ll "	%0, %1		# " #pfx "_" #op "	\n"	\
 	"	" #asm_op " %0, %2				\n"	\
-	"	sc	%0, %1					\n"	\
+	"	" #sc "	%0, %1					\n"	\
 	"\t" __SC_BEQZ "%0, 1b					\n"	\
 	"	.set	pop					\n"	\
 	: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)		\
 	: "Ir" (i) : __LLSC_CLOBBER);					\
 }
 
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	\
+#define ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc)		\
+static __inline__ type pfx##_##op##_return_relaxed(type i, pfx##_t * v)	\
 {									\
-	int temp, result;						\
+	type temp, result;						\
 									\
 	if (!kernel_uses_llsc) {					\
 		unsigned long flags;					\
@@ -89,9 +89,9 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	\
 	__asm__ __volatile__(						\
 	"	.set	push					\n"	\
 	"	.set	" MIPS_ISA_LEVEL "			\n"	\
-	"1:	ll	%1, %2		# atomic_" #op "_return	\n"	\
+	"1:	" #ll "	%1, %2		# " #pfx "_" #op "_return\n"	\
 	"	" #asm_op " %0, %1, %3				\n"	\
-	"	sc	%0, %2					\n"	\
+	"	" #sc "	%0, %2					\n"	\
 	"\t" __SC_BEQZ "%0, 1b					\n"	\
 	"	" #asm_op " %0, %1, %3				\n"	\
 	"	.set	pop					\n"	\
@@ -102,8 +102,8 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	\
 	return result;							\
 }
 
-#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
-static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	\
+#define ATOMIC_FETCH_OP(pfx, op, type, c_op, asm_op, ll, sc)		\
+static __inline__ type pfx##_fetch_##op##_relaxed(type i, pfx##_t * v)	\
 {									\
 	int temp, result;						\
 									\
@@ -120,10 +120,10 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	\
 	loongson_llsc_mb();						\
 	__asm__ __volatile__(						\
 	"	.set	push					\n"	\
-	"	.set	"MIPS_ISA_LEVEL"			\n"	\
-	"1:	ll	%1, %2		# atomic_fetch_" #op "	\n"	\
+	"	.set	" MIPS_ISA_LEVEL "			\n"	\
+	"1:	" #ll "	%1, %2		# " #pfx "_fetch_" #op "\n"	\
 	"	" #asm_op " %0, %1, %3				\n"	\
-	"	sc	%0, %2					\n"	\
+	"	" #sc "	%0, %2					\n"	\
 	"\t" __SC_BEQZ "%0, 1b					\n"	\
 	"	.set	pop					\n"	\
 	"	move	%0, %1					\n"	\
@@ -134,32 +134,50 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	\
 	return result;							\
 }
 
-#define ATOMIC_OPS(op, c_op, asm_op)					\
-	ATOMIC_OP(op, c_op, asm_op)					\
-	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-	ATOMIC_FETCH_OP(op, c_op, asm_op)
+#define ATOMIC_OPS(pfx, op, type, c_op, asm_op, ll, sc)			\
+	ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc)			\
+	ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc)		\
+	ATOMIC_FETCH_OP(pfx, op, type, c_op, asm_op, ll, sc)
 
-ATOMIC_OPS(add, +=, addu)
-ATOMIC_OPS(sub, -=, subu)
+ATOMIC_OPS(atomic, add, int, +=, addu, ll, sc)
+ATOMIC_OPS(atomic, sub, int, -=, subu, ll, sc)
 
 #define atomic_add_return_relaxed	atomic_add_return_relaxed
 #define atomic_sub_return_relaxed	atomic_sub_return_relaxed
 #define atomic_fetch_add_relaxed	atomic_fetch_add_relaxed
 #define atomic_fetch_sub_relaxed	atomic_fetch_sub_relaxed
 
+#ifdef CONFIG_64BIT
+ATOMIC_OPS(atomic64, add, s64, +=, daddu, lld, scd)
+ATOMIC_OPS(atomic64, sub, s64, -=, dsubu, lld, scd)
+# define atomic64_add_return_relaxed	atomic64_add_return_relaxed
+# define atomic64_sub_return_relaxed	atomic64_sub_return_relaxed
+# define atomic64_fetch_add_relaxed	atomic64_fetch_add_relaxed
+# define atomic64_fetch_sub_relaxed	atomic64_fetch_sub_relaxed
+#endif /* CONFIG_64BIT */
+
 #undef ATOMIC_OPS
-#define ATOMIC_OPS(op, c_op, asm_op)					\
-	ATOMIC_OP(op, c_op, asm_op)					\
-	ATOMIC_FETCH_OP(op, c_op, asm_op)
+#define ATOMIC_OPS(pfx, op, type, c_op, asm_op, ll, sc)			\
+	ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc)			\
+	ATOMIC_FETCH_OP(pfx, op, type, c_op, asm_op, ll, sc)
 
-ATOMIC_OPS(and, &=, and)
-ATOMIC_OPS(or, |=, or)
-ATOMIC_OPS(xor, ^=, xor)
+ATOMIC_OPS(atomic, and, int, &=, and, ll, sc)
+ATOMIC_OPS(atomic, or, int, |=, or, ll, sc)
+ATOMIC_OPS(atomic, xor, int, ^=, xor, ll, sc)
 
 #define atomic_fetch_and_relaxed	atomic_fetch_and_relaxed
 #define atomic_fetch_or_relaxed		atomic_fetch_or_relaxed
 #define atomic_fetch_xor_relaxed	atomic_fetch_xor_relaxed
 
+#ifdef CONFIG_64BIT
+ATOMIC_OPS(atomic64, and, s64, &=, and, lld, scd)
+ATOMIC_OPS(atomic64, or, s64, |=, or, lld, scd)
+ATOMIC_OPS(atomic64, xor, s64, ^=, xor, lld, scd)
+# define atomic64_fetch_and_relaxed	atomic64_fetch_and_relaxed
+# define atomic64_fetch_or_relaxed	atomic64_fetch_or_relaxed
+# define atomic64_fetch_xor_relaxed	atomic64_fetch_xor_relaxed
+#endif
+
 #undef ATOMIC_OPS
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_OP_RETURN
@@ -243,130 +261,6 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
  */
 #define atomic64_set(v, i)	WRITE_ONCE((v)->counter, (i))
 
-#define ATOMIC64_OP(op, c_op, asm_op)					      \
-static __inline__ void atomic64_##op(s64 i, atomic64_t * v)		      \
-{									      \
-	if (kernel_uses_llsc) {						      \
-		s64 temp;						      \
-									      \
-		loongson_llsc_mb();					      \
-		__asm__ __volatile__(					      \
-		"	.set	push					\n"   \
-		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	lld	%0, %1		# atomic64_" #op "	\n"   \
-		"	" #asm_op " %0, %2				\n"   \
-		"	scd	%0, %1					\n"   \
-		"\t" __SC_BEQZ "%0, 1b					\n"   \
-		"	.set	pop					\n"   \
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
-		: "Ir" (i) : __LLSC_CLOBBER);				      \
-	} else {							      \
-		unsigned long flags;					      \
-									      \
-		raw_local_irq_save(flags);				      \
-		v->counter c_op i;					      \
-		raw_local_irq_restore(flags);				      \
-	}								      \
-}
-
-#define ATOMIC64_OP_RETURN(op, c_op, asm_op)				      \
-static __inline__ s64 atomic64_##op##_return_relaxed(s64 i, atomic64_t * v)   \
-{									      \
-	s64 result;							      \
-									      \
-	if (kernel_uses_llsc) {						      \
-		s64 temp;						      \
-									      \
-		loongson_llsc_mb();					      \
-		__asm__ __volatile__(					      \
-		"	.set	push					\n"   \
-		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	lld	%1, %2		# atomic64_" #op "_return\n"  \
-		"	" #asm_op " %0, %1, %3				\n"   \
-		"	scd	%0, %2					\n"   \
-		"\t" __SC_BEQZ "%0, 1b					\n"   \
-		"	" #asm_op " %0, %1, %3				\n"   \
-		"	.set	pop					\n"   \
-		: "=&r" (result), "=&r" (temp),				      \
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i) : __LLSC_CLOBBER);				      \
-	} else {							      \
-		unsigned long flags;					      \
-									      \
-		raw_local_irq_save(flags);				      \
-		result = v->counter;					      \
-		result c_op i;						      \
-		v->counter = result;					      \
-		raw_local_irq_restore(flags);				      \
-	}								      \
-									      \
-	return result;							      \
-}
-
-#define ATOMIC64_FETCH_OP(op, c_op, asm_op)				      \
-static __inline__ s64 atomic64_fetch_##op##_relaxed(s64 i, atomic64_t * v)    \
-{									      \
-	s64 result;							      \
-									      \
-	if (kernel_uses_llsc) {						      \
-		s64 temp;						      \
-									      \
-		loongson_llsc_mb();					      \
-		__asm__ __volatile__(					      \
-		"	.set	push					\n"   \
-		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	lld	%1, %2		# atomic64_fetch_" #op "\n"   \
-		"	" #asm_op " %0, %1, %3				\n"   \
-		"	scd	%0, %2					\n"   \
-		"\t" __SC_BEQZ "%0, 1b					\n"   \
-		"	move	%0, %1					\n"   \
-		"	.set	pop					\n"   \
-		: "=&r" (result), "=&r" (temp),				      \
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
-		: "Ir" (i) : __LLSC_CLOBBER);				      \
-	} else {							      \
-		unsigned long flags;					      \
-									      \
-		raw_local_irq_save(flags);				      \
-		result = v->counter;					      \
-		v->counter c_op i;					      \
-		raw_local_irq_restore(flags);				      \
-	}								      \
-									      \
-	return result;							      \
-}
-
-#define ATOMIC64_OPS(op, c_op, asm_op)					      \
-	ATOMIC64_OP(op, c_op, asm_op)					      \
-	ATOMIC64_OP_RETURN(op, c_op, asm_op)				      \
-	ATOMIC64_FETCH_OP(op, c_op, asm_op)
-
-ATOMIC64_OPS(add, +=, daddu)
-ATOMIC64_OPS(sub, -=, dsubu)
-
-#define atomic64_add_return_relaxed	atomic64_add_return_relaxed
-#define atomic64_sub_return_relaxed	atomic64_sub_return_relaxed
-#define atomic64_fetch_add_relaxed	atomic64_fetch_add_relaxed
-#define atomic64_fetch_sub_relaxed	atomic64_fetch_sub_relaxed
-
-#undef ATOMIC64_OPS
-#define ATOMIC64_OPS(op, c_op, asm_op)					      \
-	ATOMIC64_OP(op, c_op, asm_op)					      \
-	ATOMIC64_FETCH_OP(op, c_op, asm_op)
-
-ATOMIC64_OPS(and, &=, and)
-ATOMIC64_OPS(or, |=, or)
-ATOMIC64_OPS(xor, ^=, xor)
-
-#define atomic64_fetch_and_relaxed	atomic64_fetch_and_relaxed
-#define atomic64_fetch_or_relaxed	atomic64_fetch_or_relaxed
-#define atomic64_fetch_xor_relaxed	atomic64_fetch_xor_relaxed
-
-#undef ATOMIC64_OPS
-#undef ATOMIC64_FETCH_OP
-#undef ATOMIC64_OP_RETURN
-#undef ATOMIC64_OP
-
 /*
  * atomic64_sub_if_positive - conditionally subtract integer from atomic
  *                            variable
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/37] MIPS: atomic: Emit Loongson3 sync workarounds within asm
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (10 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 10/37] MIPS: atomic: Handle !kernel_uses_llsc first Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 13/37] MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive() Paul Burton
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/atomic.h | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index b834af5a7382..841ff274ada6 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -21,6 +21,7 @@
 #include <asm/cpu-features.h>
 #include <asm/cmpxchg.h>
 #include <asm/llsc.h>
+#include <asm/sync.h>
 #include <asm/war.h>
 
 #define ATOMIC_INIT(i)	  { (i) }
@@ -56,10 +57,10 @@ static __inline__ void pfx##_##op(type i, pfx##_t * v)			\
 		return;							\
 	}								\
 									\
-	loongson_llsc_mb();						\
 	__asm__ __volatile__(						\
 	"	.set	push					\n"	\
 	"	.set	" MIPS_ISA_LEVEL "			\n"	\
+	"	" __SYNC(full, loongson3_war) "			\n"	\
 	"1:	" #ll "	%0, %1		# " #pfx "_" #op "	\n"	\
 	"	" #asm_op " %0, %2				\n"	\
 	"	" #sc "	%0, %1					\n"	\
@@ -85,10 +86,10 @@ static __inline__ type pfx##_##op##_return_relaxed(type i, pfx##_t * v)	\
 		return result;						\
 	}								\
 									\
-	loongson_llsc_mb();						\
 	__asm__ __volatile__(						\
 	"	.set	push					\n"	\
 	"	.set	" MIPS_ISA_LEVEL "			\n"	\
+	"	" __SYNC(full, loongson3_war) "			\n"	\
 	"1:	" #ll "	%1, %2		# " #pfx "_" #op "_return\n"	\
 	"	" #asm_op " %0, %1, %3				\n"	\
 	"	" #sc "	%0, %2					\n"	\
@@ -117,10 +118,10 @@ static __inline__ type pfx##_fetch_##op##_relaxed(type i, pfx##_t * v)	\
 		return result;						\
 	}								\
 									\
-	loongson_llsc_mb();						\
 	__asm__ __volatile__(						\
 	"	.set	push					\n"	\
 	"	.set	" MIPS_ISA_LEVEL "			\n"	\
+	"	" __SYNC(full, loongson3_war) "			\n"	\
 	"1:	" #ll "	%1, %2		# " #pfx "_fetch_" #op "\n"	\
 	"	" #asm_op " %0, %1, %3				\n"	\
 	"	" #sc "	%0, %2					\n"	\
@@ -200,10 +201,10 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 	if (kernel_uses_llsc) {
 		int temp;
 
-		loongson_llsc_mb();
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_LEVEL"			\n"
+		"	" __SYNC(full, loongson3_war) "			\n"
 		"1:	ll	%1, %2		# atomic_sub_if_positive\n"
 		"	.set	pop					\n"
 		"	subu	%0, %1, %3				\n"
@@ -213,7 +214,7 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 		"	.set	"MIPS_ISA_LEVEL"			\n"
 		"	sc	%1, %2					\n"
 		"\t" __SC_BEQZ "%1, 1b					\n"
-		"2:							\n"
+		"2:	" __SYNC(full, loongson3_war) "			\n"
 		"	.set	pop					\n"
 		: "=&r" (result), "=&r" (temp),
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)
@@ -229,7 +230,14 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 		raw_local_irq_restore(flags);
 	}
 
-	smp_llsc_mb();
+	/*
+	 * In the Loongson3 workaround case we already have a completion
+	 * barrier at 2: above, which is needed due to the bltz that can branch
+	 * to code outside of the LL/SC loop. As such, we don't need to emit
+	 * another barrier here.
+	 */
+	if (!__SYNC_loongson3_war)
+		smp_llsc_mb();
 
 	return result;
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/37] MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive()
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (11 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 12/37] MIPS: atomic: Emit Loongson3 sync workarounds within asm Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 14/37] MIPS: atomic: Unify 32b & 64b sub_if_positive Paul Burton
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Use smp_mb__before_atomic() & smp_mb__after_atomic() in
atomic_sub_if_positive() rather than the equivalent
smp_mb__before_llsc() & smp_llsc_mb(). The former are more standard &
this preps us for avoiding redundant duplicate barriers on Loongson3 in
a later patch.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/atomic.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 841ff274ada6..24443ef29337 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -196,7 +196,7 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 {
 	int result;
 
-	smp_mb__before_llsc();
+	smp_mb__before_atomic();
 
 	if (kernel_uses_llsc) {
 		int temp;
@@ -237,7 +237,7 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 	 * another barrier here.
 	 */
 	if (!__SYNC_loongson3_war)
-		smp_llsc_mb();
+		smp_mb__after_atomic();
 
 	return result;
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 14/37] MIPS: atomic: Unify 32b & 64b sub_if_positive
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (12 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 13/37] MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive() Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 15/37] MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg Paul Burton
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Unify the definitions of atomic_sub_if_positive() &
atomic64_sub_if_positive() using a macro like we do for most other
atomic functions. This allows us to share the implementation ensuring
consistency between the two. Notably this provides the appropriate
loongson3_war barriers in the atomic64_sub_if_positive() case which were
previously missing.

The code is rearranged a little to handle the !kernel_uses_llsc case
first in order to de-indent the LL/SC case & allow us not to go over 80
characters per line.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/atomic.h | 164 ++++++++++++---------------------
 1 file changed, 58 insertions(+), 106 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 24443ef29337..96ef50fa2817 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -192,65 +192,71 @@ ATOMIC_OPS(atomic64, xor, s64, ^=, xor, lld, scd)
  * Atomically test @v and subtract @i if @v is greater or equal than @i.
  * The function returns the old value of @v minus @i.
  */
-static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
-{
-	int result;
-
-	smp_mb__before_atomic();
-
-	if (kernel_uses_llsc) {
-		int temp;
-
-		__asm__ __volatile__(
-		"	.set	push					\n"
-		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"	" __SYNC(full, loongson3_war) "			\n"
-		"1:	ll	%1, %2		# atomic_sub_if_positive\n"
-		"	.set	pop					\n"
-		"	subu	%0, %1, %3				\n"
-		"	move	%1, %0					\n"
-		"	bltz	%0, 2f					\n"
-		"	.set	push					\n"
-		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"	sc	%1, %2					\n"
-		"\t" __SC_BEQZ "%1, 1b					\n"
-		"2:	" __SYNC(full, loongson3_war) "			\n"
-		"	.set	pop					\n"
-		: "=&r" (result), "=&r" (temp),
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)
-		: "Ir" (i) : __LLSC_CLOBBER);
-	} else {
-		unsigned long flags;
+#define ATOMIC_SIP_OP(pfx, type, op, ll, sc)				\
+static __inline__ int pfx##_sub_if_positive(type i, pfx##_t * v)	\
+{									\
+	type temp, result;						\
+									\
+	smp_mb__before_atomic();					\
+									\
+	if (!kernel_uses_llsc) {					\
+		unsigned long flags;					\
+									\
+		raw_local_irq_save(flags);				\
+		result = v->counter;					\
+		result -= i;						\
+		if (result >= 0)					\
+			v->counter = result;				\
+		raw_local_irq_restore(flags);				\
+		smp_mb__after_atomic();					\
+		return result;						\
+	}								\
+									\
+	__asm__ __volatile__(						\
+	"	.set	push					\n"	\
+	"	.set	" MIPS_ISA_LEVEL "			\n"	\
+	"	" __SYNC(full, loongson3_war) "			\n"	\
+	"1:	" #ll "	%1, %2		# atomic_sub_if_positive\n"	\
+	"	.set	pop					\n"	\
+	"	" #op "	%0, %1, %3				\n"	\
+	"	move	%1, %0					\n"	\
+	"	bltz	%0, 2f					\n"	\
+	"	.set	push					\n"	\
+	"	.set	" MIPS_ISA_LEVEL "			\n"	\
+	"	" #sc "	%1, %2					\n"	\
+	"	" __SC_BEQZ "%1, 1b				\n"	\
+	"2:	" __SYNC(full, loongson3_war) "			\n"	\
+	"	.set	pop					\n"	\
+	: "=&r" (result), "=&r" (temp),					\
+	  "+" GCC_OFF_SMALL_ASM() (v->counter)				\
+	: "Ir" (i)							\
+	: __LLSC_CLOBBER);						\
+									\
+	/*								\
+	 * In the Loongson3 workaround case we already have a		\
+	 * completion barrier at 2: above, which is needed due to the	\
+	 * bltz that can branch	to code outside of the LL/SC loop. As	\
+	 * such, we don't need to emit another barrier here.		\
+	 */								\
+	if (!__SYNC_loongson3_war)					\
+		smp_mb__after_atomic();					\
+									\
+	return result;							\
+}
 
-		raw_local_irq_save(flags);
-		result = v->counter;
-		result -= i;
-		if (result >= 0)
-			v->counter = result;
-		raw_local_irq_restore(flags);
-	}
+ATOMIC_SIP_OP(atomic, int, subu, ll, sc)
+#define atomic_dec_if_positive(v)	atomic_sub_if_positive(1, v)
 
-	/*
-	 * In the Loongson3 workaround case we already have a completion
-	 * barrier at 2: above, which is needed due to the bltz that can branch
-	 * to code outside of the LL/SC loop. As such, we don't need to emit
-	 * another barrier here.
-	 */
-	if (!__SYNC_loongson3_war)
-		smp_mb__after_atomic();
+#ifdef CONFIG_64BIT
+ATOMIC_SIP_OP(atomic64, s64, dsubu, lld, scd)
+#define atomic64_dec_if_positive(v)	atomic64_sub_if_positive(1, v)
+#endif
 
-	return result;
-}
+#undef ATOMIC_SIP_OP
 
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
 
-/*
- * atomic_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic_t
- */
-#define atomic_dec_if_positive(v)	atomic_sub_if_positive(1, v)
-
 #ifdef CONFIG_64BIT
 
 #define ATOMIC64_INIT(i)    { (i) }
@@ -269,64 +275,10 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
  */
 #define atomic64_set(v, i)	WRITE_ONCE((v)->counter, (i))
 
-/*
- * atomic64_sub_if_positive - conditionally subtract integer from atomic
- *                            variable
- * @i: integer value to subtract
- * @v: pointer of type atomic64_t
- *
- * Atomically test @v and subtract @i if @v is greater or equal than @i.
- * The function returns the old value of @v minus @i.
- */
-static __inline__ s64 atomic64_sub_if_positive(s64 i, atomic64_t * v)
-{
-	s64 result;
-
-	smp_mb__before_llsc();
-
-	if (kernel_uses_llsc) {
-		s64 temp;
-
-		__asm__ __volatile__(
-		"	.set	push					\n"
-		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"1:	lld	%1, %2		# atomic64_sub_if_positive\n"
-		"	dsubu	%0, %1, %3				\n"
-		"	move	%1, %0					\n"
-		"	bltz	%0, 1f					\n"
-		"	scd	%1, %2					\n"
-		"\t" __SC_BEQZ "%1, 1b					\n"
-		"1:							\n"
-		"	.set	pop					\n"
-		: "=&r" (result), "=&r" (temp),
-		  "+" GCC_OFF_SMALL_ASM() (v->counter)
-		: "Ir" (i));
-	} else {
-		unsigned long flags;
-
-		raw_local_irq_save(flags);
-		result = v->counter;
-		result -= i;
-		if (result >= 0)
-			v->counter = result;
-		raw_local_irq_restore(flags);
-	}
-
-	smp_llsc_mb();
-
-	return result;
-}
-
 #define atomic64_cmpxchg(v, o, n) \
 	((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), (new)))
 
-/*
- * atomic64_dec_if_positive - decrement by 1 if old value positive
- * @v: pointer of type atomic64_t
- */
-#define atomic64_dec_if_positive(v)	atomic64_sub_if_positive(1, v)
-
 #endif /* CONFIG_64BIT */
 
 #endif /* _ASM_ATOMIC_H */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 15/37] MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (13 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 14/37] MIPS: atomic: Unify 32b & 64b sub_if_positive Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 17/37] MIPS: bitops: Handle !kernel_uses_llsc first Paul Burton
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Remove the remaining duplication between 32b & 64b in asm/atomic.h by
making use of an ATOMIC_OPS() macro to generate:

  - atomic_read()/atomic64_read()
  - atomic_set()/atomic64_set()
  - atomic_cmpxchg()/atomic64_cmpxchg()
  - atomic_xchg()/atomic64_xchg()

This is consistent with the way all other functions in asm/atomic.h are
generated, and ensures consistency between the 32b & 64b functions.

Of note is that this results in the above now being static inline
functions rather than macros.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/atomic.h | 70 +++++++++++++---------------------
 1 file changed, 27 insertions(+), 43 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 96ef50fa2817..e5ac88392d1f 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -24,24 +24,34 @@
 #include <asm/sync.h>
 #include <asm/war.h>
 
-#define ATOMIC_INIT(i)	  { (i) }
+#define ATOMIC_OPS(pfx, type)						\
+static __always_inline type pfx##_read(const pfx##_t *v)		\
+{									\
+	return READ_ONCE(v->counter);					\
+}									\
+									\
+static __always_inline void pfx##_set(pfx##_t *v, type i)		\
+{									\
+	WRITE_ONCE(v->counter, i);					\
+}									\
+									\
+static __always_inline type pfx##_cmpxchg(pfx##_t *v, type o, type n)	\
+{									\
+	return cmpxchg(&v->counter, o, n);				\
+}									\
+									\
+static __always_inline type pfx##_xchg(pfx##_t *v, type n)		\
+{									\
+	return xchg(&v->counter, n);					\
+}
 
-/*
- * atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v.
- */
-#define atomic_read(v)		READ_ONCE((v)->counter)
+#define ATOMIC_INIT(i)		{ (i) }
+ATOMIC_OPS(atomic, int)
 
-/*
- * atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i.
- */
-#define atomic_set(v, i)	WRITE_ONCE((v)->counter, (i))
+#ifdef CONFIG_64BIT
+# define ATOMIC64_INIT(i)	{ (i) }
+ATOMIC_OPS(atomic64, s64)
+#endif
 
 #define ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc)			\
 static __inline__ void pfx##_##op(type i, pfx##_t * v)			\
@@ -135,6 +145,7 @@ static __inline__ type pfx##_fetch_##op##_relaxed(type i, pfx##_t * v)	\
 	return result;							\
 }
 
+#undef ATOMIC_OPS
 #define ATOMIC_OPS(pfx, op, type, c_op, asm_op, ll, sc)			\
 	ATOMIC_OP(pfx, op, type, c_op, asm_op, ll, sc)			\
 	ATOMIC_OP_RETURN(pfx, op, type, c_op, asm_op, ll, sc)		\
@@ -254,31 +265,4 @@ ATOMIC_SIP_OP(atomic64, s64, dsubu, lld, scd)
 
 #undef ATOMIC_SIP_OP
 
-#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
-#define atomic_xchg(v, new) (xchg(&((v)->counter), (new)))
-
-#ifdef CONFIG_64BIT
-
-#define ATOMIC64_INIT(i)    { (i) }
-
-/*
- * atomic64_read - read atomic variable
- * @v: pointer of type atomic64_t
- *
- */
-#define atomic64_read(v)	READ_ONCE((v)->counter)
-
-/*
- * atomic64_set - set atomic variable
- * @v: pointer of type atomic64_t
- * @i: required value
- */
-#define atomic64_set(v, i)	WRITE_ONCE((v)->counter, (i))
-
-#define atomic64_cmpxchg(v, o, n) \
-	((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), (new)))
-
-#endif /* CONFIG_64BIT */
-
 #endif /* _ASM_ATOMIC_H */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 17/37] MIPS: bitops: Handle !kernel_uses_llsc first
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (14 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 15/37] MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 16/37] MIPS: bitops: Use generic builtin ffs/fls; drop cpu_has_clo_clz Paul Burton
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Reorder conditions in our various bitops functions that check
kernel_uses_llsc such that they handle the !kernel_uses_llsc case first.
This allows us to avoid the need to duplicate the kernel_uses_llsc check
in all the other cases. For functions that don't involve barriers common
to the various implementations, we switch to returning from within each
if block making each case easier to read in isolation.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 213 ++++++++++++++++-----------------
 1 file changed, 105 insertions(+), 108 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 4b618afbfa5b..d3f3f37ca0b1 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -52,11 +52,16 @@ int __mips_test_and_change_bit(unsigned long nr,
  */
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
+	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
 	unsigned long temp;
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
+	if (!kernel_uses_llsc) {
+		__mips_set_bit(nr, addr);
+		return;
+	}
+
+	if (R10000_LLSC_WAR) {
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	arch=r4000				\n"
@@ -68,8 +73,11 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 		: "=&r" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
 		: "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)
 		: __LLSC_CLOBBER);
+		return;
+	}
+
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-	} else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
+	if (__builtin_constant_p(bit)) {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -80,23 +88,23 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 			: "ir" (bit), "r" (~0)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
+		return;
+	}
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
-	} else if (kernel_uses_llsc) {
-		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	.set	push				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	" __LL "%0, %1		# set_bit	\n"
-			"	or	%0, %2				\n"
-			"	" __SC	"%0, %1				\n"
-			"	.set	pop				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (1UL << bit)
-			: __LLSC_CLOBBER);
-		} while (unlikely(!temp));
-	} else
-		__mips_set_bit(nr, addr);
+
+	loongson_llsc_mb();
+	do {
+		__asm__ __volatile__(
+		"	.set	push				\n"
+		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+		"	" __LL "%0, %1		# set_bit	\n"
+		"	or	%0, %2				\n"
+		"	" __SC	"%0, %1				\n"
+		"	.set	pop				\n"
+		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
+		: "ir" (1UL << bit)
+		: __LLSC_CLOBBER);
+	} while (unlikely(!temp));
 }
 
 /*
@@ -111,11 +119,16 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
  */
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
+	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
 	unsigned long temp;
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
+	if (!kernel_uses_llsc) {
+		__mips_clear_bit(nr, addr);
+		return;
+	}
+
+	if (R10000_LLSC_WAR) {
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	arch=r4000				\n"
@@ -127,8 +140,11 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
 		: "ir" (~(1UL << bit))
 		: __LLSC_CLOBBER);
+		return;
+	}
+
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-	} else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
+	if (__builtin_constant_p(bit)) {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -139,23 +155,23 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 			: "ir" (bit)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
+		return;
+	}
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
-	} else if (kernel_uses_llsc) {
-		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	.set	push				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	" __LL "%0, %1		# clear_bit	\n"
-			"	and	%0, %2				\n"
-			"	" __SC "%0, %1				\n"
-			"	.set	pop				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (~(1UL << bit))
-			: __LLSC_CLOBBER);
-		} while (unlikely(!temp));
-	} else
-		__mips_clear_bit(nr, addr);
+
+	loongson_llsc_mb();
+	do {
+		__asm__ __volatile__(
+		"	.set	push				\n"
+		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+		"	" __LL "%0, %1		# clear_bit	\n"
+		"	and	%0, %2				\n"
+		"	" __SC "%0, %1				\n"
+		"	.set	pop				\n"
+		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
+		: "ir" (~(1UL << bit))
+		: __LLSC_CLOBBER);
+	} while (unlikely(!temp));
 }
 
 /*
@@ -183,12 +199,16 @@ static inline void clear_bit_unlock(unsigned long nr, volatile unsigned long *ad
  */
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 {
+	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
+	unsigned long temp;
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
+	if (!kernel_uses_llsc) {
+		__mips_change_bit(nr, addr);
+		return;
+	}
 
+	if (R10000_LLSC_WAR) {
 		__asm__ __volatile__(
 		"	.set	push				\n"
 		"	.set	arch=r4000			\n"
@@ -200,25 +220,22 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
 		: "ir" (1UL << bit)
 		: __LLSC_CLOBBER);
-	} else if (kernel_uses_llsc) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
+		return;
+	}
 
-		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	.set	push				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	" __LL "%0, %1		# change_bit	\n"
-			"	xor	%0, %2				\n"
-			"	" __SC	"%0, %1				\n"
-			"	.set	pop				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (1UL << bit)
-			: __LLSC_CLOBBER);
-		} while (unlikely(!temp));
-	} else
-		__mips_change_bit(nr, addr);
+	loongson_llsc_mb();
+	do {
+		__asm__ __volatile__(
+		"	.set	push				\n"
+		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+		"	" __LL "%0, %1		# change_bit	\n"
+		"	xor	%0, %2				\n"
+		"	" __SC	"%0, %1				\n"
+		"	.set	pop				\n"
+		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
+		: "ir" (1UL << bit)
+		: __LLSC_CLOBBER);
+	} while (unlikely(!temp));
 }
 
 /*
@@ -232,15 +249,15 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 static inline int test_and_set_bit(unsigned long nr,
 	volatile unsigned long *addr)
 {
+	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long res;
+	unsigned long res, temp;
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
-
+	if (!kernel_uses_llsc) {
+		res = __mips_test_and_set_bit(nr, addr);
+	} else if (R10000_LLSC_WAR) {
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	arch=r4000				\n"
@@ -253,10 +270,7 @@ static inline int test_and_set_bit(unsigned long nr,
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
 		: __LLSC_CLOBBER);
-	} else if (kernel_uses_llsc) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
-
+	} else {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -272,8 +286,7 @@ static inline int test_and_set_bit(unsigned long nr,
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
-	} else
-		res = __mips_test_and_set_bit(nr, addr);
+	}
 
 	smp_llsc_mb();
 
@@ -291,13 +304,13 @@ static inline int test_and_set_bit(unsigned long nr,
 static inline int test_and_set_bit_lock(unsigned long nr,
 	volatile unsigned long *addr)
 {
+	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long res;
-
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
+	unsigned long res, temp;
 
+	if (!kernel_uses_llsc) {
+		res = __mips_test_and_set_bit_lock(nr, addr);
+	} else if (R10000_LLSC_WAR) {
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	arch=r4000				\n"
@@ -310,11 +323,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 		: "=&r" (temp), "+m" (*m), "=&r" (res)
 		: "r" (1UL << bit)
 		: __LLSC_CLOBBER);
-	} else if (kernel_uses_llsc) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
-
-		loongson_llsc_mb();
+	} else {
 		do {
 			__asm__ __volatile__(
 			"	.set	push				\n"
@@ -329,8 +338,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
-	} else
-		res = __mips_test_and_set_bit_lock(nr, addr);
+	}
 
 	smp_llsc_mb();
 
@@ -347,15 +355,15 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 static inline int test_and_clear_bit(unsigned long nr,
 	volatile unsigned long *addr)
 {
+	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long res;
+	unsigned long res, temp;
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
-
+	if (!kernel_uses_llsc) {
+		res = __mips_test_and_clear_bit(nr, addr);
+	} else if (R10000_LLSC_WAR) {
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	arch=r4000				\n"
@@ -370,10 +378,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 		: "r" (1UL << bit)
 		: __LLSC_CLOBBER);
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-	} else if (kernel_uses_llsc && __builtin_constant_p(nr)) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
-
+	} else if (__builtin_constant_p(nr)) {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -386,10 +391,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 #endif
-	} else if (kernel_uses_llsc) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
-
+	} else {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -406,8 +408,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
-	} else
-		res = __mips_test_and_clear_bit(nr, addr);
+	}
 
 	smp_llsc_mb();
 
@@ -425,15 +426,15 @@ static inline int test_and_clear_bit(unsigned long nr,
 static inline int test_and_change_bit(unsigned long nr,
 	volatile unsigned long *addr)
 {
+	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long res;
+	unsigned long res, temp;
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
-
+	if (!kernel_uses_llsc) {
+		res = __mips_test_and_change_bit(nr, addr);
+	} else if (R10000_LLSC_WAR) {
 		__asm__ __volatile__(
 		"	.set	push					\n"
 		"	.set	arch=r4000				\n"
@@ -446,10 +447,7 @@ static inline int test_and_change_bit(unsigned long nr,
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
 		: __LLSC_CLOBBER);
-	} else if (kernel_uses_llsc) {
-		unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG);
-		unsigned long temp;
-
+	} else {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -465,8 +463,7 @@ static inline int test_and_change_bit(unsigned long nr,
 		} while (unlikely(!res));
 
 		res = temp & (1UL << bit);
-	} else
-		res = __mips_test_and_change_bit(nr, addr);
+	}
 
 	smp_llsc_mb();
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 16/37] MIPS: bitops: Use generic builtin ffs/fls; drop cpu_has_clo_clz
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (15 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 17/37] MIPS: bitops: Handle !kernel_uses_llsc first Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 18/37] MIPS: bitops: Only use ins for bit 16 or higher Paul Burton
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

The MIPS-specific implementations of __ffs(), ffs(), __fls() & fls()
make use of the MIPS clz instruction where possible. They do this via
inline asm, but in any configuration in which the kernel is built for a
MIPS32 or MIPS64 release 1 or higher instruction set we know that these
instructions are available & can be emitted using the __builtin_clz()
function & other associated builtins which are provided by all currently
supported versions of gcc.

When targeting an older instruction set GCC will generate a longer code
sequence similar to the fallback cases we have in our implementations.

As such, remove our custom implementations of these functions & use the
generic versions built atop compiler builtins. This allows us to drop a
significant chunk of code, along with the cpu_has_clo_clz feature macro
which was only used by these functions.

The only thing we lose here is the ability for kernels built to target a
pre-r1 ISA to opportunistically make use of clz when running on a CPU
that implements it. This seems like a small cost, and well worth paying
to simplify the code.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h                | 146 +-----------------
 arch/mips/include/asm/cpu-features.h          |  10 --
 .../asm/mach-malta/cpu-feature-overrides.h    |   2 -
 3 files changed, 4 insertions(+), 154 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 985d6a02f9ea..4b618afbfa5b 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -491,149 +491,11 @@ static inline void __clear_bit_unlock(unsigned long nr, volatile unsigned long *
 	nudge_writes();
 }
 
-/*
- * Return the bit position (0..63) of the most significant 1 bit in a word
- * Returns -1 if no 1 bit exists
- */
-static __always_inline unsigned long __fls(unsigned long word)
-{
-	int num;
-
-	if (BITS_PER_LONG == 32 && !__builtin_constant_p(word) &&
-	    __builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) {
-		__asm__(
-		"	.set	push					\n"
-		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"	clz	%0, %1					\n"
-		"	.set	pop					\n"
-		: "=r" (num)
-		: "r" (word));
-
-		return 31 - num;
-	}
-
-	if (BITS_PER_LONG == 64 && !__builtin_constant_p(word) &&
-	    __builtin_constant_p(cpu_has_mips64) && cpu_has_mips64) {
-		__asm__(
-		"	.set	push					\n"
-		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"	dclz	%0, %1					\n"
-		"	.set	pop					\n"
-		: "=r" (num)
-		: "r" (word));
-
-		return 63 - num;
-	}
-
-	num = BITS_PER_LONG - 1;
-
-#if BITS_PER_LONG == 64
-	if (!(word & (~0ul << 32))) {
-		num -= 32;
-		word <<= 32;
-	}
-#endif
-	if (!(word & (~0ul << (BITS_PER_LONG-16)))) {
-		num -= 16;
-		word <<= 16;
-	}
-	if (!(word & (~0ul << (BITS_PER_LONG-8)))) {
-		num -= 8;
-		word <<= 8;
-	}
-	if (!(word & (~0ul << (BITS_PER_LONG-4)))) {
-		num -= 4;
-		word <<= 4;
-	}
-	if (!(word & (~0ul << (BITS_PER_LONG-2)))) {
-		num -= 2;
-		word <<= 2;
-	}
-	if (!(word & (~0ul << (BITS_PER_LONG-1))))
-		num -= 1;
-	return num;
-}
-
-/*
- * __ffs - find first bit in word.
- * @word: The word to search
- *
- * Returns 0..SZLONG-1
- * Undefined if no bit exists, so code should check against 0 first.
- */
-static __always_inline unsigned long __ffs(unsigned long word)
-{
-	return __fls(word & -word);
-}
-
-/*
- * fls - find last bit set.
- * @word: The word to search
- *
- * This is defined the same way as ffs.
- * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
- */
-static inline int fls(unsigned int x)
-{
-	int r;
-
-	if (!__builtin_constant_p(x) &&
-	    __builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) {
-		__asm__(
-		"	.set	push					\n"
-		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"	clz	%0, %1					\n"
-		"	.set	pop					\n"
-		: "=r" (x)
-		: "r" (x));
-
-		return 32 - x;
-	}
-
-	r = 32;
-	if (!x)
-		return 0;
-	if (!(x & 0xffff0000u)) {
-		x <<= 16;
-		r -= 16;
-	}
-	if (!(x & 0xff000000u)) {
-		x <<= 8;
-		r -= 8;
-	}
-	if (!(x & 0xf0000000u)) {
-		x <<= 4;
-		r -= 4;
-	}
-	if (!(x & 0xc0000000u)) {
-		x <<= 2;
-		r -= 2;
-	}
-	if (!(x & 0x80000000u)) {
-		x <<= 1;
-		r -= 1;
-	}
-	return r;
-}
-
+#include <asm-generic/bitops/builtin-__ffs.h>
+#include <asm-generic/bitops/builtin-ffs.h>
+#include <asm-generic/bitops/builtin-__fls.h>
+#include <asm-generic/bitops/builtin-fls.h>
 #include <asm-generic/bitops/fls64.h>
-
-/*
- * ffs - find first bit set.
- * @word: The word to search
- *
- * This is defined the same way as
- * the libc and compiler builtin ffs routines, therefore
- * differs in spirit from the above ffz (man ffs).
- */
-static inline int ffs(int word)
-{
-	if (!word)
-		return 0;
-
-	return fls(word & -word);
-}
-
 #include <asm-generic/bitops/ffz.h>
 #include <asm-generic/bitops/find.h>
 
diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
index 983a6a7f43a1..274a35ae15af 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -362,16 +362,6 @@
 })
 #endif
 
-/*
- * MIPS32, MIPS64, VR5500, IDT32332, IDT32334 and maybe a few other
- * pre-MIPS32/MIPS64 processors have CLO, CLZ.	The IDT RC64574 is 64-bit and
- * has CLO and CLZ but not DCLO nor DCLZ.  For 64-bit kernels
- * cpu_has_clo_clz also indicates the availability of DCLO and DCLZ.
- */
-#ifndef cpu_has_clo_clz
-#define cpu_has_clo_clz	cpu_has_mips_r
-#endif
-
 /*
  * MIPS32 R2, MIPS64 R2, Loongson 3A and Octeon have WSBH.
  * MIPS64 R2, Loongson 3A and Octeon have WSBH, DSBH and DSHD.
diff --git a/arch/mips/include/asm/mach-malta/cpu-feature-overrides.h b/arch/mips/include/asm/mach-malta/cpu-feature-overrides.h
index de3b66a3723e..193c0912d38e 100644
--- a/arch/mips/include/asm/mach-malta/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-malta/cpu-feature-overrides.h
@@ -32,7 +32,6 @@
 /* #define cpu_has_vtag_icache	? */
 /* #define cpu_has_dc_aliases	? */
 /* #define cpu_has_ic_fills_f_dc ? */
-#define cpu_has_clo_clz		1
 #define cpu_has_nofpuex		0
 /* #define cpu_has_64bits	? */
 /* #define cpu_has_64bit_zero_reg ? */
@@ -59,7 +58,6 @@
 /* #define cpu_has_vtag_icache	? */
 /* #define cpu_has_dc_aliases	? */
 /* #define cpu_has_ic_fills_f_dc ? */
-#define cpu_has_clo_clz		1
 #define cpu_has_nofpuex		0
 /* #define cpu_has_64bits	? */
 /* #define cpu_has_64bit_zero_reg ? */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 18/37] MIPS: bitops: Only use ins for bit 16 or higher
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (16 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 16/37] MIPS: bitops: Use generic builtin ffs/fls; drop cpu_has_clo_clz Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 19/37] MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs Paul Burton
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

set_bit() can set bits 0-15 using an ori instruction, rather than
loading the value -1 into a register & then using an ins instruction.

That is, rather than the following:

  li   t0, -1
  ll   t1, 0(t2)
  ins  t1, t0, 4, 1
  sc   t1, 0(t2)

We can have the simpler:

  ll   t1, 0(t2)
  ori  t1, t1, 0x10
  sc   t1, 0(t2)

The or path already allows immediates to be used, so simply restricting
the ins path to bits that don't fit in immediates is sufficient to take
advantage of this.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index d3f3f37ca0b1..3ea4f172ac08 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -77,7 +77,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	}
 
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-	if (__builtin_constant_p(bit)) {
+	if (__builtin_constant_p(bit) && (bit >= 16)) {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 19/37] MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (17 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 18/37] MIPS: bitops: Only use ins for bit 16 or higher Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 20/37] MIPS: bitops: ins start position is always an immediate Paul Burton
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Rather than #ifdef on CONFIG_CPU_* to determine whether the ins
instruction is supported we can simply check MIPS_ISA_REV to discover
whether we're targeting MIPSr2 or higher. Do so in order to clean up the
code.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 3ea4f172ac08..b8785bdf3507 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -19,6 +19,7 @@
 #include <asm/byteorder.h>		/* sigh ... */
 #include <asm/compiler.h>
 #include <asm/cpu-features.h>
+#include <asm/isa-rev.h>
 #include <asm/llsc.h>
 #include <asm/sgidefs.h>
 #include <asm/war.h>
@@ -76,8 +77,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 		return;
 	}
 
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-	if (__builtin_constant_p(bit) && (bit >= 16)) {
+	if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit) && (bit >= 16)) {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -90,7 +90,6 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 		} while (unlikely(!temp));
 		return;
 	}
-#endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 
 	loongson_llsc_mb();
 	do {
@@ -143,8 +142,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 		return;
 	}
 
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-	if (__builtin_constant_p(bit)) {
+	if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit)) {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -157,7 +155,6 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 		} while (unlikely(!temp));
 		return;
 	}
-#endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 
 	loongson_llsc_mb();
 	do {
@@ -377,8 +374,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "r" (1UL << bit)
 		: __LLSC_CLOBBER);
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
-	} else if (__builtin_constant_p(nr)) {
+	} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
 		loongson_llsc_mb();
 		do {
 			__asm__ __volatile__(
@@ -390,7 +386,6 @@ static inline int test_and_clear_bit(unsigned long nr,
 			: "ir" (bit)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
-#endif
 	} else {
 		loongson_llsc_mb();
 		do {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 20/37] MIPS: bitops: ins start position is always an immediate
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (18 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 19/37] MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 21/37] MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant Paul Burton
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

The start position for an ins instruction is always encoded as an
immediate, so allowing registers to be used by the inline asm makes no
sense. It should never happen anyway since a bit index should always be
small enough to be treated as an immediate, but remove the nonsensical
"r" for sanity.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index b8785bdf3507..83fd1f1c3ab4 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -85,7 +85,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 			"	" __INS "%0, %3, %2, 1			\n"
 			"	" __SC "%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (bit), "r" (~0)
+			: "i" (bit), "r" (~0)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 		return;
@@ -150,7 +150,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 			"	" __INS "%0, $0, %2, 1			\n"
 			"	" __SC "%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "ir" (bit)
+			: "i" (bit)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 		return;
@@ -383,7 +383,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 			"	" __INS "%0, $0, %3, 1			\n"
 			"	" __SC	"%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "ir" (bit)
+			: "i" (bit)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!temp));
 	} else {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 21/37] MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (19 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 20/37] MIPS: bitops: ins start position is always an immediate Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 22/37] MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit Paul Burton
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

The only difference between test_and_set_bit() & test_and_set_bit_lock()
is memory ordering barrier semantics - the former provides a full
barrier whilst the latter only provides acquire semantics.

We can therefore implement test_and_set_bit() in terms of
test_and_set_bit_lock() with the addition of the extra memory barrier.
Do this in order to avoid duplicating logic.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 66 +++++++---------------------------
 arch/mips/lib/bitops.c         | 26 --------------
 2 files changed, 13 insertions(+), 79 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 83fd1f1c3ab4..34d6fe3f18d0 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -31,8 +31,6 @@
 void __mips_set_bit(unsigned long nr, volatile unsigned long *addr);
 void __mips_clear_bit(unsigned long nr, volatile unsigned long *addr);
 void __mips_change_bit(unsigned long nr, volatile unsigned long *addr);
-int __mips_test_and_set_bit(unsigned long nr,
-			    volatile unsigned long *addr);
 int __mips_test_and_set_bit_lock(unsigned long nr,
 				 volatile unsigned long *addr);
 int __mips_test_and_clear_bit(unsigned long nr,
@@ -236,24 +234,22 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 }
 
 /*
- * test_and_set_bit - Set a bit and return its old value
+ * test_and_set_bit_lock - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
+ * This operation is atomic and implies acquire ordering semantics
+ * after the memory operation.
  */
-static inline int test_and_set_bit(unsigned long nr,
+static inline int test_and_set_bit_lock(unsigned long nr,
 	volatile unsigned long *addr)
 {
 	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
 	unsigned long res, temp;
 
-	smp_mb__before_llsc();
-
 	if (!kernel_uses_llsc) {
-		res = __mips_test_and_set_bit(nr, addr);
+		res = __mips_test_and_set_bit_lock(nr, addr);
 	} else if (R10000_LLSC_WAR) {
 		__asm__ __volatile__(
 		"	.set	push					\n"
@@ -264,7 +260,7 @@ static inline int test_and_set_bit(unsigned long nr,
 		"	beqzl	%2, 1b					\n"
 		"	and	%2, %0, %3				\n"
 		"	.set	pop					\n"
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
+		: "=&r" (temp), "+m" (*m), "=&r" (res)
 		: "r" (1UL << bit)
 		: __LLSC_CLOBBER);
 	} else {
@@ -291,56 +287,20 @@ static inline int test_and_set_bit(unsigned long nr,
 }
 
 /*
- * test_and_set_bit_lock - Set a bit and return its old value
+ * test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
- * This operation is atomic and implies acquire ordering semantics
- * after the memory operation.
+ * This operation is atomic and cannot be reordered.
+ * It also implies a memory barrier.
  */
-static inline int test_and_set_bit_lock(unsigned long nr,
+static inline int test_and_set_bit(unsigned long nr,
 	volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-	int bit = nr & SZLONG_MASK;
-	unsigned long res, temp;
-
-	if (!kernel_uses_llsc) {
-		res = __mips_test_and_set_bit_lock(nr, addr);
-	} else if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	push					\n"
-		"	.set	arch=r4000				\n"
-		"1:	" __LL "%0, %1		# test_and_set_bit	\n"
-		"	or	%2, %0, %3				\n"
-		"	" __SC	"%2, %1					\n"
-		"	beqzl	%2, 1b					\n"
-		"	and	%2, %0, %3				\n"
-		"	.set	pop					\n"
-		: "=&r" (temp), "+m" (*m), "=&r" (res)
-		: "r" (1UL << bit)
-		: __LLSC_CLOBBER);
-	} else {
-		do {
-			__asm__ __volatile__(
-			"	.set	push				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	" __LL "%0, %1	# test_and_set_bit	\n"
-			"	or	%2, %0, %3			\n"
-			"	" __SC	"%2, %1				\n"
-			"	.set	pop				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "r" (1UL << bit)
-			: __LLSC_CLOBBER);
-		} while (unlikely(!res));
-
-		res = temp & (1UL << bit);
-	}
-
-	smp_llsc_mb();
-
-	return res != 0;
+	smp_mb__before_llsc();
+	return test_and_set_bit_lock(nr, addr);
 }
+
 /*
  * test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to clear
diff --git a/arch/mips/lib/bitops.c b/arch/mips/lib/bitops.c
index 3b2a1e78a543..fba402c0879d 100644
--- a/arch/mips/lib/bitops.c
+++ b/arch/mips/lib/bitops.c
@@ -77,32 +77,6 @@ void __mips_change_bit(unsigned long nr, volatile unsigned long *addr)
 EXPORT_SYMBOL(__mips_change_bit);
 
 
-/**
- * __mips_test_and_set_bit - Set a bit and return its old value.  This is
- * called by test_and_set_bit() if it cannot find a faster solution.
- * @nr: Bit to set
- * @addr: Address to count from
- */
-int __mips_test_and_set_bit(unsigned long nr,
-			    volatile unsigned long *addr)
-{
-	unsigned long *a = (unsigned long *)addr;
-	unsigned bit = nr & SZLONG_MASK;
-	unsigned long mask;
-	unsigned long flags;
-	int res;
-
-	a += nr >> SZLONG_LOG;
-	mask = 1UL << bit;
-	raw_local_irq_save(flags);
-	res = (mask & *a) != 0;
-	*a |= mask;
-	raw_local_irq_restore(flags);
-	return res;
-}
-EXPORT_SYMBOL(__mips_test_and_set_bit);
-
-
 /**
  * __mips_test_and_set_bit_lock - Set a bit and return its old value.  This is
  * called by test_and_set_bit_lock() if it cannot find a faster solution.
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 22/37] MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (20 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 21/37] MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 23/37] MIPS: bitops: Use the BIT() macro Paul Burton
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

The logical operations or & xor used in the test_and_set_bit_lock(),
test_and_clear_bit() & test_and_change_bit() functions currently force
the value 1<<bit to be placed in a register. If the bit is compile-time
constant & fits within the immediate field of an or/xor instruction (ie.
16 bits) then we can make use of the ori/xori instruction variants &
avoid the use of an extra register. Add the extra "i" constraints in
order to allow use of these immediate encodings.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 34d6fe3f18d0..0b0ce0adce8f 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -261,7 +261,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 		"	and	%2, %0, %3				\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+m" (*m), "=&r" (res)
-		: "r" (1UL << bit)
+		: "ir" (1UL << bit)
 		: __LLSC_CLOBBER);
 	} else {
 		loongson_llsc_mb();
@@ -274,7 +274,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 			"	" __SC	"%2, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "r" (1UL << bit)
+			: "ir" (1UL << bit)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
@@ -332,7 +332,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 		"	and	%2, %0, %3				\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-		: "r" (1UL << bit)
+		: "ir" (1UL << bit)
 		: __LLSC_CLOBBER);
 	} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
 		loongson_llsc_mb();
@@ -358,7 +358,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 			"	" __SC	"%2, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "r" (1UL << bit)
+			: "ir" (1UL << bit)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
@@ -400,7 +400,7 @@ static inline int test_and_change_bit(unsigned long nr,
 		"	and	%2, %0, %3				\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-		: "r" (1UL << bit)
+		: "ir" (1UL << bit)
 		: __LLSC_CLOBBER);
 	} else {
 		loongson_llsc_mb();
@@ -413,7 +413,7 @@ static inline int test_and_change_bit(unsigned long nr,
 			"	" __SC	"\t%2, %1			\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "r" (1UL << bit)
+			: "ir" (1UL << bit)
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 23/37] MIPS: bitops: Use the BIT() macro
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (21 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 22/37] MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 24/37] MIPS: bitops: Avoid redundant zero-comparison for non-LLSC Paul Burton
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Use the BIT() macro in asm/bitops.h rather than open-coding its
equivalent.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 0b0ce0adce8f..35582afc057b 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -13,6 +13,7 @@
 #error only <linux/bitops.h> can be included directly
 #endif
 
+#include <linux/bits.h>
 #include <linux/compiler.h>
 #include <linux/types.h>
 #include <asm/barrier.h>
@@ -70,7 +71,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 		"	beqzl	%0, 1b					\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)
+		: "ir" (BIT(bit)), GCC_OFF_SMALL_ASM() (*m)
 		: __LLSC_CLOBBER);
 		return;
 	}
@@ -99,7 +100,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 		"	" __SC	"%0, %1				\n"
 		"	.set	pop				\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (1UL << bit)
+		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
 	} while (unlikely(!temp));
 }
@@ -135,7 +136,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 		"	beqzl	%0, 1b					\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (~(1UL << bit))
+		: "ir" (~(BIT(bit)))
 		: __LLSC_CLOBBER);
 		return;
 	}
@@ -164,7 +165,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 		"	" __SC "%0, %1				\n"
 		"	.set	pop				\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (~(1UL << bit))
+		: "ir" (~(BIT(bit)))
 		: __LLSC_CLOBBER);
 	} while (unlikely(!temp));
 }
@@ -213,7 +214,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 		"	beqzl	%0, 1b				\n"
 		"	.set	pop				\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (1UL << bit)
+		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
 		return;
 	}
@@ -228,7 +229,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 		"	" __SC	"%0, %1				\n"
 		"	.set	pop				\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (1UL << bit)
+		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
 	} while (unlikely(!temp));
 }
@@ -261,7 +262,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 		"	and	%2, %0, %3				\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+m" (*m), "=&r" (res)
-		: "ir" (1UL << bit)
+		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
 	} else {
 		loongson_llsc_mb();
@@ -274,11 +275,11 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 			"	" __SC	"%2, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "ir" (1UL << bit)
+			: "ir" (BIT(bit))
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
-		res = temp & (1UL << bit);
+		res = temp & BIT(bit);
 	}
 
 	smp_llsc_mb();
@@ -332,7 +333,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 		"	and	%2, %0, %3				\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-		: "ir" (1UL << bit)
+		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
 	} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
 		loongson_llsc_mb();
@@ -358,11 +359,11 @@ static inline int test_and_clear_bit(unsigned long nr,
 			"	" __SC	"%2, %1				\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "ir" (1UL << bit)
+			: "ir" (BIT(bit))
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
-		res = temp & (1UL << bit);
+		res = temp & BIT(bit);
 	}
 
 	smp_llsc_mb();
@@ -400,7 +401,7 @@ static inline int test_and_change_bit(unsigned long nr,
 		"	and	%2, %0, %3				\n"
 		"	.set	pop					\n"
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-		: "ir" (1UL << bit)
+		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
 	} else {
 		loongson_llsc_mb();
@@ -413,11 +414,11 @@ static inline int test_and_change_bit(unsigned long nr,
 			"	" __SC	"\t%2, %1			\n"
 			"	.set	pop				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "ir" (1UL << bit)
+			: "ir" (BIT(bit))
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
-		res = temp & (1UL << bit);
+		res = temp & BIT(bit);
 	}
 
 	smp_llsc_mb();
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 24/37] MIPS: bitops: Avoid redundant zero-comparison for non-LLSC
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (22 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 23/37] MIPS: bitops: Use the BIT() macro Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 25/37] MIPS: bitops: Abstract LL/SC loops Paul Burton
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

The IRQ-disabling non-LLSC fallbacks for bitops on UP systems already
return a zero or one, so there's no need to perform another comparison
against zero. Move these comparisons into the LLSC paths to avoid the
redundant work.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 35582afc057b..3e5589320e83 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -264,6 +264,8 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 		: "=&r" (temp), "+m" (*m), "=&r" (res)
 		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
+
+		res = res != 0;
 	} else {
 		loongson_llsc_mb();
 		do {
@@ -279,12 +281,12 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
-		res = temp & BIT(bit);
+		res = (temp & BIT(bit)) != 0;
 	}
 
 	smp_llsc_mb();
 
-	return res != 0;
+	return res;
 }
 
 /*
@@ -335,6 +337,8 @@ static inline int test_and_clear_bit(unsigned long nr,
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
+
+		res = res != 0;
 	} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
 		loongson_llsc_mb();
 		do {
@@ -363,12 +367,12 @@ static inline int test_and_clear_bit(unsigned long nr,
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
-		res = temp & BIT(bit);
+		res = (temp & BIT(bit)) != 0;
 	}
 
 	smp_llsc_mb();
 
-	return res != 0;
+	return res;
 }
 
 /*
@@ -403,6 +407,8 @@ static inline int test_and_change_bit(unsigned long nr,
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
 		: "ir" (BIT(bit))
 		: __LLSC_CLOBBER);
+
+		res = res != 0;
 	} else {
 		loongson_llsc_mb();
 		do {
@@ -418,12 +424,12 @@ static inline int test_and_change_bit(unsigned long nr,
 			: __LLSC_CLOBBER);
 		} while (unlikely(!res));
 
-		res = temp & BIT(bit);
+		res = (temp & BIT(bit)) != 0;
 	}
 
 	smp_llsc_mb();
 
-	return res != 0;
+	return res;
 }
 
 #include <asm-generic/bitops/non-atomic.h>
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 25/37] MIPS: bitops: Abstract LL/SC loops
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (23 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 24/37] MIPS: bitops: Avoid redundant zero-comparison for non-LLSC Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 26/37] MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG Paul Burton
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Introduce __bit_op() & __test_bit_op() macros which abstract away the
implementation of LL/SC loops. This cuts down on a lot of duplicate
boilerplate code, and also allows R10000_LLSC_WAR to be handled outside
of the individual bitop functions.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 267 ++++++++-------------------------
 1 file changed, 63 insertions(+), 204 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 3e5589320e83..5701f8b41e87 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -25,6 +25,41 @@
 #include <asm/sgidefs.h>
 #include <asm/war.h>
 
+#define __bit_op(mem, insn, inputs...) do {			\
+	unsigned long temp;					\
+								\
+	asm volatile(						\
+	"	.set		push			\n"	\
+	"	.set		" MIPS_ISA_LEVEL "	\n"	\
+	"1:	" __LL		"%0, %1			\n"	\
+	"	" insn		"			\n"	\
+	"	" __SC		"%0, %1			\n"	\
+	"	" __SC_BEQZ	"%0, 1b			\n"	\
+	"	.set		pop			\n"	\
+	: "=&r"(temp), "+" GCC_OFF_SMALL_ASM()(mem)		\
+	: inputs						\
+	: __LLSC_CLOBBER);					\
+} while (0)
+
+#define __test_bit_op(mem, ll_dst, insn, inputs...) ({		\
+	unsigned long orig, temp;				\
+								\
+	asm volatile(						\
+	"	.set		push			\n"	\
+	"	.set		" MIPS_ISA_LEVEL "	\n"	\
+	"1:	" __LL		ll_dst ", %2		\n"	\
+	"	" insn		"			\n"	\
+	"	" __SC		"%1, %2			\n"	\
+	"	" __SC_BEQZ	"%1, 1b			\n"	\
+	"	.set		pop			\n"	\
+	: "=&r"(orig), "=&r"(temp),				\
+	  "+" GCC_OFF_SMALL_ASM()(mem)				\
+	: inputs						\
+	: __LLSC_CLOBBER);					\
+								\
+	orig;							\
+})
+
 /*
  * These are the "slower" versions of the functions and are in bitops.c.
  * These functions call raw_local_irq_{save,restore}().
@@ -54,55 +89,20 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long temp;
 
 	if (!kernel_uses_llsc) {
 		__mips_set_bit(nr, addr);
 		return;
 	}
 
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	push					\n"
-		"	.set	arch=r4000				\n"
-		"1:	" __LL "%0, %1			# set_bit	\n"
-		"	or	%0, %2					\n"
-		"	" __SC	"%0, %1					\n"
-		"	beqzl	%0, 1b					\n"
-		"	.set	pop					\n"
-		: "=&r" (temp), "=" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (BIT(bit)), GCC_OFF_SMALL_ASM() (*m)
-		: __LLSC_CLOBBER);
-		return;
-	}
-
 	if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit) && (bit >= 16)) {
 		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	" __LL "%0, %1		# set_bit	\n"
-			"	" __INS "%0, %3, %2, 1			\n"
-			"	" __SC "%0, %1				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "i" (bit), "r" (~0)
-			: __LLSC_CLOBBER);
-		} while (unlikely(!temp));
+		__bit_op(*m, __INS "%0, %3, %2, 1", "i"(bit), "r"(~0));
 		return;
 	}
 
 	loongson_llsc_mb();
-	do {
-		__asm__ __volatile__(
-		"	.set	push				\n"
-		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-		"	" __LL "%0, %1		# set_bit	\n"
-		"	or	%0, %2				\n"
-		"	" __SC	"%0, %1				\n"
-		"	.set	pop				\n"
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (BIT(bit))
-		: __LLSC_CLOBBER);
-	} while (unlikely(!temp));
+	__bit_op(*m, "or\t%0, %2", "ir"(BIT(bit)));
 }
 
 /*
@@ -119,55 +119,20 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long temp;
 
 	if (!kernel_uses_llsc) {
 		__mips_clear_bit(nr, addr);
 		return;
 	}
 
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	push					\n"
-		"	.set	arch=r4000				\n"
-		"1:	" __LL "%0, %1			# clear_bit	\n"
-		"	and	%0, %2					\n"
-		"	" __SC "%0, %1					\n"
-		"	beqzl	%0, 1b					\n"
-		"	.set	pop					\n"
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (~(BIT(bit)))
-		: __LLSC_CLOBBER);
-		return;
-	}
-
 	if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit)) {
 		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	" __LL "%0, %1		# clear_bit	\n"
-			"	" __INS "%0, $0, %2, 1			\n"
-			"	" __SC "%0, %1				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-			: "i" (bit)
-			: __LLSC_CLOBBER);
-		} while (unlikely(!temp));
+		__bit_op(*m, __INS "%0, $0, %2, 1", "i"(bit));
 		return;
 	}
 
 	loongson_llsc_mb();
-	do {
-		__asm__ __volatile__(
-		"	.set	push				\n"
-		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-		"	" __LL "%0, %1		# clear_bit	\n"
-		"	and	%0, %2				\n"
-		"	" __SC "%0, %1				\n"
-		"	.set	pop				\n"
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (~(BIT(bit)))
-		: __LLSC_CLOBBER);
-	} while (unlikely(!temp));
+	__bit_op(*m, "and\t%0, %2", "ir"(~BIT(bit)));
 }
 
 /*
@@ -197,41 +162,14 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 {
 	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long temp;
 
 	if (!kernel_uses_llsc) {
 		__mips_change_bit(nr, addr);
 		return;
 	}
 
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	push				\n"
-		"	.set	arch=r4000			\n"
-		"1:	" __LL "%0, %1		# change_bit	\n"
-		"	xor	%0, %2				\n"
-		"	" __SC	"%0, %1				\n"
-		"	beqzl	%0, 1b				\n"
-		"	.set	pop				\n"
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (BIT(bit))
-		: __LLSC_CLOBBER);
-		return;
-	}
-
 	loongson_llsc_mb();
-	do {
-		__asm__ __volatile__(
-		"	.set	push				\n"
-		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-		"	" __LL "%0, %1		# change_bit	\n"
-		"	xor	%0, %2				\n"
-		"	" __SC	"%0, %1				\n"
-		"	.set	pop				\n"
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
-		: "ir" (BIT(bit))
-		: __LLSC_CLOBBER);
-	} while (unlikely(!temp));
+	__bit_op(*m, "xor\t%0, %2", "ir"(BIT(bit)));
 }
 
 /*
@@ -247,41 +185,16 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 {
 	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long res, temp;
+	unsigned long res, orig;
 
 	if (!kernel_uses_llsc) {
 		res = __mips_test_and_set_bit_lock(nr, addr);
-	} else if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	push					\n"
-		"	.set	arch=r4000				\n"
-		"1:	" __LL "%0, %1		# test_and_set_bit	\n"
-		"	or	%2, %0, %3				\n"
-		"	" __SC	"%2, %1					\n"
-		"	beqzl	%2, 1b					\n"
-		"	and	%2, %0, %3				\n"
-		"	.set	pop					\n"
-		: "=&r" (temp), "+m" (*m), "=&r" (res)
-		: "ir" (BIT(bit))
-		: __LLSC_CLOBBER);
-
-		res = res != 0;
 	} else {
 		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	.set	push				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	" __LL "%0, %1	# test_and_set_bit	\n"
-			"	or	%2, %0, %3			\n"
-			"	" __SC	"%2, %1				\n"
-			"	.set	pop				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "ir" (BIT(bit))
-			: __LLSC_CLOBBER);
-		} while (unlikely(!res));
-
-		res = (temp & BIT(bit)) != 0;
+		orig = __test_bit_op(*m, "%0",
+				     "or\t%1, %0, %3",
+				     "ir"(BIT(bit)));
+		res = (orig & BIT(bit)) != 0;
 	}
 
 	smp_llsc_mb();
@@ -317,57 +230,25 @@ static inline int test_and_clear_bit(unsigned long nr,
 {
 	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long res, temp;
+	unsigned long res, orig;
 
 	smp_mb__before_llsc();
 
 	if (!kernel_uses_llsc) {
 		res = __mips_test_and_clear_bit(nr, addr);
-	} else if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	push					\n"
-		"	.set	arch=r4000				\n"
-		"1:	" __LL	"%0, %1		# test_and_clear_bit	\n"
-		"	or	%2, %0, %3				\n"
-		"	xor	%2, %3					\n"
-		"	" __SC	"%2, %1					\n"
-		"	beqzl	%2, 1b					\n"
-		"	and	%2, %0, %3				\n"
-		"	.set	pop					\n"
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-		: "ir" (BIT(bit))
-		: __LLSC_CLOBBER);
-
-		res = res != 0;
 	} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
 		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	" __LL	"%0, %1 # test_and_clear_bit	\n"
-			"	" __EXT "%2, %0, %3, 1			\n"
-			"	" __INS "%0, $0, %3, 1			\n"
-			"	" __SC	"%0, %1				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "i" (bit)
-			: __LLSC_CLOBBER);
-		} while (unlikely(!temp));
+		res = __test_bit_op(*m, "%1",
+				    __EXT "%0, %1, %3, 1;"
+				    __INS "%1, $0, %3, 1",
+				    "i"(bit));
 	} else {
 		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	.set	push				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	" __LL	"%0, %1 # test_and_clear_bit	\n"
-			"	or	%2, %0, %3			\n"
-			"	xor	%2, %3				\n"
-			"	" __SC	"%2, %1				\n"
-			"	.set	pop				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "ir" (BIT(bit))
-			: __LLSC_CLOBBER);
-		} while (unlikely(!res));
-
-		res = (temp & BIT(bit)) != 0;
+		orig = __test_bit_op(*m, "%0",
+				     "or\t%1, %0, %3;"
+				     "xor\t%1, %1, %3",
+				     "ir"(BIT(bit)));
+		res = (orig & BIT(bit)) != 0;
 	}
 
 	smp_llsc_mb();
@@ -388,43 +269,18 @@ static inline int test_and_change_bit(unsigned long nr,
 {
 	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
 	int bit = nr & SZLONG_MASK;
-	unsigned long res, temp;
+	unsigned long res, orig;
 
 	smp_mb__before_llsc();
 
 	if (!kernel_uses_llsc) {
 		res = __mips_test_and_change_bit(nr, addr);
-	} else if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	push					\n"
-		"	.set	arch=r4000				\n"
-		"1:	" __LL	"%0, %1		# test_and_change_bit	\n"
-		"	xor	%2, %0, %3				\n"
-		"	" __SC	"%2, %1					\n"
-		"	beqzl	%2, 1b					\n"
-		"	and	%2, %0, %3				\n"
-		"	.set	pop					\n"
-		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-		: "ir" (BIT(bit))
-		: __LLSC_CLOBBER);
-
-		res = res != 0;
 	} else {
 		loongson_llsc_mb();
-		do {
-			__asm__ __volatile__(
-			"	.set	push				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	" __LL	"%0, %1 # test_and_change_bit	\n"
-			"	xor	%2, %0, %3			\n"
-			"	" __SC	"\t%2, %1			\n"
-			"	.set	pop				\n"
-			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m), "=&r" (res)
-			: "ir" (BIT(bit))
-			: __LLSC_CLOBBER);
-		} while (unlikely(!res));
-
-		res = (temp & BIT(bit)) != 0;
+		orig = __test_bit_op(*m, "%0",
+				     "xor\t%1, %0, %3",
+				     "ir"(BIT(bit)));
+		res = (orig & BIT(bit)) != 0;
 	}
 
 	smp_llsc_mb();
@@ -432,6 +288,9 @@ static inline int test_and_change_bit(unsigned long nr,
 	return res;
 }
 
+#undef __bit_op
+#undef __test_bit_op
+
 #include <asm-generic/bitops/non-atomic.h>
 
 /*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 26/37] MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (24 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 25/37] MIPS: bitops: Abstract LL/SC loops Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 27/37] MIPS: bitops: Emit Loongson3 sync workarounds within asm Paul Burton
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Rather than using custom SZLONG_LOG & SZLONG_MASK macros to shift & mask
a bit index to form word & bit offsets respectively, make use of the
standard BIT_WORD() & BITS_PER_LONG macros for the same purpose.

volatile is added to the definition of pointers to the long-sized word
we'll operate on, in order to prevent the compiler complaining that we
cast away the volatile qualifier of the addr argument. This should have
no effect on generated code, which in the LL/SC case is inline asm
anyway & in the non-LLSC case access is constrained by compiler barriers
provided by raw_local_irq_{save,restore}().

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 24 ++++++++++++------------
 arch/mips/include/asm/llsc.h   |  4 ----
 arch/mips/lib/bitops.c         | 31 +++++++++++++------------------
 3 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 5701f8b41e87..59fe1d5d4fc9 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -87,8 +87,8 @@ int __mips_test_and_change_bit(unsigned long nr,
  */
 static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-	int bit = nr & SZLONG_MASK;
+	volatile unsigned long *m = &addr[BIT_WORD(nr)];
+	int bit = nr % BITS_PER_LONG;
 
 	if (!kernel_uses_llsc) {
 		__mips_set_bit(nr, addr);
@@ -117,8 +117,8 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
  */
 static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-	int bit = nr & SZLONG_MASK;
+	volatile unsigned long *m = &addr[BIT_WORD(nr)];
+	int bit = nr % BITS_PER_LONG;
 
 	if (!kernel_uses_llsc) {
 		__mips_clear_bit(nr, addr);
@@ -160,8 +160,8 @@ static inline void clear_bit_unlock(unsigned long nr, volatile unsigned long *ad
  */
 static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-	int bit = nr & SZLONG_MASK;
+	volatile unsigned long *m = &addr[BIT_WORD(nr)];
+	int bit = nr % BITS_PER_LONG;
 
 	if (!kernel_uses_llsc) {
 		__mips_change_bit(nr, addr);
@@ -183,8 +183,8 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 static inline int test_and_set_bit_lock(unsigned long nr,
 	volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-	int bit = nr & SZLONG_MASK;
+	volatile unsigned long *m = &addr[BIT_WORD(nr)];
+	int bit = nr % BITS_PER_LONG;
 	unsigned long res, orig;
 
 	if (!kernel_uses_llsc) {
@@ -228,8 +228,8 @@ static inline int test_and_set_bit(unsigned long nr,
 static inline int test_and_clear_bit(unsigned long nr,
 	volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-	int bit = nr & SZLONG_MASK;
+	volatile unsigned long *m = &addr[BIT_WORD(nr)];
+	int bit = nr % BITS_PER_LONG;
 	unsigned long res, orig;
 
 	smp_mb__before_llsc();
@@ -267,8 +267,8 @@ static inline int test_and_clear_bit(unsigned long nr,
 static inline int test_and_change_bit(unsigned long nr,
 	volatile unsigned long *addr)
 {
-	unsigned long *m = ((unsigned long *)addr) + (nr >> SZLONG_LOG);
-	int bit = nr & SZLONG_MASK;
+	volatile unsigned long *m = &addr[BIT_WORD(nr)];
+	int bit = nr % BITS_PER_LONG;
 	unsigned long res, orig;
 
 	smp_mb__before_llsc();
diff --git a/arch/mips/include/asm/llsc.h b/arch/mips/include/asm/llsc.h
index d240a4a2d1c4..c49738bc3bda 100644
--- a/arch/mips/include/asm/llsc.h
+++ b/arch/mips/include/asm/llsc.h
@@ -12,15 +12,11 @@
 #include <asm/isa-rev.h>
 
 #if _MIPS_SZLONG == 32
-#define SZLONG_LOG 5
-#define SZLONG_MASK 31UL
 #define __LL		"ll	"
 #define __SC		"sc	"
 #define __INS		"ins	"
 #define __EXT		"ext	"
 #elif _MIPS_SZLONG == 64
-#define SZLONG_LOG 6
-#define SZLONG_MASK 63UL
 #define __LL		"lld	"
 #define __SC		"scd	"
 #define __INS		"dins	"
diff --git a/arch/mips/lib/bitops.c b/arch/mips/lib/bitops.c
index fba402c0879d..116d0bd8b2ae 100644
--- a/arch/mips/lib/bitops.c
+++ b/arch/mips/lib/bitops.c
@@ -7,6 +7,7 @@
  * Copyright (c) 1999, 2000  Silicon Graphics, Inc.
  */
 #include <linux/bitops.h>
+#include <linux/bits.h>
 #include <linux/irqflags.h>
 #include <linux/export.h>
 
@@ -19,12 +20,11 @@
  */
 void __mips_set_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *a = (unsigned long *)addr;
-	unsigned bit = nr & SZLONG_MASK;
+	volatile unsigned long *a = &addr[BIT_WORD(nr)];
+	unsigned int bit = nr % BITS_PER_LONG;
 	unsigned long mask;
 	unsigned long flags;
 
-	a += nr >> SZLONG_LOG;
 	mask = 1UL << bit;
 	raw_local_irq_save(flags);
 	*a |= mask;
@@ -41,12 +41,11 @@ EXPORT_SYMBOL(__mips_set_bit);
  */
 void __mips_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *a = (unsigned long *)addr;
-	unsigned bit = nr & SZLONG_MASK;
+	volatile unsigned long *a = &addr[BIT_WORD(nr)];
+	unsigned int bit = nr % BITS_PER_LONG;
 	unsigned long mask;
 	unsigned long flags;
 
-	a += nr >> SZLONG_LOG;
 	mask = 1UL << bit;
 	raw_local_irq_save(flags);
 	*a &= ~mask;
@@ -63,12 +62,11 @@ EXPORT_SYMBOL(__mips_clear_bit);
  */
 void __mips_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *a = (unsigned long *)addr;
-	unsigned bit = nr & SZLONG_MASK;
+	volatile unsigned long *a = &addr[BIT_WORD(nr)];
+	unsigned int bit = nr % BITS_PER_LONG;
 	unsigned long mask;
 	unsigned long flags;
 
-	a += nr >> SZLONG_LOG;
 	mask = 1UL << bit;
 	raw_local_irq_save(flags);
 	*a ^= mask;
@@ -86,13 +84,12 @@ EXPORT_SYMBOL(__mips_change_bit);
 int __mips_test_and_set_bit_lock(unsigned long nr,
 				 volatile unsigned long *addr)
 {
-	unsigned long *a = (unsigned long *)addr;
-	unsigned bit = nr & SZLONG_MASK;
+	volatile unsigned long *a = &addr[BIT_WORD(nr)];
+	unsigned int bit = nr % BITS_PER_LONG;
 	unsigned long mask;
 	unsigned long flags;
 	int res;
 
-	a += nr >> SZLONG_LOG;
 	mask = 1UL << bit;
 	raw_local_irq_save(flags);
 	res = (mask & *a) != 0;
@@ -111,13 +108,12 @@ EXPORT_SYMBOL(__mips_test_and_set_bit_lock);
  */
 int __mips_test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *a = (unsigned long *)addr;
-	unsigned bit = nr & SZLONG_MASK;
+	volatile unsigned long *a = &addr[BIT_WORD(nr)];
+	unsigned int bit = nr % BITS_PER_LONG;
 	unsigned long mask;
 	unsigned long flags;
 	int res;
 
-	a += nr >> SZLONG_LOG;
 	mask = 1UL << bit;
 	raw_local_irq_save(flags);
 	res = (mask & *a) != 0;
@@ -136,13 +132,12 @@ EXPORT_SYMBOL(__mips_test_and_clear_bit);
  */
 int __mips_test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 {
-	unsigned long *a = (unsigned long *)addr;
-	unsigned bit = nr & SZLONG_MASK;
+	volatile unsigned long *a = &addr[BIT_WORD(nr)];
+	unsigned int bit = nr % BITS_PER_LONG;
 	unsigned long mask;
 	unsigned long flags;
 	int res;
 
-	a += nr >> SZLONG_LOG;
 	mask = 1UL << bit;
 	raw_local_irq_save(flags);
 	res = (mask & *a) != 0;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 27/37] MIPS: bitops: Emit Loongson3 sync workarounds within asm
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (25 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 26/37] MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 28/37] MIPS: bitops: Use smp_mb__before_atomic in test_* ops Paul Burton
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 59fe1d5d4fc9..9e967d6622c8 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -31,6 +31,7 @@
 	asm volatile(						\
 	"	.set		push			\n"	\
 	"	.set		" MIPS_ISA_LEVEL "	\n"	\
+	"	" __SYNC(full, loongson3_war) "		\n"	\
 	"1:	" __LL		"%0, %1			\n"	\
 	"	" insn		"			\n"	\
 	"	" __SC		"%0, %1			\n"	\
@@ -47,6 +48,7 @@
 	asm volatile(						\
 	"	.set		push			\n"	\
 	"	.set		" MIPS_ISA_LEVEL "	\n"	\
+	"	" __SYNC(full, loongson3_war) "		\n"	\
 	"1:	" __LL		ll_dst ", %2		\n"	\
 	"	" insn		"			\n"	\
 	"	" __SC		"%1, %2			\n"	\
@@ -96,12 +98,10 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	}
 
 	if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit) && (bit >= 16)) {
-		loongson_llsc_mb();
 		__bit_op(*m, __INS "%0, %3, %2, 1", "i"(bit), "r"(~0));
 		return;
 	}
 
-	loongson_llsc_mb();
 	__bit_op(*m, "or\t%0, %2", "ir"(BIT(bit)));
 }
 
@@ -126,12 +126,10 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	}
 
 	if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(bit)) {
-		loongson_llsc_mb();
 		__bit_op(*m, __INS "%0, $0, %2, 1", "i"(bit));
 		return;
 	}
 
-	loongson_llsc_mb();
 	__bit_op(*m, "and\t%0, %2", "ir"(~BIT(bit)));
 }
 
@@ -168,7 +166,6 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 		return;
 	}
 
-	loongson_llsc_mb();
 	__bit_op(*m, "xor\t%0, %2", "ir"(BIT(bit)));
 }
 
@@ -190,7 +187,6 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 	if (!kernel_uses_llsc) {
 		res = __mips_test_and_set_bit_lock(nr, addr);
 	} else {
-		loongson_llsc_mb();
 		orig = __test_bit_op(*m, "%0",
 				     "or\t%1, %0, %3",
 				     "ir"(BIT(bit)));
@@ -237,13 +233,11 @@ static inline int test_and_clear_bit(unsigned long nr,
 	if (!kernel_uses_llsc) {
 		res = __mips_test_and_clear_bit(nr, addr);
 	} else if ((MIPS_ISA_REV >= 2) && __builtin_constant_p(nr)) {
-		loongson_llsc_mb();
 		res = __test_bit_op(*m, "%1",
 				    __EXT "%0, %1, %3, 1;"
 				    __INS "%1, $0, %3, 1",
 				    "i"(bit));
 	} else {
-		loongson_llsc_mb();
 		orig = __test_bit_op(*m, "%0",
 				     "or\t%1, %0, %3;"
 				     "xor\t%1, %1, %3",
@@ -276,7 +270,6 @@ static inline int test_and_change_bit(unsigned long nr,
 	if (!kernel_uses_llsc) {
 		res = __mips_test_and_change_bit(nr, addr);
 	} else {
-		loongson_llsc_mb();
 		orig = __test_bit_op(*m, "%0",
 				     "xor\t%1, %0, %3",
 				     "ir"(BIT(bit)));
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 28/37] MIPS: bitops: Use smp_mb__before_atomic in test_* ops
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (26 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 27/37] MIPS: bitops: Emit Loongson3 sync workarounds within asm Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 29/37] MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm Paul Burton
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Use smp_mb__before_atomic() rather than smp_mb__before_llsc() in
test_and_set_bit(), test_and_clear_bit() & test_and_change_bit(). The
_atomic() versions make semantic sense in these cases, and will allow a
later patch to omit redundant barriers for Loongson3 systems that
already include a barrier within __test_bit_op().

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/bitops.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 9e967d6622c8..e6d97238a321 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -209,7 +209,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 static inline int test_and_set_bit(unsigned long nr,
 	volatile unsigned long *addr)
 {
-	smp_mb__before_llsc();
+	smp_mb__before_atomic();
 	return test_and_set_bit_lock(nr, addr);
 }
 
@@ -228,7 +228,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 	int bit = nr % BITS_PER_LONG;
 	unsigned long res, orig;
 
-	smp_mb__before_llsc();
+	smp_mb__before_atomic();
 
 	if (!kernel_uses_llsc) {
 		res = __mips_test_and_clear_bit(nr, addr);
@@ -265,7 +265,7 @@ static inline int test_and_change_bit(unsigned long nr,
 	int bit = nr % BITS_PER_LONG;
 	unsigned long res, orig;
 
-	smp_mb__before_llsc();
+	smp_mb__before_atomic();
 
 	if (!kernel_uses_llsc) {
 		res = __mips_test_and_change_bit(nr, addr);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 29/37] MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (27 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 28/37] MIPS: bitops: Use smp_mb__before_atomic in test_* ops Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 30/37] MIPS: cmpxchg: Omit redundant barriers for Loongson3 Paul Burton
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/cmpxchg.h | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 5d3f0e3513b4..fc121d20a980 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -12,6 +12,7 @@
 #include <linux/irqflags.h>
 #include <asm/compiler.h>
 #include <asm/llsc.h>
+#include <asm/sync.h>
 #include <asm/war.h>
 
 /*
@@ -36,12 +37,12 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 	__typeof(*(m)) __ret;						\
 									\
 	if (kernel_uses_llsc) {						\
-		loongson_llsc_mb();					\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
 		"	.set	push				\n"	\
 		"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"	\
+		"	" __SYNC(full, loongson3_war) "		\n"	\
 		"1:	" ld "	%0, %2		# __xchg_asm	\n"	\
 		"	.set	pop				\n"	\
 		"	move	$1, %z3				\n"	\
@@ -108,12 +109,12 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
 	__typeof(*(m)) __ret;						\
 									\
 	if (kernel_uses_llsc) {						\
-		loongson_llsc_mb();					\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
 		"	.set	push				\n"	\
 		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
+		"	" __SYNC(full, loongson3_war) "		\n"	\
 		"1:	" ld "	%0, %2		# __cmpxchg_asm \n"	\
 		"	bne	%0, %z3, 2f			\n"	\
 		"	.set	pop				\n"	\
@@ -122,11 +123,10 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
 		"	" st "	$1, %1				\n"	\
 		"\t" __SC_BEQZ	"$1, 1b				\n"	\
 		"	.set	pop				\n"	\
-		"2:						\n"	\
+		"2:	" __SYNC(full, loongson3_war) "		\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)	\
 		: __LLSC_CLOBBER);					\
-		loongson_llsc_mb();					\
 	} else {							\
 		unsigned long __flags;					\
 									\
@@ -222,11 +222,11 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
 	 */
 	local_irq_save(flags);
 
-	loongson_llsc_mb();
 	asm volatile(
 	"	.set	push				\n"
 	"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"
 	/* Load 64 bits from ptr */
+	"	" __SYNC(full, loongson3_war) "		\n"
 	"1:	lld	%L0, %3		# __cmpxchg64	\n"
 	/*
 	 * Split the 64 bit value we loaded into the 2 registers that hold the
@@ -260,7 +260,7 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
 	/* If we failed, loop! */
 	"\t" __SC_BEQZ "%L1, 1b				\n"
 	"	.set	pop				\n"
-	"2:						\n"
+	"2:	" __SYNC(full, loongson3_war) "		\n"
 	: "=&r"(ret),
 	  "=&r"(tmp),
 	  "=" GCC_OFF_SMALL_ASM() (*(unsigned long long *)ptr)
@@ -268,7 +268,6 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
 	  "r" (old),
 	  "r" (new)
 	: "memory");
-	loongson_llsc_mb();
 
 	local_irq_restore(flags);
 	return ret;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 30/37] MIPS: cmpxchg: Omit redundant barriers for Loongson3
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (28 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 29/37] MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 31/37] MIPS: futex: Emit Loongson3 sync workarounds within asm Paul Burton
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

When building a kernel configured to support Loongson3 LL/SC workarounds
(ie. CONFIG_CPU_LOONGSON3_WORKAROUNDS=y) the inline assembly in
__xchg_asm() & __cmpxchg_asm() already emits completion barriers, and as
such we don't need to emit extra barriers from the xchg() or cmpxchg()
macros. Add compile-time constant checks causing us to omit the
redundant memory barriers.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/cmpxchg.h | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index fc121d20a980..820df68e32e1 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -94,7 +94,13 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
 ({									\
 	__typeof__(*(ptr)) __res;					\
 									\
-	smp_mb__before_llsc();						\
+	/*								\
+	 * In the Loongson3 workaround case __xchg_asm() already	\
+	 * contains a completion barrier prior to the LL, so we don't	\
+	 * need to emit an extra one here.				\
+	 */								\
+	if (!__SYNC_loongson3_war)					\
+		smp_mb__before_llsc();					\
 									\
 	__res = (__typeof__(*(ptr)))					\
 		__xchg((ptr), (unsigned long)(x), sizeof(*(ptr)));	\
@@ -179,9 +185,23 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 ({									\
 	__typeof__(*(ptr)) __res;					\
 									\
-	smp_mb__before_llsc();						\
+	/*								\
+	 * In the Loongson3 workaround case __cmpxchg_asm() already	\
+	 * contains a completion barrier prior to the LL, so we don't	\
+	 * need to emit an extra one here.				\
+	 */								\
+	if (!__SYNC_loongson3_war)					\
+		smp_mb__before_llsc();					\
+									\
 	__res = cmpxchg_local((ptr), (old), (new));			\
-	smp_llsc_mb();							\
+									\
+	/*								\
+	 * In the Loongson3 workaround case __cmpxchg_asm() already	\
+	 * contains a completion barrier after the SC, so we don't	\
+	 * need to emit an extra one here.				\
+	 */								\
+	if (!__SYNC_loongson3_war)					\
+		smp_llsc_mb();						\
 									\
 	__res;								\
 })
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 31/37] MIPS: futex: Emit Loongson3 sync workarounds within asm
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (29 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 30/37] MIPS: cmpxchg: Omit redundant barriers for Loongson3 Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 32/37] MIPS: syscall: " Paul Burton
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/futex.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/futex.h b/arch/mips/include/asm/futex.h
index b83b0397462d..45c3e3652f48 100644
--- a/arch/mips/include/asm/futex.h
+++ b/arch/mips/include/asm/futex.h
@@ -16,6 +16,7 @@
 #include <asm/barrier.h>
 #include <asm/compiler.h>
 #include <asm/errno.h>
+#include <asm/sync.h>
 #include <asm/war.h>
 
 #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg)		\
@@ -50,12 +51,12 @@
 		  "i" (-EFAULT)						\
 		: "memory");						\
 	} else if (cpu_has_llsc) {					\
-		loongson_llsc_mb();					\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
 		"	.set	push				\n"	\
 		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
+		"	" __SYNC(full, loongson3_war) "		\n"	\
 		"1:	"user_ll("%1", "%4")" # __futex_atomic_op\n"	\
 		"	.set	pop				\n"	\
 		"	" insn	"				\n"	\
@@ -164,13 +165,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 		  "i" (-EFAULT)
 		: "memory");
 	} else if (cpu_has_llsc) {
-		loongson_llsc_mb();
 		__asm__ __volatile__(
 		"# futex_atomic_cmpxchg_inatomic			\n"
 		"	.set	push					\n"
 		"	.set	noat					\n"
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
+		"	" __SYNC(full, loongson3_war) "			\n"
 		"1:	"user_ll("%1", "%3")"				\n"
 		"	bne	%1, %z4, 3f				\n"
 		"	.set	pop					\n"
@@ -178,8 +179,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
 		"2:	"user_sc("$1", "%2")"				\n"
 		"	beqz	$1, 1b					\n"
-		__WEAK_LLSC_MB
-		"3:							\n"
+		"3:	" __SYNC_ELSE(full, loongson3_war, __WEAK_LLSC_MB) "\n"
 		"	.insn						\n"
 		"	.set	pop					\n"
 		"	.section .fixup,\"ax\"				\n"
@@ -194,7 +194,6 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 		: GCC_OFF_SMALL_ASM() (*uaddr), "Jr" (oldval), "Jr" (newval),
 		  "i" (-EFAULT)
 		: "memory");
-		loongson_llsc_mb();
 	} else
 		return -ENOSYS;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 32/37] MIPS: syscall: Emit Loongson3 sync workarounds within asm
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (30 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 31/37] MIPS: futex: Emit Loongson3 sync workarounds within asm Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 33/37] MIPS: barrier: Remove loongson_llsc_mb() Paul Burton
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/kernel/syscall.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index b0e25e913bdb..3ea288ca35f1 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -37,6 +37,7 @@
 #include <asm/signal.h>
 #include <asm/sim.h>
 #include <asm/shmparam.h>
+#include <asm/sync.h>
 #include <asm/sysmips.h>
 #include <asm/switch_to.h>
 
@@ -132,12 +133,12 @@ static inline int mips_atomic_set(unsigned long addr, unsigned long new)
 		  [efault] "i" (-EFAULT)
 		: "memory");
 	} else if (cpu_has_llsc) {
-		loongson_llsc_mb();
 		__asm__ __volatile__ (
 		"	.set	push					\n"
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
 		"	li	%[err], 0				\n"
 		"1:							\n"
+		"	" __SYNC(full, loongson3_war) "			\n"
 		user_ll("%[old]", "(%[addr])")
 		"	move	%[tmp], %[new]				\n"
 		"2:							\n"
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 33/37] MIPS: barrier: Remove loongson_llsc_mb()
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (31 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 32/37] MIPS: syscall: " Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 34/37] MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3 Paul Burton
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

The loongson_llsc_mb() macro is no longer used - instead barriers are
emitted as part of inline asm using the __SYNC() macro. Remove the
now-defunct loongson_llsc_mb() macro.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/barrier.h | 40 ---------------------------------
 arch/mips/loongson64/Platform   |  2 +-
 2 files changed, 1 insertion(+), 41 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index c7e05e832da9..1a99a6c5b5dd 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -121,46 +121,6 @@ static inline void wmb(void)
 #define __smp_mb__before_atomic()	__smp_mb__before_llsc()
 #define __smp_mb__after_atomic()	smp_llsc_mb()
 
-/*
- * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
- * store or prefetch) in between an LL & SC can cause the SC instruction to
- * erroneously succeed, breaking atomicity. Whilst it's unusual to write code
- * containing such sequences, this bug bites harder than we might otherwise
- * expect due to reordering & speculation:
- *
- * 1) A memory access appearing prior to the LL in program order may actually
- *    be executed after the LL - this is the reordering case.
- *
- *    In order to avoid this we need to place a memory barrier (ie. a SYNC
- *    instruction) prior to every LL instruction, in between it and any earlier
- *    memory access instructions.
- *
- *    This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later.
- *
- * 2) If a conditional branch exists between an LL & SC with a target outside
- *    of the LL-SC loop, for example an exit upon value mismatch in cmpxchg()
- *    or similar, then misprediction of the branch may allow speculative
- *    execution of memory accesses from outside of the LL-SC loop.
- *
- *    In order to avoid this we need a memory barrier (ie. a SYNC instruction)
- *    at each affected branch target, for which we also use loongson_llsc_mb()
- *    defined below.
- *
- *    This case affects all current Loongson 3 CPUs.
- *
- * The above described cases cause an error in the cache coherence protocol;
- * such that the Invalidate of a competing LL-SC goes 'missing' and SC
- * erroneously observes its core still has Exclusive state and lets the SC
- * proceed.
- *
- * Therefore the error only occurs on SMP systems.
- */
-#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */
-#define loongson_llsc_mb()	__asm__ __volatile__("sync" : : :"memory")
-#else
-#define loongson_llsc_mb()	do { } while (0)
-#endif
-
 static inline void sync_ginv(void)
 {
 	asm volatile(__SYNC(ginv, always));
diff --git a/arch/mips/loongson64/Platform b/arch/mips/loongson64/Platform
index c1a4d4dc4665..28172500f95a 100644
--- a/arch/mips/loongson64/Platform
+++ b/arch/mips/loongson64/Platform
@@ -27,7 +27,7 @@ cflags-$(CONFIG_CPU_LOONGSON3)	+= -Wa,--trap
 #
 # Some versions of binutils, not currently mainline as of 2019/02/04, support
 # an -mfix-loongson3-llsc flag which emits a sync prior to each ll instruction
-# to work around a CPU bug (see loongson_llsc_mb() in asm/barrier.h for a
+# to work around a CPU bug (see __SYNC_loongson3_war in asm/sync.h for a
 # description).
 #
 # We disable this in order to prevent the assembler meddling with the
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 34/37] MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (32 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 33/37] MIPS: barrier: Remove loongson_llsc_mb() Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 35/37] MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler Paul Burton
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

Loongson3 systems with CONFIG_CPU_LOONGSON3_WORKAROUNDS enabled already
emit a full completion barrier as part of the inline assembly containing
LL/SC loops for atomic operations. As such the barrier emitted by
__smp_mb__before_atomic() is redundant, and we can remove it.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/include/asm/barrier.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 1a99a6c5b5dd..f3b5aa0938c1 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -118,7 +118,17 @@ static inline void wmb(void)
 #define nudge_writes() mb()
 #endif
 
-#define __smp_mb__before_atomic()	__smp_mb__before_llsc()
+/*
+ * In the Loongson3 LL/SC workaround case, all of our LL/SC loops already have
+ * a completion barrier immediately preceding the LL instruction. Therefore we
+ * can skip emitting a barrier from __smp_mb__before_atomic().
+ */
+#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS
+# define __smp_mb__before_atomic()
+#else
+# define __smp_mb__before_atomic()	__smp_mb__before_llsc()
+#endif
+
 #define __smp_mb__after_atomic()	smp_llsc_mb()
 
 static inline void sync_ginv(void)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 35/37] MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (33 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 34/37] MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3 Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 36/37] MIPS: genex: Don't reload address unnecessarily Paul Burton
  2019-09-30 23:08 ` [PATCH 37/37] MIPS: Check Loongson3 LL/SC errata workaround correctness Paul Burton
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

In ejtag_debug_handler we use LL & SC instructions to acquire & release
an open-coded spinlock. For Loongson3 systems affected by LL/SC errata
this requires that we insert a sync instruction prior to the LL in order
to ensure correct behavior of the LL/SC loop.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/kernel/genex.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index efde27c99414..ac4f2b835165 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -18,6 +18,7 @@
 #include <asm/fpregdef.h>
 #include <asm/mipsregs.h>
 #include <asm/stackframe.h>
+#include <asm/sync.h>
 #include <asm/war.h>
 #include <asm/thread_info.h>
 
@@ -353,6 +354,7 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
 
 #ifdef CONFIG_SMP
 1:	PTR_LA	k0, ejtag_debug_buffer_spinlock
+	__SYNC(full, loongson3_war)
 	ll	k0, 0(k0)
 	bnez	k0, 1b
 	PTR_LA	k0, ejtag_debug_buffer_spinlock
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 36/37] MIPS: genex: Don't reload address unnecessarily
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (34 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 35/37] MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  2019-09-30 23:08 ` [PATCH 37/37] MIPS: Check Loongson3 LL/SC errata workaround correctness Paul Burton
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

In ejtag_debug_handler() we must reload the address of
ejtag_debug_buffer_spinlock if an sc fails, since the address in k0 will
have been clobbered by the result of the sc instruction. In the case
where we simply load a non-zero value (ie. there's contention for the
lock) the address will not be clobbered & we can simply branch back to
repeat the load from memory without reloading the address into k0.

The primary motivation for this change is that it moves the target of
the bnez instruction to an instruction within the LL/SC loop (the LL
itself), which we know contains no other memory accesses & therefore
isn't affected by Loongson3 LL/SC errata.

Signed-off-by: Paul Burton <paul.burton@mips.com>
---

 arch/mips/kernel/genex.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index ac4f2b835165..60ede6b75a3b 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -355,8 +355,8 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
 #ifdef CONFIG_SMP
 1:	PTR_LA	k0, ejtag_debug_buffer_spinlock
 	__SYNC(full, loongson3_war)
-	ll	k0, 0(k0)
-	bnez	k0, 1b
+2:	ll	k0, 0(k0)
+	bnez	k0, 2b
 	PTR_LA	k0, ejtag_debug_buffer_spinlock
 	sc	k0, 0(k0)
 	beqz	k0, 1b
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 37/37] MIPS: Check Loongson3 LL/SC errata workaround correctness
  2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
                   ` (35 preceding siblings ...)
  2019-09-30 23:08 ` [PATCH 36/37] MIPS: genex: Don't reload address unnecessarily Paul Burton
@ 2019-09-30 23:08 ` Paul Burton
  36 siblings, 0 replies; 38+ messages in thread
From: Paul Burton @ 2019-09-30 23:08 UTC (permalink / raw)
  To: linux-mips; +Cc: Huacai Chen, Jiaxun Yang, linux-kernel, Paul Burton

When Loongson3 LL/SC errata workarounds are enabled (ie.
CONFIG_CPU_LOONGSON3_WORKAROUNDS=y) run a tool to scan through the
compiled kernel & ensure that the workaround is applied correctly. That
is, ensure that:

  - Every LL or LLD instruction is preceded by a sync instruction.

  - Any branches from within an LL/SC loop to outside of that loop
    target a sync instruction.

Reasoning for these conditions can be found by reading the comment above
the definition of __SYNC_loongson3_war in arch/mips/include/asm/sync.h.

This tool will help ensure that we don't inadvertently introduce code
paths that miss the required workarounds.

Signed-off-by: Paul Burton <paul.burton@mips.com>

---

 arch/mips/Makefile                     |   2 +-
 arch/mips/Makefile.postlink            |  10 +-
 arch/mips/tools/.gitignore             |   1 +
 arch/mips/tools/Makefile               |   5 +
 arch/mips/tools/loongson3-llsc-check.c | 307 +++++++++++++++++++++++++
 5 files changed, 323 insertions(+), 2 deletions(-)
 create mode 100644 arch/mips/tools/loongson3-llsc-check.c

diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index cdc09b71febe..4ac0974cf902 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -13,7 +13,7 @@
 #
 
 archscripts: scripts_basic
-	$(Q)$(MAKE) $(build)=arch/mips/tools elf-entry
+	$(Q)$(MAKE) $(build)=arch/mips/tools elf-entry loongson3-llsc-check
 	$(Q)$(MAKE) $(build)=arch/mips/boot/tools relocs
 
 KBUILD_DEFCONFIG := 32r2el_defconfig
diff --git a/arch/mips/Makefile.postlink b/arch/mips/Makefile.postlink
index 4eea4188cb20..f03fdc95143e 100644
--- a/arch/mips/Makefile.postlink
+++ b/arch/mips/Makefile.postlink
@@ -3,7 +3,8 @@
 # Post-link MIPS pass
 # ===========================================================================
 #
-# 1. Insert relocations into vmlinux
+# 1. Check that Loongson3 LL/SC workarounds are applied correctly
+# 2. Insert relocations into vmlinux
 
 PHONY := __archpost
 __archpost:
@@ -11,6 +12,10 @@ __archpost:
 -include include/config/auto.conf
 include scripts/Kbuild.include
 
+CMD_LS3_LLSC = arch/mips/tools/loongson3-llsc-check
+quiet_cmd_ls3_llsc = LLSCCHK $@
+      cmd_ls3_llsc = $(CMD_LS3_LLSC) $@
+
 CMD_RELOCS = arch/mips/boot/tools/relocs
 quiet_cmd_relocs = RELOCS $@
       cmd_relocs = $(CMD_RELOCS) $@
@@ -19,6 +24,9 @@ quiet_cmd_relocs = RELOCS $@
 
 vmlinux: FORCE
 	@true
+ifeq ($(CONFIG_CPU_LOONGSON3_WORKAROUNDS),y)
+	$(call if_changed,ls3_llsc)
+endif
 ifeq ($(CONFIG_RELOCATABLE),y)
 	$(call if_changed,relocs)
 endif
diff --git a/arch/mips/tools/.gitignore b/arch/mips/tools/.gitignore
index 56d34ccccce4..b0209450d9ff 100644
--- a/arch/mips/tools/.gitignore
+++ b/arch/mips/tools/.gitignore
@@ -1 +1,2 @@
 elf-entry
+loongson3-llsc-check
diff --git a/arch/mips/tools/Makefile b/arch/mips/tools/Makefile
index 3baee4bc6775..aaef688749f5 100644
--- a/arch/mips/tools/Makefile
+++ b/arch/mips/tools/Makefile
@@ -3,3 +3,8 @@ hostprogs-y := elf-entry
 PHONY += elf-entry
 elf-entry: $(obj)/elf-entry
 	@:
+
+hostprogs-$(CONFIG_CPU_LOONGSON3_WORKAROUNDS) += loongson3-llsc-check
+PHONY += loongson3-llsc-check
+loongson3-llsc-check: $(obj)/loongson3-llsc-check
+	@:
diff --git a/arch/mips/tools/loongson3-llsc-check.c b/arch/mips/tools/loongson3-llsc-check.c
new file mode 100644
index 000000000000..0ebddd0ae46f
--- /dev/null
+++ b/arch/mips/tools/loongson3-llsc-check.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <byteswap.h>
+#include <elf.h>
+#include <endian.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#ifdef be32toh
+/* If libc provides le{16,32,64}toh() then we'll use them */
+#elif BYTE_ORDER == LITTLE_ENDIAN
+# define le16toh(x)	(x)
+# define le32toh(x)	(x)
+# define le64toh(x)	(x)
+#elif BYTE_ORDER == BIG_ENDIAN
+# define le16toh(x)	bswap_16(x)
+# define le32toh(x)	bswap_32(x)
+# define le64toh(x)	bswap_64(x)
+#endif
+
+/* MIPS opcodes, in bits 31:26 of an instruction */
+#define OP_SPECIAL	0x00
+#define OP_REGIMM	0x01
+#define OP_BEQ		0x04
+#define OP_BNE		0x05
+#define OP_BLEZ		0x06
+#define OP_BGTZ		0x07
+#define OP_BEQL		0x14
+#define OP_BNEL		0x15
+#define OP_BLEZL	0x16
+#define OP_BGTZL	0x17
+#define OP_LL		0x30
+#define OP_LLD		0x34
+#define OP_SC		0x38
+#define OP_SCD		0x3c
+
+/* Bits 20:16 of OP_REGIMM instructions */
+#define REGIMM_BLTZ	0x00
+#define REGIMM_BGEZ	0x01
+#define REGIMM_BLTZL	0x02
+#define REGIMM_BGEZL	0x03
+#define REGIMM_BLTZAL	0x10
+#define REGIMM_BGEZAL	0x11
+#define REGIMM_BLTZALL	0x12
+#define REGIMM_BGEZALL	0x13
+
+/* Bits 5:0 of OP_SPECIAL instructions */
+#define SPECIAL_SYNC	0x0f
+
+static void usage(FILE *f)
+{
+	fprintf(f, "Usage: loongson3-llsc-check /path/to/vmlinux\n");
+}
+
+static int se16(uint16_t x)
+{
+	return (int16_t)x;
+}
+
+static bool is_ll(uint32_t insn)
+{
+	switch (insn >> 26) {
+	case OP_LL:
+	case OP_LLD:
+		return true;
+
+	default:
+		return false;
+	}
+}
+
+static bool is_sc(uint32_t insn)
+{
+	switch (insn >> 26) {
+	case OP_SC:
+	case OP_SCD:
+		return true;
+
+	default:
+		return false;
+	}
+}
+
+static bool is_sync(uint32_t insn)
+{
+	/* Bits 31:11 should all be zeroes */
+	if (insn >> 11)
+		return false;
+
+	/* Bits 5:0 specify the SYNC special encoding */
+	if ((insn & 0x3f) != SPECIAL_SYNC)
+		return false;
+
+	return true;
+}
+
+static bool is_branch(uint32_t insn, int *off)
+{
+	switch (insn >> 26) {
+	case OP_BEQ:
+	case OP_BEQL:
+	case OP_BNE:
+	case OP_BNEL:
+	case OP_BGTZ:
+	case OP_BGTZL:
+	case OP_BLEZ:
+	case OP_BLEZL:
+		*off = se16(insn) + 1;
+		return true;
+
+	case OP_REGIMM:
+		switch ((insn >> 16) & 0x1f) {
+		case REGIMM_BGEZ:
+		case REGIMM_BGEZL:
+		case REGIMM_BGEZAL:
+		case REGIMM_BGEZALL:
+		case REGIMM_BLTZ:
+		case REGIMM_BLTZL:
+		case REGIMM_BLTZAL:
+		case REGIMM_BLTZALL:
+			*off = se16(insn) + 1;
+			return true;
+
+		default:
+			return false;
+		}
+
+	default:
+		return false;
+	}
+}
+
+static int check_ll(uint64_t pc, uint32_t *code, size_t sz)
+{
+	ssize_t i, max, sc_pos;
+	int off;
+
+	/*
+	 * Every LL must be preceded by a sync instruction in order to ensure
+	 * that instruction reordering doesn't allow a prior memory access to
+	 * execute after the LL & cause erroneous results.
+	 */
+	if (!is_sync(le32toh(code[-1]))) {
+		fprintf(stderr, "%" PRIx64 ": LL not preceded by sync\n", pc);
+		return -EINVAL;
+	}
+
+	/* Find the matching SC instruction */
+	max = sz / 4;
+	for (sc_pos = 0; sc_pos < max; sc_pos++) {
+		if (is_sc(le32toh(code[sc_pos])))
+			break;
+	}
+	if (sc_pos >= max) {
+		fprintf(stderr, "%" PRIx64 ": LL has no matching SC\n", pc);
+		return -EINVAL;
+	}
+
+	/*
+	 * Check branches within the LL/SC loop target sync instructions,
+	 * ensuring that speculative execution can't generate memory accesses
+	 * due to instructions outside of the loop.
+	 */
+	for (i = 0; i < sc_pos; i++) {
+		if (!is_branch(le32toh(code[i]), &off))
+			continue;
+
+		/*
+		 * If the branch target is within the LL/SC loop then we don't
+		 * need to worry about it.
+		 */
+		if ((off >= -i) && (off <= sc_pos))
+			continue;
+
+		/* If the branch targets a sync instruction we're all good... */
+		if (is_sync(le32toh(code[i + off])))
+			continue;
+
+		/* ...but if not, we have a problem */
+		fprintf(stderr, "%" PRIx64 ": Branch target not a sync\n",
+			pc + (i * 4));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int check_code(uint64_t pc, uint32_t *code, size_t sz)
+{
+	int err = 0;
+
+	if (sz % 4) {
+		fprintf(stderr, "%" PRIx64 ": Section size not a multiple of 4\n",
+			pc);
+		err = -EINVAL;
+		sz -= (sz % 4);
+	}
+
+	if (is_ll(le32toh(code[0]))) {
+		fprintf(stderr, "%" PRIx64 ": First instruction in section is an LL\n",
+			pc);
+		err = -EINVAL;
+	}
+
+#define advance() (	\
+	code++,		\
+	pc += 4,	\
+	sz -= 4		\
+)
+
+	/*
+	 * Skip the first instructionm allowing check_ll to look backwards
+	 * unconditionally.
+	 */
+	advance();
+
+	/* Now scan through the code looking for LL instructions */
+	for (; sz; advance()) {
+		if (is_ll(le32toh(code[0])))
+			err |= check_ll(pc, code, sz);
+	}
+
+	return err;
+}
+
+int main(int argc, char *argv[])
+{
+	int vmlinux_fd, status, err, i;
+	const char *vmlinux_path;
+	struct stat st;
+	Elf64_Ehdr *eh;
+	Elf64_Shdr *sh;
+	void *vmlinux;
+
+	status = EXIT_FAILURE;
+
+	if (argc < 2) {
+		usage(stderr);
+		goto out_ret;
+	}
+
+	vmlinux_path = argv[1];
+	vmlinux_fd = open(vmlinux_path, O_RDONLY);
+	if (vmlinux_fd == -1) {
+		perror("Unable to open vmlinux");
+		goto out_ret;
+	}
+
+	err = fstat(vmlinux_fd, &st);
+	if (err) {
+		perror("Unable to stat vmlinux");
+		goto out_close;
+	}
+
+	vmlinux = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, vmlinux_fd, 0);
+	if (vmlinux == MAP_FAILED) {
+		perror("Unable to mmap vmlinux");
+		goto out_close;
+	}
+
+	eh = vmlinux;
+	if (memcmp(eh->e_ident, ELFMAG, SELFMAG)) {
+		fprintf(stderr, "vmlinux is not an ELF?\n");
+		goto out_munmap;
+	}
+
+	if (eh->e_ident[EI_CLASS] != ELFCLASS64) {
+		fprintf(stderr, "vmlinux is not 64b?\n");
+		goto out_munmap;
+	}
+
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB) {
+		fprintf(stderr, "vmlinux is not little endian?\n");
+		goto out_munmap;
+	}
+
+	for (i = 0; i < le16toh(eh->e_shnum); i++) {
+		sh = vmlinux + le64toh(eh->e_shoff) + (i * le16toh(eh->e_shentsize));
+
+		if (sh->sh_type != SHT_PROGBITS)
+			continue;
+		if (!(sh->sh_flags & SHF_EXECINSTR))
+			continue;
+
+		err = check_code(le64toh(sh->sh_addr),
+				 vmlinux + le64toh(sh->sh_offset),
+				 le64toh(sh->sh_size));
+		if (err)
+			goto out_munmap;
+	}
+
+	status = EXIT_SUCCESS;
+out_munmap:
+	munmap(vmlinux, st.st_size);
+out_close:
+	close(vmlinux_fd);
+out_ret:
+	return status;
+}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-09-30 23:10 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-30 23:08 [PATCH 00/37] MIPS: barriers & atomics cleanups Paul Burton
2019-09-30 23:08 ` [PATCH 01/37] MIPS: Unify sc beqz definition Paul Burton
2019-09-30 23:08 ` [PATCH 02/37] MIPS: Use compact branch for LL/SC loops on MIPSr6+ Paul Burton
2019-09-30 23:08 ` [PATCH 03/37] MIPS: barrier: Add __SYNC() infrastructure Paul Burton
2019-09-30 23:08 ` [PATCH 04/37] MIPS: barrier: Clean up rmb() & wmb() definitions Paul Burton
2019-09-30 23:08 ` [PATCH 05/37] MIPS: barrier: Clean up __smp_mb() definition Paul Burton
2019-09-30 23:08 ` [PATCH 06/37] MIPS: barrier: Remove fast_mb() Octeon #ifdef'ery Paul Burton
2019-09-30 23:08 ` [PATCH 07/37] MIPS: barrier: Clean up __sync() definition Paul Burton
2019-09-30 23:08 ` [PATCH 08/37] MIPS: barrier: Clean up sync_ginv() Paul Burton
2019-09-30 23:08 ` [PATCH 09/37] MIPS: atomic: Fix whitespace in ATOMIC_OP macros Paul Burton
2019-09-30 23:08 ` [PATCH 11/37] MIPS: atomic: Use one macro to generate 32b & 64b functions Paul Burton
2019-09-30 23:08 ` [PATCH 10/37] MIPS: atomic: Handle !kernel_uses_llsc first Paul Burton
2019-09-30 23:08 ` [PATCH 12/37] MIPS: atomic: Emit Loongson3 sync workarounds within asm Paul Burton
2019-09-30 23:08 ` [PATCH 13/37] MIPS: atomic: Use _atomic barriers in atomic_sub_if_positive() Paul Burton
2019-09-30 23:08 ` [PATCH 14/37] MIPS: atomic: Unify 32b & 64b sub_if_positive Paul Burton
2019-09-30 23:08 ` [PATCH 15/37] MIPS: atomic: Deduplicate 32b & 64b read, set, xchg, cmpxchg Paul Burton
2019-09-30 23:08 ` [PATCH 17/37] MIPS: bitops: Handle !kernel_uses_llsc first Paul Burton
2019-09-30 23:08 ` [PATCH 16/37] MIPS: bitops: Use generic builtin ffs/fls; drop cpu_has_clo_clz Paul Burton
2019-09-30 23:08 ` [PATCH 18/37] MIPS: bitops: Only use ins for bit 16 or higher Paul Burton
2019-09-30 23:08 ` [PATCH 19/37] MIPS: bitops: Use MIPS_ISA_REV, not #ifdefs Paul Burton
2019-09-30 23:08 ` [PATCH 20/37] MIPS: bitops: ins start position is always an immediate Paul Burton
2019-09-30 23:08 ` [PATCH 21/37] MIPS: bitops: Implement test_and_set_bit() in terms of _lock variant Paul Burton
2019-09-30 23:08 ` [PATCH 22/37] MIPS: bitops: Allow immediates in test_and_{set,clear,change}_bit Paul Burton
2019-09-30 23:08 ` [PATCH 23/37] MIPS: bitops: Use the BIT() macro Paul Burton
2019-09-30 23:08 ` [PATCH 24/37] MIPS: bitops: Avoid redundant zero-comparison for non-LLSC Paul Burton
2019-09-30 23:08 ` [PATCH 25/37] MIPS: bitops: Abstract LL/SC loops Paul Burton
2019-09-30 23:08 ` [PATCH 26/37] MIPS: bitops: Use BIT_WORD() & BITS_PER_LONG Paul Burton
2019-09-30 23:08 ` [PATCH 27/37] MIPS: bitops: Emit Loongson3 sync workarounds within asm Paul Burton
2019-09-30 23:08 ` [PATCH 28/37] MIPS: bitops: Use smp_mb__before_atomic in test_* ops Paul Burton
2019-09-30 23:08 ` [PATCH 29/37] MIPS: cmpxchg: Emit Loongson3 sync workarounds within asm Paul Burton
2019-09-30 23:08 ` [PATCH 30/37] MIPS: cmpxchg: Omit redundant barriers for Loongson3 Paul Burton
2019-09-30 23:08 ` [PATCH 31/37] MIPS: futex: Emit Loongson3 sync workarounds within asm Paul Burton
2019-09-30 23:08 ` [PATCH 32/37] MIPS: syscall: " Paul Burton
2019-09-30 23:08 ` [PATCH 33/37] MIPS: barrier: Remove loongson_llsc_mb() Paul Burton
2019-09-30 23:08 ` [PATCH 34/37] MIPS: barrier: Make __smp_mb__before_atomic() a no-op for Loongson3 Paul Burton
2019-09-30 23:08 ` [PATCH 35/37] MIPS: genex: Add Loongson3 LL/SC workaround to ejtag_debug_handler Paul Burton
2019-09-30 23:08 ` [PATCH 36/37] MIPS: genex: Don't reload address unnecessarily Paul Burton
2019-09-30 23:08 ` [PATCH 37/37] MIPS: Check Loongson3 LL/SC errata workaround correctness Paul Burton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).