All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] MIPS: cmpxchg(), xchg() fixes & queued locks
@ 2017-06-10  0:26 ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

This series makes a bunch of cleanups & improvements to the cmpxchg() &
xchg() macros & functions, allowing them to be used on values smaller
than 4 bytes, then switches MIPS over to use generic queued spinlocks &
queued read/write locks.

Applies atop v4.12-rc4.

Paul Burton (11):
  MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases
  MIPS: cmpxchg: Pull xchg() asm into a macro
  MIPS: cmpxchg: Use __compiletime_error() for bad cmpxchg() pointers
  MIPS: cmpxchg: Error out on unsupported xchg() calls
  MIPS: cmpxchg: Drop __xchg_u{32,64} functions
  MIPS: cmpxchg: Implement __cmpxchg() as a function
  MIPS: cmpxchg: Implement 1 byte & 2 byte xchg()
  MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg()
  MIPS: cmpxchg: Rearrange __xchg() arguments to match xchg()
  MIPS: Use queued read/write locks (qrwlock)
  MIPS: Use queued spinlocks (qspinlock)

 arch/mips/Kconfig                      |   2 +
 arch/mips/include/asm/Kbuild           |   2 +
 arch/mips/include/asm/cmpxchg.h        | 280 ++++++++++------------
 arch/mips/include/asm/spinlock.h       | 426 +--------------------------------
 arch/mips/include/asm/spinlock_types.h |  34 +--
 arch/mips/kernel/Makefile              |   2 +-
 arch/mips/kernel/cmpxchg.c             | 109 +++++++++
 7 files changed, 239 insertions(+), 616 deletions(-)
 create mode 100644 arch/mips/kernel/cmpxchg.c

-- 
2.13.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 00/11] MIPS: cmpxchg(), xchg() fixes & queued locks
@ 2017-06-10  0:26 ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

This series makes a bunch of cleanups & improvements to the cmpxchg() &
xchg() macros & functions, allowing them to be used on values smaller
than 4 bytes, then switches MIPS over to use generic queued spinlocks &
queued read/write locks.

Applies atop v4.12-rc4.

Paul Burton (11):
  MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases
  MIPS: cmpxchg: Pull xchg() asm into a macro
  MIPS: cmpxchg: Use __compiletime_error() for bad cmpxchg() pointers
  MIPS: cmpxchg: Error out on unsupported xchg() calls
  MIPS: cmpxchg: Drop __xchg_u{32,64} functions
  MIPS: cmpxchg: Implement __cmpxchg() as a function
  MIPS: cmpxchg: Implement 1 byte & 2 byte xchg()
  MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg()
  MIPS: cmpxchg: Rearrange __xchg() arguments to match xchg()
  MIPS: Use queued read/write locks (qrwlock)
  MIPS: Use queued spinlocks (qspinlock)

 arch/mips/Kconfig                      |   2 +
 arch/mips/include/asm/Kbuild           |   2 +
 arch/mips/include/asm/cmpxchg.h        | 280 ++++++++++------------
 arch/mips/include/asm/spinlock.h       | 426 +--------------------------------
 arch/mips/include/asm/spinlock_types.h |  34 +--
 arch/mips/kernel/Makefile              |   2 +-
 arch/mips/kernel/cmpxchg.c             | 109 +++++++++
 7 files changed, 239 insertions(+), 616 deletions(-)
 create mode 100644 arch/mips/kernel/cmpxchg.c

-- 
2.13.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/11] MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Prior to this patch the xchg & cmpxchg functions have duplicated code
which is for all intents & purposes identical apart from use of a
branch-likely instruction in the R10000_LLSC_WAR case & a regular branch
instruction in the non-R10000_LLSC_WAR case.

This patch removes the duplication, declaring a __scbeqz macro to select
the branch instruction suitable for use when checking the result of an
sc instruction & making use of it to unify the 2 cases.

In __xchg_u{32,64}() this means writing the branch in asm, where it was
previously being done in C as a do...while loop for the
non-R10000_LLSC_WAR case. As this is a single instruction, and adds
consistency with the R10000_LLSC_WAR cases & the cmpxchg() code, this
seems worthwhile.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 80 ++++++++++++-----------------------------
 1 file changed, 22 insertions(+), 58 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index b71ab4a5fd50..0582c01d229d 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -13,44 +13,38 @@
 #include <asm/compiler.h>
 #include <asm/war.h>
 
+/*
+ * Using a branch-likely instruction to check the result of an sc instruction
+ * works around a bug present in R10000 CPUs prior to revision 3.0 that could
+ * cause ll-sc sequences to execute non-atomically.
+ */
+#if R10000_LLSC_WAR
+# define __scbeqz "beqzl"
+#else
+# define __scbeqz "beqz"
+#endif
+
 static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
 {
 	__u32 retval;
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
+	if (kernel_uses_llsc) {
 		unsigned long dummy;
 
 		__asm__ __volatile__(
-		"	.set	arch=r4000				\n"
+		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
 		"1:	ll	%0, %3			# xchg_u32	\n"
 		"	.set	mips0					\n"
 		"	move	%2, %z4					\n"
-		"	.set	arch=r4000				\n"
+		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
 		"	sc	%2, %1					\n"
-		"	beqzl	%2, 1b					\n"
+		"\t" __scbeqz "	%2, 1b					\n"
 		"	.set	mips0					\n"
 		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
 		: "memory");
-	} else if (kernel_uses_llsc) {
-		unsigned long dummy;
-
-		do {
-			__asm__ __volatile__(
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	ll	%0, %3		# xchg_u32	\n"
-			"	.set	mips0				\n"
-			"	move	%2, %z4				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	sc	%2, %1				\n"
-			"	.set	mips0				\n"
-			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
-			  "=&r" (dummy)
-			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
-			: "memory");
-		} while (unlikely(!dummy));
 	} else {
 		unsigned long flags;
 
@@ -72,34 +66,19 @@ static inline __u64 __xchg_u64(volatile __u64 * m, __u64 val)
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
+	if (kernel_uses_llsc) {
 		unsigned long dummy;
 
 		__asm__ __volatile__(
-		"	.set	arch=r4000				\n"
+		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
 		"1:	lld	%0, %3			# xchg_u64	\n"
 		"	move	%2, %z4					\n"
 		"	scd	%2, %1					\n"
-		"	beqzl	%2, 1b					\n"
+		"\t" __scbeqz "	%2, 1b					\n"
 		"	.set	mips0					\n"
 		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
 		: "memory");
-	} else if (kernel_uses_llsc) {
-		unsigned long dummy;
-
-		do {
-			__asm__ __volatile__(
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	lld	%0, %3		# xchg_u64	\n"
-			"	move	%2, %z4				\n"
-			"	scd	%2, %1				\n"
-			"	.set	mips0				\n"
-			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
-			  "=&r" (dummy)
-			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
-			: "memory");
-		} while (unlikely(!dummy));
 	} else {
 		unsigned long flags;
 
@@ -142,24 +121,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 ({									\
 	__typeof(*(m)) __ret;						\
 									\
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {			\
-		__asm__ __volatile__(					\
-		"	.set	push				\n"	\
-		"	.set	noat				\n"	\
-		"	.set	arch=r4000			\n"	\
-		"1:	" ld "	%0, %2		# __cmpxchg_asm \n"	\
-		"	bne	%0, %z3, 2f			\n"	\
-		"	.set	mips0				\n"	\
-		"	move	$1, %z4				\n"	\
-		"	.set	arch=r4000			\n"	\
-		"	" st "	$1, %1				\n"	\
-		"	beqzl	$1, 1b				\n"	\
-		"2:						\n"	\
-		"	.set	pop				\n"	\
-		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
-		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
-		: "memory");						\
-	} else if (kernel_uses_llsc) {					\
+	if (kernel_uses_llsc) {						\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
@@ -170,7 +132,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 		"	move	$1, %z4				\n"	\
 		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
 		"	" st "	$1, %1				\n"	\
-		"	beqz	$1, 1b				\n"	\
+		"\t" __scbeqz "	$1, 1b				\n"	\
 		"	.set	pop				\n"	\
 		"2:						\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
@@ -245,4 +207,6 @@ extern void __cmpxchg_called_with_bad_pointer(void);
 #define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n))
 #endif
 
+#undef __scbeqz
+
 #endif /* __ASM_CMPXCHG_H */
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 01/11] MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Prior to this patch the xchg & cmpxchg functions have duplicated code
which is for all intents & purposes identical apart from use of a
branch-likely instruction in the R10000_LLSC_WAR case & a regular branch
instruction in the non-R10000_LLSC_WAR case.

This patch removes the duplication, declaring a __scbeqz macro to select
the branch instruction suitable for use when checking the result of an
sc instruction & making use of it to unify the 2 cases.

In __xchg_u{32,64}() this means writing the branch in asm, where it was
previously being done in C as a do...while loop for the
non-R10000_LLSC_WAR case. As this is a single instruction, and adds
consistency with the R10000_LLSC_WAR cases & the cmpxchg() code, this
seems worthwhile.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 80 ++++++++++++-----------------------------
 1 file changed, 22 insertions(+), 58 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index b71ab4a5fd50..0582c01d229d 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -13,44 +13,38 @@
 #include <asm/compiler.h>
 #include <asm/war.h>
 
+/*
+ * Using a branch-likely instruction to check the result of an sc instruction
+ * works around a bug present in R10000 CPUs prior to revision 3.0 that could
+ * cause ll-sc sequences to execute non-atomically.
+ */
+#if R10000_LLSC_WAR
+# define __scbeqz "beqzl"
+#else
+# define __scbeqz "beqz"
+#endif
+
 static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
 {
 	__u32 retval;
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
+	if (kernel_uses_llsc) {
 		unsigned long dummy;
 
 		__asm__ __volatile__(
-		"	.set	arch=r4000				\n"
+		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
 		"1:	ll	%0, %3			# xchg_u32	\n"
 		"	.set	mips0					\n"
 		"	move	%2, %z4					\n"
-		"	.set	arch=r4000				\n"
+		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
 		"	sc	%2, %1					\n"
-		"	beqzl	%2, 1b					\n"
+		"\t" __scbeqz "	%2, 1b					\n"
 		"	.set	mips0					\n"
 		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
 		: "memory");
-	} else if (kernel_uses_llsc) {
-		unsigned long dummy;
-
-		do {
-			__asm__ __volatile__(
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	ll	%0, %3		# xchg_u32	\n"
-			"	.set	mips0				\n"
-			"	move	%2, %z4				\n"
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	sc	%2, %1				\n"
-			"	.set	mips0				\n"
-			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
-			  "=&r" (dummy)
-			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
-			: "memory");
-		} while (unlikely(!dummy));
 	} else {
 		unsigned long flags;
 
@@ -72,34 +66,19 @@ static inline __u64 __xchg_u64(volatile __u64 * m, __u64 val)
 
 	smp_mb__before_llsc();
 
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {
+	if (kernel_uses_llsc) {
 		unsigned long dummy;
 
 		__asm__ __volatile__(
-		"	.set	arch=r4000				\n"
+		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
 		"1:	lld	%0, %3			# xchg_u64	\n"
 		"	move	%2, %z4					\n"
 		"	scd	%2, %1					\n"
-		"	beqzl	%2, 1b					\n"
+		"\t" __scbeqz "	%2, 1b					\n"
 		"	.set	mips0					\n"
 		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
 		: "memory");
-	} else if (kernel_uses_llsc) {
-		unsigned long dummy;
-
-		do {
-			__asm__ __volatile__(
-			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
-			"	lld	%0, %3		# xchg_u64	\n"
-			"	move	%2, %z4				\n"
-			"	scd	%2, %1				\n"
-			"	.set	mips0				\n"
-			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
-			  "=&r" (dummy)
-			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
-			: "memory");
-		} while (unlikely(!dummy));
 	} else {
 		unsigned long flags;
 
@@ -142,24 +121,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 ({									\
 	__typeof(*(m)) __ret;						\
 									\
-	if (kernel_uses_llsc && R10000_LLSC_WAR) {			\
-		__asm__ __volatile__(					\
-		"	.set	push				\n"	\
-		"	.set	noat				\n"	\
-		"	.set	arch=r4000			\n"	\
-		"1:	" ld "	%0, %2		# __cmpxchg_asm \n"	\
-		"	bne	%0, %z3, 2f			\n"	\
-		"	.set	mips0				\n"	\
-		"	move	$1, %z4				\n"	\
-		"	.set	arch=r4000			\n"	\
-		"	" st "	$1, %1				\n"	\
-		"	beqzl	$1, 1b				\n"	\
-		"2:						\n"	\
-		"	.set	pop				\n"	\
-		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
-		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
-		: "memory");						\
-	} else if (kernel_uses_llsc) {					\
+	if (kernel_uses_llsc) {						\
 		__asm__ __volatile__(					\
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
@@ -170,7 +132,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 		"	move	$1, %z4				\n"	\
 		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
 		"	" st "	$1, %1				\n"	\
-		"	beqz	$1, 1b				\n"	\
+		"\t" __scbeqz "	$1, 1b				\n"	\
 		"	.set	pop				\n"	\
 		"2:						\n"	\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
@@ -245,4 +207,6 @@ extern void __cmpxchg_called_with_bad_pointer(void);
 #define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n))
 #endif
 
+#undef __scbeqz
+
 #endif /* __ASM_CMPXCHG_H */
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/11] MIPS: cmpxchg: Pull xchg() asm into a macro
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Use a macro to generate the 32 & 64 bit variants of the backing code for
xchg(), much as is already done for cmpxchg(). This removes the
duplication that could previously be found in __xchg_u32() &
__xchg_u64().

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 81 +++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 48 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 0582c01d229d..a19359649b60 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -24,36 +24,43 @@
 # define __scbeqz "beqz"
 #endif
 
+#define __xchg_asm(ld, st, m, val)					\
+({									\
+	__typeof(*(m)) __ret;						\
+									\
+	if (kernel_uses_llsc) {						\
+		__asm__ __volatile__(					\
+		"	.set	push				\n"	\
+		"	.set	noat				\n"	\
+		"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"	\
+		"1:	" ld "	%0, %2		# __xchg_asm	\n"	\
+		"	.set	mips0				\n"	\
+		"	move	$1, %z3				\n"	\
+		"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"	\
+		"	" st "	$1, %1				\n"	\
+		"\t" __scbeqz "	$1, 1b				\n"	\
+		"	.set	pop				\n"	\
+		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
+		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)			\
+		: "memory");						\
+	} else {							\
+		unsigned long __flags;					\
+									\
+		raw_local_irq_save(__flags);				\
+		__ret = *m;						\
+		*m = val;						\
+		raw_local_irq_restore(__flags);				\
+	}								\
+									\
+	__ret;								\
+})
+
 static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
 {
 	__u32 retval;
 
 	smp_mb__before_llsc();
-
-	if (kernel_uses_llsc) {
-		unsigned long dummy;
-
-		__asm__ __volatile__(
-		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
-		"1:	ll	%0, %3			# xchg_u32	\n"
-		"	.set	mips0					\n"
-		"	move	%2, %z4					\n"
-		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
-		"	sc	%2, %1					\n"
-		"\t" __scbeqz "	%2, 1b					\n"
-		"	.set	mips0					\n"
-		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
-		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
-		: "memory");
-	} else {
-		unsigned long flags;
-
-		raw_local_irq_save(flags);
-		retval = *m;
-		*m = val;
-		raw_local_irq_restore(flags);	/* implies memory barrier  */
-	}
-
+	retval = __xchg_asm("ll", "sc", m, val);
 	smp_llsc_mb();
 
 	return retval;
@@ -65,29 +72,7 @@ static inline __u64 __xchg_u64(volatile __u64 * m, __u64 val)
 	__u64 retval;
 
 	smp_mb__before_llsc();
-
-	if (kernel_uses_llsc) {
-		unsigned long dummy;
-
-		__asm__ __volatile__(
-		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
-		"1:	lld	%0, %3			# xchg_u64	\n"
-		"	move	%2, %z4					\n"
-		"	scd	%2, %1					\n"
-		"\t" __scbeqz "	%2, 1b					\n"
-		"	.set	mips0					\n"
-		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
-		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
-		: "memory");
-	} else {
-		unsigned long flags;
-
-		raw_local_irq_save(flags);
-		retval = *m;
-		*m = val;
-		raw_local_irq_restore(flags);	/* implies memory barrier  */
-	}
-
+	retval = __xchg_asm("lld", "scd", m, val);
 	smp_llsc_mb();
 
 	return retval;
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/11] MIPS: cmpxchg: Pull xchg() asm into a macro
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Use a macro to generate the 32 & 64 bit variants of the backing code for
xchg(), much as is already done for cmpxchg(). This removes the
duplication that could previously be found in __xchg_u32() &
__xchg_u64().

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 81 +++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 48 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 0582c01d229d..a19359649b60 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -24,36 +24,43 @@
 # define __scbeqz "beqz"
 #endif
 
+#define __xchg_asm(ld, st, m, val)					\
+({									\
+	__typeof(*(m)) __ret;						\
+									\
+	if (kernel_uses_llsc) {						\
+		__asm__ __volatile__(					\
+		"	.set	push				\n"	\
+		"	.set	noat				\n"	\
+		"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"	\
+		"1:	" ld "	%0, %2		# __xchg_asm	\n"	\
+		"	.set	mips0				\n"	\
+		"	move	$1, %z3				\n"	\
+		"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"	\
+		"	" st "	$1, %1				\n"	\
+		"\t" __scbeqz "	$1, 1b				\n"	\
+		"	.set	pop				\n"	\
+		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
+		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)			\
+		: "memory");						\
+	} else {							\
+		unsigned long __flags;					\
+									\
+		raw_local_irq_save(__flags);				\
+		__ret = *m;						\
+		*m = val;						\
+		raw_local_irq_restore(__flags);				\
+	}								\
+									\
+	__ret;								\
+})
+
 static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
 {
 	__u32 retval;
 
 	smp_mb__before_llsc();
-
-	if (kernel_uses_llsc) {
-		unsigned long dummy;
-
-		__asm__ __volatile__(
-		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
-		"1:	ll	%0, %3			# xchg_u32	\n"
-		"	.set	mips0					\n"
-		"	move	%2, %z4					\n"
-		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
-		"	sc	%2, %1					\n"
-		"\t" __scbeqz "	%2, 1b					\n"
-		"	.set	mips0					\n"
-		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
-		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
-		: "memory");
-	} else {
-		unsigned long flags;
-
-		raw_local_irq_save(flags);
-		retval = *m;
-		*m = val;
-		raw_local_irq_restore(flags);	/* implies memory barrier  */
-	}
-
+	retval = __xchg_asm("ll", "sc", m, val);
 	smp_llsc_mb();
 
 	return retval;
@@ -65,29 +72,7 @@ static inline __u64 __xchg_u64(volatile __u64 * m, __u64 val)
 	__u64 retval;
 
 	smp_mb__before_llsc();
-
-	if (kernel_uses_llsc) {
-		unsigned long dummy;
-
-		__asm__ __volatile__(
-		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
-		"1:	lld	%0, %3			# xchg_u64	\n"
-		"	move	%2, %z4					\n"
-		"	scd	%2, %1					\n"
-		"\t" __scbeqz "	%2, 1b					\n"
-		"	.set	mips0					\n"
-		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
-		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
-		: "memory");
-	} else {
-		unsigned long flags;
-
-		raw_local_irq_save(flags);
-		retval = *m;
-		*m = val;
-		raw_local_irq_restore(flags);	/* implies memory barrier  */
-	}
-
+	retval = __xchg_asm("lld", "scd", m, val);
 	smp_llsc_mb();
 
 	return retval;
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/11] MIPS: cmpxchg: Use __compiletime_error() for bad cmpxchg() pointers
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Our cmpxchg() implementation relies upon generating a call to a function
which doesn't really exist (__cmpxchg_called_with_bad_pointer) to create
a link failure in cases where cmpxchg() is called with a pointer to a
value of an unsupported size.

The __compiletime_error macro can be used to decorate a function such
that a call to it generates a compile-time, rather than a link-time,
error. This patch uses __compiletime_error to cause bad cmpxchg() calls
to error out at compile time rather than link time, allowing errors to
occur more quickly & making it easier to spot where the problem comes
from.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index a19359649b60..ee0214e00ab1 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -137,10 +137,17 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 })
 
 /*
- * This function doesn't exist, so you'll get a linker error
- * if something tries to do an invalid cmpxchg().
+ * This function doesn't exist, so if it is called you'll either:
+ *
+ * - Get an error at compile-time due to __compiletime_error, if supported by
+ *   your compiler.
+ *
+ * or:
+ *
+ * - Get an error at link-time due to the call to the missing function.
  */
-extern void __cmpxchg_called_with_bad_pointer(void);
+extern void __cmpxchg_called_with_bad_pointer(void)
+	__compiletime_error("Bad argument size for cmpxchg");
 
 #define __cmpxchg(ptr, old, new, pre_barrier, post_barrier)		\
 ({									\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/11] MIPS: cmpxchg: Use __compiletime_error() for bad cmpxchg() pointers
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Our cmpxchg() implementation relies upon generating a call to a function
which doesn't really exist (__cmpxchg_called_with_bad_pointer) to create
a link failure in cases where cmpxchg() is called with a pointer to a
value of an unsupported size.

The __compiletime_error macro can be used to decorate a function such
that a call to it generates a compile-time, rather than a link-time,
error. This patch uses __compiletime_error to cause bad cmpxchg() calls
to error out at compile time rather than link time, allowing errors to
occur more quickly & making it easier to spot where the problem comes
from.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index a19359649b60..ee0214e00ab1 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -137,10 +137,17 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 })
 
 /*
- * This function doesn't exist, so you'll get a linker error
- * if something tries to do an invalid cmpxchg().
+ * This function doesn't exist, so if it is called you'll either:
+ *
+ * - Get an error at compile-time due to __compiletime_error, if supported by
+ *   your compiler.
+ *
+ * or:
+ *
+ * - Get an error at link-time due to the call to the missing function.
  */
-extern void __cmpxchg_called_with_bad_pointer(void);
+extern void __cmpxchg_called_with_bad_pointer(void)
+	__compiletime_error("Bad argument size for cmpxchg");
 
 #define __cmpxchg(ptr, old, new, pre_barrier, post_barrier)		\
 ({									\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/11] MIPS: cmpxchg: Error out on unsupported xchg() calls
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

xchg() has up until now simply returned the x parameter in cases where
it is called with a pointer to a value of an unsupported size. This will
often cause the calling code to hit a failure path, presuming that the
value of x differs from the content of the memory pointed at by ptr, but
we can do better by producing a compile-time or link-time error such
that unsupported calls to xchg() are detectable earlier than runtime.

This patch does this in the same was as is already done for cmpxchg(),
using a call to a missing function annotated with __compiletime_error().

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index ee0214e00ab1..fe652c3e5d8c 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -24,6 +24,21 @@
 # define __scbeqz "beqz"
 #endif
 
+/*
+ * These functions doesn't exist, so if they are called you'll either:
+ *
+ * - Get an error at compile-time due to __compiletime_error, if supported by
+ *   your compiler.
+ *
+ * or:
+ *
+ * - Get an error at link-time due to the call to the missing function.
+ */
+extern void __cmpxchg_called_with_bad_pointer(void)
+	__compiletime_error("Bad argument size for cmpxchg");
+extern unsigned long __xchg_called_with_bad_pointer(void)
+	__compiletime_error("Bad argument size for xchg");
+
 #define __xchg_asm(ld, st, m, val)					\
 ({									\
 	__typeof(*(m)) __ret;						\
@@ -89,9 +104,9 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 		return __xchg_u32(ptr, x);
 	case 8:
 		return __xchg_u64(ptr, x);
+	default:
+		return __xchg_called_with_bad_pointer();
 	}
-
-	return x;
 }
 
 #define xchg(ptr, x)							\
@@ -136,19 +151,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 	__ret;								\
 })
 
-/*
- * This function doesn't exist, so if it is called you'll either:
- *
- * - Get an error at compile-time due to __compiletime_error, if supported by
- *   your compiler.
- *
- * or:
- *
- * - Get an error at link-time due to the call to the missing function.
- */
-extern void __cmpxchg_called_with_bad_pointer(void)
-	__compiletime_error("Bad argument size for cmpxchg");
-
 #define __cmpxchg(ptr, old, new, pre_barrier, post_barrier)		\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/11] MIPS: cmpxchg: Error out on unsupported xchg() calls
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

xchg() has up until now simply returned the x parameter in cases where
it is called with a pointer to a value of an unsupported size. This will
often cause the calling code to hit a failure path, presuming that the
value of x differs from the content of the memory pointed at by ptr, but
we can do better by producing a compile-time or link-time error such
that unsupported calls to xchg() are detectable earlier than runtime.

This patch does this in the same was as is already done for cmpxchg(),
using a call to a missing function annotated with __compiletime_error().

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index ee0214e00ab1..fe652c3e5d8c 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -24,6 +24,21 @@
 # define __scbeqz "beqz"
 #endif
 
+/*
+ * These functions doesn't exist, so if they are called you'll either:
+ *
+ * - Get an error at compile-time due to __compiletime_error, if supported by
+ *   your compiler.
+ *
+ * or:
+ *
+ * - Get an error at link-time due to the call to the missing function.
+ */
+extern void __cmpxchg_called_with_bad_pointer(void)
+	__compiletime_error("Bad argument size for cmpxchg");
+extern unsigned long __xchg_called_with_bad_pointer(void)
+	__compiletime_error("Bad argument size for xchg");
+
 #define __xchg_asm(ld, st, m, val)					\
 ({									\
 	__typeof(*(m)) __ret;						\
@@ -89,9 +104,9 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 		return __xchg_u32(ptr, x);
 	case 8:
 		return __xchg_u64(ptr, x);
+	default:
+		return __xchg_called_with_bad_pointer();
 	}
-
-	return x;
 }
 
 #define xchg(ptr, x)							\
@@ -136,19 +151,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 	__ret;								\
 })
 
-/*
- * This function doesn't exist, so if it is called you'll either:
- *
- * - Get an error at compile-time due to __compiletime_error, if supported by
- *   your compiler.
- *
- * or:
- *
- * - Get an error at link-time due to the call to the missing function.
- */
-extern void __cmpxchg_called_with_bad_pointer(void)
-	__compiletime_error("Bad argument size for cmpxchg");
-
 #define __cmpxchg(ptr, old, new, pre_barrier, post_barrier)		\
 ({									\
 	__typeof__(ptr) __ptr = (ptr);					\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/11] MIPS: cmpxchg: Drop __xchg_u{32,64} functions
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

The __xchg_u32() & __xchg_u64() functions now add very little value.
This patch therefore removes them, by:

  - Moving memory barriers out of them & into xchg(), which also removes
    the duplication & readies us to support xchg_relaxed() if we wish to.

  - Calling __xchg_asm() directly from __xchg().

  - Performing the check for CONFIG_64BIT being enabled in the size=8
    case of __xchg().

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 48 +++++++++++++++--------------------------
 1 file changed, 17 insertions(+), 31 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index fe652c3e5d8c..e9c1e97bc29d 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -70,40 +70,18 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 	__ret;								\
 })
 
-static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
-{
-	__u32 retval;
-
-	smp_mb__before_llsc();
-	retval = __xchg_asm("ll", "sc", m, val);
-	smp_llsc_mb();
-
-	return retval;
-}
-
-#ifdef CONFIG_64BIT
-static inline __u64 __xchg_u64(volatile __u64 * m, __u64 val)
-{
-	__u64 retval;
-
-	smp_mb__before_llsc();
-	retval = __xchg_asm("lld", "scd", m, val);
-	smp_llsc_mb();
-
-	return retval;
-}
-#else
-extern __u64 __xchg_u64_unsupported_on_32bit_kernels(volatile __u64 * m, __u64 val);
-#define __xchg_u64 __xchg_u64_unsupported_on_32bit_kernels
-#endif
-
 static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
 {
 	switch (size) {
 	case 4:
-		return __xchg_u32(ptr, x);
+		return __xchg_asm("ll", "sc", (volatile u32 *)ptr, x);
+
 	case 8:
-		return __xchg_u64(ptr, x);
+		if (!IS_ENABLED(CONFIG_64BIT))
+			return __xchg_called_with_bad_pointer();
+
+		return __xchg_asm("lld", "scd", (volatile u64 *)ptr, x);
+
 	default:
 		return __xchg_called_with_bad_pointer();
 	}
@@ -111,10 +89,18 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 
 #define xchg(ptr, x)							\
 ({									\
+	__typeof__(*(ptr)) __res;					\
+									\
 	BUILD_BUG_ON(sizeof(*(ptr)) & ~0xc);				\
 									\
-	((__typeof__(*(ptr)))						\
-		__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))));	\
+	smp_mb__before_llsc();						\
+									\
+	__res = (__typeof__(*(ptr)))					\
+		__xchg((unsigned long)(x), (ptr), sizeof(*(ptr)));	\
+									\
+	smp_llsc_mb();							\
+									\
+	__res;								\
 })
 
 #define __cmpxchg_asm(ld, st, m, old, new)				\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/11] MIPS: cmpxchg: Drop __xchg_u{32,64} functions
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

The __xchg_u32() & __xchg_u64() functions now add very little value.
This patch therefore removes them, by:

  - Moving memory barriers out of them & into xchg(), which also removes
    the duplication & readies us to support xchg_relaxed() if we wish to.

  - Calling __xchg_asm() directly from __xchg().

  - Performing the check for CONFIG_64BIT being enabled in the size=8
    case of __xchg().

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 48 +++++++++++++++--------------------------
 1 file changed, 17 insertions(+), 31 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index fe652c3e5d8c..e9c1e97bc29d 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -70,40 +70,18 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 	__ret;								\
 })
 
-static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
-{
-	__u32 retval;
-
-	smp_mb__before_llsc();
-	retval = __xchg_asm("ll", "sc", m, val);
-	smp_llsc_mb();
-
-	return retval;
-}
-
-#ifdef CONFIG_64BIT
-static inline __u64 __xchg_u64(volatile __u64 * m, __u64 val)
-{
-	__u64 retval;
-
-	smp_mb__before_llsc();
-	retval = __xchg_asm("lld", "scd", m, val);
-	smp_llsc_mb();
-
-	return retval;
-}
-#else
-extern __u64 __xchg_u64_unsupported_on_32bit_kernels(volatile __u64 * m, __u64 val);
-#define __xchg_u64 __xchg_u64_unsupported_on_32bit_kernels
-#endif
-
 static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
 {
 	switch (size) {
 	case 4:
-		return __xchg_u32(ptr, x);
+		return __xchg_asm("ll", "sc", (volatile u32 *)ptr, x);
+
 	case 8:
-		return __xchg_u64(ptr, x);
+		if (!IS_ENABLED(CONFIG_64BIT))
+			return __xchg_called_with_bad_pointer();
+
+		return __xchg_asm("lld", "scd", (volatile u64 *)ptr, x);
+
 	default:
 		return __xchg_called_with_bad_pointer();
 	}
@@ -111,10 +89,18 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 
 #define xchg(ptr, x)							\
 ({									\
+	__typeof__(*(ptr)) __res;					\
+									\
 	BUILD_BUG_ON(sizeof(*(ptr)) & ~0xc);				\
 									\
-	((__typeof__(*(ptr)))						\
-		__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))));	\
+	smp_mb__before_llsc();						\
+									\
+	__res = (__typeof__(*(ptr)))					\
+		__xchg((unsigned long)(x), (ptr), sizeof(*(ptr)));	\
+									\
+	smp_llsc_mb();							\
+									\
+	__res;								\
 })
 
 #define __cmpxchg_asm(ld, st, m, old, new)				\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/11] MIPS: cmpxchg: Implement __cmpxchg() as a function
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Replace the macro definition of __cmpxchg() with an inline function,
which is easier to read & modify. The cmpxchg() & cmpxchg_local() macros
are adjusted to call the new __cmpxchg() function.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 59 ++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 27 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index e9c1e97bc29d..516cb66f066b 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -34,7 +34,7 @@
  *
  * - Get an error at link-time due to the call to the missing function.
  */
-extern void __cmpxchg_called_with_bad_pointer(void)
+extern unsigned long __cmpxchg_called_with_bad_pointer(void)
 	__compiletime_error("Bad argument size for cmpxchg");
 extern unsigned long __xchg_called_with_bad_pointer(void)
 	__compiletime_error("Bad argument size for xchg");
@@ -137,38 +137,43 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 	__ret;								\
 })
 
-#define __cmpxchg(ptr, old, new, pre_barrier, post_barrier)		\
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+				      unsigned long new, unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_asm("ll", "sc", (volatile u32 *)ptr, old, new);
+
+	case 8:
+		/* lld/scd are only available for MIPS64 */
+		if (!IS_ENABLED(CONFIG_64BIT))
+			return __cmpxchg_called_with_bad_pointer();
+
+		return __cmpxchg_asm("lld", "scd", (volatile u64 *)ptr, old, new);
+
+	default:
+		return __cmpxchg_called_with_bad_pointer();
+	}
+}
+
+#define cmpxchg_local(ptr, old, new)					\
+	((__typeof__(*(ptr)))						\
+		__cmpxchg((ptr),					\
+			  (unsigned long)(__typeof__(*(ptr)))(old),	\
+			  (unsigned long)(__typeof__(*(ptr)))(new),	\
+			  sizeof(*(ptr))))
+
+#define cmpxchg(ptr, old, new)						\
 ({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	__typeof__(*(ptr)) __res = 0;					\
-									\
-	pre_barrier;							\
-									\
-	switch (sizeof(*(__ptr))) {					\
-	case 4:								\
-		__res = __cmpxchg_asm("ll", "sc", __ptr, __old, __new); \
-		break;							\
-	case 8:								\
-		if (sizeof(long) == 8) {				\
-			__res = __cmpxchg_asm("lld", "scd", __ptr,	\
-					   __old, __new);		\
-			break;						\
-		}							\
-	default:							\
-		__cmpxchg_called_with_bad_pointer();			\
-		break;							\
-	}								\
+	__typeof__(*(ptr)) __res;					\
 									\
-	post_barrier;							\
+	smp_mb__before_llsc();						\
+	__res = cmpxchg_local((ptr), (old), (new));			\
+	smp_llsc_mb();							\
 									\
 	__res;								\
 })
 
-#define cmpxchg(ptr, old, new)		__cmpxchg(ptr, old, new, smp_mb__before_llsc(), smp_llsc_mb())
-#define cmpxchg_local(ptr, old, new)	__cmpxchg(ptr, old, new, , )
-
 #ifdef CONFIG_64BIT
 #define cmpxchg64_local(ptr, o, n)					\
   ({									\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/11] MIPS: cmpxchg: Implement __cmpxchg() as a function
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Replace the macro definition of __cmpxchg() with an inline function,
which is easier to read & modify. The cmpxchg() & cmpxchg_local() macros
are adjusted to call the new __cmpxchg() function.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 59 ++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 27 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index e9c1e97bc29d..516cb66f066b 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -34,7 +34,7 @@
  *
  * - Get an error at link-time due to the call to the missing function.
  */
-extern void __cmpxchg_called_with_bad_pointer(void)
+extern unsigned long __cmpxchg_called_with_bad_pointer(void)
 	__compiletime_error("Bad argument size for cmpxchg");
 extern unsigned long __xchg_called_with_bad_pointer(void)
 	__compiletime_error("Bad argument size for xchg");
@@ -137,38 +137,43 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 	__ret;								\
 })
 
-#define __cmpxchg(ptr, old, new, pre_barrier, post_barrier)		\
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+				      unsigned long new, unsigned int size)
+{
+	switch (size) {
+	case 4:
+		return __cmpxchg_asm("ll", "sc", (volatile u32 *)ptr, old, new);
+
+	case 8:
+		/* lld/scd are only available for MIPS64 */
+		if (!IS_ENABLED(CONFIG_64BIT))
+			return __cmpxchg_called_with_bad_pointer();
+
+		return __cmpxchg_asm("lld", "scd", (volatile u64 *)ptr, old, new);
+
+	default:
+		return __cmpxchg_called_with_bad_pointer();
+	}
+}
+
+#define cmpxchg_local(ptr, old, new)					\
+	((__typeof__(*(ptr)))						\
+		__cmpxchg((ptr),					\
+			  (unsigned long)(__typeof__(*(ptr)))(old),	\
+			  (unsigned long)(__typeof__(*(ptr)))(new),	\
+			  sizeof(*(ptr))))
+
+#define cmpxchg(ptr, old, new)						\
 ({									\
-	__typeof__(ptr) __ptr = (ptr);					\
-	__typeof__(*(ptr)) __old = (old);				\
-	__typeof__(*(ptr)) __new = (new);				\
-	__typeof__(*(ptr)) __res = 0;					\
-									\
-	pre_barrier;							\
-									\
-	switch (sizeof(*(__ptr))) {					\
-	case 4:								\
-		__res = __cmpxchg_asm("ll", "sc", __ptr, __old, __new); \
-		break;							\
-	case 8:								\
-		if (sizeof(long) == 8) {				\
-			__res = __cmpxchg_asm("lld", "scd", __ptr,	\
-					   __old, __new);		\
-			break;						\
-		}							\
-	default:							\
-		__cmpxchg_called_with_bad_pointer();			\
-		break;							\
-	}								\
+	__typeof__(*(ptr)) __res;					\
 									\
-	post_barrier;							\
+	smp_mb__before_llsc();						\
+	__res = cmpxchg_local((ptr), (old), (new));			\
+	smp_llsc_mb();							\
 									\
 	__res;								\
 })
 
-#define cmpxchg(ptr, old, new)		__cmpxchg(ptr, old, new, smp_mb__before_llsc(), smp_llsc_mb())
-#define cmpxchg_local(ptr, old, new)	__cmpxchg(ptr, old, new, , )
-
 #ifdef CONFIG_64BIT
 #define cmpxchg64_local(ptr, o, n)					\
   ({									\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/11] MIPS: cmpxchg: Implement 1 byte & 2 byte xchg()
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Implement 1 & 2 byte xchg() using read-modify-write atop a 4 byte
cmpxchg(). This allows us to support these atomic operations despite the
MIPS ISA only providing for 4 & 8 byte atomic operations.

This is required in order to support queued spinlocks (qspinlock) in a
later patch, since these make use of a 2 byte xchg() in their slow path.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h |  9 +++++--
 arch/mips/kernel/Makefile       |  2 +-
 arch/mips/kernel/cmpxchg.c      | 52 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 3 deletions(-)
 create mode 100644 arch/mips/kernel/cmpxchg.c

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 516cb66f066b..a633bf845689 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -70,9 +70,16 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 	__ret;								\
 })
 
+extern unsigned long __xchg_small(volatile void *ptr, unsigned long val,
+				  unsigned int size);
+
 static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
 {
 	switch (size) {
+	case 1:
+	case 2:
+		return __xchg_small(ptr, x, size);
+
 	case 4:
 		return __xchg_asm("ll", "sc", (volatile u32 *)ptr, x);
 
@@ -91,8 +98,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 ({									\
 	__typeof__(*(ptr)) __res;					\
 									\
-	BUILD_BUG_ON(sizeof(*(ptr)) & ~0xc);				\
-									\
 	smp_mb__before_llsc();						\
 									\
 	__res = (__typeof__(*(ptr)))					\
diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
index 9a0e37b92ce0..e06c6f7d774c 100644
--- a/arch/mips/kernel/Makefile
+++ b/arch/mips/kernel/Makefile
@@ -4,7 +4,7 @@
 
 extra-y		:= head.o vmlinux.lds
 
-obj-y		+= cpu-probe.o branch.o elf.o entry.o genex.o idle.o irq.o \
+obj-y		+= cmpxchg.o cpu-probe.o branch.o elf.o entry.o genex.o idle.o irq.o \
 		   process.o prom.o ptrace.o reset.o setup.o signal.o \
 		   syscall.o time.o topology.o traps.o unaligned.o watch.o \
 		   vdso.o cacheinfo.o
diff --git a/arch/mips/kernel/cmpxchg.c b/arch/mips/kernel/cmpxchg.c
new file mode 100644
index 000000000000..5acfbf9fb2c5
--- /dev/null
+++ b/arch/mips/kernel/cmpxchg.c
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2017 Imagination Technologies
+ * Author: Paul Burton <paul.burton@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/bitops.h>
+#include <asm/cmpxchg.h>
+
+unsigned long __xchg_small(volatile void *ptr, unsigned long val, unsigned int size)
+{
+	u32 old32, new32, load32, mask;
+	volatile u32 *ptr32;
+	unsigned int shift;
+
+	/* Check that ptr is naturally aligned */
+	WARN_ON((unsigned long)ptr & (size - 1));
+
+	/* Mask value to the correct size. */
+	mask = GENMASK((size * BITS_PER_BYTE) - 1, 0);
+	val &= mask;
+
+	/*
+	 * Calculate a shift & mask that correspond to the value we wish to
+	 * exchange within the naturally aligned 4 byte integerthat includes
+	 * it.
+	 */
+	shift = (unsigned long)ptr & 0x3;
+	if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+		shift ^= sizeof(u32) - size;
+	shift *= BITS_PER_BYTE;
+	mask <<= shift;
+
+	/*
+	 * Calculate a pointer to the naturally aligned 4 byte integer that
+	 * includes our byte of interest, and load its value.
+	 */
+	ptr32 = (volatile u32 *)((unsigned long)ptr & ~0x3);
+	load32 = *ptr32;
+
+	do {
+		old32 = load32;
+		new32 = (load32 & ~mask) | (val << shift);
+		load32 = cmpxchg(ptr32, old32, new32);
+	} while (load32 != old32);
+
+	return (load32 & mask) >> shift;
+}
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/11] MIPS: cmpxchg: Implement 1 byte & 2 byte xchg()
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Implement 1 & 2 byte xchg() using read-modify-write atop a 4 byte
cmpxchg(). This allows us to support these atomic operations despite the
MIPS ISA only providing for 4 & 8 byte atomic operations.

This is required in order to support queued spinlocks (qspinlock) in a
later patch, since these make use of a 2 byte xchg() in their slow path.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h |  9 +++++--
 arch/mips/kernel/Makefile       |  2 +-
 arch/mips/kernel/cmpxchg.c      | 52 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 3 deletions(-)
 create mode 100644 arch/mips/kernel/cmpxchg.c

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 516cb66f066b..a633bf845689 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -70,9 +70,16 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 	__ret;								\
 })
 
+extern unsigned long __xchg_small(volatile void *ptr, unsigned long val,
+				  unsigned int size);
+
 static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
 {
 	switch (size) {
+	case 1:
+	case 2:
+		return __xchg_small(ptr, x, size);
+
 	case 4:
 		return __xchg_asm("ll", "sc", (volatile u32 *)ptr, x);
 
@@ -91,8 +98,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 ({									\
 	__typeof__(*(ptr)) __res;					\
 									\
-	BUILD_BUG_ON(sizeof(*(ptr)) & ~0xc);				\
-									\
 	smp_mb__before_llsc();						\
 									\
 	__res = (__typeof__(*(ptr)))					\
diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
index 9a0e37b92ce0..e06c6f7d774c 100644
--- a/arch/mips/kernel/Makefile
+++ b/arch/mips/kernel/Makefile
@@ -4,7 +4,7 @@
 
 extra-y		:= head.o vmlinux.lds
 
-obj-y		+= cpu-probe.o branch.o elf.o entry.o genex.o idle.o irq.o \
+obj-y		+= cmpxchg.o cpu-probe.o branch.o elf.o entry.o genex.o idle.o irq.o \
 		   process.o prom.o ptrace.o reset.o setup.o signal.o \
 		   syscall.o time.o topology.o traps.o unaligned.o watch.o \
 		   vdso.o cacheinfo.o
diff --git a/arch/mips/kernel/cmpxchg.c b/arch/mips/kernel/cmpxchg.c
new file mode 100644
index 000000000000..5acfbf9fb2c5
--- /dev/null
+++ b/arch/mips/kernel/cmpxchg.c
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2017 Imagination Technologies
+ * Author: Paul Burton <paul.burton@imgtec.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/bitops.h>
+#include <asm/cmpxchg.h>
+
+unsigned long __xchg_small(volatile void *ptr, unsigned long val, unsigned int size)
+{
+	u32 old32, new32, load32, mask;
+	volatile u32 *ptr32;
+	unsigned int shift;
+
+	/* Check that ptr is naturally aligned */
+	WARN_ON((unsigned long)ptr & (size - 1));
+
+	/* Mask value to the correct size. */
+	mask = GENMASK((size * BITS_PER_BYTE) - 1, 0);
+	val &= mask;
+
+	/*
+	 * Calculate a shift & mask that correspond to the value we wish to
+	 * exchange within the naturally aligned 4 byte integerthat includes
+	 * it.
+	 */
+	shift = (unsigned long)ptr & 0x3;
+	if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+		shift ^= sizeof(u32) - size;
+	shift *= BITS_PER_BYTE;
+	mask <<= shift;
+
+	/*
+	 * Calculate a pointer to the naturally aligned 4 byte integer that
+	 * includes our byte of interest, and load its value.
+	 */
+	ptr32 = (volatile u32 *)((unsigned long)ptr & ~0x3);
+	load32 = *ptr32;
+
+	do {
+		old32 = load32;
+		new32 = (load32 & ~mask) | (val << shift);
+		load32 = cmpxchg(ptr32, old32, new32);
+	} while (load32 != old32);
+
+	return (load32 & mask) >> shift;
+}
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/11] MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg()
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Implement support for 1 & 2 byte cmpxchg() using read-modify-write atop
a 4 byte cmpxchg(). This allows us to support these atomic operations
despite the MIPS ISA only providing 4 & 8 byte atomic operations.

This is required in order to support queued rwlocks (qrwlock) in a later
patch, since these make use of a 1 byte cmpxchg() in their slow path.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h |  7 +++++
 arch/mips/kernel/cmpxchg.c      | 57 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index a633bf845689..a633f61c5545 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -142,10 +142,17 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 	__ret;								\
 })
 
+extern unsigned long __cmpxchg_small(volatile void *ptr, unsigned long old,
+				     unsigned long new, unsigned int size);
+
 static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 				      unsigned long new, unsigned int size)
 {
 	switch (size) {
+	case 1:
+	case 2:
+		return __cmpxchg_small(ptr, old, new, size);
+
 	case 4:
 		return __cmpxchg_asm("ll", "sc", (volatile u32 *)ptr, old, new);
 
diff --git a/arch/mips/kernel/cmpxchg.c b/arch/mips/kernel/cmpxchg.c
index 5acfbf9fb2c5..7730f1d3434f 100644
--- a/arch/mips/kernel/cmpxchg.c
+++ b/arch/mips/kernel/cmpxchg.c
@@ -50,3 +50,60 @@ unsigned long __xchg_small(volatile void *ptr, unsigned long val, unsigned int s
 
 	return (load32 & mask) >> shift;
 }
+
+unsigned long __cmpxchg_small(volatile void *ptr, unsigned long old,
+			      unsigned long new, unsigned int size)
+{
+	u32 mask, old32, new32, load32;
+	volatile u32 *ptr32;
+	unsigned int shift;
+	u8 load;
+
+	/* Check that ptr is naturally aligned */
+	WARN_ON((unsigned long)ptr & (size - 1));
+
+	/* Mask inputs to the correct size. */
+	mask = GENMASK((size * BITS_PER_BYTE) - 1, 0);
+	old &= mask;
+	new &= mask;
+
+	/*
+	 * Calculate a shift & mask that correspond to the value we wish to
+	 * compare & exchange within the naturally aligned 4 byte integer
+	 * that includes it.
+	 */
+	shift = (unsigned long)ptr & 0x3;
+	if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+		shift ^= sizeof(u32) - size;
+	shift *= BITS_PER_BYTE;
+	mask <<= shift;
+
+	/*
+	 * Calculate a pointer to the naturally aligned 4 byte integer that
+	 * includes our byte of interest, and load its value.
+	 */
+	ptr32 = (volatile u32 *)((unsigned long)ptr & ~0x3);
+	load32 = *ptr32;
+
+	while (true) {
+		/*
+		 * Ensure the byte we want to exchange matches the expected
+		 * old value, and if not then bail.
+		 */
+		load = (load32 & mask) >> shift;
+		if (load != old)
+			return load;
+
+		/*
+		 * Calculate the old & new values of the naturally aligned
+		 * 4 byte integer that include the byte we want to exchange.
+		 * Attempt to exchange the old value for the new value, and
+		 * return if we succeed.
+		 */
+		old32 = (load32 & ~mask) | (old << shift);
+		new32 = (load32 & ~mask) | (new << shift);
+		load32 = cmpxchg(ptr32, old32, new32);
+		if (load32 == old32)
+			return old;
+	}
+}
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/11] MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg()
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

Implement support for 1 & 2 byte cmpxchg() using read-modify-write atop
a 4 byte cmpxchg(). This allows us to support these atomic operations
despite the MIPS ISA only providing 4 & 8 byte atomic operations.

This is required in order to support queued rwlocks (qrwlock) in a later
patch, since these make use of a 1 byte cmpxchg() in their slow path.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h |  7 +++++
 arch/mips/kernel/cmpxchg.c      | 57 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index a633bf845689..a633f61c5545 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -142,10 +142,17 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 	__ret;								\
 })
 
+extern unsigned long __cmpxchg_small(volatile void *ptr, unsigned long old,
+				     unsigned long new, unsigned int size);
+
 static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 				      unsigned long new, unsigned int size)
 {
 	switch (size) {
+	case 1:
+	case 2:
+		return __cmpxchg_small(ptr, old, new, size);
+
 	case 4:
 		return __cmpxchg_asm("ll", "sc", (volatile u32 *)ptr, old, new);
 
diff --git a/arch/mips/kernel/cmpxchg.c b/arch/mips/kernel/cmpxchg.c
index 5acfbf9fb2c5..7730f1d3434f 100644
--- a/arch/mips/kernel/cmpxchg.c
+++ b/arch/mips/kernel/cmpxchg.c
@@ -50,3 +50,60 @@ unsigned long __xchg_small(volatile void *ptr, unsigned long val, unsigned int s
 
 	return (load32 & mask) >> shift;
 }
+
+unsigned long __cmpxchg_small(volatile void *ptr, unsigned long old,
+			      unsigned long new, unsigned int size)
+{
+	u32 mask, old32, new32, load32;
+	volatile u32 *ptr32;
+	unsigned int shift;
+	u8 load;
+
+	/* Check that ptr is naturally aligned */
+	WARN_ON((unsigned long)ptr & (size - 1));
+
+	/* Mask inputs to the correct size. */
+	mask = GENMASK((size * BITS_PER_BYTE) - 1, 0);
+	old &= mask;
+	new &= mask;
+
+	/*
+	 * Calculate a shift & mask that correspond to the value we wish to
+	 * compare & exchange within the naturally aligned 4 byte integer
+	 * that includes it.
+	 */
+	shift = (unsigned long)ptr & 0x3;
+	if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+		shift ^= sizeof(u32) - size;
+	shift *= BITS_PER_BYTE;
+	mask <<= shift;
+
+	/*
+	 * Calculate a pointer to the naturally aligned 4 byte integer that
+	 * includes our byte of interest, and load its value.
+	 */
+	ptr32 = (volatile u32 *)((unsigned long)ptr & ~0x3);
+	load32 = *ptr32;
+
+	while (true) {
+		/*
+		 * Ensure the byte we want to exchange matches the expected
+		 * old value, and if not then bail.
+		 */
+		load = (load32 & mask) >> shift;
+		if (load != old)
+			return load;
+
+		/*
+		 * Calculate the old & new values of the naturally aligned
+		 * 4 byte integer that include the byte we want to exchange.
+		 * Attempt to exchange the old value for the new value, and
+		 * return if we succeed.
+		 */
+		old32 = (load32 & ~mask) | (old << shift);
+		new32 = (load32 & ~mask) | (new << shift);
+		load32 = cmpxchg(ptr32, old32, new32);
+		if (load32 == old32)
+			return old;
+	}
+}
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/11] MIPS: cmpxchg: Rearrange __xchg() arguments to match xchg()
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

The __xchg() function declares its first 2 arguments in reverse order
compared to the xchg() macro, which is confusing & serves no purpose.
Reorder the arguments such that __xchg() & xchg() match.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index a633f61c5545..903f3bf48419 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -73,7 +73,8 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 extern unsigned long __xchg_small(volatile void *ptr, unsigned long val,
 				  unsigned int size);
 
-static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
+static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
+				   int size)
 {
 	switch (size) {
 	case 1:
@@ -101,7 +102,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 	smp_mb__before_llsc();						\
 									\
 	__res = (__typeof__(*(ptr)))					\
-		__xchg((unsigned long)(x), (ptr), sizeof(*(ptr)));	\
+		__xchg((ptr), (unsigned long)(x), sizeof(*(ptr)));	\
 									\
 	smp_llsc_mb();							\
 									\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/11] MIPS: cmpxchg: Rearrange __xchg() arguments to match xchg()
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

The __xchg() function declares its first 2 arguments in reverse order
compared to the xchg() macro, which is confusing & serves no purpose.
Reorder the arguments such that __xchg() & xchg() match.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/include/asm/cmpxchg.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index a633f61c5545..903f3bf48419 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -73,7 +73,8 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 extern unsigned long __xchg_small(volatile void *ptr, unsigned long val,
 				  unsigned int size);
 
-static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
+static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
+				   int size)
 {
 	switch (size) {
 	case 1:
@@ -101,7 +102,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
 	smp_mb__before_llsc();						\
 									\
 	__res = (__typeof__(*(ptr)))					\
-		__xchg((unsigned long)(x), (ptr), sizeof(*(ptr)));	\
+		__xchg((ptr), (unsigned long)(x), sizeof(*(ptr)));	\
 									\
 	smp_llsc_mb();							\
 									\
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/11] MIPS: Use queued read/write locks (qrwlock)
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

This patch switches MIPS to make use of generically implemented queued
read/write locks, rather than the custom implementation used previously.
This allows us to drop a whole load of inline assembly, share more
generic code, and is also a performance win.

Results from running the AIM7 short workload on a MIPS Creator Ci40 (ie.
2 core 2 thread interAptiv CPU clocked at 546MHz) with v4.12-rc4
pistachio_defconfig, with ftrace disabled due to a current bug, and both
with & without use of queued rwlocks & spinlocks:

  Forks | v4.12-rc4 | +qlocks  | Change
 -------|-----------|----------|--------
     10 | 52630.32  | 53316.31 | +1.01%
     20 | 51777.80  | 52623.15 | +1.02%
     30 | 51645.92  | 52517.26 | +1.02%
     40 | 51634.88  | 52419.89 | +1.02%
     50 | 51506.75  | 52307.81 | +1.02%
     60 | 51500.74  | 52322.72 | +1.02%
     70 | 51434.81  | 52288.60 | +1.02%
     80 | 51423.22  | 52434.85 | +1.02%
     90 | 51428.65  | 52410.10 | +1.02%

The kernels used for these tests also had my "MIPS: Hardcode cpu_has_*
where known at compile time due to ISA" patch applied, which allows the
kernel_uses_llsc checks in cmpxchg() & xchg() to be optimised away at
compile time.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/Kconfig                      |   1 +
 arch/mips/include/asm/Kbuild           |   1 +
 arch/mips/include/asm/spinlock.h       | 216 +--------------------------------
 arch/mips/include/asm/spinlock_types.h |  10 +-
 4 files changed, 4 insertions(+), 224 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2828ecde133d..50c71273f569 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -70,6 +70,7 @@ config MIPS
 	select HAVE_EXIT_THREAD
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_COPY_THREAD_TLS
+	select ARCH_USE_QUEUED_RWLOCKS
 
 menu "Machine selection"
 
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 2535c7b4c482..ae6cb47e9d22 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,6 +12,7 @@ generic-y += mm-arch-hooks.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
+generic-y += qrwlock.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index a8df44d60607..3e7afff196cd 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -13,6 +13,7 @@
 
 #include <asm/barrier.h>
 #include <asm/processor.h>
+#include <asm/qrwlock.h>
 #include <asm/compiler.h>
 #include <asm/war.h>
 
@@ -220,221 +221,6 @@ static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
 	return tmp;
 }
 
-/*
- * Read-write spinlocks, allowing multiple readers but only one writer.
- *
- * NOTE! it is quite common to have readers in interrupts but no interrupt
- * writers. For those circumstances we can "mix" irq-safe locks - any writer
- * needs to get a irq-safe write-lock, but readers can get non-irqsafe
- * read-locks.
- */
-
-/*
- * read_can_lock - would read_trylock() succeed?
- * @lock: the rwlock in question.
- */
-#define arch_read_can_lock(rw)	((rw)->lock >= 0)
-
-/*
- * write_can_lock - would write_trylock() succeed?
- * @lock: the rwlock in question.
- */
-#define arch_write_can_lock(rw) (!(rw)->lock)
-
-static inline void arch_read_lock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_read_lock	\n"
-		"1:	ll	%1, %2					\n"
-		"	bltz	%1, 1b					\n"
-		"	 addu	%1, 1					\n"
-		"	sc	%1, %0					\n"
-		"	beqzl	%1, 1b					\n"
-		"	 nop						\n"
-		"	.set	reorder					\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		do {
-			__asm__ __volatile__(
-			"1:	ll	%1, %2	# arch_read_lock	\n"
-			"	bltz	%1, 1b				\n"
-			"	 addu	%1, 1				\n"
-			"2:	sc	%1, %0				\n"
-			: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-			: GCC_OFF_SMALL_ASM() (rw->lock)
-			: "memory");
-		} while (unlikely(!tmp));
-	}
-
-	smp_llsc_mb();
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	smp_mb__before_llsc();
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"1:	ll	%1, %2		# arch_read_unlock	\n"
-		"	addiu	%1, -1					\n"
-		"	sc	%1, %0					\n"
-		"	beqzl	%1, 1b					\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		do {
-			__asm__ __volatile__(
-			"1:	ll	%1, %2	# arch_read_unlock	\n"
-			"	addiu	%1, -1				\n"
-			"	sc	%1, %0				\n"
-			: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-			: GCC_OFF_SMALL_ASM() (rw->lock)
-			: "memory");
-		} while (unlikely(!tmp));
-	}
-}
-
-static inline void arch_write_lock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_write_lock	\n"
-		"1:	ll	%1, %2					\n"
-		"	bnez	%1, 1b					\n"
-		"	 lui	%1, 0x8000				\n"
-		"	sc	%1, %0					\n"
-		"	beqzl	%1, 1b					\n"
-		"	 nop						\n"
-		"	.set	reorder					\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		do {
-			__asm__ __volatile__(
-			"1:	ll	%1, %2	# arch_write_lock	\n"
-			"	bnez	%1, 1b				\n"
-			"	 lui	%1, 0x8000			\n"
-			"2:	sc	%1, %0				\n"
-			: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-			: GCC_OFF_SMALL_ASM() (rw->lock)
-			: "memory");
-		} while (unlikely(!tmp));
-	}
-
-	smp_llsc_mb();
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *rw)
-{
-	smp_mb__before_llsc();
-
-	__asm__ __volatile__(
-	"				# arch_write_unlock	\n"
-	"	sw	$0, %0					\n"
-	: "=m" (rw->lock)
-	: "m" (rw->lock)
-	: "memory");
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-	int ret;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_read_trylock	\n"
-		"	li	%2, 0					\n"
-		"1:	ll	%1, %3					\n"
-		"	bltz	%1, 2f					\n"
-		"	 addu	%1, 1					\n"
-		"	sc	%1, %0					\n"
-		"	.set	reorder					\n"
-		"	beqzl	%1, 1b					\n"
-		"	 nop						\n"
-		__WEAK_LLSC_MB
-		"	li	%2, 1					\n"
-		"2:							\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp), "=&r" (ret)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_read_trylock	\n"
-		"	li	%2, 0					\n"
-		"1:	ll	%1, %3					\n"
-		"	bltz	%1, 2f					\n"
-		"	 addu	%1, 1					\n"
-		"	sc	%1, %0					\n"
-		"	beqz	%1, 1b					\n"
-		"	 nop						\n"
-		"	.set	reorder					\n"
-		__WEAK_LLSC_MB
-		"	li	%2, 1					\n"
-		"2:	.insn						\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp), "=&r" (ret)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	}
-
-	return ret;
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-	int ret;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_write_trylock	\n"
-		"	li	%2, 0					\n"
-		"1:	ll	%1, %3					\n"
-		"	bnez	%1, 2f					\n"
-		"	 lui	%1, 0x8000				\n"
-		"	sc	%1, %0					\n"
-		"	beqzl	%1, 1b					\n"
-		"	 nop						\n"
-		__WEAK_LLSC_MB
-		"	li	%2, 1					\n"
-		"	.set	reorder					\n"
-		"2:							\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp), "=&r" (ret)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		do {
-			__asm__ __volatile__(
-			"	ll	%1, %3	# arch_write_trylock	\n"
-			"	li	%2, 0				\n"
-			"	bnez	%1, 2f				\n"
-			"	lui	%1, 0x8000			\n"
-			"	sc	%1, %0				\n"
-			"	li	%2, 1				\n"
-			"2:	.insn					\n"
-			: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp),
-			  "=&r" (ret)
-			: GCC_OFF_SMALL_ASM() (rw->lock)
-			: "memory");
-		} while (unlikely(!tmp));
-
-		smp_llsc_mb();
-	}
-
-	return ret;
-}
-
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
 
diff --git a/arch/mips/include/asm/spinlock_types.h b/arch/mips/include/asm/spinlock_types.h
index 9b2528e612c0..3d38bfad9b49 100644
--- a/arch/mips/include/asm/spinlock_types.h
+++ b/arch/mips/include/asm/spinlock_types.h
@@ -1,10 +1,6 @@
 #ifndef _ASM_SPINLOCK_TYPES_H
 #define _ASM_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
 #include <linux/types.h>
 
 #include <asm/byteorder.h>
@@ -28,10 +24,6 @@ typedef union {
 
 #define __ARCH_SPIN_LOCK_UNLOCKED	{ .lock = 0 }
 
-typedef struct {
-	volatile unsigned int lock;
-} arch_rwlock_t;
-
-#define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
+#include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/11] MIPS: Use queued read/write locks (qrwlock)
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

This patch switches MIPS to make use of generically implemented queued
read/write locks, rather than the custom implementation used previously.
This allows us to drop a whole load of inline assembly, share more
generic code, and is also a performance win.

Results from running the AIM7 short workload on a MIPS Creator Ci40 (ie.
2 core 2 thread interAptiv CPU clocked at 546MHz) with v4.12-rc4
pistachio_defconfig, with ftrace disabled due to a current bug, and both
with & without use of queued rwlocks & spinlocks:

  Forks | v4.12-rc4 | +qlocks  | Change
 -------|-----------|----------|--------
     10 | 52630.32  | 53316.31 | +1.01%
     20 | 51777.80  | 52623.15 | +1.02%
     30 | 51645.92  | 52517.26 | +1.02%
     40 | 51634.88  | 52419.89 | +1.02%
     50 | 51506.75  | 52307.81 | +1.02%
     60 | 51500.74  | 52322.72 | +1.02%
     70 | 51434.81  | 52288.60 | +1.02%
     80 | 51423.22  | 52434.85 | +1.02%
     90 | 51428.65  | 52410.10 | +1.02%

The kernels used for these tests also had my "MIPS: Hardcode cpu_has_*
where known at compile time due to ISA" patch applied, which allows the
kernel_uses_llsc checks in cmpxchg() & xchg() to be optimised away at
compile time.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/Kconfig                      |   1 +
 arch/mips/include/asm/Kbuild           |   1 +
 arch/mips/include/asm/spinlock.h       | 216 +--------------------------------
 arch/mips/include/asm/spinlock_types.h |  10 +-
 4 files changed, 4 insertions(+), 224 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2828ecde133d..50c71273f569 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -70,6 +70,7 @@ config MIPS
 	select HAVE_EXIT_THREAD
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_COPY_THREAD_TLS
+	select ARCH_USE_QUEUED_RWLOCKS
 
 menu "Machine selection"
 
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 2535c7b4c482..ae6cb47e9d22 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,6 +12,7 @@ generic-y += mm-arch-hooks.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
+generic-y += qrwlock.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index a8df44d60607..3e7afff196cd 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -13,6 +13,7 @@
 
 #include <asm/barrier.h>
 #include <asm/processor.h>
+#include <asm/qrwlock.h>
 #include <asm/compiler.h>
 #include <asm/war.h>
 
@@ -220,221 +221,6 @@ static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
 	return tmp;
 }
 
-/*
- * Read-write spinlocks, allowing multiple readers but only one writer.
- *
- * NOTE! it is quite common to have readers in interrupts but no interrupt
- * writers. For those circumstances we can "mix" irq-safe locks - any writer
- * needs to get a irq-safe write-lock, but readers can get non-irqsafe
- * read-locks.
- */
-
-/*
- * read_can_lock - would read_trylock() succeed?
- * @lock: the rwlock in question.
- */
-#define arch_read_can_lock(rw)	((rw)->lock >= 0)
-
-/*
- * write_can_lock - would write_trylock() succeed?
- * @lock: the rwlock in question.
- */
-#define arch_write_can_lock(rw) (!(rw)->lock)
-
-static inline void arch_read_lock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_read_lock	\n"
-		"1:	ll	%1, %2					\n"
-		"	bltz	%1, 1b					\n"
-		"	 addu	%1, 1					\n"
-		"	sc	%1, %0					\n"
-		"	beqzl	%1, 1b					\n"
-		"	 nop						\n"
-		"	.set	reorder					\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		do {
-			__asm__ __volatile__(
-			"1:	ll	%1, %2	# arch_read_lock	\n"
-			"	bltz	%1, 1b				\n"
-			"	 addu	%1, 1				\n"
-			"2:	sc	%1, %0				\n"
-			: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-			: GCC_OFF_SMALL_ASM() (rw->lock)
-			: "memory");
-		} while (unlikely(!tmp));
-	}
-
-	smp_llsc_mb();
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	smp_mb__before_llsc();
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"1:	ll	%1, %2		# arch_read_unlock	\n"
-		"	addiu	%1, -1					\n"
-		"	sc	%1, %0					\n"
-		"	beqzl	%1, 1b					\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		do {
-			__asm__ __volatile__(
-			"1:	ll	%1, %2	# arch_read_unlock	\n"
-			"	addiu	%1, -1				\n"
-			"	sc	%1, %0				\n"
-			: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-			: GCC_OFF_SMALL_ASM() (rw->lock)
-			: "memory");
-		} while (unlikely(!tmp));
-	}
-}
-
-static inline void arch_write_lock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_write_lock	\n"
-		"1:	ll	%1, %2					\n"
-		"	bnez	%1, 1b					\n"
-		"	 lui	%1, 0x8000				\n"
-		"	sc	%1, %0					\n"
-		"	beqzl	%1, 1b					\n"
-		"	 nop						\n"
-		"	.set	reorder					\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		do {
-			__asm__ __volatile__(
-			"1:	ll	%1, %2	# arch_write_lock	\n"
-			"	bnez	%1, 1b				\n"
-			"	 lui	%1, 0x8000			\n"
-			"2:	sc	%1, %0				\n"
-			: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp)
-			: GCC_OFF_SMALL_ASM() (rw->lock)
-			: "memory");
-		} while (unlikely(!tmp));
-	}
-
-	smp_llsc_mb();
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *rw)
-{
-	smp_mb__before_llsc();
-
-	__asm__ __volatile__(
-	"				# arch_write_unlock	\n"
-	"	sw	$0, %0					\n"
-	: "=m" (rw->lock)
-	: "m" (rw->lock)
-	: "memory");
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-	int ret;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_read_trylock	\n"
-		"	li	%2, 0					\n"
-		"1:	ll	%1, %3					\n"
-		"	bltz	%1, 2f					\n"
-		"	 addu	%1, 1					\n"
-		"	sc	%1, %0					\n"
-		"	.set	reorder					\n"
-		"	beqzl	%1, 1b					\n"
-		"	 nop						\n"
-		__WEAK_LLSC_MB
-		"	li	%2, 1					\n"
-		"2:							\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp), "=&r" (ret)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_read_trylock	\n"
-		"	li	%2, 0					\n"
-		"1:	ll	%1, %3					\n"
-		"	bltz	%1, 2f					\n"
-		"	 addu	%1, 1					\n"
-		"	sc	%1, %0					\n"
-		"	beqz	%1, 1b					\n"
-		"	 nop						\n"
-		"	.set	reorder					\n"
-		__WEAK_LLSC_MB
-		"	li	%2, 1					\n"
-		"2:	.insn						\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp), "=&r" (ret)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	}
-
-	return ret;
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-	int ret;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__(
-		"	.set	noreorder	# arch_write_trylock	\n"
-		"	li	%2, 0					\n"
-		"1:	ll	%1, %3					\n"
-		"	bnez	%1, 2f					\n"
-		"	 lui	%1, 0x8000				\n"
-		"	sc	%1, %0					\n"
-		"	beqzl	%1, 1b					\n"
-		"	 nop						\n"
-		__WEAK_LLSC_MB
-		"	li	%2, 1					\n"
-		"	.set	reorder					\n"
-		"2:							\n"
-		: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp), "=&r" (ret)
-		: GCC_OFF_SMALL_ASM() (rw->lock)
-		: "memory");
-	} else {
-		do {
-			__asm__ __volatile__(
-			"	ll	%1, %3	# arch_write_trylock	\n"
-			"	li	%2, 0				\n"
-			"	bnez	%1, 2f				\n"
-			"	lui	%1, 0x8000			\n"
-			"	sc	%1, %0				\n"
-			"	li	%2, 1				\n"
-			"2:	.insn					\n"
-			: "=" GCC_OFF_SMALL_ASM() (rw->lock), "=&r" (tmp),
-			  "=&r" (ret)
-			: GCC_OFF_SMALL_ASM() (rw->lock)
-			: "memory");
-		} while (unlikely(!tmp));
-
-		smp_llsc_mb();
-	}
-
-	return ret;
-}
-
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
 
diff --git a/arch/mips/include/asm/spinlock_types.h b/arch/mips/include/asm/spinlock_types.h
index 9b2528e612c0..3d38bfad9b49 100644
--- a/arch/mips/include/asm/spinlock_types.h
+++ b/arch/mips/include/asm/spinlock_types.h
@@ -1,10 +1,6 @@
 #ifndef _ASM_SPINLOCK_TYPES_H
 #define _ASM_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
 #include <linux/types.h>
 
 #include <asm/byteorder.h>
@@ -28,10 +24,6 @@ typedef union {
 
 #define __ARCH_SPIN_LOCK_UNLOCKED	{ .lock = 0 }
 
-typedef struct {
-	volatile unsigned int lock;
-} arch_rwlock_t;
-
-#define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
+#include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/11] MIPS: Use queued spinlocks (qspinlock)
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

This patch switches MIPS to make use of generically implemented queued
spinlocks, rather than the ticket spinlocks used previously. This allows
us to drop a whole load of inline assembly, share more generic code, and
is also a performance win.

Results from running the AIM7 short workload on a MIPS Creator Ci40 (ie.
2 core 2 thread interAptiv CPU clocked at 546MHz) with v4.12-rc4
pistachio_defconfig, with ftrace disabled due to a current bug, and both
with & without use of queued rwlocks & spinlocks:

  Forks | v4.12-rc4 | +qlocks  | Change
 -------|-----------|----------|--------
     10 | 52630.32  | 53316.31 | +1.01%
     20 | 51777.80  | 52623.15 | +1.02%
     30 | 51645.92  | 52517.26 | +1.02%
     40 | 51634.88  | 52419.89 | +1.02%
     50 | 51506.75  | 52307.81 | +1.02%
     60 | 51500.74  | 52322.72 | +1.02%
     70 | 51434.81  | 52288.60 | +1.02%
     80 | 51423.22  | 52434.85 | +1.02%
     90 | 51428.65  | 52410.10 | +1.02%

The kernels used for these tests also had my "MIPS: Hardcode cpu_has_*
where known at compile time due to ISA" patch applied, which allows the
kernel_uses_llsc checks in cmpxchg() & xchg() to be optimised away at
compile time.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org

---

 arch/mips/Kconfig                      |   1 +
 arch/mips/include/asm/Kbuild           |   1 +
 arch/mips/include/asm/spinlock.h       | 210 +--------------------------------
 arch/mips/include/asm/spinlock_types.h |  24 +---
 4 files changed, 4 insertions(+), 232 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 50c71273f569..398f0a55d4fa 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -71,6 +71,7 @@ config MIPS
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_COPY_THREAD_TLS
 	select ARCH_USE_QUEUED_RWLOCKS
+	select ARCH_USE_QUEUED_SPINLOCKS
 
 menu "Machine selection"
 
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index ae6cb47e9d22..7c8aab23bce8 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += qrwlock.h
+generic-y += qspinlock.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index 3e7afff196cd..a7d21da16b6a 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -9,217 +9,9 @@
 #ifndef _ASM_SPINLOCK_H
 #define _ASM_SPINLOCK_H
 
-#include <linux/compiler.h>
-
-#include <asm/barrier.h>
 #include <asm/processor.h>
 #include <asm/qrwlock.h>
-#include <asm/compiler.h>
-#include <asm/war.h>
-
-/*
- * Your basic SMP spinlocks, allowing only a single CPU anywhere
- *
- * Simple spin lock operations.	 There are two variants, one clears IRQ's
- * on the local processor, one does not.
- *
- * These are fair FIFO ticket locks
- *
- * (the type definitions are in asm/spinlock_types.h)
- */
-
-
-/*
- * Ticket locks are conceptually two parts, one indicating the current head of
- * the queue, and the other indicating the current tail. The lock is acquired
- * by atomically noting the tail and incrementing it by one (thus adding
- * ourself to the queue and noting our position), then waiting until the head
- * becomes equal to the the initial value of the tail.
- */
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	u32 counters = ACCESS_ONCE(lock->lock);
-
-	return ((counters >> 16) ^ counters) & 0xffff;
-}
-
-static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	return lock.h.serving_now == lock.h.ticket;
-}
-
-#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
-
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u16 owner = READ_ONCE(lock->h.serving_now);
-	smp_rmb();
-	for (;;) {
-		arch_spinlock_t tmp = READ_ONCE(*lock);
-
-		if (tmp.h.serving_now == tmp.h.ticket ||
-		    tmp.h.serving_now != owner)
-			break;
-
-		cpu_relax();
-	}
-	smp_acquire__after_ctrl_dep();
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	u32 counters = ACCESS_ONCE(lock->lock);
-
-	return (((counters >> 16) - counters) & 0xffff) > 1;
-}
-#define arch_spin_is_contended	arch_spin_is_contended
-
-static inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	int my_ticket;
-	int tmp;
-	int inc = 0x10000;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__ (
-		"	.set push		# arch_spin_lock	\n"
-		"	.set noreorder					\n"
-		"							\n"
-		"1:	ll	%[ticket], %[ticket_ptr]		\n"
-		"	addu	%[my_ticket], %[ticket], %[inc]		\n"
-		"	sc	%[my_ticket], %[ticket_ptr]		\n"
-		"	beqzl	%[my_ticket], 1b			\n"
-		"	 nop						\n"
-		"	srl	%[my_ticket], %[ticket], 16		\n"
-		"	andi	%[ticket], %[ticket], 0xffff		\n"
-		"	bne	%[ticket], %[my_ticket], 4f		\n"
-		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
-		"2:							\n"
-		"	.subsection 2					\n"
-		"4:	andi	%[ticket], %[ticket], 0xffff		\n"
-		"	sll	%[ticket], 5				\n"
-		"							\n"
-		"6:	bnez	%[ticket], 6b				\n"
-		"	 subu	%[ticket], 1				\n"
-		"							\n"
-		"	lhu	%[ticket], %[serving_now_ptr]		\n"
-		"	beq	%[ticket], %[my_ticket], 2b		\n"
-		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
-		"	b	4b					\n"
-		"	 subu	%[ticket], %[ticket], 1			\n"
-		"	.previous					\n"
-		"	.set pop					\n"
-		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
-		  [serving_now_ptr] "+m" (lock->h.serving_now),
-		  [ticket] "=&r" (tmp),
-		  [my_ticket] "=&r" (my_ticket)
-		: [inc] "r" (inc));
-	} else {
-		__asm__ __volatile__ (
-		"	.set push		# arch_spin_lock	\n"
-		"	.set noreorder					\n"
-		"							\n"
-		"1:	ll	%[ticket], %[ticket_ptr]		\n"
-		"	addu	%[my_ticket], %[ticket], %[inc]		\n"
-		"	sc	%[my_ticket], %[ticket_ptr]		\n"
-		"	beqz	%[my_ticket], 1b			\n"
-		"	 srl	%[my_ticket], %[ticket], 16		\n"
-		"	andi	%[ticket], %[ticket], 0xffff		\n"
-		"	bne	%[ticket], %[my_ticket], 4f		\n"
-		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
-		"2:	.insn						\n"
-		"	.subsection 2					\n"
-		"4:	andi	%[ticket], %[ticket], 0xffff		\n"
-		"	sll	%[ticket], 5				\n"
-		"							\n"
-		"6:	bnez	%[ticket], 6b				\n"
-		"	 subu	%[ticket], 1				\n"
-		"							\n"
-		"	lhu	%[ticket], %[serving_now_ptr]		\n"
-		"	beq	%[ticket], %[my_ticket], 2b		\n"
-		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
-		"	b	4b					\n"
-		"	 subu	%[ticket], %[ticket], 1			\n"
-		"	.previous					\n"
-		"	.set pop					\n"
-		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
-		  [serving_now_ptr] "+m" (lock->h.serving_now),
-		  [ticket] "=&r" (tmp),
-		  [my_ticket] "=&r" (my_ticket)
-		: [inc] "r" (inc));
-	}
-
-	smp_llsc_mb();
-}
-
-static inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	unsigned int serving_now = lock->h.serving_now + 1;
-	wmb();
-	lock->h.serving_now = (u16)serving_now;
-	nudge_writes();
-}
-
-static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	int tmp, tmp2, tmp3;
-	int inc = 0x10000;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__ (
-		"	.set push		# arch_spin_trylock	\n"
-		"	.set noreorder					\n"
-		"							\n"
-		"1:	ll	%[ticket], %[ticket_ptr]		\n"
-		"	srl	%[my_ticket], %[ticket], 16		\n"
-		"	andi	%[now_serving], %[ticket], 0xffff	\n"
-		"	bne	%[my_ticket], %[now_serving], 3f	\n"
-		"	 addu	%[ticket], %[ticket], %[inc]		\n"
-		"	sc	%[ticket], %[ticket_ptr]		\n"
-		"	beqzl	%[ticket], 1b				\n"
-		"	 li	%[ticket], 1				\n"
-		"2:							\n"
-		"	.subsection 2					\n"
-		"3:	b	2b					\n"
-		"	 li	%[ticket], 0				\n"
-		"	.previous					\n"
-		"	.set pop					\n"
-		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
-		  [ticket] "=&r" (tmp),
-		  [my_ticket] "=&r" (tmp2),
-		  [now_serving] "=&r" (tmp3)
-		: [inc] "r" (inc));
-	} else {
-		__asm__ __volatile__ (
-		"	.set push		# arch_spin_trylock	\n"
-		"	.set noreorder					\n"
-		"							\n"
-		"1:	ll	%[ticket], %[ticket_ptr]		\n"
-		"	srl	%[my_ticket], %[ticket], 16		\n"
-		"	andi	%[now_serving], %[ticket], 0xffff	\n"
-		"	bne	%[my_ticket], %[now_serving], 3f	\n"
-		"	 addu	%[ticket], %[ticket], %[inc]		\n"
-		"	sc	%[ticket], %[ticket_ptr]		\n"
-		"	beqz	%[ticket], 1b				\n"
-		"	 li	%[ticket], 1				\n"
-		"2:	.insn						\n"
-		"	.subsection 2					\n"
-		"3:	b	2b					\n"
-		"	 li	%[ticket], 0				\n"
-		"	.previous					\n"
-		"	.set pop					\n"
-		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
-		  [ticket] "=&r" (tmp),
-		  [my_ticket] "=&r" (tmp2),
-		  [now_serving] "=&r" (tmp3)
-		: [inc] "r" (inc));
-	}
-
-	smp_llsc_mb();
-
-	return tmp;
-}
+#include <asm/qspinlock.h>
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
diff --git a/arch/mips/include/asm/spinlock_types.h b/arch/mips/include/asm/spinlock_types.h
index 3d38bfad9b49..177e722eb96c 100644
--- a/arch/mips/include/asm/spinlock_types.h
+++ b/arch/mips/include/asm/spinlock_types.h
@@ -1,29 +1,7 @@
 #ifndef _ASM_SPINLOCK_TYPES_H
 #define _ASM_SPINLOCK_TYPES_H
 
-#include <linux/types.h>
-
-#include <asm/byteorder.h>
-
-typedef union {
-	/*
-	 * bits	 0..15 : serving_now
-	 * bits 16..31 : ticket
-	 */
-	u32 lock;
-	struct {
-#ifdef __BIG_ENDIAN
-		u16 ticket;
-		u16 serving_now;
-#else
-		u16 serving_now;
-		u16 ticket;
-#endif
-	} h;
-} arch_spinlock_t;
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ .lock = 0 }
-
+#include <asm-generic/qspinlock_types.h>
 #include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/11] MIPS: Use queued spinlocks (qspinlock)
@ 2017-06-10  0:26   ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-10  0:26 UTC (permalink / raw)
  To: linux-mips; +Cc: Ralf Baechle, Paul Burton

This patch switches MIPS to make use of generically implemented queued
spinlocks, rather than the ticket spinlocks used previously. This allows
us to drop a whole load of inline assembly, share more generic code, and
is also a performance win.

Results from running the AIM7 short workload on a MIPS Creator Ci40 (ie.
2 core 2 thread interAptiv CPU clocked at 546MHz) with v4.12-rc4
pistachio_defconfig, with ftrace disabled due to a current bug, and both
with & without use of queued rwlocks & spinlocks:

  Forks | v4.12-rc4 | +qlocks  | Change
 -------|-----------|----------|--------
     10 | 52630.32  | 53316.31 | +1.01%
     20 | 51777.80  | 52623.15 | +1.02%
     30 | 51645.92  | 52517.26 | +1.02%
     40 | 51634.88  | 52419.89 | +1.02%
     50 | 51506.75  | 52307.81 | +1.02%
     60 | 51500.74  | 52322.72 | +1.02%
     70 | 51434.81  | 52288.60 | +1.02%
     80 | 51423.22  | 52434.85 | +1.02%
     90 | 51428.65  | 52410.10 | +1.02%

The kernels used for these tests also had my "MIPS: Hardcode cpu_has_*
where known at compile time due to ISA" patch applied, which allows the
kernel_uses_llsc checks in cmpxchg() & xchg() to be optimised away at
compile time.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org

---

 arch/mips/Kconfig                      |   1 +
 arch/mips/include/asm/Kbuild           |   1 +
 arch/mips/include/asm/spinlock.h       | 210 +--------------------------------
 arch/mips/include/asm/spinlock_types.h |  24 +---
 4 files changed, 4 insertions(+), 232 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 50c71273f569..398f0a55d4fa 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -71,6 +71,7 @@ config MIPS
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_COPY_THREAD_TLS
 	select ARCH_USE_QUEUED_RWLOCKS
+	select ARCH_USE_QUEUED_SPINLOCKS
 
 menu "Machine selection"
 
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index ae6cb47e9d22..7c8aab23bce8 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += qrwlock.h
+generic-y += qspinlock.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index 3e7afff196cd..a7d21da16b6a 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -9,217 +9,9 @@
 #ifndef _ASM_SPINLOCK_H
 #define _ASM_SPINLOCK_H
 
-#include <linux/compiler.h>
-
-#include <asm/barrier.h>
 #include <asm/processor.h>
 #include <asm/qrwlock.h>
-#include <asm/compiler.h>
-#include <asm/war.h>
-
-/*
- * Your basic SMP spinlocks, allowing only a single CPU anywhere
- *
- * Simple spin lock operations.	 There are two variants, one clears IRQ's
- * on the local processor, one does not.
- *
- * These are fair FIFO ticket locks
- *
- * (the type definitions are in asm/spinlock_types.h)
- */
-
-
-/*
- * Ticket locks are conceptually two parts, one indicating the current head of
- * the queue, and the other indicating the current tail. The lock is acquired
- * by atomically noting the tail and incrementing it by one (thus adding
- * ourself to the queue and noting our position), then waiting until the head
- * becomes equal to the the initial value of the tail.
- */
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	u32 counters = ACCESS_ONCE(lock->lock);
-
-	return ((counters >> 16) ^ counters) & 0xffff;
-}
-
-static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	return lock.h.serving_now == lock.h.ticket;
-}
-
-#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
-
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
-	u16 owner = READ_ONCE(lock->h.serving_now);
-	smp_rmb();
-	for (;;) {
-		arch_spinlock_t tmp = READ_ONCE(*lock);
-
-		if (tmp.h.serving_now == tmp.h.ticket ||
-		    tmp.h.serving_now != owner)
-			break;
-
-		cpu_relax();
-	}
-	smp_acquire__after_ctrl_dep();
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	u32 counters = ACCESS_ONCE(lock->lock);
-
-	return (((counters >> 16) - counters) & 0xffff) > 1;
-}
-#define arch_spin_is_contended	arch_spin_is_contended
-
-static inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	int my_ticket;
-	int tmp;
-	int inc = 0x10000;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__ (
-		"	.set push		# arch_spin_lock	\n"
-		"	.set noreorder					\n"
-		"							\n"
-		"1:	ll	%[ticket], %[ticket_ptr]		\n"
-		"	addu	%[my_ticket], %[ticket], %[inc]		\n"
-		"	sc	%[my_ticket], %[ticket_ptr]		\n"
-		"	beqzl	%[my_ticket], 1b			\n"
-		"	 nop						\n"
-		"	srl	%[my_ticket], %[ticket], 16		\n"
-		"	andi	%[ticket], %[ticket], 0xffff		\n"
-		"	bne	%[ticket], %[my_ticket], 4f		\n"
-		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
-		"2:							\n"
-		"	.subsection 2					\n"
-		"4:	andi	%[ticket], %[ticket], 0xffff		\n"
-		"	sll	%[ticket], 5				\n"
-		"							\n"
-		"6:	bnez	%[ticket], 6b				\n"
-		"	 subu	%[ticket], 1				\n"
-		"							\n"
-		"	lhu	%[ticket], %[serving_now_ptr]		\n"
-		"	beq	%[ticket], %[my_ticket], 2b		\n"
-		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
-		"	b	4b					\n"
-		"	 subu	%[ticket], %[ticket], 1			\n"
-		"	.previous					\n"
-		"	.set pop					\n"
-		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
-		  [serving_now_ptr] "+m" (lock->h.serving_now),
-		  [ticket] "=&r" (tmp),
-		  [my_ticket] "=&r" (my_ticket)
-		: [inc] "r" (inc));
-	} else {
-		__asm__ __volatile__ (
-		"	.set push		# arch_spin_lock	\n"
-		"	.set noreorder					\n"
-		"							\n"
-		"1:	ll	%[ticket], %[ticket_ptr]		\n"
-		"	addu	%[my_ticket], %[ticket], %[inc]		\n"
-		"	sc	%[my_ticket], %[ticket_ptr]		\n"
-		"	beqz	%[my_ticket], 1b			\n"
-		"	 srl	%[my_ticket], %[ticket], 16		\n"
-		"	andi	%[ticket], %[ticket], 0xffff		\n"
-		"	bne	%[ticket], %[my_ticket], 4f		\n"
-		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
-		"2:	.insn						\n"
-		"	.subsection 2					\n"
-		"4:	andi	%[ticket], %[ticket], 0xffff		\n"
-		"	sll	%[ticket], 5				\n"
-		"							\n"
-		"6:	bnez	%[ticket], 6b				\n"
-		"	 subu	%[ticket], 1				\n"
-		"							\n"
-		"	lhu	%[ticket], %[serving_now_ptr]		\n"
-		"	beq	%[ticket], %[my_ticket], 2b		\n"
-		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
-		"	b	4b					\n"
-		"	 subu	%[ticket], %[ticket], 1			\n"
-		"	.previous					\n"
-		"	.set pop					\n"
-		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
-		  [serving_now_ptr] "+m" (lock->h.serving_now),
-		  [ticket] "=&r" (tmp),
-		  [my_ticket] "=&r" (my_ticket)
-		: [inc] "r" (inc));
-	}
-
-	smp_llsc_mb();
-}
-
-static inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	unsigned int serving_now = lock->h.serving_now + 1;
-	wmb();
-	lock->h.serving_now = (u16)serving_now;
-	nudge_writes();
-}
-
-static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	int tmp, tmp2, tmp3;
-	int inc = 0x10000;
-
-	if (R10000_LLSC_WAR) {
-		__asm__ __volatile__ (
-		"	.set push		# arch_spin_trylock	\n"
-		"	.set noreorder					\n"
-		"							\n"
-		"1:	ll	%[ticket], %[ticket_ptr]		\n"
-		"	srl	%[my_ticket], %[ticket], 16		\n"
-		"	andi	%[now_serving], %[ticket], 0xffff	\n"
-		"	bne	%[my_ticket], %[now_serving], 3f	\n"
-		"	 addu	%[ticket], %[ticket], %[inc]		\n"
-		"	sc	%[ticket], %[ticket_ptr]		\n"
-		"	beqzl	%[ticket], 1b				\n"
-		"	 li	%[ticket], 1				\n"
-		"2:							\n"
-		"	.subsection 2					\n"
-		"3:	b	2b					\n"
-		"	 li	%[ticket], 0				\n"
-		"	.previous					\n"
-		"	.set pop					\n"
-		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
-		  [ticket] "=&r" (tmp),
-		  [my_ticket] "=&r" (tmp2),
-		  [now_serving] "=&r" (tmp3)
-		: [inc] "r" (inc));
-	} else {
-		__asm__ __volatile__ (
-		"	.set push		# arch_spin_trylock	\n"
-		"	.set noreorder					\n"
-		"							\n"
-		"1:	ll	%[ticket], %[ticket_ptr]		\n"
-		"	srl	%[my_ticket], %[ticket], 16		\n"
-		"	andi	%[now_serving], %[ticket], 0xffff	\n"
-		"	bne	%[my_ticket], %[now_serving], 3f	\n"
-		"	 addu	%[ticket], %[ticket], %[inc]		\n"
-		"	sc	%[ticket], %[ticket_ptr]		\n"
-		"	beqz	%[ticket], 1b				\n"
-		"	 li	%[ticket], 1				\n"
-		"2:	.insn						\n"
-		"	.subsection 2					\n"
-		"3:	b	2b					\n"
-		"	 li	%[ticket], 0				\n"
-		"	.previous					\n"
-		"	.set pop					\n"
-		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
-		  [ticket] "=&r" (tmp),
-		  [my_ticket] "=&r" (tmp2),
-		  [now_serving] "=&r" (tmp3)
-		: [inc] "r" (inc));
-	}
-
-	smp_llsc_mb();
-
-	return tmp;
-}
+#include <asm/qspinlock.h>
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
diff --git a/arch/mips/include/asm/spinlock_types.h b/arch/mips/include/asm/spinlock_types.h
index 3d38bfad9b49..177e722eb96c 100644
--- a/arch/mips/include/asm/spinlock_types.h
+++ b/arch/mips/include/asm/spinlock_types.h
@@ -1,29 +1,7 @@
 #ifndef _ASM_SPINLOCK_TYPES_H
 #define _ASM_SPINLOCK_TYPES_H
 
-#include <linux/types.h>
-
-#include <asm/byteorder.h>
-
-typedef union {
-	/*
-	 * bits	 0..15 : serving_now
-	 * bits 16..31 : ticket
-	 */
-	u32 lock;
-	struct {
-#ifdef __BIG_ENDIAN
-		u16 ticket;
-		u16 serving_now;
-#else
-		u16 serving_now;
-		u16 ticket;
-#endif
-	} h;
-} arch_spinlock_t;
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ .lock = 0 }
-
+#include <asm-generic/qspinlock_types.h>
 #include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/11] MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases
  2017-06-10  0:26   ` Paul Burton
  (?)
@ 2017-06-10 23:25   ` Joshua Kinard
  2017-06-12 19:02       ` Paul Burton
  -1 siblings, 1 reply; 32+ messages in thread
From: Joshua Kinard @ 2017-06-10 23:25 UTC (permalink / raw)
  To: Paul Burton, linux-mips; +Cc: Ralf Baechle

On 06/09/2017 20:26, Paul Burton wrote:
> Prior to this patch the xchg & cmpxchg functions have duplicated code
> which is for all intents & purposes identical apart from use of a
> branch-likely instruction in the R10000_LLSC_WAR case & a regular branch
> instruction in the non-R10000_LLSC_WAR case.
> 
> This patch removes the duplication, declaring a __scbeqz macro to select
> the branch instruction suitable for use when checking the result of an
> sc instruction & making use of it to unify the 2 cases.
> 
> In __xchg_u{32,64}() this means writing the branch in asm, where it was
> previously being done in C as a do...while loop for the
> non-R10000_LLSC_WAR case. As this is a single instruction, and adds
> consistency with the R10000_LLSC_WAR cases & the cmpxchg() code, this
> seems worthwhile.

IMHO, a good cleanup.  I'll test shortly on my Octane once I get some things
moved around and a new kernel tree set up.  That said, there are a number of
locations where R10000_LLSC_WAR is used in this manner, and I think this change
should be broken out into its own patch series, with the macro that selects
normal branch over branch-likely, placed into a main MIPS header file
somewhere, and the same change made to all locations at once.  That'll give
some consistency to everything.

I don't think any of my SGI systems have the specific R10K revision that's
affected.  I've been building kernels for a while now with a patch that splits
the R10K CPU selection up into vanilla R10K, and then R12K/R14K/R16K, and using
the non-R10000_LLSC_WAR path for R12K and up, as many of my SGI systems have at
least R12K processors.  My IP28 might have the older, affected R10K, but I
haven't tested that out since 4.4 (it didn't survive for very long, either).

I also wonder if "__scbeqz" is a descriptive-enough name.  It works in this
case because the macro definition is at the top, but if moved to a different
header file, perhaps something more like "__r10k_b_insn" or such may clue
casual readers in some more?

--J


> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: linux-mips@linux-mips.org
> ---
> 
>  arch/mips/include/asm/cmpxchg.h | 80 ++++++++++++-----------------------------
>  1 file changed, 22 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
> index b71ab4a5fd50..0582c01d229d 100644
> --- a/arch/mips/include/asm/cmpxchg.h
> +++ b/arch/mips/include/asm/cmpxchg.h
> @@ -13,44 +13,38 @@
>  #include <asm/compiler.h>
>  #include <asm/war.h>
>  
> +/*
> + * Using a branch-likely instruction to check the result of an sc instruction
> + * works around a bug present in R10000 CPUs prior to revision 3.0 that could
> + * cause ll-sc sequences to execute non-atomically.
> + */
> +#if R10000_LLSC_WAR
> +# define __scbeqz "beqzl"
> +#else
> +# define __scbeqz "beqz"
> +#endif
> +
>  static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
>  {
>  	__u32 retval;
>  
>  	smp_mb__before_llsc();
>  
> -	if (kernel_uses_llsc && R10000_LLSC_WAR) {
> +	if (kernel_uses_llsc) {
>  		unsigned long dummy;
>  
>  		__asm__ __volatile__(
> -		"	.set	arch=r4000				\n"
> +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
>  		"1:	ll	%0, %3			# xchg_u32	\n"
>  		"	.set	mips0					\n"
>  		"	move	%2, %z4					\n"
> -		"	.set	arch=r4000				\n"
> +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
>  		"	sc	%2, %1					\n"
> -		"	beqzl	%2, 1b					\n"
> +		"\t" __scbeqz "	%2, 1b					\n"
>  		"	.set	mips0					\n"
>  		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
>  		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
>  		: "memory");
> -	} else if (kernel_uses_llsc) {
> -		unsigned long dummy;
> -
> -		do {
> -			__asm__ __volatile__(
> -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> -			"	ll	%0, %3		# xchg_u32	\n"
> -			"	.set	mips0				\n"
> -			"	move	%2, %z4				\n"
> -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> -			"	sc	%2, %1				\n"
> -			"	.set	mips0				\n"
> -			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
> -			  "=&r" (dummy)
> -			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> -			: "memory");
> -		} while (unlikely(!dummy));
>  	} else {
>  		unsigned long flags;
>  
> @@ -72,34 +66,19 @@ static inline __u64 __xchg_u64(volatile __u64 * m, __u64 val)
>  
>  	smp_mb__before_llsc();
>  
> -	if (kernel_uses_llsc && R10000_LLSC_WAR) {
> +	if (kernel_uses_llsc) {
>  		unsigned long dummy;
>  
>  		__asm__ __volatile__(
> -		"	.set	arch=r4000				\n"
> +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
>  		"1:	lld	%0, %3			# xchg_u64	\n"
>  		"	move	%2, %z4					\n"
>  		"	scd	%2, %1					\n"
> -		"	beqzl	%2, 1b					\n"
> +		"\t" __scbeqz "	%2, 1b					\n"
>  		"	.set	mips0					\n"
>  		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
>  		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
>  		: "memory");
> -	} else if (kernel_uses_llsc) {
> -		unsigned long dummy;
> -
> -		do {
> -			__asm__ __volatile__(
> -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> -			"	lld	%0, %3		# xchg_u64	\n"
> -			"	move	%2, %z4				\n"
> -			"	scd	%2, %1				\n"
> -			"	.set	mips0				\n"
> -			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
> -			  "=&r" (dummy)
> -			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> -			: "memory");
> -		} while (unlikely(!dummy));
>  	} else {
>  		unsigned long flags;
>  
> @@ -142,24 +121,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
>  ({									\
>  	__typeof(*(m)) __ret;						\
>  									\
> -	if (kernel_uses_llsc && R10000_LLSC_WAR) {			\
> -		__asm__ __volatile__(					\
> -		"	.set	push				\n"	\
> -		"	.set	noat				\n"	\
> -		"	.set	arch=r4000			\n"	\
> -		"1:	" ld "	%0, %2		# __cmpxchg_asm \n"	\
> -		"	bne	%0, %z3, 2f			\n"	\
> -		"	.set	mips0				\n"	\
> -		"	move	$1, %z4				\n"	\
> -		"	.set	arch=r4000			\n"	\
> -		"	" st "	$1, %1				\n"	\
> -		"	beqzl	$1, 1b				\n"	\
> -		"2:						\n"	\
> -		"	.set	pop				\n"	\
> -		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
> -		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
> -		: "memory");						\
> -	} else if (kernel_uses_llsc) {					\
> +	if (kernel_uses_llsc) {						\
>  		__asm__ __volatile__(					\
>  		"	.set	push				\n"	\
>  		"	.set	noat				\n"	\
> @@ -170,7 +132,7 @@ static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int siz
>  		"	move	$1, %z4				\n"	\
>  		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
>  		"	" st "	$1, %1				\n"	\
> -		"	beqz	$1, 1b				\n"	\
> +		"\t" __scbeqz "	$1, 1b				\n"	\
>  		"	.set	pop				\n"	\
>  		"2:						\n"	\
>  		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
> @@ -245,4 +207,6 @@ extern void __cmpxchg_called_with_bad_pointer(void);
>  #define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n))
>  #endif
>  
> +#undef __scbeqz
> +
>  #endif /* __ASM_CMPXCHG_H */
> 


-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] MIPS: cmpxchg(), xchg() fixes & queued locks
  2017-06-10  0:26 ` Paul Burton
                   ` (11 preceding siblings ...)
  (?)
@ 2017-06-12  8:27 ` Ralf Baechle
  2017-06-12 22:47     ` Paul Burton
  -1 siblings, 1 reply; 32+ messages in thread
From: Ralf Baechle @ 2017-06-12  8:27 UTC (permalink / raw)
  To: Paul Burton; +Cc: linux-mips

On Fri, Jun 09, 2017 at 05:26:32PM -0700, Paul Burton wrote:

> This series makes a bunch of cleanups & improvements to the cmpxchg() &
> xchg() macros & functions, allowing them to be used on values smaller
> than 4 bytes, then switches MIPS over to use generic queued spinlocks &
> queued read/write locks.

A number of nice cleanups there!

I'm wondering, have you tested the kernel size with and without this
series applied?  GCC claims since 25 years or so that inlines are as
efficient as macros but in reality macros have always been superior
which mattered for things that are expanded very often.

More recent GCCs have claimed improvments so it'd be interested to see
actual numbers - and possibly get rid of many more unmaintainable macros.

  Ralf

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/11] MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases
@ 2017-06-12 19:02       ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-12 19:02 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips, Ralf Baechle

[-- Attachment #1: Type: text/plain, Size: 7915 bytes --]

Hi Joshua,

On Saturday, 10 June 2017 16:25:30 PDT Joshua Kinard wrote:
> On 06/09/2017 20:26, Paul Burton wrote:
> > Prior to this patch the xchg & cmpxchg functions have duplicated code
> > which is for all intents & purposes identical apart from use of a
> > branch-likely instruction in the R10000_LLSC_WAR case & a regular branch
> > instruction in the non-R10000_LLSC_WAR case.
> > 
> > This patch removes the duplication, declaring a __scbeqz macro to select
> > the branch instruction suitable for use when checking the result of an
> > sc instruction & making use of it to unify the 2 cases.
> > 
> > In __xchg_u{32,64}() this means writing the branch in asm, where it was
> > previously being done in C as a do...while loop for the
> > non-R10000_LLSC_WAR case. As this is a single instruction, and adds
> > consistency with the R10000_LLSC_WAR cases & the cmpxchg() code, this
> > seems worthwhile.
> 
> IMHO, a good cleanup.  I'll test shortly on my Octane once I get some things
> moved around and a new kernel tree set up.

Thanks :)

> That said, there are a number
> of locations where R10000_LLSC_WAR is used in this manner, and I think this
> change should be broken out into its own patch series, with the macro that
> selects normal branch over branch-likely, placed into a main MIPS header
> file somewhere, and the same change made to all locations at once.  That'll
> give some consistency to everything.

I agree that it might be good to generalise this approach & use it elsewhere 
too, but I'd prefer that happen in further patches so that it doesn't hold up 
the rest of these cmpxchg() & locking changes.

> I don't think any of my SGI systems have the specific R10K revision that's
> affected.  I've been building kernels for a while now with a patch that
> splits the R10K CPU selection up into vanilla R10K, and then
> R12K/R14K/R16K, and using the non-R10000_LLSC_WAR path for R12K and up, as
> many of my SGI systems have at least R12K processors.  My IP28 might have
> the older, affected R10K, but I haven't tested that out since 4.4 (it
> didn't survive for very long, either).
> 
> I also wonder if "__scbeqz" is a descriptive-enough name.  It works in this
> case because the macro definition is at the top, but if moved to a different
> header file, perhaps something more like "__r10k_b_insn" or such may clue
> casual readers in some more?

Hmm, the issue then is that the name is shared for the r10k & non-r10k cases 
so putting r10k in the name seems a bit misleading. __scbeqz at least hints 
that it's the variant of beqz to be used to compare the result of an sc.

Thanks,
    Paul

> 
> --J
> 
> > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > Cc: Ralf Baechle <ralf@linux-mips.org>
> > Cc: linux-mips@linux-mips.org
> > ---
> > 
> >  arch/mips/include/asm/cmpxchg.h | 80
> >  ++++++++++++----------------------------- 1 file changed, 22
> >  insertions(+), 58 deletions(-)
> > 
> > diff --git a/arch/mips/include/asm/cmpxchg.h
> > b/arch/mips/include/asm/cmpxchg.h index b71ab4a5fd50..0582c01d229d 100644
> > --- a/arch/mips/include/asm/cmpxchg.h
> > +++ b/arch/mips/include/asm/cmpxchg.h
> > @@ -13,44 +13,38 @@
> > 
> >  #include <asm/compiler.h>
> >  #include <asm/war.h>
> > 
> > +/*
> > + * Using a branch-likely instruction to check the result of an sc
> > instruction + * works around a bug present in R10000 CPUs prior to
> > revision 3.0 that could + * cause ll-sc sequences to execute
> > non-atomically.
> > + */
> > +#if R10000_LLSC_WAR
> > +# define __scbeqz "beqzl"
> > +#else
> > +# define __scbeqz "beqz"
> > +#endif
> > +
> > 
> >  static inline unsigned long __xchg_u32(volatile int * m, unsigned int
> >  val)
> >  {
> >  
> >  	__u32 retval;
> >  	
> >  	smp_mb__before_llsc();
> > 
> > -	if (kernel_uses_llsc && R10000_LLSC_WAR) {
> > +	if (kernel_uses_llsc) {
> > 
> >  		unsigned long dummy;
> >  		
> >  		__asm__ __volatile__(
> > 
> > -		"	.set	arch=r4000				\n"
> > +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
> > 
> >  		"1:	ll	%0, %3			# xchg_u32	\n"
> >  		"	.set	mips0					\n"
> >  		"	move	%2, %z4					\n"
> > 
> > -		"	.set	arch=r4000				\n"
> > +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
> > 
> >  		"	sc	%2, %1					\n"
> > 
> > -		"	beqzl	%2, 1b					\n"
> > +		"\t" __scbeqz "	%2, 1b					\n"
> > 
> >  		"	.set	mips0					\n"
> >  		
> >  		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
> >  		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> >  		: "memory");
> > 
> > -	} else if (kernel_uses_llsc) {
> > -		unsigned long dummy;
> > -
> > -		do {
> > -			__asm__ __volatile__(
> > -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> > -			"	ll	%0, %3		# xchg_u32	\n"
> > -			"	.set	mips0				\n"
> > -			"	move	%2, %z4				\n"
> > -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> > -			"	sc	%2, %1				\n"
> > -			"	.set	mips0				\n"
> > -			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
> > -			  "=&r" (dummy)
> > -			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> > -			: "memory");
> > -		} while (unlikely(!dummy));
> > 
> >  	} else {
> >  	
> >  		unsigned long flags;
> > 
> > @@ -72,34 +66,19 @@ static inline __u64 __xchg_u64(volatile __u64 * m,
> > __u64 val)> 
> >  	smp_mb__before_llsc();
> > 
> > -	if (kernel_uses_llsc && R10000_LLSC_WAR) {
> > +	if (kernel_uses_llsc) {
> > 
> >  		unsigned long dummy;
> >  		
> >  		__asm__ __volatile__(
> > 
> > -		"	.set	arch=r4000				\n"
> > +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
> > 
> >  		"1:	lld	%0, %3			# xchg_u64	\n"
> >  		"	move	%2, %z4					\n"
> >  		"	scd	%2, %1					\n"
> > 
> > -		"	beqzl	%2, 1b					\n"
> > +		"\t" __scbeqz "	%2, 1b					\n"
> > 
> >  		"	.set	mips0					\n"
> >  		
> >  		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
> >  		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> >  		: "memory");
> > 
> > -	} else if (kernel_uses_llsc) {
> > -		unsigned long dummy;
> > -
> > -		do {
> > -			__asm__ __volatile__(
> > -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> > -			"	lld	%0, %3		# xchg_u64	\n"
> > -			"	move	%2, %z4				\n"
> > -			"	scd	%2, %1				\n"
> > -			"	.set	mips0				\n"
> > -			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
> > -			  "=&r" (dummy)
> > -			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> > -			: "memory");
> > -		} while (unlikely(!dummy));
> > 
> >  	} else {
> >  	
> >  		unsigned long flags;
> > 
> > @@ -142,24 +121,7 @@ static inline unsigned long __xchg(unsigned long x,
> > volatile void * ptr, int siz> 
> >  ({									\
> >  
> >  	__typeof(*(m)) __ret;						\
> >  	
> >  									\
> > 
> > -	if (kernel_uses_llsc && R10000_LLSC_WAR) {			\
> > -		__asm__ __volatile__(					\
> > -		"	.set	push				\n"	\
> > -		"	.set	noat				\n"	\
> > -		"	.set	arch=r4000			\n"	\
> > -		"1:	" ld "	%0, %2		# __cmpxchg_asm \n"	\
> > -		"	bne	%0, %z3, 2f			\n"	\
> > -		"	.set	mips0				\n"	\
> > -		"	move	$1, %z4				\n"	\
> > -		"	.set	arch=r4000			\n"	\
> > -		"	" st "	$1, %1				\n"	\
> > -		"	beqzl	$1, 1b				\n"	\
> > -		"2:						\n"	\
> > -		"	.set	pop				\n"	\
> > -		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
> > -		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
> > -		: "memory");						\
> > -	} else if (kernel_uses_llsc) {					\
> > +	if (kernel_uses_llsc) {						\
> > 
> >  		__asm__ __volatile__(					\
> >  		"	.set	push				\n"	\
> >  		"	.set	noat				\n"	\
> > 
> > @@ -170,7 +132,7 @@ static inline unsigned long __xchg(unsigned long x,
> > volatile void * ptr, int siz> 
> >  		"	move	$1, %z4				\n"	\
> >  		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
> >  		"	" st "	$1, %1				\n"	\
> > 
> > -		"	beqz	$1, 1b				\n"	\
> > +		"\t" __scbeqz "	$1, 1b				\n"	\
> > 
> >  		"	.set	pop				\n"	\
> >  		"2:						\n"	\
> >  		
> >  		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
> > 
> > @@ -245,4 +207,6 @@ extern void __cmpxchg_called_with_bad_pointer(void);
> > 
> >  #define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n))
> >  #endif
> > 
> > +#undef __scbeqz
> > +
> > 
> >  #endif /* __ASM_CMPXCHG_H */


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/11] MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases
@ 2017-06-12 19:02       ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-12 19:02 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: linux-mips, Ralf Baechle

[-- Attachment #1: Type: text/plain, Size: 7915 bytes --]

Hi Joshua,

On Saturday, 10 June 2017 16:25:30 PDT Joshua Kinard wrote:
> On 06/09/2017 20:26, Paul Burton wrote:
> > Prior to this patch the xchg & cmpxchg functions have duplicated code
> > which is for all intents & purposes identical apart from use of a
> > branch-likely instruction in the R10000_LLSC_WAR case & a regular branch
> > instruction in the non-R10000_LLSC_WAR case.
> > 
> > This patch removes the duplication, declaring a __scbeqz macro to select
> > the branch instruction suitable for use when checking the result of an
> > sc instruction & making use of it to unify the 2 cases.
> > 
> > In __xchg_u{32,64}() this means writing the branch in asm, where it was
> > previously being done in C as a do...while loop for the
> > non-R10000_LLSC_WAR case. As this is a single instruction, and adds
> > consistency with the R10000_LLSC_WAR cases & the cmpxchg() code, this
> > seems worthwhile.
> 
> IMHO, a good cleanup.  I'll test shortly on my Octane once I get some things
> moved around and a new kernel tree set up.

Thanks :)

> That said, there are a number
> of locations where R10000_LLSC_WAR is used in this manner, and I think this
> change should be broken out into its own patch series, with the macro that
> selects normal branch over branch-likely, placed into a main MIPS header
> file somewhere, and the same change made to all locations at once.  That'll
> give some consistency to everything.

I agree that it might be good to generalise this approach & use it elsewhere 
too, but I'd prefer that happen in further patches so that it doesn't hold up 
the rest of these cmpxchg() & locking changes.

> I don't think any of my SGI systems have the specific R10K revision that's
> affected.  I've been building kernels for a while now with a patch that
> splits the R10K CPU selection up into vanilla R10K, and then
> R12K/R14K/R16K, and using the non-R10000_LLSC_WAR path for R12K and up, as
> many of my SGI systems have at least R12K processors.  My IP28 might have
> the older, affected R10K, but I haven't tested that out since 4.4 (it
> didn't survive for very long, either).
> 
> I also wonder if "__scbeqz" is a descriptive-enough name.  It works in this
> case because the macro definition is at the top, but if moved to a different
> header file, perhaps something more like "__r10k_b_insn" or such may clue
> casual readers in some more?

Hmm, the issue then is that the name is shared for the r10k & non-r10k cases 
so putting r10k in the name seems a bit misleading. __scbeqz at least hints 
that it's the variant of beqz to be used to compare the result of an sc.

Thanks,
    Paul

> 
> --J
> 
> > Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> > Cc: Ralf Baechle <ralf@linux-mips.org>
> > Cc: linux-mips@linux-mips.org
> > ---
> > 
> >  arch/mips/include/asm/cmpxchg.h | 80
> >  ++++++++++++----------------------------- 1 file changed, 22
> >  insertions(+), 58 deletions(-)
> > 
> > diff --git a/arch/mips/include/asm/cmpxchg.h
> > b/arch/mips/include/asm/cmpxchg.h index b71ab4a5fd50..0582c01d229d 100644
> > --- a/arch/mips/include/asm/cmpxchg.h
> > +++ b/arch/mips/include/asm/cmpxchg.h
> > @@ -13,44 +13,38 @@
> > 
> >  #include <asm/compiler.h>
> >  #include <asm/war.h>
> > 
> > +/*
> > + * Using a branch-likely instruction to check the result of an sc
> > instruction + * works around a bug present in R10000 CPUs prior to
> > revision 3.0 that could + * cause ll-sc sequences to execute
> > non-atomically.
> > + */
> > +#if R10000_LLSC_WAR
> > +# define __scbeqz "beqzl"
> > +#else
> > +# define __scbeqz "beqz"
> > +#endif
> > +
> > 
> >  static inline unsigned long __xchg_u32(volatile int * m, unsigned int
> >  val)
> >  {
> >  
> >  	__u32 retval;
> >  	
> >  	smp_mb__before_llsc();
> > 
> > -	if (kernel_uses_llsc && R10000_LLSC_WAR) {
> > +	if (kernel_uses_llsc) {
> > 
> >  		unsigned long dummy;
> >  		
> >  		__asm__ __volatile__(
> > 
> > -		"	.set	arch=r4000				\n"
> > +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
> > 
> >  		"1:	ll	%0, %3			# xchg_u32	\n"
> >  		"	.set	mips0					\n"
> >  		"	move	%2, %z4					\n"
> > 
> > -		"	.set	arch=r4000				\n"
> > +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
> > 
> >  		"	sc	%2, %1					\n"
> > 
> > -		"	beqzl	%2, 1b					\n"
> > +		"\t" __scbeqz "	%2, 1b					\n"
> > 
> >  		"	.set	mips0					\n"
> >  		
> >  		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
> >  		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> >  		: "memory");
> > 
> > -	} else if (kernel_uses_llsc) {
> > -		unsigned long dummy;
> > -
> > -		do {
> > -			__asm__ __volatile__(
> > -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> > -			"	ll	%0, %3		# xchg_u32	\n"
> > -			"	.set	mips0				\n"
> > -			"	move	%2, %z4				\n"
> > -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> > -			"	sc	%2, %1				\n"
> > -			"	.set	mips0				\n"
> > -			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
> > -			  "=&r" (dummy)
> > -			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> > -			: "memory");
> > -		} while (unlikely(!dummy));
> > 
> >  	} else {
> >  	
> >  		unsigned long flags;
> > 
> > @@ -72,34 +66,19 @@ static inline __u64 __xchg_u64(volatile __u64 * m,
> > __u64 val)> 
> >  	smp_mb__before_llsc();
> > 
> > -	if (kernel_uses_llsc && R10000_LLSC_WAR) {
> > +	if (kernel_uses_llsc) {
> > 
> >  		unsigned long dummy;
> >  		
> >  		__asm__ __volatile__(
> > 
> > -		"	.set	arch=r4000				\n"
> > +		"	.set	" MIPS_ISA_ARCH_LEVEL "			\n"
> > 
> >  		"1:	lld	%0, %3			# xchg_u64	\n"
> >  		"	move	%2, %z4					\n"
> >  		"	scd	%2, %1					\n"
> > 
> > -		"	beqzl	%2, 1b					\n"
> > +		"\t" __scbeqz "	%2, 1b					\n"
> > 
> >  		"	.set	mips0					\n"
> >  		
> >  		: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m), "=&r" (dummy)
> >  		: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> >  		: "memory");
> > 
> > -	} else if (kernel_uses_llsc) {
> > -		unsigned long dummy;
> > -
> > -		do {
> > -			__asm__ __volatile__(
> > -			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
> > -			"	lld	%0, %3		# xchg_u64	\n"
> > -			"	move	%2, %z4				\n"
> > -			"	scd	%2, %1				\n"
> > -			"	.set	mips0				\n"
> > -			: "=&r" (retval), "=" GCC_OFF_SMALL_ASM() (*m),
> > -			  "=&r" (dummy)
> > -			: GCC_OFF_SMALL_ASM() (*m), "Jr" (val)
> > -			: "memory");
> > -		} while (unlikely(!dummy));
> > 
> >  	} else {
> >  	
> >  		unsigned long flags;
> > 
> > @@ -142,24 +121,7 @@ static inline unsigned long __xchg(unsigned long x,
> > volatile void * ptr, int siz> 
> >  ({									\
> >  
> >  	__typeof(*(m)) __ret;						\
> >  	
> >  									\
> > 
> > -	if (kernel_uses_llsc && R10000_LLSC_WAR) {			\
> > -		__asm__ __volatile__(					\
> > -		"	.set	push				\n"	\
> > -		"	.set	noat				\n"	\
> > -		"	.set	arch=r4000			\n"	\
> > -		"1:	" ld "	%0, %2		# __cmpxchg_asm \n"	\
> > -		"	bne	%0, %z3, 2f			\n"	\
> > -		"	.set	mips0				\n"	\
> > -		"	move	$1, %z4				\n"	\
> > -		"	.set	arch=r4000			\n"	\
> > -		"	" st "	$1, %1				\n"	\
> > -		"	beqzl	$1, 1b				\n"	\
> > -		"2:						\n"	\
> > -		"	.set	pop				\n"	\
> > -		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
> > -		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
> > -		: "memory");						\
> > -	} else if (kernel_uses_llsc) {					\
> > +	if (kernel_uses_llsc) {						\
> > 
> >  		__asm__ __volatile__(					\
> >  		"	.set	push				\n"	\
> >  		"	.set	noat				\n"	\
> > 
> > @@ -170,7 +132,7 @@ static inline unsigned long __xchg(unsigned long x,
> > volatile void * ptr, int siz> 
> >  		"	move	$1, %z4				\n"	\
> >  		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
> >  		"	" st "	$1, %1				\n"	\
> > 
> > -		"	beqz	$1, 1b				\n"	\
> > +		"\t" __scbeqz "	$1, 1b				\n"	\
> > 
> >  		"	.set	pop				\n"	\
> >  		"2:						\n"	\
> >  		
> >  		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
> > 
> > @@ -245,4 +207,6 @@ extern void __cmpxchg_called_with_bad_pointer(void);
> > 
> >  #define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n))
> >  #endif
> > 
> > +#undef __scbeqz
> > +
> > 
> >  #endif /* __ASM_CMPXCHG_H */


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] MIPS: cmpxchg(), xchg() fixes & queued locks
@ 2017-06-12 22:47     ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-12 22:47 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 2414 bytes --]

Hi Ralf,

On Monday, 12 June 2017 01:27:42 PDT Ralf Baechle wrote:
> On Fri, Jun 09, 2017 at 05:26:32PM -0700, Paul Burton wrote:
> > This series makes a bunch of cleanups & improvements to the cmpxchg() &
> > xchg() macros & functions, allowing them to be used on values smaller
> > than 4 bytes, then switches MIPS over to use generic queued spinlocks &
> > queued read/write locks.
> 
> A number of nice cleanups there!

Thanks!

> I'm wondering, have you tested the kernel size with and without this
> series applied?  GCC claims since 25 years or so that inlines are as
> efficient as macros but in reality macros have always been superior
> which mattered for things that are expanded very often.
> 
> More recent GCCs have claimed improvments so it'd be interested to see
> actual numbers - and possibly get rid of many more unmaintainable macros.

If I build a pistachio_defconfig v4.12-rc4 kernel, with ftrace disabled (same 
config mentioned in the 2 locking patches) and with my "MIPS: Hardcode 
cpu_has_* where known at compile time due to ISA" patch applied in all cases 
then I see:

   Configuration                  |  Size
  --------------------------------|---------
   v4.12-rc4                      | 7161133
   v4.12-rc4 + cmpxchg cleanups   | 7165597
   v4.12-rc4 + whole series       | 7166600

The cmpxchg cleanups row applies patches 1-9 of this series but leaves off the 
2 queued locking patches, so is a direct look at just the cmpxchg/xchg 
changes.

Sizes are as reported by scripts/bloat-o-meter. The toolchain used was 
Codescape 2016.05-06 (gcc 4.9.2, binutils 2.24.90) as found here:

http://codescape-mips-sdk.imgtec.com/components/toolchain/2016.05-06/

So the cmpxchg patches cost us 4464 bytes, of which __cmpxchg_small() & 
__xchg_small() make up 444 bytes:

function                                     old     new   delta
__cmpxchg_small                                -     236    +236
__xchg_small                                   -     208    +208

The rest is all small changes one way or the other to various functions 
throughout the tree, making up a little under 4KiB cost to the cmpxchg() & 
xchg() cleanups. Not zero (which actually surprises me..!) but hopefully not 
too much.

The generic queued locks then cost us a further ~1KiB but I'd argue offer 
enough benefits to outweigh that (if nothing else look at asm/spinlock.h 
afterwards :p).

Thanks,
    Paul

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] MIPS: cmpxchg(), xchg() fixes & queued locks
@ 2017-06-12 22:47     ` Paul Burton
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Burton @ 2017-06-12 22:47 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 2414 bytes --]

Hi Ralf,

On Monday, 12 June 2017 01:27:42 PDT Ralf Baechle wrote:
> On Fri, Jun 09, 2017 at 05:26:32PM -0700, Paul Burton wrote:
> > This series makes a bunch of cleanups & improvements to the cmpxchg() &
> > xchg() macros & functions, allowing them to be used on values smaller
> > than 4 bytes, then switches MIPS over to use generic queued spinlocks &
> > queued read/write locks.
> 
> A number of nice cleanups there!

Thanks!

> I'm wondering, have you tested the kernel size with and without this
> series applied?  GCC claims since 25 years or so that inlines are as
> efficient as macros but in reality macros have always been superior
> which mattered for things that are expanded very often.
> 
> More recent GCCs have claimed improvments so it'd be interested to see
> actual numbers - and possibly get rid of many more unmaintainable macros.

If I build a pistachio_defconfig v4.12-rc4 kernel, with ftrace disabled (same 
config mentioned in the 2 locking patches) and with my "MIPS: Hardcode 
cpu_has_* where known at compile time due to ISA" patch applied in all cases 
then I see:

   Configuration                  |  Size
  --------------------------------|---------
   v4.12-rc4                      | 7161133
   v4.12-rc4 + cmpxchg cleanups   | 7165597
   v4.12-rc4 + whole series       | 7166600

The cmpxchg cleanups row applies patches 1-9 of this series but leaves off the 
2 queued locking patches, so is a direct look at just the cmpxchg/xchg 
changes.

Sizes are as reported by scripts/bloat-o-meter. The toolchain used was 
Codescape 2016.05-06 (gcc 4.9.2, binutils 2.24.90) as found here:

http://codescape-mips-sdk.imgtec.com/components/toolchain/2016.05-06/

So the cmpxchg patches cost us 4464 bytes, of which __cmpxchg_small() & 
__xchg_small() make up 444 bytes:

function                                     old     new   delta
__cmpxchg_small                                -     236    +236
__xchg_small                                   -     208    +208

The rest is all small changes one way or the other to various functions 
throughout the tree, making up a little under 4KiB cost to the cmpxchg() & 
xchg() cleanups. Not zero (which actually surprises me..!) but hopefully not 
too much.

The generic queued locks then cost us a further ~1KiB but I'd argue offer 
enough benefits to outweigh that (if nothing else look at asm/spinlock.h 
afterwards :p).

Thanks,
    Paul

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] MIPS: cmpxchg(), xchg() fixes & queued locks
  2017-06-10  0:26 ` Paul Burton
                   ` (12 preceding siblings ...)
  (?)
@ 2017-06-29 10:28 ` Joshua Kinard
  -1 siblings, 0 replies; 32+ messages in thread
From: Joshua Kinard @ 2017-06-29 10:28 UTC (permalink / raw)
  To: Paul Burton, linux-mips; +Cc: Ralf Baechle

On 06/09/2017 20:26, Paul Burton wrote:
> This series makes a bunch of cleanups & improvements to the cmpxchg() &
> xchg() macros & functions, allowing them to be used on values smaller
> than 4 bytes, then switches MIPS over to use generic queued spinlocks &
> queued read/write locks.
> 
> Applies atop v4.12-rc4.
> 
> Paul Burton (11):
>   MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases
>   MIPS: cmpxchg: Pull xchg() asm into a macro
>   MIPS: cmpxchg: Use __compiletime_error() for bad cmpxchg() pointers
>   MIPS: cmpxchg: Error out on unsupported xchg() calls
>   MIPS: cmpxchg: Drop __xchg_u{32,64} functions
>   MIPS: cmpxchg: Implement __cmpxchg() as a function
>   MIPS: cmpxchg: Implement 1 byte & 2 byte xchg()
>   MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg()
>   MIPS: cmpxchg: Rearrange __xchg() arguments to match xchg()
>   MIPS: Use queued read/write locks (qrwlock)
>   MIPS: Use queued spinlocks (qspinlock)
> 
>  arch/mips/Kconfig                      |   2 +
>  arch/mips/include/asm/Kbuild           |   2 +
>  arch/mips/include/asm/cmpxchg.h        | 280 ++++++++++------------
>  arch/mips/include/asm/spinlock.h       | 426 +--------------------------------
>  arch/mips/include/asm/spinlock_types.h |  34 +--
>  arch/mips/kernel/Makefile              |   2 +-
>  arch/mips/kernel/cmpxchg.c             | 109 +++++++++
>  7 files changed, 239 insertions(+), 616 deletions(-)
>  create mode 100644 arch/mips/kernel/cmpxchg.c

FWIW, I've been running this patchset, along with the cpu_has_* patches, on my
SGI Octane for two weeks now, constantly compiling code, and haven't seen any
issues thus far.

Tested-by: Joshua Kinard <kumba@gentoo.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/11] MIPS: Use queued spinlocks (qspinlock)
  2017-06-10  0:26   ` Paul Burton
  (?)
@ 2020-11-06 16:28   ` Alexander Sverdlin
  -1 siblings, 0 replies; 32+ messages in thread
From: Alexander Sverdlin @ 2020-11-06 16:28 UTC (permalink / raw)
  To: Paul Burton, linux-mips; +Cc: Ralf Baechle

Hello Paul and all,

On 10/06/2017 02:26, Paul Burton wrote:
> This patch switches MIPS to make use of generically implemented queued
> spinlocks, rather than the ticket spinlocks used previously. This allows
> us to drop a whole load of inline assembly, share more generic code, and
> is also a performance win.

unfortunately this patch made spin_locks on Octeon two times slower.
Further patches didn't improve the situation and my measurement show that
currently in 5.9 the situation is even worse (which I still have to analyze).

I've performed two kinds of test for this particular patch, based on upstream
lock_torture test and using my own kernel module with a more tight loop
around spin_lock (which I could share on request).

The CONFIG_LOCK_TORTURE_TEST results were obtained with the following parameters:
modprobe locktorture nwriters_stress=6; sleep 62; rmmod locktorture;

Before the patch:
Writes:  Total: 40973197  Max/Min: 0/0   Fail: 0 

After the patch:
Writes:  Total: 34317903  Max/Min: 0/0   Fail: 0

This shows 17% performance loss.

However if I test a tighter loop:
        for (i = 0; i < NUM_ITERATIONS; i++) {                                                                                                                                  
                spin_lock(&locktest_spinlock);                                                                                                                                  
                locktest_counter++;                                                                                                                                             
                spin_unlock(&locktest_spinlock);                                                                                                                                
        }                                                                                                                                                                       

again with 6 threads, I get 1.499s (for 1M iterations) instead of 0.731s.
This is 105% overhead per lock-unlock combination in comparison to MIPS-specific
implementation!

What do you think, should I prepare a revert of this conversion to queued locks
or do you have ideas I might try on Octeon?
I suspect locks other than spin_lock might be affected as well.

> Results from running the AIM7 short workload on a MIPS Creator Ci40 (ie.
> 2 core 2 thread interAptiv CPU clocked at 546MHz) with v4.12-rc4
> pistachio_defconfig, with ftrace disabled due to a current bug, and both
> with & without use of queued rwlocks & spinlocks:
> 
>   Forks | v4.12-rc4 | +qlocks  | Change
>  -------|-----------|----------|--------
>      10 | 52630.32  | 53316.31 | +1.01%
>      20 | 51777.80  | 52623.15 | +1.02%
>      30 | 51645.92  | 52517.26 | +1.02%
>      40 | 51634.88  | 52419.89 | +1.02%
>      50 | 51506.75  | 52307.81 | +1.02%
>      60 | 51500.74  | 52322.72 | +1.02%
>      70 | 51434.81  | 52288.60 | +1.02%
>      80 | 51423.22  | 52434.85 | +1.02%
>      90 | 51428.65  | 52410.10 | +1.02%
> 
> The kernels used for these tests also had my "MIPS: Hardcode cpu_has_*
> where known at compile time due to ISA" patch applied, which allows the
> kernel_uses_llsc checks in cmpxchg() & xchg() to be optimised away at
> compile time.
> 
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: linux-mips@linux-mips.org
> ---
> 
>  arch/mips/Kconfig                      |   1 +
>  arch/mips/include/asm/Kbuild           |   1 +
>  arch/mips/include/asm/spinlock.h       | 210 +--------------------------------
>  arch/mips/include/asm/spinlock_types.h |  24 +---
>  4 files changed, 4 insertions(+), 232 deletions(-)
> 
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 50c71273f569..398f0a55d4fa 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -71,6 +71,7 @@ config MIPS
>  	select HAVE_REGS_AND_STACK_ACCESS_API
>  	select HAVE_COPY_THREAD_TLS
>  	select ARCH_USE_QUEUED_RWLOCKS
> +	select ARCH_USE_QUEUED_SPINLOCKS
>  
>  menu "Machine selection"
>  
> diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
> index ae6cb47e9d22..7c8aab23bce8 100644
> --- a/arch/mips/include/asm/Kbuild
> +++ b/arch/mips/include/asm/Kbuild
> @@ -13,6 +13,7 @@ generic-y += parport.h
>  generic-y += percpu.h
>  generic-y += preempt.h
>  generic-y += qrwlock.h
> +generic-y += qspinlock.h
>  generic-y += sections.h
>  generic-y += segment.h
>  generic-y += serial.h
> diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
> index 3e7afff196cd..a7d21da16b6a 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -9,217 +9,9 @@
>  #ifndef _ASM_SPINLOCK_H
>  #define _ASM_SPINLOCK_H
>  
> -#include <linux/compiler.h>
> -
> -#include <asm/barrier.h>
>  #include <asm/processor.h>
>  #include <asm/qrwlock.h>
> -#include <asm/compiler.h>
> -#include <asm/war.h>
> -
> -/*
> - * Your basic SMP spinlocks, allowing only a single CPU anywhere
> - *
> - * Simple spin lock operations.	 There are two variants, one clears IRQ's
> - * on the local processor, one does not.
> - *
> - * These are fair FIFO ticket locks
> - *
> - * (the type definitions are in asm/spinlock_types.h)
> - */
> -
> -
> -/*
> - * Ticket locks are conceptually two parts, one indicating the current head of
> - * the queue, and the other indicating the current tail. The lock is acquired
> - * by atomically noting the tail and incrementing it by one (thus adding
> - * ourself to the queue and noting our position), then waiting until the head
> - * becomes equal to the the initial value of the tail.
> - */
> -
> -static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> -{
> -	u32 counters = ACCESS_ONCE(lock->lock);
> -
> -	return ((counters >> 16) ^ counters) & 0xffff;
> -}
> -
> -static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> -{
> -	return lock.h.serving_now == lock.h.ticket;
> -}
> -
> -#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
> -
> -static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
> -{
> -	u16 owner = READ_ONCE(lock->h.serving_now);
> -	smp_rmb();
> -	for (;;) {
> -		arch_spinlock_t tmp = READ_ONCE(*lock);
> -
> -		if (tmp.h.serving_now == tmp.h.ticket ||
> -		    tmp.h.serving_now != owner)
> -			break;
> -
> -		cpu_relax();
> -	}
> -	smp_acquire__after_ctrl_dep();
> -}
> -
> -static inline int arch_spin_is_contended(arch_spinlock_t *lock)
> -{
> -	u32 counters = ACCESS_ONCE(lock->lock);
> -
> -	return (((counters >> 16) - counters) & 0xffff) > 1;
> -}
> -#define arch_spin_is_contended	arch_spin_is_contended
> -
> -static inline void arch_spin_lock(arch_spinlock_t *lock)
> -{
> -	int my_ticket;
> -	int tmp;
> -	int inc = 0x10000;
> -
> -	if (R10000_LLSC_WAR) {
> -		__asm__ __volatile__ (
> -		"	.set push		# arch_spin_lock	\n"
> -		"	.set noreorder					\n"
> -		"							\n"
> -		"1:	ll	%[ticket], %[ticket_ptr]		\n"
> -		"	addu	%[my_ticket], %[ticket], %[inc]		\n"
> -		"	sc	%[my_ticket], %[ticket_ptr]		\n"
> -		"	beqzl	%[my_ticket], 1b			\n"
> -		"	 nop						\n"
> -		"	srl	%[my_ticket], %[ticket], 16		\n"
> -		"	andi	%[ticket], %[ticket], 0xffff		\n"
> -		"	bne	%[ticket], %[my_ticket], 4f		\n"
> -		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
> -		"2:							\n"
> -		"	.subsection 2					\n"
> -		"4:	andi	%[ticket], %[ticket], 0xffff		\n"
> -		"	sll	%[ticket], 5				\n"
> -		"							\n"
> -		"6:	bnez	%[ticket], 6b				\n"
> -		"	 subu	%[ticket], 1				\n"
> -		"							\n"
> -		"	lhu	%[ticket], %[serving_now_ptr]		\n"
> -		"	beq	%[ticket], %[my_ticket], 2b		\n"
> -		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
> -		"	b	4b					\n"
> -		"	 subu	%[ticket], %[ticket], 1			\n"
> -		"	.previous					\n"
> -		"	.set pop					\n"
> -		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
> -		  [serving_now_ptr] "+m" (lock->h.serving_now),
> -		  [ticket] "=&r" (tmp),
> -		  [my_ticket] "=&r" (my_ticket)
> -		: [inc] "r" (inc));
> -	} else {
> -		__asm__ __volatile__ (
> -		"	.set push		# arch_spin_lock	\n"
> -		"	.set noreorder					\n"
> -		"							\n"
> -		"1:	ll	%[ticket], %[ticket_ptr]		\n"
> -		"	addu	%[my_ticket], %[ticket], %[inc]		\n"
> -		"	sc	%[my_ticket], %[ticket_ptr]		\n"
> -		"	beqz	%[my_ticket], 1b			\n"
> -		"	 srl	%[my_ticket], %[ticket], 16		\n"
> -		"	andi	%[ticket], %[ticket], 0xffff		\n"
> -		"	bne	%[ticket], %[my_ticket], 4f		\n"
> -		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
> -		"2:	.insn						\n"
> -		"	.subsection 2					\n"
> -		"4:	andi	%[ticket], %[ticket], 0xffff		\n"
> -		"	sll	%[ticket], 5				\n"
> -		"							\n"
> -		"6:	bnez	%[ticket], 6b				\n"
> -		"	 subu	%[ticket], 1				\n"
> -		"							\n"
> -		"	lhu	%[ticket], %[serving_now_ptr]		\n"
> -		"	beq	%[ticket], %[my_ticket], 2b		\n"
> -		"	 subu	%[ticket], %[my_ticket], %[ticket]	\n"
> -		"	b	4b					\n"
> -		"	 subu	%[ticket], %[ticket], 1			\n"
> -		"	.previous					\n"
> -		"	.set pop					\n"
> -		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
> -		  [serving_now_ptr] "+m" (lock->h.serving_now),
> -		  [ticket] "=&r" (tmp),
> -		  [my_ticket] "=&r" (my_ticket)
> -		: [inc] "r" (inc));
> -	}
> -
> -	smp_llsc_mb();
> -}
> -
> -static inline void arch_spin_unlock(arch_spinlock_t *lock)
> -{
> -	unsigned int serving_now = lock->h.serving_now + 1;
> -	wmb();
> -	lock->h.serving_now = (u16)serving_now;
> -	nudge_writes();
> -}
> -
> -static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)
> -{
> -	int tmp, tmp2, tmp3;
> -	int inc = 0x10000;
> -
> -	if (R10000_LLSC_WAR) {
> -		__asm__ __volatile__ (
> -		"	.set push		# arch_spin_trylock	\n"
> -		"	.set noreorder					\n"
> -		"							\n"
> -		"1:	ll	%[ticket], %[ticket_ptr]		\n"
> -		"	srl	%[my_ticket], %[ticket], 16		\n"
> -		"	andi	%[now_serving], %[ticket], 0xffff	\n"
> -		"	bne	%[my_ticket], %[now_serving], 3f	\n"
> -		"	 addu	%[ticket], %[ticket], %[inc]		\n"
> -		"	sc	%[ticket], %[ticket_ptr]		\n"
> -		"	beqzl	%[ticket], 1b				\n"
> -		"	 li	%[ticket], 1				\n"
> -		"2:							\n"
> -		"	.subsection 2					\n"
> -		"3:	b	2b					\n"
> -		"	 li	%[ticket], 0				\n"
> -		"	.previous					\n"
> -		"	.set pop					\n"
> -		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
> -		  [ticket] "=&r" (tmp),
> -		  [my_ticket] "=&r" (tmp2),
> -		  [now_serving] "=&r" (tmp3)
> -		: [inc] "r" (inc));
> -	} else {
> -		__asm__ __volatile__ (
> -		"	.set push		# arch_spin_trylock	\n"
> -		"	.set noreorder					\n"
> -		"							\n"
> -		"1:	ll	%[ticket], %[ticket_ptr]		\n"
> -		"	srl	%[my_ticket], %[ticket], 16		\n"
> -		"	andi	%[now_serving], %[ticket], 0xffff	\n"
> -		"	bne	%[my_ticket], %[now_serving], 3f	\n"
> -		"	 addu	%[ticket], %[ticket], %[inc]		\n"
> -		"	sc	%[ticket], %[ticket_ptr]		\n"
> -		"	beqz	%[ticket], 1b				\n"
> -		"	 li	%[ticket], 1				\n"
> -		"2:	.insn						\n"
> -		"	.subsection 2					\n"
> -		"3:	b	2b					\n"
> -		"	 li	%[ticket], 0				\n"
> -		"	.previous					\n"
> -		"	.set pop					\n"
> -		: [ticket_ptr] "+" GCC_OFF_SMALL_ASM() (lock->lock),
> -		  [ticket] "=&r" (tmp),
> -		  [my_ticket] "=&r" (tmp2),
> -		  [now_serving] "=&r" (tmp3)
> -		: [inc] "r" (inc));
> -	}
> -
> -	smp_llsc_mb();
> -
> -	return tmp;
> -}
> +#include <asm/qspinlock.h>
>  
>  #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
>  #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
> diff --git a/arch/mips/include/asm/spinlock_types.h b/arch/mips/include/asm/spinlock_types.h
> index 3d38bfad9b49..177e722eb96c 100644
> --- a/arch/mips/include/asm/spinlock_types.h
> +++ b/arch/mips/include/asm/spinlock_types.h
> @@ -1,29 +1,7 @@
>  #ifndef _ASM_SPINLOCK_TYPES_H
>  #define _ASM_SPINLOCK_TYPES_H
>  
> -#include <linux/types.h>
> -
> -#include <asm/byteorder.h>
> -
> -typedef union {
> -	/*
> -	 * bits	 0..15 : serving_now
> -	 * bits 16..31 : ticket
> -	 */
> -	u32 lock;
> -	struct {
> -#ifdef __BIG_ENDIAN
> -		u16 ticket;
> -		u16 serving_now;
> -#else
> -		u16 serving_now;
> -		u16 ticket;
> -#endif
> -	} h;
> -} arch_spinlock_t;
> -
> -#define __ARCH_SPIN_LOCK_UNLOCKED	{ .lock = 0 }
> -
> +#include <asm-generic/qspinlock_types.h>
>  #include <asm-generic/qrwlock_types.h>
>  
>  #endif
> 

-- 
Best regards,
Alexander Sverdlin.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-11-06 16:28 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-10  0:26 [PATCH 00/11] MIPS: cmpxchg(), xchg() fixes & queued locks Paul Burton
2017-06-10  0:26 ` Paul Burton
2017-06-10  0:26 ` [PATCH 01/11] MIPS: cmpxchg: Unify R10000_LLSC_WAR & non-R10000_LLSC_WAR cases Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10 23:25   ` Joshua Kinard
2017-06-12 19:02     ` Paul Burton
2017-06-12 19:02       ` Paul Burton
2017-06-10  0:26 ` [PATCH 02/11] MIPS: cmpxchg: Pull xchg() asm into a macro Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 03/11] MIPS: cmpxchg: Use __compiletime_error() for bad cmpxchg() pointers Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 04/11] MIPS: cmpxchg: Error out on unsupported xchg() calls Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 05/11] MIPS: cmpxchg: Drop __xchg_u{32,64} functions Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 06/11] MIPS: cmpxchg: Implement __cmpxchg() as a function Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 07/11] MIPS: cmpxchg: Implement 1 byte & 2 byte xchg() Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 08/11] MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg() Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 09/11] MIPS: cmpxchg: Rearrange __xchg() arguments to match xchg() Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 10/11] MIPS: Use queued read/write locks (qrwlock) Paul Burton
2017-06-10  0:26   ` Paul Burton
2017-06-10  0:26 ` [PATCH 11/11] MIPS: Use queued spinlocks (qspinlock) Paul Burton
2017-06-10  0:26   ` Paul Burton
2020-11-06 16:28   ` Alexander Sverdlin
2017-06-12  8:27 ` [PATCH 00/11] MIPS: cmpxchg(), xchg() fixes & queued locks Ralf Baechle
2017-06-12 22:47   ` Paul Burton
2017-06-12 22:47     ` Paul Burton
2017-06-29 10:28 ` Joshua Kinard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.