linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM
@ 2019-10-13 22:13 Sebastian Andrzej Siewior
  2019-10-13 22:13 ` [PATCH 1/6] sh: Move cmpxchg-xchg.h to asm-generic Sebastian Andrzej Siewior
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-13 22:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King, Ingo Molnar,
	Waiman Long, Will Deacon

I added support for queued-RW and -spinlocks for ARM. I followed Arnd's
suggestion and added support for xchg() on 8bit and 16bit variables (V6
CPUs) via the SH implementation. This makes it possible to remove the
current ticket based locking implementation. 

The numbers should be the same as in v1 posted here:
   http://lkml.kernel.org/r/20191007214439.27891-1-sebastian@breakpoint.cc

The only thing changed is that patch #1-#3 wire up the missing xchg and
the patches #4 and #5 additionally remove the old implementation while
adding the missing bits for the queued implementation.

Sebatian



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] sh: Move cmpxchg-xchg.h to asm-generic
  2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
@ 2019-10-13 22:13 ` Sebastian Andrzej Siewior
  2019-10-14  7:25   ` Arnd Bergmann
  2019-10-13 22:13 ` [PATCH 2/6] ARM: cmpxchg: Define first cmpxchg() followed by xchg() Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-13 22:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-arch, Rich Felker, Yoshinori Sato, Arnd Bergmann, linux-sh,
	Peter Zijlstra, Russell King, Sebastian Andrzej Siewior,
	Ingo Molnar, Waiman Long, Will Deacon

The header file is very generic and could be reused by other
architectures as long as they provide __cmpxchg_u32().

Move sh's cmpxchg-xchg.h to asm-generic.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-sh@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/sh/include/asm/Kbuild                                  | 1 +
 {arch/sh/include/asm => include/asm-generic}/cmpxchg-xchg.h | 0
 2 files changed, 1 insertion(+)
 rename {arch/sh/include/asm => include/asm-generic}/cmpxchg-xchg.h (100%)

diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 51a54df22c110..08c1d96286d9d 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 generated-y += syscall_table.h
+generic-y += cmpxchg-xchg.h
 generic-y += compat.h
 generic-y += current.h
 generic-y += delay.h
diff --git a/arch/sh/include/asm/cmpxchg-xchg.h b/include/asm-generic/cmpxchg-xchg.h
similarity index 100%
rename from arch/sh/include/asm/cmpxchg-xchg.h
rename to include/asm-generic/cmpxchg-xchg.h
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] ARM: cmpxchg: Define first cmpxchg() followed by xchg()
  2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
  2019-10-13 22:13 ` [PATCH 1/6] sh: Move cmpxchg-xchg.h to asm-generic Sebastian Andrzej Siewior
@ 2019-10-13 22:13 ` Sebastian Andrzej Siewior
  2019-10-13 22:13 ` [PATCH 3/6] ARM: Add xchg_{8|16}() on generic cmpxchg() on CPU_V6 Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-13 22:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King,
	Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon

In order to use the generic xchg_() implementation based on cmpxchg()
we need cmpxchg defined first.
Move the xchg bits to the lower part of the file.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/arm/include/asm/cmpxchg.h | 227 ++++++++++++++++-----------------
 1 file changed, 113 insertions(+), 114 deletions(-)

diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
index 8b701f8e175c0..c61de6acf41ed 100644
--- a/arch/arm/include/asm/cmpxchg.h
+++ b/arch/arm/include/asm/cmpxchg.h
@@ -5,120 +5,6 @@
 #include <linux/irqflags.h>
 #include <linux/prefetch.h>
 #include <asm/barrier.h>
-
-#if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
-/*
- * On the StrongARM, "swp" is terminally broken since it bypasses the
- * cache totally.  This means that the cache becomes inconsistent, and,
- * since we use normal loads/stores as well, this is really bad.
- * Typically, this causes oopsen in filp_close, but could have other,
- * more disastrous effects.  There are two work-arounds:
- *  1. Disable interrupts and emulate the atomic swap
- *  2. Clean the cache, perform atomic swap, flush the cache
- *
- * We choose (1) since its the "easiest" to achieve here and is not
- * dependent on the processor type.
- *
- * NOTE that this solution won't work on an SMP system, so explcitly
- * forbid it here.
- */
-#define swp_is_buggy
-#endif
-
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
-{
-	extern void __bad_xchg(volatile void *, int);
-	unsigned long ret;
-#ifdef swp_is_buggy
-	unsigned long flags;
-#endif
-#if __LINUX_ARM_ARCH__ >= 6
-	unsigned int tmp;
-#endif
-
-	prefetchw((const void *)ptr);
-
-	switch (size) {
-#if __LINUX_ARM_ARCH__ >= 6
-#ifndef CONFIG_CPU_V6 /* MIN ARCH >= V6K */
-	case 1:
-		asm volatile("@	__xchg1\n"
-		"1:	ldrexb	%0, [%3]\n"
-		"	strexb	%1, %2, [%3]\n"
-		"	teq	%1, #0\n"
-		"	bne	1b"
-			: "=&r" (ret), "=&r" (tmp)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-	case 2:
-		asm volatile("@	__xchg2\n"
-		"1:	ldrexh	%0, [%3]\n"
-		"	strexh	%1, %2, [%3]\n"
-		"	teq	%1, #0\n"
-		"	bne	1b"
-			: "=&r" (ret), "=&r" (tmp)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-#endif
-	case 4:
-		asm volatile("@	__xchg4\n"
-		"1:	ldrex	%0, [%3]\n"
-		"	strex	%1, %2, [%3]\n"
-		"	teq	%1, #0\n"
-		"	bne	1b"
-			: "=&r" (ret), "=&r" (tmp)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-#elif defined(swp_is_buggy)
-#ifdef CONFIG_SMP
-#error SMP is not supported on this platform
-#endif
-	case 1:
-		raw_local_irq_save(flags);
-		ret = *(volatile unsigned char *)ptr;
-		*(volatile unsigned char *)ptr = x;
-		raw_local_irq_restore(flags);
-		break;
-
-	case 4:
-		raw_local_irq_save(flags);
-		ret = *(volatile unsigned long *)ptr;
-		*(volatile unsigned long *)ptr = x;
-		raw_local_irq_restore(flags);
-		break;
-#else
-	case 1:
-		asm volatile("@	__xchg1\n"
-		"	swpb	%0, %1, [%2]"
-			: "=&r" (ret)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-	case 4:
-		asm volatile("@	__xchg4\n"
-		"	swp	%0, %1, [%2]"
-			: "=&r" (ret)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-#endif
-	default:
-		/* Cause a link-time error, the xchg() size is not supported */
-		__bad_xchg(ptr, size), ret = 0;
-		break;
-	}
-
-	return ret;
-}
-
-#define xchg_relaxed(ptr, x) ({						\
-	(__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr),		\
-				   sizeof(*(ptr)));			\
-})
-
 #include <asm-generic/cmpxchg-local.h>
 
 #if __LINUX_ARM_ARCH__ < 6
@@ -276,4 +162,117 @@ static inline unsigned long long __cmpxchg64(unsigned long long *ptr,
 
 #endif	/* __LINUX_ARM_ARCH__ >= 6 */
 
+#if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
+/*
+ * On the StrongARM, "swp" is terminally broken since it bypasses the
+ * cache totally.  This means that the cache becomes inconsistent, and,
+ * since we use normal loads/stores as well, this is really bad.
+ * Typically, this causes oopsen in filp_close, but could have other,
+ * more disastrous effects.  There are two work-arounds:
+ *  1. Disable interrupts and emulate the atomic swap
+ *  2. Clean the cache, perform atomic swap, flush the cache
+ *
+ * We choose (1) since its the "easiest" to achieve here and is not
+ * dependent on the processor type.
+ *
+ * NOTE that this solution won't work on an SMP system, so explcitly
+ * forbid it here.
+ */
+#define swp_is_buggy
+#endif
+
+static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
+{
+	extern void __bad_xchg(volatile void *, int);
+	unsigned long ret;
+#ifdef swp_is_buggy
+	unsigned long flags;
+#endif
+#if __LINUX_ARM_ARCH__ >= 6
+	unsigned int tmp;
+#endif
+
+	prefetchw((const void *)ptr);
+
+	switch (size) {
+#if __LINUX_ARM_ARCH__ >= 6
+#ifndef CONFIG_CPU_V6 /* MIN ARCH >= V6K */
+	case 1:
+		asm volatile("@	__xchg1\n"
+		"1:	ldrexb	%0, [%3]\n"
+		"	strexb	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+	case 2:
+		asm volatile("@	__xchg2\n"
+		"1:	ldrexh	%0, [%3]\n"
+		"	strexh	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#endif
+	case 4:
+		asm volatile("@	__xchg4\n"
+		"1:	ldrex	%0, [%3]\n"
+		"	strex	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#elif defined(swp_is_buggy)
+#ifdef CONFIG_SMP
+#error SMP is not supported on this platform
+#endif
+	case 1:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned char *)ptr;
+		*(volatile unsigned char *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+
+	case 4:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned long *)ptr;
+		*(volatile unsigned long *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+#else
+	case 1:
+		asm volatile("@	__xchg1\n"
+		"	swpb	%0, %1, [%2]"
+			: "=&r" (ret)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+	case 4:
+		asm volatile("@	__xchg4\n"
+		"	swp	%0, %1, [%2]"
+			: "=&r" (ret)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#endif
+	default:
+		/* Cause a link-time error, the xchg() size is not supported */
+		__bad_xchg(ptr, size), ret = 0;
+		break;
+	}
+
+	return ret;
+}
+
+#define xchg_relaxed(ptr, x) ({						\
+	(__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr),		\
+				   sizeof(*(ptr)));			\
+})
+
 #endif /* __ASM_ARM_CMPXCHG_H */
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] ARM: Add xchg_{8|16}() on generic cmpxchg() on CPU_V6
  2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
  2019-10-13 22:13 ` [PATCH 1/6] sh: Move cmpxchg-xchg.h to asm-generic Sebastian Andrzej Siewior
  2019-10-13 22:13 ` [PATCH 2/6] ARM: cmpxchg: Define first cmpxchg() followed by xchg() Sebastian Andrzej Siewior
@ 2019-10-13 22:13 ` Sebastian Andrzej Siewior
  2019-10-13 22:13 ` [PATCH 4/6] ARM: Use qrwlock implementation Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-13 22:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King,
	Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon

Use generic xchg_u{8|16}()to implement the function based on cmpxchg().
The generic header file expects __cmpxchg_u32() to perform a 32bit
cmpxchg operation.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/arm/include/asm/cmpxchg.h | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
index c61de6acf41ed..06e8b1c7a08fe 100644
--- a/arch/arm/include/asm/cmpxchg.h
+++ b/arch/arm/include/asm/cmpxchg.h
@@ -181,6 +181,16 @@ static inline unsigned long long __cmpxchg64(unsigned long long *ptr,
 #define swp_is_buggy
 #endif
 
+#ifdef CONFIG_CPU_V6
+static inline unsigned int __cmpxchg_u32(volatile void *ptr, unsigned int old,
+					 unsigned int new)
+{
+	return __cmpxchg(ptr, old, new, sizeof(unsigned int));
+}
+
+#include <asm-generic/cmpxchg-xchg.h>
+#endif
+
 static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
 {
 	extern void __bad_xchg(volatile void *, int);
@@ -196,7 +206,15 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 
 	switch (size) {
 #if __LINUX_ARM_ARCH__ >= 6
-#ifndef CONFIG_CPU_V6 /* MIN ARCH >= V6K */
+#ifdef CONFIG_CPU_V6
+	case 1:
+		ret = xchg_u8(ptr, x);
+		break;
+	case 2:
+		ret = xchg_u16(ptr, x);
+		break;
+
+#else /* MIN ARCH >= V6K */
 	case 1:
 		asm volatile("@	__xchg1\n"
 		"1:	ldrexb	%0, [%3]\n"
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] ARM: Use qrwlock implementation
  2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2019-10-13 22:13 ` [PATCH 3/6] ARM: Add xchg_{8|16}() on generic cmpxchg() on CPU_V6 Sebastian Andrzej Siewior
@ 2019-10-13 22:13 ` Sebastian Andrzej Siewior
  2019-10-13 22:13 ` [PATCH 5/6] ARM: Use qspinlock implementation Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-13 22:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King,
	Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon

Use the generic qrwlock implementation for rwlock. The WFE mechanism is
used as part of the spinlock implementation.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/arm/Kconfig                      |   1 +
 arch/arm/include/asm/Kbuild           |   1 +
 arch/arm/include/asm/spinlock.h       | 143 +-------------------------
 arch/arm/include/asm/spinlock_types.h |   2 +-
 4 files changed, 5 insertions(+), 142 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 8a50efb559f35..6029d825671c6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -34,6 +34,7 @@ config ARM
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 68ca86f85eb73..5327be7572cd2 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -18,6 +18,7 @@ generic-y += preempt.h
 generic-y += seccomp.h
 generic-y += serial.h
 generic-y += trace_clock.h
+generic-y += qrwlock.h
 
 generated-y += mach-types.h
 generated-y += unistd-nr.h
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 8f009e788ad40..f250a5022d4f6 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -52,6 +52,8 @@ static inline void dsb_sev(void)
  * release it, because V6 CPUs are assumed to have weakly ordered
  * memory.
  */
+#include <asm/qrwlock.h>
+#define smp_mb__after_spinlock()	smp_mb()
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
@@ -128,146 +130,5 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
-/*
- * RWLOCKS
- *
- *
- * Write locks are easy - we just set bit 31.  When unlocking, we can
- * just write zero since the lock is exclusively held.
- */
-
-static inline void arch_write_lock(arch_rwlock_t *rw)
-{
-	unsigned long tmp;
-
-	prefetchw(&rw->lock);
-	__asm__ __volatile__(
-"1:	ldrex	%0, [%1]\n"
-"	teq	%0, #0\n"
-	WFE("ne")
-"	strexeq	%0, %2, [%1]\n"
-"	teq	%0, #0\n"
-"	bne	1b"
-	: "=&r" (tmp)
-	: "r" (&rw->lock), "r" (0x80000000)
-	: "cc");
-
-	smp_mb();
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *rw)
-{
-	unsigned long contended, res;
-
-	prefetchw(&rw->lock);
-	do {
-		__asm__ __volatile__(
-		"	ldrex	%0, [%2]\n"
-		"	mov	%1, #0\n"
-		"	teq	%0, #0\n"
-		"	strexeq	%1, %3, [%2]"
-		: "=&r" (contended), "=&r" (res)
-		: "r" (&rw->lock), "r" (0x80000000)
-		: "cc");
-	} while (res);
-
-	if (!contended) {
-		smp_mb();
-		return 1;
-	} else {
-		return 0;
-	}
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *rw)
-{
-	smp_mb();
-
-	__asm__ __volatile__(
-	"str	%1, [%0]\n"
-	:
-	: "r" (&rw->lock), "r" (0)
-	: "cc");
-
-	dsb_sev();
-}
-
-/*
- * Read locks are a bit more hairy:
- *  - Exclusively load the lock value.
- *  - Increment it.
- *  - Store new lock value if positive, and we still own this location.
- *    If the value is negative, we've already failed.
- *  - If we failed to store the value, we want a negative result.
- *  - If we failed, try again.
- * Unlocking is similarly hairy.  We may have multiple read locks
- * currently active.  However, we know we won't have any write
- * locks.
- */
-static inline void arch_read_lock(arch_rwlock_t *rw)
-{
-	unsigned long tmp, tmp2;
-
-	prefetchw(&rw->lock);
-	__asm__ __volatile__(
-"	.syntax unified\n"
-"1:	ldrex	%0, [%2]\n"
-"	adds	%0, %0, #1\n"
-"	strexpl	%1, %0, [%2]\n"
-	WFE("mi")
-"	rsbspl	%0, %1, #0\n"
-"	bmi	1b"
-	: "=&r" (tmp), "=&r" (tmp2)
-	: "r" (&rw->lock)
-	: "cc");
-
-	smp_mb();
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *rw)
-{
-	unsigned long tmp, tmp2;
-
-	smp_mb();
-
-	prefetchw(&rw->lock);
-	__asm__ __volatile__(
-"1:	ldrex	%0, [%2]\n"
-"	sub	%0, %0, #1\n"
-"	strex	%1, %0, [%2]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (tmp), "=&r" (tmp2)
-	: "r" (&rw->lock)
-	: "cc");
-
-	if (tmp == 0)
-		dsb_sev();
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *rw)
-{
-	unsigned long contended, res;
-
-	prefetchw(&rw->lock);
-	do {
-		__asm__ __volatile__(
-		"	ldrex	%0, [%2]\n"
-		"	mov	%1, #0\n"
-		"	adds	%0, %0, #1\n"
-		"	strexpl	%1, %0, [%2]"
-		: "=&r" (contended), "=&r" (res)
-		: "r" (&rw->lock)
-		: "cc");
-	} while (res);
-
-	/* If the lock is negative, then it is already held for write. */
-	if (contended < 0x80000000) {
-		smp_mb();
-		return 1;
-	} else {
-		return 0;
-	}
-}
 
 #endif /* __ASM_SPINLOCK_H */
diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h
index 5976958647fe1..24a8a487e03b3 100644
--- a/arch/arm/include/asm/spinlock_types.h
+++ b/arch/arm/include/asm/spinlock_types.h
@@ -29,6 +29,6 @@ typedef struct {
 	u32 lock;
 } arch_rwlock_t;
 
-#define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
+#include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/6] ARM: Use qspinlock implementation
  2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2019-10-13 22:13 ` [PATCH 4/6] ARM: Use qrwlock implementation Sebastian Andrzej Siewior
@ 2019-10-13 22:13 ` Sebastian Andrzej Siewior
  2019-10-13 22:13 ` [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-13 22:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King,
	Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon

Use the generic queued spinlock implementation for spinlock. The WFE
mechanism is used as part of arch_mcs_spin_lock_contended().

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/arm/Kconfig                      |  1 +
 arch/arm/include/asm/Kbuild           |  1 +
 arch/arm/include/asm/spinlock.h       | 77 +--------------------------
 arch/arm/include/asm/spinlock_types.h | 24 +--------
 4 files changed, 5 insertions(+), 98 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 6029d825671c6..4c780e7387208 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -35,6 +35,7 @@ config ARM
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_QUEUED_RWLOCKS
+	select ARCH_USE_QUEUED_SPINLOCKS
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 5327be7572cd2..f98bcfd9612b6 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -18,6 +18,7 @@ generic-y += preempt.h
 generic-y += seccomp.h
 generic-y += serial.h
 generic-y += trace_clock.h
+generic-y += qspinlock.h
 generic-y += qrwlock.h
 
 generated-y += mach-types.h
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index f250a5022d4f6..5d491abfefd02 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -52,83 +52,10 @@ static inline void dsb_sev(void)
  * release it, because V6 CPUs are assumed to have weakly ordered
  * memory.
  */
+
 #include <asm/qrwlock.h>
+#include <asm/qspinlock.h>
 #define smp_mb__after_spinlock()	smp_mb()
 
-static inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	unsigned long tmp;
-	u32 newval;
-	arch_spinlock_t lockval;
-
-	prefetchw(&lock->slock);
-	__asm__ __volatile__(
-"1:	ldrex	%0, [%3]\n"
-"	add	%1, %0, %4\n"
-"	strex	%2, %1, [%3]\n"
-"	teq	%2, #0\n"
-"	bne	1b"
-	: "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
-	: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
-	: "cc");
-
-	while (lockval.tickets.next != lockval.tickets.owner) {
-		wfe();
-		lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
-	}
-
-	smp_mb();
-}
-
-static inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	unsigned long contended, res;
-	u32 slock;
-
-	prefetchw(&lock->slock);
-	do {
-		__asm__ __volatile__(
-		"	ldrex	%0, [%3]\n"
-		"	mov	%2, #0\n"
-		"	subs	%1, %0, %0, ror #16\n"
-		"	addeq	%0, %0, %4\n"
-		"	strexeq	%2, %0, [%3]"
-		: "=&r" (slock), "=&r" (contended), "=&r" (res)
-		: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
-		: "cc");
-	} while (res);
-
-	if (!contended) {
-		smp_mb();
-		return 1;
-	} else {
-		return 0;
-	}
-}
-
-static inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	smp_mb();
-	lock->tickets.owner++;
-	dsb_sev();
-}
-
-static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
-	return lock.tickets.owner == lock.tickets.next;
-}
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	return !arch_spin_value_unlocked(READ_ONCE(*lock));
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	struct __raw_tickets tickets = READ_ONCE(lock->tickets);
-	return (tickets.next - tickets.owner) > 1;
-}
-#define arch_spin_is_contended	arch_spin_is_contended
-
 
 #endif /* __ASM_SPINLOCK_H */
diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h
index 24a8a487e03b3..ecd3c45b19740 100644
--- a/arch/arm/include/asm/spinlock_types.h
+++ b/arch/arm/include/asm/spinlock_types.h
@@ -6,29 +6,7 @@
 # error "please don't include this file directly"
 #endif
 
-#define TICKET_SHIFT	16
-
-typedef struct {
-	union {
-		u32 slock;
-		struct __raw_tickets {
-#ifdef __ARMEB__
-			u16 next;
-			u16 owner;
-#else
-			u16 owner;
-			u16 next;
-#endif
-		} tickets;
-	};
-} arch_spinlock_t;
-
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
-
-typedef struct {
-	u32 lock;
-} arch_rwlock_t;
-
+#include <asm-generic/qspinlock_types.h>
 #include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION
  2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2019-10-13 22:13 ` [PATCH 5/6] ARM: Use qspinlock implementation Sebastian Andrzej Siewior
@ 2019-10-13 22:13 ` Sebastian Andrzej Siewior
  2019-10-14  7:43   ` Arnd Bergmann
  2019-10-14  9:00 ` [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Arnd Bergmann
  2019-10-16 15:57 ` Will Deacon
  7 siblings, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-13 22:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King,
	Sebastian Andrzej Siewior, Ingo Molnar, Waiman Long, Will Deacon

On non-preemptive kernels, the locking instruction is less than 64 bytes
and it makes sense to inline it. With PREEMPTION the kernel becomes very
big if the locks are inlined.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/arm/Kconfig | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4c780e7387208..21edc8e649261 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -26,6 +26,32 @@ config ARM
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select ARCH_HAS_GCOV_PROFILE_ALL
+	select ARCH_INLINE_READ_LOCK if !PREEMPTION
+	select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
+	select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION
+	select ARCH_INLINE_READ_UNLOCK if !PREEMPTION
+	select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION
+	select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION
+	select ARCH_INLINE_WRITE_LOCK if !PREEMPTION
+	select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION
+	select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION
+	select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION
+	select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION
+	select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION
+	select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION
+	select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION
+	select ARCH_INLINE_SPIN_LOCK if !PREEMPTION
+	select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION
+	select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION
+	select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION
+	select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
+	select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
+	select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
 	select ARCH_KEEP_MEMBLOCK if HAVE_ARCH_PFN_VALID || KEXEC
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_NO_SG_CHAIN if !ARM_HAS_SG_CHAIN
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] sh: Move cmpxchg-xchg.h to asm-generic
  2019-10-13 22:13 ` [PATCH 1/6] sh: Move cmpxchg-xchg.h to asm-generic Sebastian Andrzej Siewior
@ 2019-10-14  7:25   ` Arnd Bergmann
  2019-10-15 22:31     ` [PATCH 1/6 v2] " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 22+ messages in thread
From: Arnd Bergmann @ 2019-10-14  7:25 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-arch, Rich Felker, Yoshinori Sato, Linux-sh list,
	Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long,
	Will Deacon, Linux ARM

On Mon, Oct 14, 2019 at 12:13 AM Sebastian Andrzej Siewior
<sebastian@breakpoint.cc> wrote:
>
> The header file is very generic and could be reused by other
> architectures as long as they provide __cmpxchg_u32().
>
> Move sh's cmpxchg-xchg.h to asm-generic.
>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: Arnd Bergmann <arnd@arndb.de>

Acked-by: Arnd Bergmann <arnd@arndb.de>

Maybe change the "#ifndef __ASM_SH_CMPXCHG_XCHG_H"
to __ASM_GENERIC_*.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION
  2019-10-13 22:13 ` [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION Sebastian Andrzej Siewior
@ 2019-10-14  7:43   ` Arnd Bergmann
  2019-10-14 10:01     ` Arnd Bergmann
  2019-10-15 22:04     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 22+ messages in thread
From: Arnd Bergmann @ 2019-10-14  7:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long,
	Will Deacon, Linux ARM

On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior
<sebastian@breakpoint.cc> wrote:
>
> On non-preemptive kernels, the locking instruction is less than 64 bytes
> and it makes sense to inline it. With PREEMPTION the kernel becomes very
> big if the locks are inlined.
>
> Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
> ---

At the moment, we have two architectures selecting all 28 symbols
and you are adding a third, all other architecture select none of them.

This tells me that the configurability has gone a little overboard. How about
adding a shortcut ARCH_INLINE_ALL_SPINLOCKS that selects the 28
symbols and using that for arm/arm64/s390?

Also, the output of 'size vmlinux' before and after the patch for
multi_v7_defconfig would be useful to have in the changelog, as there
are a couple of platforms that are particularly sensitive to object code
size changes.

      Arnd

> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 4c780e7387208..21edc8e649261 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -26,6 +26,32 @@ config ARM
>         select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>         select ARCH_HAVE_CUSTOM_GPIO_H
>         select ARCH_HAS_GCOV_PROFILE_ALL
> +       select ARCH_INLINE_READ_LOCK if !PREEMPTION
> +       select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION
> +       select ARCH_INLINE_READ_UNLOCK if !PREEMPTION
> +       select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION
> +       select ARCH_INLINE_WRITE_LOCK if !PREEMPTION
> +       select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION
> +       select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION
> +       select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION
> +       select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION
> +       select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_SPIN_LOCK if !PREEMPTION
> +       select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION
> +       select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION
> +       select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
> +       select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
> +       select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
>         select ARCH_KEEP_MEMBLOCK if HAVE_ARCH_PFN_VALID || KEXEC
>         select ARCH_MIGHT_HAVE_PC_PARPORT
>         select ARCH_NO_SG_CHAIN if !ARM_HAS_SG_CHAIN
> --
> 2.23.0
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM
  2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2019-10-13 22:13 ` [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION Sebastian Andrzej Siewior
@ 2019-10-14  9:00 ` Arnd Bergmann
  2019-10-16 15:57 ` Will Deacon
  7 siblings, 0 replies; 22+ messages in thread
From: Arnd Bergmann @ 2019-10-14  9:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long,
	Will Deacon, Linux ARM

On Mon, Oct 14, 2019 at 12:13 AM Sebastian Andrzej Siewior
<sebastian@breakpoint.cc> wrote:
>
> I added support for queued-RW and -spinlocks for ARM. I followed Arnd's
> suggestion and added support for xchg() on 8bit and 16bit variables (V6
> CPUs) via the SH implementation. This makes it possible to remove the
> current ticket based locking implementation.
>
> The numbers should be the same as in v1 posted here:
>    http://lkml.kernel.org/r/20191007214439.27891-1-sebastian@breakpoint.cc
>
> The only thing changed is that patch #1-#3 wire up the missing xchg and
> the patches #4 and #5 additionally remove the old implementation while
> adding the missing bits for the queued implementation.

I like this, just had two very minor comments.

I'm adding it to my randconfig build bot for more testing and will let you know
in case I find any problems.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION
  2019-10-14  7:43   ` Arnd Bergmann
@ 2019-10-14 10:01     ` Arnd Bergmann
  2019-10-15 22:30       ` Sebastian Andrzej Siewior
  2019-10-15 22:04     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 22+ messages in thread
From: Arnd Bergmann @ 2019-10-14 10:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long,
	Will Deacon, Linux ARM

On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote:
> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote
>
> Also, the output of 'size vmlinux' before and after the patch for
> multi_v7_defconfig would be useful to have in the changelog, as there
> are a couple of platforms that are particularly sensitive to object code
> size changes.

To follow up, here are the numbers I get, building the linux-5.4-rc2
multi_v7_defconfig with clang-9, comparing the original spinlock
and the qspinlock, combined with inlining all locks or leaving them
out of line:

   text    data     bss     dec     hex filename
15294816 6958636 404480 22657932 159bb8c vmlinux-orig
16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline
15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline
15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline

This gives us a 1.5% size increase over the original code with
inlining, or a 0.5% decrease without inlining, or about 1.9% size
difference for the Kconfig change itself, which does sound
significant.

Maybe it should be configurable?

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION
  2019-10-14  7:43   ` Arnd Bergmann
  2019-10-14 10:01     ` Arnd Bergmann
@ 2019-10-15 22:04     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-15 22:04 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long,
	Will Deacon, Linux ARM

On 2019-10-14 09:43:53 [+0200], Arnd Bergmann wrote:
> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior
> <sebastian@breakpoint.cc> wrote:
> >
> > On non-preemptive kernels, the locking instruction is less than 64 bytes
> > and it makes sense to inline it. With PREEMPTION the kernel becomes very
> > big if the locks are inlined.
> >
> > Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
> > ---
> 
> At the moment, we have two architectures selecting all 28 symbols
> and you are adding a third, all other architecture select none of them.
> 
> This tells me that the configurability has gone a little overboard. How about
> adding a shortcut ARCH_INLINE_ALL_SPINLOCKS that selects the 28
> symbols and using that for arm/arm64/s390?

Sounds reasonable. 

> Also, the output of 'size vmlinux' before and after the patch for
> multi_v7_defconfig would be useful to have in the changelog, as there
> are a couple of platforms that are particularly sensitive to object code
> size changes.

okay.

>       Arnd

Sebastian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION
  2019-10-14 10:01     ` Arnd Bergmann
@ 2019-10-15 22:30       ` Sebastian Andrzej Siewior
  2019-10-15 22:37         ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-15 22:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long,
	Will Deacon, Linux ARM

On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote:
> On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote
> >
> > Also, the output of 'size vmlinux' before and after the patch for
> > multi_v7_defconfig would be useful to have in the changelog, as there
> > are a couple of platforms that are particularly sensitive to object code
> > size changes.
> 
> To follow up, here are the numbers I get, building the linux-5.4-rc2
> multi_v7_defconfig with clang-9, comparing the original spinlock
> and the qspinlock, combined with inlining all locks or leaving them
> out of line:
> 
>    text    data     bss     dec     hex filename
> 15294816 6958636 404480 22657932 159bb8c vmlinux-orig
> 16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline
> 15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline
> 15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline
> 
> This gives us a 1.5% size increase over the original code with
> inlining, or a 0.5% decrease without inlining, or about 1.9% size
> difference for the Kconfig change itself, which does sound
> significant.

I had 2% increase (vmlinux-orig -> vmlinux-qlock-inline) but my vmlinux
was only half the size. Performance wise the inlining improved the
hackbench numbers in my test. How do we account that?

> Maybe it should be configurable?

Any comment from the locking department? I would prefer to avoid an
extra knob for it. 
The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I
posted last time: with inlining I get more or less to the performance of
the ticket implementation on imx6 and it makes no difference on AM572x.
Let me run the hackbench test with the multi_v7_defconfig on my two
boards with ORIG/qlock/qlock-inline and come with some numbers here.

>       Arnd

Sebastian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6 v2] sh: Move cmpxchg-xchg.h to asm-generic
  2019-10-14  7:25   ` Arnd Bergmann
@ 2019-10-15 22:31     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-15 22:31 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Rich Felker, Yoshinori Sato, Linux-sh list,
	Peter Zijlstra, Russell King, Ingo Molnar, Waiman Long,
	Will Deacon, Linux ARM

The header file is very generic and could be reused by other
architectures as long as they provide __cmpxchg_u32().

Move sh's cmpxchg-xchg.h to asm-generic.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-sh@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
v1…v2: s@__ASM_SH_CMPXCHG_XCHG_H@__ASM_GENERIC_CMPXCHG_XCHG_H@

 arch/sh/include/asm/Kbuild                                  | 1 +
 {arch/sh/include/asm => include/asm-generic}/cmpxchg-xchg.h | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)
 rename {arch/sh/include/asm => include/asm-generic}/cmpxchg-xchg.h (91%)

diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 51a54df22c110..08c1d96286d9d 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 generated-y += syscall_table.h
+generic-y += cmpxchg-xchg.h
 generic-y += compat.h
 generic-y += current.h
 generic-y += delay.h
diff --git a/arch/sh/include/asm/cmpxchg-xchg.h b/include/asm-generic/cmpxchg-xchg.h
similarity index 91%
rename from arch/sh/include/asm/cmpxchg-xchg.h
rename to include/asm-generic/cmpxchg-xchg.h
index c373f21efe4d9..c9acd6ff8741d 100644
--- a/arch/sh/include/asm/cmpxchg-xchg.h
+++ b/include/asm-generic/cmpxchg-xchg.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_SH_CMPXCHG_XCHG_H
-#define __ASM_SH_CMPXCHG_XCHG_H
+#ifndef __ASM_GENERIC_CMPXCHG_XCHG_H
+#define __ASM_GENERIC_CMPXCHG_XCHG_H
 
 /*
  * Copyright (C) 2016 Red Hat, Inc.
@@ -47,4 +47,4 @@ static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
 	return __xchg_cmpxchg(m, val, sizeof *m);
 }
 
-#endif /* __ASM_SH_CMPXCHG_XCHG_H */
+#endif /* __ASM_GENERIC_CMPXCHG_XCHG_H */
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION
  2019-10-15 22:30       ` Sebastian Andrzej Siewior
@ 2019-10-15 22:37         ` Waiman Long
  2019-10-15 22:47           ` Arnd Bergmann
  0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2019-10-15 22:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Arnd Bergmann
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Russell King, Linux ARM

On 10/15/19 6:30 PM, Sebastian Andrzej Siewior wrote:
> On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote:
>> On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote:
>>> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote
>>>
>>> Also, the output of 'size vmlinux' before and after the patch for
>>> multi_v7_defconfig would be useful to have in the changelog, as there
>>> are a couple of platforms that are particularly sensitive to object code
>>> size changes.
>> To follow up, here are the numbers I get, building the linux-5.4-rc2
>> multi_v7_defconfig with clang-9, comparing the original spinlock
>> and the qspinlock, combined with inlining all locks or leaving them
>> out of line:
>>
>>    text    data     bss     dec     hex filename
>> 15294816 6958636 404480 22657932 159bb8c vmlinux-orig
>> 16004898 6962060 404480 23371438 1649eae vmlinux-orig-inline
>> 15198619 6958812 404560 22561991 15844c7 vmlinux-qlock-noinline
>> 15622897 6962116 404560 22989573 15ecb05 vmlinux-qlock-inline
>>
>> This gives us a 1.5% size increase over the original code with
>> inlining, or a 0.5% decrease without inlining, or about 1.9% size
>> difference for the Kconfig change itself, which does sound
>> significant.
> I had 2% increase (vmlinux-orig -> vmlinux-qlock-inline) but my vmlinux
> was only half the size. Performance wise the inlining improved the
> hackbench numbers in my test. How do we account that?
>
>> Maybe it should be configurable?
> Any comment from the locking department? I would prefer to avoid an
> extra knob for it. 
> The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I
> posted last time: with inlining I get more or less to the performance of
> the ticket implementation on imx6 and it makes no difference on AM572x.
> Let me run the hackbench test with the multi_v7_defconfig on my two
> boards with ORIG/qlock/qlock-inline and come with some numbers here.

Perhaps, we should not just looking at the all inlined or all uninlined
cases. Different variants of the lock and unlock functions can differ
widely in size depends on how the irq handling code is handled in each
architecture. Maybe we can inline the small ones but leave the bigger
ones uninlined. That can increase performance without too much overhead
in kernel size.

Cheers,
Longman


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION
  2019-10-15 22:37         ` Waiman Long
@ 2019-10-15 22:47           ` Arnd Bergmann
  0 siblings, 0 replies; 22+ messages in thread
From: Arnd Bergmann @ 2019-10-15 22:47 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Russell King, Sebastian Andrzej Siewior,
	Ingo Molnar, Will Deacon, Linux ARM

On Wed, Oct 16, 2019 at 12:37 AM Waiman Long <longman@redhat.com> wrote:
> On 10/15/19 6:30 PM, Sebastian Andrzej Siewior wrote:
> > On 2019-10-14 12:01:02 [+0200], Arnd Bergmann wrote:
> >> On Mon, Oct 14, 2019 at 9:43 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >>> On Mon, Oct 14, 2019 at 12:14 AM Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote
> >> Maybe it should be configurable?
> > Any comment from the locking department? I would prefer to avoid an
> > extra knob for it.
> > The v7 config is PREEMPT_NONE and HZ_100. Based on the perf numbers I
> > posted last time: with inlining I get more or less to the performance of
> > the ticket implementation on imx6 and it makes no difference on AM572x.
> > Let me run the hackbench test with the multi_v7_defconfig on my two
> > boards with ORIG/qlock/qlock-inline and come with some numbers here.
>
> Perhaps, we should not just looking at the all inlined or all uninlined
> cases. Different variants of the lock and unlock functions can differ
> widely in size depends on how the irq handling code is handled in each
> architecture. Maybe we can inline the small ones but leave the bigger
> ones uninlined. That can increase performance without too much overhead
> in kernel size.

We also have CONFIG_CC_OPTIMIZE_FOR_SIZE, which at the moment
only controls the -O2 / -Os options, but basing it on that would be
an easy way to avoid another user-visible option.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM
  2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2019-10-14  9:00 ` [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Arnd Bergmann
@ 2019-10-16 15:57 ` Will Deacon
  2019-10-16 17:45   ` Waiman Long
  2019-10-16 22:09   ` Sebastian Andrzej Siewior
  7 siblings, 2 replies; 22+ messages in thread
From: Will Deacon @ 2019-10-16 15:57 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King, Ingo Molnar,
	Waiman Long, linux-arm-kernel

Hi Sebastian,

On Mon, Oct 14, 2019 at 12:13:04AM +0200, Sebastian Andrzej Siewior wrote:
> I added support for queued-RW and -spinlocks for ARM. I followed Arnd's
> suggestion and added support for xchg() on 8bit and 16bit variables (V6
> CPUs) via the SH implementation. This makes it possible to remove the
> current ticket based locking implementation. 
> 
> The numbers should be the same as in v1 posted here:
>    http://lkml.kernel.org/r/20191007214439.27891-1-sebastian@breakpoint.cc
> 
> The only thing changed is that patch #1-#3 wire up the missing xchg and
> the patches #4 and #5 additionally remove the old implementation while
> adding the missing bits for the queued implementation.

To be honest with you, I'm surprised that qspinlock is worth it for 32-bit
Arm. qrwlock makes sense because of fairness and starvation issues, but in
my benchmarks on arm64 tickets tended to perform at least as well for small
core counts, and I think that's largely going to be true for systems
running a 32-bit kernel.

I've uploaded my (crude, was never meant to be shared!) benchmark harness
here:

  https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/spinbench

which I used to generate graphs like:

  https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/slides/arm64-spinlocks.pdf

Maybe you could port it to 32-bit and see what the numbers look like? The
qspinlock code probably needs re-syncing with mainline too, but see how you
go. If it ends up being useful maybe I'll host it in a git tree.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM
  2019-10-16 15:57 ` Will Deacon
@ 2019-10-16 17:45   ` Waiman Long
  2019-10-16 18:33     ` Arnd Bergmann
  2019-10-16 21:35     ` Sebastian Andrzej Siewior
  2019-10-16 22:09   ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 22+ messages in thread
From: Waiman Long @ 2019-10-16 17:45 UTC (permalink / raw)
  To: Will Deacon, Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, linux-arm-kernel,
	Russell King

On 10/16/19 11:57 AM, Will Deacon wrote:
> Hi Sebastian,
>
> On Mon, Oct 14, 2019 at 12:13:04AM +0200, Sebastian Andrzej Siewior wrote:
>> I added support for queued-RW and -spinlocks for ARM. I followed Arnd's
>> suggestion and added support for xchg() on 8bit and 16bit variables (V6
>> CPUs) via the SH implementation. This makes it possible to remove the
>> current ticket based locking implementation. 
>>
>> The numbers should be the same as in v1 posted here:
>>    http://lkml.kernel.org/r/20191007214439.27891-1-sebastian@breakpoint.cc
>>
>> The only thing changed is that patch #1-#3 wire up the missing xchg and
>> the patches #4 and #5 additionally remove the old implementation while
>> adding the missing bits for the queued implementation.
> To be honest with you, I'm surprised that qspinlock is worth it for 32-bit
> Arm. qrwlock makes sense because of fairness and starvation issues, but in
> my benchmarks on arm64 tickets tended to perform at least as well for small
> core counts, and I think that's largely going to be true for systems
> running a 32-bit kernel.

My own testing on qspinlocks in the past, at least on x86-64, is that it
is comparable to ticket lock with one or 2 contending tasks if not
better. When there are 3 contending tasks, qspinlock will then be a bit
slower than ticket lock because the overhead of using the MCS node for
queuing is showing up. Starting with 4 and more contending tasks,
qspinlock starts to show its strength and performing better and better
with more contending tasks.

So if the target is 4 cores or less, there isn't too much to gain from
using qspinlock. Lock function inlining probably has a bigger
performance benefit here than switching to qspinlock. Just food for thought.

Cheers,
Longman


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM
  2019-10-16 17:45   ` Waiman Long
@ 2019-10-16 18:33     ` Arnd Bergmann
  2019-10-16 21:35     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 22+ messages in thread
From: Arnd Bergmann @ 2019-10-16 18:33 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Russell King, Sebastian Andrzej Siewior,
	Ingo Molnar, Will Deacon, Linux ARM

On Wed, Oct 16, 2019 at 7:45 PM Waiman Long <longman@redhat.com> wrote:
> So if the target is 4 cores or less, there isn't too much to gain from
> using qspinlock. Lock function inlining probably has a bigger
> performance benefit here than switching to qspinlock. Just food for thought.

As a very crude estimate, here are the numbers of machine descriptions
based on the number of CPUs in each machine:

git grep -c '\<cpu@' arch/arm/boot/dts/ | cut -f 2 -d: | sort -n | uniq -c
    115 1
     63 2
     48 4
      1 5
      1 6
      8 8
      2 16

Filtering out the machines with at most four cores:

$ git grep -c '\<cpu@' arch/arm/boot/dts/  | grep -v ':[124]$'
arch/arm/boot/dts/axm5516-cpus.dtsi:16
arch/arm/boot/dts/exynos5260.dtsi:6
arch/arm/boot/dts/exynos5420-cpus.dtsi:8
arch/arm/boot/dts/exynos5422-cpus.dtsi:8
arch/arm/boot/dts/hip04.dtsi:16
arch/arm/boot/dts/mt6592.dtsi:8
arch/arm/boot/dts/r8a7790.dtsi:8
arch/arm/boot/dts/sun8i-a83t.dtsi:8
arch/arm/boot/dts/sun9i-a80.dtsi:8
arch/arm/boot/dts/tegra124.dtsi:8
arch/arm/boot/dts/tegra30.dtsi:8
arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts:5

None of the above are recent or popular. There are still more
dual-core systems than quad-code, but most of the new ones
are either quad-code or single-core.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM
  2019-10-16 17:45   ` Waiman Long
  2019-10-16 18:33     ` Arnd Bergmann
@ 2019-10-16 21:35     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-16 21:35 UTC (permalink / raw)
  To: Waiman Long
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King, Ingo Molnar,
	Will Deacon, linux-arm-kernel

On 2019-10-16 13:45:13 [-0400], Waiman Long wrote:
> My own testing on qspinlocks in the past, at least on x86-64, is that it
> is comparable to ticket lock with one or 2 contending tasks if not
> better. When there are 3 contending tasks, qspinlock will then be a bit
> slower than ticket lock because the overhead of using the MCS node for
> queuing is showing up. Starting with 4 and more contending tasks,
> qspinlock starts to show its strength and performing better and better
> with more contending tasks.
> 
> So if the target is 4 cores or less, there isn't too much to gain from
> using qspinlock. Lock function inlining probably has a bigger
> performance benefit here than switching to qspinlock. Just food for thought.

from my initial numbers:
|IMX6q
|~~~~~
|HZ_100  | PREEMPT_NONE  | PREEMPT_VOLUNTARY 	| PREEMPT
|Ticket  | 52.103        | 52.284		| 60.5681
|Q-locks | 54.1804	 | 53.267		| 56.1914
|Q-locksI| 52.2985       | 49.398		| 56.7441
|
|AM572x
|~~~~~~
|HZ_100  | PREEMPT_NONE  | PREEMPT_VOLUNTARY 	| PREEMPT
|Ticket  | 42.3819       | 42.4821     		| 43.2671
|Q-locks | 40.9141	 | 40.0269	        | 42.65  
|Q-locksI| 40.0763       | 39.9101     		| 40.7811

AM572x (Cortex-A15) has two CPUs, IMX6q (Cortex-A9) has four. Here on
AM572x q-locks gain ~3.5% (PREEMPT_NONE) without inlining. IMX6q
performs worse in the same category. However it performs better on
HZ_250 which I wouldn't expect.

> Cheers,
> Longman

Sebastian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM
  2019-10-16 15:57 ` Will Deacon
  2019-10-16 17:45   ` Waiman Long
@ 2019-10-16 22:09   ` Sebastian Andrzej Siewior
  2019-10-17 15:36     ` Waiman Long
  1 sibling, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-16 22:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnd Bergmann, Peter Zijlstra, Russell King, Ingo Molnar,
	Waiman Long, linux-arm-kernel

On 2019-10-16 16:57:56 [+0100], Will Deacon wrote:
> Hi Sebastian,
Hi Will,

> To be honest with you, I'm surprised that qspinlock is worth it for 32-bit
> Arm. qrwlock makes sense because of fairness and starvation issues, but in
> my benchmarks on arm64 tickets tended to perform at least as well for small
> core counts, and I think that's largely going to be true for systems
> running a 32-bit kernel.

Waiman had numbers on x86 on how good this q-lock thingy is and x86
moved to this. ARM64 didn't state a reason in particular only saying
"it served us well but we do q-locks now" (this is how I remember it
after I went over the git logs some time ago).
Not knowing anything about qspin vs ticket locking I looked on how hard
will be to switch. Then I looked with hackbench at it to see if it was
worth it and it didn't look that bad. Plus the diffstat was negative so…

> I've uploaded my (crude, was never meant to be shared!) benchmark harness
> here:
> 
>   https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/spinbench
> 
> which I used to generate graphs like:
> 
>   https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/slides/arm64-spinlocks.pdf
> 
> Maybe you could port it to 32-bit and see what the numbers look like? The
> qspinlock code probably needs re-syncing with mainline too, but see how you
> go. If it ends up being useful maybe I'll host it in a git tree.

Okay. Let me see what I can do.

> Will

Sebastian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM
  2019-10-16 22:09   ` Sebastian Andrzej Siewior
@ 2019-10-17 15:36     ` Waiman Long
  0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2019-10-17 15:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Will Deacon
  Cc: Peter Zijlstra, Ingo Molnar, Arnd Bergmann, linux-arm-kernel,
	Russell King

On 10/16/19 6:09 PM, Sebastian Andrzej Siewior wrote:
> On 2019-10-16 16:57:56 [+0100], Will Deacon wrote:
>> Hi Sebastian,
> Hi Will,
>
>> To be honest with you, I'm surprised that qspinlock is worth it for 32-bit
>> Arm. qrwlock makes sense because of fairness and starvation issues, but in
>> my benchmarks on arm64 tickets tended to perform at least as well for small
>> core counts, and I think that's largely going to be true for systems
>> running a 32-bit kernel.
> Waiman had numbers on x86 on how good this q-lock thingy is and x86
> moved to this. ARM64 didn't state a reason in particular only saying
> "it served us well but we do q-locks now" (this is how I remember it
> after I went over the git logs some time ago).
> Not knowing anything about qspin vs ticket locking I looked on how hard
> will be to switch. Then I looked with hackbench at it to see if it was
> worth it and it didn't look that bad. Plus the diffstat was negative so…

I had seen microbenchmarks like fio that on 2-socket arm64 systems (with
> 100 logical CPUs), you can get 4-5X performance out of qspinlock when
compared with ticket lock. It was more than I usually saw on x86-64
system. My conclusion is that arm64 systems are probably more
susceptible to the ill effect of cacheline contention than x86-64
systems. So it is certainly a win for arm64.

Cheers,
Longman



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-10-17 15:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-13 22:13 [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Sebastian Andrzej Siewior
2019-10-13 22:13 ` [PATCH 1/6] sh: Move cmpxchg-xchg.h to asm-generic Sebastian Andrzej Siewior
2019-10-14  7:25   ` Arnd Bergmann
2019-10-15 22:31     ` [PATCH 1/6 v2] " Sebastian Andrzej Siewior
2019-10-13 22:13 ` [PATCH 2/6] ARM: cmpxchg: Define first cmpxchg() followed by xchg() Sebastian Andrzej Siewior
2019-10-13 22:13 ` [PATCH 3/6] ARM: Add xchg_{8|16}() on generic cmpxchg() on CPU_V6 Sebastian Andrzej Siewior
2019-10-13 22:13 ` [PATCH 4/6] ARM: Use qrwlock implementation Sebastian Andrzej Siewior
2019-10-13 22:13 ` [PATCH 5/6] ARM: Use qspinlock implementation Sebastian Andrzej Siewior
2019-10-13 22:13 ` [PATCH 6/6] ARM: Inline locking functions for !PREEMPTION Sebastian Andrzej Siewior
2019-10-14  7:43   ` Arnd Bergmann
2019-10-14 10:01     ` Arnd Bergmann
2019-10-15 22:30       ` Sebastian Andrzej Siewior
2019-10-15 22:37         ` Waiman Long
2019-10-15 22:47           ` Arnd Bergmann
2019-10-15 22:04     ` Sebastian Andrzej Siewior
2019-10-14  9:00 ` [RFC PATCH 0/6 v2] Queued spinlocks/RW-locks for ARM Arnd Bergmann
2019-10-16 15:57 ` Will Deacon
2019-10-16 17:45   ` Waiman Long
2019-10-16 18:33     ` Arnd Bergmann
2019-10-16 21:35     ` Sebastian Andrzej Siewior
2019-10-16 22:09   ` Sebastian Andrzej Siewior
2019-10-17 15:36     ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).