linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] arm64: queued spinlocks and rw-locks
@ 2017-05-03 14:51 Yury Norov
  2017-05-03 14:51 ` [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c Yury Norov
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Yury Norov @ 2017-05-03 14:51 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, linux-kernel, linux-arch, linux-arm-kernel
  Cc: Yury Norov, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

The patch 3 adds implementation for queued-based locking on
ARM64, and the option in kernel config to enable it. Patches
1 and 2 fix some mess in header files to apply patch 3 smoothly.

Tested on QDF2400 with huge improvements with these patches on
the torture tests, by Adam Wallis.

Tested on ThunderX, by Andrew Pinski:
120 thread (30 core - 4 thread/core) CN99xx (single socket):

benchmark               Units	qspinlocks vs ticket locks
sched/messaging		s	73.91%
sched/pipe		ops/s	104.18%
futex/hash		ops/s	103.87%
futex/wake		ms	71.04%
futex/wake-parallel	ms	93.88%
futex/requeue		ms	96.47%
futex/lock-pi		ops/s	118.33%

Notice, there's the queued locks implementation for the Power PC introduced
by Pan Xinhui. He largely tested it and also found significant performance
gain. In arch part it is very similar to this patch though.
https://lwn.net/Articles/701137/

RFC: https://www.spinics.net/lists/arm-kernel/msg575575.html
v1:
 - queued_spin_unlock_wait() and queued_spin_is_locked() are
   re-implemented in arch part to add additional memory barriers;
 - queued locks are made optional, ticket locks are enabled by default.

Jan Glauber (1):
  arm64/locking: qspinlocks and qrwlocks support

Yury Norov (2):
  kernel/locking: #include <asm/spinlock.h> in qrwlock.c
  asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h

 arch/arm64/Kconfig                      | 24 +++++++++++++++++++
 arch/arm64/include/asm/qrwlock.h        |  7 ++++++
 arch/arm64/include/asm/qspinlock.h      | 42 +++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/spinlock.h       | 12 ++++++++++
 arch/arm64/include/asm/spinlock_types.h | 14 ++++++++---
 arch/arm64/kernel/Makefile              |  1 +
 arch/arm64/kernel/qspinlock.c           | 34 ++++++++++++++++++++++++++
 include/asm-generic/qspinlock.h         |  1 +
 include/asm-generic/qspinlock_types.h   |  8 -------
 kernel/locking/qrwlock.c                |  1 +
 10 files changed, 133 insertions(+), 11 deletions(-)
 create mode 100644 arch/arm64/include/asm/qrwlock.h
 create mode 100644 arch/arm64/include/asm/qspinlock.h
 create mode 100644 arch/arm64/kernel/qspinlock.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c
  2017-05-03 14:51 [PATCH 0/3] arm64: queued spinlocks and rw-locks Yury Norov
@ 2017-05-03 14:51 ` Yury Norov
  2017-05-03 15:05   ` Geert Uytterhoeven
  2017-05-03 14:51 ` [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h Yury Norov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Yury Norov @ 2017-05-03 14:51 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, linux-kernel, linux-arch, linux-arm-kernel
  Cc: Yury Norov, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

qrwlock.c calls arch_spin_lock() and arch_spin_unlock() but doesn't
include the asm/spinlock.h, where those functions are defined. It
may produce "implicit declaration of function" errors. This patch
fixes it.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 kernel/locking/qrwlock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index cc3ed0ccdfa2..6fb42925b201 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -20,6 +20,7 @@
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
+#include <asm/spinlock.h>
 #include <asm/qrwlock.h>
 
 /*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h
  2017-05-03 14:51 [PATCH 0/3] arm64: queued spinlocks and rw-locks Yury Norov
  2017-05-03 14:51 ` [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c Yury Norov
@ 2017-05-03 14:51 ` Yury Norov
  2017-05-04  8:01   ` Arnd Bergmann
  2017-05-03 14:51 ` [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support Yury Norov
       [not found] ` <SIXPR0199MB0604CF9C101455F7D7417FF7C5160@SIXPR0199MB0604.apcprd01.prod.exchangelabs.com>
  3 siblings, 1 reply; 22+ messages in thread
From: Yury Norov @ 2017-05-03 14:51 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, linux-kernel, linux-arch, linux-arm-kernel
  Cc: Yury Norov, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

The "qspinlock_types.h" doesn't need linux/atomic.h directly. So
because of this, and because including of it requires the protection
against recursive inclusion, it looks reasonable to move the
inclusion exactly where it is needed. This change affects the x86_64
arch, as the only user of qspinlocks at now. I have build-tested the
change on x86_64 with CONFIG_PARAVIRT enabled and disabled.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 include/asm-generic/qspinlock.h       | 1 +
 include/asm-generic/qspinlock_types.h | 8 --------
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 9f0681bf1e87..5f4d42a09175 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -20,6 +20,7 @@
 #define __ASM_GENERIC_QSPINLOCK_H
 
 #include <asm-generic/qspinlock_types.h>
+#include <linux/atomic.h>
 
 /**
  * queued_spin_unlock_wait - wait until the _current_ lock holder releases the lock
diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h
index 034acd0c4956..a13cc90c87fc 100644
--- a/include/asm-generic/qspinlock_types.h
+++ b/include/asm-generic/qspinlock_types.h
@@ -18,15 +18,7 @@
 #ifndef __ASM_GENERIC_QSPINLOCK_TYPES_H
 #define __ASM_GENERIC_QSPINLOCK_TYPES_H
 
-/*
- * Including atomic.h with PARAVIRT on will cause compilation errors because
- * of recursive header file incluson via paravirt_types.h. So don't include
- * it if PARAVIRT is on.
- */
-#ifndef CONFIG_PARAVIRT
 #include <linux/types.h>
-#include <linux/atomic.h>
-#endif
 
 typedef struct qspinlock {
 	atomic_t	val;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-05-03 14:51 [PATCH 0/3] arm64: queued spinlocks and rw-locks Yury Norov
  2017-05-03 14:51 ` [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c Yury Norov
  2017-05-03 14:51 ` [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h Yury Norov
@ 2017-05-03 14:51 ` Yury Norov
  2017-05-09  4:47   ` Boqun Feng
       [not found] ` <SIXPR0199MB0604CF9C101455F7D7417FF7C5160@SIXPR0199MB0604.apcprd01.prod.exchangelabs.com>
  3 siblings, 1 reply; 22+ messages in thread
From: Yury Norov @ 2017-05-03 14:51 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, linux-kernel, linux-arch, linux-arm-kernel
  Cc: Yury Norov, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

From: Jan Glauber <jglauber@cavium.com>

Ported from x86_64 with paravirtualization support removed.

Signed-off-by: Jan Glauber <jglauber@cavium.com>

Note. This patch removes protection from direct inclusion of
arch/arm64/include/asm/spinlock_types.h. It's done because
kernel/locking/qrwlock.c file does it thru the header
include/asm-generic/qrwlock_types.h. Until now the only user
of qrwlock.c was x86, and there's no such protection too.

I'm not happy to remove the protection, but if it's OK for x86,
it should be also OK for arm64. If not, I think we'd fix it
for x86, and add the protection there too.

Yury

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/Kconfig                      | 24 +++++++++++++++++++
 arch/arm64/include/asm/qrwlock.h        |  7 ++++++
 arch/arm64/include/asm/qspinlock.h      | 42 +++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/spinlock.h       | 12 ++++++++++
 arch/arm64/include/asm/spinlock_types.h | 14 ++++++++---
 arch/arm64/kernel/Makefile              |  1 +
 arch/arm64/kernel/qspinlock.c           | 34 ++++++++++++++++++++++++++
 7 files changed, 131 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm64/include/asm/qrwlock.h
 create mode 100644 arch/arm64/include/asm/qspinlock.h
 create mode 100644 arch/arm64/kernel/qspinlock.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 22dbde97eefa..db24b3b3f3c6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -25,6 +25,8 @@ config ARM64
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
+	select ARCH_USE_QUEUED_RWLOCKS if QUEUED_LOCKS
+	select ARCH_USE_QUEUED_SPINLOCKS if QUEUED_LOCKS
 	select ARM_AMBA
 	select ARM_ARCH_TIMER
 	select ARM_GIC
@@ -692,6 +694,28 @@ config ARCH_WANT_HUGE_PMD_SHARE
 config ARCH_HAS_CACHE_LINE_SIZE
 	def_bool y
 
+choice
+	prompt "Locking type"
+	default TICKET_LOCKS
+	help
+	  Choose between traditional ticked-based locking mechanism and
+	  queued-based mechanism.
+
+config TICKET_LOCKS
+	bool "Ticket locks"
+	help
+	  Ticked-based locking implementation for ARM64
+
+config QUEUED_LOCKS
+	bool "Queued locks"
+	help
+	  Queue-based locking mechanism. This option improves
+	  locking performance in case of high-contended locking
+	  on many-cpu machines. On low-cpu machines there is no
+	  difference comparing to ticked-based locking.
+
+endchoice
+
 source "mm/Kconfig"
 
 config SECCOMP
diff --git a/arch/arm64/include/asm/qrwlock.h b/arch/arm64/include/asm/qrwlock.h
new file mode 100644
index 000000000000..626f6ebfb52d
--- /dev/null
+++ b/arch/arm64/include/asm/qrwlock.h
@@ -0,0 +1,7 @@
+#ifndef _ASM_ARM64_QRWLOCK_H
+#define _ASM_ARM64_QRWLOCK_H
+
+#include <asm-generic/qrwlock_types.h>
+#include <asm-generic/qrwlock.h>
+
+#endif /* _ASM_ARM64_QRWLOCK_H */
diff --git a/arch/arm64/include/asm/qspinlock.h b/arch/arm64/include/asm/qspinlock.h
new file mode 100644
index 000000000000..09ef4f13f549
--- /dev/null
+++ b/arch/arm64/include/asm/qspinlock.h
@@ -0,0 +1,42 @@
+#ifndef _ASM_ARM64_QSPINLOCK_H
+#define _ASM_ARM64_QSPINLOCK_H
+
+#include <asm-generic/qspinlock_types.h>
+#include <asm/atomic.h>
+
+extern void queued_spin_unlock_wait(struct qspinlock *lock);
+#define queued_spin_unlock_wait queued_spin_unlock_wait
+
+#define	queued_spin_unlock queued_spin_unlock
+/**
+ * queued_spin_unlock - release a queued spinlock
+ * @lock : Pointer to queued spinlock structure
+ *
+ * A smp_store_release() on the least-significant byte.
+ */
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+	smp_store_release((u8 *)lock, 0);
+}
+
+#define queued_spin_is_locked queued_spin_is_locked
+/**
+ * queued_spin_is_locked - is the spinlock locked?
+ * @lock: Pointer to queued spinlock structure
+ * Return: 1 if it is locked, 0 otherwise
+ */
+static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
+{
+	/*
+	 * See queued_spin_unlock_wait().
+	 *
+	 * Any !0 state indicates it is locked, even if _Q_LOCKED_VAL
+	 * isn't immediately observable.
+	 */
+	smp_mb();
+	return atomic_read(&lock->val);
+}
+
+#include <asm-generic/qspinlock.h>
+
+#endif /* _ASM_ARM64_QSPINLOCK_H */
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index cae331d553f8..37713397e0c5 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -20,6 +20,10 @@
 #include <asm/spinlock_types.h>
 #include <asm/processor.h>
 
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#else
+
 /*
  * Spinlock implementation.
  *
@@ -187,6 +191,12 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+#ifdef CONFIG_QUEUED_RWLOCKS
+#include <asm/qrwlock.h>
+#else
+
 /*
  * Write lock implementation.
  *
@@ -351,6 +361,8 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
 /* read_can_lock - would read_trylock() succeed? */
 #define arch_read_can_lock(x)		((x)->lock < 0x80000000)
 
+#endif /* CONFIG_QUEUED_RWLOCKS */
+
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
 
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
index 55be59a35e3f..0f0f1561ab6a 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -16,9 +16,9 @@
 #ifndef __ASM_SPINLOCK_TYPES_H
 #define __ASM_SPINLOCK_TYPES_H
 
-#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H)
-# error "please don't include this file directly"
-#endif
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm-generic/qspinlock_types.h>
+#else
 
 #include <linux/types.h>
 
@@ -36,10 +36,18 @@ typedef struct {
 
 #define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 , 0 }
 
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+#ifdef CONFIG_QUEUED_RWLOCKS
+#include <asm-generic/qrwlock_types.h>
+#else
+
 typedef struct {
 	volatile unsigned int lock;
 } arch_rwlock_t;
 
 #define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
 
+#endif /* CONFIG_QUEUED_RWLOCKS */
+
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 9d56467dc223..f48f6256e893 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -56,6 +56,7 @@ arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
+arm64-obj-$(CONFIG_QUEUED_SPINLOCKS)	+= qspinlock.o
 
 obj-y					+= $(arm64-obj-y) vdso/ probes/
 obj-$(CONFIG_ARM64_ILP32)		+= vdso-ilp32/
diff --git a/arch/arm64/kernel/qspinlock.c b/arch/arm64/kernel/qspinlock.c
new file mode 100644
index 000000000000..924f19953adb
--- /dev/null
+++ b/arch/arm64/kernel/qspinlock.c
@@ -0,0 +1,34 @@
+#include <asm/qspinlock.h>
+#include <asm/processor.h>
+
+void queued_spin_unlock_wait(struct qspinlock *lock)
+{
+	u32 val;
+
+	for (;;) {
+		smp_mb();
+		val = atomic_read(&lock->val);
+
+		if (!val) /* not locked, we're done */
+			goto done;
+
+		if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
+			break;
+
+		/* not locked, but pending, wait until we observe the lock */
+		cpu_relax();
+	}
+
+	for (;;) {
+		smp_mb();
+		val = atomic_read(&lock->val);
+		if (!(val & _Q_LOCKED_MASK)) /* any unlock is good */
+			break;
+
+		cpu_relax();
+	}
+
+done:
+	smp_acquire__after_ctrl_dep();
+}
+EXPORT_SYMBOL(queued_spin_unlock_wait);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c
  2017-05-03 14:51 ` [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c Yury Norov
@ 2017-05-03 15:05   ` Geert Uytterhoeven
  2017-05-03 20:32     ` Yury Norov
  0 siblings, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2017-05-03 15:05 UTC (permalink / raw)
  To: Yury Norov
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, Linux-Arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Wed, May 3, 2017 at 4:51 PM, Yury Norov <ynorov@caviumnetworks.com> wrote:
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -20,6 +20,7 @@
>  #include <linux/cpumask.h>
>  #include <linux/percpu.h>
>  #include <linux/hardirq.h>
> +#include <asm/spinlock.h>

linux/spinlock.h?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c
  2017-05-03 15:05   ` Geert Uytterhoeven
@ 2017-05-03 20:32     ` Yury Norov
  0 siblings, 0 replies; 22+ messages in thread
From: Yury Norov @ 2017-05-03 20:32 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, Linux-Arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Wed, May 03, 2017 at 05:05:29PM +0200, Geert Uytterhoeven wrote:
> On Wed, May 3, 2017 at 4:51 PM, Yury Norov <ynorov@caviumnetworks.com> wrote:
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/cpumask.h>
> >  #include <linux/percpu.h>
> >  #include <linux/hardirq.h>
> > +#include <asm/spinlock.h>
> 
> linux/spinlock.h?

Comment in include/linux/spinlock.h says:

 * here's the role of the various spinlock/rwlock related include
 * files:
 *
 * on SMP builds:
 *
 [...]
 *  asm/spinlock.h:       contains the arch_spin_*()/etc.  lowlevel
 *                        implementations, mostly inline assembly code
 *

This, and the fact that include/asm-generic/spinlock.h and
arch/arm64/include/asm/spinlock.h doesn't prevent the direct
inclusion means, for me, that I should include asm/spinlock.h,
because of all that I need only arch_spin_lock() and arch_spin_unlock().

Am I wrong?

Yury

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h
  2017-05-03 14:51 ` [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h Yury Norov
@ 2017-05-04  8:01   ` Arnd Bergmann
  0 siblings, 0 replies; 22+ messages in thread
From: Arnd Bergmann @ 2017-05-04  8:01 UTC (permalink / raw)
  To: Yury Norov
  Cc: Will Deacon, Peter Zijlstra, Linux Kernel Mailing List,
	linux-arch, Linux ARM, Adam Wallis, Andrew Pinski,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Wed, May 3, 2017 at 4:51 PM, Yury Norov <ynorov@caviumnetworks.com> wrote:
> The "qspinlock_types.h" doesn't need linux/atomic.h directly. So
> because of this, and because including of it requires the protection
> against recursive inclusion, it looks reasonable to move the
> inclusion exactly where it is needed. This change affects the x86_64
> arch, as the only user of qspinlocks at now. I have build-tested the
> change on x86_64 with CONFIG_PARAVIRT enabled and disabled.
>
> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

Please keep this together with the other patches as they get merged through
the arm64 tree.

       Arnd

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 答复: [PATCH 0/3] arm64: queued spinlocks and rw-locks
       [not found] ` <SIXPR0199MB0604CF9C101455F7D7417FF7C5160@SIXPR0199MB0604.apcprd01.prod.exchangelabs.com>
@ 2017-05-04 20:28   ` Yury Norov
  2017-05-05 11:53     ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Yury Norov @ 2017-05-04 20:28 UTC (permalink / raw)
  To: pan xinhui
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, linux-arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Wed, May 03, 2017 at 06:59:19PM +0000, pan xinhui wrote:
> 在 2017/5/3 22:51, Yury Norov 写道:> The patch 3 adds implementation for queued-based locking on
> > ARM64, and the option in kernel config to enable it. Patches
> > 1 and 2 fix some mess in header files to apply patch 3 smoothly.
> >
> > Tested on QDF2400 with huge improvements with these patches on
> > the torture tests, by Adam Wallis.
> >
> > Tested on ThunderX, by Andrew Pinski:
> > 120 thread (30 core - 4 thread/core) CN99xx (single socket):
> >
> > benchmark               Units	qspinlocks vs ticket locks
> > sched/messaging		s	73.91%
> > sched/pipe		ops/s	104.18%
> > futex/hash		ops/s	103.87%
> > futex/wake		ms	71.04%
> > futex/wake-parallel	ms	93.88%
> > futex/requeue		ms	96.47%
> > futex/lock-pi		ops/s	118.33%
> >
> > Notice, there's the queued locks implementation for the Power PC introduced
> > by Pan Xinhui. He largely tested it and also found significant performance
> > gain. In arch part it is very similar to this patch though.
> > https://lwn.net/Articles/701137/Hi, Yury
>     Glad to know you will join locking development :)
> I have left IBM. However I still care about the queued-spinlock anyway.
> 
> > RFC: https://www.spinics.net/lists/arm-kernel/msg575575.htmlI notice you raised one question about the performance degradation in the acquisition of rw-lock for read on qemu.
> This is strange indeed. I once enabled qrwlock on ppc too.
> 
> I paste your test reseults below.  Is this a result of
> qspinlock + qrwlock VS qspinlock + normal rwlock or
> qspinlock + qrwlock VS normal spinlock + normal rwlock?

Initially it was VS normal spinlock + normal rwlock. But now I checked
it vs qspinlock + normal rwlock, and results are the same. I don't think
it's a real use case to have ticket spinlocks and queued rwlocks, or
vice versa.
 
> I am not sure how that should happen.

Either me. If I understand it correctly, qemu is not suitable for measuring
performance, so I don't understand why slowing in qemu is important at all,
if real hardware works better. If it matters, my host CPU is Core i7-2630QM

> I make one RFC patch below(not based on latest kernel), you could apply it to
> check if ther is any performance improvement.
> The idea is that.
> In queued_write_lock_slowpath(), we did not unlock the ->wait_lock.
> Because the writer hold the rwlock, all readers are still waiting anyway.
> And in queued_read_lock_slowpath(), calling rspin_until_writer_unlock() looks
> like introduce a little overhead, say, spinning at the rwlock.
> 
> But in the end, queued_read_lock_slowpath() is too heavy, compared with the
> normal rwlock. such result maybe is somehow reasonable?

I tried this path, but kernel hangs on boot with it, in
queued_write_lock_slowpath().
 
> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index 54a8e65..28ee01d 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -28,8 +28,9 @@
>   * Writer states & reader shift and bias
>   */
>  #define	_QW_WAITING	1		/* A writer is waiting	   */
> -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
> -#define	_QW_WMASK	0xff		/* Writer mask		   */
> +#define _QW_KICK	0x80		/* need unlock the spinlock*/
> +#define	_QW_LOCKED	0x7f		/* A writer holds the lock */
> +#define	_QW_WMASK	0x7f		/* Writer mask		   */
>  #define	_QR_SHIFT	8		/* Reader count shift	   */
>  #define _QR_BIAS	(1U << _QR_SHIFT)
>  
> @@ -139,7 +140,10 @@ static inline void queued_read_unlock(struct qrwlock *lock)
>   */
>  static inline void queued_write_unlock(struct qrwlock *lock)
>  {
> -	smp_store_release((u8 *)&lock->cnts, 0);
> +	u32 v = atomic_read(&lock->cnts) & (_QW_WMASK | _QW_KICK);
> +	if (v & _QW_KICK)
> +		arch_spin_unlock(&lock->wait_lock);
> +	(void)atomic_sub_return_release(v, &lock->cnts);
>  }
>  
>  /*
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index fec0823..1f0ea02 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -116,7 +116,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
>  
>  	/* Try to acquire the lock directly if no reader is present */
>  	if (!atomic_read(&lock->cnts) &&
> -	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
> +	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED|_QW_KICK) == 0))
>  		goto unlock;
>  
>  	/*
> @@ -138,12 +138,13 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
>  		cnts = atomic_read(&lock->cnts);
>  		if ((cnts == _QW_WAITING) &&
>  		    (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
> -					    _QW_LOCKED) == _QW_WAITING))
> +					    _QW_LOCKED|_QW_KICK) == _QW_WAITING))
>  			break;
>  
>  		cpu_relax_lowlatency();

It hangs in this in this loop. It's because lock->cnts may now contain
_QW_WAITING or _QW_WAITING | _QW_KICK. So the if() condition may never
meet in 2nd case. To handle it, I changed it like this:
    for (;;) {
            cnts = atomic_read(&lock->cnts);
            if (((cnts & _QW_WMASK) == _QW_WAITING) &&
                (atomic_cmpxchg_acquire(&lock->cnts, cnts,
                                        _QW_LOCKED|_QW_KICK) == cnts))
                    break;

            cpu_relax();
    }


But after that it hanged in queued_spin_lock_slowpath() at the line
478             smp_cond_load_acquire(&lock->val.counter, !(VAL & _Q_LOCKED_MASK));

Backtrace is below.

Yury

>  	}
>  unlock:
> -	arch_spin_unlock(&lock->wait_lock);
> +	return;
>  }
>  EXPORT_SYMBOL(queued_write_lock_slowpath);
> -- 
> 2.4.11

#0  queued_spin_lock_slowpath (lock=0xffff000008cb051c <proc_subdir_lock+4>, val=<optimized out>)
    at kernel/locking/qspinlock.c:478
#1  0xffff0000080ff158 in queued_spin_lock (lock=<optimized out>)
    at ./include/asm-generic/qspinlock.h:104
#2  queued_write_lock_slowpath (lock=0xffff000008cb0518 <proc_subdir_lock>)
    at kernel/locking/qrwlock.c:116
#3  0xffff000008815fc4 in queued_write_lock (lock=<optimized out>)
    at ./include/asm-generic/qrwlock.h:135
#4  __raw_write_lock (lock=<optimized out>) at ./include/linux/rwlock_api_smp.h:211
#5  _raw_write_lock (lock=<optimized out>) at kernel/locking/spinlock.c:295
#6  0xffff00000824c4c0 in proc_register (dir=0xffff000008bff2d0 <proc_root>, 
    dp=0xffff80003d807300) at fs/proc/generic.c:342
#7  0xffff00000824c628 in proc_symlink (name=<optimized out>, 
    parent=0xffff000008b28e40 <proc_root_init+72>, dest=0xffff000008a331a8 "self/net")
    at fs/proc/generic.c:413
#8  0xffff000008b2927c in proc_net_init () at fs/proc/proc_net.c:244
#9  0xffff000008b28e40 in proc_root_init () at fs/proc/root.c:137
#10 0xffff000008b10b10 in start_kernel () at init/main.c:661
#11 0xffff000008b101e0 in __primary_switched () at arch/arm64/kernel/head.S:347

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 答复: [PATCH 0/3] arm64: queued spinlocks and rw-locks
  2017-05-04 20:28   ` 答复: [PATCH 0/3] arm64: queued spinlocks and rw-locks Yury Norov
@ 2017-05-05 11:53     ` Peter Zijlstra
  2017-05-05 12:26       ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2017-05-05 11:53 UTC (permalink / raw)
  To: Yury Norov
  Cc: pan xinhui, Will Deacon, linux-kernel, linux-arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Thu, May 04, 2017 at 11:28:09PM +0300, Yury Norov wrote:
> I don't think
> it's a real use case to have ticket spinlocks and queued rwlocks

There's nothing wrong with that combination. In fact, we merged qrwlock
much earlier than qspinlock.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 答复: [PATCH 0/3] arm64: queued spinlocks and rw-locks
  2017-05-05 11:53     ` Peter Zijlstra
@ 2017-05-05 12:26       ` Will Deacon
  2017-05-05 15:28         ` Yury Norov
  0 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2017-05-05 12:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yury Norov, pan xinhui, linux-kernel, linux-arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Fri, May 05, 2017 at 01:53:03PM +0200, Peter Zijlstra wrote:
> On Thu, May 04, 2017 at 11:28:09PM +0300, Yury Norov wrote:
> > I don't think
> > it's a real use case to have ticket spinlocks and queued rwlocks
> 
> There's nothing wrong with that combination. In fact, we merged qrwlock
> much earlier than qspinlock.

... and that's almost certainly the direction we'll go on arm64 too, not
least because the former are a lot easier to grok.

Will

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 答复: [PATCH 0/3] arm64: queued spinlocks and rw-locks
  2017-05-05 12:26       ` Will Deacon
@ 2017-05-05 15:28         ` Yury Norov
  2017-05-05 15:32           ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Yury Norov @ 2017-05-05 15:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, pan xinhui, linux-kernel, linux-arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Fri, May 05, 2017 at 01:26:40PM +0100, Will Deacon wrote:
> On Fri, May 05, 2017 at 01:53:03PM +0200, Peter Zijlstra wrote:
> > On Thu, May 04, 2017 at 11:28:09PM +0300, Yury Norov wrote:
> > > I don't think
> > > it's a real use case to have ticket spinlocks and queued rwlocks
> > 
> > There's nothing wrong with that combination. In fact, we merged qrwlock
> > much earlier than qspinlock.
> 
> ... and that's almost certainly the direction we'll go on arm64 too, not
> least because the former are a lot easier to grok.
> 
> Will

Hmm. Then I think I have to split patch 3 to rwlock and spinlock
parts, and allow user to enable them independently in config. 

Yury

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 答复: [PATCH 0/3] arm64: queued spinlocks and rw-locks
  2017-05-05 15:28         ` Yury Norov
@ 2017-05-05 15:32           ` Will Deacon
  0 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2017-05-05 15:32 UTC (permalink / raw)
  To: Yury Norov
  Cc: Peter Zijlstra, pan xinhui, linux-kernel, linux-arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Fri, May 05, 2017 at 06:28:45PM +0300, Yury Norov wrote:
> On Fri, May 05, 2017 at 01:26:40PM +0100, Will Deacon wrote:
> > On Fri, May 05, 2017 at 01:53:03PM +0200, Peter Zijlstra wrote:
> > > On Thu, May 04, 2017 at 11:28:09PM +0300, Yury Norov wrote:
> > > > I don't think
> > > > it's a real use case to have ticket spinlocks and queued rwlocks
> > > 
> > > There's nothing wrong with that combination. In fact, we merged qrwlock
> > > much earlier than qspinlock.
> > 
> > ... and that's almost certainly the direction we'll go on arm64 too, not
> > least because the former are a lot easier to grok.
> > 
> > Will
> 
> Hmm. Then I think I have to split patch 3 to rwlock and spinlock
> parts, and allow user to enable them independently in config. 

To be honest, I'm going to spend some time looking at the qrwlock code again
before I enable it for arm64, so I don't think you need to rush to resend
patches since I suspect I'll have a few in the meantime.

Will

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-05-03 14:51 ` [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support Yury Norov
@ 2017-05-09  4:47   ` Boqun Feng
  2017-05-09 18:48     ` Yury Norov
  0 siblings, 1 reply; 22+ messages in thread
From: Boqun Feng @ 2017-05-09  4:47 UTC (permalink / raw)
  To: Yury Norov
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, linux-arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Wed, May 03, 2017 at 05:51:41PM +0300, Yury Norov wrote:
> From: Jan Glauber <jglauber@cavium.com>
> 
> Ported from x86_64 with paravirtualization support removed.
> 
> Signed-off-by: Jan Glauber <jglauber@cavium.com>
> 
> Note. This patch removes protection from direct inclusion of
> arch/arm64/include/asm/spinlock_types.h. It's done because
> kernel/locking/qrwlock.c file does it thru the header
> include/asm-generic/qrwlock_types.h. Until now the only user
> of qrwlock.c was x86, and there's no such protection too.
> 
> I'm not happy to remove the protection, but if it's OK for x86,
> it should be also OK for arm64. If not, I think we'd fix it
> for x86, and add the protection there too.
> 
> Yury
> 
> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
> ---
>  arch/arm64/Kconfig                      | 24 +++++++++++++++++++
>  arch/arm64/include/asm/qrwlock.h        |  7 ++++++
>  arch/arm64/include/asm/qspinlock.h      | 42 +++++++++++++++++++++++++++++++++
>  arch/arm64/include/asm/spinlock.h       | 12 ++++++++++
>  arch/arm64/include/asm/spinlock_types.h | 14 ++++++++---
>  arch/arm64/kernel/Makefile              |  1 +
>  arch/arm64/kernel/qspinlock.c           | 34 ++++++++++++++++++++++++++
>  7 files changed, 131 insertions(+), 3 deletions(-)
>  create mode 100644 arch/arm64/include/asm/qrwlock.h
>  create mode 100644 arch/arm64/include/asm/qspinlock.h
>  create mode 100644 arch/arm64/kernel/qspinlock.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 22dbde97eefa..db24b3b3f3c6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -25,6 +25,8 @@ config ARM64
>  	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
>  	select ARCH_WANT_FRAME_POINTERS
>  	select ARCH_HAS_UBSAN_SANITIZE_ALL
> +	select ARCH_USE_QUEUED_RWLOCKS if QUEUED_LOCKS
> +	select ARCH_USE_QUEUED_SPINLOCKS if QUEUED_LOCKS
>  	select ARM_AMBA
>  	select ARM_ARCH_TIMER
>  	select ARM_GIC
> @@ -692,6 +694,28 @@ config ARCH_WANT_HUGE_PMD_SHARE
>  config ARCH_HAS_CACHE_LINE_SIZE
>  	def_bool y
>  
> +choice
> +	prompt "Locking type"
> +	default TICKET_LOCKS
> +	help
> +	  Choose between traditional ticked-based locking mechanism and
> +	  queued-based mechanism.
> +
> +config TICKET_LOCKS
> +	bool "Ticket locks"
> +	help
> +	  Ticked-based locking implementation for ARM64
> +
> +config QUEUED_LOCKS
> +	bool "Queued locks"
> +	help
> +	  Queue-based locking mechanism. This option improves
> +	  locking performance in case of high-contended locking
> +	  on many-cpu machines. On low-cpu machines there is no
> +	  difference comparing to ticked-based locking.
> +
> +endchoice
> +
>  source "mm/Kconfig"
>  
>  config SECCOMP
> diff --git a/arch/arm64/include/asm/qrwlock.h b/arch/arm64/include/asm/qrwlock.h
> new file mode 100644
> index 000000000000..626f6ebfb52d
> --- /dev/null
> +++ b/arch/arm64/include/asm/qrwlock.h
> @@ -0,0 +1,7 @@
> +#ifndef _ASM_ARM64_QRWLOCK_H
> +#define _ASM_ARM64_QRWLOCK_H
> +
> +#include <asm-generic/qrwlock_types.h>
> +#include <asm-generic/qrwlock.h>
> +
> +#endif /* _ASM_ARM64_QRWLOCK_H */
> diff --git a/arch/arm64/include/asm/qspinlock.h b/arch/arm64/include/asm/qspinlock.h
> new file mode 100644
> index 000000000000..09ef4f13f549
> --- /dev/null
> +++ b/arch/arm64/include/asm/qspinlock.h
> @@ -0,0 +1,42 @@
> +#ifndef _ASM_ARM64_QSPINLOCK_H
> +#define _ASM_ARM64_QSPINLOCK_H
> +
> +#include <asm-generic/qspinlock_types.h>
> +#include <asm/atomic.h>
> +
> +extern void queued_spin_unlock_wait(struct qspinlock *lock);
> +#define queued_spin_unlock_wait queued_spin_unlock_wait
> +
> +#define	queued_spin_unlock queued_spin_unlock
> +/**
> + * queued_spin_unlock - release a queued spinlock
> + * @lock : Pointer to queued spinlock structure
> + *
> + * A smp_store_release() on the least-significant byte.
> + */
> +static inline void queued_spin_unlock(struct qspinlock *lock)
> +{
> +	smp_store_release((u8 *)lock, 0);

I think this part will cause endian issues, maybe you want something
like what we do in queued_write_lock().

Have you tested this on an BE environment?

Regards,
Boqun

> +}
> +
> +#define queued_spin_is_locked queued_spin_is_locked
> +/**
> + * queued_spin_is_locked - is the spinlock locked?
> + * @lock: Pointer to queued spinlock structure
> + * Return: 1 if it is locked, 0 otherwise
> + */
> +static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
> +{
> +	/*
> +	 * See queued_spin_unlock_wait().
> +	 *
> +	 * Any !0 state indicates it is locked, even if _Q_LOCKED_VAL
> +	 * isn't immediately observable.
> +	 */
> +	smp_mb();
> +	return atomic_read(&lock->val);
> +}
> +
> +#include <asm-generic/qspinlock.h>
> +
> +#endif /* _ASM_ARM64_QSPINLOCK_H */
> diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
> index cae331d553f8..37713397e0c5 100644
> --- a/arch/arm64/include/asm/spinlock.h
> +++ b/arch/arm64/include/asm/spinlock.h
> @@ -20,6 +20,10 @@
>  #include <asm/spinlock_types.h>
>  #include <asm/processor.h>
>  
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +#include <asm/qspinlock.h>
> +#else
> +
>  /*
>   * Spinlock implementation.
>   *
> @@ -187,6 +191,12 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
>  }
>  #define arch_spin_is_contended	arch_spin_is_contended
>  
> +#endif /* CONFIG_QUEUED_SPINLOCKS */
> +
> +#ifdef CONFIG_QUEUED_RWLOCKS
> +#include <asm/qrwlock.h>
> +#else
> +
>  /*
>   * Write lock implementation.
>   *
> @@ -351,6 +361,8 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
>  /* read_can_lock - would read_trylock() succeed? */
>  #define arch_read_can_lock(x)		((x)->lock < 0x80000000)
>  
> +#endif /* CONFIG_QUEUED_RWLOCKS */
> +
>  #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
>  #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
>  
> diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
> index 55be59a35e3f..0f0f1561ab6a 100644
> --- a/arch/arm64/include/asm/spinlock_types.h
> +++ b/arch/arm64/include/asm/spinlock_types.h
> @@ -16,9 +16,9 @@
>  #ifndef __ASM_SPINLOCK_TYPES_H
>  #define __ASM_SPINLOCK_TYPES_H
>  
> -#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H)
> -# error "please don't include this file directly"
> -#endif
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +#include <asm-generic/qspinlock_types.h>
> +#else
>  
>  #include <linux/types.h>
>  
> @@ -36,10 +36,18 @@ typedef struct {
>  
>  #define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 , 0 }
>  
> +#endif /* CONFIG_QUEUED_SPINLOCKS */
> +
> +#ifdef CONFIG_QUEUED_RWLOCKS
> +#include <asm-generic/qrwlock_types.h>
> +#else
> +
>  typedef struct {
>  	volatile unsigned int lock;
>  } arch_rwlock_t;
>  
>  #define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
>  
> +#endif /* CONFIG_QUEUED_RWLOCKS */
> +
>  #endif
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 9d56467dc223..f48f6256e893 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -56,6 +56,7 @@ arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
>  arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
>  arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
>  arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
> +arm64-obj-$(CONFIG_QUEUED_SPINLOCKS)	+= qspinlock.o
>  
>  obj-y					+= $(arm64-obj-y) vdso/ probes/
>  obj-$(CONFIG_ARM64_ILP32)		+= vdso-ilp32/
> diff --git a/arch/arm64/kernel/qspinlock.c b/arch/arm64/kernel/qspinlock.c
> new file mode 100644
> index 000000000000..924f19953adb
> --- /dev/null
> +++ b/arch/arm64/kernel/qspinlock.c
> @@ -0,0 +1,34 @@
> +#include <asm/qspinlock.h>
> +#include <asm/processor.h>
> +
> +void queued_spin_unlock_wait(struct qspinlock *lock)
> +{
> +	u32 val;
> +
> +	for (;;) {
> +		smp_mb();
> +		val = atomic_read(&lock->val);
> +
> +		if (!val) /* not locked, we're done */
> +			goto done;
> +
> +		if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
> +			break;
> +
> +		/* not locked, but pending, wait until we observe the lock */
> +		cpu_relax();
> +	}
> +
> +	for (;;) {
> +		smp_mb();
> +		val = atomic_read(&lock->val);
> +		if (!(val & _Q_LOCKED_MASK)) /* any unlock is good */
> +			break;
> +
> +		cpu_relax();
> +	}
> +
> +done:
> +	smp_acquire__after_ctrl_dep();
> +}
> +EXPORT_SYMBOL(queued_spin_unlock_wait);
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-05-09  4:47   ` Boqun Feng
@ 2017-05-09 18:48     ` Yury Norov
  2017-05-09 19:37       ` Yury Norov
  0 siblings, 1 reply; 22+ messages in thread
From: Yury Norov @ 2017-05-09 18:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, linux-arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Tue, May 09, 2017 at 12:47:08PM +0800, Boqun Feng wrote:
> On Wed, May 03, 2017 at 05:51:41PM +0300, Yury Norov wrote:
> > From: Jan Glauber <jglauber@cavium.com>
> > 
> > Ported from x86_64 with paravirtualization support removed.
> > 
> > Signed-off-by: Jan Glauber <jglauber@cavium.com>
> > 
> > Note. This patch removes protection from direct inclusion of
> > arch/arm64/include/asm/spinlock_types.h. It's done because
> > kernel/locking/qrwlock.c file does it thru the header
> > include/asm-generic/qrwlock_types.h. Until now the only user
> > of qrwlock.c was x86, and there's no such protection too.
> > 
> > I'm not happy to remove the protection, but if it's OK for x86,
> > it should be also OK for arm64. If not, I think we'd fix it
> > for x86, and add the protection there too.
> > 
> > Yury
> > 
> > Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>

[...]

> > +#define	queued_spin_unlock queued_spin_unlock
> > +/**
> > + * queued_spin_unlock - release a queued spinlock
> > + * @lock : Pointer to queued spinlock structure
> > + *
> > + * A smp_store_release() on the least-significant byte.
> > + */
> > +static inline void queued_spin_unlock(struct qspinlock *lock)
> > +{
> > +	smp_store_release((u8 *)lock, 0);
> 
> I think this part will cause endian issues, maybe you want something
> like what we do in queued_write_lock().
> 
> Have you tested this on an BE environment?

No. I think I have to. Thanks for the pointing it.

> 
> Regards,
> Boqun

I think it's just the issue of copying from x86, and there's no any
specific need to cast to u8* type on arm64. So the correct version of
it would be like this, I believe: smp_store_release(&lock->val).

Yury

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-05-09 18:48     ` Yury Norov
@ 2017-05-09 19:37       ` Yury Norov
  0 siblings, 0 replies; 22+ messages in thread
From: Yury Norov @ 2017-05-09 19:37 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, linux-arch,
	linux-arm-kernel, Adam Wallis, Andrew Pinski, Arnd Bergmann,
	Catalin Marinas, Ingo Molnar, Jan Glauber, Mark Rutland,
	Pan Xinhui

On Tue, May 09, 2017 at 09:48:29PM +0300, Yury Norov wrote:
> On Tue, May 09, 2017 at 12:47:08PM +0800, Boqun Feng wrote:
> > On Wed, May 03, 2017 at 05:51:41PM +0300, Yury Norov wrote:
> > > From: Jan Glauber <jglauber@cavium.com>
> > > 
> > > Ported from x86_64 with paravirtualization support removed.
> > > 
> > > Signed-off-by: Jan Glauber <jglauber@cavium.com>
> > > 
> > > Note. This patch removes protection from direct inclusion of
> > > arch/arm64/include/asm/spinlock_types.h. It's done because
> > > kernel/locking/qrwlock.c file does it thru the header
> > > include/asm-generic/qrwlock_types.h. Until now the only user
> > > of qrwlock.c was x86, and there's no such protection too.
> > > 
> > > I'm not happy to remove the protection, but if it's OK for x86,
> > > it should be also OK for arm64. If not, I think we'd fix it
> > > for x86, and add the protection there too.
> > > 
> > > Yury
> > > 
> > > Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
> 
> [...]
> 
> > > +#define	queued_spin_unlock queued_spin_unlock
> > > +/**
> > > + * queued_spin_unlock - release a queued spinlock
> > > + * @lock : Pointer to queued spinlock structure
> > > + *
> > > + * A smp_store_release() on the least-significant byte.
> > > + */
> > > +static inline void queued_spin_unlock(struct qspinlock *lock)
> > > +{
> > > +	smp_store_release((u8 *)lock, 0);
> > 
> > I think this part will cause endian issues, maybe you want something
> > like what we do in queued_write_lock().
> > 
> > Have you tested this on an BE environment?
> 
> No. I think I have to. Thanks for the pointing it.
> 
> > 
> > Regards,
> > Boqun
> 
> I think it's just the issue of copying from x86, and there's no any
> specific need to cast to u8* type on arm64. So the correct version of
> it would be like this, I believe: smp_store_release(&lock->val).
> 
> Yury

Oops, it would rather be like this:

static inline void queued_spin_unlock(struct qspinlock *lock)
{
#if IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN)
       smp_store_release((u8 *) &lock->val + 3, 0);
#else
       smp_store_release((u8 *) &lock->val, 0);
#endif
}

Or with the helper, like here in ppc port:
https://www.spinics.net/lists/linux-virtualization/msg29390.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-04-26 12:39         ` Yury Norov
@ 2017-04-28 15:44           ` Will Deacon
  0 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2017-04-28 15:44 UTC (permalink / raw)
  To: Yury Norov
  Cc: Peter Zijlstra, linux-kernel, linux-arch, linux-arm-kernel,
	Ingo Molnar, Arnd Bergmann, Catalin Marinas, Jan Glauber

On Wed, Apr 26, 2017 at 03:39:47PM +0300, Yury Norov wrote:
> On Thu, Apr 20, 2017 at 09:05:30PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 20, 2017 at 09:23:18PM +0300, Yury Norov wrote:
> > > Is there some test to reproduce the locking failure for the case.
> > 
> > Possibly sysvsem stress before commit:
> > 
> >   27d7be1801a4 ("ipc/sem.c: avoid using spin_unlock_wait()")
> > 
> > Although a similar scheme is also used in nf_conntrack, see commit:
> > 
> >   b316ff783d17 ("locking/spinlock, netfilter: Fix nf_conntrack_lock() barriers")
> > 
> > > I
> > > ask because I run loctorture for many hours on my qemu (emulating
> > > cortex-a57), and I see no failures in the test reports. And Jan did it
> > > on ThunderX, and Adam on QDF2400 without any problems. So even if I
> > > rework those functions, how could I check them for correctness?
> > 
> > Running them doesn't prove them correct. Memory ordering bugs have been
> > in the kernel for many years without 'ever' triggering. This is stuff
> > you have to think about.
> > 
> > > Anyway, regarding the queued_spin_unlock_wait(), is my understanding
> > > correct that you assume adding smp_mb() before entering the for(;;)
> > > cycle, and using ldaxr/strxr instead of atomic_read()?
> > 
> > You'll have to ask Will, I always forget the arm64 details.
> 
> So, below is what I have. For queued_spin_unlock_wait() the generated
> code is looking like this:
> ffff0000080983a0 <queued_spin_unlock_wait>:
> ffff0000080983a0:       d5033bbf        dmb     ish
> ffff0000080983a4:       b9400007        ldr     w7, [x0]
> ffff0000080983a8:       350000c7        cbnz    w7, ffff0000080983c0 <queued_spin_unlock_wait+0x20>
> ffff0000080983ac:       1400000e        b       ffff0000080983e4 <queued_spin_unlock_wait+0x44>
> ffff0000080983b0:       d503203f        yield
> ffff0000080983b4:       d5033bbf        dmb     ish
> ffff0000080983b8:       b9400007        ldr     w7, [x0]
> ffff0000080983bc:       34000147        cbz     w7, ffff0000080983e4 <queued_spin_unlock_wait+0x44>
> ffff0000080983c0:       f2401cff        tst     x7, #0xff
> ffff0000080983c4:       54ffff60        b.eq    ffff0000080983b0 <queued_spin_unlock_wait+0x10>
> ffff0000080983c8:       14000003        b       ffff0000080983d4 <queued_spin_unlock_wait+0x34>
> ffff0000080983cc:       d503201f        nop
> ffff0000080983d0:       d503203f        yield
> ffff0000080983d4:       d5033bbf        dmb     ish
> ffff0000080983d8:       b9400007        ldr     w7, [x0]
> ffff0000080983dc:       f2401cff        tst     x7, #0xff
> ffff0000080983e0:       54ffff81        b.ne    ffff0000080983d0 <queued_spin_unlock_wait+0x30>
> ffff0000080983e4:       d50339bf        dmb     ishld
> ffff0000080983e8:       d65f03c0        ret
> ffff0000080983ec:       d503201f        nop
> 
> If I understand the documentation correctly, it's enough to check the lock
> properly. If not - please give me the clue. Will?

Sorry, but I haven't had time to page this back in recently, so I can't give
you an answer straight off the bat. I'll need to go back and revisit the
qspinlock parts and, in particular, use of WFE before I'm comfortable with
this. I also don't want this on by default for the arm64 kernel, and I'd
like to see numbers comparing with our ticket locks on silicon with and
without the large system extensions, for low (<=8), medium (8-32) and high
(>32) core counts.

I'm very nervous about switching our locking implementation over to
something that's largely been developed and tested for x86, which has a
stronger memory model.

Will

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-04-20 19:05       ` Peter Zijlstra
@ 2017-04-26 12:39         ` Yury Norov
  2017-04-28 15:44           ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Yury Norov @ 2017-04-26 12:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-arm-kernel, Ingo Molnar,
	Arnd Bergmann, Catalin Marinas, Will Deacon, Jan Glauber

On Thu, Apr 20, 2017 at 09:05:30PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 09:23:18PM +0300, Yury Norov wrote:
> > Is there some test to reproduce the locking failure for the case.
> 
> Possibly sysvsem stress before commit:
> 
>   27d7be1801a4 ("ipc/sem.c: avoid using spin_unlock_wait()")
> 
> Although a similar scheme is also used in nf_conntrack, see commit:
> 
>   b316ff783d17 ("locking/spinlock, netfilter: Fix nf_conntrack_lock() barriers")
> 
> > I
> > ask because I run loctorture for many hours on my qemu (emulating
> > cortex-a57), and I see no failures in the test reports. And Jan did it
> > on ThunderX, and Adam on QDF2400 without any problems. So even if I
> > rework those functions, how could I check them for correctness?
> 
> Running them doesn't prove them correct. Memory ordering bugs have been
> in the kernel for many years without 'ever' triggering. This is stuff
> you have to think about.
> 
> > Anyway, regarding the queued_spin_unlock_wait(), is my understanding
> > correct that you assume adding smp_mb() before entering the for(;;)
> > cycle, and using ldaxr/strxr instead of atomic_read()?
> 
> You'll have to ask Will, I always forget the arm64 details.

So, below is what I have. For queued_spin_unlock_wait() the generated
code is looking like this:
ffff0000080983a0 <queued_spin_unlock_wait>:
ffff0000080983a0:       d5033bbf        dmb     ish
ffff0000080983a4:       b9400007        ldr     w7, [x0]
ffff0000080983a8:       350000c7        cbnz    w7, ffff0000080983c0 <queued_spin_unlock_wait+0x20>
ffff0000080983ac:       1400000e        b       ffff0000080983e4 <queued_spin_unlock_wait+0x44>
ffff0000080983b0:       d503203f        yield
ffff0000080983b4:       d5033bbf        dmb     ish
ffff0000080983b8:       b9400007        ldr     w7, [x0]
ffff0000080983bc:       34000147        cbz     w7, ffff0000080983e4 <queued_spin_unlock_wait+0x44>
ffff0000080983c0:       f2401cff        tst     x7, #0xff
ffff0000080983c4:       54ffff60        b.eq    ffff0000080983b0 <queued_spin_unlock_wait+0x10>
ffff0000080983c8:       14000003        b       ffff0000080983d4 <queued_spin_unlock_wait+0x34>
ffff0000080983cc:       d503201f        nop
ffff0000080983d0:       d503203f        yield
ffff0000080983d4:       d5033bbf        dmb     ish
ffff0000080983d8:       b9400007        ldr     w7, [x0]
ffff0000080983dc:       f2401cff        tst     x7, #0xff
ffff0000080983e0:       54ffff81        b.ne    ffff0000080983d0 <queued_spin_unlock_wait+0x30>
ffff0000080983e4:       d50339bf        dmb     ishld
ffff0000080983e8:       d65f03c0        ret
ffff0000080983ec:       d503201f        nop

If I understand the documentation correctly, it's enough to check the lock
properly. If not - please give me the clue. Will?

Yury

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 22dbde97eefa..2d80161ee367 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -25,6 +25,8 @@ config ARM64
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
+	select ARCH_USE_QUEUED_SPINLOCKS
+	select ARCH_USE_QUEUED_RWLOCKS
 	select ARM_AMBA
 	select ARM_ARCH_TIMER
 	select ARM_GIC
diff --git a/arch/arm64/include/asm/qrwlock.h b/arch/arm64/include/asm/qrwlock.h
new file mode 100644
index 000000000000..626f6ebfb52d
--- /dev/null
+++ b/arch/arm64/include/asm/qrwlock.h
@@ -0,0 +1,7 @@
+#ifndef _ASM_ARM64_QRWLOCK_H
+#define _ASM_ARM64_QRWLOCK_H
+
+#include <asm-generic/qrwlock_types.h>
+#include <asm-generic/qrwlock.h>
+
+#endif /* _ASM_ARM64_QRWLOCK_H */
diff --git a/arch/arm64/include/asm/qspinlock.h b/arch/arm64/include/asm/qspinlock.h
new file mode 100644
index 000000000000..09ef4f13f549
--- /dev/null
+++ b/arch/arm64/include/asm/qspinlock.h
@@ -0,0 +1,42 @@
+#ifndef _ASM_ARM64_QSPINLOCK_H
+#define _ASM_ARM64_QSPINLOCK_H
+
+#include <asm-generic/qspinlock_types.h>
+#include <asm/atomic.h>
+
+extern void queued_spin_unlock_wait(struct qspinlock *lock);
+#define queued_spin_unlock_wait queued_spin_unlock_wait
+
+#define	queued_spin_unlock queued_spin_unlock
+/**
+ * queued_spin_unlock - release a queued spinlock
+ * @lock : Pointer to queued spinlock structure
+ *
+ * A smp_store_release() on the least-significant byte.
+ */
+static __always_inline void queued_spin_unlock(struct qspinlock *lock)
+{
+	smp_store_release((u8 *)lock, 0);
+}
+
+#define queued_spin_is_locked queued_spin_is_locked
+/**
+ * queued_spin_is_locked - is the spinlock locked?
+ * @lock: Pointer to queued spinlock structure
+ * Return: 1 if it is locked, 0 otherwise
+ */
+static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
+{
+	/*
+	 * See queued_spin_unlock_wait().
+	 *
+	 * Any !0 state indicates it is locked, even if _Q_LOCKED_VAL
+	 * isn't immediately observable.
+	 */
+	smp_mb();
+	return atomic_read(&lock->val);
+}
+
+#include <asm-generic/qspinlock.h>
+
+#endif /* _ASM_ARM64_QSPINLOCK_H */
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index cae331d553f8..37713397e0c5 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -20,6 +20,10 @@
 #include <asm/spinlock_types.h>
 #include <asm/processor.h>
 
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#else
+
 /*
  * Spinlock implementation.
  *
@@ -187,6 +191,12 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+#ifdef CONFIG_QUEUED_RWLOCKS
+#include <asm/qrwlock.h>
+#else
+
 /*
  * Write lock implementation.
  *
@@ -351,6 +361,8 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
 /* read_can_lock - would read_trylock() succeed? */
 #define arch_read_can_lock(x)		((x)->lock < 0x80000000)
 
+#endif /* CONFIG_QUEUED_RWLOCKS */
+
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
 
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
index 55be59a35e3f..0f0f1561ab6a 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -16,9 +16,9 @@
 #ifndef __ASM_SPINLOCK_TYPES_H
 #define __ASM_SPINLOCK_TYPES_H
 
-#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H)
-# error "please don't include this file directly"
-#endif
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm-generic/qspinlock_types.h>
+#else
 
 #include <linux/types.h>
 
@@ -36,10 +36,18 @@ typedef struct {
 
 #define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 , 0 }
 
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+#ifdef CONFIG_QUEUED_RWLOCKS
+#include <asm-generic/qrwlock_types.h>
+#else
+
 typedef struct {
 	volatile unsigned int lock;
 } arch_rwlock_t;
 
 #define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
 
+#endif /* CONFIG_QUEUED_RWLOCKS */
+
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 9d56467dc223..f48f6256e893 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -56,6 +56,7 @@ arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
+arm64-obj-$(CONFIG_QUEUED_SPINLOCKS)	+= qspinlock.o
 
 obj-y					+= $(arm64-obj-y) vdso/ probes/
 obj-$(CONFIG_ARM64_ILP32)		+= vdso-ilp32/
diff --git a/arch/arm64/kernel/qspinlock.c b/arch/arm64/kernel/qspinlock.c
new file mode 100644
index 000000000000..924f19953adb
--- /dev/null
+++ b/arch/arm64/kernel/qspinlock.c
@@ -0,0 +1,34 @@
+#include <asm/qspinlock.h>
+#include <asm/processor.h>
+
+void queued_spin_unlock_wait(struct qspinlock *lock)
+{
+	u32 val;
+
+	for (;;) {
+		smp_mb();
+		val = atomic_read(&lock->val);
+
+		if (!val) /* not locked, we're done */
+			goto done;
+
+		if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
+			break;
+
+		/* not locked, but pending, wait until we observe the lock */
+		cpu_relax();
+	}
+
+	for (;;) {
+		smp_mb();
+		val = atomic_read(&lock->val);
+		if (!(val & _Q_LOCKED_MASK)) /* any unlock is good */
+			break;
+
+		cpu_relax();
+	}
+
+done:
+	smp_acquire__after_ctrl_dep();
+}
+EXPORT_SYMBOL(queued_spin_unlock_wait);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-04-20 18:23     ` Yury Norov
  2017-04-20 19:00       ` Mark Rutland
@ 2017-04-20 19:05       ` Peter Zijlstra
  2017-04-26 12:39         ` Yury Norov
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2017-04-20 19:05 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, linux-arch, linux-arm-kernel, Ingo Molnar,
	Arnd Bergmann, Catalin Marinas, Will Deacon, Jan Glauber

On Thu, Apr 20, 2017 at 09:23:18PM +0300, Yury Norov wrote:
> Is there some test to reproduce the locking failure for the case.

Possibly sysvsem stress before commit:

  27d7be1801a4 ("ipc/sem.c: avoid using spin_unlock_wait()")

Although a similar scheme is also used in nf_conntrack, see commit:

  b316ff783d17 ("locking/spinlock, netfilter: Fix nf_conntrack_lock() barriers")

> I
> ask because I run loctorture for many hours on my qemu (emulating
> cortex-a57), and I see no failures in the test reports. And Jan did it
> on ThunderX, and Adam on QDF2400 without any problems. So even if I
> rework those functions, how could I check them for correctness?

Running them doesn't prove them correct. Memory ordering bugs have been
in the kernel for many years without 'ever' triggering. This is stuff
you have to think about.

> Anyway, regarding the queued_spin_unlock_wait(), is my understanding
> correct that you assume adding smp_mb() before entering the for(;;)
> cycle, and using ldaxr/strxr instead of atomic_read()?

You'll have to ask Will, I always forget the arm64 details.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-04-20 18:23     ` Yury Norov
@ 2017-04-20 19:00       ` Mark Rutland
  2017-04-20 19:05       ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Rutland @ 2017-04-20 19:00 UTC (permalink / raw)
  To: Yury Norov
  Cc: Peter Zijlstra, linux-arch, Arnd Bergmann, Catalin Marinas,
	Will Deacon, linux-kernel, Ingo Molnar, Jan Glauber,
	linux-arm-kernel

On Thu, Apr 20, 2017 at 09:23:18PM +0300, Yury Norov wrote:
> On Thu, Apr 13, 2017 at 08:12:12PM +0200, Peter Zijlstra wrote:
> > On Tue, Apr 11, 2017 at 01:35:04AM +0400, Yury Norov wrote:
> > 
> > > +++ b/arch/arm64/include/asm/qspinlock.h
> > > @@ -0,0 +1,20 @@
> > > +#ifndef _ASM_ARM64_QSPINLOCK_H
> > > +#define _ASM_ARM64_QSPINLOCK_H
> > > +
> > > +#include <asm-generic/qspinlock_types.h>
> > > +
> > > +#define	queued_spin_unlock queued_spin_unlock
> > > +/**
> > > + * queued_spin_unlock - release a queued spinlock
> > > + * @lock : Pointer to queued spinlock structure
> > > + *
> > > + * A smp_store_release() on the least-significant byte.
> > > + */
> > > +static inline void queued_spin_unlock(struct qspinlock *lock)
> > > +{
> > > +	smp_store_release((u8 *)lock, 0);
> > > +}
> > 
> > I'm afraid this isn't enough for arm64. I suspect you want your own
> > variant of queued_spin_unlock_wait() and queued_spin_is_locked() as
> > well.
> > 
> > Much memory ordering fun to be had there.
> 
> Hi Peter,
> 
> Is there some test to reproduce the locking failure for the case. I
> ask because I run loctorture for many hours on my qemu (emulating
> cortex-a57), and I see no failures in the test reports.

Even with multi-threaded TCG, a system emulated with QEMU will have far
stronger memory ordering than a real platform. So stress tests on such a
system are useless for testing memory ordering properties.

I would strongly advise that you use a real platform for anything beyond
basic tests when touching code in this area.

> And Jan did it on ThunderX, and Adam on QDF2400 without any problems.
> So even if I rework those functions, how could I check them for
> correctness?

Given the variation the architecture permits, and how difficult it is to
diagnose issues in this area, testing isn't enough here.

You need at least some informal proof as to the primitives doing what
they should, i.e. you should be able to explain why the code is correct.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-04-13 18:12   ` Peter Zijlstra
@ 2017-04-20 18:23     ` Yury Norov
  2017-04-20 19:00       ` Mark Rutland
  2017-04-20 19:05       ` Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Yury Norov @ 2017-04-20 18:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, linux-arm-kernel, Ingo Molnar,
	Arnd Bergmann, Catalin Marinas, Will Deacon, Jan Glauber

On Thu, Apr 13, 2017 at 08:12:12PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 11, 2017 at 01:35:04AM +0400, Yury Norov wrote:
> 
> > +++ b/arch/arm64/include/asm/qspinlock.h
> > @@ -0,0 +1,20 @@
> > +#ifndef _ASM_ARM64_QSPINLOCK_H
> > +#define _ASM_ARM64_QSPINLOCK_H
> > +
> > +#include <asm-generic/qspinlock_types.h>
> > +
> > +#define	queued_spin_unlock queued_spin_unlock
> > +/**
> > + * queued_spin_unlock - release a queued spinlock
> > + * @lock : Pointer to queued spinlock structure
> > + *
> > + * A smp_store_release() on the least-significant byte.
> > + */
> > +static inline void queued_spin_unlock(struct qspinlock *lock)
> > +{
> > +	smp_store_release((u8 *)lock, 0);
> > +}
> 
> I'm afraid this isn't enough for arm64. I suspect you want your own
> variant of queued_spin_unlock_wait() and queued_spin_is_locked() as
> well.
> 
> Much memory ordering fun to be had there.

Hi Peter,

Is there some test to reproduce the locking failure for the case. I
ask because I run loctorture for many hours on my qemu (emulating
cortex-a57), and I see no failures in the test reports. And Jan did it
on ThunderX, and Adam on QDF2400 without any problems. So even if I
rework those functions, how could I check them for correctness?

Anyway, regarding the queued_spin_unlock_wait(), is my understanding
correct that you assume adding smp_mb() before entering the for(;;)
cycle, and using ldaxr/strxr instead of atomic_read()?

Yury

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-04-10 21:35 ` [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support Yury Norov
@ 2017-04-13 18:12   ` Peter Zijlstra
  2017-04-20 18:23     ` Yury Norov
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2017-04-13 18:12 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, linux-arch, linux-arm-kernel, Ingo Molnar,
	Arnd Bergmann, Catalin Marinas, Will Deacon, Jan Glauber

On Tue, Apr 11, 2017 at 01:35:04AM +0400, Yury Norov wrote:

> +++ b/arch/arm64/include/asm/qspinlock.h
> @@ -0,0 +1,20 @@
> +#ifndef _ASM_ARM64_QSPINLOCK_H
> +#define _ASM_ARM64_QSPINLOCK_H
> +
> +#include <asm-generic/qspinlock_types.h>
> +
> +#define	queued_spin_unlock queued_spin_unlock
> +/**
> + * queued_spin_unlock - release a queued spinlock
> + * @lock : Pointer to queued spinlock structure
> + *
> + * A smp_store_release() on the least-significant byte.
> + */
> +static inline void queued_spin_unlock(struct qspinlock *lock)
> +{
> +	smp_store_release((u8 *)lock, 0);
> +}

I'm afraid this isn't enough for arm64. I suspect you want your own
variant of queued_spin_unlock_wait() and queued_spin_is_locked() as
well.

Much memory ordering fun to be had there.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support
  2017-04-10 21:35 [RFC PATCH " Yury Norov
@ 2017-04-10 21:35 ` Yury Norov
  2017-04-13 18:12   ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Yury Norov @ 2017-04-10 21:35 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-arm-kernel
  Cc: Yury Norov, Peter Zijlstra, Ingo Molnar, Arnd Bergmann,
	Catalin Marinas, Will Deacon, Jan Glauber

From: Jan Glauber <jglauber@cavium.com>

Ported from x86_64 with paravirtualization support removed.

Signed-off-by: Jan Glauber <jglauber@cavium.com>

Note. This patch removes protection from direct inclusion of
arch/arm64/include/asm/spinlock_types.h. It's done because
kernel/locking/qrwlock.c file does it thru the header
include/asm-generic/qrwlock_types.h. Until now the only user
of qrwlock.c was x86, and there's no such protection too.

I'm not happy to remove the protection, but if it's OK for x86,
it should be also OK for arm64. If not, I think we'd fix it
for x86, and add the protection there too.

Yury

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/Kconfig                      |  2 ++
 arch/arm64/include/asm/qrwlock.h        |  7 +++++++
 arch/arm64/include/asm/qspinlock.h      | 20 ++++++++++++++++++++
 arch/arm64/include/asm/spinlock.h       | 12 ++++++++++++
 arch/arm64/include/asm/spinlock_types.h | 14 +++++++++++---
 5 files changed, 52 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm64/include/asm/qrwlock.h
 create mode 100644 arch/arm64/include/asm/qspinlock.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f2b0b52..ac1c170 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -24,6 +24,8 @@ config ARM64
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
+	select ARCH_USE_QUEUED_SPINLOCKS
+	select ARCH_USE_QUEUED_RWLOCKS
 	select ARM_AMBA
 	select ARM_ARCH_TIMER
 	select ARM_GIC
diff --git a/arch/arm64/include/asm/qrwlock.h b/arch/arm64/include/asm/qrwlock.h
new file mode 100644
index 0000000..626f6eb
--- /dev/null
+++ b/arch/arm64/include/asm/qrwlock.h
@@ -0,0 +1,7 @@
+#ifndef _ASM_ARM64_QRWLOCK_H
+#define _ASM_ARM64_QRWLOCK_H
+
+#include <asm-generic/qrwlock_types.h>
+#include <asm-generic/qrwlock.h>
+
+#endif /* _ASM_ARM64_QRWLOCK_H */
diff --git a/arch/arm64/include/asm/qspinlock.h b/arch/arm64/include/asm/qspinlock.h
new file mode 100644
index 0000000..98f50fc
--- /dev/null
+++ b/arch/arm64/include/asm/qspinlock.h
@@ -0,0 +1,20 @@
+#ifndef _ASM_ARM64_QSPINLOCK_H
+#define _ASM_ARM64_QSPINLOCK_H
+
+#include <asm-generic/qspinlock_types.h>
+
+#define	queued_spin_unlock queued_spin_unlock
+/**
+ * queued_spin_unlock - release a queued spinlock
+ * @lock : Pointer to queued spinlock structure
+ *
+ * A smp_store_release() on the least-significant byte.
+ */
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+	smp_store_release((u8 *)lock, 0);
+}
+
+#include <asm-generic/qspinlock.h>
+
+#endif /* _ASM_ARM64_QSPINLOCK_H */
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index cae331d..3771339 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -20,6 +20,10 @@
 #include <asm/spinlock_types.h>
 #include <asm/processor.h>
 
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#else
+
 /*
  * Spinlock implementation.
  *
@@ -187,6 +191,12 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+#ifdef CONFIG_QUEUED_RWLOCKS
+#include <asm/qrwlock.h>
+#else
+
 /*
  * Write lock implementation.
  *
@@ -351,6 +361,8 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
 /* read_can_lock - would read_trylock() succeed? */
 #define arch_read_can_lock(x)		((x)->lock < 0x80000000)
 
+#endif /* CONFIG_QUEUED_RWLOCKS */
+
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
 
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
index 55be59a..0f0f156 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -16,9 +16,9 @@
 #ifndef __ASM_SPINLOCK_TYPES_H
 #define __ASM_SPINLOCK_TYPES_H
 
-#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H)
-# error "please don't include this file directly"
-#endif
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm-generic/qspinlock_types.h>
+#else
 
 #include <linux/types.h>
 
@@ -36,10 +36,18 @@ typedef struct {
 
 #define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 , 0 }
 
+#endif /* CONFIG_QUEUED_SPINLOCKS */
+
+#ifdef CONFIG_QUEUED_RWLOCKS
+#include <asm-generic/qrwlock_types.h>
+#else
+
 typedef struct {
 	volatile unsigned int lock;
 } arch_rwlock_t;
 
 #define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
 
+#endif /* CONFIG_QUEUED_RWLOCKS */
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-05-09 19:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 14:51 [PATCH 0/3] arm64: queued spinlocks and rw-locks Yury Norov
2017-05-03 14:51 ` [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock.c Yury Norov
2017-05-03 15:05   ` Geert Uytterhoeven
2017-05-03 20:32     ` Yury Norov
2017-05-03 14:51 ` [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h Yury Norov
2017-05-04  8:01   ` Arnd Bergmann
2017-05-03 14:51 ` [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support Yury Norov
2017-05-09  4:47   ` Boqun Feng
2017-05-09 18:48     ` Yury Norov
2017-05-09 19:37       ` Yury Norov
     [not found] ` <SIXPR0199MB0604CF9C101455F7D7417FF7C5160@SIXPR0199MB0604.apcprd01.prod.exchangelabs.com>
2017-05-04 20:28   ` 答复: [PATCH 0/3] arm64: queued spinlocks and rw-locks Yury Norov
2017-05-05 11:53     ` Peter Zijlstra
2017-05-05 12:26       ` Will Deacon
2017-05-05 15:28         ` Yury Norov
2017-05-05 15:32           ` Will Deacon
  -- strict thread matches above, loose matches on Subject: below --
2017-04-10 21:35 [RFC PATCH " Yury Norov
2017-04-10 21:35 ` [PATCH 3/3] arm64/locking: qspinlocks and qrwlocks support Yury Norov
2017-04-13 18:12   ` Peter Zijlstra
2017-04-20 18:23     ` Yury Norov
2017-04-20 19:00       ` Mark Rutland
2017-04-20 19:05       ` Peter Zijlstra
2017-04-26 12:39         ` Yury Norov
2017-04-28 15:44           ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).