linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -v2 0/6] spin_unlock_wait borkage
@ 2016-05-26 14:19 Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 1/6] locking: Replace smp_cond_acquire with smp_cond_load_acquire Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-26 14:19 UTC (permalink / raw)
  To: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon
  Cc: boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, Peter Zijlstra (Intel)

This version rewrites all spin_unlock_wait() implementations to provide the
acquire order against the spin_unlock() release we wait on, ensuring we can
fully observe the critical section we waited on.

It pulls in the smp_cond_acquire() rewrite because it introduces a lot more
users of it. All simple spin_unlock_wait implementations end up being an
smp_cond_load_acquire().

And while this still introduces smp_acquire__after_ctrl_dep() it keeps its
usage contained to a few sites.

Please consider.. carefully.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH -v2 1/6] locking: Replace smp_cond_acquire with smp_cond_load_acquire
  2016-05-26 14:19 [PATCH -v2 0/6] spin_unlock_wait borkage Peter Zijlstra
@ 2016-05-26 14:19 ` Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 2/6] locking: Introduce cmpwait() Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-26 14:19 UTC (permalink / raw)
  To: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon
  Cc: boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-locking-smp_cond_load_acquire.patch --]
[-- Type: text/plain, Size: 5846 bytes --]

This new form allows using hardware assisted waiting.

Requested-by: Will Deacon <will.deacon@arm.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/compiler.h   |   25 +++++++++++++++++++------
 kernel/locking/qspinlock.c |   12 ++++++------
 kernel/sched/core.c        |    8 ++++----
 kernel/sched/sched.h       |    2 +-
 kernel/smp.c               |    2 +-
 5 files changed, 31 insertions(+), 18 deletions(-)

--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -305,21 +305,34 @@ static __always_inline void __write_once
 })
 
 /**
- * smp_cond_acquire() - Spin wait for cond with ACQUIRE ordering
+ * smp_cond_load_acquire() - (Spin) wait for cond with ACQUIRE ordering
+ * @ptr: pointer to the variable to wait on
  * @cond: boolean expression to wait for
  *
  * Equivalent to using smp_load_acquire() on the condition variable but employs
  * the control dependency of the wait to reduce the barrier on many platforms.
  *
+ * Due to C lacking lambda expressions we load the value of *ptr into a
+ * pre-named variable @VAL to be used in @cond.
+ *
  * The control dependency provides a LOAD->STORE order, the additional RMB
  * provides LOAD->LOAD order, together they provide LOAD->{LOAD,STORE} order,
  * aka. ACQUIRE.
  */
-#define smp_cond_acquire(cond)	do {		\
-	while (!(cond))				\
-		cpu_relax();			\
-	smp_rmb(); /* ctrl + rmb := acquire */	\
-} while (0)
+#ifndef smp_cond_load_acquire
+#define smp_cond_load_acquire(ptr, cond_expr) ({		\
+	typeof(ptr) __PTR = (ptr);				\
+	typeof(*ptr) VAL;					\
+	for (;;) {						\
+		VAL = READ_ONCE(*__PTR);			\
+		if (cond_expr)					\
+			break;					\
+		cpu_relax();					\
+	}							\
+	smp_rmb(); /* ctrl + rmb := acquire */			\
+	VAL;							\
+})
+#endif
 
 #endif /* __KERNEL__ */
 
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -358,7 +358,7 @@ void queued_spin_lock_slowpath(struct qs
 	 * sequentiality; this is because not all clear_pending_set_locked()
 	 * implementations imply full barriers.
 	 */
-	smp_cond_acquire(!(atomic_read(&lock->val) & _Q_LOCKED_MASK));
+	smp_cond_load_acquire(&lock->val.counter, !(VAL & _Q_LOCKED_MASK));
 
 	/*
 	 * take ownership and clear the pending bit.
@@ -434,7 +434,7 @@ void queued_spin_lock_slowpath(struct qs
 	 *
 	 * The PV pv_wait_head_or_lock function, if active, will acquire
 	 * the lock and return a non-zero value. So we have to skip the
-	 * smp_cond_acquire() call. As the next PV queue head hasn't been
+	 * smp_cond_load_acquire() call. As the next PV queue head hasn't been
 	 * designated yet, there is no way for the locked value to become
 	 * _Q_SLOW_VAL. So both the set_locked() and the
 	 * atomic_cmpxchg_relaxed() calls will be safe.
@@ -445,7 +445,7 @@ void queued_spin_lock_slowpath(struct qs
 	if ((val = pv_wait_head_or_lock(lock, node)))
 		goto locked;
 
-	smp_cond_acquire(!((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK));
+	val = smp_cond_load_acquire(&lock->val.counter, !(VAL & _Q_LOCKED_PENDING_MASK));
 
 locked:
 	/*
@@ -465,9 +465,9 @@ void queued_spin_lock_slowpath(struct qs
 			break;
 		}
 		/*
-		 * The smp_cond_acquire() call above has provided the necessary
-		 * acquire semantics required for locking. At most two
-		 * iterations of this loop may be ran.
+		 * The smp_cond_load_acquire() call above has provided the
+		 * necessary acquire semantics required for locking. At most
+		 * two iterations of this loop may be ran.
 		 */
 		old = atomic_cmpxchg_relaxed(&lock->val, val, _Q_LOCKED_VAL);
 		if (old == val)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1843,7 +1843,7 @@ static void ttwu_queue(struct task_struc
  * chain to provide order. Instead we do:
  *
  *   1) smp_store_release(X->on_cpu, 0)
- *   2) smp_cond_acquire(!X->on_cpu)
+ *   2) smp_cond_load_acquire(!X->on_cpu)
  *
  * Example:
  *
@@ -1854,7 +1854,7 @@ static void ttwu_queue(struct task_struc
  *   sched-out X
  *   smp_store_release(X->on_cpu, 0);
  *
- *                    smp_cond_acquire(!X->on_cpu);
+ *                    smp_cond_load_acquire(&X->on_cpu, !VAL);
  *                    X->state = WAKING
  *                    set_task_cpu(X,2)
  *
@@ -1880,7 +1880,7 @@ static void ttwu_queue(struct task_struc
  * This means that any means of doing remote wakeups must order the CPU doing
  * the wakeup against the CPU the task is going to end up running on. This,
  * however, is already required for the regular Program-Order guarantee above,
- * since the waking CPU is the one issueing the ACQUIRE (smp_cond_acquire).
+ * since the waking CPU is the one issueing the ACQUIRE (smp_cond_load_acquire).
  *
  */
 
@@ -1953,7 +1953,7 @@ try_to_wake_up(struct task_struct *p, un
 	 * This ensures that tasks getting woken will be fully ordered against
 	 * their previous state and preserve Program Order.
 	 */
-	smp_cond_acquire(!p->on_cpu);
+	smp_cond_load_acquire(&p->on_cpu, !VAL);
 
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1104,7 +1104,7 @@ static inline void finish_lock_switch(st
 	 * In particular, the load of prev->state in finish_task_switch() must
 	 * happen before this.
 	 *
-	 * Pairs with the smp_cond_acquire() in try_to_wake_up().
+	 * Pairs with the smp_cond_load_acquire() in try_to_wake_up().
 	 */
 	smp_store_release(&prev->on_cpu, 0);
 #endif
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -107,7 +107,7 @@ void __init call_function_init(void)
  */
 static __always_inline void csd_lock_wait(struct call_single_data *csd)
 {
-	smp_cond_acquire(!(csd->flags & CSD_FLAG_LOCK));
+	smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
 }
 
 static __always_inline void csd_lock(struct call_single_data *csd)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH -v2 2/6] locking: Introduce cmpwait()
  2016-05-26 14:19 [PATCH -v2 0/6] spin_unlock_wait borkage Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 1/6] locking: Replace smp_cond_acquire with smp_cond_load_acquire Peter Zijlstra
@ 2016-05-26 14:19 ` Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 3/6] locking: Introduce smp_acquire__after_ctrl_dep Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-26 14:19 UTC (permalink / raw)
  To: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon
  Cc: boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-locking-cmp_and_wait.patch --]
[-- Type: text/plain, Size: 1481 bytes --]

Provide the cmpwait() primitive, which will 'spin' wait for a variable
to change and use it to implement smp_cond_load_acquire().

This primitive can be implemented with hardware assist on some
platforms (ARM64, x86).

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/compiler.h |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -305,6 +305,23 @@ static __always_inline void __write_once
 })
 
 /**
+ * cmpwait - compare and wait for a variable to change
+ * @ptr: pointer to the variable to wait on
+ * @val: the value it should change from
+ *
+ * A simple constuct that waits for a variable to change from a known
+ * value; some architectures can do this in hardware.
+ */
+#ifndef cmpwait
+#define cmpwait(ptr, val) do {					\
+	typeof (ptr) __ptr = (ptr);				\
+	typeof (val) __val = (val);				\
+	while (READ_ONCE(*__ptr) == __val)			\
+		cpu_relax();					\
+} while (0)
+#endif
+
+/**
  * smp_cond_load_acquire() - (Spin) wait for cond with ACQUIRE ordering
  * @ptr: pointer to the variable to wait on
  * @cond: boolean expression to wait for
@@ -327,7 +344,7 @@ static __always_inline void __write_once
 		VAL = READ_ONCE(*__PTR);			\
 		if (cond_expr)					\
 			break;					\
-		cpu_relax();					\
+		cmpwait(__PTR, VAL);				\
 	}							\
 	smp_rmb(); /* ctrl + rmb := acquire */			\
 	VAL;							\

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH -v2 3/6] locking: Introduce smp_acquire__after_ctrl_dep
  2016-05-26 14:19 [PATCH -v2 0/6] spin_unlock_wait borkage Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 1/6] locking: Replace smp_cond_acquire with smp_cond_load_acquire Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 2/6] locking: Introduce cmpwait() Peter Zijlstra
@ 2016-05-26 14:19 ` Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait() Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-26 14:19 UTC (permalink / raw)
  To: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon
  Cc: boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-locking-smp_acquire__after_ctrl_dep.patch --]
[-- Type: text/plain, Size: 2747 bytes --]

Introduce smp_acquire__after_ctrl_dep(), this construct is not
uncommen, but the lack of this barrier is.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/compiler.h |   15 ++++++++++-----
 ipc/sem.c                |   14 ++------------
 2 files changed, 12 insertions(+), 17 deletions(-)

--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -305,6 +305,15 @@ static __always_inline void __write_once
 })
 
 /**
+ * smp_acquire__after_ctrl_dep() - Provide ACQUIRE ordering after a control dependency
+ *
+ * A control dependency provides a LOAD->STORE order, the additional RMB
+ * provides LOAD->LOAD order, together they provide LOAD->{LOAD,STORE} order,
+ * aka. (load)-ACQUIRE.
+ */
+#define smp_acquire__after_ctrl_dep()		smp_rmb()
+
+/**
  * cmpwait - compare and wait for a variable to change
  * @ptr: pointer to the variable to wait on
  * @val: the value it should change from
@@ -331,10 +340,6 @@ static __always_inline void __write_once
  *
  * Due to C lacking lambda expressions we load the value of *ptr into a
  * pre-named variable @VAL to be used in @cond.
- *
- * The control dependency provides a LOAD->STORE order, the additional RMB
- * provides LOAD->LOAD order, together they provide LOAD->{LOAD,STORE} order,
- * aka. ACQUIRE.
  */
 #ifndef smp_cond_load_acquire
 #define smp_cond_load_acquire(ptr, cond_expr) ({		\
@@ -346,7 +351,7 @@ static __always_inline void __write_once
 			break;					\
 		cmpwait(__PTR, VAL);				\
 	}							\
-	smp_rmb(); /* ctrl + rmb := acquire */			\
+	smp_acquire__after_ctrl_dep();				\
 	VAL;							\
 })
 #endif
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -260,16 +260,6 @@ static void sem_rcu_free(struct rcu_head
 }
 
 /*
- * spin_unlock_wait() and !spin_is_locked() are not memory barriers, they
- * are only control barriers.
- * The code must pair with spin_unlock(&sem->lock) or
- * spin_unlock(&sem_perm.lock), thus just the control barrier is insufficient.
- *
- * smp_rmb() is sufficient, as writes cannot pass the control barrier.
- */
-#define ipc_smp_acquire__after_spin_is_unlocked()	smp_rmb()
-
-/*
  * Wait until all currently ongoing simple ops have completed.
  * Caller must own sem_perm.lock.
  * New simple ops cannot start, because simple ops first check
@@ -292,7 +282,7 @@ static void sem_wait_array(struct sem_ar
 		sem = sma->sem_base + i;
 		spin_unlock_wait(&sem->lock);
 	}
-	ipc_smp_acquire__after_spin_is_unlocked();
+	smp_acquire__after_ctrl_dep();
 }
 
 /*
@@ -350,7 +340,7 @@ static inline int sem_lock(struct sem_ar
 			 *	complex_count++;
 			 *	spin_unlock(sem_perm.lock);
 			 */
-			ipc_smp_acquire__after_spin_is_unlocked();
+			smp_acquire__after_ctrl_dep();
 
 			/*
 			 * Now repeat the test of complex_count:

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()
  2016-05-26 14:19 [PATCH -v2 0/6] spin_unlock_wait borkage Peter Zijlstra
                   ` (2 preceding siblings ...)
  2016-05-26 14:19 ` [PATCH -v2 3/6] locking: Introduce smp_acquire__after_ctrl_dep Peter Zijlstra
@ 2016-05-26 14:19 ` Peter Zijlstra
  2016-05-26 21:10   ` Chris Metcalf
  2016-05-27  6:46   ` Martin Schwidefsky
  2016-05-26 14:19 ` [PATCH -v2 5/6] locking: Update spin_unlock_wait users Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 6/6] locking,netfilter: Fix nf_conntrack_lock() Peter Zijlstra
  5 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-26 14:19 UTC (permalink / raw)
  To: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon
  Cc: boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, rth, vgupta, linux,
	realmz6, rkuo, tony.luck, james.hogan, ralf, dhowells, jejb, mpe,
	schwidefsky, ysato, cmetcalf, chris, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-locking-spin_unlock_wait.patch --]
[-- Type: text/plain, Size: 13354 bytes --]

This patch updates/fixes all spin_unlock_wait() implementations.

The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.

This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.

I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.

Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.

I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.

Cc: rth@twiddle.net
Cc: vgupta@synopsys.com
Cc: linux@armlinux.org.uk
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: tony.luck@intel.com
Cc: james.hogan@imgtec.com
Cc: ralf@linux-mips.org
Cc: dhowells@redhat.com
Cc: jejb@parisc-linux.org
Cc: mpe@ellerman.id.au
Cc: schwidefsky@de.ibm.com
Cc: ysato@users.sourceforge.jp
Cc: davem@davemloft.net
Cc: cmetcalf@mellanox.com
Cc: chris@zankel.net
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/include/asm/spinlock.h    |    7 +++++--
 arch/arc/include/asm/spinlock.h      |    7 +++++--
 arch/arm/include/asm/spinlock.h      |   18 ++++++++++++++++--
 arch/blackfin/include/asm/spinlock.h |    3 +--
 arch/hexagon/include/asm/spinlock.h  |    8 ++++++--
 arch/ia64/include/asm/spinlock.h     |    2 ++
 arch/m32r/include/asm/spinlock.h     |    7 +++++--
 arch/metag/include/asm/spinlock.h    |   11 +++++++++--
 arch/mips/include/asm/spinlock.h     |   18 ++++++++++++++++--
 arch/mn10300/include/asm/spinlock.h  |    6 +++++-
 arch/parisc/include/asm/spinlock.h   |    9 +++++++--
 arch/powerpc/include/asm/spinlock.h  |    6 ++++--
 arch/s390/include/asm/spinlock.h     |    3 +--
 arch/sh/include/asm/spinlock.h       |    7 +++++--
 arch/sparc/include/asm/spinlock_32.h |    6 ++++--
 arch/sparc/include/asm/spinlock_64.h |    9 ++++++---
 arch/tile/lib/spinlock_32.c          |    4 ++++
 arch/tile/lib/spinlock_64.c          |    4 ++++
 arch/xtensa/include/asm/spinlock.h   |    7 +++++--
 include/asm-generic/qspinlock.h      |    3 +--
 include/linux/spinlock_up.h          |    9 ++++++---
 21 files changed, 117 insertions(+), 37 deletions(-)

--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -13,8 +13,11 @@
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 #define arch_spin_is_locked(x)	((x)->lock != 0)
-#define arch_spin_unlock_wait(x) \
-		do { cpu_relax(); } while ((x)->lock)
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->lock, !VAL);
+}
 
 static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -15,8 +15,11 @@
 
 #define arch_spin_is_locked(x)	((x)->slock != __ARCH_SPIN_LOCK_UNLOCKED__)
 #define arch_spin_lock_flags(lock, flags)	arch_spin_lock(lock)
-#define arch_spin_unlock_wait(x) \
-	do { while (arch_spin_is_locked(x)) cpu_relax(); } while (0)
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->slock, !VAL);
+}
 
 #ifdef CONFIG_ARC_HAS_LLSC
 
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -50,8 +50,22 @@ static inline void dsb_sev(void)
  * memory.
  */
 
-#define arch_spin_unlock_wait(lock) \
-	do { while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	u16 owner = READ_ONCE(lock->tickets.owner);
+
+	smp_rmb();
+	for (;;) {
+		arch_spinlock_t tmp = READ_ONCE(*lock);
+
+		if (tmp.tickets.owner == tmp.tickets.next ||
+		    tmp.tickets.owner != owner)
+			break;
+
+		wfe();
+	}
+	smp_acquire__after_ctrl_dep();
+}
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
--- a/arch/blackfin/include/asm/spinlock.h
+++ b/arch/blackfin/include/asm/spinlock.h
@@ -48,8 +48,7 @@ static inline void arch_spin_unlock(arch
 
 static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 {
-	while (arch_spin_is_locked(lock))
-		cpu_relax();
+	smp_cond_load_acquire(&lock->lock, !VAL);
 }
 
 static inline int arch_read_can_lock(arch_rwlock_t *rw)
--- a/arch/hexagon/include/asm/spinlock.h
+++ b/arch/hexagon/include/asm/spinlock.h
@@ -176,8 +176,12 @@ static inline unsigned int arch_spin_try
  * SMP spinlocks are intended to allow only a single CPU at the lock
  */
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
-#define arch_spin_unlock_wait(lock) \
-	do {while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->lock, !VAL);
+}
+
 #define arch_spin_is_locked(x) ((x)->lock != 0)
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -86,6 +86,8 @@ static __always_inline void __ticket_spi
 			return;
 		cpu_relax();
 	}
+
+	smp_acquire__after_ctrl_dep();
 }
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
--- a/arch/m32r/include/asm/spinlock.h
+++ b/arch/m32r/include/asm/spinlock.h
@@ -27,8 +27,11 @@
 
 #define arch_spin_is_locked(x)		(*(volatile int *)(&(x)->slock) <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
-#define arch_spin_unlock_wait(x) \
-		do { cpu_relax(); } while (arch_spin_is_locked(x))
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->slock, VAL > 0);
+}
 
 /**
  * arch_spin_trylock - Try spin lock and return a result
--- a/arch/metag/include/asm/spinlock.h
+++ b/arch/metag/include/asm/spinlock.h
@@ -7,8 +7,15 @@
 #include <asm/spinlock_lnkget.h>
 #endif
 
-#define arch_spin_unlock_wait(lock) \
-	do { while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
+/*
+ * both lock1 and lnkget are test-and-set spinlocks with 0 unlocked and 1
+ * locked.
+ */
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->lock, !VAL);
+}
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -48,8 +48,22 @@ static inline int arch_spin_value_unlock
 }
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
-#define arch_spin_unlock_wait(x) \
-	while (arch_spin_is_locked(x)) { cpu_relax(); }
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	u16 owner = READ_ONCE(lock->h.serving_now);
+	smp_rmb();
+	for (;;) {
+		arch_spinlock_t tmp = READ_ONCE(*lock);
+
+		if (tmp.h.serving_now == tmp.h.ticket ||
+		    tmp.h.serving_now != owner)
+			break;
+
+		cpu_relax();
+	}
+	smp_acquire__after_ctrl_dep();
+}
 
 static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
--- a/arch/mn10300/include/asm/spinlock.h
+++ b/arch/mn10300/include/asm/spinlock.h
@@ -23,7 +23,11 @@
  */
 
 #define arch_spin_is_locked(x)	(*(volatile signed char *)(&(x)->slock) != 0)
-#define arch_spin_unlock_wait(x) do { barrier(); } while (arch_spin_is_locked(x))
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->slock, !VAL);
+}
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
--- a/arch/parisc/include/asm/spinlock.h
+++ b/arch/parisc/include/asm/spinlock.h
@@ -13,8 +13,13 @@ static inline int arch_spin_is_locked(ar
 }
 
 #define arch_spin_lock(lock) arch_spin_lock_flags(lock, 0)
-#define arch_spin_unlock_wait(x) \
-		do { cpu_relax(); } while (arch_spin_is_locked(x))
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *x)
+{
+	volatile unsigned int *a = __ldcw_align(x);
+
+	smp_cond_load_acquire(a, VAL);
+}
 
 static inline void arch_spin_lock_flags(arch_spinlock_t *x,
 					 unsigned long flags)
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -165,8 +165,10 @@ static inline void arch_spin_unlock(arch
 #ifdef CONFIG_PPC64
 extern void arch_spin_unlock_wait(arch_spinlock_t *lock);
 #else
-#define arch_spin_unlock_wait(lock) \
-	do { while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->slock, !VAL);
+}
 #endif
 
 /*
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -95,8 +95,7 @@ static inline void arch_spin_unlock(arch
 
 static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 {
-	while (arch_spin_is_locked(lock))
-		arch_spin_relax(lock);
+	smp_cond_load_acquire(&lock->lock, !VAL);
 }
 
 /*
--- a/arch/sh/include/asm/spinlock.h
+++ b/arch/sh/include/asm/spinlock.h
@@ -25,8 +25,11 @@
 
 #define arch_spin_is_locked(x)		((x)->lock <= 0)
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
-#define arch_spin_unlock_wait(x) \
-	do { while (arch_spin_is_locked(x)) cpu_relax(); } while (0)
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->lock, VAL > 0);
+}
 
 /*
  * Simple spin lock operations.  There are two variants, one clears IRQ's
--- a/arch/sparc/include/asm/spinlock_32.h
+++ b/arch/sparc/include/asm/spinlock_32.h
@@ -13,8 +13,10 @@
 
 #define arch_spin_is_locked(lock) (*((volatile unsigned char *)(lock)) != 0)
 
-#define arch_spin_unlock_wait(lock) \
-	do { while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->lock, !VAL);
+}
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
--- a/arch/sparc/include/asm/spinlock_64.h
+++ b/arch/sparc/include/asm/spinlock_64.h
@@ -8,6 +8,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/process.h>
+
 /* To get debugging spinlocks which detect and catch
  * deadlock situations, set CONFIG_DEBUG_SPINLOCK
  * and rebuild your kernel.
@@ -23,9 +25,10 @@
 
 #define arch_spin_is_locked(lp)	((lp)->lock != 0)
 
-#define arch_spin_unlock_wait(lp)	\
-	do {	rmb();			\
-	} while((lp)->lock)
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->lock, !VAL);
+}
 
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
--- a/arch/tile/lib/spinlock_32.c
+++ b/arch/tile/lib/spinlock_32.c
@@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
 	if (next == curr)
 		return;
 
+	smp_rmb();
+
 	/* Wait until the current locker has released the lock. */
 	do {
 		delay_backoff(iterations++);
 	} while (READ_ONCE(lock->current_ticket) == curr);
+
+	smp_acquire__after_ctrl_dep();
 }
 EXPORT_SYMBOL(arch_spin_unlock_wait);
 
--- a/arch/tile/lib/spinlock_64.c
+++ b/arch/tile/lib/spinlock_64.c
@@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
 	if (arch_spin_next(val) == curr)
 		return;
 
+	smp_rmb();
+
 	/* Wait until the current locker has released the lock. */
 	do {
 		delay_backoff(iterations++);
 	} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
+
+	smp_acquire__after_ctrl_dep();
 }
 EXPORT_SYMBOL(arch_spin_unlock_wait);
 
--- a/arch/xtensa/include/asm/spinlock.h
+++ b/arch/xtensa/include/asm/spinlock.h
@@ -29,8 +29,11 @@
  */
 
 #define arch_spin_is_locked(x) ((x)->slock != 0)
-#define arch_spin_unlock_wait(lock) \
-	do { while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
+
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->slock, !VAL);
+}
 
 #define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
 
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -133,8 +133,7 @@ static inline void queued_spin_unlock_wa
 {
 	/* See queued_spin_is_locked() */
 	smp_mb();
-	while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
-		cpu_relax();
+	smp_cond_load_acquire(&lock->val.counter, !(VAL & _Q_LOCKED_MASK));
 }
 
 #ifndef virt_spin_lock
--- a/include/linux/spinlock_up.h
+++ b/include/linux/spinlock_up.h
@@ -25,6 +25,11 @@
 #ifdef CONFIG_DEBUG_SPINLOCK
 #define arch_spin_is_locked(x)		((x)->slock == 0)
 
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	smp_cond_load_acquire(&lock->slock, VAL);
+}
+
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	lock->slock = 0;
@@ -67,6 +72,7 @@ static inline void arch_spin_unlock(arch
 
 #else /* DEBUG_SPINLOCK */
 #define arch_spin_is_locked(lock)	((void)(lock), 0)
+#define arch_spin_unlock_wait(lock)	do { barrier(); (void)(lock); } while (0)
 /* for sched/core.c and kernel_lock.c: */
 # define arch_spin_lock(lock)		do { barrier(); (void)(lock); } while (0)
 # define arch_spin_lock_flags(lock, flags)	do { barrier(); (void)(lock); } while (0)
@@ -79,7 +85,4 @@ static inline void arch_spin_unlock(arch
 #define arch_read_can_lock(lock)	(((void)(lock), 1))
 #define arch_write_can_lock(lock)	(((void)(lock), 1))
 
-#define arch_spin_unlock_wait(lock) \
-		do { cpu_relax(); } while (arch_spin_is_locked(lock))
-
 #endif /* __LINUX_SPINLOCK_UP_H */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH -v2 5/6] locking: Update spin_unlock_wait users
  2016-05-26 14:19 [PATCH -v2 0/6] spin_unlock_wait borkage Peter Zijlstra
                   ` (3 preceding siblings ...)
  2016-05-26 14:19 ` [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait() Peter Zijlstra
@ 2016-05-26 14:19 ` Peter Zijlstra
  2016-05-26 14:19 ` [PATCH -v2 6/6] locking,netfilter: Fix nf_conntrack_lock() Peter Zijlstra
  5 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-26 14:19 UTC (permalink / raw)
  To: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon
  Cc: boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-locking-fix-spin_unlock_wait.patch --]
[-- Type: text/plain, Size: 1434 bytes --]

With the modified semantics of spin_unlock_wait() a number of
explicit barriers can be removed. And update the comment for the
do_exit() usecase, as that was somewhat stale/obscure.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 ipc/sem.c          |    1 -
 kernel/exit.c      |    8 ++++++--
 kernel/task_work.c |    1 -
 3 files changed, 6 insertions(+), 4 deletions(-)

--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -282,7 +282,6 @@ static void sem_wait_array(struct sem_ar
 		sem = sma->sem_base + i;
 		spin_unlock_wait(&sem->lock);
 	}
-	smp_acquire__after_ctrl_dep();
 }
 
 /*
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -776,10 +776,14 @@ void do_exit(long code)
 
 	exit_signals(tsk);  /* sets PF_EXITING */
 	/*
-	 * tsk->flags are checked in the futex code to protect against
-	 * an exiting task cleaning up the robust pi futexes.
+	 * Ensure that all new tsk->pi_lock acquisitions must observe
+	 * PF_EXITING. Serializes against futex.c:attach_to_pi_owner().
 	 */
 	smp_mb();
+	/*
+	 * Ensure that we must observe the pi_state in exit_mm() ->
+	 * mm_release() -> exit_pi_state_list().
+	 */
 	raw_spin_unlock_wait(&tsk->pi_lock);
 
 	if (unlikely(in_atomic())) {
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -108,7 +108,6 @@ void task_work_run(void)
 		 * fail, but it can play with *work and other entries.
 		 */
 		raw_spin_unlock_wait(&task->pi_lock);
-		smp_mb();
 
 		do {
 			next = work->next;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH -v2 6/6] locking,netfilter: Fix nf_conntrack_lock()
  2016-05-26 14:19 [PATCH -v2 0/6] spin_unlock_wait borkage Peter Zijlstra
                   ` (4 preceding siblings ...)
  2016-05-26 14:19 ` [PATCH -v2 5/6] locking: Update spin_unlock_wait users Peter Zijlstra
@ 2016-05-26 14:19 ` Peter Zijlstra
  5 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-26 14:19 UTC (permalink / raw)
  To: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon
  Cc: boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, Peter Zijlstra (Intel)

[-- Attachment #1: peterz-locking-netfilter.patch --]
[-- Type: text/plain, Size: 1687 bytes --]

nf_conntrack_lock{,_all}() is borken as it misses a bunch of memory
barriers to order the whole global vs local locks scheme.

Even x86 (and other TSO archs) are affected.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 net/netfilter/nf_conntrack_core.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -83,6 +83,12 @@ void nf_conntrack_lock(spinlock_t *lock)
 	spin_lock(lock);
 	while (unlikely(nf_conntrack_locks_all)) {
 		spin_unlock(lock);
+
+		/* Order the nf_contrack_locks_all load vs the
+		 * spin_unlock_wait() loads below, to ensure locks_all is
+		 * indeed held.
+		 */
+		smp_rmb(); /* spin_lock(locks_all) */
 		spin_unlock_wait(&nf_conntrack_locks_all_lock);
 		spin_lock(lock);
 	}
@@ -128,6 +134,12 @@ static void nf_conntrack_all_lock(void)
 	spin_lock(&nf_conntrack_locks_all_lock);
 	nf_conntrack_locks_all = true;
 
+	/* Order the above store against the spin_unlock_wait() loads
+	 * below, such that if nf_conntrack_lock() observes lock_all
+	 * we must observe lock[] held.
+	 */
+	smp_mb(); /* spin_lock(locks_all) */
+
 	for (i = 0; i < CONNTRACK_LOCKS; i++) {
 		spin_unlock_wait(&nf_conntrack_locks[i]);
 	}
@@ -135,7 +147,11 @@ static void nf_conntrack_all_lock(void)
 
 static void nf_conntrack_all_unlock(void)
 {
-	nf_conntrack_locks_all = false;
+	/* All prior stores must be complete before we clear locks_all.
+	 * Otherwise nf_conntrack_lock() might observe the false but not the
+	 * entire critical section.
+	 */
+	smp_store_release(&nf_conntrack_locks_all, false);
 	spin_unlock(&nf_conntrack_locks_all_lock);
 }
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()
  2016-05-26 14:19 ` [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait() Peter Zijlstra
@ 2016-05-26 21:10   ` Chris Metcalf
  2016-05-27  9:05     ` Peter Zijlstra
  2016-05-27  6:46   ` Martin Schwidefsky
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Metcalf @ 2016-05-26 21:10 UTC (permalink / raw)
  To: Peter Zijlstra, linux-kernel, torvalds, manfred, dave, paulmck,
	will.deacon
  Cc: boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, rth, vgupta, linux,
	realmz6, rkuo, tony.luck, james.hogan, ralf, dhowells, jejb, mpe,
	schwidefsky, ysato, chris

On 5/26/2016 10:19 AM, Peter Zijlstra wrote:
> --- a/arch/tile/lib/spinlock_32.c
> +++ b/arch/tile/lib/spinlock_32.c
> @@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
>   	if (next == curr)
>   		return;
>   
> +	smp_rmb();
> +
>   	/* Wait until the current locker has released the lock. */
>   	do {
>   		delay_backoff(iterations++);
>   	} while (READ_ONCE(lock->current_ticket) == curr);
> +
> +	smp_acquire__after_ctrl_dep();
>   }
>   EXPORT_SYMBOL(arch_spin_unlock_wait);
>   
> --- a/arch/tile/lib/spinlock_64.c
> +++ b/arch/tile/lib/spinlock_64.c
> @@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
>   	if (arch_spin_next(val) == curr)
>   		return;
>   
> +	smp_rmb();
> +
>   	/* Wait until the current locker has released the lock. */
>   	do {
>   		delay_backoff(iterations++);
>   	} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
> +
> +	smp_acquire__after_ctrl_dep();
>   }
>   EXPORT_SYMBOL(arch_spin_unlock_wait);

The smp_rmb() are unnecessary for tile.  We READ_ONCE next/curr from the
lock and compare them, so we know the load(s) are complete.  There's no
microarchitectural speculation going on so that's that.  Then we READ_ONCE
the next load on the lock from within the wait loop, so our load/load
ordering is guaranteed.

With that change,

Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()
  2016-05-26 14:19 ` [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait() Peter Zijlstra
  2016-05-26 21:10   ` Chris Metcalf
@ 2016-05-27  6:46   ` Martin Schwidefsky
  2016-05-27  9:02     ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Martin Schwidefsky @ 2016-05-27  6:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon,
	boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, rth, vgupta, linux,
	realmz6, rkuo, tony.luck, james.hogan, ralf, dhowells, jejb, mpe,
	ysato, cmetcalf, chris, Heiko Carstens

On Thu, 26 May 2016 16:19:26 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> This patch updates/fixes all spin_unlock_wait() implementations.
> 
> The update is in semantics; where it previously was only a control
> dependency, we now upgrade to a full load-acquire to match the
> store-release from the spin_unlock() we waited on. This ensures that
> when spin_unlock_wait() returns, we're guaranteed to observe the full
> critical section we waited on.
> 
> This fixes a number of spin_unlock_wait() users that (not
> unreasonably) rely on this.

All that is missing is an smp_rmb(), no? 

> --- a/arch/s390/include/asm/spinlock.h
> +++ b/arch/s390/include/asm/spinlock.h
> @@ -95,8 +95,7 @@ static inline void arch_spin_unlock(arch
> 
>  static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
>  {
> -	while (arch_spin_is_locked(lock))
> -		arch_spin_relax(lock);
> +	smp_cond_load_acquire(&lock->lock, !VAL);
>  }
> 
>  /*

This change adds the smp_rmb() at the end of the waiting loop, but
it also replaces arch_spin_relax() alias arch_lock_relax() with a
cpu_relax(). This is not good, these two functions do *very* different
things. cpu_relax() does an undirected yield with diagnose 0x44 but
only if the system is non-SMT. arch_lock_relax() does an additional
cpu_is_preempted() to test if the target cpu is running and does a
directed yield with diagnose 0x9c. 

Why can't we just add the smp_rmb() to the arch_spin_unlock_wait()?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()
  2016-05-27  6:46   ` Martin Schwidefsky
@ 2016-05-27  9:02     ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-27  9:02 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon,
	boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, rth, vgupta, linux,
	realmz6, rkuo, tony.luck, james.hogan, ralf, dhowells, jejb, mpe,
	ysato, cmetcalf, chris, Heiko Carstens

On Fri, May 27, 2016 at 08:46:49AM +0200, Martin Schwidefsky wrote:
> > This fixes a number of spin_unlock_wait() users that (not
> > unreasonably) rely on this.
> 
> All that is missing is an smp_rmb(), no? 

Indeed.

> > --- a/arch/s390/include/asm/spinlock.h
> > +++ b/arch/s390/include/asm/spinlock.h
> > @@ -95,8 +95,7 @@ static inline void arch_spin_unlock(arch
> > 
> >  static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
> >  {
> > -	while (arch_spin_is_locked(lock))
> > -		arch_spin_relax(lock);
> > +	smp_cond_load_acquire(&lock->lock, !VAL);
> >  }
> > 
> >  /*
> 
> This change adds the smp_rmb() at the end of the waiting loop, but
> it also replaces arch_spin_relax() alias arch_lock_relax() with a
> cpu_relax(). This is not good, these two functions do *very* different
> things. cpu_relax() does an undirected yield with diagnose 0x44 but
> only if the system is non-SMT. arch_lock_relax() does an additional
> cpu_is_preempted() to test if the target cpu is running and does a
> directed yield with diagnose 0x9c. 
> 
> Why can't we just add the smp_rmb() to the arch_spin_unlock_wait()?

We can; I forgot about the special cpu_relax on s390, will fix.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()
  2016-05-26 21:10   ` Chris Metcalf
@ 2016-05-27  9:05     ` Peter Zijlstra
  2016-05-27 19:34       ` Chris Metcalf
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-27  9:05 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon,
	boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, rth, vgupta, linux,
	realmz6, rkuo, tony.luck, james.hogan, ralf, dhowells, jejb, mpe,
	schwidefsky, ysato, chris

On Thu, May 26, 2016 at 05:10:36PM -0400, Chris Metcalf wrote:
> On 5/26/2016 10:19 AM, Peter Zijlstra wrote:
> >--- a/arch/tile/lib/spinlock_32.c
> >+++ b/arch/tile/lib/spinlock_32.c
> >@@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
> >  	if (next == curr)
> >  		return;
> >+	smp_rmb();
> >+
> >  	/* Wait until the current locker has released the lock. */
> >  	do {
> >  		delay_backoff(iterations++);
> >  	} while (READ_ONCE(lock->current_ticket) == curr);
> >+
> >+	smp_acquire__after_ctrl_dep();
> >  }
> >  EXPORT_SYMBOL(arch_spin_unlock_wait);
> >--- a/arch/tile/lib/spinlock_64.c
> >+++ b/arch/tile/lib/spinlock_64.c
> >@@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
> >  	if (arch_spin_next(val) == curr)
> >  		return;
> >+	smp_rmb();
> >+
> >  	/* Wait until the current locker has released the lock. */
> >  	do {
> >  		delay_backoff(iterations++);
> >  	} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
> >+
> >+	smp_acquire__after_ctrl_dep();
> >  }
> >  EXPORT_SYMBOL(arch_spin_unlock_wait);
> 
> The smp_rmb() are unnecessary for tile.  We READ_ONCE next/curr from the
> lock and compare them, so we know the load(s) are complete.  There's no
> microarchitectural speculation going on so that's that.  Then we READ_ONCE
> the next load on the lock from within the wait loop, so our load/load
> ordering is guaranteed.

Does TILE never speculate reads? Because in that case the control
dependency already provides a full load->load,store barrier and you'd
want smp_acquire__after_ctrl_dep() to be a barrier() instead of
smp_rmb().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()
  2016-05-27  9:05     ` Peter Zijlstra
@ 2016-05-27 19:34       ` Chris Metcalf
  2016-05-30  9:25         ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Metcalf @ 2016-05-27 19:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon,
	boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, rth, vgupta, linux,
	realmz6, rkuo, tony.luck, james.hogan, ralf, dhowells, jejb, mpe,
	schwidefsky, ysato, chris

On 5/27/2016 5:05 AM, Peter Zijlstra wrote:
> On Thu, May 26, 2016 at 05:10:36PM -0400, Chris Metcalf wrote:
>> On 5/26/2016 10:19 AM, Peter Zijlstra wrote:
>>> --- a/arch/tile/lib/spinlock_32.c
>>> +++ b/arch/tile/lib/spinlock_32.c
>>> @@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
>>>   	if (next == curr)
>>>   		return;
>>> +	smp_rmb();
>>> +
>>>   	/* Wait until the current locker has released the lock. */
>>>   	do {
>>>   		delay_backoff(iterations++);
>>>   	} while (READ_ONCE(lock->current_ticket) == curr);
>>> +
>>> +	smp_acquire__after_ctrl_dep();
>>>   }
>>>   EXPORT_SYMBOL(arch_spin_unlock_wait);
>>> --- a/arch/tile/lib/spinlock_64.c
>>> +++ b/arch/tile/lib/spinlock_64.c
>>> @@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock
>>>   	if (arch_spin_next(val) == curr)
>>>   		return;
>>> +	smp_rmb();
>>> +
>>>   	/* Wait until the current locker has released the lock. */
>>>   	do {
>>>   		delay_backoff(iterations++);
>>>   	} while (arch_spin_current(READ_ONCE(lock->lock)) == curr);
>>> +
>>> +	smp_acquire__after_ctrl_dep();
>>>   }
>>>   EXPORT_SYMBOL(arch_spin_unlock_wait);
>> The smp_rmb() are unnecessary for tile.  We READ_ONCE next/curr from the
>> lock and compare them, so we know the load(s) are complete.  There's no
>> microarchitectural speculation going on so that's that.  Then we READ_ONCE
>> the next load on the lock from within the wait loop, so our load/load
>> ordering is guaranteed.
> Does TILE never speculate reads? Because in that case the control
> dependency already provides a full load->load,store barrier and you'd
> want smp_acquire__after_ctrl_dep() to be a barrier() instead of
> smp_rmb().

Yes, that's a good point.  I didn't look at the definition of smp_acquire__after_ctrl_dep(),
but it certainly sounds like that's exactly a compiler barrier for tile.  There is no load
speculation performed.  The only out-of-order stuff that happens is in the memory
subsystem: stores will become visible in arbitrary order, and loads will arrive in
arbitrary order, but as soon as the result of a load is used in any other kind of
instruction, the instruction issue will halt until the pending load(s) for the instruction
operands are available.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait()
  2016-05-27 19:34       ` Chris Metcalf
@ 2016-05-30  9:25         ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2016-05-30  9:25 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: linux-kernel, torvalds, manfred, dave, paulmck, will.deacon,
	boqun.feng, Waiman.Long, tj, pablo, kaber, davem, oleg,
	netfilter-devel, sasha.levin, hofrat, rth, vgupta, linux,
	realmz6, rkuo, tony.luck, james.hogan, ralf, dhowells, jejb, mpe,
	schwidefsky, ysato, chris

On Fri, May 27, 2016 at 03:34:13PM -0400, Chris Metcalf wrote:

> >Does TILE never speculate reads? Because in that case the control
> >dependency already provides a full load->load,store barrier and you'd
> >want smp_acquire__after_ctrl_dep() to be a barrier() instead of
> >smp_rmb().
> 
> Yes, that's a good point.  I didn't look at the definition of smp_acquire__after_ctrl_dep(),
> but it certainly sounds like that's exactly a compiler barrier for tile.  There is no load
> speculation performed.  The only out-of-order stuff that happens is in the memory
> subsystem: stores will become visible in arbitrary order, and loads will arrive in
> arbitrary order, but as soon as the result of a load is used in any other kind of
> instruction, the instruction issue will halt until the pending load(s) for the instruction
> operands are available.

OK; for now I'll just put in barrier() and a big comment, I'll look at
making smp_acquire__after_ctrl_dep() a proper (per arch) barrier later.
There's a little header head-ache involved.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-05-30  9:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26 14:19 [PATCH -v2 0/6] spin_unlock_wait borkage Peter Zijlstra
2016-05-26 14:19 ` [PATCH -v2 1/6] locking: Replace smp_cond_acquire with smp_cond_load_acquire Peter Zijlstra
2016-05-26 14:19 ` [PATCH -v2 2/6] locking: Introduce cmpwait() Peter Zijlstra
2016-05-26 14:19 ` [PATCH -v2 3/6] locking: Introduce smp_acquire__after_ctrl_dep Peter Zijlstra
2016-05-26 14:19 ` [PATCH -v2 4/6] locking, arch: Update spin_unlock_wait() Peter Zijlstra
2016-05-26 21:10   ` Chris Metcalf
2016-05-27  9:05     ` Peter Zijlstra
2016-05-27 19:34       ` Chris Metcalf
2016-05-30  9:25         ` Peter Zijlstra
2016-05-27  6:46   ` Martin Schwidefsky
2016-05-27  9:02     ` Peter Zijlstra
2016-05-26 14:19 ` [PATCH -v2 5/6] locking: Update spin_unlock_wait users Peter Zijlstra
2016-05-26 14:19 ` [PATCH -v2 6/6] locking,netfilter: Fix nf_conntrack_lock() Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).