[PATCH 0/4] Clarify/standardize memory barriers for lock/unlock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] Clarify/standardize memory barriers for lock/unlock
@ 2016-08-28 11:56 Manfred Spraul
  2016-08-28 11:56 ` [PATCH 1/4] spinlock: Document memory barrier rules Manfred Spraul
  2016-08-29 10:53 ` [PATCH 0/4] Clarify/standardize memory barriers for lock/unlock Peter Zijlstra
  0 siblings, 2 replies; 25+ messages in thread
From: Manfred Spraul @ 2016-08-28 11:56 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

Hi,

as discussed before:
If a high-scalability locking scheme is built with multiple
spinlocks, then often additional memory barriers are required.

The documentation was not as clear as possible, and memory
barriers were missing / superfluous in the implementation.

Patch 1: Documentation, define one standard barrier, update ipc/sem.c
Patch 2: Update rcutree
Patch 3: Update nf_conntrack
Patch 4: Update for qspinlock: smp_mb__after_spin_lock is free.

Patch 3 is larger than required, it rewrites the conntrack logic
with the code from ipc/sem.c. I think the new code is simpler
and more realtime-friendly.

Please review!

@Andrew: The patches are relative to mmots.
Could you include them in your tree, with the target of including in
linux-next?

--
	Manfred

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-28 11:56 [PATCH 0/4] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
@ 2016-08-28 11:56 ` Manfred Spraul
  2016-08-28 11:56   ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Manfred Spraul
  2016-08-29 10:48   ` [PATCH 1/4] spinlock: Document memory barrier rules Peter Zijlstra
  2016-08-29 10:53 ` [PATCH 0/4] Clarify/standardize memory barriers for lock/unlock Peter Zijlstra
  1 sibling, 2 replies; 25+ messages in thread
From: Manfred Spraul @ 2016-08-28 11:56 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

Right now, the spinlock machinery tries to guarantee barriers even for
unorthodox locking cases, which ends up as a constant stream of updates
as the architectures try to support new unorthodox ideas.

The patch proposes to reverse that:
spin_lock is ACQUIRE, spin_unlock is RELEASE.
spin_unlock_wait is also ACQUIRE.
Code that needs further guarantees must use appropriate explicit barriers.

Architectures that can implement some barriers for free can define the
barriers as NOPs.

As the initial step, the patch converts ipc/sem.c to the new defines:
- no more smp_rmb() after spin_unlock_wait(), that is part of
  spin_unlock_wait()
- smp_mb__after_spin_lock() instead of a direct smp_mb().

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 Documentation/locking/spinlocks.txt |  5 +++++
 include/linux/spinlock.h            | 12 ++++++++++++
 ipc/sem.c                           | 16 +---------------
 3 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/Documentation/locking/spinlocks.txt b/Documentation/locking/spinlocks.txt
index ff35e40..fc37beb 100644
--- a/Documentation/locking/spinlocks.txt
+++ b/Documentation/locking/spinlocks.txt
@@ -40,6 +40,11 @@ example, internal driver data structures that nobody else ever touches).
    touches a shared variable has to agree about the spinlock they want
    to use.
 
+   NOTE! Code that needs stricter memory barriers than ACQUIRE during
+   LOCK and RELEASE during UNLOCK must use appropriate memory barriers
+   such as smp_mb__after_spin_lock().
+   spin_unlock_wait() has ACQUIRE semantics.
+
 ----
 
 Lesson 2: reader-writer spinlocks.
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 47dd0ce..d79000e 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -130,6 +130,18 @@ do {								\
 #define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
+#ifndef smp_mb__after_spin_lock
+/**
+ * smp_mb__after_spin_lock() - Provide smp_mb() after spin_lock
+ *
+ * spin_lock() provides ACQUIRE semantics regarding reading the lock.
+ * There are no guarantees that the lock write is visible before any read
+ * or write operation within the protected area is performed.
+ * If the lock write must happen first, this function is required.
+ */
+#define smp_mb__after_spin_lock()	smp_mb()
+#endif
+
 /**
  * raw_spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/ipc/sem.c b/ipc/sem.c
index 5e318c5..ac15ab2 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -290,14 +290,6 @@ static void complexmode_enter(struct sem_array *sma)
 		sem = sma->sem_base + i;
 		spin_unlock_wait(&sem->lock);
 	}
-	/*
-	 * spin_unlock_wait() is not a memory barriers, it is only a
-	 * control barrier. The code must pair with spin_unlock(&sem->lock),
-	 * thus just the control barrier is insufficient.
-	 *
-	 * smp_rmb() is sufficient, as writes cannot pass the control barrier.
-	 */
-	smp_rmb();
 }
 
 /*
@@ -363,13 +355,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 		 */
 		spin_lock(&sem->lock);
 
-		/*
-		 * See 51d7d5205d33
-		 * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
-		 * A full barrier is required: the write of sem->lock
-		 * must be visible before the read is executed
-		 */
-		smp_mb();
+		smp_mb__after_spin_lock();
 
 		if (!smp_load_acquire(&sma->complex_mode)) {
 			/* fast path successful! */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h
  2016-08-28 11:56 ` [PATCH 1/4] spinlock: Document memory barrier rules Manfred Spraul
@ 2016-08-28 11:56   ` Manfred Spraul
  2016-08-28 11:56     ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
                       ` (2 more replies)
  2016-08-29 10:48   ` [PATCH 1/4] spinlock: Document memory barrier rules Peter Zijlstra
  1 sibling, 3 replies; 25+ messages in thread
From: Manfred Spraul @ 2016-08-28 11:56 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

spin_unlock() + spin_lock() together do not form a full memory barrier:

  a=1;
  spin_unlock(&b);
  spin_lock(&c);
+ smp_mb__after_unlock_lock();
  d=1;

Without the smp_mb__after_unlock_lock(), other CPUs can observe the
write to d without seeing the write to a.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 include/asm-generic/barrier.h | 16 ++++++++++++++++
 kernel/rcu/tree.h             | 12 ------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index fe297b5..9b4d28f 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -244,6 +244,22 @@ do {									\
 	smp_acquire__after_ctrl_dep();				\
 	VAL;							\
 })
+
+#ifndef smp_mb__after_unlock_lock
+/*
+ * Place this after a lock-acquisition primitive to guarantee that
+ * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
+ * if the UNLOCK and LOCK are executed by the same CPU or if the
+ * UNLOCK and LOCK operate on the same lock variable.
+ */
+#ifdef CONFIG_PPC
+#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
+#else /* #ifdef CONFIG_PPC */
+#define smp_mb__after_unlock_lock()	do { } while (0)
+#endif /* #else #ifdef CONFIG_PPC */
+
+#endif
+
 #endif
 
 #endif /* !__ASSEMBLY__ */
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e99a523..a0cd9ab 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -687,18 +687,6 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 #endif /* #ifdef CONFIG_RCU_TRACE */
 
 /*
- * Place this after a lock-acquisition primitive to guarantee that
- * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
- * if the UNLOCK and LOCK are executed by the same CPU or if the
- * UNLOCK and LOCK operate on the same lock variable.
- */
-#ifdef CONFIG_PPC
-#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
-#else /* #ifdef CONFIG_PPC */
-#define smp_mb__after_unlock_lock()	do { } while (0)
-#endif /* #else #ifdef CONFIG_PPC */
-
-/*
  * Wrappers for the rcu_node::lock acquire and release.
  *
  * Because the rcu_nodes form a tree, the tree traversal locking will observe
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers.
  2016-08-28 11:56   ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Manfred Spraul
@ 2016-08-28 11:56     ` Manfred Spraul
  2016-08-28 11:56       ` [PATCH 4/4] qspinlock for x86: smp_mb__after_spin_lock() is free Manfred Spraul
  2016-08-29 10:51       ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Peter Zijlstra
  2016-08-28 13:43     ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Paul E. McKenney
  2016-08-28 14:41     ` kbuild test robot
  2 siblings, 2 replies; 25+ messages in thread
From: Manfred Spraul @ 2016-08-28 11:56 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

Change the locking for nf_conntrack_lock and nf_conntract_lock_all()
to the approach used by ipc/sem.c:

- use smp_store_mb() instead of a raw smp_mb()
- use smp_mb__after_spin_lock().
- for nf_conntrack_lock, use spin_lock(&global_lock) instead of
  spin_unlock_wait(&global_lock) and loop backward.

The last change avoids that nf_conntrack_lock() could loop multiple times.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 net/netfilter/nf_conntrack_core.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index dd2c43a..ded1adc 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -80,20 +80,29 @@ static __read_mostly bool nf_conntrack_locks_all;
 
 void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
 {
+	/* Step 1: Acquire the lock */
 	spin_lock(lock);
-	while (unlikely(nf_conntrack_locks_all)) {
-		spin_unlock(lock);
 
-		/*
-		 * Order the 'nf_conntrack_locks_all' load vs. the
-		 * spin_unlock_wait() loads below, to ensure
-		 * that 'nf_conntrack_locks_all_lock' is indeed held:
-		 */
-		smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
-		spin_unlock_wait(&nf_conntrack_locks_all_lock);
-		spin_lock(lock);
-	}
+	/* Step 2: make it visible to all CPUs that we hold the lock */
+	smp_mb__after_spin_lock();
+
+	/* Step 3: read locks_all, with ACQUIRE semantics */
+	if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
+		return;
+
+	/* slow path: unlock */
+	spin_unlock(lock);
+
+	/* Slow path, step 1: get global lock */
+	spin_lock(&nf_conntrack_locks_all_lock);
+
+	/* Slow path, step 2: get the lock we want */
+	spin_lock(lock);
+
+	/* Slow path, step 3: release the global lock */
+	spin_unlock(&nf_conntrack_locks_all_lock);
 }
+
 EXPORT_SYMBOL_GPL(nf_conntrack_lock);
 
 static void nf_conntrack_double_unlock(unsigned int h1, unsigned int h2)
@@ -133,15 +142,14 @@ static void nf_conntrack_all_lock(void)
 	int i;
 
 	spin_lock(&nf_conntrack_locks_all_lock);
-	nf_conntrack_locks_all = true;
 
 	/*
-	 * Order the above store of 'nf_conntrack_locks_all' against
+	 * Order the store of 'nf_conntrack_locks_all' against
 	 * the spin_unlock_wait() loads below, such that if
 	 * nf_conntrack_lock() observes 'nf_conntrack_locks_all'
 	 * we must observe nf_conntrack_locks[] held:
 	 */
-	smp_mb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
+	smp_store_mb(nf_conntrack_locks_all, true);
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++) {
 		spin_unlock_wait(&nf_conntrack_locks[i]);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] qspinlock for x86: smp_mb__after_spin_lock() is free
  2016-08-28 11:56     ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
@ 2016-08-28 11:56       ` Manfred Spraul
  2016-08-29 10:52         ` Peter Zijlstra
  2016-08-29 10:51       ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Manfred Spraul @ 2016-08-28 11:56 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

For x86 qspinlocks, no additional memory barrier is required in
smp_mb__after_spin_lock:

Theoretically, for qspinlock we could define two barriers:
- smp_mb__after_spin_lock: Free for x86, not free for powerpc
- smp_mb__between_spin_lock_and_spin_unlock_wait():
	Free for all archs, see queued_spin_unlock_wait for details.

As smp_mb__between_spin_lock_and_spin_unlock_wait() is not used
in any hotpaths, the patch does not create that define yet.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 arch/x86/include/asm/qspinlock.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index eaba080..da06433 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -61,6 +61,17 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
 }
 #endif /* CONFIG_PARAVIRT */
 
+#ifndef smp_mb__after_spin_lock
+/**
+ * smp_mb__after_spin_lock() - Provide smp_mb() after spin_lock
+ *
+ * queued_spin_lock() provides full memory barriers semantics,
+ * thus no further memory barrier is required. See
+ * queued_spin_unlock_wait() for further details.
+ */
+#define smp_mb__after_spin_lock()	barrier()
+#endif
+
 #include <asm-generic/qspinlock.h>
 
 #endif /* _ASM_X86_QSPINLOCK_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h
  2016-08-28 11:56   ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Manfred Spraul
  2016-08-28 11:56     ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
@ 2016-08-28 13:43     ` Paul E. McKenney
  2016-08-28 16:31       ` Manfred Spraul
  2016-08-28 18:00       ` Manfred Spraul
  2016-08-28 14:41     ` kbuild test robot
  2 siblings, 2 replies; 25+ messages in thread
From: Paul E. McKenney @ 2016-08-28 13:43 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton,
	LKML, 1vier1, Davidlohr Bueso

On Sun, Aug 28, 2016 at 01:56:14PM +0200, Manfred Spraul wrote:
> spin_unlock() + spin_lock() together do not form a full memory barrier:
> 
>   a=1;
>   spin_unlock(&b);
>   spin_lock(&c);
> + smp_mb__after_unlock_lock();
>   d=1;

Better would be s/d=1/r1=d/ above.

Then another process doing this:

	d=1
	smp_mb()
	r2=a

might have the after-the-dust-settles outcome of r1==0&&r2==0.

The advantage of this scenario is that it can happen on real hardware.

> 
> Without the smp_mb__after_unlock_lock(), other CPUs can observe the
> write to d without seeing the write to a.
> 
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>

With the upgraded commit log, I am OK with the patch below.
However, others will probably want to see at least one use of
smp_mb__after_unlock_lock() outside of RCU.

							Thanx, Paul

> ---
>  include/asm-generic/barrier.h | 16 ++++++++++++++++
>  kernel/rcu/tree.h             | 12 ------------
>  2 files changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index fe297b5..9b4d28f 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -244,6 +244,22 @@ do {									\
>  	smp_acquire__after_ctrl_dep();				\
>  	VAL;							\
>  })
> +
> +#ifndef smp_mb__after_unlock_lock
> +/*
> + * Place this after a lock-acquisition primitive to guarantee that
> + * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
> + * if the UNLOCK and LOCK are executed by the same CPU or if the
> + * UNLOCK and LOCK operate on the same lock variable.
> + */
> +#ifdef CONFIG_PPC
> +#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
> +#else /* #ifdef CONFIG_PPC */
> +#define smp_mb__after_unlock_lock()	do { } while (0)
> +#endif /* #else #ifdef CONFIG_PPC */
> +
> +#endif
> +
>  #endif
> 
>  #endif /* !__ASSEMBLY__ */
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index e99a523..a0cd9ab 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -687,18 +687,6 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
>  #endif /* #ifdef CONFIG_RCU_TRACE */
> 
>  /*
> - * Place this after a lock-acquisition primitive to guarantee that
> - * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
> - * if the UNLOCK and LOCK are executed by the same CPU or if the
> - * UNLOCK and LOCK operate on the same lock variable.
> - */
> -#ifdef CONFIG_PPC
> -#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
> -#else /* #ifdef CONFIG_PPC */
> -#define smp_mb__after_unlock_lock()	do { } while (0)
> -#endif /* #else #ifdef CONFIG_PPC */
> -
> -/*
>   * Wrappers for the rcu_node::lock acquire and release.
>   *
>   * Because the rcu_nodes form a tree, the tree traversal locking will observe
> -- 
> 2.5.5
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h
  2016-08-28 11:56   ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Manfred Spraul
  2016-08-28 11:56     ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
  2016-08-28 13:43     ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Paul E. McKenney
@ 2016-08-28 14:41     ` kbuild test robot
  2016-08-28 17:43       ` [PATCH 2/4 v3] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h Manfred Spraul
  2 siblings, 1 reply; 25+ messages in thread
From: kbuild test robot @ 2016-08-28 14:41 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: kbuild-all, benh, paulmck, Ingo Molnar, Boqun Feng,
	Peter Zijlstra, Andrew Morton, LKML, 1vier1, Davidlohr Bueso,
	Manfred Spraul

[-- Attachment #1: Type: text/plain, Size: 2608 bytes --]

Hi Manfred,

[auto build test ERROR on next-20160825]
[cannot apply to tip/locking/core linus/master linux/master v4.8-rc3 v4.8-rc2 v4.8-rc1 v4.8-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Manfred-Spraul/spinlock-Document-memory-barrier-rules/20160828-200220
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   In file included from kernel/rcu/tree.c:59:0:
   kernel/rcu/tree.h: In function 'raw_spin_lock_rcu_node':
>> kernel/rcu/tree.h:706:2: error: implicit declaration of function 'smp_mb__after_unlock_lock' [-Werror=implicit-function-declaration]
     smp_mb__after_unlock_lock();
     ^
   cc1: some warnings being treated as errors

vim +/smp_mb__after_unlock_lock +706 kernel/rcu/tree.h

67c583a7 Boqun Feng     2015-12-29  700   * As ->lock of struct rcu_node is a __private field, therefore one should use
67c583a7 Boqun Feng     2015-12-29  701   * these wrappers rather than directly call raw_spin_{lock,unlock}* on ->lock.
2a67e741 Peter Zijlstra 2015-10-08  702   */
2a67e741 Peter Zijlstra 2015-10-08  703  static inline void raw_spin_lock_rcu_node(struct rcu_node *rnp)
2a67e741 Peter Zijlstra 2015-10-08  704  {
67c583a7 Boqun Feng     2015-12-29  705  	raw_spin_lock(&ACCESS_PRIVATE(rnp, lock));
2a67e741 Peter Zijlstra 2015-10-08 @706  	smp_mb__after_unlock_lock();
2a67e741 Peter Zijlstra 2015-10-08  707  }
2a67e741 Peter Zijlstra 2015-10-08  708  
67c583a7 Boqun Feng     2015-12-29  709  static inline void raw_spin_unlock_rcu_node(struct rcu_node *rnp)

:::::: The code at line 706 was first introduced by commit
:::::: 2a67e741bbbc022e0fadf8c6dbc3a76019ecd0cf rcu: Create transitive rnp->lock acquisition functions

:::::: TO: Peter Zijlstra <peterz@infradead.org>
:::::: CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 30985 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h
  2016-08-28 13:43     ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Paul E. McKenney
@ 2016-08-28 16:31       ` Manfred Spraul
  2016-08-28 18:00       ` Manfred Spraul
  1 sibling, 0 replies; 25+ messages in thread
From: Manfred Spraul @ 2016-08-28 16:31 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

spin_unlock() + spin_lock() together do not form a full memory barrier:
(everything initialized to 0)

CPU1:
  a=1;
  spin_unlock(&b);
  spin_lock(&c);
+ smp_mb__after_unlock_lock();
  r1=d;

CPU2:
  d=1;
  smp_mb();
  r2=a;

Without the smp_mb__after_unlock_lock(), r1==0 && r2==0 would
be possible.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 include/asm-generic/barrier.h | 16 ++++++++++++++++
 kernel/rcu/tree.h             | 12 ------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index fe297b5..9b4d28f 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -244,6 +244,22 @@ do {									\
 	smp_acquire__after_ctrl_dep();				\
 	VAL;							\
 })
+
+#ifndef smp_mb__after_unlock_lock
+/*
+ * Place this after a lock-acquisition primitive to guarantee that
+ * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
+ * if the UNLOCK and LOCK are executed by the same CPU or if the
+ * UNLOCK and LOCK operate on the same lock variable.
+ */
+#ifdef CONFIG_PPC
+#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
+#else /* #ifdef CONFIG_PPC */
+#define smp_mb__after_unlock_lock()	do { } while (0)
+#endif /* #else #ifdef CONFIG_PPC */
+
+#endif
+
 #endif
 
 #endif /* !__ASSEMBLY__ */
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e99a523..a0cd9ab 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -687,18 +687,6 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 #endif /* #ifdef CONFIG_RCU_TRACE */
 
 /*
- * Place this after a lock-acquisition primitive to guarantee that
- * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
- * if the UNLOCK and LOCK are executed by the same CPU or if the
- * UNLOCK and LOCK operate on the same lock variable.
- */
-#ifdef CONFIG_PPC
-#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
-#else /* #ifdef CONFIG_PPC */
-#define smp_mb__after_unlock_lock()	do { } while (0)
-#endif /* #else #ifdef CONFIG_PPC */
-
-/*
  * Wrappers for the rcu_node::lock acquire and release.
  *
  * Because the rcu_nodes form a tree, the tree traversal locking will observe
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4 v3] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h
  2016-08-28 14:41     ` kbuild test robot
@ 2016-08-28 17:43       ` Manfred Spraul
  0 siblings, 0 replies; 25+ messages in thread
From: Manfred Spraul @ 2016-08-28 17:43 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

v3: If smp_mb__after_unlock_lock() is in barrier.h, then
for arm64, kernel/rcu/tree.c doesn't compile because barrier.h
is not included in kernel/rcu/tree.c

(v2 was: add example from Paul, something that can happen on real HW)

spin_unlock() + spin_lock() together do not form a full memory barrier:
(everything initialized to 0)

CPU1:
  a=1;
  spin_unlock(&b);
  spin_lock(&c);
+ smp_mb__after_unlock_lock();
  r1=d;

CPU2:
  d=1;
  smp_mb();
  r2=a;

Without the smp_mb__after_unlock_lock(), r1==0 && r2==0 would
be possible.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 include/linux/spinlock.h | 16 ++++++++++++++++
 kernel/rcu/tree.h        | 12 ------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index d79000e..5075c88 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -142,6 +142,22 @@ do {								\
 #define smp_mb__after_spin_lock()	smp_mb()
 #endif
 
+#ifndef smp_mb__after_unlock_lock
+/**
+ * smp_mb__after_unlock_lock() - Provide smp_mb() after unlock+lock
+ *
+ * Place this after a lock-acquisition primitive to guarantee that
+ * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
+ * if the UNLOCK and LOCK are executed by the same CPU or if the
+ * UNLOCK and LOCK operate on the same lock variable.
+ */
+#ifdef CONFIG_PPC
+#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
+#else /* #ifdef CONFIG_PPC */
+#define smp_mb__after_unlock_lock()	do { } while (0)
+#endif /* #else #ifdef CONFIG_PPC */
+#endif
+
 /**
  * raw_spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e99a523..a0cd9ab 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -687,18 +687,6 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 #endif /* #ifdef CONFIG_RCU_TRACE */
 
 /*
- * Place this after a lock-acquisition primitive to guarantee that
- * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
- * if the UNLOCK and LOCK are executed by the same CPU or if the
- * UNLOCK and LOCK operate on the same lock variable.
- */
-#ifdef CONFIG_PPC
-#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
-#else /* #ifdef CONFIG_PPC */
-#define smp_mb__after_unlock_lock()	do { } while (0)
-#endif /* #else #ifdef CONFIG_PPC */
-
-/*
  * Wrappers for the rcu_node::lock acquire and release.
  *
  * Because the rcu_nodes form a tree, the tree traversal locking will observe
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h
  2016-08-28 13:43     ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Paul E. McKenney
  2016-08-28 16:31       ` Manfred Spraul
@ 2016-08-28 18:00       ` Manfred Spraul
  1 sibling, 0 replies; 25+ messages in thread
From: Manfred Spraul @ 2016-08-28 18:00 UTC (permalink / raw)
  To: paulmck
  Cc: benh, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton,
	LKML, 1vier1, Davidlohr Bueso

On 08/28/2016 03:43 PM, Paul E. McKenney wrote:
>
>> Without the smp_mb__after_unlock_lock(), other CPUs can observe the
>> write to d without seeing the write to a.
>>
>> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> With the upgraded commit log, I am OK with the patch below.
Done.
> However, others will probably want to see at least one use of
> smp_mb__after_unlock_lock() outside of RCU.
I would look at it from the other side:
There are at least half a dozen hardware/spinlock implementations that 
must support rcu.
And for half a dozen implementations, a global header file makes sense, 
regardless of the number of users.

With this in the global header file
 > #ifndef complex_memory_barrier()
 > #define complex_memory_barrier()    always_safe_fallback()
 > #endif

it is easier for the architectures to support rcu, ipc/sem and nf_conntrack.
Especially if everything is (as it is now) in <linux/spinlock.h>

--
     Manfred

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-28 11:56 ` [PATCH 1/4] spinlock: Document memory barrier rules Manfred Spraul
  2016-08-28 11:56   ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Manfred Spraul
@ 2016-08-29 10:48   ` Peter Zijlstra
  2016-08-29 12:54     ` Manfred Spraul
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-08-29 10:48 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso

On Sun, Aug 28, 2016 at 01:56:13PM +0200, Manfred Spraul wrote:
> Right now, the spinlock machinery tries to guarantee barriers even for
> unorthodox locking cases, which ends up as a constant stream of updates
> as the architectures try to support new unorthodox ideas.
> 
> The patch proposes to reverse that:
> spin_lock is ACQUIRE, spin_unlock is RELEASE.
> spin_unlock_wait is also ACQUIRE.
> Code that needs further guarantees must use appropriate explicit barriers.
> 
> Architectures that can implement some barriers for free can define the
> barriers as NOPs.
> 
> As the initial step, the patch converts ipc/sem.c to the new defines:
> - no more smp_rmb() after spin_unlock_wait(), that is part of
>   spin_unlock_wait()
> - smp_mb__after_spin_lock() instead of a direct smp_mb().
> 

Why? This does not explain why..

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers.
  2016-08-28 11:56     ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
  2016-08-28 11:56       ` [PATCH 4/4] qspinlock for x86: smp_mb__after_spin_lock() is free Manfred Spraul
@ 2016-08-29 10:51       ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-08-29 10:51 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso

On Sun, Aug 28, 2016 at 01:56:15PM +0200, Manfred Spraul wrote:
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -80,20 +80,29 @@ static __read_mostly bool nf_conntrack_locks_all;
>  
>  void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
>  {
> +	/* Step 1: Acquire the lock */
>  	spin_lock(lock);
> -	while (unlikely(nf_conntrack_locks_all)) {
> -		spin_unlock(lock);
>  
> -		/*
> -		 * Order the 'nf_conntrack_locks_all' load vs. the
> -		 * spin_unlock_wait() loads below, to ensure
> -		 * that 'nf_conntrack_locks_all_lock' is indeed held:
> -		 */
> -		smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
> -		spin_unlock_wait(&nf_conntrack_locks_all_lock);
> -		spin_lock(lock);
> -	}
> +	/* Step 2: make it visible to all CPUs that we hold the lock */
> +	smp_mb__after_spin_lock();

I hate this comment. A barrier does _not_ make visible anything.

A barrier _orders_ things.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] qspinlock for x86: smp_mb__after_spin_lock() is free
  2016-08-28 11:56       ` [PATCH 4/4] qspinlock for x86: smp_mb__after_spin_lock() is free Manfred Spraul
@ 2016-08-29 10:52         ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-08-29 10:52 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso

On Sun, Aug 28, 2016 at 01:56:16PM +0200, Manfred Spraul wrote:
> For x86 qspinlocks, no additional memory barrier is required in
> smp_mb__after_spin_lock:
> 
> Theoretically, for qspinlock we could define two barriers:
> - smp_mb__after_spin_lock: Free for x86, not free for powerpc
> - smp_mb__between_spin_lock_and_spin_unlock_wait():
> 	Free for all archs, see queued_spin_unlock_wait for details.
> 
> As smp_mb__between_spin_lock_and_spin_unlock_wait() is not used
> in any hotpaths, the patch does not create that define yet.
> 
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> ---
>  arch/x86/include/asm/qspinlock.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index eaba080..da06433 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -61,6 +61,17 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
>  }
>  #endif /* CONFIG_PARAVIRT */
>  
> +#ifndef smp_mb__after_spin_lock
> +/**
> + * smp_mb__after_spin_lock() - Provide smp_mb() after spin_lock
> + *
> + * queued_spin_lock() provides full memory barriers semantics,
> + * thus no further memory barrier is required. See
> + * queued_spin_unlock_wait() for further details.
> + */
> +#define smp_mb__after_spin_lock()	barrier()
> +#endif


I don't get this barrier, and I from my understanding this isn't
correct.

Please explain more.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Clarify/standardize memory barriers for lock/unlock
  2016-08-28 11:56 [PATCH 0/4] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
  2016-08-28 11:56 ` [PATCH 1/4] spinlock: Document memory barrier rules Manfred Spraul
@ 2016-08-29 10:53 ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2016-08-29 10:53 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso



Please use --no-chain-reply-to.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-29 10:48   ` [PATCH 1/4] spinlock: Document memory barrier rules Peter Zijlstra
@ 2016-08-29 12:54     ` Manfred Spraul
  2016-08-29 13:44       ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Manfred Spraul @ 2016-08-29 12:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso

Hi Peter,

On 08/29/2016 12:48 PM, Peter Zijlstra wrote:
> On Sun, Aug 28, 2016 at 01:56:13PM +0200, Manfred Spraul wrote:
>> Right now, the spinlock machinery tries to guarantee barriers even for
>> unorthodox locking cases, which ends up as a constant stream of updates
>> as the architectures try to support new unorthodox ideas.
>>
>> The patch proposes to reverse that:
>> spin_lock is ACQUIRE, spin_unlock is RELEASE.
>> spin_unlock_wait is also ACQUIRE.
>> Code that needs further guarantees must use appropriate explicit barriers.
>>
>> Architectures that can implement some barriers for free can define the
>> barriers as NOPs.
>>
>> As the initial step, the patch converts ipc/sem.c to the new defines:
>> - no more smp_rmb() after spin_unlock_wait(), that is part of
>>    spin_unlock_wait()
>> - smp_mb__after_spin_lock() instead of a direct smp_mb().
>>
> Why? This does not explain why..

Which explanation is missing?

- removal of the smb_rmb() after spin_unlock_wait?

What about:

> - With commit 2c6100227116
>   ("locking/qspinlock: Fix spin_unlock_wait() some more"),
>   (and the commits for the other archs), spin_unlock_wait() is an
>   ACQUIRE.
>   Therefore the smp_rmb() after spin_unlock_wait() can be removed.
> - smp_mb__after_spin_lock() instead of a direct smp_mb().
>   This allows that architectures override it with a less expensive
>   barrier if this is sufficient for their hardware.
- Why smp_mb is required after spin_lock? See Patch 02, I added the race 
that exists on real hardware.

Exactly the same issue exists for sem.c

- Why introduce a smp_mb__after_spin_lock()?

The other options would be:
- same as RCU, i.e. add CONFIG_PPC into sem.c and nf_contrack_core.c
- overhead for all archs by added an unconditional smp_mb()

--
     Manfred

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-29 12:54     ` Manfred Spraul
@ 2016-08-29 13:44       ` Peter Zijlstra
  2016-08-31  4:59         ` Manfred Spraul
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-08-29 13:44 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso

On Mon, Aug 29, 2016 at 02:54:54PM +0200, Manfred Spraul wrote:
> Hi Peter,
> 
> On 08/29/2016 12:48 PM, Peter Zijlstra wrote:
> >On Sun, Aug 28, 2016 at 01:56:13PM +0200, Manfred Spraul wrote:
> >>Right now, the spinlock machinery tries to guarantee barriers even for
> >>unorthodox locking cases, which ends up as a constant stream of updates
> >>as the architectures try to support new unorthodox ideas.
> >>
> >>The patch proposes to reverse that:
> >>spin_lock is ACQUIRE, spin_unlock is RELEASE.
> >>spin_unlock_wait is also ACQUIRE.
> >>Code that needs further guarantees must use appropriate explicit barriers.
> >>
> >>Architectures that can implement some barriers for free can define the
> >>barriers as NOPs.
> >>
> >>As the initial step, the patch converts ipc/sem.c to the new defines:
> >>- no more smp_rmb() after spin_unlock_wait(), that is part of
> >>   spin_unlock_wait()
> >>- smp_mb__after_spin_lock() instead of a direct smp_mb().
> >>
> >Why? This does not explain why..
> 
> Which explanation is missing?
> 
> - removal of the smb_rmb() after spin_unlock_wait?

So that should have been a separate patch. This thing doing two things
is wrong too. But no, this I get. I did make spin_unlock_wait() an
ACQUIRE after all.

> - Why smp_mb is required after spin_lock? See Patch 02, I added the race
> that exists on real hardware.
> 
> Exactly the same issue exists for sem.c
> 
> - Why introduce a smp_mb__after_spin_lock()?
> 
> The other options would be:
> - same as RCU, i.e. add CONFIG_PPC into sem.c and nf_contrack_core.c
> - overhead for all archs by added an unconditional smp_mb()

See, this too doesn't adequately explain the situation, since all refers
to other sources.

If you add a barrier, the Changelog had better be clear. And I'm still
not entirely sure I get what exactly this barrier should do, nor why it
defaults to a full smp_mb. If what I suspect it should do, only PPC and
ARM64 need the barrier.

And x86 doesn't need it -- _however_ it would need it if you require
full smp_mb semantics, which I suspect you don't.

Which brings us back to a very poor definition of what this barrier
should be doing.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-29 13:44       ` Peter Zijlstra
@ 2016-08-31  4:59         ` Manfred Spraul
  2016-08-31 15:40           ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Manfred Spraul @ 2016-08-31  4:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso

On 08/29/2016 03:44 PM, Peter Zijlstra wrote:
>
> If you add a barrier, the Changelog had better be clear. And I'm still
> not entirely sure I get what exactly this barrier should do, nor why it
> defaults to a full smp_mb. If what I suspect it should do, only PPC and
> ARM64 need the barrier.
The barrier must ensure that taking the spinlock (as observed by another 
cpu with spin_unlock_wait()) and a following read are ordered.

start condition: sma->complex_mode = false;

CPU 1:
     spin_lock(&sem->lock); /* sem_nsems instances */
     smp_mb__after_spin_lock();
     if (!smp_load_acquire(&sma->complex_mode)) {
         /* fast path successful! */
         return sops->sem_num;
     }
      /* slow path, not relevant */

CPU 2: (holding sma->sem_perm.lock)

         smp_store_mb(sma->complex_mode, true);

         for (i = 0; i < sma->sem_nsems; i++) {
                 spin_unlock_wait(&sma->sem_base[i].lock);
         }

It must not happen that both CPUs proceed:
Either CPU1 proceeds, then CPU2 must spin in spin_unlock_wait()
or CPU2 proceeds, then CPU1 must enter the slow path.

What about this?
/*
  * spin_lock() provides ACQUIRE semantics regarding reading the lock.
  * There are no guarantees that the store of the lock is visible before
  * any read or write operation within the protected area is performed.
  * If the store of the lock must happen first, this function is required.
  */
#define spin_lock_store_acquire()

I would update the patch series.

--
     Manfred

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-31  4:59         ` Manfred Spraul
@ 2016-08-31 15:40           ` Peter Zijlstra
  2016-08-31 16:40             ` Will Deacon
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-08-31 15:40 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso, Will Deacon

On Wed, Aug 31, 2016 at 06:59:07AM +0200, Manfred Spraul wrote:

> The barrier must ensure that taking the spinlock (as observed by another cpu
> with spin_unlock_wait()) and a following read are ordered.
> 
> start condition: sma->complex_mode = false;
> 
> CPU 1:
>     spin_lock(&sem->lock); /* sem_nsems instances */
>     smp_mb__after_spin_lock();
>     if (!smp_load_acquire(&sma->complex_mode)) {
>         /* fast path successful! */
>         return sops->sem_num;
>     }
>      /* slow path, not relevant */
> 
> CPU 2: (holding sma->sem_perm.lock)
> 
>         smp_store_mb(sma->complex_mode, true);
> 
>         for (i = 0; i < sma->sem_nsems; i++) {
>                 spin_unlock_wait(&sma->sem_base[i].lock);
>         }
> 
> It must not happen that both CPUs proceed:
> Either CPU1 proceeds, then CPU2 must spin in spin_unlock_wait()
> or CPU2 proceeds, then CPU1 must enter the slow path.
> 
> What about this?
> /*
>  * spin_lock() provides ACQUIRE semantics regarding reading the lock.
>  * There are no guarantees that the store of the lock is visible before
>  * any read or write operation within the protected area is performed.
>  * If the store of the lock must happen first, this function is required.
>  */
> #define spin_lock_store_acquire()

So I think the fundamental problem is with our atomic_*_acquire()
primitives, where we've specified that the ACQUIRE only pertains to the
LOAD of the RmW.

The spinlock implementations suffer this problem mostly because of
that (not 100% accurate but close enough).

One solution would be to simply use smp_mb__after_atomic(). The
'problem' with that is __atomic_op_acquire() defaults to using that, so
the archs that use __atomic_op_acquire() will get a double smp_mb()
(arm64 and powerpc do not use __atomic_op_acquire()).

I'm not sure we want to introduce a new primitive for this specific to
spinlocks.

Will, any opinions?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-31 15:40           ` Peter Zijlstra
@ 2016-08-31 16:40             ` Will Deacon
  2016-08-31 18:32               ` Manfred Spraul
  0 siblings, 1 reply; 25+ messages in thread
From: Will Deacon @ 2016-08-31 16:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Manfred Spraul, benh, paulmck, Ingo Molnar, Boqun Feng,
	Andrew Morton, LKML, 1vier1, Davidlohr Bueso

On Wed, Aug 31, 2016 at 05:40:49PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 06:59:07AM +0200, Manfred Spraul wrote:
> 
> > The barrier must ensure that taking the spinlock (as observed by another cpu
> > with spin_unlock_wait()) and a following read are ordered.
> > 
> > start condition: sma->complex_mode = false;
> > 
> > CPU 1:
> >     spin_lock(&sem->lock); /* sem_nsems instances */
> >     smp_mb__after_spin_lock();
> >     if (!smp_load_acquire(&sma->complex_mode)) {
> >         /* fast path successful! */
> >         return sops->sem_num;
> >     }
> >      /* slow path, not relevant */
> > 
> > CPU 2: (holding sma->sem_perm.lock)
> > 
> >         smp_store_mb(sma->complex_mode, true);
> > 
> >         for (i = 0; i < sma->sem_nsems; i++) {
> >                 spin_unlock_wait(&sma->sem_base[i].lock);
> >         }

I'm struggling with this example. We have these locks:

  &sem->lock
  &sma->sem_base[0...sma->sem_nsems].lock
  &sma->sem_perm.lock

a condition variable:

  sma->complex_mode

and a new barrier:

  smp_mb__after_spin_lock()

For simplicity, we can make sma->sem_nsems == 1, and have &sma->sem_base[0]
be &sem->lock in the example above. &sma->sem_perm.lock seems to be
irrelevant.

The litmus test then looks a bit like:

CPUm:

LOCK(x)
smp_mb();
RyAcq=0


CPUn:

Wy=1
smp_mb();
UNLOCK_WAIT(x)


which I think can be simplified to:


LOCK(x)
Ry=0

Wy=1
smp_mb(); // Note that this is implied by spin_unlock_wait on PPC and arm64
LOCK(x)   // spin_unlock_wait behaves like lock; unlock
UNLOCK(x)


[I've removed a bunch of barriers here, that I don't think are necessary
 for the guarantees you're after]

and the question is "Can both CPUs proceed?".

Looking at the above, then I don't think that they can. Whilst CPUm can
indeed speculate the Ry=0 before successfully taking the lock, if CPUn
observes CPUm's read, then it must also observe the lock being held wrt
the spin_lock API. That is because a successful LOCK operation by CPUn
would force CPUm to replay its LL/SC loop and therefore discard its
speculation of y.

What am I missing? The code snippet seems to have too many barriers to me!

Will

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-31 16:40             ` Will Deacon
@ 2016-08-31 18:32               ` Manfred Spraul
  2016-09-01  8:44                 ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Manfred Spraul @ 2016-08-31 18:32 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Andrew Morton, LKML,
	1vier1, Davidlohr Bueso

On 08/31/2016 06:40 PM, Will Deacon wrote:
>
> I'm struggling with this example. We have these locks:
>
>    &sem->lock
>    &sma->sem_base[0...sma->sem_nsems].lock
>    &sma->sem_perm.lock
>
> a condition variable:
>
>    sma->complex_mode
>
> and a new barrier:
>
>    smp_mb__after_spin_lock()
>
> For simplicity, we can make sma->sem_nsems == 1, and have &sma->sem_base[0]
> be &sem->lock in the example above.
Correct.
>   &sma->sem_perm.lock seems to be
> irrelevant.
Correct.
> The litmus test then looks a bit like:
>
> CPUm:
>
> LOCK(x)
> smp_mb();
> RyAcq=0
>
>
> CPUn:
>
> Wy=1
> smp_mb();
> UNLOCK_WAIT(x)
Correct.
>
> which I think can be simplified to:
>
>
> LOCK(x)
I thought that here a barrier is required, because Ry=0 could be before 
store of the lock.
> Ry=0
RyAcq instead of Ry would required due to the unlock at the end of the 
critical section
CpuN: <...>
           WyRelease=0
for the litmus test irrelevant.
> Wy=1
> smp_mb(); // Note that this is implied by spin_unlock_wait on PPC and arm64
> LOCK(x)   // spin_unlock_wait behaves like lock; unlock
> UNLOCK(x)

> [I've removed a bunch of barriers here, that I don't think are necessary
>   for the guarantees you're after]
>
> and the question is "Can both CPUs proceed?".
>
> Looking at the above, then I don't think that they can. Whilst CPUm can
> indeed speculate the Ry=0 before successfully taking the lock, if CPUn
> observes CPUm's read, then it must also observe the lock being held wrt
> the spin_lock API. That is because a successful LOCK operation by CPUn
> would force CPUm to replay its LL/SC loop and therefore discard its
> speculation of y.
>
> What am I missing? The code snippet seems to have too many barriers to me!
spin_unlock_wait() is not necessarily lock()+unlock().
It can be a simple Rx, or now RxAcq.

So I had assumed:

CPUm:

LOCK(x)
smp_mb(); /* at least for PPC, therefore with arch override */
RyAcq=0


CPUn:

Wy=1
smp_mb(); /* at least for archs where UNLOCK_WAIT is RxAcq */
UNLOCK_WAIT(x)
smp_rmb(); /* not required anymore, was required when UNLOCK_WAIT was Rx */


--
     Manfred

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-08-31 18:32               ` Manfred Spraul
@ 2016-09-01  8:44                 ` Peter Zijlstra
  2016-09-01 11:04                   ` Manfred Spraul
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-01  8:44 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Will Deacon, benh, paulmck, Ingo Molnar, Boqun Feng,
	Andrew Morton, LKML, 1vier1, Davidlohr Bueso

On Wed, Aug 31, 2016 at 08:32:18PM +0200, Manfred Spraul wrote:
> On 08/31/2016 06:40 PM, Will Deacon wrote:

> >The litmus test then looks a bit like:
> >
> >CPUm:
> >
> >LOCK(x)
> >smp_mb();
> >RyAcq=0
> >
> >
> >CPUn:
> >
> >Wy=1
> >smp_mb();
> >UNLOCK_WAIT(x)
> Correct.
> >
> >which I think can be simplified to:
> >
> >
> >LOCK(x)
> I thought that here a barrier is required, because Ry=0 could be before
> store of the lock.
> >Ry=0
> RyAcq instead of Ry would required due to the unlock at the end of the
> critical section
> CpuN: <...>
>           WyRelease=0
> for the litmus test irrelevant.
> >Wy=1
> >smp_mb(); // Note that this is implied by spin_unlock_wait on PPC and arm64
> >LOCK(x)   // spin_unlock_wait behaves like lock; unlock
> >UNLOCK(x)
> 
> >[I've removed a bunch of barriers here, that I don't think are necessary
> >  for the guarantees you're after]
> >
> >and the question is "Can both CPUs proceed?".
> >
> >Looking at the above, then I don't think that they can. Whilst CPUm can
> >indeed speculate the Ry=0 before successfully taking the lock, if CPUn
> >observes CPUm's read, then it must also observe the lock being held wrt
> >the spin_lock API. That is because a successful LOCK operation by CPUn
> >would force CPUm to replay its LL/SC loop and therefore discard its
> >speculation of y.
> >
> >What am I missing? The code snippet seems to have too many barriers to me!

> spin_unlock_wait() is not necessarily lock()+unlock().
> It can be a simple Rx, or now RxAcq.

Can be, normally, yes. But on power and arm64, the only architectures on
which the ACQUIRE is 'funny' they do the 'pointless' ll/sc cycle in
spin_unlock_wait() to 'fix' things.

So for both power and arm64, you can in fact model spin_unlock_wait()
as LOCK+UNLOCK.

All the other archs have (so far) 'sensible' ACQUIRE semantics and all
this is moot.

[ MIPS _could_ possibly do the 'interesting' ACQUIRE too, but so far
  hasn't introduced their fancy barriers. ]

The other interesting case is qspinlock, which does the unordered store
in software, but that is after a necessary atomic (ACQUIRE) operation to
enqueue, which we exploit in the queued_spin_unlock_wait() to order
against. So that too is good, assuming the ACQUIRE is good.

Once Power/ARM64 use qspinlock, they'll need to be careful again.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-09-01  8:44                 ` Peter Zijlstra
@ 2016-09-01 11:04                   ` Manfred Spraul
  2016-09-01 11:19                     ` Will Deacon
  2016-09-01 11:51                     ` Peter Zijlstra
  0 siblings, 2 replies; 25+ messages in thread
From: Manfred Spraul @ 2016-09-01 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, benh, paulmck, Ingo Molnar, Boqun Feng,
	Andrew Morton, LKML, 1vier1, Davidlohr Bueso

Hi,

On 09/01/2016 10:44 AM, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 08:32:18PM +0200, Manfred Spraul wrote:
>> On 08/31/2016 06:40 PM, Will Deacon wrote:
>>> The litmus test then looks a bit like:
>>>
>>> CPUm:
>>>
>>> LOCK(x)
>>> smp_mb();
>>> RyAcq=0
>>>
>>>
>>> CPUn:
>>>
>>> Wy=1
>>> smp_mb();
>>> UNLOCK_WAIT(x)
>> Correct.
>>> which I think can be simplified to:
>>>
>>>
>>> LOCK(x)
>> I thought that here a barrier is required, because Ry=0 could be before
>> store of the lock.
>>> Ry=0
>> RyAcq instead of Ry would required due to the unlock at the end of the
>> critical section
>> CpuN: <...>
>>            WyRelease=0
>> for the litmus test irrelevant.
>>> Wy=1
>>> smp_mb(); // Note that this is implied by spin_unlock_wait on PPC and arm64
>>> LOCK(x)   // spin_unlock_wait behaves like lock; unlock
>>> UNLOCK(x)
>>> [I've removed a bunch of barriers here, that I don't think are necessary
>>>   for the guarantees you're after]
>>>
>>> and the question is "Can both CPUs proceed?".
>>>
>>> Looking at the above, then I don't think that they can. Whilst CPUm can
>>> indeed speculate the Ry=0 before successfully taking the lock, if CPUn
>>> observes CPUm's read, then it must also observe the lock being held wrt
>>> the spin_lock API. That is because a successful LOCK operation by CPUn
>>> would force CPUm to replay its LL/SC loop and therefore discard its
>>> speculation of y.
>>>
>>> What am I missing? The code snippet seems to have too many barriers to me!
>> spin_unlock_wait() is not necessarily lock()+unlock().
>> It can be a simple Rx, or now RxAcq.
> Can be, normally, yes. But on power and arm64, the only architectures on
> which the ACQUIRE is 'funny' they do the 'pointless' ll/sc cycle in
> spin_unlock_wait() to 'fix' things.
>
> So for both power and arm64, you can in fact model spin_unlock_wait()
> as LOCK+UNLOCK.
Is this consensus?

If I understand it right, the rules are:
1. spin_unlock_wait() must behave like spin_lock();spin_unlock();
2. spin_is_locked() must behave like spin_trylock() ? spin_unlock(),TRUE 
: FALSE
3. the ACQUIRE during spin_lock applies to the lock load, not to the store.

sem.c and nf_conntrack.c need only rule 1 now, but I would document the 
rest as well, ok?

I'll update the patches.

--
     Manfred

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-09-01 11:04                   ` Manfred Spraul
@ 2016-09-01 11:19                     ` Will Deacon
  2016-09-01 11:51                     ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Will Deacon @ 2016-09-01 11:19 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Peter Zijlstra, benh, paulmck, Ingo Molnar, Boqun Feng,
	Andrew Morton, LKML, 1vier1, Davidlohr Bueso

On Thu, Sep 01, 2016 at 01:04:26PM +0200, Manfred Spraul wrote:
> If I understand it right, the rules are:
> 1. spin_unlock_wait() must behave like spin_lock();spin_unlock();
> 2. spin_is_locked() must behave like spin_trylock() ? spin_unlock(),TRUE :
> FALSE

I don't think spin_is_locked is as strong as all that. On arm64 and ppc,
it's just smp_mb(); followed by a check on the lock value. It can't be
used for the same sorts of inter-CPU synchronisation that spin_unlock_wait
provides.

> 3. the ACQUIRE during spin_lock applies to the lock load, not to the store.

Correct. This is already documented for things like cmpxchg.

> sem.c and nf_conntrack.c need only rule 1 now, but I would document the rest
> as well, ok?
> 
> I'll update the patches.

Please CC me!

Will

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-09-01 11:04                   ` Manfred Spraul
  2016-09-01 11:19                     ` Will Deacon
@ 2016-09-01 11:51                     ` Peter Zijlstra
  2016-09-01 14:05                       ` Boqun Feng
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2016-09-01 11:51 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Will Deacon, benh, paulmck, Ingo Molnar, Boqun Feng,
	Andrew Morton, LKML, 1vier1, Davidlohr Bueso

On Thu, Sep 01, 2016 at 01:04:26PM +0200, Manfred Spraul wrote:

> >So for both power and arm64, you can in fact model spin_unlock_wait()
> >as LOCK+UNLOCK.

> Is this consensus?

Dunno, but it was done to fix your earlier locking scheme and both
architectures where it matters have done so.

So I suppose that could be taken as consensus ;-)

> If I understand it right, the rules are:
> 1. spin_unlock_wait() must behave like spin_lock();spin_unlock();

>From a barrier perspective, yes I think so. Ideally the implementation
would avoid stores (which was the entire point of introducing that
primitive IIRC) if at all possible (not possible on ARM64/Power).

> 2. spin_is_locked() must behave like spin_trylock() ? spin_unlock(),TRUE :
> FALSE

Not sure on this one, That might be consistent, but I don't see the
ll/sc-nop in there. Will?

> 3. the ACQUIRE during spin_lock applies to the lock load, not to the store.

I think we can state that ACQUIRE on _any_ atomic only applies to the
LOAD not the STORE.

And we're waiting for that to bite us again before trying to deal with
it in a more generic manner; for now only the spinlock implementations
(specifically spin_unlock_wait) deal with it.


Will, Boqun, did I get that right?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] spinlock: Document memory barrier rules
  2016-09-01 11:51                     ` Peter Zijlstra
@ 2016-09-01 14:05                       ` Boqun Feng
  0 siblings, 0 replies; 25+ messages in thread
From: Boqun Feng @ 2016-09-01 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Manfred Spraul, Will Deacon, benh, paulmck, Ingo Molnar,
	Andrew Morton, LKML, 1vier1, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 1890 bytes --]

On Thu, Sep 01, 2016 at 01:51:34PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 01, 2016 at 01:04:26PM +0200, Manfred Spraul wrote:
> 
> > >So for both power and arm64, you can in fact model spin_unlock_wait()
> > >as LOCK+UNLOCK.
> 
> > Is this consensus?
> 
> Dunno, but it was done to fix your earlier locking scheme and both
> architectures where it matters have done so.
> 
> So I suppose that could be taken as consensus ;-)
> 
> > If I understand it right, the rules are:
> > 1. spin_unlock_wait() must behave like spin_lock();spin_unlock();
> 
> From a barrier perspective, yes I think so. Ideally the implementation
> would avoid stores (which was the entire point of introducing that
> primitive IIRC) if at all possible (not possible on ARM64/Power).
> 
> > 2. spin_is_locked() must behave like spin_trylock() ? spin_unlock(),TRUE :
> > FALSE
> 
> Not sure on this one, That might be consistent, but I don't see the
> ll/sc-nop in there. Will?
> 

My understanding is as Will stated, we don't provide this strong
gaurantee for spin_is_locked(). The reason is mostly because all(?)
uses of spin_is_locked() are not for correctness but for other purposes
like debug output.

> > 3. the ACQUIRE during spin_lock applies to the lock load, not to the store.
> 
> I think we can state that ACQUIRE on _any_ atomic only applies to the
> LOAD not the STORE.
> 
> And we're waiting for that to bite us again before trying to deal with

;-)

> it in a more generic manner; for now only the spinlock implementations
> (specifically spin_unlock_wait) deal with it.
> 

I think the hope is that, with herd or other tools, and a formal order
model, we can make more people understand this "counter-intuitive"
behavior and help them write correct and efficient code ;-)

> 
> Will, Boqun, did I get that right?
> 

Yep ;-)

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-09-01 14:05 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-28 11:56 [PATCH 0/4] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
2016-08-28 11:56 ` [PATCH 1/4] spinlock: Document memory barrier rules Manfred Spraul
2016-08-28 11:56   ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Manfred Spraul
2016-08-28 11:56     ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
2016-08-28 11:56       ` [PATCH 4/4] qspinlock for x86: smp_mb__after_spin_lock() is free Manfred Spraul
2016-08-29 10:52         ` Peter Zijlstra
2016-08-29 10:51       ` [PATCH 3/4] net/netfilter/nf_conntrack_core: update memory barriers Peter Zijlstra
2016-08-28 13:43     ` [PATCH 2/4] barrier.h: Move smp_mb__after_unlock_lock to barrier.h Paul E. McKenney
2016-08-28 16:31       ` Manfred Spraul
2016-08-28 18:00       ` Manfred Spraul
2016-08-28 14:41     ` kbuild test robot
2016-08-28 17:43       ` [PATCH 2/4 v3] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h Manfred Spraul
2016-08-29 10:48   ` [PATCH 1/4] spinlock: Document memory barrier rules Peter Zijlstra
2016-08-29 12:54     ` Manfred Spraul
2016-08-29 13:44       ` Peter Zijlstra
2016-08-31  4:59         ` Manfred Spraul
2016-08-31 15:40           ` Peter Zijlstra
2016-08-31 16:40             ` Will Deacon
2016-08-31 18:32               ` Manfred Spraul
2016-09-01  8:44                 ` Peter Zijlstra
2016-09-01 11:04                   ` Manfred Spraul
2016-09-01 11:19                     ` Will Deacon
2016-09-01 11:51                     ` Peter Zijlstra
2016-09-01 14:05                       ` Boqun Feng
2016-08-29 10:53 ` [PATCH 0/4] Clarify/standardize memory barriers for lock/unlock Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.