All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5 V5] Clarify/standardize memory barriers for lock/unlock
@ 2016-08-31 13:42 Manfred Spraul
  2016-08-31 13:42 ` [PATCH 1/5] ipc/sem.c: Remove smp_rmb() from complexmode_enter() Manfred Spraul
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Manfred Spraul @ 2016-08-31 13:42 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

Hi,

V5: Major restructuring based on input from Peter and Davidlohr.

As discussed before:
If a high-scalability locking scheme is built with multiple
spinlocks, then often additional memory barriers are required.

The documentation was not as clear as possible, and memory
barriers were missing / superfluous in the implementation.

Patch 1: sem.c: Remove the smp_rmb() after spin_unlock_wait().
Patch 2: Documentation
Patch 3: Add spinlock_store_acquire(), update ipc/sem.c
Patch 4: Move smp_mb__after_unlock_lock to <linux/spinlock.h>
Patch 5: Fix memory ordering for nf_conntrack

The patches are safe for all architectures, the default is smp_mb().

Patch 5 is larger than required, it rewrites the conntrack logic
with the code from ipc/sem.c. I think the new code is simpler
and more realtime-friendly.

@Peter: A hint if qspinlocks can omit the smp_mb() would be
appreciated (everywhere or x86-only).
The comments on top of queued_spin_unlock_wait() had convinced
me that no additional barrier is required.
But the comment also convinced me that on x86, no further
barrier would be required for a full smp_mb() - and there you
wrote that this is wrong.

@Andrew: The patches are relative to mmots.
Could you include them in your tree, with the target of including in
linux-next?

--
	Manfred

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] ipc/sem.c: Remove smp_rmb() from complexmode_enter()
  2016-08-31 13:42 [PATCH 0/5 V5] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
@ 2016-08-31 13:42 ` Manfred Spraul
  2016-08-31 13:42 ` [PATCH 2/5] spinlock: Document memory barrier rules for spin_lock and spin_unlock() Manfred Spraul
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Manfred Spraul @ 2016-08-31 13:42 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

complexmode_enter() contains an smp_rmb() after spin_unlock_wait().
This was done to allow safe backporting.

With commit 2c6100227116
("locking/qspinlock: Fix spin_unlock_wait() some more"),
(and the commits for the other archs), spin_unlock_wait() is an
ACQUIRE.
Therefore the smp_rmb() after spin_unlock_wait() can be removed.

Not for stable!

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 ipc/sem.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 5e318c5..6586e0a 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -290,14 +290,6 @@ static void complexmode_enter(struct sem_array *sma)
 		sem = sma->sem_base + i;
 		spin_unlock_wait(&sem->lock);
 	}
-	/*
-	 * spin_unlock_wait() is not a memory barriers, it is only a
-	 * control barrier. The code must pair with spin_unlock(&sem->lock),
-	 * thus just the control barrier is insufficient.
-	 *
-	 * smp_rmb() is sufficient, as writes cannot pass the control barrier.
-	 */
-	smp_rmb();
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] spinlock: Document memory barrier rules for spin_lock and spin_unlock().
  2016-08-31 13:42 [PATCH 0/5 V5] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
  2016-08-31 13:42 ` [PATCH 1/5] ipc/sem.c: Remove smp_rmb() from complexmode_enter() Manfred Spraul
@ 2016-08-31 13:42 ` Manfred Spraul
  2016-08-31 13:42 ` [PATCH 3/5] spinlock: define spinlock_store_acquire Manfred Spraul
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Manfred Spraul @ 2016-08-31 13:42 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

In theory, the memory ordering rules for spinlock() and spin_unlock()
are simple: spin_lock is ACQUIRE, spin_unlock is RELEASE.

What is missing is a clear documentation for the corner cases:
- What is covered by the ACQUIRE during spin_lock: only the lock load
  or also the lock store?
- spin_unlock_wait() is an ACQUIRE.
- No memory ordering is enforced by spin_is_locked().

The patch adds this into Documentation/locking/spinlock.txt.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 Documentation/locking/spinlocks.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/locking/spinlocks.txt b/Documentation/locking/spinlocks.txt
index ff35e40..0696d68 100644
--- a/Documentation/locking/spinlocks.txt
+++ b/Documentation/locking/spinlocks.txt
@@ -40,6 +40,15 @@ example, internal driver data structures that nobody else ever touches).
    touches a shared variable has to agree about the spinlock they want
    to use.
 
+   NOTE! Code that needs stricter memory barriers than ACQUIRE during
+   LOCK and RELEASE during UNLOCK must use appropriate memory barriers
+   such as smp_mb__before_spinlock(). Especially, the ACQUIRE during
+   LOCK applies to reading the lock state. Operations within the
+   guarded area can happen before the lock store.
+   spin_unlock_wait() has ACQUIRE semantics.
+   spin_is_locked() is not a memory barrier, callers that use it for
+   locking must add appropriate barriers.
+
 ----
 
 Lesson 2: reader-writer spinlocks.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] spinlock: define spinlock_store_acquire
  2016-08-31 13:42 [PATCH 0/5 V5] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
  2016-08-31 13:42 ` [PATCH 1/5] ipc/sem.c: Remove smp_rmb() from complexmode_enter() Manfred Spraul
  2016-08-31 13:42 ` [PATCH 2/5] spinlock: Document memory barrier rules for spin_lock and spin_unlock() Manfred Spraul
@ 2016-08-31 13:42 ` Manfred Spraul
  2016-08-31 13:42 ` [PATCH 4/5] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h Manfred Spraul
  2016-08-31 13:42 ` [PATCH 5/5] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
  4 siblings, 0 replies; 7+ messages in thread
From: Manfred Spraul @ 2016-08-31 13:42 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

A spinlock is an ACQUIRE regarding reading the lock state.
The store of the lock may be postponed past the first operations
within the protected area, e.g. see commit 51d7d5205d33
("powerpc: Add smp_mb() to arch_spin_is_locked()".

The patch defines a new spinlock_store_acquire primitive:
It guarantees that the store is ordered relative to the following
load/store operations. Adding the barrier into spin_is_locked()
does help as not everyone uses spin_is_locked() or
spin_unlock_wait().

The patch:
- adds a definition into <linux/spinlock.h> (as smp_mb(), which is
  safe for all architectures)
- converts ipc/sem.c to the new define.

For overriding, the same approach as for smp_mb__before_spin_lock()
is used: If smp_mb__after_spin_lock is already defined, then it is
not changed.

The default is smp_mb(), to ensure that no architecture gets broken.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 include/linux/spinlock.h | 12 ++++++++++++
 ipc/sem.c                |  8 +-------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 47dd0ce..496f288 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -130,6 +130,18 @@ do {								\
 #define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
+#ifndef spinlock_store_acquire
+/**
+ * spinlock_store_acuqire() - Provide acquire() after store part
+ *
+ * spin_lock() provides ACQUIRE semantics regarding reading the lock.
+ * There are no guarantees that the lock write is visible before any read
+ * read or write operation within the protected area is performed.
+ * If the lock write must happen first, this function is required.
+ */
+#define spinlock_store_acquire()	smp_mb()
+#endif
+
 /**
  * raw_spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/ipc/sem.c b/ipc/sem.c
index 6586e0a..49d0ae3 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -355,13 +355,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 		 */
 		spin_lock(&sem->lock);
 
-		/*
-		 * See 51d7d5205d33
-		 * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
-		 * A full barrier is required: the write of sem->lock
-		 * must be visible before the read is executed
-		 */
-		smp_mb();
+		spinlock_store_acquire();
 
 		if (!smp_load_acquire(&sma->complex_mode)) {
 			/* fast path successful! */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h
  2016-08-31 13:42 [PATCH 0/5 V5] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
                   ` (2 preceding siblings ...)
  2016-08-31 13:42 ` [PATCH 3/5] spinlock: define spinlock_store_acquire Manfred Spraul
@ 2016-08-31 13:42 ` Manfred Spraul
  2016-08-31 13:42 ` [PATCH 5/5] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
  4 siblings, 0 replies; 7+ messages in thread
From: Manfred Spraul @ 2016-08-31 13:42 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

v3: If smp_mb__after_unlock_lock() is in barrier.h, then
for arm64, kernel/rcu/tree.c doesn't compile because barrier.h
is not included in kernel/rcu/tree.c

(v2 was: add example from Paul, something that can happen on real HW)

spin_unlock() + spin_lock() together do not form a full memory barrier:
(everything initialized to 0)

CPU1:
  a=1;
  spin_unlock(&b);
  spin_lock(&c);
+ smp_mb__after_unlock_lock();
  r1=d;

CPU2:
  d=1;
  smp_mb();
  r2=a;

Without the smp_mb__after_unlock_lock(), r1==0 && r2==0 would
be possible.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

---
 include/linux/spinlock.h | 16 ++++++++++++++++
 kernel/rcu/tree.h        | 12 ------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 496f288..accdebb 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -142,6 +142,22 @@ do {								\
 #define spinlock_store_acquire()	smp_mb()
 #endif
 
+#ifndef smp_mb__after_unlock_lock
+/**
+ * smp_mb__after_unlock_lock() - Provide smp_mb() after unlock+lock
+ *
+ * Place this after a lock-acquisition primitive to guarantee that
+ * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
+ * if the UNLOCK and LOCK are executed by the same CPU or if the
+ * UNLOCK and LOCK operate on the same lock variable.
+ */
+#ifdef CONFIG_PPC
+#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
+#else /* #ifdef CONFIG_PPC */
+#define smp_mb__after_unlock_lock()	do { } while (0)
+#endif /* #else #ifdef CONFIG_PPC */
+#endif
+
 /**
  * raw_spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e99a523..a0cd9ab 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -687,18 +687,6 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 #endif /* #ifdef CONFIG_RCU_TRACE */
 
 /*
- * Place this after a lock-acquisition primitive to guarantee that
- * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
- * if the UNLOCK and LOCK are executed by the same CPU or if the
- * UNLOCK and LOCK operate on the same lock variable.
- */
-#ifdef CONFIG_PPC
-#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
-#else /* #ifdef CONFIG_PPC */
-#define smp_mb__after_unlock_lock()	do { } while (0)
-#endif /* #else #ifdef CONFIG_PPC */
-
-/*
  * Wrappers for the rcu_node::lock acquire and release.
  *
  * Because the rcu_nodes form a tree, the tree traversal locking will observe
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] net/netfilter/nf_conntrack_core: update memory barriers.
  2016-08-31 13:42 [PATCH 0/5 V5] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
                   ` (3 preceding siblings ...)
  2016-08-31 13:42 ` [PATCH 4/5] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h Manfred Spraul
@ 2016-08-31 13:42 ` Manfred Spraul
  2016-08-31 14:15   ` Eric Dumazet
  4 siblings, 1 reply; 7+ messages in thread
From: Manfred Spraul @ 2016-08-31 13:42 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul, Pablo Neira Ayuso,
	netfilter-devel

As explained in commit 51d7d5205d33
("powerpc: Add smp_mb() to arch_spin_is_locked()", for some architectures
the ACQUIRE during spin_lock only applies to loading the lock, not to
storing the lock state.

nf_conntrack_lock() does not handle this correctly:
    /* 1) Acquire the lock */
    spin_lock(lock);
    while (unlikely(nf_conntrack_locks_all)) {
        spin_unlock(lock);

spinlock_store_acquire() is missing between spin_lock and reading
nf_conntrack_locks_all. In addition, reading nf_conntrack_locks_all
needs ACQUIRE memory ordering.

2nd, minor issue: If there would be many nf_conntrack_all_lock() callers,
then nf_conntrack_lock() would loop forever.

Therefore: Change nf_conntrack_lock and nf_conntract_lock_all() to the
approach used by ipc/sem.c:

- add spinlock_store_acquire()
- add smp_load_acquire()
- for nf_conntrack_lock, use spin_lock(&global_lock) instead of
  spin_unlock_wait(&global_lock) and loop backward.
- use smp_store_mb() instead of a raw smp_mb()

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org

---

Question: Should I split this patch?
First a patch that uses smp_mb(), with Cc: stable.
The replace the smp_mb() with spinlock_store_acquire, not for stable

 net/netfilter/nf_conntrack_core.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 7d90a5d..f840b0b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -79,20 +79,29 @@ static __read_mostly bool nf_conntrack_locks_all;
 
 void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
 {
+	/* 1) Acquire the lock */
 	spin_lock(lock);
-	while (unlikely(nf_conntrack_locks_all)) {
-		spin_unlock(lock);
 
-		/*
-		 * Order the 'nf_conntrack_locks_all' load vs. the
-		 * spin_unlock_wait() loads below, to ensure
-		 * that 'nf_conntrack_locks_all_lock' is indeed held:
-		 */
-		smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
-		spin_unlock_wait(&nf_conntrack_locks_all_lock);
-		spin_lock(lock);
-	}
+	/* 2) Order storing the lock and reading nf_conntrack_locks_all */
+	spinlock_store_acquire();
+
+	/* 3) read nf_conntrack_locks_all, with ACQUIRE semantics */
+	if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
+		return;
+
+	/* fast path failed, unlock */
+	spin_unlock(lock);
+
+	/* Slow path 1) get global lock */
+	spin_lock(&nf_conntrack_locks_all_lock);
+
+	/* Slow path 2) get the lock we want */
+	spin_lock(lock);
+
+	/* Slow path 3) release the global lock */
+	spin_unlock(&nf_conntrack_locks_all_lock);
 }
+
 EXPORT_SYMBOL_GPL(nf_conntrack_lock);
 
 static void nf_conntrack_double_unlock(unsigned int h1, unsigned int h2)
@@ -132,15 +141,14 @@ static void nf_conntrack_all_lock(void)
 	int i;
 
 	spin_lock(&nf_conntrack_locks_all_lock);
-	nf_conntrack_locks_all = true;
 
 	/*
-	 * Order the above store of 'nf_conntrack_locks_all' against
+	 * Order the store of 'nf_conntrack_locks_all' against
 	 * the spin_unlock_wait() loads below, such that if
 	 * nf_conntrack_lock() observes 'nf_conntrack_locks_all'
 	 * we must observe nf_conntrack_locks[] held:
 	 */
-	smp_mb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
+	smp_store_mb(nf_conntrack_locks_all, true);
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++) {
 		spin_unlock_wait(&nf_conntrack_locks[i]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 5/5] net/netfilter/nf_conntrack_core: update memory barriers.
  2016-08-31 13:42 ` [PATCH 5/5] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
@ 2016-08-31 14:15   ` Eric Dumazet
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2016-08-31 14:15 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra,
	Andrew Morton, LKML, 1vier1, Davidlohr Bueso, Pablo Neira Ayuso,
	netfilter-devel

On Wed, 2016-08-31 at 15:42 +0200, Manfred Spraul wrote:
> As explained in commit 51d7d5205d33
> ("powerpc: Add smp_mb() to arch_spin_is_locked()", for some architectures
> the ACQUIRE during spin_lock only applies to loading the lock, not to
> storing the lock state.
> 
> nf_conntrack_lock() does not handle this correctly:
>     /* 1) Acquire the lock */
>     spin_lock(lock);
>     while (unlikely(nf_conntrack_locks_all)) {
>         spin_unlock(lock);
> 
> spinlock_store_acquire() is missing between spin_lock and reading
> nf_conntrack_locks_all. In addition, reading nf_conntrack_locks_all
> needs ACQUIRE memory ordering.
> 
> 2nd, minor issue: If there would be many nf_conntrack_all_lock() callers,
> then nf_conntrack_lock() would loop forever.
> 
> Therefore: Change nf_conntrack_lock and nf_conntract_lock_all() to the
> approach used by ipc/sem.c:
> 
> - add spinlock_store_acquire()
> - add smp_load_acquire()
> - for nf_conntrack_lock, use spin_lock(&global_lock) instead of
>   spin_unlock_wait(&global_lock) and loop backward.
> - use smp_store_mb() instead of a raw smp_mb()
> 
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: netfilter-devel@vger.kernel.org
> 
> ---
> 
> Question: Should I split this patch?
> First a patch that uses smp_mb(), with Cc: stable.
> The replace the smp_mb() with spinlock_store_acquire, not for stable

I guess it all depends on stable backports you believe are needed.

You probably should add the tags :
Fixes: <12-digit-sha1> "patch title"
that introduced the bug(s) you fix.

By doing this archaeological research you will likely have a better
answer ?

Thanks !

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-31 14:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-31 13:42 [PATCH 0/5 V5] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
2016-08-31 13:42 ` [PATCH 1/5] ipc/sem.c: Remove smp_rmb() from complexmode_enter() Manfred Spraul
2016-08-31 13:42 ` [PATCH 2/5] spinlock: Document memory barrier rules for spin_lock and spin_unlock() Manfred Spraul
2016-08-31 13:42 ` [PATCH 3/5] spinlock: define spinlock_store_acquire Manfred Spraul
2016-08-31 13:42 ` [PATCH 4/5] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h Manfred Spraul
2016-08-31 13:42 ` [PATCH 5/5] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
2016-08-31 14:15   ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.