[PATCH 0/4 V4] Clarify/standardize memory barriers for lock/unlock

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4 V4] Clarify/standardize memory barriers for lock/unlock
@ 2016-08-29 13:34 Manfred Spraul
  2016-08-29 13:34 ` [PATCH 1/4 v4] spinlock: Document memory barrier rules Manfred Spraul
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Manfred Spraul @ 2016-08-29 13:34 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

Hi,

V4: Docu/comment improvements, remove unnecessary barrier for x86.
V3: Bugfix for arm64
V2: Include updated documentation for rcutree patch

As discussed before:
If a high-scalability locking scheme is built with multiple
spinlocks, then often additional memory barriers are required.

The documentation was not as clear as possible, and memory
barriers were missing / superfluous in the implementation.

Patch 1: Documentation, define one standard barrier, update ipc/sem.c
Patch 2: Update rcutree
Patch 3: Update nf_conntrack
Patch 4: Update for qspinlock: smp_mb__after_spin_lock is free.

Patch 3 is larger than required, it rewrites the conntrack logic
with the code from ipc/sem.c. I think the new code is simpler
and more realtime-friendly.

Please review!

@Andrew: The patches are relative to mmots.
Could you include them in your tree, with the target of including in
linux-next?

--
	Manfred

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4 v4] spinlock: Document memory barrier rules
  2016-08-29 13:34 [PATCH 0/4 V4] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
@ 2016-08-29 13:34 ` Manfred Spraul
  2016-08-29 17:38   ` Davidlohr Bueso
  2016-08-29 13:34 ` [PATCH 2/4 V4] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h Manfred Spraul
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Manfred Spraul @ 2016-08-29 13:34 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

Right now, the spinlock machinery tries to guarantee barriers even for
unorthodox locking cases, which ends up as a constant stream of updates
as the architectures try to support new unorthodox ideas.

The patch proposes to clarify the rules:
spin_lock is ACQUIRE, spin_unlock is RELEASE.
spin_unlock_wait is also ACQUIRE.
Code that needs further guarantees must use appropriate explicit barriers.

Architectures that can implement some barriers for free can define the
barriers as NOPs.

As the initial step, the patch converts ipc/sem.c to the new defines:
- With commit 2c6100227116
  ("locking/qspinlock: Fix spin_unlock_wait() some more"),
  (and the commits for the other archs), spin_unlock_wait() is an
  ACQUIRE.
  Therefore the smp_rmb() after spin_unlock_wait() can be removed.
- smp_mb__after_spin_lock() instead of a direct smp_mb().
  This allows that architectures override it with a less expensive
  barrier if this is sufficient for their hardware/spinlock
  implementation.

For overriding, the same approach as for smp_mb__before_spin_lock()
is used: If smp_mb__after_spin_lock is already defined, then it is
not changed.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 Documentation/locking/spinlocks.txt |  5 +++++
 include/linux/spinlock.h            | 12 ++++++++++++
 ipc/sem.c                           | 16 +---------------
 3 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/Documentation/locking/spinlocks.txt b/Documentation/locking/spinlocks.txt
index ff35e40..fc37beb 100644
--- a/Documentation/locking/spinlocks.txt
+++ b/Documentation/locking/spinlocks.txt
@@ -40,6 +40,11 @@ example, internal driver data structures that nobody else ever touches).
    touches a shared variable has to agree about the spinlock they want
    to use.
 
+   NOTE! Code that needs stricter memory barriers than ACQUIRE during
+   LOCK and RELEASE during UNLOCK must use appropriate memory barriers
+   such as smp_mb__after_spin_lock().
+   spin_unlock_wait() has ACQUIRE semantics.
+
 ----
 
 Lesson 2: reader-writer spinlocks.
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 47dd0ce..d79000e 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -130,6 +130,18 @@ do {								\
 #define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
+#ifndef smp_mb__after_spin_lock
+/**
+ * smp_mb__after_spin_lock() - Provide smp_mb() after spin_lock
+ *
+ * spin_lock() provides ACQUIRE semantics regarding reading the lock.
+ * There are no guarantees that the lock write is visible before any read
+ * or write operation within the protected area is performed.
+ * If the lock write must happen first, this function is required.
+ */
+#define smp_mb__after_spin_lock()	smp_mb()
+#endif
+
 /**
  * raw_spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/ipc/sem.c b/ipc/sem.c
index 5e318c5..ac15ab2 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -290,14 +290,6 @@ static void complexmode_enter(struct sem_array *sma)
 		sem = sma->sem_base + i;
 		spin_unlock_wait(&sem->lock);
 	}
-	/*
-	 * spin_unlock_wait() is not a memory barriers, it is only a
-	 * control barrier. The code must pair with spin_unlock(&sem->lock),
-	 * thus just the control barrier is insufficient.
-	 *
-	 * smp_rmb() is sufficient, as writes cannot pass the control barrier.
-	 */
-	smp_rmb();
 }
 
 /*
@@ -363,13 +355,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 		 */
 		spin_lock(&sem->lock);
 
-		/*
-		 * See 51d7d5205d33
-		 * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
-		 * A full barrier is required: the write of sem->lock
-		 * must be visible before the read is executed
-		 */
-		smp_mb();
+		smp_mb__after_spin_lock();
 
 		if (!smp_load_acquire(&sma->complex_mode)) {
 			/* fast path successful! */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4 V4] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h
  2016-08-29 13:34 [PATCH 0/4 V4] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
  2016-08-29 13:34 ` [PATCH 1/4 v4] spinlock: Document memory barrier rules Manfred Spraul
@ 2016-08-29 13:34 ` Manfred Spraul
  2016-08-29 13:34 ` [PATCH 3/4 V4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
  2016-08-29 13:34 ` [PATCH 4/4 V4] qspinlock for x86: smp_mb__after_spin_lock() is free Manfred Spraul
  3 siblings, 0 replies; 6+ messages in thread
From: Manfred Spraul @ 2016-08-29 13:34 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

v3: If smp_mb__after_unlock_lock() is in barrier.h, then
for arm64, kernel/rcu/tree.c doesn't compile because barrier.h
is not included in kernel/rcu/tree.c

(v2 was: add example from Paul, something that can happen on real HW)

spin_unlock() + spin_lock() together do not form a full memory barrier:
(everything initialized to 0)

CPU1:
  a=1;
  spin_unlock(&b);
  spin_lock(&c);
+ smp_mb__after_unlock_lock();
  r1=d;

CPU2:
  d=1;
  smp_mb();
  r2=a;

Without the smp_mb__after_unlock_lock(), r1==0 && r2==0 would
be possible.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 include/linux/spinlock.h | 16 ++++++++++++++++
 kernel/rcu/tree.h        | 12 ------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index d79000e..5075c88 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -142,6 +142,22 @@ do {								\
 #define smp_mb__after_spin_lock()	smp_mb()
 #endif
 
+#ifndef smp_mb__after_unlock_lock
+/**
+ * smp_mb__after_unlock_lock() - Provide smp_mb() after unlock+lock
+ *
+ * Place this after a lock-acquisition primitive to guarantee that
+ * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
+ * if the UNLOCK and LOCK are executed by the same CPU or if the
+ * UNLOCK and LOCK operate on the same lock variable.
+ */
+#ifdef CONFIG_PPC
+#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
+#else /* #ifdef CONFIG_PPC */
+#define smp_mb__after_unlock_lock()	do { } while (0)
+#endif /* #else #ifdef CONFIG_PPC */
+#endif
+
 /**
  * raw_spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e99a523..a0cd9ab 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -687,18 +687,6 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 #endif /* #ifdef CONFIG_RCU_TRACE */
 
 /*
- * Place this after a lock-acquisition primitive to guarantee that
- * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
- * if the UNLOCK and LOCK are executed by the same CPU or if the
- * UNLOCK and LOCK operate on the same lock variable.
- */
-#ifdef CONFIG_PPC
-#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
-#else /* #ifdef CONFIG_PPC */
-#define smp_mb__after_unlock_lock()	do { } while (0)
-#endif /* #else #ifdef CONFIG_PPC */
-
-/*
  * Wrappers for the rcu_node::lock acquire and release.
  *
  * Because the rcu_nodes form a tree, the tree traversal locking will observe
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4 V4] net/netfilter/nf_conntrack_core: update memory barriers.
  2016-08-29 13:34 [PATCH 0/4 V4] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
  2016-08-29 13:34 ` [PATCH 1/4 v4] spinlock: Document memory barrier rules Manfred Spraul
  2016-08-29 13:34 ` [PATCH 2/4 V4] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h Manfred Spraul
@ 2016-08-29 13:34 ` Manfred Spraul
  2016-08-29 13:34 ` [PATCH 4/4 V4] qspinlock for x86: smp_mb__after_spin_lock() is free Manfred Spraul
  3 siblings, 0 replies; 6+ messages in thread
From: Manfred Spraul @ 2016-08-29 13:34 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

Change the locking for nf_conntrack_lock and nf_conntract_lock_all()
to the approach used by ipc/sem.c:

- use smp_store_mb() instead of a raw smp_mb()
- use smp_mb__after_spin_lock().
- for nf_conntrack_lock, use spin_lock(&global_lock) instead of
  spin_unlock_wait(&global_lock) and loop backward.

The last change avoids that nf_conntrack_lock() could loop multiple times.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 net/netfilter/nf_conntrack_core.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 7d90a5d..4ed229b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -79,20 +79,29 @@ static __read_mostly bool nf_conntrack_locks_all;
 
 void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
 {
+	/* 1) Acquire the lock */
 	spin_lock(lock);
-	while (unlikely(nf_conntrack_locks_all)) {
-		spin_unlock(lock);
 
-		/*
-		 * Order the 'nf_conntrack_locks_all' load vs. the
-		 * spin_unlock_wait() loads below, to ensure
-		 * that 'nf_conntrack_locks_all_lock' is indeed held:
-		 */
-		smp_rmb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
-		spin_unlock_wait(&nf_conntrack_locks_all_lock);
-		spin_lock(lock);
-	}
+	/* 2) Order writing the lock and reading nf_conntrack_locks_all */
+	smp_mb__after_spin_lock();
+
+	/* 3) read nf_conntrack_locks_all, with ACQUIRE semantics */
+	if (likely(smp_load_acquire(&nf_conntrack_locks_all) == false))
+		return;
+
+	/* fast path failed, unlock */
+	spin_unlock(lock);
+
+	/* Slow path 1) get global lock */
+	spin_lock(&nf_conntrack_locks_all_lock);
+
+	/* Slow path 2) get the lock we want */
+	spin_lock(lock);
+
+	/* Slow path 3) release the global lock */
+	spin_unlock(&nf_conntrack_locks_all_lock);
 }
+
 EXPORT_SYMBOL_GPL(nf_conntrack_lock);
 
 static void nf_conntrack_double_unlock(unsigned int h1, unsigned int h2)
@@ -132,15 +141,14 @@ static void nf_conntrack_all_lock(void)
 	int i;
 
 	spin_lock(&nf_conntrack_locks_all_lock);
-	nf_conntrack_locks_all = true;
 
 	/*
-	 * Order the above store of 'nf_conntrack_locks_all' against
+	 * Order the store of 'nf_conntrack_locks_all' against
 	 * the spin_unlock_wait() loads below, such that if
 	 * nf_conntrack_lock() observes 'nf_conntrack_locks_all'
 	 * we must observe nf_conntrack_locks[] held:
 	 */
-	smp_mb(); /* spin_lock(&nf_conntrack_locks_all_lock) */
+	smp_store_mb(nf_conntrack_locks_all, true);
 
 	for (i = 0; i < CONNTRACK_LOCKS; i++) {
 		spin_unlock_wait(&nf_conntrack_locks[i]);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4 V4] qspinlock for x86: smp_mb__after_spin_lock() is free
  2016-08-29 13:34 [PATCH 0/4 V4] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
                   ` (2 preceding siblings ...)
  2016-08-29 13:34 ` [PATCH 3/4 V4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
@ 2016-08-29 13:34 ` Manfred Spraul
  3 siblings, 0 replies; 6+ messages in thread
From: Manfred Spraul @ 2016-08-29 13:34 UTC (permalink / raw)
  To: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra, Andrew Morton
  Cc: LKML, 1vier1, Davidlohr Bueso, Manfred Spraul

For x86 qspinlocks, no additional memory barrier is required in
smp_mb__after_spin_lock:

Theoretically, for qspinlock we could define two barriers:
- smp_mb__after_spin_lock: Free for x86, not free for powerpc
- smp_mb__between_spin_lock_and_spin_unlock_wait():
	Free for all archs, see queued_spin_unlock_wait for details.

As smp_mb__between_spin_lock_and_spin_unlock_wait() is not used
in any hotpaths, the patch does not create that define yet.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 arch/x86/include/asm/qspinlock.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index eaba080..04d26ed 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -61,6 +61,17 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
 }
 #endif /* CONFIG_PARAVIRT */
 
+#ifndef smp_mb__after_spin_lock
+/**
+ * smp_mb__after_spin_lock() - Provide smp_mb() after spin_lock
+ *
+ * queued_spin_lock() provides full memory barriers semantics,
+ * thus no further memory barrier is required. See
+ * queued_spin_unlock_wait() for further details.
+ */
+#define smp_mb__after_spin_lock()	do { } while (0)
+#endif
+
 #include <asm-generic/qspinlock.h>
 
 #endif /* _ASM_X86_QSPINLOCK_H */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/4 v4] spinlock: Document memory barrier rules
  2016-08-29 13:34 ` [PATCH 1/4 v4] spinlock: Document memory barrier rules Manfred Spraul
@ 2016-08-29 17:38   ` Davidlohr Bueso
  0 siblings, 0 replies; 6+ messages in thread
From: Davidlohr Bueso @ 2016-08-29 17:38 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: benh, paulmck, Ingo Molnar, Boqun Feng, Peter Zijlstra,
	Andrew Morton, LKML, 1vier1

On Mon, 29 Aug 2016, Manfred Spraul wrote:

>Right now, the spinlock machinery tries to guarantee barriers even for
>unorthodox locking cases, which ends up as a constant stream of updates
>as the architectures try to support new unorthodox ideas.
>
>The patch proposes to clarify the rules:
>spin_lock is ACQUIRE, spin_unlock is RELEASE.
>spin_unlock_wait is also ACQUIRE.
>Code that needs further guarantees must use appropriate explicit barriers.
>
>Architectures that can implement some barriers for free can define the
>barriers as NOPs.
>
>As the initial step, the patch converts ipc/sem.c to the new defines:
>- With commit 2c6100227116
>  ("locking/qspinlock: Fix spin_unlock_wait() some more"),
>  (and the commits for the other archs), spin_unlock_wait() is an
>  ACQUIRE.
>  Therefore the smp_rmb() after spin_unlock_wait() can be removed.
>- smp_mb__after_spin_lock() instead of a direct smp_mb().
>  This allows that architectures override it with a less expensive
>  barrier if this is sufficient for their hardware/spinlock
>  implementation.
>
>For overriding, the same approach as for smp_mb__before_spin_lock()
>is used: If smp_mb__after_spin_lock is already defined, then it is
>not changed.
>
>Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
>---
> Documentation/locking/spinlocks.txt |  5 +++++
> include/linux/spinlock.h            | 12 ++++++++++++
> ipc/sem.c                           | 16 +---------------

Preferably this would have been two patches, specially since you
remove the redundant barrier in complexmode_enter(), which is 
kind of mixing core spinlocking and core sysv sems. But anyway,
this will be the patch that we _don't_ backport to stable, right?

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-08-29 17:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-29 13:34 [PATCH 0/4 V4] Clarify/standardize memory barriers for lock/unlock Manfred Spraul
2016-08-29 13:34 ` [PATCH 1/4 v4] spinlock: Document memory barrier rules Manfred Spraul
2016-08-29 17:38   ` Davidlohr Bueso
2016-08-29 13:34 ` [PATCH 2/4 V4] spinlock.h: Move smp_mb__after_unlock_lock to spinlock.h Manfred Spraul
2016-08-29 13:34 ` [PATCH 3/4 V4] net/netfilter/nf_conntrack_core: update memory barriers Manfred Spraul
2016-08-29 13:34 ` [PATCH 4/4 V4] qspinlock for x86: smp_mb__after_spin_lock() is free Manfred Spraul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).