All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	mingo@kernel.org, linux-kernel@vger.kernel.org, corbet@lwn.net,
	mhocko@kernel.org, dhowells@redhat.com,
	torvalds@linux-foundation.org, will.deacon@arm.com,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>
Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()
Date: Thu, 12 Nov 2015 10:59:06 -0800	[thread overview]
Message-ID: <20151112185906.GK3972@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151112193302.GA9988@redhat.com>

On Thu, Nov 12, 2015 at 08:33:02PM +0100, Oleg Nesterov wrote:
> On 11/12, Peter Zijlstra wrote:
> >
> > On Thu, Nov 12, 2015 at 07:38:07PM +0100, Oleg Nesterov wrote:
> > > It seems that PPC needs to define smp_mb__before_spinlock() as full mb(),
> > > no?
> >
> > It does:
> >
> > arch/powerpc/include/asm/barrier.h:#define smp_mb__before_spinlock() smp_mb()
> 
> Ah, indeed, thanks.
> 
> And given that it also defines smp_mb__after_unlock_lock() as smp_mb(),
> I am starting to understand how it can help to avoid the races with
> spin_unlock_wait() in (for example) do_exit().
> 
> But as Boqun has already mentioned, this means that mb__after_unlock_lock()
> has the new meaning which should be documented.
> 
> Hmm. And 12d560f4 "Privatize smp_mb__after_unlock_lock()" should be reverted
> then ;)

Surprisingly, this reverts cleanly against today's mainline, please see
the patch below.  Against my -rcu stack, not so much, but so it goes.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit eff0632b4181f91f2596d56f7c73194e1a869aff
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Thu Nov 12 10:54:23 2015 -0800

    Revert "rcu,locking: Privatize smp_mb__after_unlock_lock()"
    
    This reverts commit 12d560f4ea87030667438a169912380be00cea4b.
    
    The reason for this revert is that smp_mb__after_unlock_lock() might
    prove useful outside of RCU after all for interactions between
    the locking primitives and spin_unlock_wait().

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index aef9487303d0..d4501664d49f 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1855,10 +1855,16 @@ RELEASE are to the same lock variable, but only from the perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
-not imply a full memory barrier.  Therefore, the CPU's execution of the
-critical sections corresponding to the RELEASE and the ACQUIRE can cross,
-so that:
+Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
+imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
+pair to produce a full barrier, the ACQUIRE can be followed by an
+smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
+(including transitivity) if either (a) the RELEASE and the ACQUIRE are
+executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
+the same variable.  The smp_mb__after_unlock_lock() primitive is free
+on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
+execution of the critical sections corresponding to the RELEASE and the
+ACQUIRE can cross, so that:
 
 	*A = a;
 	RELEASE M
@@ -1896,6 +1902,29 @@ the RELEASE would simply complete, thereby avoiding the deadlock.
 	a sleep-unlock race, but the locking primitive needs to resolve
 	such races properly in any case.
 
+With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
+For example, with the following code, the store to *A will always be
+seen by other CPUs before the store to *B:
+
+	*A = a;
+	RELEASE M
+	ACQUIRE N
+	smp_mb__after_unlock_lock();
+	*B = b;
+
+The operations will always occur in one of the following orders:
+
+	STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
+	STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
+	ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
+
+If the RELEASE and ACQUIRE were instead both operating on the same lock
+variable, only the first of these alternatives can occur.  In addition,
+the more strongly ordered systems may rule out some of the above orders.
+But in any case, as noted earlier, the smp_mb__after_unlock_lock()
+ensures that the store to *A will always be seen as happening before
+the store to *B.
+
 Locks and semaphores may not provide any guarantee of ordering on UP compiled
 systems, and so cannot be counted on in such a situation to actually achieve
 anything at all - especially with respect to I/O accesses - unless combined
@@ -2126,6 +2155,40 @@ But it won't see any of:
 	*E, *F or *G following RELEASE Q
 
 
+However, if the following occurs:
+
+	CPU 1				CPU 2
+	===============================	===============================
+	WRITE_ONCE(*A, a);
+	ACQUIRE M		     [1]
+	WRITE_ONCE(*B, b);
+	WRITE_ONCE(*C, c);
+	RELEASE M	     [1]
+	WRITE_ONCE(*D, d);		WRITE_ONCE(*E, e);
+					ACQUIRE M		     [2]
+					smp_mb__after_unlock_lock();
+					WRITE_ONCE(*F, f);
+					WRITE_ONCE(*G, g);
+					RELEASE M	     [2]
+					WRITE_ONCE(*H, h);
+
+CPU 3 might see:
+
+	*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
+		ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
+
+But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
+
+	*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
+	*A, *B or *C following RELEASE M [1]
+	*F, *G or *H preceding ACQUIRE M [2]
+	*A, *B, *C, *E, *F or *G following RELEASE M [2]
+
+Note that the smp_mb__after_unlock_lock() is critically important
+here: Without it CPU 3 might see some of the above orderings.
+Without smp_mb__after_unlock_lock(), the accesses are not guaranteed
+to be seen in order unless CPU 3 holds lock M.
+
 
 ACQUIRES VS I/O ACCESSES
 ------------------------
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 523673d7583c..4dbe072eecbe 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,6 +28,8 @@
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
 
+#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
+
 #ifdef CONFIG_PPC64
 /* use 0x800000yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 47dd0cebd204..ffcd053ca89a 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -130,6 +130,16 @@ do {								\
 #define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
+/*
+ * Place this after a lock-acquisition primitive to guarantee that
+ * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
+ * if the UNLOCK and LOCK are executed by the same CPU or if the
+ * UNLOCK and LOCK operate on the same lock variable.
+ */
+#ifndef smp_mb__after_unlock_lock
+#define smp_mb__after_unlock_lock()	do { } while (0)
+#endif
+
 /**
  * raw_spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 9fb4e238d4dc..8c6753d903ec 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -652,15 +652,3 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 }
 #endif /* #ifdef CONFIG_RCU_TRACE */
-
-/*
- * Place this after a lock-acquisition primitive to guarantee that
- * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
- * if the UNLOCK and LOCK are executed by the same CPU or if the
- * UNLOCK and LOCK operate on the same lock variable.
- */
-#ifdef CONFIG_PPC
-#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
-#else /* #ifdef CONFIG_PPC */
-#define smp_mb__after_unlock_lock()	do { } while (0)
-#endif /* #else #ifdef CONFIG_PPC */


  reply	other threads:[~2015-11-12 18:59 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 13:29 [PATCH 0/4] scheduler ordering bits Peter Zijlstra
2015-11-02 13:29 ` [PATCH 1/4] sched: Better document the try_to_wake_up() barriers Peter Zijlstra
2015-12-04  0:09   ` Byungchul Park
2015-12-04  0:58   ` Byungchul Park
2015-11-02 13:29 ` [PATCH 2/4] sched: Document Program-Order guarantees Peter Zijlstra
2015-11-02 20:27   ` Paul Turner
2015-11-02 20:34     ` Peter Zijlstra
2015-11-02 22:09       ` Paul Turner
2015-11-02 22:12         ` Peter Zijlstra
2015-11-20 10:02     ` Peter Zijlstra
2015-11-20 14:08       ` Boqun Feng
2015-11-20 14:18         ` Peter Zijlstra
2015-11-20 14:21           ` Boqun Feng
2015-11-20 19:41             ` Peter Zijlstra
2015-11-02 13:29 ` [PATCH 3/4] sched: Fix a race in try_to_wake_up() vs schedule() Peter Zijlstra
2015-11-02 13:29 ` [PATCH 4/4] locking: Introduce smp_cond_acquire() Peter Zijlstra
2015-11-02 13:57   ` Peter Zijlstra
2015-11-02 17:43     ` Will Deacon
2015-11-03  1:14       ` Paul E. McKenney
2015-11-03  1:25         ` Linus Torvalds
2015-11-02 17:42   ` Will Deacon
2015-11-02 18:08   ` Linus Torvalds
2015-11-02 18:37     ` Will Deacon
2015-11-02 19:17       ` Linus Torvalds
2015-11-02 19:57         ` Will Deacon
2015-11-02 20:23           ` Peter Zijlstra
2015-11-02 21:56         ` Peter Zijlstra
2015-11-03  1:57         ` Paul E. McKenney
2015-11-03 19:40           ` Linus Torvalds
2015-11-04  3:57             ` Paul E. McKenney
2015-11-04  4:43               ` Linus Torvalds
2015-11-04 12:54                 ` Paul E. McKenney
2015-11-02 20:36   ` David Howells
2015-11-02 20:40     ` Peter Zijlstra
2015-11-02 21:11     ` Linus Torvalds
2015-11-03 17:59   ` Oleg Nesterov
2015-11-03 18:23     ` Peter Zijlstra
2015-11-11  9:39     ` Boqun Feng
2015-11-11 10:34       ` Boqun Feng
2015-11-11 19:53         ` Oleg Nesterov
2015-11-12 13:50         ` Paul E. McKenney
2015-11-11 12:12       ` Peter Zijlstra
2015-11-11 19:39         ` Oleg Nesterov
2015-11-11 21:23           ` Linus Torvalds
2015-11-12  7:14           ` Boqun Feng
2015-11-12 10:28             ` Peter Zijlstra
2015-11-12 15:00             ` Oleg Nesterov
2015-11-12 14:40               ` Paul E. McKenney
2015-11-12 14:49                 ` Boqun Feng
2015-11-12 15:02                   ` Paul E. McKenney
2015-11-12 21:53                     ` Will Deacon
2015-11-12 14:50                 ` Peter Zijlstra
2015-11-12 15:01                   ` Paul E. McKenney
2015-11-12 15:08                     ` Peter Zijlstra
2015-11-12 15:20                       ` Paul E. McKenney
2015-11-12 21:25                         ` Will Deacon
2015-11-12 15:18               ` Boqun Feng
2015-11-12 18:38                 ` Oleg Nesterov
2015-11-12 18:02                   ` Peter Zijlstra
2015-11-12 19:33                     ` Oleg Nesterov
2015-11-12 18:59                       ` Paul E. McKenney [this message]
2015-11-12 21:33                         ` Will Deacon
2015-11-12 23:43                           ` Paul E. McKenney
2015-11-16 13:58                             ` Will Deacon
2015-11-12 18:21             ` Linus Torvalds
2015-11-12 22:09               ` Will Deacon
2015-11-16 15:56               ` Peter Zijlstra
2015-11-16 16:04                 ` Peter Zijlstra
2015-11-16 16:24                   ` Will Deacon
2015-11-16 16:44                     ` Paul E. McKenney
2015-11-16 16:46                       ` Will Deacon
2015-11-16 17:15                         ` Paul E. McKenney
2015-11-16 21:58                     ` Linus Torvalds
2015-11-17 11:51                       ` Will Deacon
2015-11-17 21:01                         ` Paul E. McKenney
2015-11-18 11:25                           ` Will Deacon
2015-11-19 18:01                             ` Will Deacon
2015-11-20 10:09                               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151112185906.GK3972@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=boqun.feng@gmail.com \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=oleg@redhat.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.