From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757953AbcBXFHy (ORCPT ); Wed, 24 Feb 2016 00:07:54 -0500 Received: from e19.ny.us.ibm.com ([129.33.205.209]:45413 "EHLO e19.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757090AbcBXFHn (ORCPT ); Wed, 24 Feb 2016 00:07:43 -0500 X-IBM-Helo: d01dlp02.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org Cc: mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com, "Paul E. McKenney" Subject: [PATCH tip/core/rcu 07/14] documentation: Distinguish between local and global transitivity Date: Tue, 23 Feb 2016 21:00:40 -0800 Message-Id: <1456290047-16654-7-git-send-email-paulmck@linux.vnet.ibm.com> X-Mailer: git-send-email 2.5.2 In-Reply-To: <20160224050021.GA14616@linux.vnet.ibm.com> References: <20160224050021.GA14616@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16022405-0057-0000-0000-000003830805 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The introduction of smp_load_acquire() and smp_store_release() had the side effect of introducing a weaker notion of transitivity: The transitivity of full smp_mb() barriers is global, but that of smp_store_release()/smp_load_acquire() chains is local. This commit therefore introduces the notion of local transitivity and gives an example. Reported-by: Peter Zijlstra Reported-by: Will Deacon Signed-off-by: Paul E. McKenney --- Documentation/memory-barriers.txt | 78 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 76 insertions(+), 2 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index e9ebeb3b1077..ae9d306725ba 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes. General barriers are therefore required to ensure that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses. -To reiterate, if your code requires transitivity, use general barriers -throughout. +General barriers provide "global transitivity", so that all CPUs will +agree on the order of operations. In contrast, a chain of release-acquire +pairs provides only "local transitivity", so that only those CPUs on +the chain are guaranteed to agree on the combined order of the accesses. +For example, switching to C code in deference to Herman Hollerith: + + int u, v, x, y, z; + + void cpu0(void) + { + r0 = smp_load_acquire(&x); + WRITE_ONCE(u, 1); + smp_store_release(&y, 1); + } + + void cpu1(void) + { + r1 = smp_load_acquire(&y); + r4 = READ_ONCE(v); + r5 = READ_ONCE(u); + smp_store_release(&z, 1); + } + + void cpu2(void) + { + r2 = smp_load_acquire(&z); + smp_store_release(&x, 1); + } + + void cpu3(void) + { + WRITE_ONCE(v, 1); + smp_mb(); + r3 = READ_ONCE(u); + } + +Because cpu0(), cpu1(), and cpu2() participate in a local transitive +chain of smp_store_release()/smp_load_acquire() pairs, the following +outcome is prohibited: + + r0 == 1 && r1 == 1 && r2 == 1 + +Furthermore, because of the release-acquire relationship between cpu0() +and cpu1(), cpu1() must see cpu0()'s writes, so that the following +outcome is prohibited: + + r1 == 1 && r5 == 0 + +However, the transitivity of release-acquire is local to the participating +CPUs and does not apply to cpu3(). Therefore, the following outcome +is possible: + + r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 + +Although cpu0(), cpu1(), and cpu2() will see their respective reads and +writes in order, CPUs not involved in the release-acquire chain might +well disagree on the order. This disagreement stems from the fact that +the weak memory-barrier instructions used to implement smp_load_acquire() +and smp_store_release() are not required to order prior stores against +subsequent loads in all cases. This means that cpu3() can see cpu0()'s +store to u as happening -after- cpu1()'s load from v, even though +both cpu0() and cpu1() agree that these two operations occurred in the +intended order. + +However, please keep in mind that smp_load_acquire() is not magic. +In particular, it simply reads from its argument with ordering. It does +-not- ensure that any particular value will be read. Therefore, the +following outcome is possible: + + r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0 + +Note that this outcome can happen even on a mythical sequentially +consistent system where nothing is ever reordered. + +To reiterate, if your code requires global transitivity, use general +barriers throughout. ======================== -- 2.5.2