All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/6] Documentation changes for 3.15
@ 2014-02-17 21:26 Paul E. McKenney
  2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
  2014-02-17 21:47 ` [PATCH tip/core/rcu 0/6] Documentation changes for 3.15 Josh Triplett
  0 siblings, 2 replies; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 21:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, oleg, sbw

Hello!

This series provides a variety of documentation updates:

1.	Update the documentation on actions to take to avoid frenetic
	call_rcu() invocations from exhausting memory, including a mention
	of internal-to-RCU avoidance measures.

2.	Add a note to Documentation/memory-barriers.txt stating that
	ACCESS_ONCE() provides cache coherence for accesses to any
	single given variable.

3.	Add an explicit statement that in control dependencies, the
	condition must include the load in question.

4.	Add WQ_SYSFS workqueues as a work-offloading option in
	Documentation/kernel-per-CPU-kthreads.txt.

5.	It turns out that some types of control dependencies require
	memory barriers, most notably when the exact same store is
	at the beginning of both the then-clause and the else-clause.
	Document this.

6.	Fixups to RTFP.txt

							Thanx, Paul

------------------------------------------------------------------------

 Documentation/memory-barriers.txt           |   29 +++--
 b/Documentation/RCU/RTFP.txt                |  149 +++++++++++++++++++++++-----
 b/Documentation/RCU/checklist.txt           |   19 ++-
 b/Documentation/kernel-per-CPU-kthreads.txt |   13 ++
 b/Documentation/memory-barriers.txt         |   17 +++
 5 files changed, 189 insertions(+), 38 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations
  2014-02-17 21:26 [PATCH tip/core/rcu 0/6] Documentation changes for 3.15 Paul E. McKenney
@ 2014-02-17 21:26 ` Paul E. McKenney
  2014-02-17 21:26   ` [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence Paul E. McKenney
                     ` (5 more replies)
  2014-02-17 21:47 ` [PATCH tip/core/rcu 0/6] Documentation changes for 3.15 Josh Triplett
  1 sibling, 6 replies; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 21:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The call_rcu() family of primitives will take action to accelerate
grace periods when the number of callbacks pending on a given CPU
becomes excessive.  Although this safety mechanism can be useful,
it is no substitute for users of call_rcu() having rate-limit controls
in place.  This commit adds this nuance to the documentation.

Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Reported-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/checklist.txt | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 91266193b8f4..5733e31836b5 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -256,10 +256,11 @@ over a rather long period of time, but improvements are always welcome!
 		variations on this theme.
 
 	b.	Limiting update rate.  For example, if updates occur only
-		once per hour, then no explicit rate limiting is required,
-		unless your system is already badly broken.  The dcache
-		subsystem takes this approach -- updates are guarded
-		by a global lock, limiting their rate.
+		once per hour, then no explicit rate limiting is
+		required, unless your system is already badly broken.
+		Older versions of the dcache subsystem takes this
+		approach -- updates were guarded by a global lock,
+		limiting their rate.
 
 	c.	Trusted update -- if updates can only be done manually by
 		superuser or some other trusted user, then it might not
@@ -268,7 +269,8 @@ over a rather long period of time, but improvements are always welcome!
 		the machine.
 
 	d.	Use call_rcu_bh() rather than call_rcu(), in order to take
-		advantage of call_rcu_bh()'s faster grace periods.
+		advantage of call_rcu_bh()'s faster grace periods.  (This
+		is only a partial solution, though.)
 
 	e.	Periodically invoke synchronize_rcu(), permitting a limited
 		number of updates per grace period.
@@ -276,6 +278,13 @@ over a rather long period of time, but improvements are always welcome!
 	The same cautions apply to call_rcu_bh(), call_rcu_sched(),
 	call_srcu(), and kfree_rcu().
 
+	Note that although these primitives do take action to avoid memory
+	exhaustion when any given CPU has too many callbacks, a determined
+	user could still exhaust memory.  This is especially the case
+	if a system with a large number of CPUs has been configured to
+	offload all of its RCU callbacks onto a single CPU, or if the
+	system has relatively little free memory.
+
 9.	All RCU list-traversal primitives, which include
 	rcu_dereference(), list_for_each_entry_rcu(), and
 	list_for_each_safe_rcu(), must be either within an RCU read-side
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence
  2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
@ 2014-02-17 21:26   ` Paul E. McKenney
  2014-02-17 21:40     ` Josh Triplett
  2014-02-17 21:26   ` [PATCH tip/core/rcu 3/6] Documentation/memory-barriers.txt: Conditional must use prior load Paul E. McKenney
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 21:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The ACCESS_ONCE() primitive provides cache coherence, but the
documentation does not clearly state this.  This commit therefore upgrades
the documentation.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 102dc19c4119..ad6db1d48f1f 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1249,6 +1249,23 @@ The ACCESS_ONCE() function can prevent any number of optimizations that,
 while perfectly safe in single-threaded code, can be fatal in concurrent
 code.  Here are some examples of these sorts of optimizations:
 
+ (*) The compiler is within its rights to reorder loads and stores
+     to the same variable, and in some cases, the CPU is within its
+     rights to reorder loads to the same variable.  This means that
+     the following code:
+
+	a[0] = x;
+	a[1] = x;
+
+     Might result in an older value of x stored in a[1] than in a[0].
+     Prevent both the compiler and the CPU from doing this as follows:
+
+	a[0] = ACCESS_ONCE(x);
+	a[1] = ACCESS_ONCE(x);
+
+     In short, ACCESS_ONCE() provides "cache coherence" for accesses from
+     multiple CPUs to a single variable.
+
  (*) The compiler is within its rights to merge successive loads from
      the same variable.  Such merging can cause the compiler to "optimize"
      the following code:
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH tip/core/rcu 3/6] Documentation/memory-barriers.txt: Conditional must use prior load
  2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
  2014-02-17 21:26   ` [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence Paul E. McKenney
@ 2014-02-17 21:26   ` Paul E. McKenney
  2014-02-17 21:26   ` [PATCH tip/core/rcu 4/6] Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity Paul E. McKenney
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 21:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

A control dependency consists of a load, a conditional that depends on
that load, and a store.  This commit emphasizes this point in the
summary.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index ad6db1d48f1f..f2668c19807e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -724,7 +724,8 @@ In summary:
       later loads, smp_mb().
 
   (*) Control dependencies require at least one run-time conditional
-      between the prior load and the subsequent store.  If the compiler
+      between the prior load and the subsequent store, and this
+      conditional must involve the prior load.  If the compiler
       is able to optimize the conditional away, it will have also
       optimized away the ordering.  Careful use of ACCESS_ONCE() can
       help to preserve the needed conditional.
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH tip/core/rcu 4/6] Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
  2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
  2014-02-17 21:26   ` [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence Paul E. McKenney
  2014-02-17 21:26   ` [PATCH tip/core/rcu 3/6] Documentation/memory-barriers.txt: Conditional must use prior load Paul E. McKenney
@ 2014-02-17 21:26   ` Paul E. McKenney
  2014-02-17 21:26   ` [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies Paul E. McKenney
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 21:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This commit documents the ability to apply CPU affinity to WQ_SYSFS
workqueues, thus offloading them from the desired worker CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 Documentation/kernel-per-CPU-kthreads.txt | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt
index 827104fb9364..f3cd299fcc41 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -162,7 +162,18 @@ Purpose: Execute workqueue requests
 To reduce its OS jitter, do any of the following:
 1.	Run your workload at a real-time priority, which will allow
 	preempting the kworker daemons.
-2.	Do any of the following needed to avoid jitter that your
+2.	A given workqueue can be made visible in the sysfs filesystem
+	by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
+	Such a workqueue can be confined to a given subset of the
+	CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
+	files.	The set of WQ_SYSFS workqueues can be displayed using
+	"ls sys/devices/virtual/workqueue".  That said, the workqueues
+	maintainer would like to caution people against indiscriminately
+	sprinkling WQ_SYSFS across all the workqueues.	The reason for
+	caution is that it is easy to add WQ_SYSFS, but because sysfs is
+	part of the formal user/kernel API, it can be nearly impossible
+	to remove it, even if its addition was a mistake.
+3.	Do any of the following needed to avoid jitter that your
 	application cannot tolerate:
 	a.	Build your kernel with CONFIG_SLUB=y rather than
 		CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
                     ` (2 preceding siblings ...)
  2014-02-17 21:26   ` [PATCH tip/core/rcu 4/6] Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity Paul E. McKenney
@ 2014-02-17 21:26   ` Paul E. McKenney
  2014-02-17 21:46     ` Josh Triplett
  2014-02-17 21:26   ` [PATCH tip/core/rcu 6/6] documentation: Fix some inconsistencies in RTFP.txt Paul E. McKenney
  2014-02-17 21:39   ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Josh Triplett
  5 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 21:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Current compilers can "speculate" stores in the case where both legs
of the "if" statement start with identical stores.  Because the stores
are identical, the compiler knows that the store will unconditionally
execute regardless of the "if" condition, and so the compiler is within
its rights to hoist the store to precede the condition.  Such hoisting
destroys the control-dependency ordering.  This ordering can be restored
by placing a barrier() at the beginning of each leg of the "if" statement.
This commit adds this requirement to the control-dependencies section.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index f2668c19807e..adfaca831a90 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -608,26 +608,30 @@ as follows:
 	b = p;  /* BUG: Compiler can reorder!!! */
 	do_something();
 
-The solution is again ACCESS_ONCE(), which preserves the ordering between
-the load from variable 'a' and the store to variable 'b':
+The solution is again ACCESS_ONCE() and barrier(), which preserves the
+ordering between the load from variable 'a' and the store to variable 'b':
 
 	q = ACCESS_ONCE(a);
 	if (q) {
+		barrier();
 		ACCESS_ONCE(b) = p;
 		do_something();
 	} else {
+		barrier();
 		ACCESS_ONCE(b) = p;
 		do_something_else();
 	}
 
-You could also use barrier() to prevent the compiler from moving
-the stores to variable 'b', but barrier() would not prevent the
-compiler from proving to itself that a==1 always, so ACCESS_ONCE()
-is also needed.
+The initial ACCESS_ONCE() is required to prevent the compiler from
+proving the value of 'a', and the pair of barrier() invocations are
+required to prevent the compiler from pulling the two identical stores
+to 'b' out from the legs of the "if" statement.
 
 It is important to note that control dependencies absolutely require a
 a conditional.  For example, the following "optimized" version of
-the above example breaks ordering:
+the above example breaks ordering, which is why the barrier() invocations
+are absolutely required if you have identical stores in both legs of
+the "if" statement:
 
 	q = ACCESS_ONCE(a);
 	ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
@@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
 for example, as follows:
 
 	if (ACCESS_ONCE(a) > 0) {
+		barrier();
 		ACCESS_ONCE(b) = q / 2;
 		do_something();
 	} else {
+		barrier();
 		ACCESS_ONCE(b) = q / 3;
 		do_something_else();
 	}
@@ -659,9 +665,11 @@ the needed conditional.  For example:
 
 	q = ACCESS_ONCE(a);
 	if (q % MAX) {
+		barrier();
 		ACCESS_ONCE(b) = p;
 		do_something();
 	} else {
+		barrier();
 		ACCESS_ONCE(b) = p;
 		do_something_else();
 	}
@@ -723,6 +731,10 @@ In summary:
       use smb_rmb(), smp_wmb(), or, in the case of prior stores and
       later loads, smp_mb().
 
+  (*) If both legs of the "if" statement begin with identical stores
+      to the same variable, a barrier() statement is required at the
+      beginning of each leg of the "if" statement.
+
   (*) Control dependencies require at least one run-time conditional
       between the prior load and the subsequent store, and this
       conditional must involve the prior load.  If the compiler
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH tip/core/rcu 6/6] documentation: Fix some inconsistencies in RTFP.txt
  2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
                     ` (3 preceding siblings ...)
  2014-02-17 21:26   ` [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies Paul E. McKenney
@ 2014-02-17 21:26   ` Paul E. McKenney
  2014-02-17 21:39   ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Josh Triplett
  5 siblings, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 21:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, niv, tglx,
	peterz, rostedt, dhowells, edumazet, darren, fweisbec, oleg, sbw,
	Paul E. McKenney

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Some of the early history leaves out some citations and vice versa.  This
commit fixes these up.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/RTFP.txt | 149 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 125 insertions(+), 24 deletions(-)

diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
index 273e654d7d08..2f0fcb2112d2 100644
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
@@ -31,6 +31,14 @@ has lapsed, so this approach may be used in non-GPL software, if desired.
 (In contrast, implementation of RCU is permitted only in software licensed
 under either GPL or LGPL.  Sorry!!!)
 
+In 1987, Rashid et al. described lazy TLB-flush [RichardRashid87a].
+At first glance, this has nothing to do with RCU, but nevertheless
+this paper helped inspire the update-side batching used in the later
+RCU implementation in DYNIX/ptx.  In 1988, Barbara Liskov published
+a description of Argus that noted that use of out-of-date values can
+be tolerated in some situations.  Thus, this paper provides some early
+theoretical justification for use of stale data.
+
 In 1990, Pugh [Pugh90] noted that explicitly tracking which threads
 were reading a given data structure permitted deferred free to operate
 in the presence of non-terminating threads.  However, this explicit
@@ -41,11 +49,11 @@ providing a fine-grained locking design, however, it would be interesting
 to see how much of the performance advantage reported in 1990 remains
 today.
 
-At about this same time, Adams [Adams91] described ``chaotic relaxation'',
-where the normal barriers between successive iterations of convergent
-numerical algorithms are relaxed, so that iteration $n$ might use
-data from iteration $n-1$ or even $n-2$.  This introduces error,
-which typically slows convergence and thus increases the number of
+At about this same time, Andrews [Andrews91textbook] described ``chaotic
+relaxation'', where the normal barriers between successive iterations
+of convergent numerical algorithms are relaxed, so that iteration $n$
+might use data from iteration $n-1$ or even $n-2$.  This introduces
+error, which typically slows convergence and thus increases the number of
 iterations required.  However, this increase is sometimes more than made
 up for by a reduction in the number of expensive barrier operations,
 which are otherwise required to synchronize the threads at the end
@@ -55,7 +63,8 @@ is thus inapplicable to most data structures in operating-system kernels.
 
 In 1992, Henry (now Alexia) Massalin completed a dissertation advising
 parallel programmers to defer processing when feasible to simplify
-synchronization.  RCU makes extremely heavy use of this advice.
+synchronization [HMassalinPhD].  RCU makes extremely heavy use of
+this advice.
 
 In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
 simplest deferred-free technique: simply waiting a fixed amount of time
@@ -90,27 +99,29 @@ mechanism, which is quite similar to RCU [Gamsa99].  These operating
 systems made pervasive use of RCU in place of "existence locks", which
 greatly simplifies locking hierarchies and helps avoid deadlocks.
 
-2001 saw the first RCU presentation involving Linux [McKenney01a]
-at OLS.  The resulting abundance of RCU patches was presented the
-following year [McKenney02a], and use of RCU in dcache was first
-described that same year [Linder02a].
+The year 2000 saw an email exchange that would likely have
+led to yet another independent invention of something like RCU
+[RustyRussell2000a,RustyRussell2000b].  Instead, 2001 saw the first
+RCU presentation involving Linux [McKenney01a] at OLS.  The resulting
+abundance of RCU patches was presented the following year [McKenney02a],
+and use of RCU in dcache was first described that same year [Linder02a].
 
 Also in 2002, Michael [Michael02b,Michael02a] presented "hazard-pointer"
 techniques that defer the destruction of data structures to simplify
 non-blocking synchronization (wait-free synchronization, lock-free
 synchronization, and obstruction-free synchronization are all examples of
-non-blocking synchronization).  In particular, this technique eliminates
-locking, reduces contention, reduces memory latency for readers, and
-parallelizes pipeline stalls and memory latency for writers.  However,
-these techniques still impose significant read-side overhead in the
-form of memory barriers.  Researchers at Sun worked along similar lines
-in the same timeframe [HerlihyLM02].  These techniques can be thought
-of as inside-out reference counts, where the count is represented by the
-number of hazard pointers referencing a given data structure rather than
-the more conventional counter field within the data structure itself.
-The key advantage of inside-out reference counts is that they can be
-stored in immortal variables, thus allowing races between access and
-deletion to be avoided.
+non-blocking synchronization).  The corresponding journal article appeared
+in 2004 [MagedMichael04a].  This technique eliminates locking, reduces
+contention, reduces memory latency for readers, and parallelizes pipeline
+stalls and memory latency for writers.  However, these techniques still
+impose significant read-side overhead in the form of memory barriers.
+Researchers at Sun worked along similar lines in the same timeframe
+[HerlihyLM02].  These techniques can be thought of as inside-out reference
+counts, where the count is represented by the number of hazard pointers
+referencing a given data structure rather than the more conventional
+counter field within the data structure itself.  The key advantage
+of inside-out reference counts is that they can be stored in immortal
+variables, thus allowing races between access and deletion to be avoided.
 
 By the same token, RCU can be thought of as a "bulk reference count",
 where some form of reference counter covers all reference by a given CPU
@@ -123,8 +134,10 @@ can be thought of in other terms as well.
 
 In 2003, the K42 group described how RCU could be used to create
 hot-pluggable implementations of operating-system functions [Appavoo03a].
-Later that year saw a paper describing an RCU implementation of System
-V IPC [Arcangeli03], and an introduction to RCU in Linux Journal
+Later that year saw a paper describing an RCU implementation
+of System V IPC [Arcangeli03] (following up on a suggestion by
+Hugh Dickins [Dickins02a] and an implementation by Mingming Cao
+[MingmingCao2002IPCRCU]), and an introduction to RCU in Linux Journal
 [McKenney03a].
 
 2004 has seen a Linux-Journal article on use of RCU in dcache
@@ -383,6 +396,21 @@ for Programming Languages and Operating Systems}"
 }
 }
 
+@phdthesis{HMassalinPhD
+,author="H. Massalin"
+,title="Synthesis: An Efficient Implementation of Fundamental Operating
+System Services"
+,school="Columbia University"
+,address="New York, NY"
+,year="1992"
+,annotation={
+	Mondo optimizing compiler.
+	Wait-free stuff.
+	Good advice: defer work to avoid synchronization.  See page 90
+		(PDF page 106), Section 5.4, fourth bullet point.
+}
+}
+
 @unpublished{Jacobson93
 ,author="Van Jacobson"
 ,title="Avoid Read-Side Locking Via Delayed Free"
@@ -671,6 +699,20 @@ Orran Krieger and Rusty Russell and Dipankar Sarma and Maneesh Soni"
 [Viewed October 18, 2004]"
 }
 
+@conference{Michael02b
+,author="Maged M. Michael"
+,title="High Performance Dynamic Lock-Free Hash Tables and List-Based Sets"
+,Year="2002"
+,Month="August"
+,booktitle="{Proceedings of the 14\textsuperscript{th} Annual ACM
+Symposium on Parallel
+Algorithms and Architecture}"
+,pages="73-82"
+,annotation={
+Like the title says...
+}
+}
+
 @Conference{Linder02a
 ,Author="Hanna Linder and Dipankar Sarma and Maneesh Soni"
 ,Title="Scalability of the Directory Entry Cache"
@@ -727,6 +769,24 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell"
 }
 }
 
+@conference{Michael02a
+,author="Maged M. Michael"
+,title="Safe Memory Reclamation for Dynamic Lock-Free Objects Using Atomic
+Reads and Writes"
+,Year="2002"
+,Month="August"
+,booktitle="{Proceedings of the 21\textsuperscript{st} Annual ACM
+Symposium on Principles of Distributed Computing}"
+,pages="21-30"
+,annotation={
+	Each thread keeps an array of pointers to items that it is
+	currently referencing.	Sort of an inside-out garbage collection
+	mechanism, but one that requires the accessing code to explicitly
+	state its needs.  Also requires read-side memory barriers on
+	most architectures.
+}
+}
+
 @unpublished{Dickins02a
 ,author="Hugh Dickins"
 ,title="Use RCU for System-V IPC"
@@ -735,6 +795,17 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell"
 ,note="private communication"
 }
 
+@InProceedings{HerlihyLM02
+,author={Maurice Herlihy and Victor Luchangco and Mark Moir}
+,title="The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized,
+Lock-Free Data Structures"
+,booktitle={Proceedings of 16\textsuperscript{th} International
+Symposium on Distributed Computing}
+,year=2002
+,month="October"
+,pages="339-353"
+}
+
 @unpublished{Sarma02b
 ,Author="Dipankar Sarma"
 ,Title="Some dcache\_rcu benchmark numbers"
@@ -749,6 +820,19 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell"
 }
 }
 
+@unpublished{MingmingCao2002IPCRCU
+,Author="Mingming Cao"
+,Title="[PATCH]updated ipc lock patch"
+,month="October"
+,year="2002"
+,note="Available:
+\url{https://lkml.org/lkml/2002/10/24/262}
+[Viewed February 15, 2014]"
+,annotation={
+	Mingming Cao's patch to introduce RCU to SysV IPC.
+}
+}
+
 @unpublished{LinusTorvalds2003a
 ,Author="Linus Torvalds"
 ,Title="Re: {[PATCH]} small fixes in brlock.h"
@@ -982,6 +1066,23 @@ Realtime Applications"
 }
 }
 
+@article{MagedMichael04a
+,author="Maged M. Michael"
+,title="Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects"
+,Year="2004"
+,Month="June"
+,journal="IEEE Transactions on Parallel and Distributed Systems"
+,volume="15"
+,number="6"
+,pages="491-504"
+,url="Available:
+\url{http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf}
+[Viewed March 1, 2005]"
+,annotation={
+	New canonical hazard-pointer citation.
+}
+}
+
 @phdthesis{PaulEdwardMcKenneyPhD
 ,author="Paul E. McKenney"
 ,title="Exploiting Deferred Destruction:
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations
  2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
                     ` (4 preceding siblings ...)
  2014-02-17 21:26   ` [PATCH tip/core/rcu 6/6] documentation: Fix some inconsistencies in RTFP.txt Paul E. McKenney
@ 2014-02-17 21:39   ` Josh Triplett
  2014-02-17 22:52     ` Paul E. McKenney
  5 siblings, 1 reply; 20+ messages in thread
From: Josh Triplett @ 2014-02-17 21:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 01:26:48PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> The call_rcu() family of primitives will take action to accelerate
> grace periods when the number of callbacks pending on a given CPU
> becomes excessive.  Although this safety mechanism can be useful,
> it is no substitute for users of call_rcu() having rate-limit controls
> in place.  This commit adds this nuance to the documentation.
> 
> Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
> Reported-by: Gleb Natapov <gleb@redhat.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Grammatical nit below; otherwise:
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

>  Documentation/RCU/checklist.txt | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
> index 91266193b8f4..5733e31836b5 100644
> --- a/Documentation/RCU/checklist.txt
> +++ b/Documentation/RCU/checklist.txt
> @@ -256,10 +256,11 @@ over a rather long period of time, but improvements are always welcome!
>  		variations on this theme.
>  
>  	b.	Limiting update rate.  For example, if updates occur only
> -		once per hour, then no explicit rate limiting is required,
> -		unless your system is already badly broken.  The dcache
> -		subsystem takes this approach -- updates are guarded
> -		by a global lock, limiting their rate.
> +		once per hour, then no explicit rate limiting is
> +		required, unless your system is already badly broken.
> +		Older versions of the dcache subsystem takes this
> +		approach -- updates were guarded by a global lock,
> +		limiting their rate.

s/takes/take/ to match the change from the singular "The dcache
subsystem" to the plural "Older versions of the dcache subsystem"

(You might also change " -- updates are guarded by" to ", guarding
updates with".)

>  
>  	c.	Trusted update -- if updates can only be done manually by
>  		superuser or some other trusted user, then it might not
> @@ -268,7 +269,8 @@ over a rather long period of time, but improvements are always welcome!
>  		the machine.
>  
>  	d.	Use call_rcu_bh() rather than call_rcu(), in order to take
> -		advantage of call_rcu_bh()'s faster grace periods.
> +		advantage of call_rcu_bh()'s faster grace periods.  (This
> +		is only a partial solution, though.)
>  
>  	e.	Periodically invoke synchronize_rcu(), permitting a limited
>  		number of updates per grace period.
> @@ -276,6 +278,13 @@ over a rather long period of time, but improvements are always welcome!
>  	The same cautions apply to call_rcu_bh(), call_rcu_sched(),
>  	call_srcu(), and kfree_rcu().
>  
> +	Note that although these primitives do take action to avoid memory
> +	exhaustion when any given CPU has too many callbacks, a determined
> +	user could still exhaust memory.  This is especially the case
> +	if a system with a large number of CPUs has been configured to
> +	offload all of its RCU callbacks onto a single CPU, or if the
> +	system has relatively little free memory.
> +
>  9.	All RCU list-traversal primitives, which include
>  	rcu_dereference(), list_for_each_entry_rcu(), and
>  	list_for_each_safe_rcu(), must be either within an RCU read-side
> -- 
> 1.8.1.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence
  2014-02-17 21:26   ` [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence Paul E. McKenney
@ 2014-02-17 21:40     ` Josh Triplett
  2014-02-17 22:52       ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Josh Triplett @ 2014-02-17 21:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 01:26:49PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> The ACCESS_ONCE() primitive provides cache coherence, but the
> documentation does not clearly state this.  This commit therefore upgrades
> the documentation.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Punctuation nit below; otherwise:
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

>  Documentation/memory-barriers.txt | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 102dc19c4119..ad6db1d48f1f 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1249,6 +1249,23 @@ The ACCESS_ONCE() function can prevent any number of optimizations that,
>  while perfectly safe in single-threaded code, can be fatal in concurrent
>  code.  Here are some examples of these sorts of optimizations:
>  
> + (*) The compiler is within its rights to reorder loads and stores
> +     to the same variable, and in some cases, the CPU is within its
> +     rights to reorder loads to the same variable.  This means that
> +     the following code:
> +
> +	a[0] = x;
> +	a[1] = x;
> +
> +     Might result in an older value of x stored in a[1] than in a[0].
> +     Prevent both the compiler and the CPU from doing this as follows:
> +
> +	a[0] = ACCESS_ONCE(x);
> +	a[1] = ACCESS_ONCE(x);
> +
> +     In short, ACCESS_ONCE() provides "cache coherence" for accesses from
> +     multiple CPUs to a single variable.

You don't need to "quote" the well-established term "cache coherence".

>   (*) The compiler is within its rights to merge successive loads from
>       the same variable.  Such merging can cause the compiler to "optimize"
>       the following code:
> -- 
> 1.8.1.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-17 21:26   ` [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies Paul E. McKenney
@ 2014-02-17 21:46     ` Josh Triplett
  2014-02-17 22:58       ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Josh Triplett @ 2014-02-17 21:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> Current compilers can "speculate" stores in the case where both legs
> of the "if" statement start with identical stores.  Because the stores
> are identical, the compiler knows that the store will unconditionally
> execute regardless of the "if" condition, and so the compiler is within
> its rights to hoist the store to precede the condition.  Such hoisting
> destroys the control-dependency ordering.  This ordering can be restored
> by placing a barrier() at the beginning of each leg of the "if" statement.
> This commit adds this requirement to the control-dependencies section.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

This is starting to become a rather unreasonable level of fighting the
compiler.  ("Threads cannot be implemented as a library" indeed.)  This
doesn't seem like a reasonable thing to require users to do.  Is there
really no other way to cope with this particular bit of "help" from the
compiler?

>  Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index f2668c19807e..adfaca831a90 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -608,26 +608,30 @@ as follows:
>  	b = p;  /* BUG: Compiler can reorder!!! */
>  	do_something();
>  
> -The solution is again ACCESS_ONCE(), which preserves the ordering between
> -the load from variable 'a' and the store to variable 'b':
> +The solution is again ACCESS_ONCE() and barrier(), which preserves the
> +ordering between the load from variable 'a' and the store to variable 'b':
>  
>  	q = ACCESS_ONCE(a);
>  	if (q) {
> +		barrier();
>  		ACCESS_ONCE(b) = p;
>  		do_something();
>  	} else {
> +		barrier();
>  		ACCESS_ONCE(b) = p;
>  		do_something_else();
>  	}
>  
> -You could also use barrier() to prevent the compiler from moving
> -the stores to variable 'b', but barrier() would not prevent the
> -compiler from proving to itself that a==1 always, so ACCESS_ONCE()
> -is also needed.
> +The initial ACCESS_ONCE() is required to prevent the compiler from
> +proving the value of 'a', and the pair of barrier() invocations are
> +required to prevent the compiler from pulling the two identical stores
> +to 'b' out from the legs of the "if" statement.
>  
>  It is important to note that control dependencies absolutely require a
>  a conditional.  For example, the following "optimized" version of
> -the above example breaks ordering:
> +the above example breaks ordering, which is why the barrier() invocations
> +are absolutely required if you have identical stores in both legs of
> +the "if" statement:
>  
>  	q = ACCESS_ONCE(a);
>  	ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
> @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
>  for example, as follows:
>  
>  	if (ACCESS_ONCE(a) > 0) {
> +		barrier();
>  		ACCESS_ONCE(b) = q / 2;
>  		do_something();
>  	} else {
> +		barrier();
>  		ACCESS_ONCE(b) = q / 3;
>  		do_something_else();
>  	}
> @@ -659,9 +665,11 @@ the needed conditional.  For example:
>  
>  	q = ACCESS_ONCE(a);
>  	if (q % MAX) {
> +		barrier();
>  		ACCESS_ONCE(b) = p;
>  		do_something();
>  	} else {
> +		barrier();
>  		ACCESS_ONCE(b) = p;
>  		do_something_else();
>  	}
> @@ -723,6 +731,10 @@ In summary:
>        use smb_rmb(), smp_wmb(), or, in the case of prior stores and
>        later loads, smp_mb().
>  
> +  (*) If both legs of the "if" statement begin with identical stores
> +      to the same variable, a barrier() statement is required at the
> +      beginning of each leg of the "if" statement.
> +
>    (*) Control dependencies require at least one run-time conditional
>        between the prior load and the subsequent store, and this
>        conditional must involve the prior load.  If the compiler
> -- 
> 1.8.1.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 0/6] Documentation changes for 3.15
  2014-02-17 21:26 [PATCH tip/core/rcu 0/6] Documentation changes for 3.15 Paul E. McKenney
  2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
@ 2014-02-17 21:47 ` Josh Triplett
  1 sibling, 0 replies; 20+ messages in thread
From: Josh Triplett @ 2014-02-17 21:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 01:26:25PM -0800, Paul E. McKenney wrote:
> Hello!
> 
> This series provides a variety of documentation updates:
> 
> 1.	Update the documentation on actions to take to avoid frenetic
> 	call_rcu() invocations from exhausting memory, including a mention
> 	of internal-to-RCU avoidance measures.
> 
> 2.	Add a note to Documentation/memory-barriers.txt stating that
> 	ACCESS_ONCE() provides cache coherence for accesses to any
> 	single given variable.
> 
> 3.	Add an explicit statement that in control dependencies, the
> 	condition must include the load in question.
> 
> 4.	Add WQ_SYSFS workqueues as a work-offloading option in
> 	Documentation/kernel-per-CPU-kthreads.txt.
> 
> 5.	It turns out that some types of control dependencies require
> 	memory barriers, most notably when the exact same store is
> 	at the beginning of both the then-clause and the else-clause.
> 	Document this.
> 
> 6.	Fixups to RTFP.txt

I've replied to 1, 2, and 5 with comments.  For 3, 4, and 6:
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations
  2014-02-17 21:39   ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Josh Triplett
@ 2014-02-17 22:52     ` Paul E. McKenney
  0 siblings, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 22:52 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 01:39:30PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 01:26:48PM -0800, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > The call_rcu() family of primitives will take action to accelerate
> > grace periods when the number of callbacks pending on a given CPU
> > becomes excessive.  Although this safety mechanism can be useful,
> > it is no substitute for users of call_rcu() having rate-limit controls
> > in place.  This commit adds this nuance to the documentation.
> > 
> > Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
> > Reported-by: Gleb Natapov <gleb@redhat.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Grammatical nit below; otherwise:
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> 
> >  Documentation/RCU/checklist.txt | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
> > index 91266193b8f4..5733e31836b5 100644
> > --- a/Documentation/RCU/checklist.txt
> > +++ b/Documentation/RCU/checklist.txt
> > @@ -256,10 +256,11 @@ over a rather long period of time, but improvements are always welcome!
> >  		variations on this theme.
> >  
> >  	b.	Limiting update rate.  For example, if updates occur only
> > -		once per hour, then no explicit rate limiting is required,
> > -		unless your system is already badly broken.  The dcache
> > -		subsystem takes this approach -- updates are guarded
> > -		by a global lock, limiting their rate.
> > +		once per hour, then no explicit rate limiting is
> > +		required, unless your system is already badly broken.
> > +		Older versions of the dcache subsystem takes this
> > +		approach -- updates were guarded by a global lock,
> > +		limiting their rate.
> 
> s/takes/take/ to match the change from the singular "The dcache
> subsystem" to the plural "Older versions of the dcache subsystem"
> 
> (You might also change " -- updates are guarded by" to ", guarding
> updates with".)

Took both suggested changes and applied your Reviewed-by.  Thank you!

							Thanx, Paul

> >  	c.	Trusted update -- if updates can only be done manually by
> >  		superuser or some other trusted user, then it might not
> > @@ -268,7 +269,8 @@ over a rather long period of time, but improvements are always welcome!
> >  		the machine.
> >  
> >  	d.	Use call_rcu_bh() rather than call_rcu(), in order to take
> > -		advantage of call_rcu_bh()'s faster grace periods.
> > +		advantage of call_rcu_bh()'s faster grace periods.  (This
> > +		is only a partial solution, though.)
> >  
> >  	e.	Periodically invoke synchronize_rcu(), permitting a limited
> >  		number of updates per grace period.
> > @@ -276,6 +278,13 @@ over a rather long period of time, but improvements are always welcome!
> >  	The same cautions apply to call_rcu_bh(), call_rcu_sched(),
> >  	call_srcu(), and kfree_rcu().
> >  
> > +	Note that although these primitives do take action to avoid memory
> > +	exhaustion when any given CPU has too many callbacks, a determined
> > +	user could still exhaust memory.  This is especially the case
> > +	if a system with a large number of CPUs has been configured to
> > +	offload all of its RCU callbacks onto a single CPU, or if the
> > +	system has relatively little free memory.
> > +
> >  9.	All RCU list-traversal primitives, which include
> >  	rcu_dereference(), list_for_each_entry_rcu(), and
> >  	list_for_each_safe_rcu(), must be either within an RCU read-side
> > -- 
> > 1.8.1.5
> > 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence
  2014-02-17 21:40     ` Josh Triplett
@ 2014-02-17 22:52       ` Paul E. McKenney
  0 siblings, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 22:52 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 01:40:56PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 01:26:49PM -0800, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > The ACCESS_ONCE() primitive provides cache coherence, but the
> > documentation does not clearly state this.  This commit therefore upgrades
> > the documentation.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Punctuation nit below; otherwise:
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> 
> >  Documentation/memory-barriers.txt | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > index 102dc19c4119..ad6db1d48f1f 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1249,6 +1249,23 @@ The ACCESS_ONCE() function can prevent any number of optimizations that,
> >  while perfectly safe in single-threaded code, can be fatal in concurrent
> >  code.  Here are some examples of these sorts of optimizations:
> >  
> > + (*) The compiler is within its rights to reorder loads and stores
> > +     to the same variable, and in some cases, the CPU is within its
> > +     rights to reorder loads to the same variable.  This means that
> > +     the following code:
> > +
> > +	a[0] = x;
> > +	a[1] = x;
> > +
> > +     Might result in an older value of x stored in a[1] than in a[0].
> > +     Prevent both the compiler and the CPU from doing this as follows:
> > +
> > +	a[0] = ACCESS_ONCE(x);
> > +	a[1] = ACCESS_ONCE(x);
> > +
> > +     In short, ACCESS_ONCE() provides "cache coherence" for accesses from
> > +     multiple CPUs to a single variable.
> 
> You don't need to "quote" the well-established term "cache coherence".

Good point, fixed and applied your Reviewed-by, thank you!

							Thanx, Paul

> >   (*) The compiler is within its rights to merge successive loads from
> >       the same variable.  Such merging can cause the compiler to "optimize"
> >       the following code:
> > -- 
> > 1.8.1.5
> > 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-17 21:46     ` Josh Triplett
@ 2014-02-17 22:58       ` Paul E. McKenney
  2014-02-18  0:02         ` Josh Triplett
  0 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-17 22:58 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > Current compilers can "speculate" stores in the case where both legs
> > of the "if" statement start with identical stores.  Because the stores
> > are identical, the compiler knows that the store will unconditionally
> > execute regardless of the "if" condition, and so the compiler is within
> > its rights to hoist the store to precede the condition.  Such hoisting
> > destroys the control-dependency ordering.  This ordering can be restored
> > by placing a barrier() at the beginning of each leg of the "if" statement.
> > This commit adds this requirement to the control-dependencies section.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> This is starting to become a rather unreasonable level of fighting the
> compiler.  ("Threads cannot be implemented as a library" indeed.)  This
> doesn't seem like a reasonable thing to require users to do.  Is there
> really no other way to cope with this particular bit of "help" from the
> compiler?

Well, we could use smp_mb() instead of barrier(), but that was the
sort of thing that Peter Zijlstra was trying to avoid.  That said,
I do sympathize completely with your position here -- it is just that
it is better to have our compiler-fights documented that not, right?

							Thanx, Paul

> >  Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
> >  1 file changed, 19 insertions(+), 7 deletions(-)
> > 
> > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > index f2668c19807e..adfaca831a90 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -608,26 +608,30 @@ as follows:
> >  	b = p;  /* BUG: Compiler can reorder!!! */
> >  	do_something();
> >  
> > -The solution is again ACCESS_ONCE(), which preserves the ordering between
> > -the load from variable 'a' and the store to variable 'b':
> > +The solution is again ACCESS_ONCE() and barrier(), which preserves the
> > +ordering between the load from variable 'a' and the store to variable 'b':
> >  
> >  	q = ACCESS_ONCE(a);
> >  	if (q) {
> > +		barrier();
> >  		ACCESS_ONCE(b) = p;
> >  		do_something();
> >  	} else {
> > +		barrier();
> >  		ACCESS_ONCE(b) = p;
> >  		do_something_else();
> >  	}
> >  
> > -You could also use barrier() to prevent the compiler from moving
> > -the stores to variable 'b', but barrier() would not prevent the
> > -compiler from proving to itself that a==1 always, so ACCESS_ONCE()
> > -is also needed.
> > +The initial ACCESS_ONCE() is required to prevent the compiler from
> > +proving the value of 'a', and the pair of barrier() invocations are
> > +required to prevent the compiler from pulling the two identical stores
> > +to 'b' out from the legs of the "if" statement.
> >  
> >  It is important to note that control dependencies absolutely require a
> >  a conditional.  For example, the following "optimized" version of
> > -the above example breaks ordering:
> > +the above example breaks ordering, which is why the barrier() invocations
> > +are absolutely required if you have identical stores in both legs of
> > +the "if" statement:
> >  
> >  	q = ACCESS_ONCE(a);
> >  	ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
> > @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
> >  for example, as follows:
> >  
> >  	if (ACCESS_ONCE(a) > 0) {
> > +		barrier();
> >  		ACCESS_ONCE(b) = q / 2;
> >  		do_something();
> >  	} else {
> > +		barrier();
> >  		ACCESS_ONCE(b) = q / 3;
> >  		do_something_else();
> >  	}
> > @@ -659,9 +665,11 @@ the needed conditional.  For example:
> >  
> >  	q = ACCESS_ONCE(a);
> >  	if (q % MAX) {
> > +		barrier();
> >  		ACCESS_ONCE(b) = p;
> >  		do_something();
> >  	} else {
> > +		barrier();
> >  		ACCESS_ONCE(b) = p;
> >  		do_something_else();
> >  	}
> > @@ -723,6 +731,10 @@ In summary:
> >        use smb_rmb(), smp_wmb(), or, in the case of prior stores and
> >        later loads, smp_mb().
> >  
> > +  (*) If both legs of the "if" statement begin with identical stores
> > +      to the same variable, a barrier() statement is required at the
> > +      beginning of each leg of the "if" statement.
> > +
> >    (*) Control dependencies require at least one run-time conditional
> >        between the prior load and the subsequent store, and this
> >        conditional must involve the prior load.  If the compiler
> > -- 
> > 1.8.1.5
> > 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-17 22:58       ` Paul E. McKenney
@ 2014-02-18  0:02         ` Josh Triplett
  2014-02-18  0:17           ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Josh Triplett @ 2014-02-18  0:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > 
> > > Current compilers can "speculate" stores in the case where both legs
> > > of the "if" statement start with identical stores.  Because the stores
> > > are identical, the compiler knows that the store will unconditionally
> > > execute regardless of the "if" condition, and so the compiler is within
> > > its rights to hoist the store to precede the condition.  Such hoisting
> > > destroys the control-dependency ordering.  This ordering can be restored
> > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > This commit adds this requirement to the control-dependencies section.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > This is starting to become a rather unreasonable level of fighting the
> > compiler.  ("Threads cannot be implemented as a library" indeed.)  This
> > doesn't seem like a reasonable thing to require users to do.  Is there
> > really no other way to cope with this particular bit of "help" from the
> > compiler?
> 
> Well, we could use smp_mb() instead of barrier(), but that was the
> sort of thing that Peter Zijlstra was trying to avoid.

Yeah, that's not an improvement.  The goal would be to make the code no
more complex than it already needs to be with ACCESS_ONCE; changing
"barrier()" to something else doesn't help (quite apart from smp_mb()
being suboptimal).

> That said, I do sympathize completely with your position here -- it is
> just that it is better to have our compiler-fights documented that
> not, right?

Sure, better to document them, but better still to not have them.  Is
there some other way we could avoid this one entirely?

- Josh Triplett

> > >  Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
> > >  1 file changed, 19 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > > index f2668c19807e..adfaca831a90 100644
> > > --- a/Documentation/memory-barriers.txt
> > > +++ b/Documentation/memory-barriers.txt
> > > @@ -608,26 +608,30 @@ as follows:
> > >  	b = p;  /* BUG: Compiler can reorder!!! */
> > >  	do_something();
> > >  
> > > -The solution is again ACCESS_ONCE(), which preserves the ordering between
> > > -the load from variable 'a' and the store to variable 'b':
> > > +The solution is again ACCESS_ONCE() and barrier(), which preserves the
> > > +ordering between the load from variable 'a' and the store to variable 'b':
> > >  
> > >  	q = ACCESS_ONCE(a);
> > >  	if (q) {
> > > +		barrier();
> > >  		ACCESS_ONCE(b) = p;
> > >  		do_something();
> > >  	} else {
> > > +		barrier();
> > >  		ACCESS_ONCE(b) = p;
> > >  		do_something_else();
> > >  	}
> > >  
> > > -You could also use barrier() to prevent the compiler from moving
> > > -the stores to variable 'b', but barrier() would not prevent the
> > > -compiler from proving to itself that a==1 always, so ACCESS_ONCE()
> > > -is also needed.
> > > +The initial ACCESS_ONCE() is required to prevent the compiler from
> > > +proving the value of 'a', and the pair of barrier() invocations are
> > > +required to prevent the compiler from pulling the two identical stores
> > > +to 'b' out from the legs of the "if" statement.
> > >  
> > >  It is important to note that control dependencies absolutely require a
> > >  a conditional.  For example, the following "optimized" version of
> > > -the above example breaks ordering:
> > > +the above example breaks ordering, which is why the barrier() invocations
> > > +are absolutely required if you have identical stores in both legs of
> > > +the "if" statement:
> > >  
> > >  	q = ACCESS_ONCE(a);
> > >  	ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
> > > @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
> > >  for example, as follows:
> > >  
> > >  	if (ACCESS_ONCE(a) > 0) {
> > > +		barrier();
> > >  		ACCESS_ONCE(b) = q / 2;
> > >  		do_something();
> > >  	} else {
> > > +		barrier();
> > >  		ACCESS_ONCE(b) = q / 3;
> > >  		do_something_else();
> > >  	}
> > > @@ -659,9 +665,11 @@ the needed conditional.  For example:
> > >  
> > >  	q = ACCESS_ONCE(a);
> > >  	if (q % MAX) {
> > > +		barrier();
> > >  		ACCESS_ONCE(b) = p;
> > >  		do_something();
> > >  	} else {
> > > +		barrier();
> > >  		ACCESS_ONCE(b) = p;
> > >  		do_something_else();
> > >  	}
> > > @@ -723,6 +731,10 @@ In summary:
> > >        use smb_rmb(), smp_wmb(), or, in the case of prior stores and
> > >        later loads, smp_mb().
> > >  
> > > +  (*) If both legs of the "if" statement begin with identical stores
> > > +      to the same variable, a barrier() statement is required at the
> > > +      beginning of each leg of the "if" statement.
> > > +
> > >    (*) Control dependencies require at least one run-time conditional
> > >        between the prior load and the subsequent store, and this
> > >        conditional must involve the prior load.  If the compiler
> > > -- 
> > > 1.8.1.5
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-18  0:02         ` Josh Triplett
@ 2014-02-18  0:17           ` Paul E. McKenney
  2014-02-18  0:45             ` Josh Triplett
  0 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-18  0:17 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > Current compilers can "speculate" stores in the case where both legs
> > > > of the "if" statement start with identical stores.  Because the stores
> > > > are identical, the compiler knows that the store will unconditionally
> > > > execute regardless of the "if" condition, and so the compiler is within
> > > > its rights to hoist the store to precede the condition.  Such hoisting
> > > > destroys the control-dependency ordering.  This ordering can be restored
> > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > This commit adds this requirement to the control-dependencies section.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > This is starting to become a rather unreasonable level of fighting the
> > > compiler.  ("Threads cannot be implemented as a library" indeed.)  This
> > > doesn't seem like a reasonable thing to require users to do.  Is there
> > > really no other way to cope with this particular bit of "help" from the
> > > compiler?
> > 
> > Well, we could use smp_mb() instead of barrier(), but that was the
> > sort of thing that Peter Zijlstra was trying to avoid.
> 
> Yeah, that's not an improvement.  The goal would be to make the code no
> more complex than it already needs to be with ACCESS_ONCE; changing
> "barrier()" to something else doesn't help (quite apart from smp_mb()
> being suboptimal).
> 
> > That said, I do sympathize completely with your position here -- it is
> > just that it is better to have our compiler-fights documented that
> > not, right?
> 
> Sure, better to document them, but better still to not have them.  Is
> there some other way we could avoid this one entirely?

We could try change the standard so as to outlaw pulling common code from
both legs of an "if" statement, but that will be a serious uphill battle.
Or perhaps do something to warn the developer about the possibility of
this happening.

Other thoughts?

							Thanx, Paul

> - Josh Triplett
> 
> > > >  Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
> > > >  1 file changed, 19 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > > > index f2668c19807e..adfaca831a90 100644
> > > > --- a/Documentation/memory-barriers.txt
> > > > +++ b/Documentation/memory-barriers.txt
> > > > @@ -608,26 +608,30 @@ as follows:
> > > >  	b = p;  /* BUG: Compiler can reorder!!! */
> > > >  	do_something();
> > > >  
> > > > -The solution is again ACCESS_ONCE(), which preserves the ordering between
> > > > -the load from variable 'a' and the store to variable 'b':
> > > > +The solution is again ACCESS_ONCE() and barrier(), which preserves the
> > > > +ordering between the load from variable 'a' and the store to variable 'b':
> > > >  
> > > >  	q = ACCESS_ONCE(a);
> > > >  	if (q) {
> > > > +		barrier();
> > > >  		ACCESS_ONCE(b) = p;
> > > >  		do_something();
> > > >  	} else {
> > > > +		barrier();
> > > >  		ACCESS_ONCE(b) = p;
> > > >  		do_something_else();
> > > >  	}
> > > >  
> > > > -You could also use barrier() to prevent the compiler from moving
> > > > -the stores to variable 'b', but barrier() would not prevent the
> > > > -compiler from proving to itself that a==1 always, so ACCESS_ONCE()
> > > > -is also needed.
> > > > +The initial ACCESS_ONCE() is required to prevent the compiler from
> > > > +proving the value of 'a', and the pair of barrier() invocations are
> > > > +required to prevent the compiler from pulling the two identical stores
> > > > +to 'b' out from the legs of the "if" statement.
> > > >  
> > > >  It is important to note that control dependencies absolutely require a
> > > >  a conditional.  For example, the following "optimized" version of
> > > > -the above example breaks ordering:
> > > > +the above example breaks ordering, which is why the barrier() invocations
> > > > +are absolutely required if you have identical stores in both legs of
> > > > +the "if" statement:
> > > >  
> > > >  	q = ACCESS_ONCE(a);
> > > >  	ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
> > > > @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
> > > >  for example, as follows:
> > > >  
> > > >  	if (ACCESS_ONCE(a) > 0) {
> > > > +		barrier();
> > > >  		ACCESS_ONCE(b) = q / 2;
> > > >  		do_something();
> > > >  	} else {
> > > > +		barrier();
> > > >  		ACCESS_ONCE(b) = q / 3;
> > > >  		do_something_else();
> > > >  	}
> > > > @@ -659,9 +665,11 @@ the needed conditional.  For example:
> > > >  
> > > >  	q = ACCESS_ONCE(a);
> > > >  	if (q % MAX) {
> > > > +		barrier();
> > > >  		ACCESS_ONCE(b) = p;
> > > >  		do_something();
> > > >  	} else {
> > > > +		barrier();
> > > >  		ACCESS_ONCE(b) = p;
> > > >  		do_something_else();
> > > >  	}
> > > > @@ -723,6 +731,10 @@ In summary:
> > > >        use smb_rmb(), smp_wmb(), or, in the case of prior stores and
> > > >        later loads, smp_mb().
> > > >  
> > > > +  (*) If both legs of the "if" statement begin with identical stores
> > > > +      to the same variable, a barrier() statement is required at the
> > > > +      beginning of each leg of the "if" statement.
> > > > +
> > > >    (*) Control dependencies require at least one run-time conditional
> > > >        between the prior load and the subsequent store, and this
> > > >        conditional must involve the prior load.  If the compiler
> > > > -- 
> > > > 1.8.1.5
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-18  0:17           ` Paul E. McKenney
@ 2014-02-18  0:45             ` Josh Triplett
  2014-02-18  1:21               ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Josh Triplett @ 2014-02-18  0:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 04:17:40PM -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> > On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > > 
> > > > > Current compilers can "speculate" stores in the case where both legs
> > > > > of the "if" statement start with identical stores.  Because the stores
> > > > > are identical, the compiler knows that the store will unconditionally
> > > > > execute regardless of the "if" condition, and so the compiler is within
> > > > > its rights to hoist the store to precede the condition.  Such hoisting
> > > > > destroys the control-dependency ordering.  This ordering can be restored
> > > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > > This commit adds this requirement to the control-dependencies section.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > This is starting to become a rather unreasonable level of fighting the
> > > > compiler.  ("Threads cannot be implemented as a library" indeed.)  This
> > > > doesn't seem like a reasonable thing to require users to do.  Is there
> > > > really no other way to cope with this particular bit of "help" from the
> > > > compiler?
> > > 
> > > Well, we could use smp_mb() instead of barrier(), but that was the
> > > sort of thing that Peter Zijlstra was trying to avoid.
> > 
> > Yeah, that's not an improvement.  The goal would be to make the code no
> > more complex than it already needs to be with ACCESS_ONCE; changing
> > "barrier()" to something else doesn't help (quite apart from smp_mb()
> > being suboptimal).
> > 
> > > That said, I do sympathize completely with your position here -- it is
> > > just that it is better to have our compiler-fights documented that
> > > not, right?
> > 
> > Sure, better to document them, but better still to not have them.  Is
> > there some other way we could avoid this one entirely?
> 
> We could try change the standard so as to outlaw pulling common code from
> both legs of an "if" statement, but that will be a serious uphill battle.

And insufficient given widespread use of existing compilers.

> Or perhaps do something to warn the developer about the possibility of
> this happening.
> 
> Other thoughts?

Might be worth bringing this up with the GCC folks to find out if
there's something obvious we're missing.  (For non-obvious values of
"obvious".)

- Josh Triplett

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-18  0:45             ` Josh Triplett
@ 2014-02-18  1:21               ` Paul E. McKenney
  2014-02-18  3:29                 ` Josh Triplett
  0 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-18  1:21 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 04:45:11PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 04:17:40PM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> > > On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > > > 
> > > > > > Current compilers can "speculate" stores in the case where both legs
> > > > > > of the "if" statement start with identical stores.  Because the stores
> > > > > > are identical, the compiler knows that the store will unconditionally
> > > > > > execute regardless of the "if" condition, and so the compiler is within
> > > > > > its rights to hoist the store to precede the condition.  Such hoisting
> > > > > > destroys the control-dependency ordering.  This ordering can be restored
> > > > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > > > This commit adds this requirement to the control-dependencies section.
> > > > > > 
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > 
> > > > > This is starting to become a rather unreasonable level of fighting the
> > > > > compiler.  ("Threads cannot be implemented as a library" indeed.)  This
> > > > > doesn't seem like a reasonable thing to require users to do.  Is there
> > > > > really no other way to cope with this particular bit of "help" from the
> > > > > compiler?
> > > > 
> > > > Well, we could use smp_mb() instead of barrier(), but that was the
> > > > sort of thing that Peter Zijlstra was trying to avoid.
> > > 
> > > Yeah, that's not an improvement.  The goal would be to make the code no
> > > more complex than it already needs to be with ACCESS_ONCE; changing
> > > "barrier()" to something else doesn't help (quite apart from smp_mb()
> > > being suboptimal).
> > > 
> > > > That said, I do sympathize completely with your position here -- it is
> > > > just that it is better to have our compiler-fights documented that
> > > > not, right?
> > > 
> > > Sure, better to document them, but better still to not have them.  Is
> > > there some other way we could avoid this one entirely?
> > 
> > We could try change the standard so as to outlaw pulling common code from
> > both legs of an "if" statement, but that will be a serious uphill battle.
> 
> And insufficient given widespread use of existing compilers.

Fair point...

> > Or perhaps do something to warn the developer about the possibility of
> > this happening.
> > 
> > Other thoughts?
> 
> Might be worth bringing this up with the GCC folks to find out if
> there's something obvious we're missing.  (For non-obvious values of
> "obvious".)

Non-obvious values of "obvious" -- I have no idea what that means, but
it does have a nice counter-intuitive sound to it, doesn't it?  ;-)

This conversation has started, albeit with much more noise and smoke
than signal or light.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-18  1:21               ` Paul E. McKenney
@ 2014-02-18  3:29                 ` Josh Triplett
  2014-02-18  4:57                   ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Josh Triplett @ 2014-02-18  3:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 05:21:37PM -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 04:45:11PM -0800, Josh Triplett wrote:
> > On Mon, Feb 17, 2014 at 04:17:40PM -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> > > > On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > > > > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > > > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > > > > 
> > > > > > > Current compilers can "speculate" stores in the case where both legs
> > > > > > > of the "if" statement start with identical stores.  Because the stores
> > > > > > > are identical, the compiler knows that the store will unconditionally
> > > > > > > execute regardless of the "if" condition, and so the compiler is within
> > > > > > > its rights to hoist the store to precede the condition.  Such hoisting
> > > > > > > destroys the control-dependency ordering.  This ordering can be restored
> > > > > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > > > > This commit adds this requirement to the control-dependencies section.
> > > > > > > 
> > > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > > 
> > > > > > This is starting to become a rather unreasonable level of fighting the
> > > > > > compiler.  ("Threads cannot be implemented as a library" indeed.)  This
> > > > > > doesn't seem like a reasonable thing to require users to do.  Is there
> > > > > > really no other way to cope with this particular bit of "help" from the
> > > > > > compiler?
> > > > > 
> > > > > Well, we could use smp_mb() instead of barrier(), but that was the
> > > > > sort of thing that Peter Zijlstra was trying to avoid.
> > > > 
> > > > Yeah, that's not an improvement.  The goal would be to make the code no
> > > > more complex than it already needs to be with ACCESS_ONCE; changing
> > > > "barrier()" to something else doesn't help (quite apart from smp_mb()
> > > > being suboptimal).
> > > > 
> > > > > That said, I do sympathize completely with your position here -- it is
> > > > > just that it is better to have our compiler-fights documented that
> > > > > not, right?
> > > > 
> > > > Sure, better to document them, but better still to not have them.  Is
> > > > there some other way we could avoid this one entirely?
> > > 
> > > We could try change the standard so as to outlaw pulling common code from
> > > both legs of an "if" statement, but that will be a serious uphill battle.
> > 
> > And insufficient given widespread use of existing compilers.
> 
> Fair point...
> 
> > > Or perhaps do something to warn the developer about the possibility of
> > > this happening.
> > > 
> > > Other thoughts?
> > 
> > Might be worth bringing this up with the GCC folks to find out if
> > there's something obvious we're missing.  (For non-obvious values of
> > "obvious".)
> 
> Non-obvious values of "obvious" -- I have no idea what that means, but
> it does have a nice counter-intuitive sound to it, doesn't it?  ;-)

I'm hoping for something that we'll consider obvious in hindsight.

> This conversation has started, albeit with much more noise and smoke
> than signal or light.

Whereabouts is that conversation taking place?

- Josh Triplett

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies
  2014-02-18  3:29                 ` Josh Triplett
@ 2014-02-18  4:57                   ` Paul E. McKenney
  0 siblings, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2014-02-18  4:57 UTC (permalink / raw)
  To: Josh Triplett
  Cc: linux-kernel, mingo, laijs, dipankar, akpm, mathieu.desnoyers,
	niv, tglx, peterz, rostedt, dhowells, edumazet, darren, fweisbec,
	oleg, sbw

On Mon, Feb 17, 2014 at 07:29:55PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 05:21:37PM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 17, 2014 at 04:45:11PM -0800, Josh Triplett wrote:
> > > On Mon, Feb 17, 2014 at 04:17:40PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> > > > > On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > > > > > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > > > > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > > > > > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > > > > > > > 
> > > > > > > > Current compilers can "speculate" stores in the case where both legs
> > > > > > > > of the "if" statement start with identical stores.  Because the stores
> > > > > > > > are identical, the compiler knows that the store will unconditionally
> > > > > > > > execute regardless of the "if" condition, and so the compiler is within
> > > > > > > > its rights to hoist the store to precede the condition.  Such hoisting
> > > > > > > > destroys the control-dependency ordering.  This ordering can be restored
> > > > > > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > > > > > This commit adds this requirement to the control-dependencies section.
> > > > > > > > 
> > > > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > > > 
> > > > > > > This is starting to become a rather unreasonable level of fighting the
> > > > > > > compiler.  ("Threads cannot be implemented as a library" indeed.)  This
> > > > > > > doesn't seem like a reasonable thing to require users to do.  Is there
> > > > > > > really no other way to cope with this particular bit of "help" from the
> > > > > > > compiler?
> > > > > > 
> > > > > > Well, we could use smp_mb() instead of barrier(), but that was the
> > > > > > sort of thing that Peter Zijlstra was trying to avoid.
> > > > > 
> > > > > Yeah, that's not an improvement.  The goal would be to make the code no
> > > > > more complex than it already needs to be with ACCESS_ONCE; changing
> > > > > "barrier()" to something else doesn't help (quite apart from smp_mb()
> > > > > being suboptimal).
> > > > > 
> > > > > > That said, I do sympathize completely with your position here -- it is
> > > > > > just that it is better to have our compiler-fights documented that
> > > > > > not, right?
> > > > > 
> > > > > Sure, better to document them, but better still to not have them.  Is
> > > > > there some other way we could avoid this one entirely?
> > > > 
> > > > We could try change the standard so as to outlaw pulling common code from
> > > > both legs of an "if" statement, but that will be a serious uphill battle.
> > > 
> > > And insufficient given widespread use of existing compilers.
> > 
> > Fair point...
> > 
> > > > Or perhaps do something to warn the developer about the possibility of
> > > > this happening.
> > > > 
> > > > Other thoughts?
> > > 
> > > Might be worth bringing this up with the GCC folks to find out if
> > > there's something obvious we're missing.  (For non-obvious values of
> > > "obvious".)
> > 
> > Non-obvious values of "obvious" -- I have no idea what that means, but
> > it does have a nice counter-intuitive sound to it, doesn't it?  ;-)
> 
> I'm hoping for something that we'll consider obvious in hindsight.
> 
> > This conversation has started, albeit with much more noise and smoke
> > than signal or light.
> 
> Whereabouts is that conversation taking place?

Here you go: https://lkml.org/lkml/2014/2/6/193

Not too bad, only about 80 messages thus far.  Spirited at times, though.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-02-18  4:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-17 21:26 [PATCH tip/core/rcu 0/6] Documentation changes for 3.15 Paul E. McKenney
2014-02-17 21:26 ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Paul E. McKenney
2014-02-17 21:26   ` [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence Paul E. McKenney
2014-02-17 21:40     ` Josh Triplett
2014-02-17 22:52       ` Paul E. McKenney
2014-02-17 21:26   ` [PATCH tip/core/rcu 3/6] Documentation/memory-barriers.txt: Conditional must use prior load Paul E. McKenney
2014-02-17 21:26   ` [PATCH tip/core/rcu 4/6] Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity Paul E. McKenney
2014-02-17 21:26   ` [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies Paul E. McKenney
2014-02-17 21:46     ` Josh Triplett
2014-02-17 22:58       ` Paul E. McKenney
2014-02-18  0:02         ` Josh Triplett
2014-02-18  0:17           ` Paul E. McKenney
2014-02-18  0:45             ` Josh Triplett
2014-02-18  1:21               ` Paul E. McKenney
2014-02-18  3:29                 ` Josh Triplett
2014-02-18  4:57                   ` Paul E. McKenney
2014-02-17 21:26   ` [PATCH tip/core/rcu 6/6] documentation: Fix some inconsistencies in RTFP.txt Paul E. McKenney
2014-02-17 21:39   ` [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations Josh Triplett
2014-02-17 22:52     ` Paul E. McKenney
2014-02-17 21:47 ` [PATCH tip/core/rcu 0/6] Documentation changes for 3.15 Josh Triplett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.