* [PATCH 2/6] Documentation/RCU: s/not/note/ in checklist.rst
2023-03-15 19:38 [PATCH 1/6] srcu: Add comments for srcu_size_state Joel Fernandes (Google)
@ 2023-03-15 19:38 ` Joel Fernandes (Google)
2023-03-15 19:38 ` [PATCH 3/6] srcu: Clarify comments on memory barrier "E" Joel Fernandes (Google)
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Joel Fernandes (Google) @ 2023-03-15 19:38 UTC (permalink / raw)
To: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
Jonathan Corbet
Cc: Qiuxu Zhuo, rcu, linux-doc
From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
"Please not that you *cannot* rely..." has a typo.
Fix it.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
Documentation/RCU/checklist.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/RCU/checklist.rst b/Documentation/RCU/checklist.rst
index cc361fb01ed4..bd3c58c44bef 100644
--- a/Documentation/RCU/checklist.rst
+++ b/Documentation/RCU/checklist.rst
@@ -70,7 +70,7 @@ over a rather long period of time, but improvements are always welcome!
can serve as rcu_read_lock_sched(), but is less readable and
prevents lockdep from detecting locking issues.
- Please not that you *cannot* rely on code known to be built
+ Please note that you *cannot* rely on code known to be built
only in non-preemptible kernels. Such code can and will break,
especially in kernels built with CONFIG_PREEMPT_COUNT=y.
--
2.40.0.rc1.284.g88254d51c5-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/6] srcu: Clarify comments on memory barrier "E"
2023-03-15 19:38 [PATCH 1/6] srcu: Add comments for srcu_size_state Joel Fernandes (Google)
2023-03-15 19:38 ` [PATCH 2/6] Documentation/RCU: s/not/note/ in checklist.rst Joel Fernandes (Google)
@ 2023-03-15 19:38 ` Joel Fernandes (Google)
2023-03-15 19:38 ` [PATCH 4/6] Documentation: RCU: Correct spelling Joel Fernandes (Google)
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Joel Fernandes (Google) @ 2023-03-15 19:38 UTC (permalink / raw)
To: linux-kernel, Lai Jiangshan, Paul E. McKenney, Josh Triplett,
Steven Rostedt, Mathieu Desnoyers
Cc: Joel Fernandes (Google), Frederic Weisbecker, Boqun Feng, rcu
There is an smp_mb() named "E" in srcu_flip() immediately before the
increment (flip) of the srcu_struct structure's ->srcu_idx.
The purpose of E is to order the preceding scan's read of lock counters
against the flipping of the ->srcu_idx, in order to prevent new readers
from continuing to use the old ->srcu_idx value, which might needlessly
extend the grace period.
However, this ordering is already enforced because of the control
dependency between the preceding scan and the ->srcu_idx flip.
This control dependency exists because atomic_long_read() is used
to scan the counts, because WRITE_ONCE() is used to flip ->srcu_idx,
and because ->srcu_idx is not flipped until the ->srcu_lock_count[] and
->srcu_unlock_count[] counts match. And such a match cannot happen when
there is an in-flight reader that started before the flip (observation
courtesy Mathieu Desnoyers).
The litmus test below (courtesy of Frederic Weisbecker, with changes
for ctrldep by Boqun and Joel) shows this:
C srcu
(*
* bad condition: P0's first scan (SCAN1) saw P1's idx=0 LOCK count inc, though P1 saw flip.
*
* So basically, the ->po ordering on both P0 and P1 is enforced via ->ppo
* (control deps) on both sides, and both P0 and P1 are interconnected by ->rf
* relations. Combining the ->ppo with ->rf, a cycle is impossible.
*)
{}
// updater
P0(int *IDX, int *LOCK0, int *UNLOCK0, int *LOCK1, int *UNLOCK1)
{
int lock1;
int unlock1;
int lock0;
int unlock0;
// SCAN1
unlock1 = READ_ONCE(*UNLOCK1);
smp_mb(); // A
lock1 = READ_ONCE(*LOCK1);
// FLIP
if (lock1 == unlock1) { // Control dep
smp_mb(); // E // Remove E and still passes.
WRITE_ONCE(*IDX, 1);
smp_mb(); // D
// SCAN2
unlock0 = READ_ONCE(*UNLOCK0);
smp_mb(); // A
lock0 = READ_ONCE(*LOCK0);
}
}
// reader
P1(int *IDX, int *LOCK0, int *UNLOCK0, int *LOCK1, int *UNLOCK1)
{
int tmp;
int idx1;
int idx2;
// 1st reader
idx1 = READ_ONCE(*IDX);
if (idx1 == 0) { // Control dep
tmp = READ_ONCE(*LOCK0);
WRITE_ONCE(*LOCK0, tmp + 1);
smp_mb(); /* B and C */
tmp = READ_ONCE(*UNLOCK0);
WRITE_ONCE(*UNLOCK0, tmp + 1);
} else {
tmp = READ_ONCE(*LOCK1);
WRITE_ONCE(*LOCK1, tmp + 1);
smp_mb(); /* B and C */
tmp = READ_ONCE(*UNLOCK1);
WRITE_ONCE(*UNLOCK1, tmp + 1);
}
}
exists (0:lock1=1 /\ 1:idx1=1)
More complicated litmus tests with multiple SRCU readers also show that
memory barrier E is not needed.
This commit therefore clarifies the comment on memory barrier E.
Why not also remove that redundant smp_mb()?
Because control dependencies are quite fragile due to their not being
recognized by most compilers and tools. Control dependencies therefore
exact an ongoing maintenance burden, and such a burden cannot be justified
in this slowpath. Therefore, that smp_mb() stays until such time as
its overhead becomes a measurable problem in a real workload running on
a real production system, or until such time as compilers start paying
attention to this sort of control dependency.
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/rcu/srcutree.c | 34 +++++++++++++++++++++++++++-------
1 file changed, 27 insertions(+), 7 deletions(-)
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index ab4ee58af84b..68f89c533057 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1085,16 +1085,36 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount)
static void srcu_flip(struct srcu_struct *ssp)
{
/*
- * Ensure that if this updater saw a given reader's increment
- * from __srcu_read_lock(), that reader was using an old value
- * of ->srcu_idx. Also ensure that if a given reader sees the
- * new value of ->srcu_idx, this updater's earlier scans cannot
- * have seen that reader's increments (which is OK, because this
- * grace period need not wait on that reader).
+ * Because the flip of ->srcu_idx is executed only if the
+ * preceding call to srcu_readers_active_idx_check() found that
+ * the ->srcu_unlock_count[] and ->srcu_lock_count[] sums matched
+ * and because that summing uses atomic_long_read(), there is
+ * ordering due to a control dependency between that summing and
+ * the WRITE_ONCE() in this call to srcu_flip(). This ordering
+ * ensures that if this updater saw a given reader's increment from
+ * __srcu_read_lock(), that reader was using a value of ->srcu_idx
+ * from before the previous call to srcu_flip(), which should be
+ * quite rare. This ordering thus helps forward progress because
+ * the grace period could otherwise be delayed by additional
+ * calls to __srcu_read_lock() using that old (soon to be new)
+ * value of ->srcu_idx.
+ *
+ * This sum-equality check and ordering also ensures that if
+ * a given call to __srcu_read_lock() uses the new value of
+ * ->srcu_idx, this updater's earlier scans cannot have seen
+ * that reader's increments, which is all to the good, because
+ * this grace period need not wait on that reader. After all,
+ * if those earlier scans had seen that reader, there would have
+ * been a sum mismatch and this code would not be reached.
+ *
+ * This means that the following smp_mb() is redundant, but
+ * it stays until either (1) Compilers learn about this sort of
+ * control dependency or (2) Some production workload running on
+ * a production system is unduly delayed by this slowpath smp_mb().
*/
smp_mb(); /* E */ /* Pairs with B and C. */
- WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
+ WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); // Flip the counter.
/*
* Ensure that if the updater misses an __srcu_read_unlock()
--
2.40.0.rc1.284.g88254d51c5-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/6] Documentation: RCU: Correct spelling
2023-03-15 19:38 [PATCH 1/6] srcu: Add comments for srcu_size_state Joel Fernandes (Google)
2023-03-15 19:38 ` [PATCH 2/6] Documentation/RCU: s/not/note/ in checklist.rst Joel Fernandes (Google)
2023-03-15 19:38 ` [PATCH 3/6] srcu: Clarify comments on memory barrier "E" Joel Fernandes (Google)
@ 2023-03-15 19:38 ` Joel Fernandes (Google)
2023-03-15 19:38 ` [PATCH 5/6] doc: Update whatisRCU.rst Joel Fernandes (Google)
2023-03-15 19:38 ` [PATCH 6/6] rcu: Add comment to rcu_do_batch() identifying rcuoc code path Joel Fernandes (Google)
4 siblings, 0 replies; 6+ messages in thread
From: Joel Fernandes (Google) @ 2023-03-15 19:38 UTC (permalink / raw)
To: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
Jonathan Corbet, Davidlohr Bueso
Cc: Randy Dunlap, linux-doc, rcu
From: Randy Dunlap <rdunlap@infradead.org>
Correct spelling problems for Documentation/RCU/ as reported
by codespell.
Note: in RTFP.txt, there are other misspellings that are left as is
since they were used that way in email Subject: lines or in LWN.net
articles. [preemptable, Preemptable, synchonisation]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: rcu@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
.../Expedited-Grace-Periods.rst | 6 +++---
.../Memory-Ordering/Tree-RCU-Memory-Ordering.rst | 2 +-
Documentation/RCU/RTFP.txt | 10 +++++-----
Documentation/RCU/UP.rst | 4 ++--
Documentation/RCU/lockdep.rst | 2 +-
Documentation/RCU/torture.rst | 4 ++--
6 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
index c9c957c85bac..93d899d53258 100644
--- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
+++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
@@ -277,7 +277,7 @@ the following access functions:
Again, only one request in a given batch need actually carry out a
grace-period operation, which means there must be an efficient way to
-identify which of many concurrent reqeusts will initiate the grace
+identify which of many concurrent requests will initiate the grace
period, and that there be an efficient way for the remaining requests to
wait for that grace period to complete. However, that is the topic of
the next section.
@@ -405,7 +405,7 @@ Use of Workqueues
In earlier implementations, the task requesting the expedited grace
period also drove it to completion. This straightforward approach had
the disadvantage of needing to account for POSIX signals sent to user
-tasks, so more recent implemementations use the Linux kernel's
+tasks, so more recent implementations use the Linux kernel's
workqueues (see Documentation/core-api/workqueue.rst).
The requesting task still does counter snapshotting and funnel-lock
@@ -465,7 +465,7 @@ corresponding disadvantage that workqueues cannot be used until they are
initialized, which does not happen until some time after the scheduler
spawns the first task. Given that there are parts of the kernel that
really do want to execute grace periods during this mid-boot “dead
-zone”, expedited grace periods must do something else during thie time.
+zone”, expedited grace periods must do something else during this time.
What they do is to fall back to the old practice of requiring that the
requesting task drive the expedited grace period, as was the case before
diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
index 7fdf151a8680..5750f125361b 100644
--- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
+++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
@@ -168,7 +168,7 @@ an ``atomic_add_return()`` of zero) to detect idle CPUs.
+-----------------------------------------------------------------------+
The approach must be extended to handle one final case, that of waking a
-task blocked in ``synchronize_rcu()``. This task might be affinitied to
+task blocked in ``synchronize_rcu()``. This task might be affined to
a CPU that is not yet aware that the grace period has ended, and thus
might not yet be subject to the grace period's memory ordering.
Therefore, there is an ``smp_mb()`` after the return from
diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
index 588d97366a46..db8f16b392aa 100644
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
@@ -201,7 +201,7 @@ work looked at debugging uses of RCU [Seyster:2011:RFA:2075416.2075425].
In 2012, Josh Triplett received his Ph.D. with his dissertation
covering RCU-protected resizable hash tables and the relationship
between memory barriers and read-side traversal order: If the updater
-is making changes in the opposite direction from the read-side traveral
+is making changes in the opposite direction from the read-side traversal
order, the updater need only execute a memory-barrier instruction,
but if in the same direction, the updater needs to wait for a grace
period between the individual updates [JoshTriplettPhD]. Also in 2012,
@@ -1245,7 +1245,7 @@ Oregon Health and Sciences University"
[Viewed September 5, 2005]"
,annotation={
First posting showing how RCU can be safely adapted for
- preemptable RCU read side critical sections.
+ preemptible RCU read side critical sections.
}
}
@@ -1888,7 +1888,7 @@ Revised:
\url{https://lore.kernel.org/r/20070910183004.GA3299@linux.vnet.ibm.com}
[Viewed October 25, 2007]"
,annotation={
- Final patch for preemptable RCU to -rt. (Later patches were
+ Final patch for preemptible RCU to -rt. (Later patches were
to mainline, eventually incorporated.)
}
}
@@ -2275,7 +2275,7 @@ lot of {Linux} into your technology!!!"
\url{https://lore.kernel.org/r/20090724001429.GA17374@linux.vnet.ibm.com}
[Viewed August 15, 2009]"
,annotation={
- First posting of simple and fast preemptable RCU.
+ First posting of simple and fast preemptible RCU.
}
}
@@ -2639,7 +2639,7 @@ lot of {Linux} into your technology!!!"
RCU-protected hash tables, barriers vs. read-side traversal order.
.
If the updater is making changes in the opposite direction from
- the read-side traveral order, the updater need only execute a
+ the read-side traversal order, the updater need only execute a
memory-barrier instruction, but if in the same direction, the
updater needs to wait for a grace period between the individual
updates.
diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst
index 8b20fd45f255..4060d7a2f62a 100644
--- a/Documentation/RCU/UP.rst
+++ b/Documentation/RCU/UP.rst
@@ -107,7 +107,7 @@ UP systems, including PREEMPT SMP builds running on UP systems.
Quick Quiz #3:
Why can't synchronize_rcu() return immediately on UP systems running
- preemptable RCU?
+ preemptible RCU?
.. _answer_quick_quiz_up:
@@ -143,7 +143,7 @@ Answer to Quick Quiz #2:
Answer to Quick Quiz #3:
Why can't synchronize_rcu() return immediately on UP systems
- running preemptable RCU?
+ running preemptible RCU?
Because some other task might have been preempted in the middle
of an RCU read-side critical section. If synchronize_rcu()
diff --git a/Documentation/RCU/lockdep.rst b/Documentation/RCU/lockdep.rst
index 2749f43ec1b0..69e73a39bd11 100644
--- a/Documentation/RCU/lockdep.rst
+++ b/Documentation/RCU/lockdep.rst
@@ -65,7 +65,7 @@ checking of rcu_dereference() primitives:
rcu_access_pointer(p):
Return the value of the pointer and omit all barriers,
but retain the compiler constraints that prevent duplicating
- or coalescsing. This is useful when testing the
+ or coalescing. This is useful when testing the
value of the pointer itself, for example, against NULL.
The rcu_dereference_check() check expression can be any boolean
diff --git a/Documentation/RCU/torture.rst b/Documentation/RCU/torture.rst
index 0316ba0c6922..b3b6dfa85248 100644
--- a/Documentation/RCU/torture.rst
+++ b/Documentation/RCU/torture.rst
@@ -216,7 +216,7 @@ Kernel boot arguments can also be supplied, for example, to control
rcutorture's module parameters. For example, to test a change to RCU's
CPU stall-warning code, use "--bootargs 'rcutorture.stall_cpu=30'".
This will of course result in the scripting reporting a failure, namely
-the resuling RCU CPU stall warning. As noted above, reducing memory may
+the resulting RCU CPU stall warning. As noted above, reducing memory may
require disabling rcutorture's callback-flooding tests::
kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \
@@ -370,5 +370,5 @@ You can also re-run a previous remote run in a manner similar to kvm.sh:
tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \
--duration 24h
-In this case, most of the kvm-again.sh parmeters may be supplied following
+In this case, most of the kvm-again.sh parameters may be supplied following
the pathname of the old run-results directory.
--
2.40.0.rc1.284.g88254d51c5-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 5/6] doc: Update whatisRCU.rst
2023-03-15 19:38 [PATCH 1/6] srcu: Add comments for srcu_size_state Joel Fernandes (Google)
` (2 preceding siblings ...)
2023-03-15 19:38 ` [PATCH 4/6] Documentation: RCU: Correct spelling Joel Fernandes (Google)
@ 2023-03-15 19:38 ` Joel Fernandes (Google)
2023-03-15 19:38 ` [PATCH 6/6] rcu: Add comment to rcu_do_batch() identifying rcuoc code path Joel Fernandes (Google)
4 siblings, 0 replies; 6+ messages in thread
From: Joel Fernandes (Google) @ 2023-03-15 19:38 UTC (permalink / raw)
To: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
Jonathan Corbet
Cc: Uladzislau Rezki (Sony), rcu, linux-doc
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
The kfree_rcu() macro is deprecated. Rename it to its new
kfree_rcu_mightsleep() name in this documentation.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
Documentation/RCU/whatisRCU.rst | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst
index 2c5563a91998..8eddef28d3a1 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -597,10 +597,10 @@ to avoid having to write your own callback::
If the occasional sleep is permitted, the single-argument form may
be used, omitting the rcu_head structure from struct foo.
- kfree_rcu(old_fp);
+ kfree_rcu_mightsleep(old_fp);
-This variant of kfree_rcu() almost never blocks, but might do so by
-invoking synchronize_rcu() in response to memory-allocation failure.
+This variant almost never blocks, but might do so by invoking
+synchronize_rcu() in response to memory-allocation failure.
Again, see checklist.rst for additional rules governing the use of RCU.
--
2.40.0.rc1.284.g88254d51c5-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 6/6] rcu: Add comment to rcu_do_batch() identifying rcuoc code path
2023-03-15 19:38 [PATCH 1/6] srcu: Add comments for srcu_size_state Joel Fernandes (Google)
` (3 preceding siblings ...)
2023-03-15 19:38 ` [PATCH 5/6] doc: Update whatisRCU.rst Joel Fernandes (Google)
@ 2023-03-15 19:38 ` Joel Fernandes (Google)
4 siblings, 0 replies; 6+ messages in thread
From: Joel Fernandes (Google) @ 2023-03-15 19:38 UTC (permalink / raw)
To: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Josh Triplett, Steven Rostedt,
Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes
Cc: rcu
From: "Paul E. McKenney" <paulmck@kernel.org>
This commit adds a comment to help explain why the "else" clause of the
in_serving_softirq() "if" statement does not need to enforce a time limit.
The reason is that this "else" clause handles rcuoc kthreads that do not
block handlers for other softirq vectors.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/rcu/tree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8e880c09ab59..06cc6a6ad819 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2131,6 +2131,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
break;
}
} else {
+ // In rcuoc context, so no worries about depriving
+ // other softirq vectors of CPU cycles.
local_bh_enable();
lockdep_assert_irqs_enabled();
cond_resched_tasks_rcu_qs();
--
2.40.0.rc1.284.g88254d51c5-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread