All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/7] Add support for length of each segment in the segcblist
@ 2020-11-03 14:25 Joel Fernandes (Google)
  2020-11-03 14:25 ` [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
                   ` (6 more replies)
  0 siblings, 7 replies; 40+ messages in thread
From: Joel Fernandes (Google) @ 2020-11-03 14:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

This is required for several usecases identified. One of them being tracing how
the segmented callback list changes. Tracing this has identified issues in RCU
code in the past.

>From Paul:
Another use case is of course more accurately determining whether a given CPU's
large pile of callbacks can be best served by making grace periods go faster,
invoking callbacks more vigorously, or both.  It should also be possible to
simplify some of the callback handling a bit, given that some of the unnatural
acts are due to there having been no per-batch counts.

Revision history:
v9: Fix SRCU issues, other minor style changes (Frederic). Added Frederic's
Reviewed-by to all but the last patch..

v8: Small style changes, making the seglen as non-atomic since it is always
    under lock (Frederic).

v7: Cleaned up memory barriers (thanks fweisbec@ for reviewing), made minor
corrections per Neeraj (thanks).

v6: Fixed TREE04, and restored older logic to ensure rcu_barrier works.

v5: Various changes, bug fixes. Discovery of rcu_barrier issue.

v4: Restructured rcu_do_batch() and segcblist merging to avoid issues.
    Fixed minor nit from Davidlohr.
v1->v3: minor nits.
(https://lore.kernel.org/lkml/20200719034210.2382053-1-joel@xxxxxxxxxxxxxxxxx/)

Joel Fernandes (Google) (7):
rcu/tree: Make rcu_do_batch count how many callbacks were executed
rcu/segcblist: Add counters to segcblist datastructure
srcu: Fix invoke_rcu_callbacks() segcb length adjustment
rcu/trace: Add tracing for how segcb list changes
rcu/segcblist: Remove useless rcupdate.h include
rcu/tree: segcblist: Remove redundant smp_mb()s
rcu/segcblist: Add additional comments to explain smp_mb()

include/linux/rcu_segcblist.h |   1 +
include/trace/events/rcu.h    |  25 +++++
kernel/rcu/rcu_segcblist.c    | 198 +++++++++++++++++++++++++---------
kernel/rcu/rcu_segcblist.h    |   8 +-
kernel/rcu/srcutree.c         |   5 +-
kernel/rcu/tree.c             |  21 ++--
6 files changed, 199 insertions(+), 59 deletions(-)

--
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
@ 2020-11-03 14:25 ` Joel Fernandes (Google)
  2020-11-04  0:05   ` Paul E. McKenney
  2020-11-03 14:25 ` [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes (Google) @ 2020-11-03 14:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, Paul E. McKenney,
	rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

Currently, rcu_do_batch() depends on the unsegmented callback list's len field
to know how many CBs are executed. This fields counts down from 0 as CBs are
dequeued.  It is possible that all CBs could not be run because of reaching
limits in which case the remaining unexecuted callbacks are requeued in the
CPU's segcblist.

The number of callbacks that were not requeued are then the negative count (how
many CBs were run) stored in the rcl->len which has been counting down on every
dequeue. This negative count is then added to the per-cpu segmented callback
list's to correct its count.

Such a design works against future efforts to track the length of each segment
of the segmented callback list. The reason is because
rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
list's length field (rcl->len) during extraction.

Also, the design of counting down from 0 is confusing and error-prone IMHO.

This commit therefore explicitly counts how many callbacks were executed in
rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
field, without relying on the negativity of rcl->len.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu_segcblist.c |  2 +-
 kernel/rcu/rcu_segcblist.h |  1 +
 kernel/rcu/tree.c          | 11 +++++------
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 2d2a6b6b9dfb..bb246d8c6ef1 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -95,7 +95,7 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
  * This increase is fully ordered with respect to the callers accesses
  * both before and after.
  */
-static void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
+void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
 {
 #ifdef CONFIG_RCU_NOCB_CPU
 	smp_mb__before_atomic(); /* Up to the caller! */
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 492262bcb591..1d2d61406463 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -76,6 +76,7 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
 }
 
 void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
+void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
 void rcu_segcblist_init(struct rcu_segcblist *rsclp);
 void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 286dc0a1b184..24c00020ab83 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2429,7 +2429,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
-	long bl, count;
+	long bl, count = 0;
 	long pending, tlimit = 0;
 
 	/* If no callbacks are ready, just return. */
@@ -2474,6 +2474,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 		rcu_callback_t f;
 
+		count++;
 		debug_rcu_head_unqueue(rhp);
 
 		rcu_lock_acquire(&rcu_callback_map);
@@ -2487,15 +2488,14 @@ static void rcu_do_batch(struct rcu_data *rdp)
 
 		/*
 		 * Stop only if limit reached and CPU has something to do.
-		 * Note: The rcl structure counts down from zero.
 		 */
-		if (-rcl.len >= bl && !offloaded &&
+		if (count >= bl && !offloaded &&
 		    (need_resched() ||
 		     (!is_idle_task(current) && !rcu_is_callbacks_kthread())))
 			break;
 		if (unlikely(tlimit)) {
 			/* only call local_clock() every 32 callbacks */
-			if (likely((-rcl.len & 31) || local_clock() < tlimit))
+			if (likely((count & 31) || local_clock() < tlimit))
 				continue;
 			/* Exceeded the time limit, so leave. */
 			break;
@@ -2512,7 +2512,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 
 	local_irq_save(flags);
 	rcu_nocb_lock(rdp);
-	count = -rcl.len;
 	rdp->n_cbs_invoked += count;
 	trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
 			    is_idle_task(current), rcu_is_callbacks_kthread());
@@ -2520,7 +2519,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	/* Update counts and requeue any remaining callbacks. */
 	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
 	smp_mb(); /* List handling before counting for rcu_barrier(). */
-	rcu_segcblist_insert_count(&rdp->cblist, &rcl);
+	rcu_segcblist_add_len(&rdp->cblist, -count);
 
 	/* Reinstate batch limit if we have worked down the excess. */
 	count = rcu_segcblist_n_cbs(&rdp->cblist);
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure
  2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
  2020-11-03 14:25 ` [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
@ 2020-11-03 14:25 ` Joel Fernandes (Google)
  2020-11-04 13:37   ` Neeraj Upadhyay
  2020-11-04 17:01   ` Paul E. McKenney
  2020-11-03 14:25 ` [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment Joel Fernandes (Google)
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 40+ messages in thread
From: Joel Fernandes (Google) @ 2020-11-03 14:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Frederic Weisbecker, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

Add counting of segment lengths of segmented callback list.

This will be useful for a number of things such as knowing how big the
ready-to-execute segment have gotten. The immediate benefit is ability
to trace how the callbacks in the segmented callback list change.

Also this patch remove hacks related to using donecbs's ->len field as a
temporary variable to save the segmented callback list's length. This cannot be
done anymore and is not needed.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcu_segcblist.h |   1 +
 kernel/rcu/rcu_segcblist.c    | 120 ++++++++++++++++++++++------------
 kernel/rcu/rcu_segcblist.h    |   2 -
 3 files changed, 79 insertions(+), 44 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index b36afe7b22c9..6c01f09a6456 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -72,6 +72,7 @@ struct rcu_segcblist {
 #else
 	long len;
 #endif
+	long seglen[RCU_CBLIST_NSEGS];
 	u8 enabled;
 	u8 offloaded;
 };
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index bb246d8c6ef1..357c19bbcb00 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -7,10 +7,11 @@
  * Authors: Paul E. McKenney <paulmck@linux.ibm.com>
  */
 
-#include <linux/types.h>
-#include <linux/kernel.h>
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
+#include <linux/kernel.h>
 #include <linux/rcupdate.h>
+#include <linux/types.h>
 
 #include "rcu_segcblist.h"
 
@@ -88,6 +89,46 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
 #endif
 }
 
+/* Get the length of a segment of the rcu_segcblist structure. */
+static long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg)
+{
+	return READ_ONCE(rsclp->seglen[seg]);
+}
+
+/* Set the length of a segment of the rcu_segcblist structure. */
+static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, long v)
+{
+	WRITE_ONCE(rsclp->seglen[seg], v);
+}
+
+/* Return number of callbacks in a segment of the segmented callback list. */
+static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
+{
+	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
+}
+
+/* Move from's segment length to to's segment. */
+static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, int to)
+{
+	long len;
+
+	if (from == to)
+		return;
+
+	len = rcu_segcblist_get_seglen(rsclp, from);
+	if (!len)
+		return;
+
+	rcu_segcblist_add_seglen(rsclp, to, len);
+	rcu_segcblist_set_seglen(rsclp, from, 0);
+}
+
+/* Increment segment's length. */
+static void rcu_segcblist_inc_seglen(struct rcu_segcblist *rsclp, int seg)
+{
+	rcu_segcblist_add_seglen(rsclp, seg, 1);
+}
+
 /*
  * Increase the numeric length of an rcu_segcblist structure by the
  * specified amount, which can be negative.  This can cause the ->len
@@ -119,26 +160,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp)
 	rcu_segcblist_add_len(rsclp, 1);
 }
 
-/*
- * Exchange the numeric length of the specified rcu_segcblist structure
- * with the specified value.  This can cause the ->len field to disagree
- * with the actual number of callbacks on the structure.  This exchange is
- * fully ordered with respect to the callers accesses both before and after.
- */
-static long rcu_segcblist_xchg_len(struct rcu_segcblist *rsclp, long v)
-{
-#ifdef CONFIG_RCU_NOCB_CPU
-	return atomic_long_xchg(&rsclp->len, v);
-#else
-	long ret = rsclp->len;
-
-	smp_mb(); /* Up to the caller! */
-	WRITE_ONCE(rsclp->len, v);
-	smp_mb(); /* Up to the caller! */
-	return ret;
-#endif
-}
-
 /*
  * Initialize an rcu_segcblist structure.
  */
@@ -149,8 +170,10 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
 	BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq));
 	BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq));
 	rsclp->head = NULL;
-	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
+	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
 		rsclp->tails[i] = &rsclp->head;
+		rcu_segcblist_set_seglen(rsclp, i, 0);
+	}
 	rcu_segcblist_set_len(rsclp, 0);
 	rsclp->enabled = 1;
 }
@@ -246,6 +269,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
 {
 	rcu_segcblist_inc_len(rsclp);
 	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
+	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
 	rhp->next = NULL;
 	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
 	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], &rhp->next);
@@ -274,27 +298,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
 	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
 		if (rsclp->tails[i] != rsclp->tails[i - 1])
 			break;
+	rcu_segcblist_inc_seglen(rsclp, i);
 	WRITE_ONCE(*rsclp->tails[i], rhp);
 	for (; i <= RCU_NEXT_TAIL; i++)
 		WRITE_ONCE(rsclp->tails[i], &rhp->next);
 	return true;
 }
 
-/*
- * Extract only the counts from the specified rcu_segcblist structure,
- * and place them in the specified rcu_cblist structure.  This function
- * supports both callback orphaning and invocation, hence the separation
- * of counts and callbacks.  (Callbacks ready for invocation must be
- * orphaned and adopted separately from pending callbacks, but counts
- * apply to all callbacks.  Locking must be used to make sure that
- * both orphaned-callbacks lists are consistent.)
- */
-void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
-					       struct rcu_cblist *rclp)
-{
-	rclp->len = rcu_segcblist_xchg_len(rsclp, 0);
-}
-
 /*
  * Extract only those callbacks ready to be invoked from the specified
  * rcu_segcblist structure and place them in the specified rcu_cblist
@@ -307,6 +317,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 
 	if (!rcu_segcblist_ready_cbs(rsclp))
 		return; /* Nothing to do. */
+	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_DONE_TAIL);
 	*rclp->tail = rsclp->head;
 	WRITE_ONCE(rsclp->head, *rsclp->tails[RCU_DONE_TAIL]);
 	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
@@ -314,6 +325,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 	for (i = RCU_CBLIST_NSEGS - 1; i >= RCU_DONE_TAIL; i--)
 		if (rsclp->tails[i] == rsclp->tails[RCU_DONE_TAIL])
 			WRITE_ONCE(rsclp->tails[i], &rsclp->head);
+	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
 }
 
 /*
@@ -330,11 +342,16 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
 
 	if (!rcu_segcblist_pend_cbs(rsclp))
 		return; /* Nothing to do. */
+	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
+		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
+		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
 	*rclp->tail = *rsclp->tails[RCU_DONE_TAIL];
 	rclp->tail = rsclp->tails[RCU_NEXT_TAIL];
 	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
-	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++)
+	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) {
 		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
+		rcu_segcblist_set_seglen(rsclp, i, 0);
+	}
 }
 
 /*
@@ -345,7 +362,6 @@ void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp,
 				struct rcu_cblist *rclp)
 {
 	rcu_segcblist_add_len(rsclp, rclp->len);
-	rclp->len = 0;
 }
 
 /*
@@ -359,6 +375,7 @@ void rcu_segcblist_insert_done_cbs(struct rcu_segcblist *rsclp,
 
 	if (!rclp->head)
 		return; /* No callbacks to move. */
+	rcu_segcblist_add_seglen(rsclp, RCU_DONE_TAIL, rclp->len);
 	*rclp->tail = rsclp->head;
 	WRITE_ONCE(rsclp->head, rclp->head);
 	for (i = RCU_DONE_TAIL; i < RCU_CBLIST_NSEGS; i++)
@@ -379,6 +396,8 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
 {
 	if (!rclp->head)
 		return; /* Nothing to do. */
+
+	rcu_segcblist_add_seglen(rsclp, RCU_NEXT_TAIL, rclp->len);
 	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
 	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
 }
@@ -403,6 +422,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
 		if (ULONG_CMP_LT(seq, rsclp->gp_seq[i]))
 			break;
 		WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
+		rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
 	}
 
 	/* If no callbacks moved, nothing more need be done. */
@@ -423,6 +443,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
 		if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
 			break;  /* No more callbacks. */
 		WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
+		rcu_segcblist_move_seglen(rsclp, i, j);
 		rsclp->gp_seq[j] = rsclp->gp_seq[i];
 	}
 }
@@ -444,7 +465,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
  */
 bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 {
-	int i;
+	int i, j;
 
 	WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
 	if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
@@ -487,6 +508,10 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 	if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
 		return false;
 
+	/* Accounting: everything below i is about to get merged into i. */
+	for (j = i + 1; j <= RCU_NEXT_TAIL; j++)
+		rcu_segcblist_move_seglen(rsclp, j, i);
+
 	/*
 	 * Merge all later callbacks, including newly arrived callbacks,
 	 * into the segment located by the for-loop above.  Assign "seq"
@@ -514,13 +539,24 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
 	struct rcu_cblist donecbs;
 	struct rcu_cblist pendcbs;
 
+	lockdep_assert_cpus_held();
+
 	rcu_cblist_init(&donecbs);
 	rcu_cblist_init(&pendcbs);
-	rcu_segcblist_extract_count(src_rsclp, &donecbs);
+
 	rcu_segcblist_extract_done_cbs(src_rsclp, &donecbs);
 	rcu_segcblist_extract_pend_cbs(src_rsclp, &pendcbs);
+
+	/*
+	 * No need smp_mb() before setting length to 0, because CPU hotplug
+	 * lock excludes rcu_barrier.
+	 */
+	rcu_segcblist_set_len(src_rsclp, 0);
+
 	rcu_segcblist_insert_count(dst_rsclp, &donecbs);
+	rcu_segcblist_insert_count(dst_rsclp, &pendcbs);
 	rcu_segcblist_insert_done_cbs(dst_rsclp, &donecbs);
 	rcu_segcblist_insert_pend_cbs(dst_rsclp, &pendcbs);
+
 	rcu_segcblist_init(src_rsclp);
 }
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 1d2d61406463..cd35c9faaf51 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -89,8 +89,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
 			   struct rcu_head *rhp);
 bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
 			   struct rcu_head *rhp);
-void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
-				 struct rcu_cblist *rclp);
 void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 				    struct rcu_cblist *rclp);
 void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment
  2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
  2020-11-03 14:25 ` [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
  2020-11-03 14:25 ` [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
@ 2020-11-03 14:25 ` Joel Fernandes (Google)
  2020-11-03 14:47   ` Frederic Weisbecker
  2020-11-04 13:37   ` Neeraj Upadhyay
  2020-11-03 14:26 ` [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 40+ messages in thread
From: Joel Fernandes (Google) @ 2020-11-03 14:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Frederic Weisbecker, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

With earlier patches, the negative counting of the unsegmented list
cannot be used to adjust the segmented one. To fix this, sample the
unsegmented length in advance, and use it after CB execution to adjust
the segmented list's length.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/srcutree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0f23d20d485a..79b7081143a7 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1160,6 +1160,7 @@ static void srcu_advance_state(struct srcu_struct *ssp)
  */
 static void srcu_invoke_callbacks(struct work_struct *work)
 {
+	long len;
 	bool more;
 	struct rcu_cblist ready_cbs;
 	struct rcu_head *rhp;
@@ -1182,6 +1183,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
 	/* We are on the job!  Extract and invoke ready callbacks. */
 	sdp->srcu_cblist_invoking = true;
 	rcu_segcblist_extract_done_cbs(&sdp->srcu_cblist, &ready_cbs);
+	len = ready_cbs.len;
 	spin_unlock_irq_rcu_node(sdp);
 	rhp = rcu_cblist_dequeue(&ready_cbs);
 	for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
@@ -1190,13 +1192,14 @@ static void srcu_invoke_callbacks(struct work_struct *work)
 		rhp->func(rhp);
 		local_bh_enable();
 	}
+	WARN_ON_ONCE(ready_cbs.len);
 
 	/*
 	 * Update counts, accelerate new callbacks, and if needed,
 	 * schedule another round of callback invocation.
 	 */
 	spin_lock_irq_rcu_node(sdp);
-	rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
+	rcu_segcblist_add_len(&sdp->srcu_cblist, -len);
 	(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
 				       rcu_seq_snap(&ssp->srcu_gp_seq));
 	sdp->srcu_cblist_invoking = false;
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2020-11-03 14:25 ` [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment Joel Fernandes (Google)
@ 2020-11-03 14:26 ` Joel Fernandes (Google)
  2020-11-03 15:17   ` Frederic Weisbecker
                     ` (2 more replies)
  2020-11-03 14:26 ` [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 40+ messages in thread
From: Joel Fernandes (Google) @ 2020-11-03 14:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, Paul E. McKenney,
	rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

Track how the segcb list changes before/after acceleration, during
queuing and during dequeuing.

This has proved useful to discover an optimization to avoid unwanted GP
requests when there are no callbacks accelerated. The overhead is minimal as
each segment's length is now stored in the respective segment.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
 kernel/rcu/rcu_segcblist.c | 34 ++++++++++++++++++++++++++++++++++
 kernel/rcu/rcu_segcblist.h |  5 +++++
 kernel/rcu/tree.c          |  9 +++++++++
 4 files changed, 73 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 155b5cb43cfd..5f8f2ee1a936 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
 		  __entry->qlen)
 );
 
+TRACE_EVENT_RCU(rcu_segcb_stats,
+
+		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
+
+		TP_ARGS(ctx, cb_count, gp_seq),
+
+		TP_STRUCT__entry(
+			__field(const char *, ctx)
+			__array(int, cb_count, RCU_CBLIST_NSEGS)
+			__array(unsigned long, gp_seq, RCU_CBLIST_NSEGS)
+		),
+
+		TP_fast_assign(
+			__entry->ctx = ctx;
+			memcpy(__entry->cb_count, cb_count, RCU_CBLIST_NSEGS * sizeof(int));
+			memcpy(__entry->gp_seq, gp_seq, RCU_CBLIST_NSEGS * sizeof(unsigned long));
+		),
+
+		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
+			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
+			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
+			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
+
+);
+
 /*
  * Tracepoint for the registration of a single RCU callback of the special
  * kvfree() form.  The first argument is the RCU type, the second argument
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 357c19bbcb00..2a03949d0b82 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -14,6 +14,7 @@
 #include <linux/types.h>
 
 #include "rcu_segcblist.h"
+#include "rcu.h"
 
 /* Initialize simple callback list. */
 void rcu_cblist_init(struct rcu_cblist *rclp)
@@ -328,6 +329,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
 }
 
+/*
+ * Return how many CBs each segment along with their gp_seq values.
+ *
+ * This function is O(N) where N is the number of segments. Only used from
+ * tracing code which is usually disabled in production.
+ */
+#ifdef CONFIG_RCU_TRACE
+static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
+			 int cbcount[RCU_CBLIST_NSEGS],
+			 unsigned long gpseq[RCU_CBLIST_NSEGS])
+{
+	int i;
+
+	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
+		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
+		gpseq[i] = rsclp->gp_seq[i];
+	}
+}
+
+void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
+{
+	int cbs[RCU_CBLIST_NSEGS];
+	unsigned long gps[RCU_CBLIST_NSEGS];
+
+	if (!trace_rcu_segcb_stats_enabled())
+		return;
+
+	rcu_segcblist_countseq(rsclp, cbs, gps);
+
+	trace_rcu_segcb_stats(context, cbs, gps);
+}
+#endif
+
 /*
  * Extract only those callbacks still pending (not yet ready to be
  * invoked) from the specified rcu_segcblist structure and place them in
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index cd35c9faaf51..7750734fa116 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
 bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
 void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
 			 struct rcu_segcblist *src_rsclp);
+#ifdef CONFIG_RCU_TRACE
+void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context);
+#else
+#define __trace_rcu_segcb_stats(...)
+#endif
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 24c00020ab83..f6c6653b3ec2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1497,6 +1497,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
 		return false;
 
+	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPreAcc"));
+
 	/*
 	 * Callbacks are often registered with incomplete grace-period
 	 * information.  Something about the fact that getting exact
@@ -1517,6 +1519,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 	else
 		trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
 
+	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPostAcc"));
+
 	return ret;
 }
 
@@ -2466,11 +2470,14 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
 	if (offloaded)
 		rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist);
+
+	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbDequeued"));
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 
 	/* Invoke callbacks. */
 	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
 	rhp = rcu_cblist_dequeue(&rcl);
+
 	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 		rcu_callback_t f;
 
@@ -2983,6 +2990,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 		trace_rcu_callback(rcu_state.name, head,
 				   rcu_segcblist_n_cbs(&rdp->cblist));
 
+	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
+
 	/* Go handle any RCU core processing required. */
 	if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
 		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include
  2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2020-11-03 14:26 ` [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
@ 2020-11-03 14:26 ` Joel Fernandes (Google)
  2020-11-05  3:48   ` Paul E. McKenney
  2020-11-03 14:26 ` [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s Joel Fernandes (Google)
  2020-11-03 14:26 ` [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb() Joel Fernandes (Google)
  6 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes (Google) @ 2020-11-03 14:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu_segcblist.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 2a03949d0b82..e9e72d72f7a6 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -10,7 +10,6 @@
 #include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
-#include <linux/rcupdate.h>
 #include <linux/types.h>
 
 #include "rcu_segcblist.h"
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s
  2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
                   ` (4 preceding siblings ...)
  2020-11-03 14:26 ` [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include Joel Fernandes (Google)
@ 2020-11-03 14:26 ` Joel Fernandes (Google)
  2020-11-05  3:57   ` Paul E. McKenney
  2020-11-03 14:26 ` [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb() Joel Fernandes (Google)
  6 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes (Google) @ 2020-11-03 14:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Frederic Weisbecker, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

This memory barrier is not needed as rcu_segcblist_add_len() already
includes a memory barrier *before* and *after* the length of the list is
updated.

Same reasoning for rcu_segcblist_enqueue().

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu_segcblist.c | 1 -
 kernel/rcu/tree.c          | 1 -
 2 files changed, 2 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index e9e72d72f7a6..d96272e8d604 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -268,7 +268,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
 			   struct rcu_head *rhp)
 {
 	rcu_segcblist_inc_len(rsclp);
-	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
 	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
 	rhp->next = NULL;
 	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6c6653b3ec2..fb2a5ac4a59c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2525,7 +2525,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 
 	/* Update counts and requeue any remaining callbacks. */
 	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
-	smp_mb(); /* List handling before counting for rcu_barrier(). */
 	rcu_segcblist_add_len(&rdp->cblist, -count);
 
 	/* Reinstate batch limit if we have worked down the excess. */
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb()
  2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
                   ` (5 preceding siblings ...)
  2020-11-03 14:26 ` [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s Joel Fernandes (Google)
@ 2020-11-03 14:26 ` Joel Fernandes (Google)
  2020-11-05 18:55   ` Paul E. McKenney
  6 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes (Google) @ 2020-11-03 14:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

Memory barriers are needed when updating the full length of the
segcblist, however it is not fully clearly why one is needed before and
after. This patch therefore adds additional comments to the function
header to explain it.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu_segcblist.c | 40 ++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index d96272e8d604..9b43d686b1f3 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -135,17 +135,49 @@ static void rcu_segcblist_inc_seglen(struct rcu_segcblist *rsclp, int seg)
  * field to disagree with the actual number of callbacks on the structure.
  * This increase is fully ordered with respect to the callers accesses
  * both before and after.
+ *
+ * About memory barriers:
+ * There is a situation where rcu_barrier() locklessly samples the full
+ * length of the segmented cblist before deciding what to do. That can
+ * race with another path that calls this function such as enqueue or dequeue.
+ * rcu_barrier() should not wrongly assume there are no callbacks, so any
+ * transitions from 1->0 and 0->1 have to be carefully ordered with respect to
+ * list modifications and with whatever follows the rcu_barrier().
+ *
+ * There are at least 2 cases:
+ * CASE 1: Memory barrier is needed before adding to length, for the case where
+ * v is negative (happens during dequeue). When length transitions from 1 -> 0,
+ * the write to 0 has to be ordered to appear to be *after* the memory accesses
+ * of the CBs that were dequeued and the segcblist modifications:
+ * To illustrate the problematic scenario to avoid:
+ * P0 (what P1 sees)	P1
+ * set len = 0
+ *                      rcu_barrier sees len as 0
+ * dequeue from list
+ *                      rcu_barrier does nothing.
+ *
+ * CASE 2: Memory barrier is needed after adding to length for the case
+ * where length transitions from 0 -> 1. This is because rcu_barrier()
+ * should never miss an update to the length. So the update to length
+ * has to be seen *before* any modifications to the segmented list. Otherwise a
+ * race can happen.
+ * To illustrate the problematic scenario to avoid:
+ * P0 (what P1 sees)	P1
+ * queue to list
+ *                      rcu_barrier sees len as 0
+ * set len = 1.
+ *                      rcu_barrier does nothing.
  */
 void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
 {
 #ifdef CONFIG_RCU_NOCB_CPU
-	smp_mb__before_atomic(); /* Up to the caller! */
+	smp_mb__before_atomic(); /* Read function's comments */
 	atomic_long_add(v, &rsclp->len);
-	smp_mb__after_atomic(); /* Up to the caller! */
+	smp_mb__after_atomic();  /* Read function's comments */
 #else
-	smp_mb(); /* Up to the caller! */
+	smp_mb(); /* Read function's comments */
 	WRITE_ONCE(rsclp->len, rsclp->len + v);
-	smp_mb(); /* Up to the caller! */
+	smp_mb(); /* Read function's comments */
 #endif
 }
 
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment
  2020-11-03 14:25 ` [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment Joel Fernandes (Google)
@ 2020-11-03 14:47   ` Frederic Weisbecker
  2020-11-03 14:56     ` Joel Fernandes
  2020-11-03 15:07     ` Joel Fernandes
  2020-11-04 13:37   ` Neeraj Upadhyay
  1 sibling, 2 replies; 40+ messages in thread
From: Frederic Weisbecker @ 2020-11-03 14:47 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 09:25:59AM -0500, Joel Fernandes (Google) wrote:
> With earlier patches, the negative counting of the unsegmented list
> cannot be used to adjust the segmented one. To fix this, sample the
> unsegmented length in advance, and use it after CB execution to adjust
> the segmented list's length.
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

This breaks bisection, you need to either fix up the previous patch
by adding this diff inside or better yet: expand what you did
in "rcu/tree: Make rcu_do_batch count how many callbacks were executed"
to also handle srcu before introducing the segcb count.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment
  2020-11-03 14:47   ` Frederic Weisbecker
@ 2020-11-03 14:56     ` Joel Fernandes
  2020-11-03 15:07     ` Joel Fernandes
  1 sibling, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2020-11-03 14:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 03:47:14PM +0100, Frederic Weisbecker wrote:
> On Tue, Nov 03, 2020 at 09:25:59AM -0500, Joel Fernandes (Google) wrote:
> > With earlier patches, the negative counting of the unsegmented list
> > cannot be used to adjust the segmented one. To fix this, sample the
> > unsegmented length in advance, and use it after CB execution to adjust
> > the segmented list's length.
> > 
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> This breaks bisection, you need to either fix up the previous patch
> by adding this diff inside or better yet: expand what you did
> in "rcu/tree: Make rcu_do_batch count how many callbacks were executed"
> to also handle srcu before introducing the segcb count.

Right, the latter is better. Let us do that.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment
  2020-11-03 14:47   ` Frederic Weisbecker
  2020-11-03 14:56     ` Joel Fernandes
@ 2020-11-03 15:07     ` Joel Fernandes
  2020-11-03 15:18       ` Paul E. McKenney
  2020-11-03 15:19       ` Frederic Weisbecker
  1 sibling, 2 replies; 40+ messages in thread
From: Joel Fernandes @ 2020-11-03 15:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 03:47:14PM +0100, Frederic Weisbecker wrote:
> On Tue, Nov 03, 2020 at 09:25:59AM -0500, Joel Fernandes (Google) wrote:
> > With earlier patches, the negative counting of the unsegmented list
> > cannot be used to adjust the segmented one. To fix this, sample the
> > unsegmented length in advance, and use it after CB execution to adjust
> > the segmented list's length.
> > 
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> This breaks bisection, you need to either fix up the previous patch
> by adding this diff inside or better yet: expand what you did
> in "rcu/tree: Make rcu_do_batch count how many callbacks were executed"
> to also handle srcu before introducing the segcb count.

Since doing the latter is a lot more tedious and I want to get reviewing
other's RCU patches today :) , I just squashed the suggestion into the
counters patch to fix bissection:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/segcb-counts&id=595e3a65eeef109cb8fcbfcc114fd3ea2064b873

Hope that's Ok. Also, so that I don't have to resend everything, here is the
final branch if Paul wants to take it:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch rcu/segcb-counts)

Thank you for your time, Frederick!

 - Joel






^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-03 14:26 ` [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
@ 2020-11-03 15:17   ` Frederic Weisbecker
  2020-11-04 14:08     ` Paul E. McKenney
  2020-11-04 13:40   ` Neeraj Upadhyay
  2020-11-11  0:35   ` Paul E. McKenney
  2 siblings, 1 reply; 40+ messages in thread
From: Frederic Weisbecker @ 2020-11-03 15:17 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Neeraj Upadhyay, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 09:26:00AM -0500, Joel Fernandes (Google) wrote:
> +/*
> + * Return how many CBs each segment along with their gp_seq values.
> + *
> + * This function is O(N) where N is the number of segments. Only used from
> + * tracing code which is usually disabled in production.
> + */
> +#ifdef CONFIG_RCU_TRACE
> +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> +			 int cbcount[RCU_CBLIST_NSEGS],
> +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> +{
> +	int i;
> +
> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> +		gpseq[i] = rsclp->gp_seq[i];
> +	}
> +}
> +
> +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> +{
> +	int cbs[RCU_CBLIST_NSEGS];
> +	unsigned long gps[RCU_CBLIST_NSEGS];
> +
> +	if (!trace_rcu_segcb_stats_enabled())
> +		return;

Yes, very good!

Paul just told me that RCU_TRACE can be used in production so that confirms that we
wanted to avoid this loop of 8 iterations when tracing is disabled.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment
  2020-11-03 15:07     ` Joel Fernandes
@ 2020-11-03 15:18       ` Paul E. McKenney
  2020-11-03 15:19       ` Frederic Weisbecker
  1 sibling, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-03 15:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, linux-kernel, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 10:07:38AM -0500, Joel Fernandes wrote:
> On Tue, Nov 03, 2020 at 03:47:14PM +0100, Frederic Weisbecker wrote:
> > On Tue, Nov 03, 2020 at 09:25:59AM -0500, Joel Fernandes (Google) wrote:
> > > With earlier patches, the negative counting of the unsegmented list
> > > cannot be used to adjust the segmented one. To fix this, sample the
> > > unsegmented length in advance, and use it after CB execution to adjust
> > > the segmented list's length.
> > > 
> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > > Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > This breaks bisection, you need to either fix up the previous patch
> > by adding this diff inside or better yet: expand what you did
> > in "rcu/tree: Make rcu_do_batch count how many callbacks were executed"
> > to also handle srcu before introducing the segcb count.
> 
> Since doing the latter is a lot more tedious and I want to get reviewing
> other's RCU patches today :) , I just squashed the suggestion into the
> counters patch to fix bissection:
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/segcb-counts&id=595e3a65eeef109cb8fcbfcc114fd3ea2064b873
> 
> Hope that's Ok. Also, so that I don't have to resend everything, here is the
> final branch if Paul wants to take it:
> git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch rcu/segcb-counts)

Either way, it sounds like it is time for me to review this series.

Thank you both very much!!!

							Thaxn, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment
  2020-11-03 15:07     ` Joel Fernandes
  2020-11-03 15:18       ` Paul E. McKenney
@ 2020-11-03 15:19       ` Frederic Weisbecker
  1 sibling, 0 replies; 40+ messages in thread
From: Frederic Weisbecker @ 2020-11-03 15:19 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 10:07:38AM -0500, Joel Fernandes wrote:
> On Tue, Nov 03, 2020 at 03:47:14PM +0100, Frederic Weisbecker wrote:
> > On Tue, Nov 03, 2020 at 09:25:59AM -0500, Joel Fernandes (Google) wrote:
> > > With earlier patches, the negative counting of the unsegmented list
> > > cannot be used to adjust the segmented one. To fix this, sample the
> > > unsegmented length in advance, and use it after CB execution to adjust
> > > the segmented list's length.
> > > 
> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > > Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > This breaks bisection, you need to either fix up the previous patch
> > by adding this diff inside or better yet: expand what you did
> > in "rcu/tree: Make rcu_do_batch count how many callbacks were executed"
> > to also handle srcu before introducing the segcb count.
> 
> Since doing the latter is a lot more tedious and I want to get reviewing
> other's RCU patches today :) , I just squashed the suggestion into the
> counters patch to fix bissection:
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/segcb-counts&id=595e3a65eeef109cb8fcbfcc114fd3ea2064b873
> 
> Hope that's Ok.

Works for me.

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-11-03 14:25 ` [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
@ 2020-11-04  0:05   ` Paul E. McKenney
  2020-11-04 15:09     ` Joel Fernandes
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-04  0:05 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 09:25:57AM -0500, Joel Fernandes (Google) wrote:
> Currently, rcu_do_batch() depends on the unsegmented callback list's len field
> to know how many CBs are executed. This fields counts down from 0 as CBs are
> dequeued.  It is possible that all CBs could not be run because of reaching
> limits in which case the remaining unexecuted callbacks are requeued in the
> CPU's segcblist.
> 
> The number of callbacks that were not requeued are then the negative count (how
> many CBs were run) stored in the rcl->len which has been counting down on every
> dequeue. This negative count is then added to the per-cpu segmented callback
> list's to correct its count.
> 
> Such a design works against future efforts to track the length of each segment
> of the segmented callback list. The reason is because
> rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
> list's length field (rcl->len) during extraction.
> 
> Also, the design of counting down from 0 is confusing and error-prone IMHO.
> 
> This commit therefore explicitly counts how many callbacks were executed in
> rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
> field, without relying on the negativity of rcl->len.
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Queued for testing and further review, thank you all!

I will be pulling these in as I look them over, and running tests
in between times, so this will take some time.

As usual, I could not resist editing the commit log, so please
see below to check whether I messed something up.

							Thanx, Paul

------------------------------------------------------------------------

commit d77ef2684cbff728c14b3c84a356139c52ca3a5e
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Tue Nov 3 09:25:57 2020 -0500

    rcu/tree: Make rcu_do_batch count how many callbacks were executed
    
    The rcu_do_batch() function extracts the ready-to-invoke callbacks
    from the rcu_segcblist located in the ->cblist field of the current
    CPU's rcu_data structure.  These callbacks are first moved to a local
    (unsegmented) rcu_cblist.  The rcu_do_batch() function then uses this
    rcu_cblist's ->len field to count how many CBs it has invoked, but it
    does so by counting that field down from zero.  Finally, this function
    negates the value in this ->len field (resulting in a positive number)
    and subtracts the result from the ->len field of the current CPU's
    ->cblist field.
    
    Except that it is sometimes necessary for rcu_do_batch() to stop invoking
    callbacks mid-stream, despite there being more ready to invoke, for
    example, if a high-priority task wakes up.  In this case the remaining
    not-yet-invoked callbacks are requeued back onto the CPU's ->cblist,
    but remain in the ready-to-invoke segment of that list.  As above, the
    negative of the local rcu_cblist's ->len field is still subtracted from
    the ->len field of the current CPU's ->cblist field.
    
    The design of counting down from 0 is confusing and error-prone, plus
    use of a positive count will make it easier to provide a uniform and
    consistent API to deal with the per-segment counts that are added
    later in this series.  For example, rcu_segcblist_extract_done_cbs()
    can unconditionally populate the resulting unsegmented list's ->len
    field during extraction.
    
    This commit therefore explicitly counts how many callbacks were executed
    in rcu_do_batch() itself, counting up from zero, and then uses that
    to update the per-CPU segcb list's ->len field, without relying on the
    downcounting of rcl->len from zero.
    
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 2d2a6b6b9..bb246d8 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -95,7 +95,7 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
  * This increase is fully ordered with respect to the callers accesses
  * both before and after.
  */
-static void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
+void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
 {
 #ifdef CONFIG_RCU_NOCB_CPU
 	smp_mb__before_atomic(); /* Up to the caller! */
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 492262b..1d2d614 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -76,6 +76,7 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
 }
 
 void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
+void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
 void rcu_segcblist_init(struct rcu_segcblist *rsclp);
 void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b6ec565..c4035e7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2436,7 +2436,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
-	long bl, count;
+	long bl, count = 0;
 	long pending, tlimit = 0;
 
 	/* If no callbacks are ready, just return. */
@@ -2481,6 +2481,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 		rcu_callback_t f;
 
+		count++;
 		debug_rcu_head_unqueue(rhp);
 
 		rcu_lock_acquire(&rcu_callback_map);
@@ -2494,15 +2495,14 @@ static void rcu_do_batch(struct rcu_data *rdp)
 
 		/*
 		 * Stop only if limit reached and CPU has something to do.
-		 * Note: The rcl structure counts down from zero.
 		 */
-		if (-rcl.len >= bl && !offloaded &&
+		if (count >= bl && !offloaded &&
 		    (need_resched() ||
 		     (!is_idle_task(current) && !rcu_is_callbacks_kthread())))
 			break;
 		if (unlikely(tlimit)) {
 			/* only call local_clock() every 32 callbacks */
-			if (likely((-rcl.len & 31) || local_clock() < tlimit))
+			if (likely((count & 31) || local_clock() < tlimit))
 				continue;
 			/* Exceeded the time limit, so leave. */
 			break;
@@ -2519,7 +2519,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 
 	local_irq_save(flags);
 	rcu_nocb_lock(rdp);
-	count = -rcl.len;
 	rdp->n_cbs_invoked += count;
 	trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
 			    is_idle_task(current), rcu_is_callbacks_kthread());
@@ -2527,7 +2526,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	/* Update counts and requeue any remaining callbacks. */
 	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
 	smp_mb(); /* List handling before counting for rcu_barrier(). */
-	rcu_segcblist_insert_count(&rdp->cblist, &rcl);
+	rcu_segcblist_add_len(&rdp->cblist, -count);
 
 	/* Reinstate batch limit if we have worked down the excess. */
 	count = rcu_segcblist_n_cbs(&rdp->cblist);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure
  2020-11-03 14:25 ` [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
@ 2020-11-04 13:37   ` Neeraj Upadhyay
  2020-11-04 17:01   ` Paul E. McKenney
  1 sibling, 0 replies; 40+ messages in thread
From: Neeraj Upadhyay @ 2020-11-04 13:37 UTC (permalink / raw)
  To: Joel Fernandes (Google), linux-kernel
  Cc: Frederic Weisbecker, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10



On 11/3/2020 7:55 PM, Joel Fernandes (Google) wrote:
> Add counting of segment lengths of segmented callback list.
> 
> This will be useful for a number of things such as knowing how big the
> ready-to-execute segment have gotten. The immediate benefit is ability
> to trace how the callbacks in the segmented callback list change.
> 
> Also this patch remove hacks related to using donecbs's ->len field as a
> temporary variable to save the segmented callback list's length. This cannot be
> done anymore and is not needed.
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---

Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>

>   include/linux/rcu_segcblist.h |   1 +
>   kernel/rcu/rcu_segcblist.c    | 120 ++++++++++++++++++++++------------
>   kernel/rcu/rcu_segcblist.h    |   2 -
>   3 files changed, 79 insertions(+), 44 deletions(-)
> 
> diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> index b36afe7b22c9..6c01f09a6456 100644
> --- a/include/linux/rcu_segcblist.h
> +++ b/include/linux/rcu_segcblist.h
> @@ -72,6 +72,7 @@ struct rcu_segcblist {
>   #else
>   	long len;
>   #endif
> +	long seglen[RCU_CBLIST_NSEGS];
>   	u8 enabled;
>   	u8 offloaded;
>   };
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index bb246d8c6ef1..357c19bbcb00 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -7,10 +7,11 @@
>    * Authors: Paul E. McKenney <paulmck@linux.ibm.com>
>    */
>   
> -#include <linux/types.h>
> -#include <linux/kernel.h>
> +#include <linux/cpu.h>
>   #include <linux/interrupt.h>
> +#include <linux/kernel.h>
>   #include <linux/rcupdate.h>
> +#include <linux/types.h>
>   
>   #include "rcu_segcblist.h"
>   
> @@ -88,6 +89,46 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
>   #endif
>   }
>   
> +/* Get the length of a segment of the rcu_segcblist structure. */
> +static long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg)
> +{
> +	return READ_ONCE(rsclp->seglen[seg]);
> +}
> +
> +/* Set the length of a segment of the rcu_segcblist structure. */
> +static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, long v)
> +{
> +	WRITE_ONCE(rsclp->seglen[seg], v);
> +}
> +
> +/* Return number of callbacks in a segment of the segmented callback list. */
> +static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
> +{
> +	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
> +}
> +
> +/* Move from's segment length to to's segment. */
> +static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, int to)
> +{
> +	long len;
> +
> +	if (from == to)
> +		return;
> +
> +	len = rcu_segcblist_get_seglen(rsclp, from);
> +	if (!len)
> +		return;
> +
> +	rcu_segcblist_add_seglen(rsclp, to, len);
> +	rcu_segcblist_set_seglen(rsclp, from, 0);
> +}
> +
> +/* Increment segment's length. */
> +static void rcu_segcblist_inc_seglen(struct rcu_segcblist *rsclp, int seg)
> +{
> +	rcu_segcblist_add_seglen(rsclp, seg, 1);
> +}
> +
>   /*
>    * Increase the numeric length of an rcu_segcblist structure by the
>    * specified amount, which can be negative.  This can cause the ->len
> @@ -119,26 +160,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp)
>   	rcu_segcblist_add_len(rsclp, 1);
>   }
>   
> -/*
> - * Exchange the numeric length of the specified rcu_segcblist structure
> - * with the specified value.  This can cause the ->len field to disagree
> - * with the actual number of callbacks on the structure.  This exchange is
> - * fully ordered with respect to the callers accesses both before and after.
> - */
> -static long rcu_segcblist_xchg_len(struct rcu_segcblist *rsclp, long v)
> -{
> -#ifdef CONFIG_RCU_NOCB_CPU
> -	return atomic_long_xchg(&rsclp->len, v);
> -#else
> -	long ret = rsclp->len;
> -
> -	smp_mb(); /* Up to the caller! */
> -	WRITE_ONCE(rsclp->len, v);
> -	smp_mb(); /* Up to the caller! */
> -	return ret;
> -#endif
> -}
> -
>   /*
>    * Initialize an rcu_segcblist structure.
>    */
> @@ -149,8 +170,10 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
>   	BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq));
>   	BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq));
>   	rsclp->head = NULL;
> -	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
>   		rsclp->tails[i] = &rsclp->head;
> +		rcu_segcblist_set_seglen(rsclp, i, 0);
> +	}
>   	rcu_segcblist_set_len(rsclp, 0);
>   	rsclp->enabled = 1;
>   }
> @@ -246,6 +269,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
>   {
>   	rcu_segcblist_inc_len(rsclp);
>   	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
> +	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
>   	rhp->next = NULL;
>   	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
>   	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], &rhp->next);
> @@ -274,27 +298,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
>   	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
>   		if (rsclp->tails[i] != rsclp->tails[i - 1])
>   			break;
> +	rcu_segcblist_inc_seglen(rsclp, i);
>   	WRITE_ONCE(*rsclp->tails[i], rhp);
>   	for (; i <= RCU_NEXT_TAIL; i++)
>   		WRITE_ONCE(rsclp->tails[i], &rhp->next);
>   	return true;
>   }
>   
> -/*
> - * Extract only the counts from the specified rcu_segcblist structure,
> - * and place them in the specified rcu_cblist structure.  This function
> - * supports both callback orphaning and invocation, hence the separation
> - * of counts and callbacks.  (Callbacks ready for invocation must be
> - * orphaned and adopted separately from pending callbacks, but counts
> - * apply to all callbacks.  Locking must be used to make sure that
> - * both orphaned-callbacks lists are consistent.)
> - */
> -void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
> -					       struct rcu_cblist *rclp)
> -{
> -	rclp->len = rcu_segcblist_xchg_len(rsclp, 0);
> -}
> -
>   /*
>    * Extract only those callbacks ready to be invoked from the specified
>    * rcu_segcblist structure and place them in the specified rcu_cblist
> @@ -307,6 +317,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>   
>   	if (!rcu_segcblist_ready_cbs(rsclp))
>   		return; /* Nothing to do. */
> +	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_DONE_TAIL);
>   	*rclp->tail = rsclp->head;
>   	WRITE_ONCE(rsclp->head, *rsclp->tails[RCU_DONE_TAIL]);
>   	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
> @@ -314,6 +325,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>   	for (i = RCU_CBLIST_NSEGS - 1; i >= RCU_DONE_TAIL; i--)
>   		if (rsclp->tails[i] == rsclp->tails[RCU_DONE_TAIL])
>   			WRITE_ONCE(rsclp->tails[i], &rsclp->head);
> +	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
>   }
>   
>   /*
> @@ -330,11 +342,16 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
>   
>   	if (!rcu_segcblist_pend_cbs(rsclp))
>   		return; /* Nothing to do. */
> +	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
> +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
> +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
>   	*rclp->tail = *rsclp->tails[RCU_DONE_TAIL];
>   	rclp->tail = rsclp->tails[RCU_NEXT_TAIL];
>   	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
> -	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++)
> +	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) {
>   		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
> +		rcu_segcblist_set_seglen(rsclp, i, 0);
> +	}
>   }
>   
>   /*
> @@ -345,7 +362,6 @@ void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp,
>   				struct rcu_cblist *rclp)
>   {
>   	rcu_segcblist_add_len(rsclp, rclp->len);
> -	rclp->len = 0;
>   }
>   
>   /*
> @@ -359,6 +375,7 @@ void rcu_segcblist_insert_done_cbs(struct rcu_segcblist *rsclp,
>   
>   	if (!rclp->head)
>   		return; /* No callbacks to move. */
> +	rcu_segcblist_add_seglen(rsclp, RCU_DONE_TAIL, rclp->len);
>   	*rclp->tail = rsclp->head;
>   	WRITE_ONCE(rsclp->head, rclp->head);
>   	for (i = RCU_DONE_TAIL; i < RCU_CBLIST_NSEGS; i++)
> @@ -379,6 +396,8 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
>   {
>   	if (!rclp->head)
>   		return; /* Nothing to do. */
> +
> +	rcu_segcblist_add_seglen(rsclp, RCU_NEXT_TAIL, rclp->len);
>   	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
>   	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
>   }
> @@ -403,6 +422,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>   		if (ULONG_CMP_LT(seq, rsclp->gp_seq[i]))
>   			break;
>   		WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
> +		rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
>   	}
>   
>   	/* If no callbacks moved, nothing more need be done. */
> @@ -423,6 +443,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>   		if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
>   			break;  /* No more callbacks. */
>   		WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
> +		rcu_segcblist_move_seglen(rsclp, i, j);
>   		rsclp->gp_seq[j] = rsclp->gp_seq[i];
>   	}
>   }
> @@ -444,7 +465,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>    */
>   bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>   {
> -	int i;
> +	int i, j;
>   
>   	WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
>   	if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
> @@ -487,6 +508,10 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>   	if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
>   		return false;
>   
> +	/* Accounting: everything below i is about to get merged into i. */
> +	for (j = i + 1; j <= RCU_NEXT_TAIL; j++)
> +		rcu_segcblist_move_seglen(rsclp, j, i);
> +
>   	/*
>   	 * Merge all later callbacks, including newly arrived callbacks,
>   	 * into the segment located by the for-loop above.  Assign "seq"
> @@ -514,13 +539,24 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
>   	struct rcu_cblist donecbs;
>   	struct rcu_cblist pendcbs;
>   
> +	lockdep_assert_cpus_held();
> +
>   	rcu_cblist_init(&donecbs);
>   	rcu_cblist_init(&pendcbs);
> -	rcu_segcblist_extract_count(src_rsclp, &donecbs);
> +
>   	rcu_segcblist_extract_done_cbs(src_rsclp, &donecbs);
>   	rcu_segcblist_extract_pend_cbs(src_rsclp, &pendcbs);
> +
> +	/*
> +	 * No need smp_mb() before setting length to 0, because CPU hotplug
> +	 * lock excludes rcu_barrier.
> +	 */
> +	rcu_segcblist_set_len(src_rsclp, 0);
> +
>   	rcu_segcblist_insert_count(dst_rsclp, &donecbs);
> +	rcu_segcblist_insert_count(dst_rsclp, &pendcbs);
>   	rcu_segcblist_insert_done_cbs(dst_rsclp, &donecbs);
>   	rcu_segcblist_insert_pend_cbs(dst_rsclp, &pendcbs);
> +
>   	rcu_segcblist_init(src_rsclp);
>   }
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 1d2d61406463..cd35c9faaf51 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -89,8 +89,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
>   			   struct rcu_head *rhp);
>   bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
>   			   struct rcu_head *rhp);
> -void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
> -				 struct rcu_cblist *rclp);
>   void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>   				    struct rcu_cblist *rclp);
>   void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment
  2020-11-03 14:25 ` [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment Joel Fernandes (Google)
  2020-11-03 14:47   ` Frederic Weisbecker
@ 2020-11-04 13:37   ` Neeraj Upadhyay
  1 sibling, 0 replies; 40+ messages in thread
From: Neeraj Upadhyay @ 2020-11-04 13:37 UTC (permalink / raw)
  To: Joel Fernandes (Google), linux-kernel
  Cc: Frederic Weisbecker, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10



On 11/3/2020 7:55 PM, Joel Fernandes (Google) wrote:
> With earlier patches, the negative counting of the unsegmented list
> cannot be used to adjust the segmented one. To fix this, sample the
> unsegmented length in advance, and use it after CB execution to adjust
> the segmented list's length.
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---

Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>

>   kernel/rcu/srcutree.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 0f23d20d485a..79b7081143a7 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1160,6 +1160,7 @@ static void srcu_advance_state(struct srcu_struct *ssp)
>    */
>   static void srcu_invoke_callbacks(struct work_struct *work)
>   {
> +	long len;
>   	bool more;
>   	struct rcu_cblist ready_cbs;
>   	struct rcu_head *rhp;
> @@ -1182,6 +1183,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
>   	/* We are on the job!  Extract and invoke ready callbacks. */
>   	sdp->srcu_cblist_invoking = true;
>   	rcu_segcblist_extract_done_cbs(&sdp->srcu_cblist, &ready_cbs);
> +	len = ready_cbs.len;
>   	spin_unlock_irq_rcu_node(sdp);
>   	rhp = rcu_cblist_dequeue(&ready_cbs);
>   	for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
> @@ -1190,13 +1192,14 @@ static void srcu_invoke_callbacks(struct work_struct *work)
>   		rhp->func(rhp);
>   		local_bh_enable();
>   	}
> +	WARN_ON_ONCE(ready_cbs.len);
>   
>   	/*
>   	 * Update counts, accelerate new callbacks, and if needed,
>   	 * schedule another round of callback invocation.
>   	 */
>   	spin_lock_irq_rcu_node(sdp);
> -	rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
> +	rcu_segcblist_add_len(&sdp->srcu_cblist, -len);
>   	(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
>   				       rcu_seq_snap(&ssp->srcu_gp_seq));
>   	sdp->srcu_cblist_invoking = false;
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-03 14:26 ` [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
  2020-11-03 15:17   ` Frederic Weisbecker
@ 2020-11-04 13:40   ` Neeraj Upadhyay
  2020-11-04 15:05     ` Joel Fernandes
  2020-11-11  0:35   ` Paul E. McKenney
  2 siblings, 1 reply; 40+ messages in thread
From: Neeraj Upadhyay @ 2020-11-04 13:40 UTC (permalink / raw)
  To: Joel Fernandes (Google), linux-kernel
  Cc: Frederic Weisbecker, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, Paul E. McKenney, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10



On 11/3/2020 7:56 PM, Joel Fernandes (Google) wrote:
> Track how the segcb list changes before/after acceleration, during
> queuing and during dequeuing.
> 
> This has proved useful to discover an optimization to avoid unwanted GP
> requests when there are no callbacks accelerated. The overhead is minimal as
> each segment's length is now stored in the respective segment.
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>   include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
>   kernel/rcu/rcu_segcblist.c | 34 ++++++++++++++++++++++++++++++++++
>   kernel/rcu/rcu_segcblist.h |  5 +++++
>   kernel/rcu/tree.c          |  9 +++++++++
>   4 files changed, 73 insertions(+)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 155b5cb43cfd..5f8f2ee1a936 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
>   		  __entry->qlen)
>   );
>   
> +TRACE_EVENT_RCU(rcu_segcb_stats,
> +
> +		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),

I think we need to use long[] instead of int[] for cb_count everywhere 
in this patch?


Thanks
Neeraj

> +
> +		TP_ARGS(ctx, cb_count, gp_seq),
> +
> +		TP_STRUCT__entry(
> +			__field(const char *, ctx)
> +			__array(int, cb_count, RCU_CBLIST_NSEGS)
> +			__array(unsigned long, gp_seq, RCU_CBLIST_NSEGS)
> +		),
> +
> +		TP_fast_assign(
> +			__entry->ctx = ctx;
> +			memcpy(__entry->cb_count, cb_count, RCU_CBLIST_NSEGS * sizeof(int));
> +			memcpy(__entry->gp_seq, gp_seq, RCU_CBLIST_NSEGS * sizeof(unsigned long));
> +		),
> +
> +		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
> +			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
> +			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
> +			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
> +
> +);
> +
>   /*
>    * Tracepoint for the registration of a single RCU callback of the special
>    * kvfree() form.  The first argument is the RCU type, the second argument
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 357c19bbcb00..2a03949d0b82 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -14,6 +14,7 @@
>   #include <linux/types.h>
>   
>   #include "rcu_segcblist.h"
> +#include "rcu.h"
>   
>   /* Initialize simple callback list. */
>   void rcu_cblist_init(struct rcu_cblist *rclp)
> @@ -328,6 +329,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>   	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
>   }
>   
> +/*
> + * Return how many CBs each segment along with their gp_seq values.
> + *
> + * This function is O(N) where N is the number of segments. Only used from
> + * tracing code which is usually disabled in production.
> + */
> +#ifdef CONFIG_RCU_TRACE
> +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> +			 int cbcount[RCU_CBLIST_NSEGS],
> +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> +{
> +	int i;
> +
> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> +		gpseq[i] = rsclp->gp_seq[i];
> +	}
> +}
> +
> +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> +{
> +	int cbs[RCU_CBLIST_NSEGS];
> +	unsigned long gps[RCU_CBLIST_NSEGS];
> +
> +	if (!trace_rcu_segcb_stats_enabled())
> +		return;
> +
> +	rcu_segcblist_countseq(rsclp, cbs, gps);
> +
> +	trace_rcu_segcb_stats(context, cbs, gps);
> +}
> +#endif
> +
>   /*
>    * Extract only those callbacks still pending (not yet ready to be
>    * invoked) from the specified rcu_segcblist structure and place them in
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index cd35c9faaf51..7750734fa116 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
>   bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
>   void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
>   			 struct rcu_segcblist *src_rsclp);
> +#ifdef CONFIG_RCU_TRACE
> +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context);
> +#else
> +#define __trace_rcu_segcb_stats(...)
> +#endif
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 24c00020ab83..f6c6653b3ec2 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1497,6 +1497,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>   	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
>   		return false;
>   
> +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPreAcc"));
> +
>   	/*
>   	 * Callbacks are often registered with incomplete grace-period
>   	 * information.  Something about the fact that getting exact
> @@ -1517,6 +1519,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>   	else
>   		trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
>   
> +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPostAcc"));
> +
>   	return ret;
>   }
>   
> @@ -2466,11 +2470,14 @@ static void rcu_do_batch(struct rcu_data *rdp)
>   	rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
>   	if (offloaded)
>   		rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist);
> +
> +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbDequeued"));
>   	rcu_nocb_unlock_irqrestore(rdp, flags);
>   
>   	/* Invoke callbacks. */
>   	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
>   	rhp = rcu_cblist_dequeue(&rcl);
> +
>   	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
>   		rcu_callback_t f;
>   
> @@ -2983,6 +2990,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
>   		trace_rcu_callback(rcu_state.name, head,
>   				   rcu_segcblist_n_cbs(&rdp->cblist));
>   
> +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
> +
>   	/* Go handle any RCU core processing required. */
>   	if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
>   		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-03 15:17   ` Frederic Weisbecker
@ 2020-11-04 14:08     ` Paul E. McKenney
  2020-11-04 14:33       ` Frederic Weisbecker
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-04 14:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Joel Fernandes (Google),
	linux-kernel, Neeraj Upadhyay, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 04:17:31PM +0100, Frederic Weisbecker wrote:
> On Tue, Nov 03, 2020 at 09:26:00AM -0500, Joel Fernandes (Google) wrote:
> > +/*
> > + * Return how many CBs each segment along with their gp_seq values.
> > + *
> > + * This function is O(N) where N is the number of segments. Only used from
> > + * tracing code which is usually disabled in production.
> > + */
> > +#ifdef CONFIG_RCU_TRACE
> > +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> > +			 int cbcount[RCU_CBLIST_NSEGS],
> > +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> > +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> > +		gpseq[i] = rsclp->gp_seq[i];
> > +	}
> > +}
> > +
> > +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> > +{
> > +	int cbs[RCU_CBLIST_NSEGS];
> > +	unsigned long gps[RCU_CBLIST_NSEGS];
> > +
> > +	if (!trace_rcu_segcb_stats_enabled())
> > +		return;
> 
> Yes, very good!
> 
> Paul just told me that RCU_TRACE can be used in production so that confirms that we
> wanted to avoid this loop of 8 iterations when tracing is disabled.

RCU's "don't try this in production" Kconfig option is PROVE_RCU.

I would be looking for checks that the sum of the segment lengths
match the overall ->len or checks that all of the segment lengths
are zero when ->cblist is empty to be guarded by something like
IS_ENABLED(CONFIG_PROVE_RCU).  Of course, checks of this sort need to
be confined to those portions of rcu_do_batch() that are excluding other
accesses to ->cblist.

But if rcu_segcblist_countseq() is invoked only when a specific trace
event is enabled, it should be OK to have it guarded only by RCU_TRACE.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-04 14:08     ` Paul E. McKenney
@ 2020-11-04 14:33       ` Frederic Weisbecker
  2020-11-07  0:05         ` Joel Fernandes
  0 siblings, 1 reply; 40+ messages in thread
From: Frederic Weisbecker @ 2020-11-04 14:33 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes (Google),
	linux-kernel, Neeraj Upadhyay, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Wed, Nov 04, 2020 at 06:08:07AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 03, 2020 at 04:17:31PM +0100, Frederic Weisbecker wrote:
> > On Tue, Nov 03, 2020 at 09:26:00AM -0500, Joel Fernandes (Google) wrote:
> > > +/*
> > > + * Return how many CBs each segment along with their gp_seq values.
> > > + *
> > > + * This function is O(N) where N is the number of segments. Only used from
> > > + * tracing code which is usually disabled in production.
> > > + */
> > > +#ifdef CONFIG_RCU_TRACE
> > > +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> > > +			 int cbcount[RCU_CBLIST_NSEGS],
> > > +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> > > +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> > > +		gpseq[i] = rsclp->gp_seq[i];
> > > +	}
> > > +}
> > > +
> > > +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> > > +{
> > > +	int cbs[RCU_CBLIST_NSEGS];
> > > +	unsigned long gps[RCU_CBLIST_NSEGS];
> > > +
> > > +	if (!trace_rcu_segcb_stats_enabled())
> > > +		return;
> > 
> > Yes, very good!
> > 
> > Paul just told me that RCU_TRACE can be used in production so that confirms that we
> > wanted to avoid this loop of 8 iterations when tracing is disabled.
> 
> RCU's "don't try this in production" Kconfig option is PROVE_RCU.
> 
> I would be looking for checks that the sum of the segment lengths
> match the overall ->len or checks that all of the segment lengths
> are zero when ->cblist is empty to be guarded by something like
> IS_ENABLED(CONFIG_PROVE_RCU).  Of course, checks of this sort need to
> be confined to those portions of rcu_do_batch() that are excluding other
> accesses to ->cblist.

Right.

> 
> But if rcu_segcblist_countseq() is invoked only when a specific trace
> event is enabled, it should be OK to have it guarded only by RCU_TRACE.

Indeed, so I think we are good.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-04 13:40   ` Neeraj Upadhyay
@ 2020-11-04 15:05     ` Joel Fernandes
  0 siblings, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2020-11-04 15:05 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Wed, Nov 04, 2020 at 07:10:17PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 11/3/2020 7:56 PM, Joel Fernandes (Google) wrote:
> > Track how the segcb list changes before/after acceleration, during
> > queuing and during dequeuing.
> > 
> > This has proved useful to discover an optimization to avoid unwanted GP
> > requests when there are no callbacks accelerated. The overhead is minimal as
> > each segment's length is now stored in the respective segment.
> > 
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >   include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
> >   kernel/rcu/rcu_segcblist.c | 34 ++++++++++++++++++++++++++++++++++
> >   kernel/rcu/rcu_segcblist.h |  5 +++++
> >   kernel/rcu/tree.c          |  9 +++++++++
> >   4 files changed, 73 insertions(+)
> > 
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index 155b5cb43cfd..5f8f2ee1a936 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
> >   		  __entry->qlen)
> >   );
> > +TRACE_EVENT_RCU(rcu_segcb_stats,
> > +
> > +		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
> 
> I think we need to use long[] instead of int[] for cb_count everywhere in
> this patch?

More than 4 billion callbacks on a single cblist sounds like a bug though so
int should work. Plus I prefer to keep the tracepoint size small (on 64-bit
systems, long is 64 bits, int is 32 bits).

Thanks for all the reviews on this and other patches!

 - Joel


> 
> 
> Thanks
> Neeraj
> 
> > +
> > +		TP_ARGS(ctx, cb_count, gp_seq),
> > +
> > +		TP_STRUCT__entry(
> > +			__field(const char *, ctx)
> > +			__array(int, cb_count, RCU_CBLIST_NSEGS)
> > +			__array(unsigned long, gp_seq, RCU_CBLIST_NSEGS)
> > +		),
> > +
> > +		TP_fast_assign(
> > +			__entry->ctx = ctx;
> > +			memcpy(__entry->cb_count, cb_count, RCU_CBLIST_NSEGS * sizeof(int));
> > +			memcpy(__entry->gp_seq, gp_seq, RCU_CBLIST_NSEGS * sizeof(unsigned long));
> > +		),
> > +
> > +		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
> > +			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
> > +			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
> > +			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
> > +
> > +);
> > +
> >   /*
> >    * Tracepoint for the registration of a single RCU callback of the special
> >    * kvfree() form.  The first argument is the RCU type, the second argument
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 357c19bbcb00..2a03949d0b82 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -14,6 +14,7 @@
> >   #include <linux/types.h>
> >   #include "rcu_segcblist.h"
> > +#include "rcu.h"
> >   /* Initialize simple callback list. */
> >   void rcu_cblist_init(struct rcu_cblist *rclp)
> > @@ -328,6 +329,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
> >   	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
> >   }
> > +/*
> > + * Return how many CBs each segment along with their gp_seq values.
> > + *
> > + * This function is O(N) where N is the number of segments. Only used from
> > + * tracing code which is usually disabled in production.
> > + */
> > +#ifdef CONFIG_RCU_TRACE
> > +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> > +			 int cbcount[RCU_CBLIST_NSEGS],
> > +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> > +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> > +		gpseq[i] = rsclp->gp_seq[i];
> > +	}
> > +}
> > +
> > +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> > +{
> > +	int cbs[RCU_CBLIST_NSEGS];
> > +	unsigned long gps[RCU_CBLIST_NSEGS];
> > +
> > +	if (!trace_rcu_segcb_stats_enabled())
> > +		return;
> > +
> > +	rcu_segcblist_countseq(rsclp, cbs, gps);
> > +
> > +	trace_rcu_segcb_stats(context, cbs, gps);
> > +}
> > +#endif
> > +
> >   /*
> >    * Extract only those callbacks still pending (not yet ready to be
> >    * invoked) from the specified rcu_segcblist structure and place them in
> > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > index cd35c9faaf51..7750734fa116 100644
> > --- a/kernel/rcu/rcu_segcblist.h
> > +++ b/kernel/rcu/rcu_segcblist.h
> > @@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
> >   bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
> >   void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
> >   			 struct rcu_segcblist *src_rsclp);
> > +#ifdef CONFIG_RCU_TRACE
> > +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context);
> > +#else
> > +#define __trace_rcu_segcb_stats(...)
> > +#endif
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 24c00020ab83..f6c6653b3ec2 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1497,6 +1497,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >   	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
> >   		return false;
> > +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPreAcc"));
> > +
> >   	/*
> >   	 * Callbacks are often registered with incomplete grace-period
> >   	 * information.  Something about the fact that getting exact
> > @@ -1517,6 +1519,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >   	else
> >   		trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
> > +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPostAcc"));
> > +
> >   	return ret;
> >   }
> > @@ -2466,11 +2470,14 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >   	rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
> >   	if (offloaded)
> >   		rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist);
> > +
> > +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbDequeued"));
> >   	rcu_nocb_unlock_irqrestore(rdp, flags);
> >   	/* Invoke callbacks. */
> >   	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
> >   	rhp = rcu_cblist_dequeue(&rcl);
> > +
> >   	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
> >   		rcu_callback_t f;
> > @@ -2983,6 +2990,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
> >   		trace_rcu_callback(rcu_state.name, head,
> >   				   rcu_segcblist_n_cbs(&rdp->cblist));
> > +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
> > +
> >   	/* Go handle any RCU core processing required. */
> >   	if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
> >   		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
> > 
> 
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of
> the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-11-04  0:05   ` Paul E. McKenney
@ 2020-11-04 15:09     ` Joel Fernandes
  0 siblings, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2020-11-04 15:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 04:05:54PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 03, 2020 at 09:25:57AM -0500, Joel Fernandes (Google) wrote:
> > Currently, rcu_do_batch() depends on the unsegmented callback list's len field
> > to know how many CBs are executed. This fields counts down from 0 as CBs are
> > dequeued.  It is possible that all CBs could not be run because of reaching
> > limits in which case the remaining unexecuted callbacks are requeued in the
> > CPU's segcblist.
> > 
> > The number of callbacks that were not requeued are then the negative count (how
> > many CBs were run) stored in the rcl->len which has been counting down on every
> > dequeue. This negative count is then added to the per-cpu segmented callback
> > list's to correct its count.
> > 
> > Such a design works against future efforts to track the length of each segment
> > of the segmented callback list. The reason is because
> > rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
> > list's length field (rcl->len) during extraction.
> > 
> > Also, the design of counting down from 0 is confusing and error-prone IMHO.
> > 
> > This commit therefore explicitly counts how many callbacks were executed in
> > rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
> > field, without relying on the negativity of rcl->len.
> > 
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Queued for testing and further review, thank you all!
> 
> I will be pulling these in as I look them over, and running tests
> in between times, so this will take some time.
> 
> As usual, I could not resist editing the commit log, so please
> see below to check whether I messed something up.

Looks good to me, thanks.

 - Joel



> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit d77ef2684cbff728c14b3c84a356139c52ca3a5e
> Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> Date:   Tue Nov 3 09:25:57 2020 -0500
> 
>     rcu/tree: Make rcu_do_batch count how many callbacks were executed
>     
>     The rcu_do_batch() function extracts the ready-to-invoke callbacks
>     from the rcu_segcblist located in the ->cblist field of the current
>     CPU's rcu_data structure.  These callbacks are first moved to a local
>     (unsegmented) rcu_cblist.  The rcu_do_batch() function then uses this
>     rcu_cblist's ->len field to count how many CBs it has invoked, but it
>     does so by counting that field down from zero.  Finally, this function
>     negates the value in this ->len field (resulting in a positive number)
>     and subtracts the result from the ->len field of the current CPU's
>     ->cblist field.
>     
>     Except that it is sometimes necessary for rcu_do_batch() to stop invoking
>     callbacks mid-stream, despite there being more ready to invoke, for
>     example, if a high-priority task wakes up.  In this case the remaining
>     not-yet-invoked callbacks are requeued back onto the CPU's ->cblist,
>     but remain in the ready-to-invoke segment of that list.  As above, the
>     negative of the local rcu_cblist's ->len field is still subtracted from
>     the ->len field of the current CPU's ->cblist field.
>     
>     The design of counting down from 0 is confusing and error-prone, plus
>     use of a positive count will make it easier to provide a uniform and
>     consistent API to deal with the per-segment counts that are added
>     later in this series.  For example, rcu_segcblist_extract_done_cbs()
>     can unconditionally populate the resulting unsegmented list's ->len
>     field during extraction.
>     
>     This commit therefore explicitly counts how many callbacks were executed
>     in rcu_do_batch() itself, counting up from zero, and then uses that
>     to update the per-CPU segcb list's ->len field, without relying on the
>     downcounting of rcl->len from zero.
>     
>     Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>     Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
>     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 2d2a6b6b9..bb246d8 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -95,7 +95,7 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
>   * This increase is fully ordered with respect to the callers accesses
>   * both before and after.
>   */
> -static void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
> +void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
>  {
>  #ifdef CONFIG_RCU_NOCB_CPU
>  	smp_mb__before_atomic(); /* Up to the caller! */
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 492262b..1d2d614 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -76,6 +76,7 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
>  }
>  
>  void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
> +void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
>  void rcu_segcblist_init(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
>  void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b6ec565..c4035e7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2436,7 +2436,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
>  	struct rcu_head *rhp;
>  	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
> -	long bl, count;
> +	long bl, count = 0;
>  	long pending, tlimit = 0;
>  
>  	/* If no callbacks are ready, just return. */
> @@ -2481,6 +2481,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
>  		rcu_callback_t f;
>  
> +		count++;
>  		debug_rcu_head_unqueue(rhp);
>  
>  		rcu_lock_acquire(&rcu_callback_map);
> @@ -2494,15 +2495,14 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  
>  		/*
>  		 * Stop only if limit reached and CPU has something to do.
> -		 * Note: The rcl structure counts down from zero.
>  		 */
> -		if (-rcl.len >= bl && !offloaded &&
> +		if (count >= bl && !offloaded &&
>  		    (need_resched() ||
>  		     (!is_idle_task(current) && !rcu_is_callbacks_kthread())))
>  			break;
>  		if (unlikely(tlimit)) {
>  			/* only call local_clock() every 32 callbacks */
> -			if (likely((-rcl.len & 31) || local_clock() < tlimit))
> +			if (likely((count & 31) || local_clock() < tlimit))
>  				continue;
>  			/* Exceeded the time limit, so leave. */
>  			break;
> @@ -2519,7 +2519,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  
>  	local_irq_save(flags);
>  	rcu_nocb_lock(rdp);
> -	count = -rcl.len;
>  	rdp->n_cbs_invoked += count;
>  	trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
>  			    is_idle_task(current), rcu_is_callbacks_kthread());
> @@ -2527,7 +2526,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	/* Update counts and requeue any remaining callbacks. */
>  	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
>  	smp_mb(); /* List handling before counting for rcu_barrier(). */
> -	rcu_segcblist_insert_count(&rdp->cblist, &rcl);
> +	rcu_segcblist_add_len(&rdp->cblist, -count);
>  
>  	/* Reinstate batch limit if we have worked down the excess. */
>  	count = rcu_segcblist_n_cbs(&rdp->cblist);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure
  2020-11-03 14:25 ` [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
  2020-11-04 13:37   ` Neeraj Upadhyay
@ 2020-11-04 17:01   ` Paul E. McKenney
  2020-11-07  0:01     ` Joel Fernandes
  1 sibling, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-04 17:01 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 09:25:58AM -0500, Joel Fernandes (Google) wrote:
> Add counting of segment lengths of segmented callback list.
> 
> This will be useful for a number of things such as knowing how big the
> ready-to-execute segment have gotten. The immediate benefit is ability
> to trace how the callbacks in the segmented callback list change.
> 
> Also this patch remove hacks related to using donecbs's ->len field as a
> temporary variable to save the segmented callback list's length. This cannot be
> done anymore and is not needed.
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thank you both!

A few questions and comments below.

							Thanx, Paul

> ---
>  include/linux/rcu_segcblist.h |   1 +
>  kernel/rcu/rcu_segcblist.c    | 120 ++++++++++++++++++++++------------
>  kernel/rcu/rcu_segcblist.h    |   2 -
>  3 files changed, 79 insertions(+), 44 deletions(-)
> 
> diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> index b36afe7b22c9..6c01f09a6456 100644
> --- a/include/linux/rcu_segcblist.h
> +++ b/include/linux/rcu_segcblist.h
> @@ -72,6 +72,7 @@ struct rcu_segcblist {
>  #else
>  	long len;
>  #endif
> +	long seglen[RCU_CBLIST_NSEGS];
>  	u8 enabled;
>  	u8 offloaded;
>  };
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index bb246d8c6ef1..357c19bbcb00 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -7,10 +7,11 @@
>   * Authors: Paul E. McKenney <paulmck@linux.ibm.com>
>   */
>  
> -#include <linux/types.h>
> -#include <linux/kernel.h>
> +#include <linux/cpu.h>
>  #include <linux/interrupt.h>
> +#include <linux/kernel.h>
>  #include <linux/rcupdate.h>
> +#include <linux/types.h>
>  
>  #include "rcu_segcblist.h"
>  
> @@ -88,6 +89,46 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
>  #endif
>  }
>  
> +/* Get the length of a segment of the rcu_segcblist structure. */
> +static long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg)
> +{
> +	return READ_ONCE(rsclp->seglen[seg]);
> +}
> +
> +/* Set the length of a segment of the rcu_segcblist structure. */
> +static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, long v)
> +{
> +	WRITE_ONCE(rsclp->seglen[seg], v);
> +}
> +
> +/* Return number of callbacks in a segment of the segmented callback list. */

A casual reader might be forgiven for being confused by the combination
of "Return" in the above comment and the "void" function type below.
So shouldn't this comment be something like "Add the specified number
of callbacks to the specified segment..."?

> +static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
> +{
> +	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
> +}
> +
> +/* Move from's segment length to to's segment. */
> +static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, int to)
> +{
> +	long len;
> +
> +	if (from == to)
> +		return;
> +
> +	len = rcu_segcblist_get_seglen(rsclp, from);
> +	if (!len)
> +		return;
> +
> +	rcu_segcblist_add_seglen(rsclp, to, len);
> +	rcu_segcblist_set_seglen(rsclp, from, 0);
> +}
> +
> +/* Increment segment's length. */
> +static void rcu_segcblist_inc_seglen(struct rcu_segcblist *rsclp, int seg)
> +{
> +	rcu_segcblist_add_seglen(rsclp, seg, 1);
> +}
> +
>  /*
>   * Increase the numeric length of an rcu_segcblist structure by the
>   * specified amount, which can be negative.  This can cause the ->len
> @@ -119,26 +160,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp)
>  	rcu_segcblist_add_len(rsclp, 1);
>  }
>  
> -/*
> - * Exchange the numeric length of the specified rcu_segcblist structure
> - * with the specified value.  This can cause the ->len field to disagree
> - * with the actual number of callbacks on the structure.  This exchange is
> - * fully ordered with respect to the callers accesses both before and after.
> - */
> -static long rcu_segcblist_xchg_len(struct rcu_segcblist *rsclp, long v)
> -{
> -#ifdef CONFIG_RCU_NOCB_CPU
> -	return atomic_long_xchg(&rsclp->len, v);
> -#else
> -	long ret = rsclp->len;
> -
> -	smp_mb(); /* Up to the caller! */
> -	WRITE_ONCE(rsclp->len, v);
> -	smp_mb(); /* Up to the caller! */
> -	return ret;
> -#endif
> -}
> -
>  /*
>   * Initialize an rcu_segcblist structure.
>   */
> @@ -149,8 +170,10 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
>  	BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq));
>  	BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq));
>  	rsclp->head = NULL;
> -	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
>  		rsclp->tails[i] = &rsclp->head;
> +		rcu_segcblist_set_seglen(rsclp, i, 0);
> +	}
>  	rcu_segcblist_set_len(rsclp, 0);
>  	rsclp->enabled = 1;
>  }
> @@ -246,6 +269,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
>  {
>  	rcu_segcblist_inc_len(rsclp);
>  	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
> +	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
>  	rhp->next = NULL;
>  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
>  	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], &rhp->next);
> @@ -274,27 +298,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
>  	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
>  		if (rsclp->tails[i] != rsclp->tails[i - 1])
>  			break;
> +	rcu_segcblist_inc_seglen(rsclp, i);
>  	WRITE_ONCE(*rsclp->tails[i], rhp);
>  	for (; i <= RCU_NEXT_TAIL; i++)
>  		WRITE_ONCE(rsclp->tails[i], &rhp->next);
>  	return true;
>  }
>  
> -/*
> - * Extract only the counts from the specified rcu_segcblist structure,
> - * and place them in the specified rcu_cblist structure.  This function
> - * supports both callback orphaning and invocation, hence the separation
> - * of counts and callbacks.  (Callbacks ready for invocation must be
> - * orphaned and adopted separately from pending callbacks, but counts
> - * apply to all callbacks.  Locking must be used to make sure that
> - * both orphaned-callbacks lists are consistent.)
> - */
> -void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
> -					       struct rcu_cblist *rclp)
> -{
> -	rclp->len = rcu_segcblist_xchg_len(rsclp, 0);
> -}
> -
>  /*
>   * Extract only those callbacks ready to be invoked from the specified
>   * rcu_segcblist structure and place them in the specified rcu_cblist
> @@ -307,6 +317,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>  
>  	if (!rcu_segcblist_ready_cbs(rsclp))
>  		return; /* Nothing to do. */
> +	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_DONE_TAIL);
>  	*rclp->tail = rsclp->head;
>  	WRITE_ONCE(rsclp->head, *rsclp->tails[RCU_DONE_TAIL]);
>  	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
> @@ -314,6 +325,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>  	for (i = RCU_CBLIST_NSEGS - 1; i >= RCU_DONE_TAIL; i--)
>  		if (rsclp->tails[i] == rsclp->tails[RCU_DONE_TAIL])
>  			WRITE_ONCE(rsclp->tails[i], &rsclp->head);
> +	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
>  }
>  
>  /*
> @@ -330,11 +342,16 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
>  
>  	if (!rcu_segcblist_pend_cbs(rsclp))
>  		return; /* Nothing to do. */
> +	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
> +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
> +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);

This should be a "for" loop.  Yes, the number and names of the segments
hasn't changed for a good long time, but nothing like code as above to
inspire Murphy to more mischief.  :-/

Actually, why not put the summation in the existing "for" loop below?
That would save a line of code in addition to providing less inspiration
for Mr. Murphy.

>  	*rclp->tail = *rsclp->tails[RCU_DONE_TAIL];
>  	rclp->tail = rsclp->tails[RCU_NEXT_TAIL];
>  	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
> -	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++)
> +	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) {
>  		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
> +		rcu_segcblist_set_seglen(rsclp, i, 0);
> +	}
>  }
>  
>  /*
> @@ -345,7 +362,6 @@ void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp,
>  				struct rcu_cblist *rclp)
>  {
>  	rcu_segcblist_add_len(rsclp, rclp->len);
> -	rclp->len = 0;

You audited the callers, correct?

>  }
>  
>  /*
> @@ -359,6 +375,7 @@ void rcu_segcblist_insert_done_cbs(struct rcu_segcblist *rsclp,
>  
>  	if (!rclp->head)
>  		return; /* No callbacks to move. */
> +	rcu_segcblist_add_seglen(rsclp, RCU_DONE_TAIL, rclp->len);
>  	*rclp->tail = rsclp->head;
>  	WRITE_ONCE(rsclp->head, rclp->head);
>  	for (i = RCU_DONE_TAIL; i < RCU_CBLIST_NSEGS; i++)
> @@ -379,6 +396,8 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
>  {
>  	if (!rclp->head)
>  		return; /* Nothing to do. */
> +
> +	rcu_segcblist_add_seglen(rsclp, RCU_NEXT_TAIL, rclp->len);
>  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
>  	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
>  }
> @@ -403,6 +422,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>  		if (ULONG_CMP_LT(seq, rsclp->gp_seq[i]))
>  			break;
>  		WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
> +		rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
>  	}
>  
>  	/* If no callbacks moved, nothing more need be done. */
> @@ -423,6 +443,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>  		if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
>  			break;  /* No more callbacks. */
>  		WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
> +		rcu_segcblist_move_seglen(rsclp, i, j);
>  		rsclp->gp_seq[j] = rsclp->gp_seq[i];
>  	}
>  }
> @@ -444,7 +465,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>   */
>  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  {
> -	int i;
> +	int i, j;
>  
>  	WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
>  	if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
> @@ -487,6 +508,10 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  	if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
>  		return false;
>  
> +	/* Accounting: everything below i is about to get merged into i. */
> +	for (j = i + 1; j <= RCU_NEXT_TAIL; j++)
> +		rcu_segcblist_move_seglen(rsclp, j, i);
> +
>  	/*
>  	 * Merge all later callbacks, including newly arrived callbacks,
>  	 * into the segment located by the for-loop above.  Assign "seq"
> @@ -514,13 +539,24 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
>  	struct rcu_cblist donecbs;
>  	struct rcu_cblist pendcbs;
>  
> +	lockdep_assert_cpus_held();
> +
>  	rcu_cblist_init(&donecbs);
>  	rcu_cblist_init(&pendcbs);
> -	rcu_segcblist_extract_count(src_rsclp, &donecbs);
> +
>  	rcu_segcblist_extract_done_cbs(src_rsclp, &donecbs);
>  	rcu_segcblist_extract_pend_cbs(src_rsclp, &pendcbs);
> +
> +	/*
> +	 * No need smp_mb() before setting length to 0, because CPU hotplug
> +	 * lock excludes rcu_barrier.
> +	 */
> +	rcu_segcblist_set_len(src_rsclp, 0);
> +
>  	rcu_segcblist_insert_count(dst_rsclp, &donecbs);
> +	rcu_segcblist_insert_count(dst_rsclp, &pendcbs);
>  	rcu_segcblist_insert_done_cbs(dst_rsclp, &donecbs);
>  	rcu_segcblist_insert_pend_cbs(dst_rsclp, &pendcbs);
> +
>  	rcu_segcblist_init(src_rsclp);
>  }
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 1d2d61406463..cd35c9faaf51 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -89,8 +89,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
>  			   struct rcu_head *rhp);
>  bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
>  			   struct rcu_head *rhp);
> -void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
> -				 struct rcu_cblist *rclp);
>  void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>  				    struct rcu_cblist *rclp);
>  void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
> -- 
> 2.29.1.341.ge80a0c044ae-goog
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include
  2020-11-03 14:26 ` [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include Joel Fernandes (Google)
@ 2020-11-05  3:48   ` Paul E. McKenney
  2020-11-05 14:28     ` Paul E. McKenney
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-05  3:48 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 09:26:01AM -0500, Joel Fernandes (Google) wrote:
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

This one looks fine, but depends on the earlier "rcu/segcblist: Add
counters to segcblist datastructure" patch, which also changes the list
of #include directives for this file.  It manages to avoid conflicting
with "rcu/trace: Add tracing for how segcb list changes", despite this
one also changing the #include directives.

I am testing it just out of curiosity, but it might make more sense
to fold this one into "rcu/segcblist: Add counters to segcblist
datastructure".

							Thanx, Paul

> ---
>  kernel/rcu/rcu_segcblist.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 2a03949d0b82..e9e72d72f7a6 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -10,7 +10,6 @@
>  #include <linux/cpu.h>
>  #include <linux/interrupt.h>
>  #include <linux/kernel.h>
> -#include <linux/rcupdate.h>
>  #include <linux/types.h>
>  
>  #include "rcu_segcblist.h"
> -- 
> 2.29.1.341.ge80a0c044ae-goog
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s
  2020-11-03 14:26 ` [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s Joel Fernandes (Google)
@ 2020-11-05  3:57   ` Paul E. McKenney
  2020-11-07  0:26     ` Joel Fernandes
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-05  3:57 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 09:26:02AM -0500, Joel Fernandes (Google) wrote:
> This memory barrier is not needed as rcu_segcblist_add_len() already
> includes a memory barrier *before* and *after* the length of the list is
> updated.
> 
> Same reasoning for rcu_segcblist_enqueue().

I suggest a commit log like the following:

------------------------------------------------------------------------

The full memory barriers in rcu_segcblist_enqueue() and in rcu_do_batch()
are not needed because rcu_segcblist_add_len(), and thus also
rcu_segcblist_inc_len(), already includes a memory barrier *before*
and *after* the length of the list is updated.

This commit therefore removes these redundant smp_mb() invocations.

------------------------------------------------------------------------

Other than that, looks good!  I could hand-apply it, but that
would just cause more churn with the addition of the call to
rcu_segcblist_inc_seglen().  So could you please update the commit log
when you repost, whether to the mailing list or from your git tree?

							Thanx, Paul

> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/rcu_segcblist.c | 1 -
>  kernel/rcu/tree.c          | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index e9e72d72f7a6..d96272e8d604 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -268,7 +268,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
>  			   struct rcu_head *rhp)
>  {
>  	rcu_segcblist_inc_len(rsclp);
> -	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
>  	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
>  	rhp->next = NULL;
>  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f6c6653b3ec2..fb2a5ac4a59c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2525,7 +2525,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  
>  	/* Update counts and requeue any remaining callbacks. */
>  	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
> -	smp_mb(); /* List handling before counting for rcu_barrier(). */
>  	rcu_segcblist_add_len(&rdp->cblist, -count);
>  
>  	/* Reinstate batch limit if we have worked down the excess. */
> -- 
> 2.29.1.341.ge80a0c044ae-goog
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include
  2020-11-05  3:48   ` Paul E. McKenney
@ 2020-11-05 14:28     ` Paul E. McKenney
  2020-11-07  0:27       ` Joel Fernandes
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-05 14:28 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Wed, Nov 04, 2020 at 07:48:23PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 03, 2020 at 09:26:01AM -0500, Joel Fernandes (Google) wrote:
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> This one looks fine, but depends on the earlier "rcu/segcblist: Add
> counters to segcblist datastructure" patch, which also changes the list
> of #include directives for this file.  It manages to avoid conflicting
> with "rcu/trace: Add tracing for how segcb list changes", despite this
> one also changing the #include directives.
> 
> I am testing it just out of curiosity, but it might make more sense
> to fold this one into "rcu/segcblist: Add counters to segcblist
> datastructure".

And it does pass light rcutorture.  ;-)

							Thanx, Paul

> > ---
> >  kernel/rcu/rcu_segcblist.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 2a03949d0b82..e9e72d72f7a6 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -10,7 +10,6 @@
> >  #include <linux/cpu.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/kernel.h>
> > -#include <linux/rcupdate.h>
> >  #include <linux/types.h>
> >  
> >  #include "rcu_segcblist.h"
> > -- 
> > 2.29.1.341.ge80a0c044ae-goog
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb()
  2020-11-03 14:26 ` [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb() Joel Fernandes (Google)
@ 2020-11-05 18:55   ` Paul E. McKenney
  2020-11-06 22:41     ` Joel Fernandes
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-05 18:55 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 09:26:03AM -0500, Joel Fernandes (Google) wrote:
> Memory barriers are needed when updating the full length of the
> segcblist, however it is not fully clearly why one is needed before and
> after. This patch therefore adds additional comments to the function
> header to explain it.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Looks good, thank you!  As always, I could not resist the urge to
do a bit of wordsmithing, so that the queued commit is as shown
below.  Please let me know if I messed anything up.

							Thanx, Paul

------------------------------------------------------------------------

commit 7dac7adefcae7558b3a85a16f51186d621623733
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Tue Nov 3 09:26:03 2020 -0500

    rcu/segcblist: Add additional comments to explain smp_mb()
    
    One counter-intuitive property of RCU is the fact that full memory
    barriers are needed both before and after updates to the full
    (non-segmented) length.  This patch therefore helps to assist the
    reader's intuition by adding appropriate comments.
    
    [ paulmck:  Wordsmithing. ]
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index bb246d8..b6dda7c 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -94,17 +94,77 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
  * field to disagree with the actual number of callbacks on the structure.
  * This increase is fully ordered with respect to the callers accesses
  * both before and after.
+ *
+ * So why on earth is a memory barrier required both before and after
+ * the update to the ->len field???
+ *
+ * The reason is that rcu_barrier() locklessly samples each CPU's ->len
+ * field, and if a given CPU's field is zero, avoids IPIing that CPU.
+ * This can of course race with both queuing and invoking of callbacks.
+ * Failng to correctly handle either of these races could result in
+ * rcu_barrier() failing to IPI a CPU that actually had callbacks queued
+ * which rcu_barrier() was obligated to wait on.  And if rcu_barrier()
+ * failed to wait on such a callback, unloading certain kernel modules
+ * would result in calls to functions whose code was no longer present in
+ * the kernel, for but one example.
+ *
+ * Therefore, ->len transitions from 1->0 and 0->1 have to be carefully
+ * ordered with respect with both list modifications and the rcu_barrier().
+ *
+ * The queuing case is CASE 1 and the invoking case is CASE 2.
+ *
+ * CASE 1: Suppose that CPU 0 has no callbacks queued, but invokes
+ * call_rcu() just as CPU 1 invokes rcu_barrier().  CPU 0's ->len field
+ * will transition from 0->1, which is one of the transitions that must be
+ * handled carefully.  Without the full memory barriers before the ->len
+ * update and at the beginning of rcu_barrier(), the following could happen:
+ *
+ * CPU 0				CPU 1
+ *
+ * call_rcu().
+ *                      		rcu_barrier() sees ->len as 0.
+ * set ->len = 1.
+ *                      		rcu_barrier() does nothing.
+ *					module is unloaded.
+ * callback invokes unloaded function!
+ *
+ * With the full barriers, any case where rcu_barrier() sees ->len as 0 will
+ * have unambiguously preceded the return from the racing call_rcu(), which
+ * means that this call_rcu() invocation is OK to not wait on.  After all,
+ * you are supposed to make sure that any problematic call_rcu() invocations
+ * happen before the rcu_barrier().
+ *
+ *
+ * CASE 2: Suppose that CPU 0 is invoking its last callback just as CPU 1 invokes
+ * rcu_barrier().  CPU 0's ->len field will transition from 1->0, which is one
+ * of the transitions that must be handled carefully.  Without the full memory
+ * barriers after the ->len update and at the end of rcu_barrier(), the following
+ * could happen:
+ * 
+ * CPU 0				CPU 1
+ *
+ * start invoking last callback
+ * set ->len = 0 (reordered)
+ *                      		rcu_barrier() sees ->len as 0
+ *                      		rcu_barrier() does nothing.
+ *					module is unloaded
+ * callback executing after unloaded!
+ *
+ * With the full barriers, any case where rcu_barrier() sees ->len as 0
+ * will be fully ordered after the completion of the callback function,
+ * so that the module unloading operation is completely safe.
+ * 
  */
 void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
 {
 #ifdef CONFIG_RCU_NOCB_CPU
-	smp_mb__before_atomic(); /* Up to the caller! */
+	smp_mb__before_atomic(); // Read header comment above.
 	atomic_long_add(v, &rsclp->len);
-	smp_mb__after_atomic(); /* Up to the caller! */
+	smp_mb__after_atomic();  // Read header comment above.
 #else
-	smp_mb(); /* Up to the caller! */
+	smp_mb(); // Read header comment above.
 	WRITE_ONCE(rsclp->len, rsclp->len + v);
-	smp_mb(); /* Up to the caller! */
+	smp_mb(); // Read header comment above.
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb()
  2020-11-05 18:55   ` Paul E. McKenney
@ 2020-11-06 22:41     ` Joel Fernandes
  2020-11-10  1:28       ` Paul E. McKenney
  0 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes @ 2020-11-06 22:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Thu, Nov 05, 2020 at 10:55:51AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 03, 2020 at 09:26:03AM -0500, Joel Fernandes (Google) wrote:
> > Memory barriers are needed when updating the full length of the
> > segcblist, however it is not fully clearly why one is needed before and
> > after. This patch therefore adds additional comments to the function
> > header to explain it.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Looks good, thank you!  As always, I could not resist the urge to
> do a bit of wordsmithing, so that the queued commit is as shown
> below.  Please let me know if I messed anything up.

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 7dac7adefcae7558b3a85a16f51186d621623733
> Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> Date:   Tue Nov 3 09:26:03 2020 -0500
> 
>     rcu/segcblist: Add additional comments to explain smp_mb()
>     
>     One counter-intuitive property of RCU is the fact that full memory
>     barriers are needed both before and after updates to the full
>     (non-segmented) length.  This patch therefore helps to assist the
>     reader's intuition by adding appropriate comments.
>     
>     [ paulmck:  Wordsmithing. ]
>     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index bb246d8..b6dda7c 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -94,17 +94,77 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
>   * field to disagree with the actual number of callbacks on the structure.
>   * This increase is fully ordered with respect to the callers accesses
>   * both before and after.
> + *
> + * So why on earth is a memory barrier required both before and after
> + * the update to the ->len field???
> + *
> + * The reason is that rcu_barrier() locklessly samples each CPU's ->len
> + * field, and if a given CPU's field is zero, avoids IPIing that CPU.
> + * This can of course race with both queuing and invoking of callbacks.
> + * Failng to correctly handle either of these races could result in
> + * rcu_barrier() failing to IPI a CPU that actually had callbacks queued
> + * which rcu_barrier() was obligated to wait on.  And if rcu_barrier()
> + * failed to wait on such a callback, unloading certain kernel modules
> + * would result in calls to functions whose code was no longer present in
> + * the kernel, for but one example.
> + *
> + * Therefore, ->len transitions from 1->0 and 0->1 have to be carefully
> + * ordered with respect with both list modifications and the rcu_barrier().
> + *
> + * The queuing case is CASE 1 and the invoking case is CASE 2.
> + *
> + * CASE 1: Suppose that CPU 0 has no callbacks queued, but invokes
> + * call_rcu() just as CPU 1 invokes rcu_barrier().  CPU 0's ->len field
> + * will transition from 0->1, which is one of the transitions that must be
> + * handled carefully.  Without the full memory barriers before the ->len
> + * update and at the beginning of rcu_barrier(), the following could happen:
> + *
> + * CPU 0				CPU 1
> + *
> + * call_rcu().
> + *                      		rcu_barrier() sees ->len as 0.
> + * set ->len = 1.
> + *                      		rcu_barrier() does nothing.
> + *					module is unloaded.
> + * callback invokes unloaded function!
> + *
> + * With the full barriers, any case where rcu_barrier() sees ->len as 0 will
> + * have unambiguously preceded the return from the racing call_rcu(), which
> + * means that this call_rcu() invocation is OK to not wait on.  After all,
> + * you are supposed to make sure that any problematic call_rcu() invocations
> + * happen before the rcu_barrier().

Unfortunately, I did not understand your explanation. To me the barrier
*before* the setting of length is needed on CPU0 only for 1->0 transition
(Dequeue). Where as in
your example above, it is for enqueue.

This was case 1 in my patch:

+ * To illustrate the problematic scenario to avoid:
+ * P0 (what P1 sees)	P1
+ * set len = 0
+ *                      rcu_barrier sees len as 0
+ * dequeue from list
+ *                      rcu_barrier does nothing.
+ *


Here, P1 should see the transition of 1->0 *after* the CB is dequeued. Which
means you needed a memory barrier *before* the setting of len from 1->0 and
*after* the dequeue. IOW, rcu_barrier should 'see' the memory ordering as:

1. dequeue
2. set len from 1 -> 0.

For the enqueue case, it is the reverse, rcu_barrier should see:
1. set len from 0 -> 1
2. enqueue

Either way, the point I think I was trying to make is that the length should
always be seen as non-zero if the list is non-empty. Basically, the
rcu_barrier() should always not do the fast-path if the list is non-empty.
Worst-case it might do the slow-path when it is not necessary, but it should
never do the fast-path when it was not supposed to.

Thoughts?

thanks,

 - Joel



> + *
> + *
> + * CASE 2: Suppose that CPU 0 is invoking its last callback just as CPU 1 invokes
> + * rcu_barrier().  CPU 0's ->len field will transition from 1->0, which is one
> + * of the transitions that must be handled carefully.  Without the full memory
> + * barriers after the ->len update and at the end of rcu_barrier(), the following
> + * could happen:
> + * 
> + * CPU 0				CPU 1
> + *
> + * start invoking last callback
> + * set ->len = 0 (reordered)
> + *                      		rcu_barrier() sees ->len as 0
> + *                      		rcu_barrier() does nothing.
> + *					module is unloaded
> + * callback executing after unloaded!
> + *
> + * With the full barriers, any case where rcu_barrier() sees ->len as 0
> + * will be fully ordered after the completion of the callback function,
> + * so that the module unloading operation is completely safe.
> + * 
>   */
>  void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
>  {
>  #ifdef CONFIG_RCU_NOCB_CPU
> -	smp_mb__before_atomic(); /* Up to the caller! */
> +	smp_mb__before_atomic(); // Read header comment above.
>  	atomic_long_add(v, &rsclp->len);
> -	smp_mb__after_atomic(); /* Up to the caller! */
> +	smp_mb__after_atomic();  // Read header comment above.
>  #else
> -	smp_mb(); /* Up to the caller! */
> +	smp_mb(); // Read header comment above.
>  	WRITE_ONCE(rsclp->len, rsclp->len + v);
> -	smp_mb(); /* Up to the caller! */
> +	smp_mb(); // Read header comment above.
>  #endif
>  }
>  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure
  2020-11-04 17:01   ` Paul E. McKenney
@ 2020-11-07  0:01     ` Joel Fernandes
  2020-11-07  0:18       ` Joel Fernandes
  0 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes @ 2020-11-07  0:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Wed, Nov 04, 2020 at 09:01:33AM -0800, Paul E. McKenney wrote:

> A casual reader might be forgiven for being confused by the combination
> of "Return" in the above comment and the "void" function type below.
> So shouldn't this comment be something like "Add the specified number
> of callbacks to the specified segment..."?

You are right, sorry and will fix it.

> > @@ -330,11 +342,16 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
> >  
> >  	if (!rcu_segcblist_pend_cbs(rsclp))
> >  		return; /* Nothing to do. */
> > +	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
> > +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
> > +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
> 
> This should be a "for" loop.  Yes, the number and names of the segments
> hasn't changed for a good long time, but nothing like code as above to
> inspire Murphy to more mischief.  :-/
> 
> Actually, why not put the summation in the existing "for" loop below?
> That would save a line of code in addition to providing less inspiration
> for Mr. Murphy.

I can do that. Actually Frederic suggested the same thing but I was reluctant
as I felt it did not give much LOC benefit. Will revisit it.

> 
> >  	*rclp->tail = *rsclp->tails[RCU_DONE_TAIL];
> >  	rclp->tail = rsclp->tails[RCU_NEXT_TAIL];
> >  	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
> > -	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++)
> > +	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) {
> >  		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
> > +		rcu_segcblist_set_seglen(rsclp, i, 0);
> > +	}
> >  }
> >  
> >  /*
> > @@ -345,7 +362,6 @@ void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp,
> >  				struct rcu_cblist *rclp)
> >  {
> >  	rcu_segcblist_add_len(rsclp, rclp->len);
> > -	rclp->len = 0;
> 
> You audited the callers, correct?

Yep.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-04 14:33       ` Frederic Weisbecker
@ 2020-11-07  0:05         ` Joel Fernandes
  0 siblings, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2020-11-07  0:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, linux-kernel, Neeraj Upadhyay, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Wed, Nov 04, 2020 at 03:33:14PM +0100, Frederic Weisbecker wrote:
> On Wed, Nov 04, 2020 at 06:08:07AM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 03, 2020 at 04:17:31PM +0100, Frederic Weisbecker wrote:
> > > On Tue, Nov 03, 2020 at 09:26:00AM -0500, Joel Fernandes (Google) wrote:
> > > > +/*
> > > > + * Return how many CBs each segment along with their gp_seq values.
> > > > + *
> > > > + * This function is O(N) where N is the number of segments. Only used from
> > > > + * tracing code which is usually disabled in production.
> > > > + */
> > > > +#ifdef CONFIG_RCU_TRACE
> > > > +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> > > > +			 int cbcount[RCU_CBLIST_NSEGS],
> > > > +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> > > > +{
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> > > > +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> > > > +		gpseq[i] = rsclp->gp_seq[i];
> > > > +	}
> > > > +}
> > > > +
> > > > +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> > > > +{
> > > > +	int cbs[RCU_CBLIST_NSEGS];
> > > > +	unsigned long gps[RCU_CBLIST_NSEGS];
> > > > +
> > > > +	if (!trace_rcu_segcb_stats_enabled())
> > > > +		return;
> > > 
> > > Yes, very good!
> > > 
> > > Paul just told me that RCU_TRACE can be used in production so that confirms that we
> > > wanted to avoid this loop of 8 iterations when tracing is disabled.
> > 
> > RCU's "don't try this in production" Kconfig option is PROVE_RCU.
> > 
> > I would be looking for checks that the sum of the segment lengths
> > match the overall ->len or checks that all of the segment lengths
> > are zero when ->cblist is empty to be guarded by something like
> > IS_ENABLED(CONFIG_PROVE_RCU).  Of course, checks of this sort need to
> > be confined to those portions of rcu_do_batch() that are excluding other
> > accesses to ->cblist.
> 
> Right.
> 
> > 
> > But if rcu_segcblist_countseq() is invoked only when a specific trace
> > event is enabled, it should be OK to have it guarded only by RCU_TRACE.
> 
> Indeed, so I think we are good.

Thanks, so the only changes are to patches 2/7 and 4/7 which I will work on.
1/7 was already taken by Paul. For 7/7, it sounds like I did not understand
Paul's reworks on the comments and we're still discussing it; but some
comments are better than none, so I am Ok with Pauls version of it.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure
  2020-11-07  0:01     ` Joel Fernandes
@ 2020-11-07  0:18       ` Joel Fernandes
  2020-11-07  0:48         ` Paul E. McKenney
  0 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes @ 2020-11-07  0:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Fri, Nov 06, 2020 at 07:01:57PM -0500, Joel Fernandes wrote:
> On Wed, Nov 04, 2020 at 09:01:33AM -0800, Paul E. McKenney wrote:
> 
> > A casual reader might be forgiven for being confused by the combination
> > of "Return" in the above comment and the "void" function type below.
> > So shouldn't this comment be something like "Add the specified number
> > of callbacks to the specified segment..."?
> 
> You are right, sorry and will fix it.
> 
> > > @@ -330,11 +342,16 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
> > >  
> > >  	if (!rcu_segcblist_pend_cbs(rsclp))
> > >  		return; /* Nothing to do. */
> > > +	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
> > > +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
> > > +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
> > 
> > This should be a "for" loop.  Yes, the number and names of the segments
> > hasn't changed for a good long time, but nothing like code as above to
> > inspire Murphy to more mischief.  :-/
> > 
> > Actually, why not put the summation in the existing "for" loop below?
> > That would save a line of code in addition to providing less inspiration
> > for Mr. Murphy.
> 
> I can do that. Actually Frederic suggested the same thing but I was reluctant
> as I felt it did not give much LOC benefit. Will revisit it.

It reduces 1 line of code :) I changed it to the below, will update the patch:

---8<-----------------------

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 9b43d686b1f3..bff9b2253e50 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -101,7 +101,7 @@ static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, long
 	WRITE_ONCE(rsclp->seglen[seg], v);
 }
 
-/* Return number of callbacks in a segment of the segmented callback list. */
+/* Increase the numeric length of a segment by a specified amount. */
 static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
 {
 	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
@@ -406,13 +406,12 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
 
 	if (!rcu_segcblist_pend_cbs(rsclp))
 		return; /* Nothing to do. */
-	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
-		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
-		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
+	rclp->len = 0;
 	*rclp->tail = *rsclp->tails[RCU_DONE_TAIL];
 	rclp->tail = rsclp->tails[RCU_NEXT_TAIL];
 	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
 	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) {
+		rclp->len += rcu_segcblist_get_seglen(rsclp, i);
 		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
 		rcu_segcblist_set_seglen(rsclp, i, 0);
 	}

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s
  2020-11-05  3:57   ` Paul E. McKenney
@ 2020-11-07  0:26     ` Joel Fernandes
  2020-11-10  1:41       ` Paul E. McKenney
  0 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes @ 2020-11-07  0:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Wed, Nov 04, 2020 at 07:57:13PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 03, 2020 at 09:26:02AM -0500, Joel Fernandes (Google) wrote:
> > This memory barrier is not needed as rcu_segcblist_add_len() already
> > includes a memory barrier *before* and *after* the length of the list is
> > updated.
> > 
> > Same reasoning for rcu_segcblist_enqueue().
> 
> I suggest a commit log like the following:
> 
> ------------------------------------------------------------------------
> 
> The full memory barriers in rcu_segcblist_enqueue() and in rcu_do_batch()
> are not needed because rcu_segcblist_add_len(), and thus also
> rcu_segcblist_inc_len(), already includes a memory barrier *before*
> and *after* the length of the list is updated.
> 
> This commit therefore removes these redundant smp_mb() invocations.
> 
> ------------------------------------------------------------------------
> 
> Other than that, looks good!  I could hand-apply it, but that
> would just cause more churn with the addition of the call to
> rcu_segcblist_inc_seglen().  So could you please update the commit log
> when you repost, whether to the mailing list or from your git tree?

Done, I updated it in my tree. I will share the link to tree on IRC.

thanks,

 - Joel

> 
> 							Thanx, Paul
> 
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/rcu_segcblist.c | 1 -
> >  kernel/rcu/tree.c          | 1 -
> >  2 files changed, 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index e9e72d72f7a6..d96272e8d604 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -268,7 +268,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
> >  			   struct rcu_head *rhp)
> >  {
> >  	rcu_segcblist_inc_len(rsclp);
> > -	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
> >  	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
> >  	rhp->next = NULL;
> >  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index f6c6653b3ec2..fb2a5ac4a59c 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2525,7 +2525,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >  
> >  	/* Update counts and requeue any remaining callbacks. */
> >  	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
> > -	smp_mb(); /* List handling before counting for rcu_barrier(). */
> >  	rcu_segcblist_add_len(&rdp->cblist, -count);
> >  
> >  	/* Reinstate batch limit if we have worked down the excess. */
> > -- 
> > 2.29.1.341.ge80a0c044ae-goog
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include
  2020-11-05 14:28     ` Paul E. McKenney
@ 2020-11-07  0:27       ` Joel Fernandes
  0 siblings, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2020-11-07  0:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Thu, Nov 05, 2020 at 06:28:10AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 04, 2020 at 07:48:23PM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 03, 2020 at 09:26:01AM -0500, Joel Fernandes (Google) wrote:
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > This one looks fine, but depends on the earlier "rcu/segcblist: Add
> > counters to segcblist datastructure" patch, which also changes the list
> > of #include directives for this file.  It manages to avoid conflicting
> > with "rcu/trace: Add tracing for how segcb list changes", despite this
> > one also changing the #include directives.
> > 
> > I am testing it just out of curiosity, but it might make more sense
> > to fold this one into "rcu/segcblist: Add counters to segcblist
> > datastructure".
> 
> And it does pass light rcutorture.  ;-)

Cool, I squashed it into 2/7 and updated my tree.

thanks,

 - Joel

> 							Thanx, Paul
> 
> > > ---
> > >  kernel/rcu/rcu_segcblist.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > > index 2a03949d0b82..e9e72d72f7a6 100644
> > > --- a/kernel/rcu/rcu_segcblist.c
> > > +++ b/kernel/rcu/rcu_segcblist.c
> > > @@ -10,7 +10,6 @@
> > >  #include <linux/cpu.h>
> > >  #include <linux/interrupt.h>
> > >  #include <linux/kernel.h>
> > > -#include <linux/rcupdate.h>
> > >  #include <linux/types.h>
> > >  
> > >  #include "rcu_segcblist.h"
> > > -- 
> > > 2.29.1.341.ge80a0c044ae-goog
> > > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure
  2020-11-07  0:18       ` Joel Fernandes
@ 2020-11-07  0:48         ` Paul E. McKenney
  2020-11-10  1:39           ` Paul E. McKenney
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-07  0:48 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Fri, Nov 06, 2020 at 07:18:47PM -0500, Joel Fernandes wrote:
> On Fri, Nov 06, 2020 at 07:01:57PM -0500, Joel Fernandes wrote:
> > On Wed, Nov 04, 2020 at 09:01:33AM -0800, Paul E. McKenney wrote:
> > 
> > > A casual reader might be forgiven for being confused by the combination
> > > of "Return" in the above comment and the "void" function type below.
> > > So shouldn't this comment be something like "Add the specified number
> > > of callbacks to the specified segment..."?
> > 
> > You are right, sorry and will fix it.
> > 
> > > > @@ -330,11 +342,16 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
> > > >  
> > > >  	if (!rcu_segcblist_pend_cbs(rsclp))
> > > >  		return; /* Nothing to do. */
> > > > +	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
> > > > +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
> > > > +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
> > > 
> > > This should be a "for" loop.  Yes, the number and names of the segments
> > > hasn't changed for a good long time, but nothing like code as above to
> > > inspire Murphy to more mischief.  :-/
> > > 
> > > Actually, why not put the summation in the existing "for" loop below?
> > > That would save a line of code in addition to providing less inspiration
> > > for Mr. Murphy.
> > 
> > I can do that. Actually Frederic suggested the same thing but I was reluctant
> > as I felt it did not give much LOC benefit. Will revisit it.
> 
> It reduces 1 line of code :) I changed it to the below, will update the patch:

Thank you!  And yes, I am much more concerned about the constraints on
Mr. Murphy than on the lines of code.  ;-)

							Thanx, Paul

> ---8<-----------------------
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 9b43d686b1f3..bff9b2253e50 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -101,7 +101,7 @@ static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, long
>  	WRITE_ONCE(rsclp->seglen[seg], v);
>  }
>  
> -/* Return number of callbacks in a segment of the segmented callback list. */
> +/* Increase the numeric length of a segment by a specified amount. */
>  static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
>  {
>  	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
> @@ -406,13 +406,12 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
>  
>  	if (!rcu_segcblist_pend_cbs(rsclp))
>  		return; /* Nothing to do. */
> -	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
> -		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
> -		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
> +	rclp->len = 0;
>  	*rclp->tail = *rsclp->tails[RCU_DONE_TAIL];
>  	rclp->tail = rsclp->tails[RCU_NEXT_TAIL];
>  	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
>  	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) {
> +		rclp->len += rcu_segcblist_get_seglen(rsclp, i);
>  		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
>  		rcu_segcblist_set_seglen(rsclp, i, 0);
>  	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb()
  2020-11-06 22:41     ` Joel Fernandes
@ 2020-11-10  1:28       ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-10  1:28 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Fri, Nov 06, 2020 at 05:41:41PM -0500, Joel Fernandes wrote:
> On Thu, Nov 05, 2020 at 10:55:51AM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 03, 2020 at 09:26:03AM -0500, Joel Fernandes (Google) wrote:
> > > Memory barriers are needed when updating the full length of the
> > > segcblist, however it is not fully clearly why one is needed before and
> > > after. This patch therefore adds additional comments to the function
> > > header to explain it.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Looks good, thank you!  As always, I could not resist the urge to
> > do a bit of wordsmithing, so that the queued commit is as shown
> > below.  Please let me know if I messed anything up.
> 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 7dac7adefcae7558b3a85a16f51186d621623733
> > Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Date:   Tue Nov 3 09:26:03 2020 -0500
> > 
> >     rcu/segcblist: Add additional comments to explain smp_mb()
> >     
> >     One counter-intuitive property of RCU is the fact that full memory
> >     barriers are needed both before and after updates to the full
> >     (non-segmented) length.  This patch therefore helps to assist the
> >     reader's intuition by adding appropriate comments.
> >     
> >     [ paulmck:  Wordsmithing. ]
> >     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index bb246d8..b6dda7c 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -94,17 +94,77 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
> >   * field to disagree with the actual number of callbacks on the structure.
> >   * This increase is fully ordered with respect to the callers accesses
> >   * both before and after.
> > + *
> > + * So why on earth is a memory barrier required both before and after
> > + * the update to the ->len field???
> > + *
> > + * The reason is that rcu_barrier() locklessly samples each CPU's ->len
> > + * field, and if a given CPU's field is zero, avoids IPIing that CPU.
> > + * This can of course race with both queuing and invoking of callbacks.
> > + * Failng to correctly handle either of these races could result in
> > + * rcu_barrier() failing to IPI a CPU that actually had callbacks queued
> > + * which rcu_barrier() was obligated to wait on.  And if rcu_barrier()
> > + * failed to wait on such a callback, unloading certain kernel modules
> > + * would result in calls to functions whose code was no longer present in
> > + * the kernel, for but one example.
> > + *
> > + * Therefore, ->len transitions from 1->0 and 0->1 have to be carefully
> > + * ordered with respect with both list modifications and the rcu_barrier().
> > + *
> > + * The queuing case is CASE 1 and the invoking case is CASE 2.
> > + *
> > + * CASE 1: Suppose that CPU 0 has no callbacks queued, but invokes
> > + * call_rcu() just as CPU 1 invokes rcu_barrier().  CPU 0's ->len field
> > + * will transition from 0->1, which is one of the transitions that must be
> > + * handled carefully.  Without the full memory barriers before the ->len
> > + * update and at the beginning of rcu_barrier(), the following could happen:
> > + *
> > + * CPU 0				CPU 1
> > + *
> > + * call_rcu().
> > + *                      		rcu_barrier() sees ->len as 0.
> > + * set ->len = 1.
> > + *                      		rcu_barrier() does nothing.
> > + *					module is unloaded.
> > + * callback invokes unloaded function!
> > + *
> > + * With the full barriers, any case where rcu_barrier() sees ->len as 0 will
> > + * have unambiguously preceded the return from the racing call_rcu(), which
> > + * means that this call_rcu() invocation is OK to not wait on.  After all,
> > + * you are supposed to make sure that any problematic call_rcu() invocations
> > + * happen before the rcu_barrier().
> 
> Unfortunately, I did not understand your explanation. To me the barrier
> *before* the setting of length is needed on CPU0 only for 1->0 transition
> (Dequeue). Where as in
> your example above, it is for enqueue.
> 
> This was case 1 in my patch:
> 
> + * To illustrate the problematic scenario to avoid:
> + * P0 (what P1 sees)	P1
> + * set len = 0
> + *                      rcu_barrier sees len as 0
> + * dequeue from list
> + *                      rcu_barrier does nothing.
> + *
> 
> 
> Here, P1 should see the transition of 1->0 *after* the CB is dequeued. Which
> means you needed a memory barrier *before* the setting of len from 1->0 and
> *after* the dequeue. IOW, rcu_barrier should 'see' the memory ordering as:
> 
> 1. dequeue
> 2. set len from 1 -> 0.
> 
> For the enqueue case, it is the reverse, rcu_barrier should see:
> 1. set len from 0 -> 1
> 2. enqueue
> 
> Either way, the point I think I was trying to make is that the length should
> always be seen as non-zero if the list is non-empty. Basically, the
> rcu_barrier() should always not do the fast-path if the list is non-empty.
> Worst-case it might do the slow-path when it is not necessary, but it should
> never do the fast-path when it was not supposed to.
> 
> Thoughts?

Right you are!  I reversed the before/after associated with ->len.
I will fix this.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> 
> > + *
> > + *
> > + * CASE 2: Suppose that CPU 0 is invoking its last callback just as CPU 1 invokes
> > + * rcu_barrier().  CPU 0's ->len field will transition from 1->0, which is one
> > + * of the transitions that must be handled carefully.  Without the full memory
> > + * barriers after the ->len update and at the end of rcu_barrier(), the following
> > + * could happen:
> > + * 
> > + * CPU 0				CPU 1
> > + *
> > + * start invoking last callback
> > + * set ->len = 0 (reordered)
> > + *                      		rcu_barrier() sees ->len as 0
> > + *                      		rcu_barrier() does nothing.
> > + *					module is unloaded
> > + * callback executing after unloaded!
> > + *
> > + * With the full barriers, any case where rcu_barrier() sees ->len as 0
> > + * will be fully ordered after the completion of the callback function,
> > + * so that the module unloading operation is completely safe.
> > + * 
> >   */
> >  void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
> >  {
> >  #ifdef CONFIG_RCU_NOCB_CPU
> > -	smp_mb__before_atomic(); /* Up to the caller! */
> > +	smp_mb__before_atomic(); // Read header comment above.
> >  	atomic_long_add(v, &rsclp->len);
> > -	smp_mb__after_atomic(); /* Up to the caller! */
> > +	smp_mb__after_atomic();  // Read header comment above.
> >  #else
> > -	smp_mb(); /* Up to the caller! */
> > +	smp_mb(); // Read header comment above.
> >  	WRITE_ONCE(rsclp->len, rsclp->len + v);
> > -	smp_mb(); /* Up to the caller! */
> > +	smp_mb(); // Read header comment above.
> >  #endif
> >  }
> >  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure
  2020-11-07  0:48         ` Paul E. McKenney
@ 2020-11-10  1:39           ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-10  1:39 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Fri, Nov 06, 2020 at 04:48:15PM -0800, Paul E. McKenney wrote:
> On Fri, Nov 06, 2020 at 07:18:47PM -0500, Joel Fernandes wrote:
> > On Fri, Nov 06, 2020 at 07:01:57PM -0500, Joel Fernandes wrote:
> > > On Wed, Nov 04, 2020 at 09:01:33AM -0800, Paul E. McKenney wrote:
> > > 
> > > > A casual reader might be forgiven for being confused by the combination
> > > > of "Return" in the above comment and the "void" function type below.
> > > > So shouldn't this comment be something like "Add the specified number
> > > > of callbacks to the specified segment..."?
> > > 
> > > You are right, sorry and will fix it.
> > > 
> > > > > @@ -330,11 +342,16 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
> > > > >  
> > > > >  	if (!rcu_segcblist_pend_cbs(rsclp))
> > > > >  		return; /* Nothing to do. */
> > > > > +	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
> > > > > +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
> > > > > +		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
> > > > 
> > > > This should be a "for" loop.  Yes, the number and names of the segments
> > > > hasn't changed for a good long time, but nothing like code as above to
> > > > inspire Murphy to more mischief.  :-/
> > > > 
> > > > Actually, why not put the summation in the existing "for" loop below?
> > > > That would save a line of code in addition to providing less inspiration
> > > > for Mr. Murphy.
> > > 
> > > I can do that. Actually Frederic suggested the same thing but I was reluctant
> > > as I felt it did not give much LOC benefit. Will revisit it.
> > 
> > It reduces 1 line of code :) I changed it to the below, will update the patch:
> 
> Thank you!  And yes, I am much more concerned about the constraints on
> Mr. Murphy than on the lines of code.  ;-)

And I have pulled in the updated commit, thank you all!

							Thanx, Paul

> > ---8<-----------------------
> > 
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 9b43d686b1f3..bff9b2253e50 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -101,7 +101,7 @@ static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, long
> >  	WRITE_ONCE(rsclp->seglen[seg], v);
> >  }
> >  
> > -/* Return number of callbacks in a segment of the segmented callback list. */
> > +/* Increase the numeric length of a segment by a specified amount. */
> >  static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
> >  {
> >  	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
> > @@ -406,13 +406,12 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
> >  
> >  	if (!rcu_segcblist_pend_cbs(rsclp))
> >  		return; /* Nothing to do. */
> > -	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
> > -		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
> > -		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
> > +	rclp->len = 0;
> >  	*rclp->tail = *rsclp->tails[RCU_DONE_TAIL];
> >  	rclp->tail = rsclp->tails[RCU_NEXT_TAIL];
> >  	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
> >  	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) {
> > +		rclp->len += rcu_segcblist_get_seglen(rsclp, i);
> >  		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
> >  		rcu_segcblist_set_seglen(rsclp, i, 0);
> >  	}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s
  2020-11-07  0:26     ` Joel Fernandes
@ 2020-11-10  1:41       ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-10  1:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Frederic Weisbecker, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, rcu, Steven Rostedt,
	Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Fri, Nov 06, 2020 at 07:26:04PM -0500, Joel Fernandes wrote:
> On Wed, Nov 04, 2020 at 07:57:13PM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 03, 2020 at 09:26:02AM -0500, Joel Fernandes (Google) wrote:
> > > This memory barrier is not needed as rcu_segcblist_add_len() already
> > > includes a memory barrier *before* and *after* the length of the list is
> > > updated.
> > > 
> > > Same reasoning for rcu_segcblist_enqueue().
> > 
> > I suggest a commit log like the following:
> > 
> > ------------------------------------------------------------------------
> > 
> > The full memory barriers in rcu_segcblist_enqueue() and in rcu_do_batch()
> > are not needed because rcu_segcblist_add_len(), and thus also
> > rcu_segcblist_inc_len(), already includes a memory barrier *before*
> > and *after* the length of the list is updated.
> > 
> > This commit therefore removes these redundant smp_mb() invocations.
> > 
> > ------------------------------------------------------------------------
> > 
> > Other than that, looks good!  I could hand-apply it, but that
> > would just cause more churn with the addition of the call to
> > rcu_segcblist_inc_seglen().  So could you please update the commit log
> > when you repost, whether to the mailing list or from your git tree?
> 
> Done, I updated it in my tree. I will share the link to tree on IRC.

And I have pulled this in, thank you!

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> > 
> > 							Thanx, Paul
> > 
> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/rcu_segcblist.c | 1 -
> > >  kernel/rcu/tree.c          | 1 -
> > >  2 files changed, 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > > index e9e72d72f7a6..d96272e8d604 100644
> > > --- a/kernel/rcu/rcu_segcblist.c
> > > +++ b/kernel/rcu/rcu_segcblist.c
> > > @@ -268,7 +268,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
> > >  			   struct rcu_head *rhp)
> > >  {
> > >  	rcu_segcblist_inc_len(rsclp);
> > > -	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
> > >  	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
> > >  	rhp->next = NULL;
> > >  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index f6c6653b3ec2..fb2a5ac4a59c 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2525,7 +2525,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
> > >  
> > >  	/* Update counts and requeue any remaining callbacks. */
> > >  	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
> > > -	smp_mb(); /* List handling before counting for rcu_barrier(). */
> > >  	rcu_segcblist_add_len(&rdp->cblist, -count);
> > >  
> > >  	/* Reinstate batch limit if we have worked down the excess. */
> > > -- 
> > > 2.29.1.341.ge80a0c044ae-goog
> > > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-03 14:26 ` [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
  2020-11-03 15:17   ` Frederic Weisbecker
  2020-11-04 13:40   ` Neeraj Upadhyay
@ 2020-11-11  0:35   ` Paul E. McKenney
  2020-11-11 13:49     ` Steven Rostedt
  2 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2020-11-11  0:35 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	rcu, Steven Rostedt, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, Nov 03, 2020 at 09:26:00AM -0500, Joel Fernandes (Google) wrote:
> Track how the segcb list changes before/after acceleration, during
> queuing and during dequeuing.
> 
> This has proved useful to discover an optimization to avoid unwanted GP
> requests when there are no callbacks accelerated. The overhead is minimal as
> each segment's length is now stored in the respective segment.
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

I thought I had responded to this one, but I find no record of having
done so.  But please see below for some questions and comments.

> ---
>  include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
>  kernel/rcu/rcu_segcblist.c | 34 ++++++++++++++++++++++++++++++++++
>  kernel/rcu/rcu_segcblist.h |  5 +++++
>  kernel/rcu/tree.c          |  9 +++++++++
>  4 files changed, 73 insertions(+)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 155b5cb43cfd..5f8f2ee1a936 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
>  		  __entry->qlen)
>  );
>  
> +TRACE_EVENT_RCU(rcu_segcb_stats,
> +
> +		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
> +
> +		TP_ARGS(ctx, cb_count, gp_seq),
> +
> +		TP_STRUCT__entry(
> +			__field(const char *, ctx)
> +			__array(int, cb_count, RCU_CBLIST_NSEGS)
> +			__array(unsigned long, gp_seq, RCU_CBLIST_NSEGS)
> +		),
> +
> +		TP_fast_assign(
> +			__entry->ctx = ctx;
> +			memcpy(__entry->cb_count, cb_count, RCU_CBLIST_NSEGS * sizeof(int));
> +			memcpy(__entry->gp_seq, gp_seq, RCU_CBLIST_NSEGS * sizeof(unsigned long));

OK, I will bite...

Why not include linux/rcu_segcblist.h and then fold all of
rcu_segcblist_countseq() and much of __trace_rcu_segcb_stats() into
this TP_fast_assign()?  If you must have "unsigned int" for cb_count,
then you can use "do {} while (0)" to declare the index variable, but
if you can use unsigned long as suggested by Neeraj, you can just use
a pair of memcpy()s.

> +		),
> +
> +		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
> +			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
> +			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
> +			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
> +
> +);
> +
>  /*
>   * Tracepoint for the registration of a single RCU callback of the special
>   * kvfree() form.  The first argument is the RCU type, the second argument
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 357c19bbcb00..2a03949d0b82 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -14,6 +14,7 @@
>  #include <linux/types.h>
>  
>  #include "rcu_segcblist.h"
> +#include "rcu.h"
>  
>  /* Initialize simple callback list. */
>  void rcu_cblist_init(struct rcu_cblist *rclp)
> @@ -328,6 +329,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>  	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
>  }
>  
> +/*
> + * Return how many CBs each segment along with their gp_seq values.
> + *
> + * This function is O(N) where N is the number of segments. Only used from
> + * tracing code which is usually disabled in production.
> + */
> +#ifdef CONFIG_RCU_TRACE
> +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> +			 int cbcount[RCU_CBLIST_NSEGS],
> +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> +{
> +	int i;
> +
> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> +		gpseq[i] = rsclp->gp_seq[i];
> +	}
> +}
> +
> +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> +{
> +	int cbs[RCU_CBLIST_NSEGS];
> +	unsigned long gps[RCU_CBLIST_NSEGS];
> +
> +	if (!trace_rcu_segcb_stats_enabled())
> +		return;

Can't you rely on the trace system to enable and disable this trace
event?  If the thought is to save instructions, then moving all this
into TP_fast_assign() enables the trace system to deal with that as well.

> +	rcu_segcblist_countseq(rsclp, cbs, gps);
> +
> +	trace_rcu_segcb_stats(context, cbs, gps);
> +}
> +#endif
> +
>  /*
>   * Extract only those callbacks still pending (not yet ready to be
>   * invoked) from the specified rcu_segcblist structure and place them in
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index cd35c9faaf51..7750734fa116 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
>  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
>  void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
>  			 struct rcu_segcblist *src_rsclp);
> +#ifdef CONFIG_RCU_TRACE
> +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context);
> +#else
> +#define __trace_rcu_segcb_stats(...)
> +#endif

And letting TP_fast_assign() to the work allows this to go away...

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 24c00020ab83..f6c6653b3ec2 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1497,6 +1497,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>  	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
>  		return false;
>  
> +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPreAcc"));

...and you don't need the "__" everywhere.

So what am I missing here?

						Thanx, Paul

> +
>  	/*
>  	 * Callbacks are often registered with incomplete grace-period
>  	 * information.  Something about the fact that getting exact
> @@ -1517,6 +1519,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>  	else
>  		trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
>  
> +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPostAcc"));
> +
>  	return ret;
>  }
>  
> @@ -2466,11 +2470,14 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
>  	if (offloaded)
>  		rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist);
> +
> +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbDequeued"));
>  	rcu_nocb_unlock_irqrestore(rdp, flags);
>  
>  	/* Invoke callbacks. */
>  	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
>  	rhp = rcu_cblist_dequeue(&rcl);
> +
>  	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
>  		rcu_callback_t f;
>  
> @@ -2983,6 +2990,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
>  		trace_rcu_callback(rcu_state.name, head,
>  				   rcu_segcblist_n_cbs(&rdp->cblist));
>  
> +	__trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
> +
>  	/* Go handle any RCU core processing required. */
>  	if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
>  		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
> -- 
> 2.29.1.341.ge80a0c044ae-goog
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-11  0:35   ` Paul E. McKenney
@ 2020-11-11 13:49     ` Steven Rostedt
  2020-11-11 14:08       ` Joel Fernandes
  0 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2020-11-11 13:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes (Google),
	linux-kernel, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	rcu, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Tue, 10 Nov 2020 16:35:30 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> > +{
> > +	int cbs[RCU_CBLIST_NSEGS];
> > +	unsigned long gps[RCU_CBLIST_NSEGS];
> > +
> > +	if (!trace_rcu_segcb_stats_enabled())
> > +		return;  
> 
> Can't you rely on the trace system to enable and disable this trace
> event?  If the thought is to save instructions, then moving all this
> into TP_fast_assign() enables the trace system to deal with that as well.
> 
> > +	rcu_segcblist_countseq(rsclp, cbs, gps);
> > +
> > +	trace_rcu_segcb_stats(context, cbs, gps);
> > +}
> > +#endif

Yeah, I agree with Paul. I think it is possible to move this all into the
TP_fast_assign. If you have trouble doing so, let me know.

-- Steve

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes
  2020-11-11 13:49     ` Steven Rostedt
@ 2020-11-11 14:08       ` Joel Fernandes
  0 siblings, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2020-11-11 14:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linux-kernel, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Lai Jiangshan, Marco Elver,
	Mathieu Desnoyers, rcu, Uladzislau Rezki (Sony),
	fweisbec, neeraj.iitr10

On Wed, Nov 11, 2020 at 08:49:41AM -0500, Steven Rostedt wrote:
> On Tue, 10 Nov 2020 16:35:30 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > +void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
> > > +{
> > > +	int cbs[RCU_CBLIST_NSEGS];
> > > +	unsigned long gps[RCU_CBLIST_NSEGS];
> > > +
> > > +	if (!trace_rcu_segcb_stats_enabled())
> > > +		return;  
> > 
> > Can't you rely on the trace system to enable and disable this trace
> > event?  If the thought is to save instructions, then moving all this
> > into TP_fast_assign() enables the trace system to deal with that as well.

Makes sense.

> > > +	rcu_segcblist_countseq(rsclp, cbs, gps);
> > > +
> > > +	trace_rcu_segcb_stats(context, cbs, gps);
> > > +}
> > > +#endif
> 
> Yeah, I agree with Paul. I think it is possible to move this all into the
> TP_fast_assign. If you have trouble doing so, let me know.

Sure. Last time I tried this for this patch, I ran into some issue. I will
try again and let you know if I need help.

thanks,

 - Joel

> 
> -- Steve

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-11-11 14:09 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03 14:25 [PATCH v9 0/7] Add support for length of each segment in the segcblist Joel Fernandes (Google)
2020-11-03 14:25 ` [PATCH v9 1/7] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
2020-11-04  0:05   ` Paul E. McKenney
2020-11-04 15:09     ` Joel Fernandes
2020-11-03 14:25 ` [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
2020-11-04 13:37   ` Neeraj Upadhyay
2020-11-04 17:01   ` Paul E. McKenney
2020-11-07  0:01     ` Joel Fernandes
2020-11-07  0:18       ` Joel Fernandes
2020-11-07  0:48         ` Paul E. McKenney
2020-11-10  1:39           ` Paul E. McKenney
2020-11-03 14:25 ` [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment Joel Fernandes (Google)
2020-11-03 14:47   ` Frederic Weisbecker
2020-11-03 14:56     ` Joel Fernandes
2020-11-03 15:07     ` Joel Fernandes
2020-11-03 15:18       ` Paul E. McKenney
2020-11-03 15:19       ` Frederic Weisbecker
2020-11-04 13:37   ` Neeraj Upadhyay
2020-11-03 14:26 ` [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
2020-11-03 15:17   ` Frederic Weisbecker
2020-11-04 14:08     ` Paul E. McKenney
2020-11-04 14:33       ` Frederic Weisbecker
2020-11-07  0:05         ` Joel Fernandes
2020-11-04 13:40   ` Neeraj Upadhyay
2020-11-04 15:05     ` Joel Fernandes
2020-11-11  0:35   ` Paul E. McKenney
2020-11-11 13:49     ` Steven Rostedt
2020-11-11 14:08       ` Joel Fernandes
2020-11-03 14:26 ` [PATCH v9 5/7] rcu/segcblist: Remove useless rcupdate.h include Joel Fernandes (Google)
2020-11-05  3:48   ` Paul E. McKenney
2020-11-05 14:28     ` Paul E. McKenney
2020-11-07  0:27       ` Joel Fernandes
2020-11-03 14:26 ` [PATCH v9 6/7] rcu/tree: segcblist: Remove redundant smp_mb()s Joel Fernandes (Google)
2020-11-05  3:57   ` Paul E. McKenney
2020-11-07  0:26     ` Joel Fernandes
2020-11-10  1:41       ` Paul E. McKenney
2020-11-03 14:26 ` [PATCH v9 7/7] rcu/segcblist: Add additional comments to explain smp_mb() Joel Fernandes (Google)
2020-11-05 18:55   ` Paul E. McKenney
2020-11-06 22:41     ` Joel Fernandes
2020-11-10  1:28       ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.