All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] Add support for length of each segment in the segcblist
@ 2020-09-23 15:22 Joel Fernandes (Google)
  2020-09-23 15:22 ` [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Joel Fernandes (Google) @ 2020-09-23 15:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Ingo Molnar, Josh Triplett, Lai Jiangshan, Madhuparna Bhowmik,
	Mathieu Desnoyers, neeraj.iitr10, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)


This is required for several usecases identified. One of them being tracing how
the segmented callback list changes. Tracing this has identified issues in RCU
code in the past.

From Paul:
Another use case is of course more accurately determining whether a given CPU's
large pile of callbacks can be best served by making grace periods go faster,
invoking callbacks more vigorously, or both.  It should also be possible to
simplify some of the callback handling a bit, given that some of the unnatural
acts are due to there having been no per-batch counts.

Revision history:
v6: Fixed TREE04, and restored older logic to ensure rcu_barrier works.

v5: Various changes, bug fixes. Discovery of rcu_barrier issue.

v4: Restructured rcu_do_batch() and segcblist merging to avoid issues.
    Fixed minor nit from Davidlohr.
v1->v3: minor nits.
(https://lore.kernel.org/lkml/20200719034210.2382053-1-joel@joelfernandes.org/)

Joel Fernandes (Google) (4):
rcu/tree: Make rcu_do_batch count how many callbacks were executed
rcu/segcblist: Add counters to segcblist datastructure
rcu/trace: Add tracing for how segcb list changes
rcu/segcblist: Remove useless rcupdate.h include

include/linux/rcu_segcblist.h |   2 +
include/trace/events/rcu.h    |  25 ++++++
kernel/rcu/rcu_segcblist.c    | 161 +++++++++++++++++++++++++---------
kernel/rcu/rcu_segcblist.h    |   8 +-
kernel/rcu/tree.c             |  18 ++--
5 files changed, 165 insertions(+), 49 deletions(-)

--
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-09-23 15:22 [PATCH v6 0/4] Add support for length of each segment in the segcblist Joel Fernandes (Google)
@ 2020-09-23 15:22 ` Joel Fernandes (Google)
  2020-10-09 23:14   ` Frederic Weisbecker
  2020-10-14 15:06   ` Neeraj Upadhyay
  2020-09-23 15:22 ` [PATCH v6 2/4] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Joel Fernandes (Google) @ 2020-09-23 15:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Ingo Molnar, Josh Triplett, Lai Jiangshan, Madhuparna Bhowmik,
	Mathieu Desnoyers, neeraj.iitr10, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

Currently, rcu_do_batch() depends on the unsegmented callback list's len field
to know how many CBs are executed. This fields counts down from 0 as CBs are
dequeued.  It is possible that all CBs could not be run because of reaching
limits in which case the remaining unexecuted callbacks are requeued in the
CPU's segcblist.

The number of callbacks that were not requeued are then the negative count (how
many CBs were run) stored in the rcl->len which has been counting down on every
dequeue. This negative count is then added to the per-cpu segmented callback
list's to correct its count.

Such a design works against future efforts to track the length of each segment
of the segmented callback list. The reason is because
rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
list's length field (rcl->len) during extraction.

Also, the design of counting down from 0 is confusing and error-prone IMHO.

This commit therefore explicitly counts have many callbacks were executed in
rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
field, without relying on the negativity of rcl->len.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu_segcblist.c | 2 +-
 kernel/rcu/rcu_segcblist.h | 1 +
 kernel/rcu/tree.c          | 9 ++++-----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 2d2a6b6b9dfb..bb246d8c6ef1 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -95,7 +95,7 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
  * This increase is fully ordered with respect to the callers accesses
  * both before and after.
  */
-static void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
+void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
 {
 #ifdef CONFIG_RCU_NOCB_CPU
 	smp_mb__before_atomic(); /* Up to the caller! */
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 5c293afc07b8..b90725f81d77 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -76,6 +76,7 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
 }
 
 void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
+void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
 void rcu_segcblist_init(struct rcu_segcblist *rsclp);
 void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
 void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7623128d0020..50af465729f4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2427,7 +2427,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 			       rcu_segcblist_is_offloaded(&rdp->cblist);
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
-	long bl, count;
+	long bl, count = 0;
 	long pending, tlimit = 0;
 
 	/* If no callbacks are ready, just return. */
@@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 		rcu_callback_t f;
 
+		count++;
 		debug_rcu_head_unqueue(rhp);
 
 		rcu_lock_acquire(&rcu_callback_map);
@@ -2485,9 +2486,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
 
 		/*
 		 * Stop only if limit reached and CPU has something to do.
-		 * Note: The rcl structure counts down from zero.
 		 */
-		if (-rcl.len >= bl && !offloaded &&
+		if (count >= bl && !offloaded &&
 		    (need_resched() ||
 		     (!is_idle_task(current) && !rcu_is_callbacks_kthread())))
 			break;
@@ -2510,7 +2510,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 
 	local_irq_save(flags);
 	rcu_nocb_lock(rdp);
-	count = -rcl.len;
 	rdp->n_cbs_invoked += count;
 	trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
 			    is_idle_task(current), rcu_is_callbacks_kthread());
@@ -2518,7 +2517,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	/* Update counts and requeue any remaining callbacks. */
 	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
 	smp_mb(); /* List handling before counting for rcu_barrier(). */
-	rcu_segcblist_insert_count(&rdp->cblist, &rcl);
+	rcu_segcblist_add_len(&rdp->cblist, -count);
 
 	/* Reinstate batch limit if we have worked down the excess. */
 	count = rcu_segcblist_n_cbs(&rdp->cblist);
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v6 2/4] rcu/segcblist: Add counters to segcblist datastructure
  2020-09-23 15:22 [PATCH v6 0/4] Add support for length of each segment in the segcblist Joel Fernandes (Google)
  2020-09-23 15:22 ` [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
@ 2020-09-23 15:22 ` Joel Fernandes (Google)
  2020-10-12 23:20   ` Frederic Weisbecker
  2020-09-23 15:22 ` [PATCH v6 3/4] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Joel Fernandes (Google) @ 2020-09-23 15:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Ingo Molnar, Josh Triplett, Lai Jiangshan, Madhuparna Bhowmik,
	Mathieu Desnoyers, neeraj.iitr10, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

Add counting of segment lengths of segmented callback list.

This will be useful for a number of things such as knowing how big the
ready-to-execute segment have gotten. The immediate benefit is ability
to trace how the callbacks in the segmented callback list change.

Also this patch remove hacks related to using donecbs's ->len field as a
temporary variable to save the segmented callback list's length. This cannot be
done anymore and is not needed.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcu_segcblist.h |   2 +
 kernel/rcu/rcu_segcblist.c    | 124 +++++++++++++++++++++++-----------
 kernel/rcu/rcu_segcblist.h    |   2 -
 3 files changed, 86 insertions(+), 42 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index b36afe7b22c9..d462ae5e340a 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -69,8 +69,10 @@ struct rcu_segcblist {
 	unsigned long gp_seq[RCU_CBLIST_NSEGS];
 #ifdef CONFIG_RCU_NOCB_CPU
 	atomic_long_t len;
+	atomic_long_t seglen[RCU_CBLIST_NSEGS];
 #else
 	long len;
+	long seglen[RCU_CBLIST_NSEGS];
 #endif
 	u8 enabled;
 	u8 offloaded;
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index bb246d8c6ef1..0e6d19bd3de9 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -88,6 +88,62 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
 #endif
 }
 
+/* Get the length of a segment of the rcu_segcblist structure. */
+static long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg)
+{
+#ifdef CONFIG_RCU_NOCB_CPU
+	return atomic_long_read(&rsclp->seglen[seg]);
+#else
+	return READ_ONCE(rsclp->seglen[seg]);
+#endif
+}
+
+/* Set the length of a segment of the rcu_segcblist structure. */
+static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, long v)
+{
+#ifdef CONFIG_RCU_NOCB_CPU
+	atomic_long_set(&rsclp->seglen[seg], v);
+#else
+	WRITE_ONCE(rsclp->seglen[seg], v);
+#endif
+}
+
+/* Return number of callbacks in a segment of the segmented callback list. */
+static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
+{
+#ifdef CONFIG_RCU_NOCB_CPU
+	smp_mb__before_atomic(); /* Up to the caller! */
+	atomic_long_add(v, &rsclp->seglen[seg]);
+	smp_mb__after_atomic(); /* Up to the caller! */
+#else
+	smp_mb(); /* Up to the caller! */
+	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
+	smp_mb(); /* Up to the caller! */
+#endif
+}
+
+/* Move from's segment length to to's segment. */
+static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, int to)
+{
+	long len;
+
+	if (from == to)
+		return;
+
+	len = rcu_segcblist_get_seglen(rsclp, from);
+	if (!len)
+		return;
+
+	rcu_segcblist_add_seglen(rsclp, to, len);
+	rcu_segcblist_set_seglen(rsclp, from, 0);
+}
+
+/* Increment segment's length. */
+static void rcu_segcblist_inc_seglen(struct rcu_segcblist *rsclp, int seg)
+{
+	rcu_segcblist_add_seglen(rsclp, seg, 1);
+}
+
 /*
  * Increase the numeric length of an rcu_segcblist structure by the
  * specified amount, which can be negative.  This can cause the ->len
@@ -119,26 +175,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp)
 	rcu_segcblist_add_len(rsclp, 1);
 }
 
-/*
- * Exchange the numeric length of the specified rcu_segcblist structure
- * with the specified value.  This can cause the ->len field to disagree
- * with the actual number of callbacks on the structure.  This exchange is
- * fully ordered with respect to the callers accesses both before and after.
- */
-static long rcu_segcblist_xchg_len(struct rcu_segcblist *rsclp, long v)
-{
-#ifdef CONFIG_RCU_NOCB_CPU
-	return atomic_long_xchg(&rsclp->len, v);
-#else
-	long ret = rsclp->len;
-
-	smp_mb(); /* Up to the caller! */
-	WRITE_ONCE(rsclp->len, v);
-	smp_mb(); /* Up to the caller! */
-	return ret;
-#endif
-}
-
 /*
  * Initialize an rcu_segcblist structure.
  */
@@ -149,8 +185,10 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
 	BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq));
 	BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq));
 	rsclp->head = NULL;
-	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
+	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
 		rsclp->tails[i] = &rsclp->head;
+		rcu_segcblist_set_seglen(rsclp, i, 0);
+	}
 	rcu_segcblist_set_len(rsclp, 0);
 	rsclp->enabled = 1;
 }
@@ -245,6 +283,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
 			   struct rcu_head *rhp)
 {
 	rcu_segcblist_inc_len(rsclp);
+	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
 	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
 	rhp->next = NULL;
 	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
@@ -274,27 +313,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
 	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
 		if (rsclp->tails[i] != rsclp->tails[i - 1])
 			break;
+	rcu_segcblist_inc_seglen(rsclp, i);
 	WRITE_ONCE(*rsclp->tails[i], rhp);
 	for (; i <= RCU_NEXT_TAIL; i++)
 		WRITE_ONCE(rsclp->tails[i], &rhp->next);
 	return true;
 }
 
-/*
- * Extract only the counts from the specified rcu_segcblist structure,
- * and place them in the specified rcu_cblist structure.  This function
- * supports both callback orphaning and invocation, hence the separation
- * of counts and callbacks.  (Callbacks ready for invocation must be
- * orphaned and adopted separately from pending callbacks, but counts
- * apply to all callbacks.  Locking must be used to make sure that
- * both orphaned-callbacks lists are consistent.)
- */
-void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
-					       struct rcu_cblist *rclp)
-{
-	rclp->len = rcu_segcblist_xchg_len(rsclp, 0);
-}
-
 /*
  * Extract only those callbacks ready to be invoked from the specified
  * rcu_segcblist structure and place them in the specified rcu_cblist
@@ -307,6 +332,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 
 	if (!rcu_segcblist_ready_cbs(rsclp))
 		return; /* Nothing to do. */
+	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_DONE_TAIL);
 	*rclp->tail = rsclp->head;
 	WRITE_ONCE(rsclp->head, *rsclp->tails[RCU_DONE_TAIL]);
 	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
@@ -314,6 +340,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 	for (i = RCU_CBLIST_NSEGS - 1; i >= RCU_DONE_TAIL; i--)
 		if (rsclp->tails[i] == rsclp->tails[RCU_DONE_TAIL])
 			WRITE_ONCE(rsclp->tails[i], &rsclp->head);
+	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
 }
 
 /*
@@ -330,11 +357,16 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
 
 	if (!rcu_segcblist_pend_cbs(rsclp))
 		return; /* Nothing to do. */
+	rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_WAIT_TAIL) +
+		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_READY_TAIL) +
+		    rcu_segcblist_get_seglen(rsclp, RCU_NEXT_TAIL);
 	*rclp->tail = *rsclp->tails[RCU_DONE_TAIL];
 	rclp->tail = rsclp->tails[RCU_NEXT_TAIL];
 	WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL);
-	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++)
+	for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) {
 		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
+		rcu_segcblist_set_seglen(rsclp, i, 0);
+	}
 }
 
 /*
@@ -345,7 +377,6 @@ void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp,
 				struct rcu_cblist *rclp)
 {
 	rcu_segcblist_add_len(rsclp, rclp->len);
-	rclp->len = 0;
 }
 
 /*
@@ -359,6 +390,7 @@ void rcu_segcblist_insert_done_cbs(struct rcu_segcblist *rsclp,
 
 	if (!rclp->head)
 		return; /* No callbacks to move. */
+	rcu_segcblist_add_seglen(rsclp, RCU_DONE_TAIL, rclp->len);
 	*rclp->tail = rsclp->head;
 	WRITE_ONCE(rsclp->head, rclp->head);
 	for (i = RCU_DONE_TAIL; i < RCU_CBLIST_NSEGS; i++)
@@ -379,6 +411,8 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
 {
 	if (!rclp->head)
 		return; /* Nothing to do. */
+
+	rcu_segcblist_add_seglen(rsclp, RCU_NEXT_TAIL, rclp->len);
 	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
 	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
 }
@@ -403,6 +437,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
 		if (ULONG_CMP_LT(seq, rsclp->gp_seq[i]))
 			break;
 		WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
+		rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
 	}
 
 	/* If no callbacks moved, nothing more need be done. */
@@ -423,6 +458,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
 		if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
 			break;  /* No more callbacks. */
 		WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
+		rcu_segcblist_move_seglen(rsclp, i, j);
 		rsclp->gp_seq[j] = rsclp->gp_seq[i];
 	}
 }
@@ -444,7 +480,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
  */
 bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 {
-	int i;
+	int i, j;
 
 	WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp));
 	if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL))
@@ -487,6 +523,10 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 	if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
 		return false;
 
+	/* Accounting: everything below i is about to get merged into i. */
+	for (j = i + 1; j <= RCU_NEXT_TAIL; j++)
+		rcu_segcblist_move_seglen(rsclp, j, i);
+
 	/*
 	 * Merge all later callbacks, including newly arrived callbacks,
 	 * into the segment located by the for-loop above.  Assign "seq"
@@ -516,11 +556,15 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
 
 	rcu_cblist_init(&donecbs);
 	rcu_cblist_init(&pendcbs);
-	rcu_segcblist_extract_count(src_rsclp, &donecbs);
+
+	rcu_segcblist_set_len(src_rsclp, 0);
 	rcu_segcblist_extract_done_cbs(src_rsclp, &donecbs);
 	rcu_segcblist_extract_pend_cbs(src_rsclp, &pendcbs);
+
 	rcu_segcblist_insert_count(dst_rsclp, &donecbs);
+	rcu_segcblist_insert_count(dst_rsclp, &pendcbs);
 	rcu_segcblist_insert_done_cbs(dst_rsclp, &donecbs);
 	rcu_segcblist_insert_pend_cbs(dst_rsclp, &pendcbs);
+
 	rcu_segcblist_init(src_rsclp);
 }
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index b90725f81d77..3e0eb1056ae9 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -89,8 +89,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
 			   struct rcu_head *rhp);
 bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
 			   struct rcu_head *rhp);
-void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
-				 struct rcu_cblist *rclp);
 void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 				    struct rcu_cblist *rclp);
 void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v6 3/4] rcu/trace: Add tracing for how segcb list changes
  2020-09-23 15:22 [PATCH v6 0/4] Add support for length of each segment in the segcblist Joel Fernandes (Google)
  2020-09-23 15:22 ` [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
  2020-09-23 15:22 ` [PATCH v6 2/4] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
@ 2020-09-23 15:22 ` Joel Fernandes (Google)
  2020-10-14 15:22   ` Neeraj Upadhyay
  2020-09-23 15:22 ` [PATCH v6 4/4] rcu/segcblist: Remove useless rcupdate.h include Joel Fernandes (Google)
  2020-09-24 23:42 ` [PATCH v6 0/4] Add support for length of each segment in the segcblist Paul E. McKenney
  4 siblings, 1 reply; 16+ messages in thread
From: Joel Fernandes (Google) @ 2020-09-23 15:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Ingo Molnar, Josh Triplett, Lai Jiangshan, Madhuparna Bhowmik,
	Mathieu Desnoyers, neeraj.iitr10, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

Track how the segcb list changes before/after acceleration, during
queuing and during dequeuing.

This has proved useful to discover an optimization to avoid unwanted GP
requests when there are no callbacks accelerated. The overhead is minimal as
each segment's length is now stored in the respective segment.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
 kernel/rcu/rcu_segcblist.c | 34 ++++++++++++++++++++++++++++++++++
 kernel/rcu/rcu_segcblist.h |  5 +++++
 kernel/rcu/tree.c          |  9 +++++++++
 4 files changed, 73 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 155b5cb43cfd..7b84df3c95df 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
 		  __entry->qlen)
 );
 
+TRACE_EVENT_RCU(rcu_segcb,
+
+		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
+
+		TP_ARGS(ctx, cb_count, gp_seq),
+
+		TP_STRUCT__entry(
+			__field(const char *, ctx)
+			__array(int, cb_count, 4)
+			__array(unsigned long, gp_seq, 4)
+		),
+
+		TP_fast_assign(
+			__entry->ctx = ctx;
+			memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
+			memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
+		),
+
+		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
+			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
+			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
+			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
+
+);
+
 /*
  * Tracepoint for the registration of a single RCU callback of the special
  * kvfree() form.  The first argument is the RCU type, the second argument
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 0e6d19bd3de9..df0f31e30947 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -13,6 +13,7 @@
 #include <linux/rcupdate.h>
 
 #include "rcu_segcblist.h"
+#include "rcu.h"
 
 /* Initialize simple callback list. */
 void rcu_cblist_init(struct rcu_cblist *rclp)
@@ -343,6 +344,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
 }
 
+/*
+ * Return how many CBs each segment along with their gp_seq values.
+ *
+ * This function is O(N) where N is the number of callbacks. Only used from
+ * tracing code which is usually disabled in production.
+ */
+#ifdef CONFIG_RCU_TRACE
+static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
+			 int cbcount[RCU_CBLIST_NSEGS],
+			 unsigned long gpseq[RCU_CBLIST_NSEGS])
+{
+	int i;
+
+	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
+		cbcount[i] = 0;
+
+	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
+		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
+		gpseq[i] = rsclp->gp_seq[i];
+	}
+}
+
+void trace_rcu_segcb_list(struct rcu_segcblist *rsclp, char *context)
+{
+	int cbs[RCU_CBLIST_NSEGS];
+	unsigned long gps[RCU_CBLIST_NSEGS];
+
+	rcu_segcblist_countseq(rsclp, cbs, gps);
+
+	trace_rcu_segcb(context, cbs, gps);
+}
+#endif
+
 /*
  * Extract only those callbacks still pending (not yet ready to be
  * invoked) from the specified rcu_segcblist structure and place them in
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 3e0eb1056ae9..15c10d30f88c 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
 bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
 void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
 			 struct rcu_segcblist *src_rsclp);
+#ifdef CONFIG_RCU_TRACE
+void trace_rcu_segcb_list(struct rcu_segcblist *rsclp, char *context);
+#else
+#define trace_rcu_segcb_list(...)
+#endif
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 50af465729f4..e3381ff67fc6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1492,6 +1492,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
 		return false;
 
+	trace_rcu_segcb_list(&rdp->cblist, "SegCbPreAcc");
+
 	/*
 	 * Callbacks are often registered with incomplete grace-period
 	 * information.  Something about the fact that getting exact
@@ -1512,6 +1514,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 	else
 		trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
 
+	trace_rcu_segcb_list(&rdp->cblist, "SegCbPostAcc");
+
 	return ret;
 }
 
@@ -2469,6 +2473,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	/* Invoke callbacks. */
 	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
 	rhp = rcu_cblist_dequeue(&rcl);
+
+	trace_rcu_segcb_list(&rdp->cblist, "SegCbDequeued");
+
 	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 		rcu_callback_t f;
 
@@ -2982,6 +2989,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 		trace_rcu_callback(rcu_state.name, head,
 				   rcu_segcblist_n_cbs(&rdp->cblist));
 
+	trace_rcu_segcb_list(&rdp->cblist, "SegCBQueued");
+
 	/* Go handle any RCU core processing required. */
 	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
 	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v6 4/4] rcu/segcblist: Remove useless rcupdate.h include
  2020-09-23 15:22 [PATCH v6 0/4] Add support for length of each segment in the segcblist Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2020-09-23 15:22 ` [PATCH v6 3/4] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
@ 2020-09-23 15:22 ` Joel Fernandes (Google)
  2020-09-24 23:42 ` [PATCH v6 0/4] Add support for length of each segment in the segcblist Paul E. McKenney
  4 siblings, 0 replies; 16+ messages in thread
From: Joel Fernandes (Google) @ 2020-09-23 15:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Ingo Molnar, Josh Triplett, Lai Jiangshan, Madhuparna Bhowmik,
	Mathieu Desnoyers, neeraj.iitr10, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu_segcblist.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index df0f31e30947..b65ac8c85b56 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -10,7 +10,6 @@
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
-#include <linux/rcupdate.h>
 
 #include "rcu_segcblist.h"
 #include "rcu.h"
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/4] Add support for length of each segment in the segcblist
  2020-09-23 15:22 [PATCH v6 0/4] Add support for length of each segment in the segcblist Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2020-09-23 15:22 ` [PATCH v6 4/4] rcu/segcblist: Remove useless rcupdate.h include Joel Fernandes (Google)
@ 2020-09-24 23:42 ` Paul E. McKenney
  4 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2020-09-24 23:42 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Madhuparna Bhowmik, Mathieu Desnoyers, neeraj.iitr10, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Wed, Sep 23, 2020 at 11:22:07AM -0400, Joel Fernandes (Google) wrote:
> 
> This is required for several usecases identified. One of them being tracing how
> the segmented callback list changes. Tracing this has identified issues in RCU
> code in the past.
> 
> >From Paul:
> Another use case is of course more accurately determining whether a given CPU's
> large pile of callbacks can be best served by making grace periods go faster,
> invoking callbacks more vigorously, or both.  It should also be possible to
> simplify some of the callback handling a bit, given that some of the unnatural
> acts are due to there having been no per-batch counts.
> 
> Revision history:
> v6: Fixed TREE04, and restored older logic to ensure rcu_barrier works.
> 
> v5: Various changes, bug fixes. Discovery of rcu_barrier issue.
> 
> v4: Restructured rcu_do_batch() and segcblist merging to avoid issues.
>     Fixed minor nit from Davidlohr.
> v1->v3: minor nits.
> (https://lore.kernel.org/lkml/20200719034210.2382053-1-joel@joelfernandes.org/)

Looking much improved, thank you!

I have placed these on branch rcu/test in -rcu for testing and inspection.
I had to apply them at b94e6291a208 ("torture: Force weak-hashed pointers
on console log") and cherry-pick them onto the "dev" branch, but it looks
like things worked nicely.

							Thanx, Paul

> Joel Fernandes (Google) (4):
> rcu/tree: Make rcu_do_batch count how many callbacks were executed
> rcu/segcblist: Add counters to segcblist datastructure
> rcu/trace: Add tracing for how segcb list changes
> rcu/segcblist: Remove useless rcupdate.h include
> 
> include/linux/rcu_segcblist.h |   2 +
> include/trace/events/rcu.h    |  25 ++++++
> kernel/rcu/rcu_segcblist.c    | 161 +++++++++++++++++++++++++---------
> kernel/rcu/rcu_segcblist.h    |   8 +-
> kernel/rcu/tree.c             |  18 ++--
> 5 files changed, 165 insertions(+), 49 deletions(-)
> 
> --
> 2.28.0.681.g6f77f65b4e-goog
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-09-23 15:22 ` [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
@ 2020-10-09 23:14   ` Frederic Weisbecker
  2020-10-11 16:35     ` Joel Fernandes
  2020-10-14 15:06   ` Neeraj Upadhyay
  1 sibling, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2020-10-09 23:14 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Madhuparna Bhowmik, Mathieu Desnoyers, neeraj.iitr10,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony)

On Wed, Sep 23, 2020 at 11:22:08AM -0400, Joel Fernandes (Google) wrote:
> Currently, rcu_do_batch() depends on the unsegmented callback list's len field
> to know how many CBs are executed. This fields counts down from 0 as CBs are
> dequeued.  It is possible that all CBs could not be run because of reaching
> limits in which case the remaining unexecuted callbacks are requeued in the
> CPU's segcblist.
> 
> The number of callbacks that were not requeued are then the negative count (how
> many CBs were run) stored in the rcl->len which has been counting down on every
> dequeue. This negative count is then added to the per-cpu segmented callback
> list's to correct its count.
> 
> Such a design works against future efforts to track the length of each segment
> of the segmented callback list. The reason is because
> rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
> list's length field (rcl->len) during extraction.
> Also, the design of counting down from 0 is confusing and error-prone IMHO.

Right :)

> 
> This commit therefore explicitly counts have many callbacks were executed in

s/have/how

> rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
> field, without relying on the negativity of rcl->len.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-10-09 23:14   ` Frederic Weisbecker
@ 2020-10-11 16:35     ` Joel Fernandes
  2020-10-12 13:50       ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Joel Fernandes @ 2020-10-11 16:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Madhuparna Bhowmik, Mathieu Desnoyers, Neeraj upadhyay,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony)

On Fri, Oct 9, 2020 at 4:14 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Wed, Sep 23, 2020 at 11:22:08AM -0400, Joel Fernandes (Google) wrote:
> > Currently, rcu_do_batch() depends on the unsegmented callback list's len field
> > to know how many CBs are executed. This fields counts down from 0 as CBs are
> > dequeued.  It is possible that all CBs could not be run because of reaching
> > limits in which case the remaining unexecuted callbacks are requeued in the
> > CPU's segcblist.
> >
> > The number of callbacks that were not requeued are then the negative count (how
> > many CBs were run) stored in the rcl->len which has been counting down on every
> > dequeue. This negative count is then added to the per-cpu segmented callback
> > list's to correct its count.
> >
> > Such a design works against future efforts to track the length of each segment
> > of the segmented callback list. The reason is because
> > rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
> > list's length field (rcl->len) during extraction.
> > Also, the design of counting down from 0 is confusing and error-prone IMHO.
>
> Right :)

:)

> > This commit therefore explicitly counts have many callbacks were executed in
>
> s/have/how
>
> > rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
> > field, without relying on the negativity of rcl->len.
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks! Paul would be Ok to make the minor fixup s/have/how/ that
Frederic pointed?

- Joel
(Due to COVID issues at home, I'm intermittently working so advance
apologies for slow replies.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-10-11 16:35     ` Joel Fernandes
@ 2020-10-12 13:50       ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2020-10-12 13:50 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Frederic Weisbecker, LKML, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Madhuparna Bhowmik, Mathieu Desnoyers,
	Neeraj upadhyay, rcu, Steven Rostedt, Uladzislau Rezki (Sony)

On Sun, Oct 11, 2020 at 09:35:37AM -0700, Joel Fernandes wrote:
> On Fri, Oct 9, 2020 at 4:14 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> > On Wed, Sep 23, 2020 at 11:22:08AM -0400, Joel Fernandes (Google) wrote:
> > > Currently, rcu_do_batch() depends on the unsegmented callback list's len field
> > > to know how many CBs are executed. This fields counts down from 0 as CBs are
> > > dequeued.  It is possible that all CBs could not be run because of reaching
> > > limits in which case the remaining unexecuted callbacks are requeued in the
> > > CPU's segcblist.
> > >
> > > The number of callbacks that were not requeued are then the negative count (how
> > > many CBs were run) stored in the rcl->len which has been counting down on every
> > > dequeue. This negative count is then added to the per-cpu segmented callback
> > > list's to correct its count.
> > >
> > > Such a design works against future efforts to track the length of each segment
> > > of the segmented callback list. The reason is because
> > > rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
> > > list's length field (rcl->len) during extraction.
> > > Also, the design of counting down from 0 is confusing and error-prone IMHO.
> >
> > Right :)
> 
> :)
> 
> > > This commit therefore explicitly counts have many callbacks were executed in
> >
> > s/have/how
> >
> > > rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
> > > field, without relying on the negativity of rcl->len.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Thanks! Paul would be Ok to make the minor fixup s/have/how/ that
> Frederic pointed?

But of course!  I was waiting until Frederic gets them all reviewed,
with an eye to applying and wordsmithing them as a set.

> - Joel
> (Due to COVID issues at home, I'm intermittently working so advance
> apologies for slow replies.)

And I hope that this is going as well as it possibly can!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/4] rcu/segcblist: Add counters to segcblist datastructure
  2020-09-23 15:22 ` [PATCH v6 2/4] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
@ 2020-10-12 23:20   ` Frederic Weisbecker
  2020-10-14 15:25     ` joel
  2020-10-14 23:09     ` joel
  0 siblings, 2 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2020-10-12 23:20 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Madhuparna Bhowmik, Mathieu Desnoyers, neeraj.iitr10,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony)

On Wed, Sep 23, 2020 at 11:22:09AM -0400, Joel Fernandes (Google) wrote:
> +/* Return number of callbacks in a segment of the segmented callback list. */
> +static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
> +{
> +#ifdef CONFIG_RCU_NOCB_CPU
> +	smp_mb__before_atomic(); /* Up to the caller! */
> +	atomic_long_add(v, &rsclp->seglen[seg]);
> +	smp_mb__after_atomic(); /* Up to the caller! */
> +#else
> +	smp_mb(); /* Up to the caller! */
> +	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
> +	smp_mb(); /* Up to the caller! */
> +#endif
> +}

I know that these "Up to the caller" comments come from the existing len
functions but perhaps we should explain a bit more against what it is ordering
and what it pairs to.

Also why do we need one before _and_ after?

And finally do we have the same ordering requirements than the unsegmented len
field?

> +
> +/* Move from's segment length to to's segment. */
> +static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, int to)
> +{
> +	long len;
> +
> +	if (from == to)
> +		return;
> +
> +	len = rcu_segcblist_get_seglen(rsclp, from);
> +	if (!len)
> +		return;
> +
> +	rcu_segcblist_add_seglen(rsclp, to, len);
> +	rcu_segcblist_set_seglen(rsclp, from, 0);
> +}
> +
[...]
> @@ -245,6 +283,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
>  			   struct rcu_head *rhp)
>  {
>  	rcu_segcblist_inc_len(rsclp);
> +	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
>  	smp_mb(); /* Ensure counts are updated before callback is enqueued. */

Since inc_len and even now inc_seglen have two full barriers embracing the add up,
we can probably spare the above smp_mb()?

>  	rhp->next = NULL;
>  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
> @@ -274,27 +313,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
>  	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
>  		if (rsclp->tails[i] != rsclp->tails[i - 1])
>  			break;
> +	rcu_segcblist_inc_seglen(rsclp, i);
>  	WRITE_ONCE(*rsclp->tails[i], rhp);
>  	for (; i <= RCU_NEXT_TAIL; i++)
>  		WRITE_ONCE(rsclp->tails[i], &rhp->next);
>  	return true;
>  }
>  
> @@ -403,6 +437,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>  		if (ULONG_CMP_LT(seq, rsclp->gp_seq[i]))
>  			break;
>  		WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
> +		rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);

Do we still need the same amount of full barriers contained in add() called by move() here?
It's called in the reverse order (write queue then len) than usual. If I trust the comment
in rcu_segcblist_enqueue(), the point of the barrier is to make the length visible before
the new callback for rcu_barrier() (although that concerns len and not seglen). But here
above, the unsegmented length doesn't change. I could understand a write barrier between
add_seglen(x, i) and set_seglen(0, RCU_DONE_TAIL) but I couldn't find a paired couple either.

>  	}
>  
>  	/* If no callbacks moved, nothing more need be done. */
> @@ -423,6 +458,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
>  		if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
>  			break;  /* No more callbacks. */
>  		WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
> +		rcu_segcblist_move_seglen(rsclp, i, j);

Same question here (feel free to reply "same answer" :o)

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-09-23 15:22 ` [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
  2020-10-09 23:14   ` Frederic Weisbecker
@ 2020-10-14 15:06   ` Neeraj Upadhyay
  2020-10-14 15:23     ` joel
  1 sibling, 1 reply; 16+ messages in thread
From: Neeraj Upadhyay @ 2020-10-14 15:06 UTC (permalink / raw)
  To: Joel Fernandes (Google), linux-kernel
  Cc: Ingo Molnar, Josh Triplett, Lai Jiangshan, Madhuparna Bhowmik,
	Mathieu Desnoyers, neeraj.iitr10, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)



On 9/23/2020 8:52 PM, Joel Fernandes (Google) wrote:
> Currently, rcu_do_batch() depends on the unsegmented callback list's len field
> to know how many CBs are executed. This fields counts down from 0 as CBs are
> dequeued.  It is possible that all CBs could not be run because of reaching
> limits in which case the remaining unexecuted callbacks are requeued in the
> CPU's segcblist.
> 
> The number of callbacks that were not requeued are then the negative count (how
> many CBs were run) stored in the rcl->len which has been counting down on every
> dequeue. This negative count is then added to the per-cpu segmented callback
> list's to correct its count.
> 
> Such a design works against future efforts to track the length of each segment
> of the segmented callback list. The reason is because
> rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
> list's length field (rcl->len) during extraction.
> 
> Also, the design of counting down from 0 is confusing and error-prone IMHO.
> 
> This commit therefore explicitly counts have many callbacks were executed in
> rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
> field, without relying on the negativity of rcl->len.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>   kernel/rcu/rcu_segcblist.c | 2 +-
>   kernel/rcu/rcu_segcblist.h | 1 +
>   kernel/rcu/tree.c          | 9 ++++-----
>   3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 2d2a6b6b9dfb..bb246d8c6ef1 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -95,7 +95,7 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
>    * This increase is fully ordered with respect to the callers accesses
>    * both before and after.
>    */
> -static void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
> +void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
>   {
>   #ifdef CONFIG_RCU_NOCB_CPU
>   	smp_mb__before_atomic(); /* Up to the caller! */
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 5c293afc07b8..b90725f81d77 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -76,6 +76,7 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
>   }
>   
>   void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
> +void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
>   void rcu_segcblist_init(struct rcu_segcblist *rsclp);
>   void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
>   void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 7623128d0020..50af465729f4 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2427,7 +2427,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>   			       rcu_segcblist_is_offloaded(&rdp->cblist);
>   	struct rcu_head *rhp;
>   	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
> -	long bl, count;
> +	long bl, count = 0;
>   	long pending, tlimit = 0;
>   
>   	/* If no callbacks are ready, just return. */
> @@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>   	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
>   		rcu_callback_t f;
>   
> +		count++;
>   		debug_rcu_head_unqueue(rhp);
>   
>   		rcu_lock_acquire(&rcu_callback_map);
> @@ -2485,9 +2486,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
>   
>   		/*
>   		 * Stop only if limit reached and CPU has something to do.
> -		 * Note: The rcl structure counts down from zero.
>   		 */
> -		if (-rcl.len >= bl && !offloaded &&
> +		if (count >= bl && !offloaded &&
>   		    (need_resched() ||
>   		     (!is_idle_task(current) && !rcu_is_callbacks_kthread())))
>   			break;

Update below usage of -rcl.len also?

if (likely((-rcl.len & 31) || local_clock() < tlimit))


Thanks
Neeraj

> @@ -2510,7 +2510,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
>   
>   	local_irq_save(flags);
>   	rcu_nocb_lock(rdp);
> -	count = -rcl.len;
>   	rdp->n_cbs_invoked += count;
>   	trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
>   			    is_idle_task(current), rcu_is_callbacks_kthread());
> @@ -2518,7 +2517,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>   	/* Update counts and requeue any remaining callbacks. */
>   	rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl);
>   	smp_mb(); /* List handling before counting for rcu_barrier(). */
> -	rcu_segcblist_insert_count(&rdp->cblist, &rcl);
> +	rcu_segcblist_add_len(&rdp->cblist, -count);
>   
>   	/* Reinstate batch limit if we have worked down the excess. */
>   	count = rcu_segcblist_n_cbs(&rdp->cblist);
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/4] rcu/trace: Add tracing for how segcb list changes
  2020-09-23 15:22 ` [PATCH v6 3/4] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
@ 2020-10-14 15:22   ` Neeraj Upadhyay
  2020-10-15  0:05     ` joel
  0 siblings, 1 reply; 16+ messages in thread
From: Neeraj Upadhyay @ 2020-10-14 15:22 UTC (permalink / raw)
  To: Joel Fernandes (Google), linux-kernel
  Cc: Ingo Molnar, Josh Triplett, Lai Jiangshan, Madhuparna Bhowmik,
	Mathieu Desnoyers, neeraj.iitr10, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)



On 9/23/2020 8:52 PM, Joel Fernandes (Google) wrote:
> Track how the segcb list changes before/after acceleration, during
> queuing and during dequeuing.
> 
> This has proved useful to discover an optimization to avoid unwanted GP
> requests when there are no callbacks accelerated. The overhead is minimal as
> each segment's length is now stored in the respective segment.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>   include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
>   kernel/rcu/rcu_segcblist.c | 34 ++++++++++++++++++++++++++++++++++
>   kernel/rcu/rcu_segcblist.h |  5 +++++
>   kernel/rcu/tree.c          |  9 +++++++++
>   4 files changed, 73 insertions(+)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 155b5cb43cfd..7b84df3c95df 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
>   		  __entry->qlen)
>   );
>   
> +TRACE_EVENT_RCU(rcu_segcb,
> +
> +		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
> +
> +		TP_ARGS(ctx, cb_count, gp_seq),
> +
> +		TP_STRUCT__entry(
> +			__field(const char *, ctx)
> +			__array(int, cb_count, 4)
> +			__array(unsigned long, gp_seq, 4)

Use RCU_CBLIST_NSEGS in place of 4 ?
> +		),
> +
> +		TP_fast_assign(
> +			__entry->ctx = ctx;
> +			memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
> +			memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
> +		),
> +
> +		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
> +			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
> +			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
> +			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
> +
> +);
> +
>   /*
>    * Tracepoint for the registration of a single RCU callback of the special
>    * kvfree() form.  The first argument is the RCU type, the second argument
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 0e6d19bd3de9..df0f31e30947 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -13,6 +13,7 @@
>   #include <linux/rcupdate.h>
>   
>   #include "rcu_segcblist.h"
> +#include "rcu.h"
>   
>   /* Initialize simple callback list. */
>   void rcu_cblist_init(struct rcu_cblist *rclp)
> @@ -343,6 +344,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>   	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
>   }
>   
> +/*
> + * Return how many CBs each segment along with their gp_seq values.
> + *
> + * This function is O(N) where N is the number of callbacks. Only used from

N is number of segments?

> + * tracing code which is usually disabled in production.
> + */
> +#ifdef CONFIG_RCU_TRACE
> +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> +			 int cbcount[RCU_CBLIST_NSEGS],
> +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> +{
> +	int i;
> +
> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
> +		cbcount[i] = 0;
> +

What is the reason for initializing to 0?

> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> +		gpseq[i] = rsclp->gp_seq[i];
> +	}
> +}
> +
> +void trace_rcu_segcb_list(struct rcu_segcblist *rsclp, char *context)
> +{
> +	int cbs[RCU_CBLIST_NSEGS];
> +	unsigned long gps[RCU_CBLIST_NSEGS];
> +
> +	rcu_segcblist_countseq(rsclp, cbs, gps);
> +
> +	trace_rcu_segcb(context, cbs, gps);
> +}
> +#endif
> +
>   /*
>    * Extract only those callbacks still pending (not yet ready to be
>    * invoked) from the specified rcu_segcblist structure and place them in
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 3e0eb1056ae9..15c10d30f88c 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
>   bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
>   void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
>   			 struct rcu_segcblist *src_rsclp);
> +#ifdef CONFIG_RCU_TRACE
> +void trace_rcu_segcb_list(struct rcu_segcblist *rsclp, char *context);
> +#else
> +#define trace_rcu_segcb_list(...)
> +#endif
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 50af465729f4..e3381ff67fc6 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1492,6 +1492,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>   	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
>   		return false;
>   
> +	trace_rcu_segcb_list(&rdp->cblist, "SegCbPreAcc");

Use TPS("SegCbPreAcc") ?


Thanks
Neeraj

> +
>   	/*
>   	 * Callbacks are often registered with incomplete grace-period
>   	 * information.  Something about the fact that getting exact
> @@ -1512,6 +1514,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>   	else
>   		trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
>   
> +	trace_rcu_segcb_list(&rdp->cblist, "SegCbPostAcc");
> +
>   	return ret;
>   }
>   
> @@ -2469,6 +2473,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
>   	/* Invoke callbacks. */
>   	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
>   	rhp = rcu_cblist_dequeue(&rcl);
> +
> +	trace_rcu_segcb_list(&rdp->cblist, "SegCbDequeued");
> +
>   	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
>   		rcu_callback_t f;
>   
> @@ -2982,6 +2989,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
>   		trace_rcu_callback(rcu_state.name, head,
>   				   rcu_segcblist_n_cbs(&rdp->cblist));
>   
> +	trace_rcu_segcb_list(&rdp->cblist, "SegCBQueued");
> +
>   	/* Go handle any RCU core processing required. */
>   	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
>   	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed
  2020-10-14 15:06   ` Neeraj Upadhyay
@ 2020-10-14 15:23     ` joel
  0 siblings, 0 replies; 16+ messages in thread
From: joel @ 2020-10-14 15:23 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: linux-kernel, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Madhuparna Bhowmik, Mathieu Desnoyers, neeraj.iitr10,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony)

On Wed, Oct 14, 2020 at 08:36:16PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 9/23/2020 8:52 PM, Joel Fernandes (Google) wrote:
> > Currently, rcu_do_batch() depends on the unsegmented callback list's len field
> > to know how many CBs are executed. This fields counts down from 0 as CBs are
> > dequeued.  It is possible that all CBs could not be run because of reaching
> > limits in which case the remaining unexecuted callbacks are requeued in the
> > CPU's segcblist.
> > 
> > The number of callbacks that were not requeued are then the negative count (how
> > many CBs were run) stored in the rcl->len which has been counting down on every
> > dequeue. This negative count is then added to the per-cpu segmented callback
> > list's to correct its count.
> > 
> > Such a design works against future efforts to track the length of each segment
> > of the segmented callback list. The reason is because
> > rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
> > list's length field (rcl->len) during extraction.
> > 
> > Also, the design of counting down from 0 is confusing and error-prone IMHO.
> > 
> > This commit therefore explicitly counts have many callbacks were executed in
> > rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
> > field, without relying on the negativity of rcl->len.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >   kernel/rcu/rcu_segcblist.c | 2 +-
> >   kernel/rcu/rcu_segcblist.h | 1 +
> >   kernel/rcu/tree.c          | 9 ++++-----
> >   3 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 2d2a6b6b9dfb..bb246d8c6ef1 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -95,7 +95,7 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)
> >    * This increase is fully ordered with respect to the callers accesses
> >    * both before and after.
> >    */
> > -static void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
> > +void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
> >   {
> >   #ifdef CONFIG_RCU_NOCB_CPU
> >   	smp_mb__before_atomic(); /* Up to the caller! */
> > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > index 5c293afc07b8..b90725f81d77 100644
> > --- a/kernel/rcu/rcu_segcblist.h
> > +++ b/kernel/rcu/rcu_segcblist.h
> > @@ -76,6 +76,7 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
> >   }
> >   void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
> > +void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
> >   void rcu_segcblist_init(struct rcu_segcblist *rsclp);
> >   void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
> >   void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 7623128d0020..50af465729f4 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2427,7 +2427,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >   			       rcu_segcblist_is_offloaded(&rdp->cblist);
> >   	struct rcu_head *rhp;
> >   	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
> > -	long bl, count;
> > +	long bl, count = 0;
> >   	long pending, tlimit = 0;
> >   	/* If no callbacks are ready, just return. */
> > @@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >   	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
> >   		rcu_callback_t f;
> > +		count++;
> >   		debug_rcu_head_unqueue(rhp);
> >   		rcu_lock_acquire(&rcu_callback_map);
> > @@ -2485,9 +2486,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >   		/*
> >   		 * Stop only if limit reached and CPU has something to do.
> > -		 * Note: The rcl structure counts down from zero.
> >   		 */
> > -		if (-rcl.len >= bl && !offloaded &&
> > +		if (count >= bl && !offloaded &&
> >   		    (need_resched() ||
> >   		     (!is_idle_task(current) && !rcu_is_callbacks_kthread())))
> >   			break;
> 
> Update below usage of -rcl.len also?
> 
> if (likely((-rcl.len & 31) || local_clock() < tlimit))

Yes, you are right. I need to change that as well, will do. Thanks!

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/4] rcu/segcblist: Add counters to segcblist datastructure
  2020-10-12 23:20   ` Frederic Weisbecker
@ 2020-10-14 15:25     ` joel
  2020-10-14 23:09     ` joel
  1 sibling, 0 replies; 16+ messages in thread
From: joel @ 2020-10-14 15:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Madhuparna Bhowmik, Mathieu Desnoyers, neeraj.iitr10,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony)

On Tue, Oct 13, 2020 at 01:20:08AM +0200, Frederic Weisbecker wrote:
> On Wed, Sep 23, 2020 at 11:22:09AM -0400, Joel Fernandes (Google) wrote:
> > +/* Return number of callbacks in a segment of the segmented callback list. */
> > +static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
> > +{
> > +#ifdef CONFIG_RCU_NOCB_CPU
> > +	smp_mb__before_atomic(); /* Up to the caller! */
> > +	atomic_long_add(v, &rsclp->seglen[seg]);
> > +	smp_mb__after_atomic(); /* Up to the caller! */
> > +#else
> > +	smp_mb(); /* Up to the caller! */
> > +	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
> > +	smp_mb(); /* Up to the caller! */
> > +#endif
> > +}
> 
> I know that these "Up to the caller" comments come from the existing len
> functions but perhaps we should explain a bit more against what it is ordering
> and what it pairs to.
> 
> Also why do we need one before _and_ after?
> 
> And finally do we have the same ordering requirements than the unsegmented len
> field?

Hi Paul and Neeraj,
Would be nice to discuss this on the call. I actually borrowed the memory
barriers from add_len() just to be safe, but I think Frederic's points are
valid. Would be nice if we can go over all the usecases and discuss which
memory barriers are needed. Thanks for your help!

Another thought: inc_len() calls add_len() which already has smp_mb(), so
callers of inc_len also do not need memory barriers I think.

thanks,

 - Joel


> > +
> > +/* Move from's segment length to to's segment. */
> > +static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, int to)
> > +{
> > +	long len;
> > +
> > +	if (from == to)
> > +		return;
> > +
> > +	len = rcu_segcblist_get_seglen(rsclp, from);
> > +	if (!len)
> > +		return;
> > +
> > +	rcu_segcblist_add_seglen(rsclp, to, len);
> > +	rcu_segcblist_set_seglen(rsclp, from, 0);
> > +}
> > +
> [...]
> > @@ -245,6 +283,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
> >  			   struct rcu_head *rhp)
> >  {
> >  	rcu_segcblist_inc_len(rsclp);
> > +	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
> >  	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
> 
> Since inc_len and even now inc_seglen have two full barriers embracing the add up,
> we can probably spare the above smp_mb()?
> 
> >  	rhp->next = NULL;
> >  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
> > @@ -274,27 +313,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
> >  	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
> >  		if (rsclp->tails[i] != rsclp->tails[i - 1])
> >  			break;
> > +	rcu_segcblist_inc_seglen(rsclp, i);
> >  	WRITE_ONCE(*rsclp->tails[i], rhp);
> >  	for (; i <= RCU_NEXT_TAIL; i++)
> >  		WRITE_ONCE(rsclp->tails[i], &rhp->next);
> >  	return true;
> >  }
> >  
> > @@ -403,6 +437,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
> >  		if (ULONG_CMP_LT(seq, rsclp->gp_seq[i]))
> >  			break;
> >  		WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
> > +		rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
> 
> Do we still need the same amount of full barriers contained in add() called by move() here?
> It's called in the reverse order (write queue then len) than usual. If I trust the comment
> in rcu_segcblist_enqueue(), the point of the barrier is to make the length visible before
> the new callback for rcu_barrier() (although that concerns len and not seglen). But here
> above, the unsegmented length doesn't change. I could understand a write barrier between
> add_seglen(x, i) and set_seglen(0, RCU_DONE_TAIL) but I couldn't find a paired couple either.
> 
> >  	}
> >  
> >  	/* If no callbacks moved, nothing more need be done. */
> > @@ -423,6 +458,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
> >  		if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
> >  			break;  /* No more callbacks. */
> >  		WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
> > +		rcu_segcblist_move_seglen(rsclp, i, j);
> 
> Same question here (feel free to reply "same answer" :o)
> 
> Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/4] rcu/segcblist: Add counters to segcblist datastructure
  2020-10-12 23:20   ` Frederic Weisbecker
  2020-10-14 15:25     ` joel
@ 2020-10-14 23:09     ` joel
  1 sibling, 0 replies; 16+ messages in thread
From: joel @ 2020-10-14 23:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Madhuparna Bhowmik, Mathieu Desnoyers, neeraj.iitr10,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony)

On Tue, Oct 13, 2020 at 01:20:08AM +0200, Frederic Weisbecker wrote:
> On Wed, Sep 23, 2020 at 11:22:09AM -0400, Joel Fernandes (Google) wrote:
> > +/* Return number of callbacks in a segment of the segmented callback list. */
> > +static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v)
> > +{
> > +#ifdef CONFIG_RCU_NOCB_CPU
> > +	smp_mb__before_atomic(); /* Up to the caller! */
> > +	atomic_long_add(v, &rsclp->seglen[seg]);
> > +	smp_mb__after_atomic(); /* Up to the caller! */
> > +#else
> > +	smp_mb(); /* Up to the caller! */
> > +	WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
> > +	smp_mb(); /* Up to the caller! */
> > +#endif
> > +}
> 
> I know that these "Up to the caller" comments come from the existing len
> functions but perhaps we should explain a bit more against what it is ordering
> and what it pairs to.

Sure.

> Also why do we need one before _and_ after?

I removed these memory barriers since they should not be needed, I will
update it this way for v7.

> And finally do we have the same ordering requirements than the unsegmented len
> field?

Do you mean ordering for the rsclp->seglen ? Yes we need not have ordering
for that since there are no races AFAICS (all accesses have either IRQs are
disabled, or nocb lock is held for the offloaded case). If you meant
something else like rcl->len, let me know. AFAICS, we don't have ordering
needs for those. Further, current readers of ->seglen are only for tracing.
->seglen does not influence rcu_barrier yet.

> > +/* Move from's segment length to to's segment. */
> > +static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, int to)
> > +{
> > +	long len;
> > +
> > +	if (from == to)
> > +		return;
> > +
> > +	len = rcu_segcblist_get_seglen(rsclp, from);
> > +	if (!len)
> > +		return;
> > +
> > +	rcu_segcblist_add_seglen(rsclp, to, len);
> > +	rcu_segcblist_set_seglen(rsclp, from, 0);
> > +}
> > +
> [...]
> > @@ -245,6 +283,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
> >  			   struct rcu_head *rhp)
> >  {
> >  	rcu_segcblist_inc_len(rsclp);
> > +	rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
> >  	smp_mb(); /* Ensure counts are updated before callback is enqueued. */
> 
> Since inc_len and even now inc_seglen have two full barriers embracing the add up,
> we can probably spare the above smp_mb()?

Good point, I'll remove it.

> >  	rhp->next = NULL;
> >  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
> > @@ -274,27 +313,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
> >  	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
> >  		if (rsclp->tails[i] != rsclp->tails[i - 1])
> >  			break;
> > +	rcu_segcblist_inc_seglen(rsclp, i);
> >  	WRITE_ONCE(*rsclp->tails[i], rhp);
> >  	for (; i <= RCU_NEXT_TAIL; i++)
> >  		WRITE_ONCE(rsclp->tails[i], &rhp->next);
> >  	return true;
> >  }
> >  
> > @@ -403,6 +437,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
> >  		if (ULONG_CMP_LT(seq, rsclp->gp_seq[i]))
> >  			break;
> >  		WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]);
> > +		rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL);
> 
> Do we still need the same amount of full barriers contained in add() called by move() here?
> It's called in the reverse order (write queue then len) than usual. If I trust the comment
> in rcu_segcblist_enqueue(), the point of the barrier is to make the length visible before
> the new callback for rcu_barrier() (although that concerns len and not seglen). But here
> above, the unsegmented length doesn't change. I could understand a write barrier between
> add_seglen(x, i) and set_seglen(0, RCU_DONE_TAIL) but I couldn't find a paired couple either.

I'm guessing since I removed the memory barriers from seglen updates, this is
resolved.

> >  	}
> >  
> >  	/* If no callbacks moved, nothing more need be done. */
> > @@ -423,6 +458,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq)
> >  		if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL])
> >  			break;  /* No more callbacks. */
> >  		WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]);
> > +		rcu_segcblist_move_seglen(rsclp, i, j);
> 
> Same question here (feel free to reply "same answer" :o)

Same answer :P

So based on these and other comments, I will update the patches and send them
out shortly.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/4] rcu/trace: Add tracing for how segcb list changes
  2020-10-14 15:22   ` Neeraj Upadhyay
@ 2020-10-15  0:05     ` joel
  0 siblings, 0 replies; 16+ messages in thread
From: joel @ 2020-10-15  0:05 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: linux-kernel, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Madhuparna Bhowmik, Mathieu Desnoyers, neeraj.iitr10,
	Paul E. McKenney, rcu, Steven Rostedt, Uladzislau Rezki (Sony)

On Wed, Oct 14, 2020 at 08:52:17PM +0530, Neeraj Upadhyay wrote:
> 
> 
> On 9/23/2020 8:52 PM, Joel Fernandes (Google) wrote:
> > Track how the segcb list changes before/after acceleration, during
> > queuing and during dequeuing.
> > 
> > This has proved useful to discover an optimization to avoid unwanted GP
> > requests when there are no callbacks accelerated. The overhead is minimal as
> > each segment's length is now stored in the respective segment.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >   include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
> >   kernel/rcu/rcu_segcblist.c | 34 ++++++++++++++++++++++++++++++++++
> >   kernel/rcu/rcu_segcblist.h |  5 +++++
> >   kernel/rcu/tree.c          |  9 +++++++++
> >   4 files changed, 73 insertions(+)
> > 
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index 155b5cb43cfd..7b84df3c95df 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
> >   		  __entry->qlen)
> >   );
> > +TRACE_EVENT_RCU(rcu_segcb,
> > +
> > +		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
> > +
> > +		TP_ARGS(ctx, cb_count, gp_seq),
> > +
> > +		TP_STRUCT__entry(
> > +			__field(const char *, ctx)
> > +			__array(int, cb_count, 4)
> > +			__array(unsigned long, gp_seq, 4)
> 
> Use RCU_CBLIST_NSEGS in place of 4 ?

Done.

> > +		),
> > +
> > +		TP_fast_assign(
> > +			__entry->ctx = ctx;
> > +			memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
> > +			memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
> > +		),
> > +
> > +		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
> > +			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
> > +			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
> > +			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
> > +
> > +);
> > +
> >   /*
> >    * Tracepoint for the registration of a single RCU callback of the special
> >    * kvfree() form.  The first argument is the RCU type, the second argument
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 0e6d19bd3de9..df0f31e30947 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -13,6 +13,7 @@
> >   #include <linux/rcupdate.h>
> >   #include "rcu_segcblist.h"
> > +#include "rcu.h"
> >   /* Initialize simple callback list. */
> >   void rcu_cblist_init(struct rcu_cblist *rclp)
> > @@ -343,6 +344,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
> >   	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
> >   }
> > +/*
> > + * Return how many CBs each segment along with their gp_seq values.
> > + *
> > + * This function is O(N) where N is the number of callbacks. Only used from
> 
> N is number of segments?

Yes, will fix.

> > + * tracing code which is usually disabled in production.
> > + */
> > +#ifdef CONFIG_RCU_TRACE
> > +static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> > +			 int cbcount[RCU_CBLIST_NSEGS],
> > +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
> > +		cbcount[i] = 0;
> > +
> 
> What is the reason for initializing to 0?

You are right, not needed. I'll remove.

> > +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> > +		cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
> > +		gpseq[i] = rsclp->gp_seq[i];
> > +	}
> > +}
> > +
> > +void trace_rcu_segcb_list(struct rcu_segcblist *rsclp, char *context)
> > +{
> > +	int cbs[RCU_CBLIST_NSEGS];
> > +	unsigned long gps[RCU_CBLIST_NSEGS];
> > +
> > +	rcu_segcblist_countseq(rsclp, cbs, gps);
> > +
> > +	trace_rcu_segcb(context, cbs, gps);
> > +}
> > +#endif
> > +
> >   /*
> >    * Extract only those callbacks still pending (not yet ready to be
> >    * invoked) from the specified rcu_segcblist structure and place them in
> > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > index 3e0eb1056ae9..15c10d30f88c 100644
> > --- a/kernel/rcu/rcu_segcblist.h
> > +++ b/kernel/rcu/rcu_segcblist.h
> > @@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
> >   bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
> >   void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
> >   			 struct rcu_segcblist *src_rsclp);
> > +#ifdef CONFIG_RCU_TRACE
> > +void trace_rcu_segcb_list(struct rcu_segcblist *rsclp, char *context);
> > +#else
> > +#define trace_rcu_segcb_list(...)
> > +#endif
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 50af465729f4..e3381ff67fc6 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1492,6 +1492,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >   	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
> >   		return false;
> > +	trace_rcu_segcb_list(&rdp->cblist, "SegCbPreAcc");
> 
> Use TPS("SegCbPreAcc") ?

Fixed, thanks!

thanks,

 - Joel

> 
> 
> Thanks
> Neeraj
> 
> > +
> >   	/*
> >   	 * Callbacks are often registered with incomplete grace-period
> >   	 * information.  Something about the fact that getting exact
> > @@ -1512,6 +1514,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >   	else
> >   		trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
> > +	trace_rcu_segcb_list(&rdp->cblist, "SegCbPostAcc");
> > +
> >   	return ret;
> >   }
> > @@ -2469,6 +2473,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >   	/* Invoke callbacks. */
> >   	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
> >   	rhp = rcu_cblist_dequeue(&rcl);
> > +
> > +	trace_rcu_segcb_list(&rdp->cblist, "SegCbDequeued");
> > +
> >   	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
> >   		rcu_callback_t f;
> > @@ -2982,6 +2989,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
> >   		trace_rcu_callback(rcu_state.name, head,
> >   				   rcu_segcblist_n_cbs(&rdp->cblist));
> > +	trace_rcu_segcb_list(&rdp->cblist, "SegCBQueued");
> > +
> >   	/* Go handle any RCU core processing required. */
> >   	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> >   	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
> > 
> 
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of
> the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-10-15  1:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 15:22 [PATCH v6 0/4] Add support for length of each segment in the segcblist Joel Fernandes (Google)
2020-09-23 15:22 ` [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed Joel Fernandes (Google)
2020-10-09 23:14   ` Frederic Weisbecker
2020-10-11 16:35     ` Joel Fernandes
2020-10-12 13:50       ` Paul E. McKenney
2020-10-14 15:06   ` Neeraj Upadhyay
2020-10-14 15:23     ` joel
2020-09-23 15:22 ` [PATCH v6 2/4] rcu/segcblist: Add counters to segcblist datastructure Joel Fernandes (Google)
2020-10-12 23:20   ` Frederic Weisbecker
2020-10-14 15:25     ` joel
2020-10-14 23:09     ` joel
2020-09-23 15:22 ` [PATCH v6 3/4] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
2020-10-14 15:22   ` Neeraj Upadhyay
2020-10-15  0:05     ` joel
2020-09-23 15:22 ` [PATCH v6 4/4] rcu/segcblist: Remove useless rcupdate.h include Joel Fernandes (Google)
2020-09-24 23:42 ` [PATCH v6 0/4] Add support for length of each segment in the segcblist Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.