linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate
@ 2020-06-18 20:29 Joel Fernandes (Google)
  2020-06-18 20:29 ` [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Joel Fernandes (Google) @ 2020-06-18 20:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	urezki, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, Paul E. McKenney,
	rcu, Steven Rostedt

rcu_segcblist_accelerate() returns true if a GP is to be
started/requested and false if not. During tracing, I found that it is
asking that GPs be requested

The exact flow seems to be something like:
1. Callbacks are queued on CPU A - into the NEXT list.
2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a GP X.
3. GP thread wakes up on another CPU, starts the GP X and requests QS from CPU A.
4. CPU A's softirq runs again presumably because of #3.
5. CPU A's softirq now does acceleration again, this time no CBs are
   accelerated since last attempt, but it still requests GP X+1 which
   could be useless.

The fix is, prevent the useless GP start if we detect no CBs are there
to accelerate.

With this, we have the following improvement in short runs of
rcutorture (5 seconds each):
+----+-------------------+-------------------+
| #  | Number of GPs     | Number of Wakeups |
+====+=========+=========+=========+=========+
| 1  | With    | Without | With    | Without |
+----+---------+---------+---------+---------+
| 2  |      75 |      89 |     113 |     119 |
+----+---------+---------+---------+---------+
| 3  |      62 |      91 |     105 |     123 |
+----+---------+---------+---------+---------+
| 4  |      60 |      79 |      98 |     110 |
+----+---------+---------+---------+---------+
| 5  |      63 |      79 |      99 |     112 |
+----+---------+---------+---------+---------+
| 6  |      57 |      89 |      96 |     123 |
+----+---------+---------+---------+---------+
| 7  |      64 |      85 |      97 |     118 |
+----+---------+---------+---------+---------+
| 8  |      58 |      83 |      98 |     113 |
+----+---------+---------+---------+---------+
| 9  |      57 |      77 |      89 |     104 |
+----+---------+---------+---------+---------+
| 10 |      66 |      82 |      98 |     119 |
+----+---------+---------+---------+---------+
| 11 |      52 |      82 |      83 |     117 |
+----+---------+---------+---------+---------+

Cc: urezki@gmail.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu_segcblist.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 9a0f66133b4b3..4782cf17bf4f9 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -475,8 +475,15 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 	 * Also advance to the oldest segment of callbacks whose
 	 * ->gp_seq[] completion is at or after that passed in via "seq",
 	 * skipping any empty segments.
+	 *
+	 * Note that "i" is the youngest segment of the list after which
+	 * any older segments than "i" would not be mutated or assigned
+	 * GPs. For example, if i == WAIT_TAIL, then neither WAIT_TAIL,
+	 * nor DONE_TAIL will be touched. Only CBs in NEXT_TAIL will be
+	 * merged with NEXT_READY_TAIL and the GP numbers of both of
+	 * them would be updated.
 	 */
-	if (++i >= RCU_NEXT_TAIL)
+	if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
 		return false;
 
 	/*
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes
  2020-06-18 20:29 [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Joel Fernandes (Google)
@ 2020-06-18 20:29 ` Joel Fernandes (Google)
  2020-06-18 22:16   ` Paul E. McKenney
  2020-06-18 20:29 ` [PATCH 3/7] rcu/trace: Add name of the source for gp_seq Joel Fernandes (Google)
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes (Google) @ 2020-06-18 20:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

Track how the segcb list changes before/after acceleration, during
queuing and during dequeuing.

This has proved useful to discover an optimization to avoid unwanted GP
requests when there are no callbacks accelerated.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
 kernel/rcu/rcu_segcblist.c | 37 +++++++++++++++++++++++++++++++++++++
 kernel/rcu/rcu_segcblist.h |  7 +++++++
 kernel/rcu/tree.c          | 24 ++++++++++++++++++++++++
 4 files changed, 93 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 02dcd119f3263..a6d49864dcc27 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -507,6 +507,31 @@ TRACE_EVENT_RCU(rcu_callback,
 		  __entry->qlen)
 );
 
+TRACE_EVENT_RCU(rcu_segcb,
+
+		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
+
+		TP_ARGS(ctx, cb_count, gp_seq),
+
+		TP_STRUCT__entry(
+			__field(const char *, ctx)
+			__array(int, cb_count, 4)
+			__array(unsigned long, gp_seq, 4)
+		),
+
+		TP_fast_assign(
+			__entry->ctx = ctx;
+			memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
+			memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
+		),
+
+		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
+			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
+			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
+			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
+
+);
+
 /*
  * Tracepoint for the registration of a single RCU callback of the special
  * kfree() form.  The first argument is the RCU type, the second argument
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 4782cf17bf4f9..036d4abac7c5a 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -316,6 +316,43 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
 			WRITE_ONCE(rsclp->tails[i], &rsclp->head);
 }
 
+/*
+ * Return how many CBs each segment along with their gp_seq values.
+ *
+ * This function is O(N) where N is the number of callbacks. Only used from
+ * tracing code which is usually disabled in production.
+ */
+#ifdef CONFIG_RCU_TRACE
+void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
+			 int cbcount[RCU_CBLIST_NSEGS],
+			 unsigned long gpseq[RCU_CBLIST_NSEGS])
+{
+	struct rcu_head **cur_tail, *h;
+	int i, c;
+
+	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
+		cbcount[i] = 0;
+
+	cur_tail = &(rsclp->head);
+
+	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
+		c = 0;
+		// List empty?
+		if (rsclp->tails[i] != cur_tail) {
+			// The loop skips the last node
+			c = 1;
+			for (h = *cur_tail; h->next != *(rsclp->tails[i]); h = h->next) {
+				c++;
+			}
+		}
+
+		cbcount[i] = c;
+		gpseq[i] = rsclp->gp_seq[i];
+		cur_tail = rsclp->tails[i];
+	}
+}
+#endif
+
 /*
  * Extract only those callbacks still pending (not yet ready to be
  * invoked) from the specified rcu_segcblist structure and place them in
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 5c293afc07b8e..0a8dbac438529 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -104,3 +104,10 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
 bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
 void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
 			 struct rcu_segcblist *src_rsclp);
+#ifdef CONFIG_RCU_TRACE
+void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
+			 int cbcount[RCU_CBLIST_NSEGS],
+			 unsigned long gpseq[RCU_CBLIST_NSEGS]);
+#else
+#define rcu_segcblist_countseq(...)
+#endif
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ebce14e470d0e..c61af6a33fbfd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1407,6 +1407,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 {
 	unsigned long gp_seq_req;
 	bool ret = false;
+	int cbs[RCU_CBLIST_NSEGS];
+	unsigned long gps[RCU_CBLIST_NSEGS];
 
 	rcu_lockdep_assert_cblist_protected(rdp);
 	raw_lockdep_assert_held_rcu_node(rnp);
@@ -1415,6 +1417,10 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
 		return false;
 
+	/* Count CBs for tracing. */
+	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
+	trace_rcu_segcb("SegCbPreAcc", cbs, gps);
+
 	/*
 	 * Callbacks are often registered with incomplete grace-period
 	 * information.  Something about the fact that getting exact
@@ -1434,6 +1440,11 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
 	else
 		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
+
+	/* Count CBs for tracing. */
+	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
+	trace_rcu_segcb("SegCbPostAcc", cbs, gps);
+
 	return ret;
 }
 
@@ -2316,6 +2327,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count;
 	long pending, tlimit = 0;
+	int cbs[RCU_CBLIST_NSEGS];
+	unsigned long gps[RCU_CBLIST_NSEGS];
 
 	/* If no callbacks are ready, just return. */
 	if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
@@ -2350,6 +2363,11 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	/* Invoke callbacks. */
 	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
 	rhp = rcu_cblist_dequeue(&rcl);
+
+	/* Count CBs for tracing. */
+	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
+	trace_rcu_segcb("SegCbDequeued", cbs, gps);
+
 	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
 		rcu_callback_t f;
 
@@ -2808,6 +2826,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 	unsigned long flags;
 	struct rcu_data *rdp;
 	bool was_alldone;
+	int cbs[RCU_CBLIST_NSEGS];
+	unsigned long gps[RCU_CBLIST_NSEGS];
 
 	/* Misaligned rcu_head! */
 	WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1));
@@ -2852,6 +2872,10 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 		trace_rcu_callback(rcu_state.name, head,
 				   rcu_segcblist_n_cbs(&rdp->cblist));
 
+	/* Count CBs for tracing. */
+	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
+	trace_rcu_segcb("SegCBQueued", cbs, gps);
+
 	/* Go handle any RCU core processing required. */
 	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
 	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/7] rcu/trace: Add name of the source for gp_seq
  2020-06-18 20:29 [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Joel Fernandes (Google)
  2020-06-18 20:29 ` [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
@ 2020-06-18 20:29 ` Joel Fernandes (Google)
  2020-06-18 22:19   ` Paul E. McKenney
  2020-06-18 20:29 ` [PATCH 4/7] rcu/trace: Print negative GP numbers correctly Joel Fernandes (Google)
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes (Google) @ 2020-06-18 20:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

The gp_seq value can come from either of rdp, rsp or rnp.

Only the rsp is the global source of truth (most accurate GP info). The
rnp can be off by ~1 and the rdp can be off by way more. Add some more
context to traces to clarify where it comes from.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/trace/events/rcu.h | 12 ++++++++----
 kernel/rcu/tree.c          | 32 ++++++++++++++++----------------
 kernel/rcu/tree_plugin.h   |  2 +-
 3 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index a6d49864dcc27..cb5363564f7ed 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
  */
 TRACE_EVENT_RCU(rcu_grace_period,
 
-	TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
+	TP_PROTO(const char *rcuname, const char *gp_seq_src,
+		unsigned long gp_seq, const char *gpevent),
 
-	TP_ARGS(rcuname, gp_seq, gpevent),
+	TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
+		__field(const char *, gp_seq_src)
 		__field(unsigned long, gp_seq)
 		__field(const char *, gpevent)
 	),
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
+		__entry->gp_seq_src = gp_seq_src;
 		__entry->gp_seq = gp_seq;
 		__entry->gpevent = gpevent;
 	),
 
-	TP_printk("%s %lu %s",
-		  __entry->rcuname, __entry->gp_seq, __entry->gpevent)
+	TP_printk("%s %s_gp_seq=%lu %s",
+		  __entry->rcuname, __entry->gp_seq_src,
+		  __entry->gp_seq, __entry->gpevent)
 );
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c61af6a33fbfd..81df1b837dd9d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
 		goto unlock_out;
 	}
-	trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
+	trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
 	ret = true;  /* Caller must wake GP kthread. */
 unlock_out:
 	/* Push furthest requested GP to leaf node and rcu_data structure. */
@@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 
 	/* Trace depending on how much we were able to accelerate. */
 	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
-		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
+		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
 	else
-		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
+		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
 
 	/* Count CBs for tracing. */
 	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
@@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
 		if (!offloaded)
 			ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
 		rdp->core_needs_qs = false;
-		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
+		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
 	} else {
 		if (!offloaded)
 			ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
@@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
 		 * set up to detect a quiescent state, otherwise don't
 		 * go looking for one.
 		 */
-		trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
+		trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
 		need_qs = !!(rnp->qsmask & rdp->grpmask);
 		rdp->cpu_no_qs.b.norm = need_qs;
 		rdp->core_needs_qs = need_qs;
@@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
 	/* Record GP times before starting GP, hence rcu_seq_start(). */
 	rcu_seq_start(&rcu_state.gp_seq);
 	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
-	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
+	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
 	raw_spin_unlock_irq_rcu_node(rnp);
 
 	/*
@@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
 			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
 				   jiffies + (j ? 3 * j : 2));
 		}
-		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
 				       TPS("fqswait"));
 		rcu_state.gp_state = RCU_GP_WAIT_FQS;
 		ret = swait_event_idle_timeout_exclusive(
@@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
 		/* If time for quiescent-state forcing, do it. */
 		if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
 		    (gf & RCU_GP_FLAG_FQS)) {
-			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
 					       TPS("fqsstart"));
 			rcu_gp_fqs(first_gp_fqs);
 			gf = 0;
@@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
 				first_gp_fqs = false;
 				gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
 			}
-			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
 					       TPS("fqsend"));
 			cond_resched_tasks_rcu_qs();
 			WRITE_ONCE(rcu_state.gp_activity, jiffies);
@@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
 			cond_resched_tasks_rcu_qs();
 			WRITE_ONCE(rcu_state.gp_activity, jiffies);
 			WARN_ON(signal_pending(current));
-			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
 					       TPS("fqswaitsig"));
 			ret = 1; /* Keep old FQS timing. */
 			j = jiffies;
@@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
 	raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
 
 	/* Declare grace period done, trace first to use old GP number. */
-	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
+	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
 	rcu_seq_end(&rcu_state.gp_seq);
 	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
 	rcu_state.gp_state = RCU_GP_IDLE;
@@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
 	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
 		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
 		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
-		trace_rcu_grace_period(rcu_state.name,
+		trace_rcu_grace_period(rcu_state.name, "rsp",
 				       rcu_state.gp_seq,
 				       TPS("newreq"));
 	} else {
@@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
 
 		/* Handle grace-period start. */
 		for (;;) {
-			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
 					       TPS("reqwait"));
 			rcu_state.gp_state = RCU_GP_WAIT_GPS;
 			swait_event_idle_exclusive(rcu_state.gp_wq,
@@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
 			cond_resched_tasks_rcu_qs();
 			WRITE_ONCE(rcu_state.gp_activity, jiffies);
 			WARN_ON(signal_pending(current));
-			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
 					       TPS("reqwaitsig"));
 		}
 
@@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
 		return 0;
 
 	blkd = !!(rnp->qsmask & rdp->grpmask);
-	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
+	trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
 			       blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
 	return 0;
 }
@@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
 	rdp->core_needs_qs = false;
 	rdp->rcu_iw_pending = false;
 	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
-	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
+	trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	rcu_prepare_kthreads(cpu);
 	rcu_spawn_cpu_nocb_kthread(cpu);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 982fc5be52698..32f761cf16c33 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -262,7 +262,7 @@ static void rcu_qs(void)
 {
 	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
 	if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
-		trace_rcu_grace_period(TPS("rcu_preempt"),
+		trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
 				       __this_cpu_read(rcu_data.gp_seq),
 				       TPS("cpuqs"));
 		__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/7] rcu/trace: Print negative GP numbers correctly
  2020-06-18 20:29 [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Joel Fernandes (Google)
  2020-06-18 20:29 ` [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
  2020-06-18 20:29 ` [PATCH 3/7] rcu/trace: Add name of the source for gp_seq Joel Fernandes (Google)
@ 2020-06-18 20:29 ` Joel Fernandes (Google)
  2020-06-18 22:30   ` Paul E. McKenney
  2020-06-18 20:29 ` [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes (Google) @ 2020-06-18 20:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

GP numbers start from -300 and gp_seq numbers start of -1200 (for a
shift of 2). These negative numbers are printed as unsigned long which
not only takes up more text space, but is rather confusing to the reader
as they have to constantly expend energy to truncate the number. Just
print the negative numbering directly.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/trace/events/rcu.h | 62 ++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index cb5363564f7ed..bc24862790623 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -76,18 +76,18 @@ TRACE_EVENT_RCU(rcu_grace_period,
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
 		__field(const char *, gp_seq_src)
-		__field(unsigned long, gp_seq)
+		__field(long, gp_seq)
 		__field(const char *, gpevent)
 	),
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
 		__entry->gp_seq_src = gp_seq_src;
-		__entry->gp_seq = gp_seq;
+		__entry->gp_seq = (long)gp_seq;
 		__entry->gpevent = gpevent;
 	),
 
-	TP_printk("%s %s_gp_seq=%lu %s",
+	TP_printk("%s %s_gp_seq=%ld %s",
 		  __entry->rcuname, __entry->gp_seq_src,
 		  __entry->gp_seq, __entry->gpevent)
 );
@@ -118,8 +118,8 @@ TRACE_EVENT_RCU(rcu_future_grace_period,
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
-		__field(unsigned long, gp_seq)
-		__field(unsigned long, gp_seq_req)
+		__field(long, gp_seq)
+		__field(long, gp_seq_req)
 		__field(u8, level)
 		__field(int, grplo)
 		__field(int, grphi)
@@ -128,16 +128,16 @@ TRACE_EVENT_RCU(rcu_future_grace_period,
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
-		__entry->gp_seq = gp_seq;
-		__entry->gp_seq_req = gp_seq_req;
+		__entry->gp_seq = (long)gp_seq;
+		__entry->gp_seq_req = (long)gp_seq_req;
 		__entry->level = level;
 		__entry->grplo = grplo;
 		__entry->grphi = grphi;
 		__entry->gpevent = gpevent;
 	),
 
-	TP_printk("%s %lu %lu %u %d %d %s",
-		  __entry->rcuname, __entry->gp_seq, __entry->gp_seq_req, __entry->level,
+	TP_printk("%s %ld %ld %u %d %d %s",
+		  __entry->rcuname, (long)__entry->gp_seq, (long)__entry->gp_seq_req, __entry->level,
 		  __entry->grplo, __entry->grphi, __entry->gpevent)
 );
 
@@ -157,7 +157,7 @@ TRACE_EVENT_RCU(rcu_grace_period_init,
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
-		__field(unsigned long, gp_seq)
+		__field(long, gp_seq)
 		__field(u8, level)
 		__field(int, grplo)
 		__field(int, grphi)
@@ -166,14 +166,14 @@ TRACE_EVENT_RCU(rcu_grace_period_init,
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
-		__entry->gp_seq = gp_seq;
+		__entry->gp_seq = (long)gp_seq;
 		__entry->level = level;
 		__entry->grplo = grplo;
 		__entry->grphi = grphi;
 		__entry->qsmask = qsmask;
 	),
 
-	TP_printk("%s %lu %u %d %d %lx",
+	TP_printk("%s %ld %u %d %d %lx",
 		  __entry->rcuname, __entry->gp_seq, __entry->level,
 		  __entry->grplo, __entry->grphi, __entry->qsmask)
 );
@@ -201,17 +201,17 @@ TRACE_EVENT_RCU(rcu_exp_grace_period,
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
-		__field(unsigned long, gpseq)
+		__field(long, gpseq)
 		__field(const char *, gpevent)
 	),
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
-		__entry->gpseq = gpseq;
+		__entry->gpseq = (long)gpseq;
 		__entry->gpevent = gpevent;
 	),
 
-	TP_printk("%s %lu %s",
+	TP_printk("%s %ld %s",
 		  __entry->rcuname, __entry->gpseq, __entry->gpevent)
 );
 
@@ -320,17 +320,17 @@ TRACE_EVENT_RCU(rcu_preempt_task,
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
-		__field(unsigned long, gp_seq)
+		__field(long, gp_seq)
 		__field(int, pid)
 	),
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
-		__entry->gp_seq = gp_seq;
+		__entry->gp_seq = (long)gp_seq;
 		__entry->pid = pid;
 	),
 
-	TP_printk("%s %lu %d",
+	TP_printk("%s %ld %d",
 		  __entry->rcuname, __entry->gp_seq, __entry->pid)
 );
 
@@ -347,17 +347,17 @@ TRACE_EVENT_RCU(rcu_unlock_preempted_task,
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
-		__field(unsigned long, gp_seq)
+		__field(long, gp_seq)
 		__field(int, pid)
 	),
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
-		__entry->gp_seq = gp_seq;
+		__entry->gp_seq = (long)gp_seq;
 		__entry->pid = pid;
 	),
 
-	TP_printk("%s %lu %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
+	TP_printk("%s %ld %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
 );
 
 /*
@@ -378,7 +378,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
-		__field(unsigned long, gp_seq)
+		__field(long, gp_seq)
 		__field(unsigned long, mask)
 		__field(unsigned long, qsmask)
 		__field(u8, level)
@@ -389,7 +389,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
-		__entry->gp_seq = gp_seq;
+		__entry->gp_seq = (long)gp_seq;
 		__entry->mask = mask;
 		__entry->qsmask = qsmask;
 		__entry->level = level;
@@ -398,7 +398,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
 		__entry->gp_tasks = gp_tasks;
 	),
 
-	TP_printk("%s %lu %lx>%lx %u %d %d %u",
+	TP_printk("%s %ld %lx>%lx %u %d %d %u",
 		  __entry->rcuname, __entry->gp_seq,
 		  __entry->mask, __entry->qsmask, __entry->level,
 		  __entry->grplo, __entry->grphi, __entry->gp_tasks)
@@ -419,19 +419,19 @@ TRACE_EVENT_RCU(rcu_fqs,
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
-		__field(unsigned long, gp_seq)
+		__field(long, gp_seq)
 		__field(int, cpu)
 		__field(const char *, qsevent)
 	),
 
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
-		__entry->gp_seq = gp_seq;
+		__entry->gp_seq = (long)gp_seq;
 		__entry->cpu = cpu;
 		__entry->qsevent = qsevent;
 	),
 
-	TP_printk("%s %lu %d %s",
+	TP_printk("%s %ld %d %s",
 		  __entry->rcuname, __entry->gp_seq,
 		  __entry->cpu, __entry->qsevent)
 );
@@ -520,17 +520,19 @@ TRACE_EVENT_RCU(rcu_segcb,
 		TP_STRUCT__entry(
 			__field(const char *, ctx)
 			__array(int, cb_count, 4)
-			__array(unsigned long, gp_seq, 4)
+			__array(long, gp_seq, 4)
 		),
 
 		TP_fast_assign(
+			int i;
 			__entry->ctx = ctx;
 			memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
-			memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
+			for (i = 0; i < 4; i++)
+				__entry->gp_seq[i] = (long)(gp_seq[i]);
 		),
 
 		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
-			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
+			  "gp_seq: (DONE=%ld, WAIT=%ld, NEXT_READY=%ld, NEXT=%ld)", __entry->ctx,
 			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
 			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
 
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint
  2020-06-18 20:29 [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2020-06-18 20:29 ` [PATCH 4/7] rcu/trace: Print negative GP numbers correctly Joel Fernandes (Google)
@ 2020-06-18 20:29 ` Joel Fernandes (Google)
  2020-06-18 22:27   ` Paul E. McKenney
  2020-06-18 20:29 ` [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes (Google) @ 2020-06-18 20:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

During acceleration of CB, the rsp's gp_seq is rcu_seq_snap'd. This is
the value used for acceleration - it is the value of gp_seq at which it
is safe the execute all callbacks in the callback list.

The rdp's gp_seq is not very useful for this scenario. Make
rcu_grace_period report the rsp's gp_seq instead as it allows one to
reason about how the acceleration works.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 81df1b837dd9d..c3bae7a83d792 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
 
 	/* Trace depending on how much we were able to accelerate. */
 	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
-		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
+		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccWaitCB"));
 	else
-		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
+		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccReadyCB"));
 
 	/* Count CBs for tracing. */
 	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread
  2020-06-18 20:29 [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2020-06-18 20:29 ` [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint Joel Fernandes (Google)
@ 2020-06-18 20:29 ` Joel Fernandes (Google)
  2020-06-18 22:40   ` Paul E. McKenney
  2020-06-18 20:29 ` [PATCH 7/7] rcutorture: Add number of GP information to reports Joel Fernandes (Google)
  2020-06-18 22:11 ` [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Paul E. McKenney
  6 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes (Google) @ 2020-06-18 20:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

This is useful to check for any improvements or degradation related to
number of GP kthread wakeups during testing.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/Kconfig.debug |  1 +
 kernel/rcu/rcu.h         |  2 ++
 kernel/rcu/rcutorture.c  | 23 ++++++++++++++++++++++-
 kernel/rcu/tree.c        |  7 +++++++
 4 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 3cf6132a4bb9f..3323e3378af5a 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -50,6 +50,7 @@ config RCU_TORTURE_TEST
 	select TASKS_RCU
 	select TASKS_RUDE_RCU
 	select TASKS_TRACE_RCU
+	select SCHEDSTATS
 	default n
 	help
 	  This option provides a kernel module that runs torture tests
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index cf66a3ccd7573..7e867e81d9738 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -511,6 +511,7 @@ srcu_batches_completed(struct srcu_struct *sp) { return 0; }
 static inline void rcu_force_quiescent_state(void) { }
 static inline void show_rcu_gp_kthreads(void) { }
 static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
+static inline struct task_struct *rcu_get_main_gp_kthread(void) { return 0; }
 static inline void rcu_fwd_progress_check(unsigned long j) { }
 #else /* #ifdef CONFIG_TINY_RCU */
 bool rcu_dynticks_zero_in_eqs(int cpu, int *vp);
@@ -519,6 +520,7 @@ unsigned long rcu_exp_batches_completed(void);
 unsigned long srcu_batches_completed(struct srcu_struct *sp);
 void show_rcu_gp_kthreads(void);
 int rcu_get_gp_kthreads_prio(void);
+struct task_struct *rcu_get_main_gp_kthread(void);
 void rcu_fwd_progress_check(unsigned long j);
 void rcu_force_quiescent_state(void);
 extern struct workqueue_struct *rcu_gp_wq;
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index d0d265304d147..959a1f84d6904 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -23,6 +23,7 @@
 #include <linux/rcupdate_wait.h>
 #include <linux/interrupt.h>
 #include <linux/sched/signal.h>
+#include <linux/sched/stat.h>
 #include <uapi/linux/sched/types.h>
 #include <linux/atomic.h>
 #include <linux/bitops.h>
@@ -460,9 +461,29 @@ static void rcu_sync_torture_init(void)
 	INIT_LIST_HEAD(&rcu_torture_removed);
 }
 
+unsigned long rcu_gp_nr_wakeups;
+
+static void rcu_flavor_init(void)
+{
+	rcu_sync_torture_init();
+
+	/* Make sure schedstat is enabled for GP thread wakeup count. */
+	force_schedstat_enabled();
+	rcu_gp_nr_wakeups = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
+}
+
+static void rcu_flavor_cleanup(void)
+{
+	unsigned long now_nr = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
+
+	pr_alert("End-test: Cleanup: Total GP-kthread wakeups: %lu\n",
+		now_nr - rcu_gp_nr_wakeups);
+}
+
 static struct rcu_torture_ops rcu_ops = {
 	.ttype		= RCU_FLAVOR,
-	.init		= rcu_sync_torture_init,
+	.init		= rcu_flavor_init,
+	.cleanup	= rcu_flavor_cleanup,
 	.readlock	= rcu_torture_read_lock,
 	.read_delay	= rcu_read_delay,
 	.readunlock	= rcu_torture_read_unlock,
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c3bae7a83d792..a3a175feb310a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -175,6 +175,13 @@ int rcu_get_gp_kthreads_prio(void)
 }
 EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
 
+/* Retrieve RCU's main GP kthread task_struct */
+struct task_struct *rcu_get_main_gp_kthread(void)
+{
+	return rcu_state.gp_kthread;
+}
+EXPORT_SYMBOL_GPL(rcu_get_main_gp_kthread);
+
 /*
  * Number of grace periods between delays, normalized by the duration of
  * the delay.  The longer the delay, the more the grace periods between
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/7] rcutorture: Add number of GP information to reports
  2020-06-18 20:29 [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Joel Fernandes (Google)
                   ` (4 preceding siblings ...)
  2020-06-18 20:29 ` [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread Joel Fernandes (Google)
@ 2020-06-18 20:29 ` Joel Fernandes (Google)
  2020-06-18 23:27   ` Paul E. McKenney
  2020-06-18 22:11 ` [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Paul E. McKenney
  6 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes (Google) @ 2020-06-18 20:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Lai Jiangshan,
	Marco Elver, Mathieu Desnoyers, Paul E. McKenney, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

Add 2 things to get visiblity around number of grace periods.
1. Add number of GPs to End-state print.
2. Just like End-state, add GP state to Start-state.

This helps determine how many GPs elapsed during a run of rcutorture and
what the initial state was.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcutorture.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 959a1f84d6904..c80725ddd9030 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -186,6 +186,7 @@ static long n_barrier_successes; /* did rcu_barrier test succeed? */
 static unsigned long n_read_exits;
 static struct list_head rcu_torture_removed;
 static unsigned long shutdown_jiffies;
+static unsigned long start_gp_seq;
 
 static int rcu_torture_writer_state;
 #define RTWS_FIXED_DELAY	0
@@ -2508,8 +2509,9 @@ rcu_torture_cleanup(void)
 
 	rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq);
 	srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq);
-	pr_alert("%s:  End-test grace-period state: g%lu f%#x\n",
-		 cur_ops->name, gp_seq, flags);
+	pr_alert("%s:  End-test grace-period state: g%ld f%#x total-gps=%ld\n",
+		 cur_ops->name, (long)gp_seq, flags,
+		 rcutorture_seq_diff(gp_seq, start_gp_seq));
 	torture_stop_kthread(rcu_torture_stats, stats_task);
 	torture_stop_kthread(rcu_torture_fqs, fqs_task);
 	if (rcu_torture_can_boost())
@@ -2633,6 +2635,8 @@ rcu_torture_init(void)
 	long i;
 	int cpu;
 	int firsterr = 0;
+	int flags = 0;
+	unsigned long gp_seq = 0;
 	static struct rcu_torture_ops *torture_ops[] = {
 		&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
 		&busted_srcud_ops, &tasks_ops, &tasks_rude_ops,
@@ -2675,6 +2679,11 @@ rcu_torture_init(void)
 			nrealreaders = 1;
 	}
 	rcu_torture_print_module_parms(cur_ops, "Start of test");
+	rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq);
+	srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq);
+	start_gp_seq = gp_seq;
+	pr_alert("%s:  Start-test grace-period state: g%ld f%#x\n",
+		 cur_ops->name, (long)gp_seq, flags);
 
 	/* Set up the freelist. */
 
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate
  2020-06-18 20:29 [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Joel Fernandes (Google)
                   ` (5 preceding siblings ...)
  2020-06-18 20:29 ` [PATCH 7/7] rcutorture: Add number of GP information to reports Joel Fernandes (Google)
@ 2020-06-18 22:11 ` Paul E. McKenney
  2020-06-18 23:09   ` Paul E. McKenney
  6 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-18 22:11 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, urezki, Davidlohr Bueso, Ingo Molnar,
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	rcu, Steven Rostedt

On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:

First, this looks like a very nice optimization, thank you!

> rcu_segcblist_accelerate() returns true if a GP is to be
> started/requested and false if not. During tracing, I found that it is
> asking that GPs be requested

s/GPs/unnecessary GPs/?  Plus "." at end of the sentence.

> The exact flow seems to be something like:
> 1. Callbacks are queued on CPU A - into the NEXT list.
> 2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a GP X.
> 3. GP thread wakes up on another CPU, starts the GP X and requests QS from CPU A.
> 4. CPU A's softirq runs again presumably because of #3.

Yes, that is one reason RCU softirq might run again.

> 5. CPU A's softirq now does acceleration again, this time no CBs are
>    accelerated since last attempt, but it still requests GP X+1 which
>    could be useless.

I can't think of a case where this request helps.  How about: "but
it still unnecessarily requests GP X+1"?

> The fix is, prevent the useless GP start if we detect no CBs are there
> to accelerate.
> 
> With this, we have the following improvement in short runs of
> rcutorture (5 seconds each):
> +----+-------------------+-------------------+
> | #  | Number of GPs     | Number of Wakeups |
> +====+=========+=========+=========+=========+
> | 1  | With    | Without | With    | Without |
> +----+---------+---------+---------+---------+
> | 2  |      75 |      89 |     113 |     119 |
> +----+---------+---------+---------+---------+
> | 3  |      62 |      91 |     105 |     123 |
> +----+---------+---------+---------+---------+
> | 4  |      60 |      79 |      98 |     110 |
> +----+---------+---------+---------+---------+
> | 5  |      63 |      79 |      99 |     112 |
> +----+---------+---------+---------+---------+
> | 6  |      57 |      89 |      96 |     123 |
> +----+---------+---------+---------+---------+
> | 7  |      64 |      85 |      97 |     118 |
> +----+---------+---------+---------+---------+
> | 8  |      58 |      83 |      98 |     113 |
> +----+---------+---------+---------+---------+
> | 9  |      57 |      77 |      89 |     104 |
> +----+---------+---------+---------+---------+
> | 10 |      66 |      82 |      98 |     119 |
> +----+---------+---------+---------+---------+
> | 11 |      52 |      82 |      83 |     117 |
> +----+---------+---------+---------+---------+

So the reductions in wakeups ranges from 5% to 40%, with almost a 20%
overall reduction in wakeups across all the runs.  That should be of
some use to someone.  ;-)

I do run rcutorture quite a bit, but is there a more real-world
benchmark that could be tried?

> Cc: urezki@gmail.com
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/rcu_segcblist.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 9a0f66133b4b3..4782cf17bf4f9 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -475,8 +475,15 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  	 * Also advance to the oldest segment of callbacks whose
>  	 * ->gp_seq[] completion is at or after that passed in via "seq",
>  	 * skipping any empty segments.
> +	 *
> +	 * Note that "i" is the youngest segment of the list after which
> +	 * any older segments than "i" would not be mutated or assigned
> +	 * GPs. For example, if i == WAIT_TAIL, then neither WAIT_TAIL,
> +	 * nor DONE_TAIL will be touched. Only CBs in NEXT_TAIL will be
> +	 * merged with NEXT_READY_TAIL and the GP numbers of both of
> +	 * them would be updated.

In this case, only the GP number of NEXT_READY_TAIL would be updated,
correct?  Or am I missing something subtle in the loop just past the
end of this patch?

							Thanx, Paul

>  	 */
> -	if (++i >= RCU_NEXT_TAIL)
> +	if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
>  		return false;
>  
>  	/*
> -- 
> 2.27.0.111.gc72c7da667-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes
  2020-06-18 20:29 ` [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
@ 2020-06-18 22:16   ` Paul E. McKenney
  2020-06-18 23:52     ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-18 22:16 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 04:29:50PM -0400, Joel Fernandes (Google) wrote:
> Track how the segcb list changes before/after acceleration, during
> queuing and during dequeuing.
> 
> This has proved useful to discover an optimization to avoid unwanted GP
> requests when there are no callbacks accelerated.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

It does look like it provides some useful information, but the fact
that callback lists can contain millions of callbacks makes the
rcu_segcblist_countseq() a non-starter for mainline use.

One thing that has been on my list for some time is to associate counts
with the rcu_segcblist segments, which would make this sort of tracing
more palatable.  This would be a fiddly change requiring careful testing,
which might be why I have not yet done it myself.  One benefit of such
counts is to allow RCU to determine whether callbacks are being held up by
the grace period on the one hand or by callback invocation on the other.

						Thanx, Paul

> ---
>  include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
>  kernel/rcu/rcu_segcblist.c | 37 +++++++++++++++++++++++++++++++++++++
>  kernel/rcu/rcu_segcblist.h |  7 +++++++
>  kernel/rcu/tree.c          | 24 ++++++++++++++++++++++++
>  4 files changed, 93 insertions(+)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 02dcd119f3263..a6d49864dcc27 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -507,6 +507,31 @@ TRACE_EVENT_RCU(rcu_callback,
>  		  __entry->qlen)
>  );
>  
> +TRACE_EVENT_RCU(rcu_segcb,
> +
> +		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
> +
> +		TP_ARGS(ctx, cb_count, gp_seq),
> +
> +		TP_STRUCT__entry(
> +			__field(const char *, ctx)
> +			__array(int, cb_count, 4)
> +			__array(unsigned long, gp_seq, 4)
> +		),
> +
> +		TP_fast_assign(
> +			__entry->ctx = ctx;
> +			memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
> +			memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
> +		),
> +
> +		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
> +			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
> +			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
> +			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
> +
> +);
> +
>  /*
>   * Tracepoint for the registration of a single RCU callback of the special
>   * kfree() form.  The first argument is the RCU type, the second argument
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 4782cf17bf4f9..036d4abac7c5a 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -316,6 +316,43 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>  			WRITE_ONCE(rsclp->tails[i], &rsclp->head);
>  }
>  
> +/*
> + * Return how many CBs each segment along with their gp_seq values.
> + *
> + * This function is O(N) where N is the number of callbacks. Only used from
> + * tracing code which is usually disabled in production.
> + */
> +#ifdef CONFIG_RCU_TRACE
> +void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> +			 int cbcount[RCU_CBLIST_NSEGS],
> +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> +{
> +	struct rcu_head **cur_tail, *h;
> +	int i, c;
> +
> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
> +		cbcount[i] = 0;
> +
> +	cur_tail = &(rsclp->head);
> +
> +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> +		c = 0;
> +		// List empty?
> +		if (rsclp->tails[i] != cur_tail) {
> +			// The loop skips the last node
> +			c = 1;
> +			for (h = *cur_tail; h->next != *(rsclp->tails[i]); h = h->next) {
> +				c++;
> +			}
> +		}
> +
> +		cbcount[i] = c;
> +		gpseq[i] = rsclp->gp_seq[i];
> +		cur_tail = rsclp->tails[i];
> +	}
> +}
> +#endif
> +
>  /*
>   * Extract only those callbacks still pending (not yet ready to be
>   * invoked) from the specified rcu_segcblist structure and place them in
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 5c293afc07b8e..0a8dbac438529 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -104,3 +104,10 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
>  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
>  void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
>  			 struct rcu_segcblist *src_rsclp);
> +#ifdef CONFIG_RCU_TRACE
> +void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> +			 int cbcount[RCU_CBLIST_NSEGS],
> +			 unsigned long gpseq[RCU_CBLIST_NSEGS]);
> +#else
> +#define rcu_segcblist_countseq(...)
> +#endif
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index ebce14e470d0e..c61af6a33fbfd 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1407,6 +1407,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>  {
>  	unsigned long gp_seq_req;
>  	bool ret = false;
> +	int cbs[RCU_CBLIST_NSEGS];
> +	unsigned long gps[RCU_CBLIST_NSEGS];
>  
>  	rcu_lockdep_assert_cblist_protected(rdp);
>  	raw_lockdep_assert_held_rcu_node(rnp);
> @@ -1415,6 +1417,10 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>  	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
>  		return false;
>  
> +	/* Count CBs for tracing. */
> +	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> +	trace_rcu_segcb("SegCbPreAcc", cbs, gps);
> +
>  	/*
>  	 * Callbacks are often registered with incomplete grace-period
>  	 * information.  Something about the fact that getting exact
> @@ -1434,6 +1440,11 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>  		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
>  	else
>  		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> +
> +	/* Count CBs for tracing. */
> +	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> +	trace_rcu_segcb("SegCbPostAcc", cbs, gps);
> +
>  	return ret;
>  }
>  
> @@ -2316,6 +2327,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
>  	long bl, count;
>  	long pending, tlimit = 0;
> +	int cbs[RCU_CBLIST_NSEGS];
> +	unsigned long gps[RCU_CBLIST_NSEGS];
>  
>  	/* If no callbacks are ready, just return. */
>  	if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
> @@ -2350,6 +2363,11 @@ static void rcu_do_batch(struct rcu_data *rdp)
>  	/* Invoke callbacks. */
>  	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
>  	rhp = rcu_cblist_dequeue(&rcl);
> +
> +	/* Count CBs for tracing. */
> +	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> +	trace_rcu_segcb("SegCbDequeued", cbs, gps);
> +
>  	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
>  		rcu_callback_t f;
>  
> @@ -2808,6 +2826,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	unsigned long flags;
>  	struct rcu_data *rdp;
>  	bool was_alldone;
> +	int cbs[RCU_CBLIST_NSEGS];
> +	unsigned long gps[RCU_CBLIST_NSEGS];
>  
>  	/* Misaligned rcu_head! */
>  	WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1));
> @@ -2852,6 +2872,10 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
>  		trace_rcu_callback(rcu_state.name, head,
>  				   rcu_segcblist_n_cbs(&rdp->cblist));
>  
> +	/* Count CBs for tracing. */
> +	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> +	trace_rcu_segcb("SegCBQueued", cbs, gps);
> +
>  	/* Go handle any RCU core processing required. */
>  	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
>  	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
> -- 
> 2.27.0.111.gc72c7da667-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq
  2020-06-18 20:29 ` [PATCH 3/7] rcu/trace: Add name of the source for gp_seq Joel Fernandes (Google)
@ 2020-06-18 22:19   ` Paul E. McKenney
  2020-06-18 23:51     ` Joel Fernandes
  2020-06-19  0:01     ` Steven Rostedt
  0 siblings, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-18 22:19 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> The gp_seq value can come from either of rdp, rsp or rnp.
> 
> Only the rsp is the global source of truth (most accurate GP info). The
> rnp can be off by ~1 and the rdp can be off by way more. Add some more
> context to traces to clarify where it comes from.

This would be better done in scripting that processes the trace messages.
I must pass on this one.

For future reference, the TPS() around strings is not optional.  Without
it, trace messages from crash dumps are garbled, if I remember correctly.

							Thanx, Paul

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  include/trace/events/rcu.h | 12 ++++++++----
>  kernel/rcu/tree.c          | 32 ++++++++++++++++----------------
>  kernel/rcu/tree_plugin.h   |  2 +-
>  3 files changed, 25 insertions(+), 21 deletions(-)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index a6d49864dcc27..cb5363564f7ed 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
>   */
>  TRACE_EVENT_RCU(rcu_grace_period,
>  
> -	TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> +	TP_PROTO(const char *rcuname, const char *gp_seq_src,
> +		unsigned long gp_seq, const char *gpevent),
>  
> -	TP_ARGS(rcuname, gp_seq, gpevent),
> +	TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
>  
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
> +		__field(const char *, gp_seq_src)
>  		__field(unsigned long, gp_seq)
>  		__field(const char *, gpevent)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
> +		__entry->gp_seq_src = gp_seq_src;
>  		__entry->gp_seq = gp_seq;
>  		__entry->gpevent = gpevent;
>  	),
>  
> -	TP_printk("%s %lu %s",
> -		  __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> +	TP_printk("%s %s_gp_seq=%lu %s",
> +		  __entry->rcuname, __entry->gp_seq_src,
> +		  __entry->gp_seq, __entry->gpevent)
>  );
>  
>  /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c61af6a33fbfd..81df1b837dd9d 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
>  		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
>  		goto unlock_out;
>  	}
> -	trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> +	trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
>  	ret = true;  /* Caller must wake GP kthread. */
>  unlock_out:
>  	/* Push furthest requested GP to leaf node and rcu_data structure. */
> @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>  
>  	/* Trace depending on how much we were able to accelerate. */
>  	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
>  	else
> -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
>  
>  	/* Count CBs for tracing. */
>  	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
>  		if (!offloaded)
>  			ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
>  		rdp->core_needs_qs = false;
> -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
>  	} else {
>  		if (!offloaded)
>  			ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
>  		 * set up to detect a quiescent state, otherwise don't
>  		 * go looking for one.
>  		 */
> -		trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> +		trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
>  		need_qs = !!(rnp->qsmask & rdp->grpmask);
>  		rdp->cpu_no_qs.b.norm = need_qs;
>  		rdp->core_needs_qs = need_qs;
> @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
>  	/* Record GP times before starting GP, hence rcu_seq_start(). */
>  	rcu_seq_start(&rcu_state.gp_seq);
>  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
>  	raw_spin_unlock_irq_rcu_node(rnp);
>  
>  	/*
> @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
>  			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
>  				   jiffies + (j ? 3 * j : 2));
>  		}
> -		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
>  				       TPS("fqswait"));
>  		rcu_state.gp_state = RCU_GP_WAIT_FQS;
>  		ret = swait_event_idle_timeout_exclusive(
> @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
>  		/* If time for quiescent-state forcing, do it. */
>  		if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
>  		    (gf & RCU_GP_FLAG_FQS)) {
> -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
>  					       TPS("fqsstart"));
>  			rcu_gp_fqs(first_gp_fqs);
>  			gf = 0;
> @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
>  				first_gp_fqs = false;
>  				gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
>  			}
> -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
>  					       TPS("fqsend"));
>  			cond_resched_tasks_rcu_qs();
>  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
>  			cond_resched_tasks_rcu_qs();
>  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
>  			WARN_ON(signal_pending(current));
> -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
>  					       TPS("fqswaitsig"));
>  			ret = 1; /* Keep old FQS timing. */
>  			j = jiffies;
> @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
>  	raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
>  
>  	/* Declare grace period done, trace first to use old GP number. */
> -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
>  	rcu_seq_end(&rcu_state.gp_seq);
>  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
>  	rcu_state.gp_state = RCU_GP_IDLE;
> @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
>  	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
>  		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
>  		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> -		trace_rcu_grace_period(rcu_state.name,
> +		trace_rcu_grace_period(rcu_state.name, "rsp",
>  				       rcu_state.gp_seq,
>  				       TPS("newreq"));
>  	} else {
> @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
>  
>  		/* Handle grace-period start. */
>  		for (;;) {
> -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
>  					       TPS("reqwait"));
>  			rcu_state.gp_state = RCU_GP_WAIT_GPS;
>  			swait_event_idle_exclusive(rcu_state.gp_wq,
> @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
>  			cond_resched_tasks_rcu_qs();
>  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
>  			WARN_ON(signal_pending(current));
> -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
>  					       TPS("reqwaitsig"));
>  		}
>  
> @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
>  		return 0;
>  
>  	blkd = !!(rnp->qsmask & rdp->grpmask);
> -	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> +	trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
>  			       blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
>  	return 0;
>  }
> @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
>  	rdp->core_needs_qs = false;
>  	rdp->rcu_iw_pending = false;
>  	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> -	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> +	trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  	rcu_prepare_kthreads(cpu);
>  	rcu_spawn_cpu_nocb_kthread(cpu);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 982fc5be52698..32f761cf16c33 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -262,7 +262,7 @@ static void rcu_qs(void)
>  {
>  	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
>  	if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> -		trace_rcu_grace_period(TPS("rcu_preempt"),
> +		trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
>  				       __this_cpu_read(rcu_data.gp_seq),
>  				       TPS("cpuqs"));
>  		__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> -- 
> 2.27.0.111.gc72c7da667-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint
  2020-06-18 20:29 ` [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint Joel Fernandes (Google)
@ 2020-06-18 22:27   ` Paul E. McKenney
  2020-06-18 23:54     ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-18 22:27 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 04:29:53PM -0400, Joel Fernandes (Google) wrote:
> During acceleration of CB, the rsp's gp_seq is rcu_seq_snap'd. This is
> the value used for acceleration - it is the value of gp_seq at which it
> is safe the execute all callbacks in the callback list.
> 
> The rdp's gp_seq is not very useful for this scenario. Make
> rcu_grace_period report the rsp's gp_seq instead as it allows one to
> reason about how the acceleration works.

Good catch, but please instead trace the gp_seq_req local variable.

							Thanx, Paul

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 81df1b837dd9d..c3bae7a83d792 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>  
>  	/* Trace depending on how much we were able to accelerate. */
>  	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> -		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccWaitCB"));
>  	else
> -		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccReadyCB"));
>  
>  	/* Count CBs for tracing. */
>  	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> -- 
> 2.27.0.111.gc72c7da667-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] rcu/trace: Print negative GP numbers correctly
  2020-06-18 20:29 ` [PATCH 4/7] rcu/trace: Print negative GP numbers correctly Joel Fernandes (Google)
@ 2020-06-18 22:30   ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-18 22:30 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 04:29:52PM -0400, Joel Fernandes (Google) wrote:
> GP numbers start from -300 and gp_seq numbers start of -1200 (for a
> shift of 2). These negative numbers are printed as unsigned long which
> not only takes up more text space, but is rather confusing to the reader
> as they have to constantly expend energy to truncate the number. Just
> print the negative numbering directly.

Good!  This also makes the ftrace versions of the grace-period sequence
numbers consistent with those of rcutorture.

							Thanx, Paul

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  include/trace/events/rcu.h | 62 ++++++++++++++++++++------------------
>  1 file changed, 32 insertions(+), 30 deletions(-)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index cb5363564f7ed..bc24862790623 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -76,18 +76,18 @@ TRACE_EVENT_RCU(rcu_grace_period,
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
>  		__field(const char *, gp_seq_src)
> -		__field(unsigned long, gp_seq)
> +		__field(long, gp_seq)
>  		__field(const char *, gpevent)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
>  		__entry->gp_seq_src = gp_seq_src;
> -		__entry->gp_seq = gp_seq;
> +		__entry->gp_seq = (long)gp_seq;
>  		__entry->gpevent = gpevent;
>  	),
>  
> -	TP_printk("%s %s_gp_seq=%lu %s",
> +	TP_printk("%s %s_gp_seq=%ld %s",
>  		  __entry->rcuname, __entry->gp_seq_src,
>  		  __entry->gp_seq, __entry->gpevent)
>  );
> @@ -118,8 +118,8 @@ TRACE_EVENT_RCU(rcu_future_grace_period,
>  
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
> -		__field(unsigned long, gp_seq)
> -		__field(unsigned long, gp_seq_req)
> +		__field(long, gp_seq)
> +		__field(long, gp_seq_req)
>  		__field(u8, level)
>  		__field(int, grplo)
>  		__field(int, grphi)
> @@ -128,16 +128,16 @@ TRACE_EVENT_RCU(rcu_future_grace_period,
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
> -		__entry->gp_seq = gp_seq;
> -		__entry->gp_seq_req = gp_seq_req;
> +		__entry->gp_seq = (long)gp_seq;
> +		__entry->gp_seq_req = (long)gp_seq_req;
>  		__entry->level = level;
>  		__entry->grplo = grplo;
>  		__entry->grphi = grphi;
>  		__entry->gpevent = gpevent;
>  	),
>  
> -	TP_printk("%s %lu %lu %u %d %d %s",
> -		  __entry->rcuname, __entry->gp_seq, __entry->gp_seq_req, __entry->level,
> +	TP_printk("%s %ld %ld %u %d %d %s",
> +		  __entry->rcuname, (long)__entry->gp_seq, (long)__entry->gp_seq_req, __entry->level,
>  		  __entry->grplo, __entry->grphi, __entry->gpevent)
>  );
>  
> @@ -157,7 +157,7 @@ TRACE_EVENT_RCU(rcu_grace_period_init,
>  
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
> -		__field(unsigned long, gp_seq)
> +		__field(long, gp_seq)
>  		__field(u8, level)
>  		__field(int, grplo)
>  		__field(int, grphi)
> @@ -166,14 +166,14 @@ TRACE_EVENT_RCU(rcu_grace_period_init,
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
> -		__entry->gp_seq = gp_seq;
> +		__entry->gp_seq = (long)gp_seq;
>  		__entry->level = level;
>  		__entry->grplo = grplo;
>  		__entry->grphi = grphi;
>  		__entry->qsmask = qsmask;
>  	),
>  
> -	TP_printk("%s %lu %u %d %d %lx",
> +	TP_printk("%s %ld %u %d %d %lx",
>  		  __entry->rcuname, __entry->gp_seq, __entry->level,
>  		  __entry->grplo, __entry->grphi, __entry->qsmask)
>  );
> @@ -201,17 +201,17 @@ TRACE_EVENT_RCU(rcu_exp_grace_period,
>  
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
> -		__field(unsigned long, gpseq)
> +		__field(long, gpseq)
>  		__field(const char *, gpevent)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
> -		__entry->gpseq = gpseq;
> +		__entry->gpseq = (long)gpseq;
>  		__entry->gpevent = gpevent;
>  	),
>  
> -	TP_printk("%s %lu %s",
> +	TP_printk("%s %ld %s",
>  		  __entry->rcuname, __entry->gpseq, __entry->gpevent)
>  );
>  
> @@ -320,17 +320,17 @@ TRACE_EVENT_RCU(rcu_preempt_task,
>  
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
> -		__field(unsigned long, gp_seq)
> +		__field(long, gp_seq)
>  		__field(int, pid)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
> -		__entry->gp_seq = gp_seq;
> +		__entry->gp_seq = (long)gp_seq;
>  		__entry->pid = pid;
>  	),
>  
> -	TP_printk("%s %lu %d",
> +	TP_printk("%s %ld %d",
>  		  __entry->rcuname, __entry->gp_seq, __entry->pid)
>  );
>  
> @@ -347,17 +347,17 @@ TRACE_EVENT_RCU(rcu_unlock_preempted_task,
>  
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
> -		__field(unsigned long, gp_seq)
> +		__field(long, gp_seq)
>  		__field(int, pid)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
> -		__entry->gp_seq = gp_seq;
> +		__entry->gp_seq = (long)gp_seq;
>  		__entry->pid = pid;
>  	),
>  
> -	TP_printk("%s %lu %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
> +	TP_printk("%s %ld %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
>  );
>  
>  /*
> @@ -378,7 +378,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
>  
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
> -		__field(unsigned long, gp_seq)
> +		__field(long, gp_seq)
>  		__field(unsigned long, mask)
>  		__field(unsigned long, qsmask)
>  		__field(u8, level)
> @@ -389,7 +389,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
> -		__entry->gp_seq = gp_seq;
> +		__entry->gp_seq = (long)gp_seq;
>  		__entry->mask = mask;
>  		__entry->qsmask = qsmask;
>  		__entry->level = level;
> @@ -398,7 +398,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
>  		__entry->gp_tasks = gp_tasks;
>  	),
>  
> -	TP_printk("%s %lu %lx>%lx %u %d %d %u",
> +	TP_printk("%s %ld %lx>%lx %u %d %d %u",
>  		  __entry->rcuname, __entry->gp_seq,
>  		  __entry->mask, __entry->qsmask, __entry->level,
>  		  __entry->grplo, __entry->grphi, __entry->gp_tasks)
> @@ -419,19 +419,19 @@ TRACE_EVENT_RCU(rcu_fqs,
>  
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
> -		__field(unsigned long, gp_seq)
> +		__field(long, gp_seq)
>  		__field(int, cpu)
>  		__field(const char *, qsevent)
>  	),
>  
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
> -		__entry->gp_seq = gp_seq;
> +		__entry->gp_seq = (long)gp_seq;
>  		__entry->cpu = cpu;
>  		__entry->qsevent = qsevent;
>  	),
>  
> -	TP_printk("%s %lu %d %s",
> +	TP_printk("%s %ld %d %s",
>  		  __entry->rcuname, __entry->gp_seq,
>  		  __entry->cpu, __entry->qsevent)
>  );
> @@ -520,17 +520,19 @@ TRACE_EVENT_RCU(rcu_segcb,
>  		TP_STRUCT__entry(
>  			__field(const char *, ctx)
>  			__array(int, cb_count, 4)
> -			__array(unsigned long, gp_seq, 4)
> +			__array(long, gp_seq, 4)
>  		),
>  
>  		TP_fast_assign(
> +			int i;
>  			__entry->ctx = ctx;
>  			memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
> -			memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
> +			for (i = 0; i < 4; i++)
> +				__entry->gp_seq[i] = (long)(gp_seq[i]);
>  		),
>  
>  		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
> -			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
> +			  "gp_seq: (DONE=%ld, WAIT=%ld, NEXT_READY=%ld, NEXT=%ld)", __entry->ctx,
>  			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
>  			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
>  
> -- 
> 2.27.0.111.gc72c7da667-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread
  2020-06-18 20:29 ` [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread Joel Fernandes (Google)
@ 2020-06-18 22:40   ` Paul E. McKenney
  2020-06-19  0:01     ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-18 22:40 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 04:29:54PM -0400, Joel Fernandes (Google) wrote:
> This is useful to check for any improvements or degradation related to
> number of GP kthread wakeups during testing.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

This was a good way to collect the data for your testing, but
we can expect rcutorture to only do so much.  ;-)

							Thanx, Paul

> ---
>  kernel/rcu/Kconfig.debug |  1 +
>  kernel/rcu/rcu.h         |  2 ++
>  kernel/rcu/rcutorture.c  | 23 ++++++++++++++++++++++-
>  kernel/rcu/tree.c        |  7 +++++++
>  4 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 3cf6132a4bb9f..3323e3378af5a 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -50,6 +50,7 @@ config RCU_TORTURE_TEST
>  	select TASKS_RCU
>  	select TASKS_RUDE_RCU
>  	select TASKS_TRACE_RCU
> +	select SCHEDSTATS
>  	default n
>  	help
>  	  This option provides a kernel module that runs torture tests
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index cf66a3ccd7573..7e867e81d9738 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -511,6 +511,7 @@ srcu_batches_completed(struct srcu_struct *sp) { return 0; }
>  static inline void rcu_force_quiescent_state(void) { }
>  static inline void show_rcu_gp_kthreads(void) { }
>  static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
> +static inline struct task_struct *rcu_get_main_gp_kthread(void) { return 0; }
>  static inline void rcu_fwd_progress_check(unsigned long j) { }
>  #else /* #ifdef CONFIG_TINY_RCU */
>  bool rcu_dynticks_zero_in_eqs(int cpu, int *vp);
> @@ -519,6 +520,7 @@ unsigned long rcu_exp_batches_completed(void);
>  unsigned long srcu_batches_completed(struct srcu_struct *sp);
>  void show_rcu_gp_kthreads(void);
>  int rcu_get_gp_kthreads_prio(void);
> +struct task_struct *rcu_get_main_gp_kthread(void);
>  void rcu_fwd_progress_check(unsigned long j);
>  void rcu_force_quiescent_state(void);
>  extern struct workqueue_struct *rcu_gp_wq;
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index d0d265304d147..959a1f84d6904 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -23,6 +23,7 @@
>  #include <linux/rcupdate_wait.h>
>  #include <linux/interrupt.h>
>  #include <linux/sched/signal.h>
> +#include <linux/sched/stat.h>
>  #include <uapi/linux/sched/types.h>
>  #include <linux/atomic.h>
>  #include <linux/bitops.h>
> @@ -460,9 +461,29 @@ static void rcu_sync_torture_init(void)
>  	INIT_LIST_HEAD(&rcu_torture_removed);
>  }
>  
> +unsigned long rcu_gp_nr_wakeups;
> +
> +static void rcu_flavor_init(void)
> +{
> +	rcu_sync_torture_init();
> +
> +	/* Make sure schedstat is enabled for GP thread wakeup count. */
> +	force_schedstat_enabled();
> +	rcu_gp_nr_wakeups = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> +}
> +
> +static void rcu_flavor_cleanup(void)
> +{
> +	unsigned long now_nr = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> +
> +	pr_alert("End-test: Cleanup: Total GP-kthread wakeups: %lu\n",
> +		now_nr - rcu_gp_nr_wakeups);
> +}
> +
>  static struct rcu_torture_ops rcu_ops = {
>  	.ttype		= RCU_FLAVOR,
> -	.init		= rcu_sync_torture_init,
> +	.init		= rcu_flavor_init,
> +	.cleanup	= rcu_flavor_cleanup,
>  	.readlock	= rcu_torture_read_lock,
>  	.read_delay	= rcu_read_delay,
>  	.readunlock	= rcu_torture_read_unlock,
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c3bae7a83d792..a3a175feb310a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -175,6 +175,13 @@ int rcu_get_gp_kthreads_prio(void)
>  }
>  EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
>  
> +/* Retrieve RCU's main GP kthread task_struct */
> +struct task_struct *rcu_get_main_gp_kthread(void)
> +{
> +	return rcu_state.gp_kthread;
> +}
> +EXPORT_SYMBOL_GPL(rcu_get_main_gp_kthread);
> +
>  /*
>   * Number of grace periods between delays, normalized by the duration of
>   * the delay.  The longer the delay, the more the grace periods between
> -- 
> 2.27.0.111.gc72c7da667-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate
  2020-06-18 22:11 ` [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Paul E. McKenney
@ 2020-06-18 23:09   ` Paul E. McKenney
  2020-06-19  0:04     ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-18 23:09 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, urezki, Davidlohr Bueso, Ingo Molnar,
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	rcu, Steven Rostedt

On Thu, Jun 18, 2020 at 03:11:19PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:
> 
> First, this looks like a very nice optimization, thank you!
> 
> > rcu_segcblist_accelerate() returns true if a GP is to be
> > started/requested and false if not. During tracing, I found that it is
> > asking that GPs be requested
> 
> s/GPs/unnecessary GPs/?  Plus "." at end of the sentence.
> 
> > The exact flow seems to be something like:
> > 1. Callbacks are queued on CPU A - into the NEXT list.
> > 2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a GP X.
> > 3. GP thread wakes up on another CPU, starts the GP X and requests QS from CPU A.
> > 4. CPU A's softirq runs again presumably because of #3.
> 
> Yes, that is one reason RCU softirq might run again.
> 
> > 5. CPU A's softirq now does acceleration again, this time no CBs are
> >    accelerated since last attempt, but it still requests GP X+1 which
> >    could be useless.
> 
> I can't think of a case where this request helps.  How about: "but
> it still unnecessarily requests GP X+1"?
> 
> > The fix is, prevent the useless GP start if we detect no CBs are there
> > to accelerate.
> > 
> > With this, we have the following improvement in short runs of
> > rcutorture (5 seconds each):
> > +----+-------------------+-------------------+
> > | #  | Number of GPs     | Number of Wakeups |
> > +====+=========+=========+=========+=========+
> > | 1  | With    | Without | With    | Without |
> > +----+---------+---------+---------+---------+
> > | 2  |      75 |      89 |     113 |     119 |
> > +----+---------+---------+---------+---------+
> > | 3  |      62 |      91 |     105 |     123 |
> > +----+---------+---------+---------+---------+
> > | 4  |      60 |      79 |      98 |     110 |
> > +----+---------+---------+---------+---------+
> > | 5  |      63 |      79 |      99 |     112 |
> > +----+---------+---------+---------+---------+
> > | 6  |      57 |      89 |      96 |     123 |
> > +----+---------+---------+---------+---------+
> > | 7  |      64 |      85 |      97 |     118 |
> > +----+---------+---------+---------+---------+
> > | 8  |      58 |      83 |      98 |     113 |
> > +----+---------+---------+---------+---------+
> > | 9  |      57 |      77 |      89 |     104 |
> > +----+---------+---------+---------+---------+
> > | 10 |      66 |      82 |      98 |     119 |
> > +----+---------+---------+---------+---------+
> > | 11 |      52 |      82 |      83 |     117 |
> > +----+---------+---------+---------+---------+
> 
> So the reductions in wakeups ranges from 5% to 40%, with almost a 20%
> overall reduction in wakeups across all the runs.  That should be of
> some use to someone.  ;-)
> 
> I do run rcutorture quite a bit, but is there a more real-world
> benchmark that could be tried?
> 
> > Cc: urezki@gmail.com
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

As discussed over IRC, I updated the patch as shown below.  Does that
work for you?

							Thanx, Paul

------------------------------------------------------------------------

commit ec037e1f438074eb16fd68a63d699fc419c9ba0c
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Thu Jun 18 16:29:49 2020 -0400

    rcu/segcblist: Prevent useless GP start if no CBs to accelerate
    
    The rcu_segcblist_accelerate() function returns true iff it is necessary
    to request another grace period.  A tracing session showed that this
    function unnecessarily requests grace periods.
    
    For exmaple, consider the following sequence of events:
    1. Callbacks are queued only on the NEXT segment of CPU A's callback list.
    2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT.
    3. Thus rcu_segcblist_accelerate() returns true, requesting grace period N.
    4. RCU's grace-period kthread wakes up on CPU B and starts grace period N.
    4. CPU A notices the new grace period and invokes RCU_SOFTIRQ.
    5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but
       there are no new callbacks.  However, rcu_segcblist_accelerate()
       nevertheless (uselessly) requests a new grace period N+1.
    
    This extra grace period results in additional lock contention and also
    additional wakeups, all for no good reason.
    
    This commit therefore adds a check to rcu_segcblist_accelerate() that
    prevents the return of true when there are no new callbacks.
    
    This change reduces the number of grace periods (GPs) and wakeups in each
    of eleven five-second rcutorture runs as follows:
    
    +----+-------------------+-------------------+
    | #  | Number of GPs     | Number of Wakeups |
    +====+=========+=========+=========+=========+
    | 1  | With    | Without | With    | Without |
    +----+---------+---------+---------+---------+
    | 2  |      75 |      89 |     113 |     119 |
    +----+---------+---------+---------+---------+
    | 3  |      62 |      91 |     105 |     123 |
    +----+---------+---------+---------+---------+
    | 4  |      60 |      79 |      98 |     110 |
    +----+---------+---------+---------+---------+
    | 5  |      63 |      79 |      99 |     112 |
    +----+---------+---------+---------+---------+
    | 6  |      57 |      89 |      96 |     123 |
    +----+---------+---------+---------+---------+
    | 7  |      64 |      85 |      97 |     118 |
    +----+---------+---------+---------+---------+
    | 8  |      58 |      83 |      98 |     113 |
    +----+---------+---------+---------+---------+
    | 9  |      57 |      77 |      89 |     104 |
    +----+---------+---------+---------+---------+
    | 10 |      66 |      82 |      98 |     119 |
    +----+---------+---------+---------+---------+
    | 11 |      52 |      82 |      83 |     117 |
    +----+---------+---------+---------+---------+
    
    The reduction in the number of wakeups ranges from 5% to 40%.
    
    Cc: urezki@gmail.com
    [ paulmck: Rework commit log and comment. ]
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 9a0f661..2d2a6b6b9 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -475,8 +475,16 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 	 * Also advance to the oldest segment of callbacks whose
 	 * ->gp_seq[] completion is at or after that passed in via "seq",
 	 * skipping any empty segments.
+	 *
+	 * Note that segment "i" (and any lower-numbered segments
+	 * containing older callbacks) will be unaffected, and their
+	 * grace-period numbers remain unchanged.  For example, if i ==
+	 * WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched.
+	 * Instead, the CBs in NEXT_TAIL will be merged with those in
+	 * NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL
+	 * would be updated.  NEXT_TAIL would then be empty.
 	 */
-	if (++i >= RCU_NEXT_TAIL)
+	if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
 		return false;
 
 	/*

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/7] rcutorture: Add number of GP information to reports
  2020-06-18 20:29 ` [PATCH 7/7] rcutorture: Add number of GP information to reports Joel Fernandes (Google)
@ 2020-06-18 23:27   ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-18 23:27 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 04:29:55PM -0400, Joel Fernandes (Google) wrote:
> Add 2 things to get visiblity around number of grace periods.
> 1. Add number of GPs to End-state print.
> 2. Just like End-state, add GP state to Start-state.
> 
> This helps determine how many GPs elapsed during a run of rcutorture and
> what the initial state was.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

And I queued this one with a reworked commit log, thank you!

							Thanx, Paul

> ---
>  kernel/rcu/rcutorture.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 959a1f84d6904..c80725ddd9030 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -186,6 +186,7 @@ static long n_barrier_successes; /* did rcu_barrier test succeed? */
>  static unsigned long n_read_exits;
>  static struct list_head rcu_torture_removed;
>  static unsigned long shutdown_jiffies;
> +static unsigned long start_gp_seq;
>  
>  static int rcu_torture_writer_state;
>  #define RTWS_FIXED_DELAY	0
> @@ -2508,8 +2509,9 @@ rcu_torture_cleanup(void)
>  
>  	rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq);
>  	srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq);
> -	pr_alert("%s:  End-test grace-period state: g%lu f%#x\n",
> -		 cur_ops->name, gp_seq, flags);
> +	pr_alert("%s:  End-test grace-period state: g%ld f%#x total-gps=%ld\n",
> +		 cur_ops->name, (long)gp_seq, flags,
> +		 rcutorture_seq_diff(gp_seq, start_gp_seq));
>  	torture_stop_kthread(rcu_torture_stats, stats_task);
>  	torture_stop_kthread(rcu_torture_fqs, fqs_task);
>  	if (rcu_torture_can_boost())
> @@ -2633,6 +2635,8 @@ rcu_torture_init(void)
>  	long i;
>  	int cpu;
>  	int firsterr = 0;
> +	int flags = 0;
> +	unsigned long gp_seq = 0;
>  	static struct rcu_torture_ops *torture_ops[] = {
>  		&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
>  		&busted_srcud_ops, &tasks_ops, &tasks_rude_ops,
> @@ -2675,6 +2679,11 @@ rcu_torture_init(void)
>  			nrealreaders = 1;
>  	}
>  	rcu_torture_print_module_parms(cur_ops, "Start of test");
> +	rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq);
> +	srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq);
> +	start_gp_seq = gp_seq;
> +	pr_alert("%s:  Start-test grace-period state: g%ld f%#x\n",
> +		 cur_ops->name, (long)gp_seq, flags);
>  
>  	/* Set up the freelist. */
>  
> -- 
> 2.27.0.111.gc72c7da667-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq
  2020-06-18 22:19   ` Paul E. McKenney
@ 2020-06-18 23:51     ` Joel Fernandes
  2020-06-19  0:12       ` Paul E. McKenney
  2020-06-19  0:01     ` Steven Rostedt
  1 sibling, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2020-06-18 23:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 03:19:01PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> > The gp_seq value can come from either of rdp, rsp or rnp.
> > 
> > Only the rsp is the global source of truth (most accurate GP info). The
> > rnp can be off by ~1 and the rdp can be off by way more. Add some more
> > context to traces to clarify where it comes from.
> 
> This would be better done in scripting that processes the trace messages.
> I must pass on this one.

I don't think so. I think people reading traces would get confused.  Recently
I had to talk with someone about this as well when going over traces. I
myself struggle with it.

> For future reference, the TPS() around strings is not optional.  Without
> it, trace messages from crash dumps are garbled, if I remember correctly.

Ok. I will fix that.

thanks,

 - Joel

> 
> 							Thanx, Paul
> 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  include/trace/events/rcu.h | 12 ++++++++----
> >  kernel/rcu/tree.c          | 32 ++++++++++++++++----------------
> >  kernel/rcu/tree_plugin.h   |  2 +-
> >  3 files changed, 25 insertions(+), 21 deletions(-)
> > 
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index a6d49864dcc27..cb5363564f7ed 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> >   */
> >  TRACE_EVENT_RCU(rcu_grace_period,
> >  
> > -	TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> > +	TP_PROTO(const char *rcuname, const char *gp_seq_src,
> > +		unsigned long gp_seq, const char *gpevent),
> >  
> > -	TP_ARGS(rcuname, gp_seq, gpevent),
> > +	TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
> >  
> >  	TP_STRUCT__entry(
> >  		__field(const char *, rcuname)
> > +		__field(const char *, gp_seq_src)
> >  		__field(unsigned long, gp_seq)
> >  		__field(const char *, gpevent)
> >  	),
> >  
> >  	TP_fast_assign(
> >  		__entry->rcuname = rcuname;
> > +		__entry->gp_seq_src = gp_seq_src;
> >  		__entry->gp_seq = gp_seq;
> >  		__entry->gpevent = gpevent;
> >  	),
> >  
> > -	TP_printk("%s %lu %s",
> > -		  __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> > +	TP_printk("%s %s_gp_seq=%lu %s",
> > +		  __entry->rcuname, __entry->gp_seq_src,
> > +		  __entry->gp_seq, __entry->gpevent)
> >  );
> >  
> >  /*
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c61af6a33fbfd..81df1b837dd9d 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> >  		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> >  		goto unlock_out;
> >  	}
> > -	trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> > +	trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> >  	ret = true;  /* Caller must wake GP kthread. */
> >  unlock_out:
> >  	/* Push furthest requested GP to leaf node and rcu_data structure. */
> > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >  
> >  	/* Trace depending on how much we were able to accelerate. */
> >  	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> >  	else
> > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> >  
> >  	/* Count CBs for tracing. */
> >  	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> >  		if (!offloaded)
> >  			ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> >  		rdp->core_needs_qs = false;
> > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> >  	} else {
> >  		if (!offloaded)
> >  			ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> >  		 * set up to detect a quiescent state, otherwise don't
> >  		 * go looking for one.
> >  		 */
> > -		trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > +		trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> >  		need_qs = !!(rnp->qsmask & rdp->grpmask);
> >  		rdp->cpu_no_qs.b.norm = need_qs;
> >  		rdp->core_needs_qs = need_qs;
> > @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> >  	/* Record GP times before starting GP, hence rcu_seq_start(). */
> >  	rcu_seq_start(&rcu_state.gp_seq);
> >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> >  	raw_spin_unlock_irq_rcu_node(rnp);
> >  
> >  	/*
> > @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> >  			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> >  				   jiffies + (j ? 3 * j : 2));
> >  		}
> > -		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> >  				       TPS("fqswait"));
> >  		rcu_state.gp_state = RCU_GP_WAIT_FQS;
> >  		ret = swait_event_idle_timeout_exclusive(
> > @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> >  		/* If time for quiescent-state forcing, do it. */
> >  		if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> >  		    (gf & RCU_GP_FLAG_FQS)) {
> > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> >  					       TPS("fqsstart"));
> >  			rcu_gp_fqs(first_gp_fqs);
> >  			gf = 0;
> > @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> >  				first_gp_fqs = false;
> >  				gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> >  			}
> > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> >  					       TPS("fqsend"));
> >  			cond_resched_tasks_rcu_qs();
> >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> >  			cond_resched_tasks_rcu_qs();
> >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> >  			WARN_ON(signal_pending(current));
> > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> >  					       TPS("fqswaitsig"));
> >  			ret = 1; /* Keep old FQS timing. */
> >  			j = jiffies;
> > @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> >  	raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
> >  
> >  	/* Declare grace period done, trace first to use old GP number. */
> > -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> > +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> >  	rcu_seq_end(&rcu_state.gp_seq);
> >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> >  	rcu_state.gp_state = RCU_GP_IDLE;
> > @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> >  	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> >  		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> >  		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> > -		trace_rcu_grace_period(rcu_state.name,
> > +		trace_rcu_grace_period(rcu_state.name, "rsp",
> >  				       rcu_state.gp_seq,
> >  				       TPS("newreq"));
> >  	} else {
> > @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> >  
> >  		/* Handle grace-period start. */
> >  		for (;;) {
> > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> >  					       TPS("reqwait"));
> >  			rcu_state.gp_state = RCU_GP_WAIT_GPS;
> >  			swait_event_idle_exclusive(rcu_state.gp_wq,
> > @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> >  			cond_resched_tasks_rcu_qs();
> >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> >  			WARN_ON(signal_pending(current));
> > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> >  					       TPS("reqwaitsig"));
> >  		}
> >  
> > @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> >  		return 0;
> >  
> >  	blkd = !!(rnp->qsmask & rdp->grpmask);
> > -	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> > +	trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> >  			       blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> >  	return 0;
> >  }
> > @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> >  	rdp->core_needs_qs = false;
> >  	rdp->rcu_iw_pending = false;
> >  	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> > -	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > +	trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  	rcu_prepare_kthreads(cpu);
> >  	rcu_spawn_cpu_nocb_kthread(cpu);
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 982fc5be52698..32f761cf16c33 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -262,7 +262,7 @@ static void rcu_qs(void)
> >  {
> >  	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> >  	if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> > -		trace_rcu_grace_period(TPS("rcu_preempt"),
> > +		trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> >  				       __this_cpu_read(rcu_data.gp_seq),
> >  				       TPS("cpuqs"));
> >  		__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> > -- 
> > 2.27.0.111.gc72c7da667-goog
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes
  2020-06-18 22:16   ` Paul E. McKenney
@ 2020-06-18 23:52     ` Joel Fernandes
  0 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2020-06-18 23:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 03:16:39PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:50PM -0400, Joel Fernandes (Google) wrote:
> > Track how the segcb list changes before/after acceleration, during
> > queuing and during dequeuing.
> > 
> > This has proved useful to discover an optimization to avoid unwanted GP
> > requests when there are no callbacks accelerated.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> It does look like it provides some useful information, but the fact
> that callback lists can contain millions of callbacks makes the
> rcu_segcblist_countseq() a non-starter for mainline use.
> 
> One thing that has been on my list for some time is to associate counts
> with the rcu_segcblist segments, which would make this sort of tracing
> more palatable.  This would be a fiddly change requiring careful testing,
> which might be why I have not yet done it myself.  One benefit of such
> counts is to allow RCU to determine whether callbacks are being held up by
> the grace period on the one hand or by callback invocation on the other.

Sounds good, as discussed on IRC I will work on adding direct counters to the
list.

thanks,

 - Joel


> 						Thanx, Paul
> 
> > ---
> >  include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
> >  kernel/rcu/rcu_segcblist.c | 37 +++++++++++++++++++++++++++++++++++++
> >  kernel/rcu/rcu_segcblist.h |  7 +++++++
> >  kernel/rcu/tree.c          | 24 ++++++++++++++++++++++++
> >  4 files changed, 93 insertions(+)
> > 
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index 02dcd119f3263..a6d49864dcc27 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -507,6 +507,31 @@ TRACE_EVENT_RCU(rcu_callback,
> >  		  __entry->qlen)
> >  );
> >  
> > +TRACE_EVENT_RCU(rcu_segcb,
> > +
> > +		TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
> > +
> > +		TP_ARGS(ctx, cb_count, gp_seq),
> > +
> > +		TP_STRUCT__entry(
> > +			__field(const char *, ctx)
> > +			__array(int, cb_count, 4)
> > +			__array(unsigned long, gp_seq, 4)
> > +		),
> > +
> > +		TP_fast_assign(
> > +			__entry->ctx = ctx;
> > +			memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
> > +			memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
> > +		),
> > +
> > +		TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
> > +			  "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
> > +			  __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
> > +			  __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
> > +
> > +);
> > +
> >  /*
> >   * Tracepoint for the registration of a single RCU callback of the special
> >   * kfree() form.  The first argument is the RCU type, the second argument
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 4782cf17bf4f9..036d4abac7c5a 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -316,6 +316,43 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
> >  			WRITE_ONCE(rsclp->tails[i], &rsclp->head);
> >  }
> >  
> > +/*
> > + * Return how many CBs each segment along with their gp_seq values.
> > + *
> > + * This function is O(N) where N is the number of callbacks. Only used from
> > + * tracing code which is usually disabled in production.
> > + */
> > +#ifdef CONFIG_RCU_TRACE
> > +void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> > +			 int cbcount[RCU_CBLIST_NSEGS],
> > +			 unsigned long gpseq[RCU_CBLIST_NSEGS])
> > +{
> > +	struct rcu_head **cur_tail, *h;
> > +	int i, c;
> > +
> > +	for (i = 0; i < RCU_CBLIST_NSEGS; i++)
> > +		cbcount[i] = 0;
> > +
> > +	cur_tail = &(rsclp->head);
> > +
> > +	for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
> > +		c = 0;
> > +		// List empty?
> > +		if (rsclp->tails[i] != cur_tail) {
> > +			// The loop skips the last node
> > +			c = 1;
> > +			for (h = *cur_tail; h->next != *(rsclp->tails[i]); h = h->next) {
> > +				c++;
> > +			}
> > +		}
> > +
> > +		cbcount[i] = c;
> > +		gpseq[i] = rsclp->gp_seq[i];
> > +		cur_tail = rsclp->tails[i];
> > +	}
> > +}
> > +#endif
> > +
> >  /*
> >   * Extract only those callbacks still pending (not yet ready to be
> >   * invoked) from the specified rcu_segcblist structure and place them in
> > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > index 5c293afc07b8e..0a8dbac438529 100644
> > --- a/kernel/rcu/rcu_segcblist.h
> > +++ b/kernel/rcu/rcu_segcblist.h
> > @@ -104,3 +104,10 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
> >  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
> >  void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
> >  			 struct rcu_segcblist *src_rsclp);
> > +#ifdef CONFIG_RCU_TRACE
> > +void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
> > +			 int cbcount[RCU_CBLIST_NSEGS],
> > +			 unsigned long gpseq[RCU_CBLIST_NSEGS]);
> > +#else
> > +#define rcu_segcblist_countseq(...)
> > +#endif
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index ebce14e470d0e..c61af6a33fbfd 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1407,6 +1407,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >  {
> >  	unsigned long gp_seq_req;
> >  	bool ret = false;
> > +	int cbs[RCU_CBLIST_NSEGS];
> > +	unsigned long gps[RCU_CBLIST_NSEGS];
> >  
> >  	rcu_lockdep_assert_cblist_protected(rdp);
> >  	raw_lockdep_assert_held_rcu_node(rnp);
> > @@ -1415,6 +1417,10 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >  	if (!rcu_segcblist_pend_cbs(&rdp->cblist))
> >  		return false;
> >  
> > +	/* Count CBs for tracing. */
> > +	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > +	trace_rcu_segcb("SegCbPreAcc", cbs, gps);
> > +
> >  	/*
> >  	 * Callbacks are often registered with incomplete grace-period
> >  	 * information.  Something about the fact that getting exact
> > @@ -1434,6 +1440,11 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >  		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> >  	else
> >  		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > +
> > +	/* Count CBs for tracing. */
> > +	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > +	trace_rcu_segcb("SegCbPostAcc", cbs, gps);
> > +
> >  	return ret;
> >  }
> >  
> > @@ -2316,6 +2327,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >  	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
> >  	long bl, count;
> >  	long pending, tlimit = 0;
> > +	int cbs[RCU_CBLIST_NSEGS];
> > +	unsigned long gps[RCU_CBLIST_NSEGS];
> >  
> >  	/* If no callbacks are ready, just return. */
> >  	if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
> > @@ -2350,6 +2363,11 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >  	/* Invoke callbacks. */
> >  	tick_dep_set_task(current, TICK_DEP_BIT_RCU);
> >  	rhp = rcu_cblist_dequeue(&rcl);
> > +
> > +	/* Count CBs for tracing. */
> > +	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > +	trace_rcu_segcb("SegCbDequeued", cbs, gps);
> > +
> >  	for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
> >  		rcu_callback_t f;
> >  
> > @@ -2808,6 +2826,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  	unsigned long flags;
> >  	struct rcu_data *rdp;
> >  	bool was_alldone;
> > +	int cbs[RCU_CBLIST_NSEGS];
> > +	unsigned long gps[RCU_CBLIST_NSEGS];
> >  
> >  	/* Misaligned rcu_head! */
> >  	WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1));
> > @@ -2852,6 +2872,10 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  		trace_rcu_callback(rcu_state.name, head,
> >  				   rcu_segcblist_n_cbs(&rdp->cblist));
> >  
> > +	/* Count CBs for tracing. */
> > +	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > +	trace_rcu_segcb("SegCBQueued", cbs, gps);
> > +
> >  	/* Go handle any RCU core processing required. */
> >  	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> >  	    unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
> > -- 
> > 2.27.0.111.gc72c7da667-goog
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint
  2020-06-18 22:27   ` Paul E. McKenney
@ 2020-06-18 23:54     ` Joel Fernandes
  0 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2020-06-18 23:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 03:27:21PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:53PM -0400, Joel Fernandes (Google) wrote:
> > During acceleration of CB, the rsp's gp_seq is rcu_seq_snap'd. This is
> > the value used for acceleration - it is the value of gp_seq at which it
> > is safe the execute all callbacks in the callback list.
> > 
> > The rdp's gp_seq is not very useful for this scenario. Make
> > rcu_grace_period report the rsp's gp_seq instead as it allows one to
> > reason about how the acceleration works.
> 
> Good catch, but please instead trace the gp_seq_req local variable.

Oh sure, I will do that. Actually I was planning to, but was wondering if it
would be inconsistent with other trace_rcu_grace_period() invocations which
pass a real gp_seq (one that is as old as rcu_state.gp_seq). But agreed, this
is better since the acceleration uses it.

thanks,

 - Joel

> 							Thanx, Paul
> 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 81df1b837dd9d..c3bae7a83d792 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >  
> >  	/* Trace depending on how much we were able to accelerate. */
> >  	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > -		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccWaitCB"));
> >  	else
> > -		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> > +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccReadyCB"));
> >  
> >  	/* Count CBs for tracing. */
> >  	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > -- 
> > 2.27.0.111.gc72c7da667-goog
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq
  2020-06-18 22:19   ` Paul E. McKenney
  2020-06-18 23:51     ` Joel Fernandes
@ 2020-06-19  0:01     ` Steven Rostedt
  2020-06-19  1:01       ` Joel Fernandes
  1 sibling, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2020-06-19  0:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes (Google),
	linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Uladzislau Rezki (Sony)

On Thu, 18 Jun 2020 15:19:01 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> For future reference, the TPS() around strings is not optional.  Without
> it, trace messages from crash dumps are garbled, if I remember correctly.

When you pass in a string like this, only the pointer to the string is
saved in the ring buffer. User space tools have no idea what those
pointers are. The TPS() around strings maps those pointers to the
string and shows them in the /sys/kernel/tracing/printk_formats file,
such that perf and trace-cmd know how to make sense of those strings.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread
  2020-06-18 22:40   ` Paul E. McKenney
@ 2020-06-19  0:01     ` Joel Fernandes
  2020-06-19  0:12       ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2020-06-19  0:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 03:40:58PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:54PM -0400, Joel Fernandes (Google) wrote:
> > This is useful to check for any improvements or degradation related to
> > number of GP kthread wakeups during testing.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> This was a good way to collect the data for your testing, but
> we can expect rcutorture to only do so much.  ;-)

np, I will push this one into a git tag for the next time I need it ;)

thanks,

 - Joel


> 							Thanx, Paul
> 
> > ---
> >  kernel/rcu/Kconfig.debug |  1 +
> >  kernel/rcu/rcu.h         |  2 ++
> >  kernel/rcu/rcutorture.c  | 23 ++++++++++++++++++++++-
> >  kernel/rcu/tree.c        |  7 +++++++
> >  4 files changed, 32 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > index 3cf6132a4bb9f..3323e3378af5a 100644
> > --- a/kernel/rcu/Kconfig.debug
> > +++ b/kernel/rcu/Kconfig.debug
> > @@ -50,6 +50,7 @@ config RCU_TORTURE_TEST
> >  	select TASKS_RCU
> >  	select TASKS_RUDE_RCU
> >  	select TASKS_TRACE_RCU
> > +	select SCHEDSTATS
> >  	default n
> >  	help
> >  	  This option provides a kernel module that runs torture tests
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index cf66a3ccd7573..7e867e81d9738 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -511,6 +511,7 @@ srcu_batches_completed(struct srcu_struct *sp) { return 0; }
> >  static inline void rcu_force_quiescent_state(void) { }
> >  static inline void show_rcu_gp_kthreads(void) { }
> >  static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
> > +static inline struct task_struct *rcu_get_main_gp_kthread(void) { return 0; }
> >  static inline void rcu_fwd_progress_check(unsigned long j) { }
> >  #else /* #ifdef CONFIG_TINY_RCU */
> >  bool rcu_dynticks_zero_in_eqs(int cpu, int *vp);
> > @@ -519,6 +520,7 @@ unsigned long rcu_exp_batches_completed(void);
> >  unsigned long srcu_batches_completed(struct srcu_struct *sp);
> >  void show_rcu_gp_kthreads(void);
> >  int rcu_get_gp_kthreads_prio(void);
> > +struct task_struct *rcu_get_main_gp_kthread(void);
> >  void rcu_fwd_progress_check(unsigned long j);
> >  void rcu_force_quiescent_state(void);
> >  extern struct workqueue_struct *rcu_gp_wq;
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index d0d265304d147..959a1f84d6904 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/rcupdate_wait.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/sched/signal.h>
> > +#include <linux/sched/stat.h>
> >  #include <uapi/linux/sched/types.h>
> >  #include <linux/atomic.h>
> >  #include <linux/bitops.h>
> > @@ -460,9 +461,29 @@ static void rcu_sync_torture_init(void)
> >  	INIT_LIST_HEAD(&rcu_torture_removed);
> >  }
> >  
> > +unsigned long rcu_gp_nr_wakeups;
> > +
> > +static void rcu_flavor_init(void)
> > +{
> > +	rcu_sync_torture_init();
> > +
> > +	/* Make sure schedstat is enabled for GP thread wakeup count. */
> > +	force_schedstat_enabled();
> > +	rcu_gp_nr_wakeups = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> > +}
> > +
> > +static void rcu_flavor_cleanup(void)
> > +{
> > +	unsigned long now_nr = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> > +
> > +	pr_alert("End-test: Cleanup: Total GP-kthread wakeups: %lu\n",
> > +		now_nr - rcu_gp_nr_wakeups);
> > +}
> > +
> >  static struct rcu_torture_ops rcu_ops = {
> >  	.ttype		= RCU_FLAVOR,
> > -	.init		= rcu_sync_torture_init,
> > +	.init		= rcu_flavor_init,
> > +	.cleanup	= rcu_flavor_cleanup,
> >  	.readlock	= rcu_torture_read_lock,
> >  	.read_delay	= rcu_read_delay,
> >  	.readunlock	= rcu_torture_read_unlock,
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c3bae7a83d792..a3a175feb310a 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -175,6 +175,13 @@ int rcu_get_gp_kthreads_prio(void)
> >  }
> >  EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
> >  
> > +/* Retrieve RCU's main GP kthread task_struct */
> > +struct task_struct *rcu_get_main_gp_kthread(void)
> > +{
> > +	return rcu_state.gp_kthread;
> > +}
> > +EXPORT_SYMBOL_GPL(rcu_get_main_gp_kthread);
> > +
> >  /*
> >   * Number of grace periods between delays, normalized by the duration of
> >   * the delay.  The longer the delay, the more the grace periods between
> > -- 
> > 2.27.0.111.gc72c7da667-goog
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate
  2020-06-18 23:09   ` Paul E. McKenney
@ 2020-06-19  0:04     ` Joel Fernandes
  0 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2020-06-19  0:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, urezki, Davidlohr Bueso, Ingo Molnar,
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	rcu, Steven Rostedt

On Thu, Jun 18, 2020 at 04:09:34PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 03:11:19PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:
> > 
> > First, this looks like a very nice optimization, thank you!

Thanks!

> > > Cc: urezki@gmail.com
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> As discussed over IRC, I updated the patch as shown below.  Does that
> work for you?

Yes, that works for me. Thanks!

 - Joel


> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit ec037e1f438074eb16fd68a63d699fc419c9ba0c
> Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> Date:   Thu Jun 18 16:29:49 2020 -0400
> 
>     rcu/segcblist: Prevent useless GP start if no CBs to accelerate
>     
>     The rcu_segcblist_accelerate() function returns true iff it is necessary
>     to request another grace period.  A tracing session showed that this
>     function unnecessarily requests grace periods.
>     
>     For exmaple, consider the following sequence of events:
>     1. Callbacks are queued only on the NEXT segment of CPU A's callback list.
>     2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT.
>     3. Thus rcu_segcblist_accelerate() returns true, requesting grace period N.
>     4. RCU's grace-period kthread wakes up on CPU B and starts grace period N.
>     4. CPU A notices the new grace period and invokes RCU_SOFTIRQ.
>     5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but
>        there are no new callbacks.  However, rcu_segcblist_accelerate()
>        nevertheless (uselessly) requests a new grace period N+1.
>     
>     This extra grace period results in additional lock contention and also
>     additional wakeups, all for no good reason.
>     
>     This commit therefore adds a check to rcu_segcblist_accelerate() that
>     prevents the return of true when there are no new callbacks.
>     
>     This change reduces the number of grace periods (GPs) and wakeups in each
>     of eleven five-second rcutorture runs as follows:
>     
>     +----+-------------------+-------------------+
>     | #  | Number of GPs     | Number of Wakeups |
>     +====+=========+=========+=========+=========+
>     | 1  | With    | Without | With    | Without |
>     +----+---------+---------+---------+---------+
>     | 2  |      75 |      89 |     113 |     119 |
>     +----+---------+---------+---------+---------+
>     | 3  |      62 |      91 |     105 |     123 |
>     +----+---------+---------+---------+---------+
>     | 4  |      60 |      79 |      98 |     110 |
>     +----+---------+---------+---------+---------+
>     | 5  |      63 |      79 |      99 |     112 |
>     +----+---------+---------+---------+---------+
>     | 6  |      57 |      89 |      96 |     123 |
>     +----+---------+---------+---------+---------+
>     | 7  |      64 |      85 |      97 |     118 |
>     +----+---------+---------+---------+---------+
>     | 8  |      58 |      83 |      98 |     113 |
>     +----+---------+---------+---------+---------+
>     | 9  |      57 |      77 |      89 |     104 |
>     +----+---------+---------+---------+---------+
>     | 10 |      66 |      82 |      98 |     119 |
>     +----+---------+---------+---------+---------+
>     | 11 |      52 |      82 |      83 |     117 |
>     +----+---------+---------+---------+---------+
>     
>     The reduction in the number of wakeups ranges from 5% to 40%.
>     
>     Cc: urezki@gmail.com
>     [ paulmck: Rework commit log and comment. ]
>     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 9a0f661..2d2a6b6b9 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -475,8 +475,16 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
>  	 * Also advance to the oldest segment of callbacks whose
>  	 * ->gp_seq[] completion is at or after that passed in via "seq",
>  	 * skipping any empty segments.
> +	 *
> +	 * Note that segment "i" (and any lower-numbered segments
> +	 * containing older callbacks) will be unaffected, and their
> +	 * grace-period numbers remain unchanged.  For example, if i ==
> +	 * WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched.
> +	 * Instead, the CBs in NEXT_TAIL will be merged with those in
> +	 * NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL
> +	 * would be updated.  NEXT_TAIL would then be empty.
>  	 */
> -	if (++i >= RCU_NEXT_TAIL)
> +	if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
>  		return false;
>  
>  	/*

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq
  2020-06-18 23:51     ` Joel Fernandes
@ 2020-06-19  0:12       ` Paul E. McKenney
  2020-06-19  0:56         ` Joel Fernandes
  2020-06-19  1:08         ` Joel Fernandes
  0 siblings, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-19  0:12 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 07:51:17PM -0400, Joel Fernandes wrote:
> On Thu, Jun 18, 2020 at 03:19:01PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> > > The gp_seq value can come from either of rdp, rsp or rnp.
> > > 
> > > Only the rsp is the global source of truth (most accurate GP info). The
> > > rnp can be off by ~1 and the rdp can be off by way more. Add some more
> > > context to traces to clarify where it comes from.
> > 
> > This would be better done in scripting that processes the trace messages.
> > I must pass on this one.
> 
> I don't think so. I think people reading traces would get confused.  Recently
> I had to talk with someone about this as well when going over traces. I
> myself struggle with it.
> 
> > For future reference, the TPS() around strings is not optional.  Without
> > it, trace messages from crash dumps are garbled, if I remember correctly.
> 
> Ok. I will fix that.

Then please sort this patch after the others.  I am quite unconvinced
about this one, but some of the others looked quite valuable.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> > 
> > 							Thanx, Paul
> > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  include/trace/events/rcu.h | 12 ++++++++----
> > >  kernel/rcu/tree.c          | 32 ++++++++++++++++----------------
> > >  kernel/rcu/tree_plugin.h   |  2 +-
> > >  3 files changed, 25 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > index a6d49864dcc27..cb5363564f7ed 100644
> > > --- a/include/trace/events/rcu.h
> > > +++ b/include/trace/events/rcu.h
> > > @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> > >   */
> > >  TRACE_EVENT_RCU(rcu_grace_period,
> > >  
> > > -	TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> > > +	TP_PROTO(const char *rcuname, const char *gp_seq_src,
> > > +		unsigned long gp_seq, const char *gpevent),
> > >  
> > > -	TP_ARGS(rcuname, gp_seq, gpevent),
> > > +	TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
> > >  
> > >  	TP_STRUCT__entry(
> > >  		__field(const char *, rcuname)
> > > +		__field(const char *, gp_seq_src)
> > >  		__field(unsigned long, gp_seq)
> > >  		__field(const char *, gpevent)
> > >  	),
> > >  
> > >  	TP_fast_assign(
> > >  		__entry->rcuname = rcuname;
> > > +		__entry->gp_seq_src = gp_seq_src;
> > >  		__entry->gp_seq = gp_seq;
> > >  		__entry->gpevent = gpevent;
> > >  	),
> > >  
> > > -	TP_printk("%s %lu %s",
> > > -		  __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> > > +	TP_printk("%s %s_gp_seq=%lu %s",
> > > +		  __entry->rcuname, __entry->gp_seq_src,
> > > +		  __entry->gp_seq, __entry->gpevent)
> > >  );
> > >  
> > >  /*
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index c61af6a33fbfd..81df1b837dd9d 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> > >  		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> > >  		goto unlock_out;
> > >  	}
> > > -	trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> > > +	trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> > >  	ret = true;  /* Caller must wake GP kthread. */
> > >  unlock_out:
> > >  	/* Push furthest requested GP to leaf node and rcu_data structure. */
> > > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> > >  
> > >  	/* Trace depending on how much we were able to accelerate. */
> > >  	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > >  	else
> > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> > >  
> > >  	/* Count CBs for tracing. */
> > >  	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > > @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > >  		if (!offloaded)
> > >  			ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> > >  		rdp->core_needs_qs = false;
> > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> > >  	} else {
> > >  		if (!offloaded)
> > >  			ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > > @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > >  		 * set up to detect a quiescent state, otherwise don't
> > >  		 * go looking for one.
> > >  		 */
> > > -		trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > > +		trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> > >  		need_qs = !!(rnp->qsmask & rdp->grpmask);
> > >  		rdp->cpu_no_qs.b.norm = need_qs;
> > >  		rdp->core_needs_qs = need_qs;
> > > @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> > >  	/* Record GP times before starting GP, hence rcu_seq_start(). */
> > >  	rcu_seq_start(&rcu_state.gp_seq);
> > >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > > +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> > >  	raw_spin_unlock_irq_rcu_node(rnp);
> > >  
> > >  	/*
> > > @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> > >  			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> > >  				   jiffies + (j ? 3 * j : 2));
> > >  		}
> > > -		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > >  				       TPS("fqswait"));
> > >  		rcu_state.gp_state = RCU_GP_WAIT_FQS;
> > >  		ret = swait_event_idle_timeout_exclusive(
> > > @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> > >  		/* If time for quiescent-state forcing, do it. */
> > >  		if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> > >  		    (gf & RCU_GP_FLAG_FQS)) {
> > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > >  					       TPS("fqsstart"));
> > >  			rcu_gp_fqs(first_gp_fqs);
> > >  			gf = 0;
> > > @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> > >  				first_gp_fqs = false;
> > >  				gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> > >  			}
> > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > >  					       TPS("fqsend"));
> > >  			cond_resched_tasks_rcu_qs();
> > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> > >  			cond_resched_tasks_rcu_qs();
> > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > >  			WARN_ON(signal_pending(current));
> > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > >  					       TPS("fqswaitsig"));
> > >  			ret = 1; /* Keep old FQS timing. */
> > >  			j = jiffies;
> > > @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> > >  	raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
> > >  
> > >  	/* Declare grace period done, trace first to use old GP number. */
> > > -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> > > +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> > >  	rcu_seq_end(&rcu_state.gp_seq);
> > >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > >  	rcu_state.gp_state = RCU_GP_IDLE;
> > > @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> > >  	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> > >  		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> > >  		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> > > -		trace_rcu_grace_period(rcu_state.name,
> > > +		trace_rcu_grace_period(rcu_state.name, "rsp",
> > >  				       rcu_state.gp_seq,
> > >  				       TPS("newreq"));
> > >  	} else {
> > > @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > >  
> > >  		/* Handle grace-period start. */
> > >  		for (;;) {
> > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > >  					       TPS("reqwait"));
> > >  			rcu_state.gp_state = RCU_GP_WAIT_GPS;
> > >  			swait_event_idle_exclusive(rcu_state.gp_wq,
> > > @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > >  			cond_resched_tasks_rcu_qs();
> > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > >  			WARN_ON(signal_pending(current));
> > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > >  					       TPS("reqwaitsig"));
> > >  		}
> > >  
> > > @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> > >  		return 0;
> > >  
> > >  	blkd = !!(rnp->qsmask & rdp->grpmask);
> > > -	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> > > +	trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> > >  			       blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> > >  	return 0;
> > >  }
> > > @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > >  	rdp->core_needs_qs = false;
> > >  	rdp->rcu_iw_pending = false;
> > >  	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> > > -	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > > +	trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> > >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > >  	rcu_prepare_kthreads(cpu);
> > >  	rcu_spawn_cpu_nocb_kthread(cpu);
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 982fc5be52698..32f761cf16c33 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -262,7 +262,7 @@ static void rcu_qs(void)
> > >  {
> > >  	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> > >  	if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> > > -		trace_rcu_grace_period(TPS("rcu_preempt"),
> > > +		trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> > >  				       __this_cpu_read(rcu_data.gp_seq),
> > >  				       TPS("cpuqs"));
> > >  		__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> > > -- 
> > > 2.27.0.111.gc72c7da667-goog
> > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread
  2020-06-19  0:01     ` Joel Fernandes
@ 2020-06-19  0:12       ` Paul E. McKenney
  2020-06-19  1:00         ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-19  0:12 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 08:01:56PM -0400, Joel Fernandes wrote:
> On Thu, Jun 18, 2020 at 03:40:58PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 18, 2020 at 04:29:54PM -0400, Joel Fernandes (Google) wrote:
> > > This is useful to check for any improvements or degradation related to
> > > number of GP kthread wakeups during testing.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > This was a good way to collect the data for your testing, but
> > we can expect rcutorture to only do so much.  ;-)
> 
> np, I will push this one into a git tag for the next time I need it ;)

Sounds like a plan!

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > 							Thanx, Paul
> > 
> > > ---
> > >  kernel/rcu/Kconfig.debug |  1 +
> > >  kernel/rcu/rcu.h         |  2 ++
> > >  kernel/rcu/rcutorture.c  | 23 ++++++++++++++++++++++-
> > >  kernel/rcu/tree.c        |  7 +++++++
> > >  4 files changed, 32 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > > index 3cf6132a4bb9f..3323e3378af5a 100644
> > > --- a/kernel/rcu/Kconfig.debug
> > > +++ b/kernel/rcu/Kconfig.debug
> > > @@ -50,6 +50,7 @@ config RCU_TORTURE_TEST
> > >  	select TASKS_RCU
> > >  	select TASKS_RUDE_RCU
> > >  	select TASKS_TRACE_RCU
> > > +	select SCHEDSTATS
> > >  	default n
> > >  	help
> > >  	  This option provides a kernel module that runs torture tests
> > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > index cf66a3ccd7573..7e867e81d9738 100644
> > > --- a/kernel/rcu/rcu.h
> > > +++ b/kernel/rcu/rcu.h
> > > @@ -511,6 +511,7 @@ srcu_batches_completed(struct srcu_struct *sp) { return 0; }
> > >  static inline void rcu_force_quiescent_state(void) { }
> > >  static inline void show_rcu_gp_kthreads(void) { }
> > >  static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
> > > +static inline struct task_struct *rcu_get_main_gp_kthread(void) { return 0; }
> > >  static inline void rcu_fwd_progress_check(unsigned long j) { }
> > >  #else /* #ifdef CONFIG_TINY_RCU */
> > >  bool rcu_dynticks_zero_in_eqs(int cpu, int *vp);
> > > @@ -519,6 +520,7 @@ unsigned long rcu_exp_batches_completed(void);
> > >  unsigned long srcu_batches_completed(struct srcu_struct *sp);
> > >  void show_rcu_gp_kthreads(void);
> > >  int rcu_get_gp_kthreads_prio(void);
> > > +struct task_struct *rcu_get_main_gp_kthread(void);
> > >  void rcu_fwd_progress_check(unsigned long j);
> > >  void rcu_force_quiescent_state(void);
> > >  extern struct workqueue_struct *rcu_gp_wq;
> > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > index d0d265304d147..959a1f84d6904 100644
> > > --- a/kernel/rcu/rcutorture.c
> > > +++ b/kernel/rcu/rcutorture.c
> > > @@ -23,6 +23,7 @@
> > >  #include <linux/rcupdate_wait.h>
> > >  #include <linux/interrupt.h>
> > >  #include <linux/sched/signal.h>
> > > +#include <linux/sched/stat.h>
> > >  #include <uapi/linux/sched/types.h>
> > >  #include <linux/atomic.h>
> > >  #include <linux/bitops.h>
> > > @@ -460,9 +461,29 @@ static void rcu_sync_torture_init(void)
> > >  	INIT_LIST_HEAD(&rcu_torture_removed);
> > >  }
> > >  
> > > +unsigned long rcu_gp_nr_wakeups;
> > > +
> > > +static void rcu_flavor_init(void)
> > > +{
> > > +	rcu_sync_torture_init();
> > > +
> > > +	/* Make sure schedstat is enabled for GP thread wakeup count. */
> > > +	force_schedstat_enabled();
> > > +	rcu_gp_nr_wakeups = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> > > +}
> > > +
> > > +static void rcu_flavor_cleanup(void)
> > > +{
> > > +	unsigned long now_nr = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> > > +
> > > +	pr_alert("End-test: Cleanup: Total GP-kthread wakeups: %lu\n",
> > > +		now_nr - rcu_gp_nr_wakeups);
> > > +}
> > > +
> > >  static struct rcu_torture_ops rcu_ops = {
> > >  	.ttype		= RCU_FLAVOR,
> > > -	.init		= rcu_sync_torture_init,
> > > +	.init		= rcu_flavor_init,
> > > +	.cleanup	= rcu_flavor_cleanup,
> > >  	.readlock	= rcu_torture_read_lock,
> > >  	.read_delay	= rcu_read_delay,
> > >  	.readunlock	= rcu_torture_read_unlock,
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index c3bae7a83d792..a3a175feb310a 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -175,6 +175,13 @@ int rcu_get_gp_kthreads_prio(void)
> > >  }
> > >  EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
> > >  
> > > +/* Retrieve RCU's main GP kthread task_struct */
> > > +struct task_struct *rcu_get_main_gp_kthread(void)
> > > +{
> > > +	return rcu_state.gp_kthread;
> > > +}
> > > +EXPORT_SYMBOL_GPL(rcu_get_main_gp_kthread);
> > > +
> > >  /*
> > >   * Number of grace periods between delays, normalized by the duration of
> > >   * the delay.  The longer the delay, the more the grace periods between
> > > -- 
> > > 2.27.0.111.gc72c7da667-goog
> > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq
  2020-06-19  0:12       ` Paul E. McKenney
@ 2020-06-19  0:56         ` Joel Fernandes
  2020-06-19  1:08         ` Joel Fernandes
  1 sibling, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2020-06-19  0:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 05:12:01PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 07:51:17PM -0400, Joel Fernandes wrote:
> > On Thu, Jun 18, 2020 at 03:19:01PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> > > > The gp_seq value can come from either of rdp, rsp or rnp.
> > > > 
> > > > Only the rsp is the global source of truth (most accurate GP info). The
> > > > rnp can be off by ~1 and the rdp can be off by way more. Add some more
> > > > context to traces to clarify where it comes from.
> > > 
> > > This would be better done in scripting that processes the trace messages.
> > > I must pass on this one.
> > 
> > I don't think so. I think people reading traces would get confused.  Recently
> > I had to talk with someone about this as well when going over traces. I
> > myself struggle with it.
> > 
> > > For future reference, the TPS() around strings is not optional.  Without
> > > it, trace messages from crash dumps are garbled, if I remember correctly.
> > 
> > Ok. I will fix that.
> 
> Then please sort this patch after the others.  I am quite unconvinced
> about this one, but some of the others looked quite valuable.

Sure sounds good :) will do :) thanks!

 - Joel

> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > > 
> > > 							Thanx, Paul
> > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  include/trace/events/rcu.h | 12 ++++++++----
> > > >  kernel/rcu/tree.c          | 32 ++++++++++++++++----------------
> > > >  kernel/rcu/tree_plugin.h   |  2 +-
> > > >  3 files changed, 25 insertions(+), 21 deletions(-)
> > > > 
> > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > > index a6d49864dcc27..cb5363564f7ed 100644
> > > > --- a/include/trace/events/rcu.h
> > > > +++ b/include/trace/events/rcu.h
> > > > @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> > > >   */
> > > >  TRACE_EVENT_RCU(rcu_grace_period,
> > > >  
> > > > -	TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> > > > +	TP_PROTO(const char *rcuname, const char *gp_seq_src,
> > > > +		unsigned long gp_seq, const char *gpevent),
> > > >  
> > > > -	TP_ARGS(rcuname, gp_seq, gpevent),
> > > > +	TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
> > > >  
> > > >  	TP_STRUCT__entry(
> > > >  		__field(const char *, rcuname)
> > > > +		__field(const char *, gp_seq_src)
> > > >  		__field(unsigned long, gp_seq)
> > > >  		__field(const char *, gpevent)
> > > >  	),
> > > >  
> > > >  	TP_fast_assign(
> > > >  		__entry->rcuname = rcuname;
> > > > +		__entry->gp_seq_src = gp_seq_src;
> > > >  		__entry->gp_seq = gp_seq;
> > > >  		__entry->gpevent = gpevent;
> > > >  	),
> > > >  
> > > > -	TP_printk("%s %lu %s",
> > > > -		  __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> > > > +	TP_printk("%s %s_gp_seq=%lu %s",
> > > > +		  __entry->rcuname, __entry->gp_seq_src,
> > > > +		  __entry->gp_seq, __entry->gpevent)
> > > >  );
> > > >  
> > > >  /*
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index c61af6a33fbfd..81df1b837dd9d 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> > > >  		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> > > >  		goto unlock_out;
> > > >  	}
> > > > -	trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> > > > +	trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> > > >  	ret = true;  /* Caller must wake GP kthread. */
> > > >  unlock_out:
> > > >  	/* Push furthest requested GP to leaf node and rcu_data structure. */
> > > > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> > > >  
> > > >  	/* Trace depending on how much we were able to accelerate. */
> > > >  	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> > > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > > >  	else
> > > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> > > >  
> > > >  	/* Count CBs for tracing. */
> > > >  	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > > > @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > >  		if (!offloaded)
> > > >  			ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> > > >  		rdp->core_needs_qs = false;
> > > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> > > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> > > >  	} else {
> > > >  		if (!offloaded)
> > > >  			ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > > > @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > >  		 * set up to detect a quiescent state, otherwise don't
> > > >  		 * go looking for one.
> > > >  		 */
> > > > -		trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > > > +		trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> > > >  		need_qs = !!(rnp->qsmask & rdp->grpmask);
> > > >  		rdp->cpu_no_qs.b.norm = need_qs;
> > > >  		rdp->core_needs_qs = need_qs;
> > > > @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> > > >  	/* Record GP times before starting GP, hence rcu_seq_start(). */
> > > >  	rcu_seq_start(&rcu_state.gp_seq);
> > > >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > > > +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> > > >  	raw_spin_unlock_irq_rcu_node(rnp);
> > > >  
> > > >  	/*
> > > > @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> > > >  			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> > > >  				   jiffies + (j ? 3 * j : 2));
> > > >  		}
> > > > -		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  				       TPS("fqswait"));
> > > >  		rcu_state.gp_state = RCU_GP_WAIT_FQS;
> > > >  		ret = swait_event_idle_timeout_exclusive(
> > > > @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> > > >  		/* If time for quiescent-state forcing, do it. */
> > > >  		if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> > > >  		    (gf & RCU_GP_FLAG_FQS)) {
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("fqsstart"));
> > > >  			rcu_gp_fqs(first_gp_fqs);
> > > >  			gf = 0;
> > > > @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> > > >  				first_gp_fqs = false;
> > > >  				gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> > > >  			}
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("fqsend"));
> > > >  			cond_resched_tasks_rcu_qs();
> > > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > > @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> > > >  			cond_resched_tasks_rcu_qs();
> > > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > >  			WARN_ON(signal_pending(current));
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("fqswaitsig"));
> > > >  			ret = 1; /* Keep old FQS timing. */
> > > >  			j = jiffies;
> > > > @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> > > >  	raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
> > > >  
> > > >  	/* Declare grace period done, trace first to use old GP number. */
> > > > -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> > > > +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> > > >  	rcu_seq_end(&rcu_state.gp_seq);
> > > >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > >  	rcu_state.gp_state = RCU_GP_IDLE;
> > > > @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> > > >  	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> > > >  		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> > > >  		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> > > > -		trace_rcu_grace_period(rcu_state.name,
> > > > +		trace_rcu_grace_period(rcu_state.name, "rsp",
> > > >  				       rcu_state.gp_seq,
> > > >  				       TPS("newreq"));
> > > >  	} else {
> > > > @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > >  
> > > >  		/* Handle grace-period start. */
> > > >  		for (;;) {
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("reqwait"));
> > > >  			rcu_state.gp_state = RCU_GP_WAIT_GPS;
> > > >  			swait_event_idle_exclusive(rcu_state.gp_wq,
> > > > @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > >  			cond_resched_tasks_rcu_qs();
> > > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > >  			WARN_ON(signal_pending(current));
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("reqwaitsig"));
> > > >  		}
> > > >  
> > > > @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> > > >  		return 0;
> > > >  
> > > >  	blkd = !!(rnp->qsmask & rdp->grpmask);
> > > > -	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> > > > +	trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> > > >  			       blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> > > >  	return 0;
> > > >  }
> > > > @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > > >  	rdp->core_needs_qs = false;
> > > >  	rdp->rcu_iw_pending = false;
> > > >  	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> > > > -	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > > > +	trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> > > >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > >  	rcu_prepare_kthreads(cpu);
> > > >  	rcu_spawn_cpu_nocb_kthread(cpu);
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 982fc5be52698..32f761cf16c33 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -262,7 +262,7 @@ static void rcu_qs(void)
> > > >  {
> > > >  	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> > > >  	if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> > > > -		trace_rcu_grace_period(TPS("rcu_preempt"),
> > > > +		trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> > > >  				       __this_cpu_read(rcu_data.gp_seq),
> > > >  				       TPS("cpuqs"));
> > > >  		__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> > > > -- 
> > > > 2.27.0.111.gc72c7da667-goog
> > > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread
  2020-06-19  0:12       ` Paul E. McKenney
@ 2020-06-19  1:00         ` Joel Fernandes
  2020-06-19  3:23           ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2020-06-19  1:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 05:12:44PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 08:01:56PM -0400, Joel Fernandes wrote:
> > On Thu, Jun 18, 2020 at 03:40:58PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 18, 2020 at 04:29:54PM -0400, Joel Fernandes (Google) wrote:
> > > > This is useful to check for any improvements or degradation related to
> > > > number of GP kthread wakeups during testing.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > 
> > > This was a good way to collect the data for your testing, but
> > > we can expect rcutorture to only do so much.  ;-)
> > 
> > np, I will push this one into a git tag for the next time I need it ;)
> 
> Sounds like a plan!

In case it is at all an option, I could put this as a statistic under
rcu_torture_stats_print(), that way it all is on the same line. But I take it
you are not too thrilled to have it for now.

thanks,

 - Joel



> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > > 							Thanx, Paul
> > > 
> > > > ---
> > > >  kernel/rcu/Kconfig.debug |  1 +
> > > >  kernel/rcu/rcu.h         |  2 ++
> > > >  kernel/rcu/rcutorture.c  | 23 ++++++++++++++++++++++-
> > > >  kernel/rcu/tree.c        |  7 +++++++
> > > >  4 files changed, 32 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > > > index 3cf6132a4bb9f..3323e3378af5a 100644
> > > > --- a/kernel/rcu/Kconfig.debug
> > > > +++ b/kernel/rcu/Kconfig.debug
> > > > @@ -50,6 +50,7 @@ config RCU_TORTURE_TEST
> > > >  	select TASKS_RCU
> > > >  	select TASKS_RUDE_RCU
> > > >  	select TASKS_TRACE_RCU
> > > > +	select SCHEDSTATS
> > > >  	default n
> > > >  	help
> > > >  	  This option provides a kernel module that runs torture tests
> > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > index cf66a3ccd7573..7e867e81d9738 100644
> > > > --- a/kernel/rcu/rcu.h
> > > > +++ b/kernel/rcu/rcu.h
> > > > @@ -511,6 +511,7 @@ srcu_batches_completed(struct srcu_struct *sp) { return 0; }
> > > >  static inline void rcu_force_quiescent_state(void) { }
> > > >  static inline void show_rcu_gp_kthreads(void) { }
> > > >  static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
> > > > +static inline struct task_struct *rcu_get_main_gp_kthread(void) { return 0; }
> > > >  static inline void rcu_fwd_progress_check(unsigned long j) { }
> > > >  #else /* #ifdef CONFIG_TINY_RCU */
> > > >  bool rcu_dynticks_zero_in_eqs(int cpu, int *vp);
> > > > @@ -519,6 +520,7 @@ unsigned long rcu_exp_batches_completed(void);
> > > >  unsigned long srcu_batches_completed(struct srcu_struct *sp);
> > > >  void show_rcu_gp_kthreads(void);
> > > >  int rcu_get_gp_kthreads_prio(void);
> > > > +struct task_struct *rcu_get_main_gp_kthread(void);
> > > >  void rcu_fwd_progress_check(unsigned long j);
> > > >  void rcu_force_quiescent_state(void);
> > > >  extern struct workqueue_struct *rcu_gp_wq;
> > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > > index d0d265304d147..959a1f84d6904 100644
> > > > --- a/kernel/rcu/rcutorture.c
> > > > +++ b/kernel/rcu/rcutorture.c
> > > > @@ -23,6 +23,7 @@
> > > >  #include <linux/rcupdate_wait.h>
> > > >  #include <linux/interrupt.h>
> > > >  #include <linux/sched/signal.h>
> > > > +#include <linux/sched/stat.h>
> > > >  #include <uapi/linux/sched/types.h>
> > > >  #include <linux/atomic.h>
> > > >  #include <linux/bitops.h>
> > > > @@ -460,9 +461,29 @@ static void rcu_sync_torture_init(void)
> > > >  	INIT_LIST_HEAD(&rcu_torture_removed);
> > > >  }
> > > >  
> > > > +unsigned long rcu_gp_nr_wakeups;
> > > > +
> > > > +static void rcu_flavor_init(void)
> > > > +{
> > > > +	rcu_sync_torture_init();
> > > > +
> > > > +	/* Make sure schedstat is enabled for GP thread wakeup count. */
> > > > +	force_schedstat_enabled();
> > > > +	rcu_gp_nr_wakeups = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> > > > +}
> > > > +
> > > > +static void rcu_flavor_cleanup(void)
> > > > +{
> > > > +	unsigned long now_nr = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> > > > +
> > > > +	pr_alert("End-test: Cleanup: Total GP-kthread wakeups: %lu\n",
> > > > +		now_nr - rcu_gp_nr_wakeups);
> > > > +}
> > > > +
> > > >  static struct rcu_torture_ops rcu_ops = {
> > > >  	.ttype		= RCU_FLAVOR,
> > > > -	.init		= rcu_sync_torture_init,
> > > > +	.init		= rcu_flavor_init,
> > > > +	.cleanup	= rcu_flavor_cleanup,
> > > >  	.readlock	= rcu_torture_read_lock,
> > > >  	.read_delay	= rcu_read_delay,
> > > >  	.readunlock	= rcu_torture_read_unlock,
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index c3bae7a83d792..a3a175feb310a 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -175,6 +175,13 @@ int rcu_get_gp_kthreads_prio(void)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
> > > >  
> > > > +/* Retrieve RCU's main GP kthread task_struct */
> > > > +struct task_struct *rcu_get_main_gp_kthread(void)
> > > > +{
> > > > +	return rcu_state.gp_kthread;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(rcu_get_main_gp_kthread);
> > > > +
> > > >  /*
> > > >   * Number of grace periods between delays, normalized by the duration of
> > > >   * the delay.  The longer the delay, the more the grace periods between
> > > > -- 
> > > > 2.27.0.111.gc72c7da667-goog
> > > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq
  2020-06-19  0:01     ` Steven Rostedt
@ 2020-06-19  1:01       ` Joel Fernandes
  0 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2020-06-19  1:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, linux-kernel, Davidlohr Bueso, Ingo Molnar,
	Josh Triplett, Lai Jiangshan, Marco Elver, Mathieu Desnoyers,
	rcu, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 08:01:38PM -0400, Steven Rostedt wrote:
> On Thu, 18 Jun 2020 15:19:01 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > For future reference, the TPS() around strings is not optional.  Without
> > it, trace messages from crash dumps are garbled, if I remember correctly.
> 
> When you pass in a string like this, only the pointer to the string is
> saved in the ring buffer. User space tools have no idea what those
> pointers are. The TPS() around strings maps those pointers to the
> string and shows them in the /sys/kernel/tracing/printk_formats file,
> such that perf and trace-cmd know how to make sense of those strings.

Makes sense. Quite a valuable feature!

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq
  2020-06-19  0:12       ` Paul E. McKenney
  2020-06-19  0:56         ` Joel Fernandes
@ 2020-06-19  1:08         ` Joel Fernandes
  1 sibling, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2020-06-19  1:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 05:12:01PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 07:51:17PM -0400, Joel Fernandes wrote:
> > On Thu, Jun 18, 2020 at 03:19:01PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> > > > The gp_seq value can come from either of rdp, rsp or rnp.
> > > > 
> > > > Only the rsp is the global source of truth (most accurate GP info). The
> > > > rnp can be off by ~1 and the rdp can be off by way more. Add some more
> > > > context to traces to clarify where it comes from.
> > > 
> > > This would be better done in scripting that processes the trace messages.
> > > I must pass on this one.
> > 
> > I don't think so. I think people reading traces would get confused.  Recently
> > I had to talk with someone about this as well when going over traces. I
> > myself struggle with it.
> > 
> > > For future reference, the TPS() around strings is not optional.  Without
> > > it, trace messages from crash dumps are garbled, if I remember correctly.
> > 
> > Ok. I will fix that.
> 
> Then please sort this patch after the others.  I am quite unconvinced
> about this one, but some of the others looked quite valuable.

To summarize what's left for me to do right now:

1 and 7 were pulled.
2 - requires the per-cblist counter I'll work on in the near future.
3 - have to add TPS()
4 - The negative number fixes which I'll resend.
5 - changing to gp_seq_req in this tracepoint.
6 - dropping for now and pushing it to a git tag.

So on top of rcu/dev I'll rework 3 and 5, and resend 4.

Please do consider again if we can find a place to 6 :) thanks!

thanks,

 - Joel


> 
> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > > 
> > > 							Thanx, Paul
> > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  include/trace/events/rcu.h | 12 ++++++++----
> > > >  kernel/rcu/tree.c          | 32 ++++++++++++++++----------------
> > > >  kernel/rcu/tree_plugin.h   |  2 +-
> > > >  3 files changed, 25 insertions(+), 21 deletions(-)
> > > > 
> > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > > index a6d49864dcc27..cb5363564f7ed 100644
> > > > --- a/include/trace/events/rcu.h
> > > > +++ b/include/trace/events/rcu.h
> > > > @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> > > >   */
> > > >  TRACE_EVENT_RCU(rcu_grace_period,
> > > >  
> > > > -	TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> > > > +	TP_PROTO(const char *rcuname, const char *gp_seq_src,
> > > > +		unsigned long gp_seq, const char *gpevent),
> > > >  
> > > > -	TP_ARGS(rcuname, gp_seq, gpevent),
> > > > +	TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
> > > >  
> > > >  	TP_STRUCT__entry(
> > > >  		__field(const char *, rcuname)
> > > > +		__field(const char *, gp_seq_src)
> > > >  		__field(unsigned long, gp_seq)
> > > >  		__field(const char *, gpevent)
> > > >  	),
> > > >  
> > > >  	TP_fast_assign(
> > > >  		__entry->rcuname = rcuname;
> > > > +		__entry->gp_seq_src = gp_seq_src;
> > > >  		__entry->gp_seq = gp_seq;
> > > >  		__entry->gpevent = gpevent;
> > > >  	),
> > > >  
> > > > -	TP_printk("%s %lu %s",
> > > > -		  __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> > > > +	TP_printk("%s %s_gp_seq=%lu %s",
> > > > +		  __entry->rcuname, __entry->gp_seq_src,
> > > > +		  __entry->gp_seq, __entry->gpevent)
> > > >  );
> > > >  
> > > >  /*
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index c61af6a33fbfd..81df1b837dd9d 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> > > >  		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> > > >  		goto unlock_out;
> > > >  	}
> > > > -	trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> > > > +	trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> > > >  	ret = true;  /* Caller must wake GP kthread. */
> > > >  unlock_out:
> > > >  	/* Push furthest requested GP to leaf node and rcu_data structure. */
> > > > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> > > >  
> > > >  	/* Trace depending on how much we were able to accelerate. */
> > > >  	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> > > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > > >  	else
> > > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> > > >  
> > > >  	/* Count CBs for tracing. */
> > > >  	rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > > > @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > >  		if (!offloaded)
> > > >  			ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> > > >  		rdp->core_needs_qs = false;
> > > > -		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> > > > +		trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> > > >  	} else {
> > > >  		if (!offloaded)
> > > >  			ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > > > @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > >  		 * set up to detect a quiescent state, otherwise don't
> > > >  		 * go looking for one.
> > > >  		 */
> > > > -		trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > > > +		trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> > > >  		need_qs = !!(rnp->qsmask & rdp->grpmask);
> > > >  		rdp->cpu_no_qs.b.norm = need_qs;
> > > >  		rdp->core_needs_qs = need_qs;
> > > > @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> > > >  	/* Record GP times before starting GP, hence rcu_seq_start(). */
> > > >  	rcu_seq_start(&rcu_state.gp_seq);
> > > >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > > > +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> > > >  	raw_spin_unlock_irq_rcu_node(rnp);
> > > >  
> > > >  	/*
> > > > @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> > > >  			WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> > > >  				   jiffies + (j ? 3 * j : 2));
> > > >  		}
> > > > -		trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +		trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  				       TPS("fqswait"));
> > > >  		rcu_state.gp_state = RCU_GP_WAIT_FQS;
> > > >  		ret = swait_event_idle_timeout_exclusive(
> > > > @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> > > >  		/* If time for quiescent-state forcing, do it. */
> > > >  		if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> > > >  		    (gf & RCU_GP_FLAG_FQS)) {
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("fqsstart"));
> > > >  			rcu_gp_fqs(first_gp_fqs);
> > > >  			gf = 0;
> > > > @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> > > >  				first_gp_fqs = false;
> > > >  				gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> > > >  			}
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("fqsend"));
> > > >  			cond_resched_tasks_rcu_qs();
> > > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > > @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> > > >  			cond_resched_tasks_rcu_qs();
> > > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > >  			WARN_ON(signal_pending(current));
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("fqswaitsig"));
> > > >  			ret = 1; /* Keep old FQS timing. */
> > > >  			j = jiffies;
> > > > @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> > > >  	raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
> > > >  
> > > >  	/* Declare grace period done, trace first to use old GP number. */
> > > > -	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> > > > +	trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> > > >  	rcu_seq_end(&rcu_state.gp_seq);
> > > >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > >  	rcu_state.gp_state = RCU_GP_IDLE;
> > > > @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> > > >  	if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> > > >  		WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> > > >  		WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> > > > -		trace_rcu_grace_period(rcu_state.name,
> > > > +		trace_rcu_grace_period(rcu_state.name, "rsp",
> > > >  				       rcu_state.gp_seq,
> > > >  				       TPS("newreq"));
> > > >  	} else {
> > > > @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > >  
> > > >  		/* Handle grace-period start. */
> > > >  		for (;;) {
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("reqwait"));
> > > >  			rcu_state.gp_state = RCU_GP_WAIT_GPS;
> > > >  			swait_event_idle_exclusive(rcu_state.gp_wq,
> > > > @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > >  			cond_resched_tasks_rcu_qs();
> > > >  			WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > >  			WARN_ON(signal_pending(current));
> > > > -			trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > +			trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > >  					       TPS("reqwaitsig"));
> > > >  		}
> > > >  
> > > > @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> > > >  		return 0;
> > > >  
> > > >  	blkd = !!(rnp->qsmask & rdp->grpmask);
> > > > -	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> > > > +	trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> > > >  			       blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> > > >  	return 0;
> > > >  }
> > > > @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > > >  	rdp->core_needs_qs = false;
> > > >  	rdp->rcu_iw_pending = false;
> > > >  	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> > > > -	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > > > +	trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> > > >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > >  	rcu_prepare_kthreads(cpu);
> > > >  	rcu_spawn_cpu_nocb_kthread(cpu);
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 982fc5be52698..32f761cf16c33 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -262,7 +262,7 @@ static void rcu_qs(void)
> > > >  {
> > > >  	RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> > > >  	if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> > > > -		trace_rcu_grace_period(TPS("rcu_preempt"),
> > > > +		trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> > > >  				       __this_cpu_read(rcu_data.gp_seq),
> > > >  				       TPS("cpuqs"));
> > > >  		__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> > > > -- 
> > > > 2.27.0.111.gc72c7da667-goog
> > > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread
  2020-06-19  1:00         ` Joel Fernandes
@ 2020-06-19  3:23           ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2020-06-19  3:23 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Davidlohr Bueso, Ingo Molnar, Josh Triplett,
	Lai Jiangshan, Marco Elver, Mathieu Desnoyers, rcu,
	Steven Rostedt, Uladzislau Rezki (Sony)

On Thu, Jun 18, 2020 at 09:00:12PM -0400, Joel Fernandes wrote:
> On Thu, Jun 18, 2020 at 05:12:44PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 18, 2020 at 08:01:56PM -0400, Joel Fernandes wrote:
> > > On Thu, Jun 18, 2020 at 03:40:58PM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 18, 2020 at 04:29:54PM -0400, Joel Fernandes (Google) wrote:
> > > > > This is useful to check for any improvements or degradation related to
> > > > > number of GP kthread wakeups during testing.
> > > > > 
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > 
> > > > This was a good way to collect the data for your testing, but
> > > > we can expect rcutorture to only do so much.  ;-)
> > > 
> > > np, I will push this one into a git tag for the next time I need it ;)
> > 
> > Sounds like a plan!
> 
> In case it is at all an option, I could put this as a statistic under
> rcu_torture_stats_print(), that way it all is on the same line. But I take it
> you are not too thrilled to have it for now.

The thing is that rcutorture is a stress test, not a performance or
energy-efficiency test.  So rcutorture will continue to wake things
up just to brutalize RCU, which makes such a statistic less helpful.

It might be something for rcuperf/rcuscale, although it is not clear that
a pure performance/scalability test would give a useful read on wakeups.
I would expect better results from a more random real-world workload.

But I bet that people already have this sort of thing set up.  For
example, doesn't kbuild test robot track this sort of thing?

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> 
> > 							Thanx, Paul
> > 
> > > thanks,
> > > 
> > >  - Joel
> > > 
> > > 
> > > > 							Thanx, Paul
> > > > 
> > > > > ---
> > > > >  kernel/rcu/Kconfig.debug |  1 +
> > > > >  kernel/rcu/rcu.h         |  2 ++
> > > > >  kernel/rcu/rcutorture.c  | 23 ++++++++++++++++++++++-
> > > > >  kernel/rcu/tree.c        |  7 +++++++
> > > > >  4 files changed, 32 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > > > > index 3cf6132a4bb9f..3323e3378af5a 100644
> > > > > --- a/kernel/rcu/Kconfig.debug
> > > > > +++ b/kernel/rcu/Kconfig.debug
> > > > > @@ -50,6 +50,7 @@ config RCU_TORTURE_TEST
> > > > >  	select TASKS_RCU
> > > > >  	select TASKS_RUDE_RCU
> > > > >  	select TASKS_TRACE_RCU
> > > > > +	select SCHEDSTATS
> > > > >  	default n
> > > > >  	help
> > > > >  	  This option provides a kernel module that runs torture tests
> > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > index cf66a3ccd7573..7e867e81d9738 100644
> > > > > --- a/kernel/rcu/rcu.h
> > > > > +++ b/kernel/rcu/rcu.h
> > > > > @@ -511,6 +511,7 @@ srcu_batches_completed(struct srcu_struct *sp) { return 0; }
> > > > >  static inline void rcu_force_quiescent_state(void) { }
> > > > >  static inline void show_rcu_gp_kthreads(void) { }
> > > > >  static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
> > > > > +static inline struct task_struct *rcu_get_main_gp_kthread(void) { return 0; }
> > > > >  static inline void rcu_fwd_progress_check(unsigned long j) { }
> > > > >  #else /* #ifdef CONFIG_TINY_RCU */
> > > > >  bool rcu_dynticks_zero_in_eqs(int cpu, int *vp);
> > > > > @@ -519,6 +520,7 @@ unsigned long rcu_exp_batches_completed(void);
> > > > >  unsigned long srcu_batches_completed(struct srcu_struct *sp);
> > > > >  void show_rcu_gp_kthreads(void);
> > > > >  int rcu_get_gp_kthreads_prio(void);
> > > > > +struct task_struct *rcu_get_main_gp_kthread(void);
> > > > >  void rcu_fwd_progress_check(unsigned long j);
> > > > >  void rcu_force_quiescent_state(void);
> > > > >  extern struct workqueue_struct *rcu_gp_wq;
> > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > > > index d0d265304d147..959a1f84d6904 100644
> > > > > --- a/kernel/rcu/rcutorture.c
> > > > > +++ b/kernel/rcu/rcutorture.c
> > > > > @@ -23,6 +23,7 @@
> > > > >  #include <linux/rcupdate_wait.h>
> > > > >  #include <linux/interrupt.h>
> > > > >  #include <linux/sched/signal.h>
> > > > > +#include <linux/sched/stat.h>
> > > > >  #include <uapi/linux/sched/types.h>
> > > > >  #include <linux/atomic.h>
> > > > >  #include <linux/bitops.h>
> > > > > @@ -460,9 +461,29 @@ static void rcu_sync_torture_init(void)
> > > > >  	INIT_LIST_HEAD(&rcu_torture_removed);
> > > > >  }
> > > > >  
> > > > > +unsigned long rcu_gp_nr_wakeups;
> > > > > +
> > > > > +static void rcu_flavor_init(void)
> > > > > +{
> > > > > +	rcu_sync_torture_init();
> > > > > +
> > > > > +	/* Make sure schedstat is enabled for GP thread wakeup count. */
> > > > > +	force_schedstat_enabled();
> > > > > +	rcu_gp_nr_wakeups = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> > > > > +}
> > > > > +
> > > > > +static void rcu_flavor_cleanup(void)
> > > > > +{
> > > > > +	unsigned long now_nr = rcu_get_main_gp_kthread()->se.statistics.nr_wakeups;
> > > > > +
> > > > > +	pr_alert("End-test: Cleanup: Total GP-kthread wakeups: %lu\n",
> > > > > +		now_nr - rcu_gp_nr_wakeups);
> > > > > +}
> > > > > +
> > > > >  static struct rcu_torture_ops rcu_ops = {
> > > > >  	.ttype		= RCU_FLAVOR,
> > > > > -	.init		= rcu_sync_torture_init,
> > > > > +	.init		= rcu_flavor_init,
> > > > > +	.cleanup	= rcu_flavor_cleanup,
> > > > >  	.readlock	= rcu_torture_read_lock,
> > > > >  	.read_delay	= rcu_read_delay,
> > > > >  	.readunlock	= rcu_torture_read_unlock,
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index c3bae7a83d792..a3a175feb310a 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -175,6 +175,13 @@ int rcu_get_gp_kthreads_prio(void)
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
> > > > >  
> > > > > +/* Retrieve RCU's main GP kthread task_struct */
> > > > > +struct task_struct *rcu_get_main_gp_kthread(void)
> > > > > +{
> > > > > +	return rcu_state.gp_kthread;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(rcu_get_main_gp_kthread);
> > > > > +
> > > > >  /*
> > > > >   * Number of grace periods between delays, normalized by the duration of
> > > > >   * the delay.  The longer the delay, the more the grace periods between
> > > > > -- 
> > > > > 2.27.0.111.gc72c7da667-goog
> > > > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-06-19  3:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18 20:29 [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Joel Fernandes (Google)
2020-06-18 20:29 ` [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes Joel Fernandes (Google)
2020-06-18 22:16   ` Paul E. McKenney
2020-06-18 23:52     ` Joel Fernandes
2020-06-18 20:29 ` [PATCH 3/7] rcu/trace: Add name of the source for gp_seq Joel Fernandes (Google)
2020-06-18 22:19   ` Paul E. McKenney
2020-06-18 23:51     ` Joel Fernandes
2020-06-19  0:12       ` Paul E. McKenney
2020-06-19  0:56         ` Joel Fernandes
2020-06-19  1:08         ` Joel Fernandes
2020-06-19  0:01     ` Steven Rostedt
2020-06-19  1:01       ` Joel Fernandes
2020-06-18 20:29 ` [PATCH 4/7] rcu/trace: Print negative GP numbers correctly Joel Fernandes (Google)
2020-06-18 22:30   ` Paul E. McKenney
2020-06-18 20:29 ` [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint Joel Fernandes (Google)
2020-06-18 22:27   ` Paul E. McKenney
2020-06-18 23:54     ` Joel Fernandes
2020-06-18 20:29 ` [PATCH 6/7] rcutorture: Add support to get the number of wakeups of main GP kthread Joel Fernandes (Google)
2020-06-18 22:40   ` Paul E. McKenney
2020-06-19  0:01     ` Joel Fernandes
2020-06-19  0:12       ` Paul E. McKenney
2020-06-19  1:00         ` Joel Fernandes
2020-06-19  3:23           ` Paul E. McKenney
2020-06-18 20:29 ` [PATCH 7/7] rcutorture: Add number of GP information to reports Joel Fernandes (Google)
2020-06-18 23:27   ` Paul E. McKenney
2020-06-18 22:11 ` [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate Paul E. McKenney
2020-06-18 23:09   ` Paul E. McKenney
2020-06-19  0:04     ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).