All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev
@ 2018-05-14  3:15 Joel Fernandes (Google)
  2018-05-14  3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google)
                   ` (7 more replies)
  0 siblings, 8 replies; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

Hi,
Here are some fixes, clean ups and some code comments changes mostly
for the new funnel locking, gp_seq changes and some tracing. Its based
on latest rcu/dev branch.

thanks,

- Joel

Joel Fernandes (Google) (8):
  rcu: Add comment documenting how rcu_seq_snap works
  rcu: Clarify usage of cond_resched for tasks-RCU
  rcu: Add back the cpuend tracepoint
  rcu: Get rid of old c variable from places in tree RCU
  rcu: Use rcu_node as temporary variable in funnel locking loop
  rcu: Add back the Startedleaf tracepoint
  rcu: trace CleanupMore condition only if needed
  rcu: Fix cpustart tracepoint gp_seq number

 include/linux/rcupdate.h   | 11 +++--
 include/trace/events/rcu.h | 19 ++++----
 kernel/rcu/rcu.h           | 24 +++++++++-
 kernel/rcu/tree.c          | 92 +++++++++++++++++++++++---------------
 4 files changed, 97 insertions(+), 49 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
@ 2018-05-14  3:15 ` Joel Fernandes (Google)
  2018-05-14  3:47   ` Randy Dunlap
  2018-05-14 17:38   ` Paul E. McKenney
  2018-05-14  3:15 ` [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU Joel Fernandes (Google)
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

rcu_seq_snap may be tricky for someone looking at it for the first time.
Lets document how it works with an example to make it easier.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 003671825d62..fc3170914ac7 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
 	WRITE_ONCE(*sp, rcu_seq_endval(sp));
 }
 
-/* Take a snapshot of the update side's sequence number. */
+/*
+ * Take a snapshot of the update side's sequence number.
+ *
+ * This function predicts what the grace period number will be the next
+ * time an RCU callback will be executed, given the current grace period's
+ * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
+ * already in progress.
+ *
+ * We do this with a single addition and masking.
+ * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of
+ * the seq is used to track if a GP is in progress or not, its sufficient if we
+ * add (2+1) and mask with ~1. Let's see why with an example:
+ *
+ * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0).
+ * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1)
+ * to account for the state bit. However, if the current seq is 7 (gp is 3 and
+ * state bit is 1), then it means the current grace period is already in
+ * progress so the next time the callback will run is at the end of grace
+ * period number gp+2. To account for the extra +1, we just overflow the LSB by
+ * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU
+ * is idle), then the addition of the extra 0x1 and masking will have no
+ * effect. This is calculated as below.
+ */
 static inline unsigned long rcu_seq_snap(unsigned long *sp)
 {
 	unsigned long s;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU
  2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
  2018-05-14  3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google)
@ 2018-05-14  3:15 ` Joel Fernandes (Google)
  2018-05-14 14:54   ` Steven Rostedt
  2018-05-14  3:15 ` [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint Joel Fernandes (Google)
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

Recently we had a discussion about cond_resched unconditionally
recording a voluntary context switch [1].

Lets add a comment clarifying that how this API is to be used.

[1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcupdate.h | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 743226176350..a9881007ece6 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { }
 	} while (0)
 
 /*
- * Note a voluntary context switch for RCU-tasks benefit.  This is a
- * macro rather than an inline function to avoid #include hell.
+ * Note an attempt to perform a voluntary context switch for RCU-tasks benefit.
+ *
+ * This is called even in situations where a context switch didn't really
+ * happen even though it was requested. The caller uses it to indicate
+ * traversal of an RCU-tasks quiescent state. This is a macro rather than an
+ * inline function to avoid #include hell.
  */
 #ifdef CONFIG_TASKS_RCU
 #define rcu_note_voluntary_context_switch_lite(t) \
@@ -187,7 +191,8 @@ static inline void exit_tasks_rcu_finish(void) { }
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
 
 /**
- * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
+ * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU.
+ * The quiescent state report is made even if cond_resched() did nothing.
  *
  * This macro resembles cond_resched(), except that it is defined to
  * report potential quiescent states to RCU-tasks even if the cond_resched()
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint
  2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
  2018-05-14  3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google)
  2018-05-14  3:15 ` [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU Joel Fernandes (Google)
@ 2018-05-14  3:15 ` Joel Fernandes (Google)
  2018-05-14 18:12   ` Paul E. McKenney
  2018-05-14  3:15 ` [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU Joel Fernandes (Google)
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

Commit be4b8beed87d ("rcu: Move RCU's grace-period-change code to ->gp_seq")
removed the cpuend grace period trace point. This patch adds it back.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9ad931bff409..29ccc60bdbfc 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1774,10 +1774,12 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 
 	/* Handle the ends of any preceding grace periods first. */
 	if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) ||
-	    unlikely(READ_ONCE(rdp->gpwrap)))
+	    unlikely(READ_ONCE(rdp->gpwrap))) {
 		ret = rcu_advance_cbs(rsp, rnp, rdp); /* Advance callbacks. */
-	else
+		trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpuend"));
+	} else {
 		ret = rcu_accelerate_cbs(rsp, rnp, rdp); /* Recent callbacks. */
+	}
 
 	/* Now handle the beginnings of any new-to-this-CPU grace periods. */
 	if (rcu_seq_new_gp(rdp->gp_seq, rnp->gp_seq) ||
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU
  2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2018-05-14  3:15 ` [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint Joel Fernandes (Google)
@ 2018-05-14  3:15 ` Joel Fernandes (Google)
  2018-05-14 17:57   ` Paul E. McKenney
  2018-05-14  3:15 ` [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

The 'c' variable was used previously to store the grace period
that is being requested. However it is not very meaningful for
a code reader, this patch replaces it with gp_seq_start indicating that
this is the grace period that was requested. Also updating tracing with
the new name.

Just a clean up patch, no logical change.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/trace/events/rcu.h | 15 ++++++------
 kernel/rcu/tree.c          | 47 ++++++++++++++++++++++----------------
 2 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index ce9d1a1cac78..539900a9f8c7 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -103,15 +103,16 @@ TRACE_EVENT(rcu_grace_period,
  */
 TRACE_EVENT(rcu_future_grace_period,
 
-	TP_PROTO(const char *rcuname, unsigned long gp_seq, unsigned long c,
-		 u8 level, int grplo, int grphi, const char *gpevent),
+	TP_PROTO(const char *rcuname, unsigned long gp_seq,
+		 unsigned long gp_seq_start, u8 level, int grplo, int grphi,
+		 const char *gpevent),
 
-	TP_ARGS(rcuname, gp_seq, c, level, grplo, grphi, gpevent),
+	TP_ARGS(rcuname, gp_seq, gp_seq_start, level, grplo, grphi, gpevent),
 
 	TP_STRUCT__entry(
 		__field(const char *, rcuname)
 		__field(unsigned long, gp_seq)
-		__field(unsigned long, c)
+		__field(unsigned long, gp_seq_start)
 		__field(u8, level)
 		__field(int, grplo)
 		__field(int, grphi)
@@ -121,7 +122,7 @@ TRACE_EVENT(rcu_future_grace_period,
 	TP_fast_assign(
 		__entry->rcuname = rcuname;
 		__entry->gp_seq = gp_seq;
-		__entry->c = c;
+		__entry->gp_seq_start = gp_seq_start;
 		__entry->level = level;
 		__entry->grplo = grplo;
 		__entry->grphi = grphi;
@@ -129,7 +130,7 @@ TRACE_EVENT(rcu_future_grace_period,
 	),
 
 	TP_printk("%s %lu %lu %u %d %d %s",
-		  __entry->rcuname, __entry->gp_seq, __entry->c, __entry->level,
+		  __entry->rcuname, __entry->gp_seq, __entry->gp_seq_start, __entry->level,
 		  __entry->grplo, __entry->grphi, __entry->gpevent)
 );
 
@@ -751,7 +752,7 @@ TRACE_EVENT(rcu_barrier,
 #else /* #ifdef CONFIG_RCU_TRACE */
 
 #define trace_rcu_grace_period(rcuname, gp_seq, gpevent) do { } while (0)
-#define trace_rcu_future_grace_period(rcuname, gp_seq, c, \
+#define trace_rcu_future_grace_period(rcuname, gp_seq, gp_seq_start, \
 				      level, grplo, grphi, event) \
 				      do { } while (0)
 #define trace_rcu_grace_period_init(rcuname, gp_seq, level, grplo, grphi, \
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 29ccc60bdbfc..9f5679ba413b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1541,13 +1541,18 @@ void rcu_cpu_stall_reset(void)
 
 /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
 static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
-			      unsigned long c, const char *s)
+			      unsigned long gp_seq_start, const char *s)
 {
-	trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, c,
+	trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, gp_seq_start,
 				      rnp->level, rnp->grplo, rnp->grphi, s);
 }
 
 /*
+ * rcu_start_this_gp - Request the start of a particular grace period
+ * @rnp: The leaf node of the CPU from which to start.
+ * @rdp: The rcu_data corresponding to the CPU from which to start.
+ * @gp_seq_start: The gp_seq of the grace period to start.
+ *
  * Start the specified grace period, as needed to handle newly arrived
  * callbacks.  The required future grace periods are recorded in each
  * rcu_node structure's ->gp_seq_needed field.  Returns true if there
@@ -1555,9 +1560,11 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
  *
  * The caller must hold the specified rcu_node structure's ->lock, which
  * is why the caller is responsible for waking the grace-period kthread.
+ *
+ * Returns true if the GP thread needs to be awakened else false.
  */
 static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
-			      unsigned long c)
+			      unsigned long gp_seq_start)
 {
 	bool ret = false;
 	struct rcu_state *rsp = rdp->rsp;
@@ -1573,18 +1580,19 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	 * not be released.
 	 */
 	raw_lockdep_assert_held_rcu_node(rnp);
-	trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
+	trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf"));
 	for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
 		if (rnp_root != rnp)
 			raw_spin_lock_rcu_node(rnp_root);
-		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) ||
-		    rcu_seq_done(&rnp_root->gp_seq, c) ||
+		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) ||
+		    rcu_seq_done(&rnp_root->gp_seq, gp_seq_start) ||
 		    (rnp != rnp_root &&
 		     rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
-			trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted"));
+			trace_rcu_this_gp(rnp_root, rdp, gp_seq_start,
+					  TPS("Prestarted"));
 			goto unlock_out;
 		}
-		rnp_root->gp_seq_needed = c;
+		rnp_root->gp_seq_needed = gp_seq_start;
 		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))
 			goto unlock_out;
 		if (rnp_root != rnp && rnp_root->parent != NULL)
@@ -1595,14 +1603,14 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 
 	/* If GP already in progress, just leave, otherwise start one. */
 	if (rcu_gp_in_progress(rsp)) {
-		trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedleafroot"));
+		trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("Startedleafroot"));
 		goto unlock_out;
 	}
-	trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedroot"));
+	trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("Startedroot"));
 	WRITE_ONCE(rsp->gp_flags, rsp->gp_flags | RCU_GP_FLAG_INIT);
 	rsp->gp_req_activity = jiffies;
 	if (!rsp->gp_kthread) {
-		trace_rcu_this_gp(rnp_root, rdp, c, TPS("NoGPkthread"));
+		trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("NoGPkthread"));
 		goto unlock_out;
 	}
 	trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq"));
@@ -1611,9 +1619,9 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	if (rnp != rnp_root)
 		raw_spin_unlock_rcu_node(rnp_root);
 	/* Push furthest requested GP to leaf node and rcu_data structure. */
-	if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c)) {
-		rnp->gp_seq_needed = c;
-		rdp->gp_seq_needed = c;
+	if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) {
+		rnp->gp_seq_needed = gp_seq_start;
+		rdp->gp_seq_needed = gp_seq_start;
 	}
 	return ret;
 }
@@ -1624,14 +1632,13 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
  */
 static bool rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
 {
-	unsigned long c = rnp->gp_seq;
 	bool needmore;
 	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
 
 	needmore = ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed);
 	if (!needmore)
 		rnp->gp_seq_needed = rnp->gp_seq; /* Avoid counter wrap. */
-	trace_rcu_this_gp(rnp, rdp, c,
+	trace_rcu_this_gp(rnp, rdp, rnp->gp_seq,
 			  needmore ? TPS("CleanupMore") : TPS("Cleanup"));
 	return needmore;
 }
@@ -1667,7 +1674,7 @@ static void rcu_gp_kthread_wake(struct rcu_state *rsp)
 static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp,
 			       struct rcu_data *rdp)
 {
-	unsigned long c;
+	unsigned long gp_seq_start;
 	bool ret = false;
 
 	raw_lockdep_assert_held_rcu_node(rnp);
@@ -1686,9 +1693,9 @@ static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp,
 	 * accelerating callback invocation to an earlier grace-period
 	 * number.
 	 */
-	c = rcu_seq_snap(&rsp->gp_seq);
-	if (rcu_segcblist_accelerate(&rdp->cblist, c))
-		ret = rcu_start_this_gp(rnp, rdp, c);
+	gp_seq_start = rcu_seq_snap(&rsp->gp_seq);
+	if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_start))
+		ret = rcu_start_this_gp(rnp, rdp, gp_seq_start);
 
 	/* Trace depending on how much we were able to accelerate. */
 	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop
  2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2018-05-14  3:15 ` [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU Joel Fernandes (Google)
@ 2018-05-14  3:15 ` Joel Fernandes (Google)
  2018-05-14 18:00   ` Paul E. McKenney
  2018-05-14  3:15 ` [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

The funnel locking loop in rcu_start_this_gp uses rcu_root as a
temporary variable while walking the combining tree. This causes a
tiresome exercise of a code reader reminding themselves that rcu_root
may not be root. Lets just call it rcu_node, and then finally when
rcu_node is the rcu_root, lets assign it at that time.

Just a clean up patch, no logical change.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9f5679ba413b..40670047d22c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1568,7 +1568,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 {
 	bool ret = false;
 	struct rcu_state *rsp = rdp->rsp;
-	struct rcu_node *rnp_root;
+	struct rcu_node *rnp_node, *rnp_root = NULL;
 
 	/*
 	 * Use funnel locking to either acquire the root rcu_node
@@ -1581,24 +1581,26 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	 */
 	raw_lockdep_assert_held_rcu_node(rnp);
 	trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf"));
-	for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
-		if (rnp_root != rnp)
-			raw_spin_lock_rcu_node(rnp_root);
-		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) ||
-		    rcu_seq_done(&rnp_root->gp_seq, gp_seq_start) ||
-		    (rnp != rnp_root &&
-		     rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
-			trace_rcu_this_gp(rnp_root, rdp, gp_seq_start,
+	for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) {
+		if (rnp_node != rnp)
+			raw_spin_lock_rcu_node(rnp_node);
+		if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start) ||
+		    rcu_seq_done(&rnp_node->gp_seq, gp_seq_start) ||
+		    (rnp != rnp_node &&
+		     rcu_seq_state(rcu_seq_current(&rnp_node->gp_seq)))) {
+			trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
 					  TPS("Prestarted"));
 			goto unlock_out;
 		}
-		rnp_root->gp_seq_needed = gp_seq_start;
+		rnp_node->gp_seq_needed = gp_seq_start;
 		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))
 			goto unlock_out;
-		if (rnp_root != rnp && rnp_root->parent != NULL)
-			raw_spin_unlock_rcu_node(rnp_root);
-		if (!rnp_root->parent)
+		if (rnp_node != rnp && rnp_node->parent != NULL)
+			raw_spin_unlock_rcu_node(rnp_node);
+		if (!rnp_node->parent) {
+			rnp_root = rnp_node;
 			break;  /* At root, and perhaps also leaf. */
+		}
 	}
 
 	/* If GP already in progress, just leave, otherwise start one. */
@@ -1616,10 +1618,10 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq"));
 	ret = true;  /* Caller must wake GP kthread. */
 unlock_out:
-	if (rnp != rnp_root)
-		raw_spin_unlock_rcu_node(rnp_root);
+	if (rnp != rnp_node)
+		raw_spin_unlock_rcu_node(rnp_node);
 	/* Push furthest requested GP to leaf node and rcu_data structure. */
-	if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) {
+	if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start)) {
 		rnp->gp_seq_needed = gp_seq_start;
 		rdp->gp_seq_needed = gp_seq_start;
 	}
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint
  2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
                   ` (4 preceding siblings ...)
  2018-05-14  3:15 ` [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop Joel Fernandes (Google)
@ 2018-05-14  3:15 ` Joel Fernandes (Google)
  2018-05-14 18:38   ` Paul E. McKenney
  2018-05-14  3:15 ` [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed Joel Fernandes (Google)
  2018-05-14  3:15 ` [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number Joel Fernandes (Google)
  7 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

In recent discussion [1], the check for whether a leaf believes RCU is
not idle, is being added back to funnel locking code, to avoid more
locking. In this we are marking the leaf node for a future grace-period
and bailing out since a GP is currently in progress. However the
tracepoint is missing. Lets add it back.

Also add a small comment about why we do this check (basically the point
is to avoid locking intermediate nodes unnecessarily) and clarify the
comments in the trace event header now that we are doing traversal of
one or more intermediate nodes.

[1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/trace/events/rcu.h |  4 ++--
 kernel/rcu/tree.c          | 11 ++++++++++-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 539900a9f8c7..dc0bd11739c7 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period,
  *
  * "Startleaf": Request a grace period based on leaf-node data.
  * "Prestarted": Someone beat us to the request
- * "Startedleaf": Leaf-node start proved sufficient.
- * "Startedleafroot": Leaf-node start proved sufficient after checking root.
+ * "Startedleaf": Leaf and one or more non-root nodes marked for future start.
+ * "Startedleafroot": all non-root nodes from leaf to root marked for future start.
  * "Startedroot": Requested a nocb grace period based on root-node data.
  * "NoGPkthread": The RCU grace-period kthread has not yet started.
  * "StartWait": Start waiting for the requested grace period.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 40670047d22c..8401a253e7de 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1593,8 +1593,17 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 			goto unlock_out;
 		}
 		rnp_node->gp_seq_needed = gp_seq_start;
-		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))
+
+		/*
+		 * Check if leaf believes a GP is in progress, if yes we can
+		 * bail and avoid more locking. We have already marked the leaf.
+		 */
+		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
+			trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
+					  TPS("Startedleaf"));
 			goto unlock_out;
+		}
+
 		if (rnp_node != rnp && rnp_node->parent != NULL)
 			raw_spin_unlock_rcu_node(rnp_node);
 		if (!rnp_node->parent) {
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed
  2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
                   ` (5 preceding siblings ...)
  2018-05-14  3:15 ` [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint Joel Fernandes (Google)
@ 2018-05-14  3:15 ` Joel Fernandes (Google)
  2018-05-14 19:20   ` Paul E. McKenney
  2018-05-14  3:15 ` [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number Joel Fernandes (Google)
  7 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

Currently the tree RCU clean up code records a CleanupMore trace event
even if the GP was already in progress. This makes CleanupMore show up
twice for no reason. Avoid it.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8401a253e7de..25c44328d071 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2083,7 +2083,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	rsp->gp_state = RCU_GP_IDLE;
 	/* Check for GP requests since above loop. */
 	rdp = this_cpu_ptr(rsp->rda);
-	if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
+	if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
 		trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed,
 				  TPS("CleanupMore"));
 		needgp = true;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number
  2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
                   ` (6 preceding siblings ...)
  2018-05-14  3:15 ` [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed Joel Fernandes (Google)
@ 2018-05-14  3:15 ` Joel Fernandes (Google)
  2018-05-14 20:33   ` Paul E. McKenney
  7 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes (Google) @ 2018-05-14  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

cpustart shows a stale gp_seq. This is because rdp->gp_seq is updated
only at the end of the __note_gp_changes function. For this reason, use
rnp->gp_seq instead. I believe we can't update rdp->gp_seq too early so
lets just use the gp_seq from rnp instead.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 25c44328d071..58d2b68f8b98 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1807,7 +1807,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		 * set up to detect a quiescent state, otherwise don't
 		 * go looking for one.
 		 */
-		trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpustart"));
+		trace_rcu_grace_period(rsp->name, rnp->gp_seq, TPS("cpustart"));
 		need_gp = !!(rnp->qsmask & rdp->grpmask);
 		rdp->cpu_no_qs.b.norm = need_gp;
 		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_dynticks.rcu_qs_ctr);
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-14  3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google)
@ 2018-05-14  3:47   ` Randy Dunlap
  2018-05-14  5:05     ` Joel Fernandes
  2018-05-14 17:38   ` Paul E. McKenney
  1 sibling, 1 reply; 42+ messages in thread
From: Randy Dunlap @ 2018-05-14  3:47 UTC (permalink / raw)
  To: Joel Fernandes (Google), linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

On 05/13/2018 08:15 PM, Joel Fernandes (Google) wrote:
> rcu_seq_snap may be tricky for someone looking at it for the first time.
> Lets document how it works with an example to make it easier.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 003671825d62..fc3170914ac7 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
>  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
>  }
>  
> -/* Take a snapshot of the update side's sequence number. */
> +/*
> + * Take a snapshot of the update side's sequence number.
> + *
> + * This function predicts what the grace period number will be the next
> + * time an RCU callback will be executed, given the current grace period's
> + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> + * already in progress.
> + *
> + * We do this with a single addition and masking.
> + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of
> + * the seq is used to track if a GP is in progress or not, its sufficient if we

                                                              it's

> + * add (2+1) and mask with ~1. Let's see why with an example:
> + *
> + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0).
> + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1)
> + * to account for the state bit. However, if the current seq is 7 (gp is 3 and
> + * state bit is 1), then it means the current grace period is already in
> + * progress so the next time the callback will run is at the end of grace
> + * period number gp+2. To account for the extra +1, we just overflow the LSB by
> + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU
> + * is idle), then the addition of the extra 0x1 and masking will have no
> + * effect. This is calculated as below.
> + */
>  static inline unsigned long rcu_seq_snap(unsigned long *sp)
>  {
>  	unsigned long s;
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-14  3:47   ` Randy Dunlap
@ 2018-05-14  5:05     ` Joel Fernandes
  0 siblings, 0 replies; 42+ messages in thread
From: Joel Fernandes @ 2018-05-14  5:05 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, byungchul.park, kernel-team

On Sun, May 13, 2018 at 08:47:24PM -0700, Randy Dunlap wrote:
> On 05/13/2018 08:15 PM, Joel Fernandes (Google) wrote:
> > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > Lets document how it works with an example to make it easier.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index 003671825d62..fc3170914ac7 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> >  }
> >  
> > -/* Take a snapshot of the update side's sequence number. */
> > +/*
> > + * Take a snapshot of the update side's sequence number.
> > + *
> > + * This function predicts what the grace period number will be the next
> > + * time an RCU callback will be executed, given the current grace period's
> > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > + * already in progress.
> > + *
> > + * We do this with a single addition and masking.
> > + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of
> > + * the seq is used to track if a GP is in progress or not, its sufficient if we
> 
>                                                               it's

Pardon my english. Fixed, thanks,

- Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU
  2018-05-14  3:15 ` [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU Joel Fernandes (Google)
@ 2018-05-14 14:54   ` Steven Rostedt
  2018-05-14 17:22     ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Steven Rostedt @ 2018-05-14 14:54 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Sun, 13 May 2018 20:15:35 -0700
"Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> Recently we had a discussion about cond_resched unconditionally
> recording a voluntary context switch [1].
> 
> Lets add a comment clarifying that how this API is to be used.
> 
> [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  include/linux/rcupdate.h | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 743226176350..a9881007ece6 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { }
>  	} while (0)
>  
>  /*
> - * Note a voluntary context switch for RCU-tasks benefit.  This is a
> - * macro rather than an inline function to avoid #include hell.
> + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit.
> + *
> + * This is called even in situations where a context switch didn't really
> + * happen even though it was requested. The caller uses it to indicate
> + * traversal of an RCU-tasks quiescent state. This is a macro rather than an
> + * inline function to avoid #include hell.

I don't know. I just don't like the wording. It sounds too much like
it was written by someone that was confused for it being called when a
context switch didn't occur ;-)

What about something more like:

/*
 * This is called to denote a RCU-task quiescent state. It is placed at
 * voluntary preemption points, as RCU-task critical sections may not
 * perform voluntary preemption or scheduling calls. It does not matter
 * if the task is scheduled out or not, just that a voluntary preemption
 * may be done.
 */

>   */
>  #ifdef CONFIG_TASKS_RCU
>  #define rcu_note_voluntary_context_switch_lite(t) \
> @@ -187,7 +191,8 @@ static inline void exit_tasks_rcu_finish(void) { }
>  #endif /* #else #ifdef CONFIG_TASKS_RCU */
>  
>  /**
> - * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
> + * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU.
> + * The quiescent state report is made even if cond_resched() did nothing.

Same thing here.

 * The quiescent state report does not depend on cond_resched() scheduling.

-- Steve


>   *
>   * This macro resembles cond_resched(), except that it is defined to
>   * report potential quiescent states to RCU-tasks even if the cond_resched()

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU
  2018-05-14 14:54   ` Steven Rostedt
@ 2018-05-14 17:22     ` Paul E. McKenney
  2018-05-15  0:35       ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-14 17:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes (Google),
	linux-kernel, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan,
	byungchul.park, kernel-team

On Mon, May 14, 2018 at 10:54:54AM -0400, Steven Rostedt wrote:
> On Sun, 13 May 2018 20:15:35 -0700
> "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> 
> > Recently we had a discussion about cond_resched unconditionally
> > recording a voluntary context switch [1].
> > 
> > Lets add a comment clarifying that how this API is to be used.
> > 
> > [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  include/linux/rcupdate.h | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 743226176350..a9881007ece6 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { }
> >  	} while (0)
> >  
> >  /*
> > - * Note a voluntary context switch for RCU-tasks benefit.  This is a
> > - * macro rather than an inline function to avoid #include hell.
> > + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit.
> > + *
> > + * This is called even in situations where a context switch didn't really
> > + * happen even though it was requested. The caller uses it to indicate
> > + * traversal of an RCU-tasks quiescent state. This is a macro rather than an
> > + * inline function to avoid #include hell.
> 
> I don't know. I just don't like the wording. It sounds too much like
> it was written by someone that was confused for it being called when a
> context switch didn't occur ;-)
> 
> What about something more like:
> 
> /*
>  * This is called to denote a RCU-task quiescent state. It is placed at
>  * voluntary preemption points, as RCU-task critical sections may not
>  * perform voluntary preemption or scheduling calls. It does not matter
>  * if the task is scheduled out or not, just that a voluntary preemption
>  * may be done.
>  */

s/RCU-task/RCU-tasks/ and I am good with this.

							Thanx, Paul

> >   */
> >  #ifdef CONFIG_TASKS_RCU
> >  #define rcu_note_voluntary_context_switch_lite(t) \
> > @@ -187,7 +191,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> >  #endif /* #else #ifdef CONFIG_TASKS_RCU */
> >  
> >  /**
> > - * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
> > + * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU.
> > + * The quiescent state report is made even if cond_resched() did nothing.
> 
> Same thing here.
> 
>  * The quiescent state report does not depend on cond_resched() scheduling.
> 
> -- Steve
> 
> 
> >   *
> >   * This macro resembles cond_resched(), except that it is defined to
> >   * report potential quiescent states to RCU-tasks even if the cond_resched()
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-14  3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google)
  2018-05-14  3:47   ` Randy Dunlap
@ 2018-05-14 17:38   ` Paul E. McKenney
  2018-05-15  1:51     ` Joel Fernandes
  1 sibling, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-14 17:38 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> rcu_seq_snap may be tricky for someone looking at it for the first time.
> Lets document how it works with an example to make it easier.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 003671825d62..fc3170914ac7 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
>  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
>  }
> 
> -/* Take a snapshot of the update side's sequence number. */
> +/*
> + * Take a snapshot of the update side's sequence number.
> + *
> + * This function predicts what the grace period number will be the next
> + * time an RCU callback will be executed, given the current grace period's
> + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> + * already in progress.

How about something like this?

	This function returns the earliest value of the grace-period
	sequence number that will indicate that a full grace period has
	elapsed since the current time.  Once the grace-period sequence
	number has reached this value, it will be safe to invoke all
	callbacks that have been registered prior to the current time.
	This value is the current grace-period number plus two to the
	power of the number of low-order bits reserved for state, then
	rounded up to the next value in which the state bits are all zero.

> + *
> + * We do this with a single addition and masking.

Please either fold this sentence into rest of the paragraph or add a
blank line after it.

> + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of
> + * the seq is used to track if a GP is in progress or not, its sufficient if we
> + * add (2+1) and mask with ~1. Let's see why with an example:
> + *
> + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0).
> + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1)
> + * to account for the state bit. However, if the current seq is 7 (gp is 3 and
> + * state bit is 1), then it means the current grace period is already in
> + * progress so the next time the callback will run is at the end of grace
> + * period number gp+2. To account for the extra +1, we just overflow the LSB by
> + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU
> + * is idle), then the addition of the extra 0x1 and masking will have no
> + * effect. This is calculated as below.
> + */

Having the explicit numbers is good, but please use RCU_SEQ_STATE_MASK=3,
since that is the current value.  One alternative (or perhaps addition)
is to have a short table of numbers showing the mapping from *sp to the
return value.  (I started from such a table when writing this function,
for whatever that is worth.)

						Thanx, Paul

>  static inline unsigned long rcu_seq_snap(unsigned long *sp)
>  {
>  	unsigned long s;
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU
  2018-05-14  3:15 ` [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU Joel Fernandes (Google)
@ 2018-05-14 17:57   ` Paul E. McKenney
  2018-05-15  0:41     ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-14 17:57 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Sun, May 13, 2018 at 08:15:37PM -0700, Joel Fernandes (Google) wrote:
> The 'c' variable was used previously to store the grace period
> that is being requested. However it is not very meaningful for
> a code reader, this patch replaces it with gp_seq_start indicating that

Good catch, but how about "gp_seq_req" instead of gp_seq_start?

							Thanx, Paul

> this is the grace period that was requested. Also updating tracing with
> the new name.
> 
> Just a clean up patch, no logical change.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  include/trace/events/rcu.h | 15 ++++++------
>  kernel/rcu/tree.c          | 47 ++++++++++++++++++++++----------------
>  2 files changed, 35 insertions(+), 27 deletions(-)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index ce9d1a1cac78..539900a9f8c7 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -103,15 +103,16 @@ TRACE_EVENT(rcu_grace_period,
>   */
>  TRACE_EVENT(rcu_future_grace_period,
> 
> -	TP_PROTO(const char *rcuname, unsigned long gp_seq, unsigned long c,
> -		 u8 level, int grplo, int grphi, const char *gpevent),
> +	TP_PROTO(const char *rcuname, unsigned long gp_seq,
> +		 unsigned long gp_seq_start, u8 level, int grplo, int grphi,
> +		 const char *gpevent),
> 
> -	TP_ARGS(rcuname, gp_seq, c, level, grplo, grphi, gpevent),
> +	TP_ARGS(rcuname, gp_seq, gp_seq_start, level, grplo, grphi, gpevent),
> 
>  	TP_STRUCT__entry(
>  		__field(const char *, rcuname)
>  		__field(unsigned long, gp_seq)
> -		__field(unsigned long, c)
> +		__field(unsigned long, gp_seq_start)
>  		__field(u8, level)
>  		__field(int, grplo)
>  		__field(int, grphi)
> @@ -121,7 +122,7 @@ TRACE_EVENT(rcu_future_grace_period,
>  	TP_fast_assign(
>  		__entry->rcuname = rcuname;
>  		__entry->gp_seq = gp_seq;
> -		__entry->c = c;
> +		__entry->gp_seq_start = gp_seq_start;
>  		__entry->level = level;
>  		__entry->grplo = grplo;
>  		__entry->grphi = grphi;
> @@ -129,7 +130,7 @@ TRACE_EVENT(rcu_future_grace_period,
>  	),
> 
>  	TP_printk("%s %lu %lu %u %d %d %s",
> -		  __entry->rcuname, __entry->gp_seq, __entry->c, __entry->level,
> +		  __entry->rcuname, __entry->gp_seq, __entry->gp_seq_start, __entry->level,
>  		  __entry->grplo, __entry->grphi, __entry->gpevent)
>  );
> 
> @@ -751,7 +752,7 @@ TRACE_EVENT(rcu_barrier,
>  #else /* #ifdef CONFIG_RCU_TRACE */
> 
>  #define trace_rcu_grace_period(rcuname, gp_seq, gpevent) do { } while (0)
> -#define trace_rcu_future_grace_period(rcuname, gp_seq, c, \
> +#define trace_rcu_future_grace_period(rcuname, gp_seq, gp_seq_start, \
>  				      level, grplo, grphi, event) \
>  				      do { } while (0)
>  #define trace_rcu_grace_period_init(rcuname, gp_seq, level, grplo, grphi, \
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 29ccc60bdbfc..9f5679ba413b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1541,13 +1541,18 @@ void rcu_cpu_stall_reset(void)
> 
>  /* Trace-event wrapper function for trace_rcu_future_grace_period.  */
>  static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> -			      unsigned long c, const char *s)
> +			      unsigned long gp_seq_start, const char *s)
>  {
> -	trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, c,
> +	trace_rcu_future_grace_period(rdp->rsp->name, rnp->gp_seq, gp_seq_start,
>  				      rnp->level, rnp->grplo, rnp->grphi, s);
>  }
> 
>  /*
> + * rcu_start_this_gp - Request the start of a particular grace period
> + * @rnp: The leaf node of the CPU from which to start.
> + * @rdp: The rcu_data corresponding to the CPU from which to start.
> + * @gp_seq_start: The gp_seq of the grace period to start.
> + *
>   * Start the specified grace period, as needed to handle newly arrived
>   * callbacks.  The required future grace periods are recorded in each
>   * rcu_node structure's ->gp_seq_needed field.  Returns true if there
> @@ -1555,9 +1560,11 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>   *
>   * The caller must hold the specified rcu_node structure's ->lock, which
>   * is why the caller is responsible for waking the grace-period kthread.
> + *
> + * Returns true if the GP thread needs to be awakened else false.
>   */
>  static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> -			      unsigned long c)
> +			      unsigned long gp_seq_start)
>  {
>  	bool ret = false;
>  	struct rcu_state *rsp = rdp->rsp;
> @@ -1573,18 +1580,19 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	 * not be released.
>  	 */
>  	raw_lockdep_assert_held_rcu_node(rnp);
> -	trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
> +	trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf"));
>  	for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
>  		if (rnp_root != rnp)
>  			raw_spin_lock_rcu_node(rnp_root);
> -		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) ||
> -		    rcu_seq_done(&rnp_root->gp_seq, c) ||
> +		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) ||
> +		    rcu_seq_done(&rnp_root->gp_seq, gp_seq_start) ||
>  		    (rnp != rnp_root &&
>  		     rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
> -			trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted"));
> +			trace_rcu_this_gp(rnp_root, rdp, gp_seq_start,
> +					  TPS("Prestarted"));
>  			goto unlock_out;
>  		}
> -		rnp_root->gp_seq_needed = c;
> +		rnp_root->gp_seq_needed = gp_seq_start;
>  		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))
>  			goto unlock_out;
>  		if (rnp_root != rnp && rnp_root->parent != NULL)
> @@ -1595,14 +1603,14 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> 
>  	/* If GP already in progress, just leave, otherwise start one. */
>  	if (rcu_gp_in_progress(rsp)) {
> -		trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedleafroot"));
> +		trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("Startedleafroot"));
>  		goto unlock_out;
>  	}
> -	trace_rcu_this_gp(rnp_root, rdp, c, TPS("Startedroot"));
> +	trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("Startedroot"));
>  	WRITE_ONCE(rsp->gp_flags, rsp->gp_flags | RCU_GP_FLAG_INIT);
>  	rsp->gp_req_activity = jiffies;
>  	if (!rsp->gp_kthread) {
> -		trace_rcu_this_gp(rnp_root, rdp, c, TPS("NoGPkthread"));
> +		trace_rcu_this_gp(rnp_root, rdp, gp_seq_start, TPS("NoGPkthread"));
>  		goto unlock_out;
>  	}
>  	trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq"));
> @@ -1611,9 +1619,9 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	if (rnp != rnp_root)
>  		raw_spin_unlock_rcu_node(rnp_root);
>  	/* Push furthest requested GP to leaf node and rcu_data structure. */
> -	if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c)) {
> -		rnp->gp_seq_needed = c;
> -		rdp->gp_seq_needed = c;
> +	if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) {
> +		rnp->gp_seq_needed = gp_seq_start;
> +		rdp->gp_seq_needed = gp_seq_start;
>  	}
>  	return ret;
>  }
> @@ -1624,14 +1632,13 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>   */
>  static bool rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
>  {
> -	unsigned long c = rnp->gp_seq;
>  	bool needmore;
>  	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
> 
>  	needmore = ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed);
>  	if (!needmore)
>  		rnp->gp_seq_needed = rnp->gp_seq; /* Avoid counter wrap. */
> -	trace_rcu_this_gp(rnp, rdp, c,
> +	trace_rcu_this_gp(rnp, rdp, rnp->gp_seq,
>  			  needmore ? TPS("CleanupMore") : TPS("Cleanup"));
>  	return needmore;
>  }
> @@ -1667,7 +1674,7 @@ static void rcu_gp_kthread_wake(struct rcu_state *rsp)
>  static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp,
>  			       struct rcu_data *rdp)
>  {
> -	unsigned long c;
> +	unsigned long gp_seq_start;
>  	bool ret = false;
> 
>  	raw_lockdep_assert_held_rcu_node(rnp);
> @@ -1686,9 +1693,9 @@ static bool rcu_accelerate_cbs(struct rcu_state *rsp, struct rcu_node *rnp,
>  	 * accelerating callback invocation to an earlier grace-period
>  	 * number.
>  	 */
> -	c = rcu_seq_snap(&rsp->gp_seq);
> -	if (rcu_segcblist_accelerate(&rdp->cblist, c))
> -		ret = rcu_start_this_gp(rnp, rdp, c);
> +	gp_seq_start = rcu_seq_snap(&rsp->gp_seq);
> +	if (rcu_segcblist_accelerate(&rdp->cblist, gp_seq_start))
> +		ret = rcu_start_this_gp(rnp, rdp, gp_seq_start);
> 
>  	/* Trace depending on how much we were able to accelerate. */
>  	if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop
  2018-05-14  3:15 ` [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop Joel Fernandes (Google)
@ 2018-05-14 18:00   ` Paul E. McKenney
  2018-05-15  0:43     ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-14 18:00 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Sun, May 13, 2018 at 08:15:38PM -0700, Joel Fernandes (Google) wrote:
> The funnel locking loop in rcu_start_this_gp uses rcu_root as a
> temporary variable while walking the combining tree. This causes a
> tiresome exercise of a code reader reminding themselves that rcu_root
> may not be root. Lets just call it rcu_node, and then finally when
> rcu_node is the rcu_root, lets assign it at that time.
> 
> Just a clean up patch, no logical change.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

I agree that my names could use some improvement, but given that you
called it rnp_node in the patch and rcu_node in the commit log, I would
argue that rnp_node has a Hamming-distance problem.  ;-)

How about rnp_start for the formal parameter, rnp for the cursor running
up the tree, and retaining rnp_root for the root?

I considered and rejected the thought of rnp_leaf for the formal
parameter because this function is not necessarily called with a leaf
rcu_node structure.

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 34 ++++++++++++++++++----------------
>  1 file changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 9f5679ba413b..40670047d22c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1568,7 +1568,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  {
>  	bool ret = false;
>  	struct rcu_state *rsp = rdp->rsp;
> -	struct rcu_node *rnp_root;
> +	struct rcu_node *rnp_node, *rnp_root = NULL;
> 
>  	/*
>  	 * Use funnel locking to either acquire the root rcu_node
> @@ -1581,24 +1581,26 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	 */
>  	raw_lockdep_assert_held_rcu_node(rnp);
>  	trace_rcu_this_gp(rnp, rdp, gp_seq_start, TPS("Startleaf"));
> -	for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
> -		if (rnp_root != rnp)
> -			raw_spin_lock_rcu_node(rnp_root);
> -		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start) ||
> -		    rcu_seq_done(&rnp_root->gp_seq, gp_seq_start) ||
> -		    (rnp != rnp_root &&
> -		     rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
> -			trace_rcu_this_gp(rnp_root, rdp, gp_seq_start,
> +	for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) {
> +		if (rnp_node != rnp)
> +			raw_spin_lock_rcu_node(rnp_node);
> +		if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start) ||
> +		    rcu_seq_done(&rnp_node->gp_seq, gp_seq_start) ||
> +		    (rnp != rnp_node &&
> +		     rcu_seq_state(rcu_seq_current(&rnp_node->gp_seq)))) {
> +			trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
>  					  TPS("Prestarted"));
>  			goto unlock_out;
>  		}
> -		rnp_root->gp_seq_needed = gp_seq_start;
> +		rnp_node->gp_seq_needed = gp_seq_start;
>  		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))
>  			goto unlock_out;
> -		if (rnp_root != rnp && rnp_root->parent != NULL)
> -			raw_spin_unlock_rcu_node(rnp_root);
> -		if (!rnp_root->parent)
> +		if (rnp_node != rnp && rnp_node->parent != NULL)
> +			raw_spin_unlock_rcu_node(rnp_node);
> +		if (!rnp_node->parent) {
> +			rnp_root = rnp_node;
>  			break;  /* At root, and perhaps also leaf. */
> +		}
>  	}
> 
>  	/* If GP already in progress, just leave, otherwise start one. */
> @@ -1616,10 +1618,10 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq"));
>  	ret = true;  /* Caller must wake GP kthread. */
>  unlock_out:
> -	if (rnp != rnp_root)
> -		raw_spin_unlock_rcu_node(rnp_root);
> +	if (rnp != rnp_node)
> +		raw_spin_unlock_rcu_node(rnp_node);
>  	/* Push furthest requested GP to leaf node and rcu_data structure. */
> -	if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_start)) {
> +	if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start)) {
>  		rnp->gp_seq_needed = gp_seq_start;
>  		rdp->gp_seq_needed = gp_seq_start;
>  	}
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint
  2018-05-14  3:15 ` [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint Joel Fernandes (Google)
@ 2018-05-14 18:12   ` Paul E. McKenney
  2018-05-15  0:43     ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-14 18:12 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Sun, May 13, 2018 at 08:15:36PM -0700, Joel Fernandes (Google) wrote:
> Commit be4b8beed87d ("rcu: Move RCU's grace-period-change code to ->gp_seq")
> removed the cpuend grace period trace point. This patch adds it back.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Good catch!!!  I queued this to be squashed into the offending commit,
with attribution.

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 9ad931bff409..29ccc60bdbfc 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1774,10 +1774,12 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
> 
>  	/* Handle the ends of any preceding grace periods first. */
>  	if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) ||
> -	    unlikely(READ_ONCE(rdp->gpwrap)))
> +	    unlikely(READ_ONCE(rdp->gpwrap))) {
>  		ret = rcu_advance_cbs(rsp, rnp, rdp); /* Advance callbacks. */
> -	else
> +		trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpuend"));
> +	} else {
>  		ret = rcu_accelerate_cbs(rsp, rnp, rdp); /* Recent callbacks. */
> +	}
> 
>  	/* Now handle the beginnings of any new-to-this-CPU grace periods. */
>  	if (rcu_seq_new_gp(rdp->gp_seq, rnp->gp_seq) ||
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint
  2018-05-14  3:15 ` [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint Joel Fernandes (Google)
@ 2018-05-14 18:38   ` Paul E. McKenney
  2018-05-15  0:57     ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-14 18:38 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote:
> In recent discussion [1], the check for whether a leaf believes RCU is
> not idle, is being added back to funnel locking code, to avoid more
> locking. In this we are marking the leaf node for a future grace-period
> and bailing out since a GP is currently in progress. However the
> tracepoint is missing. Lets add it back.
> 
> Also add a small comment about why we do this check (basically the point
> is to avoid locking intermediate nodes unnecessarily) and clarify the
> comments in the trace event header now that we are doing traversal of
> one or more intermediate nodes.
> 
> [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Looks like a good idea, but it does not apply -- which is not a surprise,
given the change rate in this code.  I hand-applied as a modification
to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress")
with attribution, but with the changes below.  Please let me know if I
am missing something.

Ah, I see -- this commit depends on your earlier name-change commit.
I therefore made this patch use the old names.

> ---
>  include/trace/events/rcu.h |  4 ++--
>  kernel/rcu/tree.c          | 11 ++++++++++-
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 539900a9f8c7..dc0bd11739c7 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period,
>   *
>   * "Startleaf": Request a grace period based on leaf-node data.
>   * "Prestarted": Someone beat us to the request
> - * "Startedleaf": Leaf-node start proved sufficient.
> - * "Startedleafroot": Leaf-node start proved sufficient after checking root.
> + * "Startedleaf": Leaf and one or more non-root nodes marked for future start.

Actually, we only get to that trace if all we did was mark the leaf
node, right?

> + * "Startedleafroot": all non-root nodes from leaf to root marked for future start.

I got rid of the "non-root" part, given that we had to have marked
the root to break out of the loop.

							Thanx, Paul

>   * "Startedroot": Requested a nocb grace period based on root-node data.
>   * "NoGPkthread": The RCU grace-period kthread has not yet started.
>   * "StartWait": Start waiting for the requested grace period.
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 40670047d22c..8401a253e7de 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1593,8 +1593,17 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  			goto unlock_out;
>  		}
>  		rnp_node->gp_seq_needed = gp_seq_start;
> -		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))
> +
> +		/*
> +		 * Check if leaf believes a GP is in progress, if yes we can
> +		 * bail and avoid more locking. We have already marked the leaf.
> +		 */
> +		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
> +			trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
> +					  TPS("Startedleaf"));
>  			goto unlock_out;
> +		}
> +
>  		if (rnp_node != rnp && rnp_node->parent != NULL)
>  			raw_spin_unlock_rcu_node(rnp_node);
>  		if (!rnp_node->parent) {
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed
  2018-05-14  3:15 ` [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed Joel Fernandes (Google)
@ 2018-05-14 19:20   ` Paul E. McKenney
  2018-05-15  1:01     ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-14 19:20 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Sun, May 13, 2018 at 08:15:40PM -0700, Joel Fernandes (Google) wrote:
> Currently the tree RCU clean up code records a CleanupMore trace event
> even if the GP was already in progress. This makes CleanupMore show up
> twice for no reason. Avoid it.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Good catch, and I applied this patch.  I did rework the commit log
a bit, so please look it over to make sure I didn't mess it up.

							Thanx, Paul

------------------------------------------------------------------------

commit 52c4e689efd975f5383895b1bc1b91bc90fdd372
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Sun May 13 20:15:40 2018 -0700

    rcu: Produce last "CleanupMore" trace only if late-breaking request
    
    Currently the tree RCU clean-up code records a "CleanupMore" trace
    event in response to late-arriving grace-period requests even if the
    grace period was already requested. This makes "CleanupMore" show up an
    extra time (in addition to once for each rcu_node structure that was
    previously marked with the request) for no good reason.  This commit
    therefore avoids emitting this trace message unless the only if the only
    request for this next grace period arrived during or after the cleanup
    scan of the rcu_node structures.
    
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8063a0478870..de6447dd73de 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2072,7 +2072,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	rsp->gp_state = RCU_GP_IDLE;
 	/* Check for GP requests since above loop. */
 	rdp = this_cpu_ptr(rsp->rda);
-	if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
+	if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
 		trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed,
 				  TPS("CleanupMore"));
 		needgp = true;

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number
  2018-05-14  3:15 ` [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number Joel Fernandes (Google)
@ 2018-05-14 20:33   ` Paul E. McKenney
  2018-05-15  1:02     ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-14 20:33 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Sun, May 13, 2018 at 08:15:41PM -0700, Joel Fernandes (Google) wrote:
> cpustart shows a stale gp_seq. This is because rdp->gp_seq is updated
> only at the end of the __note_gp_changes function. For this reason, use
> rnp->gp_seq instead. I believe we can't update rdp->gp_seq too early so
> lets just use the gp_seq from rnp instead.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

I took this one with the usual commit-log update.  Please take a look!

							Thanx, Paul

------------------------------------------------------------------------

commit 3e14c6a5225d201776edd9a96d9c9b4435855446
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Sun May 13 20:15:41 2018 -0700

    rcu: Fix cpustart tracepoint gp_seq number
    
    The "cpustart" trace event shows a stale gp_seq. This is because it uses
    rdp->gp_seq, which is updated only at the end of the __note_gp_changes()
    function. This commit therefore instead uses rnp->gp_seq.
    
    An alternative fix would be to update rdp->gp_seq earlier, but this would
    break RCU's detection of the beginning of a new-to-this-CPU grace period.
    
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index de6447dd73de..23b855f5c5cb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1796,7 +1796,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
 		 * set up to detect a quiescent state, otherwise don't
 		 * go looking for one.
 		 */
-		trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpustart"));
+		trace_rcu_grace_period(rsp->name, rnp->gp_seq, TPS("cpustart"));
 		need_gp = !!(rnp->qsmask & rdp->grpmask);
 		rdp->cpu_no_qs.b.norm = need_gp;
 		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_dynticks.rcu_qs_ctr);

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU
  2018-05-14 17:22     ` Paul E. McKenney
@ 2018-05-15  0:35       ` Joel Fernandes
  2018-05-15  3:42         ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  0:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 10:22:05AM -0700, Paul E. McKenney wrote:
> On Mon, May 14, 2018 at 10:54:54AM -0400, Steven Rostedt wrote:
> > On Sun, 13 May 2018 20:15:35 -0700
> > "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > 
> > > Recently we had a discussion about cond_resched unconditionally
> > > recording a voluntary context switch [1].
> > > 
> > > Lets add a comment clarifying that how this API is to be used.
> > > 
> > > [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  include/linux/rcupdate.h | 11 ++++++++---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 743226176350..a9881007ece6 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { }
> > >  	} while (0)
> > >  
> > >  /*
> > > - * Note a voluntary context switch for RCU-tasks benefit.  This is a
> > > - * macro rather than an inline function to avoid #include hell.
> > > + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit.
> > > + *
> > > + * This is called even in situations where a context switch didn't really
> > > + * happen even though it was requested. The caller uses it to indicate
> > > + * traversal of an RCU-tasks quiescent state. This is a macro rather than an
> > > + * inline function to avoid #include hell.
> > 
> > I don't know. I just don't like the wording. It sounds too much like
> > it was written by someone that was confused for it being called when a
> > context switch didn't occur ;-)

True :)

> > 
> > What about something more like:
> > 
> > /*
> >  * This is called to denote a RCU-task quiescent state. It is placed at
> >  * voluntary preemption points, as RCU-task critical sections may not
> >  * perform voluntary preemption or scheduling calls. It does not matter
> >  * if the task is scheduled out or not, just that a voluntary preemption
> >  * may be done.
> >  */
> 
> s/RCU-task/RCU-tasks/ and I am good with this.

Ok. I like Steve's comment better too.

Btw, I see you just posted a change of the macro name from
rcu_note_voluntary_context_switch_lite to rcu_tasks_qs which actually in
itself is much more descriptive. Considering this, I feel the new name is
quite self-documenting in itself. So I am more inclined to drop this patch in
any series reposting, but let me know if you feel otherwise.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU
  2018-05-14 17:57   ` Paul E. McKenney
@ 2018-05-15  0:41     ` Joel Fernandes
  0 siblings, 0 replies; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  0:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 10:57:39AM -0700, Paul E. McKenney wrote:
> On Sun, May 13, 2018 at 08:15:37PM -0700, Joel Fernandes (Google) wrote:
> > The 'c' variable was used previously to store the grace period
> > that is being requested. However it is not very meaningful for
> > a code reader, this patch replaces it with gp_seq_start indicating that
> 
> Good catch, but how about "gp_seq_req" instead of gp_seq_start?

Yes, that works as well and is also 2 character less. Will change it.

thanks,

- Joel
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop
  2018-05-14 18:00   ` Paul E. McKenney
@ 2018-05-15  0:43     ` Joel Fernandes
  0 siblings, 0 replies; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  0:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 11:00:27AM -0700, Paul E. McKenney wrote:
> On Sun, May 13, 2018 at 08:15:38PM -0700, Joel Fernandes (Google) wrote:
> > The funnel locking loop in rcu_start_this_gp uses rcu_root as a
> > temporary variable while walking the combining tree. This causes a
> > tiresome exercise of a code reader reminding themselves that rcu_root
> > may not be root. Lets just call it rcu_node, and then finally when
> > rcu_node is the rcu_root, lets assign it at that time.
> > 
> > Just a clean up patch, no logical change.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> I agree that my names could use some improvement, but given that you
> called it rnp_node in the patch and rcu_node in the commit log, I would
> argue that rnp_node has a Hamming-distance problem.  ;-)
> 
> How about rnp_start for the formal parameter, rnp for the cursor running
> up the tree, and retaining rnp_root for the root?

Ok, that's fine with me. Probably easier to read too. I will rewrite it
accordingly.

thanks,

- Joel
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint
  2018-05-14 18:12   ` Paul E. McKenney
@ 2018-05-15  0:43     ` Joel Fernandes
  0 siblings, 0 replies; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  0:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 11:12:27AM -0700, Paul E. McKenney wrote:
> On Sun, May 13, 2018 at 08:15:36PM -0700, Joel Fernandes (Google) wrote:
> > Commit be4b8beed87d ("rcu: Move RCU's grace-period-change code to ->gp_seq")
> > removed the cpuend grace period trace point. This patch adds it back.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Good catch!!!  I queued this to be squashed into the offending commit,
> with attribution.

Cool, thanks!

- Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint
  2018-05-14 18:38   ` Paul E. McKenney
@ 2018-05-15  0:57     ` Joel Fernandes
  2018-05-15  3:46       ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  0:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote:
> On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote:
> > In recent discussion [1], the check for whether a leaf believes RCU is
> > not idle, is being added back to funnel locking code, to avoid more
> > locking. In this we are marking the leaf node for a future grace-period
> > and bailing out since a GP is currently in progress. However the
> > tracepoint is missing. Lets add it back.
> > 
> > Also add a small comment about why we do this check (basically the point
> > is to avoid locking intermediate nodes unnecessarily) and clarify the
> > comments in the trace event header now that we are doing traversal of
> > one or more intermediate nodes.
> > 
> > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Looks like a good idea, but it does not apply -- which is not a surprise,
> given the change rate in this code.  I hand-applied as a modification
> to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress")
> with attribution, but with the changes below.  Please let me know if I
> am missing something.
> 
> Ah, I see -- this commit depends on your earlier name-change commit.
> I therefore made this patch use the old names.

Ok, I'll check your new tree and rebase.

> > ---
> >  include/trace/events/rcu.h |  4 ++--
> >  kernel/rcu/tree.c          | 11 ++++++++++-
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index 539900a9f8c7..dc0bd11739c7 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period,
> >   *
> >   * "Startleaf": Request a grace period based on leaf-node data.
> >   * "Prestarted": Someone beat us to the request
> > - * "Startedleaf": Leaf-node start proved sufficient.
> > - * "Startedleafroot": Leaf-node start proved sufficient after checking root.
> > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start.
> 
> Actually, we only get to that trace if all we did was mark the leaf
> node, right?

I didn't think so. In the code we are doing the check for rnp every time we
walk up the tree. So even when we are on an intermediate node, we do the
check of the node we started with. I thought that's what you wanted to do. It
makes sense to me to do so too.

> > + * "Startedleafroot": all non-root nodes from leaf to root marked for future start.
> 
> I got rid of the "non-root" part, given that we had to have marked
> the root to break out of the loop.

Ah yes, sorry. That's absolutely true.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed
  2018-05-14 19:20   ` Paul E. McKenney
@ 2018-05-15  1:01     ` Joel Fernandes
  2018-05-15  3:47       ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  1:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 12:20:28PM -0700, Paul E. McKenney wrote:
> On Sun, May 13, 2018 at 08:15:40PM -0700, Joel Fernandes (Google) wrote:
> > Currently the tree RCU clean up code records a CleanupMore trace event
> > even if the GP was already in progress. This makes CleanupMore show up
> > twice for no reason. Avoid it.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Good catch, and I applied this patch.  I did rework the commit log
> a bit, so please look it over to make sure I didn't mess it up.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 52c4e689efd975f5383895b1bc1b91bc90fdd372
> Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> Date:   Sun May 13 20:15:40 2018 -0700
> 
>     rcu: Produce last "CleanupMore" trace only if late-breaking request
>     
>     Currently the tree RCU clean-up code records a "CleanupMore" trace
>     event in response to late-arriving grace-period requests even if the
>     grace period was already requested. This makes "CleanupMore" show up an
>     extra time (in addition to once for each rcu_node structure that was
>     previously marked with the request) for no good reason.  This commit
>     therefore avoids emitting this trace message unless the only if the only
>     request for this next grace period arrived during or after the cleanup
>     scan of the rcu_node structures.

Yes, this is fine except "unless the only if the only" should be "unless the".

thanks,

- Joel

>     
>     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8063a0478870..de6447dd73de 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2072,7 +2072,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
>  	rsp->gp_state = RCU_GP_IDLE;
>  	/* Check for GP requests since above loop. */
>  	rdp = this_cpu_ptr(rsp->rda);
> -	if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
> +	if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
>  		trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed,
>  				  TPS("CleanupMore"));
>  		needgp = true;
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number
  2018-05-14 20:33   ` Paul E. McKenney
@ 2018-05-15  1:02     ` Joel Fernandes
  0 siblings, 0 replies; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  1:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 01:33:20PM -0700, Paul E. McKenney wrote:
> On Sun, May 13, 2018 at 08:15:41PM -0700, Joel Fernandes (Google) wrote:
> > cpustart shows a stale gp_seq. This is because rdp->gp_seq is updated
> > only at the end of the __note_gp_changes function. For this reason, use
> > rnp->gp_seq instead. I believe we can't update rdp->gp_seq too early so
> > lets just use the gp_seq from rnp instead.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> I took this one with the usual commit-log update.  Please take a look!

Looks good, thanks!

- Joel


> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 3e14c6a5225d201776edd9a96d9c9b4435855446
> Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> Date:   Sun May 13 20:15:41 2018 -0700
> 
>     rcu: Fix cpustart tracepoint gp_seq number
>     
>     The "cpustart" trace event shows a stale gp_seq. This is because it uses
>     rdp->gp_seq, which is updated only at the end of the __note_gp_changes()
>     function. This commit therefore instead uses rnp->gp_seq.
>     
>     An alternative fix would be to update rdp->gp_seq earlier, but this would
>     break RCU's detection of the beginning of a new-to-this-CPU grace period.
>     
>     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index de6447dd73de..23b855f5c5cb 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1796,7 +1796,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
>  		 * set up to detect a quiescent state, otherwise don't
>  		 * go looking for one.
>  		 */
> -		trace_rcu_grace_period(rsp->name, rdp->gp_seq, TPS("cpustart"));
> +		trace_rcu_grace_period(rsp->name, rnp->gp_seq, TPS("cpustart"));
>  		need_gp = !!(rnp->qsmask & rdp->grpmask);
>  		rdp->cpu_no_qs.b.norm = need_gp;
>  		rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_dynticks.rcu_qs_ctr);
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-14 17:38   ` Paul E. McKenney
@ 2018-05-15  1:51     ` Joel Fernandes
  2018-05-15  3:59       ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  1:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote:
> On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > Lets document how it works with an example to make it easier.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index 003671825d62..fc3170914ac7 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> >  }
> > 
> > -/* Take a snapshot of the update side's sequence number. */
> > +/*
> > + * Take a snapshot of the update side's sequence number.
> > + *
> > + * This function predicts what the grace period number will be the next
> > + * time an RCU callback will be executed, given the current grace period's
> > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > + * already in progress.
> 
> How about something like this?
> 
> 	This function returns the earliest value of the grace-period
> 	sequence number that will indicate that a full grace period has
> 	elapsed since the current time.  Once the grace-period sequence
> 	number has reached this value, it will be safe to invoke all
> 	callbacks that have been registered prior to the current time.
> 	This value is the current grace-period number plus two to the
> 	power of the number of low-order bits reserved for state, then
> 	rounded up to the next value in which the state bits are all zero.

This makes sense too, but do you disagree with what I said?

I was kind of thinking of snap along the lines of how the previous code
worked. Where you were calling rcu_cbs_completed() or a function with a
similar name. Now we call _snap.

So basically I connected these 2 facts together to mean that rcu_seq_snap
also does that same thing as rcu_cbs_completed - which is basically it gives
the "next GP" where existing callbacks have already run and new callbacks
will run at the end of this "next GP".

> > + *
> > + * We do this with a single addition and masking.
> 
> Please either fold this sentence into rest of the paragraph or add a
> blank line after it.
> 
> > + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of
> > + * the seq is used to track if a GP is in progress or not, its sufficient if we
> > + * add (2+1) and mask with ~1. Let's see why with an example:
> > + *
> > + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0).
> > + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1)
> > + * to account for the state bit. However, if the current seq is 7 (gp is 3 and
> > + * state bit is 1), then it means the current grace period is already in
> > + * progress so the next time the callback will run is at the end of grace
> > + * period number gp+2. To account for the extra +1, we just overflow the LSB by
> > + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU
> > + * is idle), then the addition of the extra 0x1 and masking will have no
> > + * effect. This is calculated as below.
> > + */
> 
> Having the explicit numbers is good, but please use RCU_SEQ_STATE_MASK=3,
> since that is the current value.  One alternative (or perhaps addition)
> is to have a short table of numbers showing the mapping from *sp to the
> return value.  (I started from such a table when writing this function,
> for whatever that is worth.)

Ok I'll try to give a better example above. thanks,

Also just to let you know, thanks so much for elaborately providing an
example on the other thread where we are discussing the rcu_seq_done check. I
will take some time to trace this down and see if I can zero in on the same
understanding as yours.

I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its
used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing
rnp->gp_seq in rcu_seq_done.  When that rnp->gp_seq reaches 'c', it only
means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what
we were trying to check in that loop... that's why I felt that check wasn't
correct - that's my (most likely wrong) take on the matter, and I'll get back
once I trace this a bit more hopefully today :-P

thanks!

- Joel
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU
  2018-05-15  0:35       ` Joel Fernandes
@ 2018-05-15  3:42         ` Paul E. McKenney
  0 siblings, 0 replies; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-15  3:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, linux-kernel, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 05:35:55PM -0700, Joel Fernandes wrote:
> On Mon, May 14, 2018 at 10:22:05AM -0700, Paul E. McKenney wrote:
> > On Mon, May 14, 2018 at 10:54:54AM -0400, Steven Rostedt wrote:
> > > On Sun, 13 May 2018 20:15:35 -0700
> > > "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > > 
> > > > Recently we had a discussion about cond_resched unconditionally
> > > > recording a voluntary context switch [1].
> > > > 
> > > > Lets add a comment clarifying that how this API is to be used.
> > > > 
> > > > [1] https://lkml.kernel.org/r/1526027434-21237-1-git-send-email-byungchul.park@lge.com
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  include/linux/rcupdate.h | 11 ++++++++---
> > > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > index 743226176350..a9881007ece6 100644
> > > > --- a/include/linux/rcupdate.h
> > > > +++ b/include/linux/rcupdate.h
> > > > @@ -159,8 +159,12 @@ static inline void rcu_init_nohz(void) { }
> > > >  	} while (0)
> > > >  
> > > >  /*
> > > > - * Note a voluntary context switch for RCU-tasks benefit.  This is a
> > > > - * macro rather than an inline function to avoid #include hell.
> > > > + * Note an attempt to perform a voluntary context switch for RCU-tasks benefit.
> > > > + *
> > > > + * This is called even in situations where a context switch didn't really
> > > > + * happen even though it was requested. The caller uses it to indicate
> > > > + * traversal of an RCU-tasks quiescent state. This is a macro rather than an
> > > > + * inline function to avoid #include hell.
> > > 
> > > I don't know. I just don't like the wording. It sounds too much like
> > > it was written by someone that was confused for it being called when a
> > > context switch didn't occur ;-)
> 
> True :)
> 
> > > 
> > > What about something more like:
> > > 
> > > /*
> > >  * This is called to denote a RCU-task quiescent state. It is placed at
> > >  * voluntary preemption points, as RCU-task critical sections may not
> > >  * perform voluntary preemption or scheduling calls. It does not matter
> > >  * if the task is scheduled out or not, just that a voluntary preemption
> > >  * may be done.
> > >  */
> > 
> > s/RCU-task/RCU-tasks/ and I am good with this.
> 
> Ok. I like Steve's comment better too.
> 
> Btw, I see you just posted a change of the macro name from
> rcu_note_voluntary_context_switch_lite to rcu_tasks_qs which actually in
> itself is much more descriptive. Considering this, I feel the new name is
> quite self-documenting in itself. So I am more inclined to drop this patch in
> any series reposting, but let me know if you feel otherwise.

I agree, but let's see what other people think.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint
  2018-05-15  0:57     ` Joel Fernandes
@ 2018-05-15  3:46       ` Paul E. McKenney
  2018-05-15 23:04         ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-15  3:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 05:57:09PM -0700, Joel Fernandes wrote:
> On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote:
> > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote:
> > > In recent discussion [1], the check for whether a leaf believes RCU is
> > > not idle, is being added back to funnel locking code, to avoid more
> > > locking. In this we are marking the leaf node for a future grace-period
> > > and bailing out since a GP is currently in progress. However the
> > > tracepoint is missing. Lets add it back.
> > > 
> > > Also add a small comment about why we do this check (basically the point
> > > is to avoid locking intermediate nodes unnecessarily) and clarify the
> > > comments in the trace event header now that we are doing traversal of
> > > one or more intermediate nodes.
> > > 
> > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Looks like a good idea, but it does not apply -- which is not a surprise,
> > given the change rate in this code.  I hand-applied as a modification
> > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress")
> > with attribution, but with the changes below.  Please let me know if I
> > am missing something.
> > 
> > Ah, I see -- this commit depends on your earlier name-change commit.
> > I therefore made this patch use the old names.
> 
> Ok, I'll check your new tree and rebase.

Sounds good!

> > > ---
> > >  include/trace/events/rcu.h |  4 ++--
> > >  kernel/rcu/tree.c          | 11 ++++++++++-
> > >  2 files changed, 12 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > index 539900a9f8c7..dc0bd11739c7 100644
> > > --- a/include/trace/events/rcu.h
> > > +++ b/include/trace/events/rcu.h
> > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period,
> > >   *
> > >   * "Startleaf": Request a grace period based on leaf-node data.
> > >   * "Prestarted": Someone beat us to the request
> > > - * "Startedleaf": Leaf-node start proved sufficient.
> > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root.
> > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start.
> > 
> > Actually, we only get to that trace if all we did was mark the leaf
> > node, right?
> 
> I didn't think so. In the code we are doing the check for rnp every time we
> walk up the tree. So even when we are on an intermediate node, we do the
> check of the node we started with. I thought that's what you wanted to do. It
> makes sense to me to do so too.

If we are not on the initial (usually leaf) node, then the similar check
in the previous "if" statement would have sent us to unlock_out, right?

(And yes, I should have said "mark the initial node" above.)

							Thanx, Paul

> > > + * "Startedleafroot": all non-root nodes from leaf to root marked for future start.
> > 
> > I got rid of the "non-root" part, given that we had to have marked
> > the root to break out of the loop.
> 
> Ah yes, sorry. That's absolutely true.
> 
> thanks,
> 
> - Joel
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed
  2018-05-15  1:01     ` Joel Fernandes
@ 2018-05-15  3:47       ` Paul E. McKenney
  0 siblings, 0 replies; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-15  3:47 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 06:01:31PM -0700, Joel Fernandes wrote:
> On Mon, May 14, 2018 at 12:20:28PM -0700, Paul E. McKenney wrote:
> > On Sun, May 13, 2018 at 08:15:40PM -0700, Joel Fernandes (Google) wrote:
> > > Currently the tree RCU clean up code records a CleanupMore trace event
> > > even if the GP was already in progress. This makes CleanupMore show up
> > > twice for no reason. Avoid it.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Good catch, and I applied this patch.  I did rework the commit log
> > a bit, so please look it over to make sure I didn't mess it up.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 52c4e689efd975f5383895b1bc1b91bc90fdd372
> > Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Date:   Sun May 13 20:15:40 2018 -0700
> > 
> >     rcu: Produce last "CleanupMore" trace only if late-breaking request
> >     
> >     Currently the tree RCU clean-up code records a "CleanupMore" trace
> >     event in response to late-arriving grace-period requests even if the
> >     grace period was already requested. This makes "CleanupMore" show up an
> >     extra time (in addition to once for each rcu_node structure that was
> >     previously marked with the request) for no good reason.  This commit
> >     therefore avoids emitting this trace message unless the only if the only
> >     request for this next grace period arrived during or after the cleanup
> >     scan of the rcu_node structures.
> 
> Yes, this is fine except "unless the only if the only" should be "unless the".

I did update this after sending, and I still have "the the".  Will fix on
next rebase.  As my daughter recently reminded me, the Law of Conservation
of Bugs.  ;-)

							Thanx, Paul

> thanks,
> 
> - Joel
> 
> >     
> >     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8063a0478870..de6447dd73de 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2072,7 +2072,7 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
> >  	rsp->gp_state = RCU_GP_IDLE;
> >  	/* Check for GP requests since above loop. */
> >  	rdp = this_cpu_ptr(rsp->rda);
> > -	if (ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
> > +	if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) {
> >  		trace_rcu_this_gp(rnp, rdp, rnp->gp_seq_needed,
> >  				  TPS("CleanupMore"));
> >  		needgp = true;
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-15  1:51     ` Joel Fernandes
@ 2018-05-15  3:59       ` Paul E. McKenney
  2018-05-15  7:02         ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-15  3:59 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote:
> On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote:
> > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> > > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > > Lets document how it works with an example to make it easier.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> > >  1 file changed, 23 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > index 003671825d62..fc3170914ac7 100644
> > > --- a/kernel/rcu/rcu.h
> > > +++ b/kernel/rcu/rcu.h
> > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> > >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> > >  }
> > > 
> > > -/* Take a snapshot of the update side's sequence number. */
> > > +/*
> > > + * Take a snapshot of the update side's sequence number.
> > > + *
> > > + * This function predicts what the grace period number will be the next
> > > + * time an RCU callback will be executed, given the current grace period's
> > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > > + * already in progress.
> > 
> > How about something like this?
> > 
> > 	This function returns the earliest value of the grace-period
> > 	sequence number that will indicate that a full grace period has
> > 	elapsed since the current time.  Once the grace-period sequence
> > 	number has reached this value, it will be safe to invoke all
> > 	callbacks that have been registered prior to the current time.
> > 	This value is the current grace-period number plus two to the
> > 	power of the number of low-order bits reserved for state, then
> > 	rounded up to the next value in which the state bits are all zero.
> 
> This makes sense too, but do you disagree with what I said?

In a pedantic sense, definitely.  RCU callbacks are being executed pretty
much all the time on a busy system, so it is only the recently queued
ones that are guaranteed to be deferred that long.  And my experience
indicates that someone really will get confused by that distinction,
so I feel justified in being pedantic in this case.

> I was kind of thinking of snap along the lines of how the previous code
> worked. Where you were calling rcu_cbs_completed() or a function with a
> similar name. Now we call _snap.

You are quite correct that rcu_seq_snap() is the new analog of
rcu_cbs_completed(), though there are differences due to the old ->gpnum
and ->completed now being represented by a single ->gp_seq value.

> So basically I connected these 2 facts together to mean that rcu_seq_snap
> also does that same thing as rcu_cbs_completed - which is basically it gives
> the "next GP" where existing callbacks have already run and new callbacks
> will run at the end of this "next GP".

Ah, but you didn't have the qualifier "new" in your original.  And even
then, the definition of "new" might be different for different readers.

> > > + *
> > > + * We do this with a single addition and masking.
> > 
> > Please either fold this sentence into rest of the paragraph or add a
> > blank line after it.
> > 
> > > + * For example, if RCU_SEQ_STATE_MASK=1 and the least significant bit (LSB) of
> > > + * the seq is used to track if a GP is in progress or not, its sufficient if we
> > > + * add (2+1) and mask with ~1. Let's see why with an example:
> > > + *
> > > + * Say the current seq is 6 which is 0b110 (gp is 3 and state bit is 0).
> > > + * To get the next GP number, we have to at least add 0b10 to this (0x1 << 1)
> > > + * to account for the state bit. However, if the current seq is 7 (gp is 3 and
> > > + * state bit is 1), then it means the current grace period is already in
> > > + * progress so the next time the callback will run is at the end of grace
> > > + * period number gp+2. To account for the extra +1, we just overflow the LSB by
> > > + * adding another 0x1 and masking with ~0x1. In case no GP was in progress (RCU
> > > + * is idle), then the addition of the extra 0x1 and masking will have no
> > > + * effect. This is calculated as below.
> > > + */
> > 
> > Having the explicit numbers is good, but please use RCU_SEQ_STATE_MASK=3,
> > since that is the current value.  One alternative (or perhaps addition)
> > is to have a short table of numbers showing the mapping from *sp to the
> > return value.  (I started from such a table when writing this function,
> > for whatever that is worth.)
> 
> Ok I'll try to give a better example above. thanks,

Sounds good!

> Also just to let you know, thanks so much for elaborately providing an
> example on the other thread where we are discussing the rcu_seq_done check. I
> will take some time to trace this down and see if I can zero in on the same
> understanding as yours.
> 
> I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its
> used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing
> rnp->gp_seq in rcu_seq_done.  When that rnp->gp_seq reaches 'c', it only
> means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what
> we were trying to check in that loop... that's why I felt that check wasn't
> correct - that's my (most likely wrong) take on the matter, and I'll get back
> once I trace this a bit more hopefully today :-P

If your point is that interrupts are disabled throughout, so there isn't
much chance of the grace period completing during that time, you are
mostly right.  The places you might not be right are the idle loop and
offline CPUs.  And yes, call_rcu() doesn't like queuing callbacks onto
offline CPUs, but IIRC it is just fine in the case where callbacks have
been offloaded from that CPU.

And if you instead say that "c" is the requested final ->gp_seq value
obtained from _snap(), the thought process might go more easily.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-15  3:59       ` Paul E. McKenney
@ 2018-05-15  7:02         ` Joel Fernandes
  2018-05-15 12:55           ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15  7:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

Hi Paul,
Good morning, hope you're having a great Tuesday. I managed to find some
evening hours today to dig into this a bit more.

On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote:
> On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote:
> > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote:
> > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> > > > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > > > Lets document how it works with an example to make it easier.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> > > >  1 file changed, 23 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > index 003671825d62..fc3170914ac7 100644
> > > > --- a/kernel/rcu/rcu.h
> > > > +++ b/kernel/rcu/rcu.h
> > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> > > >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> > > >  }
> > > > 
> > > > -/* Take a snapshot of the update side's sequence number. */
> > > > +/*
> > > > + * Take a snapshot of the update side's sequence number.
> > > > + *
> > > > + * This function predicts what the grace period number will be the next
> > > > + * time an RCU callback will be executed, given the current grace period's
> > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > > > + * already in progress.
> > > 
> > > How about something like this?
> > > 
> > > 	This function returns the earliest value of the grace-period
> > > 	sequence number that will indicate that a full grace period has
> > > 	elapsed since the current time.  Once the grace-period sequence
> > > 	number has reached this value, it will be safe to invoke all
> > > 	callbacks that have been registered prior to the current time.
> > > 	This value is the current grace-period number plus two to the
> > > 	power of the number of low-order bits reserved for state, then
> > > 	rounded up to the next value in which the state bits are all zero.
> > 
> > This makes sense too, but do you disagree with what I said?
> 
> In a pedantic sense, definitely.  RCU callbacks are being executed pretty
> much all the time on a busy system, so it is only the recently queued
> ones that are guaranteed to be deferred that long.  And my experience
> indicates that someone really will get confused by that distinction,
> so I feel justified in being pedantic in this case.

Ok I agree, I'll include your comment above.

> > Also just to let you know, thanks so much for elaborately providing an
> > example on the other thread where we are discussing the rcu_seq_done check. I
> > will take some time to trace this down and see if I can zero in on the same
> > understanding as yours.
> > 
> > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its
> > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing
> > rnp->gp_seq in rcu_seq_done.  When that rnp->gp_seq reaches 'c', it only
> > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what
> > we were trying to check in that loop... that's why I felt that check wasn't
> > correct - that's my (most likely wrong) take on the matter, and I'll get back
> > once I trace this a bit more hopefully today :-P
> 
> If your point is that interrupts are disabled throughout, so there isn't
> much chance of the grace period completing during that time, you are
> mostly right.  The places you might not be right are the idle loop and
> offline CPUs.  And yes, call_rcu() doesn't like queuing callbacks onto
> offline CPUs, but IIRC it is just fine in the case where callbacks have
> been offloaded from that CPU.
> 
> And if you instead say that "c" is the requested final ->gp_seq value
> obtained from _snap(), the thought process might go more easily.

Yes I agree with c being the requested final value which is the GP for which
the callbacks will be queued. At the end of the GP c, the callbacks will have
executed.

About the rcu_seq_done check and why I believe its not right to use it in
that funnel locking loop, if you could allow me to try argument my point from
a different angle...

We agreed that the way gp_seq numbers work and are compared with each other
to identify if a GP is elapsed or not, is different from the way the previous
numbers (gp_num) were compared.

Most notably, before the gp_seq conversions - inorder to start a GP, we were
doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the
end.

Now with gp_seq, for a gp to start, we don't do the "+1", we just set the
state bits. To mark the end, we clear the state bits and increment the gp_num
part of gp_seq.

However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period
requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE
with rcu_seq_done. You did so even though the gp_seq numbers work differently
from previously used numbers (gp_num and completed).

I would then argue that because of the differences above, a one-to-one
replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense.

I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num
way of things because, if c == gp_num, that means that :
 - c started already
 - c has finished.
 Which worked correctly, because we have nothing to do and we can bail
 without setting any flag.

 Where as now, with the gp_seq regime, c == gp_seq means:
 - c-1 finished   (I meant -1 subtracted from the gp_num part of c)
 This would cause us to bail without setting any flag for starting c.

 I did some tracing and I could never hit the rcu_seq_done check because it
 never happens in my tracing that _snap returned something for which
 rcu_seq_done returned true, so I'm not sure if this check is needed, but
 you're the expert ;)

@@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
         * not be released.
         */
        raw_lockdep_assert_held_rcu_node(rnp);
+       WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */
+       WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */
        trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
        for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
                if (rnp_root != rnp)
                        raw_spin_lock_rcu_node(rnp_root);
-               WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum +
-                                         need_future_gp_mask(), c));
                if (need_future_gp_element(rnp_root, c) ||
-                   ULONG_CMP_GE(rnp_root->gpnum, c) ||
+                   rcu_seq_done(&rnp_root->gp_seq, c) ||

                     ^^^^
		     A direct replacement of ULONG_CMP_GE is bit weird?  It
		     means we bail out if c-1 completed, and we don't set any
		     flag for starting c. That could result in the clean up
		     never starting c?

                    (rnp != rnp_root &&
-                    rnp_root->gpnum != rnp_root->completed)) {
+                    rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
                        trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted"));
                        goto unlock_out;
                }


 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-15  7:02         ` Joel Fernandes
@ 2018-05-15 12:55           ` Paul E. McKenney
  2018-05-15 18:41             ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-15 12:55 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote:
> Hi Paul,
> Good morning, hope you're having a great Tuesday. I managed to find some
> evening hours today to dig into this a bit more.
> 
> On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote:
> > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote:
> > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote:
> > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> > > > > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > > > > Lets document how it works with an example to make it easier.
> > > > > 
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > ---
> > > > >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> > > > >  1 file changed, 23 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > index 003671825d62..fc3170914ac7 100644
> > > > > --- a/kernel/rcu/rcu.h
> > > > > +++ b/kernel/rcu/rcu.h
> > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> > > > >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> > > > >  }
> > > > > 
> > > > > -/* Take a snapshot of the update side's sequence number. */
> > > > > +/*
> > > > > + * Take a snapshot of the update side's sequence number.
> > > > > + *
> > > > > + * This function predicts what the grace period number will be the next
> > > > > + * time an RCU callback will be executed, given the current grace period's
> > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > > > > + * already in progress.
> > > > 
> > > > How about something like this?
> > > > 
> > > > 	This function returns the earliest value of the grace-period
> > > > 	sequence number that will indicate that a full grace period has
> > > > 	elapsed since the current time.  Once the grace-period sequence
> > > > 	number has reached this value, it will be safe to invoke all
> > > > 	callbacks that have been registered prior to the current time.
> > > > 	This value is the current grace-period number plus two to the
> > > > 	power of the number of low-order bits reserved for state, then
> > > > 	rounded up to the next value in which the state bits are all zero.
> > > 
> > > This makes sense too, but do you disagree with what I said?
> > 
> > In a pedantic sense, definitely.  RCU callbacks are being executed pretty
> > much all the time on a busy system, so it is only the recently queued
> > ones that are guaranteed to be deferred that long.  And my experience
> > indicates that someone really will get confused by that distinction,
> > so I feel justified in being pedantic in this case.
> 
> Ok I agree, I'll include your comment above.
> 
> > > Also just to let you know, thanks so much for elaborately providing an
> > > example on the other thread where we are discussing the rcu_seq_done check. I
> > > will take some time to trace this down and see if I can zero in on the same
> > > understanding as yours.
> > > 
> > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its
> > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing
> > > rnp->gp_seq in rcu_seq_done.  When that rnp->gp_seq reaches 'c', it only
> > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what
> > > we were trying to check in that loop... that's why I felt that check wasn't
> > > correct - that's my (most likely wrong) take on the matter, and I'll get back
> > > once I trace this a bit more hopefully today :-P
> > 
> > If your point is that interrupts are disabled throughout, so there isn't
> > much chance of the grace period completing during that time, you are
> > mostly right.  The places you might not be right are the idle loop and
> > offline CPUs.  And yes, call_rcu() doesn't like queuing callbacks onto
> > offline CPUs, but IIRC it is just fine in the case where callbacks have
> > been offloaded from that CPU.
> > 
> > And if you instead say that "c" is the requested final ->gp_seq value
> > obtained from _snap(), the thought process might go more easily.
> 
> Yes I agree with c being the requested final value which is the GP for which
> the callbacks will be queued. At the end of the GP c, the callbacks will have
> executed.
> 
> About the rcu_seq_done check and why I believe its not right to use it in
> that funnel locking loop, if you could allow me to try argument my point from
> a different angle...
> 
> We agreed that the way gp_seq numbers work and are compared with each other
> to identify if a GP is elapsed or not, is different from the way the previous
> numbers (gp_num) were compared.
> 
> Most notably, before the gp_seq conversions - inorder to start a GP, we were
> doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the
> end.
> 
> Now with gp_seq, for a gp to start, we don't do the "+1", we just set the
> state bits. To mark the end, we clear the state bits and increment the gp_num
> part of gp_seq.
> 
> However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period
> requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE
> with rcu_seq_done. You did so even though the gp_seq numbers work differently
> from previously used numbers (gp_num and completed).
> 
> I would then argue that because of the differences above, a one-to-one
> replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense.
> 
> I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num
> way of things because, if c == gp_num, that means that :
>  - c started already
>  - c has finished.
>  Which worked correctly, because we have nothing to do and we can bail
>  without setting any flag.
> 
>  Where as now, with the gp_seq regime, c == gp_seq means:
>  - c-1 finished   (I meant -1 subtracted from the gp_num part of c)
>  This would cause us to bail without setting any flag for starting c.
> 
>  I did some tracing and I could never hit the rcu_seq_done check because it
>  never happens in my tracing that _snap returned something for which
>  rcu_seq_done returned true, so I'm not sure if this check is needed, but
>  you're the expert ;)
> 
> @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>          * not be released.
>          */
>         raw_lockdep_assert_held_rcu_node(rnp);
> +       WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */
> +       WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */
>         trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
>         for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
>                 if (rnp_root != rnp)
>                         raw_spin_lock_rcu_node(rnp_root);
> -               WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum +
> -                                         need_future_gp_mask(), c));
>                 if (need_future_gp_element(rnp_root, c) ||
> -                   ULONG_CMP_GE(rnp_root->gpnum, c) ||
> +                   rcu_seq_done(&rnp_root->gp_seq, c) ||
> 
>                      ^^^^
> 		     A direct replacement of ULONG_CMP_GE is bit weird?  It
> 		     means we bail out if c-1 completed, and we don't set any
> 		     flag for starting c. That could result in the clean up
> 		     never starting c?

Ah, I see what you are getting at now.

What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check
if GP already requested") is to push the request down to the leaves of
the tree and to the rcu_data structure.  Once that commit is in place,
the check for the grace period already being in progress isn't all
that helpful, though I suppose that it could be added.  One way to
do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with
ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems
a bit baroque to me.

The point of the rcu_seq_done() is to catch long delays, but given the
current implementation, the fact that interrupts are disabled across
all calls should prevent the rcu_seq_done() from ever returning true.
(Famous last words!)  So, yes, it could be removed, in theory, at least.
At least until the real-time guys force me to come up with a way to
run this code with interrupts enabled (hopefully never!).

If I were to do that, I would first wrap it with a WARN_ON_ONCE() and
leave it that way for an extended period of testing.  Yes, I am paranoid.
Why do you ask?  ;-)

							Thanx, Paul

>                     (rnp != rnp_root &&
> -                    rnp_root->gpnum != rnp_root->completed)) {
> +                    rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
>                         trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted"));
>                         goto unlock_out;
>                 }
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-15 12:55           ` Paul E. McKenney
@ 2018-05-15 18:41             ` Joel Fernandes
  2018-05-15 19:08               ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15 18:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Tue, May 15, 2018 at 05:55:07AM -0700, Paul E. McKenney wrote:
> On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote:
> > Hi Paul,
> > Good morning, hope you're having a great Tuesday. I managed to find some
> > evening hours today to dig into this a bit more.
> > 
> > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote:
> > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote:
> > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote:
> > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > > > > > Lets document how it works with an example to make it easier.
> > > > > > 
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > ---
> > > > > >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> > > > > >  1 file changed, 23 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > > index 003671825d62..fc3170914ac7 100644
> > > > > > --- a/kernel/rcu/rcu.h
> > > > > > +++ b/kernel/rcu/rcu.h
> > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> > > > > >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> > > > > >  }
> > > > > > 
> > > > > > -/* Take a snapshot of the update side's sequence number. */
> > > > > > +/*
> > > > > > + * Take a snapshot of the update side's sequence number.
> > > > > > + *
> > > > > > + * This function predicts what the grace period number will be the next
> > > > > > + * time an RCU callback will be executed, given the current grace period's
> > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > > > > > + * already in progress.
> > > > > 
> > > > > How about something like this?
> > > > > 
> > > > > 	This function returns the earliest value of the grace-period
> > > > > 	sequence number that will indicate that a full grace period has
> > > > > 	elapsed since the current time.  Once the grace-period sequence
> > > > > 	number has reached this value, it will be safe to invoke all
> > > > > 	callbacks that have been registered prior to the current time.
> > > > > 	This value is the current grace-period number plus two to the
> > > > > 	power of the number of low-order bits reserved for state, then
> > > > > 	rounded up to the next value in which the state bits are all zero.
> > > > 
> > > > This makes sense too, but do you disagree with what I said?
> > > 
> > > In a pedantic sense, definitely.  RCU callbacks are being executed pretty
> > > much all the time on a busy system, so it is only the recently queued
> > > ones that are guaranteed to be deferred that long.  And my experience
> > > indicates that someone really will get confused by that distinction,
> > > so I feel justified in being pedantic in this case.
> > 
> > Ok I agree, I'll include your comment above.
> > 
> > > > Also just to let you know, thanks so much for elaborately providing an
> > > > example on the other thread where we are discussing the rcu_seq_done check. I
> > > > will take some time to trace this down and see if I can zero in on the same
> > > > understanding as yours.
> > > > 
> > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its
> > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing
> > > > rnp->gp_seq in rcu_seq_done.  When that rnp->gp_seq reaches 'c', it only
> > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what
> > > > we were trying to check in that loop... that's why I felt that check wasn't
> > > > correct - that's my (most likely wrong) take on the matter, and I'll get back
> > > > once I trace this a bit more hopefully today :-P
> > > 
> > > If your point is that interrupts are disabled throughout, so there isn't
> > > much chance of the grace period completing during that time, you are
> > > mostly right.  The places you might not be right are the idle loop and
> > > offline CPUs.  And yes, call_rcu() doesn't like queuing callbacks onto
> > > offline CPUs, but IIRC it is just fine in the case where callbacks have
> > > been offloaded from that CPU.
> > > 
> > > And if you instead say that "c" is the requested final ->gp_seq value
> > > obtained from _snap(), the thought process might go more easily.
> > 
> > Yes I agree with c being the requested final value which is the GP for which
> > the callbacks will be queued. At the end of the GP c, the callbacks will have
> > executed.
> > 
> > About the rcu_seq_done check and why I believe its not right to use it in
> > that funnel locking loop, if you could allow me to try argument my point from
> > a different angle...
> > 
> > We agreed that the way gp_seq numbers work and are compared with each other
> > to identify if a GP is elapsed or not, is different from the way the previous
> > numbers (gp_num) were compared.
> > 
> > Most notably, before the gp_seq conversions - inorder to start a GP, we were
> > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the
> > end.
> > 
> > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the
> > state bits. To mark the end, we clear the state bits and increment the gp_num
> > part of gp_seq.
> > 
> > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period
> > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE
> > with rcu_seq_done. You did so even though the gp_seq numbers work differently
> > from previously used numbers (gp_num and completed).
> > 
> > I would then argue that because of the differences above, a one-to-one
> > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense.
> > 
> > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num
> > way of things because, if c == gp_num, that means that :
> >  - c started already
> >  - c has finished.
> >  Which worked correctly, because we have nothing to do and we can bail
> >  without setting any flag.
> > 
> >  Where as now, with the gp_seq regime, c == gp_seq means:
> >  - c-1 finished   (I meant -1 subtracted from the gp_num part of c)
> >  This would cause us to bail without setting any flag for starting c.
> > 
> >  I did some tracing and I could never hit the rcu_seq_done check because it
> >  never happens in my tracing that _snap returned something for which
> >  rcu_seq_done returned true, so I'm not sure if this check is needed, but
> >  you're the expert ;)
> > 
> > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> >          * not be released.
> >          */
> >         raw_lockdep_assert_held_rcu_node(rnp);
> > +       WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */
> > +       WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */
> >         trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
> >         for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
> >                 if (rnp_root != rnp)
> >                         raw_spin_lock_rcu_node(rnp_root);
> > -               WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum +
> > -                                         need_future_gp_mask(), c));
> >                 if (need_future_gp_element(rnp_root, c) ||
> > -                   ULONG_CMP_GE(rnp_root->gpnum, c) ||
> > +                   rcu_seq_done(&rnp_root->gp_seq, c) ||
> > 
> >                      ^^^^
> > 		     A direct replacement of ULONG_CMP_GE is bit weird?  It
> > 		     means we bail out if c-1 completed, and we don't set any
> > 		     flag for starting c. That could result in the clean up
> > 		     never starting c?
> 
> Ah, I see what you are getting at now.
> 
> What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check
> if GP already requested") is to push the request down to the leaves of
> the tree and to the rcu_data structure.  Once that commit is in place,
> the check for the grace period already being in progress isn't all
> that helpful, though I suppose that it could be added.  One way to
> do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with
> ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems
> a bit baroque to me.
> 
> The point of the rcu_seq_done() is to catch long delays, but given the
> current implementation, the fact that interrupts are disabled across
> all calls should prevent the rcu_seq_done() from ever returning true.
> (Famous last words!)  So, yes, it could be removed, in theory, at least.
> At least until the real-time guys force me to come up with a way to
> run this code with interrupts enabled (hopefully never!).
> 
> If I were to do that, I would first wrap it with a WARN_ON_ONCE() and
> leave it that way for an extended period of testing.  Yes, I am paranoid.
> Why do you ask?  ;-)
:-D

Ah I see what you're doing in that commit where you're moving the furthest
request down to the leaves, so that would protect against the scenario I was
describing and set the gp_seq_needed of the leaf.

The code would be correct then, but one issue is it would shout out the
'Prestarted' tracepoint for 'c' when that's not really true..

               rcu_seq_done(&rnp_root->gp_seq, c)

translates to ULONG_CMP_GE(&rnp_root->gp_seq, c)

which translates to the fact that c-1 completed.

So in this case if rcu_seq_done returns true, then saying that c has been
'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something
since what we really are doing is just marking the leaf as you mentioned in
the unlock_out part for a future start.

thanks!

- Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-15 18:41             ` Joel Fernandes
@ 2018-05-15 19:08               ` Paul E. McKenney
  2018-05-15 22:55                 ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-15 19:08 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Tue, May 15, 2018 at 11:41:15AM -0700, Joel Fernandes wrote:
> On Tue, May 15, 2018 at 05:55:07AM -0700, Paul E. McKenney wrote:
> > On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote:
> > > Hi Paul,
> > > Good morning, hope you're having a great Tuesday. I managed to find some
> > > evening hours today to dig into this a bit more.
> > > 
> > > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote:
> > > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote:
> > > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote:
> > > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> > > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > > > > > > Lets document how it works with an example to make it easier.
> > > > > > > 
> > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > ---
> > > > > > >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> > > > > > >  1 file changed, 23 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > > > index 003671825d62..fc3170914ac7 100644
> > > > > > > --- a/kernel/rcu/rcu.h
> > > > > > > +++ b/kernel/rcu/rcu.h
> > > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> > > > > > >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> > > > > > >  }
> > > > > > > 
> > > > > > > -/* Take a snapshot of the update side's sequence number. */
> > > > > > > +/*
> > > > > > > + * Take a snapshot of the update side's sequence number.
> > > > > > > + *
> > > > > > > + * This function predicts what the grace period number will be the next
> > > > > > > + * time an RCU callback will be executed, given the current grace period's
> > > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > > > > > > + * already in progress.
> > > > > > 
> > > > > > How about something like this?
> > > > > > 
> > > > > > 	This function returns the earliest value of the grace-period
> > > > > > 	sequence number that will indicate that a full grace period has
> > > > > > 	elapsed since the current time.  Once the grace-period sequence
> > > > > > 	number has reached this value, it will be safe to invoke all
> > > > > > 	callbacks that have been registered prior to the current time.
> > > > > > 	This value is the current grace-period number plus two to the
> > > > > > 	power of the number of low-order bits reserved for state, then
> > > > > > 	rounded up to the next value in which the state bits are all zero.
> > > > > 
> > > > > This makes sense too, but do you disagree with what I said?
> > > > 
> > > > In a pedantic sense, definitely.  RCU callbacks are being executed pretty
> > > > much all the time on a busy system, so it is only the recently queued
> > > > ones that are guaranteed to be deferred that long.  And my experience
> > > > indicates that someone really will get confused by that distinction,
> > > > so I feel justified in being pedantic in this case.
> > > 
> > > Ok I agree, I'll include your comment above.
> > > 
> > > > > Also just to let you know, thanks so much for elaborately providing an
> > > > > example on the other thread where we are discussing the rcu_seq_done check. I
> > > > > will take some time to trace this down and see if I can zero in on the same
> > > > > understanding as yours.
> > > > > 
> > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its
> > > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing
> > > > > rnp->gp_seq in rcu_seq_done.  When that rnp->gp_seq reaches 'c', it only
> > > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what
> > > > > we were trying to check in that loop... that's why I felt that check wasn't
> > > > > correct - that's my (most likely wrong) take on the matter, and I'll get back
> > > > > once I trace this a bit more hopefully today :-P
> > > > 
> > > > If your point is that interrupts are disabled throughout, so there isn't
> > > > much chance of the grace period completing during that time, you are
> > > > mostly right.  The places you might not be right are the idle loop and
> > > > offline CPUs.  And yes, call_rcu() doesn't like queuing callbacks onto
> > > > offline CPUs, but IIRC it is just fine in the case where callbacks have
> > > > been offloaded from that CPU.
> > > > 
> > > > And if you instead say that "c" is the requested final ->gp_seq value
> > > > obtained from _snap(), the thought process might go more easily.
> > > 
> > > Yes I agree with c being the requested final value which is the GP for which
> > > the callbacks will be queued. At the end of the GP c, the callbacks will have
> > > executed.
> > > 
> > > About the rcu_seq_done check and why I believe its not right to use it in
> > > that funnel locking loop, if you could allow me to try argument my point from
> > > a different angle...
> > > 
> > > We agreed that the way gp_seq numbers work and are compared with each other
> > > to identify if a GP is elapsed or not, is different from the way the previous
> > > numbers (gp_num) were compared.
> > > 
> > > Most notably, before the gp_seq conversions - inorder to start a GP, we were
> > > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the
> > > end.
> > > 
> > > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the
> > > state bits. To mark the end, we clear the state bits and increment the gp_num
> > > part of gp_seq.
> > > 
> > > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period
> > > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE
> > > with rcu_seq_done. You did so even though the gp_seq numbers work differently
> > > from previously used numbers (gp_num and completed).
> > > 
> > > I would then argue that because of the differences above, a one-to-one
> > > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense.
> > > 
> > > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num
> > > way of things because, if c == gp_num, that means that :
> > >  - c started already
> > >  - c has finished.
> > >  Which worked correctly, because we have nothing to do and we can bail
> > >  without setting any flag.
> > > 
> > >  Where as now, with the gp_seq regime, c == gp_seq means:
> > >  - c-1 finished   (I meant -1 subtracted from the gp_num part of c)
> > >  This would cause us to bail without setting any flag for starting c.
> > > 
> > >  I did some tracing and I could never hit the rcu_seq_done check because it
> > >  never happens in my tracing that _snap returned something for which
> > >  rcu_seq_done returned true, so I'm not sure if this check is needed, but
> > >  you're the expert ;)
> > > 
> > > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> > >          * not be released.
> > >          */
> > >         raw_lockdep_assert_held_rcu_node(rnp);
> > > +       WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */
> > > +       WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */
> > >         trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
> > >         for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
> > >                 if (rnp_root != rnp)
> > >                         raw_spin_lock_rcu_node(rnp_root);
> > > -               WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum +
> > > -                                         need_future_gp_mask(), c));
> > >                 if (need_future_gp_element(rnp_root, c) ||
> > > -                   ULONG_CMP_GE(rnp_root->gpnum, c) ||
> > > +                   rcu_seq_done(&rnp_root->gp_seq, c) ||
> > > 
> > >                      ^^^^
> > > 		     A direct replacement of ULONG_CMP_GE is bit weird?  It
> > > 		     means we bail out if c-1 completed, and we don't set any
> > > 		     flag for starting c. That could result in the clean up
> > > 		     never starting c?
> > 
> > Ah, I see what you are getting at now.
> > 
> > What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check
> > if GP already requested") is to push the request down to the leaves of
> > the tree and to the rcu_data structure.  Once that commit is in place,
> > the check for the grace period already being in progress isn't all
> > that helpful, though I suppose that it could be added.  One way to
> > do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with
> > ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems
> > a bit baroque to me.
> > 
> > The point of the rcu_seq_done() is to catch long delays, but given the
> > current implementation, the fact that interrupts are disabled across
> > all calls should prevent the rcu_seq_done() from ever returning true.
> > (Famous last words!)  So, yes, it could be removed, in theory, at least.
> > At least until the real-time guys force me to come up with a way to
> > run this code with interrupts enabled (hopefully never!).
> > 
> > If I were to do that, I would first wrap it with a WARN_ON_ONCE() and
> > leave it that way for an extended period of testing.  Yes, I am paranoid.
> > Why do you ask?  ;-)
> :-D
> 
> Ah I see what you're doing in that commit where you're moving the furthest
> request down to the leaves, so that would protect against the scenario I was
> describing and set the gp_seq_needed of the leaf.

But I came up with a less baroque check for a grace period having started,
at which point the question becomes "Why not just do both?", especially
since a check for a grace period having started is satisfied by that
grace period's having completed, which means minimal added overhead.
Perhaps no added overhead for some compilers and architectures.

Please see the end of this email for a prototype patch.

> The code would be correct then, but one issue is it would shout out the
> 'Prestarted' tracepoint for 'c' when that's not really true..
> 
>                rcu_seq_done(&rnp_root->gp_seq, c)
> 
> translates to ULONG_CMP_GE(&rnp_root->gp_seq, c)
> 
> which translates to the fact that c-1 completed.
> 
> So in this case if rcu_seq_done returns true, then saying that c has been
> 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something
> since what we really are doing is just marking the leaf as you mentioned in
> the unlock_out part for a future start.

Indeed, some of the tracing is not all that accurate.  But the trace
message itself contains the information needed to work out why the
loop was exited, so perhaps something like 'EarlyExit'?

							Thanx, Paul

------------------------------------------------------------------------

commit 59a4f38edcffbef1521852fe3b26ed4ed85af16e
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue May 15 11:53:41 2018 -0700

    rcu: Make rcu_start_this_gp() check for grace period already started
    
    In the old days of ->gpnum and ->completed, the code requesting a new
    grace period checked to see if that grace period had already started,
    bailing early if so.  The new-age ->gp_seq approach instead checks
    whether the grace period has already finished.  A compensating change
    pushed the requested grace period down to the bottom of the tree, thus
    reducing lock contention and even eliminating it in some cases.  But why
    not further reduce contention, especially on large systems, by doing both,
    especially given that the cost of doing both is extremely small?
    
    This commit therefore adds a new rcu_seq_started() function that checks
    whether a specified grace period has already started.  It then uses
    this new function in place of rcu_seq_done() in the rcu_start_this_gp()
    function's funnel locking code.
    
    Reported-by: Joel Fernandes <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 003671825d62..1c5cbd9d7c97 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -108,6 +108,15 @@ static inline unsigned long rcu_seq_current(unsigned long *sp)
 }
 
 /*
+ * Given a snapshot from rcu_seq_snap(), determine whether or not the
+ * corresponding update-side operation has started.
+ */
+static inline bool rcu_seq_started(unsigned long *sp, unsigned long s)
+{
+	return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp));
+}
+
+/*
  * Given a snapshot from rcu_seq_snap(), determine whether or not a
  * full update-side operation has occurred.
  */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9e900c5926cc..ed69f49b7054 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1580,7 +1580,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 		if (rnp_root != rnp)
 			raw_spin_lock_rcu_node(rnp_root);
 		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) ||
-		    rcu_seq_done(&rnp_root->gp_seq, c) ||
+		    rcu_seq_started(&rnp_root->gp_seq, c) ||
 		    (rnp != rnp_root &&
 		     rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
 			trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted"));

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-15 19:08               ` Paul E. McKenney
@ 2018-05-15 22:55                 ` Joel Fernandes
  2018-05-16 15:45                   ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15 22:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Tue, May 15, 2018 at 12:08:01PM -0700, Paul E. McKenney wrote:
> On Tue, May 15, 2018 at 11:41:15AM -0700, Joel Fernandes wrote:
> > On Tue, May 15, 2018 at 05:55:07AM -0700, Paul E. McKenney wrote:
> > > On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote:
> > > > Hi Paul,
> > > > Good morning, hope you're having a great Tuesday. I managed to find some
> > > > evening hours today to dig into this a bit more.
> > > > 
> > > > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote:
> > > > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote:
> > > > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> > > > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > > > > > > > Lets document how it works with an example to make it easier.
> > > > > > > > 
> > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > ---
> > > > > > > >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> > > > > > > >  1 file changed, 23 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > > > > index 003671825d62..fc3170914ac7 100644
> > > > > > > > --- a/kernel/rcu/rcu.h
> > > > > > > > +++ b/kernel/rcu/rcu.h
> > > > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> > > > > > > >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> > > > > > > >  }
> > > > > > > > 
> > > > > > > > -/* Take a snapshot of the update side's sequence number. */
> > > > > > > > +/*
> > > > > > > > + * Take a snapshot of the update side's sequence number.
> > > > > > > > + *
> > > > > > > > + * This function predicts what the grace period number will be the next
> > > > > > > > + * time an RCU callback will be executed, given the current grace period's
> > > > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > > > > > > > + * already in progress.
> > > > > > > 
> > > > > > > How about something like this?
> > > > > > > 
> > > > > > > 	This function returns the earliest value of the grace-period
> > > > > > > 	sequence number that will indicate that a full grace period has
> > > > > > > 	elapsed since the current time.  Once the grace-period sequence
> > > > > > > 	number has reached this value, it will be safe to invoke all
> > > > > > > 	callbacks that have been registered prior to the current time.
> > > > > > > 	This value is the current grace-period number plus two to the
> > > > > > > 	power of the number of low-order bits reserved for state, then
> > > > > > > 	rounded up to the next value in which the state bits are all zero.
> > > > > > 
> > > > > > This makes sense too, but do you disagree with what I said?
> > > > > 
> > > > > In a pedantic sense, definitely.  RCU callbacks are being executed pretty
> > > > > much all the time on a busy system, so it is only the recently queued
> > > > > ones that are guaranteed to be deferred that long.  And my experience
> > > > > indicates that someone really will get confused by that distinction,
> > > > > so I feel justified in being pedantic in this case.
> > > > 
> > > > Ok I agree, I'll include your comment above.
> > > > 
> > > > > > Also just to let you know, thanks so much for elaborately providing an
> > > > > > example on the other thread where we are discussing the rcu_seq_done check. I
> > > > > > will take some time to trace this down and see if I can zero in on the same
> > > > > > understanding as yours.
> > > > > > 
> > > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its
> > > > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing
> > > > > > rnp->gp_seq in rcu_seq_done.  When that rnp->gp_seq reaches 'c', it only
> > > > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what
> > > > > > we were trying to check in that loop... that's why I felt that check wasn't
> > > > > > correct - that's my (most likely wrong) take on the matter, and I'll get back
> > > > > > once I trace this a bit more hopefully today :-P
> > > > > 
> > > > > If your point is that interrupts are disabled throughout, so there isn't
> > > > > much chance of the grace period completing during that time, you are
> > > > > mostly right.  The places you might not be right are the idle loop and
> > > > > offline CPUs.  And yes, call_rcu() doesn't like queuing callbacks onto
> > > > > offline CPUs, but IIRC it is just fine in the case where callbacks have
> > > > > been offloaded from that CPU.
> > > > > 
> > > > > And if you instead say that "c" is the requested final ->gp_seq value
> > > > > obtained from _snap(), the thought process might go more easily.
> > > > 
> > > > Yes I agree with c being the requested final value which is the GP for which
> > > > the callbacks will be queued. At the end of the GP c, the callbacks will have
> > > > executed.
> > > > 
> > > > About the rcu_seq_done check and why I believe its not right to use it in
> > > > that funnel locking loop, if you could allow me to try argument my point from
> > > > a different angle...
> > > > 
> > > > We agreed that the way gp_seq numbers work and are compared with each other
> > > > to identify if a GP is elapsed or not, is different from the way the previous
> > > > numbers (gp_num) were compared.
> > > > 
> > > > Most notably, before the gp_seq conversions - inorder to start a GP, we were
> > > > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the
> > > > end.
> > > > 
> > > > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the
> > > > state bits. To mark the end, we clear the state bits and increment the gp_num
> > > > part of gp_seq.
> > > > 
> > > > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period
> > > > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE
> > > > with rcu_seq_done. You did so even though the gp_seq numbers work differently
> > > > from previously used numbers (gp_num and completed).
> > > > 
> > > > I would then argue that because of the differences above, a one-to-one
> > > > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense.
> > > > 
> > > > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num
> > > > way of things because, if c == gp_num, that means that :
> > > >  - c started already
> > > >  - c has finished.
> > > >  Which worked correctly, because we have nothing to do and we can bail
> > > >  without setting any flag.
> > > > 
> > > >  Where as now, with the gp_seq regime, c == gp_seq means:
> > > >  - c-1 finished   (I meant -1 subtracted from the gp_num part of c)
> > > >  This would cause us to bail without setting any flag for starting c.
> > > > 
> > > >  I did some tracing and I could never hit the rcu_seq_done check because it
> > > >  never happens in my tracing that _snap returned something for which
> > > >  rcu_seq_done returned true, so I'm not sure if this check is needed, but
> > > >  you're the expert ;)
> > > > 
> > > > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> > > >          * not be released.
> > > >          */
> > > >         raw_lockdep_assert_held_rcu_node(rnp);
> > > > +       WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */
> > > > +       WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */
> > > >         trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
> > > >         for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
> > > >                 if (rnp_root != rnp)
> > > >                         raw_spin_lock_rcu_node(rnp_root);
> > > > -               WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum +
> > > > -                                         need_future_gp_mask(), c));
> > > >                 if (need_future_gp_element(rnp_root, c) ||
> > > > -                   ULONG_CMP_GE(rnp_root->gpnum, c) ||
> > > > +                   rcu_seq_done(&rnp_root->gp_seq, c) ||
> > > > 
> > > >                      ^^^^
> > > > 		     A direct replacement of ULONG_CMP_GE is bit weird?  It
> > > > 		     means we bail out if c-1 completed, and we don't set any
> > > > 		     flag for starting c. That could result in the clean up
> > > > 		     never starting c?
> > > 
> > > Ah, I see what you are getting at now.
> > > 
> > > What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check
> > > if GP already requested") is to push the request down to the leaves of
> > > the tree and to the rcu_data structure.  Once that commit is in place,
> > > the check for the grace period already being in progress isn't all
> > > that helpful, though I suppose that it could be added.  One way to
> > > do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with
> > > ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems
> > > a bit baroque to me.
> > > 
> > > The point of the rcu_seq_done() is to catch long delays, but given the
> > > current implementation, the fact that interrupts are disabled across
> > > all calls should prevent the rcu_seq_done() from ever returning true.
> > > (Famous last words!)  So, yes, it could be removed, in theory, at least.
> > > At least until the real-time guys force me to come up with a way to
> > > run this code with interrupts enabled (hopefully never!).
> > > 
> > > If I were to do that, I would first wrap it with a WARN_ON_ONCE() and
> > > leave it that way for an extended period of testing.  Yes, I am paranoid.
> > > Why do you ask?  ;-)
> > :-D
> > 
> > Ah I see what you're doing in that commit where you're moving the furthest
> > request down to the leaves, so that would protect against the scenario I was
> > describing and set the gp_seq_needed of the leaf.
> 
> But I came up with a less baroque check for a grace period having started,
> at which point the question becomes "Why not just do both?", especially
> since a check for a grace period having started is satisfied by that
> grace period's having completed, which means minimal added overhead.
> Perhaps no added overhead for some compilers and architectures.
> 
> Please see the end of this email for a prototype patch.
> 
> > The code would be correct then, but one issue is it would shout out the
> > 'Prestarted' tracepoint for 'c' when that's not really true..
> > 
> >                rcu_seq_done(&rnp_root->gp_seq, c)
> > 
> > translates to ULONG_CMP_GE(&rnp_root->gp_seq, c)
> > 
> > which translates to the fact that c-1 completed.
> > 
> > So in this case if rcu_seq_done returns true, then saying that c has been
> > 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something
> > since what we really are doing is just marking the leaf as you mentioned in
> > the unlock_out part for a future start.
> 
> Indeed, some of the tracing is not all that accurate.  But the trace
> message itself contains the information needed to work out why the
> loop was exited, so perhaps something like 'EarlyExit'?

I think since you're now using rcu_seq_start to determine if c has started or
completed since, the current 'Prestarted' trace will cover it.

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 59a4f38edcffbef1521852fe3b26ed4ed85af16e
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Tue May 15 11:53:41 2018 -0700
> 
>     rcu: Make rcu_start_this_gp() check for grace period already started
>     
>     In the old days of ->gpnum and ->completed, the code requesting a new
>     grace period checked to see if that grace period had already started,
>     bailing early if so.  The new-age ->gp_seq approach instead checks
>     whether the grace period has already finished.  A compensating change
>     pushed the requested grace period down to the bottom of the tree, thus
>     reducing lock contention and even eliminating it in some cases.  But why
>     not further reduce contention, especially on large systems, by doing both,
>     especially given that the cost of doing both is extremely small?
>     
>     This commit therefore adds a new rcu_seq_started() function that checks
>     whether a specified grace period has already started.  It then uses
>     this new function in place of rcu_seq_done() in the rcu_start_this_gp()
>     function's funnel locking code.
>     
>     Reported-by: Joel Fernandes <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 003671825d62..1c5cbd9d7c97 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -108,6 +108,15 @@ static inline unsigned long rcu_seq_current(unsigned long *sp)
>  }
>  
>  /*
> + * Given a snapshot from rcu_seq_snap(), determine whether or not the
> + * corresponding update-side operation has started.
> + */
> +static inline bool rcu_seq_started(unsigned long *sp, unsigned long s)
> +{
> +	return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp));
> +}
> +
> +/*
>   * Given a snapshot from rcu_seq_snap(), determine whether or not a
>   * full update-side operation has occurred.
>   */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 9e900c5926cc..ed69f49b7054 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1580,7 +1580,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  		if (rnp_root != rnp)
>  			raw_spin_lock_rcu_node(rnp_root);
>  		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) ||
> -		    rcu_seq_done(&rnp_root->gp_seq, c) ||
> +		    rcu_seq_started(&rnp_root->gp_seq, c) ||

Yes, this does exactly what I was wanting, thanks! I think this puts our
discussion about this to rest :-)

By the way I was starting to beautify this loop like below last week, with
code comments.  I felt it would be easier to parse this loop in the future
for whoever was reading it.  Are you interested in such a patch? If not, let
me know and I'll drop this and focus on the other changes you requested.

Something like...  (just an example , actual code would be different)

       for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) {
               int prestarted = 0;

               /* Acquire lock if non-leaf node */
               if (rnp_node != rnp)
                       raw_spin_lock_rcu_node(rnp_node);

               /* Has the GP asked been recorded as a future need */
               if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start))
                       prestarted = 1;

               /* Has the GP requested for already been completed */
               if (!prestarted && rcu_seq_completed(&rnp_node->gp_seq, gp_seq_start))
                       prestarted = 1;

		... etc...
	       if (prestarted) {
                       trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
                                         TPS("Prestarted"));
                       goto unlock_out;
		}

thanks,

- Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint
  2018-05-15  3:46       ` Paul E. McKenney
@ 2018-05-15 23:04         ` Joel Fernandes
  2018-05-16 15:48           ` Paul E. McKenney
  0 siblings, 1 reply; 42+ messages in thread
From: Joel Fernandes @ 2018-05-15 23:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Mon, May 14, 2018 at 08:46:03PM -0700, Paul E. McKenney wrote:
> On Mon, May 14, 2018 at 05:57:09PM -0700, Joel Fernandes wrote:
> > On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote:
> > > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote:
> > > > In recent discussion [1], the check for whether a leaf believes RCU is
> > > > not idle, is being added back to funnel locking code, to avoid more
> > > > locking. In this we are marking the leaf node for a future grace-period
> > > > and bailing out since a GP is currently in progress. However the
> > > > tracepoint is missing. Lets add it back.
> > > > 
> > > > Also add a small comment about why we do this check (basically the point
> > > > is to avoid locking intermediate nodes unnecessarily) and clarify the
> > > > comments in the trace event header now that we are doing traversal of
> > > > one or more intermediate nodes.
> > > > 
> > > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > 
> > > Looks like a good idea, but it does not apply -- which is not a surprise,
> > > given the change rate in this code.  I hand-applied as a modification
> > > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress")
> > > with attribution, but with the changes below.  Please let me know if I
> > > am missing something.
> > > 
> > > Ah, I see -- this commit depends on your earlier name-change commit.
> > > I therefore made this patch use the old names.
> > 
> > Ok, I'll check your new tree and rebase.
> 
> Sounds good!
> 
> > > > ---
> > > >  include/trace/events/rcu.h |  4 ++--
> > > >  kernel/rcu/tree.c          | 11 ++++++++++-
> > > >  2 files changed, 12 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > > index 539900a9f8c7..dc0bd11739c7 100644
> > > > --- a/include/trace/events/rcu.h
> > > > +++ b/include/trace/events/rcu.h
> > > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period,
> > > >   *
> > > >   * "Startleaf": Request a grace period based on leaf-node data.
> > > >   * "Prestarted": Someone beat us to the request
> > > > - * "Startedleaf": Leaf-node start proved sufficient.
> > > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root.
> > > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start.
> > > 
> > > Actually, we only get to that trace if all we did was mark the leaf
> > > node, right?
> > 
> > I didn't think so. In the code we are doing the check for rnp every time we
> > walk up the tree. So even when we are on an intermediate node, we do the
> > check of the node we started with. I thought that's what you wanted to do. It
> > makes sense to me to do so too.
> 
> If we are not on the initial (usually leaf) node, then the similar check
> in the previous "if" statement would have sent us to unlock_out, right?
> 
> (And yes, I should have said "mark the initial node" above.)

I may have missed this, sorry. 

Yes, that would be true unless the check could be true not at the firsti
iteration, but after the first iteration? (i.e. another path started the
initially idle GP). That's why I changed it to "one or more non-root nodes
marked".

What do you think?

thanks,

- Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-15 22:55                 ` Joel Fernandes
@ 2018-05-16 15:45                   ` Paul E. McKenney
  2018-05-16 23:21                     ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-16 15:45 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Tue, May 15, 2018 at 03:55:09PM -0700, Joel Fernandes wrote:
> On Tue, May 15, 2018 at 12:08:01PM -0700, Paul E. McKenney wrote:
> > On Tue, May 15, 2018 at 11:41:15AM -0700, Joel Fernandes wrote:
> > > On Tue, May 15, 2018 at 05:55:07AM -0700, Paul E. McKenney wrote:
> > > > On Tue, May 15, 2018 at 12:02:43AM -0700, Joel Fernandes wrote:
> > > > > Hi Paul,
> > > > > Good morning, hope you're having a great Tuesday. I managed to find some
> > > > > evening hours today to dig into this a bit more.
> > > > > 
> > > > > On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote:
> > > > > > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote:
> > > > > > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote:
> > > > > > > > > rcu_seq_snap may be tricky for someone looking at it for the first time.
> > > > > > > > > Lets document how it works with an example to make it easier.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > ---
> > > > > > > > >  kernel/rcu/rcu.h | 24 +++++++++++++++++++++++-
> > > > > > > > >  1 file changed, 23 insertions(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > > > > > index 003671825d62..fc3170914ac7 100644
> > > > > > > > > --- a/kernel/rcu/rcu.h
> > > > > > > > > +++ b/kernel/rcu/rcu.h
> > > > > > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp)
> > > > > > > > >  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
> > > > > > > > >  }
> > > > > > > > > 
> > > > > > > > > -/* Take a snapshot of the update side's sequence number. */
> > > > > > > > > +/*
> > > > > > > > > + * Take a snapshot of the update side's sequence number.
> > > > > > > > > + *
> > > > > > > > > + * This function predicts what the grace period number will be the next
> > > > > > > > > + * time an RCU callback will be executed, given the current grace period's
> > > > > > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is
> > > > > > > > > + * already in progress.
> > > > > > > > 
> > > > > > > > How about something like this?
> > > > > > > > 
> > > > > > > > 	This function returns the earliest value of the grace-period
> > > > > > > > 	sequence number that will indicate that a full grace period has
> > > > > > > > 	elapsed since the current time.  Once the grace-period sequence
> > > > > > > > 	number has reached this value, it will be safe to invoke all
> > > > > > > > 	callbacks that have been registered prior to the current time.
> > > > > > > > 	This value is the current grace-period number plus two to the
> > > > > > > > 	power of the number of low-order bits reserved for state, then
> > > > > > > > 	rounded up to the next value in which the state bits are all zero.
> > > > > > > 
> > > > > > > This makes sense too, but do you disagree with what I said?
> > > > > > 
> > > > > > In a pedantic sense, definitely.  RCU callbacks are being executed pretty
> > > > > > much all the time on a busy system, so it is only the recently queued
> > > > > > ones that are guaranteed to be deferred that long.  And my experience
> > > > > > indicates that someone really will get confused by that distinction,
> > > > > > so I feel justified in being pedantic in this case.
> > > > > 
> > > > > Ok I agree, I'll include your comment above.
> > > > > 
> > > > > > > Also just to let you know, thanks so much for elaborately providing an
> > > > > > > example on the other thread where we are discussing the rcu_seq_done check. I
> > > > > > > will take some time to trace this down and see if I can zero in on the same
> > > > > > > understanding as yours.
> > > > > > > 
> > > > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its
> > > > > > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing
> > > > > > > rnp->gp_seq in rcu_seq_done.  When that rnp->gp_seq reaches 'c', it only
> > > > > > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what
> > > > > > > we were trying to check in that loop... that's why I felt that check wasn't
> > > > > > > correct - that's my (most likely wrong) take on the matter, and I'll get back
> > > > > > > once I trace this a bit more hopefully today :-P
> > > > > > 
> > > > > > If your point is that interrupts are disabled throughout, so there isn't
> > > > > > much chance of the grace period completing during that time, you are
> > > > > > mostly right.  The places you might not be right are the idle loop and
> > > > > > offline CPUs.  And yes, call_rcu() doesn't like queuing callbacks onto
> > > > > > offline CPUs, but IIRC it is just fine in the case where callbacks have
> > > > > > been offloaded from that CPU.
> > > > > > 
> > > > > > And if you instead say that "c" is the requested final ->gp_seq value
> > > > > > obtained from _snap(), the thought process might go more easily.
> > > > > 
> > > > > Yes I agree with c being the requested final value which is the GP for which
> > > > > the callbacks will be queued. At the end of the GP c, the callbacks will have
> > > > > executed.
> > > > > 
> > > > > About the rcu_seq_done check and why I believe its not right to use it in
> > > > > that funnel locking loop, if you could allow me to try argument my point from
> > > > > a different angle...
> > > > > 
> > > > > We agreed that the way gp_seq numbers work and are compared with each other
> > > > > to identify if a GP is elapsed or not, is different from the way the previous
> > > > > numbers (gp_num) were compared.
> > > > > 
> > > > > Most notably, before the gp_seq conversions - inorder to start a GP, we were
> > > > > doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the
> > > > > end.
> > > > > 
> > > > > Now with gp_seq, for a gp to start, we don't do the "+1", we just set the
> > > > > state bits. To mark the end, we clear the state bits and increment the gp_num
> > > > > part of gp_seq.
> > > > > 
> > > > > However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period
> > > > > requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE
> > > > > with rcu_seq_done. You did so even though the gp_seq numbers work differently
> > > > > from previously used numbers (gp_num and completed).
> > > > > 
> > > > > I would then argue that because of the differences above, a one-to-one
> > > > > replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense.
> > > > > 
> > > > > I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num
> > > > > way of things because, if c == gp_num, that means that :
> > > > >  - c started already
> > > > >  - c has finished.
> > > > >  Which worked correctly, because we have nothing to do and we can bail
> > > > >  without setting any flag.
> > > > > 
> > > > >  Where as now, with the gp_seq regime, c == gp_seq means:
> > > > >  - c-1 finished   (I meant -1 subtracted from the gp_num part of c)
> > > > >  This would cause us to bail without setting any flag for starting c.
> > > > > 
> > > > >  I did some tracing and I could never hit the rcu_seq_done check because it
> > > > >  never happens in my tracing that _snap returned something for which
> > > > >  rcu_seq_done returned true, so I'm not sure if this check is needed, but
> > > > >  you're the expert ;)
> > > > > 
> > > > > @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> > > > >          * not be released.
> > > > >          */
> > > > >         raw_lockdep_assert_held_rcu_node(rnp);
> > > > > +       WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */
> > > > > +       WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */
> > > > >         trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf"));
> > > > >         for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
> > > > >                 if (rnp_root != rnp)
> > > > >                         raw_spin_lock_rcu_node(rnp_root);
> > > > > -               WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum +
> > > > > -                                         need_future_gp_mask(), c));
> > > > >                 if (need_future_gp_element(rnp_root, c) ||
> > > > > -                   ULONG_CMP_GE(rnp_root->gpnum, c) ||
> > > > > +                   rcu_seq_done(&rnp_root->gp_seq, c) ||
> > > > > 
> > > > >                      ^^^^
> > > > > 		     A direct replacement of ULONG_CMP_GE is bit weird?  It
> > > > > 		     means we bail out if c-1 completed, and we don't set any
> > > > > 		     flag for starting c. That could result in the clean up
> > > > > 		     never starting c?
> > > > 
> > > > Ah, I see what you are getting at now.
> > > > 
> > > > What I do instead in 334dac2da529 ("rcu: Make rcu_nocb_wait_gp() check
> > > > if GP already requested") is to push the request down to the leaves of
> > > > the tree and to the rcu_data structure.  Once that commit is in place,
> > > > the check for the grace period already being in progress isn't all
> > > > that helpful, though I suppose that it could be added.  One way to
> > > > do that would be to replace "rcu_seq_done(&rnp_root->gp_seq, c)" with
> > > > ULONG_CMP_GE(rnp_root->gpnum, (c - RCU_SEQ_STATE_MASK))", but that seems
> > > > a bit baroque to me.
> > > > 
> > > > The point of the rcu_seq_done() is to catch long delays, but given the
> > > > current implementation, the fact that interrupts are disabled across
> > > > all calls should prevent the rcu_seq_done() from ever returning true.
> > > > (Famous last words!)  So, yes, it could be removed, in theory, at least.
> > > > At least until the real-time guys force me to come up with a way to
> > > > run this code with interrupts enabled (hopefully never!).
> > > > 
> > > > If I were to do that, I would first wrap it with a WARN_ON_ONCE() and
> > > > leave it that way for an extended period of testing.  Yes, I am paranoid.
> > > > Why do you ask?  ;-)
> > > :-D
> > > 
> > > Ah I see what you're doing in that commit where you're moving the furthest
> > > request down to the leaves, so that would protect against the scenario I was
> > > describing and set the gp_seq_needed of the leaf.
> > 
> > But I came up with a less baroque check for a grace period having started,
> > at which point the question becomes "Why not just do both?", especially
> > since a check for a grace period having started is satisfied by that
> > grace period's having completed, which means minimal added overhead.
> > Perhaps no added overhead for some compilers and architectures.
> > 
> > Please see the end of this email for a prototype patch.
> > 
> > > The code would be correct then, but one issue is it would shout out the
> > > 'Prestarted' tracepoint for 'c' when that's not really true..
> > > 
> > >                rcu_seq_done(&rnp_root->gp_seq, c)
> > > 
> > > translates to ULONG_CMP_GE(&rnp_root->gp_seq, c)
> > > 
> > > which translates to the fact that c-1 completed.
> > > 
> > > So in this case if rcu_seq_done returns true, then saying that c has been
> > > 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something
> > > since what we really are doing is just marking the leaf as you mentioned in
> > > the unlock_out part for a future start.
> > 
> > Indeed, some of the tracing is not all that accurate.  But the trace
> > message itself contains the information needed to work out why the
> > loop was exited, so perhaps something like 'EarlyExit'?
> 
> I think since you're now using rcu_seq_start to determine if c has started or
> completed since, the current 'Prestarted' trace will cover it.

"My work is done!"  ;-)

> > ------------------------------------------------------------------------
> > 
> > commit 59a4f38edcffbef1521852fe3b26ed4ed85af16e
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Tue May 15 11:53:41 2018 -0700
> > 
> >     rcu: Make rcu_start_this_gp() check for grace period already started
> >     
> >     In the old days of ->gpnum and ->completed, the code requesting a new
> >     grace period checked to see if that grace period had already started,
> >     bailing early if so.  The new-age ->gp_seq approach instead checks
> >     whether the grace period has already finished.  A compensating change
> >     pushed the requested grace period down to the bottom of the tree, thus
> >     reducing lock contention and even eliminating it in some cases.  But why
> >     not further reduce contention, especially on large systems, by doing both,
> >     especially given that the cost of doing both is extremely small?
> >     
> >     This commit therefore adds a new rcu_seq_started() function that checks
> >     whether a specified grace period has already started.  It then uses
> >     this new function in place of rcu_seq_done() in the rcu_start_this_gp()
> >     function's funnel locking code.
> >     
> >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index 003671825d62..1c5cbd9d7c97 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -108,6 +108,15 @@ static inline unsigned long rcu_seq_current(unsigned long *sp)
> >  }
> >  
> >  /*
> > + * Given a snapshot from rcu_seq_snap(), determine whether or not the
> > + * corresponding update-side operation has started.
> > + */
> > +static inline bool rcu_seq_started(unsigned long *sp, unsigned long s)
> > +{
> > +	return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp));
> > +}
> > +
> > +/*
> >   * Given a snapshot from rcu_seq_snap(), determine whether or not a
> >   * full update-side operation has occurred.
> >   */
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 9e900c5926cc..ed69f49b7054 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1580,7 +1580,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> >  		if (rnp_root != rnp)
> >  			raw_spin_lock_rcu_node(rnp_root);
> >  		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) ||
> > -		    rcu_seq_done(&rnp_root->gp_seq, c) ||
> > +		    rcu_seq_started(&rnp_root->gp_seq, c) ||
> 
> Yes, this does exactly what I was wanting, thanks! I think this puts our
> discussion about this to rest :-)
> 
> By the way I was starting to beautify this loop like below last week, with
> code comments.  I felt it would be easier to parse this loop in the future
> for whoever was reading it.  Are you interested in such a patch? If not, let
> me know and I'll drop this and focus on the other changes you requested.
> 
> Something like...  (just an example , actual code would be different)
> 
>        for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) {
>                int prestarted = 0;
> 
>                /* Acquire lock if non-leaf node */
>                if (rnp_node != rnp)
>                        raw_spin_lock_rcu_node(rnp_node);
> 
>                /* Has the GP asked been recorded as a future need */
>                if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start))
>                        prestarted = 1;
> 
>                /* Has the GP requested for already been completed */
>                if (!prestarted && rcu_seq_completed(&rnp_node->gp_seq, gp_seq_start))
>                        prestarted = 1;
> 
> 		... etc...
> 	       if (prestarted) {
>                        trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
>                                          TPS("Prestarted"));
>                        goto unlock_out;
> 		}

At the moment, I don't believe that the extra lines of code pay for
themselves, but I do agree that this loop is a bit more obtuse than I
would like.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint
  2018-05-15 23:04         ` Joel Fernandes
@ 2018-05-16 15:48           ` Paul E. McKenney
  2018-05-16 23:13             ` Joel Fernandes
  0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2018-05-16 15:48 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Tue, May 15, 2018 at 04:04:30PM -0700, Joel Fernandes wrote:
> On Mon, May 14, 2018 at 08:46:03PM -0700, Paul E. McKenney wrote:
> > On Mon, May 14, 2018 at 05:57:09PM -0700, Joel Fernandes wrote:
> > > On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote:
> > > > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote:
> > > > > In recent discussion [1], the check for whether a leaf believes RCU is
> > > > > not idle, is being added back to funnel locking code, to avoid more
> > > > > locking. In this we are marking the leaf node for a future grace-period
> > > > > and bailing out since a GP is currently in progress. However the
> > > > > tracepoint is missing. Lets add it back.
> > > > > 
> > > > > Also add a small comment about why we do this check (basically the point
> > > > > is to avoid locking intermediate nodes unnecessarily) and clarify the
> > > > > comments in the trace event header now that we are doing traversal of
> > > > > one or more intermediate nodes.
> > > > > 
> > > > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com
> > > > > 
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > 
> > > > Looks like a good idea, but it does not apply -- which is not a surprise,
> > > > given the change rate in this code.  I hand-applied as a modification
> > > > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress")
> > > > with attribution, but with the changes below.  Please let me know if I
> > > > am missing something.
> > > > 
> > > > Ah, I see -- this commit depends on your earlier name-change commit.
> > > > I therefore made this patch use the old names.
> > > 
> > > Ok, I'll check your new tree and rebase.
> > 
> > Sounds good!
> > 
> > > > > ---
> > > > >  include/trace/events/rcu.h |  4 ++--
> > > > >  kernel/rcu/tree.c          | 11 ++++++++++-
> > > > >  2 files changed, 12 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > > > index 539900a9f8c7..dc0bd11739c7 100644
> > > > > --- a/include/trace/events/rcu.h
> > > > > +++ b/include/trace/events/rcu.h
> > > > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period,
> > > > >   *
> > > > >   * "Startleaf": Request a grace period based on leaf-node data.
> > > > >   * "Prestarted": Someone beat us to the request
> > > > > - * "Startedleaf": Leaf-node start proved sufficient.
> > > > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root.
> > > > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start.
> > > > 
> > > > Actually, we only get to that trace if all we did was mark the leaf
> > > > node, right?
> > > 
> > > I didn't think so. In the code we are doing the check for rnp every time we
> > > walk up the tree. So even when we are on an intermediate node, we do the
> > > check of the node we started with. I thought that's what you wanted to do. It
> > > makes sense to me to do so too.
> > 
> > If we are not on the initial (usually leaf) node, then the similar check
> > in the previous "if" statement would have sent us to unlock_out, right?
> > 
> > (And yes, I should have said "mark the initial node" above.)
> 
> I may have missed this, sorry. 
> 
> Yes, that would be true unless the check could be true not at the firsti
> iteration, but after the first iteration? (i.e. another path started the
> initially idle GP). That's why I changed it to "one or more non-root nodes
> marked".

After the first iteration, the check after setting ->gp_seq_needed is
dead code.  If that check would have succeeded, the same check in the
big "if" statement would have taken the early exit.

							Thanx, Paul

> What do you think?
> 
> thanks,
> 
> - Joel
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint
  2018-05-16 15:48           ` Paul E. McKenney
@ 2018-05-16 23:13             ` Joel Fernandes
  0 siblings, 0 replies; 42+ messages in thread
From: Joel Fernandes @ 2018-05-16 23:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Wed, May 16, 2018 at 08:48:29AM -0700, Paul E. McKenney wrote:
> On Tue, May 15, 2018 at 04:04:30PM -0700, Joel Fernandes wrote:
> > On Mon, May 14, 2018 at 08:46:03PM -0700, Paul E. McKenney wrote:
> > > On Mon, May 14, 2018 at 05:57:09PM -0700, Joel Fernandes wrote:
> > > > On Mon, May 14, 2018 at 11:38:23AM -0700, Paul E. McKenney wrote:
> > > > > On Sun, May 13, 2018 at 08:15:39PM -0700, Joel Fernandes (Google) wrote:
> > > > > > In recent discussion [1], the check for whether a leaf believes RCU is
> > > > > > not idle, is being added back to funnel locking code, to avoid more
> > > > > > locking. In this we are marking the leaf node for a future grace-period
> > > > > > and bailing out since a GP is currently in progress. However the
> > > > > > tracepoint is missing. Lets add it back.
> > > > > > 
> > > > > > Also add a small comment about why we do this check (basically the point
> > > > > > is to avoid locking intermediate nodes unnecessarily) and clarify the
> > > > > > comments in the trace event header now that we are doing traversal of
> > > > > > one or more intermediate nodes.
> > > > > > 
> > > > > > [1] http://lkml.kernel.org/r/20180513190906.GL26088@linux.vnet.ibm.com
> > > > > > 
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > 
> > > > > Looks like a good idea, but it does not apply -- which is not a surprise,
> > > > > given the change rate in this code.  I hand-applied as a modification
> > > > > to c1b3f9fce26f ("rcu: Don't funnel-lock above leaf node if GP in progress")
> > > > > with attribution, but with the changes below.  Please let me know if I
> > > > > am missing something.
> > > > > 
> > > > > Ah, I see -- this commit depends on your earlier name-change commit.
> > > > > I therefore made this patch use the old names.
> > > > 
> > > > Ok, I'll check your new tree and rebase.
> > > 
> > > Sounds good!
> > > 
> > > > > > ---
> > > > > >  include/trace/events/rcu.h |  4 ++--
> > > > > >  kernel/rcu/tree.c          | 11 ++++++++++-
> > > > > >  2 files changed, 12 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > > > > index 539900a9f8c7..dc0bd11739c7 100644
> > > > > > --- a/include/trace/events/rcu.h
> > > > > > +++ b/include/trace/events/rcu.h
> > > > > > @@ -91,8 +91,8 @@ TRACE_EVENT(rcu_grace_period,
> > > > > >   *
> > > > > >   * "Startleaf": Request a grace period based on leaf-node data.
> > > > > >   * "Prestarted": Someone beat us to the request
> > > > > > - * "Startedleaf": Leaf-node start proved sufficient.
> > > > > > - * "Startedleafroot": Leaf-node start proved sufficient after checking root.
> > > > > > + * "Startedleaf": Leaf and one or more non-root nodes marked for future start.
> > > > > 
> > > > > Actually, we only get to that trace if all we did was mark the leaf
> > > > > node, right?
> > > > 
> > > > I didn't think so. In the code we are doing the check for rnp every time we
> > > > walk up the tree. So even when we are on an intermediate node, we do the
> > > > check of the node we started with. I thought that's what you wanted to do. It
> > > > makes sense to me to do so too.
> > > 
> > > If we are not on the initial (usually leaf) node, then the similar check
> > > in the previous "if" statement would have sent us to unlock_out, right?
> > > 
> > > (And yes, I should have said "mark the initial node" above.)
> > 
> > I may have missed this, sorry. 
> > 
> > Yes, that would be true unless the check could be true not at the firsti
> > iteration, but after the first iteration? (i.e. another path started the
> > initially idle GP). That's why I changed it to "one or more non-root nodes
> > marked".
> 
> After the first iteration, the check after setting ->gp_seq_needed is
> dead code.  If that check would have succeeded, the same check in the
> big "if" statement would have taken the early exit.

Oh yes, ofcourse!! I understand it now. thanks,

 - Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-16 15:45                   ` Paul E. McKenney
@ 2018-05-16 23:21                     ` Joel Fernandes
  0 siblings, 0 replies; 42+ messages in thread
From: Joel Fernandes @ 2018-05-16 23:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, byungchul.park, kernel-team

On Wed, May 16, 2018 at 08:45:43AM -0700, Paul E. McKenney wrote:
[...]
> > > > The code would be correct then, but one issue is it would shout out the
> > > > 'Prestarted' tracepoint for 'c' when that's not really true..
> > > > 
> > > >                rcu_seq_done(&rnp_root->gp_seq, c)
> > > > 
> > > > translates to ULONG_CMP_GE(&rnp_root->gp_seq, c)
> > > > 
> > > > which translates to the fact that c-1 completed.
> > > > 
> > > > So in this case if rcu_seq_done returns true, then saying that c has been
> > > > 'Prestarted' seems a bit off to me. It should be 'Startedleaf' or something
> > > > since what we really are doing is just marking the leaf as you mentioned in
> > > > the unlock_out part for a future start.
> > > 
> > > Indeed, some of the tracing is not all that accurate.  But the trace
> > > message itself contains the information needed to work out why the
> > > loop was exited, so perhaps something like 'EarlyExit'?
> > 
> > I think since you're now using rcu_seq_start to determine if c has started or
> > completed since, the current 'Prestarted' trace will cover it.
> 
> "My work is done!"  ;-)

:-D Its cool how a conversation can turn into a code improvement.

> > > ------------------------------------------------------------------------
> > > 
> > > commit 59a4f38edcffbef1521852fe3b26ed4ed85af16e
> > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Date:   Tue May 15 11:53:41 2018 -0700
> > > 
> > >     rcu: Make rcu_start_this_gp() check for grace period already started
> > >     
> > >     In the old days of ->gpnum and ->completed, the code requesting a new
> > >     grace period checked to see if that grace period had already started,
> > >     bailing early if so.  The new-age ->gp_seq approach instead checks
> > >     whether the grace period has already finished.  A compensating change
> > >     pushed the requested grace period down to the bottom of the tree, thus
> > >     reducing lock contention and even eliminating it in some cases.  But why
> > >     not further reduce contention, especially on large systems, by doing both,
> > >     especially given that the cost of doing both is extremely small?
> > >     
> > >     This commit therefore adds a new rcu_seq_started() function that checks
> > >     whether a specified grace period has already started.  It then uses
> > >     this new function in place of rcu_seq_done() in the rcu_start_this_gp()
> > >     function's funnel locking code.
> > >     
> > >     Reported-by: Joel Fernandes <joel@joelfernandes.org>
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > index 003671825d62..1c5cbd9d7c97 100644
> > > --- a/kernel/rcu/rcu.h
> > > +++ b/kernel/rcu/rcu.h
> > > @@ -108,6 +108,15 @@ static inline unsigned long rcu_seq_current(unsigned long *sp)
> > >  }
> > >  
> > >  /*
> > > + * Given a snapshot from rcu_seq_snap(), determine whether or not the
> > > + * corresponding update-side operation has started.
> > > + */
> > > +static inline bool rcu_seq_started(unsigned long *sp, unsigned long s)
> > > +{
> > > +	return ULONG_CMP_LT((s - 1) & ~RCU_SEQ_STATE_MASK, READ_ONCE(*sp));
> > > +}
> > > +
> > > +/*
> > >   * Given a snapshot from rcu_seq_snap(), determine whether or not a
> > >   * full update-side operation has occurred.
> > >   */
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 9e900c5926cc..ed69f49b7054 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1580,7 +1580,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> > >  		if (rnp_root != rnp)
> > >  			raw_spin_lock_rcu_node(rnp_root);
> > >  		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, c) ||
> > > -		    rcu_seq_done(&rnp_root->gp_seq, c) ||
> > > +		    rcu_seq_started(&rnp_root->gp_seq, c) ||
> > 
> > Yes, this does exactly what I was wanting, thanks! I think this puts our
> > discussion about this to rest :-)
> > 
> > By the way I was starting to beautify this loop like below last week, with
> > code comments.  I felt it would be easier to parse this loop in the future
> > for whoever was reading it.  Are you interested in such a patch? If not, let
> > me know and I'll drop this and focus on the other changes you requested.
> > 
> > Something like...  (just an example , actual code would be different)
> > 
> >        for (rnp_node = rnp; 1; rnp_node = rnp_node->parent) {
> >                int prestarted = 0;
> > 
> >                /* Acquire lock if non-leaf node */
> >                if (rnp_node != rnp)
> >                        raw_spin_lock_rcu_node(rnp_node);
> > 
> >                /* Has the GP asked been recorded as a future need */
> >                if (ULONG_CMP_GE(rnp_node->gp_seq_needed, gp_seq_start))
> >                        prestarted = 1;
> > 
> >                /* Has the GP requested for already been completed */
> >                if (!prestarted && rcu_seq_completed(&rnp_node->gp_seq, gp_seq_start))
> >                        prestarted = 1;
> > 
> > 		... etc...
> > 	       if (prestarted) {
> >                        trace_rcu_this_gp(rnp_node, rdp, gp_seq_start,
> >                                          TPS("Prestarted"));
> >                        goto unlock_out;
> > 		}
> 
> At the moment, I don't believe that the extra lines of code pay for
> themselves, but I do agree that this loop is a bit more obtuse than I
> would like.

Yeah I was also thinking the same. I'm glad I checked, thanks for the
feedback. I'll focus on the other comments then.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-05-16 23:21 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-14  3:15 [PATCH RFC 0/8] rcu fixes, clean ups for rcu/dev Joel Fernandes (Google)
2018-05-14  3:15 ` [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes (Google)
2018-05-14  3:47   ` Randy Dunlap
2018-05-14  5:05     ` Joel Fernandes
2018-05-14 17:38   ` Paul E. McKenney
2018-05-15  1:51     ` Joel Fernandes
2018-05-15  3:59       ` Paul E. McKenney
2018-05-15  7:02         ` Joel Fernandes
2018-05-15 12:55           ` Paul E. McKenney
2018-05-15 18:41             ` Joel Fernandes
2018-05-15 19:08               ` Paul E. McKenney
2018-05-15 22:55                 ` Joel Fernandes
2018-05-16 15:45                   ` Paul E. McKenney
2018-05-16 23:21                     ` Joel Fernandes
2018-05-14  3:15 ` [PATCH RFC 2/8] rcu: Clarify usage of cond_resched for tasks-RCU Joel Fernandes (Google)
2018-05-14 14:54   ` Steven Rostedt
2018-05-14 17:22     ` Paul E. McKenney
2018-05-15  0:35       ` Joel Fernandes
2018-05-15  3:42         ` Paul E. McKenney
2018-05-14  3:15 ` [PATCH RFC 3/8] rcu: Add back the cpuend tracepoint Joel Fernandes (Google)
2018-05-14 18:12   ` Paul E. McKenney
2018-05-15  0:43     ` Joel Fernandes
2018-05-14  3:15 ` [PATCH RFC 4/8] rcu: Get rid of old c variable from places in tree RCU Joel Fernandes (Google)
2018-05-14 17:57   ` Paul E. McKenney
2018-05-15  0:41     ` Joel Fernandes
2018-05-14  3:15 ` [PATCH RFC 5/8] rcu: Use rcu_node as temporary variable in funnel locking loop Joel Fernandes (Google)
2018-05-14 18:00   ` Paul E. McKenney
2018-05-15  0:43     ` Joel Fernandes
2018-05-14  3:15 ` [PATCH RFC 6/8] rcu: Add back the Startedleaf tracepoint Joel Fernandes (Google)
2018-05-14 18:38   ` Paul E. McKenney
2018-05-15  0:57     ` Joel Fernandes
2018-05-15  3:46       ` Paul E. McKenney
2018-05-15 23:04         ` Joel Fernandes
2018-05-16 15:48           ` Paul E. McKenney
2018-05-16 23:13             ` Joel Fernandes
2018-05-14  3:15 ` [PATCH RFC 7/8] rcu: trace CleanupMore condition only if needed Joel Fernandes (Google)
2018-05-14 19:20   ` Paul E. McKenney
2018-05-15  1:01     ` Joel Fernandes
2018-05-15  3:47       ` Paul E. McKenney
2018-05-14  3:15 ` [PATCH RFC 8/8] rcu: Fix cpustart tracepoint gp_seq number Joel Fernandes (Google)
2018-05-14 20:33   ` Paul E. McKenney
2018-05-15  1:02     ` Joel Fernandes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.