All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] cleanups, fixes for rcu/dev
@ 2018-05-23  6:38 Joel Fernandes
  2018-05-23  6:38 ` [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks Joel Fernandes
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Joel Fernandes @ 2018-05-23  6:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Boqun Feng, byungchul.park, Ingo Molnar,
	Josh Triplett, kernel-team, Lai Jiangshan, Mathieu Desnoyers,
	Paul McKenney, Peter Zilstra, Steven Rostedt

Hi Paul,

Here's an updated series for rcu/dev addressing some of your comments along
with the performance optimization. I also threw in the RCU-tasks speed up
patch again, hopefully that doesn't annoy you :)

cheers,
 - Joel

Joel Fernandes (Google) (4):
  rcu: Speed up calling of RCU tasks callbacks
  rcu: Add comment documenting how rcu_seq_snap works
  rcu: Use better variable names in funnel locking loop
  rcu: Identify grace period is in progress as we advance up the tree

 kernel/rcu/rcu.h    | 34 ++++++++++++++++++++++++++-
 kernel/rcu/tree.c   | 57 +++++++++++++++++++++++----------------------
 kernel/rcu/update.c | 12 +++++++++-
 3 files changed, 73 insertions(+), 30 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23  6:38 [PATCH 0/4] cleanups, fixes for rcu/dev Joel Fernandes
@ 2018-05-23  6:38 ` Joel Fernandes
  2018-05-23 15:57   ` Paul E. McKenney
  2018-07-17  9:11   ` [tip:core/rcu] rcu: Speed up calling of RCU tasks callbacks tip-bot for Steven Rostedt (VMware)
  2018-05-23  6:38 ` [PATCH 2/4] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Joel Fernandes @ 2018-05-23  6:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Steven Rostedt, Peter Zilstra, Ingo Molnar, Boqun Feng,
	Paul McKenney, byungchul.park, kernel-team, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

RCU tasks callbacks can take at least 1 second before the callbacks are
executed. This happens even if the hold-out tasks enter their quiescent states
quickly. I noticed this when I was testing trampoline callback execution.

To test the trampoline freeing, I wrote a simple script:
cd /sys/kernel/debug/tracing/
echo '__schedule_bug:traceon' > set_ftrace_filter;
echo '!__schedule_bug:traceon' > set_ftrace_filter;

In the background I had simple bash while loop:
while [ 1 ]; do x=1; done &

Total time of completion of above commands in seconds:

With this patch:
real    0m0.179s
user    0m0.000s
sys     0m0.054s

Without this patch:
real    0m1.098s
user    0m0.000s
sys     0m0.053s

That's a greater than 6X speed up in performance. In order to accomplish
this, I am waiting for HZ/10 time before entering the hold-out checking
loop. The loop still preserves its checking of held tasks every 1 second
as before, in case this first test doesn't succeed.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: byungchul.park@lge.com
Cc: kernel-team@android.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/update.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 5783bdf86e5a..a28698e44b08 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -743,6 +743,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		 */
 		synchronize_srcu(&tasks_rcu_exit_srcu);
 
+		/*
+		 * Wait a little bit incase held tasks are released
+		 * during their next timer ticks.
+		 */
+		schedule_timeout_interruptible(HZ/10);
+
 		/*
 		 * Each pass through the following loop scans the list
 		 * of holdout tasks, removing any that are no longer
@@ -755,7 +761,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 			int rtst;
 			struct task_struct *t1;
 
-			schedule_timeout_interruptible(HZ);
 			rtst = READ_ONCE(rcu_task_stall_timeout);
 			needreport = rtst > 0 &&
 				     time_after(jiffies, lastreport + rtst);
@@ -768,6 +773,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 				check_holdout_task(t, needreport, &firstreport);
 				cond_resched();
 			}
+
+			if (list_empty(&rcu_tasks_holdouts))
+				break;
+
+			schedule_timeout_interruptible(HZ);
 		}
 
 		/*
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-23  6:38 [PATCH 0/4] cleanups, fixes for rcu/dev Joel Fernandes
  2018-05-23  6:38 ` [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks Joel Fernandes
@ 2018-05-23  6:38 ` Joel Fernandes
  2018-05-23 16:04   ` Paul E. McKenney
  2018-05-23  6:38 ` [PATCH 3/4] rcu: Use better variable names in funnel locking loop Joel Fernandes
  2018-05-23  6:38 ` [PATCH 4/4] rcu: Identify grace period is in progress as we advance up the tree Joel Fernandes
  3 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2018-05-23  6:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Boqun Feng, byungchul.park, Ingo Molnar, Josh Triplett,
	kernel-team, Lai Jiangshan, Mathieu Desnoyers, Paul McKenney,
	Peter Zilstra, Steven Rostedt

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

rcu_seq_snap may be tricky to decipher. Lets document how it works with
an example to make it easier.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu.h | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 0453a7d12b3f..5819a37f19fb 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -91,7 +91,39 @@ static inline void rcu_seq_end(unsigned long *sp)
 	WRITE_ONCE(*sp, rcu_seq_endval(sp));
 }
 
-/* Take a snapshot of the update side's sequence number. */
+/*
+ * rcu_seq_snap - Take a snapshot of the update side's sequence number.
+ *
+ * This function returns the earliest value of the grace-period sequence number
+ * that will indicate that a full grace period has elapsed since the current
+ * time.  Once the grace-period sequence number has reached this value, it will
+ * be safe to invoke all callbacks that have been registered prior to the
+ * current time. This value is the current grace-period number plus two to the
+ * power of the number of low-order bits reserved for state, then rounded up to
+ * the next value in which the state bits are all zero.
+ *
+ * In the current design, RCU_SEQ_STATE_MASK=3 and the least significant bit of
+ * the seq is used to track if a GP is in progress or not. Given this, it is
+ * sufficient if we add (6+1) and mask with ~3 to get the next GP. Let's see
+ * why with an example:
+ *
+ * Say the current seq is 12 which is 0b1100 (GP is 3 and state bits are 0b00).
+ * To get to the next GP number of 4, we have to add 0b100 to this (0x1 << 2)
+ * to account for the shift due to 2 state bits. Now, if the current seq is
+ * 13 (GP is 3 and state bits are 0b01), then it means the current grace period
+ * is already in progress so the next GP that a future call back will be queued
+ * to run at is GP+2 = 5, not 4. To account for the extra +1, we just overflow
+ * the 2 lower bits by adding 0b11. In case the lower bit was set, the overflow
+ * will cause the extra +1 to the GP, along with the usual +1 explained before.
+ * This gives us GP+2. Finally we mask the lower to bits by ~0x3 in case the
+ * overflow didn't occur. This masking is needed because in case RCU was idle
+ * (no GP in progress so lower 2 bits are 0b00), then the overflow of the lower
+ * 2 state bits wouldn't occur, so we mask to zero out those lower 2 bits.
+ *
+ * In other words, the next seq can be obtained by (0b11 + 0b100) & (~0b11)
+ * which can be generalized to:
+ * seq + (RCU_SEQ_STATE_MASK + (RCU_SEQ_STATE_MASK + 1)) & (~RCU_SEQ_STATE_MASK)
+ */
 static inline unsigned long rcu_seq_snap(unsigned long *sp)
 {
 	unsigned long s;
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] rcu: Use better variable names in funnel locking loop
  2018-05-23  6:38 [PATCH 0/4] cleanups, fixes for rcu/dev Joel Fernandes
  2018-05-23  6:38 ` [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks Joel Fernandes
  2018-05-23  6:38 ` [PATCH 2/4] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes
@ 2018-05-23  6:38 ` Joel Fernandes
  2018-05-23 16:06   ` Paul E. McKenney
  2018-05-23  6:38 ` [PATCH 4/4] rcu: Identify grace period is in progress as we advance up the tree Joel Fernandes
  3 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2018-05-23  6:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Boqun Feng, byungchul.park, Ingo Molnar, Josh Triplett,
	kernel-team, Lai Jiangshan, Mathieu Desnoyers, Paul McKenney,
	Peter Zilstra, Steven Rostedt

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

The funnel locking loop in rcu_start_this_gp uses rcu_root as a
temporary variable while walking the combining tree. This causes a
tiresome exercise of a code reader reminding themselves that rcu_root
may not be root. Lets just call it rnp, and rename other variables as
well to be more appropriate.

Original patch: https://patchwork.kernel.org/patch/10396577/

Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 52 +++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0ad61c97da69..31f4b4b7d824 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1526,7 +1526,7 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 
 /*
  * rcu_start_this_gp - Request the start of a particular grace period
- * @rnp: The leaf node of the CPU from which to start.
+ * @rnp_start: The leaf node of the CPU from which to start.
  * @rdp: The rcu_data corresponding to the CPU from which to start.
  * @gp_seq_req: The gp_seq of the grace period to start.
  *
@@ -1540,12 +1540,12 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
  *
  * Returns true if the GP thread needs to be awakened else false.
  */
-static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
+static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 			      unsigned long gp_seq_req)
 {
 	bool ret = false;
 	struct rcu_state *rsp = rdp->rsp;
-	struct rcu_node *rnp_root;
+	struct rcu_node *rnp;
 
 	/*
 	 * Use funnel locking to either acquire the root rcu_node
@@ -1556,58 +1556,58 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	 * scan the leaf rcu_node structures.  Note that rnp->lock must
 	 * not be released.
 	 */
-	raw_lockdep_assert_held_rcu_node(rnp);
-	trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startleaf"));
-	for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
-		if (rnp_root != rnp)
-			raw_spin_lock_rcu_node(rnp_root);
-		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_req) ||
-		    rcu_seq_started(&rnp_root->gp_seq, gp_seq_req) ||
-		    (rnp != rnp_root &&
-		     rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
-			trace_rcu_this_gp(rnp_root, rdp, gp_seq_req,
+	raw_lockdep_assert_held_rcu_node(rnp_start);
+	trace_rcu_this_gp(rnp_start, rdp, gp_seq_req, TPS("Startleaf"));
+	for (rnp = rnp_start; 1; rnp = rnp->parent) {
+		if (rnp != rnp_start)
+			raw_spin_lock_rcu_node(rnp);
+		if (ULONG_CMP_GE(rnp->gp_seq_needed, gp_seq_req) ||
+		    rcu_seq_started(&rnp->gp_seq, gp_seq_req) ||
+		    (rnp != rnp_start &&
+		     rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))) {
+			trace_rcu_this_gp(rnp, rdp, gp_seq_req,
 					  TPS("Prestarted"));
 			goto unlock_out;
 		}
-		rnp_root->gp_seq_needed = gp_seq_req;
-		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
+		rnp->gp_seq_needed = gp_seq_req;
+		if (rcu_seq_state(rcu_seq_current(&rnp_start->gp_seq))) {
 			/*
 			 * We just marked the leaf, and a grace period
 			 * is in progress, which means that rcu_gp_cleanup()
 			 * will see the marking.  Bail to reduce contention.
 			 */
-			trace_rcu_this_gp(rnp, rdp, gp_seq_req,
+			trace_rcu_this_gp(rnp_start, rdp, gp_seq_req,
 					  TPS("Startedleaf"));
 			goto unlock_out;
 		}
-		if (rnp_root != rnp && rnp_root->parent != NULL)
-			raw_spin_unlock_rcu_node(rnp_root);
-		if (!rnp_root->parent)
+		if (rnp != rnp_start && rnp->parent != NULL)
+			raw_spin_unlock_rcu_node(rnp);
+		if (!rnp->parent)
 			break;  /* At root, and perhaps also leaf. */
 	}
 
 	/* If GP already in progress, just leave, otherwise start one. */
 	if (rcu_gp_in_progress(rsp)) {
-		trace_rcu_this_gp(rnp_root, rdp, gp_seq_req, TPS("Startedleafroot"));
+		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedleafroot"));
 		goto unlock_out;
 	}
-	trace_rcu_this_gp(rnp_root, rdp, gp_seq_req, TPS("Startedroot"));
+	trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
 	WRITE_ONCE(rsp->gp_flags, rsp->gp_flags | RCU_GP_FLAG_INIT);
 	rsp->gp_req_activity = jiffies;
 	if (!rsp->gp_kthread) {
-		trace_rcu_this_gp(rnp_root, rdp, gp_seq_req, TPS("NoGPkthread"));
+		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
 		goto unlock_out;
 	}
 	trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq"));
 	ret = true;  /* Caller must wake GP kthread. */
 unlock_out:
 	/* Push furthest requested GP to leaf node and rcu_data structure. */
-	if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_req)) {
-		rnp->gp_seq_needed = gp_seq_req;
+	if (ULONG_CMP_GE(rnp->gp_seq_needed, gp_seq_req)) {
+		rnp_start->gp_seq_needed = gp_seq_req;
 		rdp->gp_seq_needed = gp_seq_req;
 	}
-	if (rnp != rnp_root)
-		raw_spin_unlock_rcu_node(rnp_root);
+	if (rnp != rnp_start)
+		raw_spin_unlock_rcu_node(rnp);
 	return ret;
 }
 
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] rcu: Identify grace period is in progress as we advance up the tree
  2018-05-23  6:38 [PATCH 0/4] cleanups, fixes for rcu/dev Joel Fernandes
                   ` (2 preceding siblings ...)
  2018-05-23  6:38 ` [PATCH 3/4] rcu: Use better variable names in funnel locking loop Joel Fernandes
@ 2018-05-23  6:38 ` Joel Fernandes
  2018-05-23 16:06   ` Paul E. McKenney
  3 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2018-05-23  6:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Boqun Feng, byungchul.park, Ingo Molnar, Josh Triplett,
	kernel-team, Lai Jiangshan, Mathieu Desnoyers, Paul McKenney,
	Peter Zilstra, Steven Rostedt

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

There's no need to keep checking the same starting node for whether a
grace period is in progress as we advance up the funnel lock loop. Its
sufficient if we just checked it in the start, and then subsequently
checked the internal nodes as we advanced up the combining tree. This
also makes sense because the grace-period updates propogate from the
root to the leaf, so there's a chance we may find a grace period has
started as we advance up, lets check for the same.

Reported-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 31f4b4b7d824..65e49282429c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1570,11 +1570,12 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 			goto unlock_out;
 		}
 		rnp->gp_seq_needed = gp_seq_req;
-		if (rcu_seq_state(rcu_seq_current(&rnp_start->gp_seq))) {
+		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
 			/*
-			 * We just marked the leaf, and a grace period
-			 * is in progress, which means that rcu_gp_cleanup()
-			 * will see the marking.  Bail to reduce contention.
+			 * We just marked the leaf or internal node, and a
+			 * grace period is in progress, which means that
+			 * rcu_gp_cleanup() will see the marking.  Bail to
+			 * reduce contention.
 			 */
 			trace_rcu_this_gp(rnp_start, rdp, gp_seq_req,
 					  TPS("Startedleaf"));
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23  6:38 ` [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks Joel Fernandes
@ 2018-05-23 15:57   ` Paul E. McKenney
  2018-05-23 16:45     ` Steven Rostedt
  2018-07-17  9:11   ` [tip:core/rcu] rcu: Speed up calling of RCU tasks callbacks tip-bot for Steven Rostedt (VMware)
  1 sibling, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-23 15:57 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Joel Fernandes (Google),
	Steven Rostedt, Peter Zilstra, Ingo Molnar, Boqun Feng,
	byungchul.park, kernel-team, Josh Triplett, Lai Jiangshan,
	Mathieu Desnoyers

On Tue, May 22, 2018 at 11:38:12PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> RCU tasks callbacks can take at least 1 second before the callbacks are
> executed. This happens even if the hold-out tasks enter their quiescent states
> quickly. I noticed this when I was testing trampoline callback execution.
> 
> To test the trampoline freeing, I wrote a simple script:
> cd /sys/kernel/debug/tracing/
> echo '__schedule_bug:traceon' > set_ftrace_filter;
> echo '!__schedule_bug:traceon' > set_ftrace_filter;
> 
> In the background I had simple bash while loop:
> while [ 1 ]; do x=1; done &
> 
> Total time of completion of above commands in seconds:
> 
> With this patch:
> real    0m0.179s
> user    0m0.000s
> sys     0m0.054s
> 
> Without this patch:
> real    0m1.098s
> user    0m0.000s
> sys     0m0.053s
> 
> That's a greater than 6X speed up in performance. In order to accomplish
> this, I am waiting for HZ/10 time before entering the hold-out checking
> loop. The loop still preserves its checking of held tasks every 1 second
> as before, in case this first test doesn't succeed.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>

Given an ack from Steven, I would be happy to take this, give or take
some nits below.

							Thanx, Paul

> Cc: Peter Zilstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: byungchul.park@lge.com
> Cc: kernel-team@android.com
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/update.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 5783bdf86e5a..a28698e44b08 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -743,6 +743,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  		 */
>  		synchronize_srcu(&tasks_rcu_exit_srcu);
> 
> +		/*
> +		 * Wait a little bit incase held tasks are released

				in case

> +		 * during their next timer ticks.
> +		 */
> +		schedule_timeout_interruptible(HZ/10);
> +
>  		/*
>  		 * Each pass through the following loop scans the list
>  		 * of holdout tasks, removing any that are no longer
> @@ -755,7 +761,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  			int rtst;
>  			struct task_struct *t1;
> 
> -			schedule_timeout_interruptible(HZ);
>  			rtst = READ_ONCE(rcu_task_stall_timeout);
>  			needreport = rtst > 0 &&
>  				     time_after(jiffies, lastreport + rtst);
> @@ -768,6 +773,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  				check_holdout_task(t, needreport, &firstreport);
>  				cond_resched();
>  			}
> +
> +			if (list_empty(&rcu_tasks_holdouts))
> +				break;
> +
> +			schedule_timeout_interruptible(HZ);

Is there a better way to do this?  Can this be converted into a for-loop?
Alternatively, would it make sense to have a firsttime local variable
initialized to true, to keep the schedule_timeout_interruptible() at
the beginning of the loop, but skip it on the first pass through the loop?

Don't get me wrong, what you have looks functionally correct, but
duplicating the condition might cause problems later on, for example,
should a bug fix be needed in the condition.

>  		}
> 
>  		/*
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] rcu: Add comment documenting how rcu_seq_snap works
  2018-05-23  6:38 ` [PATCH 2/4] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes
@ 2018-05-23 16:04   ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-23 16:04 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Joel Fernandes (Google),
	Boqun Feng, byungchul.park, Ingo Molnar, Josh Triplett,
	kernel-team, Lai Jiangshan, Mathieu Desnoyers, Peter Zilstra,
	Steven Rostedt

On Tue, May 22, 2018 at 11:38:13PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> rcu_seq_snap may be tricky to decipher. Lets document how it works with
> an example to make it easier.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Very good, queued for further review, thank you!

							Thanx, Paul

> ---
>  kernel/rcu/rcu.h | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 0453a7d12b3f..5819a37f19fb 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -91,7 +91,39 @@ static inline void rcu_seq_end(unsigned long *sp)
>  	WRITE_ONCE(*sp, rcu_seq_endval(sp));
>  }
> 
> -/* Take a snapshot of the update side's sequence number. */
> +/*
> + * rcu_seq_snap - Take a snapshot of the update side's sequence number.
> + *
> + * This function returns the earliest value of the grace-period sequence number
> + * that will indicate that a full grace period has elapsed since the current
> + * time.  Once the grace-period sequence number has reached this value, it will
> + * be safe to invoke all callbacks that have been registered prior to the
> + * current time. This value is the current grace-period number plus two to the
> + * power of the number of low-order bits reserved for state, then rounded up to
> + * the next value in which the state bits are all zero.
> + *
> + * In the current design, RCU_SEQ_STATE_MASK=3 and the least significant bit of
> + * the seq is used to track if a GP is in progress or not. Given this, it is
> + * sufficient if we add (6+1) and mask with ~3 to get the next GP. Let's see
> + * why with an example:
> + *
> + * Say the current seq is 12 which is 0b1100 (GP is 3 and state bits are 0b00).
> + * To get to the next GP number of 4, we have to add 0b100 to this (0x1 << 2)
> + * to account for the shift due to 2 state bits. Now, if the current seq is
> + * 13 (GP is 3 and state bits are 0b01), then it means the current grace period
> + * is already in progress so the next GP that a future call back will be queued
> + * to run at is GP+2 = 5, not 4. To account for the extra +1, we just overflow
> + * the 2 lower bits by adding 0b11. In case the lower bit was set, the overflow
> + * will cause the extra +1 to the GP, along with the usual +1 explained before.
> + * This gives us GP+2. Finally we mask the lower to bits by ~0x3 in case the
> + * overflow didn't occur. This masking is needed because in case RCU was idle
> + * (no GP in progress so lower 2 bits are 0b00), then the overflow of the lower
> + * 2 state bits wouldn't occur, so we mask to zero out those lower 2 bits.
> + *
> + * In other words, the next seq can be obtained by (0b11 + 0b100) & (~0b11)
> + * which can be generalized to:
> + * seq + (RCU_SEQ_STATE_MASK + (RCU_SEQ_STATE_MASK + 1)) & (~RCU_SEQ_STATE_MASK)
> + */
>  static inline unsigned long rcu_seq_snap(unsigned long *sp)
>  {
>  	unsigned long s;
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] rcu: Use better variable names in funnel locking loop
  2018-05-23  6:38 ` [PATCH 3/4] rcu: Use better variable names in funnel locking loop Joel Fernandes
@ 2018-05-23 16:06   ` Paul E. McKenney
  2018-05-23 19:23     ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-23 16:06 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Joel Fernandes (Google),
	Boqun Feng, byungchul.park, Ingo Molnar, Josh Triplett,
	kernel-team, Lai Jiangshan, Mathieu Desnoyers, Peter Zilstra,
	Steven Rostedt

On Tue, May 22, 2018 at 11:38:14PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> The funnel locking loop in rcu_start_this_gp uses rcu_root as a
> temporary variable while walking the combining tree. This causes a
> tiresome exercise of a code reader reminding themselves that rcu_root
> may not be root. Lets just call it rnp, and rename other variables as
> well to be more appropriate.
> 
> Original patch: https://patchwork.kernel.org/patch/10396577/
> 
> Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

I used to have double Signed-off-by back when I was seconded to Linaro.
But I am guessing that you want the second and don't need the first
one.  Unless you tell me otherwise, I will remove the first one on
my next rebase.

Anyway, the new variable names are much more clear, good stuff,
queued for further review and testing, thank you!

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 52 +++++++++++++++++++++++------------------------
>  1 file changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 0ad61c97da69..31f4b4b7d824 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1526,7 +1526,7 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> 
>  /*
>   * rcu_start_this_gp - Request the start of a particular grace period
> - * @rnp: The leaf node of the CPU from which to start.
> + * @rnp_start: The leaf node of the CPU from which to start.
>   * @rdp: The rcu_data corresponding to the CPU from which to start.
>   * @gp_seq_req: The gp_seq of the grace period to start.
>   *
> @@ -1540,12 +1540,12 @@ static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>   *
>   * Returns true if the GP thread needs to be awakened else false.
>   */
> -static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
> +static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
>  			      unsigned long gp_seq_req)
>  {
>  	bool ret = false;
>  	struct rcu_state *rsp = rdp->rsp;
> -	struct rcu_node *rnp_root;
> +	struct rcu_node *rnp;
> 
>  	/*
>  	 * Use funnel locking to either acquire the root rcu_node
> @@ -1556,58 +1556,58 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	 * scan the leaf rcu_node structures.  Note that rnp->lock must
>  	 * not be released.
>  	 */
> -	raw_lockdep_assert_held_rcu_node(rnp);
> -	trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startleaf"));
> -	for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) {
> -		if (rnp_root != rnp)
> -			raw_spin_lock_rcu_node(rnp_root);
> -		if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_req) ||
> -		    rcu_seq_started(&rnp_root->gp_seq, gp_seq_req) ||
> -		    (rnp != rnp_root &&
> -		     rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) {
> -			trace_rcu_this_gp(rnp_root, rdp, gp_seq_req,
> +	raw_lockdep_assert_held_rcu_node(rnp_start);
> +	trace_rcu_this_gp(rnp_start, rdp, gp_seq_req, TPS("Startleaf"));
> +	for (rnp = rnp_start; 1; rnp = rnp->parent) {
> +		if (rnp != rnp_start)
> +			raw_spin_lock_rcu_node(rnp);
> +		if (ULONG_CMP_GE(rnp->gp_seq_needed, gp_seq_req) ||
> +		    rcu_seq_started(&rnp->gp_seq, gp_seq_req) ||
> +		    (rnp != rnp_start &&
> +		     rcu_seq_state(rcu_seq_current(&rnp->gp_seq)))) {
> +			trace_rcu_this_gp(rnp, rdp, gp_seq_req,
>  					  TPS("Prestarted"));
>  			goto unlock_out;
>  		}
> -		rnp_root->gp_seq_needed = gp_seq_req;
> -		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
> +		rnp->gp_seq_needed = gp_seq_req;
> +		if (rcu_seq_state(rcu_seq_current(&rnp_start->gp_seq))) {
>  			/*
>  			 * We just marked the leaf, and a grace period
>  			 * is in progress, which means that rcu_gp_cleanup()
>  			 * will see the marking.  Bail to reduce contention.
>  			 */
> -			trace_rcu_this_gp(rnp, rdp, gp_seq_req,
> +			trace_rcu_this_gp(rnp_start, rdp, gp_seq_req,
>  					  TPS("Startedleaf"));
>  			goto unlock_out;
>  		}
> -		if (rnp_root != rnp && rnp_root->parent != NULL)
> -			raw_spin_unlock_rcu_node(rnp_root);
> -		if (!rnp_root->parent)
> +		if (rnp != rnp_start && rnp->parent != NULL)
> +			raw_spin_unlock_rcu_node(rnp);
> +		if (!rnp->parent)
>  			break;  /* At root, and perhaps also leaf. */
>  	}
> 
>  	/* If GP already in progress, just leave, otherwise start one. */
>  	if (rcu_gp_in_progress(rsp)) {
> -		trace_rcu_this_gp(rnp_root, rdp, gp_seq_req, TPS("Startedleafroot"));
> +		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedleafroot"));
>  		goto unlock_out;
>  	}
> -	trace_rcu_this_gp(rnp_root, rdp, gp_seq_req, TPS("Startedroot"));
> +	trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
>  	WRITE_ONCE(rsp->gp_flags, rsp->gp_flags | RCU_GP_FLAG_INIT);
>  	rsp->gp_req_activity = jiffies;
>  	if (!rsp->gp_kthread) {
> -		trace_rcu_this_gp(rnp_root, rdp, gp_seq_req, TPS("NoGPkthread"));
> +		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
>  		goto unlock_out;
>  	}
>  	trace_rcu_grace_period(rsp->name, READ_ONCE(rsp->gp_seq), TPS("newreq"));
>  	ret = true;  /* Caller must wake GP kthread. */
>  unlock_out:
>  	/* Push furthest requested GP to leaf node and rcu_data structure. */
> -	if (ULONG_CMP_GE(rnp_root->gp_seq_needed, gp_seq_req)) {
> -		rnp->gp_seq_needed = gp_seq_req;
> +	if (ULONG_CMP_GE(rnp->gp_seq_needed, gp_seq_req)) {
> +		rnp_start->gp_seq_needed = gp_seq_req;
>  		rdp->gp_seq_needed = gp_seq_req;
>  	}
> -	if (rnp != rnp_root)
> -		raw_spin_unlock_rcu_node(rnp_root);
> +	if (rnp != rnp_start)
> +		raw_spin_unlock_rcu_node(rnp);
>  	return ret;
>  }
> 
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] rcu: Identify grace period is in progress as we advance up the tree
  2018-05-23  6:38 ` [PATCH 4/4] rcu: Identify grace period is in progress as we advance up the tree Joel Fernandes
@ 2018-05-23 16:06   ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-23 16:06 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Joel Fernandes (Google),
	Boqun Feng, byungchul.park, Ingo Molnar, Josh Triplett,
	kernel-team, Lai Jiangshan, Mathieu Desnoyers, Peter Zilstra,
	Steven Rostedt

On Tue, May 22, 2018 at 11:38:15PM -0700, Joel Fernandes wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> There's no need to keep checking the same starting node for whether a
> grace period is in progress as we advance up the funnel lock loop. Its
> sufficient if we just checked it in the start, and then subsequently
> checked the internal nodes as we advanced up the combining tree. This
> also makes sense because the grace-period updates propogate from the
> root to the leaf, so there's a chance we may find a grace period has
> started as we advance up, lets check for the same.
> 
> Reported-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Looks good, queued for further review and testing, thank you!

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 31f4b4b7d824..65e49282429c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1570,11 +1570,12 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
>  			goto unlock_out;
>  		}
>  		rnp->gp_seq_needed = gp_seq_req;
> -		if (rcu_seq_state(rcu_seq_current(&rnp_start->gp_seq))) {
> +		if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
>  			/*
> -			 * We just marked the leaf, and a grace period
> -			 * is in progress, which means that rcu_gp_cleanup()
> -			 * will see the marking.  Bail to reduce contention.
> +			 * We just marked the leaf or internal node, and a
> +			 * grace period is in progress, which means that
> +			 * rcu_gp_cleanup() will see the marking.  Bail to
> +			 * reduce contention.
>  			 */
>  			trace_rcu_this_gp(rnp_start, rdp, gp_seq_req,
>  					  TPS("Startedleaf"));
> -- 
> 2.17.0.441.gb46fe60e1d-goog
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23 15:57   ` Paul E. McKenney
@ 2018-05-23 16:45     ` Steven Rostedt
  2018-05-23 17:03       ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2018-05-23 16:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, linux-kernel, Joel Fernandes (Google),
	Peter Zilstra, Ingo Molnar, Boqun Feng, byungchul.park,
	kernel-team, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Wed, 23 May 2018 08:57:34 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, May 22, 2018 at 11:38:12PM -0700, Joel Fernandes wrote:
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > RCU tasks callbacks can take at least 1 second before the callbacks are
> > executed. This happens even if the hold-out tasks enter their quiescent states
> > quickly. I noticed this when I was testing trampoline callback execution.
> > 
> > To test the trampoline freeing, I wrote a simple script:
> > cd /sys/kernel/debug/tracing/
> > echo '__schedule_bug:traceon' > set_ftrace_filter;
> > echo '!__schedule_bug:traceon' > set_ftrace_filter;
> > 
> > In the background I had simple bash while loop:
> > while [ 1 ]; do x=1; done &
> > 
> > Total time of completion of above commands in seconds:
> > 
> > With this patch:
> > real    0m0.179s
> > user    0m0.000s
> > sys     0m0.054s
> > 
> > Without this patch:
> > real    0m1.098s
> > user    0m0.000s
> > sys     0m0.053s
> > 
> > That's a greater than 6X speed up in performance. In order to accomplish
> > this, I am waiting for HZ/10 time before entering the hold-out checking
> > loop. The loop still preserves its checking of held tasks every 1 second
> > as before, in case this first test doesn't succeed.
> > 
> > Cc: Steven Rostedt <rostedt@goodmis.org>  
> 
> Given an ack from Steven, I would be happy to take this, give or take
> some nits below.

I'm currently testing it, and trying to understand it better.

> 
> 							Thanx, Paul
> 
> > Cc: Peter Zilstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: byungchul.park@lge.com
> > Cc: kernel-team@android.com
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/update.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 5783bdf86e5a..a28698e44b08 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -743,6 +743,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  		 */
> >  		synchronize_srcu(&tasks_rcu_exit_srcu);
> > 
> > +		/*
> > +		 * Wait a little bit incase held tasks are released  
> 
> 				in case
> 
> > +		 * during their next timer ticks.
> > +		 */
> > +		schedule_timeout_interruptible(HZ/10);
> > +
> >  		/*
> >  		 * Each pass through the following loop scans the list
> >  		 * of holdout tasks, removing any that are no longer
> > @@ -755,7 +761,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  			int rtst;
> >  			struct task_struct *t1;
> > 
> > -			schedule_timeout_interruptible(HZ);
> >  			rtst = READ_ONCE(rcu_task_stall_timeout);
> >  			needreport = rtst > 0 &&
> >  				     time_after(jiffies, lastreport + rtst);
> > @@ -768,6 +773,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  				check_holdout_task(t, needreport, &firstreport);
> >  				cond_resched();
> >  			}
> > +
> > +			if (list_empty(&rcu_tasks_holdouts))
> > +				break;
> > +
> > +			schedule_timeout_interruptible(HZ);  

Why is this a full second wait and not the HZ/10 like the others?

-- Steve

> 
> Is there a better way to do this?  Can this be converted into a for-loop?
> Alternatively, would it make sense to have a firsttime local variable
> initialized to true, to keep the schedule_timeout_interruptible() at
> the beginning of the loop, but skip it on the first pass through the loop?
> 
> Don't get me wrong, what you have looks functionally correct, but
> duplicating the condition might cause problems later on, for example,
> should a bug fix be needed in the condition.
> 
> >  		}
> > 
> >  		/*
> > -- 
> > 2.17.0.441.gb46fe60e1d-goog
> >   

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23 16:45     ` Steven Rostedt
@ 2018-05-23 17:03       ` Paul E. McKenney
  2018-05-23 19:13         ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-23 17:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, linux-kernel, Joel Fernandes (Google),
	Peter Zilstra, Ingo Molnar, Boqun Feng, byungchul.park,
	kernel-team, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Wed, May 23, 2018 at 12:45:31PM -0400, Steven Rostedt wrote:
> On Wed, 23 May 2018 08:57:34 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, May 22, 2018 at 11:38:12PM -0700, Joel Fernandes wrote:
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > 
> > > RCU tasks callbacks can take at least 1 second before the callbacks are
> > > executed. This happens even if the hold-out tasks enter their quiescent states
> > > quickly. I noticed this when I was testing trampoline callback execution.
> > > 
> > > To test the trampoline freeing, I wrote a simple script:
> > > cd /sys/kernel/debug/tracing/
> > > echo '__schedule_bug:traceon' > set_ftrace_filter;
> > > echo '!__schedule_bug:traceon' > set_ftrace_filter;
> > > 
> > > In the background I had simple bash while loop:
> > > while [ 1 ]; do x=1; done &
> > > 
> > > Total time of completion of above commands in seconds:
> > > 
> > > With this patch:
> > > real    0m0.179s
> > > user    0m0.000s
> > > sys     0m0.054s
> > > 
> > > Without this patch:
> > > real    0m1.098s
> > > user    0m0.000s
> > > sys     0m0.053s
> > > 
> > > That's a greater than 6X speed up in performance. In order to accomplish
> > > this, I am waiting for HZ/10 time before entering the hold-out checking
> > > loop. The loop still preserves its checking of held tasks every 1 second
> > > as before, in case this first test doesn't succeed.
> > > 
> > > Cc: Steven Rostedt <rostedt@goodmis.org>  
> > 
> > Given an ack from Steven, I would be happy to take this, give or take
> > some nits below.
> 
> I'm currently testing it, and trying to understand it better.

Very good, thank you!

> > > Cc: Peter Zilstra <peterz@infradead.org>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> > > Cc: byungchul.park@lge.com
> > > Cc: kernel-team@android.com
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/update.c | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > index 5783bdf86e5a..a28698e44b08 100644
> > > --- a/kernel/rcu/update.c
> > > +++ b/kernel/rcu/update.c
> > > @@ -743,6 +743,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > >  		 */
> > >  		synchronize_srcu(&tasks_rcu_exit_srcu);
> > > 
> > > +		/*
> > > +		 * Wait a little bit incase held tasks are released  
> > 
> > 				in case
> > 
> > > +		 * during their next timer ticks.
> > > +		 */
> > > +		schedule_timeout_interruptible(HZ/10);
> > > +
> > >  		/*
> > >  		 * Each pass through the following loop scans the list
> > >  		 * of holdout tasks, removing any that are no longer
> > > @@ -755,7 +761,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > >  			int rtst;
> > >  			struct task_struct *t1;
> > > 
> > > -			schedule_timeout_interruptible(HZ);
> > >  			rtst = READ_ONCE(rcu_task_stall_timeout);
> > >  			needreport = rtst > 0 &&
> > >  				     time_after(jiffies, lastreport + rtst);
> > > @@ -768,6 +773,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > >  				check_holdout_task(t, needreport, &firstreport);
> > >  				cond_resched();
> > >  			}
> > > +
> > > +			if (list_empty(&rcu_tasks_holdouts))
> > > +				break;
> > > +
> > > +			schedule_timeout_interruptible(HZ);  
> 
> Why is this a full second wait and not the HZ/10 like the others?

The idea is to respond quickly on small idle systems and to reduce the
number of possibly quite lengthy traversals of the task list otherwise.
I actually considered exponential backoff, but decided to keep it simple,
at least to start with.

							Thanx, Paul

> -- Steve
> 
> > 
> > Is there a better way to do this?  Can this be converted into a for-loop?
> > Alternatively, would it make sense to have a firsttime local variable
> > initialized to true, to keep the schedule_timeout_interruptible() at
> > the beginning of the loop, but skip it on the first pass through the loop?
> > 
> > Don't get me wrong, what you have looks functionally correct, but
> > duplicating the condition might cause problems later on, for example,
> > should a bug fix be needed in the condition.
> > 
> > >  		}
> > > 
> > >  		/*
> > > -- 
> > > 2.17.0.441.gb46fe60e1d-goog
> > >   
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23 17:03       ` Paul E. McKenney
@ 2018-05-23 19:13         ` Steven Rostedt
  2018-05-23 20:04           ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2018-05-23 19:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, linux-kernel, Joel Fernandes (Google),
	Peter Zilstra, Ingo Molnar, Boqun Feng, byungchul.park,
	kernel-team, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Wed, 23 May 2018 10:03:03 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > index 5783bdf86e5a..a28698e44b08 100644
> > > > --- a/kernel/rcu/update.c
> > > > +++ b/kernel/rcu/update.c
> > > > @@ -743,6 +743,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > >  		 */
> > > >  		synchronize_srcu(&tasks_rcu_exit_srcu);
> > > > 
> > > > +		/*
> > > > +		 * Wait a little bit incase held tasks are released    
> > > 
> > > 				in case
> > >   
> > > > +		 * during their next timer ticks.
> > > > +		 */
> > > > +		schedule_timeout_interruptible(HZ/10);
> > > > +
> > > >  		/*
> > > >  		 * Each pass through the following loop scans the list
> > > >  		 * of holdout tasks, removing any that are no longer
> > > > @@ -755,7 +761,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > >  			int rtst;
> > > >  			struct task_struct *t1;
> > > > 
> > > > -			schedule_timeout_interruptible(HZ);
> > > >  			rtst = READ_ONCE(rcu_task_stall_timeout);
> > > >  			needreport = rtst > 0 &&
> > > >  				     time_after(jiffies, lastreport + rtst);
> > > > @@ -768,6 +773,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > >  				check_holdout_task(t, needreport, &firstreport);
> > > >  				cond_resched();
> > > >  			}
> > > > +
> > > > +			if (list_empty(&rcu_tasks_holdouts))
> > > > +				break;
> > > > +
> > > > +			schedule_timeout_interruptible(HZ);    
> > 
> > Why is this a full second wait and not the HZ/10 like the others?  
> 
> The idea is to respond quickly on small idle systems and to reduce the
> number of possibly quite lengthy traversals of the task list otherwise.
> I actually considered exponential backoff, but decided to keep it simple,
> at least to start with.

Ah, now it makes sense. Reading what you wrote, we can still do a
backoff and keep it simple. What about the patch below. It appears to
have the same performance improvement as Joel's

-- Steve

> > > 
> > > Is there a better way to do this?  Can this be converted into a for-loop?
> > > Alternatively, would it make sense to have a firsttime local variable
> > > initialized to true, to keep the schedule_timeout_interruptible() at
> > > the beginning of the loop, but skip it on the first pass through the loop?
> > > 
> > > Don't get me wrong, what you have looks functionally correct, but
> > > duplicating the condition might cause problems later on, for example,
> > > should a bug fix be needed in the condition.
> > >   


diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 68fa19a5e7bd..c6df9fa916cf 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -796,13 +796,22 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		 * holdouts.  When the list is empty, we are done.
 		 */
 		lastreport = jiffies;
-		while (!list_empty(&rcu_tasks_holdouts)) {
+		for (;;) {
 			bool firstreport;
 			bool needreport;
 			int rtst;
 			struct task_struct *t1;
+			int fract = 15;
+
+			/* Slowly back off waiting for holdouts */
+			schedule_timeout_interruptible(HZ/fract);
+
+			if (list_empty(&rcu_tasks_holdouts))
+				break;
+
+			if (fract > 1)
+				fract--;
 
-			schedule_timeout_interruptible(HZ);
 			rtst = READ_ONCE(rcu_task_stall_timeout);
 			needreport = rtst > 0 &&
 				     time_after(jiffies, lastreport + rtst);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] rcu: Use better variable names in funnel locking loop
  2018-05-23 16:06   ` Paul E. McKenney
@ 2018-05-23 19:23     ` Paul E. McKenney
  2018-05-24  0:54       ` Joel Fernandes
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-23 19:23 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Joel Fernandes (Google),
	Boqun Feng, byungchul.park, Ingo Molnar, Josh Triplett,
	kernel-team, Lai Jiangshan, Mathieu Desnoyers, Peter Zilstra,
	Steven Rostedt

On Wed, May 23, 2018 at 09:06:17AM -0700, Paul E. McKenney wrote:
> On Tue, May 22, 2018 at 11:38:14PM -0700, Joel Fernandes wrote:
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > The funnel locking loop in rcu_start_this_gp uses rcu_root as a
> > temporary variable while walking the combining tree. This causes a
> > tiresome exercise of a code reader reminding themselves that rcu_root
> > may not be root. Lets just call it rnp, and rename other variables as
> > well to be more appropriate.
> > 
> > Original patch: https://patchwork.kernel.org/patch/10396577/
> > 
> > Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> I used to have double Signed-off-by back when I was seconded to Linaro.
> But I am guessing that you want the second and don't need the first
> one.  Unless you tell me otherwise, I will remove the first one on
> my next rebase.
> 
> Anyway, the new variable names are much more clear, good stuff,
> queued for further review and testing, thank you!

And it looks to me like I should fold in the patchlet below to change to
rnp_start in a comment.  Please let me know if this would mess things up.

							Thanx, Paul

------------------------------------------------------------------------

commit 94ce05d9d110b8c34eca6641ca5221c1b150e99f
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed May 23 12:22:01 2018 -0700

    fixup! rcu: Use better variable names in funnel locking loop
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65e49282429c..fdba8ab95e2c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1550,11 +1550,11 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
 	/*
 	 * Use funnel locking to either acquire the root rcu_node
 	 * structure's lock or bail out if the need for this grace period
-	 * has already been recorded -- or has already started.  If there
-	 * is already a grace period in progress in a non-leaf node, no
-	 * recording is needed because the end of the grace period will
-	 * scan the leaf rcu_node structures.  Note that rnp->lock must
-	 * not be released.
+	 * has already been recorded -- or if that grace period has in
+	 * fact already started.  If there is already a grace period in
+	 * progress in a non-leaf node, no recording is needed because the
+	 * end of the grace period will scan the leaf rcu_node structures.
+	 * Note that rnp_start->lock must not be released.
 	 */
 	raw_lockdep_assert_held_rcu_node(rnp_start);
 	trace_rcu_this_gp(rnp_start, rdp, gp_seq_req, TPS("Startleaf"));

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23 19:13         ` Steven Rostedt
@ 2018-05-23 20:04           ` Paul E. McKenney
  2018-05-23 21:51             ` Joel Fernandes
                               ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-23 20:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, linux-kernel, Joel Fernandes (Google),
	Peter Zilstra, Ingo Molnar, Boqun Feng, byungchul.park,
	kernel-team, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Wed, May 23, 2018 at 03:13:37PM -0400, Steven Rostedt wrote:
> On Wed, 23 May 2018 10:03:03 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > > index 5783bdf86e5a..a28698e44b08 100644
> > > > > --- a/kernel/rcu/update.c
> > > > > +++ b/kernel/rcu/update.c
> > > > > @@ -743,6 +743,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > >  		 */
> > > > >  		synchronize_srcu(&tasks_rcu_exit_srcu);
> > > > > 
> > > > > +		/*
> > > > > +		 * Wait a little bit incase held tasks are released    
> > > > 
> > > > 				in case
> > > >   
> > > > > +		 * during their next timer ticks.
> > > > > +		 */
> > > > > +		schedule_timeout_interruptible(HZ/10);
> > > > > +
> > > > >  		/*
> > > > >  		 * Each pass through the following loop scans the list
> > > > >  		 * of holdout tasks, removing any that are no longer
> > > > > @@ -755,7 +761,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > >  			int rtst;
> > > > >  			struct task_struct *t1;
> > > > > 
> > > > > -			schedule_timeout_interruptible(HZ);
> > > > >  			rtst = READ_ONCE(rcu_task_stall_timeout);
> > > > >  			needreport = rtst > 0 &&
> > > > >  				     time_after(jiffies, lastreport + rtst);
> > > > > @@ -768,6 +773,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > >  				check_holdout_task(t, needreport, &firstreport);
> > > > >  				cond_resched();
> > > > >  			}
> > > > > +
> > > > > +			if (list_empty(&rcu_tasks_holdouts))
> > > > > +				break;
> > > > > +
> > > > > +			schedule_timeout_interruptible(HZ);    
> > > 
> > > Why is this a full second wait and not the HZ/10 like the others?  
> > 
> > The idea is to respond quickly on small idle systems and to reduce the
> > number of possibly quite lengthy traversals of the task list otherwise.
> > I actually considered exponential backoff, but decided to keep it simple,
> > at least to start with.
> 
> Ah, now it makes sense. Reading what you wrote, we can still do a
> backoff and keep it simple. What about the patch below. It appears to
> have the same performance improvement as Joel's

Looks plausible to me!

Joel, do you see any gotchas in Steve's patch?

							Thanx, Paul

> -- Steve
> 
> > > > 
> > > > Is there a better way to do this?  Can this be converted into a for-loop?
> > > > Alternatively, would it make sense to have a firsttime local variable
> > > > initialized to true, to keep the schedule_timeout_interruptible() at
> > > > the beginning of the loop, but skip it on the first pass through the loop?
> > > > 
> > > > Don't get me wrong, what you have looks functionally correct, but
> > > > duplicating the condition might cause problems later on, for example,
> > > > should a bug fix be needed in the condition.
> > > >   
> 
> 
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 68fa19a5e7bd..c6df9fa916cf 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -796,13 +796,22 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  		 * holdouts.  When the list is empty, we are done.
>  		 */
>  		lastreport = jiffies;
> -		while (!list_empty(&rcu_tasks_holdouts)) {
> +		for (;;) {
>  			bool firstreport;
>  			bool needreport;
>  			int rtst;
>  			struct task_struct *t1;
> +			int fract = 15;
> +
> +			/* Slowly back off waiting for holdouts */
> +			schedule_timeout_interruptible(HZ/fract);
> +
> +			if (list_empty(&rcu_tasks_holdouts))
> +				break;
> +
> +			if (fract > 1)
> +				fract--;
> 
> -			schedule_timeout_interruptible(HZ);
>  			rtst = READ_ONCE(rcu_task_stall_timeout);
>  			needreport = rtst > 0 &&
>  				     time_after(jiffies, lastreport + rtst);
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23 20:04           ` Paul E. McKenney
@ 2018-05-23 21:51             ` Joel Fernandes
  2018-05-24  0:51             ` Joel Fernandes
  2018-05-24 21:47             ` Steven Rostedt
  2 siblings, 0 replies; 24+ messages in thread
From: Joel Fernandes @ 2018-05-23 21:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Joel Fernandes, linux-kernel, Peter Zilstra,
	Ingo Molnar, Boqun Feng, byungchul.park, kernel-team,
	Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Wed, May 23, 2018 at 01:04:58PM -0700, Paul E. McKenney wrote:
> On Wed, May 23, 2018 at 03:13:37PM -0400, Steven Rostedt wrote:
> > On Wed, 23 May 2018 10:03:03 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > > > index 5783bdf86e5a..a28698e44b08 100644
> > > > > > --- a/kernel/rcu/update.c
> > > > > > +++ b/kernel/rcu/update.c
> > > > > > @@ -743,6 +743,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > > >  		 */
> > > > > >  		synchronize_srcu(&tasks_rcu_exit_srcu);
> > > > > > 
> > > > > > +		/*
> > > > > > +		 * Wait a little bit incase held tasks are released    
> > > > > 
> > > > > 				in case
> > > > >   
> > > > > > +		 * during their next timer ticks.
> > > > > > +		 */
> > > > > > +		schedule_timeout_interruptible(HZ/10);
> > > > > > +
> > > > > >  		/*
> > > > > >  		 * Each pass through the following loop scans the list
> > > > > >  		 * of holdout tasks, removing any that are no longer
> > > > > > @@ -755,7 +761,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > > >  			int rtst;
> > > > > >  			struct task_struct *t1;
> > > > > > 
> > > > > > -			schedule_timeout_interruptible(HZ);
> > > > > >  			rtst = READ_ONCE(rcu_task_stall_timeout);
> > > > > >  			needreport = rtst > 0 &&
> > > > > >  				     time_after(jiffies, lastreport + rtst);
> > > > > > @@ -768,6 +773,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > > >  				check_holdout_task(t, needreport, &firstreport);
> > > > > >  				cond_resched();
> > > > > >  			}
> > > > > > +
> > > > > > +			if (list_empty(&rcu_tasks_holdouts))
> > > > > > +				break;
> > > > > > +
> > > > > > +			schedule_timeout_interruptible(HZ);    
> > > > 
> > > > Why is this a full second wait and not the HZ/10 like the others?  
> > > 
> > > The idea is to respond quickly on small idle systems and to reduce the
> > > number of possibly quite lengthy traversals of the task list otherwise.
> > > I actually considered exponential backoff, but decided to keep it simple,
> > > at least to start with.
> > 
> > Ah, now it makes sense. Reading what you wrote, we can still do a
> > backoff and keep it simple. What about the patch below. It appears to
> > have the same performance improvement as Joel's
> 
> Looks plausible to me!
> 
> Joel, do you see any gotchas in Steve's patch?

I see one but I hope I'm not day dreaming.. :D

> > > > > Is there a better way to do this?  Can this be converted into a for-loop?
> > > > > Alternatively, would it make sense to have a firsttime local variable
> > > > > initialized to true, to keep the schedule_timeout_interruptible() at
> > > > > the beginning of the loop, but skip it on the first pass through the loop?
> > > > > 
> > > > > Don't get me wrong, what you have looks functionally correct, but
> > > > > duplicating the condition might cause problems later on, for example,
> > > > > should a bug fix be needed in the condition.

I agree with your suggestions and Steven's patch is better.

> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 68fa19a5e7bd..c6df9fa916cf 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -796,13 +796,22 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  		 * holdouts.  When the list is empty, we are done.
> >  		 */
> >  		lastreport = jiffies;
> > -		while (!list_empty(&rcu_tasks_holdouts)) {
> > +		for (;;) {
> >  			bool firstreport;
> >  			bool needreport;
> >  			int rtst;
> >  			struct task_struct *t1;
> > +			int fract = 15;

Shouldn't this assignment be done outside the loop? I believe the variable
will be initialized on each iteration.

A program like this doesn't terminate:

#include<stdio.h>

int main() {
	for (;;) {
		int i = 10;
		if (!(i--))
			break;
	}

	return 0;
}

Otherwise looks good to me, I would initialize fract to 10 so its consistent
with "HZ/10" in other parts of the code but I'm ok with either number.

thanks!

 - Joel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23 20:04           ` Paul E. McKenney
  2018-05-23 21:51             ` Joel Fernandes
@ 2018-05-24  0:51             ` Joel Fernandes
  2018-05-24  1:35               ` Steven Rostedt
  2018-05-24 21:47             ` Steven Rostedt
  2 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2018-05-24  0:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Joel Fernandes, linux-kernel, Peter Zilstra,
	Ingo Molnar, Boqun Feng, byungchul.park, kernel-team,
	Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Wed, May 23, 2018 at 01:04:58PM -0700, Paul E. McKenney wrote:
> On Wed, May 23, 2018 at 03:13:37PM -0400, Steven Rostedt wrote:
> > On Wed, 23 May 2018 10:03:03 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > > > index 5783bdf86e5a..a28698e44b08 100644
> > > > > > --- a/kernel/rcu/update.c
> > > > > > +++ b/kernel/rcu/update.c
> > > > > > @@ -743,6 +743,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > > >  		 */
> > > > > >  		synchronize_srcu(&tasks_rcu_exit_srcu);
> > > > > > 
> > > > > > +		/*
> > > > > > +		 * Wait a little bit incase held tasks are released    
> > > > > 
> > > > > 				in case
> > > > >   
> > > > > > +		 * during their next timer ticks.
> > > > > > +		 */
> > > > > > +		schedule_timeout_interruptible(HZ/10);
> > > > > > +
> > > > > >  		/*
> > > > > >  		 * Each pass through the following loop scans the list
> > > > > >  		 * of holdout tasks, removing any that are no longer
> > > > > > @@ -755,7 +761,6 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > > >  			int rtst;
> > > > > >  			struct task_struct *t1;
> > > > > > 
> > > > > > -			schedule_timeout_interruptible(HZ);
> > > > > >  			rtst = READ_ONCE(rcu_task_stall_timeout);
> > > > > >  			needreport = rtst > 0 &&
> > > > > >  				     time_after(jiffies, lastreport + rtst);
> > > > > > @@ -768,6 +773,11 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> > > > > >  				check_holdout_task(t, needreport, &firstreport);
> > > > > >  				cond_resched();
> > > > > >  			}
> > > > > > +
> > > > > > +			if (list_empty(&rcu_tasks_holdouts))
> > > > > > +				break;
> > > > > > +
> > > > > > +			schedule_timeout_interruptible(HZ);    
> > > > 
> > > > Why is this a full second wait and not the HZ/10 like the others?  
> > > 
> > > The idea is to respond quickly on small idle systems and to reduce the
> > > number of possibly quite lengthy traversals of the task list otherwise.
> > > I actually considered exponential backoff, but decided to keep it simple,
> > > at least to start with.
> > 
> > Ah, now it makes sense. Reading what you wrote, we can still do a
> > backoff and keep it simple. What about the patch below. It appears to
> > have the same performance improvement as Joel's
> 
> Looks plausible to me!
> 
> Joel, do you see any gotchas in Steve's patch?

I see one but I hope I'm not day dreaming.. :D

> > > > > Is there a better way to do this?  Can this be converted into a for-loop?
> > > > > Alternatively, would it make sense to have a firsttime local variable
> > > > > initialized to true, to keep the schedule_timeout_interruptible() at
> > > > > the beginning of the loop, but skip it on the first pass through the loop?
> > > > > 
> > > > > Don't get me wrong, what you have looks functionally correct, but
> > > > > duplicating the condition might cause problems later on, for example,
> > > > > should a bug fix be needed in the condition.

I agree with your suggestions and Steven's patch is better.

> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index 68fa19a5e7bd..c6df9fa916cf 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -796,13 +796,22 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  		 * holdouts.  When the list is empty, we are done.
> >  		 */
> >  		lastreport = jiffies;
> > -		while (!list_empty(&rcu_tasks_holdouts)) {
> > +		for (;;) {
> >  			bool firstreport;
> >  			bool needreport;
> >  			int rtst;
> >  			struct task_struct *t1;
> > +			int fract = 15;

Shouldn't this assignment be done outside the loop? I believe the variable
will be initialized on each iteration.

A program like this doesn't terminate:

#include<stdio.h>

int main() {
	for (;;) {
		int i = 10;
		if (!(i--))
			break;
	}

	return 0;
}

Otherwise looks good to me, I would initialize fract to 10 so its consistent
with "HZ/10" in other parts of the code but I'm ok with either number.

thanks!

 - Joel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] rcu: Use better variable names in funnel locking loop
  2018-05-23 19:23     ` Paul E. McKenney
@ 2018-05-24  0:54       ` Joel Fernandes
  2018-05-24  1:27         ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Joel Fernandes @ 2018-05-24  0:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, linux-kernel, Boqun Feng, byungchul.park,
	Ingo Molnar, Josh Triplett, kernel-team, Lai Jiangshan,
	Mathieu Desnoyers, Peter Zilstra, Steven Rostedt

On Wed, May 23, 2018 at 12:23:49PM -0700, Paul E. McKenney wrote:
> On Wed, May 23, 2018 at 09:06:17AM -0700, Paul E. McKenney wrote:
> > On Tue, May 22, 2018 at 11:38:14PM -0700, Joel Fernandes wrote:
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > 
> > > The funnel locking loop in rcu_start_this_gp uses rcu_root as a
> > > temporary variable while walking the combining tree. This causes a
> > > tiresome exercise of a code reader reminding themselves that rcu_root
> > > may not be root. Lets just call it rnp, and rename other variables as
> > > well to be more appropriate.
> > > 
> > > Original patch: https://patchwork.kernel.org/patch/10396577/
> > > 
> > > Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > I used to have double Signed-off-by back when I was seconded to Linaro.
> > But I am guessing that you want the second and don't need the first
> > one.  Unless you tell me otherwise, I will remove the first one on
> > my next rebase.
> > 
> > Anyway, the new variable names are much more clear, good stuff,
> > queued for further review and testing, thank you!
> 
> And it looks to me like I should fold in the patchlet below to change to
> rnp_start in a comment.  Please let me know if this would mess things up.

Yes, missed that. Sorry. It looks great, thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] rcu: Use better variable names in funnel locking loop
  2018-05-24  0:54       ` Joel Fernandes
@ 2018-05-24  1:27         ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-24  1:27 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Joel Fernandes, linux-kernel, Boqun Feng, byungchul.park,
	Ingo Molnar, Josh Triplett, kernel-team, Lai Jiangshan,
	Mathieu Desnoyers, Peter Zilstra, Steven Rostedt

On Wed, May 23, 2018 at 05:54:50PM -0700, Joel Fernandes wrote:
> On Wed, May 23, 2018 at 12:23:49PM -0700, Paul E. McKenney wrote:
> > On Wed, May 23, 2018 at 09:06:17AM -0700, Paul E. McKenney wrote:
> > > On Tue, May 22, 2018 at 11:38:14PM -0700, Joel Fernandes wrote:
> > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > 
> > > > The funnel locking loop in rcu_start_this_gp uses rcu_root as a
> > > > temporary variable while walking the combining tree. This causes a
> > > > tiresome exercise of a code reader reminding themselves that rcu_root
> > > > may not be root. Lets just call it rnp, and rename other variables as
> > > > well to be more appropriate.
> > > > 
> > > > Original patch: https://patchwork.kernel.org/patch/10396577/
> > > > 
> > > > Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > 
> > > I used to have double Signed-off-by back when I was seconded to Linaro.
> > > But I am guessing that you want the second and don't need the first
> > > one.  Unless you tell me otherwise, I will remove the first one on
> > > my next rebase.
> > > 
> > > Anyway, the new variable names are much more clear, good stuff,
> > > queued for further review and testing, thank you!
> > 
> > And it looks to me like I should fold in the patchlet below to change to
> > rnp_start in a comment.  Please let me know if this would mess things up.
> 
> Yes, missed that. Sorry. It looks great, thanks!

Done!  And not a problem -- this is after all why we do reviews.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-24  0:51             ` Joel Fernandes
@ 2018-05-24  1:35               ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-05-24  1:35 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Joel Fernandes, linux-kernel, Peter Zilstra,
	Ingo Molnar, Boqun Feng, byungchul.park, kernel-team,
	Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Wed, 23 May 2018 17:51:19 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> Shouldn't this assignment be done outside the loop? I believe the variable
> will be initialized on each iteration.
> 
> A program like this doesn't terminate:
> 
> #include<stdio.h>
> 
> int main() {
> 	for (;;) {
> 		int i = 10;
> 		if (!(i--))
> 			break;
> 	}
> 
> 	return 0;
> }

Hey, it compiled, booted and tested. SHIP IT! ;-)

But yeah, a little egg on my face for that one. I blame it for the
first code I wrote right off of coming back from vacation.

> 
> Otherwise looks good to me, I would initialize fract to 10 so its consistent
> with "HZ/10" in other parts of the code but I'm ok with either number.

I was thinking about that, but for some reason I thought 15. Not sure
why. I'm fine with dropping it down to 10. I'll send out a more proper
patch tomorrow.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23 20:04           ` Paul E. McKenney
  2018-05-23 21:51             ` Joel Fernandes
  2018-05-24  0:51             ` Joel Fernandes
@ 2018-05-24 21:47             ` Steven Rostedt
  2018-05-24 22:38               ` Paul E. McKenney
  2 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2018-05-24 21:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, linux-kernel, Joel Fernandes (Google),
	Peter Zilstra, Ingo Molnar, Boqun Feng, byungchul.park,
	kernel-team, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Wed, 23 May 2018 13:04:58 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Looks plausible to me!
> 

BTW looking at the code:

> 		/* Invoke the callbacks. */
> 		while (list) {
> 			next = list->next;
> 			local_bh_disable();
> 			list->func(list);
> 			local_bh_enable();
> 			list = next;
> 			cond_resched();
> 		}
> 		schedule_timeout_uninterruptible(HZ/10);

What's the purpose of this final sleep?

-- Steve

> 	}
> }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-24 21:47             ` Steven Rostedt
@ 2018-05-24 22:38               ` Paul E. McKenney
  2018-05-24 22:42                 ` Steven Rostedt
  2018-07-17  9:11                 ` [tip:core/rcu] rcu: Add comment to the last sleep in the rcu tasks loop tip-bot for Steven Rostedt (VMware)
  0 siblings, 2 replies; 24+ messages in thread
From: Paul E. McKenney @ 2018-05-24 22:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joel Fernandes, linux-kernel, Joel Fernandes (Google),
	Peter Zilstra, Ingo Molnar, Boqun Feng, byungchul.park,
	kernel-team, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Thu, May 24, 2018 at 05:47:52PM -0400, Steven Rostedt wrote:
> On Wed, 23 May 2018 13:04:58 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > Looks plausible to me!
> > 
> 
> BTW looking at the code:
> 
> > 		/* Invoke the callbacks. */
> > 		while (list) {
> > 			next = list->next;
> > 			local_bh_disable();
> > 			list->func(list);
> > 			local_bh_enable();
> > 			list = next;
> > 			cond_resched();
> > 		}
> > 		schedule_timeout_uninterruptible(HZ/10);
> 
> What's the purpose of this final sleep?

To avoid any possibility of rcu_tasks_kthread() getting into a tight loop.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks
  2018-05-24 22:38               ` Paul E. McKenney
@ 2018-05-24 22:42                 ` Steven Rostedt
  2018-07-17  9:11                 ` [tip:core/rcu] rcu: Add comment to the last sleep in the rcu tasks loop tip-bot for Steven Rostedt (VMware)
  1 sibling, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-05-24 22:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, linux-kernel, Joel Fernandes (Google),
	Peter Zilstra, Ingo Molnar, Boqun Feng, byungchul.park,
	kernel-team, Josh Triplett, Lai Jiangshan, Mathieu Desnoyers

On Thu, 24 May 2018 15:38:39 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Thu, May 24, 2018 at 05:47:52PM -0400, Steven Rostedt wrote:
> > On Wed, 23 May 2018 13:04:58 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > Looks plausible to me!
> > >   
> > 
> > BTW looking at the code:
> >   
> > > 		/* Invoke the callbacks. */
> > > 		while (list) {
> > > 			next = list->next;
> > > 			local_bh_disable();
> > > 			list->func(list);
> > > 			local_bh_enable();
> > > 			list = next;
> > > 			cond_resched();
> > > 		}
> > > 		schedule_timeout_uninterruptible(HZ/10);  
> > 
> > What's the purpose of this final sleep?  
> 
> To avoid any possibility of rcu_tasks_kthread() getting into a tight loop.
> 

OK, so I'll send out a v4 of what I just sent out ;-)

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip:core/rcu] rcu: Speed up calling of RCU tasks callbacks
  2018-05-23  6:38 ` [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks Joel Fernandes
  2018-05-23 15:57   ` Paul E. McKenney
@ 2018-07-17  9:11   ` tip-bot for Steven Rostedt (VMware)
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Steven Rostedt (VMware) @ 2018-07-17  9:11 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, rostedt, mingo, paulmck, tglx, joel, linux-kernel

Commit-ID:  c03be752d39dc64dcfda0ac8ce87fb10b1ee5621
Gitweb:     https://git.kernel.org/tip/c03be752d39dc64dcfda0ac8ce87fb10b1ee5621
Author:     Steven Rostedt (VMware) <rostedt@goodmis.org>
AuthorDate: Thu, 24 May 2018 18:49:46 -0400
Committer:  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CommitDate: Thu, 12 Jul 2018 15:39:21 -0700

rcu: Speed up calling of RCU tasks callbacks

Joel Fernandes found that the synchronize_rcu_tasks() was taking a
significant amount of time. He demonstrated it with the following test:

 # cd /sys/kernel/tracing
 # while [ 1 ]; do x=1; done &
 # echo '__schedule_bug:traceon' > set_ftrace_filter
 # time echo '!__schedule_bug:traceon' > set_ftrace_filter;

real	0m1.064s
user	0m0.000s
sys	0m0.004s

Where it takes a little over a second to perform the synchronize,
because there's a loop that waits 1 second at a time for tasks to get
through their quiescent points when there's a task that must be waited
for.

After discussion we came up with a simple way to wait for holdouts but
increase the time for each iteration of the loop but no more than a
full second.

With the new patch we have:

 # time echo '!__schedule_bug:traceon' > set_ftrace_filter;

real	0m0.131s
user	0m0.000s
sys	0m0.004s

Which drops it down to 13% of what the original wait time was.

Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.org
Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/update.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 5783bdf86e5a..4c7c49c106ee 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -668,6 +668,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 	struct rcu_head *list;
 	struct rcu_head *next;
 	LIST_HEAD(rcu_tasks_holdouts);
+	int fract;
 
 	/* Run on housekeeping CPUs by default.  Sysadm can move if desired. */
 	housekeeping_affine(current, HK_FLAG_RCU);
@@ -749,13 +750,25 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 		 * holdouts.  When the list is empty, we are done.
 		 */
 		lastreport = jiffies;
-		while (!list_empty(&rcu_tasks_holdouts)) {
+
+		/* Start off with HZ/10 wait and slowly back off to 1 HZ wait*/
+		fract = 10;
+
+		for (;;) {
 			bool firstreport;
 			bool needreport;
 			int rtst;
 			struct task_struct *t1;
 
-			schedule_timeout_interruptible(HZ);
+			if (list_empty(&rcu_tasks_holdouts))
+				break;
+
+			/* Slowly back off waiting for holdouts */
+			schedule_timeout_interruptible(HZ/fract);
+
+			if (fract > 1)
+				fract--;
+
 			rtst = READ_ONCE(rcu_task_stall_timeout);
 			needreport = rtst > 0 &&
 				     time_after(jiffies, lastreport + rtst);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:core/rcu] rcu: Add comment to the last sleep in the rcu tasks loop
  2018-05-24 22:38               ` Paul E. McKenney
  2018-05-24 22:42                 ` Steven Rostedt
@ 2018-07-17  9:11                 ` tip-bot for Steven Rostedt (VMware)
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Steven Rostedt (VMware) @ 2018-07-17  9:11 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: paulmck, mingo, tglx, hpa, linux-kernel, rostedt

Commit-ID:  cd23ac8ddb7be993f88bee893b89a8b4971c3651
Gitweb:     https://git.kernel.org/tip/cd23ac8ddb7be993f88bee893b89a8b4971c3651
Author:     Steven Rostedt (VMware) <rostedt@goodmis.org>
AuthorDate: Thu, 24 May 2018 18:58:16 -0400
Committer:  Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CommitDate: Thu, 12 Jul 2018 15:39:21 -0700

rcu: Add comment to the last sleep in the rcu tasks loop

At the end of rcu_tasks_kthread() there's a lonely
schedule_timeout_uninterruptible() call with no apparent rationale for
its existence. But there is. It is to keep the thread from going into
a tight loop if there's some anomaly. That really needs a comment.

Link: http://lkml.kernel.org/r/20180524223839.GU3803@linux.vnet.ibm.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcu/update.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c7c49c106ee..39cb23d22109 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -814,6 +814,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 			list = next;
 			cond_resched();
 		}
+		/* Paranoid sleep to keep this from entering a tight loop */
 		schedule_timeout_uninterruptible(HZ/10);
 	}
 }

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-07-17  9:11 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-23  6:38 [PATCH 0/4] cleanups, fixes for rcu/dev Joel Fernandes
2018-05-23  6:38 ` [PATCH 1/4] rcu: Speed up calling of RCU tasks callbacks Joel Fernandes
2018-05-23 15:57   ` Paul E. McKenney
2018-05-23 16:45     ` Steven Rostedt
2018-05-23 17:03       ` Paul E. McKenney
2018-05-23 19:13         ` Steven Rostedt
2018-05-23 20:04           ` Paul E. McKenney
2018-05-23 21:51             ` Joel Fernandes
2018-05-24  0:51             ` Joel Fernandes
2018-05-24  1:35               ` Steven Rostedt
2018-05-24 21:47             ` Steven Rostedt
2018-05-24 22:38               ` Paul E. McKenney
2018-05-24 22:42                 ` Steven Rostedt
2018-07-17  9:11                 ` [tip:core/rcu] rcu: Add comment to the last sleep in the rcu tasks loop tip-bot for Steven Rostedt (VMware)
2018-07-17  9:11   ` [tip:core/rcu] rcu: Speed up calling of RCU tasks callbacks tip-bot for Steven Rostedt (VMware)
2018-05-23  6:38 ` [PATCH 2/4] rcu: Add comment documenting how rcu_seq_snap works Joel Fernandes
2018-05-23 16:04   ` Paul E. McKenney
2018-05-23  6:38 ` [PATCH 3/4] rcu: Use better variable names in funnel locking loop Joel Fernandes
2018-05-23 16:06   ` Paul E. McKenney
2018-05-23 19:23     ` Paul E. McKenney
2018-05-24  0:54       ` Joel Fernandes
2018-05-24  1:27         ` Paul E. McKenney
2018-05-23  6:38 ` [PATCH 4/4] rcu: Identify grace period is in progress as we advance up the tree Joel Fernandes
2018-05-23 16:06   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.