linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/5] rcu: Introduce for_each_leaf_node_cpu()
@ 2016-12-15  2:41 Boqun Feng
  2016-12-15  2:42 ` [RFC v2 1/5] " Boqun Feng
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Boqun Feng @ 2016-12-15  2:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King, Mark Rutland

v1:

https://marc.info/?l=linux-kernel&m=148127336205765

Changes since v1:

*	Rename the primitive to for_each_leaf_node_cpu() for a shorter and
	hopefully better name

*	Fix a bug report by Colin King about bit shifting

*	Drop iterator @bit based on suggestions from Mark Rutland

*	Add a WARN_ON_ONCE() for !cpu_possible(cpu).

Based on -rcu's rcu/dev branch, commit 278c19fe5bc7 "rcu: Adjust FQS offline
checks for exact online-CPU detection". Tested by 0day and rcutorture on a
12-cpu host, the results of rcutorture are all fine except:

TREE07 ------- 7655 grace periods (4.25278 per second)
WARNING: BAD SEQ 3901:3901 last:7652 version 61
    /home/boqun/linux/tools/testing/selftests/rcutorture/res/2016.12.14-13:17:10/TREE07/console.log
WARNING: Assertion failure in /home/boqun/linux/tools/testing/selftests/rcutorture/res/2016.12.14-13:17:10/TREE07/console.log
WARNING: Summary: Stalls: 4
CPU count limited from 16 to 12

The same WARNING could be reproduced without this patchset. Paul, given the
sentence "CPU count limited from 16 to 12", is this just a problem for my
setup? Maybe I should descrease the SMP numbers or increase
rcutorture.stat_interval?

Regards,
Boqun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC v2 1/5] rcu: Introduce for_each_leaf_node_cpu()
  2016-12-15  2:41 [RFC v2 0/5] rcu: Introduce for_each_leaf_node_cpu() Boqun Feng
@ 2016-12-15  2:42 ` Boqun Feng
  2016-12-15 11:43   ` Mark Rutland
  2016-12-15 15:21   ` [RFC v2.1 " Boqun Feng
  2016-12-15  2:42 ` [RFC v2 2/5] rcu: Use for_each_leaf_node_cpu() in RCU stall checking Boqun Feng
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Boqun Feng @ 2016-12-15  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King, Mark Rutland, Boqun Feng

There are some places inside RCU core, where we need to iterate all mask
(->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all
corresponding CPUs. The current code iterates all possible CPUs in this
leaf node and then checks with the mask to see whether the bit is set.

However, given the fact that most bits in cpu_possible_mask are set but
rare bits in an RCU leaf node mask are set(in other words, ->qsmask and
its friends are usually more sparse than cpu_possible_mask), it's better
to iterate in the other way, that is iterating mask bits in a leaf node.
By doing so, we can save several checks in the loop, moreover, that fast
path checking(e.g. ->qsmask == 0) could then be consolidated into the
loop logic.

This patch introduce for_each_leaf_node_cpu() to iterate mask bits in a
more efficient way.

By design, The CPUs whose bits are set in the leaf node masks should be
a subset of possible CPUs, so we don't need extra check with
cpu_possible(), however, a WARN_ON_ONCE() is put in the loop to check
whether there are some nasty cases we miss.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index c0a4bf8f1ed0..70ef44a082e0 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -295,6 +295,22 @@ struct rcu_node {
 	     cpu <= rnp->grphi; \
 	     cpu = cpumask_next((cpu), cpu_possible_mask))
 
+
+#define MASK_BITS(mask)	(BITS_PER_BYTE * sizeof(mask))
+/*
+ * Iterate over all CPUs a leaf RCU node which are still masked in
+ * @mask.
+ *
+ * Note @rnp has to be a leaf node and @mask has to belong to @rnp. And we
+ * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW,
+ * masks of a leaf node never set a bit for an "impossible" CPU.
+ */
+#define for_each_leaf_node_cpu(rnp, mask, cpu) \
+	for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \
+	     (cpu) <= (rnp)->grphi && !WARN_ON_ONCE(!cpu_possible(cpu)); \
+	     (cpu) = (rnp)->grplo + find_next_bit(&(mask), MASK_BITS(mask), \
+						  (cpu) - (rnp)->grplo + 1))
+
 /*
  * Union to allow "aggregate OR" operation on the need for a quiescent
  * state by the normal and expedited grace periods.
-- 
2.10.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC v2 2/5] rcu: Use for_each_leaf_node_cpu() in RCU stall checking
  2016-12-15  2:41 [RFC v2 0/5] rcu: Introduce for_each_leaf_node_cpu() Boqun Feng
  2016-12-15  2:42 ` [RFC v2 1/5] " Boqun Feng
@ 2016-12-15  2:42 ` Boqun Feng
  2016-12-15  2:42 ` [RFC v2 3/5] rcu: Use for_each_leaf_node_cpu() in ->expmask iteration Boqun Feng
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Boqun Feng @ 2016-12-15  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King, Mark Rutland, Boqun Feng

Use for_each_leaf_node_cpu() in RCU stall checking code, to save some
extra checks, based on the fact that ->qsmask is mostly more sparse than
cpu_possible_mask.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b56e2dc31ba5..4e5b81c843de 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1417,10 +1417,11 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-		for_each_leaf_node_possible_cpu(rnp, cpu)
-			if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
-				if (!trigger_single_cpu_backtrace(cpu))
-					dump_cpu_task(cpu);
+
+		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu)
+			if (!trigger_single_cpu_backtrace(cpu))
+				dump_cpu_task(cpu);
+
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
 }
@@ -1490,13 +1491,12 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		ndetected += rcu_print_task_stall(rnp);
-		if (rnp->qsmask != 0) {
-			for_each_leaf_node_possible_cpu(rnp, cpu)
-				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
-					print_cpu_stall_info(rsp, cpu);
-					ndetected++;
-				}
+
+		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu) {
+			print_cpu_stall_info(rsp, cpu);
+			ndetected++;
 		}
+
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
 
-- 
2.10.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC v2 3/5] rcu: Use for_each_leaf_node_cpu() in ->expmask iteration
  2016-12-15  2:41 [RFC v2 0/5] rcu: Introduce for_each_leaf_node_cpu() Boqun Feng
  2016-12-15  2:42 ` [RFC v2 1/5] " Boqun Feng
  2016-12-15  2:42 ` [RFC v2 2/5] rcu: Use for_each_leaf_node_cpu() in RCU stall checking Boqun Feng
@ 2016-12-15  2:42 ` Boqun Feng
  2016-12-15  2:42 ` [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp() Boqun Feng
  2016-12-15  2:42 ` [RFC v2 5/5] rcu: Use for_each_leaf_node_cpu() in online CPU iteration Boqun Feng
  4 siblings, 0 replies; 26+ messages in thread
From: Boqun Feng @ 2016-12-15  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King, Mark Rutland, Boqun Feng

The ->expmask of an RCU leaf node should be more sparse than the
corresponding part of cpu_possible_mask, iterating over ->expmask bitmap
rather cpu_possible_mask to save some checks.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_exp.h | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index a3a8756670d1..c4b3c8d01941 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -419,7 +419,6 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 	int cpu;
 	unsigned long jiffies_stall;
 	unsigned long jiffies_start;
-	unsigned long mask;
 	int ndetected;
 	struct rcu_node *rnp;
 	struct rcu_node *rnp_root = rcu_get_root(rsp);
@@ -444,12 +443,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 		ndetected = 0;
 		rcu_for_each_leaf_node(rsp, rnp) {
 			ndetected += rcu_print_task_exp_stall(rnp);
-			for_each_leaf_node_possible_cpu(rnp, cpu) {
+			for_each_leaf_node_cpu(rnp, rnp->expmask, cpu) {
 				struct rcu_data *rdp;
 
-				mask = leaf_node_cpu_bit(rnp, cpu);
-				if (!(rnp->expmask & mask))
-					continue;
 				ndetected++;
 				rdp = per_cpu_ptr(rsp->rda, cpu);
 				pr_cont(" %d-%c%c%c", cpu,
@@ -475,14 +471,10 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 			}
 			pr_cont("\n");
 		}
-		rcu_for_each_leaf_node(rsp, rnp) {
-			for_each_leaf_node_possible_cpu(rnp, cpu) {
-				mask = leaf_node_cpu_bit(rnp, cpu);
-				if (!(rnp->expmask & mask))
-					continue;
+		rcu_for_each_leaf_node(rsp, rnp)
+			for_each_leaf_node_cpu(rnp, rnp->expmask, cpu)
 				dump_cpu_task(cpu);
-			}
-		}
+
 		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
 	}
 }
-- 
2.10.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-15  2:41 [RFC v2 0/5] rcu: Introduce for_each_leaf_node_cpu() Boqun Feng
                   ` (2 preceding siblings ...)
  2016-12-15  2:42 ` [RFC v2 3/5] rcu: Use for_each_leaf_node_cpu() in ->expmask iteration Boqun Feng
@ 2016-12-15  2:42 ` Boqun Feng
  2016-12-15 12:04   ` Mark Rutland
  2016-12-15  2:42 ` [RFC v2 5/5] rcu: Use for_each_leaf_node_cpu() in online CPU iteration Boqun Feng
  4 siblings, 1 reply; 26+ messages in thread
From: Boqun Feng @ 2016-12-15  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King, Mark Rutland, Boqun Feng

->qsmask of an RCU leaf node is usually more sparse than the
corresponding cpu_possible_mask. So replace the
for_each_leaf_node_possible_cpu() in force_qs_rnp() with
for_each_leaf_node_cpu() to save several checks.

[Note we need to use "1UL << bit" instead of "1 << bit" to generate the
corresponding mask for a bit because @mask is unsigned long, this was
spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
a previous version of this patch.]

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4e5b81c843de..1ef13e63bc95 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
 				continue;
 			}
 		}
-		for_each_leaf_node_possible_cpu(rnp, cpu) {
-			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
-			if ((rnp->qsmask & bit) != 0) {
-				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
-					mask |= bit;
-			}
-		}
+
+		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu)
+			if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
+				mask |= leaf_node_cpu_bit(rnp, cpu);
+
 		if (mask != 0) {
 			/* Idle/offline CPUs, report (releases rnp->lock. */
 			rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
-- 
2.10.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC v2 5/5] rcu: Use for_each_leaf_node_cpu() in online CPU iteration
  2016-12-15  2:41 [RFC v2 0/5] rcu: Introduce for_each_leaf_node_cpu() Boqun Feng
                   ` (3 preceding siblings ...)
  2016-12-15  2:42 ` [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp() Boqun Feng
@ 2016-12-15  2:42 ` Boqun Feng
  4 siblings, 0 replies; 26+ messages in thread
From: Boqun Feng @ 2016-12-15  2:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King, Mark Rutland, Boqun Feng

Though mostly identical, ->qsmaskinit(A.K.A rcu_rnp_online_cpus()) is
sometimes more sparse than the corresponding part of cpu_possible_mask
for an RCU leaf node. So we use for_each_leaf_node_cpu() in
rcu_boost_kthread_setaffinity() instead to save some extra checks.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_plugin.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 69c6eb27c37f..93cebd940a93 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1175,10 +1175,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 		return;
 	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
 		return;
-	for_each_leaf_node_possible_cpu(rnp, cpu)
-		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
-		    cpu != outgoingcpu)
+
+	for_each_leaf_node_cpu(rnp, mask, cpu)
+		if (cpu != outgoingcpu)
 			cpumask_set_cpu(cpu, cm);
+
 	if (cpumask_weight(cm) == 0)
 		cpumask_setall(cm);
 	set_cpus_allowed_ptr(t, cm);
-- 
2.10.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 1/5] rcu: Introduce for_each_leaf_node_cpu()
  2016-12-15  2:42 ` [RFC v2 1/5] " Boqun Feng
@ 2016-12-15 11:43   ` Mark Rutland
  2016-12-15 14:38     ` Boqun Feng
  2016-12-15 15:21   ` [RFC v2.1 " Boqun Feng
  1 sibling, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2016-12-15 11:43 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King

On Thu, Dec 15, 2016 at 10:42:00AM +0800, Boqun Feng wrote:
> There are some places inside RCU core, where we need to iterate all mask
> (->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all
> corresponding CPUs. The current code iterates all possible CPUs in this
> leaf node and then checks with the mask to see whether the bit is set.
> 
> However, given the fact that most bits in cpu_possible_mask are set but
> rare bits in an RCU leaf node mask are set(in other words, ->qsmask and
> its friends are usually more sparse than cpu_possible_mask), it's better
> to iterate in the other way, that is iterating mask bits in a leaf node.
> By doing so, we can save several checks in the loop, moreover, that fast
> path checking(e.g. ->qsmask == 0) could then be consolidated into the
> loop logic.
> 
> This patch introduce for_each_leaf_node_cpu() to iterate mask bits in a
> more efficient way.
> 
> By design, The CPUs whose bits are set in the leaf node masks should be
> a subset of possible CPUs, so we don't need extra check with
> cpu_possible(), however, a WARN_ON_ONCE() is put in the loop to check
> whether there are some nasty cases we miss.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/rcu/tree.h | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index c0a4bf8f1ed0..70ef44a082e0 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -295,6 +295,22 @@ struct rcu_node {
>  	     cpu <= rnp->grphi; \
>  	     cpu = cpumask_next((cpu), cpu_possible_mask))
>  
> +
> +#define MASK_BITS(mask)	(BITS_PER_BYTE * sizeof(mask))
> +/*
> + * Iterate over all CPUs a leaf RCU node which are still masked in
> + * @mask.
> + *
> + * Note @rnp has to be a leaf node and @mask has to belong to @rnp.

Not a big deal, but perhaps it's worth enforcing this? If we took just
the name of the mask here, (e.g. qsmask rather than rnp->qsmask), we
could have the macro always use (rnp)->(mask). That would also make the
invocations shorter.

> And we
> + * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW,
> + * masks of a leaf node never set a bit for an "impossible" CPU.
> + */
> +#define for_each_leaf_node_cpu(rnp, mask, cpu) \
> +	for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \
> +	     (cpu) <= (rnp)->grphi && !WARN_ON_ONCE(!cpu_possible(cpu)); \

If this happens, we'll exit the loop. If there are any reamining
possible CPUs, we'll skip them, which would be less than ideal.

I guess this shouldn't happen anyway, but it might be worth continuing.

> +	     (cpu) = (rnp)->grplo + find_next_bit(&(mask), MASK_BITS(mask), \
> +						  (cpu) - (rnp)->grplo + 1))
> +

I was going to ask if that + 1 was correct, but I see that it is!

So FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>


I had a go at handling my comments above, but I'm not sure it's any
better:

#define cpu_to_grp(rnp, cpu)	((cpu) - (rnp)->grplo)

#define grp_to_cpu(rnp, cpu)	((cpu) + (rnp)->grplo)

#define node_first_cpu(rnp, mask) \
	grp_to_cpu(find_first_bit(&(rnp)->mask, MASK_BITS((rnp)->mask)))

#define node_next_cpu(rnp, mask, cpu)
	grp_to_cpu(rnp, find_next_bit(&(rnp)->mask, MASK_BITS((rnp)->mask),
				      cpu_to_grp(rnp, cpu) + 1))

#define for_each_leaf_node_cpu(rnp, mask, cpu) \
	for ((cpu) = node_first_cpu(rnp, mask); \
	     (cpu) <= (rnp)->grphi; \
	     (cpu) = node_next_cpu(rnp, mask, cpu)) \
		if (WARN_ON_ONCE(!cpu_possible(cpu))) \
			continue; \
		else

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-15  2:42 ` [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp() Boqun Feng
@ 2016-12-15 12:04   ` Mark Rutland
  2016-12-15 14:42     ` Boqun Feng
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2016-12-15 12:04 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King

On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> ->qsmask of an RCU leaf node is usually more sparse than the
> corresponding cpu_possible_mask. So replace the
> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> for_each_leaf_node_cpu() to save several checks.
> 
> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> corresponding mask for a bit because @mask is unsigned long, this was
> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> a previous version of this patch.]

Nit: This note can go now that we use leaf_node_cpu_bit(). ;)

Thanks,
Mark.

> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/rcu/tree.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 4e5b81c843de..1ef13e63bc95 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
>  				continue;
>  			}
>  		}
> -		for_each_leaf_node_possible_cpu(rnp, cpu) {
> -			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
> -			if ((rnp->qsmask & bit) != 0) {
> -				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> -					mask |= bit;
> -			}
> -		}
> +
> +		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu)
> +			if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> +				mask |= leaf_node_cpu_bit(rnp, cpu);
> +
>  		if (mask != 0) {
>  			/* Idle/offline CPUs, report (releases rnp->lock. */
>  			rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
> -- 
> 2.10.2
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 1/5] rcu: Introduce for_each_leaf_node_cpu()
  2016-12-15 11:43   ` Mark Rutland
@ 2016-12-15 14:38     ` Boqun Feng
  2016-12-15 15:10       ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Boqun Feng @ 2016-12-15 14:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King

[-- Attachment #1: Type: text/plain, Size: 4800 bytes --]

On Thu, Dec 15, 2016 at 11:43:52AM +0000, Mark Rutland wrote:
> On Thu, Dec 15, 2016 at 10:42:00AM +0800, Boqun Feng wrote:
> > There are some places inside RCU core, where we need to iterate all mask
> > (->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all
> > corresponding CPUs. The current code iterates all possible CPUs in this
> > leaf node and then checks with the mask to see whether the bit is set.
> > 
> > However, given the fact that most bits in cpu_possible_mask are set but
> > rare bits in an RCU leaf node mask are set(in other words, ->qsmask and
> > its friends are usually more sparse than cpu_possible_mask), it's better
> > to iterate in the other way, that is iterating mask bits in a leaf node.
> > By doing so, we can save several checks in the loop, moreover, that fast
> > path checking(e.g. ->qsmask == 0) could then be consolidated into the
> > loop logic.
> > 
> > This patch introduce for_each_leaf_node_cpu() to iterate mask bits in a
> > more efficient way.
> > 
> > By design, The CPUs whose bits are set in the leaf node masks should be
> > a subset of possible CPUs, so we don't need extra check with
> > cpu_possible(), however, a WARN_ON_ONCE() is put in the loop to check
> > whether there are some nasty cases we miss.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  kernel/rcu/tree.h | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index c0a4bf8f1ed0..70ef44a082e0 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -295,6 +295,22 @@ struct rcu_node {
> >  	     cpu <= rnp->grphi; \
> >  	     cpu = cpumask_next((cpu), cpu_possible_mask))
> >  
> > +
> > +#define MASK_BITS(mask)	(BITS_PER_BYTE * sizeof(mask))
> > +/*
> > + * Iterate over all CPUs a leaf RCU node which are still masked in
> > + * @mask.
> > + *
> > + * Note @rnp has to be a leaf node and @mask has to belong to @rnp.
> 
> Not a big deal, but perhaps it's worth enforcing this? If we took just
> the name of the mask here, (e.g. qsmask rather than rnp->qsmask), we
> could have the macro always use (rnp)->(mask). That would also make the
> invocations shorter.
> 

I thought about this approach, but there may be some cases it seems
inappropriate, see patch #5, passing "qsmaskinitnext" directly to the
for_each_leaf_node_cpu() might be OK, but it just break another
abstraction layer which rcu_rnp_online_cpus() provides.

> > And we
> > + * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW,
> > + * masks of a leaf node never set a bit for an "impossible" CPU.
> > + */
> > +#define for_each_leaf_node_cpu(rnp, mask, cpu) \
> > +	for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \
> > +	     (cpu) <= (rnp)->grphi && !WARN_ON_ONCE(!cpu_possible(cpu)); \
> 
> If this happens, we'll exit the loop. If there are any reamining
> possible CPUs, we'll skip them, which would be less than ideal.
> 
> I guess this shouldn't happen anyway, but it might be worth continuing.
> 

I chose to break if we met impossible only because I wanted to avoid
using that "if(...) else" trick in an iteration macro ;-)

I don't know whether this is the first time something like this is
brought into kernel, so I'm kinda hesitating to bring this in. But seems
I got you as one supporter ;-)

Certainly, skip is better than stop.

> > +	     (cpu) = (rnp)->grplo + find_next_bit(&(mask), MASK_BITS(mask), \
> > +						  (cpu) - (rnp)->grplo + 1))
> > +
> 
> I was going to ask if that + 1 was correct, but I see that it is!
> 
> So FWIW:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 

Thanks ;-)

> 
> I had a go at handling my comments above, but I'm not sure it's any
> better:
> 
> #define cpu_to_grp(rnp, cpu)	((cpu) - (rnp)->grplo)
> 
> #define grp_to_cpu(rnp, cpu)	((cpu) + (rnp)->grplo)
> 
> #define node_first_cpu(rnp, mask) \
> 	grp_to_cpu(find_first_bit(&(rnp)->mask, MASK_BITS((rnp)->mask)))
> 
> #define node_next_cpu(rnp, mask, cpu)
> 	grp_to_cpu(rnp, find_next_bit(&(rnp)->mask, MASK_BITS((rnp)->mask),
> 				      cpu_to_grp(rnp, cpu) + 1))
> 

I tried something similar, but it seems bringing too many abstraction
layers just for one macro. I basically follow the rule: if the potential
users are less than three, no need to do abstraction ;-)

But thank you for looking into this ;-)

Regards,
Boqun

> #define for_each_leaf_node_cpu(rnp, mask, cpu) \
> 	for ((cpu) = node_first_cpu(rnp, mask); \
> 	     (cpu) <= (rnp)->grphi; \
> 	     (cpu) = node_next_cpu(rnp, mask, cpu)) \
> 		if (WARN_ON_ONCE(!cpu_possible(cpu))) \
> 			continue; \
> 		else
> 
> Thanks,
> Mark.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-15 12:04   ` Mark Rutland
@ 2016-12-15 14:42     ` Boqun Feng
  2016-12-15 14:51       ` Colin Ian King
  0 siblings, 1 reply; 26+ messages in thread
From: Boqun Feng @ 2016-12-15 14:42 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > ->qsmask of an RCU leaf node is usually more sparse than the
> > corresponding cpu_possible_mask. So replace the
> > for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > for_each_leaf_node_cpu() to save several checks.
> > 
> > [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > corresponding mask for a bit because @mask is unsigned long, this was
> > spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> > a previous version of this patch.]
> 
> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> 

I kinda keep this here for honoring the effort of finding out this bug
from Colin, but yes, it's no longer needed here for the current code.

Regards,
Boqun

> Thanks,
> Mark.
> 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  kernel/rcu/tree.c | 12 +++++-------
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 4e5b81c843de..1ef13e63bc95 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
> >  				continue;
> >  			}
> >  		}
> > -		for_each_leaf_node_possible_cpu(rnp, cpu) {
> > -			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
> > -			if ((rnp->qsmask & bit) != 0) {
> > -				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> > -					mask |= bit;
> > -			}
> > -		}
> > +
> > +		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu)
> > +			if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> > +				mask |= leaf_node_cpu_bit(rnp, cpu);
> > +
> >  		if (mask != 0) {
> >  			/* Idle/offline CPUs, report (releases rnp->lock. */
> >  			rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
> > -- 
> > 2.10.2
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-15 14:42     ` Boqun Feng
@ 2016-12-15 14:51       ` Colin Ian King
  2016-12-19 15:15         ` Boqun Feng
  0 siblings, 1 reply; 26+ messages in thread
From: Colin Ian King @ 2016-12-15 14:51 UTC (permalink / raw)
  To: Boqun Feng, Mark Rutland
  Cc: linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan


[-- Attachment #1.1: Type: text/plain, Size: 2027 bytes --]

On 15/12/16 14:42, Boqun Feng wrote:
> On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
>> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
>>> ->qsmask of an RCU leaf node is usually more sparse than the
>>> corresponding cpu_possible_mask. So replace the
>>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
>>> for_each_leaf_node_cpu() to save several checks.
>>>
>>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
>>> corresponding mask for a bit because @mask is unsigned long, this was
>>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
>>> a previous version of this patch.]
>>
>> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
>>
> 
> I kinda keep this here for honoring the effort of finding out this bug
> from Colin, but yes, it's no longer needed here for the current code.

Yep, remove it.

> 
> Regards,
> Boqun
> 
>> Thanks,
>> Mark.
>>
>>> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
>>> ---
>>>  kernel/rcu/tree.c | 12 +++++-------
>>>  1 file changed, 5 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index 4e5b81c843de..1ef13e63bc95 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
>>>  				continue;
>>>  			}
>>>  		}
>>> -		for_each_leaf_node_possible_cpu(rnp, cpu) {
>>> -			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
>>> -			if ((rnp->qsmask & bit) != 0) {
>>> -				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
>>> -					mask |= bit;
>>> -			}
>>> -		}
>>> +
>>> +		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu)
>>> +			if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
>>> +				mask |= leaf_node_cpu_bit(rnp, cpu);
>>> +
>>>  		if (mask != 0) {
>>>  			/* Idle/offline CPUs, report (releases rnp->lock. */
>>>  			rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
>>> -- 
>>> 2.10.2
>>>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 837 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 1/5] rcu: Introduce for_each_leaf_node_cpu()
  2016-12-15 14:38     ` Boqun Feng
@ 2016-12-15 15:10       ` Mark Rutland
  2016-12-15 15:14         ` Boqun Feng
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2016-12-15 15:10 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King

On Thu, Dec 15, 2016 at 10:38:58PM +0800, Boqun Feng wrote:
> On Thu, Dec 15, 2016 at 11:43:52AM +0000, Mark Rutland wrote:
> > On Thu, Dec 15, 2016 at 10:42:00AM +0800, Boqun Feng wrote:
> > > +#define MASK_BITS(mask)	(BITS_PER_BYTE * sizeof(mask))
> > > +/*
> > > + * Iterate over all CPUs a leaf RCU node which are still masked in
> > > + * @mask.
> > > + *
> > > + * Note @rnp has to be a leaf node and @mask has to belong to @rnp.
> > 
> > Not a big deal, but perhaps it's worth enforcing this? If we took just
> > the name of the mask here, (e.g. qsmask rather than rnp->qsmask), we
> > could have the macro always use (rnp)->(mask). That would also make the
> > invocations shorter.
> 
> I thought about this approach, but there may be some cases it seems
> inappropriate, see patch #5, passing "qsmaskinitnext" directly to the
> for_each_leaf_node_cpu() might be OK, but it just break another
> abstraction layer which rcu_rnp_online_cpus() provides.

I had missed that. Given that, not enforcingi t makes sense to me.

> > > And we
> > > + * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW,
> > > + * masks of a leaf node never set a bit for an "impossible" CPU.
> > > + */
> > > +#define for_each_leaf_node_cpu(rnp, mask, cpu) \
> > > +	for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \
> > > +	     (cpu) <= (rnp)->grphi && !WARN_ON_ONCE(!cpu_possible(cpu)); \
> > 
> > If this happens, we'll exit the loop. If there are any reamining
> > possible CPUs, we'll skip them, which would be less than ideal.
> > 
> > I guess this shouldn't happen anyway, but it might be worth continuing.
> > 
> 
> I chose to break if we met impossible only because I wanted to avoid
> using that "if(...) else" trick in an iteration macro ;-)

Understandable. ;)

> I don't know whether this is the first time something like this is
> brought into kernel, so I'm kinda hesitating to bring this in. But seems
> I got you as one supporter ;-)
> 
> Certainly, skip is better than stop.

>From a quick look around, I found at least a few instances of the pattern. e.g.

include/linux/cpufreq.h:

#define cpufreq_for_each_valid_entry(pos, table)                        \
        for (pos = table; pos->frequency != CPUFREQ_TABLE_END; pos++)   \
                if (pos->frequency == CPUFREQ_ENTRY_INVALID)            \
                        continue;                                       \
                else

tools/perf/util/build-id.c:
#define dsos__for_each_with_build_id(pos, head)      \
     list_for_each_entry(pos, head, node)    \
             if (!pos->has_build_id)         \
                     continue;               \
             else

Some drivers, like drivers/net/ethernet/broadcom/bnx2x/bnx2x.h really love it!

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 1/5] rcu: Introduce for_each_leaf_node_cpu()
  2016-12-15 15:10       ` Mark Rutland
@ 2016-12-15 15:14         ` Boqun Feng
  0 siblings, 0 replies; 26+ messages in thread
From: Boqun Feng @ 2016-12-15 15:14 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King

[-- Attachment #1: Type: text/plain, Size: 3153 bytes --]

On Thu, Dec 15, 2016 at 03:10:15PM +0000, Mark Rutland wrote:
> On Thu, Dec 15, 2016 at 10:38:58PM +0800, Boqun Feng wrote:
> > On Thu, Dec 15, 2016 at 11:43:52AM +0000, Mark Rutland wrote:
> > > On Thu, Dec 15, 2016 at 10:42:00AM +0800, Boqun Feng wrote:
> > > > +#define MASK_BITS(mask)	(BITS_PER_BYTE * sizeof(mask))
> > > > +/*
> > > > + * Iterate over all CPUs a leaf RCU node which are still masked in
> > > > + * @mask.
> > > > + *
> > > > + * Note @rnp has to be a leaf node and @mask has to belong to @rnp.
> > > 
> > > Not a big deal, but perhaps it's worth enforcing this? If we took just
> > > the name of the mask here, (e.g. qsmask rather than rnp->qsmask), we
> > > could have the macro always use (rnp)->(mask). That would also make the
> > > invocations shorter.
> > 
> > I thought about this approach, but there may be some cases it seems
> > inappropriate, see patch #5, passing "qsmaskinitnext" directly to the
> > for_each_leaf_node_cpu() might be OK, but it just break another
> > abstraction layer which rcu_rnp_online_cpus() provides.
> 
> I had missed that. Given that, not enforcingi t makes sense to me.
> 
> > > > And we
> > > > + * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW,
> > > > + * masks of a leaf node never set a bit for an "impossible" CPU.
> > > > + */
> > > > +#define for_each_leaf_node_cpu(rnp, mask, cpu) \
> > > > +	for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \
> > > > +	     (cpu) <= (rnp)->grphi && !WARN_ON_ONCE(!cpu_possible(cpu)); \
> > > 
> > > If this happens, we'll exit the loop. If there are any reamining
> > > possible CPUs, we'll skip them, which would be less than ideal.
> > > 
> > > I guess this shouldn't happen anyway, but it might be worth continuing.
> > > 
> > 
> > I chose to break if we met impossible only because I wanted to avoid
> > using that "if(...) else" trick in an iteration macro ;-)
> 
> Understandable. ;)
> 
> > I don't know whether this is the first time something like this is
> > brought into kernel, so I'm kinda hesitating to bring this in. But seems
> > I got you as one supporter ;-)
> > 
> > Certainly, skip is better than stop.
> 
> From a quick look around, I found at least a few instances of the pattern. e.g.
> 
> include/linux/cpufreq.h:
> 
> #define cpufreq_for_each_valid_entry(pos, table)                        \
>         for (pos = table; pos->frequency != CPUFREQ_TABLE_END; pos++)   \
>                 if (pos->frequency == CPUFREQ_ENTRY_INVALID)            \
>                         continue;                                       \
>                 else
> 
> tools/perf/util/build-id.c:
> #define dsos__for_each_with_build_id(pos, head)      \
>      list_for_each_entry(pos, head, node)    \
>              if (!pos->has_build_id)         \
>                      continue;               \
>              else
> 
> Some drivers, like drivers/net/ethernet/broadcom/bnx2x/bnx2x.h really love it!
> 

Thank you!
So no reason we don't use it, let me send an updated patch ;-)

Regards,
Boqun

> Thanks,
> Mark.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC v2.1 1/5] rcu: Introduce for_each_leaf_node_cpu()
  2016-12-15  2:42 ` [RFC v2 1/5] " Boqun Feng
  2016-12-15 11:43   ` Mark Rutland
@ 2016-12-15 15:21   ` Boqun Feng
  2016-12-15 15:29     ` Mark Rutland
  1 sibling, 1 reply; 26+ messages in thread
From: Boqun Feng @ 2016-12-15 15:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King, Mark Rutland, Boqun Feng

There are some places inside RCU core, where we need to iterate all mask
(->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all
corresponding CPUs. The current code iterates all possible CPUs in this
leaf node and then checks with the mask to see whether the bit is set.

However, given the fact that most bits in cpu_possible_mask are set but
rare bits in an RCU leaf node mask are set(in other words, ->qsmask and
its friends are usually more sparse than cpu_possible_mask), it's better
to iterate in the other way, that is iterating mask bits in a leaf node.
By doing so, we can save several checks in the loop, moreover, that fast
path checking(e.g. ->qsmask == 0) could then be consolidated into the
loop logic.

This patch introduce for_each_leaf_node_cpu() to iterate mask bits in a
more efficient way.

By design, The CPUs whose bits are set in the leaf node masks should be
a subset of possible CPUs, so we don't need extra check with
cpu_possible(), however, a WARN_ON_ONCE() is put to check whether there
are some nasty cases we miss, and we skip that "impossible" CPU in that
case.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index c0a4bf8f1ed0..b35da5b5dab1 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -295,6 +295,25 @@ struct rcu_node {
 	     cpu <= rnp->grphi; \
 	     cpu = cpumask_next((cpu), cpu_possible_mask))
 
+
+#define MASK_BITS(mask)	(BITS_PER_BYTE * sizeof(mask))
+/*
+ * Iterate over all CPUs a leaf RCU node which are still masked in
+ * @mask.
+ *
+ * Note @rnp has to be a leaf node and @mask has to belong to @rnp. And we
+ * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW,
+ * masks of a leaf node never set a bit for an "impossible" CPU.
+ */
+#define for_each_leaf_node_cpu(rnp, mask, cpu) \
+	for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \
+	     (cpu) <= (rnp)->grphi; \
+	     (cpu) = (rnp)->grplo + find_next_bit(&(mask), MASK_BITS(mask), \
+						  (cpu) - (rnp)->grplo + 1)) \
+		if (WARN_ON_ONCE(!cpu_possible(cpu))) \
+			continue; \
+		else
+
 /*
  * Union to allow "aggregate OR" operation on the need for a quiescent
  * state by the normal and expedited grace periods.
-- 
2.10.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2.1 1/5] rcu: Introduce for_each_leaf_node_cpu()
  2016-12-15 15:21   ` [RFC v2.1 " Boqun Feng
@ 2016-12-15 15:29     ` Mark Rutland
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Rutland @ 2016-12-15 15:29 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Colin King

On Thu, Dec 15, 2016 at 11:21:05PM +0800, Boqun Feng wrote:
> There are some places inside RCU core, where we need to iterate all mask
> (->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all
> corresponding CPUs. The current code iterates all possible CPUs in this
> leaf node and then checks with the mask to see whether the bit is set.
> 
> However, given the fact that most bits in cpu_possible_mask are set but
> rare bits in an RCU leaf node mask are set(in other words, ->qsmask and
> its friends are usually more sparse than cpu_possible_mask), it's better
> to iterate in the other way, that is iterating mask bits in a leaf node.
> By doing so, we can save several checks in the loop, moreover, that fast
> path checking(e.g. ->qsmask == 0) could then be consolidated into the
> loop logic.
> 
> This patch introduce for_each_leaf_node_cpu() to iterate mask bits in a
> more efficient way.
> 
> By design, The CPUs whose bits are set in the leaf node masks should be
> a subset of possible CPUs, so we don't need extra check with
> cpu_possible(), however, a WARN_ON_ONCE() is put to check whether there
> are some nasty cases we miss, and we skip that "impossible" CPU in that
> case.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  kernel/rcu/tree.h | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index c0a4bf8f1ed0..b35da5b5dab1 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -295,6 +295,25 @@ struct rcu_node {
>  	     cpu <= rnp->grphi; \
>  	     cpu = cpumask_next((cpu), cpu_possible_mask))
>  
> +
> +#define MASK_BITS(mask)	(BITS_PER_BYTE * sizeof(mask))
> +/*
> + * Iterate over all CPUs a leaf RCU node which are still masked in
> + * @mask.
> + *
> + * Note @rnp has to be a leaf node and @mask has to belong to @rnp. And we
> + * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW,
> + * masks of a leaf node never set a bit for an "impossible" CPU.
> + */
> +#define for_each_leaf_node_cpu(rnp, mask, cpu) \
> +	for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \
> +	     (cpu) <= (rnp)->grphi; \
> +	     (cpu) = (rnp)->grplo + find_next_bit(&(mask), MASK_BITS(mask), \
> +						  (cpu) - (rnp)->grplo + 1)) \
> +		if (WARN_ON_ONCE(!cpu_possible(cpu))) \
> +			continue; \
> +		else
> +
>  /*
>   * Union to allow "aggregate OR" operation on the need for a quiescent
>   * state by the normal and expedited grace periods.
> -- 
> 2.10.2
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-15 14:51       ` Colin Ian King
@ 2016-12-19 15:15         ` Boqun Feng
  2016-12-20  5:09           ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Boqun Feng @ 2016-12-19 15:15 UTC (permalink / raw)
  To: Colin Ian King
  Cc: Mark Rutland, linux-kernel, Paul E . McKenney ,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> On 15/12/16 14:42, Boqun Feng wrote:
> > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> >>> ->qsmask of an RCU leaf node is usually more sparse than the
> >>> corresponding cpu_possible_mask. So replace the
> >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> >>> for_each_leaf_node_cpu() to save several checks.
> >>>
> >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> >>> corresponding mask for a bit because @mask is unsigned long, this was
> >>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> >>> a previous version of this patch.]
> >>
> >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> >>
> > 
> > I kinda keep this here for honoring the effort of finding out this bug
> > from Colin, but yes, it's no longer needed here for the current code.
> 
> Yep, remove it.
> 

Paul, here is a modified version of this patch, what I only did is
removing this note.

Besides I rebased the whole series on the current rcu/dev branch of -rcu
tree, on this very commit:

	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")

And I put the latest version at

git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node

If you thought it's better, I could send a v3 ;-)

Regards,
Boqun

------------------------>8
From: Boqun Feng <boqun.feng@gmail.com>
Date: Thu, 8 Dec 2016 23:21:11 +0800
Subject: [PATCH v2.1 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()

->qsmask of an RCU leaf node is usually more sparse than the
corresponding cpu_possible_mask. So replace the
for_each_leaf_node_possible_cpu() in force_qs_rnp() with
for_each_leaf_node_cpu() to save several checks.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4ea4496f4ecc..c2b753fb7f09 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
 				continue;
 			}
 		}
-		for_each_leaf_node_possible_cpu(rnp, cpu) {
-			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
-			if ((rnp->qsmask & bit) != 0) {
-				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
-					mask |= bit;
-			}
-		}
+
+		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu)
+			if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
+				mask |= leaf_node_cpu_bit(rnp, cpu);
+
 		if (mask != 0) {
 			/* Idle/offline CPUs, report (releases rnp->lock. */
 			rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
-- 
2.10.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-19 15:15         ` Boqun Feng
@ 2016-12-20  5:09           ` Paul E. McKenney
  2016-12-20  5:59             ` Boqun Feng
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2016-12-20  5:09 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> > On 15/12/16 14:42, Boqun Feng wrote:
> > > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > >>> corresponding cpu_possible_mask. So replace the
> > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > >>> for_each_leaf_node_cpu() to save several checks.
> > >>>
> > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > >>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> > >>> a previous version of this patch.]
> > >>
> > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > >>
> > > 
> > > I kinda keep this here for honoring the effort of finding out this bug
> > > from Colin, but yes, it's no longer needed here for the current code.
> > 
> > Yep, remove it.
> > 
> 
> Paul, here is a modified version of this patch, what I only did is
> removing this note.
> 
> Besides I rebased the whole series on the current rcu/dev branch of -rcu
> tree, on this very commit:
> 
> 	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> 
> And I put the latest version at
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> 
> If you thought it's better, I could send a v3 ;-)

I would feel better about this patchset if it reduced the number of lines
of code rather than increasing them.  That said, part of the increase
is a commment.  Still, I am not convinced that the extra level of macro
is carrying its weight.

dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")

	The commit log needs a bit of wordsmithing.

	The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
	What is its purpose, really?  What does its triggering tell you?
	What other checks did you consider as an alternative?

	And if you are going to add checks of this type, should you
	also check for this being a leaf rcu_node structure?

3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")

	This does look a bit nicer, but why the added blank lines?
	Are they really helping?

	The commit log seems a bit misplaced.  This code is almost never
	executed (once per 21 seconds at the most), so performance really
	isn't a consideration.	The simpler-looking code might be.

fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")

	Ditto on blank lines.

	Again, this code is executed per expedited grace period, so
	performance really isn't a big deal.  More of a big deal than
	the stall-warning code, but we still are way off of any fastpath.

69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")

	Ditto again on blank lines.

	And on the commit log.  This code is executed about once
	per several jiffies, and on larger machines, per 20 jiffies
	or so.  Performance really isn't a consideration.

7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")

	And another ditto on blank lines.

	This code executes once per CPU-hotplug operation, so again isn't
	at all performance critical.

In short, if you are trying to sell this to me as a significant performance
boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,
though perhaps I am misunderstanding its purpose.  My assumption is
that you want to detect missing UL suffixes on bitmask constants, in
which case I bet there is a better way.

Speaking of which, how do we know that this is free of bugs?

							Thanx, Paul

> Regards,
> Boqun
> 
> ------------------------>8
> From: Boqun Feng <boqun.feng@gmail.com>
> Date: Thu, 8 Dec 2016 23:21:11 +0800
> Subject: [PATCH v2.1 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
> 
> ->qsmask of an RCU leaf node is usually more sparse than the
> corresponding cpu_possible_mask. So replace the
> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> for_each_leaf_node_cpu() to save several checks.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/rcu/tree.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 4ea4496f4ecc..c2b753fb7f09 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
>  				continue;
>  			}
>  		}
> -		for_each_leaf_node_possible_cpu(rnp, cpu) {
> -			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
> -			if ((rnp->qsmask & bit) != 0) {
> -				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> -					mask |= bit;
> -			}
> -		}
> +
> +		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu)
> +			if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> +				mask |= leaf_node_cpu_bit(rnp, cpu);
> +
>  		if (mask != 0) {
>  			/* Idle/offline CPUs, report (releases rnp->lock. */
>  			rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
> -- 
> 2.10.2
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-20  5:09           ` Paul E. McKenney
@ 2016-12-20  5:59             ` Boqun Feng
  2016-12-20  8:11               ` Boqun Feng
  2016-12-20 15:23               ` Paul E. McKenney
  0 siblings, 2 replies; 26+ messages in thread
From: Boqun Feng @ 2016-12-20  5:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 7467 bytes --]

On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote:
> On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> > On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> > > On 15/12/16 14:42, Boqun Feng wrote:
> > > > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > > >>> corresponding cpu_possible_mask. So replace the
> > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > > >>> for_each_leaf_node_cpu() to save several checks.
> > > >>>
> > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > > >>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> > > >>> a previous version of this patch.]
> > > >>
> > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > > >>
> > > > 
> > > > I kinda keep this here for honoring the effort of finding out this bug
> > > > from Colin, but yes, it's no longer needed here for the current code.
> > > 
> > > Yep, remove it.
> > > 
> > 
> > Paul, here is a modified version of this patch, what I only did is
> > removing this note.
> > 
> > Besides I rebased the whole series on the current rcu/dev branch of -rcu
> > tree, on this very commit:
> > 
> > 	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> > 
> > And I put the latest version at
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> > 
> > If you thought it's better, I could send a v3 ;-)
> 
> I would feel better about this patchset if it reduced the number of lines
> of code rather than increasing them.  That said, part of the increase
> is a commment.  Still, I am not convinced that the extra level of macro
> is carrying its weight.
> 
> dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")
> 
> 	The commit log needs a bit of wordsmithing.
> 
> 	The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
> 	What is its purpose, really?  What does its triggering tell you?
> 	What other checks did you consider as an alternative?
> 

The check is an over-case one, it's introduced because I'm worried about
some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on
an "impossible" CPU. I will explanation later in more details.

> 	And if you are going to add checks of this type, should you
> 	also check for this being a leaf rcu_node structure?
> 

I don't think I want to check that, and I don't think check
cpu_possible(cpu) in the macro is similar to that.

> 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")
> 
> 	This does look a bit nicer, but why the added blank lines?
> 	Are they really helping?
> 
> 	The commit log seems a bit misplaced.  This code is almost never
> 	executed (once per 21 seconds at the most), so performance really
> 	isn't a consideration.	The simpler-looking code might be.
> 
> fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")
> 
> 	Ditto on blank lines.
> 
> 	Again, this code is executed per expedited grace period, so
> 	performance really isn't a big deal.  More of a big deal than
> 	the stall-warning code, but we still are way off of any fastpath.
> 
> 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")
> 
> 	Ditto again on blank lines.
> 
> 	And on the commit log.  This code is executed about once
> 	per several jiffies, and on larger machines, per 20 jiffies
> 	or so.  Performance really isn't a consideration.
> 
> 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")
> 
> 	And another ditto on blank lines.
> 
> 	This code executes once per CPU-hotplug operation, so again isn't
> 	at all performance critical.
> 
> In short, if you are trying to sell this to me as a significant performance
> boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,

Yep, it won't help the performance a lot, but it 

1)	helps the performance in theory, because it iterates less CPUs

2)	makes code cleaner. By "cleaner", I mean we can a) affort more
	blank lines to make loops separated from other code and b)
	descrease the indent levels for those loops. But, yes I should
	add those points in the commit log, because those are more
	visible effects.

> though perhaps I am misunderstanding its purpose.  My assumption is
> that you want to detect missing UL suffixes on bitmask constants, in
> which case I bet there is a better way.
> 

The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask
constatns, and we don't need to check that, because we use
leaf_node_cpu_id() now. As I said, this is an over-case check, and we
can drop if we guarante that CPUs masked in ->qsmask* and ->expmask*
must be a "possible" CPU, IOW, ->qsmask* and ->expmask* are the subsets
(with offset fixed by ->grplo) of cpu_possible_mask.

Hmm.. and I just check the code, the initial values of ->qsmask* and
->expmask* are from ->qsmaskinitnext and ->expmaskinitnext, and the
latter two are set in rcu_cpu_starting() since commit

	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")

, and rcu_cpu_starting() only set the corresponding bit of _this_ cpu in
a leaf node's ->qsmaskinitnext and ->expmaskinitnext. So looks like we
are safe to remove the WARN_ON_ONCE() check, because a ever-running CPU
must be a possible CPU, IIRC.

But this brings a side question, is the callsite of rcu_cpu_starting()
is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
set _this_ cpu's bit in a leaf node?

Regards,
Boqun

> Speaking of which, how do we know that this is free of bugs?
> 
> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > ------------------------>8
> > From: Boqun Feng <boqun.feng@gmail.com>
> > Date: Thu, 8 Dec 2016 23:21:11 +0800
> > Subject: [PATCH v2.1 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
> > 
> > ->qsmask of an RCU leaf node is usually more sparse than the
> > corresponding cpu_possible_mask. So replace the
> > for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > for_each_leaf_node_cpu() to save several checks.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  kernel/rcu/tree.c | 12 +++++-------
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 4ea4496f4ecc..c2b753fb7f09 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3046,13 +3046,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
> >  				continue;
> >  			}
> >  		}
> > -		for_each_leaf_node_possible_cpu(rnp, cpu) {
> > -			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
> > -			if ((rnp->qsmask & bit) != 0) {
> > -				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> > -					mask |= bit;
> > -			}
> > -		}
> > +
> > +		for_each_leaf_node_cpu(rnp, rnp->qsmask, cpu)
> > +			if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
> > +				mask |= leaf_node_cpu_bit(rnp, cpu);
> > +
> >  		if (mask != 0) {
> >  			/* Idle/offline CPUs, report (releases rnp->lock. */
> >  			rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
> > -- 
> > 2.10.2
> > 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-20  5:59             ` Boqun Feng
@ 2016-12-20  8:11               ` Boqun Feng
  2016-12-20 15:32                 ` Paul E. McKenney
  2016-12-20 15:23               ` Paul E. McKenney
  1 sibling, 1 reply; 26+ messages in thread
From: Boqun Feng @ 2016-12-20  8:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Thomas Gleixner, Sebastian Andrzej Siewior, Ingo Molnar,
	Peter Zijlstra

On Tue, Dec 20, 2016 at 01:59:14PM +0800, Boqun Feng wrote:
> On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> > > On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> > > > On 15/12/16 14:42, Boqun Feng wrote:
> > > > > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> > > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > > > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > > > >>> corresponding cpu_possible_mask. So replace the
> > > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > > > >>> for_each_leaf_node_cpu() to save several checks.
> > > > >>>
> > > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > > > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > > > >>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> > > > >>> a previous version of this patch.]
> > > > >>
> > > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > > > >>
> > > > > 
> > > > > I kinda keep this here for honoring the effort of finding out this bug
> > > > > from Colin, but yes, it's no longer needed here for the current code.
> > > > 
> > > > Yep, remove it.
> > > > 
> > > 
> > > Paul, here is a modified version of this patch, what I only did is
> > > removing this note.
> > > 
> > > Besides I rebased the whole series on the current rcu/dev branch of -rcu
> > > tree, on this very commit:
> > > 
> > > 	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> > > 
> > > And I put the latest version at
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> > > 
> > > If you thought it's better, I could send a v3 ;-)
> > 
> > I would feel better about this patchset if it reduced the number of lines
> > of code rather than increasing them.  That said, part of the increase
> > is a commment.  Still, I am not convinced that the extra level of macro
> > is carrying its weight.
> > 
> > dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")
> > 
> > 	The commit log needs a bit of wordsmithing.
> > 
> > 	The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
> > 	What is its purpose, really?  What does its triggering tell you?
> > 	What other checks did you consider as an alternative?
> > 
> 
> The check is an over-case one, it's introduced because I'm worried about
> some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on
> an "impossible" CPU. I will explanation later in more details.
> 
> > 	And if you are going to add checks of this type, should you
> > 	also check for this being a leaf rcu_node structure?
> > 
> 
> I don't think I want to check that, and I don't think check
> cpu_possible(cpu) in the macro is similar to that.
> 
> > 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")
> > 
> > 	This does look a bit nicer, but why the added blank lines?
> > 	Are they really helping?
> > 
> > 	The commit log seems a bit misplaced.  This code is almost never
> > 	executed (once per 21 seconds at the most), so performance really
> > 	isn't a consideration.	The simpler-looking code might be.
> > 
> > fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")
> > 
> > 	Ditto on blank lines.
> > 
> > 	Again, this code is executed per expedited grace period, so
> > 	performance really isn't a big deal.  More of a big deal than
> > 	the stall-warning code, but we still are way off of any fastpath.
> > 
> > 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")
> > 
> > 	Ditto again on blank lines.
> > 
> > 	And on the commit log.  This code is executed about once
> > 	per several jiffies, and on larger machines, per 20 jiffies
> > 	or so.  Performance really isn't a consideration.
> > 
> > 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")
> > 
> > 	And another ditto on blank lines.
> > 
> > 	This code executes once per CPU-hotplug operation, so again isn't
> > 	at all performance critical.
> > 
> > In short, if you are trying to sell this to me as a significant performance
> > boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,
> 
> Yep, it won't help the performance a lot, but it 
> 
> 1)	helps the performance in theory, because it iterates less CPUs
> 
> 2)	makes code cleaner. By "cleaner", I mean we can a) affort more
> 	blank lines to make loops separated from other code and b)
> 	descrease the indent levels for those loops. But, yes I should
> 	add those points in the commit log, because those are more
> 	visible effects.
> 
> > though perhaps I am misunderstanding its purpose.  My assumption is
> > that you want to detect missing UL suffixes on bitmask constants, in
> > which case I bet there is a better way.
> > 
> 
> The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask
> constatns, and we don't need to check that, because we use
> leaf_node_cpu_id() now. As I said, this is an over-case check, and we
> can drop if we guarante that CPUs masked in ->qsmask* and ->expmask*
> must be a "possible" CPU, IOW, ->qsmask* and ->expmask* are the subsets
> (with offset fixed by ->grplo) of cpu_possible_mask.
> 
> Hmm.. and I just check the code, the initial values of ->qsmask* and
> ->expmask* are from ->qsmaskinitnext and ->expmaskinitnext, and the
> latter two are set in rcu_cpu_starting() since commit
> 
> 	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")
> 
> , and rcu_cpu_starting() only set the corresponding bit of _this_ cpu in
> a leaf node's ->qsmaskinitnext and ->expmaskinitnext. So looks like we
> are safe to remove the WARN_ON_ONCE() check, because a ever-running CPU
> must be a possible CPU, IIRC.
> 
> But this brings a side question, is the callsite of rcu_cpu_starting()
> is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only

By "callsite", I mean we call rcu_cpu_starting() in a
for_each_online_cpu() loop. And that doesn't seem making sense to me,
because rcu_cpu_starting() doesn't use its parameter @cpu. So I made the
following untested patch to fix this.

Thoughts?

> set _this_ cpu's bit in a leaf node?
> 

Regards,
Boqun

-------------------------------->8
From: Boqun Feng <boqun.feng@gmail.com>
Date: Tue, 20 Dec 2016 15:10:57 +0800
Subject: [PATCH] rcu: Rename rcu_cpu_starting() to rcu_this_cpu_starting()

rcu_cpu_starting() was introduced at commit:

	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")

, and was to inform RCU core the onlining of _this_ cpu, and it was
implemented as its purpose, which made the parameter @cpu useless.

It's better if we remove the unnecessary parameter and rename it to
rcu_this_cpu_starting(), which fits its functionality well. Besides, in
rcu_init(), we actually loop over all online CPUs but keep notifying
that the boot cpu is online to RCU core, so we'd better pull the
notification part out of the loop.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/rcupdate.h |  2 +-
 kernel/cpu.c             |  2 +-
 kernel/rcu/tree.c        | 17 ++++++++---------
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 813074714a95..f23c9dafbda9 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -335,7 +335,7 @@ void rcu_sched_qs(void);
 void rcu_bh_qs(void);
 void rcu_check_callbacks(int user);
 void rcu_report_dead(unsigned int cpu);
-void rcu_cpu_starting(unsigned int cpu);
+void rcu_this_cpu_starting(void);
 
 #ifndef CONFIG_TINY_RCU
 void rcu_end_inkernel_boot(void);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5df20d6d1520..63778ed6b598 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -966,7 +966,7 @@ void notify_cpu_starting(unsigned int cpu)
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
 
-	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
+	rcu_this_cpu_starting();	/* Enables RCU usage on this CPU. */
 	while (st->state < target) {
 		st->state++;
 		cpuhp_invoke_callback(cpu, st->state, true, NULL);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b9d3c0e30935..c5862aef7e21 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4002,13 +4002,13 @@ int rcutree_dead_cpu(unsigned int cpu)
 }
 
 /*
- * Mark the specified CPU as being online so that subsequent grace periods
- * (both expedited and normal) will wait on it.  Note that this means that
- * incoming CPUs are not allowed to use RCU read-side critical sections
- * until this function is called.  Failing to observe this restriction
- * will result in lockdep splats.
+ * Mark this CPU(CPU that is currently running this function) as being online
+ * so that subsequent grace periods (both expedited and normal) will wait on
+ * it.  Note that this means that incoming CPUs are not allowed to use RCU
+ * read-side critical sections until this function is called.  Failing to
+ * observe this restriction will result in lockdep splats.
  */
-void rcu_cpu_starting(unsigned int cpu)
+void rcu_this_cpu_starting(void)
 {
 	unsigned long flags;
 	unsigned long mask;
@@ -4376,10 +4376,9 @@ void __init rcu_init(void)
 	 * or the scheduler are operational.
 	 */
 	pm_notifier(rcu_pm_notify, 0);
-	for_each_online_cpu(cpu) {
+	for_each_online_cpu(cpu)
 		rcutree_prepare_cpu(cpu);
-		rcu_cpu_starting(cpu);
-	}
+	rcu_this_cpu_starting(); /* Start RCU on the booting CPU */
 }
 
 #include "tree_exp.h"
-- 
2.10.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-20  5:59             ` Boqun Feng
  2016-12-20  8:11               ` Boqun Feng
@ 2016-12-20 15:23               ` Paul E. McKenney
  2016-12-21  2:34                 ` Boqun Feng
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2016-12-20 15:23 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

On Tue, Dec 20, 2016 at 01:59:14PM +0800, Boqun Feng wrote:
> On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> > > On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> > > > On 15/12/16 14:42, Boqun Feng wrote:
> > > > > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> > > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > > > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > > > >>> corresponding cpu_possible_mask. So replace the
> > > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > > > >>> for_each_leaf_node_cpu() to save several checks.
> > > > >>>
> > > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > > > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > > > >>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> > > > >>> a previous version of this patch.]
> > > > >>
> > > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > > > >>
> > > > > 
> > > > > I kinda keep this here for honoring the effort of finding out this bug
> > > > > from Colin, but yes, it's no longer needed here for the current code.
> > > > 
> > > > Yep, remove it.
> > > > 
> > > 
> > > Paul, here is a modified version of this patch, what I only did is
> > > removing this note.
> > > 
> > > Besides I rebased the whole series on the current rcu/dev branch of -rcu
> > > tree, on this very commit:
> > > 
> > > 	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> > > 
> > > And I put the latest version at
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> > > 
> > > If you thought it's better, I could send a v3 ;-)
> > 
> > I would feel better about this patchset if it reduced the number of lines
> > of code rather than increasing them.  That said, part of the increase
> > is a commment.  Still, I am not convinced that the extra level of macro
> > is carrying its weight.
> > 
> > dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")
> > 
> > 	The commit log needs a bit of wordsmithing.
> > 
> > 	The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
> > 	What is its purpose, really?  What does its triggering tell you?
> > 	What other checks did you consider as an alternative?
> 
> The check is an over-case one, it's introduced because I'm worried about
> some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on
> an "impossible" CPU. I will explanation later in more details.

Over-case check?

> > 	And if you are going to add checks of this type, should you
> > 	also check for this being a leaf rcu_node structure?
> 
> I don't think I want to check that, and I don't think check
> cpu_possible(cpu) in the macro is similar to that.

If we are adding checks, they should be catching bugs.  This is of
course a trade-off -- too many checks makes the code less readable
and makes it more difficult to change the code.  Too few checks
makes bugs harder to pin down.

At this point, I don't really see the need for either check.  ;-)

> > 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")
> > 
> > 	This does look a bit nicer, but why the added blank lines?
> > 	Are they really helping?
> > 
> > 	The commit log seems a bit misplaced.  This code is almost never
> > 	executed (once per 21 seconds at the most), so performance really
> > 	isn't a consideration.	The simpler-looking code might be.
> > 
> > fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")
> > 
> > 	Ditto on blank lines.
> > 
> > 	Again, this code is executed per expedited grace period, so
> > 	performance really isn't a big deal.  More of a big deal than
> > 	the stall-warning code, but we still are way off of any fastpath.
> > 
> > 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")
> > 
> > 	Ditto again on blank lines.
> > 
> > 	And on the commit log.  This code is executed about once
> > 	per several jiffies, and on larger machines, per 20 jiffies
> > 	or so.  Performance really isn't a consideration.
> > 
> > 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")
> > 
> > 	And another ditto on blank lines.
> > 
> > 	This code executes once per CPU-hotplug operation, so again isn't
> > 	at all performance critical.
> > 
> > In short, if you are trying to sell this to me as a significant performance
> > boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,
> 
> Yep, it won't help the performance a lot, but it 
> 
> 1)	helps the performance in theory, because it iterates less CPUs
> 
> 2)	makes code cleaner. By "cleaner", I mean we can a) affort more
> 	blank lines to make loops separated from other code and b)
> 	descrease the indent levels for those loops. But, yes I should
> 	add those points in the commit log, because those are more
> 	visible effects.

#2 is the more important of the two, though you still have not
convinced me that those particular blank lines are helping.  Making
these functions longer isn't necessarily a good thing.

> > though perhaps I am misunderstanding its purpose.  My assumption is
> > that you want to detect missing UL suffixes on bitmask constants, in
> > which case I bet there is a better way.
> 
> The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask
> constatns, and we don't need to check that, because we use
> leaf_node_cpu_id() now. As I said, this is an over-case check, and we
> can drop if we guarante that CPUs masked in ->qsmask* and ->expmask*
> must be a "possible" CPU, IOW, ->qsmask* and ->expmask* are the subsets
> (with offset fixed by ->grplo) of cpu_possible_mask.

In which case it is not needed.

> Hmm.. and I just check the code, the initial values of ->qsmask* and
> ->expmask* are from ->qsmaskinitnext and ->expmaskinitnext, and the
> latter two are set in rcu_cpu_starting() since commit
> 
> 	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")
> 
> , and rcu_cpu_starting() only set the corresponding bit of _this_ cpu in
> a leaf node's ->qsmaskinitnext and ->expmaskinitnext. So looks like we
> are safe to remove the WARN_ON_ONCE() check, because a ever-running CPU
> must be a possible CPU, IIRC.
> 
> But this brings a side question, is the callsite of rcu_cpu_starting()
> is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> set _this_ cpu's bit in a leaf node?

The calls from notify_cpu_starting() are called from the various
start_kernel_secondary(), secondary_start_kernel(), and similarly
named functions.  These are called on the incoming CPU early in that
CPU's execution.  The call from rcu_init() is correct until such time
as more than one CPU can be running at rcu_init() time.  And that
day might be coming, so please see the untested patch below.

							Thanx, Paul

------------------------------------------------------------------------

commit 1e84402587173d6d4da8645689f0e24c877b3269
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Dec 20 07:17:58 2016 -0800

    rcu: Make rcu_cpu_starting() use its "cpu" argument
    
    The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
    incoming CPU's rcu_data structure.  This works for the boot CPU and for
    all CPUs onlined after rcu_init() executes (during very early boot).
    Currently, this is the full set of CPUs, so all is well.  But if
    anyone ever parallelizes boot before rcu_init() time, it will fail.
    This commit therefore substitutes the rcu_cpu_starting() function's
    this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
    (arguably) improving readability.
    
    Reported-by: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b9d3c0e30935..083cb8a6299c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu)
 	struct rcu_state *rsp;
 
 	for_each_rcu_flavor(rsp) {
-		rdp = this_cpu_ptr(rsp->rda);
+		rdp = per_cpu_ptr(rsp->rda, cpu);
 		rnp = rdp->mynode;
 		mask = rdp->grpmask;
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-20  8:11               ` Boqun Feng
@ 2016-12-20 15:32                 ` Paul E. McKenney
  0 siblings, 0 replies; 26+ messages in thread
From: Paul E. McKenney @ 2016-12-20 15:32 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Thomas Gleixner, Sebastian Andrzej Siewior, Ingo Molnar,
	Peter Zijlstra

On Tue, Dec 20, 2016 at 04:11:51PM +0800, Boqun Feng wrote:
> On Tue, Dec 20, 2016 at 01:59:14PM +0800, Boqun Feng wrote:
> > On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote:
> > > On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> > > > On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> > > > > On 15/12/16 14:42, Boqun Feng wrote:
> > > > > > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> > > > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > > > > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > > > > >>> corresponding cpu_possible_mask. So replace the
> > > > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > > > > >>> for_each_leaf_node_cpu() to save several checks.
> > > > > >>>
> > > > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > > > > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > > > > >>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> > > > > >>> a previous version of this patch.]
> > > > > >>
> > > > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > > > > >>
> > > > > > 
> > > > > > I kinda keep this here for honoring the effort of finding out this bug
> > > > > > from Colin, but yes, it's no longer needed here for the current code.
> > > > > 
> > > > > Yep, remove it.
> > > > > 
> > > > 
> > > > Paul, here is a modified version of this patch, what I only did is
> > > > removing this note.
> > > > 
> > > > Besides I rebased the whole series on the current rcu/dev branch of -rcu
> > > > tree, on this very commit:
> > > > 
> > > > 	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> > > > 
> > > > And I put the latest version at
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> > > > 
> > > > If you thought it's better, I could send a v3 ;-)
> > > 
> > > I would feel better about this patchset if it reduced the number of lines
> > > of code rather than increasing them.  That said, part of the increase
> > > is a commment.  Still, I am not convinced that the extra level of macro
> > > is carrying its weight.
> > > 
> > > dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")
> > > 
> > > 	The commit log needs a bit of wordsmithing.
> > > 
> > > 	The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
> > > 	What is its purpose, really?  What does its triggering tell you?
> > > 	What other checks did you consider as an alternative?
> > > 
> > 
> > The check is an over-case one, it's introduced because I'm worried about
> > some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on
> > an "impossible" CPU. I will explanation later in more details.
> > 
> > > 	And if you are going to add checks of this type, should you
> > > 	also check for this being a leaf rcu_node structure?
> > > 
> > 
> > I don't think I want to check that, and I don't think check
> > cpu_possible(cpu) in the macro is similar to that.
> > 
> > > 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")
> > > 
> > > 	This does look a bit nicer, but why the added blank lines?
> > > 	Are they really helping?
> > > 
> > > 	The commit log seems a bit misplaced.  This code is almost never
> > > 	executed (once per 21 seconds at the most), so performance really
> > > 	isn't a consideration.	The simpler-looking code might be.
> > > 
> > > fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")
> > > 
> > > 	Ditto on blank lines.
> > > 
> > > 	Again, this code is executed per expedited grace period, so
> > > 	performance really isn't a big deal.  More of a big deal than
> > > 	the stall-warning code, but we still are way off of any fastpath.
> > > 
> > > 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")
> > > 
> > > 	Ditto again on blank lines.
> > > 
> > > 	And on the commit log.  This code is executed about once
> > > 	per several jiffies, and on larger machines, per 20 jiffies
> > > 	or so.  Performance really isn't a consideration.
> > > 
> > > 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")
> > > 
> > > 	And another ditto on blank lines.
> > > 
> > > 	This code executes once per CPU-hotplug operation, so again isn't
> > > 	at all performance critical.
> > > 
> > > In short, if you are trying to sell this to me as a significant performance
> > > boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,
> > 
> > Yep, it won't help the performance a lot, but it 
> > 
> > 1)	helps the performance in theory, because it iterates less CPUs
> > 
> > 2)	makes code cleaner. By "cleaner", I mean we can a) affort more
> > 	blank lines to make loops separated from other code and b)
> > 	descrease the indent levels for those loops. But, yes I should
> > 	add those points in the commit log, because those are more
> > 	visible effects.
> > 
> > > though perhaps I am misunderstanding its purpose.  My assumption is
> > > that you want to detect missing UL suffixes on bitmask constants, in
> > > which case I bet there is a better way.
> > > 
> > 
> > The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask
> > constatns, and we don't need to check that, because we use
> > leaf_node_cpu_id() now. As I said, this is an over-case check, and we
> > can drop if we guarante that CPUs masked in ->qsmask* and ->expmask*
> > must be a "possible" CPU, IOW, ->qsmask* and ->expmask* are the subsets
> > (with offset fixed by ->grplo) of cpu_possible_mask.
> > 
> > Hmm.. and I just check the code, the initial values of ->qsmask* and
> > ->expmask* are from ->qsmaskinitnext and ->expmaskinitnext, and the
> > latter two are set in rcu_cpu_starting() since commit
> > 
> > 	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")
> > 
> > , and rcu_cpu_starting() only set the corresponding bit of _this_ cpu in
> > a leaf node's ->qsmaskinitnext and ->expmaskinitnext. So looks like we
> > are safe to remove the WARN_ON_ONCE() check, because a ever-running CPU
> > must be a possible CPU, IIRC.
> > 
> > But this brings a side question, is the callsite of rcu_cpu_starting()
> > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> 
> By "callsite", I mean we call rcu_cpu_starting() in a
> for_each_online_cpu() loop. And that doesn't seem making sense to me,
> because rcu_cpu_starting() doesn't use its parameter @cpu. So I made the
> following untested patch to fix this.
> 
> Thoughts?

This would be a legitimate approach, except that the fast-boot guys
give me some reason for concern.  See my earlier patch substituting
this_cpu_ptr() for per_cpu_ptr().

Coming back to your original patch series, if the check in
for_each_leaf_node_cpu() is removed, the added blank lines are removed,
and we have some heavy-duty validation in place, I am inclined to accept
it.  I am more worried about validation than I might be in other cases.
This is because the main effect of this patch is aesthetics on the one
hand and because of the missing-UL issue in the first submission.

							Thanx, Paul

> > set _this_ cpu's bit in a leaf node?
> > 
> 
> Regards,
> Boqun
> 
> -------------------------------->8
> From: Boqun Feng <boqun.feng@gmail.com>
> Date: Tue, 20 Dec 2016 15:10:57 +0800
> Subject: [PATCH] rcu: Rename rcu_cpu_starting() to rcu_this_cpu_starting()
> 
> rcu_cpu_starting() was introduced at commit:
> 
> 	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")
> 
> , and was to inform RCU core the onlining of _this_ cpu, and it was
> implemented as its purpose, which made the parameter @cpu useless.
> 
> It's better if we remove the unnecessary parameter and rename it to
> rcu_this_cpu_starting(), which fits its functionality well. Besides, in
> rcu_init(), we actually loop over all online CPUs but keep notifying
> that the boot cpu is online to RCU core, so we'd better pull the
> notification part out of the loop.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  include/linux/rcupdate.h |  2 +-
>  kernel/cpu.c             |  2 +-
>  kernel/rcu/tree.c        | 17 ++++++++---------
>  3 files changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 813074714a95..f23c9dafbda9 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -335,7 +335,7 @@ void rcu_sched_qs(void);
>  void rcu_bh_qs(void);
>  void rcu_check_callbacks(int user);
>  void rcu_report_dead(unsigned int cpu);
> -void rcu_cpu_starting(unsigned int cpu);
> +void rcu_this_cpu_starting(void);
> 
>  #ifndef CONFIG_TINY_RCU
>  void rcu_end_inkernel_boot(void);
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5df20d6d1520..63778ed6b598 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -966,7 +966,7 @@ void notify_cpu_starting(unsigned int cpu)
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>  	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
> 
> -	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
> +	rcu_this_cpu_starting();	/* Enables RCU usage on this CPU. */
>  	while (st->state < target) {
>  		st->state++;
>  		cpuhp_invoke_callback(cpu, st->state, true, NULL);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b9d3c0e30935..c5862aef7e21 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4002,13 +4002,13 @@ int rcutree_dead_cpu(unsigned int cpu)
>  }
> 
>  /*
> - * Mark the specified CPU as being online so that subsequent grace periods
> - * (both expedited and normal) will wait on it.  Note that this means that
> - * incoming CPUs are not allowed to use RCU read-side critical sections
> - * until this function is called.  Failing to observe this restriction
> - * will result in lockdep splats.
> + * Mark this CPU(CPU that is currently running this function) as being online
> + * so that subsequent grace periods (both expedited and normal) will wait on
> + * it.  Note that this means that incoming CPUs are not allowed to use RCU
> + * read-side critical sections until this function is called.  Failing to
> + * observe this restriction will result in lockdep splats.
>   */
> -void rcu_cpu_starting(unsigned int cpu)
> +void rcu_this_cpu_starting(void)
>  {
>  	unsigned long flags;
>  	unsigned long mask;
> @@ -4376,10 +4376,9 @@ void __init rcu_init(void)
>  	 * or the scheduler are operational.
>  	 */
>  	pm_notifier(rcu_pm_notify, 0);
> -	for_each_online_cpu(cpu) {
> +	for_each_online_cpu(cpu)
>  		rcutree_prepare_cpu(cpu);
> -		rcu_cpu_starting(cpu);
> -	}
> +	rcu_this_cpu_starting(); /* Start RCU on the booting CPU */
>  }
> 
>  #include "tree_exp.h"
> -- 
> 2.10.2
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-20 15:23               ` Paul E. McKenney
@ 2016-12-21  2:34                 ` Boqun Feng
  2016-12-21  3:40                   ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Boqun Feng @ 2016-12-21  2:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 10038 bytes --]

On Tue, Dec 20, 2016 at 07:23:52AM -0800, Paul E. McKenney wrote:
> On Tue, Dec 20, 2016 at 01:59:14PM +0800, Boqun Feng wrote:
> > On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote:
> > > On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> > > > On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> > > > > On 15/12/16 14:42, Boqun Feng wrote:
> > > > > > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> > > > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > > > > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > > > > >>> corresponding cpu_possible_mask. So replace the
> > > > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > > > > >>> for_each_leaf_node_cpu() to save several checks.
> > > > > >>>
> > > > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > > > > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > > > > >>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> > > > > >>> a previous version of this patch.]
> > > > > >>
> > > > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > > > > >>
> > > > > > 
> > > > > > I kinda keep this here for honoring the effort of finding out this bug
> > > > > > from Colin, but yes, it's no longer needed here for the current code.
> > > > > 
> > > > > Yep, remove it.
> > > > > 
> > > > 
> > > > Paul, here is a modified version of this patch, what I only did is
> > > > removing this note.
> > > > 
> > > > Besides I rebased the whole series on the current rcu/dev branch of -rcu
> > > > tree, on this very commit:
> > > > 
> > > > 	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> > > > 
> > > > And I put the latest version at
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> > > > 
> > > > If you thought it's better, I could send a v3 ;-)
> > > 
> > > I would feel better about this patchset if it reduced the number of lines
> > > of code rather than increasing them.  That said, part of the increase
> > > is a commment.  Still, I am not convinced that the extra level of macro
> > > is carrying its weight.
> > > 
> > > dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")
> > > 
> > > 	The commit log needs a bit of wordsmithing.
> > > 
> > > 	The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
> > > 	What is its purpose, really?  What does its triggering tell you?
> > > 	What other checks did you consider as an alternative?
> > 
> > The check is an over-case one, it's introduced because I'm worried about
> > some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on
> > an "impossible" CPU. I will explanation later in more details.
> 
> Over-case check?
> 

Oops, sorry for the typo, should be "over-care check".

> > > 	And if you are going to add checks of this type, should you
> > > 	also check for this being a leaf rcu_node structure?
> > 
> > I don't think I want to check that, and I don't think check
> > cpu_possible(cpu) in the macro is similar to that.
> 
> If we are adding checks, they should be catching bugs.  This is of
> course a trade-off -- too many checks makes the code less readable
> and makes it more difficult to change the code.  Too few checks
> makes bugs harder to pin down.
> 
> At this point, I don't really see the need for either check.  ;-)
> 

Agreed, my intent is to keep this overcare check for couples of releases
and if no one shoots his/her foot, we can remove it, if not, it
definitely means this part is subtle, and we need to pay more attention
to it, maybe write some regression tests for this particular problem to
help developers avoid it.

This check is supposed to be removed, so I'm not stick to keeping it.

> > > 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")
> > > 
> > > 	This does look a bit nicer, but why the added blank lines?
> > > 	Are they really helping?
> > > 
> > > 	The commit log seems a bit misplaced.  This code is almost never
> > > 	executed (once per 21 seconds at the most), so performance really
> > > 	isn't a consideration.	The simpler-looking code might be.
> > > 
> > > fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")
> > > 
> > > 	Ditto on blank lines.
> > > 
> > > 	Again, this code is executed per expedited grace period, so
> > > 	performance really isn't a big deal.  More of a big deal than
> > > 	the stall-warning code, but we still are way off of any fastpath.
> > > 
> > > 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")
> > > 
> > > 	Ditto again on blank lines.
> > > 
> > > 	And on the commit log.  This code is executed about once
> > > 	per several jiffies, and on larger machines, per 20 jiffies
> > > 	or so.  Performance really isn't a consideration.
> > > 
> > > 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")
> > > 
> > > 	And another ditto on blank lines.
> > > 
> > > 	This code executes once per CPU-hotplug operation, so again isn't
> > > 	at all performance critical.
> > > 
> > > In short, if you are trying to sell this to me as a significant performance
> > > boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,
> > 
> > Yep, it won't help the performance a lot, but it 
> > 
> > 1)	helps the performance in theory, because it iterates less CPUs
> > 
> > 2)	makes code cleaner. By "cleaner", I mean we can a) affort more
> > 	blank lines to make loops separated from other code and b)
> > 	descrease the indent levels for those loops. But, yes I should
> > 	add those points in the commit log, because those are more
> > 	visible effects.
> 
> #2 is the more important of the two, though you still have not
> convinced me that those particular blank lines are helping.  Making

TBH, blank lines really help me ;-) So my habit is more like adding
blank lines between logical parts like if-else, for, while as more as
possible.

> these functions longer isn't necessarily a good thing.
> 

But you are right, I probably shouldn't introduce more blank lines in
this kind of patchset, after all, this patchset did want to clean the
code a little bit, but adding more blank lines seems like we are making
things more complicated so we need those blank lines to separate logical
parts. So we don't want those blank lines.

> > > though perhaps I am misunderstanding its purpose.  My assumption is
> > > that you want to detect missing UL suffixes on bitmask constants, in
> > > which case I bet there is a better way.
> > 
> > The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask
> > constatns, and we don't need to check that, because we use
> > leaf_node_cpu_id() now. As I said, this is an over-case check, and we
> > can drop if we guarante that CPUs masked in ->qsmask* and ->expmask*
> > must be a "possible" CPU, IOW, ->qsmask* and ->expmask* are the subsets
> > (with offset fixed by ->grplo) of cpu_possible_mask.
> 
> In which case it is not needed.
> 
> > Hmm.. and I just check the code, the initial values of ->qsmask* and
> > ->expmask* are from ->qsmaskinitnext and ->expmaskinitnext, and the
> > latter two are set in rcu_cpu_starting() since commit
> > 
> > 	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")
> > 
> > , and rcu_cpu_starting() only set the corresponding bit of _this_ cpu in
> > a leaf node's ->qsmaskinitnext and ->expmaskinitnext. So looks like we
> > are safe to remove the WARN_ON_ONCE() check, because a ever-running CPU
> > must be a possible CPU, IIRC.
> > 
> > But this brings a side question, is the callsite of rcu_cpu_starting()
> > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> > set _this_ cpu's bit in a leaf node?
> 
> The calls from notify_cpu_starting() are called from the various
> start_kernel_secondary(), secondary_start_kernel(), and similarly
> named functions.  These are called on the incoming CPU early in that
> CPU's execution.  The call from rcu_init() is correct until such time
> as more than one CPU can be running at rcu_init() time.  And that
> day might be coming, so please see the untested patch below.
> 

Looks better than mine ;-)

But do we need to worry that we start rcu on each CPU twice, which may
slow down the boot?

Regards,
Boqun

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 1e84402587173d6d4da8645689f0e24c877b3269
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Tue Dec 20 07:17:58 2016 -0800
> 
>     rcu: Make rcu_cpu_starting() use its "cpu" argument
>     
>     The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
>     incoming CPU's rcu_data structure.  This works for the boot CPU and for
>     all CPUs onlined after rcu_init() executes (during very early boot).
>     Currently, this is the full set of CPUs, so all is well.  But if
>     anyone ever parallelizes boot before rcu_init() time, it will fail.
>     This commit therefore substitutes the rcu_cpu_starting() function's
>     this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
>     (arguably) improving readability.
>     
>     Reported-by: Boqun Feng <boqun.feng@gmail.com>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b9d3c0e30935..083cb8a6299c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu)
>  	struct rcu_state *rsp;
>  
>  	for_each_rcu_flavor(rsp) {
> -		rdp = this_cpu_ptr(rsp->rda);
> +		rdp = per_cpu_ptr(rsp->rda, cpu);
>  		rnp = rdp->mynode;
>  		mask = rdp->grpmask;
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-21  2:34                 ` Boqun Feng
@ 2016-12-21  3:40                   ` Paul E. McKenney
  2016-12-21  4:18                     ` Boqun Feng
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2016-12-21  3:40 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

On Wed, Dec 21, 2016 at 10:34:56AM +0800, Boqun Feng wrote:
> On Tue, Dec 20, 2016 at 07:23:52AM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 20, 2016 at 01:59:14PM +0800, Boqun Feng wrote:
> > > On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> > > > > On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> > > > > > On 15/12/16 14:42, Boqun Feng wrote:
> > > > > > > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> > > > > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > > > > > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > > > > > >>> corresponding cpu_possible_mask. So replace the
> > > > > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > > > > > >>> for_each_leaf_node_cpu() to save several checks.
> > > > > > >>>
> > > > > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > > > > > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > > > > > >>> spotted by Colin Ian King <colin.king@canonical.com> and CoverityScan in
> > > > > > >>> a previous version of this patch.]
> > > > > > >>
> > > > > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > > > > > >>
> > > > > > > 
> > > > > > > I kinda keep this here for honoring the effort of finding out this bug
> > > > > > > from Colin, but yes, it's no longer needed here for the current code.
> > > > > > 
> > > > > > Yep, remove it.
> > > > > > 
> > > > > 
> > > > > Paul, here is a modified version of this patch, what I only did is
> > > > > removing this note.
> > > > > 
> > > > > Besides I rebased the whole series on the current rcu/dev branch of -rcu
> > > > > tree, on this very commit:
> > > > > 
> > > > > 	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> > > > > 
> > > > > And I put the latest version at
> > > > > 
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> > > > > 
> > > > > If you thought it's better, I could send a v3 ;-)
> > > > 
> > > > I would feel better about this patchset if it reduced the number of lines
> > > > of code rather than increasing them.  That said, part of the increase
> > > > is a commment.  Still, I am not convinced that the extra level of macro
> > > > is carrying its weight.
> > > > 
> > > > dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")
> > > > 
> > > > 	The commit log needs a bit of wordsmithing.
> > > > 
> > > > 	The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
> > > > 	What is its purpose, really?  What does its triggering tell you?
> > > > 	What other checks did you consider as an alternative?
> > > 
> > > The check is an over-case one, it's introduced because I'm worried about
> > > some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on
> > > an "impossible" CPU. I will explanation later in more details.
> > 
> > Over-case check?
> 
> Oops, sorry for the typo, should be "over-care check".
> 
> > > > 	And if you are going to add checks of this type, should you
> > > > 	also check for this being a leaf rcu_node structure?
> > > 
> > > I don't think I want to check that, and I don't think check
> > > cpu_possible(cpu) in the macro is similar to that.
> > 
> > If we are adding checks, they should be catching bugs.  This is of
> > course a trade-off -- too many checks makes the code less readable
> > and makes it more difficult to change the code.  Too few checks
> > makes bugs harder to pin down.
> > 
> > At this point, I don't really see the need for either check.  ;-)
> 
> Agreed, my intent is to keep this overcare check for couples of releases
> and if no one shoots his/her foot, we can remove it, if not, it
> definitely means this part is subtle, and we need to pay more attention
> to it, maybe write some regression tests for this particular problem to
> help developers avoid it.
> 
> This check is supposed to be removed, so I'm not stick to keeping it.

I suggest keeping through validation.  If it triggers during that time,
consider keeping it longer.  If it does not trigger, remove it before
it goes upstream.

> > > > 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")
> > > > 
> > > > 	This does look a bit nicer, but why the added blank lines?
> > > > 	Are they really helping?
> > > > 
> > > > 	The commit log seems a bit misplaced.  This code is almost never
> > > > 	executed (once per 21 seconds at the most), so performance really
> > > > 	isn't a consideration.	The simpler-looking code might be.
> > > > 
> > > > fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")
> > > > 
> > > > 	Ditto on blank lines.
> > > > 
> > > > 	Again, this code is executed per expedited grace period, so
> > > > 	performance really isn't a big deal.  More of a big deal than
> > > > 	the stall-warning code, but we still are way off of any fastpath.
> > > > 
> > > > 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")
> > > > 
> > > > 	Ditto again on blank lines.
> > > > 
> > > > 	And on the commit log.  This code is executed about once
> > > > 	per several jiffies, and on larger machines, per 20 jiffies
> > > > 	or so.  Performance really isn't a consideration.
> > > > 
> > > > 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")
> > > > 
> > > > 	And another ditto on blank lines.
> > > > 
> > > > 	This code executes once per CPU-hotplug operation, so again isn't
> > > > 	at all performance critical.
> > > > 
> > > > In short, if you are trying to sell this to me as a significant performance
> > > > boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,
> > > 
> > > Yep, it won't help the performance a lot, but it 
> > > 
> > > 1)	helps the performance in theory, because it iterates less CPUs
> > > 
> > > 2)	makes code cleaner. By "cleaner", I mean we can a) affort more
> > > 	blank lines to make loops separated from other code and b)
> > > 	descrease the indent levels for those loops. But, yes I should
> > > 	add those points in the commit log, because those are more
> > > 	visible effects.
> > 
> > #2 is the more important of the two, though you still have not
> > convinced me that those particular blank lines are helping.  Making
> 
> TBH, blank lines really help me ;-) So my habit is more like adding
> blank lines between logical parts like if-else, for, while as more as
> possible.
> 
> > these functions longer isn't necessarily a good thing.
> 
> But you are right, I probably shouldn't introduce more blank lines in
> this kind of patchset, after all, this patchset did want to clean the
> code a little bit, but adding more blank lines seems like we are making
> things more complicated so we need those blank lines to separate logical
> parts. So we don't want those blank lines.

It is indeed hard to see a simplification patch unless diffstat shows
fewer lines...

> > > > though perhaps I am misunderstanding its purpose.  My assumption is
> > > > that you want to detect missing UL suffixes on bitmask constants, in
> > > > which case I bet there is a better way.
> > > 
> > > The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask
> > > constatns, and we don't need to check that, because we use
> > > leaf_node_cpu_id() now. As I said, this is an over-case check, and we
> > > can drop if we guarante that CPUs masked in ->qsmask* and ->expmask*
> > > must be a "possible" CPU, IOW, ->qsmask* and ->expmask* are the subsets
> > > (with offset fixed by ->grplo) of cpu_possible_mask.
> > 
> > In which case it is not needed.
> > 
> > > Hmm.. and I just check the code, the initial values of ->qsmask* and
> > > ->expmask* are from ->qsmaskinitnext and ->expmaskinitnext, and the
> > > latter two are set in rcu_cpu_starting() since commit
> > > 
> > > 	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")
> > > 
> > > , and rcu_cpu_starting() only set the corresponding bit of _this_ cpu in
> > > a leaf node's ->qsmaskinitnext and ->expmaskinitnext. So looks like we
> > > are safe to remove the WARN_ON_ONCE() check, because a ever-running CPU
> > > must be a possible CPU, IIRC.
> > > 
> > > But this brings a side question, is the callsite of rcu_cpu_starting()
> > > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> > > set _this_ cpu's bit in a leaf node?
> > 
> > The calls from notify_cpu_starting() are called from the various
> > start_kernel_secondary(), secondary_start_kernel(), and similarly
> > named functions.  These are called on the incoming CPU early in that
> > CPU's execution.  The call from rcu_init() is correct until such time
> > as more than one CPU can be running at rcu_init() time.  And that
> > day might be coming, so please see the untested patch below.
> 
> Looks better than mine ;-)
> 
> But do we need to worry that we start rcu on each CPU twice, which may
> slow down the boot?

We only start a given CPU once.  The boot CPU at rcu_init() time, and
the rest at CPU-hotplug time.  Unless of course a CPU is later taken
offline, in which case we start it again when it comes back online.

							Thanx, Paul

> Regards,
> Boqun
> 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 1e84402587173d6d4da8645689f0e24c877b3269
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Tue Dec 20 07:17:58 2016 -0800
> > 
> >     rcu: Make rcu_cpu_starting() use its "cpu" argument
> >     
> >     The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
> >     incoming CPU's rcu_data structure.  This works for the boot CPU and for
> >     all CPUs onlined after rcu_init() executes (during very early boot).
> >     Currently, this is the full set of CPUs, so all is well.  But if
> >     anyone ever parallelizes boot before rcu_init() time, it will fail.
> >     This commit therefore substitutes the rcu_cpu_starting() function's
> >     this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
> >     (arguably) improving readability.
> >     
> >     Reported-by: Boqun Feng <boqun.feng@gmail.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b9d3c0e30935..083cb8a6299c 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu)
> >  	struct rcu_state *rsp;
> >  
> >  	for_each_rcu_flavor(rsp) {
> > -		rdp = this_cpu_ptr(rsp->rda);
> > +		rdp = per_cpu_ptr(rsp->rda, cpu);
> >  		rnp = rdp->mynode;
> >  		mask = rdp->grpmask;
> >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-21  3:40                   ` Paul E. McKenney
@ 2016-12-21  4:18                     ` Boqun Feng
  2016-12-21 16:48                       ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Boqun Feng @ 2016-12-21  4:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 3832 bytes --]

On Tue, Dec 20, 2016 at 07:40:24PM -0800, Paul E. McKenney wrote:
[...]
> > 
> > Agreed, my intent is to keep this overcare check for couples of releases
> > and if no one shoots his/her foot, we can remove it, if not, it
> > definitely means this part is subtle, and we need to pay more attention
> > to it, maybe write some regression tests for this particular problem to
> > help developers avoid it.
> > 
> > This check is supposed to be removed, so I'm not stick to keeping it.
> 
> I suggest keeping through validation.  If it triggers during that time,
> consider keeping it longer.  If it does not trigger, remove it before
> it goes upstream.
> 

Good point ;-)

[...]
> > > > 
> > > > But this brings a side question, is the callsite of rcu_cpu_starting()
> > > > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> > > > set _this_ cpu's bit in a leaf node?
> > > 
> > > The calls from notify_cpu_starting() are called from the various
> > > start_kernel_secondary(), secondary_start_kernel(), and similarly
> > > named functions.  These are called on the incoming CPU early in that
> > > CPU's execution.  The call from rcu_init() is correct until such time
> > > as more than one CPU can be running at rcu_init() time.  And that
> > > day might be coming, so please see the untested patch below.
> > 
> > Looks better than mine ;-)
> > 
> > But do we need to worry that we start rcu on each CPU twice, which may
> > slow down the boot?
> 
> We only start a given CPU once.  The boot CPU at rcu_init() time, and
> the rest at CPU-hotplug time.  Unless of course a CPU is later taken

Confused... we call rcu_cpu_starting() in a for_each_online_cpu() loop
in rcu_init(), so we basically start all online CPUs there after
applying your patch. And all the rest CPUs will get themselves start
again at CPU-hotplug time, right?

Besides, without your patch, we started the boot CPU many times in the
for_each_online_cpu() loop.

Am I missing something subtle?

Regards,
Boqun

> offline, in which case we start it again when it comes back online.
> 
> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > commit 1e84402587173d6d4da8645689f0e24c877b3269
> > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Date:   Tue Dec 20 07:17:58 2016 -0800
> > > 
> > >     rcu: Make rcu_cpu_starting() use its "cpu" argument
> > >     
> > >     The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
> > >     incoming CPU's rcu_data structure.  This works for the boot CPU and for
> > >     all CPUs onlined after rcu_init() executes (during very early boot).
> > >     Currently, this is the full set of CPUs, so all is well.  But if
> > >     anyone ever parallelizes boot before rcu_init() time, it will fail.
> > >     This commit therefore substitutes the rcu_cpu_starting() function's
> > >     this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
> > >     (arguably) improving readability.
> > >     
> > >     Reported-by: Boqun Feng <boqun.feng@gmail.com>
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b9d3c0e30935..083cb8a6299c 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > >  	struct rcu_state *rsp;
> > >  
> > >  	for_each_rcu_flavor(rsp) {
> > > -		rdp = this_cpu_ptr(rsp->rda);
> > > +		rdp = per_cpu_ptr(rsp->rda, cpu);
> > >  		rnp = rdp->mynode;
> > >  		mask = rdp->grpmask;
> > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-21  4:18                     ` Boqun Feng
@ 2016-12-21 16:48                       ` Paul E. McKenney
  2016-12-22  1:08                         ` Boqun Feng
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2016-12-21 16:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

On Wed, Dec 21, 2016 at 12:18:08PM +0800, Boqun Feng wrote:
> On Tue, Dec 20, 2016 at 07:40:24PM -0800, Paul E. McKenney wrote:
> [...]
> > > 
> > > Agreed, my intent is to keep this overcare check for couples of releases
> > > and if no one shoots his/her foot, we can remove it, if not, it
> > > definitely means this part is subtle, and we need to pay more attention
> > > to it, maybe write some regression tests for this particular problem to
> > > help developers avoid it.
> > > 
> > > This check is supposed to be removed, so I'm not stick to keeping it.
> > 
> > I suggest keeping through validation.  If it triggers during that time,
> > consider keeping it longer.  If it does not trigger, remove it before
> > it goes upstream.
> 
> Good point ;-)
> 
> [...]
> > > > > 
> > > > > But this brings a side question, is the callsite of rcu_cpu_starting()
> > > > > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> > > > > set _this_ cpu's bit in a leaf node?
> > > > 
> > > > The calls from notify_cpu_starting() are called from the various
> > > > start_kernel_secondary(), secondary_start_kernel(), and similarly
> > > > named functions.  These are called on the incoming CPU early in that
> > > > CPU's execution.  The call from rcu_init() is correct until such time
> > > > as more than one CPU can be running at rcu_init() time.  And that
> > > > day might be coming, so please see the untested patch below.
> > > 
> > > Looks better than mine ;-)
> > > 
> > > But do we need to worry that we start rcu on each CPU twice, which may
> > > slow down the boot?
> > 
> > We only start a given CPU once.  The boot CPU at rcu_init() time, and
> > the rest at CPU-hotplug time.  Unless of course a CPU is later taken
> 
> Confused... we call rcu_cpu_starting() in a for_each_online_cpu() loop
> in rcu_init(), so we basically start all online CPUs there after
> applying your patch. And all the rest CPUs will get themselves start
> again at CPU-hotplug time, right?

At rcu_init() time, there is only one online CPU, namely the boot CPU.

Or perhaps your point is that if CPUs come online before rcu_init(), they
might do so via the normal online mechanism.  I don't believe that this
is likely, because the normal online mechanism reaquires the scheduler
be running.  But either way, my hope would be that whoever fires up CPUs
before rcu_init() asks a few questions when they run into bugs.  ;-)

> Besides, without your patch, we started the boot CPU many times in the
> for_each_online_cpu() loop.

That is true.  It is harmless because it just does a group of assignments
repeatedly, and because there is only one CPU and because interrupts
are disabled, this cannot have any effect.  And my fix inadvertently
fixed this issue, didn't it?

So I do need to update the commit log accordingly.  Done!

> Am I missing something subtle?

Given the nature of RCU, the only possible answer I can give to that
question is "probably".  (Hey, you asked!!!)

							Thanx, Paul

> Regards,
> Boqun
> 
> > offline, in which case we start it again when it comes back online.
> > 
> > 							Thanx, Paul
> > 
> > > Regards,
> > > Boqun
> > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > commit 1e84402587173d6d4da8645689f0e24c877b3269
> > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > Date:   Tue Dec 20 07:17:58 2016 -0800
> > > > 
> > > >     rcu: Make rcu_cpu_starting() use its "cpu" argument
> > > >     
> > > >     The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
> > > >     incoming CPU's rcu_data structure.  This works for the boot CPU and for
> > > >     all CPUs onlined after rcu_init() executes (during very early boot).
> > > >     Currently, this is the full set of CPUs, so all is well.  But if
> > > >     anyone ever parallelizes boot before rcu_init() time, it will fail.
> > > >     This commit therefore substitutes the rcu_cpu_starting() function's
> > > >     this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
> > > >     (arguably) improving readability.
> > > >     
> > > >     Reported-by: Boqun Feng <boqun.feng@gmail.com>
> > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b9d3c0e30935..083cb8a6299c 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > >  	struct rcu_state *rsp;
> > > >  
> > > >  	for_each_rcu_flavor(rsp) {
> > > > -		rdp = this_cpu_ptr(rsp->rda);
> > > > +		rdp = per_cpu_ptr(rsp->rda, cpu);
> > > >  		rnp = rdp->mynode;
> > > >  		mask = rdp->grpmask;
> > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > 
> > 
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()
  2016-12-21 16:48                       ` Paul E. McKenney
@ 2016-12-22  1:08                         ` Boqun Feng
  0 siblings, 0 replies; 26+ messages in thread
From: Boqun Feng @ 2016-12-22  1:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Colin Ian King, Mark Rutland, linux-kernel, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 5562 bytes --]

On Wed, Dec 21, 2016 at 08:48:45AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 21, 2016 at 12:18:08PM +0800, Boqun Feng wrote:
> > On Tue, Dec 20, 2016 at 07:40:24PM -0800, Paul E. McKenney wrote:
> > [...]
> > > > 
> > > > Agreed, my intent is to keep this overcare check for couples of releases
> > > > and if no one shoots his/her foot, we can remove it, if not, it
> > > > definitely means this part is subtle, and we need to pay more attention
> > > > to it, maybe write some regression tests for this particular problem to
> > > > help developers avoid it.
> > > > 
> > > > This check is supposed to be removed, so I'm not stick to keeping it.
> > > 
> > > I suggest keeping through validation.  If it triggers during that time,
> > > consider keeping it longer.  If it does not trigger, remove it before
> > > it goes upstream.
> > 
> > Good point ;-)
> > 
> > [...]
> > > > > > 
> > > > > > But this brings a side question, is the callsite of rcu_cpu_starting()
> > > > > > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> > > > > > set _this_ cpu's bit in a leaf node?
> > > > > 
> > > > > The calls from notify_cpu_starting() are called from the various
> > > > > start_kernel_secondary(), secondary_start_kernel(), and similarly
> > > > > named functions.  These are called on the incoming CPU early in that
> > > > > CPU's execution.  The call from rcu_init() is correct until such time
> > > > > as more than one CPU can be running at rcu_init() time.  And that
> > > > > day might be coming, so please see the untested patch below.
> > > > 
> > > > Looks better than mine ;-)
> > > > 
> > > > But do we need to worry that we start rcu on each CPU twice, which may
> > > > slow down the boot?
> > > 
> > > We only start a given CPU once.  The boot CPU at rcu_init() time, and
> > > the rest at CPU-hotplug time.  Unless of course a CPU is later taken
> > 
> > Confused... we call rcu_cpu_starting() in a for_each_online_cpu() loop
> > in rcu_init(), so we basically start all online CPUs there after
> > applying your patch. And all the rest CPUs will get themselves start
> > again at CPU-hotplug time, right?
> 
> At rcu_init() time, there is only one online CPU, namely the boot CPU.
> 
> Or perhaps your point is that if CPUs come online before rcu_init(), they
> might do so via the normal online mechanism.  I don't believe that this
> is likely, because the normal online mechanism reaquires the scheduler
> be running.  But either way, my hope would be that whoever fires up CPUs
> before rcu_init() asks a few questions when they run into bugs.  ;-)
> 

;-)

> > Besides, without your patch, we started the boot CPU many times in the
> > for_each_online_cpu() loop.
> 
> That is true.  It is harmless because it just does a group of assignments
> repeatedly, and because there is only one CPU and because interrupts
> are disabled, this cannot have any effect.  And my fix inadvertently
> fixed this issue, didn't it?
> 

Yep!

> So I do need to update the commit log accordingly.  Done!
> 
> > Am I missing something subtle?
> 
> Given the nature of RCU, the only possible answer I can give to that
> question is "probably".  (Hey, you asked!!!)
> 

True, I misread the for_each_online_cpu() loop in rcu_init(), I thought
at that time, CPUs other than boot cpu have already mask themselves in
the cpu_online_mask. But that's not true.. 

Sorry for the noice, and thank you for explanation ;-)

Regards,
Boqun

> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > > offline, in which case we start it again when it comes back online.
> > > 
> > > 							Thanx, Paul
> > > 
> > > > Regards,
> > > > Boqun
> > > > 
> > > > > 							Thanx, Paul
> > > > > 
> > > > > ------------------------------------------------------------------------
> > > > > 
> > > > > commit 1e84402587173d6d4da8645689f0e24c877b3269
> > > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > Date:   Tue Dec 20 07:17:58 2016 -0800
> > > > > 
> > > > >     rcu: Make rcu_cpu_starting() use its "cpu" argument
> > > > >     
> > > > >     The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
> > > > >     incoming CPU's rcu_data structure.  This works for the boot CPU and for
> > > > >     all CPUs onlined after rcu_init() executes (during very early boot).
> > > > >     Currently, this is the full set of CPUs, so all is well.  But if
> > > > >     anyone ever parallelizes boot before rcu_init() time, it will fail.
> > > > >     This commit therefore substitutes the rcu_cpu_starting() function's
> > > > >     this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
> > > > >     (arguably) improving readability.
> > > > >     
> > > > >     Reported-by: Boqun Feng <boqun.feng@gmail.com>
> > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index b9d3c0e30935..083cb8a6299c 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > > >  	struct rcu_state *rsp;
> > > > >  
> > > > >  	for_each_rcu_flavor(rsp) {
> > > > > -		rdp = this_cpu_ptr(rsp->rda);
> > > > > +		rdp = per_cpu_ptr(rsp->rda, cpu);
> > > > >  		rnp = rdp->mynode;
> > > > >  		mask = rdp->grpmask;
> > > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > > 
> > > 
> > > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-12-22  1:08 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15  2:41 [RFC v2 0/5] rcu: Introduce for_each_leaf_node_cpu() Boqun Feng
2016-12-15  2:42 ` [RFC v2 1/5] " Boqun Feng
2016-12-15 11:43   ` Mark Rutland
2016-12-15 14:38     ` Boqun Feng
2016-12-15 15:10       ` Mark Rutland
2016-12-15 15:14         ` Boqun Feng
2016-12-15 15:21   ` [RFC v2.1 " Boqun Feng
2016-12-15 15:29     ` Mark Rutland
2016-12-15  2:42 ` [RFC v2 2/5] rcu: Use for_each_leaf_node_cpu() in RCU stall checking Boqun Feng
2016-12-15  2:42 ` [RFC v2 3/5] rcu: Use for_each_leaf_node_cpu() in ->expmask iteration Boqun Feng
2016-12-15  2:42 ` [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp() Boqun Feng
2016-12-15 12:04   ` Mark Rutland
2016-12-15 14:42     ` Boqun Feng
2016-12-15 14:51       ` Colin Ian King
2016-12-19 15:15         ` Boqun Feng
2016-12-20  5:09           ` Paul E. McKenney
2016-12-20  5:59             ` Boqun Feng
2016-12-20  8:11               ` Boqun Feng
2016-12-20 15:32                 ` Paul E. McKenney
2016-12-20 15:23               ` Paul E. McKenney
2016-12-21  2:34                 ` Boqun Feng
2016-12-21  3:40                   ` Paul E. McKenney
2016-12-21  4:18                     ` Boqun Feng
2016-12-21 16:48                       ` Paul E. McKenney
2016-12-22  1:08                         ` Boqun Feng
2016-12-15  2:42 ` [RFC v2 5/5] rcu: Use for_each_leaf_node_cpu() in online CPU iteration Boqun Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).