linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend
@ 2016-12-09  8:48 Boqun Feng
  2016-12-09  8:48 ` [RFC 1/5] rcu: Introduce primitives to iterate mask bits in an RCU leaf node Boqun Feng
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Boqun Feng @ 2016-12-09  8:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan

Hi Paul,

While reading the discussion at:

https://marc.info/?l=linux-kernel&m=148044253400769

I figured we might use this fact to save some extra checks in RCU core code,
currently we iterate over all the possible CPUs on a leaf node, check whether
they were masked in a certain mask and do something. However, given the fact
that the masks on a leaf node should always be sparse than the corresponding
part of cpu_possible_mask, we'd better iterate over all bits in a mask and
check whether the corresponding CPU is possible or not.

So I made this RFC, I did a simple build/boot/rcutorture test on my box with
SMP=4, nothing bad happens. Currently I'm waiting for the 0day and trying to
test this one a bigger system, in the meanwhile, looking forwards to any
comment and suggestion.

So thoughts?

Regards,
Boqun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC 1/5] rcu: Introduce primitives to iterate mask bits in an RCU leaf node
  2016-12-09  8:48 [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Boqun Feng
@ 2016-12-09  8:48 ` Boqun Feng
  2016-12-09  8:48 ` [RFC 2/5] rcu: Use leaf_node_for_each_mask_possible_cpu() in RCU stall checking Boqun Feng
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Boqun Feng @ 2016-12-09  8:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Boqun Feng

There are some places inside RCU core, where we need to iterate all mask
(->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all
corresponding CPUs. The current code iterates all possible CPUs in this
leaf node and then checks with the mask to see whether the bit is set.

However, given the fact that most bits in cpu_possible_mask are set but
rare bits in an RCU leaf node mask are set(in other words, ->qsmask and
its friends are usually more sparse than cpu_possible_mask), it's better
to iterate in the other way, that is iterating mask bits in a leaf node
and then checking with cpu_possible(). By doing so, we can save several
checks in the loop, moreover, that fast path checking(e.g. ->qsmask ==
0) could then be consolidated into the loop logic.

This patch introduce leaf_node_for_each_mask_bit() and
leaf_node_for_each_mask_possible_cpu() to iterate mask bits in a more
efficient way.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index c0a4bf8f1ed0..4078a8ec2bd1 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -260,6 +260,9 @@ struct rcu_node {
  */
 #define leaf_node_cpu_bit(rnp, cpu) (1UL << ((cpu) - (rnp)->grplo))
 
+/* This returns the corresponding cpu_id for a bit in a RCU lead node */
+#define leaf_node_cpu_id(rnp, bit) ((bit) + (rnp)->grplo)
+
 /*
  * Do a full breadth-first scan of the rcu_node structures for the
  * specified rcu_state structure.
@@ -295,6 +298,33 @@ struct rcu_node {
 	     cpu <= rnp->grphi; \
 	     cpu = cpumask_next((cpu), cpu_possible_mask))
 
+
+#define QSMASK_BITS(mask)	(BITS_PER_BYTE * sizeof(mask))
+/*
+ * Iterate over all set bits in @mask of a leaf RCU node.
+ *
+ * The iterator is the bit offset in @mask of a leaf node, to get the cpu
+ * id, use leaf_node_cpu_id()
+ *
+ * Note @rnp has to be a leaf node and @mask has to belong to @rnp.
+ */
+#define leaf_node_for_each_mask_bit(rnp, mask, bit) \
+	for ((bit) = find_first_bit(&(mask), QSMASK_BITS(mask)); \
+	     (bit) < QSMASK_BITS(mask); \
+	     (bit) = find_next_bit(&(mask), QSMASK_BITS(mask), (bit) + 1))
+
+/*
+ * Iterate over all possible CPUs a leaf RCU node which are still masked in
+ * @mask.
+ *
+ * Note @rnp has to be a leaf node and @mask has to belong to @rnp.
+ */
+#define leaf_node_for_each_mask_possible_cpu(rnp, mask, bit, cpu) \
+	leaf_node_for_each_mask_bit(rnp, mask, bit) \
+		if (!cpu_possible((cpu) = leaf_node_cpu_id(rnp, bit))) \
+			continue; \
+		else
+
 /*
  * Union to allow "aggregate OR" operation on the need for a quiescent
  * state by the normal and expedited grace periods.
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 2/5] rcu: Use leaf_node_for_each_mask_possible_cpu() in RCU stall checking
  2016-12-09  8:48 [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Boqun Feng
  2016-12-09  8:48 ` [RFC 1/5] rcu: Introduce primitives to iterate mask bits in an RCU leaf node Boqun Feng
@ 2016-12-09  8:48 ` Boqun Feng
  2016-12-09  8:48 ` [RFC 3/5] rcu: Use leaf_node_for_each_mask_possible_cpu() for ->expmask iteration Boqun Feng
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Boqun Feng @ 2016-12-09  8:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Boqun Feng

Replace for_each_leaf_node_possible_cpu() loop with
leaf_node_for_each_mask_possible_cpu() loop to gain fewer checks in RCU
stall checking code.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b56e2dc31ba5..5f6802b9ba5f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1412,15 +1412,16 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
 static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 {
 	int cpu;
-	unsigned long flags;
+	unsigned long flags, bit;
 	struct rcu_node *rnp;
 
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
-		for_each_leaf_node_possible_cpu(rnp, cpu)
-			if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
-				if (!trigger_single_cpu_backtrace(cpu))
-					dump_cpu_task(cpu);
+
+		leaf_node_for_each_mask_possible_cpu(rnp, rnp->qsmask, bit, cpu)
+			if (!trigger_single_cpu_backtrace(cpu))
+				dump_cpu_task(cpu);
+
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
 }
@@ -1455,6 +1456,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 {
 	int cpu;
 	long delta;
+	unsigned long bit;
 	unsigned long flags;
 	unsigned long gpa;
 	unsigned long j;
@@ -1490,13 +1492,12 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		ndetected += rcu_print_task_stall(rnp);
-		if (rnp->qsmask != 0) {
-			for_each_leaf_node_possible_cpu(rnp, cpu)
-				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
-					print_cpu_stall_info(rsp, cpu);
-					ndetected++;
-				}
+
+		leaf_node_for_each_mask_possible_cpu(rnp, rnp->qsmask, bit, cpu) {
+			print_cpu_stall_info(rsp, cpu);
+			ndetected++;
 		}
+
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 3/5] rcu: Use leaf_node_for_each_mask_possible_cpu() for ->expmask iteration
  2016-12-09  8:48 [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Boqun Feng
  2016-12-09  8:48 ` [RFC 1/5] rcu: Introduce primitives to iterate mask bits in an RCU leaf node Boqun Feng
  2016-12-09  8:48 ` [RFC 2/5] rcu: Use leaf_node_for_each_mask_possible_cpu() in RCU stall checking Boqun Feng
@ 2016-12-09  8:48 ` Boqun Feng
  2016-12-09  8:48 ` [RFC 4/5] rcu: Use leaf_node_for_each_mask_possible_cpu() in force_qs_rnp() Boqun Feng
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Boqun Feng @ 2016-12-09  8:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Boqun Feng

The ->expmask of an RCU leaf node should be more sparse than the
corresponding part of cpu_possible_mask, iterating on ->expmask bitmap
rather cpu_possible_mask to save some checks.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_exp.h | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index a3a8756670d1..8c50a1cb9133 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -419,7 +419,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 	int cpu;
 	unsigned long jiffies_stall;
 	unsigned long jiffies_start;
-	unsigned long mask;
+	unsigned long bit;
 	int ndetected;
 	struct rcu_node *rnp;
 	struct rcu_node *rnp_root = rcu_get_root(rsp);
@@ -444,12 +444,9 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 		ndetected = 0;
 		rcu_for_each_leaf_node(rsp, rnp) {
 			ndetected += rcu_print_task_exp_stall(rnp);
-			for_each_leaf_node_possible_cpu(rnp, cpu) {
+			leaf_node_for_each_mask_possible_cpu(rnp, rnp->expmask, bit, cpu) {
 				struct rcu_data *rdp;
 
-				mask = leaf_node_cpu_bit(rnp, cpu);
-				if (!(rnp->expmask & mask))
-					continue;
 				ndetected++;
 				rdp = per_cpu_ptr(rsp->rda, cpu);
 				pr_cont(" %d-%c%c%c", cpu,
@@ -475,14 +472,10 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 			}
 			pr_cont("\n");
 		}
-		rcu_for_each_leaf_node(rsp, rnp) {
-			for_each_leaf_node_possible_cpu(rnp, cpu) {
-				mask = leaf_node_cpu_bit(rnp, cpu);
-				if (!(rnp->expmask & mask))
-					continue;
+		rcu_for_each_leaf_node(rsp, rnp)
+			leaf_node_for_each_mask_possible_cpu(rnp, rnp->expmask, bit, cpu)
 				dump_cpu_task(cpu);
-			}
-		}
+
 		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
 	}
 }
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 4/5] rcu: Use leaf_node_for_each_mask_possible_cpu() in force_qs_rnp()
  2016-12-09  8:48 [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Boqun Feng
                   ` (2 preceding siblings ...)
  2016-12-09  8:48 ` [RFC 3/5] rcu: Use leaf_node_for_each_mask_possible_cpu() for ->expmask iteration Boqun Feng
@ 2016-12-09  8:48 ` Boqun Feng
  2016-12-09  8:48 ` [RFC 5/5] rcu: Use leaf_node_for_each_mask_*() for leaf node online CPU iteration Boqun Feng
  2016-12-09 23:49 ` [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Paul E. McKenney
  5 siblings, 0 replies; 12+ messages in thread
From: Boqun Feng @ 2016-12-09  8:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Boqun Feng

->qsmask of an RCU leaf node is usually more sparse than the
corresponding cpu_possible_mask. So replace the
for_each_leaf_node_possible_cpu() in force_qs_rnp() with
leaf_node_for_each_mask_possible_cpu() to save several checks.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5f6802b9ba5f..10162ac71dbe 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3014,6 +3014,7 @@ static void force_qs_rnp(struct rcu_state *rsp,
 			 bool *isidle, unsigned long *maxj)
 {
 	int cpu;
+	unsigned long bit;
 	unsigned long flags;
 	unsigned long mask;
 	struct rcu_node *rnp;
@@ -3047,13 +3048,11 @@ static void force_qs_rnp(struct rcu_state *rsp,
 				continue;
 			}
 		}
-		for_each_leaf_node_possible_cpu(rnp, cpu) {
-			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
-			if ((rnp->qsmask & bit) != 0) {
-				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
-					mask |= bit;
-			}
-		}
+
+		leaf_node_for_each_mask_possible_cpu(rnp, rnp->qsmask, bit, cpu)
+			if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
+				mask |= 1 << bit;
+
 		if (mask != 0) {
 			/* Idle/offline CPUs, report (releases rnp->lock. */
 			rcu_report_qs_rnp(mask, rsp, rnp, rnp->gpnum, flags);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 5/5] rcu: Use leaf_node_for_each_mask_*() for leaf node online CPU iteration
  2016-12-09  8:48 [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Boqun Feng
                   ` (3 preceding siblings ...)
  2016-12-09  8:48 ` [RFC 4/5] rcu: Use leaf_node_for_each_mask_possible_cpu() in force_qs_rnp() Boqun Feng
@ 2016-12-09  8:48 ` Boqun Feng
  2016-12-09 23:49 ` [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Paul E. McKenney
  5 siblings, 0 replies; 12+ messages in thread
From: Boqun Feng @ 2016-12-09  8:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Boqun Feng

Though mostly identical, ->qsmaskinit(A.K.A rcu_rnp_online_cpus) is
sometimes more sparse than the corresponding part of cpu_possible_mask
for an RCU leaf node. So we use leaf_node_for_each_mask_possible_cpu()
in rcu_boost_kthread_setaffinity() instead to save some extra checks.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tree_plugin.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 69c6eb27c37f..c954c2a7a9ba 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1170,15 +1170,17 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 	unsigned long mask = rcu_rnp_online_cpus(rnp);
 	cpumask_var_t cm;
 	int cpu;
+	unsigned long bit;
 
 	if (!t)
 		return;
 	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
 		return;
-	for_each_leaf_node_possible_cpu(rnp, cpu)
-		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
-		    cpu != outgoingcpu)
+
+	leaf_node_for_each_mask_possible_cpu(rnp, mask, bit, cpu)
+		if (cpu != outgoingcpu)
 			cpumask_set_cpu(cpu, cm);
+
 	if (cpumask_weight(cm) == 0)
 		cpumask_setall(cm);
 	set_cpus_allowed_ptr(t, cm);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend
  2016-12-09  8:48 [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Boqun Feng
                   ` (4 preceding siblings ...)
  2016-12-09  8:48 ` [RFC 5/5] rcu: Use leaf_node_for_each_mask_*() for leaf node online CPU iteration Boqun Feng
@ 2016-12-09 23:49 ` Paul E. McKenney
  2016-12-10  0:45   ` Boqun Feng
  5 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2016-12-09 23:49 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Fri, Dec 09, 2016 at 04:48:22PM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> While reading the discussion at:
> 
> https://marc.info/?l=linux-kernel&m=148044253400769

This discussion was for stalls specifically, rather than for routine
scans of the bitmasks.

But it does look to save some code, so worth looking into.

> I figured we might use this fact to save some extra checks in RCU core code,
> currently we iterate over all the possible CPUs on a leaf node, check whether
> they were masked in a certain mask and do something. However, given the fact
> that the masks on a leaf node should always be sparse than the corresponding
> part of cpu_possible_mask, we'd better iterate over all bits in a mask and
> check whether the corresponding CPU is possible or not.
> 
> So I made this RFC, I did a simple build/boot/rcutorture test on my box with
> SMP=4, nothing bad happens. Currently I'm waiting for the 0day and trying to
> test this one a bigger system, in the meanwhile, looking forwards to any
> comment and suggestion.
> 
> So thoughts?

By analogy with for_each_cpu() and for_each_possible_cpu(), the name
should instead be for_each_leaf_node_cpu(), the tradition of excessively
long names in RCU notwithstanding.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend
  2016-12-09 23:49 ` [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Paul E. McKenney
@ 2016-12-10  0:45   ` Boqun Feng
  2016-12-10  4:28     ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Boqun Feng @ 2016-12-10  0:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 1737 bytes --]

On Fri, Dec 09, 2016 at 03:49:45PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 09, 2016 at 04:48:22PM +0800, Boqun Feng wrote:
> > Hi Paul,
> > 
> > While reading the discussion at:
> > 
> > https://marc.info/?l=linux-kernel&m=148044253400769
> 
> This discussion was for stalls specifically, rather than for routine
> scans of the bitmasks.
> 
> But it does look to save some code, so worth looking into.
> 
> > I figured we might use this fact to save some extra checks in RCU core code,
> > currently we iterate over all the possible CPUs on a leaf node, check whether
> > they were masked in a certain mask and do something. However, given the fact
> > that the masks on a leaf node should always be sparse than the corresponding
> > part of cpu_possible_mask, we'd better iterate over all bits in a mask and
> > check whether the corresponding CPU is possible or not.
> > 
> > So I made this RFC, I did a simple build/boot/rcutorture test on my box with
> > SMP=4, nothing bad happens. Currently I'm waiting for the 0day and trying to
> > test this one a bigger system, in the meanwhile, looking forwards to any
> > comment and suggestion.
> > 
> > So thoughts?
> 
> By analogy with for_each_cpu() and for_each_possible_cpu(), the name
> should instead be for_each_leaf_node_cpu(), the tradition of excessively
> long names in RCU notwithstanding.  ;-)
> 

Make sense ;-)

I think it's more appropriate to call it for_each_leaf_node_mask_cpu(),
because we don't iterate all cpus of a leaf node. The word "possible"
could be dropped because obviously we won't iterate over "impossible"
cpus in a leaf node ;-)

Will modify that in next version.

Regards,
Boqun

> 							Thanx, Paul
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend
  2016-12-10  0:45   ` Boqun Feng
@ 2016-12-10  4:28     ` Paul E. McKenney
  2016-12-10 13:36       ` Boqun Feng
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2016-12-10  4:28 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Sat, Dec 10, 2016 at 08:45:38AM +0800, Boqun Feng wrote:
> On Fri, Dec 09, 2016 at 03:49:45PM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 09, 2016 at 04:48:22PM +0800, Boqun Feng wrote:
> > > Hi Paul,
> > > 
> > > While reading the discussion at:
> > > 
> > > https://marc.info/?l=linux-kernel&m=148044253400769
> > 
> > This discussion was for stalls specifically, rather than for routine
> > scans of the bitmasks.
> > 
> > But it does look to save some code, so worth looking into.
> > 
> > > I figured we might use this fact to save some extra checks in RCU core code,
> > > currently we iterate over all the possible CPUs on a leaf node, check whether
> > > they were masked in a certain mask and do something. However, given the fact
> > > that the masks on a leaf node should always be sparse than the corresponding
> > > part of cpu_possible_mask, we'd better iterate over all bits in a mask and
> > > check whether the corresponding CPU is possible or not.
> > > 
> > > So I made this RFC, I did a simple build/boot/rcutorture test on my box with
> > > SMP=4, nothing bad happens. Currently I'm waiting for the 0day and trying to
> > > test this one a bigger system, in the meanwhile, looking forwards to any
> > > comment and suggestion.
> > > 
> > > So thoughts?
> > 
> > By analogy with for_each_cpu() and for_each_possible_cpu(), the name
> > should instead be for_each_leaf_node_cpu(), the tradition of excessively
> > long names in RCU notwithstanding.  ;-)
> > 
> 
> Make sense ;-)
> 
> I think it's more appropriate to call it for_each_leaf_node_mask_cpu(),
> because we don't iterate all cpus of a leaf node. The word "possible"
> could be dropped because obviously we won't iterate over "impossible"
> cpus in a leaf node ;-)

C'mon, Boqun!  The for_each_leaf_node_cpu() is not only consistent
with the for_each_cpu() family, it is shorter!  ;-)

							Thanx, Paul

> Will modify that in next version.
> 
> Regards,
> Boqun
> 
> > 							Thanx, Paul
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend
  2016-12-10  4:28     ` Paul E. McKenney
@ 2016-12-10 13:36       ` Boqun Feng
  2016-12-10 17:38         ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Boqun Feng @ 2016-12-10 13:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 2843 bytes --]

On Fri, Dec 09, 2016 at 08:28:05PM -0800, Paul E. McKenney wrote:
> On Sat, Dec 10, 2016 at 08:45:38AM +0800, Boqun Feng wrote:
> > On Fri, Dec 09, 2016 at 03:49:45PM -0800, Paul E. McKenney wrote:
> > > On Fri, Dec 09, 2016 at 04:48:22PM +0800, Boqun Feng wrote:
> > > > Hi Paul,
> > > > 
> > > > While reading the discussion at:
> > > > 
> > > > https://marc.info/?l=linux-kernel&m=148044253400769
> > > 
> > > This discussion was for stalls specifically, rather than for routine
> > > scans of the bitmasks.
> > > 
> > > But it does look to save some code, so worth looking into.
> > > 
> > > > I figured we might use this fact to save some extra checks in RCU core code,
> > > > currently we iterate over all the possible CPUs on a leaf node, check whether
> > > > they were masked in a certain mask and do something. However, given the fact
> > > > that the masks on a leaf node should always be sparse than the corresponding
> > > > part of cpu_possible_mask, we'd better iterate over all bits in a mask and
> > > > check whether the corresponding CPU is possible or not.
> > > > 
> > > > So I made this RFC, I did a simple build/boot/rcutorture test on my box with
> > > > SMP=4, nothing bad happens. Currently I'm waiting for the 0day and trying to
> > > > test this one a bigger system, in the meanwhile, looking forwards to any
> > > > comment and suggestion.
> > > > 
> > > > So thoughts?
> > > 
> > > By analogy with for_each_cpu() and for_each_possible_cpu(), the name
> > > should instead be for_each_leaf_node_cpu(), the tradition of excessively
> > > long names in RCU notwithstanding.  ;-)
> > > 
> > 
> > Make sense ;-)
> > 
> > I think it's more appropriate to call it for_each_leaf_node_mask_cpu(),
> > because we don't iterate all cpus of a leaf node. The word "possible"
> > could be dropped because obviously we won't iterate over "impossible"
> > cpus in a leaf node ;-)
> 
> C'mon, Boqun!  The for_each_leaf_node_cpu() is not only consistent
> with the for_each_cpu() family, it is shorter!  ;-)
> 

Sure ;-) But for_each_leaf_node_cpu() seems like an operation that
iterates over _all_ cpus in a leaf node, but I actually implement it as
an operation that iterates only the _masked_ cpus. So I feel like word
"mask" better be added in the name.

If we call it for_each_leaf_node_cpu(rnp, mask,...), we will rely on the
hope that readers could figure it out what the primitive actually does
by the indication of the parameter @mask.

I like shorter names too, but not sure whether putting "mask" in the
name is better. After all, naming is one of the most difficult
challenges in programming ;-)

Regards,
Boqun

> 							Thanx, Paul
> 
> > Will modify that in next version.
> > 
> > Regards,
> > Boqun
> > 
> > > 							Thanx, Paul
> > > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend
  2016-12-10 13:36       ` Boqun Feng
@ 2016-12-10 17:38         ` Paul E. McKenney
  2016-12-11  0:06           ` Boqun Feng
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2016-12-10 17:38 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

On Sat, Dec 10, 2016 at 09:36:29PM +0800, Boqun Feng wrote:
> On Fri, Dec 09, 2016 at 08:28:05PM -0800, Paul E. McKenney wrote:
> > On Sat, Dec 10, 2016 at 08:45:38AM +0800, Boqun Feng wrote:
> > > On Fri, Dec 09, 2016 at 03:49:45PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Dec 09, 2016 at 04:48:22PM +0800, Boqun Feng wrote:
> > > > > Hi Paul,
> > > > > 
> > > > > While reading the discussion at:
> > > > > 
> > > > > https://marc.info/?l=linux-kernel&m=148044253400769
> > > > 
> > > > This discussion was for stalls specifically, rather than for routine
> > > > scans of the bitmasks.
> > > > 
> > > > But it does look to save some code, so worth looking into.
> > > > 
> > > > > I figured we might use this fact to save some extra checks in RCU core code,
> > > > > currently we iterate over all the possible CPUs on a leaf node, check whether
> > > > > they were masked in a certain mask and do something. However, given the fact
> > > > > that the masks on a leaf node should always be sparse than the corresponding
> > > > > part of cpu_possible_mask, we'd better iterate over all bits in a mask and
> > > > > check whether the corresponding CPU is possible or not.
> > > > > 
> > > > > So I made this RFC, I did a simple build/boot/rcutorture test on my box with
> > > > > SMP=4, nothing bad happens. Currently I'm waiting for the 0day and trying to
> > > > > test this one a bigger system, in the meanwhile, looking forwards to any
> > > > > comment and suggestion.
> > > > > 
> > > > > So thoughts?
> > > > 
> > > > By analogy with for_each_cpu() and for_each_possible_cpu(), the name
> > > > should instead be for_each_leaf_node_cpu(), the tradition of excessively
> > > > long names in RCU notwithstanding.  ;-)
> > > > 
> > > 
> > > Make sense ;-)
> > > 
> > > I think it's more appropriate to call it for_each_leaf_node_mask_cpu(),
> > > because we don't iterate all cpus of a leaf node. The word "possible"
> > > could be dropped because obviously we won't iterate over "impossible"
> > > cpus in a leaf node ;-)
> > 
> > C'mon, Boqun!  The for_each_leaf_node_cpu() is not only consistent
> > with the for_each_cpu() family, it is shorter!  ;-)
> 
> Sure ;-) But for_each_leaf_node_cpu() seems like an operation that
> iterates over _all_ cpus in a leaf node, but I actually implement it as
> an operation that iterates only the _masked_ cpus. So I feel like word
> "mask" better be added in the name.

Although that is a fair point, the same can be said of for_each_cpu().
Which people seem to be able to use without undue pain.

> If we call it for_each_leaf_node_cpu(rnp, mask,...), we will rely on the
> hope that readers could figure it out what the primitive actually does
> by the indication of the parameter @mask.
> 
> I like shorter names too, but not sure whether putting "mask" in the
> name is better. After all, naming is one of the most difficult
> challenges in programming ;-)

The two most difficult challenges in programming are the last two hard
things that the person speaking worked on.  ;-)

Consistency is more important than the stand-alone understanding of
this particular name.  You can always add a comment pointing out that
it follows for_each_cpu().

							Thanx, Paul

> Regards,
> Boqun
> 
> > 							Thanx, Paul
> > 
> > > Will modify that in next version.
> > > 
> > > Regards,
> > > Boqun
> > > 
> > > > 							Thanx, Paul
> > > > 
> > 
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend
  2016-12-10 17:38         ` Paul E. McKenney
@ 2016-12-11  0:06           ` Boqun Feng
  0 siblings, 0 replies; 12+ messages in thread
From: Boqun Feng @ 2016-12-11  0:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 3817 bytes --]

On Sat, Dec 10, 2016 at 09:38:54AM -0800, Paul E. McKenney wrote:
> On Sat, Dec 10, 2016 at 09:36:29PM +0800, Boqun Feng wrote:
> > On Fri, Dec 09, 2016 at 08:28:05PM -0800, Paul E. McKenney wrote:
> > > On Sat, Dec 10, 2016 at 08:45:38AM +0800, Boqun Feng wrote:
> > > > On Fri, Dec 09, 2016 at 03:49:45PM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Dec 09, 2016 at 04:48:22PM +0800, Boqun Feng wrote:
> > > > > > Hi Paul,
> > > > > > 
> > > > > > While reading the discussion at:
> > > > > > 
> > > > > > https://marc.info/?l=linux-kernel&m=148044253400769
> > > > > 
> > > > > This discussion was for stalls specifically, rather than for routine
> > > > > scans of the bitmasks.
> > > > > 
> > > > > But it does look to save some code, so worth looking into.
> > > > > 
> > > > > > I figured we might use this fact to save some extra checks in RCU core code,
> > > > > > currently we iterate over all the possible CPUs on a leaf node, check whether
> > > > > > they were masked in a certain mask and do something. However, given the fact
> > > > > > that the masks on a leaf node should always be sparse than the corresponding
> > > > > > part of cpu_possible_mask, we'd better iterate over all bits in a mask and
> > > > > > check whether the corresponding CPU is possible or not.
> > > > > > 
> > > > > > So I made this RFC, I did a simple build/boot/rcutorture test on my box with
> > > > > > SMP=4, nothing bad happens. Currently I'm waiting for the 0day and trying to
> > > > > > test this one a bigger system, in the meanwhile, looking forwards to any
> > > > > > comment and suggestion.
> > > > > > 
> > > > > > So thoughts?
> > > > > 
> > > > > By analogy with for_each_cpu() and for_each_possible_cpu(), the name
> > > > > should instead be for_each_leaf_node_cpu(), the tradition of excessively
> > > > > long names in RCU notwithstanding.  ;-)
> > > > > 
> > > > 
> > > > Make sense ;-)
> > > > 
> > > > I think it's more appropriate to call it for_each_leaf_node_mask_cpu(),
> > > > because we don't iterate all cpus of a leaf node. The word "possible"
> > > > could be dropped because obviously we won't iterate over "impossible"
> > > > cpus in a leaf node ;-)
> > > 
> > > C'mon, Boqun!  The for_each_leaf_node_cpu() is not only consistent
> > > with the for_each_cpu() family, it is shorter!  ;-)
> > 
> > Sure ;-) But for_each_leaf_node_cpu() seems like an operation that
> > iterates over _all_ cpus in a leaf node, but I actually implement it as
> > an operation that iterates only the _masked_ cpus. So I feel like word
> > "mask" better be added in the name.
> 
> Although that is a fair point, the same can be said of for_each_cpu().
> Which people seem to be able to use without undue pain.
> 
> > If we call it for_each_leaf_node_cpu(rnp, mask,...), we will rely on the
> > hope that readers could figure it out what the primitive actually does
> > by the indication of the parameter @mask.
> > 
> > I like shorter names too, but not sure whether putting "mask" in the
> > name is better. After all, naming is one of the most difficult
> > challenges in programming ;-)
> 
> The two most difficult challenges in programming are the last two hard
> things that the person speaking worked on.  ;-)
> 

;-)

> Consistency is more important than the stand-alone understanding of
> this particular name.  You can always add a comment pointing out that
> it follows for_each_cpu().
> 

Fair enough. Let us name it for_each_leaf_node_cpu() ;-)

Regards,
Boqun

> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > > 							Thanx, Paul
> > > 
> > > > Will modify that in next version.
> > > > 
> > > > Regards,
> > > > Boqun
> > > > 
> > > > > 							Thanx, Paul
> > > > > 
> > > 
> > > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-12-11  0:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-09  8:48 [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Boqun Feng
2016-12-09  8:48 ` [RFC 1/5] rcu: Introduce primitives to iterate mask bits in an RCU leaf node Boqun Feng
2016-12-09  8:48 ` [RFC 2/5] rcu: Use leaf_node_for_each_mask_possible_cpu() in RCU stall checking Boqun Feng
2016-12-09  8:48 ` [RFC 3/5] rcu: Use leaf_node_for_each_mask_possible_cpu() for ->expmask iteration Boqun Feng
2016-12-09  8:48 ` [RFC 4/5] rcu: Use leaf_node_for_each_mask_possible_cpu() in force_qs_rnp() Boqun Feng
2016-12-09  8:48 ` [RFC 5/5] rcu: Use leaf_node_for_each_mask_*() for leaf node online CPU iteration Boqun Feng
2016-12-09 23:49 ` [RFC 0/5] rcu: Introduce leaf_node_for_each_mask_possible_cpu() and its friend Paul E. McKenney
2016-12-10  0:45   ` Boqun Feng
2016-12-10  4:28     ` Paul E. McKenney
2016-12-10 13:36       ` Boqun Feng
2016-12-10 17:38         ` Paul E. McKenney
2016-12-11  0:06           ` Boqun Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).