linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3] rcu: tree: correctly handle sparse possible cpus
@ 2016-05-23 10:42 Mark Rutland
  2016-05-23 16:13 ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rutland @ 2016-05-23 10:42 UTC (permalink / raw)
  To: paulmck
  Cc: catalin.marinas, dennis.chen, jiangshanlai, josh, linux-kernel,
	mark.rutland, mathieu.desnoyers, rostedt, steve.capper,
	will.deacon

In many cases in the RCU tree code, we iterate over the set of cpus for
a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
per-cpu data for each cpu in this range. However, if the set of possible
cpus is sparse, some cpus described in this range are not possible, and
thus no per-cpu region will have been allocated (or initialised) for
them by the generic percpu code.

Erroneous accesses to a per-cpu area for these !possible cpus may fault
or may hit other data depending on the addressed generated when the
erroneous per cpu offset is applied. In practice, both cases have been
observed on arm64 hardware (the former being silent, but detectable with
additional patches).

To avoid issues resulting from this, we must iterate over the set of
*possible* cpus for a given leaf node. This patch add a new helper,
for_each_leaf_node_possible_cpu, to enable this. As iteration is often
intertwined with rcu_node local bitmask manipulation, a new
leaf_node_cpu_bit helper is added to make this simpler and more
consistent. The RCU tree code is made to use both of these where
appropriate.

Without this patch, running reboot at a shell can result in an oops
like:

[ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
[ 3369.083881] pgd = ffffffc3ecdda000
[ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
[ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 3369.101781] Modules linked in:
[ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G        W       4.6.0+ #3
[ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
[ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
[ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5
[ 3369.146860] sp : ffffffc3eb9435a0
[ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
[ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
[ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
[ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
[ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
[ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
[ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
[ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
[ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
[ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
[ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
[ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
[ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
[ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
[ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
...
[ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8
[ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30
[ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298
[ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0
[ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40
[ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158
[ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60
[ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918
[ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8
[ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220
[ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8
[ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40
[ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238
[ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358
[ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30
[ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280
[ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88
[ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20
[ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Dennis Chen <dennis.chen@arm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
---
 kernel/rcu/tree.c        | 21 +++++++++------------
 kernel/rcu/tree.h        | 15 +++++++++++++++
 kernel/rcu/tree_exp.h    | 16 +++++++---------
 kernel/rcu/tree_plugin.h |  5 +++--
 4 files changed, 34 insertions(+), 23 deletions(-)

Since v1 [1]:
 * rebase to the -rcu rcu/dev branch.
 * replace all occurences missed by v1.
 * s/CPUS/CPUs/ in commit message. Gah.
 *
Since v2 [2]:
 * Rebase to today's -rcu rcu/dev branch (51189938b1c011be)
 * Fix build with CONFIG_RCU_BOOST
 * Split leaf_node_cpu_bit() to avoid undefined behaviour
 * s/CPUs/cpus/ in commit message for consistency. Gah, again.

Paul, sorry for all the trouble with v1 and v2.

I've built this for x86 and arm64 with and without CONFIG_RCU_BOOST, and I've
boot-tested both configurations on arm64 with CONFIG_UBSAN, to no ill effects.
Hopefully the demons from v1 and v2 have been vanquished.

Thanks for bearing with me!

Mark.

[1] https://lkml.org/lkml/2016/5/16/331
[2] https://lkml.org/lkml/2016/5/17/191

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index afdcb7b..74d8a04 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1285,9 +1285,9 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->qsmask != 0) {
-			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
-				if (rnp->qsmask & (1UL << cpu))
-					dump_cpu_task(rnp->grplo + cpu);
+			for_each_leaf_node_possible_cpu(rnp, cpu)
+				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
+					dump_cpu_task(cpu);
 		}
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
@@ -1353,10 +1353,9 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		ndetected += rcu_print_task_stall(rnp);
 		if (rnp->qsmask != 0) {
-			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
-				if (rnp->qsmask & (1UL << cpu)) {
-					print_cpu_stall_info(rsp,
-							     rnp->grplo + cpu);
+			for_each_leaf_node_possible_cpu(rnp, cpu)
+				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
+					print_cpu_stall_info(rsp, cpu);
 					ndetected++;
 				}
 		}
@@ -2873,7 +2872,6 @@ static void force_qs_rnp(struct rcu_state *rsp,
 				  unsigned long *maxj),
 			 bool *isidle, unsigned long *maxj)
 {
-	unsigned long bit;
 	int cpu;
 	unsigned long flags;
 	unsigned long mask;
@@ -2908,9 +2906,8 @@ static void force_qs_rnp(struct rcu_state *rsp,
 				continue;
 			}
 		}
-		cpu = rnp->grplo;
-		bit = 1;
-		for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
+		for_each_leaf_node_possible_cpu(rnp, cpu) {
+			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
 			if ((rnp->qsmask & bit) != 0) {
 				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
 					mask |= bit;
@@ -3739,7 +3736,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
 
 	/* Set up local state, ensuring consistent view of global state. */
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	rdp->grpmask = 1UL << (cpu - rdp->mynode->grplo);
+	rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu);
 	rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
 	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
 	WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e3959f5..dc0b7da 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -254,6 +254,13 @@ struct rcu_node {
 } ____cacheline_internodealigned_in_smp;
 
 /*
+ * Bitmasks in an rcu_node cover the interval [grplo, grphi] of CPU IDs, and
+ * are indexed relative to this interval rather than the global CPU ID space.
+ * This generates the bit for a CPU in node-local masks.
+ */
+#define leaf_node_cpu_bit(rnp, cpu) (1UL << ((cpu) - (rnp)->grplo))
+
+/*
  * Do a full breadth-first scan of the rcu_node structures for the
  * specified rcu_state structure.
  */
@@ -281,6 +288,14 @@ struct rcu_node {
 	     (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
 
 /*
+ * Iterate over all possible CPUs in a leaf RCU node.
+ */
+#define for_each_leaf_node_possible_cpu(rnp, cpu) \
+	for ((cpu) = rnp->grplo; \
+	     cpu <= rnp->grphi; \
+	     cpu = cpumask_next((cpu), cpu_possible_mask))
+
+/*
  * Union to allow "aggregate OR" operation on the need for a quiescent
  * state by the normal and expedited grace periods.
  */
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 00a02a2..d400434 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -344,7 +344,6 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 {
 	int cpu;
 	unsigned long flags;
-	unsigned long mask;
 	unsigned long mask_ofl_test;
 	unsigned long mask_ofl_ipi;
 	int ret;
@@ -356,7 +355,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 
 		/* Each pass checks a CPU for identity, offline, and idle. */
 		mask_ofl_test = 0;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
+		for_each_leaf_node_possible_cpu(rnp, cpu) {
 			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
 
@@ -376,8 +375,8 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
 		/* IPI the remaining CPUs for expedited quiescent state. */
-		mask = 1;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+		for_each_leaf_node_possible_cpu(rnp, cpu) {
+			unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
 			if (!(mask_ofl_ipi & mask))
 				continue;
 retry_ipi:
@@ -440,10 +439,10 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 		ndetected = 0;
 		rcu_for_each_leaf_node(rsp, rnp) {
 			ndetected += rcu_print_task_exp_stall(rnp);
-			mask = 1;
-			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			for_each_leaf_node_possible_cpu(rnp, cpu) {
 				struct rcu_data *rdp;
 
+				mask = leaf_node_cpu_bit(rnp, cpu);
 				if (!(rnp->expmask & mask))
 					continue;
 				ndetected++;
@@ -453,7 +452,6 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
 					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
 			}
-			mask <<= 1;
 		}
 		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
 			jiffies - jiffies_start, rsp->expedited_sequence,
@@ -473,8 +471,8 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 			pr_cont("\n");
 		}
 		rcu_for_each_leaf_node(rsp, rnp) {
-			mask = 1;
-			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			for_each_leaf_node_possible_cpu(rnp, cpu) {
+				mask = leaf_node_cpu_bit(rnp, cpu);
 				if (!(rnp->expmask & mask))
 					continue;
 				dump_cpu_task(cpu);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 02a9197..0082fce 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1164,8 +1164,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 		return;
 	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
 		return;
-	for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1)
-		if ((mask & 0x1) && cpu != outgoingcpu)
+	for_each_leaf_node_possible_cpu(rnp, cpu)
+		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
+		    cpu != outgoingcpu)
 			cpumask_set_cpu(cpu, cm);
 	if (cpumask_weight(cm) == 0)
 		cpumask_setall(cm);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv3] rcu: tree: correctly handle sparse possible cpus
  2016-05-23 10:42 [PATCHv3] rcu: tree: correctly handle sparse possible cpus Mark Rutland
@ 2016-05-23 16:13 ` Paul E. McKenney
  2016-05-31 11:03   ` Mark Rutland
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2016-05-23 16:13 UTC (permalink / raw)
  To: Mark Rutland
  Cc: catalin.marinas, dennis.chen, jiangshanlai, josh, linux-kernel,
	mathieu.desnoyers, rostedt, steve.capper, will.deacon

On Mon, May 23, 2016 at 11:42:59AM +0100, Mark Rutland wrote:
> In many cases in the RCU tree code, we iterate over the set of cpus for
> a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
> per-cpu data for each cpu in this range. However, if the set of possible
> cpus is sparse, some cpus described in this range are not possible, and
> thus no per-cpu region will have been allocated (or initialised) for
> them by the generic percpu code.
> 
> Erroneous accesses to a per-cpu area for these !possible cpus may fault
> or may hit other data depending on the addressed generated when the
> erroneous per cpu offset is applied. In practice, both cases have been
> observed on arm64 hardware (the former being silent, but detectable with
> additional patches).
> 
> To avoid issues resulting from this, we must iterate over the set of
> *possible* cpus for a given leaf node. This patch add a new helper,
> for_each_leaf_node_possible_cpu, to enable this. As iteration is often
> intertwined with rcu_node local bitmask manipulation, a new
> leaf_node_cpu_bit helper is added to make this simpler and more
> consistent. The RCU tree code is made to use both of these where
> appropriate.

Thank you, Mark, queued for review and further testing.

							Thanx, Paul

> Without this patch, running reboot at a shell can result in an oops
> like:
> 
> [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
> [ 3369.083881] pgd = ffffffc3ecdda000
> [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
> [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
> [ 3369.101781] Modules linked in:
> [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G        W       4.6.0+ #3
> [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
> [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
> [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
> [ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5
> [ 3369.146860] sp : ffffffc3eb9435a0
> [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
> [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
> [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
> [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
> [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
> [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
> [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
> [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
> [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
> [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
> [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
> [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
> [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
> [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
> [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
> ...
> [ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510
> [ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8
> [ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30
> [ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298
> [ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0
> [ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40
> [ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158
> [ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60
> [ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918
> [ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8
> [ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220
> [ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8
> [ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40
> [ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238
> [ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358
> [ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30
> [ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280
> [ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88
> [ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20
> [ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Reported-by: Dennis Chen <dennis.chen@arm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steve Capper <steve.capper@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: linux-kernel@vger.kernel.org
> ---
>  kernel/rcu/tree.c        | 21 +++++++++------------
>  kernel/rcu/tree.h        | 15 +++++++++++++++
>  kernel/rcu/tree_exp.h    | 16 +++++++---------
>  kernel/rcu/tree_plugin.h |  5 +++--
>  4 files changed, 34 insertions(+), 23 deletions(-)
> 
> Since v1 [1]:
>  * rebase to the -rcu rcu/dev branch.
>  * replace all occurences missed by v1.
>  * s/CPUS/CPUs/ in commit message. Gah.
>  *
> Since v2 [2]:
>  * Rebase to today's -rcu rcu/dev branch (51189938b1c011be)
>  * Fix build with CONFIG_RCU_BOOST
>  * Split leaf_node_cpu_bit() to avoid undefined behaviour
>  * s/CPUs/cpus/ in commit message for consistency. Gah, again.
> 
> Paul, sorry for all the trouble with v1 and v2.
> 
> I've built this for x86 and arm64 with and without CONFIG_RCU_BOOST, and I've
> boot-tested both configurations on arm64 with CONFIG_UBSAN, to no ill effects.
> Hopefully the demons from v1 and v2 have been vanquished.
> 
> Thanks for bearing with me!
> 
> Mark.
> 
> [1] https://lkml.org/lkml/2016/5/16/331
> [2] https://lkml.org/lkml/2016/5/17/191
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index afdcb7b..74d8a04 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1285,9 +1285,9 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
>  	rcu_for_each_leaf_node(rsp, rnp) {
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  		if (rnp->qsmask != 0) {
> -			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
> -				if (rnp->qsmask & (1UL << cpu))
> -					dump_cpu_task(rnp->grplo + cpu);
> +			for_each_leaf_node_possible_cpu(rnp, cpu)
> +				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
> +					dump_cpu_task(cpu);
>  		}
>  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  	}
> @@ -1353,10 +1353,9 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  		ndetected += rcu_print_task_stall(rnp);
>  		if (rnp->qsmask != 0) {
> -			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
> -				if (rnp->qsmask & (1UL << cpu)) {
> -					print_cpu_stall_info(rsp,
> -							     rnp->grplo + cpu);
> +			for_each_leaf_node_possible_cpu(rnp, cpu)
> +				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
> +					print_cpu_stall_info(rsp, cpu);
>  					ndetected++;
>  				}
>  		}
> @@ -2873,7 +2872,6 @@ static void force_qs_rnp(struct rcu_state *rsp,
>  				  unsigned long *maxj),
>  			 bool *isidle, unsigned long *maxj)
>  {
> -	unsigned long bit;
>  	int cpu;
>  	unsigned long flags;
>  	unsigned long mask;
> @@ -2908,9 +2906,8 @@ static void force_qs_rnp(struct rcu_state *rsp,
>  				continue;
>  			}
>  		}
> -		cpu = rnp->grplo;
> -		bit = 1;
> -		for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
> +		for_each_leaf_node_possible_cpu(rnp, cpu) {
> +			unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
>  			if ((rnp->qsmask & bit) != 0) {
>  				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
>  					mask |= bit;
> @@ -3739,7 +3736,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
> 
>  	/* Set up local state, ensuring consistent view of global state. */
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> -	rdp->grpmask = 1UL << (cpu - rdp->mynode->grplo);
> +	rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu);
>  	rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
>  	WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
>  	WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index e3959f5..dc0b7da 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -254,6 +254,13 @@ struct rcu_node {
>  } ____cacheline_internodealigned_in_smp;
> 
>  /*
> + * Bitmasks in an rcu_node cover the interval [grplo, grphi] of CPU IDs, and
> + * are indexed relative to this interval rather than the global CPU ID space.
> + * This generates the bit for a CPU in node-local masks.
> + */
> +#define leaf_node_cpu_bit(rnp, cpu) (1UL << ((cpu) - (rnp)->grplo))
> +
> +/*
>   * Do a full breadth-first scan of the rcu_node structures for the
>   * specified rcu_state structure.
>   */
> @@ -281,6 +288,14 @@ struct rcu_node {
>  	     (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
> 
>  /*
> + * Iterate over all possible CPUs in a leaf RCU node.
> + */
> +#define for_each_leaf_node_possible_cpu(rnp, cpu) \
> +	for ((cpu) = rnp->grplo; \
> +	     cpu <= rnp->grphi; \
> +	     cpu = cpumask_next((cpu), cpu_possible_mask))
> +
> +/*
>   * Union to allow "aggregate OR" operation on the need for a quiescent
>   * state by the normal and expedited grace periods.
>   */
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 00a02a2..d400434 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -344,7 +344,6 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  {
>  	int cpu;
>  	unsigned long flags;
> -	unsigned long mask;
>  	unsigned long mask_ofl_test;
>  	unsigned long mask_ofl_ipi;
>  	int ret;
> @@ -356,7 +355,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
> 
>  		/* Each pass checks a CPU for identity, offline, and idle. */
>  		mask_ofl_test = 0;
> -		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
> +		for_each_leaf_node_possible_cpu(rnp, cpu) {
>  			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
>  			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
> 
> @@ -376,8 +375,8 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> 
>  		/* IPI the remaining CPUs for expedited quiescent state. */
> -		mask = 1;
> -		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> +		for_each_leaf_node_possible_cpu(rnp, cpu) {
> +			unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
>  			if (!(mask_ofl_ipi & mask))
>  				continue;
>  retry_ipi:
> @@ -440,10 +439,10 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
>  		ndetected = 0;
>  		rcu_for_each_leaf_node(rsp, rnp) {
>  			ndetected += rcu_print_task_exp_stall(rnp);
> -			mask = 1;
> -			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> +			for_each_leaf_node_possible_cpu(rnp, cpu) {
>  				struct rcu_data *rdp;
> 
> +				mask = leaf_node_cpu_bit(rnp, cpu);
>  				if (!(rnp->expmask & mask))
>  					continue;
>  				ndetected++;
> @@ -453,7 +452,6 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
>  					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
>  					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
>  			}
> -			mask <<= 1;
>  		}
>  		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
>  			jiffies - jiffies_start, rsp->expedited_sequence,
> @@ -473,8 +471,8 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
>  			pr_cont("\n");
>  		}
>  		rcu_for_each_leaf_node(rsp, rnp) {
> -			mask = 1;
> -			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> +			for_each_leaf_node_possible_cpu(rnp, cpu) {
> +				mask = leaf_node_cpu_bit(rnp, cpu);
>  				if (!(rnp->expmask & mask))
>  					continue;
>  				dump_cpu_task(cpu);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 02a9197..0082fce 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1164,8 +1164,9 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  		return;
>  	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
>  		return;
> -	for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1)
> -		if ((mask & 0x1) && cpu != outgoingcpu)
> +	for_each_leaf_node_possible_cpu(rnp, cpu)
> +		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
> +		    cpu != outgoingcpu)
>  			cpumask_set_cpu(cpu, cm);
>  	if (cpumask_weight(cm) == 0)
>  		cpumask_setall(cm);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv3] rcu: tree: correctly handle sparse possible cpus
  2016-05-23 16:13 ` Paul E. McKenney
@ 2016-05-31 11:03   ` Mark Rutland
  2016-06-02 13:10     ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rutland @ 2016-05-31 11:03 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: catalin.marinas, dennis.chen, jiangshanlai, josh, linux-kernel,
	mathieu.desnoyers, rostedt, steve.capper, will.deacon

On Mon, May 23, 2016 at 09:13:33AM -0700, Paul E. McKenney wrote:
> On Mon, May 23, 2016 at 11:42:59AM +0100, Mark Rutland wrote:
> > In many cases in the RCU tree code, we iterate over the set of cpus for
> > a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
> > per-cpu data for each cpu in this range. However, if the set of possible
> > cpus is sparse, some cpus described in this range are not possible, and
> > thus no per-cpu region will have been allocated (or initialised) for
> > them by the generic percpu code.
> > 
> > Erroneous accesses to a per-cpu area for these !possible cpus may fault
> > or may hit other data depending on the addressed generated when the
> > erroneous per cpu offset is applied. In practice, both cases have been
> > observed on arm64 hardware (the former being silent, but detectable with
> > additional patches).
> > 
> > To avoid issues resulting from this, we must iterate over the set of
> > *possible* cpus for a given leaf node. This patch add a new helper,
> > for_each_leaf_node_possible_cpu, to enable this. As iteration is often
> > intertwined with rcu_node local bitmask manipulation, a new
> > leaf_node_cpu_bit helper is added to make this simpler and more
> > consistent. The RCU tree code is made to use both of these where
> > appropriate.
> 
> Thank you, Mark, queued for review and further testing.
> 
> 							Thanx, Paul

Thanks Paul.

Unfortunately, it seems that in my haste to spin v3, I missed your suggested
logic to handle the !cpu_possible(rnp->grplo) case [1]. Sorry about that,
evidently I was not being sufficiently thorough.

Would you be happy to fold that in, as per the diff below? Otherwise I can send
that as a fixup patch, or a respin the whole thing as v4, per your preference.

I've given the below a spin on arm64 atop of -rcu/dev, with and without
RCU_BOOST and/or RCU_TRACE, and I've build-tested likewise for x86. I've
devised and tested the !cpu_possible(rnp->grplo) case by messing with the arm64
SMP code and the RCU tree fanout. So far everything seems happy: no build
issues, UBSAN splats, or other runtime failures.

So fingers crossed, that's the last issue with this patch blatted...

Thanks,
Mark.

[1] lkml.kernel.org/r/20160517190106.GJ3528@linux.vnet.ibm.com

---->8----
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index dc0b7da..f714f87 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -291,7 +291,7 @@ struct rcu_node {
  * Iterate over all possible CPUs in a leaf RCU node.
  */
 #define for_each_leaf_node_possible_cpu(rnp, cpu) \
-       for ((cpu) = rnp->grplo; \
+       for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \
             cpu <= rnp->grphi; \
             cpu = cpumask_next((cpu), cpu_possible_mask))
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv3] rcu: tree: correctly handle sparse possible cpus
  2016-05-31 11:03   ` Mark Rutland
@ 2016-06-02 13:10     ` Paul E. McKenney
  0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2016-06-02 13:10 UTC (permalink / raw)
  To: Mark Rutland
  Cc: catalin.marinas, dennis.chen, jiangshanlai, josh, linux-kernel,
	mathieu.desnoyers, rostedt, steve.capper, will.deacon

On Tue, May 31, 2016 at 12:03:15PM +0100, Mark Rutland wrote:
> On Mon, May 23, 2016 at 09:13:33AM -0700, Paul E. McKenney wrote:
> > On Mon, May 23, 2016 at 11:42:59AM +0100, Mark Rutland wrote:
> > > In many cases in the RCU tree code, we iterate over the set of cpus for
> > > a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
> > > per-cpu data for each cpu in this range. However, if the set of possible
> > > cpus is sparse, some cpus described in this range are not possible, and
> > > thus no per-cpu region will have been allocated (or initialised) for
> > > them by the generic percpu code.
> > > 
> > > Erroneous accesses to a per-cpu area for these !possible cpus may fault
> > > or may hit other data depending on the addressed generated when the
> > > erroneous per cpu offset is applied. In practice, both cases have been
> > > observed on arm64 hardware (the former being silent, but detectable with
> > > additional patches).
> > > 
> > > To avoid issues resulting from this, we must iterate over the set of
> > > *possible* cpus for a given leaf node. This patch add a new helper,
> > > for_each_leaf_node_possible_cpu, to enable this. As iteration is often
> > > intertwined with rcu_node local bitmask manipulation, a new
> > > leaf_node_cpu_bit helper is added to make this simpler and more
> > > consistent. The RCU tree code is made to use both of these where
> > > appropriate.
> > 
> > Thank you, Mark, queued for review and further testing.
> > 
> > 							Thanx, Paul
> 
> Thanks Paul.
> 
> Unfortunately, it seems that in my haste to spin v3, I missed your suggested
> logic to handle the !cpu_possible(rnp->grplo) case [1]. Sorry about that,
> evidently I was not being sufficiently thorough.
> 
> Would you be happy to fold that in, as per the diff below? Otherwise I can send
> that as a fixup patch, or a respin the whole thing as v4, per your preference.
> 
> I've given the below a spin on arm64 atop of -rcu/dev, with and without
> RCU_BOOST and/or RCU_TRACE, and I've build-tested likewise for x86. I've
> devised and tested the !cpu_possible(rnp->grplo) case by messing with the arm64
> SMP code and the RCU tree fanout. So far everything seems happy: no build
> issues, UBSAN splats, or other runtime failures.
> 
> So fingers crossed, that's the last issue with this patch blatted...

Please do respin the patch -- cleaner history and easier for people to
dig through.

							Thanx, Paul

> Thanks,
> Mark.
> 
> [1] lkml.kernel.org/r/20160517190106.GJ3528@linux.vnet.ibm.com
> 
> ---->8----
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index dc0b7da..f714f87 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -291,7 +291,7 @@ struct rcu_node {
>   * Iterate over all possible CPUs in a leaf RCU node.
>   */
>  #define for_each_leaf_node_possible_cpu(rnp, cpu) \
> -       for ((cpu) = rnp->grplo; \
> +       for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \
>              cpu <= rnp->grphi; \
>              cpu = cpumask_next((cpu), cpu_possible_mask))
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-06-02 18:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-23 10:42 [PATCHv3] rcu: tree: correctly handle sparse possible cpus Mark Rutland
2016-05-23 16:13 ` Paul E. McKenney
2016-05-31 11:03   ` Mark Rutland
2016-06-02 13:10     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).