* [PATCH 0/2] arm64: Fix issues with CPU hotplug and RCU
@ 2020-11-06 10:36 Will Deacon
2020-11-06 10:36 ` [PATCH 1/2] arm64: psci: Avoid printing in cpu_psci_cpu_die() Will Deacon
2020-11-06 10:36 ` [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online Will Deacon
0 siblings, 2 replies; 4+ messages in thread
From: Will Deacon @ 2020-11-06 10:36 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, kernel-team, Will Deacon, Catalin Marinas,
Qian Cai, Paul E. McKenney
Hi folks,
Here are a couple of patches following on from:
https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck
which address issues when CPU onlining fails but RCU is left none the
wiser. Tested under QEMU.
If Paul is happy with the second patch, then I can take both of these
via arm64 as fixes for 5.11.
Cheers,
Will
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
--->8
Will Deacon (2):
arm64: psci: Avoid printing in cpu_psci_cpu_die()
arm64: smp: Tell RCU about CPUs that fail to come online
arch/arm64/kernel/psci.c | 2 --
arch/arm64/kernel/smp.c | 1 +
kernel/rcu/tree.c | 2 +-
3 files changed, 2 insertions(+), 3 deletions(-)
--
2.29.1.341.ge80a0c044ae-goog
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] arm64: psci: Avoid printing in cpu_psci_cpu_die()
2020-11-06 10:36 [PATCH 0/2] arm64: Fix issues with CPU hotplug and RCU Will Deacon
@ 2020-11-06 10:36 ` Will Deacon
2020-11-06 10:36 ` [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online Will Deacon
1 sibling, 0 replies; 4+ messages in thread
From: Will Deacon @ 2020-11-06 10:36 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, kernel-team, Will Deacon, Catalin Marinas,
Qian Cai, Paul E. McKenney
cpu_psci_cpu_die() is called in the context of the dying CPU, which
will no longer be online or tracked by RCU. It is therefore not generally
safe to call printk() if the PSCI "cpu off" request fails, so remove the
pr_crit() invocation.
Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kernel/psci.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index 43ae4e0c968f..6a4f3e37c3b4 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -75,8 +75,6 @@ static void cpu_psci_cpu_die(unsigned int cpu)
PSCI_0_2_POWER_STATE_TYPE_SHIFT;
ret = psci_ops.cpu_off(state);
-
- pr_crit("unable to power off CPU%u (%d)\n", cpu, ret);
}
static int cpu_psci_cpu_kill(unsigned int cpu)
--
2.29.1.341.ge80a0c044ae-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online
2020-11-06 10:36 [PATCH 0/2] arm64: Fix issues with CPU hotplug and RCU Will Deacon
2020-11-06 10:36 ` [PATCH 1/2] arm64: psci: Avoid printing in cpu_psci_cpu_die() Will Deacon
@ 2020-11-06 10:36 ` Will Deacon
2020-11-06 14:43 ` Paul E. McKenney
1 sibling, 1 reply; 4+ messages in thread
From: Will Deacon @ 2020-11-06 10:36 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, kernel-team, Will Deacon, Catalin Marinas,
Qian Cai, Paul E. McKenney
Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured
that RCU is informed early about incoming CPUs that might end up calling
into printk() before they are online. However, if such a CPU fails the
early CPU feature compatibility checks in check_local_cpu_capabilities(),
then it will be powered off or parked without informing RCU, leading to
an endless stream of stalls:
| rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
| rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593
| (detected by 0, t=5252 jiffies, g=9317, q=136)
| Task dump for CPU 2:
| task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028
| Call trace:
| ret_from_fork+0x0/0x30
Ensure that the dying CPU invokes rcu_report_dead() prior to being powered
off or parked.
Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Suggested-by: Qian Cai <cai@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kernel/smp.c | 1 +
kernel/rcu/tree.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 09c96f57818c..18e9727d3f64 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -413,6 +413,7 @@ void cpu_die_early(void)
/* Mark this CPU absent */
set_cpu_present(cpu, 0);
+ rcu_report_dead(cpu);
if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
update_cpu_boot_status(CPU_KILL_ME);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2a52f42f64b6..bd04b09b84b3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu)
smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
}
-#ifdef CONFIG_HOTPLUG_CPU
/*
* The outgoing function has no further need of RCU, so remove it from
* the rcu_node tree's ->qsmaskinitnext bit masks.
@@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu)
rdp->cpu_started = false;
}
+#ifdef CONFIG_HOTPLUG_CPU
/*
* The outgoing CPU has just passed through the dying-idle state, and we
* are being invoked from the CPU that was IPIed to continue the offline
--
2.29.1.341.ge80a0c044ae-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online
2020-11-06 10:36 ` [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online Will Deacon
@ 2020-11-06 14:43 ` Paul E. McKenney
0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2020-11-06 14:43 UTC (permalink / raw)
To: Will Deacon
Cc: linux-arm-kernel, linux-kernel, kernel-team, Catalin Marinas, Qian Cai
On Fri, Nov 06, 2020 at 10:36:02AM +0000, Will Deacon wrote:
> Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured
> that RCU is informed early about incoming CPUs that might end up calling
> into printk() before they are online. However, if such a CPU fails the
> early CPU feature compatibility checks in check_local_cpu_capabilities(),
> then it will be powered off or parked without informing RCU, leading to
> an endless stream of stalls:
>
> | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> | rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593
> | (detected by 0, t=5252 jiffies, g=9317, q=136)
> | Task dump for CPU 2:
> | task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028
> | Call trace:
> | ret_from_fork+0x0/0x30
>
> Ensure that the dying CPU invokes rcu_report_dead() prior to being powered
> off or parked.
>
> Cc: Qian Cai <cai@redhat.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Suggested-by: Qian Cai <cai@redhat.com>
> Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> ---
> arch/arm64/kernel/smp.c | 1 +
> kernel/rcu/tree.c | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 09c96f57818c..18e9727d3f64 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -413,6 +413,7 @@ void cpu_die_early(void)
>
> /* Mark this CPU absent */
> set_cpu_present(cpu, 0);
> + rcu_report_dead(cpu);
>
> if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
> update_cpu_boot_status(CPU_KILL_ME);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2a52f42f64b6..bd04b09b84b3 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu)
> smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
> }
>
> -#ifdef CONFIG_HOTPLUG_CPU
> /*
> * The outgoing function has no further need of RCU, so remove it from
> * the rcu_node tree's ->qsmaskinitnext bit masks.
> @@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu)
> rdp->cpu_started = false;
> }
>
> +#ifdef CONFIG_HOTPLUG_CPU
> /*
> * The outgoing CPU has just passed through the dying-idle state, and we
> * are being invoked from the CPU that was IPIed to continue the offline
> --
> 2.29.1.341.ge80a0c044ae-goog
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-11-06 14:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-06 10:36 [PATCH 0/2] arm64: Fix issues with CPU hotplug and RCU Will Deacon
2020-11-06 10:36 ` [PATCH 1/2] arm64: psci: Avoid printing in cpu_psci_cpu_die() Will Deacon
2020-11-06 10:36 ` [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online Will Deacon
2020-11-06 14:43 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).