linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] arm64: Fix issues with CPU hotplug and RCU
@ 2020-11-06 10:36 Will Deacon
  2020-11-06 10:36 ` [PATCH 1/2] arm64: psci: Avoid printing in cpu_psci_cpu_die() Will Deacon
  2020-11-06 10:36 ` [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online Will Deacon
  0 siblings, 2 replies; 4+ messages in thread
From: Will Deacon @ 2020-11-06 10:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, kernel-team, Will Deacon, Catalin Marinas,
	Qian Cai, Paul E. McKenney

Hi folks,

Here are a couple of patches following on from:

https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck

which address issues when CPU onlining fails but RCU is left none the
wiser. Tested under QEMU.

If Paul is happy with the second patch, then I can take both of these
via arm64 as fixes for 5.11.

Cheers,

Will

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>

--->8

Will Deacon (2):
  arm64: psci: Avoid printing in cpu_psci_cpu_die()
  arm64: smp: Tell RCU about CPUs that fail to come online

 arch/arm64/kernel/psci.c | 2 --
 arch/arm64/kernel/smp.c  | 1 +
 kernel/rcu/tree.c        | 2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)

-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] arm64: psci: Avoid printing in cpu_psci_cpu_die()
  2020-11-06 10:36 [PATCH 0/2] arm64: Fix issues with CPU hotplug and RCU Will Deacon
@ 2020-11-06 10:36 ` Will Deacon
  2020-11-06 10:36 ` [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online Will Deacon
  1 sibling, 0 replies; 4+ messages in thread
From: Will Deacon @ 2020-11-06 10:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, kernel-team, Will Deacon, Catalin Marinas,
	Qian Cai, Paul E. McKenney

cpu_psci_cpu_die() is called in the context of the dying CPU, which
will no longer be online or tracked by RCU. It is therefore not generally
safe to call printk() if the PSCI "cpu off" request fails, so remove the
pr_crit() invocation.

Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/psci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index 43ae4e0c968f..6a4f3e37c3b4 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -75,8 +75,6 @@ static void cpu_psci_cpu_die(unsigned int cpu)
 		    PSCI_0_2_POWER_STATE_TYPE_SHIFT;
 
 	ret = psci_ops.cpu_off(state);
-
-	pr_crit("unable to power off CPU%u (%d)\n", cpu, ret);
 }
 
 static int cpu_psci_cpu_kill(unsigned int cpu)
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online
  2020-11-06 10:36 [PATCH 0/2] arm64: Fix issues with CPU hotplug and RCU Will Deacon
  2020-11-06 10:36 ` [PATCH 1/2] arm64: psci: Avoid printing in cpu_psci_cpu_die() Will Deacon
@ 2020-11-06 10:36 ` Will Deacon
  2020-11-06 14:43   ` Paul E. McKenney
  1 sibling, 1 reply; 4+ messages in thread
From: Will Deacon @ 2020-11-06 10:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, kernel-team, Will Deacon, Catalin Marinas,
	Qian Cai, Paul E. McKenney

Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured
that RCU is informed early about incoming CPUs that might end up calling
into printk() before they are online. However, if such a CPU fails the
early CPU feature compatibility checks in check_local_cpu_capabilities(),
then it will be powered off or parked without informing RCU, leading to
an endless stream of stalls:

  | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  | rcu:	2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593
  | (detected by 0, t=5252 jiffies, g=9317, q=136)
  | Task dump for CPU 2:
  | task:swapper/2       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x00000028
  | Call trace:
  | ret_from_fork+0x0/0x30

Ensure that the dying CPU invokes rcu_report_dead() prior to being powered
off or parked.

Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Suggested-by: Qian Cai <cai@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/smp.c | 1 +
 kernel/rcu/tree.c       | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 09c96f57818c..18e9727d3f64 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -413,6 +413,7 @@ void cpu_die_early(void)
 
 	/* Mark this CPU absent */
 	set_cpu_present(cpu, 0);
+	rcu_report_dead(cpu);
 
 	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
 		update_cpu_boot_status(CPU_KILL_ME);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2a52f42f64b6..bd04b09b84b3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu)
 	smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
 /*
  * The outgoing function has no further need of RCU, so remove it from
  * the rcu_node tree's ->qsmaskinitnext bit masks.
@@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu)
 	rdp->cpu_started = false;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
 /*
  * The outgoing CPU has just passed through the dying-idle state, and we
  * are being invoked from the CPU that was IPIed to continue the offline
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online
  2020-11-06 10:36 ` [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online Will Deacon
@ 2020-11-06 14:43   ` Paul E. McKenney
  0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2020-11-06 14:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, kernel-team, Catalin Marinas, Qian Cai

On Fri, Nov 06, 2020 at 10:36:02AM +0000, Will Deacon wrote:
> Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured
> that RCU is informed early about incoming CPUs that might end up calling
> into printk() before they are online. However, if such a CPU fails the
> early CPU feature compatibility checks in check_local_cpu_capabilities(),
> then it will be powered off or parked without informing RCU, leading to
> an endless stream of stalls:
> 
>   | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>   | rcu:	2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593
>   | (detected by 0, t=5252 jiffies, g=9317, q=136)
>   | Task dump for CPU 2:
>   | task:swapper/2       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x00000028
>   | Call trace:
>   | ret_from_fork+0x0/0x30
> 
> Ensure that the dying CPU invokes rcu_report_dead() prior to being powered
> off or parked.
> 
> Cc: Qian Cai <cai@redhat.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Suggested-by: Qian Cai <cai@redhat.com>
> Signed-off-by: Will Deacon <will@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  arch/arm64/kernel/smp.c | 1 +
>  kernel/rcu/tree.c       | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 09c96f57818c..18e9727d3f64 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -413,6 +413,7 @@ void cpu_die_early(void)
>  
>  	/* Mark this CPU absent */
>  	set_cpu_present(cpu, 0);
> +	rcu_report_dead(cpu);
>  
>  	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
>  		update_cpu_boot_status(CPU_KILL_ME);
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2a52f42f64b6..bd04b09b84b3 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu)
>  	smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
>  }
>  
> -#ifdef CONFIG_HOTPLUG_CPU
>  /*
>   * The outgoing function has no further need of RCU, so remove it from
>   * the rcu_node tree's ->qsmaskinitnext bit masks.
> @@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu)
>  	rdp->cpu_started = false;
>  }
>  
> +#ifdef CONFIG_HOTPLUG_CPU
>  /*
>   * The outgoing CPU has just passed through the dying-idle state, and we
>   * are being invoked from the CPU that was IPIed to continue the offline
> -- 
> 2.29.1.341.ge80a0c044ae-goog
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-06 14:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-06 10:36 [PATCH 0/2] arm64: Fix issues with CPU hotplug and RCU Will Deacon
2020-11-06 10:36 ` [PATCH 1/2] arm64: psci: Avoid printing in cpu_psci_cpu_die() Will Deacon
2020-11-06 10:36 ` [PATCH 2/2] arm64: smp: Tell RCU about CPUs that fail to come online Will Deacon
2020-11-06 14:43   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).