linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] further reduce warnings and watchdogs in crash/shutdown paths
@ 2018-05-19  4:35 Nicholas Piggin
  2018-05-19  4:35 ` [PATCH 1/3] powerpc/64: hard disable irqs in panic_smp_self_stop Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Nicholas Piggin @ 2018-05-19  4:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Patches 1 and 3 are more of the same, just upgrading local_irq_disable
to hard_irq_disable in places.

Patch 2 fixes a warning people have been hitting in crash testing on
busy systems.

Thanks,
Nick

Nicholas Piggin (3):
  powerpc/64: hard disable irqs in panic_smp_self_stop
  powerpc: smp_send_stop do not offline stopped CPUs
  powerpc/64: hard disable irqs on the panic()ing CPU

 arch/powerpc/kernel/setup-common.c | 12 ++++++++++--
 arch/powerpc/kernel/setup_64.c     |  8 ++++++++
 arch/powerpc/kernel/smp.c          |  6 ------
 3 files changed, 18 insertions(+), 8 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] powerpc/64: hard disable irqs in panic_smp_self_stop
  2018-05-19  4:35 [PATCH 0/3] further reduce warnings and watchdogs in crash/shutdown paths Nicholas Piggin
@ 2018-05-19  4:35 ` Nicholas Piggin
  2018-06-19 23:21   ` [1/3] " Michael Ellerman
  2018-05-19  4:35 ` [PATCH 2/3] powerpc: smp_send_stop do not offline stopped CPUs Nicholas Piggin
  2018-05-19  4:35 ` [PATCH 3/3] powerpc/64: hard disable irqs on the panic()ing CPU Nicholas Piggin
  2 siblings, 1 reply; 5+ messages in thread
From: Nicholas Piggin @ 2018-05-19  4:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Similarly to commit 855bfe0de1 ("powerpc: hard disable irqs in
smp_send_stop loop"), irqs should be hard disabled by
panic_smp_self_stop.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/setup_64.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index b78f142a4148..d122321ad542 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -380,6 +380,14 @@ void early_setup_secondary(void)
 
 #endif /* CONFIG_SMP */
 
+void panic_smp_self_stop(void)
+{
+	hard_irq_disable();
+	spin_begin();
+	while (1)
+		spin_cpu_relax();
+}
+
 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
 static bool use_spinloop(void)
 {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] powerpc: smp_send_stop do not offline stopped CPUs
  2018-05-19  4:35 [PATCH 0/3] further reduce warnings and watchdogs in crash/shutdown paths Nicholas Piggin
  2018-05-19  4:35 ` [PATCH 1/3] powerpc/64: hard disable irqs in panic_smp_self_stop Nicholas Piggin
@ 2018-05-19  4:35 ` Nicholas Piggin
  2018-05-19  4:35 ` [PATCH 3/3] powerpc/64: hard disable irqs on the panic()ing CPU Nicholas Piggin
  2 siblings, 0 replies; 5+ messages in thread
From: Nicholas Piggin @ 2018-05-19  4:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Marking CPUs stopped by smp_send_stop as offline can cause warnings
due to cross-CPU wakeups. This trace was noticed on a busy system
running a sysrq+c crash test, after the injected crash:

WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240
CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G      D
Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core]
[...]
NIP [c00000000017c21c] set_task_cpu+0x22c/0x240
LR [c00000000017d580] try_to_wake_up+0x230/0x720
Call Trace:
[c000000001017700] runqueues+0x0/0xb00 (unreliable)
[c00000000017d580] try_to_wake_up+0x230/0x720
[c00000000015a214] insert_work+0x104/0x140
[c00000000015adb0] __queue_work+0x230/0x690
[c000003fc5007910] [c00000000015b26c] queue_work_on+0x5c/0x90
[c0080000135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core]
[c008000013608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core]
[c00800001362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 [mlx5_core]
[c008000013628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core]
[c008000013625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core]
[c00000000015bcec] process_one_work+0x1bc/0x5f0
[c00000000015ecac] worker_thread+0xac/0x6b0
[c000000000168338] kthread+0x168/0x1b0
[c00000000000b628] ret_from_kernel_thread+0x5c/0xb4

This happens because firstly the CPU is not really offline in the
usual sense, processes and interrupts have not been migrated away.
Secondly smp_send_stop does not happen atomically on all CPUs, so
one CPU can have marked itself offline, while another CPU is still
running processes or interrupts which can affect the first CPU.

Fix this by just not marking the CPU as offline. It's more like
frozen in time, so offline does not really reflect its state properly
anyway. There should be nothing in the crash/panic path that walks
online CPUs and synchronously waits for them, so this change should
not introduce new hangs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/smp.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 9ca7148b5881..6d6cf14009cf 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -579,9 +579,6 @@ static void nmi_stop_this_cpu(struct pt_regs *regs)
 	nmi_ipi_busy_count--;
 	nmi_ipi_unlock();
 
-	/* Remove this CPU */
-	set_cpu_online(smp_processor_id(), false);
-
 	spin_begin();
 	while (1)
 		spin_cpu_relax();
@@ -596,9 +593,6 @@ void smp_send_stop(void)
 
 static void stop_this_cpu(void *dummy)
 {
-	/* Remove this CPU */
-	set_cpu_online(smp_processor_id(), false);
-
 	hard_irq_disable();
 	spin_begin();
 	while (1)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] powerpc/64: hard disable irqs on the panic()ing CPU
  2018-05-19  4:35 [PATCH 0/3] further reduce warnings and watchdogs in crash/shutdown paths Nicholas Piggin
  2018-05-19  4:35 ` [PATCH 1/3] powerpc/64: hard disable irqs in panic_smp_self_stop Nicholas Piggin
  2018-05-19  4:35 ` [PATCH 2/3] powerpc: smp_send_stop do not offline stopped CPUs Nicholas Piggin
@ 2018-05-19  4:35 ` Nicholas Piggin
  2 siblings, 0 replies; 5+ messages in thread
From: Nicholas Piggin @ 2018-05-19  4:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Similar to previous patches, hard disable interrupts when a CPU is
in panic. This reduces the chance the watchdog has to interfere with
the panic, and avoids any other type of masked interrupt being
executed when crashing which minimises the length of the crash path.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/setup-common.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 0af5c11b9e78..d45fb522fe8a 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -706,12 +706,19 @@ EXPORT_SYMBOL(check_legacy_ioport);
 static int ppc_panic_event(struct notifier_block *this,
                              unsigned long event, void *ptr)
 {
+	/*
+	 * panic does a local_irq_disable, but we really
+	 * want interrupts to be hard disabled.
+	 */
+	hard_irq_disable();
+
 	/*
 	 * If firmware-assisted dump has been registered then trigger
 	 * firmware-assisted dump and let firmware handle everything else.
 	 */
 	crash_fadump(NULL, ptr);
-	ppc_md.panic(ptr);  /* May not return */
+	if (ppc_md.panic)
+		ppc_md.panic(ptr);  /* May not return */
 	return NOTIFY_DONE;
 }
 
@@ -722,7 +729,8 @@ static struct notifier_block ppc_panic_block = {
 
 void __init setup_panic(void)
 {
-	if (!ppc_md.panic)
+	/* PPC64 always does a hard irq disable in its panic handler */
+	if (!IS_ENABLED(CONFIG_PPC64) && !ppc_md.panic)
 		return;
 	atomic_notifier_chain_register(&panic_notifier_list, &ppc_panic_block);
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [1/3] powerpc/64: hard disable irqs in panic_smp_self_stop
  2018-05-19  4:35 ` [PATCH 1/3] powerpc/64: hard disable irqs in panic_smp_self_stop Nicholas Piggin
@ 2018-06-19 23:21   ` Michael Ellerman
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2018-06-19 23:21 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

On Sat, 2018-05-19 at 04:35:52 UTC, Nicholas Piggin wrote:
> Similarly to commit 855bfe0de1 ("powerpc: hard disable irqs in
> smp_send_stop loop"), irqs should be hard disabled by
> panic_smp_self_stop.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Series applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/8c1aef6a682f87a059f10ab606cc1e

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-19 23:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-19  4:35 [PATCH 0/3] further reduce warnings and watchdogs in crash/shutdown paths Nicholas Piggin
2018-05-19  4:35 ` [PATCH 1/3] powerpc/64: hard disable irqs in panic_smp_self_stop Nicholas Piggin
2018-06-19 23:21   ` [1/3] " Michael Ellerman
2018-05-19  4:35 ` [PATCH 2/3] powerpc: smp_send_stop do not offline stopped CPUs Nicholas Piggin
2018-05-19  4:35 ` [PATCH 3/3] powerpc/64: hard disable irqs on the panic()ing CPU Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).