All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] watchdog and NMI IPI locking improvements
@ 2017-08-09 12:41 Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 1/6] powerpc: NMI IPI improve lock primitive Nicholas Piggin
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Nicholas Piggin @ 2017-08-09 12:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Paul Mackerras

It was noticed that the watchdog was causing hangs and lockups in
some cases, hammering on the watchdog lock, so I've found a few
other improvements and bugs. Thanks to Paulus for finding the problem
and fixing the lock primitives (I fixed it a bit differently but the
idea is his).

Thanks,
Nick

Nicholas Piggin (6):
  powerpc: NMI IPI improve lock primitive
  powerpc/watchdog: Improve watchdog lock primitive
  powerpc/watchdog: Moderate touch_nmi_watchdog overhead
  powerpc/watchdog: Fix final-check recovered case
  powerpc/watchdog: Fix marking of stuck CPUs
  powerpc/watchdog: add locking around init/exit functions

 arch/powerpc/kernel/smp.c      |  6 +++---
 arch/powerpc/kernel/watchdog.c | 49 +++++++++++++++++++++++++++++++-----------
 2 files changed, 39 insertions(+), 16 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/6] powerpc: NMI IPI improve lock primitive
  2017-08-09 12:41 [PATCH 0/6] watchdog and NMI IPI locking improvements Nicholas Piggin
@ 2017-08-09 12:41 ` Nicholas Piggin
  2017-08-10 12:07   ` [1/6] " Michael Ellerman
  2017-08-09 12:41 ` [PATCH 2/6] powerpc/watchdog: Improve watchdog " Nicholas Piggin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2017-08-09 12:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Paul Mackerras

When the NMI IPI lock is contended, spin at low SMT priority, using
loads only, and with interrupts enabled (where possible). This
improves behaviour under high contention (e.g., a system crash when
a number of CPUs are trying to enter the debugger).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/smp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index cf0e1245b8cc..8d3320562c70 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -351,7 +351,7 @@ static void nmi_ipi_lock_start(unsigned long *flags)
 	hard_irq_disable();
 	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1) {
 		raw_local_irq_restore(*flags);
-		cpu_relax();
+		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
 		raw_local_irq_save(*flags);
 		hard_irq_disable();
 	}
@@ -360,7 +360,7 @@ static void nmi_ipi_lock_start(unsigned long *flags)
 static void nmi_ipi_lock(void)
 {
 	while (atomic_cmpxchg(&__nmi_ipi_lock, 0, 1) == 1)
-		cpu_relax();
+		spin_until_cond(atomic_read(&__nmi_ipi_lock) == 0);
 }
 
 static void nmi_ipi_unlock(void)
@@ -475,7 +475,7 @@ int smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us)
 	nmi_ipi_lock_start(&flags);
 	while (nmi_ipi_busy_count) {
 		nmi_ipi_unlock_end(&flags);
-		cpu_relax();
+		spin_until_cond(nmi_ipi_busy_count == 0);
 		nmi_ipi_lock_start(&flags);
 	}
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/6] powerpc/watchdog: Improve watchdog lock primitive
  2017-08-09 12:41 [PATCH 0/6] watchdog and NMI IPI locking improvements Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 1/6] powerpc: NMI IPI improve lock primitive Nicholas Piggin
@ 2017-08-09 12:41 ` Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 3/6] powerpc/watchdog: Moderate touch_nmi_watchdog overhead Nicholas Piggin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2017-08-09 12:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Paul Mackerras

- Hard-disable interrupts before taking the lock, which prevents
  soft-NMI re-entrancy and therefore can prevent deadlocks.
- Use raw_ variants of local_irq_disable to avoid irq debugging.
- When the lock is contended, spin at low SMT priority, using
  loads only, and with interrupts enabled (where possible).

Some stalls have been noticed at high loads that go away with improved
locking. There should not be so much locking contention in the first
place (which is addressed in a subsequent patch), but locking should
still be improved.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/watchdog.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 4b9a567c9975..3a26c3401acc 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -71,15 +71,20 @@ static inline void wd_smp_lock(unsigned long *flags)
 	 * This may be called from low level interrupt handlers at some
 	 * point in future.
 	 */
-	local_irq_save(*flags);
-	while (unlikely(test_and_set_bit_lock(0, &__wd_smp_lock)))
-		cpu_relax();
+	raw_local_irq_save(*flags);
+	hard_irq_disable(); /* Make it soft-NMI safe */
+	while (unlikely(test_and_set_bit_lock(0, &__wd_smp_lock))) {
+		raw_local_irq_restore(*flags);
+		spin_until_cond(!test_bit(0, &__wd_smp_lock));
+		raw_local_irq_save(*flags);
+		hard_irq_disable();
+	}
 }
 
 static inline void wd_smp_unlock(unsigned long *flags)
 {
 	clear_bit_unlock(0, &__wd_smp_lock);
-	local_irq_restore(*flags);
+	raw_local_irq_restore(*flags);
 }
 
 static void wd_lockup_ipi(struct pt_regs *regs)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/6] powerpc/watchdog: Moderate touch_nmi_watchdog overhead
  2017-08-09 12:41 [PATCH 0/6] watchdog and NMI IPI locking improvements Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 1/6] powerpc: NMI IPI improve lock primitive Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 2/6] powerpc/watchdog: Improve watchdog " Nicholas Piggin
@ 2017-08-09 12:41 ` Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 4/6] powerpc/watchdog: Fix final-check recovered case Nicholas Piggin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2017-08-09 12:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Paul Mackerras

Some code can go into a tight loop calling touch_nmi_watchdog (e.g.,
stop_machine CPU hotplug code). This can cause contention on watchdog
locks particularly if all CPUs with watchdog enabled are spinning in
the loops.

Avoid this storm of activity by running the watchdog timer callback
from this path if we have exceeded the timer period since it was last
run.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/watchdog.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 3a26c3401acc..2806383d2339 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -266,9 +266,11 @@ static void wd_timer_fn(unsigned long data)
 
 void arch_touch_nmi_watchdog(void)
 {
+	unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000;
 	int cpu = smp_processor_id();
 
-	watchdog_timer_interrupt(cpu);
+	if (get_tb() - per_cpu(wd_timer_tb, cpu) >= ticks)
+		watchdog_timer_interrupt(cpu);
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/6] powerpc/watchdog: Fix final-check recovered case
  2017-08-09 12:41 [PATCH 0/6] watchdog and NMI IPI locking improvements Nicholas Piggin
                   ` (2 preceding siblings ...)
  2017-08-09 12:41 ` [PATCH 3/6] powerpc/watchdog: Moderate touch_nmi_watchdog overhead Nicholas Piggin
@ 2017-08-09 12:41 ` Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 5/6] powerpc/watchdog: Fix marking of stuck CPUs Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 6/6] powerpc/watchdog: add locking around init/exit functions Nicholas Piggin
  5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2017-08-09 12:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Paul Mackerras

When the watchdog decides to panic, it takes the lock and double
checks everything (to avoid races with the CPU being unstuck or
panic()ed by something else).

The exit label was misplaced and would result in all-CPUs backtrace
and watchdog panic even in the case that the condition was found to be
resolved.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/watchdog.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 2806383d2339..5a69654075c1 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -144,7 +144,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 	for_each_cpu(c, &wd_smp_cpus_pending)
 		set_cpu_stuck(c, tb);
 
-out:
 	wd_smp_unlock(&flags);
 
 	printk_safe_flush();
@@ -157,6 +156,11 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
 	if (hardlockup_panic)
 		nmi_panic(NULL, "Hard LOCKUP");
+
+	return;
+
+out:
+	wd_smp_unlock(&flags);
 }
 
 static void wd_smp_clear_cpu_pending(int cpu, u64 tb)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/6] powerpc/watchdog: Fix marking of stuck CPUs
  2017-08-09 12:41 [PATCH 0/6] watchdog and NMI IPI locking improvements Nicholas Piggin
                   ` (3 preceding siblings ...)
  2017-08-09 12:41 ` [PATCH 4/6] powerpc/watchdog: Fix final-check recovered case Nicholas Piggin
@ 2017-08-09 12:41 ` Nicholas Piggin
  2017-08-09 12:41 ` [PATCH 6/6] powerpc/watchdog: add locking around init/exit functions Nicholas Piggin
  5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2017-08-09 12:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Paul Mackerras

When the SMP detector finds other CPUs stuck, it iterates over
them and marks them as stuck. This pulls them out of the pending
mask and allows the detector to continue with remaining good
CPUs (if nmi_watchdog=panic is not enabled).

The code to dothat was buggy because when setting a CPU stuck,
if the pending mask became empty, it resets it to keep the
watchdog running. However the iterator will continue to run
over the new pending mask and mark remaining good CPUs sas stuck.

Fix this by doing it with cpumask bitwise operations.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/watchdog.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 5a69654075c1..b84351f359ea 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -101,10 +101,10 @@ static void wd_lockup_ipi(struct pt_regs *regs)
 		nmi_panic(regs, "Hard LOCKUP");
 }
 
-static void set_cpu_stuck(int cpu, u64 tb)
+static void set_cpumask_stuck(const struct cpumask *cpumask, u64 tb)
 {
-	cpumask_set_cpu(cpu, &wd_smp_cpus_stuck);
-	cpumask_clear_cpu(cpu, &wd_smp_cpus_pending);
+	cpumask_or(&wd_smp_cpus_stuck, &wd_smp_cpus_stuck, cpumask);
+	cpumask_andnot(&wd_smp_cpus_pending, &wd_smp_cpus_pending, cpumask);
 	if (cpumask_empty(&wd_smp_cpus_pending)) {
 		wd_smp_last_reset_tb = tb;
 		cpumask_andnot(&wd_smp_cpus_pending,
@@ -112,6 +112,10 @@ static void set_cpu_stuck(int cpu, u64 tb)
 				&wd_smp_cpus_stuck);
 	}
 }
+static void set_cpu_stuck(int cpu, u64 tb)
+{
+	set_cpumask_stuck(cpumask_of(cpu), tb);
+}
 
 static void watchdog_smp_panic(int cpu, u64 tb)
 {
@@ -140,9 +144,8 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 	}
 	smp_flush_nmi_ipi(1000000);
 
-	/* Take the stuck CPU out of the watch group */
-	for_each_cpu(c, &wd_smp_cpus_pending)
-		set_cpu_stuck(c, tb);
+	/* Take the stuck CPUs out of the watch group */
+	set_cpumask_stuck(&wd_smp_cpus_pending, tb);
 
 	wd_smp_unlock(&flags);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/6] powerpc/watchdog: add locking around init/exit functions
  2017-08-09 12:41 [PATCH 0/6] watchdog and NMI IPI locking improvements Nicholas Piggin
                   ` (4 preceding siblings ...)
  2017-08-09 12:41 ` [PATCH 5/6] powerpc/watchdog: Fix marking of stuck CPUs Nicholas Piggin
@ 2017-08-09 12:41 ` Nicholas Piggin
  5 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2017-08-09 12:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Paul Mackerras

When CPUs start and stop the watchdog, they manipulate shared data
that is normally protected by the lock. Other CPUs can be running
concurrently at this time, so it's a good idea to use locking here
to be on the safe side.

Remove the barrier which is undocumented and didn't do anything.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/watchdog.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index b84351f359ea..2f6eadd9408d 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -300,6 +300,8 @@ static void stop_watchdog_timer_on(unsigned int cpu)
 
 static int start_wd_on_cpu(unsigned int cpu)
 {
+	unsigned long flags;
+
 	if (cpumask_test_cpu(cpu, &wd_cpus_enabled)) {
 		WARN_ON(1);
 		return 0;
@@ -314,12 +316,14 @@ static int start_wd_on_cpu(unsigned int cpu)
 	if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
 		return 0;
 
+	wd_smp_lock(&flags);
 	cpumask_set_cpu(cpu, &wd_cpus_enabled);
 	if (cpumask_weight(&wd_cpus_enabled) == 1) {
 		cpumask_set_cpu(cpu, &wd_smp_cpus_pending);
 		wd_smp_last_reset_tb = get_tb();
 	}
-	smp_wmb();
+	wd_smp_unlock(&flags);
+
 	start_watchdog_timer_on(cpu);
 
 	return 0;
@@ -327,12 +331,17 @@ static int start_wd_on_cpu(unsigned int cpu)
 
 static int stop_wd_on_cpu(unsigned int cpu)
 {
+	unsigned long flags;
+
 	if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
 		return 0; /* Can happen in CPU unplug case */
 
 	stop_watchdog_timer_on(cpu);
 
+	wd_smp_lock(&flags);
 	cpumask_clear_cpu(cpu, &wd_cpus_enabled);
+	wd_smp_unlock(&flags);
+
 	wd_smp_clear_cpu_pending(cpu, get_tb());
 
 	return 0;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [1/6] powerpc: NMI IPI improve lock primitive
  2017-08-09 12:41 ` [PATCH 1/6] powerpc: NMI IPI improve lock primitive Nicholas Piggin
@ 2017-08-10 12:07   ` Michael Ellerman
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2017-08-10 12:07 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Paul Mackerras, Nicholas Piggin

On Wed, 2017-08-09 at 12:41:21 UTC, Nicholas Piggin wrote:
> When the NMI IPI lock is contended, spin at low SMT priority, using
> loads only, and with interrupts enabled (where possible). This
> improves behaviour under high contention (e.g., a system crash when
> a number of CPUs are trying to enter the debugger).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Series applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/0459ddfdb31e7d812b555a2530ecba

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-08-10 12:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-09 12:41 [PATCH 0/6] watchdog and NMI IPI locking improvements Nicholas Piggin
2017-08-09 12:41 ` [PATCH 1/6] powerpc: NMI IPI improve lock primitive Nicholas Piggin
2017-08-10 12:07   ` [1/6] " Michael Ellerman
2017-08-09 12:41 ` [PATCH 2/6] powerpc/watchdog: Improve watchdog " Nicholas Piggin
2017-08-09 12:41 ` [PATCH 3/6] powerpc/watchdog: Moderate touch_nmi_watchdog overhead Nicholas Piggin
2017-08-09 12:41 ` [PATCH 4/6] powerpc/watchdog: Fix final-check recovered case Nicholas Piggin
2017-08-09 12:41 ` [PATCH 5/6] powerpc/watchdog: Fix marking of stuck CPUs Nicholas Piggin
2017-08-09 12:41 ` [PATCH 6/6] powerpc/watchdog: add locking around init/exit functions Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.