All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] powerpc/watchdog: provide more data in watchdog messages
@ 2018-05-05  7:25 Nicholas Piggin
  2018-05-05  7:25 ` [PATCH v2 1/2] powerpc/watchdog: don't update the watchdog timestamp if a lockup is detected Nicholas Piggin
  2018-05-05  7:26 ` [PATCH v2 2/2] powerpc/watchdog: provide more data in watchdog messages Nicholas Piggin
  0 siblings, 2 replies; 4+ messages in thread
From: Nicholas Piggin @ 2018-05-05  7:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Balbir Singh

Since v1, Balbir's feedback:
- Split into two patches
- Added human-readable intervals as well as TB timestamps.

Nicholas Piggin (2):
  powerpc/watchdog: don't update the watchdog timestamp if a lockup is
    detected
  powerpc/watchdog: provide more data in watchdog messages

 arch/powerpc/kernel/watchdog.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] powerpc/watchdog: don't update the watchdog timestamp if a lockup is detected
  2018-05-05  7:25 [PATCH v2 0/2] powerpc/watchdog: provide more data in watchdog messages Nicholas Piggin
@ 2018-05-05  7:25 ` Nicholas Piggin
  2018-05-10 14:06   ` [v2, " Michael Ellerman
  2018-05-05  7:26 ` [PATCH v2 2/2] powerpc/watchdog: provide more data in watchdog messages Nicholas Piggin
  1 sibling, 1 reply; 4+ messages in thread
From: Nicholas Piggin @ 2018-05-05  7:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Balbir Singh

The watchdog heartbeat timestamp is updated when the local heartbeat
timer fires (or touch_nmi_watchdog() is called).

This is an interesting data point, so don't overwrite it when the
soft-NMI interrupt detects a hard lockup. That code came from a pre-
merge version to prevent hard lockup messages flood, but that's taken
care of with the stuck CPU logic now, so there is no reason to
update the heartbeat timestamp here.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/watchdog.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 6256dc3b0087..0bc701f9ab35 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -245,8 +245,6 @@ void soft_nmi_interrupt(struct pt_regs *regs)
 
 	tb = get_tb();
 	if (tb - per_cpu(wd_timer_tb, cpu) >= wd_panic_timeout_tb) {
-		per_cpu(wd_timer_tb, cpu) = tb;
-
 		wd_smp_lock(&flags);
 		if (cpumask_test_cpu(cpu, &wd_smp_cpus_stuck)) {
 			wd_smp_unlock(&flags);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] powerpc/watchdog: provide more data in watchdog messages
  2018-05-05  7:25 [PATCH v2 0/2] powerpc/watchdog: provide more data in watchdog messages Nicholas Piggin
  2018-05-05  7:25 ` [PATCH v2 1/2] powerpc/watchdog: don't update the watchdog timestamp if a lockup is detected Nicholas Piggin
@ 2018-05-05  7:26 ` Nicholas Piggin
  1 sibling, 0 replies; 4+ messages in thread
From: Nicholas Piggin @ 2018-05-05  7:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Balbir Singh

Provide timebase and timebase of last heartbeat in watchdog lockup
messages. Also provide a stack trace of when a CPU becomes un-stuck,
which can be useful -- it could be where irqs are re-enabled, so it
may be the end of the critical section which is responsible for the
latency which is useful information.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/watchdog.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 0bc701f9ab35..a99951e8199e 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -111,7 +111,13 @@ static inline void wd_smp_unlock(unsigned long *flags)
 
 static void wd_lockup_ipi(struct pt_regs *regs)
 {
-	pr_emerg("CPU %d Hard LOCKUP\n", raw_smp_processor_id());
+	int cpu = raw_smp_processor_id();
+	u64 tb = get_tb();
+
+	pr_emerg("CPU %d Hard LOCKUP\n", cpu);
+	pr_emerg("CPU %d TB:%lld, last heartbeat TB:%lld (%lldms ago)\n",
+		 cpu, tb, per_cpu(wd_timer_tb, cpu),
+		 tb_to_ns(tb - per_cpu(wd_timer_tb, cpu)) / 1000000);
 	print_modules();
 	print_irqtrace_events(current);
 	if (regs)
@@ -154,6 +160,9 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
 	pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n",
 		 cpu, cpumask_pr_args(&wd_smp_cpus_pending));
+	pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld (%lldms ago)\n",
+		 cpu, tb, wd_smp_last_reset_tb,
+		 tb_to_ns(tb - wd_smp_last_reset_tb) / 1000000);
 
 	if (!sysctl_hardlockup_all_cpu_backtrace) {
 		/*
@@ -194,10 +203,19 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb)
 {
 	if (!cpumask_test_cpu(cpu, &wd_smp_cpus_pending)) {
 		if (unlikely(cpumask_test_cpu(cpu, &wd_smp_cpus_stuck))) {
+			struct pt_regs *regs = get_irq_regs();
 			unsigned long flags;
 
-			pr_emerg("CPU %d became unstuck\n", cpu);
 			wd_smp_lock(&flags);
+
+			pr_emerg("CPU %d became unstuck TB:%lld\n",
+				 cpu, tb);
+			print_irqtrace_events(current);
+			if (regs)
+				show_regs(regs);
+			else
+				dump_stack();
+
 			cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck);
 			wd_smp_unlock(&flags);
 		}
@@ -252,7 +270,11 @@ void soft_nmi_interrupt(struct pt_regs *regs)
 		}
 		set_cpu_stuck(cpu, tb);
 
-		pr_emerg("CPU %d self-detected hard LOCKUP @ %pS\n", cpu, (void *)regs->nip);
+		pr_emerg("CPU %d self-detected hard LOCKUP @ %pS\n",
+			 cpu, (void *)regs->nip);
+		pr_emerg("CPU %d TB:%lld, last heartbeat TB:%lld (%lldms ago)\n",
+			 cpu, tb, per_cpu(wd_timer_tb, cpu),
+			 tb_to_ns(tb - per_cpu(wd_timer_tb, cpu)) / 1000000);
 		print_modules();
 		print_irqtrace_events(current);
 		show_regs(regs);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [v2, 1/2] powerpc/watchdog: don't update the watchdog timestamp if a lockup is detected
  2018-05-05  7:25 ` [PATCH v2 1/2] powerpc/watchdog: don't update the watchdog timestamp if a lockup is detected Nicholas Piggin
@ 2018-05-10 14:06   ` Michael Ellerman
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2018-05-10 14:06 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

On Sat, 2018-05-05 at 07:25:59 UTC, Nicholas Piggin wrote:
> The watchdog heartbeat timestamp is updated when the local heartbeat
> timer fires (or touch_nmi_watchdog() is called).
> 
> This is an interesting data point, so don't overwrite it when the
> soft-NMI interrupt detects a hard lockup. That code came from a pre-
> merge version to prevent hard lockup messages flood, but that's taken
> care of with the stuck CPU logic now, so there is no reason to
> update the heartbeat timestamp here.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5a951c4e7e8df5d6df52bace1b4ff3

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-10 14:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-05  7:25 [PATCH v2 0/2] powerpc/watchdog: provide more data in watchdog messages Nicholas Piggin
2018-05-05  7:25 ` [PATCH v2 1/2] powerpc/watchdog: don't update the watchdog timestamp if a lockup is detected Nicholas Piggin
2018-05-10 14:06   ` [v2, " Michael Ellerman
2018-05-05  7:26 ` [PATCH v2 2/2] powerpc/watchdog: provide more data in watchdog messages Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.