All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/2] softlockup watchdog improvements
@ 2007-03-27  5:38 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27  5:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel, virtualization, Ingo Molnar, Thomas Gleixner, john stultz

Here's couple of patches to improve the softlockup watchdog.

The first changes the softlockup timer from using jiffies to sched_clock()
as a timebase.  Xen and VMI implement sched_clock() as counting unstolen
time, so time stolen by the hypervisor won't cause the watchdog to bite.

The second adds per-cpu enable flags for the watchdog timer.  This allows
the timer to be disabled when the CPU goes into a (potentially unbounded)
tickless sleep.

I know this conflicts with
fix-bogus-softlockup-warning-with-sysrq-t.patch in -mm2.  I think that
patch incorrectly changes the behaviour of the softlockup watchdog,
and a better solution is to temporarily disable the watchdog while
doing something known to be cpu-consuming, like a long sysreq output.

	J
-- 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 0/2] softlockup watchdog improvements
@ 2007-03-27  5:38 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27  5:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: virtualization, Ingo Molnar, Thomas Gleixner, Linux Kernel, john stultz

Here's couple of patches to improve the softlockup watchdog.

The first changes the softlockup timer from using jiffies to sched_clock()
as a timebase.  Xen and VMI implement sched_clock() as counting unstolen
time, so time stolen by the hypervisor won't cause the watchdog to bite.

The second adds per-cpu enable flags for the watchdog timer.  This allows
the timer to be disabled when the CPU goes into a (potentially unbounded)
tickless sleep.

I know this conflicts with
fix-bogus-softlockup-warning-with-sysrq-t.patch in -mm2.  I think that
patch incorrectly changes the behaviour of the softlockup watchdog,
and a better solution is to temporarily disable the watchdog while
doing something known to be cpu-consuming, like a long sysreq output.

	J
-- 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27  5:38 ` Jeremy Fitzhardinge
  (?)
@ 2007-03-27  5:38 ` Jeremy Fitzhardinge
  2007-03-27  7:00   ` Eric Dumazet
  2007-03-27 14:39     ` Prarit Bhargava
  -1 siblings, 2 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27  5:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel, virtualization, Ingo Molnar, Thomas Gleixner,
	john stultz, Zachary Amsden, James Morris, Dan Hecht,
	Paul Mackerras, Martin Schwidefsky, Prarit Bhargava,
	Chris Lalancette, Rick Lindsley

[-- Attachment #1: softlockup-ignore-stolen-time.patch --]
[-- Type: text/plain, Size: 4021 bytes --]

The softlockup watchdog is currently a nuisance in a virtual machine,
since the whole system could have the CPU stolen from it for a long
period of time.  While it would be unlikely for a guest domain to be
denied timer interrupts for over 10s, it could happen and any softlockup
message would be completely spurious.

Earlier I proposed that sched_clock() return time in unstolen
nanoseconds, which is how Xen and VMI currently implement it.  If the
softlockup watchdog uses sched_clock() to measure time, it would
automatically ignore stolen time, and therefore only report when the
guest itself locked up.  When running native, sched_clock() returns
real-time nanoseconds, so the behaviour would be unchanged.

Note that sched_clock() used this way is inherently per-cpu, so this
patch makes sure that the per-processor watchdog thread initialized
its own timestamp.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Morris <jmorris@namei.org>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Rick Lindsley <ricklind@us.ibm.com>

---
 kernel/softlockup.c |   28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

===================================================================
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -17,8 +17,8 @@
 
 static DEFINE_SPINLOCK(print_lock);
 
-static DEFINE_PER_CPU(unsigned long, touch_timestamp);
-static DEFINE_PER_CPU(unsigned long, print_timestamp);
+static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
+static DEFINE_PER_CPU(unsigned long long, print_timestamp);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
 
 static int did_panic = 0;
@@ -37,7 +37,7 @@ static struct notifier_block panic_block
 
 void touch_softlockup_watchdog(void)
 {
-	__raw_get_cpu_var(touch_timestamp) = jiffies;
+	__raw_get_cpu_var(touch_timestamp) = sched_clock();
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -48,10 +48,15 @@ void softlockup_tick(void)
 void softlockup_tick(void)
 {
 	int this_cpu = smp_processor_id();
-	unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
+	unsigned long long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
+	unsigned long long now;
 
-	/* prevent double reports: */
-	if (per_cpu(print_timestamp, this_cpu) == touch_timestamp ||
+	/* watchdog task hasn't updated timestamp yet */
+	if (touch_timestamp == 0)
+		return;
+
+	/* report at most once a second */
+	if (per_cpu(print_timestamp, this_cpu) < (touch_timestamp + NSEC_PER_SEC) ||
 		did_panic ||
 			!per_cpu(watchdog_task, this_cpu))
 		return;
@@ -62,12 +67,14 @@ void softlockup_tick(void)
 		return;
 	}
 
+	now = sched_clock();
+
 	/* Wake up the high-prio watchdog task every second: */
-	if (time_after(jiffies, touch_timestamp + HZ))
+	if (now > (touch_timestamp + NSEC_PER_SEC))
 		wake_up_process(per_cpu(watchdog_task, this_cpu));
 
 	/* Warn about unreasonable 10+ seconds delays: */
-	if (time_after(jiffies, touch_timestamp + 10*HZ)) {
+	if (now > (touch_timestamp + 10ull*NSEC_PER_SEC)) {
 		per_cpu(print_timestamp, this_cpu) = touch_timestamp;
 
 		spin_lock(&print_lock);
@@ -87,6 +94,9 @@ static int watchdog(void * __bind_cpu)
 
 	sched_setscheduler(current, SCHED_FIFO, &param);
 	current->flags |= PF_NOFREEZE;
+
+	/* initialize timestamp */
+	touch_softlockup_watchdog();
 
 	/*
 	 * Run briefly once per second to reset the softlockup timestamp.
@@ -120,7 +130,7 @@ cpu_callback(struct notifier_block *nfb,
 			printk("watchdog for %i failed\n", hotcpu);
 			return NOTIFY_BAD;
 		}
-  		per_cpu(touch_timestamp, hotcpu) = jiffies;
+  		per_cpu(touch_timestamp, hotcpu) = 0;
   		per_cpu(watchdog_task, hotcpu) = p;
 		kthread_bind(p, hotcpu);
  		break;

-- 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 2/2] percpu enable flag for softlockup watchdog
  2007-03-27  5:38 ` Jeremy Fitzhardinge
@ 2007-03-27  5:38   ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27  5:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel, virtualization, Ingo Molnar, Thomas Gleixner,
	john stultz, Zachary Amsden, James Morris, Dan Hecht,
	Paul Mackerras, Prarit Bhargava, Chris Lalancette,
	Martin Schwidefsky

[-- Attachment #1: softlockup-percpu-enable-flag.patch --]
[-- Type: text/plain, Size: 5859 bytes --]

On a NO_HZ system, there may be an arbitrarily long delay between
ticks on a CPU.  When we're disabling ticks for a CPU, also disable
the softlockup watchdog timer.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Morris <jmorris@namei.org>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>

---
 include/linux/sched.h    |    8 ++++++++
 kernel/softlockup.c      |   23 +++++++++++++++++++----
 kernel/time/tick-sched.c |   34 +++++++++++++++-------------------
 3 files changed, 42 insertions(+), 23 deletions(-)

===================================================================
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -232,10 +232,18 @@ extern void scheduler_tick(void);
 
 #ifdef CONFIG_DETECT_SOFTLOCKUP
 extern void softlockup_tick(void);
+extern void softlockup_enable(void);
+extern void softlockup_disable(void);
 extern void spawn_softlockup_task(void);
 extern void touch_softlockup_watchdog(void);
 #else
 static inline void softlockup_tick(void)
+{
+}
+static inline void softlockup_enable(void)
+{
+}
+static inline void softlockup_disable(void)
 {
 }
 static inline void spawn_softlockup_task(void)
===================================================================
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -20,6 +20,7 @@ static DEFINE_PER_CPU(unsigned long long
 static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
 static DEFINE_PER_CPU(unsigned long long, print_timestamp);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
+static DEFINE_PER_CPU(int, enabled);
 
 static int did_panic = 0;
 
@@ -41,6 +42,18 @@ void touch_softlockup_watchdog(void)
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
+void softlockup_enable(void)
+{
+	touch_softlockup_watchdog();
+	wmb();			/* update timestamp before enable */
+	__get_cpu_var(enabled) = 1;
+}
+
+void softlockup_disable(void)
+{
+	__get_cpu_var(enabled) = 0;
+}
+
 /*
  * This callback runs from the timer interrupt, and checks
  * whether the watchdog thread has hung or not:
@@ -51,8 +64,8 @@ void softlockup_tick(void)
 	unsigned long long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
 	unsigned long long now;
 
-	/* watchdog task hasn't updated timestamp yet */
-	if (touch_timestamp == 0)
+	/* return if not enabled */
+	if (!__get_cpu_var(enabled))
 		return;
 
 	/* report at most once a second */
@@ -95,8 +108,8 @@ static int watchdog(void * __bind_cpu)
 	sched_setscheduler(current, SCHED_FIFO, &param);
 	current->flags |= PF_NOFREEZE;
 
-	/* initialize timestamp */
-	touch_softlockup_watchdog();
+	/* enable on this cpu */
+	softlockup_enable();
 
 	/*
 	 * Run briefly once per second to reset the softlockup timestamp.
@@ -109,6 +122,8 @@ static int watchdog(void * __bind_cpu)
 		touch_softlockup_watchdog();
 		schedule();
 	}
+
+	softlockup_disable();
 
 	return 0;
 }
===================================================================
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -228,6 +228,8 @@ void tick_nohz_stop_sched_tick(void)
 			ts->idle_tick = ts->sched_timer.expires;
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
+
+			softlockup_disable();
 		}
 		/*
 		 * calculate the expiry time for the next timer wheel
@@ -255,6 +257,7 @@ void tick_nohz_stop_sched_tick(void)
 		cpu_clear(cpu, nohz_cpu_mask);
 	}
 	raise_softirq_irqoff(TIMER_SOFTIRQ);
+
 out:
 	ts->next_jiffies = next_jiffies;
 	ts->last_jiffies = last_jiffies;
@@ -311,6 +314,8 @@ void tick_nohz_restart_sched_tick(void)
 	ts->tick_stopped  = 0;
 	hrtimer_cancel(&ts->sched_timer);
 	ts->sched_timer.expires = ts->idle_tick;
+
+	softlockup_enable();
 
 	while (1) {
 		/* Forward the time to expire in the future */
@@ -355,17 +360,12 @@ static void tick_nohz_handler(struct clo
 	tick_do_update_jiffies64(now);
 
 	/*
-	 * When we are idle and the tick is stopped, we have to touch
-	 * the watchdog as we might not schedule for a really long
-	 * time. This happens on complete idle SMP systems while
-	 * waiting on the login prompt. We also increment the "start
-	 * of idle" jiffy stamp so the idle accounting adjustment we
-	 * do when we go busy again does not account too much ticks.
-	 */
-	if (ts->tick_stopped) {
-		touch_softlockup_watchdog();
+	 * Increment the "start of idle" jiffy stamp so the idle
+	 * accounting adjustment we do when we go busy again does not
+	 * account too much ticks.
+	 */
+	if (ts->tick_stopped)
 		ts->idle_jiffies++;
-	}
 
 	update_process_times(user_mode(regs));
 	profile_tick(CPU_PROFILING);
@@ -450,17 +450,12 @@ static enum hrtimer_restart tick_sched_t
 	 */
 	if (regs) {
 		/*
-		 * When we are idle and the tick is stopped, we have to touch
-		 * the watchdog as we might not schedule for a really long
-		 * time. This happens on complete idle SMP systems while
-		 * waiting on the login prompt. We also increment the "start of
-		 * idle" jiffy stamp so the idle accounting adjustment we do
-		 * when we go busy again does not account too much ticks.
+		 * Increment the "start of idle" jiffy stamp so the
+		 * idle accounting adjustment we do when we go busy
+		 * again does not account too much ticks.
 		 */
-		if (ts->tick_stopped) {
-			touch_softlockup_watchdog();
+		if (ts->tick_stopped)
 			ts->idle_jiffies++;
-		}
 		/*
 		 * update_process_times() might take tasklist_lock, hence
 		 * drop the base lock. sched-tick hrtimers are per-CPU and
@@ -522,6 +517,7 @@ void tick_cancel_sched_timer(int cpu)
 	if (ts->sched_timer.base)
 		hrtimer_cancel(&ts->sched_timer);
 	ts->tick_stopped = 0;
+	softlockup_enable();
 	ts->nohz_mode = NOHZ_MODE_INACTIVE;
 }
 #endif /* HIGH_RES_TIMERS */

-- 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch 2/2] percpu enable flag for softlockup watchdog
@ 2007-03-27  5:38   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27  5:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Prarit Bhargava, john stultz, Linux Kernel, virtualization,
	Paul Mackerras, Martin Schwidefsky, Ingo Molnar, Thomas Gleixner

[-- Attachment #1: softlockup-percpu-enable-flag.patch --]
[-- Type: text/plain, Size: 6043 bytes --]

On a NO_HZ system, there may be an arbitrarily long delay between
ticks on a CPU.  When we're disabling ticks for a CPU, also disable
the softlockup watchdog timer.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Morris <jmorris@namei.org>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>

---
 include/linux/sched.h    |    8 ++++++++
 kernel/softlockup.c      |   23 +++++++++++++++++++----
 kernel/time/tick-sched.c |   34 +++++++++++++++-------------------
 3 files changed, 42 insertions(+), 23 deletions(-)

===================================================================
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -232,10 +232,18 @@ extern void scheduler_tick(void);
 
 #ifdef CONFIG_DETECT_SOFTLOCKUP
 extern void softlockup_tick(void);
+extern void softlockup_enable(void);
+extern void softlockup_disable(void);
 extern void spawn_softlockup_task(void);
 extern void touch_softlockup_watchdog(void);
 #else
 static inline void softlockup_tick(void)
+{
+}
+static inline void softlockup_enable(void)
+{
+}
+static inline void softlockup_disable(void)
 {
 }
 static inline void spawn_softlockup_task(void)
===================================================================
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -20,6 +20,7 @@ static DEFINE_PER_CPU(unsigned long long
 static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
 static DEFINE_PER_CPU(unsigned long long, print_timestamp);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
+static DEFINE_PER_CPU(int, enabled);
 
 static int did_panic = 0;
 
@@ -41,6 +42,18 @@ void touch_softlockup_watchdog(void)
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
+void softlockup_enable(void)
+{
+	touch_softlockup_watchdog();
+	wmb();			/* update timestamp before enable */
+	__get_cpu_var(enabled) = 1;
+}
+
+void softlockup_disable(void)
+{
+	__get_cpu_var(enabled) = 0;
+}
+
 /*
  * This callback runs from the timer interrupt, and checks
  * whether the watchdog thread has hung or not:
@@ -51,8 +64,8 @@ void softlockup_tick(void)
 	unsigned long long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
 	unsigned long long now;
 
-	/* watchdog task hasn't updated timestamp yet */
-	if (touch_timestamp == 0)
+	/* return if not enabled */
+	if (!__get_cpu_var(enabled))
 		return;
 
 	/* report at most once a second */
@@ -95,8 +108,8 @@ static int watchdog(void * __bind_cpu)
 	sched_setscheduler(current, SCHED_FIFO, &param);
 	current->flags |= PF_NOFREEZE;
 
-	/* initialize timestamp */
-	touch_softlockup_watchdog();
+	/* enable on this cpu */
+	softlockup_enable();
 
 	/*
 	 * Run briefly once per second to reset the softlockup timestamp.
@@ -109,6 +122,8 @@ static int watchdog(void * __bind_cpu)
 		touch_softlockup_watchdog();
 		schedule();
 	}
+
+	softlockup_disable();
 
 	return 0;
 }
===================================================================
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -228,6 +228,8 @@ void tick_nohz_stop_sched_tick(void)
 			ts->idle_tick = ts->sched_timer.expires;
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
+
+			softlockup_disable();
 		}
 		/*
 		 * calculate the expiry time for the next timer wheel
@@ -255,6 +257,7 @@ void tick_nohz_stop_sched_tick(void)
 		cpu_clear(cpu, nohz_cpu_mask);
 	}
 	raise_softirq_irqoff(TIMER_SOFTIRQ);
+
 out:
 	ts->next_jiffies = next_jiffies;
 	ts->last_jiffies = last_jiffies;
@@ -311,6 +314,8 @@ void tick_nohz_restart_sched_tick(void)
 	ts->tick_stopped  = 0;
 	hrtimer_cancel(&ts->sched_timer);
 	ts->sched_timer.expires = ts->idle_tick;
+
+	softlockup_enable();
 
 	while (1) {
 		/* Forward the time to expire in the future */
@@ -355,17 +360,12 @@ static void tick_nohz_handler(struct clo
 	tick_do_update_jiffies64(now);
 
 	/*
-	 * When we are idle and the tick is stopped, we have to touch
-	 * the watchdog as we might not schedule for a really long
-	 * time. This happens on complete idle SMP systems while
-	 * waiting on the login prompt. We also increment the "start
-	 * of idle" jiffy stamp so the idle accounting adjustment we
-	 * do when we go busy again does not account too much ticks.
-	 */
-	if (ts->tick_stopped) {
-		touch_softlockup_watchdog();
+	 * Increment the "start of idle" jiffy stamp so the idle
+	 * accounting adjustment we do when we go busy again does not
+	 * account too much ticks.
+	 */
+	if (ts->tick_stopped)
 		ts->idle_jiffies++;
-	}
 
 	update_process_times(user_mode(regs));
 	profile_tick(CPU_PROFILING);
@@ -450,17 +450,12 @@ static enum hrtimer_restart tick_sched_t
 	 */
 	if (regs) {
 		/*
-		 * When we are idle and the tick is stopped, we have to touch
-		 * the watchdog as we might not schedule for a really long
-		 * time. This happens on complete idle SMP systems while
-		 * waiting on the login prompt. We also increment the "start of
-		 * idle" jiffy stamp so the idle accounting adjustment we do
-		 * when we go busy again does not account too much ticks.
+		 * Increment the "start of idle" jiffy stamp so the
+		 * idle accounting adjustment we do when we go busy
+		 * again does not account too much ticks.
 		 */
-		if (ts->tick_stopped) {
-			touch_softlockup_watchdog();
+		if (ts->tick_stopped)
 			ts->idle_jiffies++;
-		}
 		/*
 		 * update_process_times() might take tasklist_lock, hence
 		 * drop the base lock. sched-tick hrtimers are per-CPU and
@@ -522,6 +517,7 @@ void tick_cancel_sched_timer(int cpu)
 	if (ts->sched_timer.base)
 		hrtimer_cancel(&ts->sched_timer);
 	ts->tick_stopped = 0;
+	softlockup_enable();
 	ts->nohz_mode = NOHZ_MODE_INACTIVE;
 }
 #endif /* HIGH_RES_TIMERS */

-- 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27  5:38 ` [patch 1/2] Ignore stolen time in the softlockup watchdog Jeremy Fitzhardinge
@ 2007-03-27  7:00   ` Eric Dumazet
  2007-03-27  7:12       ` Jeremy Fitzhardinge
  2007-03-27 14:39     ` Prarit Bhargava
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2007-03-27  7:00 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar,
	Thomas Gleixner, john stultz, Zachary Amsden, James Morris,
	Dan Hecht, Paul Mackerras, Martin Schwidefsky, Prarit Bhargava,
	Chris Lalancette, Rick Lindsley

Jeremy Fitzhardinge a écrit :

> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp);

...

>  void touch_softlockup_watchdog(void)
>  {
> -	__raw_get_cpu_var(touch_timestamp) = jiffies;
> +	__raw_get_cpu_var(touch_timestamp) = sched_clock();
>  }

Not very clear if this is safe on 32bit, since this is not anymore atomic.

Maybe a comment would help ?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27  7:00   ` Eric Dumazet
@ 2007-03-27  7:12       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27  7:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar,
	Thomas Gleixner, john stultz, Zachary Amsden, James Morris,
	Dan Hecht, Paul Mackerras, Martin Schwidefsky, Prarit Bhargava,
	Chris Lalancette, Rick Lindsley

Eric Dumazet wrote:
> Jeremy Fitzhardinge a écrit :
>
>> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
>
> ...
>
>>  void touch_softlockup_watchdog(void)
>>  {
>> -    __raw_get_cpu_var(touch_timestamp) = jiffies;
>> +    __raw_get_cpu_var(touch_timestamp) = sched_clock();
>>  }
>
> Not very clear if this is safe on 32bit, since this is not anymore
> atomic.

Hm, good point.  Don't think it matters very much.  These values are
per-cpu, and if an interrupt happens between the word updates and the
intermediate values causes a timeout, then it was pretty marginal
anyway.  I guess the worst case is if the low-word gets written first,
and it goes from a high value to low, then it could be sampled as if
time had gone back by up to ~4 seconds.

I'll give it another look.

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
@ 2007-03-27  7:12       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27  7:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Prarit Bhargava, Rick Lindsley, john stultz, Ingo Molnar,
	Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky,
	Thomas Gleixner, Andrew Morton

Eric Dumazet wrote:
> Jeremy Fitzhardinge a écrit :
>
>> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
>
> ...
>
>>  void touch_softlockup_watchdog(void)
>>  {
>> -    __raw_get_cpu_var(touch_timestamp) = jiffies;
>> +    __raw_get_cpu_var(touch_timestamp) = sched_clock();
>>  }
>
> Not very clear if this is safe on 32bit, since this is not anymore
> atomic.

Hm, good point.  Don't think it matters very much.  These values are
per-cpu, and if an interrupt happens between the word updates and the
intermediate values causes a timeout, then it was pretty marginal
anyway.  I guess the worst case is if the low-word gets written first,
and it goes from a high value to low, then it could be sampled as if
time had gone back by up to ~4 seconds.

I'll give it another look.

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27  7:12       ` Jeremy Fitzhardinge
@ 2007-03-27  7:50         ` Eric Dumazet
  -1 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2007-03-27  7:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar,
	Thomas Gleixner, john stultz, Zachary Amsden, James Morris,
	Dan Hecht, Paul Mackerras, Martin Schwidefsky, Prarit Bhargava,
	Chris Lalancette, Rick Lindsley

On Tue, 27 Mar 2007 00:12:53 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Eric Dumazet wrote:
> > Jeremy Fitzhardinge a écrit :
> >
> >> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
> >
> > ...
> >
> >>  void touch_softlockup_watchdog(void)
> >>  {
> >> -    __raw_get_cpu_var(touch_timestamp) = jiffies;
> >> +    __raw_get_cpu_var(touch_timestamp) = sched_clock();
> >>  }
> >
> > Not very clear if this is safe on 32bit, since this is not anymore
> > atomic.
> 
> Hm, good point.  Don't think it matters very much.  These values are
> per-cpu, and if an interrupt happens between the word updates and the
> intermediate values causes a timeout, then it was pretty marginal
> anyway.  I guess the worst case is if the low-word gets written first,
> and it goes from a high value to low, then it could be sampled as if
> time had gone back by up to ~4 seconds.
> 
> I'll give it another look.

OK thanks. I noticed another 'not clear' bit in your second patch :

void softlockup_enable(void)
{
	touch_softlockup_watchdog();
	wmb();			/* update timestamp before enable */
	__get_cpu_var(enabled) = 1;
}


Are you sure wmb() is needed here ?

I think a barrier() (compiler barrier) should be enough. If not, a nice comment would help too :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
@ 2007-03-27  7:50         ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2007-03-27  7:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Prarit Bhargava, Rick Lindsley, john stultz, Ingo Molnar,
	Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky,
	Thomas Gleixner, Andrew Morton

On Tue, 27 Mar 2007 00:12:53 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Eric Dumazet wrote:
> > Jeremy Fitzhardinge a écrit :
> >
> >> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
> >
> > ...
> >
> >>  void touch_softlockup_watchdog(void)
> >>  {
> >> -    __raw_get_cpu_var(touch_timestamp) = jiffies;
> >> +    __raw_get_cpu_var(touch_timestamp) = sched_clock();
> >>  }
> >
> > Not very clear if this is safe on 32bit, since this is not anymore
> > atomic.
> 
> Hm, good point.  Don't think it matters very much.  These values are
> per-cpu, and if an interrupt happens between the word updates and the
> intermediate values causes a timeout, then it was pretty marginal
> anyway.  I guess the worst case is if the low-word gets written first,
> and it goes from a high value to low, then it could be sampled as if
> time had gone back by up to ~4 seconds.
> 
> I'll give it another look.

OK thanks. I noticed another 'not clear' bit in your second patch :

void softlockup_enable(void)
{
	touch_softlockup_watchdog();
	wmb();			/* update timestamp before enable */
	__get_cpu_var(enabled) = 1;
}


Are you sure wmb() is needed here ?

I think a barrier() (compiler barrier) should be enough. If not, a nice comment would help too :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27  5:38 ` [patch 1/2] Ignore stolen time in the softlockup watchdog Jeremy Fitzhardinge
@ 2007-03-27 14:39     ` Prarit Bhargava
  2007-03-27 14:39     ` Prarit Bhargava
  1 sibling, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2007-03-27 14:39 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar,
	Thomas Gleixner, john stultz, Zachary Amsden, James Morris,
	Dan Hecht, Paul Mackerras, Martin Schwidefsky, Chris Lalancette,
	Rick Lindsley



Jeremy Fitzhardinge wrote:
>
> ---
>  kernel/softlockup.c |   28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
>
> ===================================================================
> --- a/kernel/softlockup.c
> +++ b/kernel/softlockup.c
> @@ -17,8 +17,8 @@
>  
>  static DEFINE_SPINLOCK(print_lock);
>  
> -static DEFINE_PER_CPU(unsigned long, touch_timestamp);
> -static DEFINE_PER_CPU(unsigned long, print_timestamp);
> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
> +static DEFINE_PER_CPU(unsigned long long, print_timestamp);
>  static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
>  
>  static int did_panic = 0;
> @@ -37,7 +37,7 @@ static struct notifier_block panic_block
>  
>  void touch_softlockup_watchdog(void)
>  {
> -	__raw_get_cpu_var(touch_timestamp) = jiffies;
> +	__raw_get_cpu_var(touch_timestamp) = sched_clock();
>  }
>   

I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog 
and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.

See this now obsolete patch: http://lkml.org/lkml/2007/3/15/131

P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
@ 2007-03-27 14:39     ` Prarit Bhargava
  0 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2007-03-27 14:39 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rick Lindsley, john stultz, Ingo Molnar, Linux Kernel,
	virtualization, Paul Mackerras, Martin Schwidefsky,
	Thomas Gleixner, Andrew Morton



Jeremy Fitzhardinge wrote:
>
> ---
>  kernel/softlockup.c |   28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
>
> ===================================================================
> --- a/kernel/softlockup.c
> +++ b/kernel/softlockup.c
> @@ -17,8 +17,8 @@
>  
>  static DEFINE_SPINLOCK(print_lock);
>  
> -static DEFINE_PER_CPU(unsigned long, touch_timestamp);
> -static DEFINE_PER_CPU(unsigned long, print_timestamp);
> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp);
> +static DEFINE_PER_CPU(unsigned long long, print_timestamp);
>  static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
>  
>  static int did_panic = 0;
> @@ -37,7 +37,7 @@ static struct notifier_block panic_block
>  
>  void touch_softlockup_watchdog(void)
>  {
> -	__raw_get_cpu_var(touch_timestamp) = jiffies;
> +	__raw_get_cpu_var(touch_timestamp) = sched_clock();
>  }
>   

I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog 
and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.

See this now obsolete patch: http://lkml.org/lkml/2007/3/15/131

P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] percpu enable flag for softlockup watchdog
  2007-03-27  5:38   ` Jeremy Fitzhardinge
@ 2007-03-27 14:42     ` Prarit Bhargava
  -1 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2007-03-27 14:42 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar,
	Thomas Gleixner, john stultz, Zachary Amsden, James Morris,
	Dan Hecht, Paul Mackerras, Chris Lalancette, Martin Schwidefsky



Jeremy Fitzhardinge wrote:
> +static DEFINE_PER_CPU(int, enabled);
>   

Minor nit: let's call this softlockup_enabled.  Makes it easier for the 
reader ;)

P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 2/2] percpu enable flag for softlockup watchdog
@ 2007-03-27 14:42     ` Prarit Bhargava
  0 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2007-03-27 14:42 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: john stultz, Ingo Molnar, Linux Kernel, virtualization,
	Paul Mackerras, Martin Schwidefsky, Thomas Gleixner,
	Andrew Morton



Jeremy Fitzhardinge wrote:
> +static DEFINE_PER_CPU(int, enabled);
>   

Minor nit: let's call this softlockup_enabled.  Makes it easier for the 
reader ;)

P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27 14:39     ` Prarit Bhargava
  (?)
@ 2007-03-27 16:37     ` Jeremy Fitzhardinge
  2007-03-27 16:53         ` Prarit Bhargava
  -1 siblings, 1 reply; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27 16:37 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar,
	Thomas Gleixner, john stultz, Zachary Amsden, James Morris,
	Dan Hecht, Paul Mackerras, Martin Schwidefsky, Chris Lalancette,
	Rick Lindsley

Prarit Bhargava wrote:
> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog
> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.

Why?  Is that more correct?  It seems to me that you're interested in
whether a specific CPU has gone and locked up.  If touching the watchdog
makes it update all CPU timestamps, then you'll hide the fact that other
CPUs have locked up, won't it?

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27 16:37     ` Jeremy Fitzhardinge
@ 2007-03-27 16:53         ` Prarit Bhargava
  0 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2007-03-27 16:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar,
	Thomas Gleixner, john stultz, Zachary Amsden, James Morris,
	Dan Hecht, Paul Mackerras, Martin Schwidefsky, Chris Lalancette,
	Rick Lindsley



Jeremy Fitzhardinge wrote:
> Prarit Bhargava wrote:
>   
>> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog
>> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.
>>     
>
> Why?  Is that more correct?  It seems to me that you're interested in
> whether a specific CPU has gone and locked up.  If touching the watchdog
>   
> makes it update all CPU timestamps, then you'll hide the fact that other
> CPUs have locked up, won't it?
>
>   
In case of misuse, yes.  But there are cases where we know that all CPUs 
will have softlockup issues, such as when doing a "big" sysrq-t dump.  
When doing the sysrq-t we take the tasklist_lock which prevents all 
other CPUs from scheduling -- this leads to bogus softlockup messages, 
so we need to reset everyone's watchdog just before releasing the 
tasklist_lock.

Another question -- are you going to expose disable/enable_watchdog to 
other subsystems?  Or are you going to expose touch_softlockup_watchdog?

>     J
>   

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
@ 2007-03-27 16:53         ` Prarit Bhargava
  0 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2007-03-27 16:53 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rick Lindsley, john stultz, Ingo Molnar, Linux Kernel,
	virtualization, Paul Mackerras, Martin Schwidefsky,
	Thomas Gleixner, Andrew Morton



Jeremy Fitzhardinge wrote:
> Prarit Bhargava wrote:
>   
>> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog
>> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.
>>     
>
> Why?  Is that more correct?  It seems to me that you're interested in
> whether a specific CPU has gone and locked up.  If touching the watchdog
>   
> makes it update all CPU timestamps, then you'll hide the fact that other
> CPUs have locked up, won't it?
>
>   
In case of misuse, yes.  But there are cases where we know that all CPUs 
will have softlockup issues, such as when doing a "big" sysrq-t dump.  
When doing the sysrq-t we take the tasklist_lock which prevents all 
other CPUs from scheduling -- this leads to bogus softlockup messages, 
so we need to reset everyone's watchdog just before releasing the 
tasklist_lock.

Another question -- are you going to expose disable/enable_watchdog to 
other subsystems?  Or are you going to expose touch_softlockup_watchdog?

>     J
>   

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27 16:53         ` Prarit Bhargava
@ 2007-03-27 17:10           ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27 17:10 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Jeremy Fitzhardinge, Rick Lindsley, john stultz, Ingo Molnar,
	Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky,
	Thomas Gleixner, Andrew Morton

Prarit Bhargava wrote:
> Jeremy Fitzhardinge wrote:
>   
>> Prarit Bhargava wrote:
>>   
>>     
>>> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog
>>> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.
>>>     
>>>       
>> Why?  Is that more correct?  It seems to me that you're interested in
>> whether a specific CPU has gone and locked up.  If touching the watchdog
>>   
>> makes it update all CPU timestamps, then you'll hide the fact that other
>> CPUs have locked up, won't it?
>>
>>   
>>     
> In case of misuse, yes.  But there are cases where we know that all CPUs 
> will have softlockup issues, such as when doing a "big" sysrq-t dump.  
> When doing the sysrq-t we take the tasklist_lock which prevents all 
> other CPUs from scheduling -- this leads to bogus softlockup messages, 
> so we need to reset everyone's watchdog just before releasing the 
> tasklist_lock.
>
> Another question -- are you going to expose disable/enable_watchdog to 
> other subsystems?  Or are you going to expose touch_softlockup_watchdog?

Well, it depends on who turns up. 

My first thought is to export both the global enable/disable interfaces
and touch_softlockup_watchdog.  But on second thoughts maybe
touch_softlockup_watchdog is completely redundant, since you'd only do
it if you're holding off timer interrupts, but the lockup only gets
reported if timer interrupts are enabled (in other words, the best it
can tell you is "you locked up for a while there", which isn't terribly
useful).  So perhaps this can just be dropped.  I haven't looked at the
users to see what they're really trying to achieve.

The enable/disable interfaces are more generally useful in that you can
say "I *know* I'm going to go away for a while, so don't bother
reporting it".

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
@ 2007-03-27 17:10           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-27 17:10 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: virtualization, Rick Lindsley, Andrew Morton, Thomas Gleixner,
	Martin Schwidefsky, john stultz, Ingo Molnar, Linux Kernel,
	Paul Mackerras

Prarit Bhargava wrote:
> Jeremy Fitzhardinge wrote:
>   
>> Prarit Bhargava wrote:
>>   
>>     
>>> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog
>>> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.
>>>     
>>>       
>> Why?  Is that more correct?  It seems to me that you're interested in
>> whether a specific CPU has gone and locked up.  If touching the watchdog
>>   
>> makes it update all CPU timestamps, then you'll hide the fact that other
>> CPUs have locked up, won't it?
>>
>>   
>>     
> In case of misuse, yes.  But there are cases where we know that all CPUs 
> will have softlockup issues, such as when doing a "big" sysrq-t dump.  
> When doing the sysrq-t we take the tasklist_lock which prevents all 
> other CPUs from scheduling -- this leads to bogus softlockup messages, 
> so we need to reset everyone's watchdog just before releasing the 
> tasklist_lock.
>
> Another question -- are you going to expose disable/enable_watchdog to 
> other subsystems?  Or are you going to expose touch_softlockup_watchdog?

Well, it depends on who turns up. 

My first thought is to export both the global enable/disable interfaces
and touch_softlockup_watchdog.  But on second thoughts maybe
touch_softlockup_watchdog is completely redundant, since you'd only do
it if you're holding off timer interrupts, but the lockup only gets
reported if timer interrupts are enabled (in other words, the best it
can tell you is "you locked up for a while there", which isn't terribly
useful).  So perhaps this can just be dropped.  I haven't looked at the
users to see what they're really trying to achieve.

The enable/disable interfaces are more generally useful in that you can
say "I *know* I'm going to go away for a while, so don't bother
reporting it".

    J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
  2007-03-27 17:10           ` Jeremy Fitzhardinge
@ 2007-03-27 17:20             ` Prarit Bhargava
  -1 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2007-03-27 17:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeremy Fitzhardinge, Rick Lindsley, john stultz, Ingo Molnar,
	Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky,
	Thomas Gleixner, Andrew Morton



Jeremy Fitzhardinge wrote:
> Prarit Bhargava wrote:
>   
>> Jeremy Fitzhardinge wrote:
>>   
>>     
>>> Prarit Bhargava wrote:
>>>   
>>>     
>>>       
>>>> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog
>>>> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.
>>>>     
>>>>       
>>>>         
>>> Why?  Is that more correct?  It seems to me that you're interested in
>>> whether a specific CPU has gone and locked up.  If touching the watchdog
>>>   
>>> makes it update all CPU timestamps, then you'll hide the fact that other
>>> CPUs have locked up, won't it?
>>>
>>>   
>>>     
>>>       
>> In case of misuse, yes.  But there are cases where we know that all CPUs 
>> will have softlockup issues, such as when doing a "big" sysrq-t dump.  
>> When doing the sysrq-t we take the tasklist_lock which prevents all 
>> other CPUs from scheduling -- this leads to bogus softlockup messages, 
>> so we need to reset everyone's watchdog just before releasing the 
>> tasklist_lock.
>>
>> Another question -- are you going to expose disable/enable_watchdog to 
>> other subsystems?  Or are you going to expose touch_softlockup_watchdog?
>>     
>
> Well, it depends on who turns up. 
>
> My first thought is to export both the global enable/disable interfaces
> and touch_softlockup_watchdog.  But on second thoughts maybe
> touch_softlockup_watchdog is completely redundant, since you'd only do
>   

IMO, if you export enable/disable you should drop touch_softlockup_watchdog.

> it if you're holding off timer interrupts, but the lockup only gets
> reported if timer interrupts are enabled (in other words, the best it
> can tell you is "you locked up for a while there", which isn't terribly
> useful).  
I like to think of the softlockup watchdog letting me know that a cpu 
hasn't scheduled in a long time.

> So perhaps this can just be dropped.  I haven't looked at the
> users to see what they're really trying to achieve.
>   

I've looked through much of that code for my previous patch ;)

AFAICT the uses appear to be cases where we _know_ that  we've gone away 
for a while and need to reset the timer.

But there were some exceptions:  touch_nmi_watchdog erroneously calls 
touch_softlockup_watchdog.  In fact, touch_nmi_watchdog is trying to 
touch all cpus softlockup watchdogs, not just one.

IIRC, There was an extra call to touch_softlockup_watchdog which wasn't 
necessary IIRC...

Look at my previous patch where I replaced touch_softlockup_watchdog 
with touch_cpu_softlockup_watchdog ...

> The enable/disable interfaces are more generally useful in that you can
> say "I *know* I'm going to go away for a while, so don't bother
> reporting it".
>
>     J
>   

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog
@ 2007-03-27 17:20             ` Prarit Bhargava
  0 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2007-03-27 17:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: virtualization, Rick Lindsley, Andrew Morton, Thomas Gleixner,
	Martin Schwidefsky, john stultz, Ingo Molnar, Linux Kernel,
	Paul Mackerras



Jeremy Fitzhardinge wrote:
> Prarit Bhargava wrote:
>   
>> Jeremy Fitzhardinge wrote:
>>   
>>     
>>> Prarit Bhargava wrote:
>>>   
>>>     
>>>       
>>>> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog
>>>> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour.
>>>>     
>>>>       
>>>>         
>>> Why?  Is that more correct?  It seems to me that you're interested in
>>> whether a specific CPU has gone and locked up.  If touching the watchdog
>>>   
>>> makes it update all CPU timestamps, then you'll hide the fact that other
>>> CPUs have locked up, won't it?
>>>
>>>   
>>>     
>>>       
>> In case of misuse, yes.  But there are cases where we know that all CPUs 
>> will have softlockup issues, such as when doing a "big" sysrq-t dump.  
>> When doing the sysrq-t we take the tasklist_lock which prevents all 
>> other CPUs from scheduling -- this leads to bogus softlockup messages, 
>> so we need to reset everyone's watchdog just before releasing the 
>> tasklist_lock.
>>
>> Another question -- are you going to expose disable/enable_watchdog to 
>> other subsystems?  Or are you going to expose touch_softlockup_watchdog?
>>     
>
> Well, it depends on who turns up. 
>
> My first thought is to export both the global enable/disable interfaces
> and touch_softlockup_watchdog.  But on second thoughts maybe
> touch_softlockup_watchdog is completely redundant, since you'd only do
>   

IMO, if you export enable/disable you should drop touch_softlockup_watchdog.

> it if you're holding off timer interrupts, but the lockup only gets
> reported if timer interrupts are enabled (in other words, the best it
> can tell you is "you locked up for a while there", which isn't terribly
> useful).  
I like to think of the softlockup watchdog letting me know that a cpu 
hasn't scheduled in a long time.

> So perhaps this can just be dropped.  I haven't looked at the
> users to see what they're really trying to achieve.
>   

I've looked through much of that code for my previous patch ;)

AFAICT the uses appear to be cases where we _know_ that  we've gone away 
for a while and need to reset the timer.

But there were some exceptions:  touch_nmi_watchdog erroneously calls 
touch_softlockup_watchdog.  In fact, touch_nmi_watchdog is trying to 
touch all cpus softlockup watchdogs, not just one.

IIRC, There was an extra call to touch_softlockup_watchdog which wasn't 
necessary IIRC...

Look at my previous patch where I replaced touch_softlockup_watchdog 
with touch_cpu_softlockup_watchdog ...

> The enable/disable interfaces are more generally useful in that you can
> say "I *know* I'm going to go away for a while, so don't bother
> reporting it".
>
>     J
>   

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-03-27 17:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-27  5:38 [patch 0/2] softlockup watchdog improvements Jeremy Fitzhardinge
2007-03-27  5:38 ` Jeremy Fitzhardinge
2007-03-27  5:38 ` [patch 1/2] Ignore stolen time in the softlockup watchdog Jeremy Fitzhardinge
2007-03-27  7:00   ` Eric Dumazet
2007-03-27  7:12     ` Jeremy Fitzhardinge
2007-03-27  7:12       ` Jeremy Fitzhardinge
2007-03-27  7:50       ` Eric Dumazet
2007-03-27  7:50         ` Eric Dumazet
2007-03-27 14:39   ` Prarit Bhargava
2007-03-27 14:39     ` Prarit Bhargava
2007-03-27 16:37     ` Jeremy Fitzhardinge
2007-03-27 16:53       ` Prarit Bhargava
2007-03-27 16:53         ` Prarit Bhargava
2007-03-27 17:10         ` Jeremy Fitzhardinge
2007-03-27 17:10           ` Jeremy Fitzhardinge
2007-03-27 17:20           ` Prarit Bhargava
2007-03-27 17:20             ` Prarit Bhargava
2007-03-27  5:38 ` [patch 2/2] percpu enable flag for " Jeremy Fitzhardinge
2007-03-27  5:38   ` Jeremy Fitzhardinge
2007-03-27 14:42   ` Prarit Bhargava
2007-03-27 14:42     ` Prarit Bhargava

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.