All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] watchdog: fix rounding issues in get_sample_period()
@ 2011-05-16 23:34 Mandeep Singh Baines
  2011-05-16 23:34 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-16 23:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar

In get_sample_period(), softlockup_thresh is integer divided by 5 before
the multiplication by NSEC_PER_SEC. This results in softlockup_thresh
being rounded down to the nearest integer multiple of 5.

For example, a softlockup_thresh of 4 rounds down to 0.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 kernel/watchdog.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 14733d4..a06972d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -110,7 +110,7 @@ static unsigned long get_sample_period(void)
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return softlockup_thresh / 5 * NSEC_PER_SEC;
+	return softlockup_thresh * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary
  2011-05-16 23:34 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
@ 2011-05-16 23:34 ` Mandeep Singh Baines
  2011-05-16 23:35 ` [PATCH 3/4] watchdog: disable watchdog when thresh is zero Mandeep Singh Baines
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-16 23:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar

Don't take any action on an unsuccessful write to /proc.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 kernel/watchdog.c |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index a06972d..cf0e09f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -507,15 +507,19 @@ static void watchdog_disable_all_cpus(void)
 int proc_dowatchdog_enabled(struct ctl_table *table, int write,
 		     void __user *buffer, size_t *length, loff_t *ppos)
 {
-	proc_dointvec(table, write, buffer, length, ppos);
+	int ret;
 
-	if (write) {
-		if (watchdog_enabled)
-			watchdog_enable_all_cpus();
-		else
-			watchdog_disable_all_cpus();
-	}
-	return 0;
+	ret = proc_dointvec(table, write, buffer, length, ppos);
+	if (ret || !write)
+		goto out;
+
+	if (watchdog_enabled)
+		watchdog_enable_all_cpus();
+	else
+		watchdog_disable_all_cpus();
+
+out:
+	return ret;
 }
 
 int proc_dowatchdog_thresh(struct ctl_table *table, int write,
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/4] watchdog: disable watchdog when thresh is zero
  2011-05-16 23:34 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
  2011-05-16 23:34 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
@ 2011-05-16 23:35 ` Mandeep Singh Baines
  2011-05-17  7:10   ` Ingo Molnar
  2011-05-16 23:35 ` [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh Mandeep Singh Baines
  2011-05-18 13:53 ` [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Don Zickus
  3 siblings, 1 reply; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-16 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar

This restores the previous behavior of softlock_thresh.

Currently, setting watchdog_thresh to zero causes the watchdog
kthreads to consume a lot of CPU.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 kernel/watchdog.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index cf0e09f..50c2f62 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -526,7 +526,20 @@ int proc_dowatchdog_thresh(struct ctl_table *table, int write,
 			     void __user *buffer,
 			     size_t *lenp, loff_t *ppos)
 {
-	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	int ret;
+
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (ret || !write)
+		goto out;
+
+	if (softlockup_thresh)
+		watchdog_enable_all_cpus();
+	else
+		watchdog_disable_all_cpus();
+
+out:
+	return ret;
+
 }
 #endif /* CONFIG_SYSCTL */
 
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh
  2011-05-16 23:34 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
  2011-05-16 23:34 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
  2011-05-16 23:35 ` [PATCH 3/4] watchdog: disable watchdog when thresh is zero Mandeep Singh Baines
@ 2011-05-16 23:35 ` Mandeep Singh Baines
  2011-05-17  7:16   ` Ingo Molnar
  2011-05-18 13:53 ` [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Don Zickus
  3 siblings, 1 reply; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-16 23:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar

Before the conversion of the NMI watchdog to perf event, the watchdog
timeout was 5 seconds. Now it is 60 seconds. For my particular application,
netbooks, 5 seconds was a better timeout. With a short timeout, we
catch faults earlier and are able to send back a panic. With a 60 second
timeout, the user is unlikely to wait and will instead hit the power
button, causing us to lose the panic info.

This change configures the NMI period based on the watchdog_thresh.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/apic/hw_nmi.c |    4 ++--
 include/linux/nmi.h           |    2 +-
 kernel/watchdog.c             |    2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 5260fe9..d5e57db0 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -19,9 +19,9 @@
 #include <linux/delay.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
-u64 hw_nmi_get_sample_period(void)
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
-	return (u64)(cpu_khz) * 1000 * 60;
+	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
 }
 #endif
 
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index c536f85..c20c55a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -45,7 +45,7 @@ static inline bool trigger_all_cpu_backtrace(void)
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
-u64 hw_nmi_get_sample_period(void);
+u64 hw_nmi_get_sample_period(int watchdog_thresh);
 extern int watchdog_enabled;
 struct ctl_table;
 extern int proc_dowatchdog_enabled(struct ctl_table *, int ,
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 50c2f62..169e1e2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -359,7 +359,7 @@ static int watchdog_nmi_enable(int cpu)
 
 	/* Try to register using hardware perf events */
 	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period();
+	wd_attr->sample_period = hw_nmi_get_sample_period(softlockup_thresh);
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
 	if (!IS_ERR(event)) {
 		printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] watchdog: disable watchdog when thresh is zero
  2011-05-16 23:35 ` [PATCH 3/4] watchdog: disable watchdog when thresh is zero Mandeep Singh Baines
@ 2011-05-17  7:10   ` Ingo Molnar
  2011-05-18  3:36     ` [PATCH 3/4 v2] " Mandeep Singh Baines
  2011-05-23 11:13     ` [tip:perf/urgent] watchdog: Disable " tip-bot for Mandeep Singh Baines
  0 siblings, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-05-17  7:10 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker


* Mandeep Singh Baines <msb@chromium.org> wrote:

> This restores the previous behavior of softlock_thresh.
> 
> Currently, setting watchdog_thresh to zero causes the watchdog
> kthreads to consume a lot of CPU.
> 
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/watchdog.c |   15 ++++++++++++++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index cf0e09f..50c2f62 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -526,7 +526,20 @@ int proc_dowatchdog_thresh(struct ctl_table *table, int write,
>  			     void __user *buffer,
>  			     size_t *lenp, loff_t *ppos)
>  {
> -	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +	int ret;
> +
> +	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +	if (ret || !write)
> +		goto out;
> +
> +	if (softlockup_thresh)
> +		watchdog_enable_all_cpus();
> +	else
> +		watchdog_disable_all_cpus();

This now adds two similar looking blocks of these 4 lines, one in 
proc_dowatchdog_enabled(), one in proc_dowatchdog_thresh().

They are not the same though. So what happens if the watchdog is disabled but 
the threshold is updated to nonzero - do we enable the watchdog?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh
  2011-05-16 23:35 ` [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh Mandeep Singh Baines
@ 2011-05-17  7:16   ` Ingo Molnar
  2011-05-17 14:03     ` Don Zickus
                       ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-05-17  7:16 UTC (permalink / raw)
  To: Mandeep Singh Baines, Andrew Morton
  Cc: linux-kernel, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker


* Mandeep Singh Baines <msb@chromium.org> wrote:

> Before the conversion of the NMI watchdog to perf event, the watchdog
> timeout was 5 seconds. Now it is 60 seconds. For my particular application,
> netbooks, 5 seconds was a better timeout. With a short timeout, we
> catch faults earlier and are able to send back a panic. With a 60 second
> timeout, the user is unlikely to wait and will instead hit the power
> button, causing us to lose the panic info.

That's an interesting observation. Have you been able to measure/observe this 
effect somehow, or do you presume that users find 60 seconds too long?

This would be a concern for upstream as well i guess.

> This change configures the NMI period based on the watchdog_thresh.

Hm, our tolerance for the two thresholds is not just human but technical: hard 
lockup warnings should indeed be triggered after just a few seconds, soft 
lockups can have false positives under extreme conditions.

So we generally want a higher threshold for soft lockups than for hard lockups.

So how about we couple the thresholds with a factor: we make the soft threshold 
twice the amount of time the hard threshold is? Then we could change the 
upstream default as well i think: lets change the NMI timeout to 10 seconds 
(and thus have the soft threshold at 20 seconds). Is 20 seconds short enough 
for most users to not hit reset?

We might want to change another aspect of the NMI watchdog: right now it tries 
to abort the offending task - which is really nasty if there was a spuriously 
long irqs-off section somewhere in the kernel. How about we just print a 
warning instead?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh
  2011-05-17  7:16   ` Ingo Molnar
@ 2011-05-17 14:03     ` Don Zickus
  2011-05-17 18:47       ` Ingo Molnar
  2011-05-18  3:44     ` [PATCH 4/4 v2] " Mandeep Singh Baines
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Don Zickus @ 2011-05-17 14:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mandeep Singh Baines, Andrew Morton, linux-kernel,
	Marcin Slusarz, Peter Zijlstra, Frederic Weisbecker

On Tue, May 17, 2011 at 09:16:42AM +0200, Ingo Molnar wrote:
> Hm, our tolerance for the two thresholds is not just human but technical: hard 
> lockup warnings should indeed be triggered after just a few seconds, soft 
> lockups can have false positives under extreme conditions.
> 
> So we generally want a higher threshold for soft lockups than for hard lockups.
> 
> So how about we couple the thresholds with a factor: we make the soft threshold 
> twice the amount of time the hard threshold is? Then we could change the 
> upstream default as well i think: lets change the NMI timeout to 10 seconds 
> (and thus have the soft threshold at 20 seconds). Is 20 seconds short enough 
> for most users to not hit reset?

Making softlockup twice as long as hardlockup seems to make sense.
Setting the hardlockup to 10 seconds can be ok, but then you get into
power savings issues.  For example, I have the timers setup to trigger 5
times a period (I know it probably should be 2 times), so at 10 seconds
that means the timers are firing every 2 seconds.  That shows up on
powertop :-(.  Though I was flirting with the idea of trying to slow down
or stop the timer when the cpu goes into deeper c-states.  But that is a
different problem.

> 
> We might want to change another aspect of the NMI watchdog: right now it tries 
> to abort the offending task - which is really nasty if there was a spuriously 
> long irqs-off section somewhere in the kernel. How about we just print a 
> warning instead?

I dont understand this.  IIRC NMI watchdog will either printk or panic on
a hardlockup.  What do you mean by 'aborting' the task?

Cheers,
Don

> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh
  2011-05-17 14:03     ` Don Zickus
@ 2011-05-17 18:47       ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-05-17 18:47 UTC (permalink / raw)
  To: Don Zickus
  Cc: Mandeep Singh Baines, Andrew Morton, linux-kernel,
	Marcin Slusarz, Peter Zijlstra, Frederic Weisbecker


* Don Zickus <dzickus@redhat.com> wrote:

> On Tue, May 17, 2011 at 09:16:42AM +0200, Ingo Molnar wrote:
> > Hm, our tolerance for the two thresholds is not just human but technical: hard 
> > lockup warnings should indeed be triggered after just a few seconds, soft 
> > lockups can have false positives under extreme conditions.
> > 
> > So we generally want a higher threshold for soft lockups than for hard lockups.
> > 
> > So how about we couple the thresholds with a factor: we make the soft threshold 
> > twice the amount of time the hard threshold is? Then we could change the 
> > upstream default as well i think: lets change the NMI timeout to 10 seconds 
> > (and thus have the soft threshold at 20 seconds). Is 20 seconds short enough 
> > for most users to not hit reset?
> 
> Making softlockup twice as long as hardlockup seems to make sense.
> Setting the hardlockup to 10 seconds can be ok, but then you get into
> power savings issues.  For example, I have the timers setup to trigger 5
> times a period (I know it probably should be 2 times), so at 10 seconds
> that means the timers are firing every 2 seconds.  That shows up on
> powertop :-(.  Though I was flirting with the idea of trying to slow down
> or stop the timer when the cpu goes into deeper c-states.  But that is a
> different problem.
> 
> > 
> > We might want to change another aspect of the NMI watchdog: right now it tries 
> > to abort the offending task - which is really nasty if there was a spuriously 
> > long irqs-off section somewhere in the kernel. How about we just print a 
> > warning instead?
> 
> I dont understand this.  IIRC NMI watchdog will either printk or panic on
> a hardlockup.  What do you mean by 'aborting' the task?

Oh, simple dementia on my side. We used to attempt a do_exit() in some long 
gone version of that code :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/4 v2] watchdog: disable watchdog when thresh is zero
  2011-05-17  7:10   ` Ingo Molnar
@ 2011-05-18  3:36     ` Mandeep Singh Baines
  2011-05-18  8:35       ` Ingo Molnar
  2011-05-23 11:13     ` [tip:perf/urgent] watchdog: Disable " tip-bot for Mandeep Singh Baines
  1 sibling, 1 reply; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-18  3:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mandeep Singh Baines, linux-kernel, Marcin Slusarz, Don Zickus,
	Peter Zijlstra, Frederic Weisbecker

Ingo Molnar (mingo@elte.hu) wrote:
> This now adds two similar looking blocks of these 4 lines, one in 
> proc_dowatchdog_enabled(), one in proc_dowatchdog_thresh().
> 
> They are not the same though. So what happens if the watchdog is disabled but 
> the threshold is updated to nonzero - do we enable the watchdog?
> 

Good point. Fixed in this patch (v2). Also modified the patch to
disable the watchdog if thresh == 0 or enabled == 0.

I used a #define for proc_dowatchdog instead of an inline to avoid
duplicating the large parameter list.

---

This restores the previous behavior of softlock_thresh.

Currently, setting watchdog_thresh to zero causes the watchdog
kthreads to consume a lot of CPU.

In addition, the logic of proc_dowatchdog_thresh and proc_dowatchdog_enabled
has been factored into __proc_dowatchdog.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
LKML-Reference: <20110517071018.GE22305@elte.hu>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/nmi.h   |    7 +++++--
 include/linux/sched.h |    1 -
 kernel/sysctl.c       |    4 +++-
 kernel/watchdog.c     |   25 +++++++++----------------
 4 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index c536f85..10cbca7 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -47,9 +47,12 @@ static inline bool trigger_all_cpu_backtrace(void)
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
 u64 hw_nmi_get_sample_period(void);
 extern int watchdog_enabled;
+extern int watchdog_thresh;
 struct ctl_table;
-extern int proc_dowatchdog_enabled(struct ctl_table *, int ,
-			void __user *, size_t *, loff_t *);
+extern int __proc_dowatchdog(struct ctl_table *, int ,
+			     void __user *, size_t *, loff_t *);
+#define proc_dowatchdog_enabled __proc_dowatchdog
+#define proc_dowatchdog_thresh __proc_dowatchdog
 #endif
 
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 12211e1..d8b2d0b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -315,7 +315,6 @@ extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
 				  void __user *buffer,
 				  size_t *lenp, loff_t *ppos);
 extern unsigned int  softlockup_panic;
-extern int softlockup_thresh;
 void lockup_detector_init(void);
 #else
 static inline void touch_softlockup_watchdog(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..acb12f4 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -731,10 +731,12 @@ static struct ctl_table kern_table[] = {
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
 		.proc_handler   = proc_dowatchdog_enabled,
+		.extra1		= &zero,
+		.extra2		= &one,
 	},
 	{
 		.procname	= "watchdog_thresh",
-		.data		= &softlockup_thresh,
+		.data		= &watchdog_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dowatchdog_thresh,
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index cf0e09f..ea3dfc2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,7 +28,7 @@
 #include <linux/perf_event.h>
 
 int watchdog_enabled = 1;
-int __read_mostly softlockup_thresh = 60;
+int __read_mostly watchdog_thresh = 60;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
@@ -105,12 +105,12 @@ static unsigned long get_timestamp(int this_cpu)
 static unsigned long get_sample_period(void)
 {
 	/*
-	 * convert softlockup_thresh from seconds to ns
+	 * convert watchdog_thresh from seconds to ns
 	 * the divide by 5 is to give hrtimer 5 chances to
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return softlockup_thresh * (NSEC_PER_SEC / 5);
+	return watchdog_thresh * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
@@ -182,7 +182,7 @@ static int is_softlockup(unsigned long touch_ts)
 	unsigned long now = get_timestamp(smp_processor_id());
 
 	/* Warn about unreasonable delays: */
-	if (time_after(now, touch_ts + softlockup_thresh))
+	if (time_after(now, touch_ts + watchdog_thresh))
 		return now - touch_ts;
 
 	return 0;
@@ -501,19 +501,19 @@ static void watchdog_disable_all_cpus(void)
 /* sysctl functions */
 #ifdef CONFIG_SYSCTL
 /*
- * proc handler for /proc/sys/kernel/nmi_watchdog
+ * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh
  */
 
-int proc_dowatchdog_enabled(struct ctl_table *table, int write,
-		     void __user *buffer, size_t *length, loff_t *ppos)
+int __proc_dowatchdog(struct ctl_table *table, int write,
+		      void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	int ret;
 
-	ret = proc_dointvec(table, write, buffer, length, ppos);
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (ret || !write)
 		goto out;
 
-	if (watchdog_enabled)
+	if (watchdog_enabled && watchdog_thresh)
 		watchdog_enable_all_cpus();
 	else
 		watchdog_disable_all_cpus();
@@ -521,13 +521,6 @@ int proc_dowatchdog_enabled(struct ctl_table *table, int write,
 out:
 	return ret;
 }
-
-int proc_dowatchdog_thresh(struct ctl_table *table, int write,
-			     void __user *buffer,
-			     size_t *lenp, loff_t *ppos)
-{
-	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-}
 #endif /* CONFIG_SYSCTL */
 
 
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/4 v2] watchdog: configure nmi watchdog period based on watchdog_thresh
  2011-05-17  7:16   ` Ingo Molnar
  2011-05-17 14:03     ` Don Zickus
@ 2011-05-18  3:44     ` Mandeep Singh Baines
  2011-05-18  8:39       ` Ingo Molnar
  2011-05-23 11:13     ` [tip:perf/urgent] watchdog: Change the default timeout and " tip-bot for Mandeep Singh Baines
  2011-05-24  3:58     ` [tip:perf/urgent] watchdog: Fix non-standard prototype of get_softlockup_thresh() tip-bot for Ingo Molnar
  3 siblings, 1 reply; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-18  3:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mandeep Singh Baines, Andrew Morton, linux-kernel,
	Marcin Slusarz, Don Zickus, Peter Zijlstra, Frederic Weisbecker

Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mandeep Singh Baines <msb@chromium.org> wrote:
> 
> > Before the conversion of the NMI watchdog to perf event, the watchdog
> > timeout was 5 seconds. Now it is 60 seconds. For my particular application,
> > netbooks, 5 seconds was a better timeout. With a short timeout, we
> > catch faults earlier and are able to send back a panic. With a 60 second
> > timeout, the user is unlikely to wait and will instead hit the power
> > button, causing us to lose the panic info.
> 
> That's an interesting observation. Have you been able to measure/observe this 
> effect somehow, or do you presume that users find 60 seconds too long?
> 

Mostly intuition. There is a threshold beyond which the user will hit
the power button. Not sure if its 20 seconds or 20 minutes. My feeling
was that the 1 minute was too long.

For a user experience perspective, a quick reboot also seems like a better
experience than a one minute hang. Our systems boot in 8 seconds and restore
the previous session so a reboot is almost not noticable.

> This would be a concern for upstream as well i guess.
> 
> > This change configures the NMI period based on the watchdog_thresh.
> 
> Hm, our tolerance for the two thresholds is not just human but technical: hard 
> lockup warnings should indeed be triggered after just a few seconds, soft 
> lockups can have false positives under extreme conditions.
> 
> So we generally want a higher threshold for soft lockups than for hard lockups.
> 
> So how about we couple the thresholds with a factor: we make the soft threshold 
> twice the amount of time the hard threshold is? Then we could change the 
> upstream default as well i think: lets change the NMI timeout to 10 seconds 
> (and thus have the soft threshold at 20 seconds). Is 20 seconds short enough 
> for most users to not hit reset?
> 

Agree. Implemented in this version of the patch (v2).

---

Before the conversion of the NMI watchdog to perf event, the watchdog
timeout was 5 seconds. Now it is 60 seconds. For my particular application,
netbooks, 5 seconds was a better timeout. With a short timeout, we
catch faults earlier and are able to send back a panic. With a 60 second
timeout, the user is unlikely to wait and will instead hit the power
button, causing us to lose the panic info.

This change configures the NMI period to watchdog_thresh and sets
the softlockup_thresh to watchdog_thresh * 2. In addition, watchdog_thresh
was reduced to 10 seconds as suggested by Ingo Molnar.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
LKML-Reference: <20110517071642.GF22305@elte.hu>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/apic/hw_nmi.c |    4 ++--
 include/linux/nmi.h           |    2 +-
 kernel/watchdog.c             |   19 +++++++++++++++----
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 5260fe9..d5e57db0 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -19,9 +19,9 @@
 #include <linux/delay.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
-u64 hw_nmi_get_sample_period(void)
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
-	return (u64)(cpu_khz) * 1000 * 60;
+	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
 }
 #endif
 
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 10cbca7..a26fb4a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -45,7 +45,7 @@ static inline bool trigger_all_cpu_backtrace(void)
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
-u64 hw_nmi_get_sample_period(void);
+u64 hw_nmi_get_sample_period(int watchdog_thresh);
 extern int watchdog_enabled;
 extern int watchdog_thresh;
 struct ctl_table;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index ea3dfc2..2788fa9 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,7 +28,7 @@
 #include <linux/perf_event.h>
 
 int watchdog_enabled = 1;
-int __read_mostly watchdog_thresh = 60;
+int __read_mostly watchdog_thresh = 10;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
@@ -91,6 +91,17 @@ static int __init nosoftlockup_setup(char *str)
 __setup("nosoftlockup", nosoftlockup_setup);
 /*  */
 
+/*
+ * Hard-lockup warnings should be triggered after just a few seconds. Soft-
+ * lockups can have false positives under extreme conditions. So we generally
+ * want a higher threshold for soft lockups than for hard lockups. So we couple
+ * the thresholds with a factor: we make the soft threshold twice the amount of
+ * time the hard threshold is.
+ */
+static int get_softlockup_thresh()
+{
+	return watchdog_thresh * 2;
+}
 
 /*
  * Returns seconds, approximately.  We don't need nanosecond
@@ -110,7 +121,7 @@ static unsigned long get_sample_period(void)
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return watchdog_thresh * (NSEC_PER_SEC / 5);
+	return get_softlockup_thresh() * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
@@ -182,7 +193,7 @@ static int is_softlockup(unsigned long touch_ts)
 	unsigned long now = get_timestamp(smp_processor_id());
 
 	/* Warn about unreasonable delays: */
-	if (time_after(now, touch_ts + watchdog_thresh))
+	if (time_after(now, touch_ts + get_softlockup_thresh()))
 		return now - touch_ts;
 
 	return 0;
@@ -359,7 +370,7 @@ static int watchdog_nmi_enable(int cpu)
 
 	/* Try to register using hardware perf events */
 	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period();
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
 	if (!IS_ERR(event)) {
 		printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4 v2] watchdog: disable watchdog when thresh is zero
  2011-05-18  3:36     ` [PATCH 3/4 v2] " Mandeep Singh Baines
@ 2011-05-18  8:35       ` Ingo Molnar
  2011-05-19 16:20         ` [PATCH 3/4 v3] " Mandeep Singh Baines
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2011-05-18  8:35 UTC (permalink / raw)
  To: Mandeep Singh Baines, Andrew Morton
  Cc: linux-kernel, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker


* Mandeep Singh Baines <msb@chromium.org> wrote:

> +extern int watchdog_thresh;
>  struct ctl_table;
> -extern int proc_dowatchdog_enabled(struct ctl_table *, int ,
> -			void __user *, size_t *, loff_t *);
> +extern int __proc_dowatchdog(struct ctl_table *, int ,
> +			     void __user *, size_t *, loff_t *);
> +#define proc_dowatchdog_enabled __proc_dowatchdog
> +#define proc_dowatchdog_thresh __proc_dowatchdog

i like the other aspects of your patch but this one is a no-no, we do not use 
1970's tech to obfuscate nice C code! :-)

If the argument list is annoying then introduce a helper structure. But having 
it longer is no big issue either. Try to shorten the function names if 
possible.

Sidenote, the sysctl code has been misdesigned a bit: it should be possible to 
add sysctls in .c files and not centralize it all into kernel/sysctl.c 
forcibly: we could should have a central static array by using a .sysctl_data 
section or such. Anyone wanna fix/improve that?

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4 v2] watchdog: configure nmi watchdog period based on watchdog_thresh
  2011-05-18  3:44     ` [PATCH 4/4 v2] " Mandeep Singh Baines
@ 2011-05-18  8:39       ` Ingo Molnar
  2011-05-18 13:51         ` Don Zickus
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2011-05-18  8:39 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Andrew Morton, linux-kernel, Marcin Slusarz, Don Zickus,
	Peter Zijlstra, Frederic Weisbecker


* Mandeep Singh Baines <msb@chromium.org> wrote:

> Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Mandeep Singh Baines <msb@chromium.org> wrote:
> > 
> > > Before the conversion of the NMI watchdog to perf event, the watchdog
> > > timeout was 5 seconds. Now it is 60 seconds. For my particular application,
> > > netbooks, 5 seconds was a better timeout. With a short timeout, we
> > > catch faults earlier and are able to send back a panic. With a 60 second
> > > timeout, the user is unlikely to wait and will instead hit the power
> > > button, causing us to lose the panic info.
> > 
> > That's an interesting observation. Have you been able to measure/observe this 
> > effect somehow, or do you presume that users find 60 seconds too long?
> > 
> 
> Mostly intuition. There is a threshold beyond which the user will hit
> the power button. Not sure if its 20 seconds or 20 minutes. My feeling
> was that the 1 minute was too long.
> 
> For a user experience perspective, a quick reboot also seems like a better
> experience than a one minute hang. Our systems boot in 8 seconds and restore
> the previous session so a reboot is almost not noticable.

Indeed you definitely want it configurable and have the delay down to 5 or 10 
seconds, to correlate it with your boot delay.

Personally i consider any hang over 1 second annoying so you might want to work 
on that 8 seconds boot time some more, it's too long ;-)

And any kernel code running with more than 1 second irqs off is a bug, plain 
and simple.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4 v2] watchdog: configure nmi watchdog period based on watchdog_thresh
  2011-05-18  8:39       ` Ingo Molnar
@ 2011-05-18 13:51         ` Don Zickus
  0 siblings, 0 replies; 30+ messages in thread
From: Don Zickus @ 2011-05-18 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mandeep Singh Baines, Andrew Morton, linux-kernel,
	Marcin Slusarz, Peter Zijlstra, Frederic Weisbecker

On Wed, May 18, 2011 at 10:39:36AM +0200, Ingo Molnar wrote:
> 
> * Mandeep Singh Baines <msb@chromium.org> wrote:
> 
> > Ingo Molnar (mingo@elte.hu) wrote:
> > > 
> > > * Mandeep Singh Baines <msb@chromium.org> wrote:
> > > 
> > > > Before the conversion of the NMI watchdog to perf event, the watchdog
> > > > timeout was 5 seconds. Now it is 60 seconds. For my particular application,
> > > > netbooks, 5 seconds was a better timeout. With a short timeout, we
> > > > catch faults earlier and are able to send back a panic. With a 60 second
> > > > timeout, the user is unlikely to wait and will instead hit the power
> > > > button, causing us to lose the panic info.
> > > 
> > > That's an interesting observation. Have you been able to measure/observe this 
> > > effect somehow, or do you presume that users find 60 seconds too long?
> > > 
> > 
> > Mostly intuition. There is a threshold beyond which the user will hit
> > the power button. Not sure if its 20 seconds or 20 minutes. My feeling
> > was that the 1 minute was too long.
> > 
> > For a user experience perspective, a quick reboot also seems like a better
> > experience than a one minute hang. Our systems boot in 8 seconds and restore
> > the previous session so a reboot is almost not noticable.
> 
> Indeed you definitely want it configurable and have the delay down to 5 or 10 
> seconds, to correlate it with your boot delay.
> 
> Personally i consider any hang over 1 second annoying so you might want to work 
> on that 8 seconds boot time some more, it's too long ;-)
> 
> And any kernel code running with more than 1 second irqs off is a bug, plain 
> and simple.

Not necessarily, but in those cases the code can use touch_nmi_watchdog
(for example slow hardware, long print outs, etc). :-)

Cheers,
Don

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] watchdog: fix rounding issues in get_sample_period()
  2011-05-16 23:34 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
                   ` (2 preceding siblings ...)
  2011-05-16 23:35 ` [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh Mandeep Singh Baines
@ 2011-05-18 13:53 ` Don Zickus
  2011-05-19 16:26   ` Mandeep Singh Baines
  3 siblings, 1 reply; 30+ messages in thread
From: Don Zickus @ 2011-05-18 13:53 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Marcin Slusarz, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar

On Mon, May 16, 2011 at 04:34:58PM -0700, Mandeep Singh Baines wrote:
> In get_sample_period(), softlockup_thresh is integer divided by 5 before
> the multiplication by NSEC_PER_SEC. This results in softlockup_thresh
> being rounded down to the nearest integer multiple of 5.
> 
> For example, a softlockup_thresh of 4 rounds down to 0.

Mandeep,

Thanks for the patches.  Unfortunately, I am taking some time off so I'll
shephard these patches when I get back.  For the most part they look fine,
aside from Ingo's comments.

Cheers,
Don

> 
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/watchdog.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 14733d4..a06972d 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -110,7 +110,7 @@ static unsigned long get_sample_period(void)
>  	 * increment before the hardlockup detector generates
>  	 * a warning
>  	 */
> -	return softlockup_thresh / 5 * NSEC_PER_SEC;
> +	return softlockup_thresh * (NSEC_PER_SEC / 5);
>  }
>  
>  /* Commands for resetting the watchdog */
> -- 
> 1.7.3.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/4 v3] watchdog: disable watchdog when thresh is zero
  2011-05-18  8:35       ` Ingo Molnar
@ 2011-05-19 16:20         ` Mandeep Singh Baines
  0 siblings, 0 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-19 16:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mandeep Singh Baines, Andrew Morton, linux-kernel,
	Marcin Slusarz, Don Zickus, Peter Zijlstra, Frederic Weisbecker

Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mandeep Singh Baines <msb@chromium.org> wrote:
> 
> > +extern int watchdog_thresh;
> >  struct ctl_table;
> > -extern int proc_dowatchdog_enabled(struct ctl_table *, int ,
> > -			void __user *, size_t *, loff_t *);
> > +extern int __proc_dowatchdog(struct ctl_table *, int ,
> > +			     void __user *, size_t *, loff_t *);
> > +#define proc_dowatchdog_enabled __proc_dowatchdog
> > +#define proc_dowatchdog_thresh __proc_dowatchdog
> 
> i like the other aspects of your patch but this one is a no-no, we do not use 
> 1970's tech to obfuscate nice C code! :-)
> 

Yeah, you're right. Fixed in this patch (v3).

---

This restores the previous behavior of softlock_thresh.

Currently, setting watchdog_thresh to zero causes the watchdog
kthreads to consume a lot of CPU.

In addition, the logic of proc_dowatchdog_thresh and proc_dowatchdog_enabled
has been factored into proc_dowatchdog.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
LKML-Reference: <20110517071018.GE22305@elte.hu>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
 include/linux/nmi.h   |    5 +++--
 include/linux/sched.h |    1 -
 kernel/sysctl.c       |   12 ++++++++----
 kernel/watchdog.c     |   25 +++++++++----------------
 4 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index c536f85..5317b8b 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -47,9 +47,10 @@ static inline bool trigger_all_cpu_backtrace(void)
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
 u64 hw_nmi_get_sample_period(void);
 extern int watchdog_enabled;
+extern int watchdog_thresh;
 struct ctl_table;
-extern int proc_dowatchdog_enabled(struct ctl_table *, int ,
-			void __user *, size_t *, loff_t *);
+extern int proc_dowatchdog(struct ctl_table *, int ,
+			   void __user *, size_t *, loff_t *);
 #endif
 
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 12211e1..d8b2d0b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -315,7 +315,6 @@ extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
 				  void __user *buffer,
 				  size_t *lenp, loff_t *ppos);
 extern unsigned int  softlockup_panic;
-extern int softlockup_thresh;
 void lockup_detector_init(void);
 #else
 static inline void touch_softlockup_watchdog(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..3dd0c46 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -730,14 +730,16 @@ static struct ctl_table kern_table[] = {
 		.data           = &watchdog_enabled,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
-		.proc_handler   = proc_dowatchdog_enabled,
+		.proc_handler   = proc_dowatchdog,
+		.extra1		= &zero,
+		.extra2		= &one,
 	},
 	{
 		.procname	= "watchdog_thresh",
-		.data		= &softlockup_thresh,
+		.data		= &watchdog_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dowatchdog_thresh,
+		.proc_handler	= proc_dowatchdog,
 		.extra1		= &neg_one,
 		.extra2		= &sixty,
 	},
@@ -755,7 +757,9 @@ static struct ctl_table kern_table[] = {
 		.data           = &watchdog_enabled,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
-		.proc_handler   = proc_dowatchdog_enabled,
+		.proc_handler   = proc_dowatchdog,
+		.extra1		= &zero,
+		.extra2		= &one,
 	},
 #endif
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index cf0e09f..6030191 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,7 +28,7 @@
 #include <linux/perf_event.h>
 
 int watchdog_enabled = 1;
-int __read_mostly softlockup_thresh = 60;
+int __read_mostly watchdog_thresh = 60;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
@@ -105,12 +105,12 @@ static unsigned long get_timestamp(int this_cpu)
 static unsigned long get_sample_period(void)
 {
 	/*
-	 * convert softlockup_thresh from seconds to ns
+	 * convert watchdog_thresh from seconds to ns
 	 * the divide by 5 is to give hrtimer 5 chances to
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return softlockup_thresh * (NSEC_PER_SEC / 5);
+	return watchdog_thresh * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
@@ -182,7 +182,7 @@ static int is_softlockup(unsigned long touch_ts)
 	unsigned long now = get_timestamp(smp_processor_id());
 
 	/* Warn about unreasonable delays: */
-	if (time_after(now, touch_ts + softlockup_thresh))
+	if (time_after(now, touch_ts + watchdog_thresh))
 		return now - touch_ts;
 
 	return 0;
@@ -501,19 +501,19 @@ static void watchdog_disable_all_cpus(void)
 /* sysctl functions */
 #ifdef CONFIG_SYSCTL
 /*
- * proc handler for /proc/sys/kernel/nmi_watchdog
+ * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh
  */
 
-int proc_dowatchdog_enabled(struct ctl_table *table, int write,
-		     void __user *buffer, size_t *length, loff_t *ppos)
+int proc_dowatchdog(struct ctl_table *table, int write,
+		    void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	int ret;
 
-	ret = proc_dointvec(table, write, buffer, length, ppos);
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (ret || !write)
 		goto out;
 
-	if (watchdog_enabled)
+	if (watchdog_enabled && watchdog_thresh)
 		watchdog_enable_all_cpus();
 	else
 		watchdog_disable_all_cpus();
@@ -521,13 +521,6 @@ int proc_dowatchdog_enabled(struct ctl_table *table, int write,
 out:
 	return ret;
 }
-
-int proc_dowatchdog_thresh(struct ctl_table *table, int write,
-			     void __user *buffer,
-			     size_t *lenp, loff_t *ppos)
-{
-	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-}
 #endif /* CONFIG_SYSCTL */
 
 
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] watchdog: fix rounding issues in get_sample_period()
  2011-05-18 13:53 ` [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Don Zickus
@ 2011-05-19 16:26   ` Mandeep Singh Baines
  2011-05-20 11:51     ` Ingo Molnar
  2011-06-09 11:47     ` Don Zickus
  0 siblings, 2 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-19 16:26 UTC (permalink / raw)
  To: Don Zickus
  Cc: Mandeep Singh Baines, linux-kernel, Marcin Slusarz,
	Peter Zijlstra, Frederic Weisbecker, Ingo Molnar

Don Zickus (dzickus@redhat.com) wrote:
> On Mon, May 16, 2011 at 04:34:58PM -0700, Mandeep Singh Baines wrote:
> > In get_sample_period(), softlockup_thresh is integer divided by 5 before
> > the multiplication by NSEC_PER_SEC. This results in softlockup_thresh
> > being rounded down to the nearest integer multiple of 5.
> > 
> > For example, a softlockup_thresh of 4 rounds down to 0.
> 
> Mandeep,
> 
> Thanks for the patches.  Unfortunately, I am taking some time off so I'll
> shephard these patches when I get back.  For the most part they look fine,
> aside from Ingo's comments.
> 

No problem. Enjoy your time off:)

Ingo, since Don is out, would you mind acking the patches (assuming you're
happy). It simplifies pushing these patches into our local ChromiumOS tree
if I have an upstream ack.

> Cheers,
> Don
> 
> > 
> > Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> > Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
> > Cc: Don Zickus <dzickus@redhat.com>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > ---
> >  kernel/watchdog.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 14733d4..a06972d 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -110,7 +110,7 @@ static unsigned long get_sample_period(void)
> >  	 * increment before the hardlockup detector generates
> >  	 * a warning
> >  	 */
> > -	return softlockup_thresh / 5 * NSEC_PER_SEC;
> > +	return softlockup_thresh * (NSEC_PER_SEC / 5);
> >  }
> >  
> >  /* Commands for resetting the watchdog */
> > -- 
> > 1.7.3.1
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] watchdog: fix rounding issues in get_sample_period()
  2011-05-19 16:26   ` Mandeep Singh Baines
@ 2011-05-20 11:51     ` Ingo Molnar
  2011-05-23  5:14       ` Mandeep Singh Baines
  2011-06-09 11:47     ` Don Zickus
  1 sibling, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2011-05-20 11:51 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Don Zickus, linux-kernel, Marcin Slusarz, Peter Zijlstra,
	Frederic Weisbecker


* Mandeep Singh Baines <msb@chromium.org> wrote:

> Ingo, since Don is out, would you mind acking the patches (assuming you're 
> happy). It simplifies pushing these patches into our local ChromiumOS tree if 
> I have an upstream ack.

I suspect it would simplify things even more for you if they were all upstream, 
right?

To get that process underway please send an updated series with (delta) patches 
against -tip.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] watchdog: fix rounding issues in get_sample_period()
@ 2011-05-23  5:10 Mandeep Singh Baines
  2011-05-23  5:10 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-23  5:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar, Andrew Morton

In get_sample_period(), softlockup_thresh is integer divided by 5 before
the multiplication by NSEC_PER_SEC. This results in softlockup_thresh
being rounded down to the nearest integer multiple of 5.

For example, a softlockup_thresh of 4 rounds down to 0.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/watchdog.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 14733d4..a06972d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -110,7 +110,7 @@ static unsigned long get_sample_period(void)
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return softlockup_thresh / 5 * NSEC_PER_SEC;
+	return softlockup_thresh * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary
  2011-05-23  5:10 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
@ 2011-05-23  5:10 ` Mandeep Singh Baines
  2011-05-23 11:12   ` [tip:perf/urgent] watchdog: Only " tip-bot for Mandeep Singh Baines
  2011-05-23  5:10 ` [PATCH 3/4] watchdog: disable watchdog when thresh is zero Mandeep Singh Baines
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-23  5:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar, Andrew Morton

Don't take any action on an unsuccessful write to /proc.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 kernel/watchdog.c |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index a06972d..cf0e09f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -507,15 +507,19 @@ static void watchdog_disable_all_cpus(void)
 int proc_dowatchdog_enabled(struct ctl_table *table, int write,
 		     void __user *buffer, size_t *length, loff_t *ppos)
 {
-	proc_dointvec(table, write, buffer, length, ppos);
+	int ret;
 
-	if (write) {
-		if (watchdog_enabled)
-			watchdog_enable_all_cpus();
-		else
-			watchdog_disable_all_cpus();
-	}
-	return 0;
+	ret = proc_dointvec(table, write, buffer, length, ppos);
+	if (ret || !write)
+		goto out;
+
+	if (watchdog_enabled)
+		watchdog_enable_all_cpus();
+	else
+		watchdog_disable_all_cpus();
+
+out:
+	return ret;
 }
 
 int proc_dowatchdog_thresh(struct ctl_table *table, int write,
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/4] watchdog: disable watchdog when thresh is zero
  2011-05-23  5:10 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
  2011-05-23  5:10 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
@ 2011-05-23  5:10 ` Mandeep Singh Baines
  2011-05-23  5:10 ` [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh Mandeep Singh Baines
  2011-05-23 11:12 ` [tip:perf/urgent] watchdog: Fix rounding bug in get_sample_period() tip-bot for Mandeep Singh Baines
  3 siblings, 0 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-23  5:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar, Andrew Morton

This restores the previous behavior of softlock_thresh.

Currently, setting watchdog_thresh to zero causes the watchdog
kthreads to consume a lot of CPU.

In addition, the logic of proc_dowatchdog_thresh and proc_dowatchdog_enabled
has been factored into proc_dowatchdog.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
LKML-Reference: <20110517071018.GE22305@elte.hu>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/nmi.h   |    5 +++--
 include/linux/sched.h |    1 -
 kernel/sysctl.c       |   12 ++++++++----
 kernel/watchdog.c     |   25 +++++++++----------------
 4 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index c536f85..5317b8b 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -47,9 +47,10 @@ static inline bool trigger_all_cpu_backtrace(void)
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
 u64 hw_nmi_get_sample_period(void);
 extern int watchdog_enabled;
+extern int watchdog_thresh;
 struct ctl_table;
-extern int proc_dowatchdog_enabled(struct ctl_table *, int ,
-			void __user *, size_t *, loff_t *);
+extern int proc_dowatchdog(struct ctl_table *, int ,
+			   void __user *, size_t *, loff_t *);
 #endif
 
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7ee2530..aaf71e0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -315,7 +315,6 @@ extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
 				  void __user *buffer,
 				  size_t *lenp, loff_t *ppos);
 extern unsigned int  softlockup_panic;
-extern int softlockup_thresh;
 void lockup_detector_init(void);
 #else
 static inline void touch_softlockup_watchdog(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..3dd0c46 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -730,14 +730,16 @@ static struct ctl_table kern_table[] = {
 		.data           = &watchdog_enabled,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
-		.proc_handler   = proc_dowatchdog_enabled,
+		.proc_handler   = proc_dowatchdog,
+		.extra1		= &zero,
+		.extra2		= &one,
 	},
 	{
 		.procname	= "watchdog_thresh",
-		.data		= &softlockup_thresh,
+		.data		= &watchdog_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dowatchdog_thresh,
+		.proc_handler	= proc_dowatchdog,
 		.extra1		= &neg_one,
 		.extra2		= &sixty,
 	},
@@ -755,7 +757,9 @@ static struct ctl_table kern_table[] = {
 		.data           = &watchdog_enabled,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
-		.proc_handler   = proc_dowatchdog_enabled,
+		.proc_handler   = proc_dowatchdog,
+		.extra1		= &zero,
+		.extra2		= &one,
 	},
 #endif
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index cf0e09f..6030191 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,7 +28,7 @@
 #include <linux/perf_event.h>
 
 int watchdog_enabled = 1;
-int __read_mostly softlockup_thresh = 60;
+int __read_mostly watchdog_thresh = 60;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
@@ -105,12 +105,12 @@ static unsigned long get_timestamp(int this_cpu)
 static unsigned long get_sample_period(void)
 {
 	/*
-	 * convert softlockup_thresh from seconds to ns
+	 * convert watchdog_thresh from seconds to ns
 	 * the divide by 5 is to give hrtimer 5 chances to
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return softlockup_thresh * (NSEC_PER_SEC / 5);
+	return watchdog_thresh * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
@@ -182,7 +182,7 @@ static int is_softlockup(unsigned long touch_ts)
 	unsigned long now = get_timestamp(smp_processor_id());
 
 	/* Warn about unreasonable delays: */
-	if (time_after(now, touch_ts + softlockup_thresh))
+	if (time_after(now, touch_ts + watchdog_thresh))
 		return now - touch_ts;
 
 	return 0;
@@ -501,19 +501,19 @@ static void watchdog_disable_all_cpus(void)
 /* sysctl functions */
 #ifdef CONFIG_SYSCTL
 /*
- * proc handler for /proc/sys/kernel/nmi_watchdog
+ * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh
  */
 
-int proc_dowatchdog_enabled(struct ctl_table *table, int write,
-		     void __user *buffer, size_t *length, loff_t *ppos)
+int proc_dowatchdog(struct ctl_table *table, int write,
+		    void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	int ret;
 
-	ret = proc_dointvec(table, write, buffer, length, ppos);
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (ret || !write)
 		goto out;
 
-	if (watchdog_enabled)
+	if (watchdog_enabled && watchdog_thresh)
 		watchdog_enable_all_cpus();
 	else
 		watchdog_disable_all_cpus();
@@ -521,13 +521,6 @@ int proc_dowatchdog_enabled(struct ctl_table *table, int write,
 out:
 	return ret;
 }
-
-int proc_dowatchdog_thresh(struct ctl_table *table, int write,
-			     void __user *buffer,
-			     size_t *lenp, loff_t *ppos)
-{
-	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-}
 #endif /* CONFIG_SYSCTL */
 
 
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh
  2011-05-23  5:10 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
  2011-05-23  5:10 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
  2011-05-23  5:10 ` [PATCH 3/4] watchdog: disable watchdog when thresh is zero Mandeep Singh Baines
@ 2011-05-23  5:10 ` Mandeep Singh Baines
  2011-05-23 11:12 ` [tip:perf/urgent] watchdog: Fix rounding bug in get_sample_period() tip-bot for Mandeep Singh Baines
  3 siblings, 0 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-23  5:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mandeep Singh Baines, Marcin Slusarz, Don Zickus, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar, Andrew Morton

Before the conversion of the NMI watchdog to perf event, the watchdog
timeout was 5 seconds. Now it is 60 seconds. For my particular application,
netbooks, 5 seconds was a better timeout. With a short timeout, we
catch faults earlier and are able to send back a panic. With a 60 second
timeout, the user is unlikely to wait and will instead hit the power
button, causing us to lose the panic info.

This change configures the NMI period to watchdog_thresh and sets
the softlockup_thresh to watchdog_thresh * 2. In addition, watchdog_thresh
was reduced to 10 seconds as suggested by Ingo Molnar.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
LKML-Reference: <20110517071642.GF22305@elte.hu>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 arch/x86/kernel/apic/hw_nmi.c |    4 ++--
 include/linux/nmi.h           |    2 +-
 kernel/watchdog.c             |   19 +++++++++++++++----
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 5260fe9..d5e57db0 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -19,9 +19,9 @@
 #include <linux/delay.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
-u64 hw_nmi_get_sample_period(void)
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
-	return (u64)(cpu_khz) * 1000 * 60;
+	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
 }
 #endif
 
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 5317b8b..2d304ef 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -45,7 +45,7 @@ static inline bool trigger_all_cpu_backtrace(void)
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
-u64 hw_nmi_get_sample_period(void);
+u64 hw_nmi_get_sample_period(int watchdog_thresh);
 extern int watchdog_enabled;
 extern int watchdog_thresh;
 struct ctl_table;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6030191..6e63097 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,7 +28,7 @@
 #include <linux/perf_event.h>
 
 int watchdog_enabled = 1;
-int __read_mostly watchdog_thresh = 60;
+int __read_mostly watchdog_thresh = 10;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
@@ -91,6 +91,17 @@ static int __init nosoftlockup_setup(char *str)
 __setup("nosoftlockup", nosoftlockup_setup);
 /*  */
 
+/*
+ * Hard-lockup warnings should be triggered after just a few seconds. Soft-
+ * lockups can have false positives under extreme conditions. So we generally
+ * want a higher threshold for soft lockups than for hard lockups. So we couple
+ * the thresholds with a factor: we make the soft threshold twice the amount of
+ * time the hard threshold is.
+ */
+static int get_softlockup_thresh()
+{
+	return watchdog_thresh * 2;
+}
 
 /*
  * Returns seconds, approximately.  We don't need nanosecond
@@ -110,7 +121,7 @@ static unsigned long get_sample_period(void)
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return watchdog_thresh * (NSEC_PER_SEC / 5);
+	return get_softlockup_thresh() * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
@@ -182,7 +193,7 @@ static int is_softlockup(unsigned long touch_ts)
 	unsigned long now = get_timestamp(smp_processor_id());
 
 	/* Warn about unreasonable delays: */
-	if (time_after(now, touch_ts + watchdog_thresh))
+	if (time_after(now, touch_ts + get_softlockup_thresh()))
 		return now - touch_ts;
 
 	return 0;
@@ -359,7 +370,7 @@ static int watchdog_nmi_enable(int cpu)
 
 	/* Try to register using hardware perf events */
 	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period();
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
 	if (!IS_ERR(event)) {
 		printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");
-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] watchdog: fix rounding issues in get_sample_period()
  2011-05-20 11:51     ` Ingo Molnar
@ 2011-05-23  5:14       ` Mandeep Singh Baines
  0 siblings, 0 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-05-23  5:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mandeep Singh Baines, Don Zickus, linux-kernel, Marcin Slusarz,
	Peter Zijlstra, Frederic Weisbecker

Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mandeep Singh Baines <msb@chromium.org> wrote:
> 
> > Ingo, since Don is out, would you mind acking the patches (assuming you're 
> > happy). It simplifies pushing these patches into our local ChromiumOS tree if 
> > I have an upstream ack.
> 
> I suspect it would simplify things even more for you if they were all upstream, 
> right?
> 

Most definitely:)

> To get that process underway please send an updated series with (delta) patches 
> against -tip.
> 

Rebased series against tip/master and resent:

* Message-Id: <1306127423-3347-1-git-send-email-msb@chromium.org>

> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip:perf/urgent] watchdog: Fix rounding bug in get_sample_period()
  2011-05-23  5:10 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
                   ` (2 preceding siblings ...)
  2011-05-23  5:10 ` [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh Mandeep Singh Baines
@ 2011-05-23 11:12 ` tip-bot for Mandeep Singh Baines
  3 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Mandeep Singh Baines @ 2011-05-23 11:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, msb, hpa, mingo, a.p.zijlstra, marcin.slusarz,
	fweisbec, tglx, mingo, dzickus

Commit-ID:  824c6b7f6294101f30e141117def224a56c203e6
Gitweb:     http://git.kernel.org/tip/824c6b7f6294101f30e141117def224a56c203e6
Author:     Mandeep Singh Baines <msb@chromium.org>
AuthorDate: Sun, 22 May 2011 22:10:20 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 23 May 2011 11:58:58 +0200

watchdog: Fix rounding bug in get_sample_period()

In get_sample_period(), softlockup_thresh is integer divided by
5 before the multiplication by NSEC_PER_SEC. This results in
softlockup_thresh being rounded down to the nearest integer
multiple of 5.

For example, a softlockup_thresh of 4 rounds down to 0.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-1-git-send-email-msb@chromium.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/watchdog.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 14733d4..a06972d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -110,7 +110,7 @@ static unsigned long get_sample_period(void)
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return softlockup_thresh / 5 * NSEC_PER_SEC;
+	return softlockup_thresh * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip:perf/urgent] watchdog: Only disable/enable watchdog if neccessary
  2011-05-23  5:10 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
@ 2011-05-23 11:12   ` tip-bot for Mandeep Singh Baines
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Mandeep Singh Baines @ 2011-05-23 11:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, msb, hpa, mingo, a.p.zijlstra, marcin.slusarz,
	fweisbec, tglx, mingo, dzickus

Commit-ID:  e04ab2bc41b35c0cb6cdb07c8443f91aa738cf78
Gitweb:     http://git.kernel.org/tip/e04ab2bc41b35c0cb6cdb07c8443f91aa738cf78
Author:     Mandeep Singh Baines <msb@chromium.org>
AuthorDate: Sun, 22 May 2011 22:10:21 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 23 May 2011 11:58:58 +0200

watchdog: Only disable/enable watchdog if neccessary

Don't take any action on an unsuccessful write to /proc.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-2-git-send-email-msb@chromium.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/watchdog.c |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index a06972d..cf0e09f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -507,15 +507,19 @@ static void watchdog_disable_all_cpus(void)
 int proc_dowatchdog_enabled(struct ctl_table *table, int write,
 		     void __user *buffer, size_t *length, loff_t *ppos)
 {
-	proc_dointvec(table, write, buffer, length, ppos);
+	int ret;
 
-	if (write) {
-		if (watchdog_enabled)
-			watchdog_enable_all_cpus();
-		else
-			watchdog_disable_all_cpus();
-	}
-	return 0;
+	ret = proc_dointvec(table, write, buffer, length, ppos);
+	if (ret || !write)
+		goto out;
+
+	if (watchdog_enabled)
+		watchdog_enable_all_cpus();
+	else
+		watchdog_disable_all_cpus();
+
+out:
+	return ret;
 }
 
 int proc_dowatchdog_thresh(struct ctl_table *table, int write,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip:perf/urgent] watchdog: Disable watchdog when thresh is zero
  2011-05-17  7:10   ` Ingo Molnar
  2011-05-18  3:36     ` [PATCH 3/4 v2] " Mandeep Singh Baines
@ 2011-05-23 11:13     ` tip-bot for Mandeep Singh Baines
  1 sibling, 0 replies; 30+ messages in thread
From: tip-bot for Mandeep Singh Baines @ 2011-05-23 11:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, msb, hpa, mingo, a.p.zijlstra, marcin.slusarz,
	fweisbec, tglx, mingo, dzickus

Commit-ID:  586692a5a5fc5740c8a46abc0f2365495c2d7c5f
Gitweb:     http://git.kernel.org/tip/586692a5a5fc5740c8a46abc0f2365495c2d7c5f
Author:     Mandeep Singh Baines <msb@chromium.org>
AuthorDate: Sun, 22 May 2011 22:10:22 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 23 May 2011 11:58:59 +0200

watchdog: Disable watchdog when thresh is zero

This restores the previous behavior of softlock_thresh.

Currently, setting watchdog_thresh to zero causes the watchdog
kthreads to consume a lot of CPU.

In addition, the logic of proc_dowatchdog_thresh and
proc_dowatchdog_enabled has been factored into proc_dowatchdog.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-3-git-send-email-msb@chromium.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20110517071018.GE22305@elte.hu>
---
 include/linux/nmi.h   |    5 +++--
 include/linux/sched.h |    1 -
 kernel/sysctl.c       |   12 ++++++++----
 kernel/watchdog.c     |   25 +++++++++----------------
 4 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index c536f85..5317b8b 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -47,9 +47,10 @@ static inline bool trigger_all_cpu_backtrace(void)
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
 u64 hw_nmi_get_sample_period(void);
 extern int watchdog_enabled;
+extern int watchdog_thresh;
 struct ctl_table;
-extern int proc_dowatchdog_enabled(struct ctl_table *, int ,
-			void __user *, size_t *, loff_t *);
+extern int proc_dowatchdog(struct ctl_table *, int ,
+			   void __user *, size_t *, loff_t *);
 #endif
 
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 12211e1..d8b2d0b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -315,7 +315,6 @@ extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
 				  void __user *buffer,
 				  size_t *lenp, loff_t *ppos);
 extern unsigned int  softlockup_panic;
-extern int softlockup_thresh;
 void lockup_detector_init(void);
 #else
 static inline void touch_softlockup_watchdog(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..3dd0c46 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -730,14 +730,16 @@ static struct ctl_table kern_table[] = {
 		.data           = &watchdog_enabled,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
-		.proc_handler   = proc_dowatchdog_enabled,
+		.proc_handler   = proc_dowatchdog,
+		.extra1		= &zero,
+		.extra2		= &one,
 	},
 	{
 		.procname	= "watchdog_thresh",
-		.data		= &softlockup_thresh,
+		.data		= &watchdog_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dowatchdog_thresh,
+		.proc_handler	= proc_dowatchdog,
 		.extra1		= &neg_one,
 		.extra2		= &sixty,
 	},
@@ -755,7 +757,9 @@ static struct ctl_table kern_table[] = {
 		.data           = &watchdog_enabled,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
-		.proc_handler   = proc_dowatchdog_enabled,
+		.proc_handler   = proc_dowatchdog,
+		.extra1		= &zero,
+		.extra2		= &one,
 	},
 #endif
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index cf0e09f..6030191 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,7 +28,7 @@
 #include <linux/perf_event.h>
 
 int watchdog_enabled = 1;
-int __read_mostly softlockup_thresh = 60;
+int __read_mostly watchdog_thresh = 60;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
@@ -105,12 +105,12 @@ static unsigned long get_timestamp(int this_cpu)
 static unsigned long get_sample_period(void)
 {
 	/*
-	 * convert softlockup_thresh from seconds to ns
+	 * convert watchdog_thresh from seconds to ns
 	 * the divide by 5 is to give hrtimer 5 chances to
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return softlockup_thresh * (NSEC_PER_SEC / 5);
+	return watchdog_thresh * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
@@ -182,7 +182,7 @@ static int is_softlockup(unsigned long touch_ts)
 	unsigned long now = get_timestamp(smp_processor_id());
 
 	/* Warn about unreasonable delays: */
-	if (time_after(now, touch_ts + softlockup_thresh))
+	if (time_after(now, touch_ts + watchdog_thresh))
 		return now - touch_ts;
 
 	return 0;
@@ -501,19 +501,19 @@ static void watchdog_disable_all_cpus(void)
 /* sysctl functions */
 #ifdef CONFIG_SYSCTL
 /*
- * proc handler for /proc/sys/kernel/nmi_watchdog
+ * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh
  */
 
-int proc_dowatchdog_enabled(struct ctl_table *table, int write,
-		     void __user *buffer, size_t *length, loff_t *ppos)
+int proc_dowatchdog(struct ctl_table *table, int write,
+		    void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	int ret;
 
-	ret = proc_dointvec(table, write, buffer, length, ppos);
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	if (ret || !write)
 		goto out;
 
-	if (watchdog_enabled)
+	if (watchdog_enabled && watchdog_thresh)
 		watchdog_enable_all_cpus();
 	else
 		watchdog_disable_all_cpus();
@@ -521,13 +521,6 @@ int proc_dowatchdog_enabled(struct ctl_table *table, int write,
 out:
 	return ret;
 }
-
-int proc_dowatchdog_thresh(struct ctl_table *table, int write,
-			     void __user *buffer,
-			     size_t *lenp, loff_t *ppos)
-{
-	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-}
 #endif /* CONFIG_SYSCTL */
 
 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip:perf/urgent] watchdog: Change the default timeout and configure nmi watchdog period based on watchdog_thresh
  2011-05-17  7:16   ` Ingo Molnar
  2011-05-17 14:03     ` Don Zickus
  2011-05-18  3:44     ` [PATCH 4/4 v2] " Mandeep Singh Baines
@ 2011-05-23 11:13     ` tip-bot for Mandeep Singh Baines
  2011-05-24  3:58     ` [tip:perf/urgent] watchdog: Fix non-standard prototype of get_softlockup_thresh() tip-bot for Ingo Molnar
  3 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Mandeep Singh Baines @ 2011-05-23 11:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, msb, hpa, mingo, a.p.zijlstra, marcin.slusarz,
	fweisbec, tglx, mingo, dzickus

Commit-ID:  4eec42f392043063d0f019640b4ccc2a45570002
Gitweb:     http://git.kernel.org/tip/4eec42f392043063d0f019640b4ccc2a45570002
Author:     Mandeep Singh Baines <msb@chromium.org>
AuthorDate: Sun, 22 May 2011 22:10:23 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 23 May 2011 11:58:59 +0200

watchdog: Change the default timeout and configure nmi watchdog period based on watchdog_thresh

Before the conversion of the NMI watchdog to perf event, the
watchdog timeout was 5 seconds. Now it is 60 seconds. For my
particular application, netbooks, 5 seconds was a better
timeout. With a short timeout, we catch faults earlier and are
able to send back a panic. With a 60 second timeout, the user is
unlikely to wait and will instead hit the power button, causing
us to lose the panic info.

This change configures the NMI period to watchdog_thresh and
sets the softlockup_thresh to watchdog_thresh * 2. In addition,
watchdog_thresh was reduced to 10 seconds as suggested by Ingo
Molnar.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-4-git-send-email-msb@chromium.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20110517071642.GF22305@elte.hu>
---
 arch/x86/kernel/apic/hw_nmi.c |    4 ++--
 include/linux/nmi.h           |    2 +-
 kernel/watchdog.c             |   19 +++++++++++++++----
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 5260fe9..d5e57db0 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -19,9 +19,9 @@
 #include <linux/delay.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
-u64 hw_nmi_get_sample_period(void)
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
-	return (u64)(cpu_khz) * 1000 * 60;
+	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
 }
 #endif
 
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 5317b8b..2d304ef 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -45,7 +45,7 @@ static inline bool trigger_all_cpu_backtrace(void)
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 int hw_nmi_is_cpu_stuck(struct pt_regs *);
-u64 hw_nmi_get_sample_period(void);
+u64 hw_nmi_get_sample_period(int watchdog_thresh);
 extern int watchdog_enabled;
 extern int watchdog_thresh;
 struct ctl_table;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6030191..6e63097 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -28,7 +28,7 @@
 #include <linux/perf_event.h>
 
 int watchdog_enabled = 1;
-int __read_mostly watchdog_thresh = 60;
+int __read_mostly watchdog_thresh = 10;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog);
@@ -91,6 +91,17 @@ static int __init nosoftlockup_setup(char *str)
 __setup("nosoftlockup", nosoftlockup_setup);
 /*  */
 
+/*
+ * Hard-lockup warnings should be triggered after just a few seconds. Soft-
+ * lockups can have false positives under extreme conditions. So we generally
+ * want a higher threshold for soft lockups than for hard lockups. So we couple
+ * the thresholds with a factor: we make the soft threshold twice the amount of
+ * time the hard threshold is.
+ */
+static int get_softlockup_thresh()
+{
+	return watchdog_thresh * 2;
+}
 
 /*
  * Returns seconds, approximately.  We don't need nanosecond
@@ -110,7 +121,7 @@ static unsigned long get_sample_period(void)
 	 * increment before the hardlockup detector generates
 	 * a warning
 	 */
-	return watchdog_thresh * (NSEC_PER_SEC / 5);
+	return get_softlockup_thresh() * (NSEC_PER_SEC / 5);
 }
 
 /* Commands for resetting the watchdog */
@@ -182,7 +193,7 @@ static int is_softlockup(unsigned long touch_ts)
 	unsigned long now = get_timestamp(smp_processor_id());
 
 	/* Warn about unreasonable delays: */
-	if (time_after(now, touch_ts + watchdog_thresh))
+	if (time_after(now, touch_ts + get_softlockup_thresh()))
 		return now - touch_ts;
 
 	return 0;
@@ -359,7 +370,7 @@ static int watchdog_nmi_enable(int cpu)
 
 	/* Try to register using hardware perf events */
 	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period();
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
 	if (!IS_ERR(event)) {
 		printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");

^ permalink raw reply	[flat|nested] 30+ messages in thread

* linux-next: build warning in Lunus' tree
@ 2011-05-24  3:41 Stephen Rothwell
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen Rothwell @ 2011-05-24  3:41 UTC (permalink / raw)
  To: Linus; +Cc: linux-next, linux-kernel, Mandeep Singh Baines, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 400 bytes --]

Hi all,

In Linus' tree, today's linux-next build (sparc64 defconfig)
produced this warning:

kernel/watchdog.c:102: warning: function declaration isn't a prototype

Introduced by commit 4eec42f39204 ("watchdog: Change the default timeout
and configure nmi watchdog period based").

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip:perf/urgent] watchdog: Fix non-standard prototype of get_softlockup_thresh()
  2011-05-17  7:16   ` Ingo Molnar
                       ` (2 preceding siblings ...)
  2011-05-23 11:13     ` [tip:perf/urgent] watchdog: Change the default timeout and " tip-bot for Mandeep Singh Baines
@ 2011-05-24  3:58     ` tip-bot for Ingo Molnar
  3 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Ingo Molnar @ 2011-05-24  3:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, msb, hpa, mingo, a.p.zijlstra, marcin.slusarz,
	fweisbec, tglx, sfr, mingo, dzickus

Commit-ID:  6e9101aeec39961308176e0f59e73ac5d37d243a
Gitweb:     http://git.kernel.org/tip/6e9101aeec39961308176e0f59e73ac5d37d243a
Author:     Ingo Molnar <mingo@elte.hu>
AuthorDate: Tue, 24 May 2011 05:43:18 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 24 May 2011 05:53:39 +0200

watchdog: Fix non-standard prototype of get_softlockup_thresh()

This build warning slipped through:

  kernel/watchdog.c:102: warning: function declaration isn't a prototype

As reported by Stephen Rothwell.

Also address an unused variable warning that GCC 4.6.0 reports:
we cannot do anything about failed watchdog ops during CPU hotplug
(it's not serious enough to return an error from the notifier),
so ignore them.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20110524134129.8da27016.sfr@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20110517071642.GF22305@elte.hu>
---
 kernel/watchdog.c |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6e63097..3d0c56a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -98,7 +98,7 @@ __setup("nosoftlockup", nosoftlockup_setup);
  * the thresholds with a factor: we make the soft threshold twice the amount of
  * time the hard threshold is.
  */
-static int get_softlockup_thresh()
+static int get_softlockup_thresh(void)
 {
 	return watchdog_thresh * 2;
 }
@@ -415,15 +415,13 @@ static void watchdog_nmi_disable(int cpu) { return; }
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /* prepare/enable/disable routines */
-static int watchdog_prepare_cpu(int cpu)
+static void watchdog_prepare_cpu(int cpu)
 {
 	struct hrtimer *hrtimer = &per_cpu(watchdog_hrtimer, cpu);
 
 	WARN_ON(per_cpu(softlockup_watchdog, cpu));
 	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hrtimer->function = watchdog_timer_fn;
-
-	return 0;
 }
 
 static int watchdog_enable(int cpu)
@@ -542,17 +540,16 @@ static int __cpuinit
 cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
 	int hotcpu = (unsigned long)hcpu;
-	int err = 0;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		err = watchdog_prepare_cpu(hotcpu);
+		watchdog_prepare_cpu(hotcpu);
 		break;
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
 		if (watchdog_enabled)
-			err = watchdog_enable(hotcpu);
+			watchdog_enable(hotcpu);
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_UP_CANCELED:

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] watchdog: fix rounding issues in get_sample_period()
  2011-05-19 16:26   ` Mandeep Singh Baines
  2011-05-20 11:51     ` Ingo Molnar
@ 2011-06-09 11:47     ` Don Zickus
  2011-06-09 14:51       ` Mandeep Singh Baines
  1 sibling, 1 reply; 30+ messages in thread
From: Don Zickus @ 2011-06-09 11:47 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: linux-kernel, Marcin Slusarz, Peter Zijlstra,
	Frederic Weisbecker, Ingo Molnar

On Thu, May 19, 2011 at 09:26:31AM -0700, Mandeep Singh Baines wrote:
> Don Zickus (dzickus@redhat.com) wrote:
> > On Mon, May 16, 2011 at 04:34:58PM -0700, Mandeep Singh Baines wrote:
> > > In get_sample_period(), softlockup_thresh is integer divided by 5 before
> > > the multiplication by NSEC_PER_SEC. This results in softlockup_thresh
> > > being rounded down to the nearest integer multiple of 5.
> > > 
> > > For example, a softlockup_thresh of 4 rounds down to 0.
> > 
> > Mandeep,
> > 
> > Thanks for the patches.  Unfortunately, I am taking some time off so I'll
> > shephard these patches when I get back.  For the most part they look fine,
> > aside from Ingo's comments.
> > 
> 
> No problem. Enjoy your time off:)
> 
> Ingo, since Don is out, would you mind acking the patches (assuming you're
> happy). It simplifies pushing these patches into our local ChromiumOS tree
> if I have an upstream ack.

Hi Mandeep,

It looks like Ingo took most of your patches.  Is there any outstanding
ones that didn't make it that you see?

Cheers,
Don

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] watchdog: fix rounding issues in get_sample_period()
  2011-06-09 11:47     ` Don Zickus
@ 2011-06-09 14:51       ` Mandeep Singh Baines
  0 siblings, 0 replies; 30+ messages in thread
From: Mandeep Singh Baines @ 2011-06-09 14:51 UTC (permalink / raw)
  To: Don Zickus
  Cc: Mandeep Singh Baines, linux-kernel, Marcin Slusarz,
	Peter Zijlstra, Frederic Weisbecker, Ingo Molnar

Don Zickus (dzickus@redhat.com) wrote:
> On Thu, May 19, 2011 at 09:26:31AM -0700, Mandeep Singh Baines wrote:
> > Don Zickus (dzickus@redhat.com) wrote:
> > > On Mon, May 16, 2011 at 04:34:58PM -0700, Mandeep Singh Baines wrote:
> > > > In get_sample_period(), softlockup_thresh is integer divided by 5 before
> > > > the multiplication by NSEC_PER_SEC. This results in softlockup_thresh
> > > > being rounded down to the nearest integer multiple of 5.
> > > > 
> > > > For example, a softlockup_thresh of 4 rounds down to 0.
> > > 
> > > Mandeep,
> > > 
> > > Thanks for the patches.  Unfortunately, I am taking some time off so I'll
> > > shephard these patches when I get back.  For the most part they look fine,
> > > aside from Ingo's comments.
> > > 
> > 
> > No problem. Enjoy your time off:)
> > 
> > Ingo, since Don is out, would you mind acking the patches (assuming you're
> > happy). It simplifies pushing these patches into our local ChromiumOS tree
> > if I have an upstream ack.
> 
> Hi Mandeep,
> 
> It looks like Ingo took most of your patches.  Is there any outstanding
> ones that didn't make it that you see?
> 

Hi Don,

Thanks for following up. All the patches are now in.

I am working on a few new NMI issues so I'll be sending out a few patches
in a day or so. I've committed one critical fix to our 2.6.32 tree and am now
working on forward-porting it to tip/master:

http://git.chromium.org/gitweb/?p=chromiumos/third_party/kernel.git;a=commitdiff;h=2a825f5f38f24e655dd655253eaaa5a82460a43e;hp=09ecdd5b43b9df930efd51a3fe317c18907a6c2d

Regards,
Mandeep

> Cheers,
> Don

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-06-09 14:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-23  5:10 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
2011-05-23  5:10 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
2011-05-23 11:12   ` [tip:perf/urgent] watchdog: Only " tip-bot for Mandeep Singh Baines
2011-05-23  5:10 ` [PATCH 3/4] watchdog: disable watchdog when thresh is zero Mandeep Singh Baines
2011-05-23  5:10 ` [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh Mandeep Singh Baines
2011-05-23 11:12 ` [tip:perf/urgent] watchdog: Fix rounding bug in get_sample_period() tip-bot for Mandeep Singh Baines
  -- strict thread matches above, loose matches on Subject: below --
2011-05-24  3:41 linux-next: build warning in Lunus' tree Stephen Rothwell
2011-05-16 23:34 [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Mandeep Singh Baines
2011-05-16 23:34 ` [PATCH 2/4] watchdog: only disable/enable watchdog if neccessary Mandeep Singh Baines
2011-05-16 23:35 ` [PATCH 3/4] watchdog: disable watchdog when thresh is zero Mandeep Singh Baines
2011-05-17  7:10   ` Ingo Molnar
2011-05-18  3:36     ` [PATCH 3/4 v2] " Mandeep Singh Baines
2011-05-18  8:35       ` Ingo Molnar
2011-05-19 16:20         ` [PATCH 3/4 v3] " Mandeep Singh Baines
2011-05-23 11:13     ` [tip:perf/urgent] watchdog: Disable " tip-bot for Mandeep Singh Baines
2011-05-16 23:35 ` [PATCH 4/4] watchdog: configure nmi watchdog period based on watchdog_thresh Mandeep Singh Baines
2011-05-17  7:16   ` Ingo Molnar
2011-05-17 14:03     ` Don Zickus
2011-05-17 18:47       ` Ingo Molnar
2011-05-18  3:44     ` [PATCH 4/4 v2] " Mandeep Singh Baines
2011-05-18  8:39       ` Ingo Molnar
2011-05-18 13:51         ` Don Zickus
2011-05-23 11:13     ` [tip:perf/urgent] watchdog: Change the default timeout and " tip-bot for Mandeep Singh Baines
2011-05-24  3:58     ` [tip:perf/urgent] watchdog: Fix non-standard prototype of get_softlockup_thresh() tip-bot for Ingo Molnar
2011-05-18 13:53 ` [PATCH 1/4] watchdog: fix rounding issues in get_sample_period() Don Zickus
2011-05-19 16:26   ` Mandeep Singh Baines
2011-05-20 11:51     ` Ingo Molnar
2011-05-23  5:14       ` Mandeep Singh Baines
2011-06-09 11:47     ` Don Zickus
2011-06-09 14:51       ` Mandeep Singh Baines

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.