All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
@ 2017-02-17 18:25 Niklas Cassel
  2017-02-19 16:46 ` Guenter Roeck
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Niklas Cassel @ 2017-02-17 18:25 UTC (permalink / raw)
  To: wim, linux, edumazet, peterz; +Cc: linux-watchdog, linux-kernel, niklass

From: Niklas Cassel <niklas.cassel@axis.com>

Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.

Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
pending softirqs are no longer always handled immediately, instead,
if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
the handling of the softirqs are deferred, and are instead supposed
to be handled by ksoftirqd, when ksoftirqd gets scheduled.

If a user space process with a real-time policy starts to misbehave
by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
what will happen is that all softirqs will get deferred, while ksoftirqd,
which is supposed to handle the deferred softirqs, will never get to run.

To make sure that the watchdog is able to fire even when we do not get
to run softirqs, replace the timers with hrtimers.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 drivers/watchdog/softdog.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
index c7bdc986dca1..0f67cd068465 100644
--- a/drivers/watchdog/softdog.c
+++ b/drivers/watchdog/softdog.c
@@ -21,13 +21,12 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/hrtimer.h>
 #include <linux/init.h>
-#include <linux/jiffies.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/reboot.h>
-#include <linux/timer.h>
 #include <linux/types.h>
 #include <linux/watchdog.h>
 
@@ -54,7 +53,10 @@ module_param(soft_panic, int, 0);
 MODULE_PARM_DESC(soft_panic,
 	"Softdog action, set to 1 to panic, 0 to reboot (default=0)");
 
-static void softdog_fire(unsigned long data)
+static struct hrtimer softdog_ticktock;
+static struct hrtimer softdog_preticktock;
+
+static enum hrtimer_restart softdog_fire(struct hrtimer *timer)
 {
 	module_put(THIS_MODULE);
 	if (soft_noboot) {
@@ -67,41 +69,42 @@ static void softdog_fire(unsigned long data)
 		emergency_restart();
 		pr_crit("Reboot didn't ?????\n");
 	}
-}
 
-static struct timer_list softdog_ticktock =
-		TIMER_INITIALIZER(softdog_fire, 0, 0);
+	return HRTIMER_NORESTART;
+}
 
 static struct watchdog_device softdog_dev;
 
-static void softdog_pretimeout(unsigned long data)
+static enum hrtimer_restart softdog_pretimeout(struct hrtimer *timer)
 {
 	watchdog_notify_pretimeout(&softdog_dev);
-}
 
-static struct timer_list softdog_preticktock =
-		TIMER_INITIALIZER(softdog_pretimeout, 0, 0);
+	return HRTIMER_NORESTART;
+}
 
 static int softdog_ping(struct watchdog_device *w)
 {
-	if (!mod_timer(&softdog_ticktock, jiffies + (w->timeout * HZ)))
+	if (!hrtimer_active(&softdog_ticktock))
 		__module_get(THIS_MODULE);
+	hrtimer_start(&softdog_ticktock, ktime_set(w->timeout, 0),
+		      HRTIMER_MODE_REL);
 
 	if (w->pretimeout)
-		mod_timer(&softdog_preticktock, jiffies +
-			  (w->timeout - w->pretimeout) * HZ);
+		hrtimer_start(&softdog_preticktock,
+			      ktime_set(w->timeout - w->pretimeout, 0),
+			      HRTIMER_MODE_REL);
 	else
-		del_timer(&softdog_preticktock);
+		hrtimer_cancel(&softdog_preticktock);
 
 	return 0;
 }
 
 static int softdog_stop(struct watchdog_device *w)
 {
-	if (del_timer(&softdog_ticktock))
+	if (hrtimer_cancel(&softdog_ticktock))
 		module_put(THIS_MODULE);
 
-	del_timer(&softdog_preticktock);
+	hrtimer_cancel(&softdog_preticktock);
 
 	return 0;
 }
@@ -134,6 +137,11 @@ static int __init softdog_init(void)
 	watchdog_set_nowayout(&softdog_dev, nowayout);
 	watchdog_stop_on_reboot(&softdog_dev);
 
+	hrtimer_init(&softdog_ticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	softdog_ticktock.function = softdog_fire;
+	hrtimer_init(&softdog_preticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	softdog_preticktock.function = softdog_pretimeout;
+
 	ret = watchdog_register_device(&softdog_dev);
 	if (ret)
 		return ret;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-17 18:25 [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run Niklas Cassel
@ 2017-02-19 16:46 ` Guenter Roeck
  2017-02-20  8:36   ` Peter Zijlstra
  2017-02-20 10:03   ` Niklas Cassel
  2017-02-20 15:35 ` Guenter Roeck
  2017-02-27  4:04 ` Guenter Roeck
  2 siblings, 2 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-19 16:46 UTC (permalink / raw)
  To: Niklas Cassel, wim, edumazet, peterz
  Cc: linux-watchdog, linux-kernel, niklass, Wolfram Sang

Cc: Wolfram for input.

On 02/17/2017 10:25 AM, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@axis.com>
>
> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>
> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> pending softirqs are no longer always handled immediately, instead,
> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> the handling of the softirqs are deferred, and are instead supposed
> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>
> If a user space process with a real-time policy starts to misbehave
> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> what will happen is that all softirqs will get deferred, while ksoftirqd,
> which is supposed to handle the deferred softirqs, will never get to run.
>
> To make sure that the watchdog is able to fire even when we do not get
> to run softirqs, replace the timers with hrtimers.
>

This makes the driver dependent on HIGH_RES_TIMERS, which is not available
on all architectures. Before adding that restriction, I would like to see
some discussion if this is the only feasible solution.

Is this driver the only one with this problem, or is anything using
timers affected ?

Thanks,
Guenter

> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
> ---
>  drivers/watchdog/softdog.c | 40 ++++++++++++++++++++++++----------------
>  1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
> index c7bdc986dca1..0f67cd068465 100644
> --- a/drivers/watchdog/softdog.c
> +++ b/drivers/watchdog/softdog.c
> @@ -21,13 +21,12 @@
>
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/hrtimer.h>
>  #include <linux/init.h>
> -#include <linux/jiffies.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/moduleparam.h>
>  #include <linux/reboot.h>
> -#include <linux/timer.h>
>  #include <linux/types.h>
>  #include <linux/watchdog.h>
>
> @@ -54,7 +53,10 @@ module_param(soft_panic, int, 0);
>  MODULE_PARM_DESC(soft_panic,
>  	"Softdog action, set to 1 to panic, 0 to reboot (default=0)");
>
> -static void softdog_fire(unsigned long data)
> +static struct hrtimer softdog_ticktock;
> +static struct hrtimer softdog_preticktock;
> +
> +static enum hrtimer_restart softdog_fire(struct hrtimer *timer)
>  {
>  	module_put(THIS_MODULE);
>  	if (soft_noboot) {
> @@ -67,41 +69,42 @@ static void softdog_fire(unsigned long data)
>  		emergency_restart();
>  		pr_crit("Reboot didn't ?????\n");
>  	}
> -}
>
> -static struct timer_list softdog_ticktock =
> -		TIMER_INITIALIZER(softdog_fire, 0, 0);
> +	return HRTIMER_NORESTART;
> +}
>
>  static struct watchdog_device softdog_dev;
>
> -static void softdog_pretimeout(unsigned long data)
> +static enum hrtimer_restart softdog_pretimeout(struct hrtimer *timer)
>  {
>  	watchdog_notify_pretimeout(&softdog_dev);
> -}
>
> -static struct timer_list softdog_preticktock =
> -		TIMER_INITIALIZER(softdog_pretimeout, 0, 0);
> +	return HRTIMER_NORESTART;
> +}
>
>  static int softdog_ping(struct watchdog_device *w)
>  {
> -	if (!mod_timer(&softdog_ticktock, jiffies + (w->timeout * HZ)))
> +	if (!hrtimer_active(&softdog_ticktock))
>  		__module_get(THIS_MODULE);
> +	hrtimer_start(&softdog_ticktock, ktime_set(w->timeout, 0),
> +		      HRTIMER_MODE_REL);
>
>  	if (w->pretimeout)
> -		mod_timer(&softdog_preticktock, jiffies +
> -			  (w->timeout - w->pretimeout) * HZ);
> +		hrtimer_start(&softdog_preticktock,
> +			      ktime_set(w->timeout - w->pretimeout, 0),
> +			      HRTIMER_MODE_REL);
>  	else
> -		del_timer(&softdog_preticktock);
> +		hrtimer_cancel(&softdog_preticktock);
>
>  	return 0;
>  }
>
>  static int softdog_stop(struct watchdog_device *w)
>  {
> -	if (del_timer(&softdog_ticktock))
> +	if (hrtimer_cancel(&softdog_ticktock))
>  		module_put(THIS_MODULE);
>
> -	del_timer(&softdog_preticktock);
> +	hrtimer_cancel(&softdog_preticktock);
>
>  	return 0;
>  }
> @@ -134,6 +137,11 @@ static int __init softdog_init(void)
>  	watchdog_set_nowayout(&softdog_dev, nowayout);
>  	watchdog_stop_on_reboot(&softdog_dev);
>
> +	hrtimer_init(&softdog_ticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	softdog_ticktock.function = softdog_fire;
> +	hrtimer_init(&softdog_preticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	softdog_preticktock.function = softdog_pretimeout;
> +
>  	ret = watchdog_register_device(&softdog_dev);
>  	if (ret)
>  		return ret;
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-19 16:46 ` Guenter Roeck
@ 2017-02-20  8:36   ` Peter Zijlstra
  2017-02-20  9:02     ` Guenter Roeck
  2017-02-20 10:03   ` Niklas Cassel
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2017-02-20  8:36 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Niklas Cassel, wim, edumazet, linux-watchdog, linux-kernel,
	niklass, Wolfram Sang

On Sun, Feb 19, 2017 at 08:46:28AM -0800, Guenter Roeck wrote:
> Cc: Wolfram for input.
> 
> On 02/17/2017 10:25 AM, Niklas Cassel wrote:
> >From: Niklas Cassel <niklas.cassel@axis.com>
> >
> >Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
> >
> >Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> >pending softirqs are no longer always handled immediately, instead,
> >if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> >the handling of the softirqs are deferred, and are instead supposed
> >to be handled by ksoftirqd, when ksoftirqd gets scheduled.
> >
> >If a user space process with a real-time policy starts to misbehave
> >by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> >what will happen is that all softirqs will get deferred, while ksoftirqd,
> >which is supposed to handle the deferred softirqs, will never get to run.
> >
> >To make sure that the watchdog is able to fire even when we do not get
> >to run softirqs, replace the timers with hrtimers.
> >
> 
> This makes the driver dependent on HIGH_RES_TIMERS, which is not available
> on all architectures. Before adding that restriction, I would like to see
> some discussion if this is the only feasible solution.
> 

It does no such thing; the hrtimer interface is always available.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-20  8:36   ` Peter Zijlstra
@ 2017-02-20  9:02     ` Guenter Roeck
  0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-20  9:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Niklas Cassel, wim, edumazet, linux-watchdog, linux-kernel,
	niklass, Wolfram Sang

On 02/20/2017 12:36 AM, Peter Zijlstra wrote:
> On Sun, Feb 19, 2017 at 08:46:28AM -0800, Guenter Roeck wrote:
>> Cc: Wolfram for input.
>>
>> On 02/17/2017 10:25 AM, Niklas Cassel wrote:
>>> From: Niklas Cassel <niklas.cassel@axis.com>
>>>
>>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>>
>>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>>> pending softirqs are no longer always handled immediately, instead,
>>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>>> the handling of the softirqs are deferred, and are instead supposed
>>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>>
>>> If a user space process with a real-time policy starts to misbehave
>>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>>> which is supposed to handle the deferred softirqs, will never get to run.
>>>
>>> To make sure that the watchdog is able to fire even when we do not get
>>> to run softirqs, replace the timers with hrtimers.
>>>
>>
>> This makes the driver dependent on HIGH_RES_TIMERS, which is not available
>> on all architectures. Before adding that restriction, I would like to see
>> some discussion if this is the only feasible solution.
>>
>
> It does no such thing; the hrtimer interface is always available.
>

That is good to hear. Thanks for the correction.

Guenter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-19 16:46 ` Guenter Roeck
  2017-02-20  8:36   ` Peter Zijlstra
@ 2017-02-20 10:03   ` Niklas Cassel
  2017-02-20 15:22     ` Eric Dumazet
  2017-02-20 15:31     ` Guenter Roeck
  1 sibling, 2 replies; 11+ messages in thread
From: Niklas Cassel @ 2017-02-20 10:03 UTC (permalink / raw)
  To: Guenter Roeck, wim, edumazet, peterz
  Cc: linux-watchdog, linux-kernel, Wolfram Sang

On 02/19/2017 05:46 PM, Guenter Roeck wrote:
> Cc: Wolfram for input.
>
> On 02/17/2017 10:25 AM, Niklas Cassel wrote:
>> From: Niklas Cassel <niklas.cassel@axis.com>
>>
>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>
>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>> pending softirqs are no longer always handled immediately, instead,
>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>> the handling of the softirqs are deferred, and are instead supposed
>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>
>> If a user space process with a real-time policy starts to misbehave
>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>> which is supposed to handle the deferred softirqs, will never get to run.
>>
>> To make sure that the watchdog is able to fire even when we do not get
>> to run softirqs, replace the timers with hrtimers.
>>
>
> This makes the driver dependent on HIGH_RES_TIMERS, which is not available
> on all architectures. Before adding that restriction, I would like to see
> some discussion if this is the only feasible solution.
>
> Is this driver the only one with this problem, or is anything using
> timers affected ?

Anything using timers is affected.
The timers will still get incremented, but the code checking for timer
expiration is run from a softirq, which in this case never gets to run,
so the timers will never expire.

Before 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), softirqs
were never deferred, so they always got to run when exiting an irq.

So previously with a user space process using all the CPU, like:
chrt -r 99 sh -c "while :; do :; done"
the softdog would still fire.

If we ask the system to run something all the time,
and the system does that, I don't think we can blame the system.
It is however important that the watchdog can still detect and
fire when this happens. Other drivers, not so much.

I guess another solution would be to modify the if-statements in
kernel/softirq.c to sometimes do the softirq directly, even if ksoftirqd
is in state TASK_RUNNING, if we also meet some other condition.
However, do we want to add that extra complexity?
Perhaps someone with more softirq/scheduler knowledge can give
some input on this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-20 10:03   ` Niklas Cassel
@ 2017-02-20 15:22     ` Eric Dumazet
  2017-02-20 15:31     ` Guenter Roeck
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2017-02-20 15:22 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Guenter Roeck, wim, Peter Zijlstra, linux-watchdog, LKML, Wolfram Sang

[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]

On Mon, Feb 20, 2017 at 2:03 AM, Niklas Cassel <niklas.cassel@axis.com>
wrote:

> On 02/19/2017 05:46 PM, Guenter Roeck wrote:
> > Cc: Wolfram for input.
> >
> > On 02/17/2017 10:25 AM, Niklas Cassel wrote:
> >> From: Niklas Cassel <niklas.cassel@axis.com>
> >>
> >> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
> >>
> >> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> >> pending softirqs are no longer always handled immediately, instead,
> >> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> >> the handling of the softirqs are deferred, and are instead supposed
> >> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
> >>
> >> If a user space process with a real-time policy starts to misbehave
> >> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> >> what will happen is that all softirqs will get deferred, while
> ksoftirqd,
> >> which is supposed to handle the deferred softirqs, will never get to
> run.
> >>
> >> To make sure that the watchdog is able to fire even when we do not get
> >> to run softirqs, replace the timers with hrtimers.
> >>
> >
> > This makes the driver dependent on HIGH_RES_TIMERS, which is not
> available
> > on all architectures. Before adding that restriction, I would like to see
> > some discussion if this is the only feasible solution.
> >
> > Is this driver the only one with this problem, or is anything using
> > timers affected ?
>
> Anything using timers is affected.
> The timers will still get incremented, but the code checking for timer
> expiration is run from a softirq, which in this case never gets to run,
> so the timers will never expire.
>
> Before 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), softirqs
> were never deferred, so they always got to run when exiting an irq.
>
> So previously with a user space process using all the CPU, like:
> chrt -r 99 sh -c "while :; do :; done"
> the softdog would still fire.
>
>
So the question is :

If some RT process does an infinite loop, should we care about system being
functional ?

Looks like the OS is now WAI (Working As Intended)



> If we ask the system to run something all the time,
> and the system does that, I don't think we can blame the system.
> It is however important that the watchdog can still detect and
> fire when this happens. Other drivers, not so much.
>
> I guess another solution would be to modify the if-statements in
> kernel/softirq.c to sometimes do the softirq directly, even if ksoftirqd
> is in state TASK_RUNNING, if we also meet some other condition.
> However, do we want to add that extra complexity?
> Perhaps someone with more softirq/scheduler knowledge can give
> some input on this.
>

[-- Attachment #2: Type: text/html, Size: 3658 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-20 10:03   ` Niklas Cassel
  2017-02-20 15:22     ` Eric Dumazet
@ 2017-02-20 15:31     ` Guenter Roeck
  1 sibling, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-20 15:31 UTC (permalink / raw)
  To: Niklas Cassel, wim, edumazet, peterz
  Cc: linux-watchdog, linux-kernel, Wolfram Sang

On 02/20/2017 02:03 AM, Niklas Cassel wrote:
> On 02/19/2017 05:46 PM, Guenter Roeck wrote:
>> Cc: Wolfram for input.
>>
>> On 02/17/2017 10:25 AM, Niklas Cassel wrote:
>>> From: Niklas Cassel <niklas.cassel@axis.com>
>>>
>>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>>
>>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>>> pending softirqs are no longer always handled immediately, instead,
>>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>>> the handling of the softirqs are deferred, and are instead supposed
>>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>>
>>> If a user space process with a real-time policy starts to misbehave
>>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>>> which is supposed to handle the deferred softirqs, will never get to run.
>>>
>>> To make sure that the watchdog is able to fire even when we do not get
>>> to run softirqs, replace the timers with hrtimers.
>>>
>>
>> This makes the driver dependent on HIGH_RES_TIMERS, which is not available
>> on all architectures. Before adding that restriction, I would like to see
>> some discussion if this is the only feasible solution.
>>
>> Is this driver the only one with this problem, or is anything using
>> timers affected ?
>
> Anything using timers is affected.
> The timers will still get incremented, but the code checking for timer
> expiration is run from a softirq, which in this case never gets to run,
> so the timers will never expire.
>
> Before 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), softirqs
> were never deferred, so they always got to run when exiting an irq.
>
> So previously with a user space process using all the CPU, like:
> chrt -r 99 sh -c "while :; do :; done"
> the softdog would still fire.
>
> If we ask the system to run something all the time,
> and the system does that, I don't think we can blame the system.
> It is however important that the watchdog can still detect and
> fire when this happens. Other drivers, not so much.
>
> I guess another solution would be to modify the if-statements in
> kernel/softirq.c to sometimes do the softirq directly, even if ksoftirqd
> is in state TASK_RUNNING, if we also meet some other condition.
> However, do we want to add that extra complexity?

No, in that respect using hrtimers is ok. I was more concerned about
other users of timers. But, as you say, I guess that is considered ok.

Guenter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-17 18:25 [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run Niklas Cassel
  2017-02-19 16:46 ` Guenter Roeck
@ 2017-02-20 15:35 ` Guenter Roeck
  2017-02-27  4:04 ` Guenter Roeck
  2 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-20 15:35 UTC (permalink / raw)
  To: Niklas Cassel, wim, edumazet, peterz
  Cc: linux-watchdog, linux-kernel, niklass

On 02/17/2017 10:25 AM, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@axis.com>
>
> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>
> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> pending softirqs are no longer always handled immediately, instead,
> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> the handling of the softirqs are deferred, and are instead supposed
> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>
> If a user space process with a real-time policy starts to misbehave
> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> what will happen is that all softirqs will get deferred, while ksoftirqd,
> which is supposed to handle the deferred softirqs, will never get to run.
>
> To make sure that the watchdog is able to fire even when we do not get
> to run softirqs, replace the timers with hrtimers.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>

Reviewed-by: Guenter Roeck <linux@roeck-us.net>

> ---
>  drivers/watchdog/softdog.c | 40 ++++++++++++++++++++++++----------------
>  1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
> index c7bdc986dca1..0f67cd068465 100644
> --- a/drivers/watchdog/softdog.c
> +++ b/drivers/watchdog/softdog.c
> @@ -21,13 +21,12 @@
>
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/hrtimer.h>
>  #include <linux/init.h>
> -#include <linux/jiffies.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/moduleparam.h>
>  #include <linux/reboot.h>
> -#include <linux/timer.h>
>  #include <linux/types.h>
>  #include <linux/watchdog.h>
>
> @@ -54,7 +53,10 @@ module_param(soft_panic, int, 0);
>  MODULE_PARM_DESC(soft_panic,
>  	"Softdog action, set to 1 to panic, 0 to reboot (default=0)");
>
> -static void softdog_fire(unsigned long data)
> +static struct hrtimer softdog_ticktock;
> +static struct hrtimer softdog_preticktock;
> +
> +static enum hrtimer_restart softdog_fire(struct hrtimer *timer)
>  {
>  	module_put(THIS_MODULE);
>  	if (soft_noboot) {
> @@ -67,41 +69,42 @@ static void softdog_fire(unsigned long data)
>  		emergency_restart();
>  		pr_crit("Reboot didn't ?????\n");
>  	}
> -}
>
> -static struct timer_list softdog_ticktock =
> -		TIMER_INITIALIZER(softdog_fire, 0, 0);
> +	return HRTIMER_NORESTART;
> +}
>
>  static struct watchdog_device softdog_dev;
>
> -static void softdog_pretimeout(unsigned long data)
> +static enum hrtimer_restart softdog_pretimeout(struct hrtimer *timer)
>  {
>  	watchdog_notify_pretimeout(&softdog_dev);
> -}
>
> -static struct timer_list softdog_preticktock =
> -		TIMER_INITIALIZER(softdog_pretimeout, 0, 0);
> +	return HRTIMER_NORESTART;
> +}
>
>  static int softdog_ping(struct watchdog_device *w)
>  {
> -	if (!mod_timer(&softdog_ticktock, jiffies + (w->timeout * HZ)))
> +	if (!hrtimer_active(&softdog_ticktock))
>  		__module_get(THIS_MODULE);
> +	hrtimer_start(&softdog_ticktock, ktime_set(w->timeout, 0),
> +		      HRTIMER_MODE_REL);
>
>  	if (w->pretimeout)
> -		mod_timer(&softdog_preticktock, jiffies +
> -			  (w->timeout - w->pretimeout) * HZ);
> +		hrtimer_start(&softdog_preticktock,
> +			      ktime_set(w->timeout - w->pretimeout, 0),
> +			      HRTIMER_MODE_REL);
>  	else
> -		del_timer(&softdog_preticktock);
> +		hrtimer_cancel(&softdog_preticktock);
>
>  	return 0;
>  }
>
>  static int softdog_stop(struct watchdog_device *w)
>  {
> -	if (del_timer(&softdog_ticktock))
> +	if (hrtimer_cancel(&softdog_ticktock))
>  		module_put(THIS_MODULE);
>
> -	del_timer(&softdog_preticktock);
> +	hrtimer_cancel(&softdog_preticktock);
>
>  	return 0;
>  }
> @@ -134,6 +137,11 @@ static int __init softdog_init(void)
>  	watchdog_set_nowayout(&softdog_dev, nowayout);
>  	watchdog_stop_on_reboot(&softdog_dev);
>
> +	hrtimer_init(&softdog_ticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	softdog_ticktock.function = softdog_fire;
> +	hrtimer_init(&softdog_preticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	softdog_preticktock.function = softdog_pretimeout;
> +
>  	ret = watchdog_register_device(&softdog_dev);
>  	if (ret)
>  		return ret;
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-17 18:25 [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run Niklas Cassel
  2017-02-19 16:46 ` Guenter Roeck
  2017-02-20 15:35 ` Guenter Roeck
@ 2017-02-27  4:04 ` Guenter Roeck
  2017-02-27 12:58   ` Niklas Cassel
  2 siblings, 1 reply; 11+ messages in thread
From: Guenter Roeck @ 2017-02-27  4:04 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: wim, edumazet, peterz, linux-watchdog, linux-kernel, niklass

On Fri, Feb 17, 2017 at 07:25:02PM +0100, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@axis.com>
> 
> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
> 
> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> pending softirqs are no longer always handled immediately, instead,
> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> the handling of the softirqs are deferred, and are instead supposed
> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
> 
> If a user space process with a real-time policy starts to misbehave
> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> what will happen is that all softirqs will get deferred, while ksoftirqd,
> which is supposed to handle the deferred softirqs, will never get to run.
> 
> To make sure that the watchdog is able to fire even when we do not get
> to run softirqs, replace the timers with hrtimers.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
> Reviewed-by: Guenter Roeck <linux@roeck-us.net>

Niklas,

Please rebase onto current mainline, test, and resubmit.

Thanks,
Guenter

> ---
>  drivers/watchdog/softdog.c | 40 ++++++++++++++++++++++++----------------
>  1 file changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
> index c7bdc986dca1..0f67cd068465 100644
> --- a/drivers/watchdog/softdog.c
> +++ b/drivers/watchdog/softdog.c
> @@ -21,13 +21,12 @@
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include <linux/hrtimer.h>
>  #include <linux/init.h>
> -#include <linux/jiffies.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/moduleparam.h>
>  #include <linux/reboot.h>
> -#include <linux/timer.h>
>  #include <linux/types.h>
>  #include <linux/watchdog.h>
>  
> @@ -54,7 +53,10 @@ module_param(soft_panic, int, 0);
>  MODULE_PARM_DESC(soft_panic,
>  	"Softdog action, set to 1 to panic, 0 to reboot (default=0)");
>  
> -static void softdog_fire(unsigned long data)
> +static struct hrtimer softdog_ticktock;
> +static struct hrtimer softdog_preticktock;
> +
> +static enum hrtimer_restart softdog_fire(struct hrtimer *timer)
>  {
>  	module_put(THIS_MODULE);
>  	if (soft_noboot) {
> @@ -67,41 +69,42 @@ static void softdog_fire(unsigned long data)
>  		emergency_restart();
>  		pr_crit("Reboot didn't ?????\n");
>  	}
> -}
>  
> -static struct timer_list softdog_ticktock =
> -		TIMER_INITIALIZER(softdog_fire, 0, 0);
> +	return HRTIMER_NORESTART;
> +}
>  
>  static struct watchdog_device softdog_dev;
>  
> -static void softdog_pretimeout(unsigned long data)
> +static enum hrtimer_restart softdog_pretimeout(struct hrtimer *timer)
>  {
>  	watchdog_notify_pretimeout(&softdog_dev);
> -}
>  
> -static struct timer_list softdog_preticktock =
> -		TIMER_INITIALIZER(softdog_pretimeout, 0, 0);
> +	return HRTIMER_NORESTART;
> +}
>  
>  static int softdog_ping(struct watchdog_device *w)
>  {
> -	if (!mod_timer(&softdog_ticktock, jiffies + (w->timeout * HZ)))
> +	if (!hrtimer_active(&softdog_ticktock))
>  		__module_get(THIS_MODULE);
> +	hrtimer_start(&softdog_ticktock, ktime_set(w->timeout, 0),
> +		      HRTIMER_MODE_REL);
>  
>  	if (w->pretimeout)
> -		mod_timer(&softdog_preticktock, jiffies +
> -			  (w->timeout - w->pretimeout) * HZ);
> +		hrtimer_start(&softdog_preticktock,
> +			      ktime_set(w->timeout - w->pretimeout, 0),
> +			      HRTIMER_MODE_REL);
>  	else
> -		del_timer(&softdog_preticktock);
> +		hrtimer_cancel(&softdog_preticktock);
>  
>  	return 0;
>  }
>  
>  static int softdog_stop(struct watchdog_device *w)
>  {
> -	if (del_timer(&softdog_ticktock))
> +	if (hrtimer_cancel(&softdog_ticktock))
>  		module_put(THIS_MODULE);
>  
> -	del_timer(&softdog_preticktock);
> +	hrtimer_cancel(&softdog_preticktock);
>  
>  	return 0;
>  }
> @@ -134,6 +137,11 @@ static int __init softdog_init(void)
>  	watchdog_set_nowayout(&softdog_dev, nowayout);
>  	watchdog_stop_on_reboot(&softdog_dev);
>  
> +	hrtimer_init(&softdog_ticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	softdog_ticktock.function = softdog_fire;
> +	hrtimer_init(&softdog_preticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	softdog_preticktock.function = softdog_pretimeout;
> +
>  	ret = watchdog_register_device(&softdog_dev);
>  	if (ret)
>  		return ret;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-27  4:04 ` Guenter Roeck
@ 2017-02-27 12:58   ` Niklas Cassel
  2017-02-27 14:31     ` Guenter Roeck
  0 siblings, 1 reply; 11+ messages in thread
From: Niklas Cassel @ 2017-02-27 12:58 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: wim, edumazet, peterz, linux-watchdog, linux-kernel

On 02/27/2017 05:04 AM, Guenter Roeck wrote:
> On Fri, Feb 17, 2017 at 07:25:02PM +0100, Niklas Cassel wrote:
>> From: Niklas Cassel <niklas.cassel@axis.com>
>>
>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>
>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>> pending softirqs are no longer always handled immediately, instead,
>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>> the handling of the softirqs are deferred, and are instead supposed
>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>
>> If a user space process with a real-time policy starts to misbehave
>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>> which is supposed to handle the deferred softirqs, will never get to run.
>>
>> To make sure that the watchdog is able to fire even when we do not get
>> to run softirqs, replace the timers with hrtimers.
>>
>> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
> Niklas,
>
> Please rebase onto current mainline, test, and resubmit.

I've sent out a v2 now :)

Although, I thought that it was a bit weird that the conflicting patch
was in Linus tree but is not in next-20170227.

I'm not sure if you are the web master for www.linux-watchdog.org,
but their cgit seems to be broken:
http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog.git;a=summary
http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog-next.git;a=summary


>
> Thanks,
> Guenter
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: watchdog: softdog: fire watchdog even if softirqs do not get to run
  2017-02-27 12:58   ` Niklas Cassel
@ 2017-02-27 14:31     ` Guenter Roeck
  0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-27 14:31 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: wim, edumazet, peterz, linux-watchdog, linux-kernel

On 02/27/2017 04:58 AM, Niklas Cassel wrote:
> On 02/27/2017 05:04 AM, Guenter Roeck wrote:
>> On Fri, Feb 17, 2017 at 07:25:02PM +0100, Niklas Cassel wrote:
>>> From: Niklas Cassel <niklas.cassel@axis.com>
>>>
>>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>>
>>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>>> pending softirqs are no longer always handled immediately, instead,
>>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>>> the handling of the softirqs are deferred, and are instead supposed
>>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>>
>>> If a user space process with a real-time policy starts to misbehave
>>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>>> which is supposed to handle the deferred softirqs, will never get to run.
>>>
>>> To make sure that the watchdog is able to fire even when we do not get
>>> to run softirqs, replace the timers with hrtimers.
>>>
>>> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
>>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
>> Niklas,
>>
>> Please rebase onto current mainline, test, and resubmit.
>
> I've sent out a v2 now :)
>
> Although, I thought that it was a bit weird that the conflicting patch
> was in Linus tree but is not in next-20170227.
>
It was in the watchdog-next branch of my repository at kernel.org,
which is not in -next. I'll have to sort that out with Wim at some point.

> I'm not sure if you are the web master for www.linux-watchdog.org,
> but their cgit seems to be broken:
> http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog.git;a=summary
> http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog-next.git;a=summary
>
Wim knows about it, and plans to replace the server.

Guenter

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-02-27 14:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-17 18:25 [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run Niklas Cassel
2017-02-19 16:46 ` Guenter Roeck
2017-02-20  8:36   ` Peter Zijlstra
2017-02-20  9:02     ` Guenter Roeck
2017-02-20 10:03   ` Niklas Cassel
2017-02-20 15:22     ` Eric Dumazet
2017-02-20 15:31     ` Guenter Roeck
2017-02-20 15:35 ` Guenter Roeck
2017-02-27  4:04 ` Guenter Roeck
2017-02-27 12:58   ` Niklas Cassel
2017-02-27 14:31     ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.