* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-17 18:25 [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run Niklas Cassel
@ 2017-02-19 16:46 ` Guenter Roeck
2017-02-20 8:36 ` Peter Zijlstra
2017-02-20 10:03 ` Niklas Cassel
2017-02-20 15:35 ` Guenter Roeck
2017-02-27 4:04 ` Guenter Roeck
2 siblings, 2 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-19 16:46 UTC (permalink / raw)
To: Niklas Cassel, wim, edumazet, peterz
Cc: linux-watchdog, linux-kernel, niklass, Wolfram Sang
Cc: Wolfram for input.
On 02/17/2017 10:25 AM, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@axis.com>
>
> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>
> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> pending softirqs are no longer always handled immediately, instead,
> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> the handling of the softirqs are deferred, and are instead supposed
> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>
> If a user space process with a real-time policy starts to misbehave
> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> what will happen is that all softirqs will get deferred, while ksoftirqd,
> which is supposed to handle the deferred softirqs, will never get to run.
>
> To make sure that the watchdog is able to fire even when we do not get
> to run softirqs, replace the timers with hrtimers.
>
This makes the driver dependent on HIGH_RES_TIMERS, which is not available
on all architectures. Before adding that restriction, I would like to see
some discussion if this is the only feasible solution.
Is this driver the only one with this problem, or is anything using
timers affected ?
Thanks,
Guenter
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
> ---
> drivers/watchdog/softdog.c | 40 ++++++++++++++++++++++++----------------
> 1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
> index c7bdc986dca1..0f67cd068465 100644
> --- a/drivers/watchdog/softdog.c
> +++ b/drivers/watchdog/softdog.c
> @@ -21,13 +21,12 @@
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/hrtimer.h>
> #include <linux/init.h>
> -#include <linux/jiffies.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/moduleparam.h>
> #include <linux/reboot.h>
> -#include <linux/timer.h>
> #include <linux/types.h>
> #include <linux/watchdog.h>
>
> @@ -54,7 +53,10 @@ module_param(soft_panic, int, 0);
> MODULE_PARM_DESC(soft_panic,
> "Softdog action, set to 1 to panic, 0 to reboot (default=0)");
>
> -static void softdog_fire(unsigned long data)
> +static struct hrtimer softdog_ticktock;
> +static struct hrtimer softdog_preticktock;
> +
> +static enum hrtimer_restart softdog_fire(struct hrtimer *timer)
> {
> module_put(THIS_MODULE);
> if (soft_noboot) {
> @@ -67,41 +69,42 @@ static void softdog_fire(unsigned long data)
> emergency_restart();
> pr_crit("Reboot didn't ?????\n");
> }
> -}
>
> -static struct timer_list softdog_ticktock =
> - TIMER_INITIALIZER(softdog_fire, 0, 0);
> + return HRTIMER_NORESTART;
> +}
>
> static struct watchdog_device softdog_dev;
>
> -static void softdog_pretimeout(unsigned long data)
> +static enum hrtimer_restart softdog_pretimeout(struct hrtimer *timer)
> {
> watchdog_notify_pretimeout(&softdog_dev);
> -}
>
> -static struct timer_list softdog_preticktock =
> - TIMER_INITIALIZER(softdog_pretimeout, 0, 0);
> + return HRTIMER_NORESTART;
> +}
>
> static int softdog_ping(struct watchdog_device *w)
> {
> - if (!mod_timer(&softdog_ticktock, jiffies + (w->timeout * HZ)))
> + if (!hrtimer_active(&softdog_ticktock))
> __module_get(THIS_MODULE);
> + hrtimer_start(&softdog_ticktock, ktime_set(w->timeout, 0),
> + HRTIMER_MODE_REL);
>
> if (w->pretimeout)
> - mod_timer(&softdog_preticktock, jiffies +
> - (w->timeout - w->pretimeout) * HZ);
> + hrtimer_start(&softdog_preticktock,
> + ktime_set(w->timeout - w->pretimeout, 0),
> + HRTIMER_MODE_REL);
> else
> - del_timer(&softdog_preticktock);
> + hrtimer_cancel(&softdog_preticktock);
>
> return 0;
> }
>
> static int softdog_stop(struct watchdog_device *w)
> {
> - if (del_timer(&softdog_ticktock))
> + if (hrtimer_cancel(&softdog_ticktock))
> module_put(THIS_MODULE);
>
> - del_timer(&softdog_preticktock);
> + hrtimer_cancel(&softdog_preticktock);
>
> return 0;
> }
> @@ -134,6 +137,11 @@ static int __init softdog_init(void)
> watchdog_set_nowayout(&softdog_dev, nowayout);
> watchdog_stop_on_reboot(&softdog_dev);
>
> + hrtimer_init(&softdog_ticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + softdog_ticktock.function = softdog_fire;
> + hrtimer_init(&softdog_preticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + softdog_preticktock.function = softdog_pretimeout;
> +
> ret = watchdog_register_device(&softdog_dev);
> if (ret)
> return ret;
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-19 16:46 ` Guenter Roeck
@ 2017-02-20 8:36 ` Peter Zijlstra
2017-02-20 9:02 ` Guenter Roeck
2017-02-20 10:03 ` Niklas Cassel
1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2017-02-20 8:36 UTC (permalink / raw)
To: Guenter Roeck
Cc: Niklas Cassel, wim, edumazet, linux-watchdog, linux-kernel,
niklass, Wolfram Sang
On Sun, Feb 19, 2017 at 08:46:28AM -0800, Guenter Roeck wrote:
> Cc: Wolfram for input.
>
> On 02/17/2017 10:25 AM, Niklas Cassel wrote:
> >From: Niklas Cassel <niklas.cassel@axis.com>
> >
> >Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
> >
> >Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> >pending softirqs are no longer always handled immediately, instead,
> >if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> >the handling of the softirqs are deferred, and are instead supposed
> >to be handled by ksoftirqd, when ksoftirqd gets scheduled.
> >
> >If a user space process with a real-time policy starts to misbehave
> >by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> >what will happen is that all softirqs will get deferred, while ksoftirqd,
> >which is supposed to handle the deferred softirqs, will never get to run.
> >
> >To make sure that the watchdog is able to fire even when we do not get
> >to run softirqs, replace the timers with hrtimers.
> >
>
> This makes the driver dependent on HIGH_RES_TIMERS, which is not available
> on all architectures. Before adding that restriction, I would like to see
> some discussion if this is the only feasible solution.
>
It does no such thing; the hrtimer interface is always available.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-20 8:36 ` Peter Zijlstra
@ 2017-02-20 9:02 ` Guenter Roeck
0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-20 9:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Niklas Cassel, wim, edumazet, linux-watchdog, linux-kernel,
niklass, Wolfram Sang
On 02/20/2017 12:36 AM, Peter Zijlstra wrote:
> On Sun, Feb 19, 2017 at 08:46:28AM -0800, Guenter Roeck wrote:
>> Cc: Wolfram for input.
>>
>> On 02/17/2017 10:25 AM, Niklas Cassel wrote:
>>> From: Niklas Cassel <niklas.cassel@axis.com>
>>>
>>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>>
>>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>>> pending softirqs are no longer always handled immediately, instead,
>>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>>> the handling of the softirqs are deferred, and are instead supposed
>>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>>
>>> If a user space process with a real-time policy starts to misbehave
>>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>>> which is supposed to handle the deferred softirqs, will never get to run.
>>>
>>> To make sure that the watchdog is able to fire even when we do not get
>>> to run softirqs, replace the timers with hrtimers.
>>>
>>
>> This makes the driver dependent on HIGH_RES_TIMERS, which is not available
>> on all architectures. Before adding that restriction, I would like to see
>> some discussion if this is the only feasible solution.
>>
>
> It does no such thing; the hrtimer interface is always available.
>
That is good to hear. Thanks for the correction.
Guenter
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-19 16:46 ` Guenter Roeck
2017-02-20 8:36 ` Peter Zijlstra
@ 2017-02-20 10:03 ` Niklas Cassel
2017-02-20 15:22 ` Eric Dumazet
2017-02-20 15:31 ` Guenter Roeck
1 sibling, 2 replies; 11+ messages in thread
From: Niklas Cassel @ 2017-02-20 10:03 UTC (permalink / raw)
To: Guenter Roeck, wim, edumazet, peterz
Cc: linux-watchdog, linux-kernel, Wolfram Sang
On 02/19/2017 05:46 PM, Guenter Roeck wrote:
> Cc: Wolfram for input.
>
> On 02/17/2017 10:25 AM, Niklas Cassel wrote:
>> From: Niklas Cassel <niklas.cassel@axis.com>
>>
>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>
>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>> pending softirqs are no longer always handled immediately, instead,
>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>> the handling of the softirqs are deferred, and are instead supposed
>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>
>> If a user space process with a real-time policy starts to misbehave
>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>> which is supposed to handle the deferred softirqs, will never get to run.
>>
>> To make sure that the watchdog is able to fire even when we do not get
>> to run softirqs, replace the timers with hrtimers.
>>
>
> This makes the driver dependent on HIGH_RES_TIMERS, which is not available
> on all architectures. Before adding that restriction, I would like to see
> some discussion if this is the only feasible solution.
>
> Is this driver the only one with this problem, or is anything using
> timers affected ?
Anything using timers is affected.
The timers will still get incremented, but the code checking for timer
expiration is run from a softirq, which in this case never gets to run,
so the timers will never expire.
Before 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), softirqs
were never deferred, so they always got to run when exiting an irq.
So previously with a user space process using all the CPU, like:
chrt -r 99 sh -c "while :; do :; done"
the softdog would still fire.
If we ask the system to run something all the time,
and the system does that, I don't think we can blame the system.
It is however important that the watchdog can still detect and
fire when this happens. Other drivers, not so much.
I guess another solution would be to modify the if-statements in
kernel/softirq.c to sometimes do the softirq directly, even if ksoftirqd
is in state TASK_RUNNING, if we also meet some other condition.
However, do we want to add that extra complexity?
Perhaps someone with more softirq/scheduler knowledge can give
some input on this.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-20 10:03 ` Niklas Cassel
@ 2017-02-20 15:22 ` Eric Dumazet
2017-02-20 15:31 ` Guenter Roeck
1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2017-02-20 15:22 UTC (permalink / raw)
To: Niklas Cassel
Cc: Guenter Roeck, wim, Peter Zijlstra, linux-watchdog, LKML, Wolfram Sang
[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]
On Mon, Feb 20, 2017 at 2:03 AM, Niklas Cassel <niklas.cassel@axis.com>
wrote:
> On 02/19/2017 05:46 PM, Guenter Roeck wrote:
> > Cc: Wolfram for input.
> >
> > On 02/17/2017 10:25 AM, Niklas Cassel wrote:
> >> From: Niklas Cassel <niklas.cassel@axis.com>
> >>
> >> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
> >>
> >> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> >> pending softirqs are no longer always handled immediately, instead,
> >> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> >> the handling of the softirqs are deferred, and are instead supposed
> >> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
> >>
> >> If a user space process with a real-time policy starts to misbehave
> >> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> >> what will happen is that all softirqs will get deferred, while
> ksoftirqd,
> >> which is supposed to handle the deferred softirqs, will never get to
> run.
> >>
> >> To make sure that the watchdog is able to fire even when we do not get
> >> to run softirqs, replace the timers with hrtimers.
> >>
> >
> > This makes the driver dependent on HIGH_RES_TIMERS, which is not
> available
> > on all architectures. Before adding that restriction, I would like to see
> > some discussion if this is the only feasible solution.
> >
> > Is this driver the only one with this problem, or is anything using
> > timers affected ?
>
> Anything using timers is affected.
> The timers will still get incremented, but the code checking for timer
> expiration is run from a softirq, which in this case never gets to run,
> so the timers will never expire.
>
> Before 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), softirqs
> were never deferred, so they always got to run when exiting an irq.
>
> So previously with a user space process using all the CPU, like:
> chrt -r 99 sh -c "while :; do :; done"
> the softdog would still fire.
>
>
So the question is :
If some RT process does an infinite loop, should we care about system being
functional ?
Looks like the OS is now WAI (Working As Intended)
> If we ask the system to run something all the time,
> and the system does that, I don't think we can blame the system.
> It is however important that the watchdog can still detect and
> fire when this happens. Other drivers, not so much.
>
> I guess another solution would be to modify the if-statements in
> kernel/softirq.c to sometimes do the softirq directly, even if ksoftirqd
> is in state TASK_RUNNING, if we also meet some other condition.
> However, do we want to add that extra complexity?
> Perhaps someone with more softirq/scheduler knowledge can give
> some input on this.
>
[-- Attachment #2: Type: text/html, Size: 3658 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-20 10:03 ` Niklas Cassel
2017-02-20 15:22 ` Eric Dumazet
@ 2017-02-20 15:31 ` Guenter Roeck
1 sibling, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-20 15:31 UTC (permalink / raw)
To: Niklas Cassel, wim, edumazet, peterz
Cc: linux-watchdog, linux-kernel, Wolfram Sang
On 02/20/2017 02:03 AM, Niklas Cassel wrote:
> On 02/19/2017 05:46 PM, Guenter Roeck wrote:
>> Cc: Wolfram for input.
>>
>> On 02/17/2017 10:25 AM, Niklas Cassel wrote:
>>> From: Niklas Cassel <niklas.cassel@axis.com>
>>>
>>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>>
>>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>>> pending softirqs are no longer always handled immediately, instead,
>>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>>> the handling of the softirqs are deferred, and are instead supposed
>>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>>
>>> If a user space process with a real-time policy starts to misbehave
>>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>>> which is supposed to handle the deferred softirqs, will never get to run.
>>>
>>> To make sure that the watchdog is able to fire even when we do not get
>>> to run softirqs, replace the timers with hrtimers.
>>>
>>
>> This makes the driver dependent on HIGH_RES_TIMERS, which is not available
>> on all architectures. Before adding that restriction, I would like to see
>> some discussion if this is the only feasible solution.
>>
>> Is this driver the only one with this problem, or is anything using
>> timers affected ?
>
> Anything using timers is affected.
> The timers will still get incremented, but the code checking for timer
> expiration is run from a softirq, which in this case never gets to run,
> so the timers will never expire.
>
> Before 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), softirqs
> were never deferred, so they always got to run when exiting an irq.
>
> So previously with a user space process using all the CPU, like:
> chrt -r 99 sh -c "while :; do :; done"
> the softdog would still fire.
>
> If we ask the system to run something all the time,
> and the system does that, I don't think we can blame the system.
> It is however important that the watchdog can still detect and
> fire when this happens. Other drivers, not so much.
>
> I guess another solution would be to modify the if-statements in
> kernel/softirq.c to sometimes do the softirq directly, even if ksoftirqd
> is in state TASK_RUNNING, if we also meet some other condition.
> However, do we want to add that extra complexity?
No, in that respect using hrtimers is ok. I was more concerned about
other users of timers. But, as you say, I guess that is considered ok.
Guenter
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-17 18:25 [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run Niklas Cassel
2017-02-19 16:46 ` Guenter Roeck
@ 2017-02-20 15:35 ` Guenter Roeck
2017-02-27 4:04 ` Guenter Roeck
2 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-20 15:35 UTC (permalink / raw)
To: Niklas Cassel, wim, edumazet, peterz
Cc: linux-watchdog, linux-kernel, niklass
On 02/17/2017 10:25 AM, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@axis.com>
>
> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>
> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> pending softirqs are no longer always handled immediately, instead,
> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> the handling of the softirqs are deferred, and are instead supposed
> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>
> If a user space process with a real-time policy starts to misbehave
> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> what will happen is that all softirqs will get deferred, while ksoftirqd,
> which is supposed to handle the deferred softirqs, will never get to run.
>
> To make sure that the watchdog is able to fire even when we do not get
> to run softirqs, replace the timers with hrtimers.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
> ---
> drivers/watchdog/softdog.c | 40 ++++++++++++++++++++++++----------------
> 1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
> index c7bdc986dca1..0f67cd068465 100644
> --- a/drivers/watchdog/softdog.c
> +++ b/drivers/watchdog/softdog.c
> @@ -21,13 +21,12 @@
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/hrtimer.h>
> #include <linux/init.h>
> -#include <linux/jiffies.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/moduleparam.h>
> #include <linux/reboot.h>
> -#include <linux/timer.h>
> #include <linux/types.h>
> #include <linux/watchdog.h>
>
> @@ -54,7 +53,10 @@ module_param(soft_panic, int, 0);
> MODULE_PARM_DESC(soft_panic,
> "Softdog action, set to 1 to panic, 0 to reboot (default=0)");
>
> -static void softdog_fire(unsigned long data)
> +static struct hrtimer softdog_ticktock;
> +static struct hrtimer softdog_preticktock;
> +
> +static enum hrtimer_restart softdog_fire(struct hrtimer *timer)
> {
> module_put(THIS_MODULE);
> if (soft_noboot) {
> @@ -67,41 +69,42 @@ static void softdog_fire(unsigned long data)
> emergency_restart();
> pr_crit("Reboot didn't ?????\n");
> }
> -}
>
> -static struct timer_list softdog_ticktock =
> - TIMER_INITIALIZER(softdog_fire, 0, 0);
> + return HRTIMER_NORESTART;
> +}
>
> static struct watchdog_device softdog_dev;
>
> -static void softdog_pretimeout(unsigned long data)
> +static enum hrtimer_restart softdog_pretimeout(struct hrtimer *timer)
> {
> watchdog_notify_pretimeout(&softdog_dev);
> -}
>
> -static struct timer_list softdog_preticktock =
> - TIMER_INITIALIZER(softdog_pretimeout, 0, 0);
> + return HRTIMER_NORESTART;
> +}
>
> static int softdog_ping(struct watchdog_device *w)
> {
> - if (!mod_timer(&softdog_ticktock, jiffies + (w->timeout * HZ)))
> + if (!hrtimer_active(&softdog_ticktock))
> __module_get(THIS_MODULE);
> + hrtimer_start(&softdog_ticktock, ktime_set(w->timeout, 0),
> + HRTIMER_MODE_REL);
>
> if (w->pretimeout)
> - mod_timer(&softdog_preticktock, jiffies +
> - (w->timeout - w->pretimeout) * HZ);
> + hrtimer_start(&softdog_preticktock,
> + ktime_set(w->timeout - w->pretimeout, 0),
> + HRTIMER_MODE_REL);
> else
> - del_timer(&softdog_preticktock);
> + hrtimer_cancel(&softdog_preticktock);
>
> return 0;
> }
>
> static int softdog_stop(struct watchdog_device *w)
> {
> - if (del_timer(&softdog_ticktock))
> + if (hrtimer_cancel(&softdog_ticktock))
> module_put(THIS_MODULE);
>
> - del_timer(&softdog_preticktock);
> + hrtimer_cancel(&softdog_preticktock);
>
> return 0;
> }
> @@ -134,6 +137,11 @@ static int __init softdog_init(void)
> watchdog_set_nowayout(&softdog_dev, nowayout);
> watchdog_stop_on_reboot(&softdog_dev);
>
> + hrtimer_init(&softdog_ticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + softdog_ticktock.function = softdog_fire;
> + hrtimer_init(&softdog_preticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + softdog_preticktock.function = softdog_pretimeout;
> +
> ret = watchdog_register_device(&softdog_dev);
> if (ret)
> return ret;
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-17 18:25 [PATCH] watchdog: softdog: fire watchdog even if softirqs do not get to run Niklas Cassel
2017-02-19 16:46 ` Guenter Roeck
2017-02-20 15:35 ` Guenter Roeck
@ 2017-02-27 4:04 ` Guenter Roeck
2017-02-27 12:58 ` Niklas Cassel
2 siblings, 1 reply; 11+ messages in thread
From: Guenter Roeck @ 2017-02-27 4:04 UTC (permalink / raw)
To: Niklas Cassel
Cc: wim, edumazet, peterz, linux-watchdog, linux-kernel, niklass
On Fri, Feb 17, 2017 at 07:25:02PM +0100, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@axis.com>
>
> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>
> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
> pending softirqs are no longer always handled immediately, instead,
> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
> the handling of the softirqs are deferred, and are instead supposed
> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>
> If a user space process with a real-time policy starts to misbehave
> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
> what will happen is that all softirqs will get deferred, while ksoftirqd,
> which is supposed to handle the deferred softirqs, will never get to run.
>
> To make sure that the watchdog is able to fire even when we do not get
> to run softirqs, replace the timers with hrtimers.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Niklas,
Please rebase onto current mainline, test, and resubmit.
Thanks,
Guenter
> ---
> drivers/watchdog/softdog.c | 40 ++++++++++++++++++++++++----------------
> 1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
> index c7bdc986dca1..0f67cd068465 100644
> --- a/drivers/watchdog/softdog.c
> +++ b/drivers/watchdog/softdog.c
> @@ -21,13 +21,12 @@
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> +#include <linux/hrtimer.h>
> #include <linux/init.h>
> -#include <linux/jiffies.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/moduleparam.h>
> #include <linux/reboot.h>
> -#include <linux/timer.h>
> #include <linux/types.h>
> #include <linux/watchdog.h>
>
> @@ -54,7 +53,10 @@ module_param(soft_panic, int, 0);
> MODULE_PARM_DESC(soft_panic,
> "Softdog action, set to 1 to panic, 0 to reboot (default=0)");
>
> -static void softdog_fire(unsigned long data)
> +static struct hrtimer softdog_ticktock;
> +static struct hrtimer softdog_preticktock;
> +
> +static enum hrtimer_restart softdog_fire(struct hrtimer *timer)
> {
> module_put(THIS_MODULE);
> if (soft_noboot) {
> @@ -67,41 +69,42 @@ static void softdog_fire(unsigned long data)
> emergency_restart();
> pr_crit("Reboot didn't ?????\n");
> }
> -}
>
> -static struct timer_list softdog_ticktock =
> - TIMER_INITIALIZER(softdog_fire, 0, 0);
> + return HRTIMER_NORESTART;
> +}
>
> static struct watchdog_device softdog_dev;
>
> -static void softdog_pretimeout(unsigned long data)
> +static enum hrtimer_restart softdog_pretimeout(struct hrtimer *timer)
> {
> watchdog_notify_pretimeout(&softdog_dev);
> -}
>
> -static struct timer_list softdog_preticktock =
> - TIMER_INITIALIZER(softdog_pretimeout, 0, 0);
> + return HRTIMER_NORESTART;
> +}
>
> static int softdog_ping(struct watchdog_device *w)
> {
> - if (!mod_timer(&softdog_ticktock, jiffies + (w->timeout * HZ)))
> + if (!hrtimer_active(&softdog_ticktock))
> __module_get(THIS_MODULE);
> + hrtimer_start(&softdog_ticktock, ktime_set(w->timeout, 0),
> + HRTIMER_MODE_REL);
>
> if (w->pretimeout)
> - mod_timer(&softdog_preticktock, jiffies +
> - (w->timeout - w->pretimeout) * HZ);
> + hrtimer_start(&softdog_preticktock,
> + ktime_set(w->timeout - w->pretimeout, 0),
> + HRTIMER_MODE_REL);
> else
> - del_timer(&softdog_preticktock);
> + hrtimer_cancel(&softdog_preticktock);
>
> return 0;
> }
>
> static int softdog_stop(struct watchdog_device *w)
> {
> - if (del_timer(&softdog_ticktock))
> + if (hrtimer_cancel(&softdog_ticktock))
> module_put(THIS_MODULE);
>
> - del_timer(&softdog_preticktock);
> + hrtimer_cancel(&softdog_preticktock);
>
> return 0;
> }
> @@ -134,6 +137,11 @@ static int __init softdog_init(void)
> watchdog_set_nowayout(&softdog_dev, nowayout);
> watchdog_stop_on_reboot(&softdog_dev);
>
> + hrtimer_init(&softdog_ticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + softdog_ticktock.function = softdog_fire;
> + hrtimer_init(&softdog_preticktock, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + softdog_preticktock.function = softdog_pretimeout;
> +
> ret = watchdog_register_device(&softdog_dev);
> if (ret)
> return ret;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-27 4:04 ` Guenter Roeck
@ 2017-02-27 12:58 ` Niklas Cassel
2017-02-27 14:31 ` Guenter Roeck
0 siblings, 1 reply; 11+ messages in thread
From: Niklas Cassel @ 2017-02-27 12:58 UTC (permalink / raw)
To: Guenter Roeck; +Cc: wim, edumazet, peterz, linux-watchdog, linux-kernel
On 02/27/2017 05:04 AM, Guenter Roeck wrote:
> On Fri, Feb 17, 2017 at 07:25:02PM +0100, Niklas Cassel wrote:
>> From: Niklas Cassel <niklas.cassel@axis.com>
>>
>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>
>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>> pending softirqs are no longer always handled immediately, instead,
>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>> the handling of the softirqs are deferred, and are instead supposed
>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>
>> If a user space process with a real-time policy starts to misbehave
>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>> which is supposed to handle the deferred softirqs, will never get to run.
>>
>> To make sure that the watchdog is able to fire even when we do not get
>> to run softirqs, replace the timers with hrtimers.
>>
>> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
> Niklas,
>
> Please rebase onto current mainline, test, and resubmit.
I've sent out a v2 now :)
Although, I thought that it was a bit weird that the conflicting patch
was in Linus tree but is not in next-20170227.
I'm not sure if you are the web master for www.linux-watchdog.org,
but their cgit seems to be broken:
http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog.git;a=summary
http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog-next.git;a=summary
>
> Thanks,
> Guenter
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: watchdog: softdog: fire watchdog even if softirqs do not get to run
2017-02-27 12:58 ` Niklas Cassel
@ 2017-02-27 14:31 ` Guenter Roeck
0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2017-02-27 14:31 UTC (permalink / raw)
To: Niklas Cassel; +Cc: wim, edumazet, peterz, linux-watchdog, linux-kernel
On 02/27/2017 04:58 AM, Niklas Cassel wrote:
> On 02/27/2017 05:04 AM, Guenter Roeck wrote:
>> On Fri, Feb 17, 2017 at 07:25:02PM +0100, Niklas Cassel wrote:
>>> From: Niklas Cassel <niklas.cassel@axis.com>
>>>
>>> Checking for timer expiration is done from the softirq TIMER_SOFTIRQ.
>>>
>>> Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
>>> pending softirqs are no longer always handled immediately, instead,
>>> if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING,
>>> the handling of the softirqs are deferred, and are instead supposed
>>> to be handled by ksoftirqd, when ksoftirqd gets scheduled.
>>>
>>> If a user space process with a real-time policy starts to misbehave
>>> by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING,
>>> what will happen is that all softirqs will get deferred, while ksoftirqd,
>>> which is supposed to handle the deferred softirqs, will never get to run.
>>>
>>> To make sure that the watchdog is able to fire even when we do not get
>>> to run softirqs, replace the timers with hrtimers.
>>>
>>> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
>>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
>> Niklas,
>>
>> Please rebase onto current mainline, test, and resubmit.
>
> I've sent out a v2 now :)
>
> Although, I thought that it was a bit weird that the conflicting patch
> was in Linus tree but is not in next-20170227.
>
It was in the watchdog-next branch of my repository at kernel.org,
which is not in -next. I'll have to sort that out with Wim at some point.
> I'm not sure if you are the web master for www.linux-watchdog.org,
> but their cgit seems to be broken:
> http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog.git;a=summary
> http://www.linux-watchdog.org/cgi-bin/gitweb.cgi?p=linux-watchdog-next.git;a=summary
>
Wim knows about it, and plans to replace the server.
Guenter
^ permalink raw reply [flat|nested] 11+ messages in thread