linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Default behavior of watchdog drivers
@ 2018-09-27 12:38 Jean Delvare
  2018-09-27 13:08 ` Guenter Roeck
  0 siblings, 1 reply; 7+ messages in thread
From: Jean Delvare @ 2018-09-27 12:38 UTC (permalink / raw)
  To: linux-watchdog, Martin Wilck, Wim Van Sebroeck, Guenter Roeck
  Cc: Giel van Schijndel, Steven J. Hill

Hello,

It seems that various watchdog drivers behave differently if the
watchdog timer is already enabled when the driver is loaded:

* iTCO_wdt will disable the timer. I think this is what most drivers
  do, but not all.
* w83627hf_wdt will let the timer run, unless option early_disable=1 is
  passed.

These are the 2 which bother me the most because they are among the
most popular watchdog drivers on x86 systems. Having a different
behavior depending on which driver is used is quite confusing.

Can we please settle on a default behavior (either all drivers reset
the timer a load time, or none do it) and have all watchdog drivers
stick to that?

If an option to get the opposite behavior is deemed useful, can we
settle on a standard name for it? Or even implement it at the
watchdog_core level, so that each driver doesn't need to implement it
separately?

While looking into this, I found a few other strange module parameters:

* f71808e_wdt has "start_withtimeout", which starts the timer even if
  nobody opens the watchdog device node. Giel, do we really need this?
* octeon-wdt has "disable", which completely disables the watchdog
  function. This "feature" was sneaked in via commit 381cec022e46
  ("watchdog: octeon-wdt: File cleaning.") which was supposed to be a
  cleanup-only patch, without any explanation nor even mention. I can't
  see how such an option can be useful. If you don't need the driver,
  just don't load it. Steven, can you explain?

Thanks,
-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Default behavior of watchdog drivers
  2018-09-27 12:38 Default behavior of watchdog drivers Jean Delvare
@ 2018-09-27 13:08 ` Guenter Roeck
  2018-10-01 11:36   ` Giel van Schijndel
  0 siblings, 1 reply; 7+ messages in thread
From: Guenter Roeck @ 2018-09-27 13:08 UTC (permalink / raw)
  To: Jean Delvare, linux-watchdog, Martin Wilck, Wim Van Sebroeck
  Cc: Giel van Schijndel, Steven J. Hill

Hi Jean,

On 09/27/2018 05:38 AM, Jean Delvare wrote:
> Hello,
> 
> It seems that various watchdog drivers behave differently if the
> watchdog timer is already enabled when the driver is loaded:
> 
> * iTCO_wdt will disable the timer. I think this is what most drivers
>    do, but not all.
> * w83627hf_wdt will let the timer run, unless option early_disable=1 is
>    passed.
> 
> These are the 2 which bother me the most because they are among the
> most popular watchdog drivers on x86 systems. Having a different
> behavior depending on which driver is used is quite confusing.
> 
> Can we please settle on a default behavior (either all drivers reset
> the timer a load time, or none do it) and have all watchdog drivers
> stick to that?
> 

Since

ee142889e32f watchdog: Introduce WDOG_HW_RUNNING flag
664a39236e71 watchdog: Introduce hardware maximum heartbeat in watchdog core

the default behavior _should_ be that the timer is kept running but the
core is informed that it is running. Of course, that doesn't mean that
(legacy) drivers actually do that.

Please note that some watchdog drivers can not be stopped, so a common
mechanism to stop watchdogs during probe is technically impossible.

> If an option to get the opposite behavior is deemed useful, can we
> settle on a standard name for it? Or even implement it at the
> watchdog_core level, so that each driver doesn't need to implement it
> separately?
> 
> While looking into this, I found a few other strange module parameters:
> 
> * f71808e_wdt has "start_withtimeout", which starts the timer even if
>    nobody opens the watchdog device node. Giel, do we really need this?

We had requests for a common mechanism to do that, ie some kind of boot
timeout. Idea would be to reboot the system if the watchdog device has
not been opened after a set period of time. Maybe that is the idea here.

Guenter

> * octeon-wdt has "disable", which completely disables the watchdog
>    function. This "feature" was sneaked in via commit 381cec022e46
>    ("watchdog: octeon-wdt: File cleaning.") which was supposed to be a
>    cleanup-only patch, without any explanation nor even mention. I can't
>    see how such an option can be useful. If you don't need the driver,
>    just don't load it. Steven, can you explain?
> 
> Thanks,
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Default behavior of watchdog drivers
  2018-09-27 13:08 ` Guenter Roeck
@ 2018-10-01 11:36   ` Giel van Schijndel
  0 siblings, 0 replies; 7+ messages in thread
From: Giel van Schijndel @ 2018-10-01 11:36 UTC (permalink / raw)
  To: Guenter Roeck, Jean Delvare, linux-watchdog, Martin Wilck,
	Wim Van Sebroeck
  Cc: Giel van Schijndel, Steven J. Hill

Hi Jean, Guenter,

On 27-09-18 15:08, Guenter Roeck wrote:
> Hi Jean,
>
> On 09/27/2018 05:38 AM, Jean Delvare wrote:
>> It seems that various watchdog drivers behave differently if the
>> watchdog timer is already enabled when the driver is loaded:
>>
>> ... snip ...
>>
>> Can we please settle on a default behavior (either all drivers reset
>> the timer a load time, or none do it) and have all watchdog drivers
>> stick to that?
>>
>
> ... snip ...
>
>> If an option to get the opposite behavior is deemed useful, can we
>> settle on a standard name for it? Or even implement it at the
>> watchdog_core level, so that each driver doesn't need to implement it
>> separately?
>>
>> While looking into this, I found a few other strange module parameters:
>>
>> * f71808e_wdt has "start_withtimeout", which starts the timer even if
>>    nobody opens the watchdog device node. Giel, do we really need this?
>
> We had requests for a common mechanism to do that, ie some kind of boot
> timeout. Idea would be to reboot the system if the watchdog device has
> not been opened after a set period of time. Maybe that is the idea here.

The purpose of this parameter is to reduce the time window that we need to
trust software to behave correctly. Withthis parameter we don't need to
trust user land at all in order to still be protected by the watchdog.

I'm not sure currently because I no longer work for the company that relied
on this. But, from memory, the init system that was being used at the time
was, occassionaly, prone to hang due to some services hanging at startup,
preventing watchdog startup. These were systems deployed at distant
locations
in the field with, unreliable, GPRS connections for remote maintenance. This
watchdog parameter completely eliminated the need for field trips for
anything
less than hardware failure in the time that I was there (about a year).

-- 
Met vriendelijke groet,
With kind regards,
Giel van Schijndel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Default behavior of watchdog drivers
  2018-10-02 11:39   ` Wim Van Sebroeck
@ 2018-10-02 13:24     ` Guenter Roeck
  0 siblings, 0 replies; 7+ messages in thread
From: Guenter Roeck @ 2018-10-02 13:24 UTC (permalink / raw)
  To: Wim Van Sebroeck
  Cc: Jean Delvare, linux-watchdog, Martin Wilck, Giel van Schijndel,
	Steven J. Hill

Hi Wim,

On 10/02/2018 04:39 AM, Wim Van Sebroeck wrote:
> Hi Guenter,
> 
>> On 09/30/2018 06:09 AM, Wim Van Sebroeck wrote:
>>> hi Jean, Guenter,
>>>
>>>> Hi Jean,
>>>>
>>>> On 09/27/2018 05:38 AM, Jean Delvare wrote:
>>>>> Hello,
>>>>>
>>>>> It seems that various watchdog drivers behave differently if the
>>>>> watchdog timer is already enabled when the driver is loaded:
>>>>>
>>>>> * iTCO_wdt will disable the timer. I think this is what most drivers
>>>>>    do, but not all.
>>>>> * w83627hf_wdt will let the timer run, unless option early_disable=1 is
>>>>>    passed.
>>>>>
>>>>> These are the 2 which bother me the most because they are among the
>>>>> most popular watchdog drivers on x86 systems. Having a different
>>>>> behavior depending on which driver is used is quite confusing.
>>>>>
>>>>> Can we please settle on a default behavior (either all drivers reset
>>>>> the timer a load time, or none do it) and have all watchdog drivers
>>>>> stick to that?
>>>>>
>>>>
>>>> Since
>>>>
>>>> ee142889e32f watchdog: Introduce WDOG_HW_RUNNING flag
>>>> 664a39236e71 watchdog: Introduce hardware maximum heartbeat in watchdog core
>>>>
>>>> the default behavior _should_ be that the timer is kept running but the
>>>> core is informed that it is running. Of course, that doesn't mean that
>>>> (legacy) drivers actually do that.
>>>>
>>>> Please note that some watchdog drivers can not be stopped, so a common
>>>> mechanism to stop watchdogs during probe is technically impossible.
>>>
>>> Originally the default behaviour was to stop the watchdog if it was running and restart it when /dev/watchdog was opened.
>>> However some watchdog devices can't be stopped once running and the starting of it or not is done on "BIOS" level. So there we said that if the watchdog is allready running and it can't be stopped that the core should keep the watchdog running as long as the device is not "opened".
>>>
>>> So the correct behaviour is that we should not have the watchdog device active if the device is not being used, but when the device can't be stopped then we need to keep the watchdog running and do the keepalive ping ourselves. It makes no sense to start the watchdog and ping it while itis not in use and can be stopped.
>>>
>>> So looking at Jean's comment I think we need to review how w83627hf_wdt does the logic.
>>>
>>
>> I have seen requests that evrn watchdogs which can be stopped should be
>> kept running to ensure that the system is always protected. Given that,
>> I would hesitate to take that functionality away from the w83627hf_wdt
>> driver.
> 
> That's not according to the specs that stipulate that it is userspace that should control that.
> And if it's not default behaviour then you could fix it by a module_param that enables that if you set it to do that.
> 

It is difficult for userspace to control behavior that applies prior
to userspace running.

There have been some arguments back and forth discussing how and when
to enable the watchdog prior to userspace opening it (and what the initial
timeout should be). Maybe it is time to revisit that and find a consistent
solution.

Thanks,
Guenter

>> Guenter
>>
>>>>> If an option to get the opposite behavior is deemed useful, can we
>>>>> settle on a standard name for it? Or even implement it at the
>>>>> watchdog_core level, so that each driver doesn't need to implement it
>>>>> separately?
>>>>>
>>>>> While looking into this, I found a few other strange module parameters:
>>>>>
>>>>> * f71808e_wdt has "start_withtimeout", which starts the timer even if
>>>>>    nobody opens the watchdog device node. Giel, do we really need this?
>>>>
>>>> We had requests for a common mechanism to do that, ie some kind of boot
>>>> timeout. Idea would be to reboot the system if the watchdog device has
>>>> not been opened after a set period of time. Maybe that is the idea here.
>>>>
>>>> Guenter
>>>>
>>>>> * octeon-wdt has "disable", which completely disables the watchdog
>>>>>    function. This "feature" was sneaked in via commit 381cec022e46
>>>>>    ("watchdog: octeon-wdt: File cleaning.") which was supposed to be a
>>>>>    cleanup-only patch, without any explanation nor even mention. I can't
>>>>>    see how such an option can be useful. If you don't need the driver,
>>>>>    just don't load it. Steven, can you explain?
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>
>>> Kind regards,
>>> Wim.
>>>
>>>
>>
> 
> Kind regards,
> Wim.
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Default behavior of watchdog drivers
  2018-09-30 13:58 ` Guenter Roeck
@ 2018-10-02 11:39   ` Wim Van Sebroeck
  2018-10-02 13:24     ` Guenter Roeck
  0 siblings, 1 reply; 7+ messages in thread
From: Wim Van Sebroeck @ 2018-10-02 11:39 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Wim Van Sebroeck, Jean Delvare, linux-watchdog, Martin Wilck,
	Giel van Schijndel, Steven J. Hill

Hi Guenter,

> On 09/30/2018 06:09 AM, Wim Van Sebroeck wrote:
> >hi Jean, Guenter,
> >
> >>Hi Jean,
> >>
> >>On 09/27/2018 05:38 AM, Jean Delvare wrote:
> >>>Hello,
> >>>
> >>>It seems that various watchdog drivers behave differently if the
> >>>watchdog timer is already enabled when the driver is loaded:
> >>>
> >>>* iTCO_wdt will disable the timer. I think this is what most drivers
> >>>   do, but not all.
> >>>* w83627hf_wdt will let the timer run, unless option early_disable=1 is
> >>>   passed.
> >>>
> >>>These are the 2 which bother me the most because they are among the
> >>>most popular watchdog drivers on x86 systems. Having a different
> >>>behavior depending on which driver is used is quite confusing.
> >>>
> >>>Can we please settle on a default behavior (either all drivers reset
> >>>the timer a load time, or none do it) and have all watchdog drivers
> >>>stick to that?
> >>>
> >>
> >>Since
> >>
> >>ee142889e32f watchdog: Introduce WDOG_HW_RUNNING flag
> >>664a39236e71 watchdog: Introduce hardware maximum heartbeat in watchdog core
> >>
> >>the default behavior _should_ be that the timer is kept running but the
> >>core is informed that it is running. Of course, that doesn't mean that
> >>(legacy) drivers actually do that.
> >>
> >>Please note that some watchdog drivers can not be stopped, so a common
> >>mechanism to stop watchdogs during probe is technically impossible.
> >
> >Originally the default behaviour was to stop the watchdog if it was running and restart it when /dev/watchdog was opened.
> >However some watchdog devices can't be stopped once running and the starting of it or not is done on "BIOS" level. So there we said that if the watchdog is allready running and it can't be stopped that the core should keep the watchdog running as long as the device is not "opened".
> >
> >So the correct behaviour is that we should not have the watchdog device active if the device is not being used, but when the device can't be stopped then we need to keep the watchdog running and do the keepalive ping ourselves. It makes no sense to start the watchdog and ping it while itis not in use and can be stopped.
> >
> >So looking at Jean's comment I think we need to review how w83627hf_wdt does the logic.
> >
> 
> I have seen requests that evrn watchdogs which can be stopped should be
> kept running to ensure that the system is always protected. Given that,
> I would hesitate to take that functionality away from the w83627hf_wdt
> driver.

That's not according to the specs that stipulate that it is userspace that should control that.
And if it's not default behaviour then you could fix it by a module_param that enables that if you set it to do that.

> Guenter
> 
> >>>If an option to get the opposite behavior is deemed useful, can we
> >>>settle on a standard name for it? Or even implement it at the
> >>>watchdog_core level, so that each driver doesn't need to implement it
> >>>separately?
> >>>
> >>>While looking into this, I found a few other strange module parameters:
> >>>
> >>>* f71808e_wdt has "start_withtimeout", which starts the timer even if
> >>>   nobody opens the watchdog device node. Giel, do we really need this?
> >>
> >>We had requests for a common mechanism to do that, ie some kind of boot
> >>timeout. Idea would be to reboot the system if the watchdog device has
> >>not been opened after a set period of time. Maybe that is the idea here.
> >>
> >>Guenter
> >>
> >>>* octeon-wdt has "disable", which completely disables the watchdog
> >>>   function. This "feature" was sneaked in via commit 381cec022e46
> >>>   ("watchdog: octeon-wdt: File cleaning.") which was supposed to be a
> >>>   cleanup-only patch, without any explanation nor even mention. I can't
> >>>   see how such an option can be useful. If you don't need the driver,
> >>>   just don't load it. Steven, can you explain?
> >>>
> >>>Thanks,
> >>>
> >>
> >
> >Kind regards,
> >Wim.
> >
> >
> 

Kind regards,
Wim.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Default behavior of watchdog drivers
  2018-09-30 13:09 Wim Van Sebroeck
@ 2018-09-30 13:58 ` Guenter Roeck
  2018-10-02 11:39   ` Wim Van Sebroeck
  0 siblings, 1 reply; 7+ messages in thread
From: Guenter Roeck @ 2018-09-30 13:58 UTC (permalink / raw)
  To: Wim Van Sebroeck
  Cc: Jean Delvare, linux-watchdog, Martin Wilck, Giel van Schijndel,
	Steven J. Hill

On 09/30/2018 06:09 AM, Wim Van Sebroeck wrote:
> hi Jean, Guenter,
> 
>> Hi Jean,
>>
>> On 09/27/2018 05:38 AM, Jean Delvare wrote:
>>> Hello,
>>>
>>> It seems that various watchdog drivers behave differently if the
>>> watchdog timer is already enabled when the driver is loaded:
>>>
>>> * iTCO_wdt will disable the timer. I think this is what most drivers
>>>    do, but not all.
>>> * w83627hf_wdt will let the timer run, unless option early_disable=1 is
>>>    passed.
>>>
>>> These are the 2 which bother me the most because they are among the
>>> most popular watchdog drivers on x86 systems. Having a different
>>> behavior depending on which driver is used is quite confusing.
>>>
>>> Can we please settle on a default behavior (either all drivers reset
>>> the timer a load time, or none do it) and have all watchdog drivers
>>> stick to that?
>>>
>>
>> Since
>>
>> ee142889e32f watchdog: Introduce WDOG_HW_RUNNING flag
>> 664a39236e71 watchdog: Introduce hardware maximum heartbeat in watchdog core
>>
>> the default behavior _should_ be that the timer is kept running but the
>> core is informed that it is running. Of course, that doesn't mean that
>> (legacy) drivers actually do that.
>>
>> Please note that some watchdog drivers can not be stopped, so a common
>> mechanism to stop watchdogs during probe is technically impossible.
> 
> Originally the default behaviour was to stop the watchdog if it was running and restart it when /dev/watchdog was opened.
> However some watchdog devices can't be stopped once running and the starting of it or not is done on "BIOS" level. So there we said that if the watchdog is allready running and it can't be stopped that the core should keep the watchdog running as long as the device is not "opened".
> 
> So the correct behaviour is that we should not have the watchdog device active if the device is not being used, but when the device can't be stopped then we need to keep the watchdog running and do the keepalive ping ourselves. It makes no sense to start the watchdog and ping it while itis not in use and can be stopped.
> 
> So looking at Jean's comment I think we need to review how w83627hf_wdt does the logic.
> 

I have seen requests that evrn watchdogs which can be stopped should be
kept running to ensure that the system is always protected. Given that,
I would hesitate to take that functionality away from the w83627hf_wdt
driver.

Guenter

>>> If an option to get the opposite behavior is deemed useful, can we
>>> settle on a standard name for it? Or even implement it at the
>>> watchdog_core level, so that each driver doesn't need to implement it
>>> separately?
>>>
>>> While looking into this, I found a few other strange module parameters:
>>>
>>> * f71808e_wdt has "start_withtimeout", which starts the timer even if
>>>    nobody opens the watchdog device node. Giel, do we really need this?
>>
>> We had requests for a common mechanism to do that, ie some kind of boot
>> timeout. Idea would be to reboot the system if the watchdog device has
>> not been opened after a set period of time. Maybe that is the idea here.
>>
>> Guenter
>>
>>> * octeon-wdt has "disable", which completely disables the watchdog
>>>    function. This "feature" was sneaked in via commit 381cec022e46
>>>    ("watchdog: octeon-wdt: File cleaning.") which was supposed to be a
>>>    cleanup-only patch, without any explanation nor even mention. I can't
>>>    see how such an option can be useful. If you don't need the driver,
>>>    just don't load it. Steven, can you explain?
>>>
>>> Thanks,
>>>
>>
> 
> Kind regards,
> Wim.
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Default behavior of watchdog drivers
@ 2018-09-30 13:09 Wim Van Sebroeck
  2018-09-30 13:58 ` Guenter Roeck
  0 siblings, 1 reply; 7+ messages in thread
From: Wim Van Sebroeck @ 2018-09-30 13:09 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jean Delvare, linux-watchdog, Martin Wilck, Wim Van Sebroeck,
	Giel van Schijndel, Steven J. Hill

hi Jean, Guenter,

> Hi Jean,
> 
> On 09/27/2018 05:38 AM, Jean Delvare wrote:
> >Hello,
> >
> >It seems that various watchdog drivers behave differently if the
> >watchdog timer is already enabled when the driver is loaded:
> >
> >* iTCO_wdt will disable the timer. I think this is what most drivers
> >   do, but not all.
> >* w83627hf_wdt will let the timer run, unless option early_disable=1 is
> >   passed.
> >
> >These are the 2 which bother me the most because they are among the
> >most popular watchdog drivers on x86 systems. Having a different
> >behavior depending on which driver is used is quite confusing.
> >
> >Can we please settle on a default behavior (either all drivers reset
> >the timer a load time, or none do it) and have all watchdog drivers
> >stick to that?
> >
> 
> Since
> 
> ee142889e32f watchdog: Introduce WDOG_HW_RUNNING flag
> 664a39236e71 watchdog: Introduce hardware maximum heartbeat in watchdog core
> 
> the default behavior _should_ be that the timer is kept running but the
> core is informed that it is running. Of course, that doesn't mean that
> (legacy) drivers actually do that.
> 
> Please note that some watchdog drivers can not be stopped, so a common
> mechanism to stop watchdogs during probe is technically impossible.

Originally the default behaviour was to stop the watchdog if it was running and restart it when /dev/watchdog was opened.
However some watchdog devices can't be stopped once running and the starting of it or not is done on "BIOS" level. So there we said that if the watchdog is allready running and it can't be stopped that the core should keep the watchdog running as long as the device is not "opened".

So the correct behaviour is that we should not have the watchdog device active if the device is not being used, but when the device can't be stopped then we need to keep the watchdog running and do the keepalive ping ourselves. It makes no sense to start the watchdog and ping it while itis not in use and can be stopped.

So looking at Jean's comment I think we need to review how w83627hf_wdt does the logic.

> >If an option to get the opposite behavior is deemed useful, can we
> >settle on a standard name for it? Or even implement it at the
> >watchdog_core level, so that each driver doesn't need to implement it
> >separately?
> >
> >While looking into this, I found a few other strange module parameters:
> >
> >* f71808e_wdt has "start_withtimeout", which starts the timer even if
> >   nobody opens the watchdog device node. Giel, do we really need this?
> 
> We had requests for a common mechanism to do that, ie some kind of boot
> timeout. Idea would be to reboot the system if the watchdog device has
> not been opened after a set period of time. Maybe that is the idea here.
> 
> Guenter
> 
> >* octeon-wdt has "disable", which completely disables the watchdog
> >   function. This "feature" was sneaked in via commit 381cec022e46
> >   ("watchdog: octeon-wdt: File cleaning.") which was supposed to be a
> >   cleanup-only patch, without any explanation nor even mention. I can't
> >   see how such an option can be useful. If you don't need the driver,
> >   just don't load it. Steven, can you explain?
> >
> >Thanks,
> >
> 

Kind regards,
Wim.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-02 20:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-27 12:38 Default behavior of watchdog drivers Jean Delvare
2018-09-27 13:08 ` Guenter Roeck
2018-10-01 11:36   ` Giel van Schijndel
2018-09-30 13:09 Wim Van Sebroeck
2018-09-30 13:58 ` Guenter Roeck
2018-10-02 11:39   ` Wim Van Sebroeck
2018-10-02 13:24     ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).