[PATCH v7 0/2] watchdog: allow setting deadline for opening /dev/watchdogN

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v7 0/2] watchdog: allow setting deadline for opening /dev/watchdogN
@ 2017-11-28 10:35 Rasmus Villemoes
  2017-11-28 10:35 ` [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
  2017-11-28 10:35 ` [PATCH 2/2] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
  0 siblings, 2 replies; 6+ messages in thread
From: Rasmus Villemoes @ 2017-11-28 10:35 UTC (permalink / raw)
  To: linux-watchdog, linux-doc, linux-kernel
  Cc: Wim Van Sebroeck, Esben Haabendal, mnhu, Rasmus Villemoes, Guenter Roeck

If a watchdog driver tells the framework that the device is running,
the framework takes care of feeding the watchdog until userspace opens
the device. If the userspace application which is supposed to do that
never comes up properly, the watchdog is fed indefinitely by the
kernel. This can be especially problematic for embedded devices.

The existing handle_boot_enabled cmdline parameter/config option
partially solves that, but that is only usable for the subset of
hardware watchdogs that have (or can be configured by the bootloader
to have) a timeout that is sufficient to make it realistic for
userspace to come up. Many devices have timeouts of a second, or even
less, making handle_boot_enabled insufficient.

These patches allow one to set a maximum time for which the kernel
will feed the watchdog, thus ensuring that either userspace has come
up, or the board gets reset. This allows fallback logic in the
bootloader to attempt some recovery (for example, if an automatic
update is in progress, it could roll back to the previous version).

The patches have been tested on a Raspberry Pi 2 and a Wandboard.

A preparatory patch of this series has already been merged
(c013b65ad8a1e "watchdog: introduce watchdog_worker_should_ping
helper"). On 2017-07-08, Guenter wrote [1]

  It is sufficiently different to handle_boot_enabled to keep it
  separate. I am mostly ok with the patch.  One comment below.

That one comment (regarding the placement of the module_param) has
been addressed in this version.

There has been some opposition to making the default value of
watchdog.open_timeout configurable in Kconfig, but in [2] Guenter said

  I used to be opposed to it, but it does seem to make some sense to
  me now after thinking about it.

I do hope that these patches can now find their way into the kernel,
but if 2/2 is somehow still controversial, please consider just taking
1/2. (I can't help but noting that handle_boot_enabled does get its
default value from Kconfig, and nobody complained about that when that
option was added).

[1] https://patchwork.kernel.org/patch/9754095/
[2] https://patchwork.kernel.org/patch/9754093/

Rasmus Villemoes (2):
  watchdog: introduce watchdog.open_timeout commandline parameter
  watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT

 Documentation/watchdog/watchdog-parameters.txt |  9 +++++++++
 drivers/watchdog/Kconfig                       |  9 +++++++++
 drivers/watchdog/watchdog_dev.c                | 27 +++++++++++++++++++++++++-
 3 files changed, 44 insertions(+), 1 deletion(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter
  2017-11-28 10:35 [PATCH v7 0/2] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
@ 2017-11-28 10:35 ` Rasmus Villemoes
  2017-11-28 22:14   ` Guenter Roeck
  2017-11-28 10:35 ` [PATCH 2/2] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
  1 sibling, 1 reply; 6+ messages in thread
From: Rasmus Villemoes @ 2017-11-28 10:35 UTC (permalink / raw)
  To: Wim Van Sebroeck, Guenter Roeck, Jonathan Corbet
  Cc: Esben Haabendal, mnhu, Rasmus Villemoes, linux-watchdog,
	linux-doc, linux-kernel

The watchdog framework takes care of feeding a hardware watchdog until
userspace opens /dev/watchdogN. If that never happens for some reason
(buggy init script, corrupt root filesystem or whatnot) but the kernel
itself is fine, the machine stays up indefinitely. This patch allows
setting an upper limit for how long the kernel will take care of the
watchdog, thus ensuring that the watchdog will eventually reset the
machine.

This is particularly useful for embedded devices where some fallback
logic is implemented in the bootloader (e.g., use a different root
partition, boot from network, ...).

The existing handle_boot_enabled parameter has the same purpose, but
that is only usable when the hardware watchdog has a sufficiently long
timeout (possibly configured by the bootloader). Many hardware watchdogs
cannot be configured, or can only be configured up to a certain value,
making the timeout short enough that it is completely impossible to have
userspace ready soon enough. Hence it is necessary for the kernel to
handle those watchdogs for a while.

The open timeout is also used as a maximum time for an application to
re-open /dev/watchdogN after closing it.

A value of 0 (the default) means infinite timeout, preserving the
current behaviour.

The unit is milliseconds rather than seconds because that covers more
use cases. For example, one can effectively disable the kernel handling
by setting the open_timeout to 1 ms. There are also customers with very
strict requirements that may want to set the open_timeout to something
like 4500 ms, which combined with a hardware watchdog that must be
pinged every 250 ms ensures userspace is up no more than 5 seconds after
the bootloader hands control to the kernel (250 ms until the driver gets
registered and kernel handling starts, 4500 ms of kernel handling, and
then up to 250 ms from the last ping until userspace takes over).

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Esben Haabendal <esben@haabendal.dk>
---
 Documentation/watchdog/watchdog-parameters.txt |  8 ++++++++
 drivers/watchdog/watchdog_dev.c                | 27 +++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
index 6f9d7b4..5363bf3 100644
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ b/Documentation/watchdog/watchdog-parameters.txt
@@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
 providing kernel parameters for builtin drivers versus loadable
 modules.
 
+The watchdog core parameter watchdog.open_timeout is the maximum time,
+in milliseconds, for which the watchdog framework will take care of
+pinging a hardware watchdog until userspace opens the corresponding
+/dev/watchdogN device. A value of 0 (the default) means an infinite
+timeout. Setting this to a non-zero value can be useful to ensure that
+either userspace comes up properly, or the board gets reset and allows
+fallback logic in the bootloader to try something else.
+
 
 -------------------------------------------------
 acquirewdt:
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 1e971a5..b4985db 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -67,6 +67,7 @@ struct watchdog_core_data {
 	struct mutex lock;
 	unsigned long last_keepalive;
 	unsigned long last_hw_keepalive;
+	unsigned long open_deadline;
 	struct delayed_work work;
 	unsigned long status;		/* Internal status bits */
 #define _WDOG_DEV_OPEN		0	/* Opened ? */
@@ -83,6 +84,19 @@ static struct workqueue_struct *watchdog_wq;
 
 static bool handle_boot_enabled =
 	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
+static unsigned open_timeout;
+
+static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
+{
+	if (!open_timeout)
+		return false;
+	return time_is_before_jiffies(data->open_deadline);
+}
+
+static void watchdog_set_open_deadline(struct watchdog_core_data *data)
+{
+	data->open_deadline = jiffies + msecs_to_jiffies(open_timeout);
+}
 
 static inline bool watchdog_need_worker(struct watchdog_device *wdd)
 {
@@ -200,7 +214,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data)
 {
 	struct watchdog_device *wdd = wd_data->wdd;
 
-	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));
+	if (!wdd)
+		return false;
+
+	if (watchdog_active(wdd))
+		return true;
+
+	return watchdog_hw_running(wdd) && !watchdog_past_open_deadline(wd_data);
 }
 
 static void watchdog_ping_work(struct work_struct *work)
@@ -861,6 +881,7 @@ static int watchdog_release(struct inode *inode, struct file *file)
 		watchdog_ping(wdd);
 	}
 
+	watchdog_set_open_deadline(wd_data);
 	watchdog_update_worker(wdd);
 
 	/* make sure that /dev/watchdog can be re-opened */
@@ -959,6 +980,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
 
 	/* Record time of most recent heartbeat as 'just before now'. */
 	wd_data->last_hw_keepalive = jiffies - 1;
+	watchdog_set_open_deadline(wd_data);
 
 	/*
 	 * If the watchdog is running, prevent its driver from being unloaded,
@@ -1156,3 +1178,6 @@ module_param(handle_boot_enabled, bool, 0444);
 MODULE_PARM_DESC(handle_boot_enabled,
 	"Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
 	__MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
+module_param(open_timeout, uint, 0644);
+MODULE_PARM_DESC(open_timeout,
+		 "Maximum time in milliseconds for userspace to take over handling enabled watchdogs (0 = infinite)");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT
  2017-11-28 10:35 [PATCH v7 0/2] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
  2017-11-28 10:35 ` [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
@ 2017-11-28 10:35 ` Rasmus Villemoes
  1 sibling, 0 replies; 6+ messages in thread
From: Rasmus Villemoes @ 2017-11-28 10:35 UTC (permalink / raw)
  To: Wim Van Sebroeck, Guenter Roeck, Jonathan Corbet
  Cc: Esben Haabendal, mnhu, Rasmus Villemoes, linux-watchdog,
	linux-doc, linux-kernel

This allows setting a default value for the watchdog.open_timeout
commandline parameter via Kconfig.

Some BSPs allow remote updating of the kernel image and root file
system, but updating the bootloader requires physical access. Hence, if
one has a firmware update that requires relaxing the
watchdog.open_timeout a little, the value used must be baked into the
kernel image itself and cannot come from the u-boot environment via the
kernel command line.

Being able to set the initial value in .config doesn't change the fact
that the value on the command line, if present, takes precedence, and is
of course immensely useful for development purposes while one has
console acccess, as well as usable in the cases where one can make a
permanent update of the kernel command line.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Esben Haabendal <esben@haabendal.dk>
---
 Documentation/watchdog/watchdog-parameters.txt | 3 ++-
 drivers/watchdog/Kconfig                       | 9 +++++++++
 drivers/watchdog/watchdog_dev.c                | 2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
index 5363bf3..60dd2be 100644
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ b/Documentation/watchdog/watchdog-parameters.txt
@@ -11,7 +11,8 @@ modules.
 The watchdog core parameter watchdog.open_timeout is the maximum time,
 in milliseconds, for which the watchdog framework will take care of
 pinging a hardware watchdog until userspace opens the corresponding
-/dev/watchdogN device. A value of 0 (the default) means an infinite
+/dev/watchdogN device. The defalt value is
+CONFIG_WATCHDOG_OPEN_TIMEOUT. A value of 0 means an infinite
 timeout. Setting this to a non-zero value can be useful to ensure that
 either userspace comes up properly, or the board gets reset and allows
 fallback logic in the bootloader to try something else.
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index ca200d1..a142e1e 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -63,6 +63,15 @@ config WATCHDOG_SYSFS
 	  Say Y here if you want to enable watchdog device status read through
 	  sysfs attributes.
 
+config WATCHDOG_OPEN_TIMEOUT
+	int "Timeout value for opening watchdog device"
+	default 0
+	help
+	  The maximum time, in milliseconds, for which the watchdog
+	  framework takes care of pinging a hardware watchdog. A value
+	  of 0 means infinite. The value set here can be overridden by
+	  the commandline parameter "watchdog.open_timeout".
+
 #
 # General Watchdog drivers
 #
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index b4985db..9f18952 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -84,7 +84,7 @@ static struct workqueue_struct *watchdog_wq;
 
 static bool handle_boot_enabled =
 	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
-static unsigned open_timeout;
+static unsigned open_timeout = CONFIG_WATCHDOG_OPEN_TIMEOUT;
 
 static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter
  2017-11-28 10:35 ` [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
@ 2017-11-28 22:14   ` Guenter Roeck
  2017-11-29 10:56     ` Rasmus Villemoes
  0 siblings, 1 reply; 6+ messages in thread
From: Guenter Roeck @ 2017-11-28 22:14 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Wim Van Sebroeck, Jonathan Corbet, Esben Haabendal, mnhu,
	linux-watchdog, linux-doc, linux-kernel

On Tue, Nov 28, 2017 at 11:35:49AM +0100, Rasmus Villemoes wrote:
> The watchdog framework takes care of feeding a hardware watchdog until
> userspace opens /dev/watchdogN. If that never happens for some reason
> (buggy init script, corrupt root filesystem or whatnot) but the kernel
> itself is fine, the machine stays up indefinitely. This patch allows
> setting an upper limit for how long the kernel will take care of the
> watchdog, thus ensuring that the watchdog will eventually reset the
> machine.
> 
> This is particularly useful for embedded devices where some fallback
> logic is implemented in the bootloader (e.g., use a different root
> partition, boot from network, ...).
> 
> The existing handle_boot_enabled parameter has the same purpose, but
> that is only usable when the hardware watchdog has a sufficiently long
> timeout (possibly configured by the bootloader). Many hardware watchdogs
> cannot be configured, or can only be configured up to a certain value,
> making the timeout short enough that it is completely impossible to have
> userspace ready soon enough. Hence it is necessary for the kernel to
> handle those watchdogs for a while.
> 
> The open timeout is also used as a maximum time for an application to
> re-open /dev/watchdogN after closing it.
> 
> A value of 0 (the default) means infinite timeout, preserving the
> current behaviour.
> 
> The unit is milliseconds rather than seconds because that covers more
> use cases. For example, one can effectively disable the kernel handling
> by setting the open_timeout to 1 ms. There are also customers with very
> strict requirements that may want to set the open_timeout to something
> like 4500 ms, which combined with a hardware watchdog that must be
> pinged every 250 ms ensures userspace is up no more than 5 seconds after
> the bootloader hands control to the kernel (250 ms until the driver gets
> registered and kernel handling starts, 4500 ms of kernel handling, and
> then up to 250 ms from the last ping until userspace takes over).

This is quite vague, especially since it doesn't count the time from
boot to starting the watchdog driver, which can vary even across boots.
Why not make it specific, for example by adjusting the open timeout with
ktime_get_boot_ns() ?

I would actually make it even more specific and calculate the open
timeout such that the system would reboot after open_timeout, not
after <open_timeout + hardware_timeout>. Any reason for not doing that ?
The upside would be more accuracy, and I don't really see a downside.

Thanks,
Guenter

> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> Reviewed-by: Esben Haabendal <esben@haabendal.dk>
> ---
>  Documentation/watchdog/watchdog-parameters.txt |  8 ++++++++
>  drivers/watchdog/watchdog_dev.c                | 27 +++++++++++++++++++++++++-
>  2 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
> index 6f9d7b4..5363bf3 100644
> --- a/Documentation/watchdog/watchdog-parameters.txt
> +++ b/Documentation/watchdog/watchdog-parameters.txt
> @@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
>  providing kernel parameters for builtin drivers versus loadable
>  modules.
>  
> +The watchdog core parameter watchdog.open_timeout is the maximum time,
> +in milliseconds, for which the watchdog framework will take care of
> +pinging a hardware watchdog until userspace opens the corresponding
> +/dev/watchdogN device. A value of 0 (the default) means an infinite
> +timeout. Setting this to a non-zero value can be useful to ensure that
> +either userspace comes up properly, or the board gets reset and allows
> +fallback logic in the bootloader to try something else.
> +
>  
>  -------------------------------------------------
>  acquirewdt:
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 1e971a5..b4985db 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -67,6 +67,7 @@ struct watchdog_core_data {
>  	struct mutex lock;
>  	unsigned long last_keepalive;
>  	unsigned long last_hw_keepalive;
> +	unsigned long open_deadline;
>  	struct delayed_work work;
>  	unsigned long status;		/* Internal status bits */
>  #define _WDOG_DEV_OPEN		0	/* Opened ? */
> @@ -83,6 +84,19 @@ static struct workqueue_struct *watchdog_wq;
>  
>  static bool handle_boot_enabled =
>  	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
> +static unsigned open_timeout;
> +
> +static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
> +{
> +	if (!open_timeout)
> +		return false;
> +	return time_is_before_jiffies(data->open_deadline);
> +}
> +
> +static void watchdog_set_open_deadline(struct watchdog_core_data *data)
> +{
> +	data->open_deadline = jiffies + msecs_to_jiffies(open_timeout);


> +}
>  
>  static inline bool watchdog_need_worker(struct watchdog_device *wdd)
>  {
> @@ -200,7 +214,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data)
>  {
>  	struct watchdog_device *wdd = wd_data->wdd;
>  
> -	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));
> +	if (!wdd)
> +		return false;
> +
> +	if (watchdog_active(wdd))
> +		return true;
> +
> +	return watchdog_hw_running(wdd) && !watchdog_past_open_deadline(wd_data);
>  }
>  
>  static void watchdog_ping_work(struct work_struct *work)
> @@ -861,6 +881,7 @@ static int watchdog_release(struct inode *inode, struct file *file)
>  		watchdog_ping(wdd);
>  	}
>  
> +	watchdog_set_open_deadline(wd_data);
>  	watchdog_update_worker(wdd);
>  
>  	/* make sure that /dev/watchdog can be re-opened */
> @@ -959,6 +980,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
>  
>  	/* Record time of most recent heartbeat as 'just before now'. */
>  	wd_data->last_hw_keepalive = jiffies - 1;
> +	watchdog_set_open_deadline(wd_data);
>  
>  	/*
>  	 * If the watchdog is running, prevent its driver from being unloaded,
> @@ -1156,3 +1178,6 @@ module_param(handle_boot_enabled, bool, 0444);
>  MODULE_PARM_DESC(handle_boot_enabled,
>  	"Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
>  	__MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
> +module_param(open_timeout, uint, 0644);
> +MODULE_PARM_DESC(open_timeout,
> +		 "Maximum time in milliseconds for userspace to take over handling enabled watchdogs (0 = infinite)");
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter
  2017-11-28 22:14   ` Guenter Roeck
@ 2017-11-29 10:56     ` Rasmus Villemoes
  2017-11-29 15:48       ` Guenter Roeck
  0 siblings, 1 reply; 6+ messages in thread
From: Rasmus Villemoes @ 2017-11-29 10:56 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Wim Van Sebroeck, Jonathan Corbet, Esben Haabendal, mnhu,
	linux-watchdog, linux-doc, linux-kernel

On 2017-11-28 23:14, Guenter Roeck wrote:
> On Tue, Nov 28, 2017 at 11:35:49AM +0100, Rasmus Villemoes wrote:
>>
>> The unit is milliseconds rather than seconds because that covers more
>> use cases. For example, one can effectively disable the kernel handling
>> by setting the open_timeout to 1 ms. There are also customers with very
>> strict requirements that may want to set the open_timeout to something
>> like 4500 ms, which combined with a hardware watchdog that must be
>> pinged every 250 ms ensures userspace is up no more than 5 seconds after
>> the bootloader hands control to the kernel (250 ms until the driver gets
>> registered and kernel handling starts, 4500 ms of kernel handling, and
>> then up to 250 ms from the last ping until userspace takes over).
> 
> This is quite vague, especially since it doesn't count the time from
> boot to starting the watchdog driver,

My example is bad, and I now realize one cannot really get such precise
guarantees. But the example _did_ actually account for the time from
boot to device registration - it allowed 250 ms for the kernel to get
that far.

which can vary even across boots.
> Why not make it specific, for example by adjusting the open timeout with
> ktime_get_boot_ns() ?

If by "boot" we mean the moment the bootloader hands control to the
kernel, ktime_get_boot_ns() doesn't give that either - at best, it gives
an approximation of the time since timekeeping_init(), but it's not very
accurate that early (I simply injected printks of ktime_get_boot_ns at
various places in init/main.c and timestamped the output lines). If it
overshoots, we'd be subtracting more of the allowance than we should,
and I don't think we have any way of knowing when that happens or to
correct for it. So I'd rather keep the code simple and let it count from
the time the watchdog framework knows about the device, which is also
around the time when the kernel's timekeeping is reasonably accurate.

> I would actually make it even more specific and calculate the open
> timeout such that the system would reboot after open_timeout, not
> after <open_timeout + hardware_timeout>. Any reason for not doing that ?
> The upside would be more accuracy, and I don't really see a downside.

I don't think it would be that much more accurate - we schedule the
pings at a frequency of half the max_hw_heartbeat_ms==$x, with the
current code we'd get rebooted somewhere between [open_deadline + $x/2,
open_deadline + $x], and subtracting $x from the open_timeout that would
become [open_deadline - $x/2, open_deadline]. I'd rather not have the
reboot happen before the open_deadline. Sure, we could subtract $x/2
instead. Then there's the case where ->max_hw_heartbeat_ms is not set,
so we have to use ->timeout for $x, and then there's the case of $x (or
$x/2) being greater than $open_timeout. I'd really like to keep the code
simple. If it helps, I'd be happy to document the exact semantics of the
open_timeout with a nice ascii art timeline.

Rasmus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter
  2017-11-29 10:56     ` Rasmus Villemoes
@ 2017-11-29 15:48       ` Guenter Roeck
  0 siblings, 0 replies; 6+ messages in thread
From: Guenter Roeck @ 2017-11-29 15:48 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Wim Van Sebroeck, Jonathan Corbet, Esben Haabendal, mnhu,
	linux-watchdog, linux-doc, linux-kernel

On Wed, Nov 29, 2017 at 11:56:57AM +0100, Rasmus Villemoes wrote:
> On 2017-11-28 23:14, Guenter Roeck wrote:
> > On Tue, Nov 28, 2017 at 11:35:49AM +0100, Rasmus Villemoes wrote:
> >>
> >> The unit is milliseconds rather than seconds because that covers more
> >> use cases. For example, one can effectively disable the kernel handling
> >> by setting the open_timeout to 1 ms. There are also customers with very
> >> strict requirements that may want to set the open_timeout to something
> >> like 4500 ms, which combined with a hardware watchdog that must be
> >> pinged every 250 ms ensures userspace is up no more than 5 seconds after
> >> the bootloader hands control to the kernel (250 ms until the driver gets
> >> registered and kernel handling starts, 4500 ms of kernel handling, and
> >> then up to 250 ms from the last ping until userspace takes over).
> > 
> > This is quite vague, especially since it doesn't count the time from
> > boot to starting the watchdog driver,
> 
> My example is bad, and I now realize one cannot really get such precise
> guarantees. But the example _did_ actually account for the time from
> boot to device registration - it allowed 250 ms for the kernel to get
> that far.
> 
> which can vary even across boots.
> > Why not make it specific, for example by adjusting the open timeout with
> > ktime_get_boot_ns() ?
> 
> If by "boot" we mean the moment the bootloader hands control to the
> kernel, ktime_get_boot_ns() doesn't give that either - at best, it gives
> an approximation of the time since timekeeping_init(), but it's not very
> accurate that early (I simply injected printks of ktime_get_boot_ns at
> various places in init/main.c and timestamped the output lines). If it
> overshoots, we'd be subtracting more of the allowance than we should,
> and I don't think we have any way of knowing when that happens or to
> correct for it. So I'd rather keep the code simple and let it count from
> the time the watchdog framework knows about the device, which is also
> around the time when the kernel's timekeeping is reasonably accurate.
> 
We should try to make things as accurate as possible. "It won't be perfect"
is not an argument against that.

> > I would actually make it even more specific and calculate the open
> > timeout such that the system would reboot after open_timeout, not
> > after <open_timeout + hardware_timeout>. Any reason for not doing that ?
> > The upside would be more accuracy, and I don't really see a downside.
> 
> I don't think it would be that much more accurate - we schedule the
> pings at a frequency of half the max_hw_heartbeat_ms==$x, with the
> current code we'd get rebooted somewhere between [open_deadline + $x/2,
> open_deadline + $x], and subtracting $x from the open_timeout that would
> become [open_deadline - $x/2, open_deadline]. I'd rather not have the
> reboot happen before the open_deadline. Sure, we could subtract $x/2
> instead. Then there's the case where ->max_hw_heartbeat_ms is not set,
> so we have to use ->timeout for $x, and then there's the case of $x (or
> $x/2) being greater than $open_timeout. I'd really like to keep the code
> simple. If it helps, I'd be happy to document the exact semantics of the
> open_timeout with a nice ascii art timeline.
> 
It was not much of a problem to get that right after a watchdog was opened,
by timing pings accordingly such that the HW times out when it should.
It should not be that hard to get it working for the pre-open case as well.

Guenter

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-29 15:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-28 10:35 [PATCH v7 0/2] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
2017-11-28 10:35 ` [PATCH 1/2] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
2017-11-28 22:14   ` Guenter Roeck
2017-11-29 10:56     ` Rasmus Villemoes
2017-11-29 15:48       ` Guenter Roeck
2017-11-28 10:35 ` [PATCH 2/2] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.