[PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN

linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN
@ 2019-01-16 12:14 Rasmus Villemoes
  2019-01-16 12:14 ` [PATCH v8 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-16 12:14 UTC (permalink / raw)
  To: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Esben Haabendal, martin, Rasmus Villemoes

If a watchdog driver tells the framework that the device is running,
the framework takes care of feeding the watchdog until userspace opens
the device. If the userspace application which is supposed to do that
never comes up properly, the watchdog is fed indefinitely by the
kernel. This can be especially problematic for embedded devices.

The existing handle_boot_enabled cmdline parameter/config option
partially solves that, but that is only usable for the subset of
hardware watchdogs that have (or can be configured by the bootloader
to have) a timeout that is sufficient to make it realistic for
userspace to come up. Many devices have timeouts of only a few
seconds, or even less, making handle_boot_enabled insufficient.

These patches allow one to set a maximum time for which the kernel
will feed the watchdog, thus ensuring that either userspace has come
up, or the board gets reset. This allows fallback logic in the
bootloader to attempt some recovery (for example, if an automatic
update is in progress, it could roll back to the previous version).

The patches have been tested on a Raspberry Pi 2 and a Wandboard.

Changes in v8: Redo on top of 5.0-rc1 - in particular, adapt to the
jiffies->ktime_t conversion (1ff68820 "watchdog: core: make sure the
watchdog_worker is not deferred"). Add a patch to make the hardware
timeout at the deadline as requested by Guenther - which was actually
made very easy by the ktime_t conversion.

v7 submission at <https://lore.kernel.org/lkml/1511865350-20665-1-git-send-email-rasmus.villemoes@prevas.dk/>

Rasmus Villemoes (3):
  watchdog: introduce watchdog.open_timeout commandline parameter
  watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT
  watchdog: make the device time out at open_deadline when open_timeout
    is used

 .../watchdog/watchdog-parameters.txt          |  9 ++++
 drivers/watchdog/Kconfig                      |  9 ++++
 drivers/watchdog/watchdog_dev.c               | 42 +++++++++++++++----
 3 files changed, 53 insertions(+), 7 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v8 1/3] watchdog: introduce watchdog.open_timeout commandline parameter
  2019-01-16 12:14 [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
@ 2019-01-16 12:14 ` Rasmus Villemoes
  2019-01-17 21:24   ` Guenter Roeck
  2019-01-16 12:14 ` [PATCH v8 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-16 12:14 UTC (permalink / raw)
  To: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Esben Haabendal, martin, Rasmus Villemoes

The watchdog framework takes care of feeding a hardware watchdog until
userspace opens /dev/watchdogN. If that never happens for some reason
(buggy init script, corrupt root filesystem or whatnot) but the kernel
itself is fine, the machine stays up indefinitely. This patch allows
setting an upper limit for how long the kernel will take care of the
watchdog, thus ensuring that the watchdog will eventually reset the
machine.

A value of 0 (the default) means infinite timeout, preserving the
current behaviour.

This is particularly useful for embedded devices where some fallback
logic is implemented in the bootloader (e.g., use a different root
partition, boot from network, ...).

There is already handle_boot_enabled serving a similar purpose. However,
such a binary choice is unsuitable if the hardware watchdog cannot be
programmed by the bootloader to provide a timeout long enough for
userspace to get up and running. Many of the embedded devices we see use
external (gpio-triggered) watchdogs with a fixed timeout of the order of
1-2 seconds.

The open timeout is also used as a maximum time for an application to
re-open /dev/watchdogN after closing it. Again, while the kernel already
has a nowayout mechanism, using that means userspace is at the mercy of
whatever timeout the hardware has.

Being a module parameter, one can revert to the ordinary behaviour of
having the kernel maintain the watchdog indefinitely by simply writing 0
to /sys/... after initially opening /dev/watchdog; conversely, one can
of course also have the current behaviour of allowing indefinite time
until the first open, and then set that module parameter.

The unit is milliseconds rather than seconds because that covers more
use cases. For example, userspace might need a long time to get in the
air initially, requiring a somewhat liberal open_timeout, but when (for
whatever reason) the application might then want to re-exec itself, it
can set a much smaller threshold.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 .../watchdog/watchdog-parameters.txt          |  8 +++++
 drivers/watchdog/watchdog_dev.c               | 30 +++++++++++++++++--
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
index 0b88e333f9e1..5e4235989154 100644
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ b/Documentation/watchdog/watchdog-parameters.txt
@@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
 providing kernel parameters for builtin drivers versus loadable
 modules.
 
+The watchdog core parameter watchdog.open_timeout is the maximum time,
+in milliseconds, for which the watchdog framework will take care of
+pinging a hardware watchdog until userspace opens the corresponding
+/dev/watchdogN device. A value of 0 (the default) means an infinite
+timeout. Setting this to a non-zero value can be useful to ensure that
+either userspace comes up properly, or the board gets reset and allows
+fallback logic in the bootloader to try something else.
+
 
 -------------------------------------------------
 acquirewdt:
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index f6c24b22b37c..a9585925458f 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -69,6 +69,7 @@ struct watchdog_core_data {
 	struct mutex lock;
 	ktime_t last_keepalive;
 	ktime_t last_hw_keepalive;
+	ktime_t open_deadline;
 	struct hrtimer timer;
 	struct kthread_work work;
 	unsigned long status;		/* Internal status bits */
@@ -87,6 +88,19 @@ static struct kthread_worker *watchdog_kworker;
 static bool handle_boot_enabled =
 	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
 
+static unsigned open_timeout;
+
+static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
+{
+	return ktime_after(ktime_get(), data->open_deadline);
+}
+
+static void watchdog_set_open_deadline(struct watchdog_core_data *data)
+{
+	data->open_deadline = open_timeout ?
+		ktime_get() + ms_to_ktime(open_timeout) : KTIME_MAX;
+}
+
 static inline bool watchdog_need_worker(struct watchdog_device *wdd)
 {
 	/* All variables in milli-seconds */
@@ -211,7 +225,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data)
 {
 	struct watchdog_device *wdd = wd_data->wdd;
 
-	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));
+	if (!wdd)
+		return false;
+
+	if (watchdog_active(wdd))
+		return true;
+
+	return watchdog_hw_running(wdd) && !watchdog_past_open_deadline(wd_data);
 }
 
 static void watchdog_ping_work(struct kthread_work *work)
@@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
 		return -EBUSY;
 	}
 
-	if (wdd->ops->stop) {
+	if (wdd->ops->stop && !open_timeout) {
 		clear_bit(WDOG_HW_RUNNING, &wdd->status);
 		err = wdd->ops->stop(wdd);
 	} else {
@@ -883,6 +903,7 @@ static int watchdog_release(struct inode *inode, struct file *file)
 		watchdog_ping(wdd);
 	}
 
+	watchdog_set_open_deadline(wd_data);
 	watchdog_update_worker(wdd);
 
 	/* make sure that /dev/watchdog can be re-opened */
@@ -983,6 +1004,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
 
 	/* Record time of most recent heartbeat as 'just before now'. */
 	wd_data->last_hw_keepalive = ktime_sub(ktime_get(), 1);
+	watchdog_set_open_deadline(wd_data);
 
 	/*
 	 * If the watchdog is running, prevent its driver from being unloaded,
@@ -1181,3 +1203,7 @@ module_param(handle_boot_enabled, bool, 0444);
 MODULE_PARM_DESC(handle_boot_enabled,
 	"Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
 	__MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
+
+module_param(open_timeout, uint, 0644);
+MODULE_PARM_DESC(open_timeout,
+	"Maximum time (in milliseconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT
  2019-01-16 12:14 [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
  2019-01-16 12:14 ` [PATCH v8 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
@ 2019-01-16 12:14 ` Rasmus Villemoes
  2019-01-16 12:14 ` [PATCH v8 3/3] watchdog: make the device time out at open_deadline when open_timeout is used Rasmus Villemoes
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-16 12:14 UTC (permalink / raw)
  To: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Esben Haabendal, martin, Rasmus Villemoes

This allows setting a default value for the watchdog.open_timeout
commandline parameter via Kconfig.

Some BSPs allow remote updating of the kernel image and root file
system, but updating the bootloader requires physical access. Hence, if
one has a firmware update that requires relaxing the
watchdog.open_timeout a little, the value used must be baked into the
kernel image itself and cannot come from the u-boot environment via the
kernel command line.

Being able to set the initial value in .config doesn't change the fact
that the value on the command line, if present, takes precedence, and is
of course immensely useful for development purposes while one has
console acccess, as well as usable in the cases where one can make a
permanent update of the kernel command line.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 Documentation/watchdog/watchdog-parameters.txt | 3 ++-
 drivers/watchdog/Kconfig                       | 9 +++++++++
 drivers/watchdog/watchdog_dev.c                | 5 +++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
index 5e4235989154..f550f4fb831d 100644
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ b/Documentation/watchdog/watchdog-parameters.txt
@@ -11,7 +11,8 @@ modules.
 The watchdog core parameter watchdog.open_timeout is the maximum time,
 in milliseconds, for which the watchdog framework will take care of
 pinging a hardware watchdog until userspace opens the corresponding
-/dev/watchdogN device. A value of 0 (the default) means an infinite
+/dev/watchdogN device. The defalt value is
+CONFIG_WATCHDOG_OPEN_TIMEOUT. A value of 0 means an infinite
 timeout. Setting this to a non-zero value can be useful to ensure that
 either userspace comes up properly, or the board gets reset and allows
 fallback logic in the bootloader to try something else.
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 57f017d74a97..289d13fecc27 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -63,6 +63,15 @@ config WATCHDOG_SYSFS
 	  Say Y here if you want to enable watchdog device status read through
 	  sysfs attributes.
 
+config WATCHDOG_OPEN_TIMEOUT
+	int "Timeout value for opening watchdog device"
+	default 0
+	help
+	  The maximum time, in milliseconds, for which the watchdog
+	  framework takes care of pinging a hardware watchdog. A value
+	  of 0 means infinite. The value set here can be overridden by
+	  the commandline parameter "watchdog.open_timeout".
+
 #
 # General Watchdog drivers
 #
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index a9585925458f..804da5b2ce02 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -88,7 +88,7 @@ static struct kthread_worker *watchdog_kworker;
 static bool handle_boot_enabled =
 	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
 
-static unsigned open_timeout;
+static unsigned open_timeout = CONFIG_WATCHDOG_OPEN_TIMEOUT;
 
 static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
 {
@@ -1206,4 +1206,5 @@ MODULE_PARM_DESC(handle_boot_enabled,
 
 module_param(open_timeout, uint, 0644);
 MODULE_PARM_DESC(open_timeout,
-	"Maximum time (in milliseconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");
+	"Maximum time (in milliseconds, 0 means infinity) for userspace to take over a running watchdog (default="
+	__MODULE_STRING(CONFIG_WATCHDOG_OPEN_TIMEOUT) ")");
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 3/3] watchdog: make the device time out at open_deadline when open_timeout is used
  2019-01-16 12:14 [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
  2019-01-16 12:14 ` [PATCH v8 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
  2019-01-16 12:14 ` [PATCH v8 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
@ 2019-01-16 12:14 ` Rasmus Villemoes
  2019-01-16 14:45 ` [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Esben Haabendal
  2019-01-21 20:45 ` [PATCH v9 " Rasmus Villemoes
  4 siblings, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-16 12:14 UTC (permalink / raw)
  To: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Esben Haabendal, martin, Rasmus Villemoes

When the watchdog device is not open by userspace, the kernel takes
care of pinging it. When the open_timeout feature is in use, we should
ensure that the hardware fires close to open_timeout ms after the
kernel has assumed responsibility for the device (either at boot, or
after userspace has had it open and magic-closed it).

To do this, simply reuse the logic that is already in place for
ensuring the same thing when userspace is responsible for regularly
pinging the device:

- When watchdog_active(wdd), this patch doesn't change anything.

- When !watchdoc_active(wdd), the "virtual timeout" should be taken to
be ->open_deadline". When the open_timeout feature is not used (i.e.,
when open_timeout was 0 the last time watchdog_set_open_deadline was
called), ->open_deadline is KTIME_MAX, and the arithmetic ends up
returning keepalive_interval as we used to.

This has been tested on a Wandboard with various combinations of
open_timeout (including fractional-seconds settings) and timeout-sec
properties for the on-board watchdog by booting with 'init=/bin/sh',
timestamping the lines on the serial console, and comparing the
timestamp of the 'imx2-wdt 20bc000.wdog: timeout nnn sec' line with
the timestamp of the 'U-Boot SPL ...' line (which appears just after
reset).

Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 drivers/watchdog/watchdog_dev.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 804da5b2ce02..39dd2329cbea 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -133,14 +133,15 @@ static ktime_t watchdog_next_keepalive(struct watchdog_device *wdd)
 	ktime_t virt_timeout;
 	unsigned int hw_heartbeat_ms;
 
-	virt_timeout = ktime_add(wd_data->last_keepalive,
-				 ms_to_ktime(timeout_ms));
+	if (watchdog_active(wdd))
+		virt_timeout = ktime_add(wd_data->last_keepalive,
+					 ms_to_ktime(timeout_ms));
+	else
+		virt_timeout = wd_data->open_deadline;
+
 	hw_heartbeat_ms = min_not_zero(timeout_ms, wdd->max_hw_heartbeat_ms);
 	keepalive_interval = ms_to_ktime(hw_heartbeat_ms / 2);
 
-	if (!watchdog_active(wdd))
-		return keepalive_interval;
-
 	/*
 	 * To ensure that the watchdog times out wdd->timeout seconds
 	 * after the most recent ping from userspace, the last
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN
  2019-01-16 12:14 [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
                   ` (2 preceding siblings ...)
  2019-01-16 12:14 ` [PATCH v8 3/3] watchdog: make the device time out at open_deadline when open_timeout is used Rasmus Villemoes
@ 2019-01-16 14:45 ` Esben Haabendal
  2019-01-21 20:45 ` [PATCH v9 " Rasmus Villemoes
  4 siblings, 0 replies; 15+ messages in thread
From: Esben Haabendal @ 2019-01-16 14:45 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet,
	linux-kernel, linux-doc, martin, Rasmus Villemoes

Rasmus Villemoes <rasmus.villemoes@prevas.dk> writes:

> If a watchdog driver tells the framework that the device is running,
> the framework takes care of feeding the watchdog until userspace opens
> the device. If the userspace application which is supposed to do that
> never comes up properly, the watchdog is fed indefinitely by the
> kernel. This can be especially problematic for embedded devices.
>
> The existing handle_boot_enabled cmdline parameter/config option
> partially solves that, but that is only usable for the subset of
> hardware watchdogs that have (or can be configured by the bootloader
> to have) a timeout that is sufficient to make it realistic for
> userspace to come up. Many devices have timeouts of only a few
> seconds, or even less, making handle_boot_enabled insufficient.
>
> These patches allow one to set a maximum time for which the kernel
> will feed the watchdog, thus ensuring that either userspace has come
> up, or the board gets reset. This allows fallback logic in the
> bootloader to attempt some recovery (for example, if an automatic
> update is in progress, it could roll back to the previous version).
>
> The patches have been tested on a Raspberry Pi 2 and a Wandboard.
>
> Changes in v8: Redo on top of 5.0-rc1 - in particular, adapt to the
> jiffies->ktime_t conversion (1ff68820 "watchdog: core: make sure the
> watchdog_worker is not deferred"). Add a patch to make the hardware
> timeout at the deadline as requested by Guenther - which was actually
> made very easy by the ktime_t conversion.
>
> v7 submission at <https://lore.kernel.org/lkml/1511865350-20665-1-git-send-email-rasmus.villemoes@prevas.dk/>
>
> Rasmus Villemoes (3):
>   watchdog: introduce watchdog.open_timeout commandline parameter
>   watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT
>   watchdog: make the device time out at open_deadline when open_timeout
>     is used
>
>  .../watchdog/watchdog-parameters.txt          |  9 ++++
>  drivers/watchdog/Kconfig                      |  9 ++++
>  drivers/watchdog/watchdog_dev.c               | 42 +++++++++++++++----
>  3 files changed, 53 insertions(+), 7 deletions(-)

For the whole series:
Acked-by: Esben Haabendal <esben@haabendal.dk>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 1/3] watchdog: introduce watchdog.open_timeout commandline parameter
  2019-01-16 12:14 ` [PATCH v8 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
@ 2019-01-17 21:24   ` Guenter Roeck
  0 siblings, 0 replies; 15+ messages in thread
From: Guenter Roeck @ 2019-01-17 21:24 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: linux-watchdog, Wim Van Sebroeck, Jonathan Corbet, linux-kernel,
	linux-doc, Esben Haabendal, martin, Rasmus Villemoes

On Wed, Jan 16, 2019 at 12:14:42PM +0000, Rasmus Villemoes wrote:
> The watchdog framework takes care of feeding a hardware watchdog until
> userspace opens /dev/watchdogN. If that never happens for some reason
> (buggy init script, corrupt root filesystem or whatnot) but the kernel
> itself is fine, the machine stays up indefinitely. This patch allows
> setting an upper limit for how long the kernel will take care of the
> watchdog, thus ensuring that the watchdog will eventually reset the
> machine.
> 
> A value of 0 (the default) means infinite timeout, preserving the
> current behaviour.
> 
> This is particularly useful for embedded devices where some fallback
> logic is implemented in the bootloader (e.g., use a different root
> partition, boot from network, ...).
> 
> There is already handle_boot_enabled serving a similar purpose. However,
> such a binary choice is unsuitable if the hardware watchdog cannot be
> programmed by the bootloader to provide a timeout long enough for
> userspace to get up and running. Many of the embedded devices we see use
> external (gpio-triggered) watchdogs with a fixed timeout of the order of
> 1-2 seconds.
> 
> The open timeout is also used as a maximum time for an application to
> re-open /dev/watchdogN after closing it. Again, while the kernel already
> has a nowayout mechanism, using that means userspace is at the mercy of
> whatever timeout the hardware has.
> 
> Being a module parameter, one can revert to the ordinary behaviour of
> having the kernel maintain the watchdog indefinitely by simply writing 0
> to /sys/... after initially opening /dev/watchdog; conversely, one can
> of course also have the current behaviour of allowing indefinite time
> until the first open, and then set that module parameter.
> 
> The unit is milliseconds rather than seconds because that covers more
> use cases. For example, userspace might need a long time to get in the
> air initially, requiring a somewhat liberal open_timeout, but when (for
> whatever reason) the application might then want to re-exec itself, it
> can set a much smaller threshold.
> 

I don't buy this use case. We are not in control of the exact time between
hand-off to the Linux kernel and to driver instantiation, and we can not even
measure it. A less-than-one second granularity, especially since it is from
instantiation time and not even from handoff time, does not really make
sense and would only create a false impression of accuracy. Let's stick with
seconds.

Guenter

> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>  .../watchdog/watchdog-parameters.txt          |  8 +++++
>  drivers/watchdog/watchdog_dev.c               | 30 +++++++++++++++++--
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
> index 0b88e333f9e1..5e4235989154 100644
> --- a/Documentation/watchdog/watchdog-parameters.txt
> +++ b/Documentation/watchdog/watchdog-parameters.txt
> @@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
>  providing kernel parameters for builtin drivers versus loadable
>  modules.
>  
> +The watchdog core parameter watchdog.open_timeout is the maximum time,
> +in milliseconds, for which the watchdog framework will take care of
> +pinging a hardware watchdog until userspace opens the corresponding
> +/dev/watchdogN device. A value of 0 (the default) means an infinite
> +timeout. Setting this to a non-zero value can be useful to ensure that
> +either userspace comes up properly, or the board gets reset and allows
> +fallback logic in the bootloader to try something else.
> +
>  
>  -------------------------------------------------
>  acquirewdt:
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index f6c24b22b37c..a9585925458f 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -69,6 +69,7 @@ struct watchdog_core_data {
>  	struct mutex lock;
>  	ktime_t last_keepalive;
>  	ktime_t last_hw_keepalive;
> +	ktime_t open_deadline;
>  	struct hrtimer timer;
>  	struct kthread_work work;
>  	unsigned long status;		/* Internal status bits */
> @@ -87,6 +88,19 @@ static struct kthread_worker *watchdog_kworker;
>  static bool handle_boot_enabled =
>  	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
>  
> +static unsigned open_timeout;
> +
> +static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
> +{
> +	return ktime_after(ktime_get(), data->open_deadline);
> +}
> +
> +static void watchdog_set_open_deadline(struct watchdog_core_data *data)
> +{
> +	data->open_deadline = open_timeout ?
> +		ktime_get() + ms_to_ktime(open_timeout) : KTIME_MAX;
> +}
> +
>  static inline bool watchdog_need_worker(struct watchdog_device *wdd)
>  {
>  	/* All variables in milli-seconds */
> @@ -211,7 +225,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data)
>  {
>  	struct watchdog_device *wdd = wd_data->wdd;
>  
> -	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));
> +	if (!wdd)
> +		return false;
> +
> +	if (watchdog_active(wdd))
> +		return true;
> +
> +	return watchdog_hw_running(wdd) && !watchdog_past_open_deadline(wd_data);
>  }
>  
>  static void watchdog_ping_work(struct kthread_work *work)
> @@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
>  		return -EBUSY;
>  	}
>  
> -	if (wdd->ops->stop) {
> +	if (wdd->ops->stop && !open_timeout) {
>  		clear_bit(WDOG_HW_RUNNING, &wdd->status);
>  		err = wdd->ops->stop(wdd);
>  	} else {
> @@ -883,6 +903,7 @@ static int watchdog_release(struct inode *inode, struct file *file)
>  		watchdog_ping(wdd);
>  	}
>  
> +	watchdog_set_open_deadline(wd_data);
>  	watchdog_update_worker(wdd);
>  
>  	/* make sure that /dev/watchdog can be re-opened */
> @@ -983,6 +1004,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
>  
>  	/* Record time of most recent heartbeat as 'just before now'. */
>  	wd_data->last_hw_keepalive = ktime_sub(ktime_get(), 1);
> +	watchdog_set_open_deadline(wd_data);
>  
>  	/*
>  	 * If the watchdog is running, prevent its driver from being unloaded,
> @@ -1181,3 +1203,7 @@ module_param(handle_boot_enabled, bool, 0444);
>  MODULE_PARM_DESC(handle_boot_enabled,
>  	"Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
>  	__MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
> +
> +module_param(open_timeout, uint, 0644);
> +MODULE_PARM_DESC(open_timeout,
> +	"Maximum time (in milliseconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v9 0/3] watchdog: allow setting deadline for opening /dev/watchdogN
  2019-01-16 12:14 [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
                   ` (3 preceding siblings ...)
  2019-01-16 14:45 ` [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Esben Haabendal
@ 2019-01-21 20:45 ` Rasmus Villemoes
  2019-01-21 20:45   ` [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
                     ` (3 more replies)
  4 siblings, 4 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-21 20:45 UTC (permalink / raw)
  To: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Esben Haabendal, martin, Rasmus Villemoes

If a watchdog driver tells the framework that the device is running,
the framework takes care of feeding the watchdog until userspace opens
the device. If the userspace application which is supposed to do that
never comes up properly, the watchdog is fed indefinitely by the
kernel. This can be especially problematic for embedded devices.

The existing handle_boot_enabled cmdline parameter/config option
partially solves that, but that is only usable for the subset of
hardware watchdogs that have (or can be configured by the bootloader
to have) a timeout that is sufficient to make it realistic for
userspace to come up. Many devices have timeouts of only a few
seconds, or even less, making handle_boot_enabled insufficient.

These patches allow one to set a maximum time for which the kernel
will feed the watchdog, thus ensuring that either userspace has come
up, or the board gets reset. This allows fallback logic in the
bootloader to attempt some recovery (for example, if an automatic
update is in progress, it could roll back to the previous version).

The patches have been tested on a Raspberry Pi 2 and a Wandboard.

Changes in v9: Make the unit seconds instead of milliseconds.

Changes in v8: Redo on top of 5.0-rc1 - in particular, adapt to the
jiffies->ktime_t conversion (1ff68820 "watchdog: core: make sure the
watchdog_worker is not deferred"). Add a patch to make the hardware
timeout at the deadline as requested by Guenther - which was actually
made very easy by the ktime_t conversion.

v7 submission at <https://lore.kernel.org/lkml/1511865350-20665-1-git-send-email-rasmus.villemoes@prevas.dk/>

Rasmus Villemoes (3):
  watchdog: introduce watchdog.open_timeout commandline parameter
  watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT
  watchdog: make the device time out at open_deadline when open_timeout
    is used

 .../watchdog/watchdog-parameters.txt          |  8 ++++
 drivers/watchdog/Kconfig                      |  9 ++++
 drivers/watchdog/watchdog_dev.c               | 42 +++++++++++++++----
 3 files changed, 52 insertions(+), 7 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter
  2019-01-21 20:45 ` [PATCH v9 " Rasmus Villemoes
@ 2019-01-21 20:45   ` Rasmus Villemoes
  2019-01-22 17:29     ` Guenter Roeck
  2019-01-21 20:45   ` [PATCH v9 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-21 20:45 UTC (permalink / raw)
  To: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Esben Haabendal, martin, Rasmus Villemoes

The watchdog framework takes care of feeding a hardware watchdog until
userspace opens /dev/watchdogN. If that never happens for some reason
(buggy init script, corrupt root filesystem or whatnot) but the kernel
itself is fine, the machine stays up indefinitely. This patch allows
setting an upper limit for how long the kernel will take care of the
watchdog, thus ensuring that the watchdog will eventually reset the
machine.

A value of 0 (the default) means infinite timeout, preserving the
current behaviour.

This is particularly useful for embedded devices where some fallback
logic is implemented in the bootloader (e.g., use a different root
partition, boot from network, ...).

There is already handle_boot_enabled serving a similar purpose. However,
such a binary choice is unsuitable if the hardware watchdog cannot be
programmed by the bootloader to provide a timeout long enough for
userspace to get up and running. Many of the embedded devices we see use
external (gpio-triggered) watchdogs with a fixed timeout of the order of
1-2 seconds.

The open timeout is also used as a maximum time for an application to
re-open /dev/watchdogN after closing it. Again, while the kernel already
has a nowayout mechanism, using that means userspace is at the mercy of
whatever timeout the hardware has.

Being a module parameter, one can revert to the ordinary behaviour of
having the kernel maintain the watchdog indefinitely by simply writing 0
to /sys/... after initially opening /dev/watchdog; conversely, one can
of course also have the current behaviour of allowing indefinite time
until the first open, and then set that module parameter.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 .../watchdog/watchdog-parameters.txt          |  8 +++++
 drivers/watchdog/watchdog_dev.c               | 30 +++++++++++++++++--
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
index 0b88e333f9e1..907c4bb13810 100644
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ b/Documentation/watchdog/watchdog-parameters.txt
@@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
 providing kernel parameters for builtin drivers versus loadable
 modules.
 
+The watchdog core parameter watchdog.open_timeout is the maximum time,
+in seconds, for which the watchdog framework will take care of pinging
+a hardware watchdog until userspace opens the corresponding
+/dev/watchdogN device. A value of 0 (the default) means an infinite
+timeout. Setting this to a non-zero value can be useful to ensure that
+either userspace comes up properly, or the board gets reset and allows
+fallback logic in the bootloader to try something else.
+
 
 -------------------------------------------------
 acquirewdt:
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index f6c24b22b37c..ab2ad20f13eb 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -69,6 +69,7 @@ struct watchdog_core_data {
 	struct mutex lock;
 	ktime_t last_keepalive;
 	ktime_t last_hw_keepalive;
+	ktime_t open_deadline;
 	struct hrtimer timer;
 	struct kthread_work work;
 	unsigned long status;		/* Internal status bits */
@@ -87,6 +88,19 @@ static struct kthread_worker *watchdog_kworker;
 static bool handle_boot_enabled =
 	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
 
+static unsigned open_timeout;
+
+static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
+{
+	return ktime_after(ktime_get(), data->open_deadline);
+}
+
+static void watchdog_set_open_deadline(struct watchdog_core_data *data)
+{
+	data->open_deadline = open_timeout ?
+		ktime_get() + ktime_set(open_timeout, 0) : KTIME_MAX;
+}
+
 static inline bool watchdog_need_worker(struct watchdog_device *wdd)
 {
 	/* All variables in milli-seconds */
@@ -211,7 +225,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data)
 {
 	struct watchdog_device *wdd = wd_data->wdd;
 
-	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));
+	if (!wdd)
+		return false;
+
+	if (watchdog_active(wdd))
+		return true;
+
+	return watchdog_hw_running(wdd) && !watchdog_past_open_deadline(wd_data);
 }
 
 static void watchdog_ping_work(struct kthread_work *work)
@@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
 		return -EBUSY;
 	}
 
-	if (wdd->ops->stop) {
+	if (wdd->ops->stop && !open_timeout) {
 		clear_bit(WDOG_HW_RUNNING, &wdd->status);
 		err = wdd->ops->stop(wdd);
 	} else {
@@ -883,6 +903,7 @@ static int watchdog_release(struct inode *inode, struct file *file)
 		watchdog_ping(wdd);
 	}
 
+	watchdog_set_open_deadline(wd_data);
 	watchdog_update_worker(wdd);
 
 	/* make sure that /dev/watchdog can be re-opened */
@@ -983,6 +1004,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
 
 	/* Record time of most recent heartbeat as 'just before now'. */
 	wd_data->last_hw_keepalive = ktime_sub(ktime_get(), 1);
+	watchdog_set_open_deadline(wd_data);
 
 	/*
 	 * If the watchdog is running, prevent its driver from being unloaded,
@@ -1181,3 +1203,7 @@ module_param(handle_boot_enabled, bool, 0444);
 MODULE_PARM_DESC(handle_boot_enabled,
 	"Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
 	__MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
+
+module_param(open_timeout, uint, 0644);
+MODULE_PARM_DESC(open_timeout,
+	"Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v9 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT
  2019-01-21 20:45 ` [PATCH v9 " Rasmus Villemoes
  2019-01-21 20:45   ` [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
@ 2019-01-21 20:45   ` Rasmus Villemoes
  2019-01-21 20:45   ` [PATCH v9 3/3] watchdog: make the device time out at open_deadline when open_timeout is used Rasmus Villemoes
  2019-05-01  0:05   ` [PATCH v9 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Jerry Hoemann
  3 siblings, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-21 20:45 UTC (permalink / raw)
  To: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Esben Haabendal, martin, Rasmus Villemoes

This allows setting a default value for the watchdog.open_timeout
commandline parameter via Kconfig.

Some BSPs allow remote updating of the kernel image and root file
system, but updating the bootloader requires physical access. Hence, if
one has a firmware update that requires relaxing the
watchdog.open_timeout a little, the value used must be baked into the
kernel image itself and cannot come from the u-boot environment via the
kernel command line.

Being able to set the initial value in .config doesn't change the fact
that the value on the command line, if present, takes precedence, and is
of course immensely useful for development purposes while one has
console acccess, as well as usable in the cases where one can make a
permanent update of the kernel command line.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 Documentation/watchdog/watchdog-parameters.txt | 8 ++++----
 drivers/watchdog/Kconfig                       | 9 +++++++++
 drivers/watchdog/watchdog_dev.c                | 5 +++--
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
index 907c4bb13810..2fdbd07f9791 100644
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ b/Documentation/watchdog/watchdog-parameters.txt
@@ -11,10 +11,10 @@ modules.
 The watchdog core parameter watchdog.open_timeout is the maximum time,
 in seconds, for which the watchdog framework will take care of pinging
 a hardware watchdog until userspace opens the corresponding
-/dev/watchdogN device. A value of 0 (the default) means an infinite
-timeout. Setting this to a non-zero value can be useful to ensure that
-either userspace comes up properly, or the board gets reset and allows
-fallback logic in the bootloader to try something else.
+/dev/watchdogN device. A value of 0 means an infinite timeout. Setting
+this to a non-zero value can be useful to ensure that either userspace
+comes up properly, or the board gets reset and allows fallback logic
+in the bootloader to try something else.
 
 
 -------------------------------------------------
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 57f017d74a97..e1de5beb4e80 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -63,6 +63,15 @@ config WATCHDOG_SYSFS
 	  Say Y here if you want to enable watchdog device status read through
 	  sysfs attributes.
 
+config WATCHDOG_OPEN_TIMEOUT
+	int "Timeout value for opening watchdog device"
+	default 0
+	help
+	  The maximum time, in seconds, for which the watchdog framework takes
+	  care of pinging a hardware watchdog.  A value of 0 means infinite. The
+	  value set here can be overridden by the commandline parameter
+	  "watchdog.open_timeout".
+
 #
 # General Watchdog drivers
 #
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index ab2ad20f13eb..b763080741cc 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -88,7 +88,7 @@ static struct kthread_worker *watchdog_kworker;
 static bool handle_boot_enabled =
 	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
 
-static unsigned open_timeout;
+static unsigned open_timeout = CONFIG_WATCHDOG_OPEN_TIMEOUT;
 
 static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
 {
@@ -1206,4 +1206,5 @@ MODULE_PARM_DESC(handle_boot_enabled,
 
 module_param(open_timeout, uint, 0644);
 MODULE_PARM_DESC(open_timeout,
-	"Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");
+	"Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default="
+	__MODULE_STRING(CONFIG_WATCHDOG_OPEN_TIMEOUT) ")");
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v9 3/3] watchdog: make the device time out at open_deadline when open_timeout is used
  2019-01-21 20:45 ` [PATCH v9 " Rasmus Villemoes
  2019-01-21 20:45   ` [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
  2019-01-21 20:45   ` [PATCH v9 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
@ 2019-01-21 20:45   ` Rasmus Villemoes
  2019-05-01  0:05   ` [PATCH v9 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Jerry Hoemann
  3 siblings, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-21 20:45 UTC (permalink / raw)
  To: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Esben Haabendal, martin, Rasmus Villemoes

When the watchdog device is not open by userspace, the kernel takes
care of pinging it. When the open_timeout feature is in use, we should
ensure that the hardware fires close to open_timeout seconds after the
kernel has assumed responsibility for the device (either at boot, or
after userspace has had it open and magic-closed it).

To do this, simply reuse the logic that is already in place for
ensuring the same thing when userspace is responsible for regularly
pinging the device:

- When watchdog_active(wdd), this patch doesn't change anything.

- When !watchdoc_active(wdd), the "virtual timeout" should be taken to
be ->open_deadline". When the open_timeout feature is not used (i.e.,
when open_timeout was 0 the last time watchdog_set_open_deadline was
called), ->open_deadline is KTIME_MAX, and the arithmetic ends up
returning keepalive_interval as we used to.

This has been tested on a Wandboard with various combinations of
open_timeout and timeout-sec properties for the on-board watchdog by
booting with 'init=/bin/sh', timestamping the lines on the serial
console, and comparing the timestamp of the 'imx2-wdt 20bc000.wdog:
timeout nnn sec' line with the timestamp of the 'U-Boot SPL ...'
line (which appears just after reset).

Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 drivers/watchdog/watchdog_dev.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index b763080741cc..8ca310c4a9e4 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -133,14 +133,15 @@ static ktime_t watchdog_next_keepalive(struct watchdog_device *wdd)
 	ktime_t virt_timeout;
 	unsigned int hw_heartbeat_ms;
 
-	virt_timeout = ktime_add(wd_data->last_keepalive,
-				 ms_to_ktime(timeout_ms));
+	if (watchdog_active(wdd))
+		virt_timeout = ktime_add(wd_data->last_keepalive,
+					 ms_to_ktime(timeout_ms));
+	else
+		virt_timeout = wd_data->open_deadline;
+
 	hw_heartbeat_ms = min_not_zero(timeout_ms, wdd->max_hw_heartbeat_ms);
 	keepalive_interval = ms_to_ktime(hw_heartbeat_ms / 2);
 
-	if (!watchdog_active(wdd))
-		return keepalive_interval;
-
 	/*
 	 * To ensure that the watchdog times out wdd->timeout seconds
 	 * after the most recent ping from userspace, the last
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter
  2019-01-21 20:45   ` [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
@ 2019-01-22 17:29     ` Guenter Roeck
  2019-01-29 20:35       ` Rasmus Villemoes
  0 siblings, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2019-01-22 17:29 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: linux-watchdog, Wim Van Sebroeck, Jonathan Corbet, linux-kernel,
	linux-doc, Esben Haabendal, martin, Rasmus Villemoes

On Mon, Jan 21, 2019 at 08:45:39PM +0000, Rasmus Villemoes wrote:
> The watchdog framework takes care of feeding a hardware watchdog until
> userspace opens /dev/watchdogN. If that never happens for some reason
> (buggy init script, corrupt root filesystem or whatnot) but the kernel
> itself is fine, the machine stays up indefinitely. This patch allows
> setting an upper limit for how long the kernel will take care of the
> watchdog, thus ensuring that the watchdog will eventually reset the
> machine.
> 
> A value of 0 (the default) means infinite timeout, preserving the
> current behaviour.
> 
> This is particularly useful for embedded devices where some fallback
> logic is implemented in the bootloader (e.g., use a different root
> partition, boot from network, ...).
> 
> There is already handle_boot_enabled serving a similar purpose. However,
> such a binary choice is unsuitable if the hardware watchdog cannot be
> programmed by the bootloader to provide a timeout long enough for
> userspace to get up and running. Many of the embedded devices we see use
> external (gpio-triggered) watchdogs with a fixed timeout of the order of
> 1-2 seconds.
> 
> The open timeout is also used as a maximum time for an application to
> re-open /dev/watchdogN after closing it. Again, while the kernel already
> has a nowayout mechanism, using that means userspace is at the mercy of
> whatever timeout the hardware has.
> 
> Being a module parameter, one can revert to the ordinary behaviour of
> having the kernel maintain the watchdog indefinitely by simply writing 0
> to /sys/... after initially opening /dev/watchdog; conversely, one can
> of course also have the current behaviour of allowing indefinite time
> until the first open, and then set that module parameter.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>  .../watchdog/watchdog-parameters.txt          |  8 +++++
>  drivers/watchdog/watchdog_dev.c               | 30 +++++++++++++++++--
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
> index 0b88e333f9e1..907c4bb13810 100644
> --- a/Documentation/watchdog/watchdog-parameters.txt
> +++ b/Documentation/watchdog/watchdog-parameters.txt
> @@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
>  providing kernel parameters for builtin drivers versus loadable
>  modules.
>  
> +The watchdog core parameter watchdog.open_timeout is the maximum time,
> +in seconds, for which the watchdog framework will take care of pinging
> +a hardware watchdog until userspace opens the corresponding
> +/dev/watchdogN device. A value of 0 (the default) means an infinite
> +timeout. Setting this to a non-zero value can be useful to ensure that
> +either userspace comes up properly, or the board gets reset and allows
> +fallback logic in the bootloader to try something else.
> +

This is misleading. Unless I am missing something, the above only applies
if the watchdog is already runnning at boot, and after it has been opened
and closed once.

FWIW, I find this operation quite confusing. What is the rationale for not
starting the watchdog at boot time if it is not running and open_timeout
is set, but then refusing to stop it after it has been started once ?

>  
>  -------------------------------------------------
>  acquirewdt:
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index f6c24b22b37c..ab2ad20f13eb 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -69,6 +69,7 @@ struct watchdog_core_data {
>  	struct mutex lock;
>  	ktime_t last_keepalive;
>  	ktime_t last_hw_keepalive;
> +	ktime_t open_deadline;
>  	struct hrtimer timer;
>  	struct kthread_work work;
>  	unsigned long status;		/* Internal status bits */
> @@ -87,6 +88,19 @@ static struct kthread_worker *watchdog_kworker;
>  static bool handle_boot_enabled =
>  	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
>  
> +static unsigned open_timeout;
> +
> +static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
> +{
> +	return ktime_after(ktime_get(), data->open_deadline);
> +}
> +
> +static void watchdog_set_open_deadline(struct watchdog_core_data *data)
> +{
> +	data->open_deadline = open_timeout ?
> +		ktime_get() + ktime_set(open_timeout, 0) : KTIME_MAX;
> +}
> +
>  static inline bool watchdog_need_worker(struct watchdog_device *wdd)
>  {
>  	/* All variables in milli-seconds */
> @@ -211,7 +225,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data)
>  {
>  	struct watchdog_device *wdd = wd_data->wdd;
>  
> -	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));
> +	if (!wdd)
> +		return false;
> +
> +	if (watchdog_active(wdd))
> +		return true;
> +
> +	return watchdog_hw_running(wdd) && !watchdog_past_open_deadline(wd_data);
>  }
>  
>  static void watchdog_ping_work(struct kthread_work *work)
> @@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
>  		return -EBUSY;
>  	}
>  
> -	if (wdd->ops->stop) {
> +	if (wdd->ops->stop && !open_timeout) {

This changes the semantics of WDIOC_SETOPTIONS / WDIOS_DISABLECARD.
"Turn off the watchdog timer" is well defined and doesn't leave
the option of setting a timeout on it.

>  		clear_bit(WDOG_HW_RUNNING, &wdd->status);
>  		err = wdd->ops->stop(wdd);
>  	} else {
> @@ -883,6 +903,7 @@ static int watchdog_release(struct inode *inode, struct file *file)
>  		watchdog_ping(wdd);
>  	}
>  
> +	watchdog_set_open_deadline(wd_data);
>  	watchdog_update_worker(wdd);
>  
>  	/* make sure that /dev/watchdog can be re-opened */
> @@ -983,6 +1004,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
>  
>  	/* Record time of most recent heartbeat as 'just before now'. */
>  	wd_data->last_hw_keepalive = ktime_sub(ktime_get(), 1);
> +	watchdog_set_open_deadline(wd_data);
>  
>  	/*
>  	 * If the watchdog is running, prevent its driver from being unloaded,
> @@ -1181,3 +1203,7 @@ module_param(handle_boot_enabled, bool, 0444);
>  MODULE_PARM_DESC(handle_boot_enabled,
>  	"Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
>  	__MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
> +
> +module_param(open_timeout, uint, 0644);
> +MODULE_PARM_DESC(open_timeout,
> +	"Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");

The description is misleading. After the initial open, a subsequent close no
longer really stops the watchdog.

> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter
  2019-01-22 17:29     ` Guenter Roeck
@ 2019-01-29 20:35       ` Rasmus Villemoes
  2019-01-30  7:40         ` Rasmus Villemoes
  0 siblings, 1 reply; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-29 20:35 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-watchdog, Wim Van Sebroeck, Jonathan Corbet, linux-kernel,
	linux-doc, Esben Haabendal, martin, Rasmus Villemoes

On 22/01/2019 18.29, Guenter Roeck wrote:
> On Mon, Jan 21, 2019 at 08:45:39PM +0000, Rasmus Villemoes wrote:
>> The watchdog framework takes care of feeding a hardware watchdog until
>> userspace opens /dev/watchdogN. If that never happens for some reason
>> (buggy init script, corrupt root filesystem or whatnot) but the kernel
>> itself is fine, the machine stays up indefinitely. This patch allows
>> setting an upper limit for how long the kernel will take care of the
>> watchdog, thus ensuring that the watchdog will eventually reset the
>> machine.
>>
>> A value of 0 (the default) means infinite timeout, preserving the
>> current behaviour.
>>
>> This is particularly useful for embedded devices where some fallback
>> logic is implemented in the bootloader (e.g., use a different root
>> partition, boot from network, ...).
>>
>> There is already handle_boot_enabled serving a similar purpose. However,
>> such a binary choice is unsuitable if the hardware watchdog cannot be
>> programmed by the bootloader to provide a timeout long enough for
>> userspace to get up and running. Many of the embedded devices we see use
>> external (gpio-triggered) watchdogs with a fixed timeout of the order of
>> 1-2 seconds.
>>
>> The open timeout is also used as a maximum time for an application to
>> re-open /dev/watchdogN after closing it. Again, while the kernel already
>> has a nowayout mechanism, using that means userspace is at the mercy of
>> whatever timeout the hardware has.
>>
>> Being a module parameter, one can revert to the ordinary behaviour of
>> having the kernel maintain the watchdog indefinitely by simply writing 0
>> to /sys/... after initially opening /dev/watchdog; conversely, one can
>> of course also have the current behaviour of allowing indefinite time
>> until the first open, and then set that module parameter.
>>
>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
>> ---
>>  .../watchdog/watchdog-parameters.txt          |  8 +++++
>>  drivers/watchdog/watchdog_dev.c               | 30 +++++++++++++++++--
>>  2 files changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
>> index 0b88e333f9e1..907c4bb13810 100644
>> --- a/Documentation/watchdog/watchdog-parameters.txt
>> +++ b/Documentation/watchdog/watchdog-parameters.txt
>> @@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
>>  providing kernel parameters for builtin drivers versus loadable
>>  modules.
>>  
>> +The watchdog core parameter watchdog.open_timeout is the maximum time,
>> +in seconds, for which the watchdog framework will take care of pinging
>> +a hardware watchdog until userspace opens the corresponding
>> +/dev/watchdogN device. A value of 0 (the default) means an infinite
>> +timeout. Setting this to a non-zero value can be useful to ensure that
>> +either userspace comes up properly, or the board gets reset and allows
>> +fallback logic in the bootloader to try something else.
>> +
> 
> This is misleading. Unless I am missing something, the above only applies
> if the watchdog is already runnning at boot,

well, yes, if it's not running at boot, there's nothing for the kernel
to take care of. I can do s/a hardware watchdog/a running hardware
watchdog/ if that makes it clearer.

 and after it has been opened
> and closed once.
> 
> FWIW, I find this operation quite confusing. What is the rationale for not
> starting the watchdog at boot time if it is not running and open_timeout
> is set, but then refusing to stop it after it has been started once ?

I guess you have a point that there's a certain asymmetry there. In
practice, the cases where one wants this kind of robustness guarantee do
not rely on a watchdog that can/must be started and stopped by software.

>>  
>>  static void watchdog_ping_work(struct kthread_work *work)
>> @@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
>>  		return -EBUSY;
>>  	}
>>  
>> -	if (wdd->ops->stop) {
>> +	if (wdd->ops->stop && !open_timeout) {
> 
> This changes the semantics of WDIOC_SETOPTIONS / WDIOS_DISABLECARD.
> "Turn off the watchdog timer" is well defined and doesn't leave
> the option of setting a timeout on it.

I can drop this hunk, since it's mostly irrelevant to the actual use
cases I have in mind. It makes testing the feature on reference boards a
little more awkward, but I can live with that.

>> +
>> +module_param(open_timeout, uint, 0644);
>> +MODULE_PARM_DESC(open_timeout,
>> +	"Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");
> 
> The description is misleading. After the initial open, a subsequent close no
> longer really stops the watchdog.

If I drop the "&& !open_timeout" above, this would be accurate, no?

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter
  2019-01-29 20:35       ` Rasmus Villemoes
@ 2019-01-30  7:40         ` Rasmus Villemoes
  0 siblings, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2019-01-30  7:40 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-watchdog, Wim Van Sebroeck, Jonathan Corbet, linux-kernel,
	linux-doc, Esben Haabendal, martin

On 29/01/2019 21.35, Rasmus Villemoes wrote:
> On 22/01/2019 18.29, Guenter Roeck wrote:
>> On Mon, Jan 21, 2019 at 08:45:39PM +0000, Rasmus Villemoes wrote:
>>>  
>>>  static void watchdog_ping_work(struct kthread_work *work)
>>> @@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
>>>  		return -EBUSY;
>>>  	}
>>>  
>>> -	if (wdd->ops->stop) {
>>> +	if (wdd->ops->stop && !open_timeout) {
>>
>> This changes the semantics of WDIOC_SETOPTIONS / WDIOS_DISABLECARD.
>> "Turn off the watchdog timer" is well defined and doesn't leave
>> the option of setting a timeout on it.
> 
> I can drop this hunk, since it's mostly irrelevant to the actual use
> cases I have in mind. It makes testing the feature on reference boards a
> little more awkward, but I can live with that.

Actually, that's not enough - if there was a non-zero open_timeout at
boot, the device has had its ->open_deadline set to some finite (not
KTIME_MAX) value, and ->open_deadline does not get updated on the
WDIOC_SETOPTIONS / WDIOS_DISABLECARD path.

So, I think one way to preserve the semantics of WDIOS_DISABLECARD
(which, for a non-stopable watchdog really means "please let the kernel
take care of this indefinitely, or until WDIOS_ENABLECARD") is to extend
watchdog_stop with a timeout parameter, and do the call of
watchdog_set_open_deadline() from watchdog_stop() using the passed
timeout value. When called from the WDIOS_DISABLECARD case, we'd pass 0
for that timeout, while the call from _release would pass the module
parameter open_timeout. [Obviously, watchdog_set_open_deadline() would
also take the timeout as a parameter instead of referring to the
open_timeout directly.]

Thanks for pointing out the WDIOS_DISABLECARD case. I'll try to make the
above into code to see how it looks.

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 0/3] watchdog: allow setting deadline for opening /dev/watchdogN
  2019-01-21 20:45 ` [PATCH v9 " Rasmus Villemoes
                     ` (2 preceding siblings ...)
  2019-01-21 20:45   ` [PATCH v9 3/3] watchdog: make the device time out at open_deadline when open_timeout is used Rasmus Villemoes
@ 2019-05-01  0:05   ` Jerry Hoemann
  2019-05-01  6:32     ` Rasmus Villemoes
  3 siblings, 1 reply; 15+ messages in thread
From: Jerry Hoemann @ 2019-05-01  0:05 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet,
	linux-kernel, linux-doc, Esben Haabendal, martin,
	Rasmus Villemoes

On Mon, Jan 21, 2019 at 08:45:38PM +0000, Rasmus Villemoes wrote:
> If a watchdog driver tells the framework that the device is running,
> the framework takes care of feeding the watchdog until userspace opens
> the device. If the userspace application which is supposed to do that
> never comes up properly, the watchdog is fed indefinitely by the
> kernel. This can be especially problematic for embedded devices.
> 
> The existing handle_boot_enabled cmdline parameter/config option
> partially solves that, but that is only usable for the subset of
> hardware watchdogs that have (or can be configured by the bootloader
> to have) a timeout that is sufficient to make it realistic for
> userspace to come up. Many devices have timeouts of only a few
> seconds, or even less, making handle_boot_enabled insufficient.
> 
> These patches allow one to set a maximum time for which the kernel
> will feed the watchdog, thus ensuring that either userspace has come
> up, or the board gets reset. This allows fallback logic in the
> bootloader to attempt some recovery (for example, if an automatic
> update is in progress, it could roll back to the previous version).
> 

Rasmus,

Sorry if I missed it, but are you still looking into adding
this feature?  I was thinking it might also be useful with
kdump when a watchdog was running in the first kernel.

After a panic if kdump is configured, the system will boot the
crash kernel.  If a watchdog was running in the first kernel
it would still running in the crash kernel environment.

Some of the drivers on HPE systems take a non-trivial amount of time
to reset during the crash kernel boot, so it would be good to have
the core pet the watchdog until user space is ready.  But as the
crash kernel environment has its issues,  we really don't want
the core to ping the watchdog indefinitely.

Thanks

Jerry

-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v9 0/3] watchdog: allow setting deadline for opening /dev/watchdogN
  2019-05-01  0:05   ` [PATCH v9 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Jerry Hoemann
@ 2019-05-01  6:32     ` Rasmus Villemoes
  0 siblings, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2019-05-01  6:32 UTC (permalink / raw)
  To: Jerry.Hoemann
  Cc: linux-watchdog, Guenter Roeck, Wim Van Sebroeck, Jonathan Corbet,
	linux-kernel, linux-doc, Esben Haabendal, martin,
	Rasmus Villemoes

On 01/05/2019 02.05, Jerry Hoemann wrote:
> On Mon, Jan 21, 2019 at 08:45:38PM +0000, Rasmus Villemoes wrote:
>>
>> These patches allow one to set a maximum time for which the kernel
>> will feed the watchdog,
> 
> Rasmus,
> 
> Sorry if I missed it, but are you still looking into adding
> this feature? 

I am, and still intend to get around to it when I get some idle cycles.

I was thinking it might also be useful with
> kdump when a watchdog was running in the first kernel.
> 
> After a panic if kdump is configured, the system will boot the
> crash kernel.  If a watchdog was running in the first kernel
> it would still running in the crash kernel environment.
> 
> Some of the drivers on HPE systems take a non-trivial amount of time
> to reset during the crash kernel boot, so it would be good to have
> the core pet the watchdog until user space is ready.  But as the
> crash kernel environment has its issues,  we really don't want
> the core to ping the watchdog indefinitely.

Thanks, it's great to hear that others would also be interested in this.
This will move it up a few places on my todo list. I'll try to remember
to add you to the cc list.

Thanks,
Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-05-01  6:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16 12:14 [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
2019-01-16 12:14 ` [PATCH v8 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
2019-01-17 21:24   ` Guenter Roeck
2019-01-16 12:14 ` [PATCH v8 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
2019-01-16 12:14 ` [PATCH v8 3/3] watchdog: make the device time out at open_deadline when open_timeout is used Rasmus Villemoes
2019-01-16 14:45 ` [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Esben Haabendal
2019-01-21 20:45 ` [PATCH v9 " Rasmus Villemoes
2019-01-21 20:45   ` [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
2019-01-22 17:29     ` Guenter Roeck
2019-01-29 20:35       ` Rasmus Villemoes
2019-01-30  7:40         ` Rasmus Villemoes
2019-01-21 20:45   ` [PATCH v9 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
2019-01-21 20:45   ` [PATCH v9 3/3] watchdog: make the device time out at open_deadline when open_timeout is used Rasmus Villemoes
2019-05-01  0:05   ` [PATCH v9 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Jerry Hoemann
2019-05-01  6:32     ` Rasmus Villemoes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).