All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: "linux-watchdog@vger.kernel.org" <linux-watchdog@vger.kernel.org>,
	Wim Van Sebroeck <wim@linux-watchdog.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	Esben Haabendal <esben@haabendal.dk>,
	"martin@hundeboll.net" <martin@hundeboll.net>,
	Rasmus Villemoes <Rasmus.Villemoes@prevas.se>
Subject: Re: [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter
Date: Tue, 22 Jan 2019 09:29:35 -0800	[thread overview]
Message-ID: <20190122172935.GA4964@roeck-us.net> (raw)
In-Reply-To: <20190121204527.5548-2-rasmus.villemoes@prevas.dk>

On Mon, Jan 21, 2019 at 08:45:39PM +0000, Rasmus Villemoes wrote:
> The watchdog framework takes care of feeding a hardware watchdog until
> userspace opens /dev/watchdogN. If that never happens for some reason
> (buggy init script, corrupt root filesystem or whatnot) but the kernel
> itself is fine, the machine stays up indefinitely. This patch allows
> setting an upper limit for how long the kernel will take care of the
> watchdog, thus ensuring that the watchdog will eventually reset the
> machine.
> 
> A value of 0 (the default) means infinite timeout, preserving the
> current behaviour.
> 
> This is particularly useful for embedded devices where some fallback
> logic is implemented in the bootloader (e.g., use a different root
> partition, boot from network, ...).
> 
> There is already handle_boot_enabled serving a similar purpose. However,
> such a binary choice is unsuitable if the hardware watchdog cannot be
> programmed by the bootloader to provide a timeout long enough for
> userspace to get up and running. Many of the embedded devices we see use
> external (gpio-triggered) watchdogs with a fixed timeout of the order of
> 1-2 seconds.
> 
> The open timeout is also used as a maximum time for an application to
> re-open /dev/watchdogN after closing it. Again, while the kernel already
> has a nowayout mechanism, using that means userspace is at the mercy of
> whatever timeout the hardware has.
> 
> Being a module parameter, one can revert to the ordinary behaviour of
> having the kernel maintain the watchdog indefinitely by simply writing 0
> to /sys/... after initially opening /dev/watchdog; conversely, one can
> of course also have the current behaviour of allowing indefinite time
> until the first open, and then set that module parameter.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>  .../watchdog/watchdog-parameters.txt          |  8 +++++
>  drivers/watchdog/watchdog_dev.c               | 30 +++++++++++++++++--
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
> index 0b88e333f9e1..907c4bb13810 100644
> --- a/Documentation/watchdog/watchdog-parameters.txt
> +++ b/Documentation/watchdog/watchdog-parameters.txt
> @@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
>  providing kernel parameters for builtin drivers versus loadable
>  modules.
>  
> +The watchdog core parameter watchdog.open_timeout is the maximum time,
> +in seconds, for which the watchdog framework will take care of pinging
> +a hardware watchdog until userspace opens the corresponding
> +/dev/watchdogN device. A value of 0 (the default) means an infinite
> +timeout. Setting this to a non-zero value can be useful to ensure that
> +either userspace comes up properly, or the board gets reset and allows
> +fallback logic in the bootloader to try something else.
> +

This is misleading. Unless I am missing something, the above only applies
if the watchdog is already runnning at boot, and after it has been opened
and closed once.

FWIW, I find this operation quite confusing. What is the rationale for not
starting the watchdog at boot time if it is not running and open_timeout
is set, but then refusing to stop it after it has been started once ?

>  
>  -------------------------------------------------
>  acquirewdt:
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index f6c24b22b37c..ab2ad20f13eb 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -69,6 +69,7 @@ struct watchdog_core_data {
>  	struct mutex lock;
>  	ktime_t last_keepalive;
>  	ktime_t last_hw_keepalive;
> +	ktime_t open_deadline;
>  	struct hrtimer timer;
>  	struct kthread_work work;
>  	unsigned long status;		/* Internal status bits */
> @@ -87,6 +88,19 @@ static struct kthread_worker *watchdog_kworker;
>  static bool handle_boot_enabled =
>  	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
>  
> +static unsigned open_timeout;
> +
> +static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
> +{
> +	return ktime_after(ktime_get(), data->open_deadline);
> +}
> +
> +static void watchdog_set_open_deadline(struct watchdog_core_data *data)
> +{
> +	data->open_deadline = open_timeout ?
> +		ktime_get() + ktime_set(open_timeout, 0) : KTIME_MAX;
> +}
> +
>  static inline bool watchdog_need_worker(struct watchdog_device *wdd)
>  {
>  	/* All variables in milli-seconds */
> @@ -211,7 +225,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data)
>  {
>  	struct watchdog_device *wdd = wd_data->wdd;
>  
> -	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));
> +	if (!wdd)
> +		return false;
> +
> +	if (watchdog_active(wdd))
> +		return true;
> +
> +	return watchdog_hw_running(wdd) && !watchdog_past_open_deadline(wd_data);
>  }
>  
>  static void watchdog_ping_work(struct kthread_work *work)
> @@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
>  		return -EBUSY;
>  	}
>  
> -	if (wdd->ops->stop) {
> +	if (wdd->ops->stop && !open_timeout) {

This changes the semantics of WDIOC_SETOPTIONS / WDIOS_DISABLECARD.
"Turn off the watchdog timer" is well defined and doesn't leave
the option of setting a timeout on it.

>  		clear_bit(WDOG_HW_RUNNING, &wdd->status);
>  		err = wdd->ops->stop(wdd);
>  	} else {
> @@ -883,6 +903,7 @@ static int watchdog_release(struct inode *inode, struct file *file)
>  		watchdog_ping(wdd);
>  	}
>  
> +	watchdog_set_open_deadline(wd_data);
>  	watchdog_update_worker(wdd);
>  
>  	/* make sure that /dev/watchdog can be re-opened */
> @@ -983,6 +1004,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
>  
>  	/* Record time of most recent heartbeat as 'just before now'. */
>  	wd_data->last_hw_keepalive = ktime_sub(ktime_get(), 1);
> +	watchdog_set_open_deadline(wd_data);
>  
>  	/*
>  	 * If the watchdog is running, prevent its driver from being unloaded,
> @@ -1181,3 +1203,7 @@ module_param(handle_boot_enabled, bool, 0444);
>  MODULE_PARM_DESC(handle_boot_enabled,
>  	"Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
>  	__MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
> +
> +module_param(open_timeout, uint, 0644);
> +MODULE_PARM_DESC(open_timeout,
> +	"Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");

The description is misleading. After the initial open, a subsequent close no
longer really stops the watchdog.

> -- 
> 2.20.1
> 

  reply	other threads:[~2019-01-22 17:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-16 12:14 [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Rasmus Villemoes
2019-01-16 12:14 ` [PATCH v8 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
2019-01-17 21:24   ` Guenter Roeck
2019-01-16 12:14 ` [PATCH v8 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
2019-01-16 12:14 ` [PATCH v8 3/3] watchdog: make the device time out at open_deadline when open_timeout is used Rasmus Villemoes
2019-01-16 14:45 ` [PATCH v8 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Esben Haabendal
2019-01-21 20:45 ` [PATCH v9 " Rasmus Villemoes
2019-01-21 20:45   ` [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout commandline parameter Rasmus Villemoes
2019-01-22 17:29     ` Guenter Roeck [this message]
2019-01-29 20:35       ` Rasmus Villemoes
2019-01-30  7:40         ` Rasmus Villemoes
2019-01-21 20:45   ` [PATCH v9 2/3] watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT Rasmus Villemoes
2019-01-21 20:45   ` [PATCH v9 3/3] watchdog: make the device time out at open_deadline when open_timeout is used Rasmus Villemoes
2019-05-01  0:05   ` [PATCH v9 0/3] watchdog: allow setting deadline for opening /dev/watchdogN Jerry Hoemann
2019-05-01  6:32     ` Rasmus Villemoes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190122172935.GA4964@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=Rasmus.Villemoes@prevas.se \
    --cc=corbet@lwn.net \
    --cc=esben@haabendal.dk \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-watchdog@vger.kernel.org \
    --cc=martin@hundeboll.net \
    --cc=rasmus.villemoes@prevas.dk \
    --cc=wim@linux-watchdog.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.