linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] watchdog: Provide user control over WDOG_STOP_ON_REBOOT
@ 2020-02-13 17:59 Dmitry Safonov
  2020-02-13 17:59 ` [PATCH 1/2] watchdog: Check WDOG_STOP_ON_REBOOT in reboot notifier Dmitry Safonov
  2020-02-13 17:59 ` [PATCH 2/2] watchdog/uapi: Add WDIOS_{RUN,STOP}_ON_REBOOT Dmitry Safonov
  0 siblings, 2 replies; 5+ messages in thread
From: Dmitry Safonov @ 2020-02-13 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Guenter Roeck, Wim Van Sebroeck,
	linux-watchdog

Add WDIOS_RUN_ON_REBOOT and WDIOS_STOP_ON_REBOOT to control the
watchdog's behavior over reboot.

Changes since RFC:
o rebase over v5.6
o fixed return code for ioctl()

I've sent RFC a while ago and it probably was very late in release
cycle to catch any attention:
https://lkml.kernel.org/r/20200121162145.166334-1-dima@arista.com

While waiting for rc1, I've changed my mind that it's RFC material and
sending it as PATCHv1 instead.

Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: linux-watchdog@vger.kernel.org

Dmitry Safonov (2):
  watchdog: Check WDOG_STOP_ON_REBOOT in reboot notifier
  watchdog/uapi: Add WDIOS_{RUN,STOP}_ON_REBOOT

 drivers/watchdog/watchdog_core.c | 27 +++++++++++++--------------
 drivers/watchdog/watchdog_dev.c  | 12 ++++++++++++
 include/linux/watchdog.h         |  6 ++++++
 include/uapi/linux/watchdog.h    |  3 ++-
 4 files changed, 33 insertions(+), 15 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] watchdog: Check WDOG_STOP_ON_REBOOT in reboot notifier
  2020-02-13 17:59 [PATCH 0/2] watchdog: Provide user control over WDOG_STOP_ON_REBOOT Dmitry Safonov
@ 2020-02-13 17:59 ` Dmitry Safonov
  2020-02-13 19:12   ` Guenter Roeck
  2020-02-13 17:59 ` [PATCH 2/2] watchdog/uapi: Add WDIOS_{RUN,STOP}_ON_REBOOT Dmitry Safonov
  1 sibling, 1 reply; 5+ messages in thread
From: Dmitry Safonov @ 2020-02-13 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Guenter Roeck, Wim Van Sebroeck,
	linux-watchdog

Many watchdog drivers use watchdog_stop_on_reboot() helper in order
to stop the watchdog on system reboot. Unfortunately, this logic is
coded in driver's probe function and doesn't allows user to decide what
to do during shutdown/reboot.

On the other side, Xen and Qemu watchdog drivers (xen_wdt and i6300esb)
may be configured to either send NMI or turn off/reboot VM as
the watchdog action. As the kernel may stuck at any state, sending NMIs
can't reliably reboot the VM.

At Arista, we benefited from the following set-up: the emulated watchdogs
trigger VM reset and softdog is set to catch less severe conditions to
generate vmcore. Just before reboot watchdog's timeout is increased
to some good-enough value (3 mins). That keeps watchdog always running
and guarantees that VM doesn't stuck.

As a preparation to move the watchdog's decision to stop on reboot or
not in userspace, allow WDOG_STOP_ON_REBOOT to be set during runtime,
not only on driver's probing. Always register reboot notifier and check
WDOG_STOP_ON_REBOOT inside it (on actual reboot).

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/watchdog/watchdog_core.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index 861daf4f37b2..ebf80ff3e8ce 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -153,6 +153,10 @@ static int watchdog_reboot_notifier(struct notifier_block *nb,
 	struct watchdog_device *wdd;
 
 	wdd = container_of(nb, struct watchdog_device, reboot_nb);
+
+	if (!test_bit(WDOG_STOP_ON_REBOOT, &wdd->status))
+		return NOTIFY_DONE;
+
 	if (code == SYS_DOWN || code == SYS_HALT) {
 		if (watchdog_active(wdd)) {
 			int ret;
@@ -254,17 +258,14 @@ static int __watchdog_register_device(struct watchdog_device *wdd)
 		}
 	}
 
-	if (test_bit(WDOG_STOP_ON_REBOOT, &wdd->status)) {
-		wdd->reboot_nb.notifier_call = watchdog_reboot_notifier;
-
-		ret = register_reboot_notifier(&wdd->reboot_nb);
-		if (ret) {
-			pr_err("watchdog%d: Cannot register reboot notifier (%d)\n",
-			       wdd->id, ret);
-			watchdog_dev_unregister(wdd);
-			ida_simple_remove(&watchdog_ida, id);
-			return ret;
-		}
+	wdd->reboot_nb.notifier_call = watchdog_reboot_notifier;
+	ret = register_reboot_notifier(&wdd->reboot_nb);
+	if (ret) {
+		pr_err("watchdog%d: Cannot register reboot notifier (%d)\n",
+				wdd->id, ret);
+		watchdog_dev_unregister(wdd);
+		ida_simple_remove(&watchdog_ida, id);
+		return ret;
 	}
 
 	if (wdd->ops->restart) {
@@ -321,9 +322,7 @@ static void __watchdog_unregister_device(struct watchdog_device *wdd)
 	if (wdd->ops->restart)
 		unregister_restart_handler(&wdd->restart_nb);
 
-	if (test_bit(WDOG_STOP_ON_REBOOT, &wdd->status))
-		unregister_reboot_notifier(&wdd->reboot_nb);
-
+	unregister_reboot_notifier(&wdd->reboot_nb);
 	watchdog_dev_unregister(wdd);
 	ida_simple_remove(&watchdog_ida, wdd->id);
 }
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] watchdog/uapi: Add WDIOS_{RUN,STOP}_ON_REBOOT
  2020-02-13 17:59 [PATCH 0/2] watchdog: Provide user control over WDOG_STOP_ON_REBOOT Dmitry Safonov
  2020-02-13 17:59 ` [PATCH 1/2] watchdog: Check WDOG_STOP_ON_REBOOT in reboot notifier Dmitry Safonov
@ 2020-02-13 17:59 ` Dmitry Safonov
  1 sibling, 0 replies; 5+ messages in thread
From: Dmitry Safonov @ 2020-02-13 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Guenter Roeck, Wim Van Sebroeck,
	linux-watchdog

Many watchdog drivers use watchdog_stop_on_reboot() helper in order
to stop the watchdog on system reboot. Unfortunately, this logic is
coded in driver's probe function and doesn't allows user to decide what
to do during shutdown/reboot.

On the other side, Xen and Qemu watchdog drivers (xen_wdt and i6300esb)
may be configured to either send NMI or turn off/reboot VM as
the watchdog action. As the kernel may stuck at any state, sending NMIs
can't reliably reboot the VM.

At Arista, we benefited from the following set-up: the emulated watchdogs
trigger VM reset and softdog is set to catch less severe conditions to
generate vmcore. Just before reboot watchdog's timeout is increased
to some good-enough value (3 mins). That keeps watchdog always running
and guarantees that VM doesn't stuck.

Provide new WDIOS_RUN_ON_REBOOT and WDIOS_STOP_ON_REBOOT ioctl options
to set up strategy on reboot.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/watchdog/watchdog_dev.c | 12 ++++++++++++
 include/linux/watchdog.h        |  6 ++++++
 include/uapi/linux/watchdog.h   |  3 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 8b5c742f24e8..c854cd0245db 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -753,6 +753,18 @@ static long watchdog_ioctl(struct file *file, unsigned int cmd,
 		}
 		if (val & WDIOS_ENABLECARD)
 			err = watchdog_start(wdd);
+
+		if (val & WDIOS_RUN_ON_REBOOT) {
+			if (val & WDIOS_STOP_ON_REBOOT) {
+				err = -EINVAL;
+				break;
+			}
+			watchdog_run_on_reboot(wdd);
+			err = 0;
+		} else if (val & WDIOS_STOP_ON_REBOOT) {
+			watchdog_stop_on_reboot(wdd);
+			err = 0;
+		}
 		break;
 	case WDIOC_KEEPALIVE:
 		if (!(wdd->info->options & WDIOF_KEEPALIVEPING)) {
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 417d9f37077a..9e2ca7754631 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -150,6 +150,12 @@ static inline void watchdog_stop_on_reboot(struct watchdog_device *wdd)
 	set_bit(WDOG_STOP_ON_REBOOT, &wdd->status);
 }
 
+/* Use the following function to keep the watchdog running on reboot */
+static inline void watchdog_run_on_reboot(struct watchdog_device *wdd)
+{
+	clear_bit(WDOG_STOP_ON_REBOOT, &wdd->status);
+}
+
 /* Use the following function to stop the watchdog when unregistering it */
 static inline void watchdog_stop_on_unregister(struct watchdog_device *wdd)
 {
diff --git a/include/uapi/linux/watchdog.h b/include/uapi/linux/watchdog.h
index b15cde5c9054..bf19a5d3c987 100644
--- a/include/uapi/linux/watchdog.h
+++ b/include/uapi/linux/watchdog.h
@@ -53,6 +53,7 @@ struct watchdog_info {
 #define	WDIOS_DISABLECARD	0x0001	/* Turn off the watchdog timer */
 #define	WDIOS_ENABLECARD	0x0002	/* Turn on the watchdog timer */
 #define	WDIOS_TEMPPANIC		0x0004	/* Kernel panic on temperature trip */
-
+#define	WDIOS_RUN_ON_REBOOT	0x0008	/* Keep watchdog enabled on reboot */
+#define	WDIOS_STOP_ON_REBOOT	0x0010	/* Turn off the watchdog on reboot */
 
 #endif /* _UAPI_LINUX_WATCHDOG_H */
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] watchdog: Check WDOG_STOP_ON_REBOOT in reboot notifier
  2020-02-13 17:59 ` [PATCH 1/2] watchdog: Check WDOG_STOP_ON_REBOOT in reboot notifier Dmitry Safonov
@ 2020-02-13 19:12   ` Guenter Roeck
  2020-02-13 20:23     ` Dmitry Safonov
  0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2020-02-13 19:12 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Dmitry Safonov, Wim Van Sebroeck, linux-watchdog

On Thu, Feb 13, 2020 at 05:59:57PM +0000, Dmitry Safonov wrote:
> Many watchdog drivers use watchdog_stop_on_reboot() helper in order
> to stop the watchdog on system reboot. Unfortunately, this logic is
> coded in driver's probe function and doesn't allows user to decide what
> to do during shutdown/reboot.
> 
> On the other side, Xen and Qemu watchdog drivers (xen_wdt and i6300esb)
> may be configured to either send NMI or turn off/reboot VM as
> the watchdog action. As the kernel may stuck at any state, sending NMIs
> can't reliably reboot the VM.
> 
> At Arista, we benefited from the following set-up: the emulated watchdogs
> trigger VM reset and softdog is set to catch less severe conditions to
> generate vmcore. Just before reboot watchdog's timeout is increased
> to some good-enough value (3 mins). That keeps watchdog always running
> and guarantees that VM doesn't stuck.
> 
> As a preparation to move the watchdog's decision to stop on reboot or
> not in userspace, allow WDOG_STOP_ON_REBOOT to be set during runtime,
> not only on driver's probing. Always register reboot notifier and check
> WDOG_STOP_ON_REBOOT inside it (on actual reboot).
> 

Does that really have to be decided at runtime, by the user ?
How about doing it with a module parameter ?

Also, I am not sure if an ioctl is the best means to do this, if it indeed
makes sense to decide it at runtime. ioctl implies an open watchdog device,
which interferes with the watchdog daemon. This means that the watchdog
daemon would have to be modified to support this, making this a quite expensive
change. It also implies that the action would have to be known when the
watchdog daemon is started, suggesting that a module parameter should be
sufficient.

Guenter

> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  drivers/watchdog/watchdog_core.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index 861daf4f37b2..ebf80ff3e8ce 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -153,6 +153,10 @@ static int watchdog_reboot_notifier(struct notifier_block *nb,
>  	struct watchdog_device *wdd;
>  
>  	wdd = container_of(nb, struct watchdog_device, reboot_nb);
> +
> +	if (!test_bit(WDOG_STOP_ON_REBOOT, &wdd->status))
> +		return NOTIFY_DONE;
> +
>  	if (code == SYS_DOWN || code == SYS_HALT) {
>  		if (watchdog_active(wdd)) {
>  			int ret;
> @@ -254,17 +258,14 @@ static int __watchdog_register_device(struct watchdog_device *wdd)
>  		}
>  	}
>  
> -	if (test_bit(WDOG_STOP_ON_REBOOT, &wdd->status)) {
> -		wdd->reboot_nb.notifier_call = watchdog_reboot_notifier;
> -
> -		ret = register_reboot_notifier(&wdd->reboot_nb);
> -		if (ret) {
> -			pr_err("watchdog%d: Cannot register reboot notifier (%d)\n",
> -			       wdd->id, ret);
> -			watchdog_dev_unregister(wdd);
> -			ida_simple_remove(&watchdog_ida, id);
> -			return ret;
> -		}
> +	wdd->reboot_nb.notifier_call = watchdog_reboot_notifier;
> +	ret = register_reboot_notifier(&wdd->reboot_nb);
> +	if (ret) {
> +		pr_err("watchdog%d: Cannot register reboot notifier (%d)\n",
> +				wdd->id, ret);
> +		watchdog_dev_unregister(wdd);
> +		ida_simple_remove(&watchdog_ida, id);
> +		return ret;
>  	}
>  
>  	if (wdd->ops->restart) {
> @@ -321,9 +322,7 @@ static void __watchdog_unregister_device(struct watchdog_device *wdd)
>  	if (wdd->ops->restart)
>  		unregister_restart_handler(&wdd->restart_nb);
>  
> -	if (test_bit(WDOG_STOP_ON_REBOOT, &wdd->status))
> -		unregister_reboot_notifier(&wdd->reboot_nb);
> -
> +	unregister_reboot_notifier(&wdd->reboot_nb);
>  	watchdog_dev_unregister(wdd);
>  	ida_simple_remove(&watchdog_ida, wdd->id);
>  }
> -- 
> 2.25.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] watchdog: Check WDOG_STOP_ON_REBOOT in reboot notifier
  2020-02-13 19:12   ` Guenter Roeck
@ 2020-02-13 20:23     ` Dmitry Safonov
  0 siblings, 0 replies; 5+ messages in thread
From: Dmitry Safonov @ 2020-02-13 20:23 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, Dmitry Safonov, Wim Van Sebroeck, linux-watchdog

Hi Guenter,

On 2/13/20 7:12 PM, Guenter Roeck wrote:
> Does that really have to be decided at runtime, by the user ?
> How about doing it with a module parameter ?
> 
> Also, I am not sure if an ioctl is the best means to do this, if it indeed
> makes sense to decide it at runtime. ioctl implies an open watchdog device,
> which interferes with the watchdog daemon. This means that the watchdog
> daemon would have to be modified to support this, making this a quite expensive
> change. It also implies that the action would have to be known when the
> watchdog daemon is started, suggesting that a module parameter should be
> sufficient.

Yes, fair points. I went with ioctl() because the timeout can be changed
in runtime. But you're right, I'll look into making it a module
parameter instead.

Thanks for the review and time,
          Dmitry

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-13 20:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 17:59 [PATCH 0/2] watchdog: Provide user control over WDOG_STOP_ON_REBOOT Dmitry Safonov
2020-02-13 17:59 ` [PATCH 1/2] watchdog: Check WDOG_STOP_ON_REBOOT in reboot notifier Dmitry Safonov
2020-02-13 19:12   ` Guenter Roeck
2020-02-13 20:23     ` Dmitry Safonov
2020-02-13 17:59 ` [PATCH 2/2] watchdog/uapi: Add WDIOS_{RUN,STOP}_ON_REBOOT Dmitry Safonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).