linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] watchdog/hpwdt: Disable Pretimeout/NMI in Crash Path
@ 2020-11-23  2:08 Jerry Hoemann
  2020-11-23  2:08 ` [PATCH 1/2] watchdog/hpwdt: Disable NMI in Crash Kernel Jerry Hoemann
  2020-11-23  2:08 ` [PATCH 2/2] watchdog/hpwdt: Reflect changes Jerry Hoemann
  0 siblings, 2 replies; 5+ messages in thread
From: Jerry Hoemann @ 2020-11-23  2:08 UTC (permalink / raw)
  To: linux, wim; +Cc: kasong, linux-watchdog, Jerry Hoemann

An intermittent issue was first noticed on RHEL 8.x during kdump.
When the dump completed and the system was in the process of resetting
an NMI would get generated as a result of an IO error.

For a discussion of the underlying cause and attempt to fix see:
	https://lkml.org/lkml/2019/12/25/159

The kernel's handling of the NMI generated an intermittent
secondary NMI that would hang the system.

As systemd enables WDT during shutdown, the WDT should have broken
the system out of the hang, but hpwdt_pretimeout stops the WDT
in order to allow the collection of a kdump.  But as we are
already in the crash kernel when the NMI is received, stopping 
the WDT is not necessary.

Jerry Hoemann (2):
  watchdog/hpwdt: Disable NMI in Crash Kernel
  watchdog/hpwdt: Reflect changes

 drivers/watchdog/hpwdt.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] watchdog/hpwdt: Disable NMI in Crash Kernel
  2020-11-23  2:08 [PATCH 0/2] watchdog/hpwdt: Disable Pretimeout/NMI in Crash Path Jerry Hoemann
@ 2020-11-23  2:08 ` Jerry Hoemann
  2020-11-23  2:20   ` Guenter Roeck
  2020-11-23  2:08 ` [PATCH 2/2] watchdog/hpwdt: Reflect changes Jerry Hoemann
  1 sibling, 1 reply; 5+ messages in thread
From: Jerry Hoemann @ 2020-11-23  2:08 UTC (permalink / raw)
  To: linux, wim; +Cc: kasong, linux-watchdog, Jerry Hoemann

NMIs received during the crash path are problematic as hpwdt_pretimeout
handling of the NMI would cause a reentry into kdump.

The situation is complicated in that I/O errors can be signaled as NMI
circumventing hpwdt_pretimeout's attempt to not claim NMI not associated
with either the WDT or the iLO NMI switch.  These NMI can additionally
cause a secondary NMI which cause the system to hang.

By disabling pretimeout and hpwdtimeout in crash path we both reduce
the risk of receiving an NMI and simuletaneously leave the WDT running
(if it was already in use) to allow the WDT to break the system out of
hangs by the WDT reset.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com>
---
 drivers/watchdog/hpwdt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 7d34bcf..eeb4df2 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -21,6 +21,7 @@
 #include <linux/types.h>
 #include <linux/watchdog.h>
 #include <asm/nmi.h>
+#include <linux/crash_dump.h>
 
 #define HPWDT_VERSION			"2.0.3"
 #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
@@ -334,6 +335,11 @@ static int hpwdt_init_one(struct pci_dev *dev,
 	watchdog_set_nowayout(&hpwdt_dev, nowayout);
 	watchdog_init_timeout(&hpwdt_dev, soft_margin, NULL);
 
+	if (is_kdump_kernel()) {
+		pretimeout = 0;
+		kdumptimeout = 0;
+	}
+
 	if (pretimeout && hpwdt_dev.timeout <= PRETIMEOUT_SEC) {
 		dev_warn(&dev->dev, "timeout <= pretimeout. Setting pretimeout to zero\n");
 		pretimeout = 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] watchdog/hpwdt: Reflect changes
  2020-11-23  2:08 [PATCH 0/2] watchdog/hpwdt: Disable Pretimeout/NMI in Crash Path Jerry Hoemann
  2020-11-23  2:08 ` [PATCH 1/2] watchdog/hpwdt: Disable NMI in Crash Kernel Jerry Hoemann
@ 2020-11-23  2:08 ` Jerry Hoemann
  2020-11-23  2:20   ` Guenter Roeck
  1 sibling, 1 reply; 5+ messages in thread
From: Jerry Hoemann @ 2020-11-23  2:08 UTC (permalink / raw)
  To: linux, wim; +Cc: kasong, linux-watchdog, Jerry Hoemann

Bump driver number to reflect recent changes.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com>
---
 drivers/watchdog/hpwdt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index eeb4df2..cbd1498 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -23,7 +23,7 @@
 #include <asm/nmi.h>
 #include <linux/crash_dump.h>
 
-#define HPWDT_VERSION			"2.0.3"
+#define HPWDT_VERSION			"2.0.4"
 #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
 #define TICKS_TO_SECS(ticks)		((ticks) * 128 / 1000)
 #define HPWDT_MAX_TICKS			65535
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] watchdog/hpwdt: Disable NMI in Crash Kernel
  2020-11-23  2:08 ` [PATCH 1/2] watchdog/hpwdt: Disable NMI in Crash Kernel Jerry Hoemann
@ 2020-11-23  2:20   ` Guenter Roeck
  0 siblings, 0 replies; 5+ messages in thread
From: Guenter Roeck @ 2020-11-23  2:20 UTC (permalink / raw)
  To: Jerry Hoemann; +Cc: wim, kasong, linux-watchdog

On Sun, Nov 22, 2020 at 07:08:39PM -0700, Jerry Hoemann wrote:
> NMIs received during the crash path are problematic as hpwdt_pretimeout
> handling of the NMI would cause a reentry into kdump.
> 
> The situation is complicated in that I/O errors can be signaled as NMI
> circumventing hpwdt_pretimeout's attempt to not claim NMI not associated
> with either the WDT or the iLO NMI switch.  These NMI can additionally
> cause a secondary NMI which cause the system to hang.
> 
> By disabling pretimeout and hpwdtimeout in crash path we both reduce
> the risk of receiving an NMI and simuletaneously leave the WDT running
> (if it was already in use) to allow the WDT to break the system out of
> hangs by the WDT reset.
> 
> Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com>

Reviewed-by: Guenter Roeck <linux@roeck-us.net>

> ---
>  drivers/watchdog/hpwdt.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index 7d34bcf..eeb4df2 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -21,6 +21,7 @@
>  #include <linux/types.h>
>  #include <linux/watchdog.h>
>  #include <asm/nmi.h>
> +#include <linux/crash_dump.h>
>  
>  #define HPWDT_VERSION			"2.0.3"
>  #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
> @@ -334,6 +335,11 @@ static int hpwdt_init_one(struct pci_dev *dev,
>  	watchdog_set_nowayout(&hpwdt_dev, nowayout);
>  	watchdog_init_timeout(&hpwdt_dev, soft_margin, NULL);
>  
> +	if (is_kdump_kernel()) {
> +		pretimeout = 0;
> +		kdumptimeout = 0;
> +	}
> +
>  	if (pretimeout && hpwdt_dev.timeout <= PRETIMEOUT_SEC) {
>  		dev_warn(&dev->dev, "timeout <= pretimeout. Setting pretimeout to zero\n");
>  		pretimeout = 0;
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] watchdog/hpwdt: Reflect changes
  2020-11-23  2:08 ` [PATCH 2/2] watchdog/hpwdt: Reflect changes Jerry Hoemann
@ 2020-11-23  2:20   ` Guenter Roeck
  0 siblings, 0 replies; 5+ messages in thread
From: Guenter Roeck @ 2020-11-23  2:20 UTC (permalink / raw)
  To: Jerry Hoemann; +Cc: wim, kasong, linux-watchdog

On Sun, Nov 22, 2020 at 07:08:40PM -0700, Jerry Hoemann wrote:
> Bump driver number to reflect recent changes.
> 
> Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com>

Reviewed-by: Guenter Roeck <linux@roeck-us.net>

> ---
>  drivers/watchdog/hpwdt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index eeb4df2..cbd1498 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -23,7 +23,7 @@
>  #include <asm/nmi.h>
>  #include <linux/crash_dump.h>
>  
> -#define HPWDT_VERSION			"2.0.3"
> +#define HPWDT_VERSION			"2.0.4"
>  #define SECS_TO_TICKS(secs)		((secs) * 1000 / 128)
>  #define TICKS_TO_SECS(ticks)		((ticks) * 128 / 1000)
>  #define HPWDT_MAX_TICKS			65535
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-23  2:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-23  2:08 [PATCH 0/2] watchdog/hpwdt: Disable Pretimeout/NMI in Crash Path Jerry Hoemann
2020-11-23  2:08 ` [PATCH 1/2] watchdog/hpwdt: Disable NMI in Crash Kernel Jerry Hoemann
2020-11-23  2:20   ` Guenter Roeck
2020-11-23  2:08 ` [PATCH 2/2] watchdog/hpwdt: Reflect changes Jerry Hoemann
2020-11-23  2:20   ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).