linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerry Hoemann <jerry.hoemann@hpe.com>
To: Ivan Mironov <mironov.ivan@gmail.com>
Cc: linux-watchdog@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wim Van Sebroeck <wim@linux-watchdog.org>,
	Guenter Roeck <linux@roeck-us.net>
Subject: Re: [RFC PATCH 0/4] watchdog: hpwdt: Fix NMI-related behaviour when CONFIG_HPWDT_NMI_DECODING is enabled
Date: Tue, 15 Jan 2019 19:22:42 -0700	[thread overview]
Message-ID: <20190116022242.GC18342@anatevka> (raw)
In-Reply-To: <20190114023617.10656-1-mironov.ivan@gmail.com>

On Mon, Jan 14, 2019 at 07:36:13AM +0500, Ivan Mironov wrote:
> Hi,
> 
> I found out that hpwdt alters NMI behaviour unexpectedly if compiled
> with enabled CONFIG_HPWDT_NMI_DECODING:
> 
>  * System starts to panic on any NMI with misleading message.

hpwdt doesn't start to panic on any NMI.  It starts to panic on:

1) NMI_SERR		associated with NMI
2) NMI_IO_CHECK		associated with IO errors
3) NMI_UNKNOWN		NMI unclaimed by all local handlers.

On Gen10 going forward we plan to restrict to just iLO
generated NMIs.

There is a long history on hp/hpe proliant systems where hpwdt
was handler of general IO errors (at least ones that would cause
NMI to be generated) and we chose to panic in these situation
as the errors were generally quite serious.

Yes, this has caused some problems in the past as Linux has
overloaded NMI and some subsystems didn't claim the NMIs that
they generated (think profiling.)  But, I haven't seen these
types of problems for several years now.

The more modern platforms have more robust error handling built
into them and to linux so going forward we'll restrict hpwdt to a more
traditional WDT role.  But we're retaining the more conservative
approach for legacy platforms.

How would you suggest that the message be enhanced?


>  * Watchdog provided by hpwdt is not working after such panic.
> 
> Here are the patches that should fix this.
> 
> This is an RFC patch series because I am not sure that patches are
> correct. Questions:
> 
>  * Are "mynmi" flags always set on all supported iLO versions when iLO
>    is the source of NMI?


Unfortunately no.

hpwdt is a dual purpose driver.  It handles the iLO watchdog timer
and the "Generate NMI to System" button.  These are closely related
hardware wise.

However, some platforms generate NMI for "Generate NMI to System" button but aren't
signaled via iLO registers.  These will show up as NMI_UNKNOWN, hence while
hpwdt still claims these.

There are also some systems that do not set the nmistat bits correctly.

So as to not break legacy platforms, the use the nmistat bits for
control will be for Gen10 going forward.



>  * Is it safe to reset "mynmi" flags to zero if code decides to not panic?

The reading of the registers is itself destructive (sets to zero) but the real
issue is that some proliant systems lack the ability to acknowledge the NMI so
only one can ever be received.  So returning is not advisable as no
further NMI will be generated via this path.  A reset through firmware
is required to restore the feature.


> 
> Ivan Mironov (4):
>   watchdog: hpwdt: Don't disable watchdog on NMI
>   watchdog: hpwdt: Don't panic on foreign NMI
>   watchdog: hpwdt: Add more information into message
>   watchdog: hpwdt: Make panic behaviour configurable
> 
>  drivers/watchdog/hpwdt.c | 45 ++++++++++++++++++++++------------------
>  1 file changed, 25 insertions(+), 20 deletions(-)
> 
> -- 
> 2.20.1

-- 

-----------------------------------------------------------------------------
Jerry Hoemann                  Software Engineer   Hewlett Packard Enterprise
-----------------------------------------------------------------------------

  parent reply	other threads:[~2019-01-16  3:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-14  2:36 [RFC PATCH 0/4] watchdog: hpwdt: Fix NMI-related behaviour when CONFIG_HPWDT_NMI_DECODING is enabled Ivan Mironov
2019-01-14  2:36 ` [RFC PATCH 1/4] watchdog: hpwdt: Don't disable watchdog on NMI Ivan Mironov
2019-01-16  2:27   ` Jerry Hoemann
2019-01-16  2:52     ` Guenter Roeck
2019-02-02  4:55     ` Ivan Mironov
2019-02-08  1:26       ` Jerry Hoemann
2019-02-08  4:17         ` Guenter Roeck
2019-02-14 19:49         ` Ivan Mironov
2019-01-14  2:36 ` [RFC PATCH 2/4] watchdog: hpwdt: Don't panic on foreign NMI Ivan Mironov
2019-01-14  2:36 ` [RFC PATCH 3/4] watchdog: hpwdt: Add more information into message Ivan Mironov
2019-01-14  2:36 ` [RFC PATCH 4/4] watchdog: hpwdt: Make panic behaviour configurable Ivan Mironov
2019-01-16  2:30   ` Jerry Hoemann
2019-02-02  5:13     ` Ivan Mironov
2019-01-16  2:22 ` Jerry Hoemann [this message]
2019-02-02  6:24   ` [RFC PATCH 0/4] watchdog: hpwdt: Fix NMI-related behaviour when CONFIG_HPWDT_NMI_DECODING is enabled Ivan Mironov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190116022242.GC18342@anatevka \
    --to=jerry.hoemann@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-watchdog@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mironov.ivan@gmail.com \
    --cc=wim@linux-watchdog.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).