All of lore.kernel.org
 help / color / mirror / Atom feed
* ipmi watchdog questions
@ 2014-05-01 13:58 Don Zickus
  2014-05-02  0:38 ` Corey Minyard
  0 siblings, 1 reply; 14+ messages in thread
From: Don Zickus @ 2014-05-01 13:58 UTC (permalink / raw)
  To: minyard; +Cc: openipmi-developer, linux-watchdog

Hi Corey,

I stumbled upon an issue with a partner of ours, where they booted their
machine and tried to load the ipmi_watchdog module by hand and it failed.

The reason it failed was that the iTCO watchdog driver was already loaded
and it registered the misc device /dev/watchdog first.

I looked at the ipmi watchdog driver and realized it was never converted
to the new watchdog framework where the watchdog_core module manages the
'/dev/watchdog' misc device.

So being naive and not knowing much about IPMI, I decided to follow the
helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt
and convert the ipmi_watchdog to use the new watchdog framework.

I ran into a few issues and then realized the driver itself never really
binds to any hardware, so it makes the conversion process a little more
challenging.

So a few questions to you before I waste my time in this area:

- Is there any prior history about why the ipmi_watchdog was never
  converted to the new watchdog framework?  Lack of interest? Technical
hurdles?

- Is there a reason why the ipmi_watchdog is a seperate module as opposed
  to being called by ipmi_si?  It seems there shouldn't be an issue with
the watchdog always loaded, it just won't do anything until someone opens
it (from my understanding).  Also you would gain the ability to use the
shutdown/remove routines properly instead of the reboot/panic notifiers.

  In addition, passing the pointer to the 'struct watchdog_device' would be
easier if some of those extra pieces were not there (as opposed to making
it a global reference).

- What does the fasync and poll calls do for a watchdog?

I'll start with that for now.

I appreciate any feedback.  Currently we just implemented blacklisting the
iTCO watchdog driver to workaround this problem.  I thought we could do
better, hence my motivation to do work in this area.

Cheers,
Don


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-01 13:58 ipmi watchdog questions Don Zickus
@ 2014-05-02  0:38 ` Corey Minyard
  2014-05-02  1:11   ` Guenter Roeck
  2014-05-02 15:10   ` Don Zickus
  0 siblings, 2 replies; 14+ messages in thread
From: Corey Minyard @ 2014-05-02  0:38 UTC (permalink / raw)
  To: Don Zickus; +Cc: openipmi-developer, linux-watchdog

On 05/01/2014 08:58 AM, Don Zickus wrote:
> Hi Corey,
>
> I stumbled upon an issue with a partner of ours, where they booted their
> machine and tried to load the ipmi_watchdog module by hand and it failed.
>
> The reason it failed was that the iTCO watchdog driver was already loaded
> and it registered the misc device /dev/watchdog first.
>
> I looked at the ipmi watchdog driver and realized it was never converted
> to the new watchdog framework where the watchdog_core module manages the
> '/dev/watchdog' misc device.
>
> So being naive and not knowing much about IPMI, I decided to follow the
> helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt
> and convert the ipmi_watchdog to use the new watchdog framework.
>
> I ran into a few issues and then realized the driver itself never really
> binds to any hardware, so it makes the conversion process a little more
> challenging.
>
> So a few questions to you before I waste my time in this area:
>
> - Is there any prior history about why the ipmi_watchdog was never
>   converted to the new watchdog framework?  Lack of interest? Technical
> hurdles?

Mostly lack of interest, but there are some technical hurdles.

It would be hard to implement some things.  The watchdog framework has
no concept of pretimeouts.  And IPMI is message based, you send a
message to a controller to do anything, and you have to wait for the
response.  That doesn't work very well with the watchdog interface,
which assumes you can do everything immediately.

>
> - Is there a reason why the ipmi_watchdog is a seperate module as opposed
>   to being called by ipmi_si?  It seems there shouldn't be an issue with
> the watchdog always loaded, it just won't do anything until someone opens
> it (from my understanding).  Also you would gain the ability to use the
> shutdown/remove routines properly instead of the reboot/panic notifiers.

I'm not sure I understand this.  Why would you want it as part of
ipmi_si?  ipmi_msghandler would be a little more logical, but IMHO still
doesn't make sense.  It uses the IPMI interface, and the interface is
designed to have multiple users.  Better to keep it separate because
it's a separate function.

I also don't understand the comment about shutdown/remove instead of
reboot/panic.  Can you elaborate on that?

>   In addition, passing the pointer to the 'struct watchdog_device' would be
> easier if some of those extra pieces were not there (as opposed to making
> it a global reference).
>
> - What does the fasync and poll calls do for a watchdog?

The IPMI watchdog has the ability to report a pretimeout at a specific
amount of time before the final timeout, presumably to take some action
before the system reboots.  the fasync and poll (and read) calls let
this be reported to the user.

>
> I'll start with that for now.
>
> I appreciate any feedback.  Currently we just implemented blacklisting the
> iTCO watchdog driver to workaround this problem.  I thought we could do
> better, hence my motivation to do work in this area.

It would be nice, yes.  I'm afraid to get all the functionality would be
a lot of hacking on the watchdog framework or removal of function from
the driver.

Thanks,

-corey

>
> Cheers,
> Don
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02  0:38 ` Corey Minyard
@ 2014-05-02  1:11   ` Guenter Roeck
  2014-05-02  4:38     ` Corey Minyard
  2014-05-02 15:10   ` Don Zickus
  1 sibling, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2014-05-02  1:11 UTC (permalink / raw)
  To: minyard, Don Zickus; +Cc: openipmi-developer, linux-watchdog

On 05/01/2014 05:38 PM, Corey Minyard wrote:
> On 05/01/2014 08:58 AM, Don Zickus wrote:
>> Hi Corey,
>>
>> I stumbled upon an issue with a partner of ours, where they booted their
>> machine and tried to load the ipmi_watchdog module by hand and it failed.
>>
>> The reason it failed was that the iTCO watchdog driver was already loaded
>> and it registered the misc device /dev/watchdog first.
>>
>> I looked at the ipmi watchdog driver and realized it was never converted
>> to the new watchdog framework where the watchdog_core module manages the
>> '/dev/watchdog' misc device.
>>
>> So being naive and not knowing much about IPMI, I decided to follow the
>> helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt
>> and convert the ipmi_watchdog to use the new watchdog framework.
>>
>> I ran into a few issues and then realized the driver itself never really
>> binds to any hardware, so it makes the conversion process a little more
>> challenging.
>>
>> So a few questions to you before I waste my time in this area:
>>
>> - Is there any prior history about why the ipmi_watchdog was never
>>    converted to the new watchdog framework?  Lack of interest? Technical
>> hurdles?
>
> Mostly lack of interest, but there are some technical hurdles.
>
> It would be hard to implement some things.  The watchdog framework has
> no concept of pretimeouts.  And IPMI is message based, you send a

Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't work
for ipmi ? If so, can you explain ?

> message to a controller to do anything, and you have to wait for the
> response.  That doesn't work very well with the watchdog interface,
> which assumes you can do everything immediately.
>
Does it ? How so ? Please elaborate; I don't immediately see how the watchdog
subsystem would prevent you from using, say, kernel threads or delayed workers
to implement asynchronous access to or from any underlying hardware.

Guenter


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02  1:11   ` Guenter Roeck
@ 2014-05-02  4:38     ` Corey Minyard
  2014-05-02 13:17       ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Corey Minyard @ 2014-05-02  4:38 UTC (permalink / raw)
  To: Guenter Roeck, Don Zickus; +Cc: openipmi-developer, linux-watchdog

On 05/01/2014 08:11 PM, Guenter Roeck wrote:
> On 05/01/2014 05:38 PM, Corey Minyard wrote:
>> On 05/01/2014 08:58 AM, Don Zickus wrote:
>>> Hi Corey,
>>>
>>> I stumbled upon an issue with a partner of ours, where they booted
>>> their
>>> machine and tried to load the ipmi_watchdog module by hand and it
>>> failed.
>>>
>>> The reason it failed was that the iTCO watchdog driver was already
>>> loaded
>>> and it registered the misc device /dev/watchdog first.
>>>
>>> I looked at the ipmi watchdog driver and realized it was never
>>> converted
>>> to the new watchdog framework where the watchdog_core module manages
>>> the
>>> '/dev/watchdog' misc device.
>>>
>>> So being naive and not knowing much about IPMI, I decided to follow the
>>> helpful document
>>> Documentation/watchdog/convert_drivers_to_kernel_api.txt
>>> and convert the ipmi_watchdog to use the new watchdog framework.
>>>
>>> I ran into a few issues and then realized the driver itself never
>>> really
>>> binds to any hardware, so it makes the conversion process a little more
>>> challenging.
>>>
>>> So a few questions to you before I waste my time in this area:
>>>
>>> - Is there any prior history about why the ipmi_watchdog was never
>>>    converted to the new watchdog framework?  Lack of interest?
>>> Technical
>>> hurdles?
>>
>> Mostly lack of interest, but there are some technical hurdles.
>>
>> It would be hard to implement some things.  The watchdog framework has
>> no concept of pretimeouts.  And IPMI is message based, you send a
>
> Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't
> work
> for ipmi ? If so, can you explain ?
>

That isn't enough to be able to report the pretimeout to the user.  You
can set it and get it with those calls, but it also needs poll, fasync,
and read to be able to select on a pretimeout or block on a read.


>> message to a controller to do anything, and you have to wait for the
>> response.  That doesn't work very well with the watchdog interface,
>> which assumes you can do everything immediately.
>>
> Does it ? How so ? Please elaborate; I don't immediately see how the
> watchdog
> subsystem would prevent you from using, say, kernel threads or delayed
> workers
> to implement asynchronous access to or from any underlying hardware.
>
I was remembering you couldn't block in the various calls, but looking
at the code, I see that's not an issue.

-corey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02  4:38     ` Corey Minyard
@ 2014-05-02 13:17       ` Guenter Roeck
  2014-05-02 16:44         ` Don Zickus
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2014-05-02 13:17 UTC (permalink / raw)
  To: minyard, Don Zickus; +Cc: openipmi-developer, linux-watchdog

On 05/01/2014 09:38 PM, Corey Minyard wrote:
> On 05/01/2014 08:11 PM, Guenter Roeck wrote:
>> On 05/01/2014 05:38 PM, Corey Minyard wrote:
>>> On 05/01/2014 08:58 AM, Don Zickus wrote:
>>>> Hi Corey,
>>>>
>>>> I stumbled upon an issue with a partner of ours, where they booted
>>>> their
>>>> machine and tried to load the ipmi_watchdog module by hand and it
>>>> failed.
>>>>
>>>> The reason it failed was that the iTCO watchdog driver was already
>>>> loaded
>>>> and it registered the misc device /dev/watchdog first.
>>>>
>>>> I looked at the ipmi watchdog driver and realized it was never
>>>> converted
>>>> to the new watchdog framework where the watchdog_core module manages
>>>> the
>>>> '/dev/watchdog' misc device.
>>>>
>>>> So being naive and not knowing much about IPMI, I decided to follow the
>>>> helpful document
>>>> Documentation/watchdog/convert_drivers_to_kernel_api.txt
>>>> and convert the ipmi_watchdog to use the new watchdog framework.
>>>>
>>>> I ran into a few issues and then realized the driver itself never
>>>> really
>>>> binds to any hardware, so it makes the conversion process a little more
>>>> challenging.
>>>>
>>>> So a few questions to you before I waste my time in this area:
>>>>
>>>> - Is there any prior history about why the ipmi_watchdog was never
>>>>     converted to the new watchdog framework?  Lack of interest?
>>>> Technical
>>>> hurdles?
>>>
>>> Mostly lack of interest, but there are some technical hurdles.
>>>
>>> It would be hard to implement some things.  The watchdog framework has
>>> no concept of pretimeouts.  And IPMI is message based, you send a
>>
>> Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't
>> work
>> for ipmi ? If so, can you explain ?
>>
>
> That isn't enough to be able to report the pretimeout to the user.  You
> can set it and get it with those calls, but it also needs poll, fasync,
> and read to be able to select on a pretimeout or block on a read.
>

Ah, but now you are talking about a specific implementation, which is a bit
different. The question here is what you expect to occur when a pretimeout
happens, and you have a certain set of expectations. Personally I don't know
what the best solution is; maybe a sysfs attribute or, yes, some activity
on the watchdog device entry. Why don't you (or Don) suggest something
and come up with a patch set for review ?

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02  0:38 ` Corey Minyard
  2014-05-02  1:11   ` Guenter Roeck
@ 2014-05-02 15:10   ` Don Zickus
  1 sibling, 0 replies; 14+ messages in thread
From: Don Zickus @ 2014-05-02 15:10 UTC (permalink / raw)
  To: Corey Minyard; +Cc: openipmi-developer, linux-watchdog

On Thu, May 01, 2014 at 07:38:18PM -0500, Corey Minyard wrote:
> On 05/01/2014 08:58 AM, Don Zickus wrote:
> > Hi Corey,
> >
> > I stumbled upon an issue with a partner of ours, where they booted their
> > machine and tried to load the ipmi_watchdog module by hand and it failed.
> >
> > The reason it failed was that the iTCO watchdog driver was already loaded
> > and it registered the misc device /dev/watchdog first.
> >
> > I looked at the ipmi watchdog driver and realized it was never converted
> > to the new watchdog framework where the watchdog_core module manages the
> > '/dev/watchdog' misc device.
> >
> > So being naive and not knowing much about IPMI, I decided to follow the
> > helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt
> > and convert the ipmi_watchdog to use the new watchdog framework.
> >
> > I ran into a few issues and then realized the driver itself never really
> > binds to any hardware, so it makes the conversion process a little more
> > challenging.
> >
> > So a few questions to you before I waste my time in this area:
> >
> > - Is there any prior history about why the ipmi_watchdog was never
> >   converted to the new watchdog framework?  Lack of interest? Technical
> > hurdles?
> 
> Mostly lack of interest, but there are some technical hurdles.

Hi Corey,

Thanks for all the responses.

> 
> It would be hard to implement some things.  The watchdog framework has
> no concept of pretimeouts.  And IPMI is message based, you send a
> message to a controller to do anything, and you have to wait for the
> response.  That doesn't work very well with the watchdog interface,
> which assumes you can do everything immediately.

I will defer this conversation to Guenter's expertise.  I am willing to hack
up any suggestions the both of you come up with here to see if everything
works well (same goes for the fasync/poll stuff).

> 
> >
> > - Is there a reason why the ipmi_watchdog is a seperate module as opposed
> >   to being called by ipmi_si?  It seems there shouldn't be an issue with
> > the watchdog always loaded, it just won't do anything until someone opens
> > it (from my understanding).  Also you would gain the ability to use the
> > shutdown/remove routines properly instead of the reboot/panic notifiers.
> 
> I'm not sure I understand this.  Why would you want it as part of
> ipmi_si?  ipmi_msghandler would be a little more logical, but IMHO still
> doesn't make sense.  It uses the IPMI interface, and the interface is
> designed to have multiple users.  Better to keep it separate because
> it's a separate function.
> 
> I also don't understand the comment about shutdown/remove instead of
> reboot/panic.  Can you elaborate on that?

So part of the problem with ipmi_watchdog is that it can't load
automatically (like a normal driver).  The reason is that it doesn't
attach to any device.  My suggestion to roll it into ipmi_si (or
ipmi_msghandler works too), was to help with the autoloading part.

To counter-argue the argument that a customer may not want the watchdog
running, I would argue that it does nothing until someone opens it the
first time.  So autoloading shouldn't have a big downside to it.

The second part I was trying to change is to remove the panic/reboot
notifiers and instead use proper shutdown/remove functions.  This would
make it easier to use the 'struct watchdog_device' pointer as the
pointer could be embedded in the per device struct.  Though on the other
hand, there is only ever one ipmi device per system, so maybe having it
global isn't a big deal.  I was trying to think of scalability issues.

> 
> >   In addition, passing the pointer to the 'struct watchdog_device' would be
> > easier if some of those extra pieces were not there (as opposed to making
> > it a global reference).
> >
> > - What does the fasync and poll calls do for a watchdog?
> 
> The IPMI watchdog has the ability to report a pretimeout at a specific
> amount of time before the final timeout, presumably to take some action
> before the system reboots.  the fasync and poll (and read) calls let
> this be reported to the user.
> 
> >
> > I'll start with that for now.
> >
> > I appreciate any feedback.  Currently we just implemented blacklisting the
> > iTCO watchdog driver to workaround this problem.  I thought we could do
> > better, hence my motivation to do work in this area.
> 
> It would be nice, yes.  I'm afraid to get all the functionality would be
> a lot of hacking on the watchdog framework or removal of function from
> the driver.

We can take it step by step and see how we can get there.  Again my goal
was to help a partner of ours get the ipmi_watchdog to play nice in their
system without resorting to blacklisting other watchdogs.  In addition, I
thought it would help with out of the box configuration if the
ipmi_watchdog could autoload if the ipmi pieces were loaded in the system
too instead of having to add an entry to /etc/modprobe.d.

Thanks for the help so far!

Cheers,
Don

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02 13:17       ` Guenter Roeck
@ 2014-05-02 16:44         ` Don Zickus
  2014-05-02 17:18           ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Don Zickus @ 2014-05-02 16:44 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: minyard, openipmi-developer, linux-watchdog

On Fri, May 02, 2014 at 06:17:51AM -0700, Guenter Roeck wrote:
> On 05/01/2014 09:38 PM, Corey Minyard wrote:
> >On 05/01/2014 08:11 PM, Guenter Roeck wrote:
> >>On 05/01/2014 05:38 PM, Corey Minyard wrote:
> >>>On 05/01/2014 08:58 AM, Don Zickus wrote:
> >>>>Hi Corey,
> >>>>
> >>>>I stumbled upon an issue with a partner of ours, where they booted
> >>>>their
> >>>>machine and tried to load the ipmi_watchdog module by hand and it
> >>>>failed.
> >>>>
> >>>>The reason it failed was that the iTCO watchdog driver was already
> >>>>loaded
> >>>>and it registered the misc device /dev/watchdog first.
> >>>>
> >>>>I looked at the ipmi watchdog driver and realized it was never
> >>>>converted
> >>>>to the new watchdog framework where the watchdog_core module manages
> >>>>the
> >>>>'/dev/watchdog' misc device.
> >>>>
> >>>>So being naive and not knowing much about IPMI, I decided to follow the
> >>>>helpful document
> >>>>Documentation/watchdog/convert_drivers_to_kernel_api.txt
> >>>>and convert the ipmi_watchdog to use the new watchdog framework.
> >>>>
> >>>>I ran into a few issues and then realized the driver itself never
> >>>>really
> >>>>binds to any hardware, so it makes the conversion process a little more
> >>>>challenging.
> >>>>
> >>>>So a few questions to you before I waste my time in this area:
> >>>>
> >>>>- Is there any prior history about why the ipmi_watchdog was never
> >>>>    converted to the new watchdog framework?  Lack of interest?
> >>>>Technical
> >>>>hurdles?
> >>>
> >>>Mostly lack of interest, but there are some technical hurdles.
> >>>
> >>>It would be hard to implement some things.  The watchdog framework has
> >>>no concept of pretimeouts.  And IPMI is message based, you send a
> >>
> >>Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't
> >>work
> >>for ipmi ? If so, can you explain ?
> >>
> >
> >That isn't enough to be able to report the pretimeout to the user.  You
> >can set it and get it with those calls, but it also needs poll, fasync,
> >and read to be able to select on a pretimeout or block on a read.
> >
> 
> Ah, but now you are talking about a specific implementation, which is a bit
> different. The question here is what you expect to occur when a pretimeout
> happens, and you have a certain set of expectations. Personally I don't know
> what the best solution is; maybe a sysfs attribute or, yes, some activity
> on the watchdog device entry. Why don't you (or Don) suggest something
> and come up with a patch set for review ?

I look through the only other two watchdogs that I could find with
pretimeouts (kempld and hpwdt).  hpwdt uses NMI as its pretimeout
notification, while kempld uses a low level configured action (nmi, smi,
sci, delay).  I think ipmi is the only one that chooses a user space
implementation (which raises another question[1]).

I can try to respectfully copy the ipmi implementation to watchdog_dev.c
and set a wdd->option to indicate its use and in addition add the
pretimeout ioctls to watchdog_dev.c (and struct watchdog_device).

Otherwise I am not sure if adding read, fasync, and poll wrappers to
watchdog_dev.c looks like a dirty hack.

Cheers,
Don

[1] if the system is stuck such that the pretimeout goes off, is it even
possible for userspace to run?  Or guaranteed that it could run reliably?
Just curious behind the history for this addition.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02 16:44         ` Don Zickus
@ 2014-05-02 17:18           ` Guenter Roeck
  2014-05-02 17:46             ` Don Zickus
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2014-05-02 17:18 UTC (permalink / raw)
  To: Don Zickus; +Cc: minyard, openipmi-developer, linux-watchdog

On 05/02/2014 09:44 AM, Don Zickus wrote:
> On Fri, May 02, 2014 at 06:17:51AM -0700, Guenter Roeck wrote:
>> On 05/01/2014 09:38 PM, Corey Minyard wrote:
>>> On 05/01/2014 08:11 PM, Guenter Roeck wrote:
>>>> On 05/01/2014 05:38 PM, Corey Minyard wrote:
>>>>> On 05/01/2014 08:58 AM, Don Zickus wrote:
>>>>>> Hi Corey,
>>>>>>
>>>>>> I stumbled upon an issue with a partner of ours, where they booted
>>>>>> their
>>>>>> machine and tried to load the ipmi_watchdog module by hand and it
>>>>>> failed.
>>>>>>
>>>>>> The reason it failed was that the iTCO watchdog driver was already
>>>>>> loaded
>>>>>> and it registered the misc device /dev/watchdog first.
>>>>>>
>>>>>> I looked at the ipmi watchdog driver and realized it was never
>>>>>> converted
>>>>>> to the new watchdog framework where the watchdog_core module manages
>>>>>> the
>>>>>> '/dev/watchdog' misc device.
>>>>>>
>>>>>> So being naive and not knowing much about IPMI, I decided to follow the
>>>>>> helpful document
>>>>>> Documentation/watchdog/convert_drivers_to_kernel_api.txt
>>>>>> and convert the ipmi_watchdog to use the new watchdog framework.
>>>>>>
>>>>>> I ran into a few issues and then realized the driver itself never
>>>>>> really
>>>>>> binds to any hardware, so it makes the conversion process a little more
>>>>>> challenging.
>>>>>>
>>>>>> So a few questions to you before I waste my time in this area:
>>>>>>
>>>>>> - Is there any prior history about why the ipmi_watchdog was never
>>>>>>     converted to the new watchdog framework?  Lack of interest?
>>>>>> Technical
>>>>>> hurdles?
>>>>>
>>>>> Mostly lack of interest, but there are some technical hurdles.
>>>>>
>>>>> It would be hard to implement some things.  The watchdog framework has
>>>>> no concept of pretimeouts.  And IPMI is message based, you send a
>>>>
>>>> Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't
>>>> work
>>>> for ipmi ? If so, can you explain ?
>>>>
>>>
>>> That isn't enough to be able to report the pretimeout to the user.  You
>>> can set it and get it with those calls, but it also needs poll, fasync,
>>> and read to be able to select on a pretimeout or block on a read.
>>>
>>
>> Ah, but now you are talking about a specific implementation, which is a bit
>> different. The question here is what you expect to occur when a pretimeout
>> happens, and you have a certain set of expectations. Personally I don't know
>> what the best solution is; maybe a sysfs attribute or, yes, some activity
>> on the watchdog device entry. Why don't you (or Don) suggest something
>> and come up with a patch set for review ?
>
> I look through the only other two watchdogs that I could find with
> pretimeouts (kempld and hpwdt).  hpwdt uses NMI as its pretimeout
> notification, while kempld uses a low level configured action (nmi, smi,
> sci, delay).  I think ipmi is the only one that chooses a user space
> implementation (which raises another question[1]).
>
> I can try to respectfully copy the ipmi implementation to watchdog_dev.c
> and set a wdd->option to indicate its use and in addition add the
> pretimeout ioctls to watchdog_dev.c (and struct watchdog_device).
>
> Otherwise I am not sure if adding read, fasync, and poll wrappers to
> watchdog_dev.c looks like a dirty hack.
>
> Cheers,
> Don
>
> [1] if the system is stuck such that the pretimeout goes off, is it even
> possible for userspace to run?  Or guaranteed that it could run reliably?
> Just curious behind the history for this addition.
>

I would guess it depends. In most cases, I would assume it reflects that the
watchdog daemon did not run. This in turn may suggest that userspace is,
for all practical purposes, unable to run. Given that, I would suspect that
a solution which depends on user space to act will in most cases not be able
to fulfil its purpose, and I would not want to depend on it.

Note that kempld in practice only implements NMI, though the HW can
do more. I can ask Kontron for feedback on their opinion for possible
actions (and why they didn't implement other actions in the driver).

Guenter


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02 17:18           ` Guenter Roeck
@ 2014-05-02 17:46             ` Don Zickus
  2014-05-02 21:52               ` Corey Minyard
  0 siblings, 1 reply; 14+ messages in thread
From: Don Zickus @ 2014-05-02 17:46 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: minyard, openipmi-developer, linux-watchdog

On Fri, May 02, 2014 at 10:18:03AM -0700, Guenter Roeck wrote:
> >>>That isn't enough to be able to report the pretimeout to the user.  You
> >>>can set it and get it with those calls, but it also needs poll, fasync,
> >>>and read to be able to select on a pretimeout or block on a read.
> >>>
> >>
> >>Ah, but now you are talking about a specific implementation, which is a bit
> >>different. The question here is what you expect to occur when a pretimeout
> >>happens, and you have a certain set of expectations. Personally I don't know
> >>what the best solution is; maybe a sysfs attribute or, yes, some activity
> >>on the watchdog device entry. Why don't you (or Don) suggest something
> >>and come up with a patch set for review ?
> >
> >I look through the only other two watchdogs that I could find with
> >pretimeouts (kempld and hpwdt).  hpwdt uses NMI as its pretimeout
> >notification, while kempld uses a low level configured action (nmi, smi,
> >sci, delay).  I think ipmi is the only one that chooses a user space
> >implementation (which raises another question[1]).
> >
> >I can try to respectfully copy the ipmi implementation to watchdog_dev.c
> >and set a wdd->option to indicate its use and in addition add the
> >pretimeout ioctls to watchdog_dev.c (and struct watchdog_device).
> >
> >Otherwise I am not sure if adding read, fasync, and poll wrappers to
> >watchdog_dev.c looks like a dirty hack.
> >
> >Cheers,
> >Don
> >
> >[1] if the system is stuck such that the pretimeout goes off, is it even
> >possible for userspace to run?  Or guaranteed that it could run reliably?
> >Just curious behind the history for this addition.
> >
> 
> I would guess it depends. In most cases, I would assume it reflects that the
> watchdog daemon did not run. This in turn may suggest that userspace is,
> for all practical purposes, unable to run. Given that, I would suspect that
> a solution which depends on user space to act will in most cases not be able
> to fulfil its purpose, and I would not want to depend on it.

I am sure Corey ran into one vendor who wanted it, hence why he
implemented it.  But yeah, I am not sure I would depend on it either.

> 
> Note that kempld in practice only implements NMI, though the HW can
> do more. I can ask Kontron for feedback on their opinion for possible
> actions (and why they didn't implement other actions in the driver).

I don't have much interest in it really.  I was just looking at what other
vendors did as reference.  NMI makes sense.  Though I don't see an NMI
handler, so I assume they just use the NMI to force a panic (like the
hpwdt does, sortof)?

Cheers,
Don

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02 17:46             ` Don Zickus
@ 2014-05-02 21:52               ` Corey Minyard
  2014-05-02 23:20                 ` Guenter Roeck
  2014-05-03  2:10                 ` Don Zickus
  0 siblings, 2 replies; 14+ messages in thread
From: Corey Minyard @ 2014-05-02 21:52 UTC (permalink / raw)
  To: Don Zickus, Guenter Roeck; +Cc: openipmi-developer, linux-watchdog

On 05/02/2014 12:46 PM, Don Zickus wrote:
> On Fri, May 02, 2014 at 10:18:03AM -0700, Guenter Roeck wrote:
>>>>> That isn't enough to be able to report the pretimeout to the user.  You
>>>>> can set it and get it with those calls, but it also needs poll, fasync,
>>>>> and read to be able to select on a pretimeout or block on a read.
>>>>>
>>>> Ah, but now you are talking about a specific implementation, which is a bit
>>>> different. The question here is what you expect to occur when a pretimeout
>>>> happens, and you have a certain set of expectations. Personally I don't know
>>>> what the best solution is; maybe a sysfs attribute or, yes, some activity
>>>> on the watchdog device entry. Why don't you (or Don) suggest something
>>>> and come up with a patch set for review ?
>>> I look through the only other two watchdogs that I could find with
>>> pretimeouts (kempld and hpwdt).  hpwdt uses NMI as its pretimeout
>>> notification, while kempld uses a low level configured action (nmi, smi,
>>> sci, delay).  I think ipmi is the only one that chooses a user space
>>> implementation (which raises another question[1]).
>>>
>>> I can try to respectfully copy the ipmi implementation to watchdog_dev.c
>>> and set a wdd->option to indicate its use and in addition add the
>>> pretimeout ioctls to watchdog_dev.c (and struct watchdog_device).
>>>
>>> Otherwise I am not sure if adding read, fasync, and poll wrappers to
>>> watchdog_dev.c looks like a dirty hack.
>>>
>>> Cheers,
>>> Don
>>>
>>> [1] if the system is stuck such that the pretimeout goes off, is it even
>>> possible for userspace to run?  Or guaranteed that it could run reliably?
>>> Just curious behind the history for this addition.
>>>
>> I would guess it depends. In most cases, I would assume it reflects that the
>> watchdog daemon did not run. This in turn may suggest that userspace is,
>> for all practical purposes, unable to run. Given that, I would suspect that
>> a solution which depends on user space to act will in most cases not be able
>> to fulfil its purpose, and I would not want to depend on it.
> I am sure Corey ran into one vendor who wanted it, hence why he
> implemented it.  But yeah, I am not sure I would depend on it either.

The driver hardware can do an NMI or an indication through an IPMI
interrupt/signal, and the driver can either panic or try to give
information to the user.

I don't exactly remember the reason for giving a pretimeout to the
user.  I think there were some older implementations that did this, so I
kept the function. You can't depend on it, of course, but if it can work
then debugging a problem with the watchdog daemon (or something else in
userland) would be a lot easier with an option like this.  I have no
idea if anyone uses it.

I do agree that the driver should be moved over to use the framework. 
Implementing read/poll should be easy in the framework.

Don, are you interested in working on this?  i won't be able to get to
it for a bit.

-corey

>
>> Note that kempld in practice only implements NMI, though the HW can
>> do more. I can ask Kontron for feedback on their opinion for possible
>> actions (and why they didn't implement other actions in the driver).
> I don't have much interest in it really.  I was just looking at what other
> vendors did as reference.  NMI makes sense.  Though I don't see an NMI
> handler, so I assume they just use the NMI to force a panic (like the
> hpwdt does, sortof)?
>
> Cheers,
> Don


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02 21:52               ` Corey Minyard
@ 2014-05-02 23:20                 ` Guenter Roeck
  2014-05-03  2:10                 ` Don Zickus
  1 sibling, 0 replies; 14+ messages in thread
From: Guenter Roeck @ 2014-05-02 23:20 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Don Zickus, openipmi-developer, linux-watchdog

On Fri, May 02, 2014 at 04:52:12PM -0500, Corey Minyard wrote:
[ ... ]
> 
> I do agree that the driver should be moved over to use the framework. 
> Implementing read/poll should be easy in the framework.
> 
If it isn't essential, I would prefer to drop it for now.
Otherwise, you'll have to convince Wim to accept it.
I am personally not convinced that it is worthwhile.
Eother case, the core changes should be a separate patch
(or set of patches), so the work can be independent of each other.

[ We might want to explore if a sysfs attribute would make more sense
  than the poll/fasync changes, for example ]

In this context, does the driver have to reside in the ipmi directory ?
It would be better to have it in the watchdog directory; that would ensure
that the watchdog maintainer is involved if there are changes.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-02 21:52               ` Corey Minyard
  2014-05-02 23:20                 ` Guenter Roeck
@ 2014-05-03  2:10                 ` Don Zickus
  2014-05-03  2:51                   ` Corey Minyard
  2014-05-03  4:23                   ` Guenter Roeck
  1 sibling, 2 replies; 14+ messages in thread
From: Don Zickus @ 2014-05-03  2:10 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Guenter Roeck, openipmi-developer, linux-watchdog

On Fri, May 02, 2014 at 04:52:12PM -0500, Corey Minyard wrote:
> On 05/02/2014 12:46 PM, Don Zickus wrote:
> > On Fri, May 02, 2014 at 10:18:03AM -0700, Guenter Roeck wrote:
> >>>>> That isn't enough to be able to report the pretimeout to the user.  You
> >>>>> can set it and get it with those calls, but it also needs poll, fasync,
> >>>>> and read to be able to select on a pretimeout or block on a read.
> >>>>>
> >>>> Ah, but now you are talking about a specific implementation, which is a bit
> >>>> different. The question here is what you expect to occur when a pretimeout
> >>>> happens, and you have a certain set of expectations. Personally I don't know
> >>>> what the best solution is; maybe a sysfs attribute or, yes, some activity
> >>>> on the watchdog device entry. Why don't you (or Don) suggest something
> >>>> and come up with a patch set for review ?
> >>> I look through the only other two watchdogs that I could find with
> >>> pretimeouts (kempld and hpwdt).  hpwdt uses NMI as its pretimeout
> >>> notification, while kempld uses a low level configured action (nmi, smi,
> >>> sci, delay).  I think ipmi is the only one that chooses a user space
> >>> implementation (which raises another question[1]).
> >>>
> >>> I can try to respectfully copy the ipmi implementation to watchdog_dev.c
> >>> and set a wdd->option to indicate its use and in addition add the
> >>> pretimeout ioctls to watchdog_dev.c (and struct watchdog_device).
> >>>
> >>> Otherwise I am not sure if adding read, fasync, and poll wrappers to
> >>> watchdog_dev.c looks like a dirty hack.
> >>>
> >>> Cheers,
> >>> Don
> >>>
> >>> [1] if the system is stuck such that the pretimeout goes off, is it even
> >>> possible for userspace to run?  Or guaranteed that it could run reliably?
> >>> Just curious behind the history for this addition.
> >>>
> >> I would guess it depends. In most cases, I would assume it reflects that the
> >> watchdog daemon did not run. This in turn may suggest that userspace is,
> >> for all practical purposes, unable to run. Given that, I would suspect that
> >> a solution which depends on user space to act will in most cases not be able
> >> to fulfil its purpose, and I would not want to depend on it.
> > I am sure Corey ran into one vendor who wanted it, hence why he
> > implemented it.  But yeah, I am not sure I would depend on it either.
> 
> The driver hardware can do an NMI or an indication through an IPMI
> interrupt/signal, and the driver can either panic or try to give
> information to the user.
> 
> I don't exactly remember the reason for giving a pretimeout to the
> user.  I think there were some older implementations that did this, so I
> kept the function. You can't depend on it, of course, but if it can work
> then debugging a problem with the watchdog daemon (or something else in
> userland) would be a lot easier with an option like this.  I have no
> idea if anyone uses it.
> 
> I do agree that the driver should be moved over to use the framework. 
> Implementing read/poll should be easy in the framework.
> 
> Don, are you interested in working on this?  i won't be able to get to
> it for a bit.

I don't mind doing the work, as long as we agree on an implementation. :-)

Just throwing an idea out there based on Guenter's reply, should I just
create a new file ipmi_wdt in the drivers/watchdog area that is ported to
the new watchdog framework?

The idea is if a user pops up that wants the userspace solution that can
fallback to the old ipmi_watchdog currently implemented.  Otherwise the
new port would be the recommended way to go?  Seeing that the current
driver is updated fairly infrequent, keeping both drivers in sync wouldn't
be to hard.

Just a random idea.

Cheers,
Don

> 
> -corey
> 
> >
> >> Note that kempld in practice only implements NMI, though the HW can
> >> do more. I can ask Kontron for feedback on their opinion for possible
> >> actions (and why they didn't implement other actions in the driver).
> > I don't have much interest in it really.  I was just looking at what other
> > vendors did as reference.  NMI makes sense.  Though I don't see an NMI
> > handler, so I assume they just use the NMI to force a panic (like the
> > hpwdt does, sortof)?
> >
> > Cheers,
> > Don
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-03  2:10                 ` Don Zickus
@ 2014-05-03  2:51                   ` Corey Minyard
  2014-05-03  4:23                   ` Guenter Roeck
  1 sibling, 0 replies; 14+ messages in thread
From: Corey Minyard @ 2014-05-03  2:51 UTC (permalink / raw)
  To: Don Zickus; +Cc: Guenter Roeck, openipmi-developer, linux-watchdog

On 05/02/2014 09:10 PM, Don Zickus wrote:
> framework. 
> Implementing read/poll should be easy in the framework.
>
> Don, are you interested in working on this?  i won't be able to get to
> it for a bit.
> I don't mind doing the work, as long as we agree on an implementation. :-)
>
> Just throwing an idea out there based on Guenter's reply, should I just
> create a new file ipmi_wdt in the drivers/watchdog area that is ported to
> the new watchdog framework?

Yes, that's what I would do.  I definitely belongs in the watchdog
directory.

>
> The idea is if a user pops up that wants the userspace solution that can
> fallback to the old ipmi_watchdog currently implemented.  Otherwise the
> new port would be the recommended way to go?  Seeing that the current
> driver is updated fairly infrequent, keeping both drivers in sync wouldn't
> be to hard.
>
> Just a random idea.

Yes, that's probably a good plan.

-corey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ipmi watchdog questions
  2014-05-03  2:10                 ` Don Zickus
  2014-05-03  2:51                   ` Corey Minyard
@ 2014-05-03  4:23                   ` Guenter Roeck
  1 sibling, 0 replies; 14+ messages in thread
From: Guenter Roeck @ 2014-05-03  4:23 UTC (permalink / raw)
  To: Don Zickus; +Cc: Corey Minyard, openipmi-developer, linux-watchdog

On Fri, May 02, 2014 at 10:10:02PM -0400, Don Zickus wrote:
> On Fri, May 02, 2014 at 04:52:12PM -0500, Corey Minyard wrote:
> > On 05/02/2014 12:46 PM, Don Zickus wrote:
> > > On Fri, May 02, 2014 at 10:18:03AM -0700, Guenter Roeck wrote:
> > >>>>> That isn't enough to be able to report the pretimeout to the user.  You
> > >>>>> can set it and get it with those calls, but it also needs poll, fasync,
> > >>>>> and read to be able to select on a pretimeout or block on a read.
> > >>>>>
> > >>>> Ah, but now you are talking about a specific implementation, which is a bit
> > >>>> different. The question here is what you expect to occur when a pretimeout
> > >>>> happens, and you have a certain set of expectations. Personally I don't know
> > >>>> what the best solution is; maybe a sysfs attribute or, yes, some activity
> > >>>> on the watchdog device entry. Why don't you (or Don) suggest something
> > >>>> and come up with a patch set for review ?
> > >>> I look through the only other two watchdogs that I could find with
> > >>> pretimeouts (kempld and hpwdt).  hpwdt uses NMI as its pretimeout
> > >>> notification, while kempld uses a low level configured action (nmi, smi,
> > >>> sci, delay).  I think ipmi is the only one that chooses a user space
> > >>> implementation (which raises another question[1]).
> > >>>
> > >>> I can try to respectfully copy the ipmi implementation to watchdog_dev.c
> > >>> and set a wdd->option to indicate its use and in addition add the
> > >>> pretimeout ioctls to watchdog_dev.c (and struct watchdog_device).
> > >>>
> > >>> Otherwise I am not sure if adding read, fasync, and poll wrappers to
> > >>> watchdog_dev.c looks like a dirty hack.
> > >>>
> > >>> Cheers,
> > >>> Don
> > >>>
> > >>> [1] if the system is stuck such that the pretimeout goes off, is it even
> > >>> possible for userspace to run?  Or guaranteed that it could run reliably?
> > >>> Just curious behind the history for this addition.
> > >>>
> > >> I would guess it depends. In most cases, I would assume it reflects that the
> > >> watchdog daemon did not run. This in turn may suggest that userspace is,
> > >> for all practical purposes, unable to run. Given that, I would suspect that
> > >> a solution which depends on user space to act will in most cases not be able
> > >> to fulfil its purpose, and I would not want to depend on it.
> > > I am sure Corey ran into one vendor who wanted it, hence why he
> > > implemented it.  But yeah, I am not sure I would depend on it either.
> > 
> > The driver hardware can do an NMI or an indication through an IPMI
> > interrupt/signal, and the driver can either panic or try to give
> > information to the user.
> > 
> > I don't exactly remember the reason for giving a pretimeout to the
> > user.  I think there were some older implementations that did this, so I
> > kept the function. You can't depend on it, of course, but if it can work
> > then debugging a problem with the watchdog daemon (or something else in
> > userland) would be a lot easier with an option like this.  I have no
> > idea if anyone uses it.
> > 
> > I do agree that the driver should be moved over to use the framework. 
> > Implementing read/poll should be easy in the framework.
> > 
> > Don, are you interested in working on this?  i won't be able to get to
> > it for a bit.
> 
> I don't mind doing the work, as long as we agree on an implementation. :-)
> 
> Just throwing an idea out there based on Guenter's reply, should I just
> create a new file ipmi_wdt in the drivers/watchdog area that is ported to
> the new watchdog framework?
> 
I would suggest to move it in the 1st patch, then implement the wd framework
in the next patch. That makes it easier to identify the changes.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-05-03  4:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 13:58 ipmi watchdog questions Don Zickus
2014-05-02  0:38 ` Corey Minyard
2014-05-02  1:11   ` Guenter Roeck
2014-05-02  4:38     ` Corey Minyard
2014-05-02 13:17       ` Guenter Roeck
2014-05-02 16:44         ` Don Zickus
2014-05-02 17:18           ` Guenter Roeck
2014-05-02 17:46             ` Don Zickus
2014-05-02 21:52               ` Corey Minyard
2014-05-02 23:20                 ` Guenter Roeck
2014-05-03  2:10                 ` Don Zickus
2014-05-03  2:51                   ` Corey Minyard
2014-05-03  4:23                   ` Guenter Roeck
2014-05-02 15:10   ` Don Zickus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.