* ipmi watchdog questions @ 2014-05-01 13:58 Don Zickus 2014-05-02 0:38 ` Corey Minyard 0 siblings, 1 reply; 14+ messages in thread From: Don Zickus @ 2014-05-01 13:58 UTC (permalink / raw) To: minyard; +Cc: openipmi-developer, linux-watchdog Hi Corey, I stumbled upon an issue with a partner of ours, where they booted their machine and tried to load the ipmi_watchdog module by hand and it failed. The reason it failed was that the iTCO watchdog driver was already loaded and it registered the misc device /dev/watchdog first. I looked at the ipmi watchdog driver and realized it was never converted to the new watchdog framework where the watchdog_core module manages the '/dev/watchdog' misc device. So being naive and not knowing much about IPMI, I decided to follow the helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt and convert the ipmi_watchdog to use the new watchdog framework. I ran into a few issues and then realized the driver itself never really binds to any hardware, so it makes the conversion process a little more challenging. So a few questions to you before I waste my time in this area: - Is there any prior history about why the ipmi_watchdog was never converted to the new watchdog framework? Lack of interest? Technical hurdles? - Is there a reason why the ipmi_watchdog is a seperate module as opposed to being called by ipmi_si? It seems there shouldn't be an issue with the watchdog always loaded, it just won't do anything until someone opens it (from my understanding). Also you would gain the ability to use the shutdown/remove routines properly instead of the reboot/panic notifiers. In addition, passing the pointer to the 'struct watchdog_device' would be easier if some of those extra pieces were not there (as opposed to making it a global reference). - What does the fasync and poll calls do for a watchdog? I'll start with that for now. I appreciate any feedback. Currently we just implemented blacklisting the iTCO watchdog driver to workaround this problem. I thought we could do better, hence my motivation to do work in this area. Cheers, Don ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-01 13:58 ipmi watchdog questions Don Zickus @ 2014-05-02 0:38 ` Corey Minyard 2014-05-02 1:11 ` Guenter Roeck 2014-05-02 15:10 ` Don Zickus 0 siblings, 2 replies; 14+ messages in thread From: Corey Minyard @ 2014-05-02 0:38 UTC (permalink / raw) To: Don Zickus; +Cc: openipmi-developer, linux-watchdog On 05/01/2014 08:58 AM, Don Zickus wrote: > Hi Corey, > > I stumbled upon an issue with a partner of ours, where they booted their > machine and tried to load the ipmi_watchdog module by hand and it failed. > > The reason it failed was that the iTCO watchdog driver was already loaded > and it registered the misc device /dev/watchdog first. > > I looked at the ipmi watchdog driver and realized it was never converted > to the new watchdog framework where the watchdog_core module manages the > '/dev/watchdog' misc device. > > So being naive and not knowing much about IPMI, I decided to follow the > helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt > and convert the ipmi_watchdog to use the new watchdog framework. > > I ran into a few issues and then realized the driver itself never really > binds to any hardware, so it makes the conversion process a little more > challenging. > > So a few questions to you before I waste my time in this area: > > - Is there any prior history about why the ipmi_watchdog was never > converted to the new watchdog framework? Lack of interest? Technical > hurdles? Mostly lack of interest, but there are some technical hurdles. It would be hard to implement some things. The watchdog framework has no concept of pretimeouts. And IPMI is message based, you send a message to a controller to do anything, and you have to wait for the response. That doesn't work very well with the watchdog interface, which assumes you can do everything immediately. > > - Is there a reason why the ipmi_watchdog is a seperate module as opposed > to being called by ipmi_si? It seems there shouldn't be an issue with > the watchdog always loaded, it just won't do anything until someone opens > it (from my understanding). Also you would gain the ability to use the > shutdown/remove routines properly instead of the reboot/panic notifiers. I'm not sure I understand this. Why would you want it as part of ipmi_si? ipmi_msghandler would be a little more logical, but IMHO still doesn't make sense. It uses the IPMI interface, and the interface is designed to have multiple users. Better to keep it separate because it's a separate function. I also don't understand the comment about shutdown/remove instead of reboot/panic. Can you elaborate on that? > In addition, passing the pointer to the 'struct watchdog_device' would be > easier if some of those extra pieces were not there (as opposed to making > it a global reference). > > - What does the fasync and poll calls do for a watchdog? The IPMI watchdog has the ability to report a pretimeout at a specific amount of time before the final timeout, presumably to take some action before the system reboots. the fasync and poll (and read) calls let this be reported to the user. > > I'll start with that for now. > > I appreciate any feedback. Currently we just implemented blacklisting the > iTCO watchdog driver to workaround this problem. I thought we could do > better, hence my motivation to do work in this area. It would be nice, yes. I'm afraid to get all the functionality would be a lot of hacking on the watchdog framework or removal of function from the driver. Thanks, -corey > > Cheers, > Don > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 0:38 ` Corey Minyard @ 2014-05-02 1:11 ` Guenter Roeck 2014-05-02 4:38 ` Corey Minyard 2014-05-02 15:10 ` Don Zickus 1 sibling, 1 reply; 14+ messages in thread From: Guenter Roeck @ 2014-05-02 1:11 UTC (permalink / raw) To: minyard, Don Zickus; +Cc: openipmi-developer, linux-watchdog On 05/01/2014 05:38 PM, Corey Minyard wrote: > On 05/01/2014 08:58 AM, Don Zickus wrote: >> Hi Corey, >> >> I stumbled upon an issue with a partner of ours, where they booted their >> machine and tried to load the ipmi_watchdog module by hand and it failed. >> >> The reason it failed was that the iTCO watchdog driver was already loaded >> and it registered the misc device /dev/watchdog first. >> >> I looked at the ipmi watchdog driver and realized it was never converted >> to the new watchdog framework where the watchdog_core module manages the >> '/dev/watchdog' misc device. >> >> So being naive and not knowing much about IPMI, I decided to follow the >> helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt >> and convert the ipmi_watchdog to use the new watchdog framework. >> >> I ran into a few issues and then realized the driver itself never really >> binds to any hardware, so it makes the conversion process a little more >> challenging. >> >> So a few questions to you before I waste my time in this area: >> >> - Is there any prior history about why the ipmi_watchdog was never >> converted to the new watchdog framework? Lack of interest? Technical >> hurdles? > > Mostly lack of interest, but there are some technical hurdles. > > It would be hard to implement some things. The watchdog framework has > no concept of pretimeouts. And IPMI is message based, you send a Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't work for ipmi ? If so, can you explain ? > message to a controller to do anything, and you have to wait for the > response. That doesn't work very well with the watchdog interface, > which assumes you can do everything immediately. > Does it ? How so ? Please elaborate; I don't immediately see how the watchdog subsystem would prevent you from using, say, kernel threads or delayed workers to implement asynchronous access to or from any underlying hardware. Guenter ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 1:11 ` Guenter Roeck @ 2014-05-02 4:38 ` Corey Minyard 2014-05-02 13:17 ` Guenter Roeck 0 siblings, 1 reply; 14+ messages in thread From: Corey Minyard @ 2014-05-02 4:38 UTC (permalink / raw) To: Guenter Roeck, Don Zickus; +Cc: openipmi-developer, linux-watchdog On 05/01/2014 08:11 PM, Guenter Roeck wrote: > On 05/01/2014 05:38 PM, Corey Minyard wrote: >> On 05/01/2014 08:58 AM, Don Zickus wrote: >>> Hi Corey, >>> >>> I stumbled upon an issue with a partner of ours, where they booted >>> their >>> machine and tried to load the ipmi_watchdog module by hand and it >>> failed. >>> >>> The reason it failed was that the iTCO watchdog driver was already >>> loaded >>> and it registered the misc device /dev/watchdog first. >>> >>> I looked at the ipmi watchdog driver and realized it was never >>> converted >>> to the new watchdog framework where the watchdog_core module manages >>> the >>> '/dev/watchdog' misc device. >>> >>> So being naive and not knowing much about IPMI, I decided to follow the >>> helpful document >>> Documentation/watchdog/convert_drivers_to_kernel_api.txt >>> and convert the ipmi_watchdog to use the new watchdog framework. >>> >>> I ran into a few issues and then realized the driver itself never >>> really >>> binds to any hardware, so it makes the conversion process a little more >>> challenging. >>> >>> So a few questions to you before I waste my time in this area: >>> >>> - Is there any prior history about why the ipmi_watchdog was never >>> converted to the new watchdog framework? Lack of interest? >>> Technical >>> hurdles? >> >> Mostly lack of interest, but there are some technical hurdles. >> >> It would be hard to implement some things. The watchdog framework has >> no concept of pretimeouts. And IPMI is message based, you send a > > Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't > work > for ipmi ? If so, can you explain ? > That isn't enough to be able to report the pretimeout to the user. You can set it and get it with those calls, but it also needs poll, fasync, and read to be able to select on a pretimeout or block on a read. >> message to a controller to do anything, and you have to wait for the >> response. That doesn't work very well with the watchdog interface, >> which assumes you can do everything immediately. >> > Does it ? How so ? Please elaborate; I don't immediately see how the > watchdog > subsystem would prevent you from using, say, kernel threads or delayed > workers > to implement asynchronous access to or from any underlying hardware. > I was remembering you couldn't block in the various calls, but looking at the code, I see that's not an issue. -corey ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 4:38 ` Corey Minyard @ 2014-05-02 13:17 ` Guenter Roeck 2014-05-02 16:44 ` Don Zickus 0 siblings, 1 reply; 14+ messages in thread From: Guenter Roeck @ 2014-05-02 13:17 UTC (permalink / raw) To: minyard, Don Zickus; +Cc: openipmi-developer, linux-watchdog On 05/01/2014 09:38 PM, Corey Minyard wrote: > On 05/01/2014 08:11 PM, Guenter Roeck wrote: >> On 05/01/2014 05:38 PM, Corey Minyard wrote: >>> On 05/01/2014 08:58 AM, Don Zickus wrote: >>>> Hi Corey, >>>> >>>> I stumbled upon an issue with a partner of ours, where they booted >>>> their >>>> machine and tried to load the ipmi_watchdog module by hand and it >>>> failed. >>>> >>>> The reason it failed was that the iTCO watchdog driver was already >>>> loaded >>>> and it registered the misc device /dev/watchdog first. >>>> >>>> I looked at the ipmi watchdog driver and realized it was never >>>> converted >>>> to the new watchdog framework where the watchdog_core module manages >>>> the >>>> '/dev/watchdog' misc device. >>>> >>>> So being naive and not knowing much about IPMI, I decided to follow the >>>> helpful document >>>> Documentation/watchdog/convert_drivers_to_kernel_api.txt >>>> and convert the ipmi_watchdog to use the new watchdog framework. >>>> >>>> I ran into a few issues and then realized the driver itself never >>>> really >>>> binds to any hardware, so it makes the conversion process a little more >>>> challenging. >>>> >>>> So a few questions to you before I waste my time in this area: >>>> >>>> - Is there any prior history about why the ipmi_watchdog was never >>>> converted to the new watchdog framework? Lack of interest? >>>> Technical >>>> hurdles? >>> >>> Mostly lack of interest, but there are some technical hurdles. >>> >>> It would be hard to implement some things. The watchdog framework has >>> no concept of pretimeouts. And IPMI is message based, you send a >> >> Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't >> work >> for ipmi ? If so, can you explain ? >> > > That isn't enough to be able to report the pretimeout to the user. You > can set it and get it with those calls, but it also needs poll, fasync, > and read to be able to select on a pretimeout or block on a read. > Ah, but now you are talking about a specific implementation, which is a bit different. The question here is what you expect to occur when a pretimeout happens, and you have a certain set of expectations. Personally I don't know what the best solution is; maybe a sysfs attribute or, yes, some activity on the watchdog device entry. Why don't you (or Don) suggest something and come up with a patch set for review ? Thanks, Guenter ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 13:17 ` Guenter Roeck @ 2014-05-02 16:44 ` Don Zickus 2014-05-02 17:18 ` Guenter Roeck 0 siblings, 1 reply; 14+ messages in thread From: Don Zickus @ 2014-05-02 16:44 UTC (permalink / raw) To: Guenter Roeck; +Cc: minyard, openipmi-developer, linux-watchdog On Fri, May 02, 2014 at 06:17:51AM -0700, Guenter Roeck wrote: > On 05/01/2014 09:38 PM, Corey Minyard wrote: > >On 05/01/2014 08:11 PM, Guenter Roeck wrote: > >>On 05/01/2014 05:38 PM, Corey Minyard wrote: > >>>On 05/01/2014 08:58 AM, Don Zickus wrote: > >>>>Hi Corey, > >>>> > >>>>I stumbled upon an issue with a partner of ours, where they booted > >>>>their > >>>>machine and tried to load the ipmi_watchdog module by hand and it > >>>>failed. > >>>> > >>>>The reason it failed was that the iTCO watchdog driver was already > >>>>loaded > >>>>and it registered the misc device /dev/watchdog first. > >>>> > >>>>I looked at the ipmi watchdog driver and realized it was never > >>>>converted > >>>>to the new watchdog framework where the watchdog_core module manages > >>>>the > >>>>'/dev/watchdog' misc device. > >>>> > >>>>So being naive and not knowing much about IPMI, I decided to follow the > >>>>helpful document > >>>>Documentation/watchdog/convert_drivers_to_kernel_api.txt > >>>>and convert the ipmi_watchdog to use the new watchdog framework. > >>>> > >>>>I ran into a few issues and then realized the driver itself never > >>>>really > >>>>binds to any hardware, so it makes the conversion process a little more > >>>>challenging. > >>>> > >>>>So a few questions to you before I waste my time in this area: > >>>> > >>>>- Is there any prior history about why the ipmi_watchdog was never > >>>> converted to the new watchdog framework? Lack of interest? > >>>>Technical > >>>>hurdles? > >>> > >>>Mostly lack of interest, but there are some technical hurdles. > >>> > >>>It would be hard to implement some things. The watchdog framework has > >>>no concept of pretimeouts. And IPMI is message based, you send a > >> > >>Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't > >>work > >>for ipmi ? If so, can you explain ? > >> > > > >That isn't enough to be able to report the pretimeout to the user. You > >can set it and get it with those calls, but it also needs poll, fasync, > >and read to be able to select on a pretimeout or block on a read. > > > > Ah, but now you are talking about a specific implementation, which is a bit > different. The question here is what you expect to occur when a pretimeout > happens, and you have a certain set of expectations. Personally I don't know > what the best solution is; maybe a sysfs attribute or, yes, some activity > on the watchdog device entry. Why don't you (or Don) suggest something > and come up with a patch set for review ? I look through the only other two watchdogs that I could find with pretimeouts (kempld and hpwdt). hpwdt uses NMI as its pretimeout notification, while kempld uses a low level configured action (nmi, smi, sci, delay). I think ipmi is the only one that chooses a user space implementation (which raises another question[1]). I can try to respectfully copy the ipmi implementation to watchdog_dev.c and set a wdd->option to indicate its use and in addition add the pretimeout ioctls to watchdog_dev.c (and struct watchdog_device). Otherwise I am not sure if adding read, fasync, and poll wrappers to watchdog_dev.c looks like a dirty hack. Cheers, Don [1] if the system is stuck such that the pretimeout goes off, is it even possible for userspace to run? Or guaranteed that it could run reliably? Just curious behind the history for this addition. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 16:44 ` Don Zickus @ 2014-05-02 17:18 ` Guenter Roeck 2014-05-02 17:46 ` Don Zickus 0 siblings, 1 reply; 14+ messages in thread From: Guenter Roeck @ 2014-05-02 17:18 UTC (permalink / raw) To: Don Zickus; +Cc: minyard, openipmi-developer, linux-watchdog On 05/02/2014 09:44 AM, Don Zickus wrote: > On Fri, May 02, 2014 at 06:17:51AM -0700, Guenter Roeck wrote: >> On 05/01/2014 09:38 PM, Corey Minyard wrote: >>> On 05/01/2014 08:11 PM, Guenter Roeck wrote: >>>> On 05/01/2014 05:38 PM, Corey Minyard wrote: >>>>> On 05/01/2014 08:58 AM, Don Zickus wrote: >>>>>> Hi Corey, >>>>>> >>>>>> I stumbled upon an issue with a partner of ours, where they booted >>>>>> their >>>>>> machine and tried to load the ipmi_watchdog module by hand and it >>>>>> failed. >>>>>> >>>>>> The reason it failed was that the iTCO watchdog driver was already >>>>>> loaded >>>>>> and it registered the misc device /dev/watchdog first. >>>>>> >>>>>> I looked at the ipmi watchdog driver and realized it was never >>>>>> converted >>>>>> to the new watchdog framework where the watchdog_core module manages >>>>>> the >>>>>> '/dev/watchdog' misc device. >>>>>> >>>>>> So being naive and not knowing much about IPMI, I decided to follow the >>>>>> helpful document >>>>>> Documentation/watchdog/convert_drivers_to_kernel_api.txt >>>>>> and convert the ipmi_watchdog to use the new watchdog framework. >>>>>> >>>>>> I ran into a few issues and then realized the driver itself never >>>>>> really >>>>>> binds to any hardware, so it makes the conversion process a little more >>>>>> challenging. >>>>>> >>>>>> So a few questions to you before I waste my time in this area: >>>>>> >>>>>> - Is there any prior history about why the ipmi_watchdog was never >>>>>> converted to the new watchdog framework? Lack of interest? >>>>>> Technical >>>>>> hurdles? >>>>> >>>>> Mostly lack of interest, but there are some technical hurdles. >>>>> >>>>> It would be hard to implement some things. The watchdog framework has >>>>> no concept of pretimeouts. And IPMI is message based, you send a >>>> >>>> Are you saying that WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT don't >>>> work >>>> for ipmi ? If so, can you explain ? >>>> >>> >>> That isn't enough to be able to report the pretimeout to the user. You >>> can set it and get it with those calls, but it also needs poll, fasync, >>> and read to be able to select on a pretimeout or block on a read. >>> >> >> Ah, but now you are talking about a specific implementation, which is a bit >> different. The question here is what you expect to occur when a pretimeout >> happens, and you have a certain set of expectations. Personally I don't know >> what the best solution is; maybe a sysfs attribute or, yes, some activity >> on the watchdog device entry. Why don't you (or Don) suggest something >> and come up with a patch set for review ? > > I look through the only other two watchdogs that I could find with > pretimeouts (kempld and hpwdt). hpwdt uses NMI as its pretimeout > notification, while kempld uses a low level configured action (nmi, smi, > sci, delay). I think ipmi is the only one that chooses a user space > implementation (which raises another question[1]). > > I can try to respectfully copy the ipmi implementation to watchdog_dev.c > and set a wdd->option to indicate its use and in addition add the > pretimeout ioctls to watchdog_dev.c (and struct watchdog_device). > > Otherwise I am not sure if adding read, fasync, and poll wrappers to > watchdog_dev.c looks like a dirty hack. > > Cheers, > Don > > [1] if the system is stuck such that the pretimeout goes off, is it even > possible for userspace to run? Or guaranteed that it could run reliably? > Just curious behind the history for this addition. > I would guess it depends. In most cases, I would assume it reflects that the watchdog daemon did not run. This in turn may suggest that userspace is, for all practical purposes, unable to run. Given that, I would suspect that a solution which depends on user space to act will in most cases not be able to fulfil its purpose, and I would not want to depend on it. Note that kempld in practice only implements NMI, though the HW can do more. I can ask Kontron for feedback on their opinion for possible actions (and why they didn't implement other actions in the driver). Guenter ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 17:18 ` Guenter Roeck @ 2014-05-02 17:46 ` Don Zickus 2014-05-02 21:52 ` Corey Minyard 0 siblings, 1 reply; 14+ messages in thread From: Don Zickus @ 2014-05-02 17:46 UTC (permalink / raw) To: Guenter Roeck; +Cc: minyard, openipmi-developer, linux-watchdog On Fri, May 02, 2014 at 10:18:03AM -0700, Guenter Roeck wrote: > >>>That isn't enough to be able to report the pretimeout to the user. You > >>>can set it and get it with those calls, but it also needs poll, fasync, > >>>and read to be able to select on a pretimeout or block on a read. > >>> > >> > >>Ah, but now you are talking about a specific implementation, which is a bit > >>different. The question here is what you expect to occur when a pretimeout > >>happens, and you have a certain set of expectations. Personally I don't know > >>what the best solution is; maybe a sysfs attribute or, yes, some activity > >>on the watchdog device entry. Why don't you (or Don) suggest something > >>and come up with a patch set for review ? > > > >I look through the only other two watchdogs that I could find with > >pretimeouts (kempld and hpwdt). hpwdt uses NMI as its pretimeout > >notification, while kempld uses a low level configured action (nmi, smi, > >sci, delay). I think ipmi is the only one that chooses a user space > >implementation (which raises another question[1]). > > > >I can try to respectfully copy the ipmi implementation to watchdog_dev.c > >and set a wdd->option to indicate its use and in addition add the > >pretimeout ioctls to watchdog_dev.c (and struct watchdog_device). > > > >Otherwise I am not sure if adding read, fasync, and poll wrappers to > >watchdog_dev.c looks like a dirty hack. > > > >Cheers, > >Don > > > >[1] if the system is stuck such that the pretimeout goes off, is it even > >possible for userspace to run? Or guaranteed that it could run reliably? > >Just curious behind the history for this addition. > > > > I would guess it depends. In most cases, I would assume it reflects that the > watchdog daemon did not run. This in turn may suggest that userspace is, > for all practical purposes, unable to run. Given that, I would suspect that > a solution which depends on user space to act will in most cases not be able > to fulfil its purpose, and I would not want to depend on it. I am sure Corey ran into one vendor who wanted it, hence why he implemented it. But yeah, I am not sure I would depend on it either. > > Note that kempld in practice only implements NMI, though the HW can > do more. I can ask Kontron for feedback on their opinion for possible > actions (and why they didn't implement other actions in the driver). I don't have much interest in it really. I was just looking at what other vendors did as reference. NMI makes sense. Though I don't see an NMI handler, so I assume they just use the NMI to force a panic (like the hpwdt does, sortof)? Cheers, Don ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 17:46 ` Don Zickus @ 2014-05-02 21:52 ` Corey Minyard 2014-05-02 23:20 ` Guenter Roeck 2014-05-03 2:10 ` Don Zickus 0 siblings, 2 replies; 14+ messages in thread From: Corey Minyard @ 2014-05-02 21:52 UTC (permalink / raw) To: Don Zickus, Guenter Roeck; +Cc: openipmi-developer, linux-watchdog On 05/02/2014 12:46 PM, Don Zickus wrote: > On Fri, May 02, 2014 at 10:18:03AM -0700, Guenter Roeck wrote: >>>>> That isn't enough to be able to report the pretimeout to the user. You >>>>> can set it and get it with those calls, but it also needs poll, fasync, >>>>> and read to be able to select on a pretimeout or block on a read. >>>>> >>>> Ah, but now you are talking about a specific implementation, which is a bit >>>> different. The question here is what you expect to occur when a pretimeout >>>> happens, and you have a certain set of expectations. Personally I don't know >>>> what the best solution is; maybe a sysfs attribute or, yes, some activity >>>> on the watchdog device entry. Why don't you (or Don) suggest something >>>> and come up with a patch set for review ? >>> I look through the only other two watchdogs that I could find with >>> pretimeouts (kempld and hpwdt). hpwdt uses NMI as its pretimeout >>> notification, while kempld uses a low level configured action (nmi, smi, >>> sci, delay). I think ipmi is the only one that chooses a user space >>> implementation (which raises another question[1]). >>> >>> I can try to respectfully copy the ipmi implementation to watchdog_dev.c >>> and set a wdd->option to indicate its use and in addition add the >>> pretimeout ioctls to watchdog_dev.c (and struct watchdog_device). >>> >>> Otherwise I am not sure if adding read, fasync, and poll wrappers to >>> watchdog_dev.c looks like a dirty hack. >>> >>> Cheers, >>> Don >>> >>> [1] if the system is stuck such that the pretimeout goes off, is it even >>> possible for userspace to run? Or guaranteed that it could run reliably? >>> Just curious behind the history for this addition. >>> >> I would guess it depends. In most cases, I would assume it reflects that the >> watchdog daemon did not run. This in turn may suggest that userspace is, >> for all practical purposes, unable to run. Given that, I would suspect that >> a solution which depends on user space to act will in most cases not be able >> to fulfil its purpose, and I would not want to depend on it. > I am sure Corey ran into one vendor who wanted it, hence why he > implemented it. But yeah, I am not sure I would depend on it either. The driver hardware can do an NMI or an indication through an IPMI interrupt/signal, and the driver can either panic or try to give information to the user. I don't exactly remember the reason for giving a pretimeout to the user. I think there were some older implementations that did this, so I kept the function. You can't depend on it, of course, but if it can work then debugging a problem with the watchdog daemon (or something else in userland) would be a lot easier with an option like this. I have no idea if anyone uses it. I do agree that the driver should be moved over to use the framework. Implementing read/poll should be easy in the framework. Don, are you interested in working on this? i won't be able to get to it for a bit. -corey > >> Note that kempld in practice only implements NMI, though the HW can >> do more. I can ask Kontron for feedback on their opinion for possible >> actions (and why they didn't implement other actions in the driver). > I don't have much interest in it really. I was just looking at what other > vendors did as reference. NMI makes sense. Though I don't see an NMI > handler, so I assume they just use the NMI to force a panic (like the > hpwdt does, sortof)? > > Cheers, > Don ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 21:52 ` Corey Minyard @ 2014-05-02 23:20 ` Guenter Roeck 2014-05-03 2:10 ` Don Zickus 1 sibling, 0 replies; 14+ messages in thread From: Guenter Roeck @ 2014-05-02 23:20 UTC (permalink / raw) To: Corey Minyard; +Cc: Don Zickus, openipmi-developer, linux-watchdog On Fri, May 02, 2014 at 04:52:12PM -0500, Corey Minyard wrote: [ ... ] > > I do agree that the driver should be moved over to use the framework. > Implementing read/poll should be easy in the framework. > If it isn't essential, I would prefer to drop it for now. Otherwise, you'll have to convince Wim to accept it. I am personally not convinced that it is worthwhile. Eother case, the core changes should be a separate patch (or set of patches), so the work can be independent of each other. [ We might want to explore if a sysfs attribute would make more sense than the poll/fasync changes, for example ] In this context, does the driver have to reside in the ipmi directory ? It would be better to have it in the watchdog directory; that would ensure that the watchdog maintainer is involved if there are changes. Guenter ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 21:52 ` Corey Minyard 2014-05-02 23:20 ` Guenter Roeck @ 2014-05-03 2:10 ` Don Zickus 2014-05-03 2:51 ` Corey Minyard 2014-05-03 4:23 ` Guenter Roeck 1 sibling, 2 replies; 14+ messages in thread From: Don Zickus @ 2014-05-03 2:10 UTC (permalink / raw) To: Corey Minyard; +Cc: Guenter Roeck, openipmi-developer, linux-watchdog On Fri, May 02, 2014 at 04:52:12PM -0500, Corey Minyard wrote: > On 05/02/2014 12:46 PM, Don Zickus wrote: > > On Fri, May 02, 2014 at 10:18:03AM -0700, Guenter Roeck wrote: > >>>>> That isn't enough to be able to report the pretimeout to the user. You > >>>>> can set it and get it with those calls, but it also needs poll, fasync, > >>>>> and read to be able to select on a pretimeout or block on a read. > >>>>> > >>>> Ah, but now you are talking about a specific implementation, which is a bit > >>>> different. The question here is what you expect to occur when a pretimeout > >>>> happens, and you have a certain set of expectations. Personally I don't know > >>>> what the best solution is; maybe a sysfs attribute or, yes, some activity > >>>> on the watchdog device entry. Why don't you (or Don) suggest something > >>>> and come up with a patch set for review ? > >>> I look through the only other two watchdogs that I could find with > >>> pretimeouts (kempld and hpwdt). hpwdt uses NMI as its pretimeout > >>> notification, while kempld uses a low level configured action (nmi, smi, > >>> sci, delay). I think ipmi is the only one that chooses a user space > >>> implementation (which raises another question[1]). > >>> > >>> I can try to respectfully copy the ipmi implementation to watchdog_dev.c > >>> and set a wdd->option to indicate its use and in addition add the > >>> pretimeout ioctls to watchdog_dev.c (and struct watchdog_device). > >>> > >>> Otherwise I am not sure if adding read, fasync, and poll wrappers to > >>> watchdog_dev.c looks like a dirty hack. > >>> > >>> Cheers, > >>> Don > >>> > >>> [1] if the system is stuck such that the pretimeout goes off, is it even > >>> possible for userspace to run? Or guaranteed that it could run reliably? > >>> Just curious behind the history for this addition. > >>> > >> I would guess it depends. In most cases, I would assume it reflects that the > >> watchdog daemon did not run. This in turn may suggest that userspace is, > >> for all practical purposes, unable to run. Given that, I would suspect that > >> a solution which depends on user space to act will in most cases not be able > >> to fulfil its purpose, and I would not want to depend on it. > > I am sure Corey ran into one vendor who wanted it, hence why he > > implemented it. But yeah, I am not sure I would depend on it either. > > The driver hardware can do an NMI or an indication through an IPMI > interrupt/signal, and the driver can either panic or try to give > information to the user. > > I don't exactly remember the reason for giving a pretimeout to the > user. I think there were some older implementations that did this, so I > kept the function. You can't depend on it, of course, but if it can work > then debugging a problem with the watchdog daemon (or something else in > userland) would be a lot easier with an option like this. I have no > idea if anyone uses it. > > I do agree that the driver should be moved over to use the framework. > Implementing read/poll should be easy in the framework. > > Don, are you interested in working on this? i won't be able to get to > it for a bit. I don't mind doing the work, as long as we agree on an implementation. :-) Just throwing an idea out there based on Guenter's reply, should I just create a new file ipmi_wdt in the drivers/watchdog area that is ported to the new watchdog framework? The idea is if a user pops up that wants the userspace solution that can fallback to the old ipmi_watchdog currently implemented. Otherwise the new port would be the recommended way to go? Seeing that the current driver is updated fairly infrequent, keeping both drivers in sync wouldn't be to hard. Just a random idea. Cheers, Don > > -corey > > > > >> Note that kempld in practice only implements NMI, though the HW can > >> do more. I can ask Kontron for feedback on their opinion for possible > >> actions (and why they didn't implement other actions in the driver). > > I don't have much interest in it really. I was just looking at what other > > vendors did as reference. NMI makes sense. Though I don't see an NMI > > handler, so I assume they just use the NMI to force a panic (like the > > hpwdt does, sortof)? > > > > Cheers, > > Don > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-03 2:10 ` Don Zickus @ 2014-05-03 2:51 ` Corey Minyard 2014-05-03 4:23 ` Guenter Roeck 1 sibling, 0 replies; 14+ messages in thread From: Corey Minyard @ 2014-05-03 2:51 UTC (permalink / raw) To: Don Zickus; +Cc: Guenter Roeck, openipmi-developer, linux-watchdog On 05/02/2014 09:10 PM, Don Zickus wrote: > framework. > Implementing read/poll should be easy in the framework. > > Don, are you interested in working on this? i won't be able to get to > it for a bit. > I don't mind doing the work, as long as we agree on an implementation. :-) > > Just throwing an idea out there based on Guenter's reply, should I just > create a new file ipmi_wdt in the drivers/watchdog area that is ported to > the new watchdog framework? Yes, that's what I would do. I definitely belongs in the watchdog directory. > > The idea is if a user pops up that wants the userspace solution that can > fallback to the old ipmi_watchdog currently implemented. Otherwise the > new port would be the recommended way to go? Seeing that the current > driver is updated fairly infrequent, keeping both drivers in sync wouldn't > be to hard. > > Just a random idea. Yes, that's probably a good plan. -corey ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-03 2:10 ` Don Zickus 2014-05-03 2:51 ` Corey Minyard @ 2014-05-03 4:23 ` Guenter Roeck 1 sibling, 0 replies; 14+ messages in thread From: Guenter Roeck @ 2014-05-03 4:23 UTC (permalink / raw) To: Don Zickus; +Cc: Corey Minyard, openipmi-developer, linux-watchdog On Fri, May 02, 2014 at 10:10:02PM -0400, Don Zickus wrote: > On Fri, May 02, 2014 at 04:52:12PM -0500, Corey Minyard wrote: > > On 05/02/2014 12:46 PM, Don Zickus wrote: > > > On Fri, May 02, 2014 at 10:18:03AM -0700, Guenter Roeck wrote: > > >>>>> That isn't enough to be able to report the pretimeout to the user. You > > >>>>> can set it and get it with those calls, but it also needs poll, fasync, > > >>>>> and read to be able to select on a pretimeout or block on a read. > > >>>>> > > >>>> Ah, but now you are talking about a specific implementation, which is a bit > > >>>> different. The question here is what you expect to occur when a pretimeout > > >>>> happens, and you have a certain set of expectations. Personally I don't know > > >>>> what the best solution is; maybe a sysfs attribute or, yes, some activity > > >>>> on the watchdog device entry. Why don't you (or Don) suggest something > > >>>> and come up with a patch set for review ? > > >>> I look through the only other two watchdogs that I could find with > > >>> pretimeouts (kempld and hpwdt). hpwdt uses NMI as its pretimeout > > >>> notification, while kempld uses a low level configured action (nmi, smi, > > >>> sci, delay). I think ipmi is the only one that chooses a user space > > >>> implementation (which raises another question[1]). > > >>> > > >>> I can try to respectfully copy the ipmi implementation to watchdog_dev.c > > >>> and set a wdd->option to indicate its use and in addition add the > > >>> pretimeout ioctls to watchdog_dev.c (and struct watchdog_device). > > >>> > > >>> Otherwise I am not sure if adding read, fasync, and poll wrappers to > > >>> watchdog_dev.c looks like a dirty hack. > > >>> > > >>> Cheers, > > >>> Don > > >>> > > >>> [1] if the system is stuck such that the pretimeout goes off, is it even > > >>> possible for userspace to run? Or guaranteed that it could run reliably? > > >>> Just curious behind the history for this addition. > > >>> > > >> I would guess it depends. In most cases, I would assume it reflects that the > > >> watchdog daemon did not run. This in turn may suggest that userspace is, > > >> for all practical purposes, unable to run. Given that, I would suspect that > > >> a solution which depends on user space to act will in most cases not be able > > >> to fulfil its purpose, and I would not want to depend on it. > > > I am sure Corey ran into one vendor who wanted it, hence why he > > > implemented it. But yeah, I am not sure I would depend on it either. > > > > The driver hardware can do an NMI or an indication through an IPMI > > interrupt/signal, and the driver can either panic or try to give > > information to the user. > > > > I don't exactly remember the reason for giving a pretimeout to the > > user. I think there were some older implementations that did this, so I > > kept the function. You can't depend on it, of course, but if it can work > > then debugging a problem with the watchdog daemon (or something else in > > userland) would be a lot easier with an option like this. I have no > > idea if anyone uses it. > > > > I do agree that the driver should be moved over to use the framework. > > Implementing read/poll should be easy in the framework. > > > > Don, are you interested in working on this? i won't be able to get to > > it for a bit. > > I don't mind doing the work, as long as we agree on an implementation. :-) > > Just throwing an idea out there based on Guenter's reply, should I just > create a new file ipmi_wdt in the drivers/watchdog area that is ported to > the new watchdog framework? > I would suggest to move it in the 1st patch, then implement the wd framework in the next patch. That makes it easier to identify the changes. Guenter ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: ipmi watchdog questions 2014-05-02 0:38 ` Corey Minyard 2014-05-02 1:11 ` Guenter Roeck @ 2014-05-02 15:10 ` Don Zickus 1 sibling, 0 replies; 14+ messages in thread From: Don Zickus @ 2014-05-02 15:10 UTC (permalink / raw) To: Corey Minyard; +Cc: openipmi-developer, linux-watchdog On Thu, May 01, 2014 at 07:38:18PM -0500, Corey Minyard wrote: > On 05/01/2014 08:58 AM, Don Zickus wrote: > > Hi Corey, > > > > I stumbled upon an issue with a partner of ours, where they booted their > > machine and tried to load the ipmi_watchdog module by hand and it failed. > > > > The reason it failed was that the iTCO watchdog driver was already loaded > > and it registered the misc device /dev/watchdog first. > > > > I looked at the ipmi watchdog driver and realized it was never converted > > to the new watchdog framework where the watchdog_core module manages the > > '/dev/watchdog' misc device. > > > > So being naive and not knowing much about IPMI, I decided to follow the > > helpful document Documentation/watchdog/convert_drivers_to_kernel_api.txt > > and convert the ipmi_watchdog to use the new watchdog framework. > > > > I ran into a few issues and then realized the driver itself never really > > binds to any hardware, so it makes the conversion process a little more > > challenging. > > > > So a few questions to you before I waste my time in this area: > > > > - Is there any prior history about why the ipmi_watchdog was never > > converted to the new watchdog framework? Lack of interest? Technical > > hurdles? > > Mostly lack of interest, but there are some technical hurdles. Hi Corey, Thanks for all the responses. > > It would be hard to implement some things. The watchdog framework has > no concept of pretimeouts. And IPMI is message based, you send a > message to a controller to do anything, and you have to wait for the > response. That doesn't work very well with the watchdog interface, > which assumes you can do everything immediately. I will defer this conversation to Guenter's expertise. I am willing to hack up any suggestions the both of you come up with here to see if everything works well (same goes for the fasync/poll stuff). > > > > > - Is there a reason why the ipmi_watchdog is a seperate module as opposed > > to being called by ipmi_si? It seems there shouldn't be an issue with > > the watchdog always loaded, it just won't do anything until someone opens > > it (from my understanding). Also you would gain the ability to use the > > shutdown/remove routines properly instead of the reboot/panic notifiers. > > I'm not sure I understand this. Why would you want it as part of > ipmi_si? ipmi_msghandler would be a little more logical, but IMHO still > doesn't make sense. It uses the IPMI interface, and the interface is > designed to have multiple users. Better to keep it separate because > it's a separate function. > > I also don't understand the comment about shutdown/remove instead of > reboot/panic. Can you elaborate on that? So part of the problem with ipmi_watchdog is that it can't load automatically (like a normal driver). The reason is that it doesn't attach to any device. My suggestion to roll it into ipmi_si (or ipmi_msghandler works too), was to help with the autoloading part. To counter-argue the argument that a customer may not want the watchdog running, I would argue that it does nothing until someone opens it the first time. So autoloading shouldn't have a big downside to it. The second part I was trying to change is to remove the panic/reboot notifiers and instead use proper shutdown/remove functions. This would make it easier to use the 'struct watchdog_device' pointer as the pointer could be embedded in the per device struct. Though on the other hand, there is only ever one ipmi device per system, so maybe having it global isn't a big deal. I was trying to think of scalability issues. > > > In addition, passing the pointer to the 'struct watchdog_device' would be > > easier if some of those extra pieces were not there (as opposed to making > > it a global reference). > > > > - What does the fasync and poll calls do for a watchdog? > > The IPMI watchdog has the ability to report a pretimeout at a specific > amount of time before the final timeout, presumably to take some action > before the system reboots. the fasync and poll (and read) calls let > this be reported to the user. > > > > > I'll start with that for now. > > > > I appreciate any feedback. Currently we just implemented blacklisting the > > iTCO watchdog driver to workaround this problem. I thought we could do > > better, hence my motivation to do work in this area. > > It would be nice, yes. I'm afraid to get all the functionality would be > a lot of hacking on the watchdog framework or removal of function from > the driver. We can take it step by step and see how we can get there. Again my goal was to help a partner of ours get the ipmi_watchdog to play nice in their system without resorting to blacklisting other watchdogs. In addition, I thought it would help with out of the box configuration if the ipmi_watchdog could autoload if the ipmi pieces were loaded in the system too instead of having to add an entry to /etc/modprobe.d. Thanks for the help so far! Cheers, Don ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-05-03 4:23 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-01 13:58 ipmi watchdog questions Don Zickus 2014-05-02 0:38 ` Corey Minyard 2014-05-02 1:11 ` Guenter Roeck 2014-05-02 4:38 ` Corey Minyard 2014-05-02 13:17 ` Guenter Roeck 2014-05-02 16:44 ` Don Zickus 2014-05-02 17:18 ` Guenter Roeck 2014-05-02 17:46 ` Don Zickus 2014-05-02 21:52 ` Corey Minyard 2014-05-02 23:20 ` Guenter Roeck 2014-05-03 2:10 ` Don Zickus 2014-05-03 2:51 ` Corey Minyard 2014-05-03 4:23 ` Guenter Roeck 2014-05-02 15:10 ` Don Zickus
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.