All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v1 0/2] Add driver for PAPR watchdog timers
@ 2022-04-13 16:48 Scott Cheloha
  0 siblings, 0 replies; 6+ messages in thread
From: Scott Cheloha @ 2022-04-13 16:48 UTC (permalink / raw)
  To: linux-watchdog; +Cc: bjking, nlynch, aik, npiggin, vaishnavi, wvoigt

This series adds a driver for PAPR hypercall-based watchdog timers,
tentatively named "pseries-wdt".

I wanted to get some clarification on a few things before submitting
the series as a patch, hence the RFC.  The first patch adding the
hypercall to hvcall.h is straightforward, but I have questions about
the second patch (the driver).  In particular:

- In pseries_wdt_probe() we register the watchdog device with
  devm_watchdog_register_device().  However, in pseries_wdt_remove(),
  calling watchdog_unregister_devce() causes a kernel panic later,
  so I assume this is the wrong thing to do.

  Do we need to do anything to clean up the watchdog device during
  pseries_wdt_remove()?  Or does devm_watchdog_register_device()
  ensure the cleanup is handled transparently?

- In pseries_wdt_probe(), is it incorrect to devm_kfree() my
  allocation in the event that devm_watchdog_register_device()
  fails?

- The enormous hypercall input/output comment is mostly for my
  edification.  It seems like the sort of thing that will rot over time.
  I intend to remove most of it.  However, as far as I know the PAPR
  revision containing these details is not published yet.  Should I
  leave the comment in to ease review for now and remove it later?
  Or should I omit it from the initial commit entirely?

- Should we print something to the console when probing/removing the
  watchdog0 device or is that just noise?

  Most drivers (as distinct from devices) seem to print something
  during initialization, so that's what I've done in
  pseries_wdt_module_init() when the capability query succeeds.

- The timeout action is currently hardcoded to a hard reset.  This
  could be made configurable through a module parameter.  I intend
  to do this in a later patch unless someone needs it included
  in the initial patch.

- We set EIO if the hypercall fails in pseries_wdt_start() or
  pseries_wdt_stop().  There is nothing userspace can do if this
  happens.  All hypercall failures in these contexts are unexpected.

  Given all of that, is there is a more appropriate errno than EIO?

- The H_WATCHDOG spec indicates that H_BUSY is possible.  Is it
  probable, though?  Should we spin and retry the hypercall in
  the event that we see it?  Or is that pointless?



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC v1 0/2] Add driver for PAPR watchdog timers
  2022-04-19  8:49 ` Alexey Kardashevskiy
@ 2022-04-19 13:55   ` Guenter Roeck
  0 siblings, 0 replies; 6+ messages in thread
From: Guenter Roeck @ 2022-04-19 13:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Scott Cheloha, linux-watchdog
  Cc: bjking, nathanl, npiggin, vaishnavi, wvoigt

On 4/19/22 01:49, Alexey Kardashevskiy wrote:
> 
> 
> On 14/04/2022 02:51, Scott Cheloha wrote:
>> This series adds a driver for PAPR hypercall-based watchdog timers,
>> tentatively named "pseries-wdt".
>>
>> I wanted to get some clarification on a few things before submitting
>> the series as a patch, hence the RFC.  The first patch adding the
>> hypercall to hvcall.h is straightforward, but I have questions about
>> the second patch (the driver).  In particular:
>>
>> - In pseries_wdt_probe() we register the watchdog device with
>>    devm_watchdog_register_device().  However, in pseries_wdt_remove(),
>>    calling watchdog_unregister_devce() causes a kernel panic later,
>>    so I assume this is the wrong thing to do.
> 
> 
> It should have been devm_watchdog_unregister_device() (no difference though) and what was the backtrace? Most watchdog drivers do it this way  :-/
> 

Please make yourself familiar with devm_ functions and their use.
There is no exported devm_watchdog_unregister_device() because it is
not needed.

> 
>>    Do we need to do anything to clean up the watchdog device during
>>    pseries_wdt_remove()?  Or does devm_watchdog_register_device()
>>    ensure the cleanup is handled transparently?
>>
>> - In pseries_wdt_probe(), is it incorrect to devm_kfree() my
>>    allocation in the event that devm_watchdog_register_device()
>>    fails?
> 
> I am pretty sure nothing is going to free the memory you allocated in devm_kzalloc() as you do not even pass the allocated pointer to devm_watchdog_register_device(), it is an offset. The only reason devm_kfree(&pw->wd) won't barf1 is @wd is the first member of the pseries_wdt struct.
> 

Again, please make yourself familiar with devm_ functions
and their use.

Guenter

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC v1 0/2] Add driver for PAPR watchdog timers
  2022-04-13 16:51 Scott Cheloha
  2022-04-14  2:23 ` Guenter Roeck
@ 2022-04-19  8:49 ` Alexey Kardashevskiy
  2022-04-19 13:55   ` Guenter Roeck
  1 sibling, 1 reply; 6+ messages in thread
From: Alexey Kardashevskiy @ 2022-04-19  8:49 UTC (permalink / raw)
  To: Scott Cheloha, linux-watchdog; +Cc: bjking, nathanl, npiggin, vaishnavi, wvoigt



On 14/04/2022 02:51, Scott Cheloha wrote:
> This series adds a driver for PAPR hypercall-based watchdog timers,
> tentatively named "pseries-wdt".
> 
> I wanted to get some clarification on a few things before submitting
> the series as a patch, hence the RFC.  The first patch adding the
> hypercall to hvcall.h is straightforward, but I have questions about
> the second patch (the driver).  In particular:
> 
> - In pseries_wdt_probe() we register the watchdog device with
>    devm_watchdog_register_device().  However, in pseries_wdt_remove(),
>    calling watchdog_unregister_devce() causes a kernel panic later,
>    so I assume this is the wrong thing to do.


It should have been devm_watchdog_unregister_device() (no difference 
though) and what was the backtrace? Most watchdog drivers do it this way 
  :-/


>    Do we need to do anything to clean up the watchdog device during
>    pseries_wdt_remove()?  Or does devm_watchdog_register_device()
>    ensure the cleanup is handled transparently?
> 
> - In pseries_wdt_probe(), is it incorrect to devm_kfree() my
>    allocation in the event that devm_watchdog_register_device()
>    fails?

I am pretty sure nothing is going to free the memory you allocated in 
devm_kzalloc() as you do not even pass the allocated pointer to 
devm_watchdog_register_device(), it is an offset. The only reason 
devm_kfree(&pw->wd) won't barf1 is @wd is the first member of the 
pseries_wdt struct.


> - The enormous hypercall input/output comment is mostly for my
>    edification.  It seems like the sort of thing that will rot over time.
>    I intend to remove most of it.  However, as far as I know the PAPR
>    revision containing these details is not published yet.  Should I
>    leave the comment in to ease review for now and remove it later?
>    Or should I omit it from the initial commit entirely?

I'd probably remove some empty lines and add shorter comments inline, like:

+/* Bits 56-63: "timeoutAction" */
+#define PSERIES_WDTF_ACTION(ac)			SETFIELD(ac, 56, 63)
+#define PSERIES_WDTF_ACTION_HARD_POWEROFF	PSERIES_WDTF_ACTION(0x1) // 
"Hard poweroff"
+#define PSERIES_WDTF_ACTION_HARD_RESTART	PSERIES_WDTF_ACTION(0x2) // 
"Hard restart"
+#define PSERIES_WDTF_ACTION_DUMP_RESTART	PSERIES_WDTF_ACTION(0x3) // 
"Dump restart"


The quoted text would tell what to search literally for in the PAPR spec 
when it is updated.


> - Should we print something to the console when probing/removing the
>    watchdog0 device or is that just noise?
> 
>    Most drivers (as distinct from devices) seem to print something
>    during initialization, so that's what I've done in
>    pseries_wdt_module_init() when the capability query succeeds.


I'd say it is noise but since the watchdog is not represented in the 
device tree, there is really no other way of knowing if it is running 
(unless it is a module?).

One line message in pseries_wdt_probe() with 
PSERIES_WDTQ_MAX_NUMBER/PSERIES_WDTQ_MIN_TIMEOUT should do.


> - The timeout action is currently hardcoded to a hard reset.  This
>    could be made configurable through a module parameter.  I intend
>    to do this in a later patch unless someone needs it included
>    in the initial patch.

Make it in the initial patch, it is just a few lines.


> - We set EIO if the hypercall fails in pseries_wdt_start() or
>    pseries_wdt_stop().  There is nothing userspace can do if this
>    happens.  All hypercall failures in these contexts are unexpected.

The userspace can log the event, send an email, "sync && reboot", dunno.

>    Given all of that, is there is a more appropriate errno than EIO?
> 
> - The H_WATCHDOG spec indicates that H_BUSY is possible.  Is it
>    probable, though?  Should we spin and retry the hypercall in
>    the event that we see it?  Or is that pointless?


Looks like the other parts of pseries do retry after calling cond_resched().

> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC v1 0/2] Add driver for PAPR watchdog timers
  2022-04-14  2:23 ` Guenter Roeck
@ 2022-04-14 12:39   ` Nathan Lynch
  0 siblings, 0 replies; 6+ messages in thread
From: Nathan Lynch @ 2022-04-14 12:39 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: bjking, aik, npiggin, vaishnavi, Scott Cheloha, linux-watchdog

Guenter Roeck <linux@roeck-us.net> writes:
> Anyway, doesn't pseries support devicetree ? Why is this driver not
> instantiated through a devicetree node ?

It's not ideal, but this facility doesn't have a device tree
representation specified in the platform architecture. It has to be
discovered through hypervisor calls.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC v1 0/2] Add driver for PAPR watchdog timers
  2022-04-13 16:51 Scott Cheloha
@ 2022-04-14  2:23 ` Guenter Roeck
  2022-04-14 12:39   ` Nathan Lynch
  2022-04-19  8:49 ` Alexey Kardashevskiy
  1 sibling, 1 reply; 6+ messages in thread
From: Guenter Roeck @ 2022-04-14  2:23 UTC (permalink / raw)
  To: Scott Cheloha, linux-watchdog
  Cc: bjking, nathanl, aik, npiggin, vaishnavi, wvoigt

On 4/13/22 09:51, Scott Cheloha wrote:
> This series adds a driver for PAPR hypercall-based watchdog timers,
> tentatively named "pseries-wdt".
> 
> I wanted to get some clarification on a few things before submitting
> the series as a patch, hence the RFC.  The first patch adding the
> hypercall to hvcall.h is straightforward, but I have questions about
> the second patch (the driver).  In particular:
> 
> - In pseries_wdt_probe() we register the watchdog device with
>    devm_watchdog_register_device().  However, in pseries_wdt_remove(),
>    calling watchdog_unregister_devce() causes a kernel panic later,
>    so I assume this is the wrong thing to do.
> 

The whole point of using devm_ functions is to handle cleanup (or removal)
automatically. I would suggest to make yourself familiar with the concept.

>    Do we need to do anything to clean up the watchdog device during
>    pseries_wdt_remove()?  Or does devm_watchdog_register_device()
>    ensure the cleanup is handled transparently?
> 
> - In pseries_wdt_probe(), is it incorrect to devm_kfree() my
>    allocation in the event that devm_watchdog_register_device()
>    fails?
> 

No. Same thing.

> - The enormous hypercall input/output comment is mostly for my
>    edification.  It seems like the sort of thing that will rot over time.
>    I intend to remove most of it.  However, as far as I know the PAPR
>    revision containing these details is not published yet.  Should I
>    leave the comment in to ease review for now and remove it later?
>    Or should I omit it from the initial commit entirely?
> 
> - Should we print something to the console when probing/removing the
>    watchdog0 device or is that just noise?
> 
It is just noise, but some developers insist on it.

>    Most drivers (as distinct from devices) seem to print something
>    during initialization, so that's what I've done in
>    pseries_wdt_module_init() when the capability query succeeds.
> 
No. If you have to print something, print it during probe. module init
noise is even worse. And those error messages in the init function are
completely unacceptable.

Anyway, doesn't pseries support devicetree ? Why is this driver not
instantiated through a devicetree node ?

Guenter

> - The timeout action is currently hardcoded to a hard reset.  This
>    could be made configurable through a module parameter.  I intend
>    to do this in a later patch unless someone needs it included
>    in the initial patch.
> 
> - We set EIO if the hypercall fails in pseries_wdt_start() or
>    pseries_wdt_stop().  There is nothing userspace can do if this
>    happens.  All hypercall failures in these contexts are unexpected.
> 
>    Given all of that, is there is a more appropriate errno than EIO?
> 
> - The H_WATCHDOG spec indicates that H_BUSY is possible.  Is it
>    probable, though?  Should we spin and retry the hypercall in
>    the event that we see it?  Or is that pointless?
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC v1 0/2] Add driver for PAPR watchdog timers
@ 2022-04-13 16:51 Scott Cheloha
  2022-04-14  2:23 ` Guenter Roeck
  2022-04-19  8:49 ` Alexey Kardashevskiy
  0 siblings, 2 replies; 6+ messages in thread
From: Scott Cheloha @ 2022-04-13 16:51 UTC (permalink / raw)
  To: linux-watchdog; +Cc: bjking, nathanl, aik, npiggin, vaishnavi, wvoigt

This series adds a driver for PAPR hypercall-based watchdog timers,
tentatively named "pseries-wdt".

I wanted to get some clarification on a few things before submitting
the series as a patch, hence the RFC.  The first patch adding the
hypercall to hvcall.h is straightforward, but I have questions about
the second patch (the driver).  In particular:

- In pseries_wdt_probe() we register the watchdog device with
  devm_watchdog_register_device().  However, in pseries_wdt_remove(),
  calling watchdog_unregister_devce() causes a kernel panic later,
  so I assume this is the wrong thing to do.

  Do we need to do anything to clean up the watchdog device during
  pseries_wdt_remove()?  Or does devm_watchdog_register_device()
  ensure the cleanup is handled transparently?

- In pseries_wdt_probe(), is it incorrect to devm_kfree() my
  allocation in the event that devm_watchdog_register_device()
  fails?

- The enormous hypercall input/output comment is mostly for my
  edification.  It seems like the sort of thing that will rot over time.
  I intend to remove most of it.  However, as far as I know the PAPR
  revision containing these details is not published yet.  Should I
  leave the comment in to ease review for now and remove it later?
  Or should I omit it from the initial commit entirely?

- Should we print something to the console when probing/removing the
  watchdog0 device or is that just noise?

  Most drivers (as distinct from devices) seem to print something
  during initialization, so that's what I've done in
  pseries_wdt_module_init() when the capability query succeeds.

- The timeout action is currently hardcoded to a hard reset.  This
  could be made configurable through a module parameter.  I intend
  to do this in a later patch unless someone needs it included
  in the initial patch.

- We set EIO if the hypercall fails in pseries_wdt_start() or
  pseries_wdt_stop().  There is nothing userspace can do if this
  happens.  All hypercall failures in these contexts are unexpected.

  Given all of that, is there is a more appropriate errno than EIO?

- The H_WATCHDOG spec indicates that H_BUSY is possible.  Is it
  probable, though?  Should we spin and retry the hypercall in
  the event that we see it?  Or is that pointless?



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-19 13:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13 16:48 [RFC v1 0/2] Add driver for PAPR watchdog timers Scott Cheloha
2022-04-13 16:51 Scott Cheloha
2022-04-14  2:23 ` Guenter Roeck
2022-04-14 12:39   ` Nathan Lynch
2022-04-19  8:49 ` Alexey Kardashevskiy
2022-04-19 13:55   ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.