All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] watchdog: new option to skip ping in close
@ 2012-08-03 17:40 David Teigland
  2012-08-04  9:08 ` Hans de Goede
  0 siblings, 1 reply; 7+ messages in thread
From: David Teigland @ 2012-08-03 17:40 UTC (permalink / raw)
  To: wim; +Cc: linux-watchdog

If a daemon has /dev/watchdog open, and exits (say by crashing or
sigkill), watchdog_release() generates a new watchdog_ping(), extending
the life of the machine.  However, the daemon may require control over
whether or not the watchdog is pinged, and the ping generated by close
compromises this control.

I'm using the watchdog to reset machines a cluster when their shared
leases expire.  Machine A will not ping its watchdog after one of its
leases expires, intending that it be reset by its watchdog.  Machine B can
then assume that 60 seconds after the lease expiration, machine A will be
reset, and it can safely acquire A's leases.  The problem is that if the
daemon on machine A exits sometime during the 60 seconds prior to the
watchdog firing, watchdog_close() generates a new ping, extending the the
life of A by a new 60 seconds.  To account for this, B must wait 120
seconds instead of 60 seconds to account for the additional ping the
kernel may have inserted.

Signed-off-by: David Teigland <teigland@redhat.com>
---
 drivers/watchdog/watchdog_dev.c |    9 ++++++++-
 include/linux/watchdog.h        |    2 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index ef8edec..b77442a 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -339,6 +339,11 @@ static long watchdog_ioctl(struct file *file, unsigned int cmd,
 	unsigned int val;
 	int err;
 
+	if (cmd == WDIOC_NOCLOSEPING) {
+		set_bit(WDOG_NO_CLOSE_PING, &wdd->status);
+		return 0;
+	}
+
 	err = watchdog_ioctl_op(wdd, cmd, arg);
 	if (err != -ENOIOCTLCMD)
 		return err;
@@ -480,7 +485,9 @@ static int watchdog_release(struct inode *inode, struct file *file)
 		if (!test_bit(WDOG_UNREGISTERED, &wdd->status))
 			dev_crit(wdd->dev, "watchdog did not stop!\n");
 		mutex_unlock(&wdd->lock);
-		watchdog_ping(wdd);
+
+		if (!test_bit(WDOG_NO_CLOSE_PING, &wdd->status))
+			watchdog_ping(wdd);
 	}
 
 	/* Allow the owner module to be unloaded again */
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index da70f0f..4fa10ff 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -31,6 +31,7 @@ struct watchdog_info {
 #define	WDIOC_SETPRETIMEOUT	_IOWR(WATCHDOG_IOCTL_BASE, 8, int)
 #define	WDIOC_GETPRETIMEOUT	_IOR(WATCHDOG_IOCTL_BASE, 9, int)
 #define	WDIOC_GETTIMELEFT	_IOR(WATCHDOG_IOCTL_BASE, 10, int)
+#define	WDIOC_NOCLOSEPING	_IOR(WATCHDOG_IOCTL_BASE, 11, int)
 
 #define	WDIOF_UNKNOWN		-1	/* Unknown flag error */
 #define	WDIOS_UNKNOWN		-1	/* Unknown status error */
@@ -140,6 +141,7 @@ struct watchdog_device {
 #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
 #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
 #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
+#define WDOG_NO_CLOSE_PING	5	/* Do not ping in watchdog_release() */
 };
 
 #ifdef CONFIG_WATCHDOG_NOWAYOUT
-- 
1.7.10.1.362.g242cab3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] watchdog: new option to skip ping in close
  2012-08-03 17:40 [PATCH] watchdog: new option to skip ping in close David Teigland
@ 2012-08-04  9:08 ` Hans de Goede
  2012-08-06 16:25   ` David Teigland
  0 siblings, 1 reply; 7+ messages in thread
From: Hans de Goede @ 2012-08-04  9:08 UTC (permalink / raw)
  To: David Teigland; +Cc: wim, linux-watchdog

Hi,

I've no opinion on the usability of adding support for this, but I
do have a comment on the implementation, see below.

On 08/03/2012 07:40 PM, David Teigland wrote:
> If a daemon has /dev/watchdog open, and exits (say by crashing or
> sigkill), watchdog_release() generates a new watchdog_ping(), extending
> the life of the machine.  However, the daemon may require control over
> whether or not the watchdog is pinged, and the ping generated by close
> compromises this control.
>
> I'm using the watchdog to reset machines a cluster when their shared
> leases expire.  Machine A will not ping its watchdog after one of its
> leases expires, intending that it be reset by its watchdog.  Machine B can
> then assume that 60 seconds after the lease expiration, machine A will be
> reset, and it can safely acquire A's leases.  The problem is that if the
> daemon on machine A exits sometime during the 60 seconds prior to the
> watchdog firing, watchdog_close() generates a new ping, extending the the
> life of A by a new 60 seconds.  To account for this, B must wait 120
> seconds instead of 60 seconds to account for the additional ping the
> kernel may have inserted.
>
> Signed-off-by: David Teigland <teigland@redhat.com>
> ---
>   drivers/watchdog/watchdog_dev.c |    9 ++++++++-
>   include/linux/watchdog.h        |    2 ++
>   2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index ef8edec..b77442a 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -339,6 +339,11 @@ static long watchdog_ioctl(struct file *file, unsigned int cmd,
>   	unsigned int val;
>   	int err;
>
> +	if (cmd == WDIOC_NOCLOSEPING) {
> +		set_bit(WDOG_NO_CLOSE_PING, &wdd->status);
> +		return 0;
> +	}
> +
>   	err = watchdog_ioctl_op(wdd, cmd, arg);
>   	if (err != -ENOIOCTLCMD)
>   		return err;

Why on earth are you putting that if block there and not simply putting
the command in the switch-case where it belongs? If you were afraid watchdog_ioctl_op
could cause issues, then maybe you should have studied the code?

That would have thought you that it will always return -ENOIOCTLCMD for commands it
does not know how to handle, so it won't interfere with adding new standard commands.

Adding this if block there is completely unnecessary and quite ugly IMHO.


> @@ -480,7 +485,9 @@ static int watchdog_release(struct inode *inode, struct file *file)
>   		if (!test_bit(WDOG_UNREGISTERED, &wdd->status))
>   			dev_crit(wdd->dev, "watchdog did not stop!\n");
>   		mutex_unlock(&wdd->lock);
> -		watchdog_ping(wdd);
> +
> +		if (!test_bit(WDOG_NO_CLOSE_PING, &wdd->status))
> +			watchdog_ping(wdd);
>   	}
>
>   	/* Allow the owner module to be unloaded again */
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index da70f0f..4fa10ff 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -31,6 +31,7 @@ struct watchdog_info {
>   #define	WDIOC_SETPRETIMEOUT	_IOWR(WATCHDOG_IOCTL_BASE, 8, int)
>   #define	WDIOC_GETPRETIMEOUT	_IOR(WATCHDOG_IOCTL_BASE, 9, int)
>   #define	WDIOC_GETTIMELEFT	_IOR(WATCHDOG_IOCTL_BASE, 10, int)
> +#define	WDIOC_NOCLOSEPING	_IOR(WATCHDOG_IOCTL_BASE, 11, int)
>
>   #define	WDIOF_UNKNOWN		-1	/* Unknown flag error */
>   #define	WDIOS_UNKNOWN		-1	/* Unknown status error */
> @@ -140,6 +141,7 @@ struct watchdog_device {
>   #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>   #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>   #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
> +#define WDOG_NO_CLOSE_PING	5	/* Do not ping in watchdog_release() */
>   };
>
>   #ifdef CONFIG_WATCHDOG_NOWAYOUT
>

Regards,

Hans

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] watchdog: new option to skip ping in close
  2012-08-04  9:08 ` Hans de Goede
@ 2012-08-06 16:25   ` David Teigland
  2012-08-07  7:53     ` Hans de Goede
  2012-08-08 20:22     ` Wim Van Sebroeck
  0 siblings, 2 replies; 7+ messages in thread
From: David Teigland @ 2012-08-06 16:25 UTC (permalink / raw)
  To: Hans de Goede; +Cc: wim, linux-watchdog

On Sat, Aug 04, 2012 at 11:08:24AM +0200, Hans de Goede wrote:
> Why on earth are you putting that if block there and not simply putting
> the command in the switch-case where it belongs? If you were afraid watchdog_ioctl_op
> could cause issues, then maybe you should have studied the code?
> 
> That would have thought you that it will always return -ENOIOCTLCMD for commands it
> does not know how to handle, so it won't interfere with adding new standard commands.
> 
> Adding this if block there is completely unnecessary and quite ugly IMHO.

It is ugly, but it's also necessary for a couple reasons that you may have
missed at first glance.  First, this ioctl is addressing behavior that
exists in the common framework layer, not in individual drivers, so it
doesn't make too much sense to pass down to the drivers.

That being said, we could do so if ENOIOCTLCMD worked as you suggest, but
it doesn't quite.  Drivers with an ioctl op will return ENOTTY, in which
case the switch statement would work, but drivers with no ioctl op (e.g.
softdog) cause watchdog_ioctl_op to return ENOIOCTLCMD, causing the switch
statement to be skipped.

In the end, we need WDIOC_NOCLOSEPING to be processed by the core,
independent of what drivers do, since as I explained this is targeting the
behavior of the core framework, not the drivers.

For those reasons, I placed the code where it is, but I would be satisfied
with a different approach that also works.

Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] watchdog: new option to skip ping in close
  2012-08-06 16:25   ` David Teigland
@ 2012-08-07  7:53     ` Hans de Goede
  2012-08-07 15:23       ` David Teigland
  2012-08-08 20:22     ` Wim Van Sebroeck
  1 sibling, 1 reply; 7+ messages in thread
From: Hans de Goede @ 2012-08-07  7:53 UTC (permalink / raw)
  To: David Teigland; +Cc: wim, linux-watchdog

Hi,

On 08/06/2012 06:25 PM, David Teigland wrote:
> On Sat, Aug 04, 2012 at 11:08:24AM +0200, Hans de Goede wrote:
>> Why on earth are you putting that if block there and not simply putting
>> the command in the switch-case where it belongs? If you were afraid watchdog_ioctl_op
>> could cause issues, then maybe you should have studied the code?
>>
>> That would have thought you that it will always return -ENOIOCTLCMD for commands it
>> does not know how to handle, so it won't interfere with adding new standard commands.
>>
>> Adding this if block there is completely unnecessary and quite ugly IMHO.
>
> It is ugly, but it's also necessary for a couple reasons that you may have
> missed at first glance.  First, this ioctl is addressing behavior that
> exists in the common framework layer, not in individual drivers, so it
> doesn't make too much sense to pass down to the drivers.

All the ioctls handled by the switch-case are handled primarily by the
core layer (doing things like range checking for the timeout value, etc),
WDIOC_NOCLOSEPING is no different!

> That being said, we could do so if ENOIOCTLCMD worked as you suggest, but
> it doesn't quite.  Drivers with an ioctl op will return ENOTTY,

No they don't, if they would do that all the standard ioctls would not work,
since watchdog_dev.c: watchdog_ioctl() has:

         err = watchdog_ioctl_op(wdd, cmd, arg);
         if (err != -ENOIOCTLCMD)
                 return err;

So if they returned ENOTTY we would never get into the switch case.

 > in which
> case the switch statement would work, but drivers with no ioctl op (e.g.
> softdog) cause watchdog_ioctl_op to return ENOIOCTLCMD, causing the switch
> statement to be skipped.

<sigH> it is the other way around not returning ENOIOCTLCMD (ie returning ENOTTY)
causes the switch case to be skipped, not the other way around. The watchdog_ioctl_op
driver callback is there to add custom driver specific ioctls, for all other ioctls
the watchdog_ioctl_op function must return ENOIOCTLCMD, this get turned into a
ENOTTY by the switch case if the ioctl is not one of the standard ones either,
so with WDIOC_NOCLOSEPING being a new standard ioctl it should, it *must* go in
the switch case just like all the other ioctls.

> In the end, we need WDIOC_NOCLOSEPING to be processed by the core,
> independent of what drivers do, since as I explained this is targeting the
> behavior of the core framework, not the drivers.

Right, and it will be processed by the core if it is in the switch-case because:

watchdog_ioctl_op will return ENOIOCTLCMD for non driver specific custom ioctls,
causing watchdog_ioctl() to continue into the switch case for standard ioctls!

Regards,

Hans

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] watchdog: new option to skip ping in close
  2012-08-07  7:53     ` Hans de Goede
@ 2012-08-07 15:23       ` David Teigland
  0 siblings, 0 replies; 7+ messages in thread
From: David Teigland @ 2012-08-07 15:23 UTC (permalink / raw)
  To: Hans de Goede; +Cc: wim, linux-watchdog

On Tue, Aug 07, 2012 at 09:53:36AM +0200, Hans de Goede wrote:
> >In the end, we need WDIOC_NOCLOSEPING to be processed by the core,
> >independent of what drivers do, since as I explained this is targeting the
> >behavior of the core framework, not the drivers.
> 
> Right, and it will be processed by the core if it is in the switch-case because:
> 
> watchdog_ioctl_op will return ENOIOCTLCMD for non driver specific custom ioctls,
> causing watchdog_ioctl() to continue into the switch case for standard ioctls!

Thanks, there's one piece I'm missing -- could you refer me to a specific
driver that returns ENOIOCTLCMD from its ioctl method?

Thanks,
Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] watchdog: new option to skip ping in close
  2012-08-06 16:25   ` David Teigland
  2012-08-07  7:53     ` Hans de Goede
@ 2012-08-08 20:22     ` Wim Van Sebroeck
  2012-08-08 21:11       ` David Teigland
  1 sibling, 1 reply; 7+ messages in thread
From: Wim Van Sebroeck @ 2012-08-08 20:22 UTC (permalink / raw)
  To: David Teigland; +Cc: Hans de Goede, linux-watchdog

Hi Dave,

> On Sat, Aug 04, 2012 at 11:08:24AM +0200, Hans de Goede wrote:
> > Why on earth are you putting that if block there and not simply putting
> > the command in the switch-case where it belongs? If you were afraid watchdog_ioctl_op
> > could cause issues, then maybe you should have studied the code?
> > 
> > That would have thought you that it will always return -ENOIOCTLCMD for commands it
> > does not know how to handle, so it won't interfere with adding new standard commands.
> > 
> > Adding this if block there is completely unnecessary and quite ugly IMHO.
> 
> It is ugly, but it's also necessary for a couple reasons that you may have
> missed at first glance.  First, this ioctl is addressing behavior that
> exists in the common framework layer, not in individual drivers, so it
> doesn't make too much sense to pass down to the drivers.
> 
> That being said, we could do so if ENOIOCTLCMD worked as you suggest, but
> it doesn't quite.  Drivers with an ioctl op will return ENOTTY, in which
> case the switch statement would work, but drivers with no ioctl op (e.g.
> softdog) cause watchdog_ioctl_op to return ENOIOCTLCMD, causing the switch
> statement to be skipped.

at this moment all drivers that are allready converted to the general watchdog
infrastructure/API don't need and thus don't have an ioctl operation in place.
They all use the common code. Before we added the WDIOC_GETTIMELEFT to the
framework, I had the converted iTCO_wdt driver using an ioctl operations call
that used the ENOIOCTLCMD without any issue.

> In the end, we need WDIOC_NOCLOSEPING to be processed by the core,
> independent of what drivers do, since as I explained this is targeting the
> behavior of the core framework, not the drivers.
> 
> For those reasons, I placed the code where it is, but I would be satisfied
> with a different approach that also works.

I'm going to be honoust: I don't understand the use case. I have the impression
that you want to solve High Availability issues with watchdog resets...
And that's like replacing a car with a new car when you are out of gasoline...

So, I need to think more about this and I wonder if other people have the same
issue as you have with the extra ping.

Aside from that (and aside from the things that Hans allready suggested (thanks
Hans!)) we should look if disabling that ping is something that you want for the
complete system or for only one driver in the system.
The code for the complete system would be translated as a module or kernel parameter
and not as an extra ioctl.
If it would be needed for specific drivers then we need a more generic approach
that will also let us tweak other items in the future so that we don't need an ioctl
for every change of behaviour.

Kind regards,
Wim.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] watchdog: new option to skip ping in close
  2012-08-08 20:22     ` Wim Van Sebroeck
@ 2012-08-08 21:11       ` David Teigland
  0 siblings, 0 replies; 7+ messages in thread
From: David Teigland @ 2012-08-08 21:11 UTC (permalink / raw)
  To: Wim Van Sebroeck; +Cc: Hans de Goede, linux-watchdog

On Wed, Aug 08, 2012 at 10:22:18PM +0200, Wim Van Sebroeck wrote:
> > That being said, we could do so if ENOIOCTLCMD worked as you suggest, but
> > it doesn't quite.  Drivers with an ioctl op will return ENOTTY, in which
> > case the switch statement would work, but drivers with no ioctl op (e.g.
> > softdog) cause watchdog_ioctl_op to return ENOIOCTLCMD, causing the switch
> > statement to be skipped.
> 
> at this moment all drivers that are allready converted to the general watchdog
> infrastructure/API don't need and thus don't have an ioctl operation in place.
> They all use the common code. Before we added the WDIOC_GETTIMELEFT to the
> framework, I had the converted iTCO_wdt driver using an ioctl operations call
> that used the ENOIOCTLCMD without any issue.

Yeah, that's what confused me.  I mistakenly thought the ioctl methods I
saw in the drivers were called by watchdog_dev, when in fact they are just
unconverted.  There's no point in adding the new ioctl to unconverted
drivers.

> I'm going to be honoust: I don't understand the use case. I have the impression
> that you want to solve High Availability issues with watchdog resets...

Yes. I'm using the watchdog as an alternative to i/o fencing in a shared
storage cluster.

> And that's like replacing a car with a new car when you are out of gasoline...

I guess you mean that reseting a machine sounds severe.  It's not if you
consider the context of i/o fencing where you often use a power switch to
turn off a machine to fence it.

> So, I need to think more about this and I wonder if other people have the same
> issue as you have with the extra ping.

People who *want* the watchdog to fire given certain conditions and also
want the most predictability about it, would probably appreciate no ping
generated by the kernel in close, but only generated themselves by ioctl.

There are a couple ways I can work around this at the moment, but neither
is very nice.  First, I can add an extra 60 seconds to my timeout
calculations in order to account for a ping that the kernel generates if
my daemon is killed just before the watchdog was about to fire.

Second, the approach I'm using now, is to pre-emptively close the device
specifically to use the close to generate what might be the final ping
that I actually want.  I then re-open the device if it turns out that it
wasn't in fact the final one I needed.  So, I use close and re-open to
issue some of the keepalives instead of ioctl.  This adds complexity.  (In
the pathological case, you end up doing repeated open&close of the device
to generate all your pings, instead of ioctl!)

> The code for the complete system would be translated as a module or kernel parameter
> and not as an extra ioctl.
> If it would be needed for specific drivers then we need a more generic approach
> that will also let us tweak other items in the future so that we don't need an ioctl
> for every change of behaviour.

I would need it in the general case, for the complete system.  But, I
would also need to be able to detect if the behavior was enabled or not on
a given system.

Thanks,
Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-08-08 21:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-03 17:40 [PATCH] watchdog: new option to skip ping in close David Teigland
2012-08-04  9:08 ` Hans de Goede
2012-08-06 16:25   ` David Teigland
2012-08-07  7:53     ` Hans de Goede
2012-08-07 15:23       ` David Teigland
2012-08-08 20:22     ` Wim Van Sebroeck
2012-08-08 21:11       ` David Teigland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.