All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv5 0/4] watchdog: Extend kernel API and add early_timeout_sec feature
@ 2015-04-10 13:06 ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel, linux-watchdog, boris.brezillon, nicolas.ferre,
	alexandre.belloni
  Cc: Timo Kokkonen

The watchdog kernel API is quite limited. It has support for providing
generic device handling, but it doesn't really know anything about the
watchdog hardware or its constraints. The watchdog drivers come with a
lot of diversity and their own set of quirks and constraints. Some of
their limitations are not nice for the user space, so the drivers work
around them with all sorts of ad hoc implementations.

One common pattern is to use kernel timers or work queues to allow
longer timeout parameters than the actual hardware supports. To solve
this problem, this patch set extends the kernel watchdog API with a few
parameters that let the core know more about the watchdog HW and take
care about the timeout extending.

The patch set also implements "early_timeout_sec" feature that is very
common on many production systems where early kernel or user space
crashes must lead to a device reset. Traditional watchdog handling
does not allow this as the watchdog is stopped (fully or emulating
stopped state with kernel timers) before user space opens it for the
first time.

The changes are designed to be taken in use one driver at time. If the
driver does not set the new parameters and call
watchdog_init_params(), the watchdog behavior is exactly the same as
before.

In principle this new API makes it possible for the user space to see
every watchdog hardware to behave the same, at least in terms of
watchdog timeouts. Once the API is in, it should be easier to move
even more common behavior out of the driver code to the watchdog core
and make the drivers simpler.

I'm not anticipating this patch set to go in as is, this is merely to
help bringing up the discussion as there is a working patch set to look
at. There are propably some corner cases that I didn't get right
yet. Also, we could even remove the maximum timeout completely as the
HW no longer limits the watchdog timeouts. This propably needs some
thinking still and your commenting.

Please review and give feedback.

Patch revision history:

-v5: Re-think the approach to be fully generic. The early_timeout_sec
  handling is no longer in the driver but in the watchdog core. As a
  result the core needed to gain knowledge about the watchdog
  hardware. Appropriate handling is added in the core. The side effect
  for this is that drivers using the new extensions can be simplified
  a lot and different kinds of watchdog hardware can be made to
  behave the same for the user space.

-v4: Binding documentation is now separated completely from the driver
  patch. The documentation no longer makes any assumptions about how
  the actual implementation is made, it just describes the actual
  behavior the driver should implement in order to satisfy the
  requirement.

- v3: Rename the property to "early-timeout-sec" and use it as a
  timeout value that stops the timer in the atmel driver after the
  timeout expires. A watchdog.txt is also introduced for documenting
  the common watchdog properties, including now this one and
  "timeout-sec" property.

- v2: Rename the property to "enable-early-reset" as the behavior
  itself is not atmel specific. This way other drivers are free to
  implement same behavior with the same property name.

- v1: Propose property name "atmle,no-early-timer" for disabling the
  timer that keeps the atmel watchdog running until user space opens
  the device.


Timo Kokkonen (4):
  watchdog: Extend kernel API to know about HW limitations
  watchdog: Allow watchdog to reset device at early boot
  devicetree: Document generic watchdog properties
  watchdog: at91sam9_wdt: Convert to use new watchdog core extensions

 .../devicetree/bindings/watchdog/watchdog.txt      |  20 ++++
 drivers/watchdog/at91sam9_wdt.c                    |  58 +++--------
 drivers/watchdog/watchdog_core.c                   | 108 ++++++++++++++++++++-
 drivers/watchdog/watchdog_dev.c                    |  12 +++
 include/linux/watchdog.h                           |  24 +++++
 5 files changed, 173 insertions(+), 49 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/watchdog/watchdog.txt

-- 
2.1.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 0/4] watchdog: Extend kernel API and add early_timeout_sec feature
@ 2015-04-10 13:06 ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel

The watchdog kernel API is quite limited. It has support for providing
generic device handling, but it doesn't really know anything about the
watchdog hardware or its constraints. The watchdog drivers come with a
lot of diversity and their own set of quirks and constraints. Some of
their limitations are not nice for the user space, so the drivers work
around them with all sorts of ad hoc implementations.

One common pattern is to use kernel timers or work queues to allow
longer timeout parameters than the actual hardware supports. To solve
this problem, this patch set extends the kernel watchdog API with a few
parameters that let the core know more about the watchdog HW and take
care about the timeout extending.

The patch set also implements "early_timeout_sec" feature that is very
common on many production systems where early kernel or user space
crashes must lead to a device reset. Traditional watchdog handling
does not allow this as the watchdog is stopped (fully or emulating
stopped state with kernel timers) before user space opens it for the
first time.

The changes are designed to be taken in use one driver at time. If the
driver does not set the new parameters and call
watchdog_init_params(), the watchdog behavior is exactly the same as
before.

In principle this new API makes it possible for the user space to see
every watchdog hardware to behave the same, at least in terms of
watchdog timeouts. Once the API is in, it should be easier to move
even more common behavior out of the driver code to the watchdog core
and make the drivers simpler.

I'm not anticipating this patch set to go in as is, this is merely to
help bringing up the discussion as there is a working patch set to look
at. There are propably some corner cases that I didn't get right
yet. Also, we could even remove the maximum timeout completely as the
HW no longer limits the watchdog timeouts. This propably needs some
thinking still and your commenting.

Please review and give feedback.

Patch revision history:

-v5: Re-think the approach to be fully generic. The early_timeout_sec
  handling is no longer in the driver but in the watchdog core. As a
  result the core needed to gain knowledge about the watchdog
  hardware. Appropriate handling is added in the core. The side effect
  for this is that drivers using the new extensions can be simplified
  a lot and different kinds of watchdog hardware can be made to
  behave the same for the user space.

-v4: Binding documentation is now separated completely from the driver
  patch. The documentation no longer makes any assumptions about how
  the actual implementation is made, it just describes the actual
  behavior the driver should implement in order to satisfy the
  requirement.

- v3: Rename the property to "early-timeout-sec" and use it as a
  timeout value that stops the timer in the atmel driver after the
  timeout expires. A watchdog.txt is also introduced for documenting
  the common watchdog properties, including now this one and
  "timeout-sec" property.

- v2: Rename the property to "enable-early-reset" as the behavior
  itself is not atmel specific. This way other drivers are free to
  implement same behavior with the same property name.

- v1: Propose property name "atmle,no-early-timer" for disabling the
  timer that keeps the atmel watchdog running until user space opens
  the device.


Timo Kokkonen (4):
  watchdog: Extend kernel API to know about HW limitations
  watchdog: Allow watchdog to reset device at early boot
  devicetree: Document generic watchdog properties
  watchdog: at91sam9_wdt: Convert to use new watchdog core extensions

 .../devicetree/bindings/watchdog/watchdog.txt      |  20 ++++
 drivers/watchdog/at91sam9_wdt.c                    |  58 +++--------
 drivers/watchdog/watchdog_core.c                   | 108 ++++++++++++++++++++-
 drivers/watchdog/watchdog_dev.c                    |  12 +++
 include/linux/watchdog.h                           |  24 +++++
 5 files changed, 173 insertions(+), 49 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/watchdog/watchdog.txt

-- 
2.1.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
  2015-04-10 13:06 ` Timo Kokkonen
@ 2015-04-10 13:06   ` Timo Kokkonen
  -1 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel, linux-watchdog, boris.brezillon, nicolas.ferre,
	alexandre.belloni
  Cc: Timo Kokkonen

There is a great deal of diversity in the watchdog hardware found on
different devices. Differen hardware have different contstraints on
them, many of the constraints that are excessively difficult for the
user space to satisfy.

One such constraint is the length of the timeout value, which in many
cases can be just a few seconds. Drivers are creating ad hoc solutions
with timers and workqueues to extend the timeout in order to give user
space more time between updates. Looking at the drivers it is clear
that this has resulted to a lot of duplicate code.

Add an extension to the watchdog kernel API that allows the driver to
describe tis HW constraints to the watchdog code. A kernel worker in
the core is then used to extend the watchdog timeout on behalf of the
user space. This allows the drivers to be simplified as core takes
over the timer extending.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 drivers/watchdog/watchdog_core.c | 87 ++++++++++++++++++++++++++++++++++++++--
 drivers/watchdog/watchdog_dev.c  | 12 ++++++
 include/linux/watchdog.h         | 23 +++++++++++
 3 files changed, 119 insertions(+), 3 deletions(-)

diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index cec9b55..8e7d08d 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
 EXPORT_SYMBOL_GPL(watchdog_init_timeout);
 
 /**
+ * watchdog_init_timeout() - initialize generic watchdog parameters
+ * @wdd: Watchdog device to operate
+ * @dev: Device that stores the device tree properties
+ *
+ * Initialize the generic timeout parameters. The driver needs to set
+ * hw_features bitmask from @wdd prior calling this function in order
+ * to ensure the core knows how to handle the HW.
+ *
+ * A zero is returned on success and -EINVAL for failure.
+ */
+int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
+{
+	int ret = 0;
+
+	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
+	if (ret < 0)
+		return ret;
+
+	/*
+	 * Max HW timeout needs to be set so that core knows when to
+	 * use a kernel worker to support longer watchdog timeouts
+	 */
+	if (!wdd->hw_max_timeout)
+		return -EINVAL;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(watchdog_init_params);
+
+static void watchdog_worker(struct work_struct *work)
+{
+	struct watchdog_device *wdd = container_of(to_delayed_work(work),
+						struct watchdog_device, work);
+
+	if (time_after(jiffies, wdd->expires)) {
+		pr_crit("I will reset your machine !\n");
+		return;
+	}
+
+	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
+		(watchdog_active(wdd) &&
+			wdd->hw_max_timeout < wdd->timeout * HZ)) {
+		if (wdd->ops->ping)
+			wdd->ops->ping(wdd);
+		else
+			wdd->ops->start(wdd);
+
+		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
+	}
+}
+
+static int prepare_watchdog(struct watchdog_device *wdd)
+{
+	/* Stop the watchdog now before user space opens the device */
+	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
+		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
+		wdd->ops->stop(wdd);
+
+	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
+		/*
+		 * Can't stop it, use a kernel timer to tick
+		 * it until it's open by user space
+		 */
+		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
+	}
+	return 0;
+}
+
+/**
  * watchdog_register_device() - register a watchdog device
  * @wdd: watchdog device
  *
@@ -156,13 +225,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
 	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
 					NULL, "watchdog%d", wdd->id);
 	if (IS_ERR(wdd->dev)) {
-		watchdog_dev_unregister(wdd);
-		ida_simple_remove(&watchdog_ida, id);
 		ret = PTR_ERR(wdd->dev);
-		return ret;
+		goto dev_create_fail;
+	}
+
+	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
+
+	if (wdd->hw_max_timeout) {
+		ret = prepare_watchdog(wdd);
+		if (ret)
+			goto dev_create_fail;
 	}
 
 	return 0;
+
+dev_create_fail:
+	watchdog_dev_unregister(wdd);
+	ida_simple_remove(&watchdog_ida, id);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(watchdog_register_device);
 
@@ -181,6 +261,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
 	if (wdd == NULL)
 		return;
 
+	cancel_delayed_work_sync(&wdd->work);
 	devno = wdd->cdev.dev;
 	ret = watchdog_dev_unregister(wdd);
 	if (ret)
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 6aaefba..99eb363 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device *wddev)
 	else
 		err = wddev->ops->start(wddev); /* restart watchdog */
 
+	if (wddev->hw_max_timeout &&
+		wddev->timeout * HZ > wddev->hw_max_timeout) {
+		wddev->expires = jiffies + wddev->timeout * HZ;
+		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+	}
+
 out_ping:
 	mutex_unlock(&wddev->lock);
 	return err;
@@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device *wddev)
 	if (err == 0)
 		set_bit(WDOG_ACTIVE, &wddev->status);
 
+	if (wddev->hw_max_timeout &&
+		wddev->timeout * HZ > wddev->hw_max_timeout) {
+		wddev->expires = jiffies + wddev->timeout * HZ;
+		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+	}
+
 out_start:
 	mutex_unlock(&wddev->lock);
 	return err;
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 395b70e..c16c921 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -12,6 +12,7 @@
 #include <linux/bitops.h>
 #include <linux/device.h>
 #include <linux/cdev.h>
+#include <linux/workqueue.h>
 #include <uapi/linux/watchdog.h>
 
 struct watchdog_ops;
@@ -62,9 +63,13 @@ struct watchdog_ops {
  * @timeout:	The watchdog devices timeout value.
  * @min_timeout:The watchdog devices minimum timeout value.
  * @max_timeout:The watchdog devices maximum timeout value.
+ * @hw_max_timeout:The watchdog hardware maximum timeout value.
+ * @hw_margin:	Time interval between timer pings.
  * @driver-data:Pointer to the drivers private data.
  * @lock:	Lock for watchdog core internal use only.
+ * @work:	Worker used to provide longer timeouts than hardware supports.
  * @status:	Field that contains the devices internal status bits.
+ * @hw_features:Feature bits describing how the watchdog HW works.
  *
  * The watchdog_device structure contains all information about a
  * watchdog timer device.
@@ -86,8 +91,12 @@ struct watchdog_device {
 	unsigned int timeout;
 	unsigned int min_timeout;
 	unsigned int max_timeout;
+	unsigned int hw_max_timeout; /* in jiffies */
+	unsigned int hw_heartbeat; /* in jiffies */
+	unsigned long int expires; /* for soft timer */
 	void *driver_data;
 	struct mutex lock;
+	struct delayed_work work;
 	unsigned long status;
 /* Bit numbers for status flags */
 #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
@@ -95,6 +104,14 @@ struct watchdog_device {
 #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
 #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
 #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
+
+/* Bits describing features supported by the HW */
+	unsigned long hw_features;
+
+/* Can the watchdog be stopped and started */
+#define WDOG_HW_IS_STOPPABLE		BIT(0)
+/* Is the watchdog already running when the driver starts up */
+#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
 };
 
 #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
@@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
 		(t < wdd->min_timeout || t > wdd->max_timeout));
 }
 
+static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
+{
+	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
+}
+
 /* Use the following functions to manipulate watchdog driver specific data */
 static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
 {
@@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
 /* drivers/watchdog/watchdog_core.c */
 extern int watchdog_init_timeout(struct watchdog_device *wdd,
 				  unsigned int timeout_parm, struct device *dev);
+int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
 extern int watchdog_register_device(struct watchdog_device *);
 extern void watchdog_unregister_device(struct watchdog_device *);
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
@ 2015-04-10 13:06   ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel

There is a great deal of diversity in the watchdog hardware found on
different devices. Differen hardware have different contstraints on
them, many of the constraints that are excessively difficult for the
user space to satisfy.

One such constraint is the length of the timeout value, which in many
cases can be just a few seconds. Drivers are creating ad hoc solutions
with timers and workqueues to extend the timeout in order to give user
space more time between updates. Looking at the drivers it is clear
that this has resulted to a lot of duplicate code.

Add an extension to the watchdog kernel API that allows the driver to
describe tis HW constraints to the watchdog code. A kernel worker in
the core is then used to extend the watchdog timeout on behalf of the
user space. This allows the drivers to be simplified as core takes
over the timer extending.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 drivers/watchdog/watchdog_core.c | 87 ++++++++++++++++++++++++++++++++++++++--
 drivers/watchdog/watchdog_dev.c  | 12 ++++++
 include/linux/watchdog.h         | 23 +++++++++++
 3 files changed, 119 insertions(+), 3 deletions(-)

diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index cec9b55..8e7d08d 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
 EXPORT_SYMBOL_GPL(watchdog_init_timeout);
 
 /**
+ * watchdog_init_timeout() - initialize generic watchdog parameters
+ * @wdd: Watchdog device to operate
+ * @dev: Device that stores the device tree properties
+ *
+ * Initialize the generic timeout parameters. The driver needs to set
+ * hw_features bitmask from @wdd prior calling this function in order
+ * to ensure the core knows how to handle the HW.
+ *
+ * A zero is returned on success and -EINVAL for failure.
+ */
+int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
+{
+	int ret = 0;
+
+	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
+	if (ret < 0)
+		return ret;
+
+	/*
+	 * Max HW timeout needs to be set so that core knows when to
+	 * use a kernel worker to support longer watchdog timeouts
+	 */
+	if (!wdd->hw_max_timeout)
+		return -EINVAL;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(watchdog_init_params);
+
+static void watchdog_worker(struct work_struct *work)
+{
+	struct watchdog_device *wdd = container_of(to_delayed_work(work),
+						struct watchdog_device, work);
+
+	if (time_after(jiffies, wdd->expires)) {
+		pr_crit("I will reset your machine !\n");
+		return;
+	}
+
+	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
+		(watchdog_active(wdd) &&
+			wdd->hw_max_timeout < wdd->timeout * HZ)) {
+		if (wdd->ops->ping)
+			wdd->ops->ping(wdd);
+		else
+			wdd->ops->start(wdd);
+
+		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
+	}
+}
+
+static int prepare_watchdog(struct watchdog_device *wdd)
+{
+	/* Stop the watchdog now before user space opens the device */
+	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
+		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
+		wdd->ops->stop(wdd);
+
+	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
+		/*
+		 * Can't stop it, use a kernel timer to tick
+		 * it until it's open by user space
+		 */
+		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
+	}
+	return 0;
+}
+
+/**
  * watchdog_register_device() - register a watchdog device
  * @wdd: watchdog device
  *
@@ -156,13 +225,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
 	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
 					NULL, "watchdog%d", wdd->id);
 	if (IS_ERR(wdd->dev)) {
-		watchdog_dev_unregister(wdd);
-		ida_simple_remove(&watchdog_ida, id);
 		ret = PTR_ERR(wdd->dev);
-		return ret;
+		goto dev_create_fail;
+	}
+
+	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
+
+	if (wdd->hw_max_timeout) {
+		ret = prepare_watchdog(wdd);
+		if (ret)
+			goto dev_create_fail;
 	}
 
 	return 0;
+
+dev_create_fail:
+	watchdog_dev_unregister(wdd);
+	ida_simple_remove(&watchdog_ida, id);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(watchdog_register_device);
 
@@ -181,6 +261,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
 	if (wdd == NULL)
 		return;
 
+	cancel_delayed_work_sync(&wdd->work);
 	devno = wdd->cdev.dev;
 	ret = watchdog_dev_unregister(wdd);
 	if (ret)
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 6aaefba..99eb363 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device *wddev)
 	else
 		err = wddev->ops->start(wddev); /* restart watchdog */
 
+	if (wddev->hw_max_timeout &&
+		wddev->timeout * HZ > wddev->hw_max_timeout) {
+		wddev->expires = jiffies + wddev->timeout * HZ;
+		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+	}
+
 out_ping:
 	mutex_unlock(&wddev->lock);
 	return err;
@@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device *wddev)
 	if (err == 0)
 		set_bit(WDOG_ACTIVE, &wddev->status);
 
+	if (wddev->hw_max_timeout &&
+		wddev->timeout * HZ > wddev->hw_max_timeout) {
+		wddev->expires = jiffies + wddev->timeout * HZ;
+		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+	}
+
 out_start:
 	mutex_unlock(&wddev->lock);
 	return err;
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 395b70e..c16c921 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -12,6 +12,7 @@
 #include <linux/bitops.h>
 #include <linux/device.h>
 #include <linux/cdev.h>
+#include <linux/workqueue.h>
 #include <uapi/linux/watchdog.h>
 
 struct watchdog_ops;
@@ -62,9 +63,13 @@ struct watchdog_ops {
  * @timeout:	The watchdog devices timeout value.
  * @min_timeout:The watchdog devices minimum timeout value.
  * @max_timeout:The watchdog devices maximum timeout value.
+ * @hw_max_timeout:The watchdog hardware maximum timeout value.
+ * @hw_margin:	Time interval between timer pings.
  * @driver-data:Pointer to the drivers private data.
  * @lock:	Lock for watchdog core internal use only.
+ * @work:	Worker used to provide longer timeouts than hardware supports.
  * @status:	Field that contains the devices internal status bits.
+ * @hw_features:Feature bits describing how the watchdog HW works.
  *
  * The watchdog_device structure contains all information about a
  * watchdog timer device.
@@ -86,8 +91,12 @@ struct watchdog_device {
 	unsigned int timeout;
 	unsigned int min_timeout;
 	unsigned int max_timeout;
+	unsigned int hw_max_timeout; /* in jiffies */
+	unsigned int hw_heartbeat; /* in jiffies */
+	unsigned long int expires; /* for soft timer */
 	void *driver_data;
 	struct mutex lock;
+	struct delayed_work work;
 	unsigned long status;
 /* Bit numbers for status flags */
 #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
@@ -95,6 +104,14 @@ struct watchdog_device {
 #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
 #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
 #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
+
+/* Bits describing features supported by the HW */
+	unsigned long hw_features;
+
+/* Can the watchdog be stopped and started */
+#define WDOG_HW_IS_STOPPABLE		BIT(0)
+/* Is the watchdog already running when the driver starts up */
+#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
 };
 
 #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
@@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
 		(t < wdd->min_timeout || t > wdd->max_timeout));
 }
 
+static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
+{
+	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
+}
+
 /* Use the following functions to manipulate watchdog driver specific data */
 static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
 {
@@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
 /* drivers/watchdog/watchdog_core.c */
 extern int watchdog_init_timeout(struct watchdog_device *wdd,
 				  unsigned int timeout_parm, struct device *dev);
+int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
 extern int watchdog_register_device(struct watchdog_device *);
 extern void watchdog_unregister_device(struct watchdog_device *);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv5 2/4] watchdog: Allow watchdog to reset device at early boot
  2015-04-10 13:06 ` Timo Kokkonen
@ 2015-04-10 13:06   ` Timo Kokkonen
  -1 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel, linux-watchdog, boris.brezillon, nicolas.ferre,
	alexandre.belloni
  Cc: Timo Kokkonen

Historically the watchdogs have always been stopped before user space
opens and takes over the device. This is not good on many production
systems where any crash, in kernel or user space, must always result
in a device reset.

Add a new early_timeout_sec parameter to the watchdog that gives user
space certain amount of time to set up itself and take over the
watchdog. Until this timeout has been reached the watchdog core takes
care of petting the watchdog HW. If there is any crash in kernel or
user space, reboot is guaranteed as watchdog hardware is never
stopped.

There is also mode of supplying zero seconds for the early_timeout_sec
parameter. In this mode the worker is not scheduled, so the watchdog
timer is not touched nor is the HW petted until user space takes over
it.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 drivers/watchdog/watchdog_core.c | 39 ++++++++++++++++++++++++++++++---------
 include/linux/watchdog.h         |  1 +
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index 8e7d08d..d6bedf7 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -111,12 +111,18 @@ EXPORT_SYMBOL_GPL(watchdog_init_timeout);
  */
 int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
 {
+	unsigned int t = 0;
 	int ret = 0;
 
 	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
 	if (ret < 0)
 		return ret;
 
+	if (!of_property_read_u32(dev->of_node, "early-timeout-sec", &t))
+		wdd->early_timeout_sec = t;
+	else
+		wdd->early_timeout_sec = -1;
+
 	/*
 	 * Max HW timeout needs to be set so that core knows when to
 	 * use a kernel worker to support longer watchdog timeouts
@@ -139,6 +145,7 @@ static void watchdog_worker(struct work_struct *work)
 	}
 
 	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
+		(!watchdog_active(wdd) && wdd->early_timeout_sec >= 0) ||
 		(watchdog_active(wdd) &&
 			wdd->hw_max_timeout < wdd->timeout * HZ)) {
 		if (wdd->ops->ping)
@@ -152,17 +159,31 @@ static void watchdog_worker(struct work_struct *work)
 
 static int prepare_watchdog(struct watchdog_device *wdd)
 {
-	/* Stop the watchdog now before user space opens the device */
-	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
-		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
-		wdd->ops->stop(wdd);
-
-	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
+	if (wdd->early_timeout_sec >= 0) {
 		/*
-		 * Can't stop it, use a kernel timer to tick
-		 * it until it's open by user space
+		 * early timeout, if set, ensures that watchdog will
+		 * reset the device unless user space opens the
+		 * watchdog device within the given interval.
 		 */
-		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
+		if (!(wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT))
+			wdd->ops->start(wdd);
+
+		if (wdd->early_timeout_sec > 0) {
+			schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
+		}
+	} else {
+		/* Stop the watchdog now before user space opens the device */
+		if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
+			wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
+			wdd->ops->stop(wdd);
+
+		} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
+			/*
+			 * Can't stop it, use a kernel timer to tick
+			 * it until it's open by user space
+			 */
+			schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
+		}
 	}
 	return 0;
 }
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index c16c921..6f868b6 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -94,6 +94,7 @@ struct watchdog_device {
 	unsigned int hw_max_timeout; /* in jiffies */
 	unsigned int hw_heartbeat; /* in jiffies */
 	unsigned long int expires; /* for soft timer */
+	unsigned int early_timeout_sec;
 	void *driver_data;
 	struct mutex lock;
 	struct delayed_work work;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv5 2/4] watchdog: Allow watchdog to reset device at early boot
@ 2015-04-10 13:06   ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel

Historically the watchdogs have always been stopped before user space
opens and takes over the device. This is not good on many production
systems where any crash, in kernel or user space, must always result
in a device reset.

Add a new early_timeout_sec parameter to the watchdog that gives user
space certain amount of time to set up itself and take over the
watchdog. Until this timeout has been reached the watchdog core takes
care of petting the watchdog HW. If there is any crash in kernel or
user space, reboot is guaranteed as watchdog hardware is never
stopped.

There is also mode of supplying zero seconds for the early_timeout_sec
parameter. In this mode the worker is not scheduled, so the watchdog
timer is not touched nor is the HW petted until user space takes over
it.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 drivers/watchdog/watchdog_core.c | 39 ++++++++++++++++++++++++++++++---------
 include/linux/watchdog.h         |  1 +
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index 8e7d08d..d6bedf7 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -111,12 +111,18 @@ EXPORT_SYMBOL_GPL(watchdog_init_timeout);
  */
 int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
 {
+	unsigned int t = 0;
 	int ret = 0;
 
 	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
 	if (ret < 0)
 		return ret;
 
+	if (!of_property_read_u32(dev->of_node, "early-timeout-sec", &t))
+		wdd->early_timeout_sec = t;
+	else
+		wdd->early_timeout_sec = -1;
+
 	/*
 	 * Max HW timeout needs to be set so that core knows when to
 	 * use a kernel worker to support longer watchdog timeouts
@@ -139,6 +145,7 @@ static void watchdog_worker(struct work_struct *work)
 	}
 
 	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
+		(!watchdog_active(wdd) && wdd->early_timeout_sec >= 0) ||
 		(watchdog_active(wdd) &&
 			wdd->hw_max_timeout < wdd->timeout * HZ)) {
 		if (wdd->ops->ping)
@@ -152,17 +159,31 @@ static void watchdog_worker(struct work_struct *work)
 
 static int prepare_watchdog(struct watchdog_device *wdd)
 {
-	/* Stop the watchdog now before user space opens the device */
-	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
-		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
-		wdd->ops->stop(wdd);
-
-	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
+	if (wdd->early_timeout_sec >= 0) {
 		/*
-		 * Can't stop it, use a kernel timer to tick
-		 * it until it's open by user space
+		 * early timeout, if set, ensures that watchdog will
+		 * reset the device unless user space opens the
+		 * watchdog device within the given interval.
 		 */
-		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
+		if (!(wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT))
+			wdd->ops->start(wdd);
+
+		if (wdd->early_timeout_sec > 0) {
+			schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
+		}
+	} else {
+		/* Stop the watchdog now before user space opens the device */
+		if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
+			wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
+			wdd->ops->stop(wdd);
+
+		} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
+			/*
+			 * Can't stop it, use a kernel timer to tick
+			 * it until it's open by user space
+			 */
+			schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
+		}
 	}
 	return 0;
 }
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index c16c921..6f868b6 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -94,6 +94,7 @@ struct watchdog_device {
 	unsigned int hw_max_timeout; /* in jiffies */
 	unsigned int hw_heartbeat; /* in jiffies */
 	unsigned long int expires; /* for soft timer */
+	unsigned int early_timeout_sec;
 	void *driver_data;
 	struct mutex lock;
 	struct delayed_work work;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv5 3/4] devicetree: Document generic watchdog properties
  2015-04-10 13:06 ` Timo Kokkonen
@ 2015-04-10 13:06   ` Timo Kokkonen
  -1 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel, linux-watchdog, boris.brezillon, nicolas.ferre,
	alexandre.belloni
  Cc: Timo Kokkonen

There is no documentation for the watchdog properties that are common
among most of the watchdog drivers. Add document where these generic
properties can be described and told how they should be used in
drivers.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 .../devicetree/bindings/watchdog/watchdog.txt        | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/watchdog.txt

diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
new file mode 100644
index 0000000..3781406
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
@@ -0,0 +1,20 @@
+These properties are common among most watchdog drivers. Any driver
+that requires the functionality listed below should implement them
+using these definitions.
+
+Optional properties:
+- timeout-sec: Contains the watchdog timeout in seconds.
+- early-timeout-sec: If present, specify the timeout in seconds for
+  how long it can take for the watchdog daemon to take over the
+  watchdog device. If driver supports this property it must ensure the
+  watchdog hardware is running during this period and a watchdog reset
+  must occur if user space fails to open the device in time. If left
+  zero, the driver only needs to guarantee the watchdog is not
+  stopped or is started during driver init.
+
+Example:
+
+watchdog {
+	 timeout-sec = <60>;
+	 early-timeout-sec = <120>;
+};
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv5 3/4] devicetree: Document generic watchdog properties
@ 2015-04-10 13:06   ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel

There is no documentation for the watchdog properties that are common
among most of the watchdog drivers. Add document where these generic
properties can be described and told how they should be used in
drivers.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 .../devicetree/bindings/watchdog/watchdog.txt        | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/watchdog.txt

diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
new file mode 100644
index 0000000..3781406
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
@@ -0,0 +1,20 @@
+These properties are common among most watchdog drivers. Any driver
+that requires the functionality listed below should implement them
+using these definitions.
+
+Optional properties:
+- timeout-sec: Contains the watchdog timeout in seconds.
+- early-timeout-sec: If present, specify the timeout in seconds for
+  how long it can take for the watchdog daemon to take over the
+  watchdog device. If driver supports this property it must ensure the
+  watchdog hardware is running during this period and a watchdog reset
+  must occur if user space fails to open the device in time. If left
+  zero, the driver only needs to guarantee the watchdog is not
+  stopped or is started during driver init.
+
+Example:
+
+watchdog {
+	 timeout-sec = <60>;
+	 early-timeout-sec = <120>;
+};
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv5 4/4] watchdog: at91sam9_wdt: Convert to use new watchdog core extensions
  2015-04-10 13:06 ` Timo Kokkonen
@ 2015-04-10 13:06   ` Timo Kokkonen
  -1 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel, linux-watchdog, boris.brezillon, nicolas.ferre,
	alexandre.belloni
  Cc: Timo Kokkonen

Fill in the new watchdog core parameters and call
watchdog_init_params() to let core know we are ready to support the
new core API extensions. This allows the ad hoc timer code to be
removed from the driver as the watchdog core takes care of petting the
driver as needed.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 drivers/watchdog/at91sam9_wdt.c | 58 +++++++++--------------------------------
 1 file changed, 12 insertions(+), 46 deletions(-)

diff --git a/drivers/watchdog/at91sam9_wdt.c b/drivers/watchdog/at91sam9_wdt.c
index 1443b3c..a30dadb 100644
--- a/drivers/watchdog/at91sam9_wdt.c
+++ b/drivers/watchdog/at91sam9_wdt.c
@@ -29,7 +29,6 @@
 #include <linux/types.h>
 #include <linux/watchdog.h>
 #include <linux/jiffies.h>
-#include <linux/timer.h>
 #include <linux/bitops.h>
 #include <linux/uaccess.h>
 #include <linux/of.h>
@@ -83,11 +82,8 @@ MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started "
 struct at91wdt {
 	struct watchdog_device wdd;
 	void __iomem *base;
-	unsigned long next_heartbeat;	/* the next_heartbeat for the timer */
-	struct timer_list timer;	/* The timer that pings the watchdog */
 	u32 mr;
 	u32 mr_mask;
-	unsigned long heartbeat;	/* WDT heartbeat in jiffies */
 	bool nowayout;
 	unsigned int irq;
 };
@@ -115,26 +111,11 @@ static inline void at91_wdt_reset(struct at91wdt *wdt)
 	wdt_write(wdt, AT91_WDT_CR, AT91_WDT_KEY | AT91_WDT_WDRSTT);
 }
 
-/*
- * Timer tick
- */
-static void at91_ping(unsigned long data)
-{
-	struct at91wdt *wdt = (struct at91wdt *)data;
-	if (time_before(jiffies, wdt->next_heartbeat) ||
-	    !watchdog_active(&wdt->wdd)) {
-		at91_wdt_reset(wdt);
-		mod_timer(&wdt->timer, jiffies + wdt->heartbeat);
-	} else {
-		pr_crit("I will reset your machine !\n");
-	}
-}
-
 static int at91_wdt_start(struct watchdog_device *wdd)
 {
 	struct at91wdt *wdt = to_wdt(wdd);
-	/* calculate when the next userspace timeout will be */
-	wdt->next_heartbeat = jiffies + wdd->timeout * HZ;
+
+	at91_wdt_reset(wdt);
 	return 0;
 }
 
@@ -146,7 +127,7 @@ static int at91_wdt_stop(struct watchdog_device *wdd)
 
 static int at91_wdt_set_timeout(struct watchdog_device *wdd, unsigned int new_timeout)
 {
-	wdd->timeout = new_timeout;
+	/* The watchdog hardware can't be reconfigured */
 	return at91_wdt_start(wdd);
 }
 
@@ -196,11 +177,11 @@ static int at91_wdt_init(struct platform_device *pdev, struct at91wdt *wdt)
 	 * it at the min_heartbeat period.
 	 */
 	if ((max_heartbeat / 4) >= min_heartbeat)
-		wdt->heartbeat = max_heartbeat / 4;
+		wdt->wdd.hw_heartbeat = max_heartbeat / 4;
 	else if ((max_heartbeat / 2) >= min_heartbeat)
-		wdt->heartbeat = max_heartbeat / 2;
+		wdt->wdd.hw_heartbeat = max_heartbeat / 2;
 	else
-		wdt->heartbeat = min_heartbeat;
+		wdt->wdd.hw_heartbeat = min_heartbeat;
 
 	if (max_heartbeat < min_heartbeat + 4)
 		dev_warn(dev,
@@ -220,32 +201,15 @@ static int at91_wdt_init(struct platform_device *pdev, struct at91wdt *wdt)
 			 "watchdog already configured differently (mr = %x expecting %x)\n",
 			 tmp & wdt->mr_mask, wdt->mr & wdt->mr_mask);
 
-	setup_timer(&wdt->timer, at91_ping, (unsigned long)wdt);
+	wdt->wdd.hw_features = WDOG_HW_RUNNING_AT_BOOT;
+	watchdog_init_params(&wdt->wdd, dev);
 
-	/*
-	 * Use min_heartbeat the first time to avoid spurious watchdog reset:
-	 * we don't know for how long the watchdog counter is running, and
-	 *  - resetting it right now might trigger a watchdog fault reset
-	 *  - waiting for heartbeat time might lead to a watchdog timeout
-	 *    reset
-	 */
-	mod_timer(&wdt->timer, jiffies + min_heartbeat);
-
-	/* Try to set timeout from device tree first */
-	if (watchdog_init_timeout(&wdt->wdd, 0, dev))
-		watchdog_init_timeout(&wdt->wdd, heartbeat, dev);
 	watchdog_set_nowayout(&wdt->wdd, wdt->nowayout);
 	err = watchdog_register_device(&wdt->wdd);
 	if (err)
-		goto out_stop_timer;
-
-	wdt->next_heartbeat = jiffies + wdt->wdd.timeout * HZ;
+		return err;
 
 	return 0;
-
-out_stop_timer:
-	del_timer(&wdt->timer);
-	return err;
 }
 
 /* ......................................................................... */
@@ -287,6 +251,8 @@ static int of_at91wdt_init(struct device_node *np, struct at91wdt *wdt)
 		}
 	}
 
+	wdt->wdd.hw_max_timeout = max * HZ;
+
 	min = secs_to_ticks(min);
 	max = secs_to_ticks(max);
 
@@ -346,6 +312,7 @@ static int __init at91wdt_probe(struct platform_device *pdev)
 	wdt->wdd.timeout = WDT_HEARTBEAT;
 	wdt->wdd.min_timeout = 1;
 	wdt->wdd.max_timeout = 0xFFFF;
+	wdt->wdd.hw_max_timeout = 15 * HZ;
 
 	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	wdt->base = devm_ioremap_resource(&pdev->dev, r);
@@ -376,7 +343,6 @@ static int __exit at91wdt_remove(struct platform_device *pdev)
 	watchdog_unregister_device(&wdt->wdd);
 
 	pr_warn("I quit now, hardware will probably reboot!\n");
-	del_timer(&wdt->timer);
 
 	return 0;
 }
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCHv5 4/4] watchdog: at91sam9_wdt: Convert to use new watchdog core extensions
@ 2015-04-10 13:06   ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-10 13:06 UTC (permalink / raw)
  To: linux-arm-kernel

Fill in the new watchdog core parameters and call
watchdog_init_params() to let core know we are ready to support the
new core API extensions. This allows the ad hoc timer code to be
removed from the driver as the watchdog core takes care of petting the
driver as needed.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 drivers/watchdog/at91sam9_wdt.c | 58 +++++++++--------------------------------
 1 file changed, 12 insertions(+), 46 deletions(-)

diff --git a/drivers/watchdog/at91sam9_wdt.c b/drivers/watchdog/at91sam9_wdt.c
index 1443b3c..a30dadb 100644
--- a/drivers/watchdog/at91sam9_wdt.c
+++ b/drivers/watchdog/at91sam9_wdt.c
@@ -29,7 +29,6 @@
 #include <linux/types.h>
 #include <linux/watchdog.h>
 #include <linux/jiffies.h>
-#include <linux/timer.h>
 #include <linux/bitops.h>
 #include <linux/uaccess.h>
 #include <linux/of.h>
@@ -83,11 +82,8 @@ MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started "
 struct at91wdt {
 	struct watchdog_device wdd;
 	void __iomem *base;
-	unsigned long next_heartbeat;	/* the next_heartbeat for the timer */
-	struct timer_list timer;	/* The timer that pings the watchdog */
 	u32 mr;
 	u32 mr_mask;
-	unsigned long heartbeat;	/* WDT heartbeat in jiffies */
 	bool nowayout;
 	unsigned int irq;
 };
@@ -115,26 +111,11 @@ static inline void at91_wdt_reset(struct at91wdt *wdt)
 	wdt_write(wdt, AT91_WDT_CR, AT91_WDT_KEY | AT91_WDT_WDRSTT);
 }
 
-/*
- * Timer tick
- */
-static void at91_ping(unsigned long data)
-{
-	struct at91wdt *wdt = (struct at91wdt *)data;
-	if (time_before(jiffies, wdt->next_heartbeat) ||
-	    !watchdog_active(&wdt->wdd)) {
-		at91_wdt_reset(wdt);
-		mod_timer(&wdt->timer, jiffies + wdt->heartbeat);
-	} else {
-		pr_crit("I will reset your machine !\n");
-	}
-}
-
 static int at91_wdt_start(struct watchdog_device *wdd)
 {
 	struct at91wdt *wdt = to_wdt(wdd);
-	/* calculate when the next userspace timeout will be */
-	wdt->next_heartbeat = jiffies + wdd->timeout * HZ;
+
+	at91_wdt_reset(wdt);
 	return 0;
 }
 
@@ -146,7 +127,7 @@ static int at91_wdt_stop(struct watchdog_device *wdd)
 
 static int at91_wdt_set_timeout(struct watchdog_device *wdd, unsigned int new_timeout)
 {
-	wdd->timeout = new_timeout;
+	/* The watchdog hardware can't be reconfigured */
 	return at91_wdt_start(wdd);
 }
 
@@ -196,11 +177,11 @@ static int at91_wdt_init(struct platform_device *pdev, struct at91wdt *wdt)
 	 * it at the min_heartbeat period.
 	 */
 	if ((max_heartbeat / 4) >= min_heartbeat)
-		wdt->heartbeat = max_heartbeat / 4;
+		wdt->wdd.hw_heartbeat = max_heartbeat / 4;
 	else if ((max_heartbeat / 2) >= min_heartbeat)
-		wdt->heartbeat = max_heartbeat / 2;
+		wdt->wdd.hw_heartbeat = max_heartbeat / 2;
 	else
-		wdt->heartbeat = min_heartbeat;
+		wdt->wdd.hw_heartbeat = min_heartbeat;
 
 	if (max_heartbeat < min_heartbeat + 4)
 		dev_warn(dev,
@@ -220,32 +201,15 @@ static int at91_wdt_init(struct platform_device *pdev, struct at91wdt *wdt)
 			 "watchdog already configured differently (mr = %x expecting %x)\n",
 			 tmp & wdt->mr_mask, wdt->mr & wdt->mr_mask);
 
-	setup_timer(&wdt->timer, at91_ping, (unsigned long)wdt);
+	wdt->wdd.hw_features = WDOG_HW_RUNNING_AT_BOOT;
+	watchdog_init_params(&wdt->wdd, dev);
 
-	/*
-	 * Use min_heartbeat the first time to avoid spurious watchdog reset:
-	 * we don't know for how long the watchdog counter is running, and
-	 *  - resetting it right now might trigger a watchdog fault reset
-	 *  - waiting for heartbeat time might lead to a watchdog timeout
-	 *    reset
-	 */
-	mod_timer(&wdt->timer, jiffies + min_heartbeat);
-
-	/* Try to set timeout from device tree first */
-	if (watchdog_init_timeout(&wdt->wdd, 0, dev))
-		watchdog_init_timeout(&wdt->wdd, heartbeat, dev);
 	watchdog_set_nowayout(&wdt->wdd, wdt->nowayout);
 	err = watchdog_register_device(&wdt->wdd);
 	if (err)
-		goto out_stop_timer;
-
-	wdt->next_heartbeat = jiffies + wdt->wdd.timeout * HZ;
+		return err;
 
 	return 0;
-
-out_stop_timer:
-	del_timer(&wdt->timer);
-	return err;
 }
 
 /* ......................................................................... */
@@ -287,6 +251,8 @@ static int of_at91wdt_init(struct device_node *np, struct at91wdt *wdt)
 		}
 	}
 
+	wdt->wdd.hw_max_timeout = max * HZ;
+
 	min = secs_to_ticks(min);
 	max = secs_to_ticks(max);
 
@@ -346,6 +312,7 @@ static int __init at91wdt_probe(struct platform_device *pdev)
 	wdt->wdd.timeout = WDT_HEARTBEAT;
 	wdt->wdd.min_timeout = 1;
 	wdt->wdd.max_timeout = 0xFFFF;
+	wdt->wdd.hw_max_timeout = 15 * HZ;
 
 	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	wdt->base = devm_ioremap_resource(&pdev->dev, r);
@@ -376,7 +343,6 @@ static int __exit at91wdt_remove(struct platform_device *pdev)
 	watchdog_unregister_device(&wdt->wdd);
 
 	pr_warn("I quit now, hardware will probably reboot!\n");
-	del_timer(&wdt->timer);
 
 	return 0;
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCHv5 0/4] watchdog: Extend kernel API and add early_timeout_sec feature
  2015-04-10 13:06 ` Timo Kokkonen
@ 2015-04-13  7:55   ` Yang, Wenyou
  -1 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-13  7:55 UTC (permalink / raw)
  To: Timo Kokkonen, linux-arm-kernel, linux-watchdog, boris.brezillon,
	nicolas.ferre, alexandre.belloni



On 2015/4/10 21:06, Timo Kokkonen wrote:
> The watchdog kernel API is quite limited. It has support for providing
> generic device handling, but it doesn't really know anything about the
> watchdog hardware or its constraints. The watchdog drivers come with a
> lot of diversity and their own set of quirks and constraints. Some of
> their limitations are not nice for the user space, so the drivers work
> around them with all sorts of ad hoc implementations.
>
> One common pattern is to use kernel timers or work queues to allow
> longer timeout parameters than the actual hardware supports. To solve
> this problem, this patch set extends the kernel watchdog API with a few
> parameters that let the core know more about the watchdog HW and take
> care about the timeout extending.
>
> The patch set also implements "early_timeout_sec" feature that is very
> common on many production systems where early kernel or user space
> crashes must lead to a device reset. Traditional watchdog handling
> does not allow this as the watchdog is stopped (fully or emulating
> stopped state with kernel timers) before user space opens it for the
> first time.
>
> The changes are designed to be taken in use one driver at time. If the
> driver does not set the new parameters and call
> watchdog_init_params(), the watchdog behavior is exactly the same as
> before.
>
> In principle this new API makes it possible for the user space to see
> every watchdog hardware to behave the same, at least in terms of
> watchdog timeouts. Once the API is in, it should be easier to move
> even more common behavior out of the driver code to the watchdog core
> and make the drivers simpler.
>
> I'm not anticipating this patch set to go in as is, this is merely to
> help bringing up the discussion as there is a working patch set to look
> at. There are propably some corner cases that I didn't get right
> yet. Also, we could even remove the maximum timeout completely as the
> HW no longer limits the watchdog timeouts. This propably needs some
> thinking still and your commenting.
>
> Please review and give feedback.
Tested on the SAMA5D4EK.

Tested-by Wenyou Yang <wenyou.yang@atmel.com>
>
> Patch revision history:
>
> -v5: Re-think the approach to be fully generic. The early_timeout_sec
>    handling is no longer in the driver but in the watchdog core. As a
>    result the core needed to gain knowledge about the watchdog
>    hardware. Appropriate handling is added in the core. The side effect
>    for this is that drivers using the new extensions can be simplified
>    a lot and different kinds of watchdog hardware can be made to
>    behave the same for the user space.
>
> -v4: Binding documentation is now separated completely from the driver
>    patch. The documentation no longer makes any assumptions about how
>    the actual implementation is made, it just describes the actual
>    behavior the driver should implement in order to satisfy the
>    requirement.
>
> - v3: Rename the property to "early-timeout-sec" and use it as a
>    timeout value that stops the timer in the atmel driver after the
>    timeout expires. A watchdog.txt is also introduced for documenting
>    the common watchdog properties, including now this one and
>    "timeout-sec" property.
>
> - v2: Rename the property to "enable-early-reset" as the behavior
>    itself is not atmel specific. This way other drivers are free to
>    implement same behavior with the same property name.
>
> - v1: Propose property name "atmle,no-early-timer" for disabling the
>    timer that keeps the atmel watchdog running until user space opens
>    the device.
>
>
> Timo Kokkonen (4):
>    watchdog: Extend kernel API to know about HW limitations
>    watchdog: Allow watchdog to reset device at early boot
>    devicetree: Document generic watchdog properties
>    watchdog: at91sam9_wdt: Convert to use new watchdog core extensions
>
>   .../devicetree/bindings/watchdog/watchdog.txt      |  20 ++++
>   drivers/watchdog/at91sam9_wdt.c                    |  58 +++--------
>   drivers/watchdog/watchdog_core.c                   | 108 ++++++++++++++++++++-
>   drivers/watchdog/watchdog_dev.c                    |  12 +++
>   include/linux/watchdog.h                           |  24 +++++
>   5 files changed, 173 insertions(+), 49 deletions(-)
>   create mode 100644 Documentation/devicetree/bindings/watchdog/watchdog.txt
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 0/4] watchdog: Extend kernel API and add early_timeout_sec feature
@ 2015-04-13  7:55   ` Yang, Wenyou
  0 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-13  7:55 UTC (permalink / raw)
  To: linux-arm-kernel



On 2015/4/10 21:06, Timo Kokkonen wrote:
> The watchdog kernel API is quite limited. It has support for providing
> generic device handling, but it doesn't really know anything about the
> watchdog hardware or its constraints. The watchdog drivers come with a
> lot of diversity and their own set of quirks and constraints. Some of
> their limitations are not nice for the user space, so the drivers work
> around them with all sorts of ad hoc implementations.
>
> One common pattern is to use kernel timers or work queues to allow
> longer timeout parameters than the actual hardware supports. To solve
> this problem, this patch set extends the kernel watchdog API with a few
> parameters that let the core know more about the watchdog HW and take
> care about the timeout extending.
>
> The patch set also implements "early_timeout_sec" feature that is very
> common on many production systems where early kernel or user space
> crashes must lead to a device reset. Traditional watchdog handling
> does not allow this as the watchdog is stopped (fully or emulating
> stopped state with kernel timers) before user space opens it for the
> first time.
>
> The changes are designed to be taken in use one driver at time. If the
> driver does not set the new parameters and call
> watchdog_init_params(), the watchdog behavior is exactly the same as
> before.
>
> In principle this new API makes it possible for the user space to see
> every watchdog hardware to behave the same, at least in terms of
> watchdog timeouts. Once the API is in, it should be easier to move
> even more common behavior out of the driver code to the watchdog core
> and make the drivers simpler.
>
> I'm not anticipating this patch set to go in as is, this is merely to
> help bringing up the discussion as there is a working patch set to look
> at. There are propably some corner cases that I didn't get right
> yet. Also, we could even remove the maximum timeout completely as the
> HW no longer limits the watchdog timeouts. This propably needs some
> thinking still and your commenting.
>
> Please review and give feedback.
Tested on the SAMA5D4EK.

Tested-by Wenyou Yang <wenyou.yang@atmel.com>
>
> Patch revision history:
>
> -v5: Re-think the approach to be fully generic. The early_timeout_sec
>    handling is no longer in the driver but in the watchdog core. As a
>    result the core needed to gain knowledge about the watchdog
>    hardware. Appropriate handling is added in the core. The side effect
>    for this is that drivers using the new extensions can be simplified
>    a lot and different kinds of watchdog hardware can be made to
>    behave the same for the user space.
>
> -v4: Binding documentation is now separated completely from the driver
>    patch. The documentation no longer makes any assumptions about how
>    the actual implementation is made, it just describes the actual
>    behavior the driver should implement in order to satisfy the
>    requirement.
>
> - v3: Rename the property to "early-timeout-sec" and use it as a
>    timeout value that stops the timer in the atmel driver after the
>    timeout expires. A watchdog.txt is also introduced for documenting
>    the common watchdog properties, including now this one and
>    "timeout-sec" property.
>
> - v2: Rename the property to "enable-early-reset" as the behavior
>    itself is not atmel specific. This way other drivers are free to
>    implement same behavior with the same property name.
>
> - v1: Propose property name "atmle,no-early-timer" for disabling the
>    timer that keeps the atmel watchdog running until user space opens
>    the device.
>
>
> Timo Kokkonen (4):
>    watchdog: Extend kernel API to know about HW limitations
>    watchdog: Allow watchdog to reset device at early boot
>    devicetree: Document generic watchdog properties
>    watchdog: at91sam9_wdt: Convert to use new watchdog core extensions
>
>   .../devicetree/bindings/watchdog/watchdog.txt      |  20 ++++
>   drivers/watchdog/at91sam9_wdt.c                    |  58 +++--------
>   drivers/watchdog/watchdog_core.c                   | 108 ++++++++++++++++++++-
>   drivers/watchdog/watchdog_dev.c                    |  12 +++
>   include/linux/watchdog.h                           |  24 +++++
>   5 files changed, 173 insertions(+), 49 deletions(-)
>   create mode 100644 Documentation/devicetree/bindings/watchdog/watchdog.txt
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv5 2/4] watchdog: Allow watchdog to reset device at early boot
  2015-04-10 13:06   ` Timo Kokkonen
@ 2015-04-13  7:56     ` Yang, Wenyou
  -1 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-13  7:56 UTC (permalink / raw)
  To: Timo Kokkonen, linux-arm-kernel, linux-watchdog, boris.brezillon,
	nicolas.ferre, alexandre.belloni



On 2015/4/10 21:06, Timo Kokkonen wrote:
> Historically the watchdogs have always been stopped before user space
> opens and takes over the device. This is not good on many production
> systems where any crash, in kernel or user space, must always result
> in a device reset.
>
> Add a new early_timeout_sec parameter to the watchdog that gives user
> space certain amount of time to set up itself and take over the
> watchdog. Until this timeout has been reached the watchdog core takes
> care of petting the watchdog HW. If there is any crash in kernel or
> user space, reboot is guaranteed as watchdog hardware is never
> stopped.
>
> There is also mode of supplying zero seconds for the early_timeout_sec
> parameter. In this mode the worker is not scheduled, so the watchdog
> timer is not touched nor is the HW petted until user space takes over
> it.
>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>   drivers/watchdog/watchdog_core.c | 39 ++++++++++++++++++++++++++++++---------
>   include/linux/watchdog.h         |  1 +
>   2 files changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index 8e7d08d..d6bedf7 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -111,12 +111,18 @@ EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>    */
>   int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
>   {
> +	unsigned int t = 0;
>   	int ret = 0;
>   
>   	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>   	if (ret < 0)
>   		return ret;
>   
> +	if (!of_property_read_u32(dev->of_node, "early-timeout-sec", &t))
> +		wdd->early_timeout_sec = t;
> +	else
> +		wdd->early_timeout_sec = -1;
> +
>   	/*
>   	 * Max HW timeout needs to be set so that core knows when to
>   	 * use a kernel worker to support longer watchdog timeouts
> @@ -139,6 +145,7 @@ static void watchdog_worker(struct work_struct *work)
>   	}
>   
>   	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> +		(!watchdog_active(wdd) && wdd->early_timeout_sec >= 0) ||
>   		(watchdog_active(wdd) &&
>   			wdd->hw_max_timeout < wdd->timeout * HZ)) {
>   		if (wdd->ops->ping)
> @@ -152,17 +159,31 @@ static void watchdog_worker(struct work_struct *work)
>   
>   static int prepare_watchdog(struct watchdog_device *wdd)
>   {
> -	/* Stop the watchdog now before user space opens the device */
> -	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> -		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> -		wdd->ops->stop(wdd);
> -
> -	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> +	if (wdd->early_timeout_sec >= 0) {
>   		/*
> -		 * Can't stop it, use a kernel timer to tick
> -		 * it until it's open by user space
> +		 * early timeout, if set, ensures that watchdog will
> +		 * reset the device unless user space opens the
> +		 * watchdog device within the given interval.
>   		 */
> -		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +		if (!(wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT))
> +			wdd->ops->start(wdd);
> +
> +		if (wdd->early_timeout_sec > 0) {
> +			schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
> +		}

This bracket is unnecessary.

> +	} else {
> +		/* Stop the watchdog now before user space opens the device */
> +		if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> +			wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +			wdd->ops->stop(wdd);
> +
> +		} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> +			/*
> +			 * Can't stop it, use a kernel timer to tick
> +			 * it until it's open by user space
> +			 */
> +			schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +		}
>   	}
>   	return 0;
>   }
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index c16c921..6f868b6 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -94,6 +94,7 @@ struct watchdog_device {
>   	unsigned int hw_max_timeout; /* in jiffies */
>   	unsigned int hw_heartbeat; /* in jiffies */
>   	unsigned long int expires; /* for soft timer */
> +	unsigned int early_timeout_sec;
>   	void *driver_data;
>   	struct mutex lock;
>   	struct delayed_work work;
Best Regards,
Wenyou Yang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 2/4] watchdog: Allow watchdog to reset device at early boot
@ 2015-04-13  7:56     ` Yang, Wenyou
  0 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-13  7:56 UTC (permalink / raw)
  To: linux-arm-kernel



On 2015/4/10 21:06, Timo Kokkonen wrote:
> Historically the watchdogs have always been stopped before user space
> opens and takes over the device. This is not good on many production
> systems where any crash, in kernel or user space, must always result
> in a device reset.
>
> Add a new early_timeout_sec parameter to the watchdog that gives user
> space certain amount of time to set up itself and take over the
> watchdog. Until this timeout has been reached the watchdog core takes
> care of petting the watchdog HW. If there is any crash in kernel or
> user space, reboot is guaranteed as watchdog hardware is never
> stopped.
>
> There is also mode of supplying zero seconds for the early_timeout_sec
> parameter. In this mode the worker is not scheduled, so the watchdog
> timer is not touched nor is the HW petted until user space takes over
> it.
>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>   drivers/watchdog/watchdog_core.c | 39 ++++++++++++++++++++++++++++++---------
>   include/linux/watchdog.h         |  1 +
>   2 files changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index 8e7d08d..d6bedf7 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -111,12 +111,18 @@ EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>    */
>   int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
>   {
> +	unsigned int t = 0;
>   	int ret = 0;
>   
>   	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>   	if (ret < 0)
>   		return ret;
>   
> +	if (!of_property_read_u32(dev->of_node, "early-timeout-sec", &t))
> +		wdd->early_timeout_sec = t;
> +	else
> +		wdd->early_timeout_sec = -1;
> +
>   	/*
>   	 * Max HW timeout needs to be set so that core knows when to
>   	 * use a kernel worker to support longer watchdog timeouts
> @@ -139,6 +145,7 @@ static void watchdog_worker(struct work_struct *work)
>   	}
>   
>   	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> +		(!watchdog_active(wdd) && wdd->early_timeout_sec >= 0) ||
>   		(watchdog_active(wdd) &&
>   			wdd->hw_max_timeout < wdd->timeout * HZ)) {
>   		if (wdd->ops->ping)
> @@ -152,17 +159,31 @@ static void watchdog_worker(struct work_struct *work)
>   
>   static int prepare_watchdog(struct watchdog_device *wdd)
>   {
> -	/* Stop the watchdog now before user space opens the device */
> -	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> -		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> -		wdd->ops->stop(wdd);
> -
> -	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> +	if (wdd->early_timeout_sec >= 0) {
>   		/*
> -		 * Can't stop it, use a kernel timer to tick
> -		 * it until it's open by user space
> +		 * early timeout, if set, ensures that watchdog will
> +		 * reset the device unless user space opens the
> +		 * watchdog device within the given interval.
>   		 */
> -		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +		if (!(wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT))
> +			wdd->ops->start(wdd);
> +
> +		if (wdd->early_timeout_sec > 0) {
> +			schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
> +		}

This bracket is unnecessary.

> +	} else {
> +		/* Stop the watchdog now before user space opens the device */
> +		if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> +			wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +			wdd->ops->stop(wdd);
> +
> +		} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> +			/*
> +			 * Can't stop it, use a kernel timer to tick
> +			 * it until it's open by user space
> +			 */
> +			schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +		}
>   	}
>   	return 0;
>   }
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index c16c921..6f868b6 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -94,6 +94,7 @@ struct watchdog_device {
>   	unsigned int hw_max_timeout; /* in jiffies */
>   	unsigned int hw_heartbeat; /* in jiffies */
>   	unsigned long int expires; /* for soft timer */
> +	unsigned int early_timeout_sec;
>   	void *driver_data;
>   	struct mutex lock;
>   	struct delayed_work work;
Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
  2015-04-10 13:06   ` Timo Kokkonen
@ 2015-04-13  7:56     ` Yang, Wenyou
  -1 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-13  7:56 UTC (permalink / raw)
  To: Timo Kokkonen, linux-arm-kernel, linux-watchdog, boris.brezillon,
	nicolas.ferre, alexandre.belloni


Hi Timo,

There is trivial typo.

On 2015/4/10 21:06, Timo Kokkonen wrote:
> There is a great deal of diversity in the watchdog hardware found on
> different devices. Differen hardware have different contstraints on
> them, many of the constraints that are excessively difficult for the
> user space to satisfy.
>
> One such constraint is the length of the timeout value, which in many
> cases can be just a few seconds. Drivers are creating ad hoc solutions
> with timers and workqueues to extend the timeout in order to give user
> space more time between updates. Looking at the drivers it is clear
> that this has resulted to a lot of duplicate code.
>
> Add an extension to the watchdog kernel API that allows the driver to
> describe tis HW constraints to the watchdog code. A kernel worker in
> the core is then used to extend the watchdog timeout on behalf of the
> user space. This allows the drivers to be simplified as core takes
> over the timer extending.
>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>   drivers/watchdog/watchdog_core.c | 87 ++++++++++++++++++++++++++++++++++++++--
>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>   include/linux/watchdog.h         | 23 +++++++++++
>   3 files changed, 119 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index cec9b55..8e7d08d 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>   
>   /**
> + * watchdog_init_timeout() - initialize generic watchdog parameters
s/watchdog_init_timeout/watchdog_init_params/ ?
> + * @wdd: Watchdog device to operate
> + * @dev: Device that stores the device tree properties
> + *
> + * Initialize the generic timeout parameters. The driver needs to set
> + * hw_features bitmask from @wdd prior calling this function in order
> + * to ensure the core knows how to handle the HW.
> + *
> + * A zero is returned on success and -EINVAL for failure.
> + */
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	/*
> +	 * Max HW timeout needs to be set so that core knows when to
> +	 * use a kernel worker to support longer watchdog timeouts
> +	 */
> +	if (!wdd->hw_max_timeout)
> +		return -EINVAL;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> +
> +static void watchdog_worker(struct work_struct *work)
> +{
> +	struct watchdog_device *wdd = container_of(to_delayed_work(work),
> +						struct watchdog_device, work);
> +
> +	if (time_after(jiffies, wdd->expires)) {
> +		pr_crit("I will reset your machine !\n");
> +		return;
> +	}
> +
> +	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> +		(watchdog_active(wdd) &&
> +			wdd->hw_max_timeout < wdd->timeout * HZ)) {
> +		if (wdd->ops->ping)
> +			wdd->ops->ping(wdd);
> +		else
> +			wdd->ops->start(wdd);
> +
> +		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
> +	}
> +}
> +
> +static int prepare_watchdog(struct watchdog_device *wdd)
> +{
> +	/* Stop the watchdog now before user space opens the device */
> +	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +		wdd->ops->stop(wdd);
> +
> +	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> +		/*
> +		 * Can't stop it, use a kernel timer to tick
> +		 * it until it's open by user space
> +		 */
> +		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +	}
> +	return 0;
> +}
> +
> +/**
>    * watchdog_register_device() - register a watchdog device
>    * @wdd: watchdog device
>    *
> @@ -156,13 +225,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
>   	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>   					NULL, "watchdog%d", wdd->id);
>   	if (IS_ERR(wdd->dev)) {
> -		watchdog_dev_unregister(wdd);
> -		ida_simple_remove(&watchdog_ida, id);
>   		ret = PTR_ERR(wdd->dev);
> -		return ret;
> +		goto dev_create_fail;
> +	}
> +
> +	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> +
> +	if (wdd->hw_max_timeout) {
> +		ret = prepare_watchdog(wdd);
> +		if (ret)
> +			goto dev_create_fail;
>   	}
>   
>   	return 0;
> +
> +dev_create_fail:
> +	watchdog_dev_unregister(wdd);
> +	ida_simple_remove(&watchdog_ida, id);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>   
> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
>   	if (wdd == NULL)
>   		return;
>   
> +	cancel_delayed_work_sync(&wdd->work);
>   	devno = wdd->cdev.dev;
>   	ret = watchdog_dev_unregister(wdd);
>   	if (ret)
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 6aaefba..99eb363 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device *wddev)
>   	else
>   		err = wddev->ops->start(wddev); /* restart watchdog */
>   
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
> +
>   out_ping:
>   	mutex_unlock(&wddev->lock);
>   	return err;
> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device *wddev)
>   	if (err == 0)
>   		set_bit(WDOG_ACTIVE, &wddev->status);
>   
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
> +
>   out_start:
>   	mutex_unlock(&wddev->lock);
>   	return err;
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index 395b70e..c16c921 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -12,6 +12,7 @@
>   #include <linux/bitops.h>
>   #include <linux/device.h>
>   #include <linux/cdev.h>
> +#include <linux/workqueue.h>
>   #include <uapi/linux/watchdog.h>
>   
>   struct watchdog_ops;
> @@ -62,9 +63,13 @@ struct watchdog_ops {
>    * @timeout:	The watchdog devices timeout value.
>    * @min_timeout:The watchdog devices minimum timeout value.
>    * @max_timeout:The watchdog devices maximum timeout value.
> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> + * @hw_margin:	Time interval between timer pings.
s/hw_margin/hw_heartbeat/ ?
>    * @driver-data:Pointer to the drivers private data.
>    * @lock:	Lock for watchdog core internal use only.
> + * @work:	Worker used to provide longer timeouts than hardware supports.
>    * @status:	Field that contains the devices internal status bits.
> + * @hw_features:Feature bits describing how the watchdog HW works.
>    *
>    * The watchdog_device structure contains all information about a
>    * watchdog timer device.
> @@ -86,8 +91,12 @@ struct watchdog_device {
>   	unsigned int timeout;
>   	unsigned int min_timeout;
>   	unsigned int max_timeout;
> +	unsigned int hw_max_timeout; /* in jiffies */
> +	unsigned int hw_heartbeat; /* in jiffies */
> +	unsigned long int expires; /* for soft timer */
>   	void *driver_data;
>   	struct mutex lock;
> +	struct delayed_work work;
>   	unsigned long status;
>   /* Bit numbers for status flags */
>   #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
> @@ -95,6 +104,14 @@ struct watchdog_device {
>   #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>   #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>   #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
> +
> +/* Bits describing features supported by the HW */
> +	unsigned long hw_features;
> +
> +/* Can the watchdog be stopped and started */
> +#define WDOG_HW_IS_STOPPABLE		BIT(0)
> +/* Is the watchdog already running when the driver starts up */
> +#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
>   };
>   
>   #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> @@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>   		(t < wdd->min_timeout || t > wdd->max_timeout));
>   }
>   
> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> +{
> +	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> +}
> +
>   /* Use the following functions to manipulate watchdog driver specific data */
>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
>   {
> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
>   /* drivers/watchdog/watchdog_core.c */
>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>   				  unsigned int timeout_parm, struct device *dev);
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
>   extern int watchdog_register_device(struct watchdog_device *);
>   extern void watchdog_unregister_device(struct watchdog_device *);
>   
Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
@ 2015-04-13  7:56     ` Yang, Wenyou
  0 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-13  7:56 UTC (permalink / raw)
  To: linux-arm-kernel


Hi Timo,

There is trivial typo.

On 2015/4/10 21:06, Timo Kokkonen wrote:
> There is a great deal of diversity in the watchdog hardware found on
> different devices. Differen hardware have different contstraints on
> them, many of the constraints that are excessively difficult for the
> user space to satisfy.
>
> One such constraint is the length of the timeout value, which in many
> cases can be just a few seconds. Drivers are creating ad hoc solutions
> with timers and workqueues to extend the timeout in order to give user
> space more time between updates. Looking at the drivers it is clear
> that this has resulted to a lot of duplicate code.
>
> Add an extension to the watchdog kernel API that allows the driver to
> describe tis HW constraints to the watchdog code. A kernel worker in
> the core is then used to extend the watchdog timeout on behalf of the
> user space. This allows the drivers to be simplified as core takes
> over the timer extending.
>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>   drivers/watchdog/watchdog_core.c | 87 ++++++++++++++++++++++++++++++++++++++--
>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>   include/linux/watchdog.h         | 23 +++++++++++
>   3 files changed, 119 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index cec9b55..8e7d08d 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>   
>   /**
> + * watchdog_init_timeout() - initialize generic watchdog parameters
s/watchdog_init_timeout/watchdog_init_params/ ?
> + * @wdd: Watchdog device to operate
> + * @dev: Device that stores the device tree properties
> + *
> + * Initialize the generic timeout parameters. The driver needs to set
> + * hw_features bitmask from @wdd prior calling this function in order
> + * to ensure the core knows how to handle the HW.
> + *
> + * A zero is returned on success and -EINVAL for failure.
> + */
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	/*
> +	 * Max HW timeout needs to be set so that core knows when to
> +	 * use a kernel worker to support longer watchdog timeouts
> +	 */
> +	if (!wdd->hw_max_timeout)
> +		return -EINVAL;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> +
> +static void watchdog_worker(struct work_struct *work)
> +{
> +	struct watchdog_device *wdd = container_of(to_delayed_work(work),
> +						struct watchdog_device, work);
> +
> +	if (time_after(jiffies, wdd->expires)) {
> +		pr_crit("I will reset your machine !\n");
> +		return;
> +	}
> +
> +	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> +		(watchdog_active(wdd) &&
> +			wdd->hw_max_timeout < wdd->timeout * HZ)) {
> +		if (wdd->ops->ping)
> +			wdd->ops->ping(wdd);
> +		else
> +			wdd->ops->start(wdd);
> +
> +		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
> +	}
> +}
> +
> +static int prepare_watchdog(struct watchdog_device *wdd)
> +{
> +	/* Stop the watchdog now before user space opens the device */
> +	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +		wdd->ops->stop(wdd);
> +
> +	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> +		/*
> +		 * Can't stop it, use a kernel timer to tick
> +		 * it until it's open by user space
> +		 */
> +		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +	}
> +	return 0;
> +}
> +
> +/**
>    * watchdog_register_device() - register a watchdog device
>    * @wdd: watchdog device
>    *
> @@ -156,13 +225,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
>   	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>   					NULL, "watchdog%d", wdd->id);
>   	if (IS_ERR(wdd->dev)) {
> -		watchdog_dev_unregister(wdd);
> -		ida_simple_remove(&watchdog_ida, id);
>   		ret = PTR_ERR(wdd->dev);
> -		return ret;
> +		goto dev_create_fail;
> +	}
> +
> +	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> +
> +	if (wdd->hw_max_timeout) {
> +		ret = prepare_watchdog(wdd);
> +		if (ret)
> +			goto dev_create_fail;
>   	}
>   
>   	return 0;
> +
> +dev_create_fail:
> +	watchdog_dev_unregister(wdd);
> +	ida_simple_remove(&watchdog_ida, id);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>   
> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
>   	if (wdd == NULL)
>   		return;
>   
> +	cancel_delayed_work_sync(&wdd->work);
>   	devno = wdd->cdev.dev;
>   	ret = watchdog_dev_unregister(wdd);
>   	if (ret)
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 6aaefba..99eb363 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device *wddev)
>   	else
>   		err = wddev->ops->start(wddev); /* restart watchdog */
>   
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
> +
>   out_ping:
>   	mutex_unlock(&wddev->lock);
>   	return err;
> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device *wddev)
>   	if (err == 0)
>   		set_bit(WDOG_ACTIVE, &wddev->status);
>   
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
> +
>   out_start:
>   	mutex_unlock(&wddev->lock);
>   	return err;
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index 395b70e..c16c921 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -12,6 +12,7 @@
>   #include <linux/bitops.h>
>   #include <linux/device.h>
>   #include <linux/cdev.h>
> +#include <linux/workqueue.h>
>   #include <uapi/linux/watchdog.h>
>   
>   struct watchdog_ops;
> @@ -62,9 +63,13 @@ struct watchdog_ops {
>    * @timeout:	The watchdog devices timeout value.
>    * @min_timeout:The watchdog devices minimum timeout value.
>    * @max_timeout:The watchdog devices maximum timeout value.
> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> + * @hw_margin:	Time interval between timer pings.
s/hw_margin/hw_heartbeat/ ?
>    * @driver-data:Pointer to the drivers private data.
>    * @lock:	Lock for watchdog core internal use only.
> + * @work:	Worker used to provide longer timeouts than hardware supports.
>    * @status:	Field that contains the devices internal status bits.
> + * @hw_features:Feature bits describing how the watchdog HW works.
>    *
>    * The watchdog_device structure contains all information about a
>    * watchdog timer device.
> @@ -86,8 +91,12 @@ struct watchdog_device {
>   	unsigned int timeout;
>   	unsigned int min_timeout;
>   	unsigned int max_timeout;
> +	unsigned int hw_max_timeout; /* in jiffies */
> +	unsigned int hw_heartbeat; /* in jiffies */
> +	unsigned long int expires; /* for soft timer */
>   	void *driver_data;
>   	struct mutex lock;
> +	struct delayed_work work;
>   	unsigned long status;
>   /* Bit numbers for status flags */
>   #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
> @@ -95,6 +104,14 @@ struct watchdog_device {
>   #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>   #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>   #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
> +
> +/* Bits describing features supported by the HW */
> +	unsigned long hw_features;
> +
> +/* Can the watchdog be stopped and started */
> +#define WDOG_HW_IS_STOPPABLE		BIT(0)
> +/* Is the watchdog already running when the driver starts up */
> +#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
>   };
>   
>   #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> @@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>   		(t < wdd->min_timeout || t > wdd->max_timeout));
>   }
>   
> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> +{
> +	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> +}
> +
>   /* Use the following functions to manipulate watchdog driver specific data */
>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
>   {
> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
>   /* drivers/watchdog/watchdog_core.c */
>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>   				  unsigned int timeout_parm, struct device *dev);
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
>   extern int watchdog_register_device(struct watchdog_device *);
>   extern void watchdog_unregister_device(struct watchdog_device *);
>   
Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv5 0/4] watchdog: Extend kernel API and add early_timeout_sec feature
  2015-04-13  7:55   ` Yang, Wenyou
@ 2015-04-13  8:19     ` Timo Kokkonen
  -1 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-13  8:19 UTC (permalink / raw)
  To: Yang, Wenyou, linux-arm-kernel, linux-watchdog, boris.brezillon,
	nicolas.ferre, alexandre.belloni

On 13.04.2015 10:55, Yang, Wenyou wrote:
>
>
> On 2015/4/10 21:06, Timo Kokkonen wrote:
>> The watchdog kernel API is quite limited. It has support for providing
>> generic device handling, but it doesn't really know anything about the
>> watchdog hardware or its constraints. The watchdog drivers come with a
>> lot of diversity and their own set of quirks and constraints. Some of
>> their limitations are not nice for the user space, so the drivers work
>> around them with all sorts of ad hoc implementations.
>>
>> One common pattern is to use kernel timers or work queues to allow
>> longer timeout parameters than the actual hardware supports. To solve
>> this problem, this patch set extends the kernel watchdog API with a few
>> parameters that let the core know more about the watchdog HW and take
>> care about the timeout extending.
>>
>> The patch set also implements "early_timeout_sec" feature that is very
>> common on many production systems where early kernel or user space
>> crashes must lead to a device reset. Traditional watchdog handling
>> does not allow this as the watchdog is stopped (fully or emulating
>> stopped state with kernel timers) before user space opens it for the
>> first time.
>>
>> The changes are designed to be taken in use one driver at time. If the
>> driver does not set the new parameters and call
>> watchdog_init_params(), the watchdog behavior is exactly the same as
>> before.
>>
>> In principle this new API makes it possible for the user space to see
>> every watchdog hardware to behave the same, at least in terms of
>> watchdog timeouts. Once the API is in, it should be easier to move
>> even more common behavior out of the driver code to the watchdog core
>> and make the drivers simpler.
>>
>> I'm not anticipating this patch set to go in as is, this is merely to
>> help bringing up the discussion as there is a working patch set to look
>> at. There are propably some corner cases that I didn't get right
>> yet. Also, we could even remove the maximum timeout completely as the
>> HW no longer limits the watchdog timeouts. This propably needs some
>> thinking still and your commenting.
>>
>> Please review and give feedback.
> Tested on the SAMA5D4EK.
>
> Tested-by Wenyou Yang <wenyou.yang@atmel.com>

Thank you, the typos you pointed will be fixed on the next version.

-Timo

>>
>> Patch revision history:
>>
>> -v5: Re-think the approach to be fully generic. The early_timeout_sec
>>    handling is no longer in the driver but in the watchdog core. As a
>>    result the core needed to gain knowledge about the watchdog
>>    hardware. Appropriate handling is added in the core. The side effect
>>    for this is that drivers using the new extensions can be simplified
>>    a lot and different kinds of watchdog hardware can be made to
>>    behave the same for the user space.
>>
>> -v4: Binding documentation is now separated completely from the driver
>>    patch. The documentation no longer makes any assumptions about how
>>    the actual implementation is made, it just describes the actual
>>    behavior the driver should implement in order to satisfy the
>>    requirement.
>>
>> - v3: Rename the property to "early-timeout-sec" and use it as a
>>    timeout value that stops the timer in the atmel driver after the
>>    timeout expires. A watchdog.txt is also introduced for documenting
>>    the common watchdog properties, including now this one and
>>    "timeout-sec" property.
>>
>> - v2: Rename the property to "enable-early-reset" as the behavior
>>    itself is not atmel specific. This way other drivers are free to
>>    implement same behavior with the same property name.
>>
>> - v1: Propose property name "atmle,no-early-timer" for disabling the
>>    timer that keeps the atmel watchdog running until user space opens
>>    the device.
>>
>>
>> Timo Kokkonen (4):
>>    watchdog: Extend kernel API to know about HW limitations
>>    watchdog: Allow watchdog to reset device at early boot
>>    devicetree: Document generic watchdog properties
>>    watchdog: at91sam9_wdt: Convert to use new watchdog core extensions
>>
>>   .../devicetree/bindings/watchdog/watchdog.txt      |  20 ++++
>>   drivers/watchdog/at91sam9_wdt.c                    |  58 +++--------
>>   drivers/watchdog/watchdog_core.c                   | 108
>> ++++++++++++++++++++-
>>   drivers/watchdog/watchdog_dev.c                    |  12 +++
>>   include/linux/watchdog.h                           |  24 +++++
>>   5 files changed, 173 insertions(+), 49 deletions(-)
>>   create mode 100644
>> Documentation/devicetree/bindings/watchdog/watchdog.txt
>>
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 0/4] watchdog: Extend kernel API and add early_timeout_sec feature
@ 2015-04-13  8:19     ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-13  8:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 13.04.2015 10:55, Yang, Wenyou wrote:
>
>
> On 2015/4/10 21:06, Timo Kokkonen wrote:
>> The watchdog kernel API is quite limited. It has support for providing
>> generic device handling, but it doesn't really know anything about the
>> watchdog hardware or its constraints. The watchdog drivers come with a
>> lot of diversity and their own set of quirks and constraints. Some of
>> their limitations are not nice for the user space, so the drivers work
>> around them with all sorts of ad hoc implementations.
>>
>> One common pattern is to use kernel timers or work queues to allow
>> longer timeout parameters than the actual hardware supports. To solve
>> this problem, this patch set extends the kernel watchdog API with a few
>> parameters that let the core know more about the watchdog HW and take
>> care about the timeout extending.
>>
>> The patch set also implements "early_timeout_sec" feature that is very
>> common on many production systems where early kernel or user space
>> crashes must lead to a device reset. Traditional watchdog handling
>> does not allow this as the watchdog is stopped (fully or emulating
>> stopped state with kernel timers) before user space opens it for the
>> first time.
>>
>> The changes are designed to be taken in use one driver at time. If the
>> driver does not set the new parameters and call
>> watchdog_init_params(), the watchdog behavior is exactly the same as
>> before.
>>
>> In principle this new API makes it possible for the user space to see
>> every watchdog hardware to behave the same, at least in terms of
>> watchdog timeouts. Once the API is in, it should be easier to move
>> even more common behavior out of the driver code to the watchdog core
>> and make the drivers simpler.
>>
>> I'm not anticipating this patch set to go in as is, this is merely to
>> help bringing up the discussion as there is a working patch set to look
>> at. There are propably some corner cases that I didn't get right
>> yet. Also, we could even remove the maximum timeout completely as the
>> HW no longer limits the watchdog timeouts. This propably needs some
>> thinking still and your commenting.
>>
>> Please review and give feedback.
> Tested on the SAMA5D4EK.
>
> Tested-by Wenyou Yang <wenyou.yang@atmel.com>

Thank you, the typos you pointed will be fixed on the next version.

-Timo

>>
>> Patch revision history:
>>
>> -v5: Re-think the approach to be fully generic. The early_timeout_sec
>>    handling is no longer in the driver but in the watchdog core. As a
>>    result the core needed to gain knowledge about the watchdog
>>    hardware. Appropriate handling is added in the core. The side effect
>>    for this is that drivers using the new extensions can be simplified
>>    a lot and different kinds of watchdog hardware can be made to
>>    behave the same for the user space.
>>
>> -v4: Binding documentation is now separated completely from the driver
>>    patch. The documentation no longer makes any assumptions about how
>>    the actual implementation is made, it just describes the actual
>>    behavior the driver should implement in order to satisfy the
>>    requirement.
>>
>> - v3: Rename the property to "early-timeout-sec" and use it as a
>>    timeout value that stops the timer in the atmel driver after the
>>    timeout expires. A watchdog.txt is also introduced for documenting
>>    the common watchdog properties, including now this one and
>>    "timeout-sec" property.
>>
>> - v2: Rename the property to "enable-early-reset" as the behavior
>>    itself is not atmel specific. This way other drivers are free to
>>    implement same behavior with the same property name.
>>
>> - v1: Propose property name "atmle,no-early-timer" for disabling the
>>    timer that keeps the atmel watchdog running until user space opens
>>    the device.
>>
>>
>> Timo Kokkonen (4):
>>    watchdog: Extend kernel API to know about HW limitations
>>    watchdog: Allow watchdog to reset device at early boot
>>    devicetree: Document generic watchdog properties
>>    watchdog: at91sam9_wdt: Convert to use new watchdog core extensions
>>
>>   .../devicetree/bindings/watchdog/watchdog.txt      |  20 ++++
>>   drivers/watchdog/at91sam9_wdt.c                    |  58 +++--------
>>   drivers/watchdog/watchdog_core.c                   | 108
>> ++++++++++++++++++++-
>>   drivers/watchdog/watchdog_dev.c                    |  12 +++
>>   include/linux/watchdog.h                           |  24 +++++
>>   5 files changed, 173 insertions(+), 49 deletions(-)
>>   create mode 100644
>> Documentation/devicetree/bindings/watchdog/watchdog.txt
>>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
  2015-04-10 13:06   ` Timo Kokkonen
@ 2015-04-13  9:29     ` Yang, Wenyou
  -1 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-13  9:29 UTC (permalink / raw)
  To: Timo Kokkonen, linux-arm-kernel, linux-watchdog, boris.brezillon,
	nicolas.ferre, alexandre.belloni

Hi Timo,

If the /dev/watchdog device file doesn't open by the user space 
application, the system will reboot by the watchdog.

I think it is missing update of  the wdd->expires value. added it as 
following.

On 2015/4/10 21:06, Timo Kokkonen wrote:
> There is a great deal of diversity in the watchdog hardware found on
> different devices. Differen hardware have different contstraints on
> them, many of the constraints that are excessively difficult for the
> user space to satisfy.
>
> One such constraint is the length of the timeout value, which in many
> cases can be just a few seconds. Drivers are creating ad hoc solutions
> with timers and workqueues to extend the timeout in order to give user
> space more time between updates. Looking at the drivers it is clear
> that this has resulted to a lot of duplicate code.
>
> Add an extension to the watchdog kernel API that allows the driver to
> describe tis HW constraints to the watchdog code. A kernel worker in
> the core is then used to extend the watchdog timeout on behalf of the
> user space. This allows the drivers to be simplified as core takes
> over the timer extending.
>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>   drivers/watchdog/watchdog_core.c | 87 ++++++++++++++++++++++++++++++++++++++--
>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>   include/linux/watchdog.h         | 23 +++++++++++
>   3 files changed, 119 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index cec9b55..8e7d08d 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>   
>   /**
> + * watchdog_init_timeout() - initialize generic watchdog parameters
> + * @wdd: Watchdog device to operate
> + * @dev: Device that stores the device tree properties
> + *
> + * Initialize the generic timeout parameters. The driver needs to set
> + * hw_features bitmask from @wdd prior calling this function in order
> + * to ensure the core knows how to handle the HW.
> + *
> + * A zero is returned on success and -EINVAL for failure.
> + */
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	/*
> +	 * Max HW timeout needs to be set so that core knows when to
> +	 * use a kernel worker to support longer watchdog timeouts
> +	 */
> +	if (!wdd->hw_max_timeout)
> +		return -EINVAL;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> +
> +static void watchdog_worker(struct work_struct *work)
> +{
> +	struct watchdog_device *wdd = container_of(to_delayed_work(work),
> +						struct watchdog_device, work);
> +
> +	if (time_after(jiffies, wdd->expires)) {
> +		pr_crit("I will reset your machine !\n");
> +		return;
> +	}
> +
> +	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> +		(watchdog_active(wdd) &&
> +			wdd->hw_max_timeout < wdd->timeout * HZ)) {
> +		if (wdd->ops->ping)
> +			wdd->ops->ping(wdd);
> +		else
> +			wdd->ops->start(wdd);
> +
> +		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
    I think you missed the update of wdd->expires, as following.
                                    wdd->expires = jiffies + 
wdd->timeout * HZ;
> +	}
> +}
> +
> +static int prepare_watchdog(struct watchdog_device *wdd)
> +{
> +	/* Stop the watchdog now before user space opens the device */
> +	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +		wdd->ops->stop(wdd);
> +
> +	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> +		/*
> +		 * Can't stop it, use a kernel timer to tick
> +		 * it until it's open by user space
> +		 */
> +		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +	}
      ditto
                     wdd->expires = jiffies + wdd->timeout * HZ;

> +	return 0;
> +}
> +
> +/**
>    * watchdog_register_device() - register a watchdog device
>    * @wdd: watchdog device
>    *
> @@ -156,13 +225,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
>   	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>   					NULL, "watchdog%d", wdd->id);
>   	if (IS_ERR(wdd->dev)) {
> -		watchdog_dev_unregister(wdd);
> -		ida_simple_remove(&watchdog_ida, id);
>   		ret = PTR_ERR(wdd->dev);
> -		return ret;
> +		goto dev_create_fail;
> +	}
> +
> +	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> +
> +	if (wdd->hw_max_timeout) {
> +		ret = prepare_watchdog(wdd);
> +		if (ret)
> +			goto dev_create_fail;
>   	}
>   
>   	return 0;
> +
> +dev_create_fail:
> +	watchdog_dev_unregister(wdd);
> +	ida_simple_remove(&watchdog_ida, id);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>   
> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
>   	if (wdd == NULL)
>   		return;
>   
> +	cancel_delayed_work_sync(&wdd->work);
>   	devno = wdd->cdev.dev;
>   	ret = watchdog_dev_unregister(wdd);
>   	if (ret)
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 6aaefba..99eb363 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device *wddev)
>   	else
>   		err = wddev->ops->start(wddev); /* restart watchdog */
>   
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
> +
>   out_ping:
>   	mutex_unlock(&wddev->lock);
>   	return err;
> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device *wddev)
>   	if (err == 0)
>   		set_bit(WDOG_ACTIVE, &wddev->status);
>   
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
> +
>   out_start:
>   	mutex_unlock(&wddev->lock);
>   	return err;
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index 395b70e..c16c921 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -12,6 +12,7 @@
>   #include <linux/bitops.h>
>   #include <linux/device.h>
>   #include <linux/cdev.h>
> +#include <linux/workqueue.h>
>   #include <uapi/linux/watchdog.h>
>   
>   struct watchdog_ops;
> @@ -62,9 +63,13 @@ struct watchdog_ops {
>    * @timeout:	The watchdog devices timeout value.
>    * @min_timeout:The watchdog devices minimum timeout value.
>    * @max_timeout:The watchdog devices maximum timeout value.
> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> + * @hw_margin:	Time interval between timer pings.
>    * @driver-data:Pointer to the drivers private data.
>    * @lock:	Lock for watchdog core internal use only.
> + * @work:	Worker used to provide longer timeouts than hardware supports.
>    * @status:	Field that contains the devices internal status bits.
> + * @hw_features:Feature bits describing how the watchdog HW works.
>    *
>    * The watchdog_device structure contains all information about a
>    * watchdog timer device.
> @@ -86,8 +91,12 @@ struct watchdog_device {
>   	unsigned int timeout;
>   	unsigned int min_timeout;
>   	unsigned int max_timeout;
> +	unsigned int hw_max_timeout; /* in jiffies */
> +	unsigned int hw_heartbeat; /* in jiffies */
> +	unsigned long int expires; /* for soft timer */
>   	void *driver_data;
>   	struct mutex lock;
> +	struct delayed_work work;
>   	unsigned long status;
>   /* Bit numbers for status flags */
>   #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
> @@ -95,6 +104,14 @@ struct watchdog_device {
>   #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>   #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>   #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
> +
> +/* Bits describing features supported by the HW */
> +	unsigned long hw_features;
> +
> +/* Can the watchdog be stopped and started */
> +#define WDOG_HW_IS_STOPPABLE		BIT(0)
> +/* Is the watchdog already running when the driver starts up */
> +#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
>   };
>   
>   #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> @@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>   		(t < wdd->min_timeout || t > wdd->max_timeout));
>   }
>   
> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> +{
> +	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> +}
> +
>   /* Use the following functions to manipulate watchdog driver specific data */
>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
>   {
> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
>   /* drivers/watchdog/watchdog_core.c */
>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>   				  unsigned int timeout_parm, struct device *dev);
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
>   extern int watchdog_register_device(struct watchdog_device *);
>   extern void watchdog_unregister_device(struct watchdog_device *);
>   
Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
@ 2015-04-13  9:29     ` Yang, Wenyou
  0 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-13  9:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Timo,

If the /dev/watchdog device file doesn't open by the user space 
application, the system will reboot by the watchdog.

I think it is missing update of  the wdd->expires value. added it as 
following.

On 2015/4/10 21:06, Timo Kokkonen wrote:
> There is a great deal of diversity in the watchdog hardware found on
> different devices. Differen hardware have different contstraints on
> them, many of the constraints that are excessively difficult for the
> user space to satisfy.
>
> One such constraint is the length of the timeout value, which in many
> cases can be just a few seconds. Drivers are creating ad hoc solutions
> with timers and workqueues to extend the timeout in order to give user
> space more time between updates. Looking at the drivers it is clear
> that this has resulted to a lot of duplicate code.
>
> Add an extension to the watchdog kernel API that allows the driver to
> describe tis HW constraints to the watchdog code. A kernel worker in
> the core is then used to extend the watchdog timeout on behalf of the
> user space. This allows the drivers to be simplified as core takes
> over the timer extending.
>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>   drivers/watchdog/watchdog_core.c | 87 ++++++++++++++++++++++++++++++++++++++--
>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>   include/linux/watchdog.h         | 23 +++++++++++
>   3 files changed, 119 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index cec9b55..8e7d08d 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>   
>   /**
> + * watchdog_init_timeout() - initialize generic watchdog parameters
> + * @wdd: Watchdog device to operate
> + * @dev: Device that stores the device tree properties
> + *
> + * Initialize the generic timeout parameters. The driver needs to set
> + * hw_features bitmask from @wdd prior calling this function in order
> + * to ensure the core knows how to handle the HW.
> + *
> + * A zero is returned on success and -EINVAL for failure.
> + */
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	/*
> +	 * Max HW timeout needs to be set so that core knows when to
> +	 * use a kernel worker to support longer watchdog timeouts
> +	 */
> +	if (!wdd->hw_max_timeout)
> +		return -EINVAL;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> +
> +static void watchdog_worker(struct work_struct *work)
> +{
> +	struct watchdog_device *wdd = container_of(to_delayed_work(work),
> +						struct watchdog_device, work);
> +
> +	if (time_after(jiffies, wdd->expires)) {
> +		pr_crit("I will reset your machine !\n");
> +		return;
> +	}
> +
> +	if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> +		(watchdog_active(wdd) &&
> +			wdd->hw_max_timeout < wdd->timeout * HZ)) {
> +		if (wdd->ops->ping)
> +			wdd->ops->ping(wdd);
> +		else
> +			wdd->ops->start(wdd);
> +
> +		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
    I think you missed the update of wdd->expires, as following.
                                    wdd->expires = jiffies + 
wdd->timeout * HZ;
> +	}
> +}
> +
> +static int prepare_watchdog(struct watchdog_device *wdd)
> +{
> +	/* Stop the watchdog now before user space opens the device */
> +	if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +		wdd->ops->stop(wdd);
> +
> +	} else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> +		/*
> +		 * Can't stop it, use a kernel timer to tick
> +		 * it until it's open by user space
> +		 */
> +		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +	}
      ditto
                     wdd->expires = jiffies + wdd->timeout * HZ;

> +	return 0;
> +}
> +
> +/**
>    * watchdog_register_device() - register a watchdog device
>    * @wdd: watchdog device
>    *
> @@ -156,13 +225,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
>   	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>   					NULL, "watchdog%d", wdd->id);
>   	if (IS_ERR(wdd->dev)) {
> -		watchdog_dev_unregister(wdd);
> -		ida_simple_remove(&watchdog_ida, id);
>   		ret = PTR_ERR(wdd->dev);
> -		return ret;
> +		goto dev_create_fail;
> +	}
> +
> +	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> +
> +	if (wdd->hw_max_timeout) {
> +		ret = prepare_watchdog(wdd);
> +		if (ret)
> +			goto dev_create_fail;
>   	}
>   
>   	return 0;
> +
> +dev_create_fail:
> +	watchdog_dev_unregister(wdd);
> +	ida_simple_remove(&watchdog_ida, id);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>   
> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
>   	if (wdd == NULL)
>   		return;
>   
> +	cancel_delayed_work_sync(&wdd->work);
>   	devno = wdd->cdev.dev;
>   	ret = watchdog_dev_unregister(wdd);
>   	if (ret)
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 6aaefba..99eb363 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device *wddev)
>   	else
>   		err = wddev->ops->start(wddev); /* restart watchdog */
>   
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
> +
>   out_ping:
>   	mutex_unlock(&wddev->lock);
>   	return err;
> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device *wddev)
>   	if (err == 0)
>   		set_bit(WDOG_ACTIVE, &wddev->status);
>   
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
> +
>   out_start:
>   	mutex_unlock(&wddev->lock);
>   	return err;
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index 395b70e..c16c921 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -12,6 +12,7 @@
>   #include <linux/bitops.h>
>   #include <linux/device.h>
>   #include <linux/cdev.h>
> +#include <linux/workqueue.h>
>   #include <uapi/linux/watchdog.h>
>   
>   struct watchdog_ops;
> @@ -62,9 +63,13 @@ struct watchdog_ops {
>    * @timeout:	The watchdog devices timeout value.
>    * @min_timeout:The watchdog devices minimum timeout value.
>    * @max_timeout:The watchdog devices maximum timeout value.
> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> + * @hw_margin:	Time interval between timer pings.
>    * @driver-data:Pointer to the drivers private data.
>    * @lock:	Lock for watchdog core internal use only.
> + * @work:	Worker used to provide longer timeouts than hardware supports.
>    * @status:	Field that contains the devices internal status bits.
> + * @hw_features:Feature bits describing how the watchdog HW works.
>    *
>    * The watchdog_device structure contains all information about a
>    * watchdog timer device.
> @@ -86,8 +91,12 @@ struct watchdog_device {
>   	unsigned int timeout;
>   	unsigned int min_timeout;
>   	unsigned int max_timeout;
> +	unsigned int hw_max_timeout; /* in jiffies */
> +	unsigned int hw_heartbeat; /* in jiffies */
> +	unsigned long int expires; /* for soft timer */
>   	void *driver_data;
>   	struct mutex lock;
> +	struct delayed_work work;
>   	unsigned long status;
>   /* Bit numbers for status flags */
>   #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
> @@ -95,6 +104,14 @@ struct watchdog_device {
>   #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>   #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>   #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
> +
> +/* Bits describing features supported by the HW */
> +	unsigned long hw_features;
> +
> +/* Can the watchdog be stopped and started */
> +#define WDOG_HW_IS_STOPPABLE		BIT(0)
> +/* Is the watchdog already running when the driver starts up */
> +#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
>   };
>   
>   #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> @@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>   		(t < wdd->min_timeout || t > wdd->max_timeout));
>   }
>   
> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> +{
> +	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> +}
> +
>   /* Use the following functions to manipulate watchdog driver specific data */
>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
>   {
> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
>   /* drivers/watchdog/watchdog_core.c */
>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>   				  unsigned int timeout_parm, struct device *dev);
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
>   extern int watchdog_register_device(struct watchdog_device *);
>   extern void watchdog_unregister_device(struct watchdog_device *);
>   
Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
  2015-04-13  9:29     ` Yang, Wenyou
@ 2015-04-13 11:00       ` Timo Kokkonen
  -1 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-13 11:00 UTC (permalink / raw)
  To: Yang, Wenyou, linux-arm-kernel, linux-watchdog, boris.brezillon,
	nicolas.ferre, alexandre.belloni

On 13.04.2015 12:29, Yang, Wenyou wrote:
> Hi Timo,
>
> If the /dev/watchdog device file doesn't open by the user space
> application, the system will reboot by the watchdog.
>

Thank you for pointing out the bug. When the struct watchdog_device is 
initialized, the expires variable is zero. Jiffies however is negative 
on boot up, so time_after() comparison with zero returns false and the 
worker keeps the watchdog alive for some minutes, until jiffies become 
positive. Then it will stop updating the HW as the expires value has 
become expired.

> I think it is missing update of  the wdd->expires value. added it as
> following.

Actually, the expires value stores the user space expiration value. It 
should only be updated when user space pings the watchdog. If we 
increment the expires value in the worker thread, the watchdog will 
never expire even if user space stops pinging it and keep it open.

So the proper fix is to have it like this (in watchdog_worker function):

	if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
		pr_crit("I will reset your machine !\n");
		return;
	}

Also we need this change to re-start the worker when watchdog is 
properly stopped (in watchdog_dev.c):

@@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device *wddev)
                 goto out_stop;
         }

-       err = wddev->ops->stop(wddev);
-       if (err == 0)
+       if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
+               err = wddev->ops->stop(wddev);
+               if (err == 0)
+                       clear_bit(WDOG_ACTIVE, &wddev->status);
+       } else {
+               /* Unstoppable watchdogs need the worker to keep them 
alive */
                 clear_bit(WDOG_ACTIVE, &wddev->status);
+               schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+       }

  out_stop:
         mutex_unlock(&wddev->lock);


After these two modifications the watchdog is being pinged indefinitely 
until the user space opens the device and the worker is re-started if 
watchdog daemon exits after writing the magic character. I will update 
these change to the next version of the patch.

Thanks again!

-Timo

> On 2015/4/10 21:06, Timo Kokkonen wrote:
>> There is a great deal of diversity in the watchdog hardware found on
>> different devices. Differen hardware have different contstraints on
>> them, many of the constraints that are excessively difficult for the
>> user space to satisfy.
>>
>> One such constraint is the length of the timeout value, which in many
>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>> with timers and workqueues to extend the timeout in order to give user
>> space more time between updates. Looking at the drivers it is clear
>> that this has resulted to a lot of duplicate code.
>>
>> Add an extension to the watchdog kernel API that allows the driver to
>> describe tis HW constraints to the watchdog code. A kernel worker in
>> the core is then used to extend the watchdog timeout on behalf of the
>> user space. This allows the drivers to be simplified as core takes
>> over the timer extending.
>>
>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>> ---
>>   drivers/watchdog/watchdog_core.c | 87
>> ++++++++++++++++++++++++++++++++++++++--
>>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>>   include/linux/watchdog.h         | 23 +++++++++++
>>   3 files changed, 119 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/watchdog/watchdog_core.c
>> b/drivers/watchdog/watchdog_core.c
>> index cec9b55..8e7d08d 100644
>> --- a/drivers/watchdog/watchdog_core.c
>> +++ b/drivers/watchdog/watchdog_core.c
>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>>   /**
>> + * watchdog_init_timeout() - initialize generic watchdog parameters
>> + * @wdd: Watchdog device to operate
>> + * @dev: Device that stores the device tree properties
>> + *
>> + * Initialize the generic timeout parameters. The driver needs to set
>> + * hw_features bitmask from @wdd prior calling this function in order
>> + * to ensure the core knows how to handle the HW.
>> + *
>> + * A zero is returned on success and -EINVAL for failure.
>> + */
>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>> *dev)
>> +{
>> +    int ret = 0;
>> +
>> +    ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>> +    if (ret < 0)
>> +        return ret;
>> +
>> +    /*
>> +     * Max HW timeout needs to be set so that core knows when to
>> +     * use a kernel worker to support longer watchdog timeouts
>> +     */
>> +    if (!wdd->hw_max_timeout)
>> +        return -EINVAL;
>> +
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
>> +
>> +static void watchdog_worker(struct work_struct *work)
>> +{
>> +    struct watchdog_device *wdd = container_of(to_delayed_work(work),
>> +                        struct watchdog_device, work);
>> +
>> +    if (time_after(jiffies, wdd->expires)) {
>> +        pr_crit("I will reset your machine !\n");
>> +        return;
>> +    }
>> +
>> +    if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
>> +        (watchdog_active(wdd) &&
>> +            wdd->hw_max_timeout < wdd->timeout * HZ)) {
>> +        if (wdd->ops->ping)
>> +            wdd->ops->ping(wdd);
>> +        else
>> +            wdd->ops->start(wdd);
>> +
>> +        schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
>     I think you missed the update of wdd->expires, as following.
>                                     wdd->expires = jiffies +
> wdd->timeout * HZ;
>> +    }
>> +}
>> +
>> +static int prepare_watchdog(struct watchdog_device *wdd)
>> +{
>> +    /* Stop the watchdog now before user space opens the device */
>> +    if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
>> +        wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
>> +        wdd->ops->stop(wdd);
>> +
>> +    } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
>> +        /*
>> +         * Can't stop it, use a kernel timer to tick
>> +         * it until it's open by user space
>> +         */
>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>> +    }
>       ditto
>                      wdd->expires = jiffies + wdd->timeout * HZ;
>
>> +    return 0;
>> +}
>> +
>> +/**
>>    * watchdog_register_device() - register a watchdog device
>>    * @wdd: watchdog device
>>    *
>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
>> watchdog_device *wdd)
>>       wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>>                       NULL, "watchdog%d", wdd->id);
>>       if (IS_ERR(wdd->dev)) {
>> -        watchdog_dev_unregister(wdd);
>> -        ida_simple_remove(&watchdog_ida, id);
>>           ret = PTR_ERR(wdd->dev);
>> -        return ret;
>> +        goto dev_create_fail;
>> +    }
>> +
>> +    INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
>> +
>> +    if (wdd->hw_max_timeout) {
>> +        ret = prepare_watchdog(wdd);
>> +        if (ret)
>> +            goto dev_create_fail;
>>       }
>>       return 0;
>> +
>> +dev_create_fail:
>> +    watchdog_dev_unregister(wdd);
>> +    ida_simple_remove(&watchdog_ida, id);
>> +    return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
>> watchdog_device *wdd)
>>       if (wdd == NULL)
>>           return;
>> +    cancel_delayed_work_sync(&wdd->work);
>>       devno = wdd->cdev.dev;
>>       ret = watchdog_dev_unregister(wdd);
>>       if (ret)
>> diff --git a/drivers/watchdog/watchdog_dev.c
>> b/drivers/watchdog/watchdog_dev.c
>> index 6aaefba..99eb363 100644
>> --- a/drivers/watchdog/watchdog_dev.c
>> +++ b/drivers/watchdog/watchdog_dev.c
>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
>> *wddev)
>>       else
>>           err = wddev->ops->start(wddev); /* restart watchdog */
>> +    if (wddev->hw_max_timeout &&
>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +    }
>> +
>>   out_ping:
>>       mutex_unlock(&wddev->lock);
>>       return err;
>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
>> *wddev)
>>       if (err == 0)
>>           set_bit(WDOG_ACTIVE, &wddev->status);
>> +    if (wddev->hw_max_timeout &&
>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +    }
>> +
>>   out_start:
>>       mutex_unlock(&wddev->lock);
>>       return err;
>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
>> index 395b70e..c16c921 100644
>> --- a/include/linux/watchdog.h
>> +++ b/include/linux/watchdog.h
>> @@ -12,6 +12,7 @@
>>   #include <linux/bitops.h>
>>   #include <linux/device.h>
>>   #include <linux/cdev.h>
>> +#include <linux/workqueue.h>
>>   #include <uapi/linux/watchdog.h>
>>   struct watchdog_ops;
>> @@ -62,9 +63,13 @@ struct watchdog_ops {
>>    * @timeout:    The watchdog devices timeout value.
>>    * @min_timeout:The watchdog devices minimum timeout value.
>>    * @max_timeout:The watchdog devices maximum timeout value.
>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
>> + * @hw_margin:    Time interval between timer pings.
>>    * @driver-data:Pointer to the drivers private data.
>>    * @lock:    Lock for watchdog core internal use only.
>> + * @work:    Worker used to provide longer timeouts than hardware
>> supports.
>>    * @status:    Field that contains the devices internal status bits.
>> + * @hw_features:Feature bits describing how the watchdog HW works.
>>    *
>>    * The watchdog_device structure contains all information about a
>>    * watchdog timer device.
>> @@ -86,8 +91,12 @@ struct watchdog_device {
>>       unsigned int timeout;
>>       unsigned int min_timeout;
>>       unsigned int max_timeout;
>> +    unsigned int hw_max_timeout; /* in jiffies */
>> +    unsigned int hw_heartbeat; /* in jiffies */
>> +    unsigned long int expires; /* for soft timer */
>>       void *driver_data;
>>       struct mutex lock;
>> +    struct delayed_work work;
>>       unsigned long status;
>>   /* Bit numbers for status flags */
>>   #define WDOG_ACTIVE        0    /* Is the watchdog running/active */
>> @@ -95,6 +104,14 @@ struct watchdog_device {
>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char
>> ? */
>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
>>   #define WDOG_UNREGISTERED    4    /* Has the device been
>> unregistered */
>> +
>> +/* Bits describing features supported by the HW */
>> +    unsigned long hw_features;
>> +
>> +/* Can the watchdog be stopped and started */
>> +#define WDOG_HW_IS_STOPPABLE        BIT(0)
>> +/* Is the watchdog already running when the driver starts up */
>> +#define WDOG_HW_RUNNING_AT_BOOT        BIT(1)
>>   };
>>   #define WATCHDOG_NOWAYOUT        IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
>> @@ -120,6 +137,11 @@ static inline bool
>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>>           (t < wdd->min_timeout || t > wdd->max_timeout));
>>   }
>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
>> +{
>> +    return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
>> +}
>> +
>>   /* Use the following functions to manipulate watchdog driver
>> specific data */
>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
>> void *data)
>>   {
>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
>> watchdog_device *wdd)
>>   /* drivers/watchdog/watchdog_core.c */
>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>>                     unsigned int timeout_parm, struct device *dev);
>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>> *dev);
>>   extern int watchdog_register_device(struct watchdog_device *);
>>   extern void watchdog_unregister_device(struct watchdog_device *);
> Best Regards,
> Wenyou Yang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
@ 2015-04-13 11:00       ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-13 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 13.04.2015 12:29, Yang, Wenyou wrote:
> Hi Timo,
>
> If the /dev/watchdog device file doesn't open by the user space
> application, the system will reboot by the watchdog.
>

Thank you for pointing out the bug. When the struct watchdog_device is 
initialized, the expires variable is zero. Jiffies however is negative 
on boot up, so time_after() comparison with zero returns false and the 
worker keeps the watchdog alive for some minutes, until jiffies become 
positive. Then it will stop updating the HW as the expires value has 
become expired.

> I think it is missing update of  the wdd->expires value. added it as
> following.

Actually, the expires value stores the user space expiration value. It 
should only be updated when user space pings the watchdog. If we 
increment the expires value in the worker thread, the watchdog will 
never expire even if user space stops pinging it and keep it open.

So the proper fix is to have it like this (in watchdog_worker function):

	if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
		pr_crit("I will reset your machine !\n");
		return;
	}

Also we need this change to re-start the worker when watchdog is 
properly stopped (in watchdog_dev.c):

@@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device *wddev)
                 goto out_stop;
         }

-       err = wddev->ops->stop(wddev);
-       if (err == 0)
+       if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
+               err = wddev->ops->stop(wddev);
+               if (err == 0)
+                       clear_bit(WDOG_ACTIVE, &wddev->status);
+       } else {
+               /* Unstoppable watchdogs need the worker to keep them 
alive */
                 clear_bit(WDOG_ACTIVE, &wddev->status);
+               schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+       }

  out_stop:
         mutex_unlock(&wddev->lock);


After these two modifications the watchdog is being pinged indefinitely 
until the user space opens the device and the worker is re-started if 
watchdog daemon exits after writing the magic character. I will update 
these change to the next version of the patch.

Thanks again!

-Timo

> On 2015/4/10 21:06, Timo Kokkonen wrote:
>> There is a great deal of diversity in the watchdog hardware found on
>> different devices. Differen hardware have different contstraints on
>> them, many of the constraints that are excessively difficult for the
>> user space to satisfy.
>>
>> One such constraint is the length of the timeout value, which in many
>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>> with timers and workqueues to extend the timeout in order to give user
>> space more time between updates. Looking at the drivers it is clear
>> that this has resulted to a lot of duplicate code.
>>
>> Add an extension to the watchdog kernel API that allows the driver to
>> describe tis HW constraints to the watchdog code. A kernel worker in
>> the core is then used to extend the watchdog timeout on behalf of the
>> user space. This allows the drivers to be simplified as core takes
>> over the timer extending.
>>
>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>> ---
>>   drivers/watchdog/watchdog_core.c | 87
>> ++++++++++++++++++++++++++++++++++++++--
>>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>>   include/linux/watchdog.h         | 23 +++++++++++
>>   3 files changed, 119 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/watchdog/watchdog_core.c
>> b/drivers/watchdog/watchdog_core.c
>> index cec9b55..8e7d08d 100644
>> --- a/drivers/watchdog/watchdog_core.c
>> +++ b/drivers/watchdog/watchdog_core.c
>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>>   /**
>> + * watchdog_init_timeout() - initialize generic watchdog parameters
>> + * @wdd: Watchdog device to operate
>> + * @dev: Device that stores the device tree properties
>> + *
>> + * Initialize the generic timeout parameters. The driver needs to set
>> + * hw_features bitmask from @wdd prior calling this function in order
>> + * to ensure the core knows how to handle the HW.
>> + *
>> + * A zero is returned on success and -EINVAL for failure.
>> + */
>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>> *dev)
>> +{
>> +    int ret = 0;
>> +
>> +    ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>> +    if (ret < 0)
>> +        return ret;
>> +
>> +    /*
>> +     * Max HW timeout needs to be set so that core knows when to
>> +     * use a kernel worker to support longer watchdog timeouts
>> +     */
>> +    if (!wdd->hw_max_timeout)
>> +        return -EINVAL;
>> +
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
>> +
>> +static void watchdog_worker(struct work_struct *work)
>> +{
>> +    struct watchdog_device *wdd = container_of(to_delayed_work(work),
>> +                        struct watchdog_device, work);
>> +
>> +    if (time_after(jiffies, wdd->expires)) {
>> +        pr_crit("I will reset your machine !\n");
>> +        return;
>> +    }
>> +
>> +    if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
>> +        (watchdog_active(wdd) &&
>> +            wdd->hw_max_timeout < wdd->timeout * HZ)) {
>> +        if (wdd->ops->ping)
>> +            wdd->ops->ping(wdd);
>> +        else
>> +            wdd->ops->start(wdd);
>> +
>> +        schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
>     I think you missed the update of wdd->expires, as following.
>                                     wdd->expires = jiffies +
> wdd->timeout * HZ;
>> +    }
>> +}
>> +
>> +static int prepare_watchdog(struct watchdog_device *wdd)
>> +{
>> +    /* Stop the watchdog now before user space opens the device */
>> +    if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
>> +        wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
>> +        wdd->ops->stop(wdd);
>> +
>> +    } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
>> +        /*
>> +         * Can't stop it, use a kernel timer to tick
>> +         * it until it's open by user space
>> +         */
>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>> +    }
>       ditto
>                      wdd->expires = jiffies + wdd->timeout * HZ;
>
>> +    return 0;
>> +}
>> +
>> +/**
>>    * watchdog_register_device() - register a watchdog device
>>    * @wdd: watchdog device
>>    *
>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
>> watchdog_device *wdd)
>>       wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>>                       NULL, "watchdog%d", wdd->id);
>>       if (IS_ERR(wdd->dev)) {
>> -        watchdog_dev_unregister(wdd);
>> -        ida_simple_remove(&watchdog_ida, id);
>>           ret = PTR_ERR(wdd->dev);
>> -        return ret;
>> +        goto dev_create_fail;
>> +    }
>> +
>> +    INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
>> +
>> +    if (wdd->hw_max_timeout) {
>> +        ret = prepare_watchdog(wdd);
>> +        if (ret)
>> +            goto dev_create_fail;
>>       }
>>       return 0;
>> +
>> +dev_create_fail:
>> +    watchdog_dev_unregister(wdd);
>> +    ida_simple_remove(&watchdog_ida, id);
>> +    return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
>> watchdog_device *wdd)
>>       if (wdd == NULL)
>>           return;
>> +    cancel_delayed_work_sync(&wdd->work);
>>       devno = wdd->cdev.dev;
>>       ret = watchdog_dev_unregister(wdd);
>>       if (ret)
>> diff --git a/drivers/watchdog/watchdog_dev.c
>> b/drivers/watchdog/watchdog_dev.c
>> index 6aaefba..99eb363 100644
>> --- a/drivers/watchdog/watchdog_dev.c
>> +++ b/drivers/watchdog/watchdog_dev.c
>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
>> *wddev)
>>       else
>>           err = wddev->ops->start(wddev); /* restart watchdog */
>> +    if (wddev->hw_max_timeout &&
>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +    }
>> +
>>   out_ping:
>>       mutex_unlock(&wddev->lock);
>>       return err;
>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
>> *wddev)
>>       if (err == 0)
>>           set_bit(WDOG_ACTIVE, &wddev->status);
>> +    if (wddev->hw_max_timeout &&
>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +    }
>> +
>>   out_start:
>>       mutex_unlock(&wddev->lock);
>>       return err;
>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
>> index 395b70e..c16c921 100644
>> --- a/include/linux/watchdog.h
>> +++ b/include/linux/watchdog.h
>> @@ -12,6 +12,7 @@
>>   #include <linux/bitops.h>
>>   #include <linux/device.h>
>>   #include <linux/cdev.h>
>> +#include <linux/workqueue.h>
>>   #include <uapi/linux/watchdog.h>
>>   struct watchdog_ops;
>> @@ -62,9 +63,13 @@ struct watchdog_ops {
>>    * @timeout:    The watchdog devices timeout value.
>>    * @min_timeout:The watchdog devices minimum timeout value.
>>    * @max_timeout:The watchdog devices maximum timeout value.
>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
>> + * @hw_margin:    Time interval between timer pings.
>>    * @driver-data:Pointer to the drivers private data.
>>    * @lock:    Lock for watchdog core internal use only.
>> + * @work:    Worker used to provide longer timeouts than hardware
>> supports.
>>    * @status:    Field that contains the devices internal status bits.
>> + * @hw_features:Feature bits describing how the watchdog HW works.
>>    *
>>    * The watchdog_device structure contains all information about a
>>    * watchdog timer device.
>> @@ -86,8 +91,12 @@ struct watchdog_device {
>>       unsigned int timeout;
>>       unsigned int min_timeout;
>>       unsigned int max_timeout;
>> +    unsigned int hw_max_timeout; /* in jiffies */
>> +    unsigned int hw_heartbeat; /* in jiffies */
>> +    unsigned long int expires; /* for soft timer */
>>       void *driver_data;
>>       struct mutex lock;
>> +    struct delayed_work work;
>>       unsigned long status;
>>   /* Bit numbers for status flags */
>>   #define WDOG_ACTIVE        0    /* Is the watchdog running/active */
>> @@ -95,6 +104,14 @@ struct watchdog_device {
>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char
>> ? */
>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
>>   #define WDOG_UNREGISTERED    4    /* Has the device been
>> unregistered */
>> +
>> +/* Bits describing features supported by the HW */
>> +    unsigned long hw_features;
>> +
>> +/* Can the watchdog be stopped and started */
>> +#define WDOG_HW_IS_STOPPABLE        BIT(0)
>> +/* Is the watchdog already running when the driver starts up */
>> +#define WDOG_HW_RUNNING_AT_BOOT        BIT(1)
>>   };
>>   #define WATCHDOG_NOWAYOUT        IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
>> @@ -120,6 +137,11 @@ static inline bool
>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>>           (t < wdd->min_timeout || t > wdd->max_timeout));
>>   }
>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
>> +{
>> +    return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
>> +}
>> +
>>   /* Use the following functions to manipulate watchdog driver
>> specific data */
>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
>> void *data)
>>   {
>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
>> watchdog_device *wdd)
>>   /* drivers/watchdog/watchdog_core.c */
>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>>                     unsigned int timeout_parm, struct device *dev);
>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>> *dev);
>>   extern int watchdog_register_device(struct watchdog_device *);
>>   extern void watchdog_unregister_device(struct watchdog_device *);
> Best Regards,
> Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
  2015-04-13 11:00       ` Timo Kokkonen
@ 2015-04-14  2:39         ` Yang, Wenyou
  -1 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-14  2:39 UTC (permalink / raw)
  To: Timo Kokkonen, linux-arm-kernel, linux-watchdog, boris.brezillon,
	nicolas.ferre, alexandre.belloni



On 2015/4/13 19:00, Timo Kokkonen wrote:
> On 13.04.2015 12:29, Yang, Wenyou wrote:
>> Hi Timo,
>>
>> If the /dev/watchdog device file doesn't open by the user space
>> application, the system will reboot by the watchdog.
>>
>
> Thank you for pointing out the bug. When the struct watchdog_device is 
> initialized, the expires variable is zero. Jiffies however is negative 
> on boot up, so time_after() comparison with zero returns false and the 
> worker keeps the watchdog alive for some minutes, until jiffies become 
> positive. Then it will stop updating the HW as the expires value has 
> become expired.
>
>> I think it is missing update of  the wdd->expires value. added it as
>> following.
>
> Actually, the expires value stores the user space expiration value. It 
> should only be updated when user space pings the watchdog. If we 
> increment the expires value in the worker thread, the watchdog will 
> never expire even if user space stops pinging it and keep it open.
>
> So the proper fix is to have it like this (in watchdog_worker function):
>
>     if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
>         pr_crit("I will reset your machine !\n");
>         return;
>     }
I have a little doubt, if the watchdog is not opened by the user space, 
the watchdog will not reset the system in any case.

it doesn't work in the same way as before in the at91sam9_wdt.c.
>
> Also we need this change to re-start the worker when watchdog is 
> properly stopped (in watchdog_dev.c):
>
> @@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device 
> *wddev)
>                 goto out_stop;
>         }
>
> -       err = wddev->ops->stop(wddev);
> -       if (err == 0)
> +       if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
> +               err = wddev->ops->stop(wddev);
> +               if (err == 0)
> +                       clear_bit(WDOG_ACTIVE, &wddev->status);
> +       } else {
> +               /* Unstoppable watchdogs need the worker to keep them 
> alive */
>                 clear_bit(WDOG_ACTIVE, &wddev->status);
> +               schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +       }
Tested it, there is a issue.
After open the watchdog device file, and feed the watchdog, then close 
the watchdog device file, the system reset by the watchdog.
It didn't reach here, the watchdog_worker didn't run.

>
>  out_stop:
>         mutex_unlock(&wddev->lock);
>
>
> After these two modifications the watchdog is being pinged 
> indefinitely until the user space opens the device and the worker is 
> re-started if watchdog daemon exits after writing the magic character. 
> I will update these change to the next version of the patch.
>
> Thanks again!
>
> -Timo
>
>> On 2015/4/10 21:06, Timo Kokkonen wrote:
>>> There is a great deal of diversity in the watchdog hardware found on
>>> different devices. Differen hardware have different contstraints on
>>> them, many of the constraints that are excessively difficult for the
>>> user space to satisfy.
>>>
>>> One such constraint is the length of the timeout value, which in many
>>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>>> with timers and workqueues to extend the timeout in order to give user
>>> space more time between updates. Looking at the drivers it is clear
>>> that this has resulted to a lot of duplicate code.
>>>
>>> Add an extension to the watchdog kernel API that allows the driver to
>>> describe tis HW constraints to the watchdog code. A kernel worker in
>>> the core is then used to extend the watchdog timeout on behalf of the
>>> user space. This allows the drivers to be simplified as core takes
>>> over the timer extending.
>>>
>>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>>> ---
>>>   drivers/watchdog/watchdog_core.c | 87
>>> ++++++++++++++++++++++++++++++++++++++--
>>>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>>>   include/linux/watchdog.h         | 23 +++++++++++
>>>   3 files changed, 119 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/watchdog/watchdog_core.c
>>> b/drivers/watchdog/watchdog_core.c
>>> index cec9b55..8e7d08d 100644
>>> --- a/drivers/watchdog/watchdog_core.c
>>> +++ b/drivers/watchdog/watchdog_core.c
>>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device 
>>> *wdd,
>>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>>>   /**
>>> + * watchdog_init_timeout() - initialize generic watchdog parameters
>>> + * @wdd: Watchdog device to operate
>>> + * @dev: Device that stores the device tree properties
>>> + *
>>> + * Initialize the generic timeout parameters. The driver needs to set
>>> + * hw_features bitmask from @wdd prior calling this function in order
>>> + * to ensure the core knows how to handle the HW.
>>> + *
>>> + * A zero is returned on success and -EINVAL for failure.
>>> + */
>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>>> *dev)
>>> +{
>>> +    int ret = 0;
>>> +
>>> +    ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    /*
>>> +     * Max HW timeout needs to be set so that core knows when to
>>> +     * use a kernel worker to support longer watchdog timeouts
>>> +     */
>>> +    if (!wdd->hw_max_timeout)
>>> +        return -EINVAL;
>>> +
>>> +    return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
>>> +
>>> +static void watchdog_worker(struct work_struct *work)
>>> +{
>>> +    struct watchdog_device *wdd = container_of(to_delayed_work(work),
>>> +                        struct watchdog_device, work);
>>> +
>>> +    if (time_after(jiffies, wdd->expires)) {
>>> +        pr_crit("I will reset your machine !\n");
>>> +        return;
>>> +    }
>>> +
>>> +    if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
>>> +        (watchdog_active(wdd) &&
>>> +            wdd->hw_max_timeout < wdd->timeout * HZ)) {
>>> +        if (wdd->ops->ping)
>>> +            wdd->ops->ping(wdd);
>>> +        else
>>> +            wdd->ops->start(wdd);
>>> +
>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>>     I think you missed the update of wdd->expires, as following.
>>                                     wdd->expires = jiffies +
>> wdd->timeout * HZ;
>>> +    }
>>> +}
>>> +
>>> +static int prepare_watchdog(struct watchdog_device *wdd)
>>> +{
>>> +    /* Stop the watchdog now before user space opens the device */
>>> +    if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
>>> +        wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
>>> +        wdd->ops->stop(wdd);
>>> +
>>> +    } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
>>> +        /*
>>> +         * Can't stop it, use a kernel timer to tick
>>> +         * it until it's open by user space
>>> +         */
>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>>> +    }
>>       ditto
>>                      wdd->expires = jiffies + wdd->timeout * HZ;
>>
>>> +    return 0;
>>> +}
>>> +
>>> +/**
>>>    * watchdog_register_device() - register a watchdog device
>>>    * @wdd: watchdog device
>>>    *
>>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
>>> watchdog_device *wdd)
>>>       wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>>>                       NULL, "watchdog%d", wdd->id);
>>>       if (IS_ERR(wdd->dev)) {
>>> -        watchdog_dev_unregister(wdd);
>>> -        ida_simple_remove(&watchdog_ida, id);
>>>           ret = PTR_ERR(wdd->dev);
>>> -        return ret;
>>> +        goto dev_create_fail;
>>> +    }
>>> +
>>> +    INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
>>> +
>>> +    if (wdd->hw_max_timeout) {
>>> +        ret = prepare_watchdog(wdd);
>>> +        if (ret)
>>> +            goto dev_create_fail;
>>>       }
>>>       return 0;
>>> +
>>> +dev_create_fail:
>>> +    watchdog_dev_unregister(wdd);
>>> +    ida_simple_remove(&watchdog_ida, id);
>>> +    return ret;
>>>   }
>>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
>>> watchdog_device *wdd)
>>>       if (wdd == NULL)
>>>           return;
>>> +    cancel_delayed_work_sync(&wdd->work);
>>>       devno = wdd->cdev.dev;
>>>       ret = watchdog_dev_unregister(wdd);
>>>       if (ret)
>>> diff --git a/drivers/watchdog/watchdog_dev.c
>>> b/drivers/watchdog/watchdog_dev.c
>>> index 6aaefba..99eb363 100644
>>> --- a/drivers/watchdog/watchdog_dev.c
>>> +++ b/drivers/watchdog/watchdog_dev.c
>>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
>>> *wddev)
>>>       else
>>>           err = wddev->ops->start(wddev); /* restart watchdog */
>>> +    if (wddev->hw_max_timeout &&
>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>>> +    }
>>> +
>>>   out_ping:
>>>       mutex_unlock(&wddev->lock);
>>>       return err;
>>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
>>> *wddev)
>>>       if (err == 0)
>>>           set_bit(WDOG_ACTIVE, &wddev->status);
>>> +    if (wddev->hw_max_timeout &&
>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>>> +    }
>>> +
>>>   out_start:
>>>       mutex_unlock(&wddev->lock);
>>>       return err;
>>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
>>> index 395b70e..c16c921 100644
>>> --- a/include/linux/watchdog.h
>>> +++ b/include/linux/watchdog.h
>>> @@ -12,6 +12,7 @@
>>>   #include <linux/bitops.h>
>>>   #include <linux/device.h>
>>>   #include <linux/cdev.h>
>>> +#include <linux/workqueue.h>
>>>   #include <uapi/linux/watchdog.h>
>>>   struct watchdog_ops;
>>> @@ -62,9 +63,13 @@ struct watchdog_ops {
>>>    * @timeout:    The watchdog devices timeout value.
>>>    * @min_timeout:The watchdog devices minimum timeout value.
>>>    * @max_timeout:The watchdog devices maximum timeout value.
>>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
>>> + * @hw_margin:    Time interval between timer pings.
>>>    * @driver-data:Pointer to the drivers private data.
>>>    * @lock:    Lock for watchdog core internal use only.
>>> + * @work:    Worker used to provide longer timeouts than hardware
>>> supports.
>>>    * @status:    Field that contains the devices internal status bits.
>>> + * @hw_features:Feature bits describing how the watchdog HW works.
>>>    *
>>>    * The watchdog_device structure contains all information about a
>>>    * watchdog timer device.
>>> @@ -86,8 +91,12 @@ struct watchdog_device {
>>>       unsigned int timeout;
>>>       unsigned int min_timeout;
>>>       unsigned int max_timeout;
>>> +    unsigned int hw_max_timeout; /* in jiffies */
>>> +    unsigned int hw_heartbeat; /* in jiffies */
>>> +    unsigned long int expires; /* for soft timer */
>>>       void *driver_data;
>>>       struct mutex lock;
>>> +    struct delayed_work work;
>>>       unsigned long status;
>>>   /* Bit numbers for status flags */
>>>   #define WDOG_ACTIVE        0    /* Is the watchdog running/active */
>>> @@ -95,6 +104,14 @@ struct watchdog_device {
>>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char
>>> ? */
>>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
>>>   #define WDOG_UNREGISTERED    4    /* Has the device been
>>> unregistered */
>>> +
>>> +/* Bits describing features supported by the HW */
>>> +    unsigned long hw_features;
>>> +
>>> +/* Can the watchdog be stopped and started */
>>> +#define WDOG_HW_IS_STOPPABLE        BIT(0)
>>> +/* Is the watchdog already running when the driver starts up */
>>> +#define WDOG_HW_RUNNING_AT_BOOT        BIT(1)
>>>   };
>>>   #define WATCHDOG_NOWAYOUT IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
>>> @@ -120,6 +137,11 @@ static inline bool
>>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>>>           (t < wdd->min_timeout || t > wdd->max_timeout));
>>>   }
>>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
>>> +{
>>> +    return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
>>> +}
>>> +
>>>   /* Use the following functions to manipulate watchdog driver
>>> specific data */
>>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
>>> void *data)
>>>   {
>>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
>>> watchdog_device *wdd)
>>>   /* drivers/watchdog/watchdog_core.c */
>>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>>>                     unsigned int timeout_parm, struct device *dev);
>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>>> *dev);
>>>   extern int watchdog_register_device(struct watchdog_device *);
>>>   extern void watchdog_unregister_device(struct watchdog_device *);
>> Best Regards,
>> Wenyou Yang
>
Best Regards,
Wenyou Yang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
@ 2015-04-14  2:39         ` Yang, Wenyou
  0 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-14  2:39 UTC (permalink / raw)
  To: linux-arm-kernel



On 2015/4/13 19:00, Timo Kokkonen wrote:
> On 13.04.2015 12:29, Yang, Wenyou wrote:
>> Hi Timo,
>>
>> If the /dev/watchdog device file doesn't open by the user space
>> application, the system will reboot by the watchdog.
>>
>
> Thank you for pointing out the bug. When the struct watchdog_device is 
> initialized, the expires variable is zero. Jiffies however is negative 
> on boot up, so time_after() comparison with zero returns false and the 
> worker keeps the watchdog alive for some minutes, until jiffies become 
> positive. Then it will stop updating the HW as the expires value has 
> become expired.
>
>> I think it is missing update of  the wdd->expires value. added it as
>> following.
>
> Actually, the expires value stores the user space expiration value. It 
> should only be updated when user space pings the watchdog. If we 
> increment the expires value in the worker thread, the watchdog will 
> never expire even if user space stops pinging it and keep it open.
>
> So the proper fix is to have it like this (in watchdog_worker function):
>
>     if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
>         pr_crit("I will reset your machine !\n");
>         return;
>     }
I have a little doubt, if the watchdog is not opened by the user space, 
the watchdog will not reset the system in any case.

it doesn't work in the same way as before in the at91sam9_wdt.c.
>
> Also we need this change to re-start the worker when watchdog is 
> properly stopped (in watchdog_dev.c):
>
> @@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device 
> *wddev)
>                 goto out_stop;
>         }
>
> -       err = wddev->ops->stop(wddev);
> -       if (err == 0)
> +       if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
> +               err = wddev->ops->stop(wddev);
> +               if (err == 0)
> +                       clear_bit(WDOG_ACTIVE, &wddev->status);
> +       } else {
> +               /* Unstoppable watchdogs need the worker to keep them 
> alive */
>                 clear_bit(WDOG_ACTIVE, &wddev->status);
> +               schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +       }
Tested it, there is a issue.
After open the watchdog device file, and feed the watchdog, then close 
the watchdog device file, the system reset by the watchdog.
It didn't reach here, the watchdog_worker didn't run.

>
>  out_stop:
>         mutex_unlock(&wddev->lock);
>
>
> After these two modifications the watchdog is being pinged 
> indefinitely until the user space opens the device and the worker is 
> re-started if watchdog daemon exits after writing the magic character. 
> I will update these change to the next version of the patch.
>
> Thanks again!
>
> -Timo
>
>> On 2015/4/10 21:06, Timo Kokkonen wrote:
>>> There is a great deal of diversity in the watchdog hardware found on
>>> different devices. Differen hardware have different contstraints on
>>> them, many of the constraints that are excessively difficult for the
>>> user space to satisfy.
>>>
>>> One such constraint is the length of the timeout value, which in many
>>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>>> with timers and workqueues to extend the timeout in order to give user
>>> space more time between updates. Looking at the drivers it is clear
>>> that this has resulted to a lot of duplicate code.
>>>
>>> Add an extension to the watchdog kernel API that allows the driver to
>>> describe tis HW constraints to the watchdog code. A kernel worker in
>>> the core is then used to extend the watchdog timeout on behalf of the
>>> user space. This allows the drivers to be simplified as core takes
>>> over the timer extending.
>>>
>>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>>> ---
>>>   drivers/watchdog/watchdog_core.c | 87
>>> ++++++++++++++++++++++++++++++++++++++--
>>>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>>>   include/linux/watchdog.h         | 23 +++++++++++
>>>   3 files changed, 119 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/watchdog/watchdog_core.c
>>> b/drivers/watchdog/watchdog_core.c
>>> index cec9b55..8e7d08d 100644
>>> --- a/drivers/watchdog/watchdog_core.c
>>> +++ b/drivers/watchdog/watchdog_core.c
>>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device 
>>> *wdd,
>>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>>>   /**
>>> + * watchdog_init_timeout() - initialize generic watchdog parameters
>>> + * @wdd: Watchdog device to operate
>>> + * @dev: Device that stores the device tree properties
>>> + *
>>> + * Initialize the generic timeout parameters. The driver needs to set
>>> + * hw_features bitmask from @wdd prior calling this function in order
>>> + * to ensure the core knows how to handle the HW.
>>> + *
>>> + * A zero is returned on success and -EINVAL for failure.
>>> + */
>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>>> *dev)
>>> +{
>>> +    int ret = 0;
>>> +
>>> +    ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    /*
>>> +     * Max HW timeout needs to be set so that core knows when to
>>> +     * use a kernel worker to support longer watchdog timeouts
>>> +     */
>>> +    if (!wdd->hw_max_timeout)
>>> +        return -EINVAL;
>>> +
>>> +    return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
>>> +
>>> +static void watchdog_worker(struct work_struct *work)
>>> +{
>>> +    struct watchdog_device *wdd = container_of(to_delayed_work(work),
>>> +                        struct watchdog_device, work);
>>> +
>>> +    if (time_after(jiffies, wdd->expires)) {
>>> +        pr_crit("I will reset your machine !\n");
>>> +        return;
>>> +    }
>>> +
>>> +    if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
>>> +        (watchdog_active(wdd) &&
>>> +            wdd->hw_max_timeout < wdd->timeout * HZ)) {
>>> +        if (wdd->ops->ping)
>>> +            wdd->ops->ping(wdd);
>>> +        else
>>> +            wdd->ops->start(wdd);
>>> +
>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>>     I think you missed the update of wdd->expires, as following.
>>                                     wdd->expires = jiffies +
>> wdd->timeout * HZ;
>>> +    }
>>> +}
>>> +
>>> +static int prepare_watchdog(struct watchdog_device *wdd)
>>> +{
>>> +    /* Stop the watchdog now before user space opens the device */
>>> +    if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
>>> +        wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
>>> +        wdd->ops->stop(wdd);
>>> +
>>> +    } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
>>> +        /*
>>> +         * Can't stop it, use a kernel timer to tick
>>> +         * it until it's open by user space
>>> +         */
>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>>> +    }
>>       ditto
>>                      wdd->expires = jiffies + wdd->timeout * HZ;
>>
>>> +    return 0;
>>> +}
>>> +
>>> +/**
>>>    * watchdog_register_device() - register a watchdog device
>>>    * @wdd: watchdog device
>>>    *
>>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
>>> watchdog_device *wdd)
>>>       wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>>>                       NULL, "watchdog%d", wdd->id);
>>>       if (IS_ERR(wdd->dev)) {
>>> -        watchdog_dev_unregister(wdd);
>>> -        ida_simple_remove(&watchdog_ida, id);
>>>           ret = PTR_ERR(wdd->dev);
>>> -        return ret;
>>> +        goto dev_create_fail;
>>> +    }
>>> +
>>> +    INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
>>> +
>>> +    if (wdd->hw_max_timeout) {
>>> +        ret = prepare_watchdog(wdd);
>>> +        if (ret)
>>> +            goto dev_create_fail;
>>>       }
>>>       return 0;
>>> +
>>> +dev_create_fail:
>>> +    watchdog_dev_unregister(wdd);
>>> +    ida_simple_remove(&watchdog_ida, id);
>>> +    return ret;
>>>   }
>>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
>>> watchdog_device *wdd)
>>>       if (wdd == NULL)
>>>           return;
>>> +    cancel_delayed_work_sync(&wdd->work);
>>>       devno = wdd->cdev.dev;
>>>       ret = watchdog_dev_unregister(wdd);
>>>       if (ret)
>>> diff --git a/drivers/watchdog/watchdog_dev.c
>>> b/drivers/watchdog/watchdog_dev.c
>>> index 6aaefba..99eb363 100644
>>> --- a/drivers/watchdog/watchdog_dev.c
>>> +++ b/drivers/watchdog/watchdog_dev.c
>>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
>>> *wddev)
>>>       else
>>>           err = wddev->ops->start(wddev); /* restart watchdog */
>>> +    if (wddev->hw_max_timeout &&
>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>>> +    }
>>> +
>>>   out_ping:
>>>       mutex_unlock(&wddev->lock);
>>>       return err;
>>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
>>> *wddev)
>>>       if (err == 0)
>>>           set_bit(WDOG_ACTIVE, &wddev->status);
>>> +    if (wddev->hw_max_timeout &&
>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>>> +    }
>>> +
>>>   out_start:
>>>       mutex_unlock(&wddev->lock);
>>>       return err;
>>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
>>> index 395b70e..c16c921 100644
>>> --- a/include/linux/watchdog.h
>>> +++ b/include/linux/watchdog.h
>>> @@ -12,6 +12,7 @@
>>>   #include <linux/bitops.h>
>>>   #include <linux/device.h>
>>>   #include <linux/cdev.h>
>>> +#include <linux/workqueue.h>
>>>   #include <uapi/linux/watchdog.h>
>>>   struct watchdog_ops;
>>> @@ -62,9 +63,13 @@ struct watchdog_ops {
>>>    * @timeout:    The watchdog devices timeout value.
>>>    * @min_timeout:The watchdog devices minimum timeout value.
>>>    * @max_timeout:The watchdog devices maximum timeout value.
>>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
>>> + * @hw_margin:    Time interval between timer pings.
>>>    * @driver-data:Pointer to the drivers private data.
>>>    * @lock:    Lock for watchdog core internal use only.
>>> + * @work:    Worker used to provide longer timeouts than hardware
>>> supports.
>>>    * @status:    Field that contains the devices internal status bits.
>>> + * @hw_features:Feature bits describing how the watchdog HW works.
>>>    *
>>>    * The watchdog_device structure contains all information about a
>>>    * watchdog timer device.
>>> @@ -86,8 +91,12 @@ struct watchdog_device {
>>>       unsigned int timeout;
>>>       unsigned int min_timeout;
>>>       unsigned int max_timeout;
>>> +    unsigned int hw_max_timeout; /* in jiffies */
>>> +    unsigned int hw_heartbeat; /* in jiffies */
>>> +    unsigned long int expires; /* for soft timer */
>>>       void *driver_data;
>>>       struct mutex lock;
>>> +    struct delayed_work work;
>>>       unsigned long status;
>>>   /* Bit numbers for status flags */
>>>   #define WDOG_ACTIVE        0    /* Is the watchdog running/active */
>>> @@ -95,6 +104,14 @@ struct watchdog_device {
>>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char
>>> ? */
>>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
>>>   #define WDOG_UNREGISTERED    4    /* Has the device been
>>> unregistered */
>>> +
>>> +/* Bits describing features supported by the HW */
>>> +    unsigned long hw_features;
>>> +
>>> +/* Can the watchdog be stopped and started */
>>> +#define WDOG_HW_IS_STOPPABLE        BIT(0)
>>> +/* Is the watchdog already running when the driver starts up */
>>> +#define WDOG_HW_RUNNING_AT_BOOT        BIT(1)
>>>   };
>>>   #define WATCHDOG_NOWAYOUT IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
>>> @@ -120,6 +137,11 @@ static inline bool
>>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>>>           (t < wdd->min_timeout || t > wdd->max_timeout));
>>>   }
>>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
>>> +{
>>> +    return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
>>> +}
>>> +
>>>   /* Use the following functions to manipulate watchdog driver
>>> specific data */
>>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
>>> void *data)
>>>   {
>>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
>>> watchdog_device *wdd)
>>>   /* drivers/watchdog/watchdog_core.c */
>>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>>>                     unsigned int timeout_parm, struct device *dev);
>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>>> *dev);
>>>   extern int watchdog_register_device(struct watchdog_device *);
>>>   extern void watchdog_unregister_device(struct watchdog_device *);
>> Best Regards,
>> Wenyou Yang
>
Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
  2015-04-14  2:39         ` Yang, Wenyou
@ 2015-04-14  5:31           ` Timo Kokkonen
  -1 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-14  5:31 UTC (permalink / raw)
  To: Yang, Wenyou, linux-arm-kernel, linux-watchdog, boris.brezillon,
	nicolas.ferre, alexandre.belloni

On 14.04.2015 05:39, Yang, Wenyou wrote:
>
>
> On 2015/4/13 19:00, Timo Kokkonen wrote:
>> On 13.04.2015 12:29, Yang, Wenyou wrote:
>>> Hi Timo,
>>>
>>> If the /dev/watchdog device file doesn't open by the user space
>>> application, the system will reboot by the watchdog.
>>>
>>
>> Thank you for pointing out the bug. When the struct watchdog_device is
>> initialized, the expires variable is zero. Jiffies however is negative
>> on boot up, so time_after() comparison with zero returns false and the
>> worker keeps the watchdog alive for some minutes, until jiffies become
>> positive. Then it will stop updating the HW as the expires value has
>> become expired.
>>
>>> I think it is missing update of  the wdd->expires value. added it as
>>> following.
>>
>> Actually, the expires value stores the user space expiration value. It
>> should only be updated when user space pings the watchdog. If we
>> increment the expires value in the worker thread, the watchdog will
>> never expire even if user space stops pinging it and keep it open.
>>
>> So the proper fix is to have it like this (in watchdog_worker function):
>>
>>     if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
>>         pr_crit("I will reset your machine !\n");
>>         return;
>>     }
> I have a little doubt, if the watchdog is not opened by the user space,
> the watchdog will not reset the system in any case.
>
> it doesn't work in the same way as before in the at91sam9_wdt.c.

That's why the timer was originally in at91sam9_wdt, it kept on pinging 
the watchdog HW until user space opened it. That's why I started working 
on the early-timeout-sec properlty in the first place. And that is the 
same pattern that repeats all over the watchdog drivers.

>>
>> Also we need this change to re-start the worker when watchdog is
>> properly stopped (in watchdog_dev.c):
>>
>> @@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device
>> *wddev)
>>                 goto out_stop;
>>         }
>>
>> -       err = wddev->ops->stop(wddev);
>> -       if (err == 0)
>> +       if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
>> +               err = wddev->ops->stop(wddev);
>> +               if (err == 0)
>> +                       clear_bit(WDOG_ACTIVE, &wddev->status);
>> +       } else {
>> +               /* Unstoppable watchdogs need the worker to keep them
>> alive */
>>                 clear_bit(WDOG_ACTIVE, &wddev->status);
>> +               schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +       }
> Tested it, there is a issue.
> After open the watchdog device file, and feed the watchdog, then close
> the watchdog device file, the system reset by the watchdog.
> It didn't reach here, the watchdog_worker didn't run.
>

If you just close the device, watchdog is not stopped. That is to 
prevent watchdog daemon crash from hanging the system. You need to feed 
the magic char 'V' in order to stop the watchdog, which will trigger the 
start of the worker. Have you tried that?

I noticed some other issues with the patches. I'll send a new one with 
all fixes combined shortly..

Thanks for your feedback.

-Timo

>>
>>  out_stop:
>>         mutex_unlock(&wddev->lock);
>>
>>
>> After these two modifications the watchdog is being pinged
>> indefinitely until the user space opens the device and the worker is
>> re-started if watchdog daemon exits after writing the magic character.
>> I will update these change to the next version of the patch.
>>
>> Thanks again!
>>
>> -Timo
>>
>>> On 2015/4/10 21:06, Timo Kokkonen wrote:
>>>> There is a great deal of diversity in the watchdog hardware found on
>>>> different devices. Differen hardware have different contstraints on
>>>> them, many of the constraints that are excessively difficult for the
>>>> user space to satisfy.
>>>>
>>>> One such constraint is the length of the timeout value, which in many
>>>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>>>> with timers and workqueues to extend the timeout in order to give user
>>>> space more time between updates. Looking at the drivers it is clear
>>>> that this has resulted to a lot of duplicate code.
>>>>
>>>> Add an extension to the watchdog kernel API that allows the driver to
>>>> describe tis HW constraints to the watchdog code. A kernel worker in
>>>> the core is then used to extend the watchdog timeout on behalf of the
>>>> user space. This allows the drivers to be simplified as core takes
>>>> over the timer extending.
>>>>
>>>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>>>> ---
>>>>   drivers/watchdog/watchdog_core.c | 87
>>>> ++++++++++++++++++++++++++++++++++++++--
>>>>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>>>>   include/linux/watchdog.h         | 23 +++++++++++
>>>>   3 files changed, 119 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/watchdog/watchdog_core.c
>>>> b/drivers/watchdog/watchdog_core.c
>>>> index cec9b55..8e7d08d 100644
>>>> --- a/drivers/watchdog/watchdog_core.c
>>>> +++ b/drivers/watchdog/watchdog_core.c
>>>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device
>>>> *wdd,
>>>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>>>>   /**
>>>> + * watchdog_init_timeout() - initialize generic watchdog parameters
>>>> + * @wdd: Watchdog device to operate
>>>> + * @dev: Device that stores the device tree properties
>>>> + *
>>>> + * Initialize the generic timeout parameters. The driver needs to set
>>>> + * hw_features bitmask from @wdd prior calling this function in order
>>>> + * to ensure the core knows how to handle the HW.
>>>> + *
>>>> + * A zero is returned on success and -EINVAL for failure.
>>>> + */
>>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>>>> *dev)
>>>> +{
>>>> +    int ret = 0;
>>>> +
>>>> +    ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>>>> +    if (ret < 0)
>>>> +        return ret;
>>>> +
>>>> +    /*
>>>> +     * Max HW timeout needs to be set so that core knows when to
>>>> +     * use a kernel worker to support longer watchdog timeouts
>>>> +     */
>>>> +    if (!wdd->hw_max_timeout)
>>>> +        return -EINVAL;
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
>>>> +
>>>> +static void watchdog_worker(struct work_struct *work)
>>>> +{
>>>> +    struct watchdog_device *wdd = container_of(to_delayed_work(work),
>>>> +                        struct watchdog_device, work);
>>>> +
>>>> +    if (time_after(jiffies, wdd->expires)) {
>>>> +        pr_crit("I will reset your machine !\n");
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
>>>> +        (watchdog_active(wdd) &&
>>>> +            wdd->hw_max_timeout < wdd->timeout * HZ)) {
>>>> +        if (wdd->ops->ping)
>>>> +            wdd->ops->ping(wdd);
>>>> +        else
>>>> +            wdd->ops->start(wdd);
>>>> +
>>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>>>     I think you missed the update of wdd->expires, as following.
>>>                                     wdd->expires = jiffies +
>>> wdd->timeout * HZ;
>>>> +    }
>>>> +}
>>>> +
>>>> +static int prepare_watchdog(struct watchdog_device *wdd)
>>>> +{
>>>> +    /* Stop the watchdog now before user space opens the device */
>>>> +    if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
>>>> +        wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
>>>> +        wdd->ops->stop(wdd);
>>>> +
>>>> +    } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
>>>> +        /*
>>>> +         * Can't stop it, use a kernel timer to tick
>>>> +         * it until it's open by user space
>>>> +         */
>>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>>>> +    }
>>>       ditto
>>>                      wdd->expires = jiffies + wdd->timeout * HZ;
>>>
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/**
>>>>    * watchdog_register_device() - register a watchdog device
>>>>    * @wdd: watchdog device
>>>>    *
>>>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
>>>> watchdog_device *wdd)
>>>>       wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>>>>                       NULL, "watchdog%d", wdd->id);
>>>>       if (IS_ERR(wdd->dev)) {
>>>> -        watchdog_dev_unregister(wdd);
>>>> -        ida_simple_remove(&watchdog_ida, id);
>>>>           ret = PTR_ERR(wdd->dev);
>>>> -        return ret;
>>>> +        goto dev_create_fail;
>>>> +    }
>>>> +
>>>> +    INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
>>>> +
>>>> +    if (wdd->hw_max_timeout) {
>>>> +        ret = prepare_watchdog(wdd);
>>>> +        if (ret)
>>>> +            goto dev_create_fail;
>>>>       }
>>>>       return 0;
>>>> +
>>>> +dev_create_fail:
>>>> +    watchdog_dev_unregister(wdd);
>>>> +    ida_simple_remove(&watchdog_ida, id);
>>>> +    return ret;
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>>>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
>>>> watchdog_device *wdd)
>>>>       if (wdd == NULL)
>>>>           return;
>>>> +    cancel_delayed_work_sync(&wdd->work);
>>>>       devno = wdd->cdev.dev;
>>>>       ret = watchdog_dev_unregister(wdd);
>>>>       if (ret)
>>>> diff --git a/drivers/watchdog/watchdog_dev.c
>>>> b/drivers/watchdog/watchdog_dev.c
>>>> index 6aaefba..99eb363 100644
>>>> --- a/drivers/watchdog/watchdog_dev.c
>>>> +++ b/drivers/watchdog/watchdog_dev.c
>>>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
>>>> *wddev)
>>>>       else
>>>>           err = wddev->ops->start(wddev); /* restart watchdog */
>>>> +    if (wddev->hw_max_timeout &&
>>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>>>> +    }
>>>> +
>>>>   out_ping:
>>>>       mutex_unlock(&wddev->lock);
>>>>       return err;
>>>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
>>>> *wddev)
>>>>       if (err == 0)
>>>>           set_bit(WDOG_ACTIVE, &wddev->status);
>>>> +    if (wddev->hw_max_timeout &&
>>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>>>> +    }
>>>> +
>>>>   out_start:
>>>>       mutex_unlock(&wddev->lock);
>>>>       return err;
>>>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
>>>> index 395b70e..c16c921 100644
>>>> --- a/include/linux/watchdog.h
>>>> +++ b/include/linux/watchdog.h
>>>> @@ -12,6 +12,7 @@
>>>>   #include <linux/bitops.h>
>>>>   #include <linux/device.h>
>>>>   #include <linux/cdev.h>
>>>> +#include <linux/workqueue.h>
>>>>   #include <uapi/linux/watchdog.h>
>>>>   struct watchdog_ops;
>>>> @@ -62,9 +63,13 @@ struct watchdog_ops {
>>>>    * @timeout:    The watchdog devices timeout value.
>>>>    * @min_timeout:The watchdog devices minimum timeout value.
>>>>    * @max_timeout:The watchdog devices maximum timeout value.
>>>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
>>>> + * @hw_margin:    Time interval between timer pings.
>>>>    * @driver-data:Pointer to the drivers private data.
>>>>    * @lock:    Lock for watchdog core internal use only.
>>>> + * @work:    Worker used to provide longer timeouts than hardware
>>>> supports.
>>>>    * @status:    Field that contains the devices internal status bits.
>>>> + * @hw_features:Feature bits describing how the watchdog HW works.
>>>>    *
>>>>    * The watchdog_device structure contains all information about a
>>>>    * watchdog timer device.
>>>> @@ -86,8 +91,12 @@ struct watchdog_device {
>>>>       unsigned int timeout;
>>>>       unsigned int min_timeout;
>>>>       unsigned int max_timeout;
>>>> +    unsigned int hw_max_timeout; /* in jiffies */
>>>> +    unsigned int hw_heartbeat; /* in jiffies */
>>>> +    unsigned long int expires; /* for soft timer */
>>>>       void *driver_data;
>>>>       struct mutex lock;
>>>> +    struct delayed_work work;
>>>>       unsigned long status;
>>>>   /* Bit numbers for status flags */
>>>>   #define WDOG_ACTIVE        0    /* Is the watchdog running/active */
>>>> @@ -95,6 +104,14 @@ struct watchdog_device {
>>>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char
>>>> ? */
>>>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
>>>>   #define WDOG_UNREGISTERED    4    /* Has the device been
>>>> unregistered */
>>>> +
>>>> +/* Bits describing features supported by the HW */
>>>> +    unsigned long hw_features;
>>>> +
>>>> +/* Can the watchdog be stopped and started */
>>>> +#define WDOG_HW_IS_STOPPABLE        BIT(0)
>>>> +/* Is the watchdog already running when the driver starts up */
>>>> +#define WDOG_HW_RUNNING_AT_BOOT        BIT(1)
>>>>   };
>>>>   #define WATCHDOG_NOWAYOUT IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
>>>> @@ -120,6 +137,11 @@ static inline bool
>>>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>>>>           (t < wdd->min_timeout || t > wdd->max_timeout));
>>>>   }
>>>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
>>>> +{
>>>> +    return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
>>>> +}
>>>> +
>>>>   /* Use the following functions to manipulate watchdog driver
>>>> specific data */
>>>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
>>>> void *data)
>>>>   {
>>>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
>>>> watchdog_device *wdd)
>>>>   /* drivers/watchdog/watchdog_core.c */
>>>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>>>>                     unsigned int timeout_parm, struct device *dev);
>>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>>>> *dev);
>>>>   extern int watchdog_register_device(struct watchdog_device *);
>>>>   extern void watchdog_unregister_device(struct watchdog_device *);
>>> Best Regards,
>>> Wenyou Yang
>>
> Best Regards,
> Wenyou Yang
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
@ 2015-04-14  5:31           ` Timo Kokkonen
  0 siblings, 0 replies; 28+ messages in thread
From: Timo Kokkonen @ 2015-04-14  5:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 14.04.2015 05:39, Yang, Wenyou wrote:
>
>
> On 2015/4/13 19:00, Timo Kokkonen wrote:
>> On 13.04.2015 12:29, Yang, Wenyou wrote:
>>> Hi Timo,
>>>
>>> If the /dev/watchdog device file doesn't open by the user space
>>> application, the system will reboot by the watchdog.
>>>
>>
>> Thank you for pointing out the bug. When the struct watchdog_device is
>> initialized, the expires variable is zero. Jiffies however is negative
>> on boot up, so time_after() comparison with zero returns false and the
>> worker keeps the watchdog alive for some minutes, until jiffies become
>> positive. Then it will stop updating the HW as the expires value has
>> become expired.
>>
>>> I think it is missing update of  the wdd->expires value. added it as
>>> following.
>>
>> Actually, the expires value stores the user space expiration value. It
>> should only be updated when user space pings the watchdog. If we
>> increment the expires value in the worker thread, the watchdog will
>> never expire even if user space stops pinging it and keep it open.
>>
>> So the proper fix is to have it like this (in watchdog_worker function):
>>
>>     if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
>>         pr_crit("I will reset your machine !\n");
>>         return;
>>     }
> I have a little doubt, if the watchdog is not opened by the user space,
> the watchdog will not reset the system in any case.
>
> it doesn't work in the same way as before in the at91sam9_wdt.c.

That's why the timer was originally in at91sam9_wdt, it kept on pinging 
the watchdog HW until user space opened it. That's why I started working 
on the early-timeout-sec properlty in the first place. And that is the 
same pattern that repeats all over the watchdog drivers.

>>
>> Also we need this change to re-start the worker when watchdog is
>> properly stopped (in watchdog_dev.c):
>>
>> @@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device
>> *wddev)
>>                 goto out_stop;
>>         }
>>
>> -       err = wddev->ops->stop(wddev);
>> -       if (err == 0)
>> +       if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
>> +               err = wddev->ops->stop(wddev);
>> +               if (err == 0)
>> +                       clear_bit(WDOG_ACTIVE, &wddev->status);
>> +       } else {
>> +               /* Unstoppable watchdogs need the worker to keep them
>> alive */
>>                 clear_bit(WDOG_ACTIVE, &wddev->status);
>> +               schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +       }
> Tested it, there is a issue.
> After open the watchdog device file, and feed the watchdog, then close
> the watchdog device file, the system reset by the watchdog.
> It didn't reach here, the watchdog_worker didn't run.
>

If you just close the device, watchdog is not stopped. That is to 
prevent watchdog daemon crash from hanging the system. You need to feed 
the magic char 'V' in order to stop the watchdog, which will trigger the 
start of the worker. Have you tried that?

I noticed some other issues with the patches. I'll send a new one with 
all fixes combined shortly..

Thanks for your feedback.

-Timo

>>
>>  out_stop:
>>         mutex_unlock(&wddev->lock);
>>
>>
>> After these two modifications the watchdog is being pinged
>> indefinitely until the user space opens the device and the worker is
>> re-started if watchdog daemon exits after writing the magic character.
>> I will update these change to the next version of the patch.
>>
>> Thanks again!
>>
>> -Timo
>>
>>> On 2015/4/10 21:06, Timo Kokkonen wrote:
>>>> There is a great deal of diversity in the watchdog hardware found on
>>>> different devices. Differen hardware have different contstraints on
>>>> them, many of the constraints that are excessively difficult for the
>>>> user space to satisfy.
>>>>
>>>> One such constraint is the length of the timeout value, which in many
>>>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>>>> with timers and workqueues to extend the timeout in order to give user
>>>> space more time between updates. Looking at the drivers it is clear
>>>> that this has resulted to a lot of duplicate code.
>>>>
>>>> Add an extension to the watchdog kernel API that allows the driver to
>>>> describe tis HW constraints to the watchdog code. A kernel worker in
>>>> the core is then used to extend the watchdog timeout on behalf of the
>>>> user space. This allows the drivers to be simplified as core takes
>>>> over the timer extending.
>>>>
>>>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>>>> ---
>>>>   drivers/watchdog/watchdog_core.c | 87
>>>> ++++++++++++++++++++++++++++++++++++++--
>>>>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
>>>>   include/linux/watchdog.h         | 23 +++++++++++
>>>>   3 files changed, 119 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/watchdog/watchdog_core.c
>>>> b/drivers/watchdog/watchdog_core.c
>>>> index cec9b55..8e7d08d 100644
>>>> --- a/drivers/watchdog/watchdog_core.c
>>>> +++ b/drivers/watchdog/watchdog_core.c
>>>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device
>>>> *wdd,
>>>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>>>>   /**
>>>> + * watchdog_init_timeout() - initialize generic watchdog parameters
>>>> + * @wdd: Watchdog device to operate
>>>> + * @dev: Device that stores the device tree properties
>>>> + *
>>>> + * Initialize the generic timeout parameters. The driver needs to set
>>>> + * hw_features bitmask from @wdd prior calling this function in order
>>>> + * to ensure the core knows how to handle the HW.
>>>> + *
>>>> + * A zero is returned on success and -EINVAL for failure.
>>>> + */
>>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>>>> *dev)
>>>> +{
>>>> +    int ret = 0;
>>>> +
>>>> +    ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>>>> +    if (ret < 0)
>>>> +        return ret;
>>>> +
>>>> +    /*
>>>> +     * Max HW timeout needs to be set so that core knows when to
>>>> +     * use a kernel worker to support longer watchdog timeouts
>>>> +     */
>>>> +    if (!wdd->hw_max_timeout)
>>>> +        return -EINVAL;
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
>>>> +
>>>> +static void watchdog_worker(struct work_struct *work)
>>>> +{
>>>> +    struct watchdog_device *wdd = container_of(to_delayed_work(work),
>>>> +                        struct watchdog_device, work);
>>>> +
>>>> +    if (time_after(jiffies, wdd->expires)) {
>>>> +        pr_crit("I will reset your machine !\n");
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
>>>> +        (watchdog_active(wdd) &&
>>>> +            wdd->hw_max_timeout < wdd->timeout * HZ)) {
>>>> +        if (wdd->ops->ping)
>>>> +            wdd->ops->ping(wdd);
>>>> +        else
>>>> +            wdd->ops->start(wdd);
>>>> +
>>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>>>     I think you missed the update of wdd->expires, as following.
>>>                                     wdd->expires = jiffies +
>>> wdd->timeout * HZ;
>>>> +    }
>>>> +}
>>>> +
>>>> +static int prepare_watchdog(struct watchdog_device *wdd)
>>>> +{
>>>> +    /* Stop the watchdog now before user space opens the device */
>>>> +    if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
>>>> +        wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
>>>> +        wdd->ops->stop(wdd);
>>>> +
>>>> +    } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
>>>> +        /*
>>>> +         * Can't stop it, use a kernel timer to tick
>>>> +         * it until it's open by user space
>>>> +         */
>>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>>>> +    }
>>>       ditto
>>>                      wdd->expires = jiffies + wdd->timeout * HZ;
>>>
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/**
>>>>    * watchdog_register_device() - register a watchdog device
>>>>    * @wdd: watchdog device
>>>>    *
>>>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
>>>> watchdog_device *wdd)
>>>>       wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>>>>                       NULL, "watchdog%d", wdd->id);
>>>>       if (IS_ERR(wdd->dev)) {
>>>> -        watchdog_dev_unregister(wdd);
>>>> -        ida_simple_remove(&watchdog_ida, id);
>>>>           ret = PTR_ERR(wdd->dev);
>>>> -        return ret;
>>>> +        goto dev_create_fail;
>>>> +    }
>>>> +
>>>> +    INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
>>>> +
>>>> +    if (wdd->hw_max_timeout) {
>>>> +        ret = prepare_watchdog(wdd);
>>>> +        if (ret)
>>>> +            goto dev_create_fail;
>>>>       }
>>>>       return 0;
>>>> +
>>>> +dev_create_fail:
>>>> +    watchdog_dev_unregister(wdd);
>>>> +    ida_simple_remove(&watchdog_ida, id);
>>>> +    return ret;
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>>>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
>>>> watchdog_device *wdd)
>>>>       if (wdd == NULL)
>>>>           return;
>>>> +    cancel_delayed_work_sync(&wdd->work);
>>>>       devno = wdd->cdev.dev;
>>>>       ret = watchdog_dev_unregister(wdd);
>>>>       if (ret)
>>>> diff --git a/drivers/watchdog/watchdog_dev.c
>>>> b/drivers/watchdog/watchdog_dev.c
>>>> index 6aaefba..99eb363 100644
>>>> --- a/drivers/watchdog/watchdog_dev.c
>>>> +++ b/drivers/watchdog/watchdog_dev.c
>>>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
>>>> *wddev)
>>>>       else
>>>>           err = wddev->ops->start(wddev); /* restart watchdog */
>>>> +    if (wddev->hw_max_timeout &&
>>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>>>> +    }
>>>> +
>>>>   out_ping:
>>>>       mutex_unlock(&wddev->lock);
>>>>       return err;
>>>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
>>>> *wddev)
>>>>       if (err == 0)
>>>>           set_bit(WDOG_ACTIVE, &wddev->status);
>>>> +    if (wddev->hw_max_timeout &&
>>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
>>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
>>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>>>> +    }
>>>> +
>>>>   out_start:
>>>>       mutex_unlock(&wddev->lock);
>>>>       return err;
>>>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
>>>> index 395b70e..c16c921 100644
>>>> --- a/include/linux/watchdog.h
>>>> +++ b/include/linux/watchdog.h
>>>> @@ -12,6 +12,7 @@
>>>>   #include <linux/bitops.h>
>>>>   #include <linux/device.h>
>>>>   #include <linux/cdev.h>
>>>> +#include <linux/workqueue.h>
>>>>   #include <uapi/linux/watchdog.h>
>>>>   struct watchdog_ops;
>>>> @@ -62,9 +63,13 @@ struct watchdog_ops {
>>>>    * @timeout:    The watchdog devices timeout value.
>>>>    * @min_timeout:The watchdog devices minimum timeout value.
>>>>    * @max_timeout:The watchdog devices maximum timeout value.
>>>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
>>>> + * @hw_margin:    Time interval between timer pings.
>>>>    * @driver-data:Pointer to the drivers private data.
>>>>    * @lock:    Lock for watchdog core internal use only.
>>>> + * @work:    Worker used to provide longer timeouts than hardware
>>>> supports.
>>>>    * @status:    Field that contains the devices internal status bits.
>>>> + * @hw_features:Feature bits describing how the watchdog HW works.
>>>>    *
>>>>    * The watchdog_device structure contains all information about a
>>>>    * watchdog timer device.
>>>> @@ -86,8 +91,12 @@ struct watchdog_device {
>>>>       unsigned int timeout;
>>>>       unsigned int min_timeout;
>>>>       unsigned int max_timeout;
>>>> +    unsigned int hw_max_timeout; /* in jiffies */
>>>> +    unsigned int hw_heartbeat; /* in jiffies */
>>>> +    unsigned long int expires; /* for soft timer */
>>>>       void *driver_data;
>>>>       struct mutex lock;
>>>> +    struct delayed_work work;
>>>>       unsigned long status;
>>>>   /* Bit numbers for status flags */
>>>>   #define WDOG_ACTIVE        0    /* Is the watchdog running/active */
>>>> @@ -95,6 +104,14 @@ struct watchdog_device {
>>>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char
>>>> ? */
>>>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
>>>>   #define WDOG_UNREGISTERED    4    /* Has the device been
>>>> unregistered */
>>>> +
>>>> +/* Bits describing features supported by the HW */
>>>> +    unsigned long hw_features;
>>>> +
>>>> +/* Can the watchdog be stopped and started */
>>>> +#define WDOG_HW_IS_STOPPABLE        BIT(0)
>>>> +/* Is the watchdog already running when the driver starts up */
>>>> +#define WDOG_HW_RUNNING_AT_BOOT        BIT(1)
>>>>   };
>>>>   #define WATCHDOG_NOWAYOUT IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
>>>> @@ -120,6 +137,11 @@ static inline bool
>>>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>>>>           (t < wdd->min_timeout || t > wdd->max_timeout));
>>>>   }
>>>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
>>>> +{
>>>> +    return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
>>>> +}
>>>> +
>>>>   /* Use the following functions to manipulate watchdog driver
>>>> specific data */
>>>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
>>>> void *data)
>>>>   {
>>>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
>>>> watchdog_device *wdd)
>>>>   /* drivers/watchdog/watchdog_core.c */
>>>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>>>>                     unsigned int timeout_parm, struct device *dev);
>>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
>>>> *dev);
>>>>   extern int watchdog_register_device(struct watchdog_device *);
>>>>   extern void watchdog_unregister_device(struct watchdog_device *);
>>> Best Regards,
>>> Wenyou Yang
>>
> Best Regards,
> Wenyou Yang
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
  2015-04-14  5:31           ` Timo Kokkonen
@ 2015-04-14  5:44             ` Yang, Wenyou
  -1 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-14  5:44 UTC (permalink / raw)
  To: Timo Kokkonen, linux-arm-kernel, linux-watchdog, boris.brezillon,
	Ferre, Nicolas, alexandre.belloni



> -----Original Message-----
> From: Timo Kokkonen [mailto:timo.kokkonen@offcode.fi]
> Sent: 2015年4月14日 13:32
> To: Yang, Wenyou; linux-arm-kernel@lists.infradead.org; linux-
> watchdog@vger.kernel.org; boris.brezillon@free-electrons.com; Ferre, Nicolas;
> alexandre.belloni@free-electrons.com
> Subject: Re: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW
> limitations
> 
> On 14.04.2015 05:39, Yang, Wenyou wrote:
> >
> >
> > On 2015/4/13 19:00, Timo Kokkonen wrote:
> >> On 13.04.2015 12:29, Yang, Wenyou wrote:
> >>> Hi Timo,
> >>>
> >>> If the /dev/watchdog device file doesn't open by the user space
> >>> application, the system will reboot by the watchdog.
> >>>
> >>
> >> Thank you for pointing out the bug. When the struct watchdog_device
> >> is initialized, the expires variable is zero. Jiffies however is
> >> negative on boot up, so time_after() comparison with zero returns
> >> false and the worker keeps the watchdog alive for some minutes, until
> >> jiffies become positive. Then it will stop updating the HW as the
> >> expires value has become expired.
> >>
> >>> I think it is missing update of  the wdd->expires value. added it as
> >>> following.
> >>
> >> Actually, the expires value stores the user space expiration value.
> >> It should only be updated when user space pings the watchdog. If we
> >> increment the expires value in the worker thread, the watchdog will
> >> never expire even if user space stops pinging it and keep it open.
> >>
> >> So the proper fix is to have it like this (in watchdog_worker function):
> >>
> >>     if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
> >>         pr_crit("I will reset your machine !\n");
> >>         return;
> >>     }
> > I have a little doubt, if the watchdog is not opened by the user
> > space, the watchdog will not reset the system in any case.
> >
> > it doesn't work in the same way as before in the at91sam9_wdt.c.
> 
> That's why the timer was originally in at91sam9_wdt, it kept on pinging the
> watchdog HW until user space opened it. That's why I started working on the
> early-timeout-sec properlty in the first place. And that is the same pattern that
> repeats all over the watchdog drivers.
> 
> >>
> >> Also we need this change to re-start the worker when watchdog is
> >> properly stopped (in watchdog_dev.c):
> >>
> >> @@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device
> >> *wddev)
> >>                 goto out_stop;
> >>         }
> >>
> >> -       err = wddev->ops->stop(wddev);
> >> -       if (err == 0)
> >> +       if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
> >> +               err = wddev->ops->stop(wddev);
> >> +               if (err == 0)
> >> +                       clear_bit(WDOG_ACTIVE, &wddev->status);
> >> +       } else {
> >> +               /* Unstoppable watchdogs need the worker to keep them
> >> alive */
> >>                 clear_bit(WDOG_ACTIVE, &wddev->status);
> >> +               schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >> +       }
> > Tested it, there is a issue.
> > After open the watchdog device file, and feed the watchdog, then close
> > the watchdog device file, the system reset by the watchdog.
> > It didn't reach here, the watchdog_worker didn't run.
> >
> 
> If you just close the device, watchdog is not stopped. That is to
> prevent watchdog daemon crash from hanging the system. You need to feed
> the magic char 'V' in order to stop the watchdog, which will trigger the
> start of the worker. Have you tried that?
Yes, if feed the magic char "V", then close the watchdog device file, it works fine.

> 
> I noticed some other issues with the patches. I'll send a new one with
> all fixes combined shortly..
> 
> Thanks for your feedback.
> 
> -Timo
> 
> >>
> >>  out_stop:
> >>         mutex_unlock(&wddev->lock);
> >>
> >>
> >> After these two modifications the watchdog is being pinged
> >> indefinitely until the user space opens the device and the worker is
> >> re-started if watchdog daemon exits after writing the magic character.
> >> I will update these change to the next version of the patch.
> >>
> >> Thanks again!
> >>
> >> -Timo
> >>
> >>> On 2015/4/10 21:06, Timo Kokkonen wrote:
> >>>> There is a great deal of diversity in the watchdog hardware found on
> >>>> different devices. Differen hardware have different contstraints on
> >>>> them, many of the constraints that are excessively difficult for the
> >>>> user space to satisfy.
> >>>>
> >>>> One such constraint is the length of the timeout value, which in many
> >>>> cases can be just a few seconds. Drivers are creating ad hoc solutions
> >>>> with timers and workqueues to extend the timeout in order to give user
> >>>> space more time between updates. Looking at the drivers it is clear
> >>>> that this has resulted to a lot of duplicate code.
> >>>>
> >>>> Add an extension to the watchdog kernel API that allows the driver to
> >>>> describe tis HW constraints to the watchdog code. A kernel worker in
> >>>> the core is then used to extend the watchdog timeout on behalf of the
> >>>> user space. This allows the drivers to be simplified as core takes
> >>>> over the timer extending.
> >>>>
> >>>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> >>>> ---
> >>>>   drivers/watchdog/watchdog_core.c | 87
> >>>> ++++++++++++++++++++++++++++++++++++++--
> >>>>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
> >>>>   include/linux/watchdog.h         | 23 +++++++++++
> >>>>   3 files changed, 119 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/watchdog/watchdog_core.c
> >>>> b/drivers/watchdog/watchdog_core.c
> >>>> index cec9b55..8e7d08d 100644
> >>>> --- a/drivers/watchdog/watchdog_core.c
> >>>> +++ b/drivers/watchdog/watchdog_core.c
> >>>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device
> >>>> *wdd,
> >>>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
> >>>>   /**
> >>>> + * watchdog_init_timeout() - initialize generic watchdog parameters
> >>>> + * @wdd: Watchdog device to operate
> >>>> + * @dev: Device that stores the device tree properties
> >>>> + *
> >>>> + * Initialize the generic timeout parameters. The driver needs to set
> >>>> + * hw_features bitmask from @wdd prior calling this function in order
> >>>> + * to ensure the core knows how to handle the HW.
> >>>> + *
> >>>> + * A zero is returned on success and -EINVAL for failure.
> >>>> + */
> >>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
> >>>> *dev)
> >>>> +{
> >>>> +    int ret = 0;
> >>>> +
> >>>> +    ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> >>>> +    if (ret < 0)
> >>>> +        return ret;
> >>>> +
> >>>> +    /*
> >>>> +     * Max HW timeout needs to be set so that core knows when to
> >>>> +     * use a kernel worker to support longer watchdog timeouts
> >>>> +     */
> >>>> +    if (!wdd->hw_max_timeout)
> >>>> +        return -EINVAL;
> >>>> +
> >>>> +    return ret;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> >>>> +
> >>>> +static void watchdog_worker(struct work_struct *work)
> >>>> +{
> >>>> +    struct watchdog_device *wdd = container_of(to_delayed_work(work),
> >>>> +                        struct watchdog_device, work);
> >>>> +
> >>>> +    if (time_after(jiffies, wdd->expires)) {
> >>>> +        pr_crit("I will reset your machine !\n");
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> >>>> +        (watchdog_active(wdd) &&
> >>>> +            wdd->hw_max_timeout < wdd->timeout * HZ)) {
> >>>> +        if (wdd->ops->ping)
> >>>> +            wdd->ops->ping(wdd);
> >>>> +        else
> >>>> +            wdd->ops->start(wdd);
> >>>> +
> >>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> >>>     I think you missed the update of wdd->expires, as following.
> >>>                                     wdd->expires = jiffies +
> >>> wdd->timeout * HZ;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static int prepare_watchdog(struct watchdog_device *wdd)
> >>>> +{
> >>>> +    /* Stop the watchdog now before user space opens the device */
> >>>> +    if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> >>>> +        wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> >>>> +        wdd->ops->stop(wdd);
> >>>> +
> >>>> +    } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> >>>> +        /*
> >>>> +         * Can't stop it, use a kernel timer to tick
> >>>> +         * it until it's open by user space
> >>>> +         */
> >>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> >>>> +    }
> >>>       ditto
> >>>                      wdd->expires = jiffies + wdd->timeout * HZ;
> >>>
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>>    * watchdog_register_device() - register a watchdog device
> >>>>    * @wdd: watchdog device
> >>>>    *
> >>>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
> >>>> watchdog_device *wdd)
> >>>>       wdd->dev = device_create(watchdog_class, wdd->parent, devno,
> >>>>                       NULL, "watchdog%d", wdd->id);
> >>>>       if (IS_ERR(wdd->dev)) {
> >>>> -        watchdog_dev_unregister(wdd);
> >>>> -        ida_simple_remove(&watchdog_ida, id);
> >>>>           ret = PTR_ERR(wdd->dev);
> >>>> -        return ret;
> >>>> +        goto dev_create_fail;
> >>>> +    }
> >>>> +
> >>>> +    INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> >>>> +
> >>>> +    if (wdd->hw_max_timeout) {
> >>>> +        ret = prepare_watchdog(wdd);
> >>>> +        if (ret)
> >>>> +            goto dev_create_fail;
> >>>>       }
> >>>>       return 0;
> >>>> +
> >>>> +dev_create_fail:
> >>>> +    watchdog_dev_unregister(wdd);
> >>>> +    ida_simple_remove(&watchdog_ida, id);
> >>>> +    return ret;
> >>>>   }
> >>>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
> >>>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
> >>>> watchdog_device *wdd)
> >>>>       if (wdd == NULL)
> >>>>           return;
> >>>> +    cancel_delayed_work_sync(&wdd->work);
> >>>>       devno = wdd->cdev.dev;
> >>>>       ret = watchdog_dev_unregister(wdd);
> >>>>       if (ret)
> >>>> diff --git a/drivers/watchdog/watchdog_dev.c
> >>>> b/drivers/watchdog/watchdog_dev.c
> >>>> index 6aaefba..99eb363 100644
> >>>> --- a/drivers/watchdog/watchdog_dev.c
> >>>> +++ b/drivers/watchdog/watchdog_dev.c
> >>>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
> >>>> *wddev)
> >>>>       else
> >>>>           err = wddev->ops->start(wddev); /* restart watchdog */
> >>>> +    if (wddev->hw_max_timeout &&
> >>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
> >>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
> >>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >>>> +    }
> >>>> +
> >>>>   out_ping:
> >>>>       mutex_unlock(&wddev->lock);
> >>>>       return err;
> >>>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
> >>>> *wddev)
> >>>>       if (err == 0)
> >>>>           set_bit(WDOG_ACTIVE, &wddev->status);
> >>>> +    if (wddev->hw_max_timeout &&
> >>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
> >>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
> >>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >>>> +    }
> >>>> +
> >>>>   out_start:
> >>>>       mutex_unlock(&wddev->lock);
> >>>>       return err;
> >>>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> >>>> index 395b70e..c16c921 100644
> >>>> --- a/include/linux/watchdog.h
> >>>> +++ b/include/linux/watchdog.h
> >>>> @@ -12,6 +12,7 @@
> >>>>   #include <linux/bitops.h>
> >>>>   #include <linux/device.h>
> >>>>   #include <linux/cdev.h>
> >>>> +#include <linux/workqueue.h>
> >>>>   #include <uapi/linux/watchdog.h>
> >>>>   struct watchdog_ops;
> >>>> @@ -62,9 +63,13 @@ struct watchdog_ops {
> >>>>    * @timeout:    The watchdog devices timeout value.
> >>>>    * @min_timeout:The watchdog devices minimum timeout value.
> >>>>    * @max_timeout:The watchdog devices maximum timeout value.
> >>>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> >>>> + * @hw_margin:    Time interval between timer pings.
> >>>>    * @driver-data:Pointer to the drivers private data.
> >>>>    * @lock:    Lock for watchdog core internal use only.
> >>>> + * @work:    Worker used to provide longer timeouts than hardware
> >>>> supports.
> >>>>    * @status:    Field that contains the devices internal status bits.
> >>>> + * @hw_features:Feature bits describing how the watchdog HW works.
> >>>>    *
> >>>>    * The watchdog_device structure contains all information about a
> >>>>    * watchdog timer device.
> >>>> @@ -86,8 +91,12 @@ struct watchdog_device {
> >>>>       unsigned int timeout;
> >>>>       unsigned int min_timeout;
> >>>>       unsigned int max_timeout;
> >>>> +    unsigned int hw_max_timeout; /* in jiffies */
> >>>> +    unsigned int hw_heartbeat; /* in jiffies */
> >>>> +    unsigned long int expires; /* for soft timer */
> >>>>       void *driver_data;
> >>>>       struct mutex lock;
> >>>> +    struct delayed_work work;
> >>>>       unsigned long status;
> >>>>   /* Bit numbers for status flags */
> >>>>   #define WDOG_ACTIVE        0    /* Is the watchdog running/active */
> >>>> @@ -95,6 +104,14 @@ struct watchdog_device {
> >>>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char
> >>>> ? */
> >>>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
> >>>>   #define WDOG_UNREGISTERED    4    /* Has the device been
> >>>> unregistered */
> >>>> +
> >>>> +/* Bits describing features supported by the HW */
> >>>> +    unsigned long hw_features;
> >>>> +
> >>>> +/* Can the watchdog be stopped and started */
> >>>> +#define WDOG_HW_IS_STOPPABLE        BIT(0)
> >>>> +/* Is the watchdog already running when the driver starts up */
> >>>> +#define WDOG_HW_RUNNING_AT_BOOT        BIT(1)
> >>>>   };
> >>>>   #define WATCHDOG_NOWAYOUT
> IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> >>>> @@ -120,6 +137,11 @@ static inline bool
> >>>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
> >>>>           (t < wdd->min_timeout || t > wdd->max_timeout));
> >>>>   }
> >>>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> >>>> +{
> >>>> +    return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> >>>> +}
> >>>> +
> >>>>   /* Use the following functions to manipulate watchdog driver
> >>>> specific data */
> >>>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
> >>>> void *data)
> >>>>   {
> >>>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
> >>>> watchdog_device *wdd)
> >>>>   /* drivers/watchdog/watchdog_core.c */
> >>>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
> >>>>                     unsigned int timeout_parm, struct device *dev);
> >>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
> >>>> *dev);
> >>>>   extern int watchdog_register_device(struct watchdog_device *);
> >>>>   extern void watchdog_unregister_device(struct watchdog_device *);
> >>> Best Regards,
> >>> Wenyou Yang
> >>
> > Best Regards,
> > Wenyou Yang
> >
Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
@ 2015-04-14  5:44             ` Yang, Wenyou
  0 siblings, 0 replies; 28+ messages in thread
From: Yang, Wenyou @ 2015-04-14  5:44 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Timo Kokkonen [mailto:timo.kokkonen at offcode.fi]
> Sent: 2015?4?14? 13:32
> To: Yang, Wenyou; linux-arm-kernel at lists.infradead.org; linux-
> watchdog at vger.kernel.org; boris.brezillon at free-electrons.com; Ferre, Nicolas;
> alexandre.belloni at free-electrons.com
> Subject: Re: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW
> limitations
> 
> On 14.04.2015 05:39, Yang, Wenyou wrote:
> >
> >
> > On 2015/4/13 19:00, Timo Kokkonen wrote:
> >> On 13.04.2015 12:29, Yang, Wenyou wrote:
> >>> Hi Timo,
> >>>
> >>> If the /dev/watchdog device file doesn't open by the user space
> >>> application, the system will reboot by the watchdog.
> >>>
> >>
> >> Thank you for pointing out the bug. When the struct watchdog_device
> >> is initialized, the expires variable is zero. Jiffies however is
> >> negative on boot up, so time_after() comparison with zero returns
> >> false and the worker keeps the watchdog alive for some minutes, until
> >> jiffies become positive. Then it will stop updating the HW as the
> >> expires value has become expired.
> >>
> >>> I think it is missing update of  the wdd->expires value. added it as
> >>> following.
> >>
> >> Actually, the expires value stores the user space expiration value.
> >> It should only be updated when user space pings the watchdog. If we
> >> increment the expires value in the worker thread, the watchdog will
> >> never expire even if user space stops pinging it and keep it open.
> >>
> >> So the proper fix is to have it like this (in watchdog_worker function):
> >>
> >>     if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
> >>         pr_crit("I will reset your machine !\n");
> >>         return;
> >>     }
> > I have a little doubt, if the watchdog is not opened by the user
> > space, the watchdog will not reset the system in any case.
> >
> > it doesn't work in the same way as before in the at91sam9_wdt.c.
> 
> That's why the timer was originally in at91sam9_wdt, it kept on pinging the
> watchdog HW until user space opened it. That's why I started working on the
> early-timeout-sec properlty in the first place. And that is the same pattern that
> repeats all over the watchdog drivers.
> 
> >>
> >> Also we need this change to re-start the worker when watchdog is
> >> properly stopped (in watchdog_dev.c):
> >>
> >> @@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device
> >> *wddev)
> >>                 goto out_stop;
> >>         }
> >>
> >> -       err = wddev->ops->stop(wddev);
> >> -       if (err == 0)
> >> +       if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
> >> +               err = wddev->ops->stop(wddev);
> >> +               if (err == 0)
> >> +                       clear_bit(WDOG_ACTIVE, &wddev->status);
> >> +       } else {
> >> +               /* Unstoppable watchdogs need the worker to keep them
> >> alive */
> >>                 clear_bit(WDOG_ACTIVE, &wddev->status);
> >> +               schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >> +       }
> > Tested it, there is a issue.
> > After open the watchdog device file, and feed the watchdog, then close
> > the watchdog device file, the system reset by the watchdog.
> > It didn't reach here, the watchdog_worker didn't run.
> >
> 
> If you just close the device, watchdog is not stopped. That is to
> prevent watchdog daemon crash from hanging the system. You need to feed
> the magic char 'V' in order to stop the watchdog, which will trigger the
> start of the worker. Have you tried that?
Yes, if feed the magic char "V", then close the watchdog device file, it works fine.

> 
> I noticed some other issues with the patches. I'll send a new one with
> all fixes combined shortly..
> 
> Thanks for your feedback.
> 
> -Timo
> 
> >>
> >>  out_stop:
> >>         mutex_unlock(&wddev->lock);
> >>
> >>
> >> After these two modifications the watchdog is being pinged
> >> indefinitely until the user space opens the device and the worker is
> >> re-started if watchdog daemon exits after writing the magic character.
> >> I will update these change to the next version of the patch.
> >>
> >> Thanks again!
> >>
> >> -Timo
> >>
> >>> On 2015/4/10 21:06, Timo Kokkonen wrote:
> >>>> There is a great deal of diversity in the watchdog hardware found on
> >>>> different devices. Differen hardware have different contstraints on
> >>>> them, many of the constraints that are excessively difficult for the
> >>>> user space to satisfy.
> >>>>
> >>>> One such constraint is the length of the timeout value, which in many
> >>>> cases can be just a few seconds. Drivers are creating ad hoc solutions
> >>>> with timers and workqueues to extend the timeout in order to give user
> >>>> space more time between updates. Looking at the drivers it is clear
> >>>> that this has resulted to a lot of duplicate code.
> >>>>
> >>>> Add an extension to the watchdog kernel API that allows the driver to
> >>>> describe tis HW constraints to the watchdog code. A kernel worker in
> >>>> the core is then used to extend the watchdog timeout on behalf of the
> >>>> user space. This allows the drivers to be simplified as core takes
> >>>> over the timer extending.
> >>>>
> >>>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> >>>> ---
> >>>>   drivers/watchdog/watchdog_core.c | 87
> >>>> ++++++++++++++++++++++++++++++++++++++--
> >>>>   drivers/watchdog/watchdog_dev.c  | 12 ++++++
> >>>>   include/linux/watchdog.h         | 23 +++++++++++
> >>>>   3 files changed, 119 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/watchdog/watchdog_core.c
> >>>> b/drivers/watchdog/watchdog_core.c
> >>>> index cec9b55..8e7d08d 100644
> >>>> --- a/drivers/watchdog/watchdog_core.c
> >>>> +++ b/drivers/watchdog/watchdog_core.c
> >>>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device
> >>>> *wdd,
> >>>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
> >>>>   /**
> >>>> + * watchdog_init_timeout() - initialize generic watchdog parameters
> >>>> + * @wdd: Watchdog device to operate
> >>>> + * @dev: Device that stores the device tree properties
> >>>> + *
> >>>> + * Initialize the generic timeout parameters. The driver needs to set
> >>>> + * hw_features bitmask from @wdd prior calling this function in order
> >>>> + * to ensure the core knows how to handle the HW.
> >>>> + *
> >>>> + * A zero is returned on success and -EINVAL for failure.
> >>>> + */
> >>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
> >>>> *dev)
> >>>> +{
> >>>> +    int ret = 0;
> >>>> +
> >>>> +    ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> >>>> +    if (ret < 0)
> >>>> +        return ret;
> >>>> +
> >>>> +    /*
> >>>> +     * Max HW timeout needs to be set so that core knows when to
> >>>> +     * use a kernel worker to support longer watchdog timeouts
> >>>> +     */
> >>>> +    if (!wdd->hw_max_timeout)
> >>>> +        return -EINVAL;
> >>>> +
> >>>> +    return ret;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> >>>> +
> >>>> +static void watchdog_worker(struct work_struct *work)
> >>>> +{
> >>>> +    struct watchdog_device *wdd = container_of(to_delayed_work(work),
> >>>> +                        struct watchdog_device, work);
> >>>> +
> >>>> +    if (time_after(jiffies, wdd->expires)) {
> >>>> +        pr_crit("I will reset your machine !\n");
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> >>>> +        (watchdog_active(wdd) &&
> >>>> +            wdd->hw_max_timeout < wdd->timeout * HZ)) {
> >>>> +        if (wdd->ops->ping)
> >>>> +            wdd->ops->ping(wdd);
> >>>> +        else
> >>>> +            wdd->ops->start(wdd);
> >>>> +
> >>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> >>>     I think you missed the update of wdd->expires, as following.
> >>>                                     wdd->expires = jiffies +
> >>> wdd->timeout * HZ;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static int prepare_watchdog(struct watchdog_device *wdd)
> >>>> +{
> >>>> +    /* Stop the watchdog now before user space opens the device */
> >>>> +    if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> >>>> +        wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> >>>> +        wdd->ops->stop(wdd);
> >>>> +
> >>>> +    } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> >>>> +        /*
> >>>> +         * Can't stop it, use a kernel timer to tick
> >>>> +         * it until it's open by user space
> >>>> +         */
> >>>> +        schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> >>>> +    }
> >>>       ditto
> >>>                      wdd->expires = jiffies + wdd->timeout * HZ;
> >>>
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>>    * watchdog_register_device() - register a watchdog device
> >>>>    * @wdd: watchdog device
> >>>>    *
> >>>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
> >>>> watchdog_device *wdd)
> >>>>       wdd->dev = device_create(watchdog_class, wdd->parent, devno,
> >>>>                       NULL, "watchdog%d", wdd->id);
> >>>>       if (IS_ERR(wdd->dev)) {
> >>>> -        watchdog_dev_unregister(wdd);
> >>>> -        ida_simple_remove(&watchdog_ida, id);
> >>>>           ret = PTR_ERR(wdd->dev);
> >>>> -        return ret;
> >>>> +        goto dev_create_fail;
> >>>> +    }
> >>>> +
> >>>> +    INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> >>>> +
> >>>> +    if (wdd->hw_max_timeout) {
> >>>> +        ret = prepare_watchdog(wdd);
> >>>> +        if (ret)
> >>>> +            goto dev_create_fail;
> >>>>       }
> >>>>       return 0;
> >>>> +
> >>>> +dev_create_fail:
> >>>> +    watchdog_dev_unregister(wdd);
> >>>> +    ida_simple_remove(&watchdog_ida, id);
> >>>> +    return ret;
> >>>>   }
> >>>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
> >>>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
> >>>> watchdog_device *wdd)
> >>>>       if (wdd == NULL)
> >>>>           return;
> >>>> +    cancel_delayed_work_sync(&wdd->work);
> >>>>       devno = wdd->cdev.dev;
> >>>>       ret = watchdog_dev_unregister(wdd);
> >>>>       if (ret)
> >>>> diff --git a/drivers/watchdog/watchdog_dev.c
> >>>> b/drivers/watchdog/watchdog_dev.c
> >>>> index 6aaefba..99eb363 100644
> >>>> --- a/drivers/watchdog/watchdog_dev.c
> >>>> +++ b/drivers/watchdog/watchdog_dev.c
> >>>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
> >>>> *wddev)
> >>>>       else
> >>>>           err = wddev->ops->start(wddev); /* restart watchdog */
> >>>> +    if (wddev->hw_max_timeout &&
> >>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
> >>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
> >>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >>>> +    }
> >>>> +
> >>>>   out_ping:
> >>>>       mutex_unlock(&wddev->lock);
> >>>>       return err;
> >>>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
> >>>> *wddev)
> >>>>       if (err == 0)
> >>>>           set_bit(WDOG_ACTIVE, &wddev->status);
> >>>> +    if (wddev->hw_max_timeout &&
> >>>> +        wddev->timeout * HZ > wddev->hw_max_timeout) {
> >>>> +        wddev->expires = jiffies + wddev->timeout * HZ;
> >>>> +        schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >>>> +    }
> >>>> +
> >>>>   out_start:
> >>>>       mutex_unlock(&wddev->lock);
> >>>>       return err;
> >>>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> >>>> index 395b70e..c16c921 100644
> >>>> --- a/include/linux/watchdog.h
> >>>> +++ b/include/linux/watchdog.h
> >>>> @@ -12,6 +12,7 @@
> >>>>   #include <linux/bitops.h>
> >>>>   #include <linux/device.h>
> >>>>   #include <linux/cdev.h>
> >>>> +#include <linux/workqueue.h>
> >>>>   #include <uapi/linux/watchdog.h>
> >>>>   struct watchdog_ops;
> >>>> @@ -62,9 +63,13 @@ struct watchdog_ops {
> >>>>    * @timeout:    The watchdog devices timeout value.
> >>>>    * @min_timeout:The watchdog devices minimum timeout value.
> >>>>    * @max_timeout:The watchdog devices maximum timeout value.
> >>>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> >>>> + * @hw_margin:    Time interval between timer pings.
> >>>>    * @driver-data:Pointer to the drivers private data.
> >>>>    * @lock:    Lock for watchdog core internal use only.
> >>>> + * @work:    Worker used to provide longer timeouts than hardware
> >>>> supports.
> >>>>    * @status:    Field that contains the devices internal status bits.
> >>>> + * @hw_features:Feature bits describing how the watchdog HW works.
> >>>>    *
> >>>>    * The watchdog_device structure contains all information about a
> >>>>    * watchdog timer device.
> >>>> @@ -86,8 +91,12 @@ struct watchdog_device {
> >>>>       unsigned int timeout;
> >>>>       unsigned int min_timeout;
> >>>>       unsigned int max_timeout;
> >>>> +    unsigned int hw_max_timeout; /* in jiffies */
> >>>> +    unsigned int hw_heartbeat; /* in jiffies */
> >>>> +    unsigned long int expires; /* for soft timer */
> >>>>       void *driver_data;
> >>>>       struct mutex lock;
> >>>> +    struct delayed_work work;
> >>>>       unsigned long status;
> >>>>   /* Bit numbers for status flags */
> >>>>   #define WDOG_ACTIVE        0    /* Is the watchdog running/active */
> >>>> @@ -95,6 +104,14 @@ struct watchdog_device {
> >>>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char
> >>>> ? */
> >>>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
> >>>>   #define WDOG_UNREGISTERED    4    /* Has the device been
> >>>> unregistered */
> >>>> +
> >>>> +/* Bits describing features supported by the HW */
> >>>> +    unsigned long hw_features;
> >>>> +
> >>>> +/* Can the watchdog be stopped and started */
> >>>> +#define WDOG_HW_IS_STOPPABLE        BIT(0)
> >>>> +/* Is the watchdog already running when the driver starts up */
> >>>> +#define WDOG_HW_RUNNING_AT_BOOT        BIT(1)
> >>>>   };
> >>>>   #define WATCHDOG_NOWAYOUT
> IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> >>>> @@ -120,6 +137,11 @@ static inline bool
> >>>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
> >>>>           (t < wdd->min_timeout || t > wdd->max_timeout));
> >>>>   }
> >>>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> >>>> +{
> >>>> +    return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> >>>> +}
> >>>> +
> >>>>   /* Use the following functions to manipulate watchdog driver
> >>>> specific data */
> >>>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
> >>>> void *data)
> >>>>   {
> >>>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
> >>>> watchdog_device *wdd)
> >>>>   /* drivers/watchdog/watchdog_core.c */
> >>>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
> >>>>                     unsigned int timeout_parm, struct device *dev);
> >>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
> >>>> *dev);
> >>>>   extern int watchdog_register_device(struct watchdog_device *);
> >>>>   extern void watchdog_unregister_device(struct watchdog_device *);
> >>> Best Regards,
> >>> Wenyou Yang
> >>
> > Best Regards,
> > Wenyou Yang
> >
Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-04-14  5:44 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-10 13:06 [PATCHv5 0/4] watchdog: Extend kernel API and add early_timeout_sec feature Timo Kokkonen
2015-04-10 13:06 ` Timo Kokkonen
2015-04-10 13:06 ` [PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations Timo Kokkonen
2015-04-10 13:06   ` Timo Kokkonen
2015-04-13  7:56   ` Yang, Wenyou
2015-04-13  7:56     ` Yang, Wenyou
2015-04-13  9:29   ` Yang, Wenyou
2015-04-13  9:29     ` Yang, Wenyou
2015-04-13 11:00     ` Timo Kokkonen
2015-04-13 11:00       ` Timo Kokkonen
2015-04-14  2:39       ` Yang, Wenyou
2015-04-14  2:39         ` Yang, Wenyou
2015-04-14  5:31         ` Timo Kokkonen
2015-04-14  5:31           ` Timo Kokkonen
2015-04-14  5:44           ` Yang, Wenyou
2015-04-14  5:44             ` Yang, Wenyou
2015-04-10 13:06 ` [PATCHv5 2/4] watchdog: Allow watchdog to reset device at early boot Timo Kokkonen
2015-04-10 13:06   ` Timo Kokkonen
2015-04-13  7:56   ` Yang, Wenyou
2015-04-13  7:56     ` Yang, Wenyou
2015-04-10 13:06 ` [PATCHv5 3/4] devicetree: Document generic watchdog properties Timo Kokkonen
2015-04-10 13:06   ` Timo Kokkonen
2015-04-10 13:06 ` [PATCHv5 4/4] watchdog: at91sam9_wdt: Convert to use new watchdog core extensions Timo Kokkonen
2015-04-10 13:06   ` Timo Kokkonen
2015-04-13  7:55 ` [PATCHv5 0/4] watchdog: Extend kernel API and add early_timeout_sec feature Yang, Wenyou
2015-04-13  7:55   ` Yang, Wenyou
2015-04-13  8:19   ` Timo Kokkonen
2015-04-13  8:19     ` Timo Kokkonen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.