linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Detect stalls on guest vCPUS
@ 2022-04-05 14:19 Sebastian Ene
  2022-04-05 14:19 ` [PATCH 1/2] dt-bindings: watchdog: Add qemu,vm-watchdog compatible Sebastian Ene
  2022-04-05 14:19 ` [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  0 siblings, 2 replies; 9+ messages in thread
From: Sebastian Ene @ 2022-04-05 14:19 UTC (permalink / raw)
  To: Wim Van Sebroeck, Guenter Roeck, Rob Herring
  Cc: devicetree, linux-kernel, linux-watchdog, will, qperret, maz,
	Sebastian Ene

This adds a mechanism to detect stalls on the guest vCPUS by creating a
per CPU hrtimer which periodically 'pets' the host backend driver.

This device driver acts as a soft lockup detector by relying on the host
backend driver to measure the elapesed time between subsequent 'pet' events.
If the elapsed time doesn't match an expected value, the backend driver
decides that the guest vCPU is locked and resets the guest. The host
backend driver takes into account the time that the guest is not
running. The communication with the backend driver is done through MMIO
and the register layout of the virtual watchdog is described as part of
the backend driver changes.

The host backend driver is implemented as part of:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817

Sebastian Ene (2):
  dt-bindings: watchdog: Add qemu,vm-watchdog compatible
  watchdog: Add a mechanism to detect stalls on guest vCPUs

 .../devicetree/bindings/watchdog/vm-wdt.yaml  |  44 ++++
 drivers/watchdog/Kconfig                      |   8 +
 drivers/watchdog/Makefile                     |   1 +
 drivers/watchdog/vm-wdt.c                     | 215 ++++++++++++++++++
 4 files changed, 268 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/vm-wdt.yaml
 create mode 100644 drivers/watchdog/vm-wdt.c

-- 
2.35.1.1094.g7c7d902a7c-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] dt-bindings: watchdog: Add qemu,vm-watchdog compatible
  2022-04-05 14:19 [PATCH 0/2] Detect stalls on guest vCPUS Sebastian Ene
@ 2022-04-05 14:19 ` Sebastian Ene
  2022-04-06 12:25   ` Rob Herring
  2022-04-05 14:19 ` [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Ene @ 2022-04-05 14:19 UTC (permalink / raw)
  To: Wim Van Sebroeck, Guenter Roeck, Rob Herring
  Cc: devicetree, linux-kernel, linux-watchdog, will, qperret, maz,
	Sebastian Ene

This watchdog can be used to detect stalls on vCPUs.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
---
 .../devicetree/bindings/watchdog/vm-wdt.yaml  | 44 +++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/vm-wdt.yaml

diff --git a/Documentation/devicetree/bindings/watchdog/vm-wdt.yaml b/Documentation/devicetree/bindings/watchdog/vm-wdt.yaml
new file mode 100644
index 000000000000..5365c963a7f6
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/vm-wdt.yaml
@@ -0,0 +1,44 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/watchdog/vm-wdt.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: VM watchdog
+
+allOf:
+  - $ref: "watchdog.yaml#"
+
+maintainers:
+  - Sebastian Ene <sebastianene@google.com>
+
+properties:
+  compatible:
+    enum:
+      - qemu,vm-watchdog
+  clock:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description: |
+      The watchdog internal clock measure in Hz used to decrement the
+      watchdog counter register on each tick.
+      Defaults to 10 if unset.
+  timeout-sec:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description: |
+      The watchdog expiration timeout measured in seconds.
+      Defaults to 8 if unset.
+
+required:
+  - compatible
+
+unevaluatedProperties: false
+
+examples:
+  - |
+    watchdog {
+      compatible = "qemu,vm-watchdog";
+      clock = <10>;
+      timeout-sec = <8>;
+    };
+
+...
-- 
2.35.1.1094.g7c7d902a7c-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs
  2022-04-05 14:19 [PATCH 0/2] Detect stalls on guest vCPUS Sebastian Ene
  2022-04-05 14:19 ` [PATCH 1/2] dt-bindings: watchdog: Add qemu,vm-watchdog compatible Sebastian Ene
@ 2022-04-05 14:19 ` Sebastian Ene
  2022-04-05 21:15   ` Guenter Roeck
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Ene @ 2022-04-05 14:19 UTC (permalink / raw)
  To: Wim Van Sebroeck, Guenter Roeck, Rob Herring
  Cc: devicetree, linux-kernel, linux-watchdog, will, qperret, maz,
	Sebastian Ene

This patch adds support for a virtual watchdog which relies on the
per-cpu hrtimers to pet at regular intervals.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
---
 drivers/watchdog/Kconfig  |   8 ++
 drivers/watchdog/Makefile |   1 +
 drivers/watchdog/vm-wdt.c | 215 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 224 insertions(+)
 create mode 100644 drivers/watchdog/vm-wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 01ce3f41cc21..3304d128484e 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -351,6 +351,14 @@ config SL28CPLD_WATCHDOG
 	  To compile this driver as a module, choose M here: the
 	  module will be called sl28cpld_wdt.
 
+config VM_WATCHDOG
+	tristate "Virtual Machine Watchdog"
+	select LOCKUP_DETECTOR
+	help
+	  Detect CPU locks on the virtual machine.
+	  To compile this driver as a module, choose M here: the
+	  module will be called vm-wdt.
+
 # ALPHA Architecture
 
 # ARM Architecture
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index 071a2e50be98..73206cbc3835 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -227,3 +227,4 @@ obj-$(CONFIG_MENZ069_WATCHDOG) += menz69_wdt.o
 obj-$(CONFIG_RAVE_SP_WATCHDOG) += rave-sp-wdt.o
 obj-$(CONFIG_STPMIC1_WATCHDOG) += stpmic1_wdt.o
 obj-$(CONFIG_SL28CPLD_WATCHDOG) += sl28cpld_wdt.o
+obj-$(CONFIG_VM_WATCHDOG) += vm-wdt.o
diff --git a/drivers/watchdog/vm-wdt.c b/drivers/watchdog/vm-wdt.c
new file mode 100644
index 000000000000..ea4351754645
--- /dev/null
+++ b/drivers/watchdog/vm-wdt.c
@@ -0,0 +1,215 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// Virtual watchdog driver.
+//  Copyright (C) Google, 2022
+
+#define pr_fmt(fmt) "vm-watchdog: " fmt
+
+#include <linux/cpu.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/nmi.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/param.h>
+#include <linux/percpu.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+#define DRV_NAME			"vm_wdt"
+#define DRV_VERSION			"1.0"
+
+#define VMWDT_REG_STATUS		(0x00)
+#define VMWDT_REG_LOAD_CNT		(0x04)
+#define VMWDT_REG_CURRENT_CNT		(0x08)
+#define VMWDT_REG_CLOCK_FREQ_HZ		(0x0C)
+#define VMWDT_REG_LEN			(0x10)
+
+#define VMWDT_DEFAULT_CLOCK_HZ		(10)
+#define VMWDT_DEFAULT_TIMEOT_SEC	(8)
+
+struct vm_wdt_s {
+	void __iomem *membase;
+	u32 clock_freq;
+	u32 expiration_sec;
+	u32 ping_timeout_ms;
+	struct hrtimer per_cpu_hrtimer;
+	struct platform_device *dev;
+};
+
+#define vmwdt_reg_write(wdt, reg, value)	\
+	iowrite32((value), (wdt)->membase + (reg))
+#define vmwdt_reg_read(wdt, reg)		\
+	io32read((wdt)->membase + (reg))
+
+static struct platform_device *virt_dev;
+
+static enum hrtimer_restart vmwdt_timer_fn(struct hrtimer *hrtimer)
+{
+	struct vm_wdt_s *cpu_wdt;
+	u32 ticks;
+
+	cpu_wdt = container_of(hrtimer, struct vm_wdt_s, per_cpu_hrtimer);
+	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
+	hrtimer_forward_now(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms));
+
+	return HRTIMER_RESTART;
+}
+
+static void vmwdt_start(void *arg)
+{
+	u32 ticks;
+	int cpu = smp_processor_id();
+	struct vm_wdt_s *cpu_wdt = arg;
+	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
+
+	pr_info("cpu %u vmwdt start\n", cpu);
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_CLOCK_FREQ_HZ,
+			cpu_wdt->clock_freq);
+
+	/* Compute the number of ticks required for the watchdog counter
+	 * register based on the internal clock frequency and the watchdog
+	 * timeout given from the device tree.
+	 */
+	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
+
+	/* Enable the internal clock and start the watchdog */
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_STATUS, 1);
+
+	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hrtimer->function = vmwdt_timer_fn;
+	hrtimer_start(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms),
+		      HRTIMER_MODE_REL_PINNED);
+}
+
+static void vmwdt_stop(void *arg)
+{
+	int cpu = smp_processor_id();
+	struct vm_wdt_s *cpu_wdt = arg;
+	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
+
+	hrtimer_cancel(hrtimer);
+
+	/* Disable the watchdog */
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_STATUS, 0);
+	pr_info("cpu %d vmwdt stop\n", cpu);
+}
+
+static int start_watchdog_on_cpu(unsigned int cpu)
+{
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(virt_dev);
+
+	vmwdt_start(this_cpu_ptr(vm_wdt));
+	return 0;
+}
+
+static int stop_watchdog_on_cpu(unsigned int cpu)
+{
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(virt_dev);
+
+	vmwdt_stop(this_cpu_ptr(vm_wdt));
+	return 0;
+}
+
+static int vmwdt_probe(struct platform_device *dev)
+{
+	int cpu, ret, err;
+	void __iomem *membase;
+	struct resource *r;
+	struct vm_wdt_s *vm_wdt;
+	u32 wdt_clock, wdt_timeout_sec = 0;
+
+	r = platform_get_resource(dev, IORESOURCE_MEM, 0);
+	if (r == NULL)
+		return -ENOENT;
+
+	vm_wdt = alloc_percpu(typeof(struct vm_wdt_s));
+	if (!vm_wdt)
+		return -ENOMEM;
+
+	membase = ioremap(r->start, resource_size(r));
+	if (!membase) {
+		ret = -ENXIO;
+		goto err_withmem;
+	}
+
+	virt_dev = dev;
+	platform_set_drvdata(dev, vm_wdt);
+	if (of_property_read_u32(dev->dev.of_node, "clock", &wdt_clock))
+		wdt_clock = VMWDT_DEFAULT_CLOCK_HZ;
+
+	if (of_property_read_u32(dev->dev.of_node, "timeout-sec",
+				 &wdt_timeout_sec))
+		wdt_timeout_sec = VMWDT_DEFAULT_TIMEOT_SEC;
+
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) {
+		struct vm_wdt_s *cpu_wdt = per_cpu_ptr(vm_wdt, cpu);
+
+		cpu_wdt->membase = membase + cpu * VMWDT_REG_LEN;
+		cpu_wdt->clock_freq = wdt_clock;
+		cpu_wdt->expiration_sec = wdt_timeout_sec;
+		cpu_wdt->ping_timeout_ms = wdt_timeout_sec * MSEC_PER_SEC / 2;
+		smp_call_function_single(cpu, vmwdt_start, cpu_wdt, true);
+	}
+
+	err = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					"virt/watchdog:online",
+					start_watchdog_on_cpu,
+					stop_watchdog_on_cpu);
+	if (err < 0) {
+		pr_warn("could not be initialized");
+		ret = err;
+		goto err_withmem;
+	}
+
+	return 0;
+
+err_withmem:
+	free_percpu(vm_wdt);
+	return ret;
+}
+
+static int vmwdt_remove(struct platform_device *dev)
+{
+	int cpu;
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(dev);
+
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) {
+		struct vm_wdt_s *cpu_wdt = per_cpu_ptr(vm_wdt, cpu);
+
+		smp_call_function_single(cpu, vmwdt_stop, cpu_wdt, true);
+	}
+
+	free_percpu(vm_wdt);
+	return 0;
+}
+
+static const struct of_device_id vmwdt_of_match[] = {
+	{ .compatible = "qemu,vm-watchdog", },
+	{}
+};
+
+MODULE_DEVICE_TABLE(of, vm_watchdog_of_match);
+
+static struct platform_driver vmwdt_driver = {
+	.probe  = vmwdt_probe,
+	.remove = vmwdt_remove,
+	.driver = {
+		.name           = DRV_NAME,
+		.of_match_table = vmwdt_of_match,
+	},
+};
+
+module_platform_driver(vmwdt_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Sebastian Ene <sebastianene@google.com>");
+MODULE_DESCRIPTION("Virtual watchdog driver");
+MODULE_VERSION(DRV_VERSION);
-- 
2.35.1.1094.g7c7d902a7c-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs
  2022-04-05 14:19 ` [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
@ 2022-04-05 21:15   ` Guenter Roeck
  2022-04-06 16:31     ` Sebastian Ene
  0 siblings, 1 reply; 9+ messages in thread
From: Guenter Roeck @ 2022-04-05 21:15 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Wim Van Sebroeck, Rob Herring, devicetree, linux-kernel,
	linux-watchdog, will, qperret, maz

Sebastian,

On Tue, Apr 05, 2022 at 02:19:55PM +0000, Sebastian Ene wrote:
> This patch adds support for a virtual watchdog which relies on the
> per-cpu hrtimers to pet at regular intervals.
> 

The watchdog subsystem is not intended to detect soft and hard lockups.
It is intended to detect userspace issues. A watchdog driver requires
a userspace compinent which needs to ping the watchdog on a regular basis
to prevent timeouts (and watchdog drivers are supposed to use the
watchdog kernel API).

What you have here is a CPU stall detection mechanism, similar to the
existing soft/hard lockup detection mechanism. This code does not
belong into the watchdog subsystem; it is similar to the existing
hard/softlockup detection code (kernel/watchdog.c) and should reside
at the same location.

Having said that, I could imagine a watchdog driver to be used in VMs,
but that would be similar to existing watchdog drivers. The easiest way
to get there would probably be to just instantiate one of the watchdog
devices already supported by qemu.

Guenter

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] dt-bindings: watchdog: Add qemu,vm-watchdog compatible
  2022-04-05 14:19 ` [PATCH 1/2] dt-bindings: watchdog: Add qemu,vm-watchdog compatible Sebastian Ene
@ 2022-04-06 12:25   ` Rob Herring
  2022-04-06 16:34     ` Sebastian Ene
  0 siblings, 1 reply; 9+ messages in thread
From: Rob Herring @ 2022-04-06 12:25 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: linux-kernel, devicetree, Guenter Roeck, Wim Van Sebroeck,
	Rob Herring, qperret, maz, linux-watchdog, will

On Tue, 05 Apr 2022 14:19:54 +0000, Sebastian Ene wrote:
> This watchdog can be used to detect stalls on vCPUs.
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  .../devicetree/bindings/watchdog/vm-wdt.yaml  | 44 +++++++++++++++++++
>  1 file changed, 44 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/watchdog/vm-wdt.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/watchdog/vm-wdt.yaml: properties:timeout-sec: '$ref' should not be valid under {'const': '$ref'}
	hint: Standard unit suffix properties don't need a type $ref
	from schema $id: http://devicetree.org/meta-schemas/core.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/watchdog/vm-wdt.yaml: ignoring, error in schema: properties: timeout-sec
Documentation/devicetree/bindings/watchdog/vm-wdt.example.dtb:0:0: /example-0/watchdog: failed to match any schema with compatible: ['qemu,vm-watchdog']

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs
  2022-04-05 21:15   ` Guenter Roeck
@ 2022-04-06 16:31     ` Sebastian Ene
  2022-04-06 16:52       ` Guenter Roeck
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Ene @ 2022-04-06 16:31 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Wim Van Sebroeck, Rob Herring, devicetree, linux-kernel,
	linux-watchdog, will, qperret, maz

On Tue, Apr 05, 2022 at 02:15:51PM -0700, Guenter Roeck wrote:
> Sebastian,
> 

Hello Guenter,

> On Tue, Apr 05, 2022 at 02:19:55PM +0000, Sebastian Ene wrote:
> > This patch adds support for a virtual watchdog which relies on the
> > per-cpu hrtimers to pet at regular intervals.
> > 
> 
> The watchdog subsystem is not intended to detect soft and hard lockups.
> It is intended to detect userspace issues. A watchdog driver requires
> a userspace compinent which needs to ping the watchdog on a regular basis
> to prevent timeouts (and watchdog drivers are supposed to use the
> watchdog kernel API).
> 

Thanks for getting back ! I wanted to create a mechanism to detect
stalls on vCPUs and I am not sure if the current watchdog subsystem has a way
to create per-CPU binded watchdogs (in the same way as Power PC has
kernel/watchdog.c). 
The per-CPU watchdog is needed to account for time that the guest is not
running(either scheduled out or waiting for an event) to prevent spurious
reset events caused by the watchdog.

> What you have here is a CPU stall detection mechanism, similar to the
> existing soft/hard lockup detection mechanism. This code does not
> belong into the watchdog subsystem; it is similar to the existing
> hard/softlockup detection code (kernel/watchdog.c) and should reside
> at the same location.
>

I agree that this doesn't belong to the watchdog subsytem but the current
stall detection mechanism calls through MMIO into a virtual device
'qemu,virt-watchdog'. Calling a device from (kernel/watchdog.c) isn't
something that we should avoid ?

> Having said that, I could imagine a watchdog driver to be used in VMs,
> but that would be similar to existing watchdog drivers. The easiest way
> to get there would probably be to just instantiate one of the watchdog
> devices already supported by qemu.
>

I am looking forward for your response,

> Guenter

Cheers,
Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] dt-bindings: watchdog: Add qemu,vm-watchdog compatible
  2022-04-06 12:25   ` Rob Herring
@ 2022-04-06 16:34     ` Sebastian Ene
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Ene @ 2022-04-06 16:34 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, devicetree, Guenter Roeck, Wim Van Sebroeck,
	Rob Herring, qperret, maz, linux-watchdog, will

On Wed, Apr 06, 2022 at 07:25:24AM -0500, Rob Herring wrote:
> On Tue, 05 Apr 2022 14:19:54 +0000, Sebastian Ene wrote:
> > This watchdog can be used to detect stalls on vCPUs.
> > 
> > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > ---
> >  .../devicetree/bindings/watchdog/vm-wdt.yaml  | 44 +++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/watchdog/vm-wdt.yaml
> > 

Hello Rob,

> 
> My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> on your patch (DT_CHECKER_FLAGS is new in v5.13):
> 
> yamllint warnings/errors:
> 
> dtschema/dtc warnings/errors:
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/watchdog/vm-wdt.yaml: properties:timeout-sec: '$ref' should not be valid under {'const': '$ref'}
> 	hint: Standard unit suffix properties don't need a type $ref
> 	from schema $id: http://devicetree.org/meta-schemas/core.yaml#
> /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/watchdog/vm-wdt.yaml: ignoring, error in schema: properties: timeout-sec
> Documentation/devicetree/bindings/watchdog/vm-wdt.example.dtb:0:0: /example-0/watchdog: failed to match any schema with compatible: ['qemu,vm-watchdog']
> 
> doc reference errors (make refcheckdocs):
> 
> See https://patchwork.ozlabs.org/patch/
> 

I will fix them, thanks for letting me know.

> This check can fail if there are any dependencies. The base for a patch
> series is generally the most recent rc1.
> 
> If you already ran 'make dt_binding_check' and didn't see the above
> error(s), then make sure 'yamllint' is installed and dt-schema is up to
> date:
> 
> pip3 install dtschema --upgrade
> 
> Please check and re-submit.
> 

Cheers,
Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs
  2022-04-06 16:31     ` Sebastian Ene
@ 2022-04-06 16:52       ` Guenter Roeck
  2022-04-08 14:03         ` Sebastian Ene
  0 siblings, 1 reply; 9+ messages in thread
From: Guenter Roeck @ 2022-04-06 16:52 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Wim Van Sebroeck, Rob Herring, devicetree, linux-kernel,
	linux-watchdog, will, qperret, maz

On 4/6/22 09:31, Sebastian Ene wrote:
> On Tue, Apr 05, 2022 at 02:15:51PM -0700, Guenter Roeck wrote:
>> Sebastian,
>>
> 
> Hello Guenter,
> 
>> On Tue, Apr 05, 2022 at 02:19:55PM +0000, Sebastian Ene wrote:
>>> This patch adds support for a virtual watchdog which relies on the
>>> per-cpu hrtimers to pet at regular intervals.
>>>
>>
>> The watchdog subsystem is not intended to detect soft and hard lockups.
>> It is intended to detect userspace issues. A watchdog driver requires
>> a userspace compinent which needs to ping the watchdog on a regular basis
>> to prevent timeouts (and watchdog drivers are supposed to use the
>> watchdog kernel API).
>>
> 
> Thanks for getting back ! I wanted to create a mechanism to detect
> stalls on vCPUs and I am not sure if the current watchdog subsystem has a way
> to create per-CPU binded watchdogs (in the same way as Power PC has
> kernel/watchdog.c).
> The per-CPU watchdog is needed to account for time that the guest is not
> running(either scheduled out or waiting for an event) to prevent spurious
> reset events caused by the watchdog.
> 
>> What you have here is a CPU stall detection mechanism, similar to the
>> existing soft/hard lockup detection mechanism. This code does not
>> belong into the watchdog subsystem; it is similar to the existing
>> hard/softlockup detection code (kernel/watchdog.c) and should reside
>> at the same location.
>>
> 
> I agree that this doesn't belong to the watchdog subsytem but the current
> stall detection mechanism calls through MMIO into a virtual device
> 'qemu,virt-watchdog'. Calling a device from (kernel/watchdog.c) isn't
> something that we should avoid ?
> 

You are introducing qemu,virt-watchdog, so it seems to me that any argument
along that line doesn't really apply.

I think it is more a matter for core kernel developers to discuss and
decide how this functionality is best instantiated. It doesn't _have_
to be a device, after all, just like the current lockup detection
code is not a device. Either case, I am not really the right person
to discuss this since it is a matter of core kernel code which I am
not sufficiently familiar with. All I can say is that watchdog drivers
in the watchdog subsystem have a different scope.

Guenter

>> Having said that, I could imagine a watchdog driver to be used in VMs,
>> but that would be similar to existing watchdog drivers. The easiest way
>> to get there would probably be to just instantiate one of the watchdog
>> devices already supported by qemu.
>>
> 
> I am looking forward for your response,
> 
>> Guenter
> 
> Cheers,
> Sebastian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs
  2022-04-06 16:52       ` Guenter Roeck
@ 2022-04-08 14:03         ` Sebastian Ene
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Ene @ 2022-04-08 14:03 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Wim Van Sebroeck, Rob Herring, devicetree, linux-kernel,
	linux-watchdog, will, qperret, maz

On Wed, Apr 06, 2022 at 09:52:05AM -0700, Guenter Roeck wrote:
> On 4/6/22 09:31, Sebastian Ene wrote:
> > On Tue, Apr 05, 2022 at 02:15:51PM -0700, Guenter Roeck wrote:
> > > Sebastian,
> > > 
> > 
> > Hello Guenter,
> > 
> > > On Tue, Apr 05, 2022 at 02:19:55PM +0000, Sebastian Ene wrote:
> > > > This patch adds support for a virtual watchdog which relies on the
> > > > per-cpu hrtimers to pet at regular intervals.
> > > > 
> > > 
> > > The watchdog subsystem is not intended to detect soft and hard lockups.
> > > It is intended to detect userspace issues. A watchdog driver requires
> > > a userspace compinent which needs to ping the watchdog on a regular basis
> > > to prevent timeouts (and watchdog drivers are supposed to use the
> > > watchdog kernel API).
> > > 
> > 
> > Thanks for getting back ! I wanted to create a mechanism to detect
> > stalls on vCPUs and I am not sure if the current watchdog subsystem has a way
> > to create per-CPU binded watchdogs (in the same way as Power PC has
> > kernel/watchdog.c).
> > The per-CPU watchdog is needed to account for time that the guest is not
> > running(either scheduled out or waiting for an event) to prevent spurious
> > reset events caused by the watchdog.
> > 
> > > What you have here is a CPU stall detection mechanism, similar to the
> > > existing soft/hard lockup detection mechanism. This code does not
> > > belong into the watchdog subsystem; it is similar to the existing
> > > hard/softlockup detection code (kernel/watchdog.c) and should reside
> > > at the same location.
> > > 
> > 
> > I agree that this doesn't belong to the watchdog subsytem but the current
> > stall detection mechanism calls through MMIO into a virtual device
> > 'qemu,virt-watchdog'. Calling a device from (kernel/watchdog.c) isn't
> > something that we should avoid ?
> > 

Hello Guenter,

> 
> You are introducing qemu,virt-watchdog, so it seems to me that any argument
> along that line doesn't really apply.
> 

I am trying to follow your guidelines to make this work, so I would be grateful
if you have some time to share your thoughts on this. 

> I think it is more a matter for core kernel developers to discuss and
> decide how this functionality is best instantiated. It doesn't _have_
> to be a device, after all, just like the current lockup detection
> code is not a device. Either case, I am not really the right person
> to discuss this since it is a matter of core kernel code which I am
> not sufficiently familiar with. All I can say is that watchdog drivers
> in the watchdog subsystem have a different scope.

This watchdog device tracks the elapsed time on a per-cpu basis, since KVM
schedules vCPUs independently.
I am attempting to re-write it to use the watchdog-core infrastructure but
doing this will loose the per-cpu watchdog binding and exposing it to
userspace would require a strong thread affinity setting. How can I overcome
this problem ?

Having it like a hard lockup detector mechanism doesn’t work either because
when the watchdog expires, we rely on crosvm (not the guest kernel) to handle
this event and reset the machine. We cannot inject the reset event back into
the guest as we don’t have NMI support on arm64.

> 
> Guenter

Thanks,
Sebastian

> 
> > > Having said that, I could imagine a watchdog driver to be used in VMs,
> > > but that would be similar to existing watchdog drivers. The easiest way
> > > to get there would probably be to just instantiate one of the watchdog
> > > devices already supported by qemu.
> > > 
> > 
> > I am looking forward for your response,
> > 
> > > Guenter
> > 
> > Cheers,
> > Sebastian
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-08 14:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-05 14:19 [PATCH 0/2] Detect stalls on guest vCPUS Sebastian Ene
2022-04-05 14:19 ` [PATCH 1/2] dt-bindings: watchdog: Add qemu,vm-watchdog compatible Sebastian Ene
2022-04-06 12:25   ` Rob Herring
2022-04-06 16:34     ` Sebastian Ene
2022-04-05 14:19 ` [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
2022-04-05 21:15   ` Guenter Roeck
2022-04-06 16:31     ` Sebastian Ene
2022-04-06 16:52       ` Guenter Roeck
2022-04-08 14:03         ` Sebastian Ene

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).