All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH  v2 0/2] Detect stalls on guest vCPUS
@ 2022-04-22 14:19 Sebastian Ene
  2022-04-22 14:19 ` [PATCH v2 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Sebastian Ene @ 2022-04-22 14:19 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, qperret, Sebastian Ene

This adds a mechanism to detect stalls on the guest vCPUS by creating a
per CPU hrtimer which periodically 'pets' the host backend driver.

This device driver acts as a soft lockup detector by relying on the host
backend driver to measure the elapesed time between subsequent 'pet' events.
If the elapsed time doesn't match an expected value, the backend driver
decides that the guest vCPU is locked and resets the guest. The host
backend driver takes into account the time that the guest is not
running. The communication with the backend driver is done through MMIO
and the register layout of the virtual watchdog is described as part of
the backend driver changes.

The host backend driver is implemented as part of:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817

Changelog v2:
 - move the driver to misc as this does not cope with watchdog core
   subsystem
 - fix the dt-bindings warnings

Sebastian Ene (2):
  dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible
  misc: Add a mechanism to detect stalls on guest vCPUs

 .../devicetree/bindings/misc/vm-wdt.yaml      |  44 ++++
 drivers/misc/Kconfig                          |   8 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/vm-wdt.c                         | 215 ++++++++++++++++++
 4 files changed, 268 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/vm-wdt.yaml
 create mode 100644 drivers/misc/vm-wdt.c

-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH  v2 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible
  2022-04-22 14:19 [PATCH v2 0/2] Detect stalls on guest vCPUS Sebastian Ene
@ 2022-04-22 14:19 ` Sebastian Ene
  2022-04-22 21:10   ` Rob Herring
  2022-04-22 14:19 ` [PATCH v2 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  2022-04-23  6:51 ` [PATCH v2 0/2] Detect stalls on guest vCPUS Greg Kroah-Hartman
  2 siblings, 1 reply; 11+ messages in thread
From: Sebastian Ene @ 2022-04-22 14:19 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, qperret, Sebastian Ene

The stall detection mechanism allows to configure the expiration
duration and the internal counter clock frequency measured in Hz.
Add these properties in the schema.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
---
 .../devicetree/bindings/misc/vm-wdt.yaml      | 44 +++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/vm-wdt.yaml

diff --git a/Documentation/devicetree/bindings/misc/vm-wdt.yaml b/Documentation/devicetree/bindings/misc/vm-wdt.yaml
new file mode 100644
index 000000000000..cb7665a0c5af
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/vm-wdt.yaml
@@ -0,0 +1,44 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/misc/vm-wdt.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: VM watchdog
+
+description: |
+  This binding describes a CPU stall detector mechanism for virtual cpus.
+
+maintainers:
+  - Sebastian Ene <sebastianene@google.com>
+
+properties:
+  compatible:
+    enum:
+      - qemu,vm-watchdog
+  clock:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description: |
+      The watchdog internal clock measure in Hz used to decrement the
+      watchdog counter register on each tick.
+      Defaults to 10 if unset.
+  timeout-sec:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description: |
+      The watchdog expiration timeout measured in seconds.
+      Defaults to 8 if unset.
+
+required:
+  - compatible
+
+additionalProperties: false
+
+examples:
+  - |
+    watchdog {
+      compatible = "qemu,vm-watchdog";
+      clock = <10>;
+      timeout-sec = <8>;
+    };
+
+...
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH  v2 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-22 14:19 [PATCH v2 0/2] Detect stalls on guest vCPUS Sebastian Ene
  2022-04-22 14:19 ` [PATCH v2 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
@ 2022-04-22 14:19 ` Sebastian Ene
  2022-04-23  6:50   ` Greg Kroah-Hartman
  2022-04-23  6:51 ` [PATCH v2 0/2] Detect stalls on guest vCPUS Greg Kroah-Hartman
  2 siblings, 1 reply; 11+ messages in thread
From: Sebastian Ene @ 2022-04-22 14:19 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, qperret, Sebastian Ene

This patch adds support for a virtual watchdog which relies on the
per-cpu hrtimers to pet at regular intervals.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
---
 drivers/misc/Kconfig  |   8 ++
 drivers/misc/Makefile |   1 +
 drivers/misc/vm-wdt.c | 215 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 224 insertions(+)
 create mode 100644 drivers/misc/vm-wdt.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 2b9572a6d114..0e710149ff95 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -493,6 +493,14 @@ config OPEN_DICE
 
 	  If unsure, say N.
 
+config VM_WATCHDOG
+	tristate "Virtual Machine Watchdog"
+	select LOCKUP_DETECTOR
+	help
+	  Detect CPU locks on the virtual machine.
+	  To compile this driver as a module, choose M here: the
+	  module will be called vm-wdt.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 2ec634354cf5..868e28d01b75 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -59,3 +59,4 @@ obj-$(CONFIG_XILINX_SDFEC)	+= xilinx_sdfec.o
 obj-$(CONFIG_HISI_HIKEY_USB)	+= hisi_hikey_usb.o
 obj-$(CONFIG_UID_SYS_STATS)	+= uid_sys_stats.o
 obj-$(CONFIG_OPEN_DICE)		+= open-dice.o
+obj-$(CONFIG_VM_WATCHDOG) += vm-wdt.o
\ No newline at end of file
diff --git a/drivers/misc/vm-wdt.c b/drivers/misc/vm-wdt.c
new file mode 100644
index 000000000000..ea4351754645
--- /dev/null
+++ b/drivers/misc/vm-wdt.c
@@ -0,0 +1,215 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// Virtual watchdog driver.
+//  Copyright (C) Google, 2022
+
+#define pr_fmt(fmt) "vm-watchdog: " fmt
+
+#include <linux/cpu.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/nmi.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/param.h>
+#include <linux/percpu.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+#define DRV_NAME			"vm_wdt"
+#define DRV_VERSION			"1.0"
+
+#define VMWDT_REG_STATUS		(0x00)
+#define VMWDT_REG_LOAD_CNT		(0x04)
+#define VMWDT_REG_CURRENT_CNT		(0x08)
+#define VMWDT_REG_CLOCK_FREQ_HZ		(0x0C)
+#define VMWDT_REG_LEN			(0x10)
+
+#define VMWDT_DEFAULT_CLOCK_HZ		(10)
+#define VMWDT_DEFAULT_TIMEOT_SEC	(8)
+
+struct vm_wdt_s {
+	void __iomem *membase;
+	u32 clock_freq;
+	u32 expiration_sec;
+	u32 ping_timeout_ms;
+	struct hrtimer per_cpu_hrtimer;
+	struct platform_device *dev;
+};
+
+#define vmwdt_reg_write(wdt, reg, value)	\
+	iowrite32((value), (wdt)->membase + (reg))
+#define vmwdt_reg_read(wdt, reg)		\
+	io32read((wdt)->membase + (reg))
+
+static struct platform_device *virt_dev;
+
+static enum hrtimer_restart vmwdt_timer_fn(struct hrtimer *hrtimer)
+{
+	struct vm_wdt_s *cpu_wdt;
+	u32 ticks;
+
+	cpu_wdt = container_of(hrtimer, struct vm_wdt_s, per_cpu_hrtimer);
+	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
+	hrtimer_forward_now(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms));
+
+	return HRTIMER_RESTART;
+}
+
+static void vmwdt_start(void *arg)
+{
+	u32 ticks;
+	int cpu = smp_processor_id();
+	struct vm_wdt_s *cpu_wdt = arg;
+	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
+
+	pr_info("cpu %u vmwdt start\n", cpu);
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_CLOCK_FREQ_HZ,
+			cpu_wdt->clock_freq);
+
+	/* Compute the number of ticks required for the watchdog counter
+	 * register based on the internal clock frequency and the watchdog
+	 * timeout given from the device tree.
+	 */
+	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
+
+	/* Enable the internal clock and start the watchdog */
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_STATUS, 1);
+
+	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hrtimer->function = vmwdt_timer_fn;
+	hrtimer_start(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms),
+		      HRTIMER_MODE_REL_PINNED);
+}
+
+static void vmwdt_stop(void *arg)
+{
+	int cpu = smp_processor_id();
+	struct vm_wdt_s *cpu_wdt = arg;
+	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
+
+	hrtimer_cancel(hrtimer);
+
+	/* Disable the watchdog */
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_STATUS, 0);
+	pr_info("cpu %d vmwdt stop\n", cpu);
+}
+
+static int start_watchdog_on_cpu(unsigned int cpu)
+{
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(virt_dev);
+
+	vmwdt_start(this_cpu_ptr(vm_wdt));
+	return 0;
+}
+
+static int stop_watchdog_on_cpu(unsigned int cpu)
+{
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(virt_dev);
+
+	vmwdt_stop(this_cpu_ptr(vm_wdt));
+	return 0;
+}
+
+static int vmwdt_probe(struct platform_device *dev)
+{
+	int cpu, ret, err;
+	void __iomem *membase;
+	struct resource *r;
+	struct vm_wdt_s *vm_wdt;
+	u32 wdt_clock, wdt_timeout_sec = 0;
+
+	r = platform_get_resource(dev, IORESOURCE_MEM, 0);
+	if (r == NULL)
+		return -ENOENT;
+
+	vm_wdt = alloc_percpu(typeof(struct vm_wdt_s));
+	if (!vm_wdt)
+		return -ENOMEM;
+
+	membase = ioremap(r->start, resource_size(r));
+	if (!membase) {
+		ret = -ENXIO;
+		goto err_withmem;
+	}
+
+	virt_dev = dev;
+	platform_set_drvdata(dev, vm_wdt);
+	if (of_property_read_u32(dev->dev.of_node, "clock", &wdt_clock))
+		wdt_clock = VMWDT_DEFAULT_CLOCK_HZ;
+
+	if (of_property_read_u32(dev->dev.of_node, "timeout-sec",
+				 &wdt_timeout_sec))
+		wdt_timeout_sec = VMWDT_DEFAULT_TIMEOT_SEC;
+
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) {
+		struct vm_wdt_s *cpu_wdt = per_cpu_ptr(vm_wdt, cpu);
+
+		cpu_wdt->membase = membase + cpu * VMWDT_REG_LEN;
+		cpu_wdt->clock_freq = wdt_clock;
+		cpu_wdt->expiration_sec = wdt_timeout_sec;
+		cpu_wdt->ping_timeout_ms = wdt_timeout_sec * MSEC_PER_SEC / 2;
+		smp_call_function_single(cpu, vmwdt_start, cpu_wdt, true);
+	}
+
+	err = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					"virt/watchdog:online",
+					start_watchdog_on_cpu,
+					stop_watchdog_on_cpu);
+	if (err < 0) {
+		pr_warn("could not be initialized");
+		ret = err;
+		goto err_withmem;
+	}
+
+	return 0;
+
+err_withmem:
+	free_percpu(vm_wdt);
+	return ret;
+}
+
+static int vmwdt_remove(struct platform_device *dev)
+{
+	int cpu;
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(dev);
+
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) {
+		struct vm_wdt_s *cpu_wdt = per_cpu_ptr(vm_wdt, cpu);
+
+		smp_call_function_single(cpu, vmwdt_stop, cpu_wdt, true);
+	}
+
+	free_percpu(vm_wdt);
+	return 0;
+}
+
+static const struct of_device_id vmwdt_of_match[] = {
+	{ .compatible = "qemu,vm-watchdog", },
+	{}
+};
+
+MODULE_DEVICE_TABLE(of, vm_watchdog_of_match);
+
+static struct platform_driver vmwdt_driver = {
+	.probe  = vmwdt_probe,
+	.remove = vmwdt_remove,
+	.driver = {
+		.name           = DRV_NAME,
+		.of_match_table = vmwdt_of_match,
+	},
+};
+
+module_platform_driver(vmwdt_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Sebastian Ene <sebastianene@google.com>");
+MODULE_DESCRIPTION("Virtual watchdog driver");
+MODULE_VERSION(DRV_VERSION);
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH  v2 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible
  2022-04-22 14:19 ` [PATCH v2 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
@ 2022-04-22 21:10   ` Rob Herring
  0 siblings, 0 replies; 11+ messages in thread
From: Rob Herring @ 2022-04-22 21:10 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: maz, qperret, Rob Herring, Arnd Bergmann, Dragan Cvetic,
	linux-kernel, will, Greg Kroah-Hartman, devicetree

On Fri, 22 Apr 2022 14:19:49 +0000, Sebastian Ene wrote:
> The stall detection mechanism allows to configure the expiration
> duration and the internal counter clock frequency measured in Hz.
> Add these properties in the schema.
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  .../devicetree/bindings/misc/vm-wdt.yaml      | 44 +++++++++++++++++++
>  1 file changed, 44 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/misc/vm-wdt.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/misc/vm-wdt.yaml: properties:timeout-sec: '$ref' should not be valid under {'const': '$ref'}
	hint: Standard unit suffix properties don't need a type $ref
	from schema $id: http://devicetree.org/meta-schemas/core.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/misc/vm-wdt.yaml: ignoring, error in schema: properties: timeout-sec
Documentation/devicetree/bindings/misc/vm-wdt.example.dtb:0:0: /example-0/watchdog: failed to match any schema with compatible: ['qemu,vm-watchdog']

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH  v2 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-22 14:19 ` [PATCH v2 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
@ 2022-04-23  6:50   ` Greg Kroah-Hartman
  2022-04-23  9:14     ` Sebastian Ene
  0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-23  6:50 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, qperret

On Fri, Apr 22, 2022 at 02:19:50PM +0000, Sebastian Ene wrote:
> This patch adds support for a virtual watchdog which relies on the
> per-cpu hrtimers to pet at regular intervals.
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  drivers/misc/Kconfig  |   8 ++
>  drivers/misc/Makefile |   1 +
>  drivers/misc/vm-wdt.c | 215 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 224 insertions(+)
>  create mode 100644 drivers/misc/vm-wdt.c
> 
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 2b9572a6d114..0e710149ff95 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -493,6 +493,14 @@ config OPEN_DICE
>  
>  	  If unsure, say N.
>  
> +config VM_WATCHDOG
> +	tristate "Virtual Machine Watchdog"
> +	select LOCKUP_DETECTOR
> +	help
> +	  Detect CPU locks on the virtual machine.
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called vm-wdt.
> +
>  source "drivers/misc/c2port/Kconfig"
>  source "drivers/misc/eeprom/Kconfig"
>  source "drivers/misc/cb710/Kconfig"
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index 2ec634354cf5..868e28d01b75 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -59,3 +59,4 @@ obj-$(CONFIG_XILINX_SDFEC)	+= xilinx_sdfec.o
>  obj-$(CONFIG_HISI_HIKEY_USB)	+= hisi_hikey_usb.o
>  obj-$(CONFIG_UID_SYS_STATS)	+= uid_sys_stats.o
>  obj-$(CONFIG_OPEN_DICE)		+= open-dice.o
> +obj-$(CONFIG_VM_WATCHDOG) += vm-wdt.o

No tab?

> \ No newline at end of file
> diff --git a/drivers/misc/vm-wdt.c b/drivers/misc/vm-wdt.c
> new file mode 100644
> index 000000000000..ea4351754645
> --- /dev/null
> +++ b/drivers/misc/vm-wdt.c
> @@ -0,0 +1,215 @@
> +// SPDX-License-Identifier: GPL-2.0+

I have to ask, do you really mean "+" here as this is not the overall
license of the kernel.  It's not a normal license for your employer to
pick, so as long as you have legal approval, it's fine, but if not, you
need to get that.

> +//
> +// Virtual watchdog driver.
> +//  Copyright (C) Google, 2022
> +
> +#define pr_fmt(fmt) "vm-watchdog: " fmt

It's a driver, you shouldn't need any pr_* calls.

> +
> +#include <linux/cpu.h>
> +#include <linux/init.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +
> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/module.h>
> +#include <linux/nmi.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/param.h>
> +#include <linux/percpu.h>
> +#include <linux/platform_device.h>
> +#include <linux/slab.h>
> +
> +#define DRV_NAME			"vm_wdt"

KBUILD_MODNAME?

> +#define DRV_VERSION			"1.0"

"versions" mean nothing once the code is in the kernel, please drop
this.

But why isn't this in the normal watchdog subdirectory?  Why is this a
special driver?

> +
> +#define VMWDT_REG_STATUS		(0x00)
> +#define VMWDT_REG_LOAD_CNT		(0x04)
> +#define VMWDT_REG_CURRENT_CNT		(0x08)
> +#define VMWDT_REG_CLOCK_FREQ_HZ		(0x0C)
> +#define VMWDT_REG_LEN			(0x10)
> +
> +#define VMWDT_DEFAULT_CLOCK_HZ		(10)
> +#define VMWDT_DEFAULT_TIMEOT_SEC	(8)
> +
> +struct vm_wdt_s {
> +	void __iomem *membase;
> +	u32 clock_freq;
> +	u32 expiration_sec;
> +	u32 ping_timeout_ms;
> +	struct hrtimer per_cpu_hrtimer;
> +	struct platform_device *dev;
> +};
> +
> +#define vmwdt_reg_write(wdt, reg, value)	\
> +	iowrite32((value), (wdt)->membase + (reg))
> +#define vmwdt_reg_read(wdt, reg)		\
> +	io32read((wdt)->membase + (reg))
> +
> +static struct platform_device *virt_dev;
> +
> +static enum hrtimer_restart vmwdt_timer_fn(struct hrtimer *hrtimer)
> +{
> +	struct vm_wdt_s *cpu_wdt;
> +	u32 ticks;
> +
> +	cpu_wdt = container_of(hrtimer, struct vm_wdt_s, per_cpu_hrtimer);
> +	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
> +	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
> +	hrtimer_forward_now(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms));
> +
> +	return HRTIMER_RESTART;
> +}
> +
> +static void vmwdt_start(void *arg)
> +{
> +	u32 ticks;
> +	int cpu = smp_processor_id();
> +	struct vm_wdt_s *cpu_wdt = arg;
> +	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
> +
> +	pr_info("cpu %u vmwdt start\n", cpu);

When drivers work properly, they are quiet.

Again, why not have this in drivers/watchdog/ and use the apis there
instead of creating a custom one for no reason?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH  v2 0/2] Detect stalls on guest vCPUS
  2022-04-22 14:19 [PATCH v2 0/2] Detect stalls on guest vCPUS Sebastian Ene
  2022-04-22 14:19 ` [PATCH v2 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
  2022-04-22 14:19 ` [PATCH v2 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
@ 2022-04-23  6:51 ` Greg Kroah-Hartman
  2022-04-23  9:02   ` Sebastian Ene
  2 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-23  6:51 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, qperret

On Fri, Apr 22, 2022 at 02:19:48PM +0000, Sebastian Ene wrote:
> This adds a mechanism to detect stalls on the guest vCPUS by creating a
> per CPU hrtimer which periodically 'pets' the host backend driver.
> 
> This device driver acts as a soft lockup detector by relying on the host
> backend driver to measure the elapesed time between subsequent 'pet' events.
> If the elapsed time doesn't match an expected value, the backend driver
> decides that the guest vCPU is locked and resets the guest. The host
> backend driver takes into account the time that the guest is not
> running. The communication with the backend driver is done through MMIO
> and the register layout of the virtual watchdog is described as part of
> the backend driver changes.
> 
> The host backend driver is implemented as part of:
> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> 
> Changelog v2:
>  - move the driver to misc as this does not cope with watchdog core
>    subsystem

Wait, why does it not cope with it?  That's not documented anywhere in
your patch that adds the driver.  In fact, most of the text here needs
to be in the changelog for the driver submission, not thrown away in the
00/XX email that will never end up in the kernel tree.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH  v2 0/2] Detect stalls on guest vCPUS
  2022-04-23  6:51 ` [PATCH v2 0/2] Detect stalls on guest vCPUS Greg Kroah-Hartman
@ 2022-04-23  9:02   ` Sebastian Ene
  2022-04-23  9:36     ` Marc Zyngier
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Ene @ 2022-04-23  9:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	Rob Herring, devicetree, qperret, will, maz

On Sat, Apr 23, 2022 at 08:51:16AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Apr 22, 2022 at 02:19:48PM +0000, Sebastian Ene wrote:
> > This adds a mechanism to detect stalls on the guest vCPUS by creating a
> > per CPU hrtimer which periodically 'pets' the host backend driver.
> > 
> > This device driver acts as a soft lockup detector by relying on the host
> > backend driver to measure the elapesed time between subsequent 'pet' events.
> > If the elapsed time doesn't match an expected value, the backend driver
> > decides that the guest vCPU is locked and resets the guest. The host
> > backend driver takes into account the time that the guest is not
> > running. The communication with the backend driver is done through MMIO
> > and the register layout of the virtual watchdog is described as part of
> > the backend driver changes.
> > 
> > The host backend driver is implemented as part of:
> > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > 
> > Changelog v2:
> >  - move the driver to misc as this does not cope with watchdog core
> >    subsystem

Hello Greg,

> 
> Wait, why does it not cope with it?  That's not documented anywhere in
> your patch that adds the driver.  In fact, most of the text here needs
> to be in the changelog for the driver submission, not thrown away in the
> 00/XX email that will never end up in the kernel tree.
> 
> thanks,
> 
> greg k-h

From the previous feedback that I received on this patch it seems that
watchdog core is not intended to be used for this type of driver. This
watchdog device tracks the elapsed time on a per-cpu basis,
since KVM schedules vCPUs independently. Watchdog core is not intended
to detect CPU stalls and the drivers don't have a notion of CPU.

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH  v2 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-23  6:50   ` Greg Kroah-Hartman
@ 2022-04-23  9:14     ` Sebastian Ene
  2022-04-23  9:22       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Ene @ 2022-04-23  9:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	Rob Herring, devicetree, qperret, will, maz

On Sat, Apr 23, 2022 at 08:50:03AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Apr 22, 2022 at 02:19:50PM +0000, Sebastian Ene wrote:

Hello Greg,

> > This patch adds support for a virtual watchdog which relies on the
> > per-cpu hrtimers to pet at regular intervals.
> > 
> > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > ---
> >  drivers/misc/Kconfig  |   8 ++
> >  drivers/misc/Makefile |   1 +
> >  drivers/misc/vm-wdt.c | 215 ++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 224 insertions(+)
> >  create mode 100644 drivers/misc/vm-wdt.c
> > 
> > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > index 2b9572a6d114..0e710149ff95 100644
> > --- a/drivers/misc/Kconfig
> > +++ b/drivers/misc/Kconfig
> > @@ -493,6 +493,14 @@ config OPEN_DICE
> >  
> >  	  If unsure, say N.
> >  
> > +config VM_WATCHDOG
> > +	tristate "Virtual Machine Watchdog"
> > +	select LOCKUP_DETECTOR
> > +	help
> > +	  Detect CPU locks on the virtual machine.
> > +	  To compile this driver as a module, choose M here: the
> > +	  module will be called vm-wdt.
> > +
> >  source "drivers/misc/c2port/Kconfig"
> >  source "drivers/misc/eeprom/Kconfig"
> >  source "drivers/misc/cb710/Kconfig"
> > diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> > index 2ec634354cf5..868e28d01b75 100644
> > --- a/drivers/misc/Makefile
> > +++ b/drivers/misc/Makefile
> > @@ -59,3 +59,4 @@ obj-$(CONFIG_XILINX_SDFEC)	+= xilinx_sdfec.o
> >  obj-$(CONFIG_HISI_HIKEY_USB)	+= hisi_hikey_usb.o
> >  obj-$(CONFIG_UID_SYS_STATS)	+= uid_sys_stats.o
> >  obj-$(CONFIG_OPEN_DICE)		+= open-dice.o
> > +obj-$(CONFIG_VM_WATCHDOG) += vm-wdt.o
> 
> No tab?
> 

I will add one.

> > \ No newline at end of file
> > diff --git a/drivers/misc/vm-wdt.c b/drivers/misc/vm-wdt.c
> > new file mode 100644
> > index 000000000000..ea4351754645
> > --- /dev/null
> > +++ b/drivers/misc/vm-wdt.c
> > @@ -0,0 +1,215 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> 
> I have to ask, do you really mean "+" here as this is not the overall
> license of the kernel.  It's not a normal license for your employer to
> pick, so as long as you have legal approval, it's fine, but if not, you
> need to get that.
> 

Thanks for letting me know, I think this should be :
SPDX-License-Identifier: GPL-2.0 without "+".

> > +//
> > +// Virtual watchdog driver.
> > +//  Copyright (C) Google, 2022
> > +
> > +#define pr_fmt(fmt) "vm-watchdog: " fmt
> 
> It's a driver, you shouldn't need any pr_* calls.
>

I will remove those.

> > +
> > +#include <linux/cpu.h>
> > +#include <linux/init.h>
> > +#include <linux/io.h>
> > +#include <linux/kernel.h>
> > +
> > +#include <linux/device.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/module.h>
> > +#include <linux/nmi.h>
> > +#include <linux/of.h>
> > +#include <linux/of_device.h>
> > +#include <linux/param.h>
> > +#include <linux/percpu.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/slab.h>
> > +
> > +#define DRV_NAME			"vm_wdt"
> 
> KBUILD_MODNAME?
> 
> > +#define DRV_VERSION			"1.0"
> 
> "versions" mean nothing once the code is in the kernel, please drop
> this.
> 

I will drop this.

> But why isn't this in the normal watchdog subdirectory?  Why is this a
> special driver?
> 
> > +
> > +#define VMWDT_REG_STATUS		(0x00)
> > +#define VMWDT_REG_LOAD_CNT		(0x04)
> > +#define VMWDT_REG_CURRENT_CNT		(0x08)
> > +#define VMWDT_REG_CLOCK_FREQ_HZ		(0x0C)
> > +#define VMWDT_REG_LEN			(0x10)
> > +
> > +#define VMWDT_DEFAULT_CLOCK_HZ		(10)
> > +#define VMWDT_DEFAULT_TIMEOT_SEC	(8)
> > +
> > +struct vm_wdt_s {
> > +	void __iomem *membase;
> > +	u32 clock_freq;
> > +	u32 expiration_sec;
> > +	u32 ping_timeout_ms;
> > +	struct hrtimer per_cpu_hrtimer;
> > +	struct platform_device *dev;
> > +};
> > +
> > +#define vmwdt_reg_write(wdt, reg, value)	\
> > +	iowrite32((value), (wdt)->membase + (reg))
> > +#define vmwdt_reg_read(wdt, reg)		\
> > +	io32read((wdt)->membase + (reg))
> > +
> > +static struct platform_device *virt_dev;
> > +
> > +static enum hrtimer_restart vmwdt_timer_fn(struct hrtimer *hrtimer)
> > +{
> > +	struct vm_wdt_s *cpu_wdt;
> > +	u32 ticks;
> > +
> > +	cpu_wdt = container_of(hrtimer, struct vm_wdt_s, per_cpu_hrtimer);
> > +	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
> > +	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
> > +	hrtimer_forward_now(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms));
> > +
> > +	return HRTIMER_RESTART;
> > +}
> > +
> > +static void vmwdt_start(void *arg)
> > +{
> > +	u32 ticks;
> > +	int cpu = smp_processor_id();
> > +	struct vm_wdt_s *cpu_wdt = arg;
> > +	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
> > +
> > +	pr_info("cpu %u vmwdt start\n", cpu);
> 
> When drivers work properly, they are quiet.
> 

I will drop this.

> Again, why not have this in drivers/watchdog/ and use the apis there
> instead of creating a custom one for no reason?
> 

I submitted this patch to the drivers/watchdog and I received some
feedback on it stating that this type of driver is not intended to be
used with watchdog core, because the drivers don't have a notion of
CPU. Moreover, we need to keep track of the elapsed time on a per-cpu
basis and the core watchdog framework doesn't provide this
functionality.

> thanks,
> 
> greg k-h

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH  v2 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-23  9:14     ` Sebastian Ene
@ 2022-04-23  9:22       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-23  9:22 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	Rob Herring, devicetree, qperret, will, maz

On Sat, Apr 23, 2022 at 09:14:53AM +0000, Sebastian Ene wrote:
> On Sat, Apr 23, 2022 at 08:50:03AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Apr 22, 2022 at 02:19:50PM +0000, Sebastian Ene wrote:
> 
> Hello Greg,
> 
> > > This patch adds support for a virtual watchdog which relies on the
> > > per-cpu hrtimers to pet at regular intervals.
> > > 
> > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > ---
> > >  drivers/misc/Kconfig  |   8 ++
> > >  drivers/misc/Makefile |   1 +
> > >  drivers/misc/vm-wdt.c | 215 ++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 224 insertions(+)
> > >  create mode 100644 drivers/misc/vm-wdt.c
> > > 
> > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > > index 2b9572a6d114..0e710149ff95 100644
> > > --- a/drivers/misc/Kconfig
> > > +++ b/drivers/misc/Kconfig
> > > @@ -493,6 +493,14 @@ config OPEN_DICE
> > >  
> > >  	  If unsure, say N.
> > >  
> > > +config VM_WATCHDOG
> > > +	tristate "Virtual Machine Watchdog"
> > > +	select LOCKUP_DETECTOR
> > > +	help
> > > +	  Detect CPU locks on the virtual machine.
> > > +	  To compile this driver as a module, choose M here: the
> > > +	  module will be called vm-wdt.
> > > +
> > >  source "drivers/misc/c2port/Kconfig"
> > >  source "drivers/misc/eeprom/Kconfig"
> > >  source "drivers/misc/cb710/Kconfig"
> > > diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> > > index 2ec634354cf5..868e28d01b75 100644
> > > --- a/drivers/misc/Makefile
> > > +++ b/drivers/misc/Makefile
> > > @@ -59,3 +59,4 @@ obj-$(CONFIG_XILINX_SDFEC)	+= xilinx_sdfec.o
> > >  obj-$(CONFIG_HISI_HIKEY_USB)	+= hisi_hikey_usb.o
> > >  obj-$(CONFIG_UID_SYS_STATS)	+= uid_sys_stats.o
> > >  obj-$(CONFIG_OPEN_DICE)		+= open-dice.o
> > > +obj-$(CONFIG_VM_WATCHDOG) += vm-wdt.o
> > 
> > No tab?
> > 
> 
> I will add one.
> 
> > > \ No newline at end of file
> > > diff --git a/drivers/misc/vm-wdt.c b/drivers/misc/vm-wdt.c
> > > new file mode 100644
> > > index 000000000000..ea4351754645
> > > --- /dev/null
> > > +++ b/drivers/misc/vm-wdt.c
> > > @@ -0,0 +1,215 @@
> > > +// SPDX-License-Identifier: GPL-2.0+
> > 
> > I have to ask, do you really mean "+" here as this is not the overall
> > license of the kernel.  It's not a normal license for your employer to
> > pick, so as long as you have legal approval, it's fine, but if not, you
> > need to get that.
> > 
> 
> Thanks for letting me know, I think this should be :
> SPDX-License-Identifier: GPL-2.0 without "+".
> 
> > > +//
> > > +// Virtual watchdog driver.
> > > +//  Copyright (C) Google, 2022
> > > +
> > > +#define pr_fmt(fmt) "vm-watchdog: " fmt
> > 
> > It's a driver, you shouldn't need any pr_* calls.
> >
> 
> I will remove those.
> 
> > > +
> > > +#include <linux/cpu.h>
> > > +#include <linux/init.h>
> > > +#include <linux/io.h>
> > > +#include <linux/kernel.h>
> > > +
> > > +#include <linux/device.h>
> > > +#include <linux/interrupt.h>
> > > +#include <linux/module.h>
> > > +#include <linux/nmi.h>
> > > +#include <linux/of.h>
> > > +#include <linux/of_device.h>
> > > +#include <linux/param.h>
> > > +#include <linux/percpu.h>
> > > +#include <linux/platform_device.h>
> > > +#include <linux/slab.h>
> > > +
> > > +#define DRV_NAME			"vm_wdt"
> > 
> > KBUILD_MODNAME?
> > 
> > > +#define DRV_VERSION			"1.0"
> > 
> > "versions" mean nothing once the code is in the kernel, please drop
> > this.
> > 
> 
> I will drop this.
> 
> > But why isn't this in the normal watchdog subdirectory?  Why is this a
> > special driver?
> > 
> > > +
> > > +#define VMWDT_REG_STATUS		(0x00)
> > > +#define VMWDT_REG_LOAD_CNT		(0x04)
> > > +#define VMWDT_REG_CURRENT_CNT		(0x08)
> > > +#define VMWDT_REG_CLOCK_FREQ_HZ		(0x0C)
> > > +#define VMWDT_REG_LEN			(0x10)
> > > +
> > > +#define VMWDT_DEFAULT_CLOCK_HZ		(10)
> > > +#define VMWDT_DEFAULT_TIMEOT_SEC	(8)
> > > +
> > > +struct vm_wdt_s {
> > > +	void __iomem *membase;
> > > +	u32 clock_freq;
> > > +	u32 expiration_sec;
> > > +	u32 ping_timeout_ms;
> > > +	struct hrtimer per_cpu_hrtimer;
> > > +	struct platform_device *dev;
> > > +};
> > > +
> > > +#define vmwdt_reg_write(wdt, reg, value)	\
> > > +	iowrite32((value), (wdt)->membase + (reg))
> > > +#define vmwdt_reg_read(wdt, reg)		\
> > > +	io32read((wdt)->membase + (reg))
> > > +
> > > +static struct platform_device *virt_dev;
> > > +
> > > +static enum hrtimer_restart vmwdt_timer_fn(struct hrtimer *hrtimer)
> > > +{
> > > +	struct vm_wdt_s *cpu_wdt;
> > > +	u32 ticks;
> > > +
> > > +	cpu_wdt = container_of(hrtimer, struct vm_wdt_s, per_cpu_hrtimer);
> > > +	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
> > > +	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
> > > +	hrtimer_forward_now(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms));
> > > +
> > > +	return HRTIMER_RESTART;
> > > +}
> > > +
> > > +static void vmwdt_start(void *arg)
> > > +{
> > > +	u32 ticks;
> > > +	int cpu = smp_processor_id();
> > > +	struct vm_wdt_s *cpu_wdt = arg;
> > > +	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
> > > +
> > > +	pr_info("cpu %u vmwdt start\n", cpu);
> > 
> > When drivers work properly, they are quiet.
> > 
> 
> I will drop this.
> 
> > Again, why not have this in drivers/watchdog/ and use the apis there
> > instead of creating a custom one for no reason?
> > 
> 
> I submitted this patch to the drivers/watchdog and I received some
> feedback on it stating that this type of driver is not intended to be
> used with watchdog core, because the drivers don't have a notion of
> CPU. Moreover, we need to keep track of the elapsed time on a per-cpu
> basis and the core watchdog framework doesn't provide this
> functionality.

Then please document the heck out of this in the changelog and in the
Kconfig help text as others will have the same question.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH  v2 0/2] Detect stalls on guest vCPUS
  2022-04-23  9:02   ` Sebastian Ene
@ 2022-04-23  9:36     ` Marc Zyngier
  2022-04-23 11:12       ` Sebastian Ene
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2022-04-23  9:36 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Greg Kroah-Hartman, linux-kernel, Derek Kiernan, Dragan Cvetic,
	Arnd Bergmann, Rob Herring, devicetree, qperret, will

On Sat, 23 Apr 2022 10:02:24 +0100,
Sebastian Ene <sebastianene@google.com> wrote:
> 
> On Sat, Apr 23, 2022 at 08:51:16AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Apr 22, 2022 at 02:19:48PM +0000, Sebastian Ene wrote:
> > > This adds a mechanism to detect stalls on the guest vCPUS by creating a
> > > per CPU hrtimer which periodically 'pets' the host backend driver.
> > > 
> > > This device driver acts as a soft lockup detector by relying on the host
> > > backend driver to measure the elapesed time between subsequent 'pet' events.
> > > If the elapsed time doesn't match an expected value, the backend driver
> > > decides that the guest vCPU is locked and resets the guest. The host
> > > backend driver takes into account the time that the guest is not
> > > running. The communication with the backend driver is done through MMIO
> > > and the register layout of the virtual watchdog is described as part of
> > > the backend driver changes.
> > > 
> > > The host backend driver is implemented as part of:
> > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > > 
> > > Changelog v2:
> > >  - move the driver to misc as this does not cope with watchdog core
> > >    subsystem
> 
> Hello Greg,
> 
> > 
> > Wait, why does it not cope with it?  That's not documented anywhere in
> > your patch that adds the driver.  In fact, most of the text here needs
> > to be in the changelog for the driver submission, not thrown away in the
> > 00/XX email that will never end up in the kernel tree.
> > 
> > thanks,
> > 
> > greg k-h
> 
> From the previous feedback that I received on this patch it seems that
> watchdog core is not intended to be used for this type of driver. This
> watchdog device tracks the elapsed time on a per-cpu basis,
> since KVM schedules vCPUs independently. Watchdog core is not intended
> to detect CPU stalls and the drivers don't have a notion of CPU.

I must say that I don't really get the objection against the watchdog
approach. OK, there is no userspace aspect to this.  But we already
use watchdogs for more than just userspace (reboot is one of the major
use cases).

There already are per-CPU watchdog in the tree: see how the
fsl-ls208xa platform has one SP805 per CPU (8 of them in total). As
far as I can tell, there was no objection to this. So what is special
about this one?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH  v2 0/2] Detect stalls on guest vCPUS
  2022-04-23  9:36     ` Marc Zyngier
@ 2022-04-23 11:12       ` Sebastian Ene
  0 siblings, 0 replies; 11+ messages in thread
From: Sebastian Ene @ 2022-04-23 11:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	Rob Herring, devicetree, qperret, will, Greg Kroah-Hartman

On Sat, Apr 23, 2022 at 10:36:56AM +0100, Marc Zyngier wrote:
> On Sat, 23 Apr 2022 10:02:24 +0100,
> Sebastian Ene <sebastianene@google.com> wrote:
> > 
> > On Sat, Apr 23, 2022 at 08:51:16AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, Apr 22, 2022 at 02:19:48PM +0000, Sebastian Ene wrote:
> > > > This adds a mechanism to detect stalls on the guest vCPUS by creating a
> > > > per CPU hrtimer which periodically 'pets' the host backend driver.
> > > > 
> > > > This device driver acts as a soft lockup detector by relying on the host
> > > > backend driver to measure the elapesed time between subsequent 'pet' events.
> > > > If the elapsed time doesn't match an expected value, the backend driver
> > > > decides that the guest vCPU is locked and resets the guest. The host
> > > > backend driver takes into account the time that the guest is not
> > > > running. The communication with the backend driver is done through MMIO
> > > > and the register layout of the virtual watchdog is described as part of
> > > > the backend driver changes.
> > > > 
> > > > The host backend driver is implemented as part of:
> > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > > > 
> > > > Changelog v2:
> > > >  - move the driver to misc as this does not cope with watchdog core
> > > >    subsystem
> > 
> > Hello Greg,
> > 
> > > 
> > > Wait, why does it not cope with it?  That's not documented anywhere in
> > > your patch that adds the driver.  In fact, most of the text here needs
> > > to be in the changelog for the driver submission, not thrown away in the
> > > 00/XX email that will never end up in the kernel tree.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > From the previous feedback that I received on this patch it seems that
> > watchdog core is not intended to be used for this type of driver. This
> > watchdog device tracks the elapsed time on a per-cpu basis,
> > since KVM schedules vCPUs independently. Watchdog core is not intended
> > to detect CPU stalls and the drivers don't have a notion of CPU.

Hello Marc,

> 
> I must say that I don't really get the objection against the watchdog
> approach. OK, there is no userspace aspect to this.  But we already
> use watchdogs for more than just userspace (reboot is one of the major
> use cases).
> 
> There already are per-CPU watchdog in the tree: see how the
> fsl-ls208xa platform has one SP805 per CPU (8 of them in total). As
> far as I can tell, there was no objection to this. So what is special
> about this one?

I think the difference is in the fact that this driver expects hrtimers
which are CPU binded to execute the periodic watchdog 'pet'. We would
require a strong thread affinity setting if we rely on userspace to
do this 'pet' operation.

Thanks,
Sebastian

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-23 11:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-22 14:19 [PATCH v2 0/2] Detect stalls on guest vCPUS Sebastian Ene
2022-04-22 14:19 ` [PATCH v2 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
2022-04-22 21:10   ` Rob Herring
2022-04-22 14:19 ` [PATCH v2 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
2022-04-23  6:50   ` Greg Kroah-Hartman
2022-04-23  9:14     ` Sebastian Ene
2022-04-23  9:22       ` Greg Kroah-Hartman
2022-04-23  6:51 ` [PATCH v2 0/2] Detect stalls on guest vCPUS Greg Kroah-Hartman
2022-04-23  9:02   ` Sebastian Ene
2022-04-23  9:36     ` Marc Zyngier
2022-04-23 11:12       ` Sebastian Ene

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.