linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] Detect stalls on guest vCPUS
@ 2022-04-29  8:30 Sebastian Ene
  2022-04-29  8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Sebastian Ene @ 2022-04-29  8:30 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, qperret, Guenter Roeck,
	Sebastian Ene

This adds a mechanism to detect stalls on the guest vCPUS by creating a
per CPU hrtimer which periodically 'pets' the host backend driver.
On a conventional watchdog-core driver, the userspace is responsible for
delivering the 'pet' events by writing to the particular /dev/watchdogN node.
In this case we require a strong thread affinity to be able to
account for lost time on a per vCPU basis.

This device driver acts as a soft lockup detector by relying on the host
backend driver to measure the elapesed time between subsequent 'pet' events.
If the elapsed time doesn't match an expected value, the backend driver
decides that the guest vCPU is locked and resets the guest. The host
backend driver takes into account the time that the guest is not
running. The communication with the backend driver is done through MMIO
and the register layout of the virtual watchdog is described as part of
the backend driver changes.

The host backend driver is implemented as part of:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817

Changelog v4:
 - rename the source from vm-wdt.c -> vm-watchdog.c
 - convert all the error logging calls from pr_* to dev_* calls
 - rename the DTS node "clock" to "clock-frequency"

Changelog v3:
 - cosmetic fixes, remove pr_info and version information
 - improve description in the commit messag
 - improve description in the Kconfig help section

Sebastian Ene (2):
  dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible
  misc: Add a mechanism to detect stalls on guest vCPUs

 .../devicetree/bindings/misc/vm-watchdog.yaml |  45 ++++
 drivers/misc/Kconfig                          |  12 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/vm-watchdog.c                    | 206 ++++++++++++++++++
 4 files changed, 264 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/vm-watchdog.yaml
 create mode 100644 drivers/misc/vm-watchdog.c

-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible
  2022-04-29  8:30 [PATCH v4 0/2] Detect stalls on guest vCPUS Sebastian Ene
@ 2022-04-29  8:30 ` Sebastian Ene
  2022-04-29 15:53   ` Rob Herring
  2022-05-07  8:16   ` Sebastian Ene
  2022-04-29  8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  2022-04-29 20:25 ` [PATCH v4 0/2] Detect stalls on guest vCPUS Rob Herring
  2 siblings, 2 replies; 18+ messages in thread
From: Sebastian Ene @ 2022-04-29  8:30 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, qperret, Guenter Roeck,
	Sebastian Ene

The stall detection mechanism allows to configure the expiration
duration and the internal counter clock frequency measured in Hz.
Add these properties in the schema.

Signed-off-by: Sebastian Ene <sebastianene@google.com>
---
 .../devicetree/bindings/misc/vm-watchdog.yaml | 45 +++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/misc/vm-watchdog.yaml

diff --git a/Documentation/devicetree/bindings/misc/vm-watchdog.yaml b/Documentation/devicetree/bindings/misc/vm-watchdog.yaml
new file mode 100644
index 000000000000..d7cca23357ab
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/vm-watchdog.yaml
@@ -0,0 +1,45 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/misc/vm-watchdog.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: VM watchdog
+
+description: |
+  This binding describes a CPU stall detector mechanism for virtual cpus
+  which is accessed through MMIO.
+
+maintainers:
+  - Sebastian Ene <sebastianene@google.com>
+
+properties:
+  compatible:
+    enum:
+      - qemu,vm-watchdog
+  clock-frequency:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description: |
+      The watchdog internal clock measure in Hz used to decrement the
+      watchdog counter register on each tick.
+      Defaults to 10 if unset.
+  timeout-sec:
+    $ref: /schemas/types.yaml#/definitions/uint32
+    description: |
+      The watchdog expiration timeout measured in seconds.
+      Defaults to 8 if unset.
+
+required:
+  - compatible
+
+additionalProperties: false
+
+examples:
+  - |
+    watchdog {
+      compatible = "qemu,vm-watchdog";
+      clock-frequency = <10>;
+      timeout-sec = <8>;
+    };
+
+...
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29  8:30 [PATCH v4 0/2] Detect stalls on guest vCPUS Sebastian Ene
  2022-04-29  8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
@ 2022-04-29  8:30 ` Sebastian Ene
  2022-04-29  8:48   ` Greg Kroah-Hartman
  2022-04-29  8:51   ` Greg Kroah-Hartman
  2022-04-29 20:25 ` [PATCH v4 0/2] Detect stalls on guest vCPUS Rob Herring
  2 siblings, 2 replies; 18+ messages in thread
From: Sebastian Ene @ 2022-04-29  8:30 UTC (permalink / raw)
  To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic
  Cc: linux-kernel, devicetree, maz, will, qperret, Guenter Roeck,
	Sebastian Ene

This driver creates per-cpu hrtimers which are required to do the
periodic 'pet' operation. On a conventional watchdog-core driver, the
userspace is responsible for delivering the 'pet' events by writing to
the particular /dev/watchdogN node. In this case we require a strong
thread affinity to be able to account for lost time on a per vCPU.

This part of the driver is the 'frontend' which is reponsible for
delivering the periodic 'pet' events, configuring the virtual peripheral
and listening for cpu hotplug events. The other part of the driver
handles the peripheral emulation and this part accounts for lost time by
looking at the /proc/{}/task/{}/stat entries and is located here:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817

Signed-off-by: Sebastian Ene <sebastianene@google.com>
---
 drivers/misc/Kconfig       |  12 +++
 drivers/misc/Makefile      |   1 +
 drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
 3 files changed, 219 insertions(+)
 create mode 100644 drivers/misc/vm-watchdog.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 2b9572a6d114..26c3a99e269c 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -493,6 +493,18 @@ config OPEN_DICE
 
 	  If unsure, say N.
 
+config VM_WATCHDOG
+	tristate "Virtual Machine Watchdog"
+	select LOCKUP_DETECTOR
+	help
+	  Detect CPU locks on the virtual machine. This driver relies on the
+	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
+	  has to do a 'pet', it exits the guest through MMIO write and the
+	  backend driver takes into account the lost ticks for this particular
+	  CPU.
+	  To compile this driver as a module, choose M here: the
+	  module will be called vm-wdt.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 2ec634354cf5..9ea562d0c98b 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -59,3 +59,4 @@ obj-$(CONFIG_XILINX_SDFEC)	+= xilinx_sdfec.o
 obj-$(CONFIG_HISI_HIKEY_USB)	+= hisi_hikey_usb.o
 obj-$(CONFIG_UID_SYS_STATS)	+= uid_sys_stats.o
 obj-$(CONFIG_OPEN_DICE)		+= open-dice.o
+obj-$(CONFIG_VM_WATCHDOG)	+= vm-watchdog.o
\ No newline at end of file
diff --git a/drivers/misc/vm-watchdog.c b/drivers/misc/vm-watchdog.c
new file mode 100644
index 000000000000..1d344e9a84a8
--- /dev/null
+++ b/drivers/misc/vm-watchdog.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Virtual watchdog driver.
+//  Copyright (C) Google, 2022
+
+#include <linux/cpu.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/nmi.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/param.h>
+#include <linux/percpu.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+#define VMWDT_REG_STATUS		(0x00)
+#define VMWDT_REG_LOAD_CNT		(0x04)
+#define VMWDT_REG_CURRENT_CNT		(0x08)
+#define VMWDT_REG_CLOCK_FREQ_HZ		(0x0C)
+#define VMWDT_REG_LEN			(0x10)
+
+#define VMWDT_DEFAULT_CLOCK_HZ		(10)
+#define VMWDT_DEFAULT_TIMEOT_SEC	(8)
+
+struct vm_wdt_s {
+	void __iomem *membase;
+	u32 clock_freq;
+	u32 expiration_sec;
+	u32 ping_timeout_ms;
+	struct hrtimer per_cpu_hrtimer;
+	struct platform_device *dev;
+};
+
+#define vmwdt_reg_write(wdt, reg, value)	\
+	iowrite32((value), (wdt)->membase + (reg))
+#define vmwdt_reg_read(wdt, reg)		\
+	io32read((wdt)->membase + (reg))
+
+static struct platform_device *virt_dev;
+
+static enum hrtimer_restart vmwdt_timer_fn(struct hrtimer *hrtimer)
+{
+	struct vm_wdt_s *cpu_wdt;
+	u32 ticks;
+
+	cpu_wdt = container_of(hrtimer, struct vm_wdt_s, per_cpu_hrtimer);
+	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
+	hrtimer_forward_now(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms));
+
+	return HRTIMER_RESTART;
+}
+
+static void vmwdt_start(void *arg)
+{
+	u32 ticks;
+	struct vm_wdt_s *cpu_wdt = arg;
+	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
+
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_CLOCK_FREQ_HZ,
+			cpu_wdt->clock_freq);
+
+	/* Compute the number of ticks required for the watchdog counter
+	 * register based on the internal clock frequency and the watchdog
+	 * timeout given from the device tree.
+	 */
+	ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec;
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks);
+
+	/* Enable the internal clock and start the watchdog */
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_STATUS, 1);
+
+	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hrtimer->function = vmwdt_timer_fn;
+	hrtimer_start(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms),
+		      HRTIMER_MODE_REL_PINNED);
+}
+
+static void vmwdt_stop(void *arg)
+{
+	struct vm_wdt_s *cpu_wdt = arg;
+	struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer;
+
+	hrtimer_cancel(hrtimer);
+
+	/* Disable the watchdog */
+	vmwdt_reg_write(cpu_wdt, VMWDT_REG_STATUS, 0);
+}
+
+static int start_watchdog_on_cpu(unsigned int cpu)
+{
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(virt_dev);
+
+	vmwdt_start(this_cpu_ptr(vm_wdt));
+	return 0;
+}
+
+static int stop_watchdog_on_cpu(unsigned int cpu)
+{
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(virt_dev);
+
+	vmwdt_stop(this_cpu_ptr(vm_wdt));
+	return 0;
+}
+
+static int vmwdt_probe(struct platform_device *dev)
+{
+	int cpu, ret, err;
+	void __iomem *membase;
+	struct resource *r;
+	struct vm_wdt_s *vm_wdt;
+	u32 wdt_clock, wdt_timeout_sec = 0;
+
+	r = platform_get_resource(dev, IORESOURCE_MEM, 0);
+	if (r == NULL)
+		return -ENOENT;
+
+	vm_wdt = alloc_percpu(typeof(struct vm_wdt_s));
+	if (!vm_wdt)
+		return -ENOMEM;
+
+	membase = ioremap(r->start, resource_size(r));
+	if (!membase) {
+		ret = -ENXIO;
+		goto err_withmem;
+	}
+
+	virt_dev = dev;
+	platform_set_drvdata(dev, vm_wdt);
+	if (of_property_read_u32(dev->dev.of_node, "clock-frequency",
+				 &wdt_clock))
+		wdt_clock = VMWDT_DEFAULT_CLOCK_HZ;
+
+	if (of_property_read_u32(dev->dev.of_node, "timeout-sec",
+				 &wdt_timeout_sec))
+		wdt_timeout_sec = VMWDT_DEFAULT_TIMEOT_SEC;
+
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) {
+		struct vm_wdt_s *cpu_wdt = per_cpu_ptr(vm_wdt, cpu);
+
+		cpu_wdt->membase = membase + cpu * VMWDT_REG_LEN;
+		cpu_wdt->clock_freq = wdt_clock;
+		cpu_wdt->expiration_sec = wdt_timeout_sec;
+		cpu_wdt->ping_timeout_ms = wdt_timeout_sec * MSEC_PER_SEC / 2;
+		smp_call_function_single(cpu, vmwdt_start, cpu_wdt, true);
+	}
+
+	err = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					"virt/watchdog:online",
+					start_watchdog_on_cpu,
+					stop_watchdog_on_cpu);
+	if (err < 0) {
+		dev_warn(&dev->dev, "failed to install cpu hotplug");
+		ret = err;
+		goto err_withmem;
+	}
+
+	return 0;
+
+err_withmem:
+	free_percpu(vm_wdt);
+	return ret;
+}
+
+static int vmwdt_remove(struct platform_device *dev)
+{
+	int cpu;
+	struct vm_wdt_s *vm_wdt = platform_get_drvdata(dev);
+
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) {
+		struct vm_wdt_s *cpu_wdt = per_cpu_ptr(vm_wdt, cpu);
+
+		smp_call_function_single(cpu, vmwdt_stop, cpu_wdt, true);
+	}
+
+	free_percpu(vm_wdt);
+	return 0;
+}
+
+static const struct of_device_id vmwdt_of_match[] = {
+	{ .compatible = "qemu,vm-watchdog", },
+	{}
+};
+
+MODULE_DEVICE_TABLE(of, vm_watchdog_of_match);
+
+static struct platform_driver vmwdt_driver = {
+	.probe  = vmwdt_probe,
+	.remove = vmwdt_remove,
+	.driver = {
+		.name           = KBUILD_MODNAME,
+		.of_match_table = vmwdt_of_match,
+	},
+};
+
+module_platform_driver(vmwdt_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Sebastian Ene <sebastianene@google.com>");
+MODULE_DESCRIPTION("Virtual watchdog driver");
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29  8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
@ 2022-04-29  8:48   ` Greg Kroah-Hartman
  2022-04-29 16:51     ` Guenter Roeck
  2022-04-29  8:51   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-29  8:48 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, qperret, Guenter Roeck

On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> This driver creates per-cpu hrtimers which are required to do the
> periodic 'pet' operation. On a conventional watchdog-core driver, the
> userspace is responsible for delivering the 'pet' events by writing to
> the particular /dev/watchdogN node. In this case we require a strong
> thread affinity to be able to account for lost time on a per vCPU.
> 
> This part of the driver is the 'frontend' which is reponsible for
> delivering the periodic 'pet' events, configuring the virtual peripheral
> and listening for cpu hotplug events. The other part of the driver
> handles the peripheral emulation and this part accounts for lost time by
> looking at the /proc/{}/task/{}/stat entries and is located here:
> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  drivers/misc/Kconfig       |  12 +++
>  drivers/misc/Makefile      |   1 +
>  drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 219 insertions(+)
>  create mode 100644 drivers/misc/vm-watchdog.c
> 
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 2b9572a6d114..26c3a99e269c 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -493,6 +493,18 @@ config OPEN_DICE
>  
>  	  If unsure, say N.
>  
> +config VM_WATCHDOG
> +	tristate "Virtual Machine Watchdog"
> +	select LOCKUP_DETECTOR
> +	help
> +	  Detect CPU locks on the virtual machine. This driver relies on the
> +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> +	  has to do a 'pet', it exits the guest through MMIO write and the
> +	  backend driver takes into account the lost ticks for this particular
> +	  CPU.
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called vm-wdt.

You forgot to name the module properly here based on the Makefile change
you made.

And again, as this is called a "watchdog", it seems crazy that it is not
in drivers/watchdog/

{sigh}

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29  8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
  2022-04-29  8:48   ` Greg Kroah-Hartman
@ 2022-04-29  8:51   ` Greg Kroah-Hartman
  2022-04-29  9:26     ` Sebastian Ene
  1 sibling, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-29  8:51 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, qperret, Guenter Roeck

On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> This driver creates per-cpu hrtimers which are required to do the
> periodic 'pet' operation. On a conventional watchdog-core driver, the
> userspace is responsible for delivering the 'pet' events by writing to
> the particular /dev/watchdogN node. In this case we require a strong
> thread affinity to be able to account for lost time on a per vCPU.
> 
> This part of the driver is the 'frontend' which is reponsible for
> delivering the periodic 'pet' events, configuring the virtual peripheral
> and listening for cpu hotplug events. The other part of the driver
> handles the peripheral emulation and this part accounts for lost time by
> looking at the /proc/{}/task/{}/stat entries and is located here:
> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  drivers/misc/Kconfig       |  12 +++
>  drivers/misc/Makefile      |   1 +
>  drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 219 insertions(+)
>  create mode 100644 drivers/misc/vm-watchdog.c
> 
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 2b9572a6d114..26c3a99e269c 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -493,6 +493,18 @@ config OPEN_DICE
>  
>  	  If unsure, say N.
>  
> +config VM_WATCHDOG
> +	tristate "Virtual Machine Watchdog"
> +	select LOCKUP_DETECTOR
> +	help
> +	  Detect CPU locks on the virtual machine. This driver relies on the
> +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> +	  has to do a 'pet', it exits the guest through MMIO write and the
> +	  backend driver takes into account the lost ticks for this particular
> +	  CPU.

There's nothing to keep this tied to a virtual machine at all, right?
You are just relying on some iomem address to be updated, so it should
be a "generic_iomem_watchdog" driver as there's nothing specific to vms
at all from what I can tell.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29  8:51   ` Greg Kroah-Hartman
@ 2022-04-29  9:26     ` Sebastian Ene
  2022-04-29  9:38       ` Greg Kroah-Hartman
  2022-04-29 17:02       ` Guenter Roeck
  0 siblings, 2 replies; 18+ messages in thread
From: Sebastian Ene @ 2022-04-29  9:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	devicetree, qperret, will, Rob Herring, Guenter Roeck

On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> > This driver creates per-cpu hrtimers which are required to do the
> > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > userspace is responsible for delivering the 'pet' events by writing to
> > the particular /dev/watchdogN node. In this case we require a strong
> > thread affinity to be able to account for lost time on a per vCPU.
> > 
> > This part of the driver is the 'frontend' which is reponsible for
> > delivering the periodic 'pet' events, configuring the virtual peripheral
> > and listening for cpu hotplug events. The other part of the driver
> > handles the peripheral emulation and this part accounts for lost time by
> > looking at the /proc/{}/task/{}/stat entries and is located here:
> > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > 
> > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > ---
> >  drivers/misc/Kconfig       |  12 +++
> >  drivers/misc/Makefile      |   1 +
> >  drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
> >  3 files changed, 219 insertions(+)
> >  create mode 100644 drivers/misc/vm-watchdog.c
> > 
> > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > index 2b9572a6d114..26c3a99e269c 100644
> > --- a/drivers/misc/Kconfig
> > +++ b/drivers/misc/Kconfig
> > @@ -493,6 +493,18 @@ config OPEN_DICE
> >  
> >  	  If unsure, say N.
> >  
> > +config VM_WATCHDOG
> > +	tristate "Virtual Machine Watchdog"
> > +	select LOCKUP_DETECTOR
> > +	help
> > +	  Detect CPU locks on the virtual machine. This driver relies on the
> > +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> > +	  has to do a 'pet', it exits the guest through MMIO write and the
> > +	  backend driver takes into account the lost ticks for this particular
> > +	  CPU.

Hi,

> 
> There's nothing to keep this tied to a virtual machine at all, right?
> You are just relying on some iomem address to be updated, so it should
> be a "generic_iomem_watchdog" driver as there's nothing specific to vms
> at all from what I can tell.
> 
> thanks,
> 
> greg k-h

That's right although I might think of using the term "generic lockup detector"
instead of watchdog. The only reason why I would keep "virtual machine"
word in, is that there is no actual hardware for this.

Thanks,
Seb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29  9:26     ` Sebastian Ene
@ 2022-04-29  9:38       ` Greg Kroah-Hartman
  2022-04-29 21:03         ` Rob Herring
  2022-04-29 17:02       ` Guenter Roeck
  1 sibling, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-29  9:38 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	devicetree, qperret, will, Rob Herring, Guenter Roeck

On Fri, Apr 29, 2022 at 09:26:26AM +0000, Sebastian Ene wrote:
> On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> > > This driver creates per-cpu hrtimers which are required to do the
> > > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > > userspace is responsible for delivering the 'pet' events by writing to
> > > the particular /dev/watchdogN node. In this case we require a strong
> > > thread affinity to be able to account for lost time on a per vCPU.
> > > 
> > > This part of the driver is the 'frontend' which is reponsible for
> > > delivering the periodic 'pet' events, configuring the virtual peripheral
> > > and listening for cpu hotplug events. The other part of the driver
> > > handles the peripheral emulation and this part accounts for lost time by
> > > looking at the /proc/{}/task/{}/stat entries and is located here:
> > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > > 
> > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > ---
> > >  drivers/misc/Kconfig       |  12 +++
> > >  drivers/misc/Makefile      |   1 +
> > >  drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 219 insertions(+)
> > >  create mode 100644 drivers/misc/vm-watchdog.c
> > > 
> > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > > index 2b9572a6d114..26c3a99e269c 100644
> > > --- a/drivers/misc/Kconfig
> > > +++ b/drivers/misc/Kconfig
> > > @@ -493,6 +493,18 @@ config OPEN_DICE
> > >  
> > >  	  If unsure, say N.
> > >  
> > > +config VM_WATCHDOG
> > > +	tristate "Virtual Machine Watchdog"
> > > +	select LOCKUP_DETECTOR
> > > +	help
> > > +	  Detect CPU locks on the virtual machine. This driver relies on the
> > > +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> > > +	  has to do a 'pet', it exits the guest through MMIO write and the
> > > +	  backend driver takes into account the lost ticks for this particular
> > > +	  CPU.
> 
> Hi,
> 
> > 
> > There's nothing to keep this tied to a virtual machine at all, right?
> > You are just relying on some iomem address to be updated, so it should
> > be a "generic_iomem_watchdog" driver as there's nothing specific to vms
> > at all from what I can tell.
> > 
> > thanks,
> > 
> > greg k-h
> 
> That's right although I might think of using the term "generic lockup detector"
> instead of watchdog. The only reason why I would keep "virtual machine"
> word in, is that there is no actual hardware for this.

That doesn't really matter, it's just a memory location in device tree
that you are needing, odds are some hardware device could use it just
like this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible
  2022-04-29  8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
@ 2022-04-29 15:53   ` Rob Herring
  2022-05-07  8:16   ` Sebastian Ene
  1 sibling, 0 replies; 18+ messages in thread
From: Rob Herring @ 2022-04-29 15:53 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: devicetree, qperret, maz, Arnd Bergmann, Rob Herring, will,
	linux-kernel, Dragan Cvetic, Guenter Roeck, Greg Kroah-Hartman

On Fri, 29 Apr 2022 08:30:31 +0000, Sebastian Ene wrote:
> The stall detection mechanism allows to configure the expiration
> duration and the internal counter clock frequency measured in Hz.
> Add these properties in the schema.
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  .../devicetree/bindings/misc/vm-watchdog.yaml | 45 +++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/misc/vm-watchdog.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/misc/vm-watchdog.yaml: properties:timeout-sec: '$ref' should not be valid under {'const': '$ref'}
	hint: Standard unit suffix properties don't need a type $ref
	from schema $id: http://devicetree.org/meta-schemas/core.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/misc/vm-watchdog.yaml: ignoring, error in schema: properties: timeout-sec
Documentation/devicetree/bindings/misc/vm-watchdog.example.dtb:0:0: /example-0/watchdog: failed to match any schema with compatible: ['qemu,vm-watchdog']

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29  8:48   ` Greg Kroah-Hartman
@ 2022-04-29 16:51     ` Guenter Roeck
  2022-04-30  6:18       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2022-04-29 16:51 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Sebastian Ene
  Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, qperret

On 4/29/22 01:48, Greg Kroah-Hartman wrote:
> On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
>> This driver creates per-cpu hrtimers which are required to do the
>> periodic 'pet' operation. On a conventional watchdog-core driver, the
>> userspace is responsible for delivering the 'pet' events by writing to
>> the particular /dev/watchdogN node. In this case we require a strong
>> thread affinity to be able to account for lost time on a per vCPU.
>>
>> This part of the driver is the 'frontend' which is reponsible for
>> delivering the periodic 'pet' events, configuring the virtual peripheral
>> and listening for cpu hotplug events. The other part of the driver
>> handles the peripheral emulation and this part accounts for lost time by
>> looking at the /proc/{}/task/{}/stat entries and is located here:
>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
>>
>> Signed-off-by: Sebastian Ene <sebastianene@google.com>
>> ---
>>   drivers/misc/Kconfig       |  12 +++
>>   drivers/misc/Makefile      |   1 +
>>   drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
>>   3 files changed, 219 insertions(+)
>>   create mode 100644 drivers/misc/vm-watchdog.c
>>
>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
>> index 2b9572a6d114..26c3a99e269c 100644
>> --- a/drivers/misc/Kconfig
>> +++ b/drivers/misc/Kconfig
>> @@ -493,6 +493,18 @@ config OPEN_DICE
>>   
>>   	  If unsure, say N.
>>   
>> +config VM_WATCHDOG
>> +	tristate "Virtual Machine Watchdog"
>> +	select LOCKUP_DETECTOR
>> +	help
>> +	  Detect CPU locks on the virtual machine. This driver relies on the
>> +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
>> +	  has to do a 'pet', it exits the guest through MMIO write and the
>> +	  backend driver takes into account the lost ticks for this particular
>> +	  CPU.
>> +	  To compile this driver as a module, choose M here: the
>> +	  module will be called vm-wdt.
> 
> You forgot to name the module properly here based on the Makefile change
> you made.
> 
> And again, as this is called a "watchdog", it seems crazy that it is not
> in drivers/watchdog/
> 

I disagree. It is not a watchdog driver in the traditional sense (it does
not use, want to use, or need to use the watchdog driver API or ABI).
Its functionality is similar to the functionality of kernel/watchdog.c,
which doesn't belong into drivers/watchdog either.

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29  9:26     ` Sebastian Ene
  2022-04-29  9:38       ` Greg Kroah-Hartman
@ 2022-04-29 17:02       ` Guenter Roeck
  1 sibling, 0 replies; 18+ messages in thread
From: Guenter Roeck @ 2022-04-29 17:02 UTC (permalink / raw)
  To: Sebastian Ene, Greg Kroah-Hartman
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	devicetree, qperret, will, Rob Herring

On 4/29/22 02:26, Sebastian Ene wrote:
> On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote:
>> On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
>>> This driver creates per-cpu hrtimers which are required to do the
>>> periodic 'pet' operation. On a conventional watchdog-core driver, the
>>> userspace is responsible for delivering the 'pet' events by writing to
>>> the particular /dev/watchdogN node. In this case we require a strong
>>> thread affinity to be able to account for lost time on a per vCPU.
>>>
>>> This part of the driver is the 'frontend' which is reponsible for
>>> delivering the periodic 'pet' events, configuring the virtual peripheral
>>> and listening for cpu hotplug events. The other part of the driver
>>> handles the peripheral emulation and this part accounts for lost time by
>>> looking at the /proc/{}/task/{}/stat entries and is located here:
>>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
>>>
>>> Signed-off-by: Sebastian Ene <sebastianene@google.com>
>>> ---
>>>   drivers/misc/Kconfig       |  12 +++
>>>   drivers/misc/Makefile      |   1 +
>>>   drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
>>>   3 files changed, 219 insertions(+)
>>>   create mode 100644 drivers/misc/vm-watchdog.c
>>>
>>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
>>> index 2b9572a6d114..26c3a99e269c 100644
>>> --- a/drivers/misc/Kconfig
>>> +++ b/drivers/misc/Kconfig
>>> @@ -493,6 +493,18 @@ config OPEN_DICE
>>>   
>>>   	  If unsure, say N.
>>>   
>>> +config VM_WATCHDOG
>>> +	tristate "Virtual Machine Watchdog"
>>> +	select LOCKUP_DETECTOR
>>> +	help
>>> +	  Detect CPU locks on the virtual machine. This driver relies on the
>>> +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
>>> +	  has to do a 'pet', it exits the guest through MMIO write and the
>>> +	  backend driver takes into account the lost ticks for this particular
>>> +	  CPU.
> 
> Hi,
> 
>>
>> There's nothing to keep this tied to a virtual machine at all, right?
>> You are just relying on some iomem address to be updated, so it should
>> be a "generic_iomem_watchdog" driver as there's nothing specific to vms
>> at all from what I can tell.
>>
>> thanks,
>>
>> greg k-h
> 
> That's right although I might think of using the term "generic lockup detector"

Agreed, that would be a much better name.

Guenter


> instead of watchdog. The only reason why I would keep "virtual machine"
> word in, is that there is no actual hardware for this.
> 
> Thanks,
> Seb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/2] Detect stalls on guest vCPUS
  2022-04-29  8:30 [PATCH v4 0/2] Detect stalls on guest vCPUS Sebastian Ene
  2022-04-29  8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
  2022-04-29  8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
@ 2022-04-29 20:25 ` Rob Herring
  2022-05-02  6:03   ` Sebastian Ene
  2 siblings, 1 reply; 18+ messages in thread
From: Rob Herring @ 2022-04-29 20:25 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic, linux-kernel,
	devicetree, maz, will, qperret, Guenter Roeck

On Fri, Apr 29, 2022 at 08:30:29AM +0000, Sebastian Ene wrote:
> This adds a mechanism to detect stalls on the guest vCPUS by creating a
> per CPU hrtimer which periodically 'pets' the host backend driver.
> On a conventional watchdog-core driver, the userspace is responsible for
> delivering the 'pet' events by writing to the particular /dev/watchdogN node.
> In this case we require a strong thread affinity to be able to
> account for lost time on a per vCPU basis.
> 
> This device driver acts as a soft lockup detector by relying on the host
> backend driver to measure the elapesed time between subsequent 'pet' events.
> If the elapsed time doesn't match an expected value, the backend driver
> decides that the guest vCPU is locked and resets the guest. The host
> backend driver takes into account the time that the guest is not
> running. The communication with the backend driver is done through MMIO
> and the register layout of the virtual watchdog is described as part of
> the backend driver changes.
> 
> The host backend driver is implemented as part of:
> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> 
> Changelog v4:
>  - rename the source from vm-wdt.c -> vm-watchdog.c
>  - convert all the error logging calls from pr_* to dev_* calls
>  - rename the DTS node "clock" to "clock-frequency"

Why do I have a v4 now when the discussion on v3 is not concluded. Give 
folks some time to respond. We're busy drinking from the firehose.

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29  9:38       ` Greg Kroah-Hartman
@ 2022-04-29 21:03         ` Rob Herring
  2022-05-04  7:29           ` Sebastian Ene
  0 siblings, 1 reply; 18+ messages in thread
From: Rob Herring @ 2022-04-29 21:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sebastian Ene, linux-kernel, Derek Kiernan, Dragan Cvetic,
	Arnd Bergmann, devicetree, qperret, will, Guenter Roeck

On Fri, Apr 29, 2022 at 11:38:52AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Apr 29, 2022 at 09:26:26AM +0000, Sebastian Ene wrote:
> > On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> > > > This driver creates per-cpu hrtimers which are required to do the
> > > > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > > > userspace is responsible for delivering the 'pet' events by writing to
> > > > the particular /dev/watchdogN node. In this case we require a strong
> > > > thread affinity to be able to account for lost time on a per vCPU.
> > > > 
> > > > This part of the driver is the 'frontend' which is reponsible for
> > > > delivering the periodic 'pet' events, configuring the virtual peripheral
> > > > and listening for cpu hotplug events. The other part of the driver
> > > > handles the peripheral emulation and this part accounts for lost time by
> > > > looking at the /proc/{}/task/{}/stat entries and is located here:
> > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > > > 
> > > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > > ---
> > > >  drivers/misc/Kconfig       |  12 +++
> > > >  drivers/misc/Makefile      |   1 +
> > > >  drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 219 insertions(+)
> > > >  create mode 100644 drivers/misc/vm-watchdog.c
> > > > 
> > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > > > index 2b9572a6d114..26c3a99e269c 100644
> > > > --- a/drivers/misc/Kconfig
> > > > +++ b/drivers/misc/Kconfig
> > > > @@ -493,6 +493,18 @@ config OPEN_DICE
> > > >  
> > > >  	  If unsure, say N.
> > > >  
> > > > +config VM_WATCHDOG
> > > > +	tristate "Virtual Machine Watchdog"
> > > > +	select LOCKUP_DETECTOR
> > > > +	help
> > > > +	  Detect CPU locks on the virtual machine. This driver relies on the
> > > > +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> > > > +	  has to do a 'pet', it exits the guest through MMIO write and the
> > > > +	  backend driver takes into account the lost ticks for this particular
> > > > +	  CPU.
> > 
> > Hi,
> > 
> > > 
> > > There's nothing to keep this tied to a virtual machine at all, right?
> > > You are just relying on some iomem address to be updated, so it should
> > > be a "generic_iomem_watchdog" driver as there's nothing specific to vms
> > > at all from what I can tell.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > That's right although I might think of using the term "generic lockup detector"
> > instead of watchdog. The only reason why I would keep "virtual machine"
> > word in, is that there is no actual hardware for this.
> 
> That doesn't really matter, it's just a memory location in device tree
> that you are needing, odds are some hardware device could use it just
> like this.

Such as a shared on-chip memory that both a system control processor and 
the main processors can access. Of course, those also typically already 
have a comnunication channel.

But for a VM-hypervisor interface, why isn't one of the existing 
communications interfaces being used? One that is discoverable would be 
better than using DT.

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29 16:51     ` Guenter Roeck
@ 2022-04-30  6:18       ` Greg Kroah-Hartman
  2022-04-30 12:36         ` Guenter Roeck
  0 siblings, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-30  6:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Sebastian Ene, Rob Herring, Arnd Bergmann, Dragan Cvetic,
	linux-kernel, devicetree, maz, will, qperret

On Fri, Apr 29, 2022 at 09:51:51AM -0700, Guenter Roeck wrote:
> On 4/29/22 01:48, Greg Kroah-Hartman wrote:
> > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> > > This driver creates per-cpu hrtimers which are required to do the
> > > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > > userspace is responsible for delivering the 'pet' events by writing to
> > > the particular /dev/watchdogN node. In this case we require a strong
> > > thread affinity to be able to account for lost time on a per vCPU.
> > > 
> > > This part of the driver is the 'frontend' which is reponsible for
> > > delivering the periodic 'pet' events, configuring the virtual peripheral
> > > and listening for cpu hotplug events. The other part of the driver
> > > handles the peripheral emulation and this part accounts for lost time by
> > > looking at the /proc/{}/task/{}/stat entries and is located here:
> > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > > 
> > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > ---
> > >   drivers/misc/Kconfig       |  12 +++
> > >   drivers/misc/Makefile      |   1 +
> > >   drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
> > >   3 files changed, 219 insertions(+)
> > >   create mode 100644 drivers/misc/vm-watchdog.c
> > > 
> > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > > index 2b9572a6d114..26c3a99e269c 100644
> > > --- a/drivers/misc/Kconfig
> > > +++ b/drivers/misc/Kconfig
> > > @@ -493,6 +493,18 @@ config OPEN_DICE
> > >   	  If unsure, say N.
> > > +config VM_WATCHDOG
> > > +	tristate "Virtual Machine Watchdog"
> > > +	select LOCKUP_DETECTOR
> > > +	help
> > > +	  Detect CPU locks on the virtual machine. This driver relies on the
> > > +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> > > +	  has to do a 'pet', it exits the guest through MMIO write and the
> > > +	  backend driver takes into account the lost ticks for this particular
> > > +	  CPU.
> > > +	  To compile this driver as a module, choose M here: the
> > > +	  module will be called vm-wdt.
> > 
> > You forgot to name the module properly here based on the Makefile change
> > you made.
> > 
> > And again, as this is called a "watchdog", it seems crazy that it is not
> > in drivers/watchdog/
> > 
> 
> I disagree. It is not a watchdog driver in the traditional sense (it does
> not use, want to use, or need to use the watchdog driver API or ABI).
> Its functionality is similar to the functionality of kernel/watchdog.c,
> which doesn't belong into drivers/watchdog either.

Ah, ok, that makes more sense, the user/kernel api is not the same.
Someone should put that in the changelog next time :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-30  6:18       ` Greg Kroah-Hartman
@ 2022-04-30 12:36         ` Guenter Roeck
  2022-05-02  5:58           ` Sebastian Ene
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2022-04-30 12:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sebastian Ene, Rob Herring, Arnd Bergmann, Dragan Cvetic,
	linux-kernel, devicetree, maz, will, qperret

On 4/29/22 23:18, Greg Kroah-Hartman wrote:
> On Fri, Apr 29, 2022 at 09:51:51AM -0700, Guenter Roeck wrote:
>> On 4/29/22 01:48, Greg Kroah-Hartman wrote:
>>> On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
>>>> This driver creates per-cpu hrtimers which are required to do the
>>>> periodic 'pet' operation. On a conventional watchdog-core driver, the
>>>> userspace is responsible for delivering the 'pet' events by writing to
>>>> the particular /dev/watchdogN node. In this case we require a strong
>>>> thread affinity to be able to account for lost time on a per vCPU.
>>>>
>>>> This part of the driver is the 'frontend' which is reponsible for
>>>> delivering the periodic 'pet' events, configuring the virtual peripheral
>>>> and listening for cpu hotplug events. The other part of the driver
>>>> handles the peripheral emulation and this part accounts for lost time by
>>>> looking at the /proc/{}/task/{}/stat entries and is located here:
>>>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
>>>>
>>>> Signed-off-by: Sebastian Ene <sebastianene@google.com>
>>>> ---
>>>>    drivers/misc/Kconfig       |  12 +++
>>>>    drivers/misc/Makefile      |   1 +
>>>>    drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
>>>>    3 files changed, 219 insertions(+)
>>>>    create mode 100644 drivers/misc/vm-watchdog.c
>>>>
>>>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
>>>> index 2b9572a6d114..26c3a99e269c 100644
>>>> --- a/drivers/misc/Kconfig
>>>> +++ b/drivers/misc/Kconfig
>>>> @@ -493,6 +493,18 @@ config OPEN_DICE
>>>>    	  If unsure, say N.
>>>> +config VM_WATCHDOG
>>>> +	tristate "Virtual Machine Watchdog"
>>>> +	select LOCKUP_DETECTOR
>>>> +	help
>>>> +	  Detect CPU locks on the virtual machine. This driver relies on the
>>>> +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
>>>> +	  has to do a 'pet', it exits the guest through MMIO write and the
>>>> +	  backend driver takes into account the lost ticks for this particular
>>>> +	  CPU.
>>>> +	  To compile this driver as a module, choose M here: the
>>>> +	  module will be called vm-wdt.
>>>
>>> You forgot to name the module properly here based on the Makefile change
>>> you made.
>>>
>>> And again, as this is called a "watchdog", it seems crazy that it is not
>>> in drivers/watchdog/
>>>
>>
>> I disagree. It is not a watchdog driver in the traditional sense (it does
>> not use, want to use, or need to use the watchdog driver API or ABI).
>> Its functionality is similar to the functionality of kernel/watchdog.c,
>> which doesn't belong into drivers/watchdog either.
> 
> Ah, ok, that makes more sense, the user/kernel api is not the same.
> Someone should put that in the changelog next time :)
> 

Renaming it to "VCPU stall detector" or similar should fix the confusion.

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-30 12:36         ` Guenter Roeck
@ 2022-05-02  5:58           ` Sebastian Ene
  0 siblings, 0 replies; 18+ messages in thread
From: Sebastian Ene @ 2022-05-02  5:58 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	devicetree, qperret, will, maz, Greg Kroah-Hartman, Rob Herring

On Sat, Apr 30, 2022 at 05:36:50AM -0700, Guenter Roeck wrote:
> On 4/29/22 23:18, Greg Kroah-Hartman wrote:
> > On Fri, Apr 29, 2022 at 09:51:51AM -0700, Guenter Roeck wrote:
> > > On 4/29/22 01:48, Greg Kroah-Hartman wrote:
> > > > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> > > > > This driver creates per-cpu hrtimers which are required to do the
> > > > > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > > > > userspace is responsible for delivering the 'pet' events by writing to
> > > > > the particular /dev/watchdogN node. In this case we require a strong
> > > > > thread affinity to be able to account for lost time on a per vCPU.
> > > > > 
> > > > > This part of the driver is the 'frontend' which is reponsible for
> > > > > delivering the periodic 'pet' events, configuring the virtual peripheral
> > > > > and listening for cpu hotplug events. The other part of the driver
> > > > > handles the peripheral emulation and this part accounts for lost time by
> > > > > looking at the /proc/{}/task/{}/stat entries and is located here:
> > > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > > > > 
> > > > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > > > ---
> > > > >    drivers/misc/Kconfig       |  12 +++
> > > > >    drivers/misc/Makefile      |   1 +
> > > > >    drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
> > > > >    3 files changed, 219 insertions(+)
> > > > >    create mode 100644 drivers/misc/vm-watchdog.c
> > > > > 
> > > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > > > > index 2b9572a6d114..26c3a99e269c 100644
> > > > > --- a/drivers/misc/Kconfig
> > > > > +++ b/drivers/misc/Kconfig
> > > > > @@ -493,6 +493,18 @@ config OPEN_DICE
> > > > >    	  If unsure, say N.
> > > > > +config VM_WATCHDOG
> > > > > +	tristate "Virtual Machine Watchdog"
> > > > > +	select LOCKUP_DETECTOR
> > > > > +	help
> > > > > +	  Detect CPU locks on the virtual machine. This driver relies on the
> > > > > +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> > > > > +	  has to do a 'pet', it exits the guest through MMIO write and the
> > > > > +	  backend driver takes into account the lost ticks for this particular
> > > > > +	  CPU.
> > > > > +	  To compile this driver as a module, choose M here: the
> > > > > +	  module will be called vm-wdt.
> > > > 
> > > > You forgot to name the module properly here based on the Makefile change
> > > > you made.
> > > > 
> > > > And again, as this is called a "watchdog", it seems crazy that it is not
> > > > in drivers/watchdog/
> > > > 
> > > 
> > > I disagree. It is not a watchdog driver in the traditional sense (it does
> > > not use, want to use, or need to use the watchdog driver API or ABI).
> > > Its functionality is similar to the functionality of kernel/watchdog.c,
> > > which doesn't belong into drivers/watchdog either.

Hi,

> > 
> > Ah, ok, that makes more sense, the user/kernel api is not the same.
> > Someone should put that in the changelog next time :)
> > 

I will include this in the changelog to fix the confusion.

> 
> Renaming it to "VCPU stall detector" or similar should fix the confusion.
> 

Yes I think that is a better naming.

Thanks,
Seb

> Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/2] Detect stalls on guest vCPUS
  2022-04-29 20:25 ` [PATCH v4 0/2] Detect stalls on guest vCPUS Rob Herring
@ 2022-05-02  6:03   ` Sebastian Ene
  0 siblings, 0 replies; 18+ messages in thread
From: Sebastian Ene @ 2022-05-02  6:03 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	devicetree, qperret, will, maz, Greg Kroah-Hartman,
	Guenter Roeck

On Fri, Apr 29, 2022 at 03:25:45PM -0500, Rob Herring wrote:
> On Fri, Apr 29, 2022 at 08:30:29AM +0000, Sebastian Ene wrote:
> > This adds a mechanism to detect stalls on the guest vCPUS by creating a
> > per CPU hrtimer which periodically 'pets' the host backend driver.
> > On a conventional watchdog-core driver, the userspace is responsible for
> > delivering the 'pet' events by writing to the particular /dev/watchdogN node.
> > In this case we require a strong thread affinity to be able to
> > account for lost time on a per vCPU basis.
> > 
> > This device driver acts as a soft lockup detector by relying on the host
> > backend driver to measure the elapesed time between subsequent 'pet' events.
> > If the elapsed time doesn't match an expected value, the backend driver
> > decides that the guest vCPU is locked and resets the guest. The host
> > backend driver takes into account the time that the guest is not
> > running. The communication with the backend driver is done through MMIO
> > and the register layout of the virtual watchdog is described as part of
> > the backend driver changes.
> > 
> > The host backend driver is implemented as part of:
> > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > 
> > Changelog v4:
> >  - rename the source from vm-wdt.c -> vm-watchdog.c
> >  - convert all the error logging calls from pr_* to dev_* calls
> >  - rename the DTS node "clock" to "clock-frequency"

Hi,

> 
> Why do I have a v4 now when the discussion on v3 is not concluded. Give 
> folks some time to respond. We're busy drinking from the firehose.
> 

I am trying to address the issues incrementlly keeping a week cadence.
Any feedback on this is welcomed.

Thanks,
Seb

> Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs
  2022-04-29 21:03         ` Rob Herring
@ 2022-05-04  7:29           ` Sebastian Ene
  0 siblings, 0 replies; 18+ messages in thread
From: Sebastian Ene @ 2022-05-04  7:29 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	devicetree, qperret, will, maz, Greg Kroah-Hartman

On Fri, Apr 29, 2022 at 04:03:45PM -0500, Rob Herring wrote:
> On Fri, Apr 29, 2022 at 11:38:52AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Apr 29, 2022 at 09:26:26AM +0000, Sebastian Ene wrote:
> > > On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote:
> > > > > This driver creates per-cpu hrtimers which are required to do the
> > > > > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > > > > userspace is responsible for delivering the 'pet' events by writing to
> > > > > the particular /dev/watchdogN node. In this case we require a strong
> > > > > thread affinity to be able to account for lost time on a per vCPU.
> > > > > 
> > > > > This part of the driver is the 'frontend' which is reponsible for
> > > > > delivering the periodic 'pet' events, configuring the virtual peripheral
> > > > > and listening for cpu hotplug events. The other part of the driver
> > > > > handles the peripheral emulation and this part accounts for lost time by
> > > > > looking at the /proc/{}/task/{}/stat entries and is located here:
> > > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817
> > > > > 
> > > > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > > > ---
> > > > >  drivers/misc/Kconfig       |  12 +++
> > > > >  drivers/misc/Makefile      |   1 +
> > > > >  drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++
> > > > >  3 files changed, 219 insertions(+)
> > > > >  create mode 100644 drivers/misc/vm-watchdog.c
> > > > > 
> > > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > > > > index 2b9572a6d114..26c3a99e269c 100644
> > > > > --- a/drivers/misc/Kconfig
> > > > > +++ b/drivers/misc/Kconfig
> > > > > @@ -493,6 +493,18 @@ config OPEN_DICE
> > > > >  
> > > > >  	  If unsure, say N.
> > > > >  
> > > > > +config VM_WATCHDOG
> > > > > +	tristate "Virtual Machine Watchdog"
> > > > > +	select LOCKUP_DETECTOR
> > > > > +	help
> > > > > +	  Detect CPU locks on the virtual machine. This driver relies on the
> > > > > +	  hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU
> > > > > +	  has to do a 'pet', it exits the guest through MMIO write and the
> > > > > +	  backend driver takes into account the lost ticks for this particular
> > > > > +	  CPU.
> > > 
> > > Hi,
> > > 
> > > > 
> > > > There's nothing to keep this tied to a virtual machine at all, right?
> > > > You are just relying on some iomem address to be updated, so it should
> > > > be a "generic_iomem_watchdog" driver as there's nothing specific to vms
> > > > at all from what I can tell.
> > > > 
> > > > thanks,
> > > > 
> > > > greg k-h
> > > 
> > > That's right although I might think of using the term "generic lockup detector"
> > > instead of watchdog. The only reason why I would keep "virtual machine"
> > > word in, is that there is no actual hardware for this.
> > 
> > That doesn't really matter, it's just a memory location in device tree
> > that you are needing, odds are some hardware device could use it just
> > like this.

Hi,

> 
> Such as a shared on-chip memory that both a system control processor and 
> the main processors can access. Of course, those also typically already 
> have a comnunication channel.
> 
> But for a VM-hypervisor interface, why isn't one of the existing 
> communications interfaces being used? One that is discoverable would be 
> better than using DT.
> 

In a protected VM we don't trust the host to present and control the loaded
peripherals. We rely on another entity to generate a trusted device tree
for us. I hope this clarifies the need for DT and I think this information
should also be added in the changelog.

> Rob

Thanks,
Seb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible
  2022-04-29  8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
  2022-04-29 15:53   ` Rob Herring
@ 2022-05-07  8:16   ` Sebastian Ene
  1 sibling, 0 replies; 18+ messages in thread
From: Sebastian Ene @ 2022-05-07  8:16 UTC (permalink / raw)
  To: Sebastian Ene
  Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann,
	devicetree, qperret, will, maz, Greg Kroah-Hartman, Rob Herring,
	Guenter Roeck

On Fri, Apr 29, 2022 at 08:30:31AM +0000, Sebastian Ene wrote:
> The stall detection mechanism allows to configure the expiration
> duration and the internal counter clock frequency measured in Hz.
> Add these properties in the schema.
> 
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> ---
>  .../devicetree/bindings/misc/vm-watchdog.yaml | 45 +++++++++++++++++++
>  1 file changed, 45 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/misc/vm-watchdog.yaml
> 
> diff --git a/Documentation/devicetree/bindings/misc/vm-watchdog.yaml b/Documentation/devicetree/bindings/misc/vm-watchdog.yaml
> new file mode 100644
> index 000000000000..d7cca23357ab
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/misc/vm-watchdog.yaml
> @@ -0,0 +1,45 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/misc/vm-watchdog.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: VM watchdog
> +
> +description: |
> +  This binding describes a CPU stall detector mechanism for virtual cpus
> +  which is accessed through MMIO.
> +
> +maintainers:
> +  - Sebastian Ene <sebastianene@google.com>
> +
> +properties:
> +  compatible:
> +    enum:
> +      - qemu,vm-watchdog
> +  clock-frequency:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description: |
> +      The watchdog internal clock measure in Hz used to decrement the
> +      watchdog counter register on each tick.
> +      Defaults to 10 if unset.
> +  timeout-sec:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description: |
> +      The watchdog expiration timeout measured in seconds.
> +      Defaults to 8 if unset.
> +
> +required:
> +  - compatible
> +
> +additionalProperties: false

Hi,

> +
> +examples:
> +  - |
> +    watchdog {
> +      compatible = "qemu,vm-watchdog";
> +      clock-frequency = <10>;
> +      timeout-sec = <8>;
> +    };
> +
> +...
> -- 
> 2.36.0.464.gb9c8b46e94-goog
>

I was thinking to move the properties clock-frequency and timeout-sec as
part of the emulated watchdog registers and keep only the registers
property in the DT.

Thanks,
Seb

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-05-07  8:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29  8:30 [PATCH v4 0/2] Detect stalls on guest vCPUS Sebastian Ene
2022-04-29  8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene
2022-04-29 15:53   ` Rob Herring
2022-05-07  8:16   ` Sebastian Ene
2022-04-29  8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
2022-04-29  8:48   ` Greg Kroah-Hartman
2022-04-29 16:51     ` Guenter Roeck
2022-04-30  6:18       ` Greg Kroah-Hartman
2022-04-30 12:36         ` Guenter Roeck
2022-05-02  5:58           ` Sebastian Ene
2022-04-29  8:51   ` Greg Kroah-Hartman
2022-04-29  9:26     ` Sebastian Ene
2022-04-29  9:38       ` Greg Kroah-Hartman
2022-04-29 21:03         ` Rob Herring
2022-05-04  7:29           ` Sebastian Ene
2022-04-29 17:02       ` Guenter Roeck
2022-04-29 20:25 ` [PATCH v4 0/2] Detect stalls on guest vCPUS Rob Herring
2022-05-02  6:03   ` Sebastian Ene

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).