* [PATCH v4 0/2] Detect stalls on guest vCPUS @ 2022-04-29 8:30 Sebastian Ene 2022-04-29 8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Sebastian Ene @ 2022-04-29 8:30 UTC (permalink / raw) To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic Cc: linux-kernel, devicetree, maz, will, qperret, Guenter Roeck, Sebastian Ene This adds a mechanism to detect stalls on the guest vCPUS by creating a per CPU hrtimer which periodically 'pets' the host backend driver. On a conventional watchdog-core driver, the userspace is responsible for delivering the 'pet' events by writing to the particular /dev/watchdogN node. In this case we require a strong thread affinity to be able to account for lost time on a per vCPU basis. This device driver acts as a soft lockup detector by relying on the host backend driver to measure the elapesed time between subsequent 'pet' events. If the elapsed time doesn't match an expected value, the backend driver decides that the guest vCPU is locked and resets the guest. The host backend driver takes into account the time that the guest is not running. The communication with the backend driver is done through MMIO and the register layout of the virtual watchdog is described as part of the backend driver changes. The host backend driver is implemented as part of: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 Changelog v4: - rename the source from vm-wdt.c -> vm-watchdog.c - convert all the error logging calls from pr_* to dev_* calls - rename the DTS node "clock" to "clock-frequency" Changelog v3: - cosmetic fixes, remove pr_info and version information - improve description in the commit messag - improve description in the Kconfig help section Sebastian Ene (2): dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible misc: Add a mechanism to detect stalls on guest vCPUs .../devicetree/bindings/misc/vm-watchdog.yaml | 45 ++++ drivers/misc/Kconfig | 12 + drivers/misc/Makefile | 1 + drivers/misc/vm-watchdog.c | 206 ++++++++++++++++++ 4 files changed, 264 insertions(+) create mode 100644 Documentation/devicetree/bindings/misc/vm-watchdog.yaml create mode 100644 drivers/misc/vm-watchdog.c -- 2.36.0.464.gb9c8b46e94-goog ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible 2022-04-29 8:30 [PATCH v4 0/2] Detect stalls on guest vCPUS Sebastian Ene @ 2022-04-29 8:30 ` Sebastian Ene 2022-04-29 15:53 ` Rob Herring 2022-05-07 8:16 ` Sebastian Ene 2022-04-29 8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene 2022-04-29 20:25 ` [PATCH v4 0/2] Detect stalls on guest vCPUS Rob Herring 2 siblings, 2 replies; 18+ messages in thread From: Sebastian Ene @ 2022-04-29 8:30 UTC (permalink / raw) To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic Cc: linux-kernel, devicetree, maz, will, qperret, Guenter Roeck, Sebastian Ene The stall detection mechanism allows to configure the expiration duration and the internal counter clock frequency measured in Hz. Add these properties in the schema. Signed-off-by: Sebastian Ene <sebastianene@google.com> --- .../devicetree/bindings/misc/vm-watchdog.yaml | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 Documentation/devicetree/bindings/misc/vm-watchdog.yaml diff --git a/Documentation/devicetree/bindings/misc/vm-watchdog.yaml b/Documentation/devicetree/bindings/misc/vm-watchdog.yaml new file mode 100644 index 000000000000..d7cca23357ab --- /dev/null +++ b/Documentation/devicetree/bindings/misc/vm-watchdog.yaml @@ -0,0 +1,45 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/misc/vm-watchdog.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: VM watchdog + +description: | + This binding describes a CPU stall detector mechanism for virtual cpus + which is accessed through MMIO. + +maintainers: + - Sebastian Ene <sebastianene@google.com> + +properties: + compatible: + enum: + - qemu,vm-watchdog + clock-frequency: + $ref: /schemas/types.yaml#/definitions/uint32 + description: | + The watchdog internal clock measure in Hz used to decrement the + watchdog counter register on each tick. + Defaults to 10 if unset. + timeout-sec: + $ref: /schemas/types.yaml#/definitions/uint32 + description: | + The watchdog expiration timeout measured in seconds. + Defaults to 8 if unset. + +required: + - compatible + +additionalProperties: false + +examples: + - | + watchdog { + compatible = "qemu,vm-watchdog"; + clock-frequency = <10>; + timeout-sec = <8>; + }; + +... -- 2.36.0.464.gb9c8b46e94-goog ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible 2022-04-29 8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene @ 2022-04-29 15:53 ` Rob Herring 2022-05-07 8:16 ` Sebastian Ene 1 sibling, 0 replies; 18+ messages in thread From: Rob Herring @ 2022-04-29 15:53 UTC (permalink / raw) To: Sebastian Ene Cc: devicetree, qperret, maz, Arnd Bergmann, Rob Herring, will, linux-kernel, Dragan Cvetic, Guenter Roeck, Greg Kroah-Hartman On Fri, 29 Apr 2022 08:30:31 +0000, Sebastian Ene wrote: > The stall detection mechanism allows to configure the expiration > duration and the internal counter clock frequency measured in Hz. > Add these properties in the schema. > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > --- > .../devicetree/bindings/misc/vm-watchdog.yaml | 45 +++++++++++++++++++ > 1 file changed, 45 insertions(+) > create mode 100644 Documentation/devicetree/bindings/misc/vm-watchdog.yaml > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' on your patch (DT_CHECKER_FLAGS is new in v5.13): yamllint warnings/errors: dtschema/dtc warnings/errors: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/misc/vm-watchdog.yaml: properties:timeout-sec: '$ref' should not be valid under {'const': '$ref'} hint: Standard unit suffix properties don't need a type $ref from schema $id: http://devicetree.org/meta-schemas/core.yaml# /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/misc/vm-watchdog.yaml: ignoring, error in schema: properties: timeout-sec Documentation/devicetree/bindings/misc/vm-watchdog.example.dtb:0:0: /example-0/watchdog: failed to match any schema with compatible: ['qemu,vm-watchdog'] doc reference errors (make refcheckdocs): See https://patchwork.ozlabs.org/patch/ This check can fail if there are any dependencies. The base for a patch series is generally the most recent rc1. If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure 'yamllint' is installed and dt-schema is up to date: pip3 install dtschema --upgrade Please check and re-submit. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible 2022-04-29 8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene 2022-04-29 15:53 ` Rob Herring @ 2022-05-07 8:16 ` Sebastian Ene 1 sibling, 0 replies; 18+ messages in thread From: Sebastian Ene @ 2022-05-07 8:16 UTC (permalink / raw) To: Sebastian Ene Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann, devicetree, qperret, will, maz, Greg Kroah-Hartman, Rob Herring, Guenter Roeck On Fri, Apr 29, 2022 at 08:30:31AM +0000, Sebastian Ene wrote: > The stall detection mechanism allows to configure the expiration > duration and the internal counter clock frequency measured in Hz. > Add these properties in the schema. > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > --- > .../devicetree/bindings/misc/vm-watchdog.yaml | 45 +++++++++++++++++++ > 1 file changed, 45 insertions(+) > create mode 100644 Documentation/devicetree/bindings/misc/vm-watchdog.yaml > > diff --git a/Documentation/devicetree/bindings/misc/vm-watchdog.yaml b/Documentation/devicetree/bindings/misc/vm-watchdog.yaml > new file mode 100644 > index 000000000000..d7cca23357ab > --- /dev/null > +++ b/Documentation/devicetree/bindings/misc/vm-watchdog.yaml > @@ -0,0 +1,45 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/misc/vm-watchdog.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: VM watchdog > + > +description: | > + This binding describes a CPU stall detector mechanism for virtual cpus > + which is accessed through MMIO. > + > +maintainers: > + - Sebastian Ene <sebastianene@google.com> > + > +properties: > + compatible: > + enum: > + - qemu,vm-watchdog > + clock-frequency: > + $ref: /schemas/types.yaml#/definitions/uint32 > + description: | > + The watchdog internal clock measure in Hz used to decrement the > + watchdog counter register on each tick. > + Defaults to 10 if unset. > + timeout-sec: > + $ref: /schemas/types.yaml#/definitions/uint32 > + description: | > + The watchdog expiration timeout measured in seconds. > + Defaults to 8 if unset. > + > +required: > + - compatible > + > +additionalProperties: false Hi, > + > +examples: > + - | > + watchdog { > + compatible = "qemu,vm-watchdog"; > + clock-frequency = <10>; > + timeout-sec = <8>; > + }; > + > +... > -- > 2.36.0.464.gb9c8b46e94-goog > I was thinking to move the properties clock-frequency and timeout-sec as part of the emulated watchdog registers and keep only the registers property in the DT. Thanks, Seb ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 8:30 [PATCH v4 0/2] Detect stalls on guest vCPUS Sebastian Ene 2022-04-29 8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene @ 2022-04-29 8:30 ` Sebastian Ene 2022-04-29 8:48 ` Greg Kroah-Hartman 2022-04-29 8:51 ` Greg Kroah-Hartman 2022-04-29 20:25 ` [PATCH v4 0/2] Detect stalls on guest vCPUS Rob Herring 2 siblings, 2 replies; 18+ messages in thread From: Sebastian Ene @ 2022-04-29 8:30 UTC (permalink / raw) To: Rob Herring, Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic Cc: linux-kernel, devicetree, maz, will, qperret, Guenter Roeck, Sebastian Ene This driver creates per-cpu hrtimers which are required to do the periodic 'pet' operation. On a conventional watchdog-core driver, the userspace is responsible for delivering the 'pet' events by writing to the particular /dev/watchdogN node. In this case we require a strong thread affinity to be able to account for lost time on a per vCPU. This part of the driver is the 'frontend' which is reponsible for delivering the periodic 'pet' events, configuring the virtual peripheral and listening for cpu hotplug events. The other part of the driver handles the peripheral emulation and this part accounts for lost time by looking at the /proc/{}/task/{}/stat entries and is located here: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 Signed-off-by: Sebastian Ene <sebastianene@google.com> --- drivers/misc/Kconfig | 12 +++ drivers/misc/Makefile | 1 + drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ 3 files changed, 219 insertions(+) create mode 100644 drivers/misc/vm-watchdog.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 2b9572a6d114..26c3a99e269c 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -493,6 +493,18 @@ config OPEN_DICE If unsure, say N. +config VM_WATCHDOG + tristate "Virtual Machine Watchdog" + select LOCKUP_DETECTOR + help + Detect CPU locks on the virtual machine. This driver relies on the + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU + has to do a 'pet', it exits the guest through MMIO write and the + backend driver takes into account the lost ticks for this particular + CPU. + To compile this driver as a module, choose M here: the + module will be called vm-wdt. + source "drivers/misc/c2port/Kconfig" source "drivers/misc/eeprom/Kconfig" source "drivers/misc/cb710/Kconfig" diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 2ec634354cf5..9ea562d0c98b 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -59,3 +59,4 @@ obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o obj-$(CONFIG_HISI_HIKEY_USB) += hisi_hikey_usb.o obj-$(CONFIG_UID_SYS_STATS) += uid_sys_stats.o obj-$(CONFIG_OPEN_DICE) += open-dice.o +obj-$(CONFIG_VM_WATCHDOG) += vm-watchdog.o \ No newline at end of file diff --git a/drivers/misc/vm-watchdog.c b/drivers/misc/vm-watchdog.c new file mode 100644 index 000000000000..1d344e9a84a8 --- /dev/null +++ b/drivers/misc/vm-watchdog.c @@ -0,0 +1,206 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Virtual watchdog driver. +// Copyright (C) Google, 2022 + +#include <linux/cpu.h> +#include <linux/init.h> +#include <linux/io.h> +#include <linux/kernel.h> + +#include <linux/device.h> +#include <linux/interrupt.h> +#include <linux/module.h> +#include <linux/nmi.h> +#include <linux/of.h> +#include <linux/of_device.h> +#include <linux/param.h> +#include <linux/percpu.h> +#include <linux/platform_device.h> +#include <linux/slab.h> + +#define VMWDT_REG_STATUS (0x00) +#define VMWDT_REG_LOAD_CNT (0x04) +#define VMWDT_REG_CURRENT_CNT (0x08) +#define VMWDT_REG_CLOCK_FREQ_HZ (0x0C) +#define VMWDT_REG_LEN (0x10) + +#define VMWDT_DEFAULT_CLOCK_HZ (10) +#define VMWDT_DEFAULT_TIMEOT_SEC (8) + +struct vm_wdt_s { + void __iomem *membase; + u32 clock_freq; + u32 expiration_sec; + u32 ping_timeout_ms; + struct hrtimer per_cpu_hrtimer; + struct platform_device *dev; +}; + +#define vmwdt_reg_write(wdt, reg, value) \ + iowrite32((value), (wdt)->membase + (reg)) +#define vmwdt_reg_read(wdt, reg) \ + io32read((wdt)->membase + (reg)) + +static struct platform_device *virt_dev; + +static enum hrtimer_restart vmwdt_timer_fn(struct hrtimer *hrtimer) +{ + struct vm_wdt_s *cpu_wdt; + u32 ticks; + + cpu_wdt = container_of(hrtimer, struct vm_wdt_s, per_cpu_hrtimer); + ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec; + vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks); + hrtimer_forward_now(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms)); + + return HRTIMER_RESTART; +} + +static void vmwdt_start(void *arg) +{ + u32 ticks; + struct vm_wdt_s *cpu_wdt = arg; + struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer; + + vmwdt_reg_write(cpu_wdt, VMWDT_REG_CLOCK_FREQ_HZ, + cpu_wdt->clock_freq); + + /* Compute the number of ticks required for the watchdog counter + * register based on the internal clock frequency and the watchdog + * timeout given from the device tree. + */ + ticks = cpu_wdt->clock_freq * cpu_wdt->expiration_sec; + vmwdt_reg_write(cpu_wdt, VMWDT_REG_LOAD_CNT, ticks); + + /* Enable the internal clock and start the watchdog */ + vmwdt_reg_write(cpu_wdt, VMWDT_REG_STATUS, 1); + + hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer->function = vmwdt_timer_fn; + hrtimer_start(hrtimer, ms_to_ktime(cpu_wdt->ping_timeout_ms), + HRTIMER_MODE_REL_PINNED); +} + +static void vmwdt_stop(void *arg) +{ + struct vm_wdt_s *cpu_wdt = arg; + struct hrtimer *hrtimer = &cpu_wdt->per_cpu_hrtimer; + + hrtimer_cancel(hrtimer); + + /* Disable the watchdog */ + vmwdt_reg_write(cpu_wdt, VMWDT_REG_STATUS, 0); +} + +static int start_watchdog_on_cpu(unsigned int cpu) +{ + struct vm_wdt_s *vm_wdt = platform_get_drvdata(virt_dev); + + vmwdt_start(this_cpu_ptr(vm_wdt)); + return 0; +} + +static int stop_watchdog_on_cpu(unsigned int cpu) +{ + struct vm_wdt_s *vm_wdt = platform_get_drvdata(virt_dev); + + vmwdt_stop(this_cpu_ptr(vm_wdt)); + return 0; +} + +static int vmwdt_probe(struct platform_device *dev) +{ + int cpu, ret, err; + void __iomem *membase; + struct resource *r; + struct vm_wdt_s *vm_wdt; + u32 wdt_clock, wdt_timeout_sec = 0; + + r = platform_get_resource(dev, IORESOURCE_MEM, 0); + if (r == NULL) + return -ENOENT; + + vm_wdt = alloc_percpu(typeof(struct vm_wdt_s)); + if (!vm_wdt) + return -ENOMEM; + + membase = ioremap(r->start, resource_size(r)); + if (!membase) { + ret = -ENXIO; + goto err_withmem; + } + + virt_dev = dev; + platform_set_drvdata(dev, vm_wdt); + if (of_property_read_u32(dev->dev.of_node, "clock-frequency", + &wdt_clock)) + wdt_clock = VMWDT_DEFAULT_CLOCK_HZ; + + if (of_property_read_u32(dev->dev.of_node, "timeout-sec", + &wdt_timeout_sec)) + wdt_timeout_sec = VMWDT_DEFAULT_TIMEOT_SEC; + + for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) { + struct vm_wdt_s *cpu_wdt = per_cpu_ptr(vm_wdt, cpu); + + cpu_wdt->membase = membase + cpu * VMWDT_REG_LEN; + cpu_wdt->clock_freq = wdt_clock; + cpu_wdt->expiration_sec = wdt_timeout_sec; + cpu_wdt->ping_timeout_ms = wdt_timeout_sec * MSEC_PER_SEC / 2; + smp_call_function_single(cpu, vmwdt_start, cpu_wdt, true); + } + + err = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, + "virt/watchdog:online", + start_watchdog_on_cpu, + stop_watchdog_on_cpu); + if (err < 0) { + dev_warn(&dev->dev, "failed to install cpu hotplug"); + ret = err; + goto err_withmem; + } + + return 0; + +err_withmem: + free_percpu(vm_wdt); + return ret; +} + +static int vmwdt_remove(struct platform_device *dev) +{ + int cpu; + struct vm_wdt_s *vm_wdt = platform_get_drvdata(dev); + + for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) { + struct vm_wdt_s *cpu_wdt = per_cpu_ptr(vm_wdt, cpu); + + smp_call_function_single(cpu, vmwdt_stop, cpu_wdt, true); + } + + free_percpu(vm_wdt); + return 0; +} + +static const struct of_device_id vmwdt_of_match[] = { + { .compatible = "qemu,vm-watchdog", }, + {} +}; + +MODULE_DEVICE_TABLE(of, vm_watchdog_of_match); + +static struct platform_driver vmwdt_driver = { + .probe = vmwdt_probe, + .remove = vmwdt_remove, + .driver = { + .name = KBUILD_MODNAME, + .of_match_table = vmwdt_of_match, + }, +}; + +module_platform_driver(vmwdt_driver); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Sebastian Ene <sebastianene@google.com>"); +MODULE_DESCRIPTION("Virtual watchdog driver"); -- 2.36.0.464.gb9c8b46e94-goog ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene @ 2022-04-29 8:48 ` Greg Kroah-Hartman 2022-04-29 16:51 ` Guenter Roeck 2022-04-29 8:51 ` Greg Kroah-Hartman 1 sibling, 1 reply; 18+ messages in thread From: Greg Kroah-Hartman @ 2022-04-29 8:48 UTC (permalink / raw) To: Sebastian Ene Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel, devicetree, maz, will, qperret, Guenter Roeck On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: > This driver creates per-cpu hrtimers which are required to do the > periodic 'pet' operation. On a conventional watchdog-core driver, the > userspace is responsible for delivering the 'pet' events by writing to > the particular /dev/watchdogN node. In this case we require a strong > thread affinity to be able to account for lost time on a per vCPU. > > This part of the driver is the 'frontend' which is reponsible for > delivering the periodic 'pet' events, configuring the virtual peripheral > and listening for cpu hotplug events. The other part of the driver > handles the peripheral emulation and this part accounts for lost time by > looking at the /proc/{}/task/{}/stat entries and is located here: > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > --- > drivers/misc/Kconfig | 12 +++ > drivers/misc/Makefile | 1 + > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ > 3 files changed, 219 insertions(+) > create mode 100644 drivers/misc/vm-watchdog.c > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig > index 2b9572a6d114..26c3a99e269c 100644 > --- a/drivers/misc/Kconfig > +++ b/drivers/misc/Kconfig > @@ -493,6 +493,18 @@ config OPEN_DICE > > If unsure, say N. > > +config VM_WATCHDOG > + tristate "Virtual Machine Watchdog" > + select LOCKUP_DETECTOR > + help > + Detect CPU locks on the virtual machine. This driver relies on the > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU > + has to do a 'pet', it exits the guest through MMIO write and the > + backend driver takes into account the lost ticks for this particular > + CPU. > + To compile this driver as a module, choose M here: the > + module will be called vm-wdt. You forgot to name the module properly here based on the Makefile change you made. And again, as this is called a "watchdog", it seems crazy that it is not in drivers/watchdog/ {sigh} greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 8:48 ` Greg Kroah-Hartman @ 2022-04-29 16:51 ` Guenter Roeck 2022-04-30 6:18 ` Greg Kroah-Hartman 0 siblings, 1 reply; 18+ messages in thread From: Guenter Roeck @ 2022-04-29 16:51 UTC (permalink / raw) To: Greg Kroah-Hartman, Sebastian Ene Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel, devicetree, maz, will, qperret On 4/29/22 01:48, Greg Kroah-Hartman wrote: > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: >> This driver creates per-cpu hrtimers which are required to do the >> periodic 'pet' operation. On a conventional watchdog-core driver, the >> userspace is responsible for delivering the 'pet' events by writing to >> the particular /dev/watchdogN node. In this case we require a strong >> thread affinity to be able to account for lost time on a per vCPU. >> >> This part of the driver is the 'frontend' which is reponsible for >> delivering the periodic 'pet' events, configuring the virtual peripheral >> and listening for cpu hotplug events. The other part of the driver >> handles the peripheral emulation and this part accounts for lost time by >> looking at the /proc/{}/task/{}/stat entries and is located here: >> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 >> >> Signed-off-by: Sebastian Ene <sebastianene@google.com> >> --- >> drivers/misc/Kconfig | 12 +++ >> drivers/misc/Makefile | 1 + >> drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ >> 3 files changed, 219 insertions(+) >> create mode 100644 drivers/misc/vm-watchdog.c >> >> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig >> index 2b9572a6d114..26c3a99e269c 100644 >> --- a/drivers/misc/Kconfig >> +++ b/drivers/misc/Kconfig >> @@ -493,6 +493,18 @@ config OPEN_DICE >> >> If unsure, say N. >> >> +config VM_WATCHDOG >> + tristate "Virtual Machine Watchdog" >> + select LOCKUP_DETECTOR >> + help >> + Detect CPU locks on the virtual machine. This driver relies on the >> + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU >> + has to do a 'pet', it exits the guest through MMIO write and the >> + backend driver takes into account the lost ticks for this particular >> + CPU. >> + To compile this driver as a module, choose M here: the >> + module will be called vm-wdt. > > You forgot to name the module properly here based on the Makefile change > you made. > > And again, as this is called a "watchdog", it seems crazy that it is not > in drivers/watchdog/ > I disagree. It is not a watchdog driver in the traditional sense (it does not use, want to use, or need to use the watchdog driver API or ABI). Its functionality is similar to the functionality of kernel/watchdog.c, which doesn't belong into drivers/watchdog either. Guenter ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 16:51 ` Guenter Roeck @ 2022-04-30 6:18 ` Greg Kroah-Hartman 2022-04-30 12:36 ` Guenter Roeck 0 siblings, 1 reply; 18+ messages in thread From: Greg Kroah-Hartman @ 2022-04-30 6:18 UTC (permalink / raw) To: Guenter Roeck Cc: Sebastian Ene, Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel, devicetree, maz, will, qperret On Fri, Apr 29, 2022 at 09:51:51AM -0700, Guenter Roeck wrote: > On 4/29/22 01:48, Greg Kroah-Hartman wrote: > > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: > > > This driver creates per-cpu hrtimers which are required to do the > > > periodic 'pet' operation. On a conventional watchdog-core driver, the > > > userspace is responsible for delivering the 'pet' events by writing to > > > the particular /dev/watchdogN node. In this case we require a strong > > > thread affinity to be able to account for lost time on a per vCPU. > > > > > > This part of the driver is the 'frontend' which is reponsible for > > > delivering the periodic 'pet' events, configuring the virtual peripheral > > > and listening for cpu hotplug events. The other part of the driver > > > handles the peripheral emulation and this part accounts for lost time by > > > looking at the /proc/{}/task/{}/stat entries and is located here: > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > > > > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > > > --- > > > drivers/misc/Kconfig | 12 +++ > > > drivers/misc/Makefile | 1 + > > > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 219 insertions(+) > > > create mode 100644 drivers/misc/vm-watchdog.c > > > > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig > > > index 2b9572a6d114..26c3a99e269c 100644 > > > --- a/drivers/misc/Kconfig > > > +++ b/drivers/misc/Kconfig > > > @@ -493,6 +493,18 @@ config OPEN_DICE > > > If unsure, say N. > > > +config VM_WATCHDOG > > > + tristate "Virtual Machine Watchdog" > > > + select LOCKUP_DETECTOR > > > + help > > > + Detect CPU locks on the virtual machine. This driver relies on the > > > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU > > > + has to do a 'pet', it exits the guest through MMIO write and the > > > + backend driver takes into account the lost ticks for this particular > > > + CPU. > > > + To compile this driver as a module, choose M here: the > > > + module will be called vm-wdt. > > > > You forgot to name the module properly here based on the Makefile change > > you made. > > > > And again, as this is called a "watchdog", it seems crazy that it is not > > in drivers/watchdog/ > > > > I disagree. It is not a watchdog driver in the traditional sense (it does > not use, want to use, or need to use the watchdog driver API or ABI). > Its functionality is similar to the functionality of kernel/watchdog.c, > which doesn't belong into drivers/watchdog either. Ah, ok, that makes more sense, the user/kernel api is not the same. Someone should put that in the changelog next time :) thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-30 6:18 ` Greg Kroah-Hartman @ 2022-04-30 12:36 ` Guenter Roeck 2022-05-02 5:58 ` Sebastian Ene 0 siblings, 1 reply; 18+ messages in thread From: Guenter Roeck @ 2022-04-30 12:36 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Sebastian Ene, Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel, devicetree, maz, will, qperret On 4/29/22 23:18, Greg Kroah-Hartman wrote: > On Fri, Apr 29, 2022 at 09:51:51AM -0700, Guenter Roeck wrote: >> On 4/29/22 01:48, Greg Kroah-Hartman wrote: >>> On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: >>>> This driver creates per-cpu hrtimers which are required to do the >>>> periodic 'pet' operation. On a conventional watchdog-core driver, the >>>> userspace is responsible for delivering the 'pet' events by writing to >>>> the particular /dev/watchdogN node. In this case we require a strong >>>> thread affinity to be able to account for lost time on a per vCPU. >>>> >>>> This part of the driver is the 'frontend' which is reponsible for >>>> delivering the periodic 'pet' events, configuring the virtual peripheral >>>> and listening for cpu hotplug events. The other part of the driver >>>> handles the peripheral emulation and this part accounts for lost time by >>>> looking at the /proc/{}/task/{}/stat entries and is located here: >>>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 >>>> >>>> Signed-off-by: Sebastian Ene <sebastianene@google.com> >>>> --- >>>> drivers/misc/Kconfig | 12 +++ >>>> drivers/misc/Makefile | 1 + >>>> drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ >>>> 3 files changed, 219 insertions(+) >>>> create mode 100644 drivers/misc/vm-watchdog.c >>>> >>>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig >>>> index 2b9572a6d114..26c3a99e269c 100644 >>>> --- a/drivers/misc/Kconfig >>>> +++ b/drivers/misc/Kconfig >>>> @@ -493,6 +493,18 @@ config OPEN_DICE >>>> If unsure, say N. >>>> +config VM_WATCHDOG >>>> + tristate "Virtual Machine Watchdog" >>>> + select LOCKUP_DETECTOR >>>> + help >>>> + Detect CPU locks on the virtual machine. This driver relies on the >>>> + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU >>>> + has to do a 'pet', it exits the guest through MMIO write and the >>>> + backend driver takes into account the lost ticks for this particular >>>> + CPU. >>>> + To compile this driver as a module, choose M here: the >>>> + module will be called vm-wdt. >>> >>> You forgot to name the module properly here based on the Makefile change >>> you made. >>> >>> And again, as this is called a "watchdog", it seems crazy that it is not >>> in drivers/watchdog/ >>> >> >> I disagree. It is not a watchdog driver in the traditional sense (it does >> not use, want to use, or need to use the watchdog driver API or ABI). >> Its functionality is similar to the functionality of kernel/watchdog.c, >> which doesn't belong into drivers/watchdog either. > > Ah, ok, that makes more sense, the user/kernel api is not the same. > Someone should put that in the changelog next time :) > Renaming it to "VCPU stall detector" or similar should fix the confusion. Guenter ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-30 12:36 ` Guenter Roeck @ 2022-05-02 5:58 ` Sebastian Ene 0 siblings, 0 replies; 18+ messages in thread From: Sebastian Ene @ 2022-05-02 5:58 UTC (permalink / raw) To: Guenter Roeck Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann, devicetree, qperret, will, maz, Greg Kroah-Hartman, Rob Herring On Sat, Apr 30, 2022 at 05:36:50AM -0700, Guenter Roeck wrote: > On 4/29/22 23:18, Greg Kroah-Hartman wrote: > > On Fri, Apr 29, 2022 at 09:51:51AM -0700, Guenter Roeck wrote: > > > On 4/29/22 01:48, Greg Kroah-Hartman wrote: > > > > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: > > > > > This driver creates per-cpu hrtimers which are required to do the > > > > > periodic 'pet' operation. On a conventional watchdog-core driver, the > > > > > userspace is responsible for delivering the 'pet' events by writing to > > > > > the particular /dev/watchdogN node. In this case we require a strong > > > > > thread affinity to be able to account for lost time on a per vCPU. > > > > > > > > > > This part of the driver is the 'frontend' which is reponsible for > > > > > delivering the periodic 'pet' events, configuring the virtual peripheral > > > > > and listening for cpu hotplug events. The other part of the driver > > > > > handles the peripheral emulation and this part accounts for lost time by > > > > > looking at the /proc/{}/task/{}/stat entries and is located here: > > > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > > > > > > > > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > > > > > --- > > > > > drivers/misc/Kconfig | 12 +++ > > > > > drivers/misc/Makefile | 1 + > > > > > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ > > > > > 3 files changed, 219 insertions(+) > > > > > create mode 100644 drivers/misc/vm-watchdog.c > > > > > > > > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig > > > > > index 2b9572a6d114..26c3a99e269c 100644 > > > > > --- a/drivers/misc/Kconfig > > > > > +++ b/drivers/misc/Kconfig > > > > > @@ -493,6 +493,18 @@ config OPEN_DICE > > > > > If unsure, say N. > > > > > +config VM_WATCHDOG > > > > > + tristate "Virtual Machine Watchdog" > > > > > + select LOCKUP_DETECTOR > > > > > + help > > > > > + Detect CPU locks on the virtual machine. This driver relies on the > > > > > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU > > > > > + has to do a 'pet', it exits the guest through MMIO write and the > > > > > + backend driver takes into account the lost ticks for this particular > > > > > + CPU. > > > > > + To compile this driver as a module, choose M here: the > > > > > + module will be called vm-wdt. > > > > > > > > You forgot to name the module properly here based on the Makefile change > > > > you made. > > > > > > > > And again, as this is called a "watchdog", it seems crazy that it is not > > > > in drivers/watchdog/ > > > > > > > > > > I disagree. It is not a watchdog driver in the traditional sense (it does > > > not use, want to use, or need to use the watchdog driver API or ABI). > > > Its functionality is similar to the functionality of kernel/watchdog.c, > > > which doesn't belong into drivers/watchdog either. Hi, > > > > Ah, ok, that makes more sense, the user/kernel api is not the same. > > Someone should put that in the changelog next time :) > > I will include this in the changelog to fix the confusion. > > Renaming it to "VCPU stall detector" or similar should fix the confusion. > Yes I think that is a better naming. Thanks, Seb > Guenter ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene 2022-04-29 8:48 ` Greg Kroah-Hartman @ 2022-04-29 8:51 ` Greg Kroah-Hartman 2022-04-29 9:26 ` Sebastian Ene 1 sibling, 1 reply; 18+ messages in thread From: Greg Kroah-Hartman @ 2022-04-29 8:51 UTC (permalink / raw) To: Sebastian Ene Cc: Rob Herring, Arnd Bergmann, Dragan Cvetic, linux-kernel, devicetree, maz, will, qperret, Guenter Roeck On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: > This driver creates per-cpu hrtimers which are required to do the > periodic 'pet' operation. On a conventional watchdog-core driver, the > userspace is responsible for delivering the 'pet' events by writing to > the particular /dev/watchdogN node. In this case we require a strong > thread affinity to be able to account for lost time on a per vCPU. > > This part of the driver is the 'frontend' which is reponsible for > delivering the periodic 'pet' events, configuring the virtual peripheral > and listening for cpu hotplug events. The other part of the driver > handles the peripheral emulation and this part accounts for lost time by > looking at the /proc/{}/task/{}/stat entries and is located here: > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > --- > drivers/misc/Kconfig | 12 +++ > drivers/misc/Makefile | 1 + > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ > 3 files changed, 219 insertions(+) > create mode 100644 drivers/misc/vm-watchdog.c > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig > index 2b9572a6d114..26c3a99e269c 100644 > --- a/drivers/misc/Kconfig > +++ b/drivers/misc/Kconfig > @@ -493,6 +493,18 @@ config OPEN_DICE > > If unsure, say N. > > +config VM_WATCHDOG > + tristate "Virtual Machine Watchdog" > + select LOCKUP_DETECTOR > + help > + Detect CPU locks on the virtual machine. This driver relies on the > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU > + has to do a 'pet', it exits the guest through MMIO write and the > + backend driver takes into account the lost ticks for this particular > + CPU. There's nothing to keep this tied to a virtual machine at all, right? You are just relying on some iomem address to be updated, so it should be a "generic_iomem_watchdog" driver as there's nothing specific to vms at all from what I can tell. thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 8:51 ` Greg Kroah-Hartman @ 2022-04-29 9:26 ` Sebastian Ene 2022-04-29 9:38 ` Greg Kroah-Hartman 2022-04-29 17:02 ` Guenter Roeck 0 siblings, 2 replies; 18+ messages in thread From: Sebastian Ene @ 2022-04-29 9:26 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann, devicetree, qperret, will, Rob Herring, Guenter Roeck On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote: > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: > > This driver creates per-cpu hrtimers which are required to do the > > periodic 'pet' operation. On a conventional watchdog-core driver, the > > userspace is responsible for delivering the 'pet' events by writing to > > the particular /dev/watchdogN node. In this case we require a strong > > thread affinity to be able to account for lost time on a per vCPU. > > > > This part of the driver is the 'frontend' which is reponsible for > > delivering the periodic 'pet' events, configuring the virtual peripheral > > and listening for cpu hotplug events. The other part of the driver > > handles the peripheral emulation and this part accounts for lost time by > > looking at the /proc/{}/task/{}/stat entries and is located here: > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > > --- > > drivers/misc/Kconfig | 12 +++ > > drivers/misc/Makefile | 1 + > > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ > > 3 files changed, 219 insertions(+) > > create mode 100644 drivers/misc/vm-watchdog.c > > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig > > index 2b9572a6d114..26c3a99e269c 100644 > > --- a/drivers/misc/Kconfig > > +++ b/drivers/misc/Kconfig > > @@ -493,6 +493,18 @@ config OPEN_DICE > > > > If unsure, say N. > > > > +config VM_WATCHDOG > > + tristate "Virtual Machine Watchdog" > > + select LOCKUP_DETECTOR > > + help > > + Detect CPU locks on the virtual machine. This driver relies on the > > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU > > + has to do a 'pet', it exits the guest through MMIO write and the > > + backend driver takes into account the lost ticks for this particular > > + CPU. Hi, > > There's nothing to keep this tied to a virtual machine at all, right? > You are just relying on some iomem address to be updated, so it should > be a "generic_iomem_watchdog" driver as there's nothing specific to vms > at all from what I can tell. > > thanks, > > greg k-h That's right although I might think of using the term "generic lockup detector" instead of watchdog. The only reason why I would keep "virtual machine" word in, is that there is no actual hardware for this. Thanks, Seb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 9:26 ` Sebastian Ene @ 2022-04-29 9:38 ` Greg Kroah-Hartman 2022-04-29 21:03 ` Rob Herring 2022-04-29 17:02 ` Guenter Roeck 1 sibling, 1 reply; 18+ messages in thread From: Greg Kroah-Hartman @ 2022-04-29 9:38 UTC (permalink / raw) To: Sebastian Ene Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann, devicetree, qperret, will, Rob Herring, Guenter Roeck On Fri, Apr 29, 2022 at 09:26:26AM +0000, Sebastian Ene wrote: > On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote: > > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: > > > This driver creates per-cpu hrtimers which are required to do the > > > periodic 'pet' operation. On a conventional watchdog-core driver, the > > > userspace is responsible for delivering the 'pet' events by writing to > > > the particular /dev/watchdogN node. In this case we require a strong > > > thread affinity to be able to account for lost time on a per vCPU. > > > > > > This part of the driver is the 'frontend' which is reponsible for > > > delivering the periodic 'pet' events, configuring the virtual peripheral > > > and listening for cpu hotplug events. The other part of the driver > > > handles the peripheral emulation and this part accounts for lost time by > > > looking at the /proc/{}/task/{}/stat entries and is located here: > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > > > > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > > > --- > > > drivers/misc/Kconfig | 12 +++ > > > drivers/misc/Makefile | 1 + > > > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 219 insertions(+) > > > create mode 100644 drivers/misc/vm-watchdog.c > > > > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig > > > index 2b9572a6d114..26c3a99e269c 100644 > > > --- a/drivers/misc/Kconfig > > > +++ b/drivers/misc/Kconfig > > > @@ -493,6 +493,18 @@ config OPEN_DICE > > > > > > If unsure, say N. > > > > > > +config VM_WATCHDOG > > > + tristate "Virtual Machine Watchdog" > > > + select LOCKUP_DETECTOR > > > + help > > > + Detect CPU locks on the virtual machine. This driver relies on the > > > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU > > > + has to do a 'pet', it exits the guest through MMIO write and the > > > + backend driver takes into account the lost ticks for this particular > > > + CPU. > > Hi, > > > > > There's nothing to keep this tied to a virtual machine at all, right? > > You are just relying on some iomem address to be updated, so it should > > be a "generic_iomem_watchdog" driver as there's nothing specific to vms > > at all from what I can tell. > > > > thanks, > > > > greg k-h > > That's right although I might think of using the term "generic lockup detector" > instead of watchdog. The only reason why I would keep "virtual machine" > word in, is that there is no actual hardware for this. That doesn't really matter, it's just a memory location in device tree that you are needing, odds are some hardware device could use it just like this. thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 9:38 ` Greg Kroah-Hartman @ 2022-04-29 21:03 ` Rob Herring 2022-05-04 7:29 ` Sebastian Ene 0 siblings, 1 reply; 18+ messages in thread From: Rob Herring @ 2022-04-29 21:03 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Sebastian Ene, linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann, devicetree, qperret, will, Guenter Roeck On Fri, Apr 29, 2022 at 11:38:52AM +0200, Greg Kroah-Hartman wrote: > On Fri, Apr 29, 2022 at 09:26:26AM +0000, Sebastian Ene wrote: > > On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote: > > > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: > > > > This driver creates per-cpu hrtimers which are required to do the > > > > periodic 'pet' operation. On a conventional watchdog-core driver, the > > > > userspace is responsible for delivering the 'pet' events by writing to > > > > the particular /dev/watchdogN node. In this case we require a strong > > > > thread affinity to be able to account for lost time on a per vCPU. > > > > > > > > This part of the driver is the 'frontend' which is reponsible for > > > > delivering the periodic 'pet' events, configuring the virtual peripheral > > > > and listening for cpu hotplug events. The other part of the driver > > > > handles the peripheral emulation and this part accounts for lost time by > > > > looking at the /proc/{}/task/{}/stat entries and is located here: > > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > > > > > > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > > > > --- > > > > drivers/misc/Kconfig | 12 +++ > > > > drivers/misc/Makefile | 1 + > > > > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ > > > > 3 files changed, 219 insertions(+) > > > > create mode 100644 drivers/misc/vm-watchdog.c > > > > > > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig > > > > index 2b9572a6d114..26c3a99e269c 100644 > > > > --- a/drivers/misc/Kconfig > > > > +++ b/drivers/misc/Kconfig > > > > @@ -493,6 +493,18 @@ config OPEN_DICE > > > > > > > > If unsure, say N. > > > > > > > > +config VM_WATCHDOG > > > > + tristate "Virtual Machine Watchdog" > > > > + select LOCKUP_DETECTOR > > > > + help > > > > + Detect CPU locks on the virtual machine. This driver relies on the > > > > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU > > > > + has to do a 'pet', it exits the guest through MMIO write and the > > > > + backend driver takes into account the lost ticks for this particular > > > > + CPU. > > > > Hi, > > > > > > > > There's nothing to keep this tied to a virtual machine at all, right? > > > You are just relying on some iomem address to be updated, so it should > > > be a "generic_iomem_watchdog" driver as there's nothing specific to vms > > > at all from what I can tell. > > > > > > thanks, > > > > > > greg k-h > > > > That's right although I might think of using the term "generic lockup detector" > > instead of watchdog. The only reason why I would keep "virtual machine" > > word in, is that there is no actual hardware for this. > > That doesn't really matter, it's just a memory location in device tree > that you are needing, odds are some hardware device could use it just > like this. Such as a shared on-chip memory that both a system control processor and the main processors can access. Of course, those also typically already have a comnunication channel. But for a VM-hypervisor interface, why isn't one of the existing communications interfaces being used? One that is discoverable would be better than using DT. Rob ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 21:03 ` Rob Herring @ 2022-05-04 7:29 ` Sebastian Ene 0 siblings, 0 replies; 18+ messages in thread From: Sebastian Ene @ 2022-05-04 7:29 UTC (permalink / raw) To: Rob Herring Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann, devicetree, qperret, will, maz, Greg Kroah-Hartman On Fri, Apr 29, 2022 at 04:03:45PM -0500, Rob Herring wrote: > On Fri, Apr 29, 2022 at 11:38:52AM +0200, Greg Kroah-Hartman wrote: > > On Fri, Apr 29, 2022 at 09:26:26AM +0000, Sebastian Ene wrote: > > > On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote: > > > > On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: > > > > > This driver creates per-cpu hrtimers which are required to do the > > > > > periodic 'pet' operation. On a conventional watchdog-core driver, the > > > > > userspace is responsible for delivering the 'pet' events by writing to > > > > > the particular /dev/watchdogN node. In this case we require a strong > > > > > thread affinity to be able to account for lost time on a per vCPU. > > > > > > > > > > This part of the driver is the 'frontend' which is reponsible for > > > > > delivering the periodic 'pet' events, configuring the virtual peripheral > > > > > and listening for cpu hotplug events. The other part of the driver > > > > > handles the peripheral emulation and this part accounts for lost time by > > > > > looking at the /proc/{}/task/{}/stat entries and is located here: > > > > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > > > > > > > > > Signed-off-by: Sebastian Ene <sebastianene@google.com> > > > > > --- > > > > > drivers/misc/Kconfig | 12 +++ > > > > > drivers/misc/Makefile | 1 + > > > > > drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ > > > > > 3 files changed, 219 insertions(+) > > > > > create mode 100644 drivers/misc/vm-watchdog.c > > > > > > > > > > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig > > > > > index 2b9572a6d114..26c3a99e269c 100644 > > > > > --- a/drivers/misc/Kconfig > > > > > +++ b/drivers/misc/Kconfig > > > > > @@ -493,6 +493,18 @@ config OPEN_DICE > > > > > > > > > > If unsure, say N. > > > > > > > > > > +config VM_WATCHDOG > > > > > + tristate "Virtual Machine Watchdog" > > > > > + select LOCKUP_DETECTOR > > > > > + help > > > > > + Detect CPU locks on the virtual machine. This driver relies on the > > > > > + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU > > > > > + has to do a 'pet', it exits the guest through MMIO write and the > > > > > + backend driver takes into account the lost ticks for this particular > > > > > + CPU. > > > > > > Hi, > > > > > > > > > > > There's nothing to keep this tied to a virtual machine at all, right? > > > > You are just relying on some iomem address to be updated, so it should > > > > be a "generic_iomem_watchdog" driver as there's nothing specific to vms > > > > at all from what I can tell. > > > > > > > > thanks, > > > > > > > > greg k-h > > > > > > That's right although I might think of using the term "generic lockup detector" > > > instead of watchdog. The only reason why I would keep "virtual machine" > > > word in, is that there is no actual hardware for this. > > > > That doesn't really matter, it's just a memory location in device tree > > that you are needing, odds are some hardware device could use it just > > like this. Hi, > > Such as a shared on-chip memory that both a system control processor and > the main processors can access. Of course, those also typically already > have a comnunication channel. > > But for a VM-hypervisor interface, why isn't one of the existing > communications interfaces being used? One that is discoverable would be > better than using DT. > In a protected VM we don't trust the host to present and control the loaded peripherals. We rely on another entity to generate a trusted device tree for us. I hope this clarifies the need for DT and I think this information should also be added in the changelog. > Rob Thanks, Seb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs 2022-04-29 9:26 ` Sebastian Ene 2022-04-29 9:38 ` Greg Kroah-Hartman @ 2022-04-29 17:02 ` Guenter Roeck 1 sibling, 0 replies; 18+ messages in thread From: Guenter Roeck @ 2022-04-29 17:02 UTC (permalink / raw) To: Sebastian Ene, Greg Kroah-Hartman Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann, devicetree, qperret, will, Rob Herring On 4/29/22 02:26, Sebastian Ene wrote: > On Fri, Apr 29, 2022 at 10:51:14AM +0200, Greg Kroah-Hartman wrote: >> On Fri, Apr 29, 2022 at 08:30:33AM +0000, Sebastian Ene wrote: >>> This driver creates per-cpu hrtimers which are required to do the >>> periodic 'pet' operation. On a conventional watchdog-core driver, the >>> userspace is responsible for delivering the 'pet' events by writing to >>> the particular /dev/watchdogN node. In this case we require a strong >>> thread affinity to be able to account for lost time on a per vCPU. >>> >>> This part of the driver is the 'frontend' which is reponsible for >>> delivering the periodic 'pet' events, configuring the virtual peripheral >>> and listening for cpu hotplug events. The other part of the driver >>> handles the peripheral emulation and this part accounts for lost time by >>> looking at the /proc/{}/task/{}/stat entries and is located here: >>> https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 >>> >>> Signed-off-by: Sebastian Ene <sebastianene@google.com> >>> --- >>> drivers/misc/Kconfig | 12 +++ >>> drivers/misc/Makefile | 1 + >>> drivers/misc/vm-watchdog.c | 206 +++++++++++++++++++++++++++++++++++++ >>> 3 files changed, 219 insertions(+) >>> create mode 100644 drivers/misc/vm-watchdog.c >>> >>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig >>> index 2b9572a6d114..26c3a99e269c 100644 >>> --- a/drivers/misc/Kconfig >>> +++ b/drivers/misc/Kconfig >>> @@ -493,6 +493,18 @@ config OPEN_DICE >>> >>> If unsure, say N. >>> >>> +config VM_WATCHDOG >>> + tristate "Virtual Machine Watchdog" >>> + select LOCKUP_DETECTOR >>> + help >>> + Detect CPU locks on the virtual machine. This driver relies on the >>> + hrtimers which are CPU-binded to do the 'pet' operation. When a vCPU >>> + has to do a 'pet', it exits the guest through MMIO write and the >>> + backend driver takes into account the lost ticks for this particular >>> + CPU. > > Hi, > >> >> There's nothing to keep this tied to a virtual machine at all, right? >> You are just relying on some iomem address to be updated, so it should >> be a "generic_iomem_watchdog" driver as there's nothing specific to vms >> at all from what I can tell. >> >> thanks, >> >> greg k-h > > That's right although I might think of using the term "generic lockup detector" Agreed, that would be a much better name. Guenter > instead of watchdog. The only reason why I would keep "virtual machine" > word in, is that there is no actual hardware for this. > > Thanks, > Seb ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/2] Detect stalls on guest vCPUS 2022-04-29 8:30 [PATCH v4 0/2] Detect stalls on guest vCPUS Sebastian Ene 2022-04-29 8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene 2022-04-29 8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene @ 2022-04-29 20:25 ` Rob Herring 2022-05-02 6:03 ` Sebastian Ene 2 siblings, 1 reply; 18+ messages in thread From: Rob Herring @ 2022-04-29 20:25 UTC (permalink / raw) To: Sebastian Ene Cc: Greg Kroah-Hartman, Arnd Bergmann, Dragan Cvetic, linux-kernel, devicetree, maz, will, qperret, Guenter Roeck On Fri, Apr 29, 2022 at 08:30:29AM +0000, Sebastian Ene wrote: > This adds a mechanism to detect stalls on the guest vCPUS by creating a > per CPU hrtimer which periodically 'pets' the host backend driver. > On a conventional watchdog-core driver, the userspace is responsible for > delivering the 'pet' events by writing to the particular /dev/watchdogN node. > In this case we require a strong thread affinity to be able to > account for lost time on a per vCPU basis. > > This device driver acts as a soft lockup detector by relying on the host > backend driver to measure the elapesed time between subsequent 'pet' events. > If the elapsed time doesn't match an expected value, the backend driver > decides that the guest vCPU is locked and resets the guest. The host > backend driver takes into account the time that the guest is not > running. The communication with the backend driver is done through MMIO > and the register layout of the virtual watchdog is described as part of > the backend driver changes. > > The host backend driver is implemented as part of: > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > Changelog v4: > - rename the source from vm-wdt.c -> vm-watchdog.c > - convert all the error logging calls from pr_* to dev_* calls > - rename the DTS node "clock" to "clock-frequency" Why do I have a v4 now when the discussion on v3 is not concluded. Give folks some time to respond. We're busy drinking from the firehose. Rob ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/2] Detect stalls on guest vCPUS 2022-04-29 20:25 ` [PATCH v4 0/2] Detect stalls on guest vCPUS Rob Herring @ 2022-05-02 6:03 ` Sebastian Ene 0 siblings, 0 replies; 18+ messages in thread From: Sebastian Ene @ 2022-05-02 6:03 UTC (permalink / raw) To: Rob Herring Cc: linux-kernel, Derek Kiernan, Dragan Cvetic, Arnd Bergmann, devicetree, qperret, will, maz, Greg Kroah-Hartman, Guenter Roeck On Fri, Apr 29, 2022 at 03:25:45PM -0500, Rob Herring wrote: > On Fri, Apr 29, 2022 at 08:30:29AM +0000, Sebastian Ene wrote: > > This adds a mechanism to detect stalls on the guest vCPUS by creating a > > per CPU hrtimer which periodically 'pets' the host backend driver. > > On a conventional watchdog-core driver, the userspace is responsible for > > delivering the 'pet' events by writing to the particular /dev/watchdogN node. > > In this case we require a strong thread affinity to be able to > > account for lost time on a per vCPU basis. > > > > This device driver acts as a soft lockup detector by relying on the host > > backend driver to measure the elapesed time between subsequent 'pet' events. > > If the elapsed time doesn't match an expected value, the backend driver > > decides that the guest vCPU is locked and resets the guest. The host > > backend driver takes into account the time that the guest is not > > running. The communication with the backend driver is done through MMIO > > and the register layout of the virtual watchdog is described as part of > > the backend driver changes. > > > > The host backend driver is implemented as part of: > > https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3548817 > > > > Changelog v4: > > - rename the source from vm-wdt.c -> vm-watchdog.c > > - convert all the error logging calls from pr_* to dev_* calls > > - rename the DTS node "clock" to "clock-frequency" Hi, > > Why do I have a v4 now when the discussion on v3 is not concluded. Give > folks some time to respond. We're busy drinking from the firehose. > I am trying to address the issues incrementlly keeping a week cadence. Any feedback on this is welcomed. Thanks, Seb > Rob ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-05-07 8:16 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-29 8:30 [PATCH v4 0/2] Detect stalls on guest vCPUS Sebastian Ene 2022-04-29 8:30 ` [PATCH v4 1/2] dt-bindings: vm-wdt: Add qemu,vm-watchdog compatible Sebastian Ene 2022-04-29 15:53 ` Rob Herring 2022-05-07 8:16 ` Sebastian Ene 2022-04-29 8:30 ` [PATCH v4 2/2] misc: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene 2022-04-29 8:48 ` Greg Kroah-Hartman 2022-04-29 16:51 ` Guenter Roeck 2022-04-30 6:18 ` Greg Kroah-Hartman 2022-04-30 12:36 ` Guenter Roeck 2022-05-02 5:58 ` Sebastian Ene 2022-04-29 8:51 ` Greg Kroah-Hartman 2022-04-29 9:26 ` Sebastian Ene 2022-04-29 9:38 ` Greg Kroah-Hartman 2022-04-29 21:03 ` Rob Herring 2022-05-04 7:29 ` Sebastian Ene 2022-04-29 17:02 ` Guenter Roeck 2022-04-29 20:25 ` [PATCH v4 0/2] Detect stalls on guest vCPUS Rob Herring 2022-05-02 6:03 ` Sebastian Ene
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).